0% found this document useful (0 votes)
10 views

EXP-14

The document provides an overview of Support Vector Machines (SVM), explaining its purpose as a supervised machine learning algorithm that finds the best hyperplane to separate classes. It discusses the differences between SVM and logistic regression, types of SVM (linear and non-linear), and the importance of support vectors and margins. Additionally, it covers the mathematical intuition behind SVM, the use of kernels for non-linear classification, and includes example code for implementing both linear and non-linear SVM using Python.

Uploaded by

8367748261durga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

EXP-14

The document provides an overview of Support Vector Machines (SVM), explaining its purpose as a supervised machine learning algorithm that finds the best hyperplane to separate classes. It discusses the differences between SVM and logistic regression, types of SVM (linear and non-linear), and the importance of support vectors and margins. Additionally, it covers the mathematical intuition behind SVM, the use of kernels for non-linear classification, and includes example code for implementing both linear and non-linear SVM using Python.

Uploaded by

8367748261durga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Experiment-14:

Write a program to Implement Support Vector Machines

What is a Support Vector Machine?

It is a supervised machine learning problem where we try to find a hyperplane that best separates the two
classes. Note: Don’t get confused between SVM and logistic regression. Both the algorithms try to find the
best hyperplane, but the main difference is logistic regression is a probabilistic approach whereas support
vector machine is based on statistical approaches.

When to use logistic regression vs Support vector machine?

Depending on the number of features you have you can either choose Logistic Regression or SVM.

SVM works best when the dataset is small and complex. It is usually advisable to first use logistic regression
and see how does it performs, if it fails to give a good accuracy you can go for SVM without any kernel
(will talk more about kernels in the later section). Logistic regression and SVM without any kernel have
similar performance but depending on your features, one may be more efficient than the other.

Types of Support Vector Machine

Linear SVM

When the data is perfectly linearly separable only then we can use Linear SVM. Perfectly linearly separable
means that the data points can be classified into 2 classes by using a single straight line(if 2D).

Non-Linear SVM

When the data is not linearly separable then we can use Non-Linear SVM, which means when the data
points cannot be separated into 2 classes by using a straight line (if 2D) then we use some advanced
techniques like kernel tricks to classify them. In most real-world applications we do not find linearly
separable datapoints hence we use kernel trick to solve them.
Support Vectors: These are the points that are closest to the hyperplane. A separating line will be defined
with the help of these data points.

Margin: it is the distance between the hyperplane and the observations closest to the hyperplane (support
vectors). In SVM large margin is considered a good margin. There are two types of margins hard
margin and soft margin. I will talk more about these two in the later section

1.Linear Support Vector Machine:

SVM is defined such that it is defined in terms of the support vectors only, we don’t have to worry about
other observations since the margin is made using the points which are closest to the hyperplane (support
vectors), whereas in logistic regression the classifier is defined over all the points. Hence SVM enjoys some
natural speed-ups.
Let’s understand the working of SVM using an example. Suppose we have a dataset that has two classes
(green and blue). We want to classify that the new data point as either blue or green.

To classify these points, we can have many decision boundaries, but the question is which is the best and
how do we find it? NOTE: Since we are plotting the data points in a 2-dimensional graph we call this
decision boundary a straight line but if we have more dimensions, we call this decision boundary
a “hyperplane”
The best hyperplane is that plane that has the maximum distance from both the classes, and this is the main
aim of SVM. This is done by finding different hyperplanes which classify the labels in the best way then it
will choose the one which is farthest from the data points or the one which has a maximum margin.

Mathematical Intuition behind Support Vector Machine

Use of Dot Product in SVM:

Consider a random point X and we want to know whether it lies on the right side of the plane or the left side
of the plane (positive or negative).
To find this first we assume this point is a vector (X) and then we make a vector (w) which is perpendicular
to the hyperplane. Let’s say the distance of vector w from origin to decision boundary is ‘c’. Now we take
the projection of X vector on w.

We already know that projection of any vector or another vector is called dot-product. Hence, we take the
dot product of x and w vectors. If the dot product is greater than ‘c’ then we can say that the point lies on the
right side. If the dot product is less than ‘c’ then the point is on the left side and if the dot product is equal to
‘c’ then the point lies on the decision boundary.
You must be having this doubt that why did we take this perpendicular vector w to the hyperplane? So what
we want is the distance of vector X from the decision boundary and there can be infinite points on the
boundary to measure the distance from. So that’s why we come to standard, we simply take perpendicular
and use it as a reference and then take projections of all the other data points on this perpendicular vector
and then compare the distance.

In SVM we also have a concept of margin. In the next section, we will see how we find the equation of a
hyperplane and what exactly do we need to optimize in SVM.

Margin in Support Vector Machine

We all know the equation of a hyperplane is w.x+b=0 where w is a vector normal to hyperplane and b is an
offset.

To classify a point as negative or positive we need to define a decision rule. We can define decision rule as:
If the value of w.x+b>0 then we can say it is a positive point otherwise it is a negative point. Now we need
(w,b) such that the margin has a maximum distance. Let’s say this distance is ‘d’.

To calculate ‘d’ we need the equation of L1 and L2. For this, we will take few assumptions that the equation
of L1 is w.x+b=1 and for L2 it is w.x+b=-1. Now the question comes

1. Why the magnitude is equal, why didn’t we take 1 and -2?


2. Why did we only take 1 and -1, why not any other value like 24 and -100?

3. Why did we assume this line?

Let’s try to answer these questions:

1. We want our plane to have equal distance from both the classes that means L should pass through the
center of L1 and L2 that’s why we take magnitude equal.

2. Let’s say the equation of our hyperplane is 2x+y=2, we observe that even if we multiply the whole
equation with some other number the line doesn’t change (try plotting on a graph). Hence for mathematical
convenience, we take it as 1.

3. Now the main question is exactly why there’s a need to assume only this line? To answer this, I’ll try to
take the help of graphs.

Suppose the equation of our hyperplane is 2x+y=2:

Let’s create margin for this hyperplane,


Optimization function and its constraints

In order to get our optimization function, there are few constraints to consider. That constraint is that “We’ll
calculate the distance (d) in such a way that no positive or negative point can cross the margin
line”. Let’s write these constraints mathematically:

Rather than taking 2 constraints forward, we’ll now try to simplify these two constraints into 1. We assume
that negative classes have y=-1 and positive classes have y=1.

We can say that for every point to be correctly classified this condition should always be true:

Suppose a green point is correctly classified that means it will follow w.x+b>=1, if we multiply this
with y=1 we get this same equation mentioned above. Similarly, if we do this with a red point with y=-1 we
will again get this equation. Hence, we can say that we need to maximize (d) such that this constraint holds
true.
We will take 2 support vectors, 1 from the negative class and 2 nd from the positive class. The distance
between these two vectors x1 and x2 will be (x2-x1) vector. What we need is, the shortest distance between
these two points which can be found using a trick we used in the dot product. We take a vector ‘w’
perpendicular to the hyperplane and then find the projection of (x2-x1) vector on ‘w’. Note: this
perpendicular vector should be a unit vector then only this will work. Why this should be a unit vector? This
has been explained in the dot-product section. To make this ‘w’ a unit vector we divide this with the norm of
‘w’.

We already know how to find the projection of a vector on another vector. We do this by dot-product of both
vectors. So let’s see how

Since x2 and x1 are support vectors and they lie on the hyperplane, hence they will follow yi* (2.x+b)=1 so
we can write it as:
Putting equations (2) and (3) in equation (1) we get:

Hence the equation which we have to maximize is:

We have now found our optimization function but there is a catch here that we don’t find this type of
perfectly linearly separable data in the industry, there is hardly any case we get this type of data and hence
we fail to use this condition we proved here. The type of problem which we just studied is called Hard
Margin SVM now we shall study soft margin which is similar to this but there are few more interesting
tricks we use in Soft Margin SVM

2.Non-Linear( Kernels )in Support Vector Machine:

The most interesting feature of SVM is that it can even work with a non-linear dataset and for this, we use
“Kernel Trick” which makes it easier to classifies the points. Suppose we have a dataset like this:

Here we see we cannot draw a single line or say hyperplane which can classify the points correctly. So what
we do is try converting this lower dimension space to a higher dimension space using some quadratic
functions which will allow us to find a decision boundary that clearly divides the data points. These
functions which help us do this are called Kernels and which kernel to use is purely determined by hyper
parameter tuning.
Different Kernel functions

Some kernel functions which you can use in SVM are given below:

1. Polynomial kernel

Following is the formula for the polynomial kernel:

Here d is the degree of the polynomial, which we need to specify manually.

Suppose we have two features X1 and X2 and output variable as Y, so using polynomial kernel we can write
it as:

So we basically need to find X12 , X22 and X1.X2, and now we can see that 2 dimensions got converted into 5
dimensions.

Image 4
2. Sigmoid kernel

We can use it as the proxy for neural networks. Equation is:

It is just taking your input, mapping them to a value of 0 and 1 so that they can be separated by a simple
straight line.

3. RBF kernel

What it actually does is to create non-linear combinations of our features to lift your samples onto a higher-
dimensional feature space where we can use a linear decision boundary to separate your classes It is the
most used kernel in SVM classifications, the following formula explains it mathematically:

where,

1. ‘σ’ is the variance and our hyperparameter


2. ||X₁ – X₂|| is the Euclidean Distance between two points X₁ and X₂
How to choose the right Kernel?

I am well aware of the fact that you must be having this doubt about how to decide which kernel function
will work efficiently for your dataset. It is necessary to choose a good kernel function because the
performance of the model depends on it.

Choosing a kernel totally depends on what kind of dataset are you working on. If it is linearly separable then
you must opt. for linear kernel function since it is very easy to use and the complexity is much lower
compared to other kernel functions. I’d recommend you start with a hypothesis that your data is linearly
separable and choose a linear kernel function.

You can then work your way up towards the more complex kernel functions. Usually, we use SVM with
RBF and linear kernel function because other kernels like polynomial kernel are rarely used due to poor
efficiency. But what if linear and RBF both give approximately similar results? Which kernel do we choose
now? Let’s understand this with the help of an example, for simplicity I’ll only take 2 features that mean 2
dimensions only. In the figure below I have plotted the decision boundary of a linear SVM on 2 features of
the iris dataset:
Here we see that a linear kernel works fine on this dataset, but now let’s see how will RBF kernel work.

We can observe that both the kernels give similar results, both work well with our dataset but which one
should we choose? Linear SVM is a parametric model. A Parametric Model is a concept used to describe a
model in which all its data is represented within its parameters. In short, the only information needed to
predict the future from the current value is the parameters.

The complexity of the RBF kernel grows as the training data size increases. In addition to the fact that it is
more expensive to prepare RBF kernel, we also have to keep the kernel matrix around, and the projection
into this “infinite” higher dimensional space where the data becomes linearly separable is more expensive as
well during prediction. If the dataset is not linear then using linear kernel doesn’t make sense we’ll get a
very low accuracy if we do so.

So for this kind of dataset, we can use RBF without even a second thought because it makes decision
boundary like this:
Liner svm:

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset = pd.read_csv('/content/Iris.csv')

dataset
dataset.head()

dataset.tail()

dataset.isna().sum()

X = dataset.iloc[:, [2, 3]].values

y = dataset.iloc[:, -1].values

"""''''''**NOTE: As we can see labels are categorical. KNeighborsClassifier does not accept string
labels. We need to use LabelEncoder to transform them into numbers. Iris-setosa correspond to 0, Iris-
versicolor correspond to 1 and Iris-virginica correspond to 2.**''''''"""

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

y = le.fit_transform(y)

#Spliting dataset into training set and test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

from sklearn.svm import SVC # "Support vector classifier"

classifier = SVC(kernel='linear', random_state=0)


classifier.fit(X_train, y_train)

# Predicting the Test set results

y_pred = classifier.predict(X_test)

print(" Actual output \n {}".format(y_test),"\n predict outputs:\n {}".format(y_pred))

#confusion_matrix

from sklearn.metrics import confusion_matrix

import seaborn as sns

cm = confusion_matrix(y_test, y_pred) #Transform to df

cm_df = pd.DataFrame(cm,index = ['setosa','versicolor','virginica'], columns =


['setosa','versicolor','virginica'])

plt.figure(figsize=(5.5,4))

sns.heatmap(cm_df, annot=True)

plt.ylabel('True label')

plt.xlabel('Predicted label')

plt.show()

# find accuracy_score

from sklearn.metrics import accuracy_score

print("the Accuracy of given model:",accuracy_score(y_test, y_pred)*100)


# find precision_score

from sklearn.metrics import precision_score

precision = precision_score(y_test, y_pred, average='micro')

print('Precision:', precision*100)

# calculate recall
from sklearn.metrics import recall_score
f1score = recall_score(y_test, y_pred, average='micro')
print('Recall:', f1score*100)

# f1_score

from sklearn.metrics import f1_score

f1score = f1_score(y_test, y_pred, average='micro')

print(' f1_score:', f1score*100)

Non Liner svm:

# -*- coding: utf-8 -*-

"""SVM2.ipynb

Automatically generated by Colaboratory.

Original file is located at

https://colab.research.google.com/drive/1ppcAzf8a5DGsAeQaQAxtWEa7QfWm1s5g
# Non-linear Support Vector Machine Example

## Importing and exploring our Data

"""

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset= pd.read_csv('/content/Iris.csv')

dataset.shape

dataset.head()

dataset.tail()

dataset.isna().sum()

X = dataset.iloc[:, [2, 3]].values


y = dataset.iloc[:, -1].values

"""# ''''''**NOTE: As we can see labels are categorical. KNeighborsClassifier does not accept string
labels. We need to use LabelEncoder to transform them into numbers. Iris-setosa correspond to 0, Iris-
versicolor correspond to 1 and Iris-virginica correspond to 2.**''''''"""

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

y = le.fit_transform(y)

#Spliting dataset into training set and test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

"""1. Polynomial Kernel:In the case of polynomial kernel, you also have to pass a value for the
degree parameter of the SVC class. This basically is the degree of the polynomial. Take a look at how we
can use a polynomial kernel to implement kernel SVM:**[bold text](https://)** **bold text**"""

from sklearn.svm import SVC

svclassifier = SVC(kernel='poly', degree=8)

svclassifier.fit(X_train, y_train)

star_data.columns
y_pred = svclassifier.predict(X_test)

#confusion_matrix

from sklearn.metrics import confusion_matrix

import seaborn as sns

cm = confusion_matrix(y_test, y_pred) #Transform to df

cm_df = pd.DataFrame(cm,index = ['setosa','versicolor','virginica'], columns =


['setosa','versicolor','virginica'])

plt.figure(figsize=(5.5,4))

sns.heatmap(cm_df, annot=True)

plt.ylabel('True label')

plt.xlabel('Predicted label')

plt.show()

# find accuracy_score

from sklearn.metrics import accuracy_score

print("the Accuracy of given model:",accuracy_score(y_test, y_pred)*100)

# find precision_score

from sklearn.metrics import precision_score

precision = precision_score(y_test, y_pred, average='micro')

print('Precision:', precision*100)
# calculate F1_score

from sklearn.metrics import f1_score

f1score = f1_score(y_test, y_pred, average='micro')

print('Recall:', f1score*100)

# calculate recall

from sklearn.metrics import recall_score

f1score = recall_score(y_test, y_pred, average='micro')

print('Recall:', f1score*100)

"""2. Gaussian Kernel(rbf)"""

from sklearn.svm import SVC

svclassifier = SVC(kernel='rbf')

svclassifier.fit(X_train, y_train)

y_pred = svclassifier.predict(X_test)

#confusion_matrix

from sklearn.metrics import confusion_matrix

import seaborn as sns

cm = confusion_matrix(y_test, y_pred) #Transform to df


cm_df = pd.DataFrame(cm,index = ['setosa','versicolor','virginica'], columns =
['setosa','versicolor','virginica'])

plt.figure(figsize=(5.5,4))

sns.heatmap(cm_df, annot=True)

plt.ylabel('True label')

plt.xlabel('Predicted label')

plt.show()

# find accuracy_score

from sklearn.metrics import accuracy_score

print("the Accuracy of given model:",accuracy_score(y_test, y_pred)*100)

# find precision_score

from sklearn.metrics import precision_score

precision = precision_score(y_test, y_pred, average='micro')

print('Precision:', precision*100)

# calculate F1_score

from sklearn.metrics import f1_score

f1score = f1_score(y_test, y_pred, average='micro')

print('Recall:', f1score*100)

# calculate recall
from sklearn.metrics import recall_score

f1score = recall_score(y_test, y_pred, average='micro')

print('Recall:', f1score*100)

"""3. Sigmoid Kernel"""

from sklearn.svm import SVC

svclassifier = SVC(kernel='sigmoid')

svclassifier.fit(X_train, y_train)

y_pred = svclassifier.predict(X_test)

#confusion_matrix

from sklearn.metrics import confusion_matrix

import seaborn as sns

cm = confusion_matrix(y_test, y_pred) #Transform to df

cm_df = pd.DataFrame(cm,index = ['setosa','versicolor','virginica'], columns =


['setosa','versicolor','virginica'])

plt.figure(figsize=(5.5,4))

sns.heatmap(cm_df, annot=True)

plt.ylabel('True label')

plt.xlabel('Predicted label')

plt.show()
# find accuracy_score

from sklearn.metrics import accuracy_score

print("the Accuracy of given model:",accuracy_score(y_test, y_pred)*100)

# find precision_score

from sklearn.metrics import precision_score

precision = precision_score(y_test, y_pred, average='micro')

print('Precision:', precision*100)

# calculate F1_score

from sklearn.metrics import f1_score

f1score = f1_score(y_test, y_pred, average='micro')

print('Recall:', f1score*100)

# calculate recall

from sklearn.metrics import recall_score

f1score = recall_score(y_test, y_pred, average='micro')

print('Recall:', f1score*100)

You might also like