3.unit 3 ML Part-2 Q&A
3.unit 3 ML Part-2 Q&A
10. What are support vectors? Describe Large margin classification in SVM.[7M]July–
2023 Set -3[Remember]
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are
called as support vectors, and hence algorithm is termed as Support Vector Machine. Consider the below
diagram in which there are two different categories that are classified using a decision boundary or
hyperplane:
Example:
SVM can be understood with the example that we have used in the KNN classifier. Suppose we see a
strange cat that also has some features of dogs, so if we want a model that can accurately identify whether
it is a cat or dog, so such a model can be created by using the SVM algorithm. We will first train our
model with lots of images of cats and dogs so that it can learn about different features of cats and dogs,
and then we test it with this strange creature. So as support vector creates a decision boundary between
these two data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case of
cat and dog. On the basis of the support vectors, it will classify it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Large Margin refers to the wide separation or “margin” between the decision boundary
(hyperplane) and the nearest data points of each class.
In a large margin classification, the SVM tries to find a decision boundary that not only correctly
classifies the data points but also maximizes the margin between the classes.
It provides effective classification for newly added instances and fewer outliers in classification.
Adding more instances “off the street” will not affect the decision boundary at all; it is fully determined
by the instances located on the edge of the street. These instances are called the support vectors.
Hard Margin Classification and Soft Margin Classification
Hard margin classification is a type of SVM classification in which the SVM aims to find a decision
boundary (hyperplane) that perfectly separates the data into two classes without any misclassifications.
If we strictly impose that all instances be off the street and on the right side, this is called hard margin
classification.
It is advised to use flexible model. The objective is to find a good balance between keeping the street as
large as possible and limiting the margin violations. This is called soft margin classification.
In soft margin classification, the SVM allows for some misclassifications, and the goal is to find a
decision boundary that still maximizes the margin but tolerates a certain amount of classification errors.
This approach is used when the data is not perfectly separable due to overlapping points or outliers.
Trying to achieve a hard margin in such cases might result in an overly complex and sensitive model.
Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region
is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the classes. These
points are called support vectors. The distance between the vectors and the hyperplane is called as margin.
And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called
the optimal hyperplane.
Non-Linear SVM Classification
Non-Linear SVM can be used to classify data when it cannot be separated into two classes by a
straight line (in the case of 2D). By using kernel functions, nonlinear SVMs can handle nonlinearly
separable data. The original input data is transformed by these kernel functions into a higher-
dimensional feature space, where the data points can be linearly separated. A linear SVM is used to
locate a nonlinear decision boundary in this modified space.
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we
cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data, we have used
two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated
as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d
space with z=1, then it will become as:
Hence we get a circumference of radius 1 in case of non-linear data.
Gaussian Kernel:
It is used to perform transformation when there is no prior knowledge about data.
Code:
from sklearn.svm import SVC
classifier = SVC(kernel ='rbf', random_state = 0)
# training set in x, y axis
classifier.fit(x_train, y_train)
Sigmoid Kernel:
It this function is equivalent to a two-layer, perceptron model of the neural network, which is used as
an activation function for artificial neurons.
Code:
from sklearn.svm import SVC
classifier = SVC(kernel ='sigmoid')
classifier.fit(x_train, y_train) # training set in x, y axis
Polynomial Kernel:
It represents the similarity of vectors in the training set of data in a feature space over polynomials of
the original variables used in the kernel.
Code:
from sklearn.svm import SVC
classifier = SVC(kernel ='poly', degree = 4)
classifier.fit(x_train, y_train) # training set in x, y axis
Linear Kernel:
It used when data is linearly separable.
Code:
from sklearn.svm import SVC
classifier = SVC(kernel ='linear')
classifier.fit(x_train, y_train) # training set in x, y axis
SVM Regression
13. Explain SVM regression in detail with a neat diagram.[7M]July–2023 Set -
2[Understand]
Support Vector Regression (SVR) is a machine learning technique used for regression tasks. It is a
variant of Support Vector Machines (SVM) and is designed to predict continuous numeric values,
making it suitable for tasks like time series forecasting, stock price prediction, and more.
Key points about SVR:
1. Objective: SVR aims to find a function that predicts a continuous target variable while
maximizing the margin between the predicted values and the actual data points.
2. Margin: SVR identifies a “margin” around the predicted regression line, and its goal is to fit the
line within this margin while minimizing the prediction error.
3. Support Vectors: In SVR, data points that are closest to the regression line and define the
margin are known as “support vectors.” These points play a crucial role in determining the
regression model.
4. Kernel Trick: SVR can use various kernel functions (e.g., linear, polynomial, radial basis
function) to transform the feature space, making it possible to model non-linear relationships
between input features and the target variable.
5. Hyperparameters: SVR requires tuning hyperparameters, such as the regularization parameter
(c ) and kernel parameters, to achieve the best model performance.
6. Loss Function: SVR typically uses an epsilon-insensitive loss function that allows for some
errors within a defined range (epsilon), and it penalizes errors outside this range more heavily.
7. Complexity Control: The regularization parameter (c ) in SVR controls the trade-off between
maximizing the margin and minimizing the prediction error. A smaller C leads to a wider margin
with more errors allowed, while a larger C results in a narrower margin with fewer errors
allowed.
8. Robustness: SVR is robust to outliers, as it primarily focuses on the data points close to the
margin (support vectors) and doesn’t heavily rely on all data points.
SVR is a regression technique that seeks to find a regression model with a margin around the predicted
values, allowing for a balance between fitting the data and avoiding overfitting. It is particularly useful
when dealing with non-linear relationships and can be adapted to various problem domains through the
choice of kernel functions.
Basic implementation of Support Vector Regression (SVR) in Python using the popular machine learning
library, scikit-learn. We’ll use a synthetic dataset for demonstration:
import numpy as np
from sklearn.svm import SVR
import matplotlib.pyplot as plt
The most widely used kernels in SVM are the linear kernel, polynomial kernel, and Gaussian (radial basis
function) kernel. The choice of kernel relies on the nature of the data and the job at hand. The linear kernel
is used when the data is roughly linearly separable, whereas the polynomial kernel is used when the data
has a complicated curved border. The Gaussian kernel is employed when the data has no clear boundaries
and contains complicated areas of overlap.
Let’s take an example to understand the kernel trick in more detail. Consider a binary classification
problem where we have two classes of data points: red and blue. The data is not linearly separable in the
2D space. We can see this in the plot below:
To make this data linearly separable, we can use the kernel trick.
By applying the kernel trick to the data, we transform it into a higher-dimensional feature space where the
data becomes linearly separable. We can see this in the plot below, where the red and blue data points have
been separated by a hyperplane in the 3D space:
As we can see, the kernel trick has helped us find a solution for a non-linearly separable dataset.
The kernel trick is a powerful technique that enables SVMs to solve non-linear classification problems by
implicitly mapping the input data to a higher-dimensional feature space. By doing so, it allows us to find
a hyperplane that separates the different classes of data
Polynomial Kernel: It represents the similarity of vectors in the training set of data in a feature space
over polynomials of the original variables used in the kernel.
The Kernel Trick helps us to actually visualize the non-linear datasets which are more complex and
cant be solved or classified on the basis of a linear line.
The linearly separable graph can be done by using any of the classification algorithm but what about the
Non-linear one? We cant place a linear line jut like we did in the left side, we have to separate them from
the red points by drawing a circle around them and to do so we use the Gaussian RBF Kernel Function
By using Gaussian RBF Kernel we can shift the points from a 2D plane to a 3D plane by just shifting all
the green points above the red ones by using a mapping function like gaussian RBF which introduces a
hyperplane in the given space just like its shown below:
This function will allow us to introduce a hyperplane which will help us distinguish between the green
points and red points by uplifting the green points above the hyperplane and leaving the red points below
it.
This is a visualization of how the function will work and help us distinguish between the 2 group of
points. It is a exponentially decreasing function and its center is marked by the vector l is the landmark
for the function.
The radius of the base of this mountain is denoted by constant sigma and putting the value of l
vector and sigma in the function we can get the desired output.
Now How to apply the Non linear SVM with Gaussian RBF Kernel in python
Well after importing the datasets and splitting the data into training and test set we import the SVC (Support
Vector Classifier) class from the SVM module of scikit-learn library.
Here we are putting kernel= ’rbf’ which will use the Gaussian RBF function
Now again predicting the accuracy score of Gaussian RBF Function
Here we achieved an accuracy of 93% and now visualizing the test results
16. Explain about Naïve Bayes classifier algorithm with an example.[7M]July–2023 Set
-3, Set -4[Understand]
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, which can be described as:
o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent
of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape,
and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
o The formula for Bayes' theorem is given as:
Where,
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis
is true.
Working of Naïve Bayes' Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this
dataset we need to decide that whether we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Weather No Yes
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny)= 0.35
P(Yes)=0.71
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35