0% found this document useful (0 votes)
8 views

3.unit 3 ML Part-2 Q&A

Uploaded by

shaik amreen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

3.unit 3 ML Part-2 Q&A

Uploaded by

shaik amreen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

PART-B

Support Vector Machine:

10. What are support vectors? Describe Large margin classification in SVM.[7M]July–
2023 Set -3[Remember]

Support Vector Machine


Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is
used for Classification as well as Regression problems. However, primarily, it is used for Classification
problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are
called as support vectors, and hence algorithm is termed as Support Vector Machine. Consider the below
diagram in which there are two different categories that are classified using a decision boundary or
hyperplane:

Example:

SVM can be understood with the example that we have used in the KNN classifier. Suppose we see a
strange cat that also has some features of dogs, so if we want a model that can accurately identify whether
it is a cat or dog, so such a model can be created by using the SVM algorithm. We will first train our
model with lots of images of cats and dogs so that it can learn about different features of cats and dogs,
and then we test it with this strange creature. So as support vector creates a decision boundary between
these two data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case of
cat and dog. On the basis of the support vectors, it will classify it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text categorization, etc.

Large Margin Classification

 Large Margin refers to the wide separation or “margin” between the decision boundary
(hyperplane) and the nearest data points of each class.

 In a large margin classification, the SVM tries to find a decision boundary that not only correctly
classifies the data points but also maximizes the margin between the classes.

 It provides effective classification for newly added instances and fewer outliers in classification.
Adding more instances “off the street” will not affect the decision boundary at all; it is fully determined
by the instances located on the edge of the street. These instances are called the support vectors.
Hard Margin Classification and Soft Margin Classification

Hard margin classification is a type of SVM classification in which the SVM aims to find a decision
boundary (hyperplane) that perfectly separates the data into two classes without any misclassifications.
If we strictly impose that all instances be off the street and on the right side, this is called hard margin
classification.

There are two main issues with hard margin classification.


1. It only works if the data is linearly separable
2. It is quite sensitive to outliers

It is advised to use flexible model. The objective is to find a good balance between keeping the street as
large as possible and limiting the margin violations. This is called soft margin classification.

In soft margin classification, the SVM allows for some misclassifications, and the goal is to find a
decision boundary that still maximizes the margin but tolerates a certain amount of classification errors.
This approach is used when the data is not perfectly separable due to overlapping points or outliers.
Trying to achieve a hard margin in such cases might result in an overly complex and sensitive model.

Linear SVM Classification


11.What is Linear classifier? Explain SVM linear classification.[7M]July–2023 Set -
4[Remember]
Linear Classifiers
Linear classifiers make predictions based on a linear combination of the input features. Their decision
boundary in a two-dimensional feature space is a straight line, in three dimensions it’s a plane, and in
higher dimensions, it’s a hyperplane. When data is linearly separable (meaning classes can be separated
with a straight line or hyperplane), linear classifiers can perform exceptionally well. If the data isn’t
naturally linearly separable, sometimes it can be made so through feature transformations or encodings.
Some of the linear classification models are as follows:
 Logistic Regression
 Support Vector Machines having kernel = ‘linear’
 Single-layer Perceptron
 Stochastic Gradient Descent (SGD) Classifier
Logistic regression
Logistic regression is used for binary classification where we use sigmoid function, that takes input as
independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an input is
greater than 0.5 (threshold value) then it belongs to Class 1 it belongs to Class 0. It’s referred to as
regression because it is the extension of linear regression but is mainly used for classification problems.
Logistic Function – Sigmoid Function
 The sigmoid function is a mathematical function used to map the predicted values to probabilities.
 It maps any real value into another value within a range of 0 and 1. The value of the logistic
regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the
“S” form.
 The S-form curve is called the Sigmoid function or the logistic function.
 In logistic regression, we use the concept of the threshold value, which defines the probability of
either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold
values tends to 0.

Types of Logistic Regression:


On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types
of the dependent variable, such as “cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as “low”, “Medium”, or “High”.
Single-layer Perceptron
From the figure above it’s very clear that there are multiple lines (our hyperplane here is a line because
we are considering only two input features x1, x2 ) that segregate our data points or do a classification
between red and blue circles.
It is one of the oldest and first introduced neural networks. It was proposed by Frank Rosenblatt in 1958.
Perceptron is also known as an artificial neural network. Perceptron is mainly used to compute the logical
gate like AND, OR, and NOR which has binary input and binary output. The main functionality of the
perceptron is:-

 Takes input from the input layer


 Weight them up and sum it up.
 Pass the sum to the nonlinear function to produce the output.

Single-layer neural network


Here activation functions can be anything like sigmoid, tanh, relu Based on the requirement we will be
choosing the most appropriate nonlinear activation function to produce the better result. Now let us
implement a single-layer perceptron.
Linear SVM Classification:
Linear SVMs use a linear decision boundary to separate the data points of different classes. When the data
can be precisely linearly separated, linear SVMs are very suitable. This means that a single straight line
(in 2D) or a hyperplane (in higher dimensions) can entirely divide the data points into their respective
classes. A hyperplane that maximizes the margin between the classes is the decision boundary. The
working of the SVM algorithm can be understood by using an example. Suppose we have a dataset that
has two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can
classify the pair (x1, x2) of coordinates in either green or blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can
be multiple lines that can separate these classes. Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region
is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the classes. These
points are called support vectors. The distance between the vectors and the hyperplane is called as margin.
And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called
the optimal hyperplane.
Non-Linear SVM Classification

12. Define Non-linear classification ? Explain the list of kernels in SVM


briefly.[7M]July–2023 Set -2[Remember]
Non-Linear SVM:

Non-Linear SVM can be used to classify data when it cannot be separated into two classes by a
straight line (in the case of 2D). By using kernel functions, nonlinear SVMs can handle nonlinearly
separable data. The original input data is transformed by these kernel functions into a higher-
dimensional feature space, where the data points can be linearly separated. A linear SVM is used to
locate a nonlinear decision boundary in this modified space.

If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we
cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data, we have used
two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated
as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:

So now, SVM will divide the datasets into classes in the following way. Consider the below image:

Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d
space with z=1, then it will become as:
Hence we get a circumference of radius 1 in case of non-linear data.

The List of kernels in SVM:


Kernel Function is a method used to take data as input and transform it into the required form of
processing data. “Kernel” is used due to a set of mathematical functions used in Support Vector
Machine providing the window to manipulate the data. So, Kernel Function generally transforms the
training set of data so that a non-linear decision surface is able to transform to a linear equation in a
higher number of dimension spaces. Basically, It returns the inner product between two points in a
standard feature dimension.

Standard Kernel Function Equation :

Major Kernel Functions :-


For Implementing Kernel Functions, first of all, we have to install the “scikit-learn” library using the
command prompt terminal:

pip install scikit-learn

Gaussian Kernel:
It is used to perform transformation when there is no prior knowledge about data.

Gaussian Kernel Radial Basis Function (RBF):


Same as above kernel function, adding radial basis method to improve the transformation.
Gaussian Kernel Graph

Code:
from sklearn.svm import SVC
classifier = SVC(kernel ='rbf', random_state = 0)
# training set in x, y axis
classifier.fit(x_train, y_train)

Sigmoid Kernel:
It this function is equivalent to a two-layer, perceptron model of the neural network, which is used as
an activation function for artificial neurons.

Sigmoid Kernel Graph

Code:
from sklearn.svm import SVC
classifier = SVC(kernel ='sigmoid')
classifier.fit(x_train, y_train) # training set in x, y axis
Polynomial Kernel:
It represents the similarity of vectors in the training set of data in a feature space over polynomials of
the original variables used in the kernel.

Polynomial Kernel Graph

Code:
from sklearn.svm import SVC
classifier = SVC(kernel ='poly', degree = 4)
classifier.fit(x_train, y_train) # training set in x, y axis

Linear Kernel:
It used when data is linearly separable.

Code:
from sklearn.svm import SVC
classifier = SVC(kernel ='linear')
classifier.fit(x_train, y_train) # training set in x, y axis

SVM Regression
13. Explain SVM regression in detail with a neat diagram.[7M]July–2023 Set -
2[Understand]
Support Vector Regression (SVR) is a machine learning technique used for regression tasks. It is a
variant of Support Vector Machines (SVM) and is designed to predict continuous numeric values,
making it suitable for tasks like time series forecasting, stock price prediction, and more.
Key points about SVR:
1. Objective: SVR aims to find a function that predicts a continuous target variable while
maximizing the margin between the predicted values and the actual data points.
2. Margin: SVR identifies a “margin” around the predicted regression line, and its goal is to fit the
line within this margin while minimizing the prediction error.
3. Support Vectors: In SVR, data points that are closest to the regression line and define the
margin are known as “support vectors.” These points play a crucial role in determining the
regression model.
4. Kernel Trick: SVR can use various kernel functions (e.g., linear, polynomial, radial basis
function) to transform the feature space, making it possible to model non-linear relationships
between input features and the target variable.
5. Hyperparameters: SVR requires tuning hyperparameters, such as the regularization parameter
(c ) and kernel parameters, to achieve the best model performance.
6. Loss Function: SVR typically uses an epsilon-insensitive loss function that allows for some
errors within a defined range (epsilon), and it penalizes errors outside this range more heavily.
7. Complexity Control: The regularization parameter (c ) in SVR controls the trade-off between
maximizing the margin and minimizing the prediction error. A smaller C leads to a wider margin
with more errors allowed, while a larger C results in a narrower margin with fewer errors
allowed.
8. Robustness: SVR is robust to outliers, as it primarily focuses on the data points close to the
margin (support vectors) and doesn’t heavily rely on all data points.

SVR is a regression technique that seeks to find a regression model with a margin around the predicted
values, allowing for a balance between fitting the data and avoiding overfitting. It is particularly useful
when dealing with non-linear relationships and can be adapted to various problem domains through the
choice of kernel functions.
Basic implementation of Support Vector Regression (SVR) in Python using the popular machine learning
library, scikit-learn. We’ll use a synthetic dataset for demonstration:
import numpy as np
from sklearn.svm import SVR
import matplotlib.pyplot as plt

# Generate synthetic data


np.random.seed(0)
X = np.sort(5 * np.random.rand(80, 1), axis=0)
y = np.sin(X).ravel()

# Fit the SVR model


svr_rbf = SVR(kernel='rbf', C=100, gamma=0.1)
svr_rbf.fit(X, y)

# Predict on new data points


X_test = np.linspace(0, 5, 100)[:, np.newaxis]
y_pred = svr_rbf.predict(X_test)

# Plot the results


plt.scatter(X, y, color='darkorange', label='data')
plt.plot(X_test, y_pred, color='navy', lw=2, label='RBF model')
plt.xlabel('Data')
plt.ylabel('Target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()
In this code:
1. We generate a synthetic dataset with a sine wave relationship between X and y.
2. We create an SVR model with a radial basis function (RBF) kernel, set the regularization
parameter (c ) to 100, and the kernel coefficient (gamma) to 0.1.
3. We fit the SVR model to the data.
4. We generate new data points (X_test) for prediction and use the trained SVR model to make
predictions.
5. Finally, we visualize the original data points and the SVR model’s predictions.

Support Vector Machine

14. What is Kernel trick? Describe polynomial kernel function.[7M]July–2023 Set -


4[Remember]

What is Kernel Trick ?


The “Kernel Trick” is a method used in Support Vector Machines (SVMs) to convert data (that is not
linearly separable) into a higher-dimensional feature space where it may be linearly separated.
This technique enables the SVM to identify a hyperplane that separates the data with the maximum margin,
even when the data is not linearly separable in its original space. The kernel functions are used to compute
the inner product between pairs of points in the transformed feature space without explicitly computing
the transformation itself. This makes it computationally efficient to deal with high dimensional feature
spaces.

The most widely used kernels in SVM are the linear kernel, polynomial kernel, and Gaussian (radial basis
function) kernel. The choice of kernel relies on the nature of the data and the job at hand. The linear kernel
is used when the data is roughly linearly separable, whereas the polynomial kernel is used when the data
has a complicated curved border. The Gaussian kernel is employed when the data has no clear boundaries
and contains complicated areas of overlap.
Let’s take an example to understand the kernel trick in more detail. Consider a binary classification
problem where we have two classes of data points: red and blue. The data is not linearly separable in the
2D space. We can see this in the plot below:
To make this data linearly separable, we can use the kernel trick.
By applying the kernel trick to the data, we transform it into a higher-dimensional feature space where the
data becomes linearly separable. We can see this in the plot below, where the red and blue data points have
been separated by a hyperplane in the 3D space:

As we can see, the kernel trick has helped us find a solution for a non-linearly separable dataset.
The kernel trick is a powerful technique that enables SVMs to solve non-linear classification problems by
implicitly mapping the input data to a higher-dimensional feature space. By doing so, it allows us to find
a hyperplane that separates the different classes of data
Polynomial Kernel: It represents the similarity of vectors in the training set of data in a feature space
over polynomials of the original variables used in the kernel.

Polynomial Kernel Graph

The equation for the polynomial kernel function is:


K(x,xi) = 1 + sum(x * xi)^d
This kernel is used when data cannot be separated linearly. The polynomial kernel has a degree parameter
(d) which functions to find the optimal value in each dataset. The d parameter is the degree of the
polynomial kernel function with a default value of d = 2. The greater the d value, the resulting system
accuracy will be fluctuating and less stable. This happens because the higher the d parameter value, the
more curved the resulting hyperplane line.
Code:
from sklearn.svm import SVC
classifier = SVC(kernel ='poly', degree = 4)
classifier.fit(x_train, y_train) # training set in x, y axis

15. Describe Gaussian RBF kernel in SVM.[7M]July–2023 Set -1[Understand]

The Kernel Trick helps us to actually visualize the non-linear datasets which are more complex and
cant be solved or classified on the basis of a linear line.
The linearly separable graph can be done by using any of the classification algorithm but what about the
Non-linear one? We cant place a linear line jut like we did in the left side, we have to separate them from
the red points by drawing a circle around them and to do so we use the Gaussian RBF Kernel Function
By using Gaussian RBF Kernel we can shift the points from a 2D plane to a 3D plane by just shifting all
the green points above the red ones by using a mapping function like gaussian RBF which introduces a
hyperplane in the given space just like its shown below:

The Gaussian RBF Kernel Function is as follows:

This function will allow us to introduce a hyperplane which will help us distinguish between the green
points and red points by uplifting the green points above the hyperplane and leaving the red points below
it.

This is a visualization of how the function will work and help us distinguish between the 2 group of
points. It is a exponentially decreasing function and its center is marked by the vector l is the landmark
for the function.
The radius of the base of this mountain is denoted by constant sigma and putting the value of l
vector and sigma in the function we can get the desired output.

Now How to apply the Non linear SVM with Gaussian RBF Kernel in python
Well after importing the datasets and splitting the data into training and test set we import the SVC (Support
Vector Classifier) class from the SVM module of scikit-learn library.

Here we are putting kernel= ’rbf’ which will use the Gaussian RBF function
Now again predicting the accuracy score of Gaussian RBF Function
Here we achieved an accuracy of 93% and now visualizing the test results

We get the following results

Graph of Estimated Salary Vs Age

Naïve Bayes Classifiers

16. Explain about Naïve Bayes classifier algorithm with an example.[7M]July–2023 Set
-3, Set -4[Understand]

Naïve Bayes Classifier Algorithm


o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and
used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis,
and classifying articles.

Why we called Naïve Bayes:

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, which can be described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent
of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape,
and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis
is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the below example:

Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this
dataset we need to decide that whether we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:

Outlook Play

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Frequency table for the Weather Conditions:


Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 5

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes ‘theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So, as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)


Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:


o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in multi-class predictions as compared to the other Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:


o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the
relationship between features.

Applications of Naïve Bayes Classifier:


o It is used for Credit Scoring.
o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
o It is used in Text classification such as Spam filtering and Sentiment analysis.

You might also like