ML Unit 3
ML Unit 3
Support Vector Machines (SVMs) are a powerful and versatile class of supervised machine
learning algorithms used for classification and regression tasks. At its core, SVM aims to find an
optimal hyperplane that best separates data points into distinct classes. This hyperplane is chosen
to maximize the margin between the two classes, meaning it should be positioned in such a way
that it maximally separates the closest data points from each class. These closest data points are
known as "support vectors.' By focusing on support vectors, SVMs achieve better generalization
to new, unseen data.
A non-linear dataset is one in which the relationship between the input features and the target
variable cannot be effectively modeled by a straight line or a hyperplane. Non-linear datasets
often exhibit complex, curved, or irregular patterns that require more sophisticated models to
capture. Here's an example
Example: Predicting a Student's Exam Score
Suppose you want to predict a student's exam score based on the number of
studied and the number of hours they slept the night before. The hours they
these features and the exam score might not be linear.
relationship between
It could be that studying more hours initially
further studying becomes less effective. Also,increases
the score, but after a certain point,
the quality of sleep might interact with the
effect of studying.
These relationships can be highly non-linear and are better captured by
models like decision trees, random forests, or support vector machines with non-linear
kernels.
non-linear
In this case, plotting the data points would result in a curve or a
doesn't fit a straight line or a simple plane. complex shape that
Understanding whether your data is linear or non-linear is crucial for selecting the
appropriate machine learning model and algorithm. Linear regression, for example, is ideal for
linear datasets, while non-linear datasets require more advanced techniques to
capture the
underlying patterns effectively.
Support Vector Machine (SVM)
Support Vector Machine (SVM) is a powerful and versatile machine learning algorithm used for
classification and regression tasks. Its primary goal is to find an optimal hyperplane that best
separates different classes of data points in a high-dimensional space. SVM is particularly useful
when dealing with both linearly and non-linearly separable datasets. Here's a brief introduction
to SVM with relevant diagrams.
1. Intuition:
At its core, SVM aims to find a hyperplane that maximizes the margin
between two classes of
data points. This hyperplane is known as the decision boundary, and the margin
represents the
distance between the boundary and the nearest data points from each class. The idea is to ensure
that the decision boundary has the greatest separation, making it robust to classify new, unseen
data accurately.
2. Linear Separation
For linearly separable data, SVM finds a hyperplane that separates the two classes with the
largest possible margin. In a two-dimensional feature space, this hyperplane is a straight line.
Here's a simplified diagram to illustrate this concept:
- 0 1
The solid line is the decision boundary, and the dashed lines represent the margin.
3. Support Vectors
Support vectors are the data points closest to the decision boundary. These points play a critical
role in defining the margin and the overall performance of the SVM. The margin is determined
by the distance between the support vectors and the decision boundary, as shown below:
x,
4. Non-linear Separation
In many real-world scenarios, data may not be linearly separable. SVM can handle such cases by
mapping the data into a higher-dimensional space where it becomes linearly separable. This
transformation is done using a kernel function. Common kernel functions include the linear,
demonstrates this
polynomial, and radial basis function (RBF) kernels. The following diagram
concept:
Iris Classification using Non-linear Kernel SVM
2.5
Boo 8
Ooooo88o
5
05
In this example, data is transformed into a 3D space where it becomes linearly separable.
In summary,
Support Vector Machine is a versatile machine learning algorithm that excels in both linear and
non-linear classification tasks. It aims to find the optimal hyper plane that maximizes the margin
between classes, and it uses support vectors to define this margin. By introducing kernel
functions, sVM can handle non-linear separations, making it a valuable tool in various
applications, from image classification to financial analysis.
SVMs advantages:
They provide strong theoretical foundations, making them a preferred choice for many
researchers and practitioners.
They are effective with high-dimensional data, relatively resistant to overfitting, and can
handle both binary and multiclass classification problems.
Moreover, SVMs are also used in regression tasks, known as Support Vector Regression
(SVR), where they aim to fit a hyperplane that approximates the target values as closely
as possible.
Linear Discriminant functions for Binary classification using SVM:
Linear discriminant functions are essential in binary classification tasks, and Support Vector
Machines (SVMs) utilize them to separate data into two distinct classes.
Let's explain this concept using simple equations and diagrams
In SVM, the linear discriminant function for binary classification can be represented as
f(x) =sign (wx+b)
Where
f(x):The decision function that determines the class label of a data point x.
w: The weight vector that defines the orientation of the decision boundary.
sign(.): The sign function, which assigns a class label based on the sign of the result.
Example
Let's illustrate this with a simple example. Suppose we have a 2D
feature space for
classification, where class A is represented by Blue points and class B is represented bybinary
Black
points. The linear discriminant function is:
Here's a diagram
NA
In the diagram, the decision boundary is a straight line, represented by
Wi X1 + W2 X2 + b = 0
That separates the two classes. Any data point x falling on one side of the decision boundary is
classified as Class A, and on the other side as Class B.
" SVM's objective is to find the optimal values of w and b that maximize the margin
between the two classes while minimizing classification errors.
" The support vectors are the data points that are closest to the decision boundary, and they
are used to define the margin in an SVM.
The margin is the perpendicular distance from the decision boundary to the support
vectors.
SVM aims to maximize this margin while ensuring that data points are correctly
classified.
" If the data points are not linearly separable, SVM can use kernel functions to map the
data to a higher-dimensional space where linear separation is possible.
Large margin classifier for linearly separable data using SVM:
Akey concept in Support Vector Machines (SVM) is the idea of a large margin classifier for
linearly separable data. The goal is to find a hyperplane that maximizes the margin between two
classes of data points. Here's an explanation with simple equations and diagrams:
Consider a binary classification problem where you have two classes: Class Aand Class
B. The objective of SVM is to find a hyperplane that best separates these two classes
while maximizing the margin. The equation for this hyperplane is
W-X+b = 0
Where
Optimal hyperlane
Maxinum
margin
In the diagram, the decision hyperplane (the straight line) separates Class Afrom Class B. The
margin is the distance from the hyperplane to the closest data points from each class.
" SVM's objective is to find the optimal w and b that maximize this margin while ensuring
that data points are correctly classified.
In this ideal scenario of Iinearly separable data, the support vectors are the data points
closest to the hyperplane, and they are used to define the margin.
" SVM finds these support vectors and optimizes the margin by solving a constrained
optimization problem.
The large margin classifier provides a robust solution for linearly separable data, ensuring a
wider separation between classes and making it less sensitive to noise in the data.
y*1
y-1
margn
bad classifier
Fig: Representing good and bad SVM classifier models in small and large margin cases
Alarge margin classifier in SVM for linearly separable data aims to find an optimal hyperplane
that maximizes the margin between two classes, ensuring a robust separation. Support vectors
define this margin, and SVM finds the best hyperplane by minimizing classification errors while
maximizing the margin, enhancing classification accuracy and robustness.
Linear soft marginclassifier for over lapping classes
A linear soft margin classifier in SVM addresses overlapping classes by allowing for some
classification errors. It introduces a penalty for misclassifications, striking a balance between
maximizing the margin and tolerating some errors.
Linear Soft Margin Classifier:
The linear soft margin classifier in SVM aims to find a hyperplane that best separates
overlapping classes, even when perfect separation isn't possible. It introduces a "slack
variable" E) to account for classification errors. The objective function is modified as
follows
minww+c Di,
Subject to:
yi(w-xi+b)>I-i :Ei0
Where
C:A hyper parameter that controls the trade-off between maximizing the margin and minimizing
the misclassification error.
Ei: Slack variable for the ith data point.
n: The number of data points.
Sack
arata
Myper)
(Lu2
" Consider a simple 2D dataset where Cilass A(Green points) and Class B(blue points) are
not linearly separable in the original feature space:
In this diagram, it's evident that a straight-line decision boundary cannot separate the
classes effectively in the original 2D space.
Now, by using a kernel function, we implicitly map this data to a higher-dimensional
feature space, often referred to as a "kernel-induced feature space." Let's say we use a
radial basis function (RBF) kernel
This RBF kernel implicitly maps the data to a higher-dimensional space where the classes
might become linearly separable
30 5pace
Fig: Non linear separable data using 3D Kernel Space and 2D Space
In this new feature space, the data points might be linearly separable with the right
choice
of kernel and kernel parameters, enabling SVM to find an optimal
decision boundary that
maximizes the margin between classes.
The transformation into the
kernel-induced feature space is implicit and doesn't require explicit
calculation of the transformed feature vectors. It allows SVM to handle non-linearly separable
data effectively.
Perceptron Algorithm:
The Perceptron algorithm is a simple supervised learning algorithm used for binary
classification. It's designed to find a linear decision boundary that separates data points of two
classes. Here's an explanation of the Perceptron algorithm with suitable diagrams:
The Perceptron algorithm works as follows:
1. Initialize the weights (w) and bias (b) to small random values or zeros.
2. For each data point (*) in the training dataset, compute the predicted class label (9) using the
following formula
ó =sign (wx+b)
Here, w represents the weight vector, x is the feature vector of the data point, and sign () is a
function that returns +1for values greater than or equal to zero and -1 for values less than zero.
3. Compare the predicted class label (ý) to the true class label (y) of the data point. If they don't
match, update the weights and bias as follows:
W=wta (y-)x
b=bta (y- 9)
Here, (a) is the learning rate, and (y - is the classification error. These updates help the
Perceptron adjust the decision boundary to classify the data points correctly.
4. Repeat the above steps for a fixed number of iterations or until the algorithm converges,
meaning no more misclassifications occurs.
Fig: Perceptron
Let's illustrate the Perceptron algorithm witha simple 2D dataset and a linear decision boundary
Linear regression using Support Vector Machines (SVM) is a variation of SVM designed for
regression tasks. It aims to find a linear relationship between input features and a continuous
target variable.
In linear regression using SVM, the goal is to find a linear function that best
approximates the relationship between input features and the target variable. This linear
function is represented as:
f(x) = w-x+b
f(x): The predicted target variable.
w: The weight vector.
" The linear regression objective is to minimize the mean squared error
predictions and the true target values (MSE) between the
The target variable (y) is represented on the vertical axis, and the input features (x) are on
the horizontal axis.
The non-linear function f(x)=2, a K(X, x)+b captures non-linear relationships
between input features and the target variable by implicitly mapping the data into a
higher-dimensional feature space using the kernel function.
The model can then make non-linear predictions based on the input features.