ML NOTES BY PUSHPA
ML NOTES BY PUSHPA
Learning
Classification and regression are two primary tasks in supervised machine
learning, where key difference lies in the nature of the output:
classification deals with discrete outcomes (e.g., yes/no, categories),
while regression handles continuous values (e.g., price, temperature).
Both approaches require labeled data for training but differ in their
objectives—classification aims to find decision boundaries that separate
classes, whereas regression focuses on finding the best-fitting line to
predict numerical outcomes. Understanding these distinctions helps in
selecting the right approach for specific machine learning tasks.
Classification Algorithms
There are different types of classification algorithms that have been
developed over time to give the best results for classification tasks. Don’t
worry if they seem overwhelming at first—we’ll dive deeper into each
algorithm, one by one, in the upcoming chapters.
Logistic Regression
Decision Tree
Random Forest
K – Nearest Neighbors
Support Vector Machine
Naive Bayes
Regression Algorithms
There are different types of regression algorithms that have been developed
over time to give the best results for regression tasks.
Lasso Regression
Ridge Regression
XGBoost Regressor
LGBM Regressor
Comparison between Classification and Regression
Feature Classification Regression
Key Points:
Sigmoid function
Random Forest
It is a method that combines the predictions of multiple decision trees to
produce a more accurate and stable result. It can be used for both
classification and regression tasks.
In classification tasks, Random Forest Classification predicts categorical
outcomes based on the input data. It uses multiple decision trees and
outputs the label that has the maximum votes among all the individual tree
predictions and in this article we will learn more about it.
Random Forest Classification works by creating multiple decision trees
each trained on a random subset of data. The process begins with Bootstrap
Sampling where random rows of data are selected with replacement to
form different training datasets for each tree.
Imagine you’re deciding which fruit it is based on its shape and size. You
compare it to fruits you already know.
If k = 3, the algorithm looks at the 3 closest fruits to the new one.
If 2 of those 3 fruits are apples and 1 is a banana, the algorithm says the
new fruit is an apple because most of its neighbours are apples.
Working of KNN algorithm
Thе K-Nearest Neighbors (KNN) algorithm operates on the principle of
similarity where it predicts the label or value of a new data point by
considering the labels or values of its K nearest neighbors in the training
dataset.
The k data points with the smallest distances to the target point are
nearest neighbors.
Step 4: Voting for Classification or Taking Average for
Regression
When you want to classify a data point into a category (like spam or not
spam), the K-NN algorithm looks at the K closest points in the dataset.
These closest points are called neighbors. The algorithm then looks at
which category the neighbors belong to and picks the one that appears
the most. This is called majority voting.
In regression, the algorithm still looks for the K closest points. But
instead of voting for a class in classification, it takes the average of the
values of those K neighbors. This average is the predicted value for the
new point for the algorithm.
Distance Metrics Used in KNN Algorithm
KNN uses distance metrics to identify nearest neighbour, these neighbours
are used for classification and regression task. To identify nearest
neighbour we use below distance metrics:
1. Euclidean Distance
2. Manhattan Distance
This is the total distance you would travel if you could only move along
horizontal and vertical lines (like a grid or city streets). It’s also called
“taxicab distance” because a taxi can only drive along the grid-like streets
of a city.
d(x,y)=∑i=1n∣xi−yi∣d(x,y)=∑i=1n∣xi−yi∣
3. Minkowski Distance
The best hyperplane, also known as the “hard margin,” is the one that
maximizes the distance between the hyperplane and the nearest data points
from both classes. This ensures a clear separation between the classes. So,
from the above figure, we choose L2 as hard margin.
Let’s consider a scenario like shown below:
Here, we have one blue ball in the boundary of the red ball.
When data is not linearly separable (i.e., it can’t be divided by a straight
line), SVM uses a technique called kernels to map the data into a higher-
dimensional space where it becomes separable. This transformation helps
SVM find a decision boundary even for non-linear data.
The dataset is divided into two parts, namely, feature matrix and
the response vector.
Feature matrix contains all the vectors(rows) of dataset in which each
vector consists of the value of dependent features. In above dataset,
features are ‘Outlook’, ‘Temperature’, ‘Humidity’ and ‘Windy’.
Response vector contains the value of class variable(prediction or
output) for each row of feature matrix. In above dataset, the class
variable name is ‘Play golf’.
Assumption of Naive Bayes
The fundamental Naive Bayes assumption is that each feature makes an:
Feature independence: This means that when we are trying to classify
something, we assume that each feature (or piece of information) in the
data does not affect any other feature.
Continuous features are normally distributed: If a feature is
continuous, then it is assumed to be normally distributed within each
class.
Discrete features have multinomial distributions: If a feature is
discrete, then it is assumed to have a multinomial distribution within
each class.
Features are equally important: All features are assumed to
contribute equally to the prediction of the class label.
No missing data: The data should not contain any missing values.
Types of Naive Bayes Model
There are three types of Naive Bayes Model :
Bernoulli Naive Bayes deals with binary features, where each feature
indicates whether a word appears or not in a document. It is suited for
scenarios where the presence or absence of terms is more relevant than
their frequency. Both models are widely used in document classification
tasks
Q. Apply K(=2)-Means algorithm over the data (185, 72), (170, 56), (168, 60),
(179,68), (182,72), (188,77) up to two iterations and show the clusters. Initially choose
first two objects as initial centroids.
Solution:
Given, number of clusters to be created (K) = 2 say c1 and c2,
number of iterations = 2 and
The given data points can be represented in tabular form as:
How do we choose K?
Mark 40 0 Neither
Sara 16 1 Cricket
Zaira 34 1 Cricket
Sachin 55 0 Neither
Rahul 40 0 Cricket
Pooja 20 1 Neither
Smith 15 0 Cricket
Laxmi 55 1 Football
Michael 15 0 Football
Here male is denoted with numeric value 0 and female with 1. Let’s find in which
class of people Angelina will lie whose k factor is 3 and age is 5. So we have to find
out the distance using Euclidean distance formula:
d=(x2–x1)2+(y2–y1)2d=(x2–x1)2+(y2–y1)2 to find the distance between any two
points.
To calculate the distance between Angelina and other individuals in the dataset:
d=(age2–age1)2+(gender2–gender1)2d=(age2–age1)2+(gender2–gender1)2
Here, Angelina has:
Age = 5
Gender = 1 (female)
1. Distance between Angelina and Ajay (age = 32, gender = 0):
d=(5–32)2+(1–0)2=729+1=730=27.02d=(5–32)2+(1–0)2=729+1
=730=27.02
2. Distance between Angelina and Mark (age = 40, gender = 0):
d=(5–40)2+(1–0)2=1225+1=1226=35.01d=(5–40)2+(1–0)2
=1225+1=1226=35.01
3. Distance between Angelina and Sara (age = 16, gender = 1):
d=(5–16)2+(1–1)2=121+0=121=11.00d=(5–16)2+(1–1)2=121+0
=121=11.00
4. Distance between Angelina and Zaira (age = 34, gender = 1):
d=(5–34)2+(1–1)2=841+0=841=29.00d=(5–34)2+(1–1)2=841+0
=841=29.00
5. Distance between Angelina and Sachin (age = 55, gender = 0):
d=(5–55)2+(1–0)2=2500+1=2501=50.01d=(5–55)2+(1–0)2
=2500+1=2501=50.01
6. Distance between Angelina and Rahul (age = 40, gender = 0):
d=(5–40)2+(1–0)2=1225+1=1226=35.01d=(5–40)2+(1–0)2
=1225+1=1226=35.01
7. Distance between Angelina and Pooja (age = 20, gender = 1):
d=(5–20)2+(1–1)2=225+0=225=15.00d=(5–20)2+(1–1)2=225+0
=225=15.00
8. Distance between Angelina and Smith (age = 15, gender = 0):
d=(5–15)2+(1–0)2=100+1=101=10.05d=(5–15)2+(1–0)2=100+1
=101=10.05
9. Distance between Angelina and Laxmi (age = 55, gender = 1):
d=(5–55)2+(1–1)2=2500+0=2500=50.00d=(5–55)2+(1–1)2
=2500+0=2500=50.00
10. Distance between Angelina and Michael (age = 15, gender = 0):
d=(5–15)2+(1–0)2=100+1=101=10.05d=(5–15)2+(1–0)2=100+1
=101=10.05
Ajay 27.02
Mark 35.01
Sara 11.00
Zaira 29.00
Sachin 50.01
Rahul 35.01
Pooja 15.00
Smith 10.05
Laxmi 50.00
Michael 10.05