02- KNN & Regression
02- KNN & Regression
• If we aren't sure of what features to use for our machine learning model,
clustering can discovers those patterns for use to figure out what stands
out in the data.
• Each data cluster in the K-means algorithm is created in such a way that
they are placed as far away as possible from each other. The data points
in the clusters are allocated to the nearest centroid till no point is left
without a centroid.
Step-3
Assign each data point to the closest centroid based on calculated
proximity value.
Step-4
Re-compute the centroids using the current cluster memberships. The
new cluster is simple the mean of the points. Calculate new Proximity of
points with respect to new found Centroids.
Step-5
If a convergence criterion is not met, repeat steps 2 and 4
Step-6
Plot elbow graph to select best k value.
Three Metrics :-
1. Points – Euclidian distance
2. Vectors – Cosine distance
3. Sets – Jaccard distance
iteration -1 ( k=3)
B1 5 8
B2 7 5
B3 6 4
C1 1 2
Step-2
Calculated Euclidian distance
Data Points
Choosing Euclidian distance C2 4 9 2 10 5 8 1 2
method calculate the distance of A1 2 10 0 3.61 8.06
A2 2 5 5 4.24 3.16
every point from the selected A3 8 4 8.49 5 7.28
iteration -1 ( k=3)
IGDTU - BTECH
C2 , IT1
4 & 9IT2 ,2.24
5th Semester
1.41 7.61 ,
K means Clustering Algorithm – Example
(contd)
Step-3
Assign the data point to Centroids based on calculated Euclidian
distance. Calculated Euclidian distance
Data Points
2 10 5 8 1 2 C luster
A1 2 10 0 3.61 8.06 1
A2 2 5 5 4.24 3.16 3
A3 8 4 8.49 5 7.28 2
iteration -1 ( k=3)
B1 5 8 3.61 0 7.21 2
B2 7 5 7.07 3.61 6.71 2
B3 6 4 7.21 4.12 5.39 2
C1 1 2 8.01 7.21 0 3
Step-2 ( repeat )
Re compute new clusters values. The calculations are
C2 4 9 2.24 1.41 7.61 2
1st Cluster : No change since only one point falling in it. (2,10)
2nd Cluster : Total 5 points , (8,4) , (5,8) , (7,5) , (6,4) , (4,9)
Recalculated new 2nd cluster points :
(8+5+7+6+4)/ 5 , (4+8+5+4+9)/5 = (6,6)
iteration -1 ( k=3)
B1 5 8
B2 7 5
B3 6 4
C1 1 2
Again calculate the Euclidian
distance from new found Calculated Euclidian distance
Data Points
Centroids . 2 10 6 6 1.5 3.5
A1
C2 2
4 10
9 0 5.66 6.52
A2 2 5 5 4.12 1.58
A3 8 4 8.59 2.83 6.52
iteration -1 ( k=3) B1 5 8 3.61 2.24 5.7
B2 7 5 7.07 1.41 5.7
B3 6 4 7.21 2 4.53
Step-3( repeat ) C1 1 2 8.06 6.4 1.58
Assign points to
new Clusters While
retaining previous Calculated Euclidian distance
Data Points
Clusters value. C2 4 2 9 10 62.24 6 1.53.61 3.5 C luster
6.04 New Cluster
A1 2 10 0 5.66 6.52 1 1
A2 2 5 5 4.12 1.58 3 3
Observe that Point A3 8 4 8.59 2.83 6.52 2 2
iteration -1 ( k=3)
B1 5 8 2
B2 7 5 2
B3 6 4 2
C1 1 2 3
iteration -1 ( k=3)
distance from new B1 5 8 2.5 3.13 5.7 2
B2 7 5 6.02 0.56 5.7 2
Centroids. B3 6 4 6.26 1.35 4.53 2
C1 1 2 7.76 6.39 1.58 3
1
Step-3( repeat ) C2 4 9 1.12 4.51 6.04
B1 5 8
B2 7 5
B3 6 4
C1 1 2
iteration -1 ( k=3)
B1 5 8 1.67 4.18 5.7
B2 7 5 5.21 0.67 5.7
B3 6 4 5.52 1.05 4.53
C1 1 2 7.49 6.44 1.58
• When we see an elbow shape in the graph, we pick the K-value where
the elbow gets created. We can call this the elbow point. Beyond the
elbow point, increasing the value of ‘K’ does not lead to a significant
reduction in WCSS.
𝑘= 0 𝑘
()
( 𝑥+ 𝑎 ) = ∑ 𝑛 𝑥 𝑘 𝑎𝑛− 𝑘
( )
∞
𝑛𝜋𝑥 𝑛𝜋 𝑥
𝑓 ( 𝑥 ) =𝑎0 + ∑ 𝑎𝑛 cos +𝑏𝑛 sin Binomial Exp
𝑛=1 𝐿 𝐿
2 2 2
𝑎 +𝑏 =𝑐 Fourier Series
2
𝐴=𝜋 𝑟 Pythogorous Eq.
Regression Model
Types
It is a technique for using data to identify
relationships among variables and use these
relationships to make predictions.
3. Polynomial Regression
Simple Linear Regression -
SLM linear regression refers to a model
A simple
with just one explanatory variable.
Y = β0 + β1X + €
The goal of the Linear Regression algorithm is to
estimate the values of these coefficients (β0, β1,
β2, …, βn) in such a way that the sum of
squared errors is minimized. This process is
called the Ordinary Least Squares (OLS) method.
least squares is optimal is a theoretical issue
that we do not address.
Simple Linear Regression -
SLM
Simple Linear regression example is a straight line
characteristics which plots a straight line within
data points to minimise error between the line
and the data points.
It is one of the most simple and basic types of
machine learning regression.
In practice, we rarely have just one
explanatory variable, so we use multiple
rather than simple regression.
Multiple Linear Regression -
MLRLinear Regression MLR) is a analyses the
Multiple
relationship of dependent outcome against
multiple independent variables. It is an extension
of Simple Linear Regression (SLR) and assumes
linear relationship between Y and X.
The goal is to estimate the linear equation
coefficients that best describe this relationship.
The general multiple linear regression model with K
explanatory variables has the following
Y = β0 + β1X1 + ・ ・ ・ + βKXK + €
{ Y1,X11, . . . ,XK1+ €1
Y2,X12, . . . ,XK2+ €2
……
Yn,X1n, . . . ,XKn+ €n }
Polynomial
Regression
It is used to represent a non-linear relationship
between dependent and independent
variables. It is a variant of the multiple linear
regression model, except that the best fit line
is curved rather than straight.
Evaluating Model – Loss function
Both cost function and loss function are crucial concepts in
machine learning. The loss function calculates the error for a
single training example (or data point). It measures how
well or poorly the model performed on one observation.
Whereas , The cost function is the average of the loss
function over the entire dataset.
They are measured by following Metrics.
b = a – λ.Δf(a)
Gradient Descent
The error in the(contd)
model can be different at different points, and
our objective is to find the quickest way to minimize it, to
prevent resource wastage.
A good way to make sure the gradient descent algorithm runs
properly is by plotting the cost function as the optimization
runs. We put the number of iterations on the x-axis and the
value of the cost function on the y-axis. We observe the cost
function after each iteration of gradient descent. The trajectory
of Cost function then provides us a way to spot how appropriate
the learning rate (λ) is. If the gradient descent algorithm is
working properly, the cost function should decrease after every
iteration.
Low learning rate
High learning
rate
Good learning
10 rate
20 40 60 80
Learning rate
The Learning rate (λ) is decided based on trial-and-error
method and it plays very vital role in convergence of the
model . The number of iterations gradient descent may needs
to converge can sometimes vary a lot. It can take 50 iterations,
60,000 or maybe even 3 million.
Generally the learning rate (λ) chosen range from low values of
say , 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, etc., as the learning
rates and look at which one performs the best.
An inappropriate learning rate may have problems like
If Learning rate is low it may results in wasted time and
resources. A lower learning rate means more training time
more time results in increased cloud GPU costs.
A higher rate could result in a model that might not be able
to predict anything accurately.
ny
r a
fo S E
l ve d M
So a n
t s E
L e A
– h M
r k ot
o b
w
s s W
a e
Cl hre
t
Slope is steep so model Slope is gradual .
converge could be problem Guaranteed to reach local
as local minima could get minima but may take time.
Principle & Architecture of a Neural
Network
Neural nets are a means of doing machine learning, in which a computer
learns to perform some task by analyzing training examples. They are
modeled loosely on the human brain and consists of thousands or even
millions of simple processing nodes that are densely interconnected. For
eg. The deep neural net of Google Brain, is reported to have over 100
billion artificial neurons in 2020 while OpenAI's GPT-3, contains around
175 inspiration
The billion nodes.
for Artificial Neural
network comes from human brain.
The artificial neural nets are
organized into layers of nodes, and
that data is processed at every level
of node and passed on Nodeto the next
node . Multiplying inspiration
weight w1 ,
Input-1 𝑥1
w2
𝑤1 …
Input-2 𝑥2 𝑤2
𝑦
𝑤3
Input-3 𝑥3 Y = Summed
𝑏
+1 Output
Y > TP or
...
𝑤𝑁 Y < TP Threshold
𝑥𝑁
Input-n Potential
Principle & Architecture of a Neural Network
In the figures on previous slide , to each of its incoming connections, a node
(contd)
will assign a number known as a “weight.” When the network is active, the
node receives different inputs x1 , x2 , x3 ….xn over each of its
connections. It multiplies it by the associated weight w1 , w2 etc as shown in
figure. It then adds the resulting products together (yin) yielding a single
number. If that number is below a threshold value(TP) (yin < TP ) , the
node passes no data to the next layer. If the number exceeds the threshold
value (yin > TP ) the node “fires,” the next node in hierarchy and the process
goes. The TP can also be called sometimes as an Activation function.
Activation Function (ƒ ) is a mathematical formula that helps the neuron
to
1. switch
BinaryON/OFF based on value it generates for TP. Two most common
sigmoidal
Activation
function: functions that are used are
This activation function
performs input editing yin=∑xi.wi
between 0 and 1.
F(x)=sigm(x)=
= 1/ Y=ƒ(yin
. (1+exp(−x))
Bipolar sigmoidal function:
his activation function performs )
put editing between -1 and 1.
(x)=sigm(x)=
BTECH , IT1 & IT2 , 5th Semester , Sep
=1−exp(x) / 1+exp(x)
Types of Neural Network
Neural networks can be described in many ways. They can be classified
depending on their: Structure, Data flow, Neurons used and their density,
Layers and their depth activation filters etc. Following are some of the most
important Neural Network structures that are popularly in use.
1. Feed-forward neural networks
These are simplest form of neural networks where input data travels in
one direction only, passing through artificial neural nodes(cell) and
exiting through output nodes (cell) . The network might or might not have
hidden node layers. Number of layers depends on the complexity of the
function.
In the two-layer feed forward neural network
the input layer takes in the data and forwards
the data to the node of the hidden layer. Each
node in the hidden layer serves as a function
that integrates its inputs and transmits the
output of the function into the node in the
next layer. There are no cyclic computations .
The neuron is activated if it is above threshold
(usually 0) and the neuron produces 1 as an
output. The neuron is not activated if it is
below threshold (usually 0) which is
This type of ANN computational model is used in technologies such as facial
considered as -1.
recognition and computer vision. BTECH , IT1 & IT2 , 5th Semester , Sep
2. Recurrent neural networks (RNNs)
This neural network starts with the front propagation
as a feed-forward network but then save the output
of processing nodes and feed the result back into
the model. Thus these NN goes on to remember all
processed information to reuse it in the future. If the
network's prediction is incorrect, then the system
self-learns and continues working toward the correct
prediction during back propagation stage. This type
of ANN is frequently used in text-to-speech
3. Convolutional neural networks (CNNs)
conversions.
The CNN model is particularly popular in the image recognition, facial
recognition, text digitization and NLP application areas. CNNs are one of
the most popular models used today. Convolution neural network contains
a three-dimensional arrangement of neurons instead of the standard two-
dimensional array. The first layer is called a convolutional layer. These
convolutional layers create feature maps that record a region of the image
that's ultimately broken into rectangles and sent out for nonlinear
processing. The inputs are multiplied with weights and fed to the
activation function which are generally nonlinear activation function. Other
use cases include paraphrase detection, signal processing and image
classification.
4. Deep Neural Networks (DNNs)
A deep neural network is a much more complicated system than a
“simple” neural system. In a Deep feed forward neural network there can
be many hidden layers between the input layer and the output layer that
model the complex relations between the inputs and the outputs. They
have multiple levels of nonlinear operations and they go beyond the input
data and can learn from previous experiences. They aim at learning
feature hierarchies, where features at higher levels of the hierarchy are
formed using the features at lower levels.
Deep neural networks can recognize voice commands, identify voices,
recognize sounds and graphics and do much more than a neural network.
Deep learning networks utilize "Big Data'' along with algorithms in order to
solve a problem, and these deep neural networks can solve problems with
limited or no human input.
c o ic .
optimization algorithm used in training artificial neural networks. It helps
n op
the model update its weights by minimizing the error between predicted
o w
and actual outputs using Gradient Descent.
t
u r o n
i
yo gat
re a
p a op
r e p r
p ck
s e a
ea rB
l
P fo
BTECH , IT1 & IT2 , 5 Semester , Sep
th
Thanks