0% found this document useful (0 votes)
2 views40 pages

02- KNN & Regression

The document discusses supervised and unsupervised machine learning, detailing their definitions, applications, and differences. It emphasizes the significance of clustering in unsupervised learning, explaining its purpose and various algorithms, particularly the K-means clustering algorithm. The document also outlines the steps involved in the K-means algorithm and provides examples of its application.

Uploaded by

azmiunofficial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views40 pages

02- KNN & Regression

The document discusses supervised and unsupervised machine learning, detailing their definitions, applications, and differences. It emphasizes the significance of clustering in unsupervised learning, explaining its purpose and various algorithms, particularly the K-means clustering algorithm. The document also outlines the steps involved in the K-means algorithm and provides examples of its application.

Uploaded by

azmiunofficial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Unit # 1

KNN , Regression , Loss


Function, Gradient
descent, Neural Network
, Back Propagation

BTECH , IT1 & IT2 , 5th Semester , Sep


Unsupervised Learning: Clustering
Machine Learning
Algorithms
Supervised Reinforcement
Learning Un Supervised Learning Algorithms
Algorithms Learning Algorithms

Supervised Machine Learning


• Supervised learning is the types of machine learning in which machines
are trained using well "labeled" training data, and on basis of that data,
machines predict the output. The labeled data means some input data is
already tagged with the correct output. Thus many times it is so said that
in supervised learning, the supervisor teaches the machines to predict
the output correctly.
• Supervised learning can be used for those cases where we know the
input as well as corresponding outputs.
• Supervised learning is used for two types problems Classification and
Regression.
• Supervised learning is not close to true Artificial intelligence as in this,
we first train the model for each data, and then only it can predict the
correct output. IGDTU - BTECH , IT1 & IT2 , 5th Semester ,
Un supervised Learning: Clustering
( contd
Un Supervised )
Machine Learning
• Unsupervised learning is another machine learning method in which
patterns are inferred from the unlabeled input data. The goal of
unsupervised learning is to find the structure and patterns from the input
data.
• Unsupervised learning does not need any supervision. Instead, it finds
patterns from the data by its own.
• Unsupervised learning can be used for those cases where we have only
input data and no corresponding output data.
• Unsupervised learning is used for two types of problems . Clustering &
Associations
• Unsupervised learning is more close to the true Artificial Intelligence as it
learns similarly as a child learns daily routine things by his experiences.
Why Un Supervised Machine Learning ?
1. A large amount of real world data comes unannoteted. Annotating large
datasets is very costly because it can be expensive to get an expert to
label every data point and hence we can label only a few examples
manually.
2. There may be cases where we even don’t know how many/what classes
is the data divided into. Example: Data Mining
3. We may want to use clustering to gain some insight into the structure of
IGDTU - BTECH , IT1 & IT2 , 5th Semester ,
the data before designing a classifier.
Clustering
Clustering can be considered the most important unsupervised learning
problem; so, as every other problem of this kind, it deals with finding a
structure in a collection of unlabeled data. A loose definition of clustering
could be “the process of organizing objects into groups whose members
are similar in some way”. A cluster is therefore a collection of objects
which are “similar” between them and are “dissimilar” to the objects
belonging to other clusters.
Why Clustering ?

• We might want to use clustering when we want to do anomaly detection


or to find outliers in our data. It helps by finding those groups of clusters
and showing the boundaries that would determine whether a data point
is an outlier or not.

• If we aren't sure of what features to use for our machine learning model,
clustering can discovers those patterns for use to figure out what stands
out in the data.

• Clustering is especially useful for exploring data we know nothing about.


It might take some time to figure out which type of clustering algorithm
works the best, but when we do,
IGDTU we get ,invaluable
- BTECH IT1 & IT2insight
, 5th on the data.,
Semester
We might find connections we never would have thought of.
Popular Clustering Algorithms
. K-means clustering algorithm
. DBSCAN clustering algorithm (Density-Based Spatial Clustering of
Applications with Noise)
. GMM algorithm (Gaussian Mixture Model algorithm)
. BIRCH algorithm (Balanced Iterative Reducing and Clustering using
Hierarchies)
. Affinity Propagation clustering algorithm
. Mean-Shift clustering algorithm
. OPTICS algorithm
. Spectral Clustering

IGDTU - BTECH , IT1 & IT2 , 5th Semester ,


K means Clustering Algorithm
• The most commonly used algorithm is, K-means clustering It is a
centroid-based algorithm. It is said to be the simplest unsupervised
learning algorithm. Here, K defines the number of predefined clusters
that need to be generated.

• Each data cluster in the K-means algorithm is created in such a way that
they are placed as far away as possible from each other. The data points
in the clusters are allocated to the nearest centroid till no point is left
without a centroid.

• As long as the data is numerical, it can be analyzed with the K-means


algorithm. In addition to being easy to understand, this algorithm is also
much faster than other clustering algorithms.

• The drawback of this algorithm is that it is not compatible with non-


linear data, or data that possess many outliers, or a categorical data.

IGDTU - BTECH , IT1 & IT2 , 5th Semester ,


K means Clustering Algorithm - Steps
Step-1
Choose k (random) data points (seeds) to be the initial centroids,
cluster centers
Step-2
Calculate the proximity of data points wrt the selected centroid using
Proximity measure methods.

Step-3
Assign each data point to the closest centroid based on calculated
proximity value.
Step-4
Re-compute the centroids using the current cluster memberships. The
new cluster is simple the mean of the points. Calculate new Proximity of
points with respect to new found Centroids.
Step-5
If a convergence criterion is not met, repeat steps 2 and 4

Step-6
Plot elbow graph to select best k value.

IGDTU - BTECH , IT1 & IT2 , 5th Semester ,


Sep 2023
Convergence (stopping) Criterion
• No (or minimum) re-assignments of data points to different cluster
possible.
Proximity Metric
• For clustering, we need to define a proximity measure for two data
points. Proximity here means how similar/dissimilar the samples are
with respect to each other.

Similarity measure S(xi,xk): large if xi,xk are similar


Dissimilarity(or distance) measure D(xi,xk): small if xi,xk are similar

Three Metrics :-
1. Points – Euclidian distance
2. Vectors – Cosine distance
3. Sets – Jaccard distance

IGDTU - BTECH , IT1 & IT2 , 5th Semester ,


Formulas
Point – Euclidian distance
The straight line distance between two points. In a plane with p 1 at (x1,
y1) and p2 at (x2, y2), it is

E.D = √((y1 - y2)² + (x1 - x2)² )


Vectors – Cosine distance
In cosine similarity, data objects in a dataset are treated as a vector. The
formula to find the cosine similarity between two vectors is –
where,
(x, y) = x . y / ||x|| * ||y||
•x . y = product (dot) of the vectors ‘x’ and
‘y’.
•||x|| and ||y|| = length (magnitude) of the
two vectors ‘x’ and ‘y’.
Sets – Jaccard distance •||x|| * ||y|| = regular product of the two
vectors ‘x’ and ‘y
The Jaccard Index and the Jaccard Distance between the two sets can be
calculated by using the formula:

IGDTU - BTECH , IT1 & IT2 , 5th Semester ,


K means Clustering Algorithm -
Example
Step-1
Choose k (random) data points (seeds) to be the initial centroids,
cluster centers.
In the example we choose three centroids (Data
randomly
Points
) say points
Calculated Euclidian distance
represented by A1 , B1 & C1 and prepare the table 2 10 5 8 1 2
A1 2 10
as shown. A2 2 5
A3 8 4

iteration -1 ( k=3)
B1 5 8
B2 7 5
B3 6 4
C1 1 2

Step-2
Calculated Euclidian distance
Data Points
Choosing Euclidian distance C2 4 9 2 10 5 8 1 2
method calculate the distance of A1 2 10 0 3.61 8.06
A2 2 5 5 4.24 3.16
every point from the selected A3 8 4 8.49 5 7.28
iteration -1 ( k=3)

three centroids. The formula to B1 5 8 3.61 0 7.21


B2 7 5 7.07 3.61 6.71
calculate is given on previous B3 6 4 7.21 4.12 5.39
slide. Doing this we get C1 1 2 8.01 7.21 0

IGDTU - BTECH
C2 , IT1
4 & 9IT2 ,2.24
5th Semester
1.41 7.61 ,
K means Clustering Algorithm – Example
(contd)
Step-3
Assign the data point to Centroids based on calculated Euclidian
distance. Calculated Euclidian distance
Data Points
2 10 5 8 1 2 C luster
A1 2 10 0 3.61 8.06 1
A2 2 5 5 4.24 3.16 3
A3 8 4 8.49 5 7.28 2
iteration -1 ( k=3)

B1 5 8 3.61 0 7.21 2
B2 7 5 7.07 3.61 6.71 2
B3 6 4 7.21 4.12 5.39 2
C1 1 2 8.01 7.21 0 3

Step-2 ( repeat )
Re compute new clusters values. The calculations are
C2 4 9 2.24 1.41 7.61 2
1st Cluster : No change since only one point falling in it. (2,10)
2nd Cluster : Total 5 points , (8,4) , (5,8) , (7,5) , (6,4) , (4,9)
Recalculated new 2nd cluster points :
(8+5+7+6+4)/ 5 , (4+8+5+4+9)/5 = (6,6)

3rd Cluster : Total 2 points only , (2,5) , (1,2)


New 3rd cluster points = ((2+1)/2 , (5+2)/2) = (1.5 , 3.5)
Hence the new table becomes as shown
IGDTU - BTECH , IT1 & IT2 , 5th Semester ,
Calculated Euclidian distance
Data Points
2 10 6 6 1.5 3.5
A1 2 10
A2 2 5
A3 8 4

iteration -1 ( k=3)
B1 5 8
B2 7 5
B3 6 4
C1 1 2
Again calculate the Euclidian
distance from new found Calculated Euclidian distance
Data Points
Centroids . 2 10 6 6 1.5 3.5
A1
C2 2
4 10
9 0 5.66 6.52
A2 2 5 5 4.12 1.58
A3 8 4 8.59 2.83 6.52
iteration -1 ( k=3) B1 5 8 3.61 2.24 5.7
B2 7 5 7.07 1.41 5.7
B3 6 4 7.21 2 4.53
Step-3( repeat ) C1 1 2 8.06 6.4 1.58
Assign points to
new Clusters While
retaining previous Calculated Euclidian distance
Data Points
Clusters value. C2 4 2 9 10 62.24 6 1.53.61 3.5 C luster
6.04 New Cluster
A1 2 10 0 5.66 6.52 1 1
A2 2 5 5 4.12 1.58 3 3
Observe that Point A3 8 4 8.59 2.83 6.52 2 2
iteration -1 ( k=3)

C2 shifted from B1 5 8 3.61 2.24 5.7 2 2


B2 7 5 7.07 1.41 5.7 2 2
cluster 2 to new B3 6 4 7.21 2 4.53 2 2
cluster 1 C1 1 2 8.06 6.4 1.58 3 3
Step-2 ( repeat )
Re compute new clusters values and calculate new Euclidian distances
from new Centroids.

The calculations are


1st Cluster : Total 2 points in it . (2,10) , (4,9)
New 1st Cluster points = ( (2+4)/2 , (10+9)/2 )
= (3,9.5)
2nd Cluster : Total 4 points , (8,4) , (5,8) , (7,5) , (6,4)
Recalculated new 2nd cluster points :
(8+5+7+6)/ 4 , (4+8+5+4)/4 = (6.5, 5.25)
3rd Cluster : Total 2 points only , (2,5) , (1,2)
New 3rd cluster points = ((2+1)/2 , (5+2)/2) = (1.5 , 3.5)
Hence the new table becomes as shown
Calculated Euclidian distance
Data Points
3 9.5 6.5 5.25 1.5 3.5 C luster
A1 2 10 1
A2 2 5 3
A3 8 4 2
iteration -1 ( k=3)

B1 5 8 2
B2 7 5 2
B3 6 4 2
C1 1 2 3

IGDTU - BTECH , IT1 & IT2 , 5th Semester ,


Calculated Euclidian distance
Data Points
Step-2 ( contd ) 3 9.5 6.5 5.25 1.5 3.5 C luster
A1 2 10 1.12 6.54 6.52 1
A2 2 5 4.61 4.51 1.58 3
Calculate Eucldian A3 8 4 7.43 1.95 6.52 2

iteration -1 ( k=3)
distance from new B1 5 8 2.5 3.13 5.7 2
B2 7 5 6.02 0.56 5.7 2
Centroids. B3 6 4 6.26 1.35 4.53 2
C1 1 2 7.76 6.39 1.58 3

1
Step-3( repeat ) C2 4 9 1.12 4.51 6.04

Assign points to Calculated Euclidian distance


Data Points
new Clusters While 3 9.5 6.5 5.25 1.5 3.5 C luster New Cluster

retaining previous A1 2 10 1.12 6.54 6.52 1 1


A2 2 5 4.61 4.51 1.58 3 3
Clusters value. A3 8 4 7.43 1.95 6.52 2 2
iteration -1 ( k=3)

B1 5 8 2.5 3.13 5.7 2 1


B2 7 5 6.02 0.56 5.7 2 2
Observe that Point B3 6 4 6.26 1.35 4.53 2 2
B1 now shifted from C1 1 2 7.76 6.39 1.58 3 3
cluster 2 to new
cluster 1. Its
encircled in Red
So since again C2
data
4 point
9 has4.51moved
1.12 6.04 from one
1 cluster
1
now.
to another we need to re compute new Centroid ,
calculate Euclidian distance and assign data points to
Calculated Euclidian distance
Data Points
3.67 9 7 4.33 1.5 3.5
A1 2 10
A2 2 5
A3 8 4
iteration -1 ( k=3)

B1 5 8
B2 7 5
B3 6 4
C1 1 2

Calculated Euclidian distance


Data Points
3.67 9 7 4.33 1.5 3.5
C2 4 9 A1 2 10 1.94 7.56 6.52
A2 2 5 4.33 5.04 1.58
A3 8 4 6.62 1.05 6.52

iteration -1 ( k=3)
B1 5 8 1.67 4.18 5.7
B2 7 5 5.21 0.67 5.7
B3 6 4 5.52 1.05 4.53
C1 1 2 7.49 6.44 1.58

Calculated Euclidian distance


Data Points
3.67 9 7 4.33 1.5 C23.5 4C luster 9 New Cluster
0.33 5.55 6.04
A1 2 10 1.94 7.56 6.52 1 1
A2 2 5 4.33 5.04 1.58 3 3
A3 8 4 6.62 1.05 6.52 2 2
iteration -1 ( k=3)

B1 5 8 1.67 4.18 5.7 1 1


B2 7 5 5.21 0.67 5.7 2 2
B3 6 4 5.52 1.05 4.53 2 2
C1 1 2 7.49 6.44 1.58 3 3
IGDTU - BTECH , IT1 & IT2 , 5th Semester ,
Elbow method to determine value of k

• K-means clustering is one of the most used clustering algorithms in


the field of data science. To successfully implement the K-means
algorithm, we need to identify the number of clusters we want to create
using the K-means. Here number of clusters is a hyper parameter, it will
• be defined before running the model.
The elbow method is a graphical method for finding the optimal K value
in a k-means clustering algorithm. The elbow graph shows the within-
cluster-sum-of-square (WCSS) values on the y-axis corresponding to the
different values of K (on the x-axis). The optimal K value is the point at
which the graph forms an elbow.

• When we see an elbow shape in the graph, we pick the K-value where
the elbow gets created. We can call this the elbow point. Beyond the
elbow point, increasing the value of ‘K’ does not lead to a significant
reduction in WCSS.

IGDTU - BTECH , IT1 & IT2 , 5th Semester ,


Regression

 It is a technique for using data to identify


relationships among variables and use
these relationships to make predictions
when applied to new and unseen data
 Common use of Regression includes :
 Predicting continuous outcomes like house prices, or stocks.

 Predicting the success of future retail sales or marketing


 Predicting customer or user trends, such as on streaming services

 Predicting interest rates or stock prices from a variety of factors.

 Predicting patient disease condition


Prediction
!!
What is a Regression Model ?

 Model provides a function that describes the


relationship between one or more independent
variables and a response, dependent, or target
variable.
 Y = mX +C

 Sale Price of a house ??

Sq ft No.of No.of Floor


area Bedroom Bathrooms level

Selling Price = (sq.ft.)+ (no. bedrooms) + (no. bath) + (floor level)


Selling Price = (sq.ft.)- (no. bedrooms) - (no. bath) - (floor level)
Selling Price = (sq.ft.)- 10*(no. bedrooms) - (no. bath \ 100) - (floor level)*(-
25\89)
Model ( contd )
Selling Price =β1 (sq.ft.)+β2 (no. bedrooms) +
β3 (no. bath)
β4 + (floor level)

 What are these β ‘s ???

 Can they be Margins ?? Compensators ??


Adjustment Factors ??

 Selling Price = β1 (sq.ft.) + β2 (no. bedrooms)


+ β3 (no. bath) + β4 (floor level) + error
 Selling Price = β0 + β1 (sq.ft.) + β2 (no.
bedrooms) + β3 (no. bath) + β4 (floor level) +
error
???
 Selling Price = Y = β0 + β1X1 + ・ ・ ・ + βKXK +€

 Selling Price (Y) = ƒ{ ∑ }

 A regression model specifies a relation between


a dependent variable Y and certain explanatory
variables X1, . . . ,XK.
What is a Regression Model ?

 Model provides a function of summing


 Sale Price = X1 + X2 +X3 +X4 …..
 Function can be empirical
 Function can be - Polynomial
 Some relationship ?? Series
expansion
2
𝑛𝑥 𝑛 ( 𝑛 −1) 𝑥
( 1+ 𝑥 )𝑛=1+ + +…
1! 2!
𝑛
𝑛

𝑘= 0 𝑘
()
( 𝑥+ 𝑎 ) = ∑ 𝑛 𝑥 𝑘 𝑎𝑛− 𝑘

( )

𝑛𝜋𝑥 𝑛𝜋 𝑥
𝑓 ( 𝑥 ) =𝑎0 + ∑ 𝑎𝑛 cos +𝑏𝑛 sin Binomial Exp
𝑛=1 𝐿 𝐿
2 2 2
𝑎 +𝑏 =𝑐 Fourier Series
2
𝐴=𝜋 𝑟 Pythogorous Eq.
Regression Model
Types
 It is a technique for using data to identify
relationships among variables and use these
relationships to make predictions.

 Broad categories of Regression are :


1. Simple Linear Regression (SLR)

2. Multiple Regression (MLR)

3. Polynomial Regression
Simple Linear Regression -
SLM linear regression refers to a model
 A simple
with just one explanatory variable.
 Y = β0 + β1X + €
 The goal of the Linear Regression algorithm is to
estimate the values of these coefficients (β0, β1,
β2, …, βn) in such a way that the sum of
squared errors is minimized. This process is
called the Ordinary Least Squares (OLS) method.
 least squares is optimal is a theoretical issue
that we do not address.
Simple Linear Regression -
SLM
 Simple Linear regression example is a straight line
characteristics which plots a straight line within
data points to minimise error between the line
and the data points.
 It is one of the most simple and basic types of
machine learning regression.
 In practice, we rarely have just one
explanatory variable, so we use multiple
rather than simple regression.
Multiple Linear Regression -
MLRLinear Regression MLR) is a analyses the
 Multiple
relationship of dependent outcome against
multiple independent variables. It is an extension
of Simple Linear Regression (SLR) and assumes
linear relationship between Y and X.
 The goal is to estimate the linear equation
coefficients that best describe this relationship.
 The general multiple linear regression model with K
explanatory variables has the following
 Y = β0 + β1X1 + ・ ・ ・ + βKXK + €
{ Y1,X11, . . . ,XK1+ €1

Y2,X12, . . . ,XK2+ €2

……

Yn,X1n, . . . ,XKn+ €n }
Polynomial
Regression
 It is used to represent a non-linear relationship
between dependent and independent
variables. It is a variant of the multiple linear
regression model, except that the best fit line
is curved rather than straight.
Evaluating Model – Loss function
 Both cost function and loss function are crucial concepts in
machine learning. The loss function calculates the error for a
single training example (or data point). It measures how
well or poorly the model performed on one observation.
 Whereas , The cost function is the average of the loss
function over the entire dataset.
 They are measured by following Metrics.

 Mean Absolute Error (MAE): The predicted values is


substracted from the actual values, to get the errors. The
absolute values of the obtained errors are summed and
their mean is calculated. This metric gives a notion of the
overall error for each prediction of the model, the smaller
(closer to 0) the better.
Evaluating Model
( contd
 Mean )
Squared Error (MSE): It is similar to
the MAE metric, but it squares the absolute
values of the errors.

 Root Mean Squared Error (RMSE): It is


similar to the MSE metric, but finally it takes
the square root of its final value, so as to
scale it back to the same units of the data..
Gradient
Descent
 Gradient Descent is an optimization algorithm which is
used for optimizing the cost function or error in the model.
We start by defining the initial parameter’s values and from
there the gradient descent algorithm iteratively adjust the
values so as to minimize the given cost-function. We can
think of a gradient as the slope of a function. Thus , In
mathematical terms, the gradient is a partial derivative
with respect to its inputs. The higher the gradient, the
steeper the slope and the faster a model shall learn.
 The equation below describes the gradient descent
algorithm : b is the next position while descend
while a represents current position. The gamma λ is the
learning rate and the gradient term ( Δf(a) – partial
derivative) is simply the direction of the steepest descent.

b = a – λ.Δf(a)
Gradient Descent
 The error in the(contd)
model can be different at different points, and
our objective is to find the quickest way to minimize it, to
prevent resource wastage.
 A good way to make sure the gradient descent algorithm runs
properly is by plotting the cost function as the optimization
runs. We put the number of iterations on the x-axis and the
value of the cost function on the y-axis. We observe the cost
function after each iteration of gradient descent. The trajectory
of Cost function then provides us a way to spot how appropriate
the learning rate (λ) is. If the gradient descent algorithm is
working properly, the cost function should decrease after every
iteration.
Low learning rate

High learning
rate
Good learning
10 rate
20 40 60 80
Learning rate
 The Learning rate (λ) is decided based on trial-and-error
method and it plays very vital role in convergence of the
model . The number of iterations gradient descent may needs
to converge can sometimes vary a lot. It can take 50 iterations,
60,000 or maybe even 3 million.
 Generally the learning rate (λ) chosen range from low values of
say , 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, etc., as the learning
rates and look at which one performs the best.
 An inappropriate learning rate may have problems like
 If Learning rate is low it may results in wasted time and
resources. A lower learning rate means more training time
more time results in increased cloud GPU costs.
 A higher rate could result in a model that might not be able
to predict anything accurately.

 A desirable learning rate is one that’s low enough


so that the network converges to something
useful but high enough so that it can be trained in
a reasonable amount of time.
Effect of Cost function

ny
r a
fo S E
l ve d M
So a n
t s E
L e A
– h M
r k ot
o b
w
s s W
a e
Cl hre
t
Slope is steep so model Slope is gradual .
converge could be problem Guaranteed to reach local
as local minima could get minima but may take time.
Principle & Architecture of a Neural
Network
Neural nets are a means of doing machine learning, in which a computer
learns to perform some task by analyzing training examples. They are
modeled loosely on the human brain and consists of thousands or even
millions of simple processing nodes that are densely interconnected. For
eg. The deep neural net of Google Brain, is reported to have over 100
billion artificial neurons in 2020 while OpenAI's GPT-3, contains around
175 inspiration
The billion nodes.
for Artificial Neural
network comes from human brain.
The artificial neural nets are
organized into layers of nodes, and
that data is processed at every level
of node and passed on Nodeto the next
node . Multiplying inspiration
weight w1 ,
Input-1 𝑥1
w2
𝑤1 …

Input-2 𝑥2 𝑤2
𝑦
𝑤3
Input-3 𝑥3 Y = Summed
𝑏
+1 Output
Y > TP or
...

𝑤𝑁 Y < TP Threshold
𝑥𝑁
Input-n Potential
Principle & Architecture of a Neural Network
In the figures on previous slide , to each of its incoming connections, a node
(contd)
will assign a number known as a “weight.” When the network is active, the
node receives different inputs x1 , x2 , x3 ….xn over each of its
connections. It multiplies it by the associated weight w1 , w2 etc as shown in
figure. It then adds the resulting products together (yin) yielding a single
number. If that number is below a threshold value(TP) (yin < TP ) , the
node passes no data to the next layer. If the number exceeds the threshold
value (yin > TP ) the node “fires,” the next node in hierarchy and the process
goes. The TP can also be called sometimes as an Activation function.
Activation Function (ƒ ) is a mathematical formula that helps the neuron
to
1. switch
BinaryON/OFF based on value it generates for TP. Two most common
sigmoidal
Activation
function: functions that are used are
This activation function
performs input editing yin=∑xi.wi
between 0 and 1.
F(x)=sigm(x)=
= 1/ Y=ƒ(yin
. (1+exp(−x))
Bipolar sigmoidal function:
his activation function performs )
put editing between -1 and 1.

(x)=sigm(x)=
BTECH , IT1 & IT2 , 5th Semester , Sep
=1−exp(x) / 1+exp(x)
Types of Neural Network
Neural networks can be described in many ways. They can be classified
depending on their: Structure, Data flow, Neurons used and their density,
Layers and their depth activation filters etc. Following are some of the most
important Neural Network structures that are popularly in use.
1. Feed-forward neural networks
These are simplest form of neural networks where input data travels in
one direction only, passing through artificial neural nodes(cell) and
exiting through output nodes (cell) . The network might or might not have
hidden node layers. Number of layers depends on the complexity of the
function.
In the two-layer feed forward neural network
the input layer takes in the data and forwards
the data to the node of the hidden layer. Each
node in the hidden layer serves as a function
that integrates its inputs and transmits the
output of the function into the node in the
next layer. There are no cyclic computations .
The neuron is activated if it is above threshold
(usually 0) and the neuron produces 1 as an
output. The neuron is not activated if it is
below threshold (usually 0) which is
This type of ANN computational model is used in technologies such as facial
considered as -1.
recognition and computer vision. BTECH , IT1 & IT2 , 5th Semester , Sep
2. Recurrent neural networks (RNNs)
This neural network starts with the front propagation
as a feed-forward network but then save the output
of processing nodes and feed the result back into
the model. Thus these NN goes on to remember all
processed information to reuse it in the future. If the
network's prediction is incorrect, then the system
self-learns and continues working toward the correct
prediction during back propagation stage. This type
of ANN is frequently used in text-to-speech
3. Convolutional neural networks (CNNs)
conversions.
The CNN model is particularly popular in the image recognition, facial
recognition, text digitization and NLP application areas. CNNs are one of
the most popular models used today. Convolution neural network contains
a three-dimensional arrangement of neurons instead of the standard two-
dimensional array. The first layer is called a convolutional layer. These
convolutional layers create feature maps that record a region of the image
that's ultimately broken into rectangles and sent out for nonlinear
processing. The inputs are multiplied with weights and fed to the
activation function which are generally nonlinear activation function. Other
use cases include paraphrase detection, signal processing and image
classification.
4. Deep Neural Networks (DNNs)
A deep neural network is a much more complicated system than a
“simple” neural system. In a Deep feed forward neural network there can
be many hidden layers between the input layer and the output layer that
model the complex relations between the inputs and the outputs. They
have multiple levels of nonlinear operations and they go beyond the input
data and can learn from previous experiences. They aim at learning
feature hierarchies, where features at higher levels of the hierarchy are
formed using the features at lower levels.
Deep neural networks can recognize voice commands, identify voices,
recognize sounds and graphics and do much more than a neural network.
Deep learning networks utilize "Big Data'' along with algorithms in order to
solve a problem, and these deep neural networks can solve problems with
limited or no human input.

BTECH , IT1 & IT2 , 5th Semester , Sep


Neural Network Training
Methods
Neural network training is the process of teaching a neural network to
perform a task. During training, the weights and thresholds are continually
adjusted until training data with the same labels consistently yield similar
t
outputs. There are many methods that are used to modify the weights.
n
e
These methods are called Learning rules, which are simply algorithms or
t
n
equations. Backpropagation (Backward Propagation of Errors) is an

c o ic .
optimization algorithm used in training artificial neural networks. It helps

n op
the model update its weights by minimizing the error between predicted

o w
and actual outputs using Gradient Descent.
t
u r o n
i
yo gat
re a
p a op
r e p r
p ck
s e a
ea rB
l
P fo
BTECH , IT1 & IT2 , 5 Semester , Sep
th
Thanks

IGDTU - BTECH , IT1 & IT2 , 5th Semester ,

You might also like