0% found this document useful (0 votes)

21 views

Module 6

Uploaded by

mehul.rudra15

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Module 6

Uploaded by

mehul.rudra15

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 104

Module 6 - Deep Learning

Simple feed forward networks – Computation

graph for deep learning – Convolution neural
networks – Learning algorithms – generalization –
Recurrent Neural Networks - Deep reinforcement
learning
Neural Networks
Neural Networks
Origin of Neural Networks – AI [since 1940’s]:
“Artificial neural networks (ANN) or connectionist
systems are computing systems vaguely inspired by the
biological neural networks that constitute animal
brains.” [Wikipedia]
Neural Networks and the Brain
• brain
– set of interconnected modules
– performs information processing
operations
• sensory input analysis
• memory storage and retrieval
• reasoning
• feelings
• Consciousness

• neurons
– basic computational elements
[Russell & Norvig, 1995] – heavily interconnected with other
neurons
Neuron Diagram
• soma
– cell body
• dendrites
– incoming branches
• axon
– outgoing branch
• synapse
– junction between a
dendrite and an axon
from another neuron

[Russell & Norvig, 1995]

Neural Networks and the Brain (Cont.)
The human brain incorporates nearly 10 billion neurons and
60 trillion connections between them.

Our brain can be considered as a highly complex, non-linear

and parallel information-processing system.

Learning is a fundamental and essential characteristic of

biological neural networks.
Analogy between biological and artificial neural networks

Biological Neural Network Artificial Neural Network

Soma Neuron / Node
Dendrite Input
Axon Output
Synapse Weight
Also a good reference on the history of Neural Networks:
“A brief history of Neural Nets and Deep Learning” by A. Kurenkov
McCulloch-Pitts Neuron (M-P Neuron)

An MCP neuron is a simplified version of a biological

neuron, receiving binary inputs (0 or 1) and generating a
binary output.
These inputs are weighted equally (all weights set
to 1), and the neuron’s output is determined by
applying a weighted sum to its inputs and
comparing it to a threshold.
If the weighted sum is greater than or equal to the
threshold, the neuron outputs 1; otherwise, it outputs 0

The first computational model of a neuron was

proposed by Warren MuCulloch (neuroscientist)
and Walter Pitts (logician) in 1943.
McCulloch-Pitts Neuron (M-P Neuron)

AND OR

An AND function neuron would R function neuron would fire if ANY of the
only fire when ALL the inputs inputs is ON i.e., g(x) ≥ 1 here.
are ON i.e., g(x) ≥ 3 here.
McCulloch-Pitts Neuron (M-P Neuron)

AND GATE

x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1
McCulloch-Pitts Neuron (M-P Neuron)

OR GATE

x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 1
McCulloch-Pitts Neuron (M-P Neuron)
Two types of input

An input is known as ‘inhibitory input’ if the weight associated with the input
is of negative magnitude and is known as ‘excitatory input’ if the weight
associated with the input is of positive magnitude

Inhibitory inputs have an absolute veto power over any

excitatory inputs.
McCulloch-Pitts Neuron (M-P Neuron)
The disadvantages of MCP (McCulloch-Pitts) neurons include:

1.Binary Output Limitation: MCP neurons produce only binary outputs (0 or 1), which
limits their ability to represent complex patterns or continuous data.

2.Fixed Weights: In MCP neurons, all inputs are equally weighted (usually set to 1),
which may not reflect the varying importance of different inputs in real-world scenarios.

3.Lack of Learning: MCP neurons do not have mechanisms for learning or adjusting
their weights based on experience or training data, making them unsuitable for tasks
requiring adaptation or optimization.

4.Inability to Handle Non-Linearities: Due to their linear threshold function, MCP

neurons struggle with tasks that involve non-linear relationships or complex decision
boundaries.

5.Limited Complexity: The simplicity of MCP neurons limits their ability to model
sophisticated behaviors or cognitive processes found in biological neural networks.

6.Scalability: MCP neurons may not scale well to handle large amounts of data or
complex networks, as their binary nature and fixed weights can lead to computational
Perceptron
Perceptron
The perceptron consists of 4 parts.

Input value or One input layer: The input layer of the perceptron is made of artificial input neurons
and takes the initial data into the system for further processing.

Weights and Bias:

Weight: It represents the dimension or strength of the connection between units. If the weight to
node 1 to node 2 has a higher quantity, then neuron 1 has a more considerable influence on the
neuron.
Bias: It is the same as the intercept added in a linear equation. It is an additional parameter which
task is to modify the output along with the weighted sum of the input to the other neuron.

Net sum: It calculates the total sum.

Activation Function: A neuron can be activated or not, is determined by an activation function. The
activation function calculates a weighted sum and further adding bias with it to give the result.
Perceptron vs McCulloch-Pitts Neuron

Binary inputs, the weights are The weights, including the

equal and the threshold values threshold can be learned and
predefined the inputs can be real values.
Modern Neural Networks –
Data Science (from
Machine Learning) [since
1990’s but mostly after 2006]
“until 2006 we didn't know
how to train neural
networks to surpass more
traditional approaches,
except for a few
specialized problems.
What changed in 2006 was
the discovery of techniques
for learning in so-called
deep neural networks.”
Activation Functions
1. Linear Activation Function
2. Non-linear Activation Functions

Equation : f(x) = x

Range : (-infinity to infinity)

It makes it easy for the model to generalize or adapt with
It doesn’t help with the complexity or various parameters variety of data and to differentiate between the output.
of usual data that is fed to the neural networks.
Activation Functions
The main terminologies needed to understand for
nonlinear functions are:

Derivative or Differential: Change in y-axis w.r.t.

change in x-axis. It is also known as slope.

Monotonic function: A function which is either

entirely non-increasing or non-decreasing.

A function is monotonic if its first derivative (which need not be continuous) does not change sign.
Common Activation Functions

• Step(x) = 1 if x >= t, else 0

• Sign(x) = +1 if x >= 0, else –1
• Sigmoid(x) = 1/(1+e-x)
Sigmoid or Logistic Activation Function

The softmax function is a more

generalized logistic activation function
which is used for multiclass classification.

The function is differentiable. That means, we can find the

slope of the sigmoid curve at any two points.

The function is monotonic but function’s derivative is not.

Tanh or hyperbolic tangent Activation Function
The range of the tanh function is from (-1 to 1). tanh is also sigmoidal (s - shaped).

The advantage is that the negative inputs will be mapped

strongly negative and the zero inputs will be mapped near
zero in the tanh graph.

The function is differentiable.

The function is monotonic while its derivative is not

monotonic.

The tanh function is mainly used classification between

two classes.

Both tanh and logistic sigmoid activation functions are

used in feed-forward nets.
ReLU (Rectified Linear Unit) Activation Function
The ReLU is the most used activation function in the world right now. Since, it is used in almost all the convolutional neural
networks or deep learning.

But the issue is that all the negative

values become zero immediately
which decreases the ability of the
model to fit or train from the data
properly.
That means any negative input
given to the ReLU activation
function turns the value into zero
immediately in the graph, which in
turns affects the resulting graph by
not mapping the negative values
the ReLU is half rectified (from bottom). appropriately.
f(z) is zero when z is less than zero and

f(z) is equal to z when z is above or equal to zero.

Range: [ 0 to infinity)

The function and its derivative both are monotonic.

Leaky ReLU

The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01 or so.

When a is not 0.01 then it is called Randomized ReLU.

Therefore the range of the Leaky ReLU is (-infinity to infinity).

Both Leaky and Randomized ReLU functions are monotonic in nature. Also, their derivatives
also monotonic in nature.
Why
derivative/differentiation
is used ?

When updating the

curve, to know in
which direction and
how much to change
or update the curve
depending upon the
slope.
That is why we use
differentiation in almost
every part of Machine
Learning and Deep
Learning.
Why
derivative/differentiation
is used ?

When updating the

Single layer: Single layer perceptron can learn only linearly

separable patterns.

Multilayer: Multilayer perceptron can learn about two or

more layers having a greater processing power.
It is mainly similar to a single-layer perceptron model but has more hidden layers.
single layer perceptron (SLP) is a feed-forward network based on a threshold transfer function. SLP is the simplest type of artificial
neural networks and can only classify linearly separable cases with a binary target (1 , 0).
The single layer perceptron does not have a priori knowledge, so the initial
weights are assigned randomly.
SLP sums all the weighted inputs and if the sum is above the threshold
(some predetermined value), SLP is said to be activated (output=1)

The input values are presented to the perceptron, and if the predicted output is the same as the desired output, then the
performance is considered satisfactory and no changes to the weights are made.
if the output does not match the desired output, then the weights need to be changed to reduce
the error.
AND GATE PROBLEM – Single Layer Perceptron
AND GATE PROBLEM – Single Layer Perceptron

Assume all the initial weights are 0.9

⅀ = = 0 * 0.9 + 0 * 0.9 = 0

⅀ = = 0 * 0.9 + 1 * 0.9 = 0.9

AND GATE PROBLEM – Single Layer Perceptron

⅀ = = 0 * 0.9 + 1 * 0.9 = 0.9 Predicted out put = 1

Error in prediction.
The actual value is 0 but the perceptron predicted 1
AND GATE PROBLEM – Single Layer Perceptron

Error () = Actual -Predicted  = 0-1 = -1

Update the weight

AND GATE PROBLEM – Single Layer Perceptron

Error () = Actual -Predicted

 = 0-1 = -1 Update the weight

= *(actual – predicted) *
 = 0.5 Assume
= 0.5 *(0 – 1) * 0 = 0.9 No change in weight

= *(actual – predicted) *

= 0.5 (0 – 1) 1 = 0.9 – 0.5 = 0.4 New weight updated

3RD RECORD

⅀ = = 1 * 0.9 + 0 * 0.4 = 0.9 Predicted out put = 1

Error in prediction  Update the weight  = 0.5 Assume

= *(actual – predicted) *

= 0.5 (0 – 1) 1 = 0.9-0.5 = 0.4 New weight updated

= *(actual – predicted) *

= 0.5 (0 – 1) 0 = 0.9 – 0.5 = 0.4 No change in weight

4th RECORD

⅀ = = 1 * 0.4 + 1 * 0.4 = 0.8 Predicted out put = 1

Epoh 1
Initial Initial Updated Updated
X1 X2 Y Ypredicted (Y hat) Weight (X1) Weight (X2) Weight (X1) Weight (X2)
0 0 0
0 0.9 0.9 No Change No Change
0 1 0
1 No Change 0.4
(updated)
1 0 0
1 0.4 0.4
1 1 1 (updated)
1
0.4 0.4
AND GATE PROBLEM – Single Layer Perceptron
AND GATE PROBLEM – Single Layer Perceptron
AND GATE PROBLEM – Single Layer Perceptron
Because SLP is a linear classifier and if the cases are not linearly
separable the learning process will never reach a point where all the cases
are classified properly.
The most famous example of the inability of perceptron to solve problems
with linearly non-separable cases is the XOR problem.

However, a multi-layer perceptron using the backpropagation

algorithm can successfully classify the XOR data.
Multi-layer feed-forward Artificial Neural Network,

The inner layers for deeper processing

of the inputs are known as hidden
layers. The hidden layers are not
dependent on any other layers. This
architecture is known as Multilayer
Perceptron (MLP).
Sigmoid neurons
Introducing sigmoid neurons where the output function is much smoother than the step
function

no longer see a sharp transition around the

threshold -w0
Also the output y is no longer binary but a real value between
0 and 1 which can be interpreted as a probability

Instead of a like/dislike decision we get the probability

of liking the movie
Perceptron Vs Sigmoid neurons

Not smooth, not continuous (at w0), Smooth, continuous, differentiable

not differentiable
A typical Supervised Machine Learning Setup
Earlier mentioned that a single perceptron cannot deal
with this data because it is not linearly separable

What does “cannot deal with” mean?

What would happen if we use a perceptron model to

classify this data?
Sure, it misclassifies 3 blue points and 3 red points but
we could live with this error in most real world
applications
From now on, we will accept that it is hard to drive the
error to 0 in most cases and will instead aim to reach the
minimum possible error
A typical Supervised Machine Learning Setup

Objective/Loss/Error function: To
guide the learning algorithm - the
learning algorithm should aim to
minimize the loss function

Parameters: In all the above cases, w is a parameter

which needs to be learned from the data

Learning algorithm: An algorithm for learning the parameters (w) of the model (for
example, perceptron learning algorithm, gradient descent, etc.
A typical Supervised Machine Learning Setup
Consider our movie example

The learning algorithm should aim to find a w which minimizes the above function
(squared error between y and )
Learning Parameters: (Infeasible) guess work
Keeping this supervised ML setup in mind, we will now focus on this
model and discuss an algorithm for learning the parameters of this
model from some given data using an appropriate objective function

σ stands for the sigmoid function (logistic function in this case

Consider a very simplified version of

the model having just 1 input
Learning Parameters: (Infeasible) guess work
What does it mean to train the network?

Suppose we train the network with (x, y) = (0.5, 0.2) and (2.5, 0.9)

At the end of training we expect to find w, b such that:

f(0.5) → 0.2 and f(2.5) → 0.9

Learning Parameters: (Infeasible) guess work
Can we try to find such a w∗ and b∗ manually

Let us try a random guess.. (say, w = 0.5, b = 0)

Clearly not good, but how bad is it ?

Learning Parameters: (Infeasible) guess work

some guess work and intuition we were able to find the right values for w and b
Learning Parameters: (Infeasible) guess work
Let us look at the geometric interpretation of our “guess work” algorithm in terms of this
error surface
What's Next

There is a more efficient and principled way of doing the weight calculation 

Learning Parameters : Gradient Descent

Understanding the Mathematics behind Gradient
Descent
 Agile is a pretty well-known term in the software development
process.
 The basic idea behind it is simple:
 build something quickly, ➡️get it out there, ➡️get some feedback ➡️make
changes depending upon the feedback ➡️repeat the process.
 The goal is to get the product near the user and guide you
with feedback to obtain the best possible product with the least
error.
 Also, the steps taken for improvement need to be small and should
constantly involve the user
The idea of — start with a solution as soon as possible, measure and
iterate as frequently as possible, is Gradient descent under the hood.
Gradient Descent
Gradient Descent

At step n, the weights of the neural network are all modified by the product of the hyperparameter \
alpha times the gradient of the cost function, computed with those weights. If the gradient is positive,
then we decrease the weights; and conversely, if the gradient is negative, then we increase them.

Neural networks have very complex loss

surfaces and finding the optimum is difficult

Gradient Ascent
Understanding the Mathematics behind Gradient Descent
Objective
Gradient descent algorithm is an iterative process that takes us to the minimum of a
function.

The formula below sums up the entire Gradient Descent algorithm in a single line
Understanding the Mathematics behind Gradient
Descent
A Machine Learning Model
arbitrary line in space that passes through
Consider a bunch of data points in a 2 D some of these data points
space. Assume that the data is related to
the height and weight of a group of
students.

We are trying to predict some

relationship between these quantities to
predict the weight of some new
students afterward.

This is essentially a simple example of a

supervised Machine Learning technique
Prakash VIT Chennai
Understanding the Mathematics behind Gradient
Descent
Predictions
Given a known set of inputs and their corresponding outputs, A machine learning model
tries to make some predictions for a new set of inputs.

This relates to the idea of a Cost

The Error would be the difference between the function or Loss function.
two predictions.

Prakash VIT Chennai

Understanding the Mathematics behind Gradient
Descent
Cost Function
A Cost Function/Loss Function evaluates the performance of our Machine Learning
Algorithm
The Loss function computes the error for a single training example, while the Cost
function is the average of the loss functions for all the training examples
Let’s say there are a total of ’N’ points in the dataset, and for all those ’N’ data
points, we want to minimize the error.

So the Cost function would be the total squared error

The goal of any Learning Algorithm is to

minimize the Cost Function.
Prakash VIT Chennai
Understanding the Mathematics behind Gradient
Descent
How do we minimize any function?

Cost function is of the form Y = X²

To minimize the function above, need to find
that value of X that produces the lowest value
of Y which is the red dot in the above figure
It is pretty easy to locate the minima here since it is a 2D graph, but this may not
always be the case, especially in higher dimensions

Devise an algorithm to locate the minima, and that algorithm is called Gradient Descent

Prakash VIT Chennai

Understanding the Mathematics behind Gradient
Descent
Consider that you are walking along with the graph below, and you are currently at the
‘green’ dot. You aim to reach the minimum, i.e., the ‘red’ dot, but from your position, you
are unable to view it.
Possible actions would be:
You might go upward or downward

If you decide which way to go, you might take a bigger step or a little
step to reach your destination

Essentially, there are two things that you should know to

reach the minima, i.e. which way to go and how big a step
to take.
Understanding the Mathematics behind Gradient
Descent
The Minimum Value
Tangent at the green point, know that if we are
moving upwards, we are moving away from the
minima and vice versa.

Also, the tangent gives us a sense of the

steepness of the slope

The slope at the blue point is less steep

than that at the green point, which means
it will take much smaller steps to reach the
minimum from the blue point than from
the green point
Understanding the Mathematics behind Gradient
Descent
Mathematical Interpretation of Cost Function
• Let us now put all these learnings into a mathematical formula.
• In the equation, y = mX+b ‘m’ and ‘b’ are its parameters.
• During the training process, there will be a small change in their values.
• Let that small change be denoted by δ.
• The value of parameters will be updated as m=m-δm and b=b-δb, respectively.
• Aim here is to find those values of m and b in y = mx+b
• For which the error is minimum, i.e., values that minimize the cost function.

The idea is that by being able to compute the derivative/slope of the function, find the minimum of a function
Understanding the Mathematics behind Gradient
The Learning rate Descent
This size of steps taken to reach the minimum or bottom is called Learning Rate.

Derivatives

Use derivates to decide whether to increase or decrease the weights to increase

or decrease any objective function

Two concepts from calculus

Chain Rule

Power Rule
Understanding the Mathematics behind Gradient
Calculating Gradient Descent Descent
apply these rules of calculus in our original equation and find the derivative of the Cost
Function w.r.t to both ‘m’ and ‘b’.
Calculate the gradient of Error w.r.t to both m and b

m¹,b¹ = next position

parameters;
m⁰,b⁰ = current position
parameters
Example
 Find the local minima of the function y=(x+5)² starting from the point
x=3
Step 1 : Initialize x =3. Then, find the gradient
of the function, dy/dx = 2*(x+5). learning rate → 0.01

https://gist.github.com/rohanjo
seph93/ecbbb9fb1715d5c248bc
ad0a7d3bffd2#file-gradient_des
cent-ipynb

Prakash VIT Chennai

Multi-layer Perceptron - Backpropagation algorithm
A multi-layer perceptron (MLP) has the same structure of a
single layer perceptron with one or more hidden layers.
The backpropagation algorithm consists of two phases:
• the forward phase where the activations are propagated
from the input to the output layer, and
• the backward phase, where the error between the
observed actual and the requested nominal value in the
output layer is propagated backwards in order to modify
the weights and bias values.
What is Feed Forward Neural Network?
The most fundamental kind of neural network, in which input data travels only in
one way before leaving through output nodes and passing through artificial neural
nodes. Input and output layers are present in locations where hidden layers may or
may not be present. Based on this, they are further divided into single-layered and
multi-layered feed-forward neural networks.
Neural networks can
modify their weights
during training based on a
characteristic known as
the delta rule, which
allows them to compare
their outputs to the
expected values.

The complexity of the function is inversely correlated with the number of layers. It cannot spread
backward; it can only go forward. In this scenario, the weights are unchanged. Weights are added to
the inputs before being passed to an activation function.
Forward propagation
Propagate inputs by adding all the weighted inputs and then computing outputs using
sigmoid threshold.
Backward Propagation
Propagates the errors backward by apportioning them to each unit according to the amount of this error the unit is
responsible for
Back-Propagation in Multilayer Feedforward Neural Networks
Back-propagation refers to the method used during network training.
More specifically, back-propagation refers to a simple method for
calculating the gradient of the network, that is the first derivative of the
weights in the network.
The primary objective of network training is to estimate an appropriate
set of network weights based upon a training dataset.
Many ways have been researched for estimating these weights, but
they all involve minimizing some error function.The commonly used
error function is the sum-of-squared errors:

Training uses one of several possible optimization methods to minimize this error term. Some of
the more common are: steepest descent / gradient descent, quasi-Newton, conjugant gradient
and many various modifications of these optimization routines
Back-Propagation in Multilayer Feedforward Neural Networks

back-propagation is a method for calculating

the first derivative of the error function with
respect to each network weight.

x1, x2 are the inputs.

w1, w2 are the coefficient weights for each input

y is the output
z is the weighted input

b is bias

σ(z) is the activation function

which represents sigmoid function
Variables and Parameters

Neural network classifying 4×3 pixel picture

Binary Classification of a Picture Using 1 Unit

let’s say we put the following

picture as input the weights are set as random (0 to 1
value) and the bias is set as 0

the inputs of the input layer, that

is, the inputs of the hidden layer
would be as follows
Binary Classification of a Picture Using 1 Unit
calculate using all of the units of the neural network, and calculate the final output of the whole
network
Binary Classification of a Picture Using 1 Unit
calculate using all of the units of the neural network, and calculate the final output of the whole
network

initialize the weights with random values (0~1) and the bias as 0
Binary Classification of a Picture Using 1 Unit
calculate using all of the units of the neural network, and calculate the final output of the whole network

Input layer → Hidden layer

Binary Classification of a Picture Using 1 Unit
calculate using all of the units of the neural network, and calculate the final output of the whole network

Hidden layer → Output layer

In the other words, the
parameters were
aimlessly set up in the
first place and happen to
give an accurate result.
When learning with
neural network, the
neural network adjust its
own parameters to give
a better accurate
classification result. To
do this, an algorithm
called
"Backpropagation" is
used to calculate how
much update is needed
It is y≥0.5 , thus y is close to 1. This means The number for each weight
on the picture is 1.
Overall picture
What is the derivative of the logistic sigmoid function?
Chain rule
As put by George F. Simmons: "if a car travels twice as fast as a bicycle and the bicycle is four times as fast as a walking
man, then the car travels 2 × 4 = 8 times as fast as the man.
Basic structure
Backpropagation Example
Use a neural network with two inputs, two hidden neurons, two output neurons. Additionally, the hidden and output neurons will
include a bias.
Goal of backpropagation is to optimize
the weights so that the neural network
initial weights, the biases
can learn how to correctly map
arbitrary inputs to outputs

Example : a single training set: given

inputs 0.05 and 0.10, we want the neural
network to output 0.01 and 0.99

Prakash VIT Chennai

Backpropagation Example
The Forward Pass

the neural network currently predicts given the weights and biases
above and inputs of 0.05 and 0.10.
To do this we’ll feed those inputs forward though the network

To figure out the total net input to each hidden layer neuron, squash
the total net input using an activation function (use the logistic
function), then repeat the process with the output layer neurons

Prakash VIT Chennai

The Forward Pass
Backpropagation Example

Calculating the Total Error

Target output for o1 is 0.01 but the neural network output 0.75136507  error is

Repeating this process for o2 (remembering that the target is 0.99)

Total error for the neural network is the sum of these errors
Prakash VIT Chennai
Backpropagation Example
The Backwards Pass Our goal with backpropagation is to update each of the weights in the network so that
they cause the actual output to be closer the target output, thereby minimizing the error
for each output neuron and the network as a whole
Output Layer
Consider w5, want to know how much a change in w5 affects the total error,

By applying the chain rule

Backpropagation Example
The Backwards Pass Output Layer

First, how much does the total error change with respect to the output?
Backpropagation Example
The Backwards Pass Output Layer

how much does the output of o1 change with respect to its total net input?
Backpropagation Example
The Backwards Pass Output Layer

Finally, how much does the total net input of o1 change with respect to w 5?

Update the error

the actual updates in the neural network after we have

the new weights leading into the hidden layer neurons
Backpropagation Example
continue the backwards pass by calculating new values for w 1, w2, w3, and
Hidden Layer
w4
similar process as we did for the output layer, but
slightly different to account for the fact that the output
of each hidden layer neuron contributes to the output
(and therefore error) of multiple output neurons

outh1 affects both outo1 and outo2

Backpropagation Example
continue the backwards pass by calculating new values for w 1, w2, w3, and
Hidden Layer
w4
Backpropagation Example
continue the backwards pass by calculating new values for w 1, w2, w3, and
Hidden Layer
w4

calculate the partial derivative of the total net input to

h1 with respect to w1
Backpropagation Example
continue the backwards pass by calculating new values for w 1, w2, w3, and
Hidden Layer
w4

Repeating this for w2, w3, and w4

Backpropagation Example

https://www.javatpoint.com/pytorch-backpropagation-process-in-deep-neural-network
Backpropagation Example

https://www.javatpoint.com/pytorch-backpropagation-process-in-deep-neural-network
https://www.youtube.com/watch?
v=wqPt3qjB6uA&list=PLLeO8f6PhlKYLwtebJZzCue0AW
W7VVSn4&index=5
https://www.youtube.com/watch?v=QflXxNfMCKo

https://www.youtube.com/watch?v=QflXxNfMCKo&t=629s

https://www.youtube.com/watch?v=FglxznJkGPA

https://www.youtube.com/watch?v=1UmGvau4zm0

05continuous Univariate Distributions, Vol. 1 PDF
0% (1)
05continuous Univariate Distributions, Vol. 1 PDF
769 pages
As 1720 1988 Part 1 Timber Structures 3
100% (1)
As 1720 1988 Part 1 Timber Structures 3
86 pages
Casa Rotonda - Mario Botta
0% (1)
Casa Rotonda - Mario Botta
7 pages
Soft Computing Manual.-1
No ratings yet
Soft Computing Manual.-1
45 pages
Activation Functions
No ratings yet
Activation Functions
11 pages
ANN (SPPU AI&DS Insem Solved Question Paper 2019 Pattern)
No ratings yet
ANN (SPPU AI&DS Insem Solved Question Paper 2019 Pattern)
26 pages
CHP 9
No ratings yet
CHP 9
29 pages
Viava Questions For Software Laboratory
No ratings yet
Viava Questions For Software Laboratory
20 pages
mod3
No ratings yet
mod3
101 pages
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
No ratings yet
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
40 pages
Neural Networks Notes
No ratings yet
Neural Networks Notes
22 pages
eL_Assignment
No ratings yet
eL_Assignment
10 pages
UNIT1_Perceptron_MLP
No ratings yet
UNIT1_Perceptron_MLP
26 pages
465-Lecture 2-4
No ratings yet
465-Lecture 2-4
43 pages
Unit 2
No ratings yet
Unit 2
15 pages
FALLSEM2024-25_BCSE209L_TH_VL2024250101737_2024-08-06_Reference-Material-I
No ratings yet
FALLSEM2024-25_BCSE209L_TH_VL2024250101737_2024-08-06_Reference-Material-I
20 pages
Week-3 Module-2 Neural Network
No ratings yet
Week-3 Module-2 Neural Network
58 pages
MODULE 1 DL
No ratings yet
MODULE 1 DL
6 pages
Lesson 7.0 Supervised Learning With Neural Networks (1)
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks (1)
22 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
ADVANCED_SUPERVISED_LEARNING[1]
No ratings yet
ADVANCED_SUPERVISED_LEARNING[1]
17 pages
Neural Network
No ratings yet
Neural Network
85 pages
digital library
No ratings yet
digital library
24 pages
Module 5 AIML Notes
No ratings yet
Module 5 AIML Notes
77 pages
Neural Networks and CNN
No ratings yet
Neural Networks and CNN
25 pages
Unit I
No ratings yet
Unit I
21 pages
Unit-5
No ratings yet
Unit-5
58 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
5 pages
NNDL
No ratings yet
NNDL
96 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
Deep Leaning
No ratings yet
Deep Leaning
117 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
UNIT V (1)
No ratings yet
UNIT V (1)
25 pages
Deep Learning Unit1
No ratings yet
Deep Learning Unit1
25 pages
@vtucode - in Module 5 AI 2021 Scheme 5th Sem
No ratings yet
@vtucode - in Module 5 AI 2021 Scheme 5th Sem
66 pages
FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
35 pages
Function of Single Biological Neuron and Modelling of Artificial Neuron From It
No ratings yet
Function of Single Biological Neuron and Modelling of Artificial Neuron From It
33 pages
Unit 2 (Q&A)
No ratings yet
Unit 2 (Q&A)
23 pages
Biological Neuron and Memory: Understanding The Basics of Neural Function and Memory Mechanisms
No ratings yet
Biological Neuron and Memory: Understanding The Basics of Neural Function and Memory Mechanisms
455 pages
ANN Multi Layer Perceptron Assignment
No ratings yet
ANN Multi Layer Perceptron Assignment
3 pages
ML Module 5
No ratings yet
ML Module 5
14 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
73 pages
Perceptron For Class
No ratings yet
Perceptron For Class
28 pages
Lecture-1
No ratings yet
Lecture-1
26 pages
Percptron
No ratings yet
Percptron
25 pages
36b_NN Intro2
No ratings yet
36b_NN Intro2
22 pages
Unit-5 DR - HCV
No ratings yet
Unit-5 DR - HCV
34 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Aiml Unit 5
No ratings yet
Aiml Unit 5
16 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
Unit 3
No ratings yet
Unit 3
8 pages
CS231n Convolutional Neural Networks For Visual Recognition 5
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 5
13 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
19 pages
activation fn
No ratings yet
activation fn
15 pages
Lec1-Introduction
No ratings yet
Lec1-Introduction
14 pages
Unit 4-Health care and Deep Learninh
No ratings yet
Unit 4-Health care and Deep Learninh
87 pages
SCT UNIT-2
No ratings yet
SCT UNIT-2
30 pages
Artificial Intelligence in Robotics-21-50
No ratings yet
Artificial Intelligence in Robotics-21-50
30 pages
Lecture08 NN1
No ratings yet
Lecture08 NN1
40 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
House Dzone Refcard 383 Neural Network Essentials
No ratings yet
House Dzone Refcard 383 Neural Network Essentials
5 pages
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet
Mathematical Cryptography
100% (1)
Mathematical Cryptography
138 pages
Pumping of Liquids
100% (1)
Pumping of Liquids
175 pages
Untitled
No ratings yet
Untitled
2 pages
Unit 7 Trignometry Assessment
No ratings yet
Unit 7 Trignometry Assessment
4 pages
Sap-C Ts4fi 2021
100% (1)
Sap-C Ts4fi 2021
38 pages
Version History
No ratings yet
Version History
43 pages
L1 - Layers of The Earth
No ratings yet
L1 - Layers of The Earth
3 pages
Pengetahuan Wirausaha Dan Minat Berwirausaha Pada Siswa SMK: Sahade Dan M. Yusuf A. Ngampo
No ratings yet
Pengetahuan Wirausaha Dan Minat Berwirausaha Pada Siswa SMK: Sahade Dan M. Yusuf A. Ngampo
6 pages
A Monster Ate My Homework Youtube
100% (1)
A Monster Ate My Homework Youtube
6 pages
Citroen 2CV4 CV6 Owner's Manual
No ratings yet
Citroen 2CV4 CV6 Owner's Manual
86 pages
s4 Ilp Teacher Leader Project
No ratings yet
s4 Ilp Teacher Leader Project
4 pages
Phy Ex
No ratings yet
Phy Ex
2 pages
STD Hanger
No ratings yet
STD Hanger
1 page
CH 01 EOC Solutions
No ratings yet
CH 01 EOC Solutions
2 pages
STRrrrr. Final Edited
100% (1)
STRrrrr. Final Edited
27 pages
Thesis Capsule Proposal
100% (3)
Thesis Capsule Proposal
6 pages
Heckscher-Ohlin Theory (Factor Proportions Theory)
No ratings yet
Heckscher-Ohlin Theory (Factor Proportions Theory)
4 pages
2nd Update Payload FPL HZS523-HZS524 RPLL-VHHH-RPLC STD131930 STD132300
No ratings yet
2nd Update Payload FPL HZS523-HZS524 RPLL-VHHH-RPLC STD131930 STD132300
42 pages
24EM2482_40100131232402817_Dec27085742
No ratings yet
24EM2482_40100131232402817_Dec27085742
1 page
GQB in PDF
100% (7)
GQB in PDF
592 pages
HFX 400 HF Long Wire
No ratings yet
HFX 400 HF Long Wire
2 pages
PAM, PPM, PWM Modulation and Demodulation Trainer ST2110 Learning
100% (2)
PAM, PPM, PWM Modulation and Demodulation Trainer ST2110 Learning
67 pages
Wind Tunnel Test Model Cessna
No ratings yet
Wind Tunnel Test Model Cessna
31 pages
0606 s16 QP 11
No ratings yet
0606 s16 QP 11
16 pages
How To Interface The 24LC256 EEPROM To Arduino
No ratings yet
How To Interface The 24LC256 EEPROM To Arduino
5 pages
Spiro Project Titles 2024-2025
No ratings yet
Spiro Project Titles 2024-2025
28 pages
Employee Management System: Computer Science (Python)
No ratings yet
Employee Management System: Computer Science (Python)
17 pages

Module 6

Uploaded by

Module 6

Uploaded by

Module 6 - Deep Learning

Simple feed forward networks – Computation

[Russell & Norvig, 1995]

Our brain can be considered as a highly complex, non-linear

Learning is a fundamental and essential characteristic of

Biological Neural Network Artificial Neural Network

An MCP neuron is a simplified version of a biological

The first computational model of a neuron was

Inhibitory inputs have an absolute veto power over any

4.Inability to Handle Non-Linearities: Due to their linear threshold function, MCP

Weights and Bias:

Net sum: It calculates the total sum.

Binary inputs, the weights are The weights, including the

Range : (-infinity to infinity)

Derivative or Differential: Change in y-axis w.r.t.

Monotonic function: A function which is either

• Step(x) = 1 if x >= t, else 0

The softmax function is a more

The function is differentiable. That means, we can find the

The function is monotonic but function’s derivative is not.

The advantage is that the negative inputs will be mapped

The function is differentiable.

The function is monotonic while its derivative is not

The tanh function is mainly used classification between

Both tanh and logistic sigmoid activation functions are

But the issue is that all the negative

f(z) is equal to z when z is above or equal to zero.

The function and its derivative both are monotonic.

When a is not 0.01 then it is called Randomized ReLU.

Therefore the range of the Leaky ReLU is (-infinity to infinity).

When updating the

When updating the

Single layer: Single layer perceptron can learn only linearly

Multilayer: Multilayer perceptron can learn about two or

Assume all the initial weights are 0.9

⅀ = = 0 * 0.9 + 1 * 0.9 = 0.9

⅀ = = 0 * 0.9 + 1 * 0.9 = 0.9 Predicted out put = 1

Error () = Actual -Predicted  = 0-1 = -1

Update the weight

Error () = Actual -Predicted

= 0.5 *(0 – 1) * 1 = 0.9 – 0.5 = 0.4 New weight updated

⅀ = = 1 * 0.9 + 0 * 0.4 = 0.9 Predicted out put = 1

Error in prediction  Update the weight  = 0.5 Assume

= 0.5 *(0 – 1) * 1 = 0.9-0.5 = 0.4 New weight updated

= 0.5 *(0 – 1) * 0 = 0.9 – 0.5 = 0.4 No change in weight

⅀ = = 1 * 0.4 + 1 * 0.4 = 0.8 Predicted out put = 1

However, a multi-layer perceptron using the backpropagation

The inner layers for deeper processing

no longer see a sharp transition around the

Instead of a like/dislike decision we get the probability

Not smooth, not continuous (at w0), Smooth, continuous, differentiable

What does “cannot deal with” mean?

What would happen if we use a perceptron model to

Parameters: In all the above cases, w is a parameter

σ stands for the sigmoid function (logistic function in this case

Consider a very simplified version of

At the end of training we expect to find w*, b* such that:

f(0.5) → 0.2 and f(2.5) → 0.9

Let us try a random guess.. (say, w = 0.5, b = 0)

Clearly not good, but how bad is it ?

Learning Parameters : Gradient Descent

Neural networks have very complex loss

We are trying to predict some

This is essentially a simple example of a

This relates to the idea of a Cost

Prakash VIT Chennai

So the Cost function would be the total squared error

The goal of any Learning Algorithm is to

Cost function is of the form Y = X²

Prakash VIT Chennai

Essentially, there are two things that you should know to

Also, the tangent gives us a sense of the

The slope at the blue point is less steep

Use derivates to decide whether to increase or decrease the weights to increase

Two concepts from calculus

m¹,b¹ = next position

= 0.5 (0 – 1) 1 = 0.9 – 0.5 = 0.4 New weight updated

= 0.5 (0 – 1) 1 = 0.9-0.5 = 0.4 New weight updated

= 0.5 (0 – 1) 0 = 0.9 – 0.5 = 0.4 No change in weight

At the end of training we expect to find w, b such that: