Module 6
Module 6
• neurons
– basic computational elements
[Russell & Norvig, 1995] – heavily interconnected with other
neurons
Neuron Diagram
• soma
– cell body
• dendrites
– incoming branches
• axon
– outgoing branch
• synapse
– junction between a
dendrite and an axon
from another neuron
AND OR
An AND function neuron would R function neuron would fire if ANY of the
only fire when ALL the inputs inputs is ON i.e., g(x) ≥ 1 here.
are ON i.e., g(x) ≥ 3 here.
McCulloch-Pitts Neuron (M-P Neuron)
AND GATE
x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1
McCulloch-Pitts Neuron (M-P Neuron)
OR GATE
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 1
McCulloch-Pitts Neuron (M-P Neuron)
Two types of input
An input is known as ‘inhibitory input’ if the weight associated with the input
is of negative magnitude and is known as ‘excitatory input’ if the weight
associated with the input is of positive magnitude
1.Binary Output Limitation: MCP neurons produce only binary outputs (0 or 1), which
limits their ability to represent complex patterns or continuous data.
2.Fixed Weights: In MCP neurons, all inputs are equally weighted (usually set to 1),
which may not reflect the varying importance of different inputs in real-world scenarios.
3.Lack of Learning: MCP neurons do not have mechanisms for learning or adjusting
their weights based on experience or training data, making them unsuitable for tasks
requiring adaptation or optimization.
5.Limited Complexity: The simplicity of MCP neurons limits their ability to model
sophisticated behaviors or cognitive processes found in biological neural networks.
6.Scalability: MCP neurons may not scale well to handle large amounts of data or
complex networks, as their binary nature and fixed weights can lead to computational
Perceptron
Perceptron
The perceptron consists of 4 parts.
Input value or One input layer: The input layer of the perceptron is made of artificial input neurons
and takes the initial data into the system for further processing.
Activation Function: A neuron can be activated or not, is determined by an activation function. The
activation function calculates a weighted sum and further adding bias with it to give the result.
Perceptron vs McCulloch-Pitts Neuron
Equation : f(x) = x
A function is monotonic if its first derivative (which need not be continuous) does not change sign.
Common Activation Functions
Range: [ 0 to infinity)
The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01 or so.
Both Leaky and Randomized ReLU functions are monotonic in nature. Also, their derivatives
also monotonic in nature.
Why
derivative/differentiation
is used ?
The input values are presented to the perceptron, and if the predicted output is the same as the desired output, then the
performance is considered satisfactory and no changes to the weights are made.
if the output does not match the desired output, then the weights need to be changed to reduce
the error.
AND GATE PROBLEM – Single Layer Perceptron
AND GATE PROBLEM – Single Layer Perceptron
⅀ = = 0 * 0.9 + 0 * 0.9 = 0
Error in prediction.
The actual value is 0 but the perceptron predicted 1
AND GATE PROBLEM – Single Layer Perceptron
= *(actual – predicted) *
= 0.5 Assume
= 0.5 *(0 – 1) * 0 = 0.9 No change in weight
= *(actual – predicted) *
= *(actual – predicted) *
= *(actual – predicted) *
Epoh 1
Initial Initial Updated Updated
X1 X2 Y Ypredicted (Y hat) Weight (X1) Weight (X2) Weight (X1) Weight (X2)
0 0 0
0 0.9 0.9 No Change No Change
0 1 0
1 No Change 0.4
(updated)
1 0 0
1 0.4 0.4
1 1 1 (updated)
1
0.4 0.4
AND GATE PROBLEM – Single Layer Perceptron
AND GATE PROBLEM – Single Layer Perceptron
AND GATE PROBLEM – Single Layer Perceptron
Because SLP is a linear classifier and if the cases are not linearly
separable the learning process will never reach a point where all the cases
are classified properly.
The most famous example of the inability of perceptron to solve problems
with linearly non-separable cases is the XOR problem.
Objective/Loss/Error function: To
guide the learning algorithm - the
learning algorithm should aim to
minimize the loss function
Learning algorithm: An algorithm for learning the parameters (w) of the model (for
example, perceptron learning algorithm, gradient descent, etc.
A typical Supervised Machine Learning Setup
Consider our movie example
The learning algorithm should aim to find a w which minimizes the above function
(squared error between y and )
Learning Parameters: (Infeasible) guess work
Keeping this supervised ML setup in mind, we will now focus on this
model and discuss an algorithm for learning the parameters of this
model from some given data using an appropriate objective function
Suppose we train the network with (x, y) = (0.5, 0.2) and (2.5, 0.9)
some guess work and intuition we were able to find the right values for w and b
Learning Parameters: (Infeasible) guess work
Let us look at the geometric interpretation of our “guess work” algorithm in terms of this
error surface
What's Next
There is a more efficient and principled way of doing the weight calculation
At step n, the weights of the neural network are all modified by the product of the hyperparameter \
alpha times the gradient of the cost function, computed with those weights. If the gradient is positive,
then we decrease the weights; and conversely, if the gradient is negative, then we increase them.
Gradient Ascent
Understanding the Mathematics behind Gradient Descent
Objective
Gradient descent algorithm is an iterative process that takes us to the minimum of a
function.
The formula below sums up the entire Gradient Descent algorithm in a single line
Understanding the Mathematics behind Gradient
Descent
A Machine Learning Model
arbitrary line in space that passes through
Consider a bunch of data points in a 2 D some of these data points
space. Assume that the data is related to
the height and weight of a group of
students.
Devise an algorithm to locate the minima, and that algorithm is called Gradient Descent
If you decide which way to go, you might take a bigger step or a little
step to reach your destination
The idea is that by being able to compute the derivative/slope of the function, find the minimum of a function
Understanding the Mathematics behind Gradient
The Learning rate Descent
This size of steps taken to reach the minimum or bottom is called Learning Rate.
Derivatives
Chain Rule
Power Rule
Understanding the Mathematics behind Gradient
Calculating Gradient Descent Descent
apply these rules of calculus in our original equation and find the derivative of the Cost
Function w.r.t to both ‘m’ and ‘b’.
Calculate the gradient of Error w.r.t to both m and b
https://gist.github.com/rohanjo
seph93/ecbbb9fb1715d5c248bc
ad0a7d3bffd2#file-gradient_des
cent-ipynb
The complexity of the function is inversely correlated with the number of layers. It cannot spread
backward; it can only go forward. In this scenario, the weights are unchanged. Weights are added to
the inputs before being passed to an activation function.
Forward propagation
Propagate inputs by adding all the weighted inputs and then computing outputs using
sigmoid threshold.
Backward Propagation
Propagates the errors backward by apportioning them to each unit according to the amount of this error the unit is
responsible for
Back-Propagation in Multilayer Feedforward Neural Networks
Back-propagation refers to the method used during network training.
More specifically, back-propagation refers to a simple method for
calculating the gradient of the network, that is the first derivative of the
weights in the network.
The primary objective of network training is to estimate an appropriate
set of network weights based upon a training dataset.
Many ways have been researched for estimating these weights, but
they all involve minimizing some error function.The commonly used
error function is the sum-of-squared errors:
Training uses one of several possible optimization methods to minimize this error term. Some of
the more common are: steepest descent / gradient descent, quasi-Newton, conjugant gradient
and many various modifications of these optimization routines
Back-Propagation in Multilayer Feedforward Neural Networks
Training uses one of several possible optimization methods to minimize this error term. Some of
the more common are: steepest descent / gradient descent, quasi-Newton, conjugant gradient
and many various modifications of these optimization routines
Back-Propagation in Multilayer Feedforward Neural Networks
Example of Feed Forward Neural Network
b is bias
initialize the weights with random values (0~1) and the bias as 0
Binary Classification of a Picture Using 1 Unit
calculate using all of the units of the neural network, and calculate the final output of the whole network
the neural network currently predicts given the weights and biases
above and inputs of 0.05 and 0.10.
To do this we’ll feed those inputs forward though the network
To figure out the total net input to each hidden layer neuron, squash
the total net input using an activation function (use the logistic
function), then repeat the process with the output layer neurons
Target output for o1 is 0.01 but the neural network output 0.75136507 error is
Total error for the neural network is the sum of these errors
Prakash VIT Chennai
Backpropagation Example
The Backwards Pass Our goal with backpropagation is to update each of the weights in the network so that
they cause the actual output to be closer the target output, thereby minimizing the error
for each output neuron and the network as a whole
Output Layer
Consider w5, want to know how much a change in w5 affects the total error,
First, how much does the total error change with respect to the output?
Backpropagation Example
The Backwards Pass Output Layer
how much does the output of o1 change with respect to its total net input?
Backpropagation Example
The Backwards Pass Output Layer
Finally, how much does the total net input of o1 change with respect to w 5?
https://www.javatpoint.com/pytorch-backpropagation-process-in-deep-neural-network
Backpropagation Example
https://www.javatpoint.com/pytorch-backpropagation-process-in-deep-neural-network
https://www.youtube.com/watch?
v=wqPt3qjB6uA&list=PLLeO8f6PhlKYLwtebJZzCue0AW
W7VVSn4&index=5
https://www.youtube.com/watch?v=QflXxNfMCKo
https://www.youtube.com/watch?v=QflXxNfMCKo&t=629s
https://www.youtube.com/watch?v=FglxznJkGPA
https://www.youtube.com/watch?v=1UmGvau4zm0