SlideShare a Scribd company logo
Neural Network I
Overview
● What is Neural Network, Artificial Neural Networks: Biological neurons
and its working
● Simulation of biological neurons to problem solving
● Learning rules and various activation functions (sigmoid, tanh, relu and
softmax )
● McCulloch Pitts Neuron, Concept of Linear Separability
● Single layer Perceptron
● Feedforward Neural Networks
● Back Propagation networks
● Character Recognition Application
● Stochastic Gradient Descent
● Immunological computing
Introduction
● What is Neural Network??
● A method of computing, based on the interaction of multiple connected processing
elements.
● A powerful technique to solve many real world problems.
● The ability to learn from experience in order to improve their performance.
● At the core of a neural network is a mathematical model that is used to make predictions
or decisions based on input data.
● The neurons in a neural network are connected by weighted links that allow them to
communicate with one another.
● There are several types of neural networks, including feedforward neural networks,
convolutional neural networks, and recurrent neural networks.
Basics of Neural Network
● A neuron is a cell that carries electrical impulses and are the basic units of
the nervous system.
● Every neuron is made of a cell body (also called a soma), dendrites and an
axon. Dendrites and axons are nerve fibers. There are about 86 billion neurons
in the human brain, which comprises roughly 10% of all brain cells.
● Neurons are connected to one another and tissues. They do not touch and
instead form tiny gaps called synapses. These gaps can be chemical synapses
or electrical synapses and pass the signal from one neuron to the next.
● Dendrite — It receives signals from other neurons.
● Soma (cell body) — It sums all the incoming signals to generate input.
● Axon — When the sum reaches a threshold value, neuron fires and the signal
travels down the axon to the other neurons.
● Synapses — The point of interconnection of one neuron with other neurons.
The amount of signal transmitted depend upon the strength (synaptic weights)
of the connections.
Neurons
●
Comparing ANN and BNN
● As this concept borrowed from ANN there are lot of similarities though there are differences too.
● Similarities are in the following table
Neural Networks and its related Concepts
Analogy of ANN with BNN
Learning
• Learning = learning by adaptation
• The objective of learning in biological organisms is to improve their
survival and reproductive success by adapting to changing environmental
conditions and developing new strategies for survival.
• Learning in biological organisms allows them to:
1. Respond to environmental changes
2. Improve their performance
3. Develop new behaviors
4. Enhance communication
Types of Learning in Neural Network
● Supervised Learning — Supervised learning is a type of machine
learning where the algorithm is trained on labeled data, which means that
the data is already categorized into specific classes or categories.
● Unsupervised Learning — Unsupervised learning is a type of machine
learning where the algorithm is trained on unlabeled data, which means
that the data is not categorized into specific classes or categories. The goal
of unsupervised learning is to find patterns and relationships in the data
without any prior knowledge of what the data represent.
● Reinforcement Learning — Reinforcement learning is a type of machine
learning where an agent learns to make decisions in an environment by
receiving feedback in the form of rewards or penalties.
Model of Artificial
Neural Network
● Receives n-inputs
● Multiplies each input by its
weight
● Applies activation function
to the sum of results
● Outputs result
Activation Functions
● The activation function is a mathematical “gate” in between
the input feeding the current neuron and its output going to
the next layer. They basically decide whether the neuron should
be activated or not.
● Activation functions in a neural network (NN) are mathematical
functions that are applied to the output of a neuron in the
network.
● The activation function introduces non-linearity into the
network and helps to produce a non-linear decision boundary
that can be used to model complex relationships in the input
data.
Activation Functions
Why do we use an activation function ?
If we do not have the activation function the weights and bias would simply
do a linear transformation.
A linear equation is simple to solve but is limited in its capacity to solve
complex problems and have less power to learn complex functional
mappings from data.
A neural network without an activation function is just a linear regression
model.
Generally, neural networks use non-linear activation functions, which can
help the network learn complex data, compute and learn almost any function
representing a question, and provide accurate predictions.
Why use a non-linear activation function?
If we were to use a linear activation function or identity activation
functions then the neural network will just output a linear
function of input.
And so, no matter how many layers our neural network has, it will
still behave just like a single layer network because summing
these layers will give us another linear function which is not strong
enough to model data.
Linear or Identity Activation Function
Linear or Identity Activation Function
Equation: f(x) = x
Derivative: f’(x) = 1
Range: (-∞, +∞)
Two major problems:
1. Back-propagation is not possible — The derivative of the function
is a constant, and has no relation to the input, X. So it’s not possible to
go back and understand which weights in the input neurons can
provide a better prediction.
2. All layers of the neural network collapse into one — with linear
activation functions, no matter how many layers in the neural network,
the last layer will be a linear function of the first layer
Non-linear Activation Function
Modern neural network models use non-linear activation functions.
They allow the model to create complex mappings between the
network’s inputs and outputs, which are essential for learning
and modeling complex data, such as images, video, audio, and
data sets which are non-linear or have high dimensionality.
Almost any process imaginable can be represented as a
functional computation in a neural network, provided that the
activation function is non-linear.
Non-linear Activation Function
Non-linear functions address the problems of a linear activation
function:
They allow back-propagation because they have a derivative
function which is related to the inputs.
They allow “stacking” of multiple layers of neurons to create
a deep neural network. Multiple hidden layers of neurons are
needed to learn complex data sets with high levels of accuracy.
Activation Functions
● Some commonly used activation functions in NNs include:
● Sigmoid function: The sigmoid function is an S-shaped curve that maps any input value
to a value between 0 and 1. It is commonly used as the activation function in the output
layer of binary classification problems.
● ReLU (Rectified Linear Unit) function: The ReLU function maps any input value to 0 if
it is negative, and to the input value if it is positive. It is commonly used as the activation
function in the hidden layers of deep neural networks.
● Tanh (Hyperbolic tangent) function: The Tanh function is similar to the sigmoid
function, but it maps any input value to a value between -1 and 1. It is also commonly
used as an activation function in the hidden layers of neural networks.
● Softmax function: The softmax function is used in the output layer of multi-class
classification problems. It maps the output values of each neuron to a probability
distribution over the classes.
Sigmoid Function
● It is a function which is plotted as ‘S’ shaped
graph.
● Equation : A = 1/(1 + e-x
)
● Derivative: f’(x) = s*(1-s)
● Nature : Non-linear. Notice that X values
lies between -2 to 2, Y values are very steep.
This means, small changes in x would also
bring about large changes in the value of Y.
● Value Range : 0 to 1
● Uses : Usually used in output layer of a
binary classification, where result is either 0
or 1, as value for sigmoid function lies
between 0 and 1 only so, result can be
predicted easily to be 1 if value is greater
than 0.5 and 0 otherwise.
Sigmoid Function
Advantages:
1. The function is differentiable.That means, we can find the slope
of the sigmoid curve at any two points.
2. Output values bound between 0 and 1, normalizing the output of
each neuron.
Disadvantages:
3. Vanishing gradient — For very large or very small inputs, the
sigmoid curve flattens.
This means the gradient (slope) becomes almost zero.
With gradients so small, the neural network struggles to update
its weights, slowing or even stopping learning.
4. Due to the vanishing gradient, the training process becomes very
slow, as updates to the model are minimal. sigmoids have slow
convergence.
● Outputs not zero centered.: The sigmoid output ranges from 0 to 1,
so it is always positive.
● This causes issues during weight updates, as the gradients can push
all weights in the same direction, making optimization harder.
1. Computationally expensive.
Tanh Function
• The activation that works almost always better than sigmoid
function is Tanh function also known as Tangent Hyperbolic
function. It’s actually mathematically shifted version of the
sigmoid function. Both are similar and can be derived from each
other.
• Equation :
• Value Range :- -1 to +1
• Derivative: (1- a²)
• Nature :- non-linear*
• Uses :- Usually used in hidden layers of a neural network as it’s
values lies between -1 to 1 hence the mean for the hidden layer
comes out be 0 or very close to it, hence helps in centering the
data by bringing mean close to 0. This makes learning for the
next layer much easier.
Tanh Function
Advantages:
1. Zero centered — Unlike the sigmoid function,
the tanh function outputs values between −1
and 1.
This helps the neural network model
inputs with strong negative, neutral, and
strong positive values more effectively,
leading to faster convergence.
2. The function and its derivative both are
monotonic.(consistently increase or decrease,
simplifying learning)
3. Works better than sigmoid function
Disadvantage:
4. It also suffers vanishing gradient problem and
hence slow convergence.
RELU Function
•It Stands for Rectified linear unit. It is the most widely used
activation function. Chiefly implemented in hidden layers of
Neural network.
•Equation :- A(x) = max(0,x). It gives an output x if x is
positive and 0 otherwise.
•Value Range :- [0, inf)
•Nature :- non-linear, which means we can easily
backpropagate the errors and have multiple layers of neurons
being activated by the ReLU function.
•Uses :- ReLu is less computationally expensive than tanh and
sigmoid because it involves simpler mathematical operations.
At a time only a few neurons are activated making the network
sparse making it efficient and easy for computation.
In simple words, RELU learns much faster than sigmoid and
Tanh function.
Softmax Function
● The softmax activation function is commonly
used in the output layer of a neural network
when performing multiclass classification.
● Nature :- non-linear
● softmax(x_i) = exp(x_i) / sum(exp(x_j)) for all j
● Uses :- Usually used when trying to handle
multiple classes. The softmax function was
commonly found in the output layer of image
classification problems.
● The softmax function is particularly useful in
multiclass classification tasks, where the goal is
to predict the probability of each possible class
for a given input.
Activation function
● Sigmoid functions and their combinations generally work better in
the case of classification problems.
● Sigmoid and tanh functions are sometimes avoided due to the
vanishing gradient problem.
● ReLU activation function is widely used and is default choice as it
yields better results.
● ReLU function should only be used in the hidden layers.
● An output layer can be linear activation function in case of
regression problems.
Activation function
● The basic rule of thumb is if you really don’t know what
activation function to use, then simply use RELU as it is a
general activation function in hidden layers and is used in most
cases these days.
● If your output is for binary classification then, sigmoid
function is very natural choice for output layer.
● If your output is for multi-class classification then, Softmax is
very useful to predict the probabilities of each classes.
What is the Perceptron model in Machine Learning?
Perceptron is Machine Learning algorithm for supervised learning of
various binary classification tasks. Further, Perceptron is also
understood as an Artificial Neuron or neural network unit that helps
to detect certain input data computations in business intelligence.
Perceptron model is also treated as one of the best and simplest
types of Artificial Neural networks. However, it is a supervised
learning algorithm of binary classifiers. Hence, we can consider it as
a single-layer neural network with four main parameters, i.e., input
values, weights and Bias, net sum, and an activation function.
What is Binary classifier in Machine Learning?
A binary classifier is a model used to categorize data into two distinct
classes (e.g., Yes/No, 1/-1, True/False).
A binary classifier predicts which of two classes a given input belongs
to:
● Positive class: Often labeled as 1.
● Negative class: Often labeled as −1 (or 0, depending on
convention).
Basic Components of Perceptron
Basic Components of Perceptron
○ Input Nodes or Input Layer:
This is the primary component of Perceptron which accepts the initial data
into the system for further processing. Each input node contains a real
numerical value.
○ Wight and Bias:
Weight parameter represents the strength of the connection between units.
This is another most important parameter of Perceptron components. Weight is
directly proportional to the strength of the associated input neuron in deciding
the output. Further, Bias can be considered as the line of intercept in a linear
equation.
Basic Components of Perceptron
Activation Function:
These are the final and important components that help to determine whether
the neuron will fire or not. Activation Function can be considered primarily
as a step function.
Types of Activation functions:
—-
How does Perceptron work?
How does Perceptron work?
In Machine Learning, Perceptron is considered as a single-layer neural network that
consists of four main parameters named input values (Input nodes), weights and Bias,
net sum, and an activation function.
The perceptron model begins with the multiplication of all input values and their
weights, then adds these values together to create the weighted sum.
Then this weighted sum is applied to the activation function 'f' to obtain the desired
output.
This activation function is also known as the step function and is represented by 'f'.
his step function or Activation function plays a vital role in ensuring that output is mapped
between required values (0,1) or (-1,1).
It is important to note that the weight of input is indicative of the strength of a node. Similarly,
an input's bias value gives the ability to shift the activation function curve up or down.
How does Perceptron work?
Step-1
In the first step first, multiply all input values with corresponding weight values and then add them
to determine the weighted sum. Mathematically, we can calculate the weighted sum as follows:
∑wi*xi = x1*w1 + x2*w2 +…wn*xn
Add a special term called bias 'b' to this weighted sum to improve the model's performance.
∑wi*xi + b
Step-2
In the second step, an activation function is applied with the above-mentioned weighted sum,
which gives us output either in binary form or a continuous value as follows:
Y = f(∑wi*xi + b)
Single Layer Perceptron
A single-layer perceptron is a type of artificial neural network that
consists of only one layer of artificial neurons.
It is the simplest type of neural network and was proposed by Frank
Rosenblatt in 1958.
Single layer perceptron has been used in various applications,
including: Pattern Recognition, Binary Classification, , Control
Systems, Medical Diagnosis, Financial Forecasting
The perceptron consists of 4 parts:
Input value or One input layer: The input layer of the perceptron is made of
artificial input neurons and takes the initial data into the system for further
processing.
Weights and Bias:
Weight: It represents the dimension or strength of the connection between
units.
Bias: It is the same as the intercept added in a linear equation. bias is a
tunable parameter in neural networks that can help improve the accuracy and
flexibility of the model by allowing it to learn more complex decision
boundaries.
Net sum: It calculates the total sum.
Activation Function: A neuron can be activated or not, is determined by an
activation function. The activation function calculates a weighted sum and
further adding bias with it to give the result.
The Perceptron Learning Rule
1. Initialize the weights: Start with random weights for each input.
2. Input the training data: Input the features into the perceptron and calculate
the output.
3. Calculate the error: Compare the predicted output with the desired output to
calculate the error.
4. Update the weights: Adjust the weights of the inputs based on the error. If the
predicted output is less than the desired output, increase the weights of the
inputs. If the predicted output is greater than the desired output, decrease the
weights of the inputs. The magnitude of the weight adjustment is proportional
to the error and the input value.
Repeat: Repeat steps 2 to 4 until the error is minimized or a maximum number of
iterations is reached
Neural Networks and its related Concepts
Perceptron Function
Perceptron is a function that maps its input “x,” which is multiplied with the learned weight
coefficient; an output value” f(x)”is generated.
In the equation given above:
“w” = vector of real-valued weights
“b” = bias (an element that adjusts the boundary away from origin without any dependence
on the input value)
“x” = vector of input x values
“m” = number of inputs to the Perceptron
The output can be represented as “1” or “0.” It can also be represented as “1” or “-1”
depending on which activation function is used
Activation Functions of Perceptron
The activation function applies a step rule (convert the numerical output
into +1 or -1) to check if the output of the weighting function is greater than
zero or not.
For example:
If ∑ wixi> 0 => then final output “o” = 1 (issue bank loan)
Else, final output “o” = -1 (deny bank loan)
Step function gets triggered above a certain value of the neuron output; else
it outputs zero. Sign Function outputs +1 or -1 depending on whether
neuron output is greater than zero or not. Sigmoid is the S-curve and outputs
a value between 0 and 1.
Feedforward Neural Networks (FFNN)
(FFNN) is a type of artificial neural network where the information flows in one
direction only, from the input layer through one or more hidden layers to the output layer.
The output of each layer is connected to the input of the next layer, and the weights
and biases of the connections are learned during the training process.
FFNNs are commonly used for tasks such as classification, Control systems such as
robotics , and pattern recognition.
They can be trained using supervised learning, where the training data consists of input-
output pairs, and the network learns to map inputs to outputs.
The weights and biases of the network are updated during the training process using
backpropagation, which is an algorithm that computes the gradients of the loss function
with respect to the weights and biases.
How FFNN works
The input layer of an FFNN takes in the input data, which is usually in the form
of a vector.
Passes it through a series of hidden layers, each consisting of a set of neurons.
Each neuron in a hidden layer takes in the weighted sum of the outputs from the
previous layer, adds a bias term, and applies an activation function to produce
an output that is passed to the next layer.
The output layer produces the final output of the network, which is usually a
prediction or a classification.
Neural Networks and its related Concepts
A Multi-Layer Perceptron (MLP)
A Multi-Layer Perceptron (MLP) is a type of neural
network that consists of multiple layers of artificial
neurons.
MLPs are also known as feedforward neural networks.
The architecture of an MLP consists of an input layer,
one or more hidden layers, and an output layer.
Each layer is composed of multiple artificial neurons that
compute a weighted sum of the input signals and apply
an activation function to produce an output signal.
A Multi-Layer Perceptron (MLP)
The hidden layers in an MLP are responsible for extracting
features from the input data and transforming them into a
format that is suitable for the output layer.
The output layer produces the final output of the network,
which can be binary or continuous.
The learning process of an MLP involves adjusting the
weights of the input signals using backpropagation.
Backpropagation allows the MLP to learn from the training
data and improve its performance over time.
Neural Networks and its related Concepts
Compare single layer and multilayer perceptron model
Architecture:
Single-layer perceptrons have only one layer of neurons that directly connects to
the input data, whereas multilayer perceptrons consist of multiple layers of
neurons, including one or more hidden layers that lie between the input and output
layers.
Capabilities:
Single-layer perceptrons are limited to linearly separable problems, meaning they
can only learn and classify data that can be separated by a single straight line.
In contrast, multilayer perceptrons can learn and classify non-linearly
separable problems by using hidden layers to transform the input data into a
more complex feature space that can be separated by the output layer.
Compare single layer and multilayer perceptron model
Training:
Single-layer perceptrons use a simple learning rule called the Perceptron Learning
Algorithm, which adjusts the weights of the input signals to minimize the error
between the predicted and actual output. In contrast, multilayer perceptrons use a
more complex learning algorithm called backpropagation, which iteratively adjusts
the weights of all the neurons in the network to minimize the error between the
predicted and actual output.
Applications:
Single-layer perceptrons are typically used for simple binary classification
problems, such as predicting whether an email is spam or not. Multilayer perceptrons
are more powerful and can be used for a wide range of applications, including
image and speech recognition, natural language processing, and financial forecasting.
Back Propagation networks
Back Propagation are supervised learning algorithms used for training
neural networks.
The basic structure of a backpropagation network consists of an input
layer, one or more hidden layers, and an output layer.
Each layer is composed of one or more neurons, which receive inputs,
process them, and pass the outputs to the next layer.
The connections between the neurons are weighted, and these weights
are adjusted during training to improve the accuracy of the network's
predictions.
Back Propagation networks
During the training process, the network is fed a set of input-output
pairs, and the output of the network is compared to the desired output.
The error between the actual output and the desired output is
then back propagated through the network, and the weights are
adjusted to reduce the error.
This process is repeated many times, with the hope that the
network will eventually converge to a set of weights that produces
accurate predictions for new input data.
Neural Networks and its related Concepts
How Backpropagation Algorithm Works:
1. Inputs X, arrive through the preconnected path
2. Input is modeled using real weights W. The weights are usually randomly
selected.
3. Calculate the output for every neuron from the input layer, to the hidden
layers, to the output layer.
4. Calculate the error in the outputs
ErrorB= Actual Output – Desired Output
5. Travel back from the output layer to the hidden layer to adjust the weights
such that the error is decreased.
Why We Need Backpropagation?
● Backpropagation is fast, simple and easy to program
● It has no parameters to tune apart from the numbers of
input
● It is a flexible method as it does not require prior
knowledge about the network
● It is a standard method that generally works well
● It does not need any special mention of the features of the
function to be learned.
Concept of Linear Separability
● Linear separability is a concept in mathematics and particularly
in machine learning.
● Imagine you have some points scattered around on a piece of
paper, and you want to draw a straight line to separate them into
two groups.
● If you can draw such a line where all the points of one group are
on one side of the line, and all the points of the other group are
on the other side, then those points are said to be linearly
separable.
Concept of Linear Separability
● For example, let's say you have red and blue dots on a sheet of
paper, and you want to separate them with a straight line. If you
can draw a line in such a way that all the red dots are on one side
and all the blue dots are on the other side, then those dots are
linearly separable.
● Linear separability is important in machine learning because
it means that the data is easy to classify using a simple
algorithm like a linear classifier. If data is not linearly separable,
more complex methods may be needed to classify it accurately.
Concept of Linear Separability
● Linear separability is an important concept in neural networks. If the separate points in n-dimensional space
follows
then it is said linearly separable
● For two-dimensional inputs, if there exists a line (whose equation is ) that separates all
samples of one class from the other class, then an appropriate perception can be derived from the equation of the
separating line. such classification problems are called “Linear separable” i.e, separating by a linear combination
of i/p.
Character Recognition Application
Character recognition is a common application of neural networks, and
can be achieved using various types of neural networks, including
Feedforward neural networks, convolutional neural networks, and
recurrent neural networks
The network must be trained on a dataset of labeled character images
in order to learn to recognize characters.
During training, the network adjusts its weights based on the
difference between its predicted output and the true label of the input
image.
Once the network is trained, it can be used to make predictions on new,
unlabeled character images.
OCR (Optical Character Recognition)
OCR is a technology that analyzes the text of a page and turns the
letters into code that may be used to process information.
OCR is a technique for detecting printed or handwritten text
characters inside digital images of paper files, such as scanning paper
records
OCR systems are hardware and software systems that turn physical
documents into machine-readable text.
These digital versions can be highly beneficial to children and young
adults who struggle to read.
The essential application of OCR is to convert hard copy legal or
historical documents into PDFs.
Neural Networks and its related Concepts
How OCR works?
1. Image Pre-Processing
● Size normalization: This step ensures that all images are of the same
size for consistency. We use a method called bicubic interpolation to
resize images to a standard size.
● Binarization: Here, we convert grayscale images to binary images by
setting a threshold. Pixels above the threshold become white, while
those below become black. This helps in simplifying the image for
further processing.
● Smoothing: To make the edges of objects in the image smoother, we
use erosion and dilation techniques. This helps in reducing noise and
making the objects clearer.
How OCR works?
Text recognition: Once the image is pre-processed, we can start recognizing
text. There are two main methods for this:
● Pattern matching: This works well for typed documents with known
fonts. It compares parts of the image with patterns of characters it
knows.
● Feature extraction: This method looks at specific features of
characters, like lines and curves, to identify them.
How OCR works?
Postprocessing
After recognizing the text, the system converts it into a digital format. Some
systems also create annotated PDF files, which show both the original
scanned document and the recognized text.
Immunological computing
Immunological computing is like using the principles of our immune system to teach
computers how to recognize patterns, make decisions, and solve problems
effectively. It's a fascinating area of research that draws inspiration from nature to
develop smarter algorithms and systems.
Immune System Basics:
Our immune system is like a defense force in our body that helps to keep us healthy.
It can recognize harmful invaders, like bacteria or viruses, and fight them off to keep
us safe.
It does this by identifying foreign substances called antigens and producing antibodies
to neutralize them.
Immunological computing
How Immunological Computing Works:
In immunological computing, we mimic the behavior of the immune system to
solve computational problems.
Just like our immune system learns to recognize and respond to threats, in
immunological computing, algorithms learn to recognize patterns in data and
make decisions based on them.
Instead of antigens and antibodies, we use concepts like "patterns" and "rules".
The algorithms adapt and improve over time, similar to how our immune system
builds immunity to diseases.
Immunological computing
Applications:
Immunological computing can be used in various fields such as data mining,
pattern recognition, and optimization.
For example, in anomaly detection, it can help identify unusual patterns in
data that may indicate fraud or errors.
In optimization problems, it can be used to find the best solution among
many possibilities, similar to how our immune system finds the best
response to different threats.
Stochastic Gradient Descent
Gradient Descent (GD):
Imagine you are blindfolded on a hill and want to find the lowest point without any help.
You feel the slope under your feet and take small steps downhill. This is like Gradient
Descent where you iteratively move towards the minimum of a function by following the
direction of steepest descent.
Stochastic Gradient Descent (SGD):
Now, let's add a twist. Instead of relying on the slope at your current location alone, you
randomly pick a spot on the hill, feel the slope there, and take a step. Sometimes this spot
might be flat or even uphill, but over many such random steps, you tend to move towards
the bottom of the hill. This randomness helps in escaping local minima and can be faster
than regular Gradient Descent, especially for large datasets.
Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is a variant of the Gradient Descent
algorithm that is used for optimizing machine learning models.
In SGD, instead of using the entire dataset for each iteration, only a single
random training example (or a small batch) is selected to calculate the
gradient and update the model parameters. This random selection introduces
randomness into the optimization process, hence the term “stochastic” in
stochastic Gradient Descent
The advantage of using SGD is its computational efficiency, especially when
dealing with large datasets. By using a single example or a small batch, the
computational cost per iteration is significantly reduced compared to traditional
Gradient Descent methods that require processing the entire dataset.
Stochastic Gradient Descent
How it works:
● Start with an initial guess for the minimum point.
● Randomly shuffle your dataset.
● For each data point in the shuffled dataset:
○ Compute the gradient of the loss function at that point (i.e.,
the direction of steepest descent).
○ Update your guess for the minimum point by taking a small
step.
● Repeat this process for a fixed number of iterations or until the
improvement becomes very small.
Ad

More Related Content

Similar to Neural Networks and its related Concepts (20)

Sppu engineering artificial intelligence and data science semester 6th Artif...
Sppu engineering  artificial intelligence and data science semester 6th Artif...Sppu engineering  artificial intelligence and data science semester 6th Artif...
Sppu engineering artificial intelligence and data science semester 6th Artif...
pawaletrupti434
 
Deep learning Ann(Artificial neural network)
Deep learning Ann(Artificial neural network)Deep learning Ann(Artificial neural network)
Deep learning Ann(Artificial neural network)
aawezix
 
Neural Networks Basic Concepts and Deep Learning
Neural Networks Basic Concepts and Deep LearningNeural Networks Basic Concepts and Deep Learning
Neural Networks Basic Concepts and Deep Learning
rahuljain582793
 
2011 0480.neural-networks
2011 0480.neural-networks2011 0480.neural-networks
2011 0480.neural-networks
Parneet Kaur
 
Neural networks introduction
Neural networks introductionNeural networks introduction
Neural networks introduction
آيةالله عبدالحكيم
 
Unit 6: Application of AI
Unit 6: Application of AIUnit 6: Application of AI
Unit 6: Application of AI
Tekendra Nath Yogi
 
Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101
AMIT KUMAR
 
Neuralnetwork 101222074552-phpapp02
Neuralnetwork 101222074552-phpapp02Neuralnetwork 101222074552-phpapp02
Neuralnetwork 101222074552-phpapp02
Deepu Gupta
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introduction
Ganesan Narayanasamy
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
Derek Kane
 
Activation_function.pptx
Activation_function.pptxActivation_function.pptx
Activation_function.pptx
Mohamed Essam
 
Artificial neural network paper
Artificial neural network paperArtificial neural network paper
Artificial neural network paper
AkashRanjandas1
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
PradeeshSAI
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar
 
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI300
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI300SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI300
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI300
dhruvkeshav123
 
ANN.pptx
ANN.pptxANN.pptx
ANN.pptx
AROCKIAJAYAIECW
 
Machine learning PPT which shows the some deep learning concepts and code of ...
Machine learning PPT which shows the some deep learning concepts and code of ...Machine learning PPT which shows the some deep learning concepts and code of ...
Machine learning PPT which shows the some deep learning concepts and code of ...
workingmann08
 
ACTIVATION FUNCTIONS IN SOFT COMPUTING AW
ACTIVATION FUNCTIONS IN SOFT COMPUTING AWACTIVATION FUNCTIONS IN SOFT COMPUTING AW
ACTIVATION FUNCTIONS IN SOFT COMPUTING AW
sssmrockz
 
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdfNEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
SowmyaJyothi3
 
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdfDeep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
naveenraghavendran10
 
Sppu engineering artificial intelligence and data science semester 6th Artif...
Sppu engineering  artificial intelligence and data science semester 6th Artif...Sppu engineering  artificial intelligence and data science semester 6th Artif...
Sppu engineering artificial intelligence and data science semester 6th Artif...
pawaletrupti434
 
Deep learning Ann(Artificial neural network)
Deep learning Ann(Artificial neural network)Deep learning Ann(Artificial neural network)
Deep learning Ann(Artificial neural network)
aawezix
 
Neural Networks Basic Concepts and Deep Learning
Neural Networks Basic Concepts and Deep LearningNeural Networks Basic Concepts and Deep Learning
Neural Networks Basic Concepts and Deep Learning
rahuljain582793
 
2011 0480.neural-networks
2011 0480.neural-networks2011 0480.neural-networks
2011 0480.neural-networks
Parneet Kaur
 
Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101
AMIT KUMAR
 
Neuralnetwork 101222074552-phpapp02
Neuralnetwork 101222074552-phpapp02Neuralnetwork 101222074552-phpapp02
Neuralnetwork 101222074552-phpapp02
Deepu Gupta
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introduction
Ganesan Narayanasamy
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
Derek Kane
 
Activation_function.pptx
Activation_function.pptxActivation_function.pptx
Activation_function.pptx
Mohamed Essam
 
Artificial neural network paper
Artificial neural network paperArtificial neural network paper
Artificial neural network paper
AkashRanjandas1
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar
 
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI300
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI300SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI300
SET-02_SOCS_ESE-DEC23__B.Tech%20(CSE-H+NH)-AIML_5_CSAI300
dhruvkeshav123
 
Machine learning PPT which shows the some deep learning concepts and code of ...
Machine learning PPT which shows the some deep learning concepts and code of ...Machine learning PPT which shows the some deep learning concepts and code of ...
Machine learning PPT which shows the some deep learning concepts and code of ...
workingmann08
 
ACTIVATION FUNCTIONS IN SOFT COMPUTING AW
ACTIVATION FUNCTIONS IN SOFT COMPUTING AWACTIVATION FUNCTIONS IN SOFT COMPUTING AW
ACTIVATION FUNCTIONS IN SOFT COMPUTING AW
sssmrockz
 
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdfNEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
SowmyaJyothi3
 
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdfDeep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
Deep Learning Study _ FInalwithCNN_RNN_LSTM_GRU.pdf
naveenraghavendran10
 

Recently uploaded (20)

π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
The Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLabThe Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLab
Journal of Soft Computing in Civil Engineering
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Journal of Soft Computing in Civil Engineering
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
fluke dealers in bangalore..............
fluke dealers in bangalore..............fluke dealers in bangalore..............
fluke dealers in bangalore..............
Haresh Vaswani
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
fluke dealers in bangalore..............
fluke dealers in bangalore..............fluke dealers in bangalore..............
fluke dealers in bangalore..............
Haresh Vaswani
 
Ad

Neural Networks and its related Concepts

  • 2. Overview ● What is Neural Network, Artificial Neural Networks: Biological neurons and its working ● Simulation of biological neurons to problem solving ● Learning rules and various activation functions (sigmoid, tanh, relu and softmax ) ● McCulloch Pitts Neuron, Concept of Linear Separability ● Single layer Perceptron ● Feedforward Neural Networks ● Back Propagation networks ● Character Recognition Application ● Stochastic Gradient Descent ● Immunological computing
  • 3. Introduction ● What is Neural Network?? ● A method of computing, based on the interaction of multiple connected processing elements. ● A powerful technique to solve many real world problems. ● The ability to learn from experience in order to improve their performance. ● At the core of a neural network is a mathematical model that is used to make predictions or decisions based on input data. ● The neurons in a neural network are connected by weighted links that allow them to communicate with one another. ● There are several types of neural networks, including feedforward neural networks, convolutional neural networks, and recurrent neural networks.
  • 4. Basics of Neural Network ● A neuron is a cell that carries electrical impulses and are the basic units of the nervous system. ● Every neuron is made of a cell body (also called a soma), dendrites and an axon. Dendrites and axons are nerve fibers. There are about 86 billion neurons in the human brain, which comprises roughly 10% of all brain cells. ● Neurons are connected to one another and tissues. They do not touch and instead form tiny gaps called synapses. These gaps can be chemical synapses or electrical synapses and pass the signal from one neuron to the next. ● Dendrite — It receives signals from other neurons. ● Soma (cell body) — It sums all the incoming signals to generate input. ● Axon — When the sum reaches a threshold value, neuron fires and the signal travels down the axon to the other neurons. ● Synapses — The point of interconnection of one neuron with other neurons. The amount of signal transmitted depend upon the strength (synaptic weights) of the connections.
  • 6. Comparing ANN and BNN ● As this concept borrowed from ANN there are lot of similarities though there are differences too. ● Similarities are in the following table
  • 8. Analogy of ANN with BNN
  • 9. Learning • Learning = learning by adaptation • The objective of learning in biological organisms is to improve their survival and reproductive success by adapting to changing environmental conditions and developing new strategies for survival. • Learning in biological organisms allows them to: 1. Respond to environmental changes 2. Improve their performance 3. Develop new behaviors 4. Enhance communication
  • 10. Types of Learning in Neural Network ● Supervised Learning — Supervised learning is a type of machine learning where the algorithm is trained on labeled data, which means that the data is already categorized into specific classes or categories. ● Unsupervised Learning — Unsupervised learning is a type of machine learning where the algorithm is trained on unlabeled data, which means that the data is not categorized into specific classes or categories. The goal of unsupervised learning is to find patterns and relationships in the data without any prior knowledge of what the data represent. ● Reinforcement Learning — Reinforcement learning is a type of machine learning where an agent learns to make decisions in an environment by receiving feedback in the form of rewards or penalties.
  • 11. Model of Artificial Neural Network ● Receives n-inputs ● Multiplies each input by its weight ● Applies activation function to the sum of results ● Outputs result
  • 12. Activation Functions ● The activation function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer. They basically decide whether the neuron should be activated or not. ● Activation functions in a neural network (NN) are mathematical functions that are applied to the output of a neuron in the network. ● The activation function introduces non-linearity into the network and helps to produce a non-linear decision boundary that can be used to model complex relationships in the input data.
  • 14. Why do we use an activation function ? If we do not have the activation function the weights and bias would simply do a linear transformation. A linear equation is simple to solve but is limited in its capacity to solve complex problems and have less power to learn complex functional mappings from data. A neural network without an activation function is just a linear regression model. Generally, neural networks use non-linear activation functions, which can help the network learn complex data, compute and learn almost any function representing a question, and provide accurate predictions.
  • 15. Why use a non-linear activation function? If we were to use a linear activation function or identity activation functions then the neural network will just output a linear function of input. And so, no matter how many layers our neural network has, it will still behave just like a single layer network because summing these layers will give us another linear function which is not strong enough to model data.
  • 16. Linear or Identity Activation Function
  • 17. Linear or Identity Activation Function Equation: f(x) = x Derivative: f’(x) = 1 Range: (-∞, +∞) Two major problems: 1. Back-propagation is not possible — The derivative of the function is a constant, and has no relation to the input, X. So it’s not possible to go back and understand which weights in the input neurons can provide a better prediction. 2. All layers of the neural network collapse into one — with linear activation functions, no matter how many layers in the neural network, the last layer will be a linear function of the first layer
  • 18. Non-linear Activation Function Modern neural network models use non-linear activation functions. They allow the model to create complex mappings between the network’s inputs and outputs, which are essential for learning and modeling complex data, such as images, video, audio, and data sets which are non-linear or have high dimensionality. Almost any process imaginable can be represented as a functional computation in a neural network, provided that the activation function is non-linear.
  • 19. Non-linear Activation Function Non-linear functions address the problems of a linear activation function: They allow back-propagation because they have a derivative function which is related to the inputs. They allow “stacking” of multiple layers of neurons to create a deep neural network. Multiple hidden layers of neurons are needed to learn complex data sets with high levels of accuracy.
  • 20. Activation Functions ● Some commonly used activation functions in NNs include: ● Sigmoid function: The sigmoid function is an S-shaped curve that maps any input value to a value between 0 and 1. It is commonly used as the activation function in the output layer of binary classification problems. ● ReLU (Rectified Linear Unit) function: The ReLU function maps any input value to 0 if it is negative, and to the input value if it is positive. It is commonly used as the activation function in the hidden layers of deep neural networks. ● Tanh (Hyperbolic tangent) function: The Tanh function is similar to the sigmoid function, but it maps any input value to a value between -1 and 1. It is also commonly used as an activation function in the hidden layers of neural networks. ● Softmax function: The softmax function is used in the output layer of multi-class classification problems. It maps the output values of each neuron to a probability distribution over the classes.
  • 21. Sigmoid Function ● It is a function which is plotted as ‘S’ shaped graph. ● Equation : A = 1/(1 + e-x ) ● Derivative: f’(x) = s*(1-s) ● Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are very steep. This means, small changes in x would also bring about large changes in the value of Y. ● Value Range : 0 to 1 ● Uses : Usually used in output layer of a binary classification, where result is either 0 or 1, as value for sigmoid function lies between 0 and 1 only so, result can be predicted easily to be 1 if value is greater than 0.5 and 0 otherwise.
  • 22. Sigmoid Function Advantages: 1. The function is differentiable.That means, we can find the slope of the sigmoid curve at any two points. 2. Output values bound between 0 and 1, normalizing the output of each neuron. Disadvantages: 3. Vanishing gradient — For very large or very small inputs, the sigmoid curve flattens. This means the gradient (slope) becomes almost zero. With gradients so small, the neural network struggles to update its weights, slowing or even stopping learning. 4. Due to the vanishing gradient, the training process becomes very slow, as updates to the model are minimal. sigmoids have slow convergence. ● Outputs not zero centered.: The sigmoid output ranges from 0 to 1, so it is always positive. ● This causes issues during weight updates, as the gradients can push all weights in the same direction, making optimization harder. 1. Computationally expensive.
  • 23. Tanh Function • The activation that works almost always better than sigmoid function is Tanh function also known as Tangent Hyperbolic function. It’s actually mathematically shifted version of the sigmoid function. Both are similar and can be derived from each other. • Equation : • Value Range :- -1 to +1 • Derivative: (1- a²) • Nature :- non-linear* • Uses :- Usually used in hidden layers of a neural network as it’s values lies between -1 to 1 hence the mean for the hidden layer comes out be 0 or very close to it, hence helps in centering the data by bringing mean close to 0. This makes learning for the next layer much easier.
  • 24. Tanh Function Advantages: 1. Zero centered — Unlike the sigmoid function, the tanh function outputs values between −1 and 1. This helps the neural network model inputs with strong negative, neutral, and strong positive values more effectively, leading to faster convergence. 2. The function and its derivative both are monotonic.(consistently increase or decrease, simplifying learning) 3. Works better than sigmoid function Disadvantage: 4. It also suffers vanishing gradient problem and hence slow convergence.
  • 25. RELU Function •It Stands for Rectified linear unit. It is the most widely used activation function. Chiefly implemented in hidden layers of Neural network. •Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise. •Value Range :- [0, inf) •Nature :- non-linear, which means we can easily backpropagate the errors and have multiple layers of neurons being activated by the ReLU function. •Uses :- ReLu is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations. At a time only a few neurons are activated making the network sparse making it efficient and easy for computation. In simple words, RELU learns much faster than sigmoid and Tanh function.
  • 26. Softmax Function ● The softmax activation function is commonly used in the output layer of a neural network when performing multiclass classification. ● Nature :- non-linear ● softmax(x_i) = exp(x_i) / sum(exp(x_j)) for all j ● Uses :- Usually used when trying to handle multiple classes. The softmax function was commonly found in the output layer of image classification problems. ● The softmax function is particularly useful in multiclass classification tasks, where the goal is to predict the probability of each possible class for a given input.
  • 27. Activation function ● Sigmoid functions and their combinations generally work better in the case of classification problems. ● Sigmoid and tanh functions are sometimes avoided due to the vanishing gradient problem. ● ReLU activation function is widely used and is default choice as it yields better results. ● ReLU function should only be used in the hidden layers. ● An output layer can be linear activation function in case of regression problems.
  • 28. Activation function ● The basic rule of thumb is if you really don’t know what activation function to use, then simply use RELU as it is a general activation function in hidden layers and is used in most cases these days. ● If your output is for binary classification then, sigmoid function is very natural choice for output layer. ● If your output is for multi-class classification then, Softmax is very useful to predict the probabilities of each classes.
  • 29. What is the Perceptron model in Machine Learning? Perceptron is Machine Learning algorithm for supervised learning of various binary classification tasks. Further, Perceptron is also understood as an Artificial Neuron or neural network unit that helps to detect certain input data computations in business intelligence. Perceptron model is also treated as one of the best and simplest types of Artificial Neural networks. However, it is a supervised learning algorithm of binary classifiers. Hence, we can consider it as a single-layer neural network with four main parameters, i.e., input values, weights and Bias, net sum, and an activation function.
  • 30. What is Binary classifier in Machine Learning? A binary classifier is a model used to categorize data into two distinct classes (e.g., Yes/No, 1/-1, True/False). A binary classifier predicts which of two classes a given input belongs to: ● Positive class: Often labeled as 1. ● Negative class: Often labeled as −1 (or 0, depending on convention).
  • 31. Basic Components of Perceptron
  • 32. Basic Components of Perceptron ○ Input Nodes or Input Layer: This is the primary component of Perceptron which accepts the initial data into the system for further processing. Each input node contains a real numerical value. ○ Wight and Bias: Weight parameter represents the strength of the connection between units. This is another most important parameter of Perceptron components. Weight is directly proportional to the strength of the associated input neuron in deciding the output. Further, Bias can be considered as the line of intercept in a linear equation.
  • 33. Basic Components of Perceptron Activation Function: These are the final and important components that help to determine whether the neuron will fire or not. Activation Function can be considered primarily as a step function. Types of Activation functions: —-
  • 35. How does Perceptron work? In Machine Learning, Perceptron is considered as a single-layer neural network that consists of four main parameters named input values (Input nodes), weights and Bias, net sum, and an activation function. The perceptron model begins with the multiplication of all input values and their weights, then adds these values together to create the weighted sum. Then this weighted sum is applied to the activation function 'f' to obtain the desired output. This activation function is also known as the step function and is represented by 'f'. his step function or Activation function plays a vital role in ensuring that output is mapped between required values (0,1) or (-1,1). It is important to note that the weight of input is indicative of the strength of a node. Similarly, an input's bias value gives the ability to shift the activation function curve up or down.
  • 36. How does Perceptron work? Step-1 In the first step first, multiply all input values with corresponding weight values and then add them to determine the weighted sum. Mathematically, we can calculate the weighted sum as follows: ∑wi*xi = x1*w1 + x2*w2 +…wn*xn Add a special term called bias 'b' to this weighted sum to improve the model's performance. ∑wi*xi + b Step-2 In the second step, an activation function is applied with the above-mentioned weighted sum, which gives us output either in binary form or a continuous value as follows: Y = f(∑wi*xi + b)
  • 37. Single Layer Perceptron A single-layer perceptron is a type of artificial neural network that consists of only one layer of artificial neurons. It is the simplest type of neural network and was proposed by Frank Rosenblatt in 1958. Single layer perceptron has been used in various applications, including: Pattern Recognition, Binary Classification, , Control Systems, Medical Diagnosis, Financial Forecasting
  • 38. The perceptron consists of 4 parts: Input value or One input layer: The input layer of the perceptron is made of artificial input neurons and takes the initial data into the system for further processing. Weights and Bias: Weight: It represents the dimension or strength of the connection between units. Bias: It is the same as the intercept added in a linear equation. bias is a tunable parameter in neural networks that can help improve the accuracy and flexibility of the model by allowing it to learn more complex decision boundaries. Net sum: It calculates the total sum. Activation Function: A neuron can be activated or not, is determined by an activation function. The activation function calculates a weighted sum and further adding bias with it to give the result.
  • 39. The Perceptron Learning Rule 1. Initialize the weights: Start with random weights for each input. 2. Input the training data: Input the features into the perceptron and calculate the output. 3. Calculate the error: Compare the predicted output with the desired output to calculate the error. 4. Update the weights: Adjust the weights of the inputs based on the error. If the predicted output is less than the desired output, increase the weights of the inputs. If the predicted output is greater than the desired output, decrease the weights of the inputs. The magnitude of the weight adjustment is proportional to the error and the input value. Repeat: Repeat steps 2 to 4 until the error is minimized or a maximum number of iterations is reached
  • 41. Perceptron Function Perceptron is a function that maps its input “x,” which is multiplied with the learned weight coefficient; an output value” f(x)”is generated. In the equation given above: “w” = vector of real-valued weights “b” = bias (an element that adjusts the boundary away from origin without any dependence on the input value) “x” = vector of input x values “m” = number of inputs to the Perceptron The output can be represented as “1” or “0.” It can also be represented as “1” or “-1” depending on which activation function is used
  • 42. Activation Functions of Perceptron The activation function applies a step rule (convert the numerical output into +1 or -1) to check if the output of the weighting function is greater than zero or not. For example: If ∑ wixi> 0 => then final output “o” = 1 (issue bank loan) Else, final output “o” = -1 (deny bank loan) Step function gets triggered above a certain value of the neuron output; else it outputs zero. Sign Function outputs +1 or -1 depending on whether neuron output is greater than zero or not. Sigmoid is the S-curve and outputs a value between 0 and 1.
  • 43. Feedforward Neural Networks (FFNN) (FFNN) is a type of artificial neural network where the information flows in one direction only, from the input layer through one or more hidden layers to the output layer. The output of each layer is connected to the input of the next layer, and the weights and biases of the connections are learned during the training process. FFNNs are commonly used for tasks such as classification, Control systems such as robotics , and pattern recognition. They can be trained using supervised learning, where the training data consists of input- output pairs, and the network learns to map inputs to outputs. The weights and biases of the network are updated during the training process using backpropagation, which is an algorithm that computes the gradients of the loss function with respect to the weights and biases.
  • 44. How FFNN works The input layer of an FFNN takes in the input data, which is usually in the form of a vector. Passes it through a series of hidden layers, each consisting of a set of neurons. Each neuron in a hidden layer takes in the weighted sum of the outputs from the previous layer, adds a bias term, and applies an activation function to produce an output that is passed to the next layer. The output layer produces the final output of the network, which is usually a prediction or a classification.
  • 46. A Multi-Layer Perceptron (MLP) A Multi-Layer Perceptron (MLP) is a type of neural network that consists of multiple layers of artificial neurons. MLPs are also known as feedforward neural networks. The architecture of an MLP consists of an input layer, one or more hidden layers, and an output layer. Each layer is composed of multiple artificial neurons that compute a weighted sum of the input signals and apply an activation function to produce an output signal.
  • 47. A Multi-Layer Perceptron (MLP) The hidden layers in an MLP are responsible for extracting features from the input data and transforming them into a format that is suitable for the output layer. The output layer produces the final output of the network, which can be binary or continuous. The learning process of an MLP involves adjusting the weights of the input signals using backpropagation. Backpropagation allows the MLP to learn from the training data and improve its performance over time.
  • 49. Compare single layer and multilayer perceptron model Architecture: Single-layer perceptrons have only one layer of neurons that directly connects to the input data, whereas multilayer perceptrons consist of multiple layers of neurons, including one or more hidden layers that lie between the input and output layers. Capabilities: Single-layer perceptrons are limited to linearly separable problems, meaning they can only learn and classify data that can be separated by a single straight line. In contrast, multilayer perceptrons can learn and classify non-linearly separable problems by using hidden layers to transform the input data into a more complex feature space that can be separated by the output layer.
  • 50. Compare single layer and multilayer perceptron model Training: Single-layer perceptrons use a simple learning rule called the Perceptron Learning Algorithm, which adjusts the weights of the input signals to minimize the error between the predicted and actual output. In contrast, multilayer perceptrons use a more complex learning algorithm called backpropagation, which iteratively adjusts the weights of all the neurons in the network to minimize the error between the predicted and actual output. Applications: Single-layer perceptrons are typically used for simple binary classification problems, such as predicting whether an email is spam or not. Multilayer perceptrons are more powerful and can be used for a wide range of applications, including image and speech recognition, natural language processing, and financial forecasting.
  • 51. Back Propagation networks Back Propagation are supervised learning algorithms used for training neural networks. The basic structure of a backpropagation network consists of an input layer, one or more hidden layers, and an output layer. Each layer is composed of one or more neurons, which receive inputs, process them, and pass the outputs to the next layer. The connections between the neurons are weighted, and these weights are adjusted during training to improve the accuracy of the network's predictions.
  • 52. Back Propagation networks During the training process, the network is fed a set of input-output pairs, and the output of the network is compared to the desired output. The error between the actual output and the desired output is then back propagated through the network, and the weights are adjusted to reduce the error. This process is repeated many times, with the hope that the network will eventually converge to a set of weights that produces accurate predictions for new input data.
  • 54. How Backpropagation Algorithm Works: 1. Inputs X, arrive through the preconnected path 2. Input is modeled using real weights W. The weights are usually randomly selected. 3. Calculate the output for every neuron from the input layer, to the hidden layers, to the output layer. 4. Calculate the error in the outputs ErrorB= Actual Output – Desired Output 5. Travel back from the output layer to the hidden layer to adjust the weights such that the error is decreased.
  • 55. Why We Need Backpropagation? ● Backpropagation is fast, simple and easy to program ● It has no parameters to tune apart from the numbers of input ● It is a flexible method as it does not require prior knowledge about the network ● It is a standard method that generally works well ● It does not need any special mention of the features of the function to be learned.
  • 56. Concept of Linear Separability ● Linear separability is a concept in mathematics and particularly in machine learning. ● Imagine you have some points scattered around on a piece of paper, and you want to draw a straight line to separate them into two groups. ● If you can draw such a line where all the points of one group are on one side of the line, and all the points of the other group are on the other side, then those points are said to be linearly separable.
  • 57. Concept of Linear Separability ● For example, let's say you have red and blue dots on a sheet of paper, and you want to separate them with a straight line. If you can draw a line in such a way that all the red dots are on one side and all the blue dots are on the other side, then those dots are linearly separable. ● Linear separability is important in machine learning because it means that the data is easy to classify using a simple algorithm like a linear classifier. If data is not linearly separable, more complex methods may be needed to classify it accurately.
  • 58. Concept of Linear Separability ● Linear separability is an important concept in neural networks. If the separate points in n-dimensional space follows then it is said linearly separable ● For two-dimensional inputs, if there exists a line (whose equation is ) that separates all samples of one class from the other class, then an appropriate perception can be derived from the equation of the separating line. such classification problems are called “Linear separable” i.e, separating by a linear combination of i/p.
  • 59. Character Recognition Application Character recognition is a common application of neural networks, and can be achieved using various types of neural networks, including Feedforward neural networks, convolutional neural networks, and recurrent neural networks The network must be trained on a dataset of labeled character images in order to learn to recognize characters. During training, the network adjusts its weights based on the difference between its predicted output and the true label of the input image. Once the network is trained, it can be used to make predictions on new, unlabeled character images.
  • 60. OCR (Optical Character Recognition) OCR is a technology that analyzes the text of a page and turns the letters into code that may be used to process information. OCR is a technique for detecting printed or handwritten text characters inside digital images of paper files, such as scanning paper records OCR systems are hardware and software systems that turn physical documents into machine-readable text. These digital versions can be highly beneficial to children and young adults who struggle to read. The essential application of OCR is to convert hard copy legal or historical documents into PDFs.
  • 62. How OCR works? 1. Image Pre-Processing ● Size normalization: This step ensures that all images are of the same size for consistency. We use a method called bicubic interpolation to resize images to a standard size. ● Binarization: Here, we convert grayscale images to binary images by setting a threshold. Pixels above the threshold become white, while those below become black. This helps in simplifying the image for further processing. ● Smoothing: To make the edges of objects in the image smoother, we use erosion and dilation techniques. This helps in reducing noise and making the objects clearer.
  • 63. How OCR works? Text recognition: Once the image is pre-processed, we can start recognizing text. There are two main methods for this: ● Pattern matching: This works well for typed documents with known fonts. It compares parts of the image with patterns of characters it knows. ● Feature extraction: This method looks at specific features of characters, like lines and curves, to identify them.
  • 64. How OCR works? Postprocessing After recognizing the text, the system converts it into a digital format. Some systems also create annotated PDF files, which show both the original scanned document and the recognized text.
  • 65. Immunological computing Immunological computing is like using the principles of our immune system to teach computers how to recognize patterns, make decisions, and solve problems effectively. It's a fascinating area of research that draws inspiration from nature to develop smarter algorithms and systems. Immune System Basics: Our immune system is like a defense force in our body that helps to keep us healthy. It can recognize harmful invaders, like bacteria or viruses, and fight them off to keep us safe. It does this by identifying foreign substances called antigens and producing antibodies to neutralize them.
  • 66. Immunological computing How Immunological Computing Works: In immunological computing, we mimic the behavior of the immune system to solve computational problems. Just like our immune system learns to recognize and respond to threats, in immunological computing, algorithms learn to recognize patterns in data and make decisions based on them. Instead of antigens and antibodies, we use concepts like "patterns" and "rules". The algorithms adapt and improve over time, similar to how our immune system builds immunity to diseases.
  • 67. Immunological computing Applications: Immunological computing can be used in various fields such as data mining, pattern recognition, and optimization. For example, in anomaly detection, it can help identify unusual patterns in data that may indicate fraud or errors. In optimization problems, it can be used to find the best solution among many possibilities, similar to how our immune system finds the best response to different threats.
  • 68. Stochastic Gradient Descent Gradient Descent (GD): Imagine you are blindfolded on a hill and want to find the lowest point without any help. You feel the slope under your feet and take small steps downhill. This is like Gradient Descent where you iteratively move towards the minimum of a function by following the direction of steepest descent. Stochastic Gradient Descent (SGD): Now, let's add a twist. Instead of relying on the slope at your current location alone, you randomly pick a spot on the hill, feel the slope there, and take a step. Sometimes this spot might be flat or even uphill, but over many such random steps, you tend to move towards the bottom of the hill. This randomness helps in escaping local minima and can be faster than regular Gradient Descent, especially for large datasets.
  • 69. Stochastic Gradient Descent Stochastic Gradient Descent (SGD) is a variant of the Gradient Descent algorithm that is used for optimizing machine learning models. In SGD, instead of using the entire dataset for each iteration, only a single random training example (or a small batch) is selected to calculate the gradient and update the model parameters. This random selection introduces randomness into the optimization process, hence the term “stochastic” in stochastic Gradient Descent The advantage of using SGD is its computational efficiency, especially when dealing with large datasets. By using a single example or a small batch, the computational cost per iteration is significantly reduced compared to traditional Gradient Descent methods that require processing the entire dataset.
  • 70. Stochastic Gradient Descent How it works: ● Start with an initial guess for the minimum point. ● Randomly shuffle your dataset. ● For each data point in the shuffled dataset: ○ Compute the gradient of the loss function at that point (i.e., the direction of steepest descent). ○ Update your guess for the minimum point by taking a small step. ● Repeat this process for a fixed number of iterations or until the improvement becomes very small.

Editor's Notes

  • #9: Respond to environmental changes: By learning from past experiences, organisms can adjust their behavior to changing environmental conditions, such as changes in temperature, availability of resources, or presence of predators. Improve their performance: Organisms can improve their ability to perform various tasks, such as finding food or avoiding predators, through trial-and-error learning or observational learning. Develop new behaviors: Through learning, organisms can develop new behaviors that allow them to exploit new resources or adapt to new environmental challenges. Enhance communication: Learning can also improve communication between individuals within a species, allowing for the transmission of knowledge and cultural traditions.