0% found this document useful (0 votes)
11 views40 pages

WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I

The document discusses artificial neural networks and their biological inspiration from neurons in the brain. It provides details on the basic components and types of artificial neurons, including McCulloch-Pitts neurons, perceptrons, and feedforward neural networks. Activation functions such as sigmoid and tanh are also explained.

Uploaded by

Khwab Vachhani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views40 pages

WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I

The document discusses artificial neural networks and their biological inspiration from neurons in the brain. It provides details on the basic components and types of artificial neurons, including McCulloch-Pitts neurons, perceptrons, and feedforward neural networks. Activation functions such as sigmoid and tanh are also explained.

Uploaded by

Khwab Vachhani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Module - 4

Artificial Neural Network

By
Dr.Ramkumar.T
[email protected]

1
What is a Neural Network?
• Neural network: information processing
paradigm inspired by biological nervous
systems, such as our brain

• Structure: large number of highly


interconnected processing elements
(neurons) working together
2
Artificial Neural Network

• The most fundamental unit of a artificial neural


network is called an artificial neuron
• Why is it called a neuron ? Where does the
inspiration come from ?
• The inspiration comes from biology (more
specifically, from the brain – Biological Neurons)
• Biological neurons = neural cells = neural
processing units

3
Biological Neurons

dendrite: receives signals from other neurons


synapse: point of connection to other neurons
soma: processes the information
axon: transmits the output of this neuron

4
Biological Neurons

An average human brain has around 1011 (100 billion) neurons!

5
Biological Neurons
• Our sense organs interact with the outside world
• They relay information to the neurons
• The neurons (may) get activated and produces a response
(laughter in this case)
• Of course, in reality, it is not just a single neuron which
does all this
• There is a massively parallel interconnected network of
neurons
• Some of these neurons may fire (in red) in response to
this information and in turn relay information to other
neurons they are connected
• These neurons may also fire (again, in red) and the
process continues eventually resulting in a response
(laughter in this case)
6
Definition of ANN
“Data processing system consisting of a large
number of simple, highly interconnected
processing elements (artificial neurons) in an
architecture inspired by the structure of the
cerebral cortex of the brain”

(Tsoukalas & Uhrig, 1997).


7
FEW APPLICATIONS OF NEURAL NETWORKS

• Pattern recognition, e.g. handwritten characters


or face identification.
• Diagnosis or mapping symptoms to a medical
case.
• Speech recognition
• Human Emotion Detection
• Educational Loan Forecasting

8
Biological Neuron Artificial Neuron

Four basic components of a human biological The components of a basic artificial neuron
neuron

9
McCulloch – Pitts Neuron
• McCulloch (neuroscientist) and Pitts (logician) proposed a
highly simplified computational model of the neuron (1943)

• ‘g’ aggregates the inputs (which are binary in nature) and


apply a thresholding parameter and the function ‘f’ takes a
decision based on this thresholding logic.
• Some Boolean functions have been implemented using
McCulloch Pitts (MP) neuron 10
Modelling Boolean Functions using McCulloch Pitts
Neurons

11
Perceptron model (1958)

• How to handle real data inputs ?

• Are all inputs equal ?

• What if we want to assign more importance to some inputs ?

• Frank Rosenblatt, an American psychologist, proposed the


classical perceptron model (1958)

• Perceptron - A more general computational model than


McCulloch–Pitts neuron

12
Perceptron model (1958)
• Inputs are no longer limited to Boolean values

• Introduction of numerical weights for inputs

• Introduction of the term – ‘Bias’, an additional parameter which


imitates the additional input (always equal to 1) and has a
weight

• Bias allow the network to customize the input-output mapping


and produce arbitrary outputs

• Bias helps the model to fit the given input data and control the
triggering of output function
13
Perceptron model (1958)

A perceptron will fire if the weighted


sum of its inputs is greater than the
threshold
14
Modelling Various Gates

• Modelling of following function using Single perceptron is


possible

• AND
• OR
• NOT
• NAND
• NOR
• Single perceptron cannot deal XOR Gate data because it is
not linearly separable

• Hence a multilayer network of perceptron's with a single


hidden layer can be used to represent XOR Function. 15
Feed-forward Networks
• A feed-forward neural network, also called a multi-layer
perceptron, is a collection of neurons, organized in layers
• The number of layers in the network (excluding the input
layer) is known as depth
• The neurons are arranged in the form of a directed acyclic
graph i.e., the information only flows in one direction -
input x to output y. Hence the term feed-forward
• It is used to approximate some function f . For instance, f
could be a classifier that maps an input vector x to a
category y.
• Aims to minimize the loss function

16
Feed-forward Neural Network

17
Multi-Layered Feed-forward Networks
• Neural networks feedforward, also known as multi-layered
networks of neurons, are called "feedforward," where information
flows in one direction from the input layer to the output layer
without looping back.
• Multi-layered networks  one input layer, one output layer and
one or more than one hidden layer(s)
• The information only flows in one direction - input x to output y.
Hence the term feed-forward
• Aims to minimize the loss function

18
Multi- layered Feed-forward Neural Network

19
Feed-forward Networks

• In order to predict the output value, inputs


are propagated from the input layer to the
output layer
• This whole process from the input layer to
the output layer is known as forward
propagation
• During this propagation, they are multiplied
by their respective weights on each layer
and an activation function is applied on top
of them..
20
21
How many weights are in this model?
• Input to Hidden Layer 1:
4x3 = 12
• Hidden Layer to Output Layer
3x1 = 3
• Total:
12 + 3= 15
22
23
24
25
Feed-forward Neural Network - Layers
• It is composed of three types of layers:
a) Input Layer:
• The input layer accepts the input data and passes it to the next layer.
b) Hidden Layers:
• One or more hidden layers that process and transform the input data.
Each hidden layer has a set of neurons connected to the neurons of the
previous and next layers. These layers use activation functions, such as
ReLU or sigmoid, to introduce non-linearity into the network, allowing it
to learn and model more complex relationships between the inputs and
outputs.
c) Output Layer:
• The output layer generates the final output. Depending on the type of
problem, the number of neurons in the output layer may vary. For
example, in a binary classification problem, it would only have one
neuron. In contrast, a multi-class classification problem would have as
many neurons as the number of classes. 26
WHAT IS ACTIVATION FUNCTION?
• It is used to calculate the output response of a neuron.
• The sum of the weighted input signal is applied with
an activation to obtain the response.
• For neurons in the same layer, same activation
functions are used.
• These may be linear or non-linear activation functions.
• The non-linear activation functions are used in a
multilayer net.

27
Need for Sigmoid activation function
• The thresholding logic used by Step Function is harsh
• There will always be this sudden change in the decision (from
0 to 1) if the weighted sum crosses the threshold value
• For most real world applications we would expect a smoother
decision function which gradually changes from 0 to 1

• Introducing sigmoid neurons where the output function is


much smoother than the step function

• One form of the sigmoid function called as logistic function

28
Sigmoid Activation Function

• When the net input z to a sigmoid function


is a positive large number then y = 1.
• When the net input z to a sigmoid is a
negative large number then y = 0
• When the net input z to a sigmoid function
is 0 then y = 0.5

29
Sigmoid Activation Function
• The output of a sigmoid
a c t i va t i o n f u n c t i o n ra n g e s
between 0 and 1.
• The output of a neuron that has a
sigmoid activation function is
very smooth and gives nice
continuous derivatives, which
works well when training a
neural network.
• Because of its capability to
provide continuous values in the
range of 0 to 1, the sigmoid
function is generally used to
output probability with respect
to a given class for a binary 30

classification
Tanh Activation Function
• The input–output relationship for a tanh
activation function is expressed as

• Where z = w T x + b is the net input to the


tanh activation function.

31
Tanh Activation Function

Tanh activation functions can


output values between -1 and +1
 W he n t h e n et i n pu t ‘ z ’ i s a
positive large number, then Y =
1
 W he n t h e n et i n pu t ‘ z ’ i s a
negative large number, then Y =
-1
 When the net input ‘z’ is 0 then,
Y=0
32
Rectified Linear Unit(ReLU)
Activation Function
• In a rectified linear unit
activation function, the
output equals the net input
to the neuron if the overall
input is greater than 0;
• if the overall input is less
than or equal to 0 the neuron
outputs a 0.
• The range of values of
ReLU activation function
from 0 to ∞ 33
ANN – Supervised Machine Learning Setup
• M
• Modelling the relationship between ‘x’ and ‘y’ as a function
• For a sigmoid function

where,

• Parameters to be learnt : weight (w) & Bias (b)


• Algorithm to learn these parameters : Gradient descent (Back
Propagation of error)
• Objective/Loss/Error function - Learning algorithm aims to
minimize the loss function

34
How does ANN learn?
• The learning objective of ANN is to minimize the cost
function for better prediction
• How can we minimize the cost function?
• Neural network makes predictions using forward propagation.
• if we can change some values in the forward propagation, we
can predict the correct output and minimize the loss.
• But what values can we change in the forward propagation?
Obviously, we can't change input and output.
• Gradient descent – Optimization technique, used to learn
the optimal values of the randomly initialized weight
matrices
• With the optimal values of weights, ANN can predict the
correct output and minimize the loss.

35
Gradient descent
• Plotting the values of ‘J(w)’ – ie, Cost and ‘w’ – ie, Weight

• Gradients are the derivatives that are actually the slope of a tangent line.
• There exists a value of parameters ‘w’ which has the minimum cost ‘J(w)’
• The gradient of the cost function is calculated as partial derivative of cost
function ‘J ‘ with respect to each model parameter ‘wj’ . In notation,
• In the gradient descent algorithm, we start with random model parameters
and calculate the cost
• keep updating the model parameters to move closer to the values that
results in minimum cost.
36
Back propagation of Network
• With gradient descent, we move our weights to a position
where the cost is minimum . How do we update the weights ?
• Back propagating the network from the output layer to the
input layer and calculate the gradient of the cost function
with respect to all the weights between the output and the
input layer
• After calculating gradients, old weights can be updated using
weight update rule:
• α - alpha, is called as learning rate - Used to control the
amount of weight adjustment at each step of training.
• This whole process of back propagating the network from the
output layer to the input layer and updating the weights of the
network using gradient descent to minimize the loss is called
Back propagation. 37
Role of Learning Rate
• If the learning rate is large, then we take a large step and our gradient
descent will be fast, but we might fail to reach the global minimum and
become stuck at a local minimum.

• Hence the learning rate (step size) should be chosen optimally

38
Ways of doing Gradient descent
• Batch gradient descent
• Uses all of the training instances to update the model parameters
in each iteration.(All examples at once)
• Converges slowly with accurate estimates of the error gradient
• Stochastic Gradient Descent (SGD)
• Updates the parameters using only a single training instance in
each iteration. The training instance is usually selected
randomly. (one sample at a time)
• converges fast with noisy estimates of the error gradient.
• Mini-batch Gradient Descent
• Divides the training set into smaller size called batch denoted by
‘b’. Mini-batch ‘b’ is used to update the model parameters in
each iteration.
• Mini-batch gradient descent is the most common
implementation of gradient descent used in the field of deep
learning
• The smaller batch size makes the learning process faster, but the
variance of the validation dataset accuracy is higher.
39
• A bigger batch size has a slower learning process, but the
Epoch
• The number of times a whole dataset is passed through the
neural network model is called an epoch
• One epoch means that the training dataset is passed forward
and backward through the neural network once
• A too-small number of epochs results in under fitting because
the neural network has not learned much enough
• On the other hand, too many epochs will lead to over fitting
where the model can predict the data (train data) very well,
but cannot predict new unseen data(test data) well enough
• Assume that the total number of training records are 12000, if
the batch size is 6000, then 2 iterations are needed for
completing 1 epoch

40

You might also like