0% found this document useful (0 votes)
47 views

Unit 3

The document discusses artificial neural networks and how they are inspired by biological neural networks. It provides details on the typical structure of an artificial neural network, including inputs, nodes, weights, and outputs that correspond to dendrites, cell nuclei, synapses, and axons in biological neural networks. The document also gives a brief history of artificial neural networks, from early models in the 1940s-1950s to the development of backpropagation in the 1980s that helped advance the field.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Unit 3

The document discusses artificial neural networks and how they are inspired by biological neural networks. It provides details on the typical structure of an artificial neural network, including inputs, nodes, weights, and outputs that correspond to dendrites, cell nuclei, synapses, and axons in biological neural networks. The document also gives a brief history of artificial neural networks, from early models in the 1940s-1950s to the development of backpropagation in the 1980s that helped advance the field.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Neural Networks (NN)

Neurons

Scientists agree that our brain has around 100 billion neurons.

These neurons have hundreds of billions connections between them.

ANN learning is robust to errors in the training data and has been successfully
applied for learning real-valued, discrete-valued, and vector-valued functions
containing problems such as interpreting visual scenes, speech recognition, and
learning robot control strategies. The study of artificial neural networks (ANNs) has
been inspired in part by the observation that biological learning systems are built
of very complex webs of interconnected neurons in brains. The human brain
contains a densely interconnected network of approximately 10^11-10^12
neurons, each connected neuron, on average connected, to l0^4-10^5 other
neurons. So on average human brain takes approximately 10^-1 to make
surprisingly complex decisions. ANN systems are motivated to capture this kind of
highly parallel computation based on distributed representations. Generally, ANNs
are built out of a densely interconnected set of simple units, where each unit takes
a number of real-valued inputs and produces a single real-valued output.
But ANNs are less motivated by biological neural systems, there are many
complexities to biological neural systems that are not modeled by ANNs. Some of
them are shown in the figures.
Difference between Biological Neurons and Artificial Neurons

Biological Neurons Artificial Neurons

Major components: Axions, Dendrites, Major Components: Nodes, Inputs,


Synapse Outputs, Weights, Bias

Information from other neurons, in the The arrangements and connections of


form of electrical impulses, enters the the neurons made up the network and
dendrites at connection points called have three layers. The first layer is
synapses. The information flows from called the input layer and is the only
the dendrites to the cell where it is layer exposed to external signals. The
processed. The output signal, a train input layer transmits signals to the
of impulses, is then sent down the neurons in the next layer, which is
axon to the synapse of other neurons. called a hidden layer. The hidden layer
extracts relevant features or patterns
from the received signals. Those
features or patterns that are
considered important are then directed
to the output layer, which is the final
layer of the network.

A synapse is able to increase or The artificial signals can be changed by


decrease the strength of the weights in a manner similar to the
connection. This is where information physical changes that occur in the
is stored. synapses.

102– 104 neurons with current


Approx 1011 neurons. technology

Difference between the human brain and computers in terms of how


information is processed.

Human Brain(Biological Neuron


Network) Computers(Artificial Neuron Network)

The human brain works


asynchronously Computers(ANN) work synchronously.

Biological Neurons compute slowly Artificial Neurons compute fast (<1


(several ms per computation) nanosecond per computation)

The brain represents information in a In computer programs every bit has to


distributed way because neurons are function as intended otherwise these
unreliable and could die any time. programs would crash.
Our brain changes their connectivity The connectivity between the electronic
over time to represents new components in a computer never
information and requirements change unless we replace its
imposed on us. components.

Biological neural networks have


complicated topologies. ANNs are often in a tree structure.

Researchers are still to find out how ANNs use Gradient Descent for
the brain actually learns. learning.

History of Artificial Neural Network

The history of neural networking arguably began in the late 1800s with scientific
endeavors to study the activity of the human brain. In 1890, William James published
the first work about brain activity patterns. In 1943, McCulloch and Pitts created a
model of the neuron that is still used today in an artificial neural network. This model
is segmented in two parts

o A summation over-weighted inputs.


o An output function of the sum.

Artificial Neural Network (ANN):

In 1949, Donald Hebb published "The Organization of Behavior," which illustrated


a law for synaptic neuron learning. This law, later known as Hebbian Learning in
honor of Donald Hebb, is one of the most straight-forward and simple learning rules
for artificial neural networks.

In 1951, Narvin Minsky made the first Artificial Neural Network (ANN) while
working at Princeton.

In 1958, "The Computer and the Brain" were published, a year after Jhon von
Neumann's death. In that book, von Neumann proposed numerous extreme changes
to how analysts had been modeling the brain.

Perceptron:

Perceptron was created in 1958, at Cornell University by Frank Rosenblatt. The


perceptron was an endeavor to use neural network procedures for character
recognition. Perceptron was a linear system and was valuable for solving issues where
the input classes were linearly separable in the input space.
In 1960, Rosenblatt published the book principles of neurodynamics, containing a bit
of his research and ideas about modeling the brain.

Despite the early accomplishment of the perceptron and artificial neural network
research, there were many individuals who felt that there was a constrained guarantee
in these methods. Among these were Marvin Minsky and Seymour Papert, whose 1969
book perceptrons were used to dishonor ANN research and focus attention on the
apparent constraints of ANN work. One of the limitations that Minsky and Papert's
highlight was the fact that the Perceptron was not capable of distinguishing patterns
that are not linearly separable in input space with a linear classification problem.
Regardless of the disappointment of Perceptron to deal with non-linearly separable
data, it was not an inherent failure of the technology, but a matter of scale. Hecht-
Nielsen showed a two-layer perceptron (Mark) in 1990 that is a three-layer machine
that was equipped for tackling non-linear separation problems. Perceptrons
introduced what some call the "quiet years," where ANN research was at a minimum
of interest.

The backpropagation algorithm, initially found by Werbos in 1974, was


rediscovered in 1986 with the book Learning Internal Representation by Error
Propagation by Rumelhart, Hinton, and Williams. Backpropagation is a type of gradient
descent algorithm used with artificial neural networks for reduction and curve-fitting.

In 1987, the IEEE annual international ANN conference was begun for ANN scientists.
In 1987, the International Neural Network Society(INNS) was formed, along with
INNS neural Networking journal in 1988.

What is Artificial Neural Network?

The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are
known as nodes.
The given figure illustrates the typical diagram of Biological Neural Network.

The typical Artificial Neural Network looks something like the given figure.

Dendrites from Biological Neural Network represent inputs in Artificial Neural


Networks, cell nucleus represents Nodes, synapse represents Weights, and Axon
represents Output.

Relationship between Biological neural network and artificial neural network:

Biological Neural Artificial Neural


Network Network

Dendrites Inputs
Cell nucleus Nodes

Synapse Weights

Axon Output

An Artificial Neural Network in the field of Artificial intelligence where it attempts


to mimic the network of neurons makes up a human brain so that computers will have
an option to understand things and make decisions in a human-like manner. The
artificial neural network is designed by programming computers to behave simply like
interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human brain,
data is stored in such a manner as to be distributed, and we can extract more than one
piece of this data when necessary from our memory parallelly. We can say that the
human brain is made up of incredibly amazing parallel processors.

We can understand the artificial neural network with an example, consider an example
of a digital logic gate that takes an input and gives an output. "OR" gate, which takes
two inputs. If one or both the inputs are "On," then we get "On" in output. If both the
inputs are "Off," then we get "Off" in output. Here the output depends upon input. Our
brain does not perform the same task. The outputs to inputs relationship keep
changing because of the neurons in our brain, which are "learning."

The architecture of an artificial neural network:

To understand the concept of the architecture of an artificial neural network, we have


to understand what a neural network consists of. In order to define a neural network
that consists of a large number of artificial neurons, which are termed units arranged
in a sequence of layers. Lets us look at various types of layers available in an artificial
neural network.

Artificial Neural Network primarily consists of three layers:


Input Layer:

As the name suggests, it accepts inputs in several different formats provided by the
programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function.

It determines weighted total is passed as an input to an activation function to produce


the output. Activation functions choose whether a node should fire or not. Only those
who are fired make it to the output layer. There are distinctive activation functions
available that can be applied upon the sort of task we are performing.

Advantages of Artificial Neural Network (ANN)

Parallel processing capability:


Artificial neural networks have a numerical value that can perform more than one task
simultaneously.

Storing data on the entire network:

Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent
the network from working.

Capability to work with incomplete knowledge:

After ANN training, the information may produce output even with inadequate data.
The loss of performance here relies upon the significance of missing data.

Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples and to


encourage the network according to the desired output by demonstrating these
examples to the network. The succession of the network is directly proportional to the
chosen instances, and if the event can't appear to the network in all its aspects, it can
produce false output.

Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from generating output,
and this feature makes the network fault-tolerance.

Disadvantages of Artificial Neural Network:

Assurance of proper network structure:

There is no particular guideline for determining the structure of artificial neural


networks. The appropriate network structure is accomplished through experience, trial,
and error.

Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a testing solution, it does
not provide insight concerning why and how. It decreases trust in the network.

Hardware dependence:

Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
Difficulty of showing the issue to the network:

ANNs can work with numerical data. Problems must be converted into numerical
values before being introduced to ANN. The presentation mechanism to be resolved
here will directly impact the performance of the network. It relies on the user's abilities.

The duration of the network is unknown:

The network is reduced to a specific value of the error, and this value does not give us
optimum results.

How do artificial neural networks work?

Artificial Neural Network can be best represented as a weighted directed graph, where
the artificial neurons form the nodes. The association between the neurons outputs
and neuron inputs can be viewed as the directed edges with weights. The Artificial
Neural Network receives the input signal from the external source in the form of a
pattern and image in the form of a vector. These inputs are then mathematically
assigned by the notations x(n) for every n number of inputs.

Afterward, each of the input is multiplied by its corresponding weights ( these weights
are the details utilized by the artificial neural networks to solve a specific problem ). In
general terms, these weights normally represent the strength of the interconnection
between neurons inside the artificial neural network. All the weighted inputs are
summarized inside the computing unit.

If the weighted sum is equal to zero, then bias is added to make the output non-zero
or something else to scale up to the system's response. Bias has the same input, and
weight equals to 1. Here the total of weighted inputs can be in the range of 0 to
positive infinity. Here, to keep the response in the limits of the desired value, a certain
maximum value is benchmarked, and the total of weighted inputs is passed through
the activation function.

Neural Network: Algorithms


In a Neural Network, the learning (or training) process is initiated by dividing the
data into three different sets:
 Training dataset – This dataset allows the Neural Network to understand the
weights between nodes.
 Validation dataset – This dataset is used for fine-tuning the performance of
the Neural Network.
 Test dataset – This dataset is used to determine the accuracy and margin of
error of the Neural Network.
Once the data is segmented into these three parts, Neural Network algorithms are
applied to them for training the Neural Network. The procedure used for facilitating
the training process in a Neural Network is known as the optimization, and the
algorithm used is called the optimizer. There are different types of optimization
algorithms, each with their unique characteristics and aspects such as memory
requirements, numerical precision, and processing speed.
Before we dive into the discussion of the different Neural Network algorithms, let’s
understand the learning problem first.

Applications of Neural Network


1. Every new technology need assistance from the previous one i.e. data from
previous ones and these data are analyzed so that every pros and cons should be
studied correctly. All of these things are possible only through the help of neural
network.
2. Neural network is suitable for the research on Animal behavior, predator/prey
relationships and population cycles .
3. It would be easier to do proper valuation of property, buildings, automobiles,
machinery etc. with the help of neural network.
4. Neural Network can be used in betting on horse races, sporting events, and most
importantly in stock market.
5. It can be used to predict the correct judgment for any crime by using a large
data of crime details as input and the resulting sentences as output.
6. By analyzing data and determining which of the data has any fault ( files
diverging from peers ) called as Data mining, cleaning and validation can be
achieved through neural network.
7. Neural Network can be used to predict targets with the help of echo patterns we
get from sonar, radar, seismic and magnetic instruments.
8. It can be used efficiently in Employee hiring so that any company can hire the
right employee depending upon the skills the employee has and what should be its
productivity in future.
9. It has a large application in Medical Research .
10. It can be used to for Fraud Detection regarding credit cards , insurance or taxes
by analyzing the past records .

Introduction to ANN (Network Architectures)


An Artificial Neural Network (ANN) is an information processing paradigm that is
inspired by the brain. ANNs, like people, learn by examples. An ANN is configured
for a specific application, such as pattern recognition or data classification, through
a learning process. Learning largely involves adjustments to the synaptic
connections that exist between the neurons.
The model of an artificial neural network can be specified by three entities:

 Interconnections
 Activation functions
 Learning rules

Interconnections:

Interconnection can be defined as the way processing elements (Neuron) in ANN


are connected to each other. Hence, the arrangements of these processing
elements and geometry of interconnections are very essential in ANN.
These arrangements always have two layers that are common to all network
architectures, the Input layer and output layer where the input layer buffers the
input signal, and the output layer generates the output of the network. The third
layer is the Hidden layer, in which neurons are neither kept in the input layer nor in
the output layer. These neurons are hidden from the people who are interfacing
with the system and act as a black box to them. By increasing the hidden layers
with neurons, the system’s computational and processing power can be increased
but the training phenomena of the system get more complex at the same time.
Types of Artificial Neural Network:

There are various types of Artificial Neural Networks (ANN) depending upon the
human brain neuron and network functions, an artificial neural network similarly
performs tasks. The majority of the artificial neural networks will have some similarities
with a more complex biological partner and are very effective at their expected tasks.
For example, segmentation or classification.

Feedback ANN:

In this type of ANN, the output returns into the network to accomplish the best-
evolved results internally. As per the University of Massachusetts, Lowell Centre for
Atmospheric Research. The feedback networks feed information back into itself and
are well suited to solve optimization issues. The Internal system error corrections utilize
feedback ANNs.

Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an
output layer, and at least one layer of a neuron. Through assessment of its output by
reviewing its input, the intensity of the network can be noticed based on group
behavior of the associated neurons, and the output is decided. The primary
advantage of this network is that it figures out how to evaluate and recognize input
patterns.

There exist five basic types of neuron connection architecture :

1. Single-layer feed-forward network


2. Multilayer feed-forward network
3. Single node with its own feedback
4. Single-layer recurrent network
5. Multilayer recurrent network
6.
1. Single-layer feed-forward network
In this type of network, we have only two layers input layer and the output layer but
the input layer does not count because no computation is performed in this layer.
The output layer is formed when different weights are applied to input nodes and
the cumulative effect per node is taken. After this, the neurons collectively give the
output layer to compute the output signals.
2. Multilayer feed-forward network

This layer also has a hidden layer that is internal to the network and has no direct
contact with the external layer. The existence of one or more hidden layers enables
the network to be computationally stronger, a feed-forward network because of
information flow through the input function, and the intermediate computations
used to determine the output Z. There are no feedback connections in which
outputs of the model are fed back into itself.
3. Single node with its own feedback
Single Node with own Feedback

When outputs can be directed back as inputs to the same layer or preceding layer
nodes, then it results in feedback networks. Recurrent networks are feedback
networks with closed loops. The above figure shows a single recurrent network
having a single neuron with feedback to itself.
4. Single-layer recurrent network

The above network is a single-layer network with a feedback connection in which


the processing element’s output can be directed back to itself or to another
processing element or both. A recurrent neural network is a class of artificial neural
networks where connections between nodes form a directed graph along a
sequence. This allows it to exhibit dynamic temporal behavior for a time sequence.
Unlike feedforward neural networks, RNNs can use their internal state (memory) to
process sequences of inputs.
5. Multilayer recurrent network

In this type of network, processing element output can be directed to the


processing element in the same layer and in the preceding layer forming a
multilayer recurrent network. They perform the same task for every element of a
sequence, with the output being dependent on the previous computations. Inputs
are not needed at each time step. The main feature of a Recurrent Neural Network
is its hidden state, which captures some information about a sequence.

Activation Function
Definition

In artificial neural networks, an activation function is one that outputs a smaller value
for tiny inputs and a higher value if its inputs are greater than a threshold. An activation
function "fires" if the inputs are big enough; otherwise, nothing happens. An activation
function, then, is a gate that verifies how an incoming value is higher than a threshold
value.

Because they introduce non-linearities in neural networks and enable the neural
networks can learn powerful operations, activation functions are helpful. A feedforward
neural network might be refactored into a straightforward linear function or matrix
transformation on to its input if indeed the activation functions were taken out.

By generating a weighted total and then including bias with it, the activation function
determines whether a neuron should be turned on. The activation function seeks to
boost a neuron's output's nonlinearity.
Explanation: As we are aware, neurons in neural networks operate in accordance with
weight, bias, and their corresponding activation functions. Based on the mistake, the
values of the neurons inside a neural network would be modified. This process is
known as back-propagation. Back-propagation is made possible by activation
functions since they provide the gradients and error required to change the biases and
weights.

The two main categories of activation functions are:

o Linear Activation Function


o Non-linear Activation Functions
Learning and Adaption

The learning rule is a technique or a mathematical logic which encourages a neural


network to gain from the existing condition and uplift its performance. It is an iterative
procedure. In this tutorial, we will talk about the learning rules in Neural Network.

A learning rule or Learning process is a technique or a mathematical logic. It boosts


the Artificial Neural Network's performance and implements this rule over the network.
Thus learning rules refreshes the weights and bias levels of a network when a network
mimics in a particular data environment.
Perceptron Learning Algorithm

In Machine Learning and Artificial Intelligence, Perceptron is the most commonly


used term for all folks. It is the primary step to learn Machine Learning and Deep
Learning technologies, which consists of a set of weights, input values or scores, and
a threshold. Perceptron is a building block of an Artificial Neural Network.
Initially, in the mid of 19th century, Mr. Frank Rosenblatt invented the Perceptron for
performing certain calculations to detect input data capabilities or business
intelligence.

Perceptron is a linear supervised machine learning algorithm. It is used for binary


classification. This article will introduce you to a very important binary classifier, the
perceptrons, which forms the basis for the most popular machine learning models
nowadays – the neural networks.

Introduction

Perceptron Learning Algorithm is also understood as an Artificial Neuron or neural


network unit that helps to detect certain input data computations in business
intelligence. The perceptron learning algorithm is treated as the most straightforward
Artificial Neural network. It is a supervised learning algorithm of binary classifiers.
Hence, it is a single-layer neural network with four main parameters, i.e., input
values, weights and Bias, net sum, and an activation function.

What is Binary classifier in Machine Learning?

In Machine Learning, binary classifiers are defined as the function that helps in
deciding whether input data can be represented as vectors of numbers and belongs
to some specific class.

Binary classifiers can be considered as linear classifiers. In simple words, we can


understand it as a classification algorithm that can predict linear predictor
function in terms of weight and feature vectors.

Basic Components of Perceptron

Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:
o Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into the
system for further processing. Each input node contains a real numerical value.

o Wight and Bias:

Weight parameter represents the strength of the connection between units. This is
another most important parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in deciding the output.
Further, Bias can be considered as the line of intercept in a linear equation.

o Activation Function:

These are the final and important components that help to determine whether the
neuron will fire or not. Activation Function can be considered primarily as a step
function.

Types of Activation functions:

o Sign function
o Step function, and
o Sigmoid function
The data scientist uses the activation function to take a subjective decision based on
various problem statements and forms the desired outputs. Activation function may
differ (e.g., Sign, Step, and Sigmoid) in perceptron models by checking whether the
learning process is slow or has vanishing or exploding gradients.

How does Perceptron work?

In Machine Learning, Perceptron is considered as a single-layer neural network that


consists of four main parameters named input values (Input nodes), weights and Bias,
net sum, and an activation function. The perceptron model begins with the
multiplication of all input values and their weights, then adds these values together to
create the weighted sum. Then this weighted sum is applied to the activation function
'f' to obtain the desired output. This activation function is also known as the step
function and is represented by 'f'.

This step function or Activation function plays a vital role in ensuring that output is
mapped between required values (0,1) or (-1,1). It is important to note that the weight
of input is indicative of the strength of a node. Similarly, an input's bias value gives the
ability to shift the activation function curve up or down.

Perceptron model works in two important steps as follows:


Step-1

In the first step first, multiply all input values with corresponding weight values and
then add them to determine the weighted sum. Mathematically, we can calculate the
weighted sum as follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn

Add a special term called bias 'b' to this weighted sum to improve the model's
performance.

∑wi*xi + b

Step-2

In the second step, an activation function is applied with the above-mentioned


weighted sum, which gives us output either in binary form or a continuous value as
follows:

Y = f(∑wi*xi + b)

Types of Perceptron Models

Based on the layers, Perceptron models are divided into two types. These are as
follows:

1. Single-layer Perceptron Model


2. Multi-layer Perceptron model

Single Layer Perceptron Model:

This is one of the easiest Artificial neural networks (ANN) types. A single-layered
perceptron model consists feed-forward network and also includes a threshold
transfer function inside the model. The main objective of the single-layer perceptron
model is to analyze the linearly separable objects with binary outcomes.

In a single layer perceptron model, its algorithms do not contain recorded data, so it
begins with inconstantly allocated input for weight parameters. Further, it sums up all
inputs (weight). After adding all inputs, if the total sum of all inputs is more than a pre-
determined value, the model gets activated and shows the output value as +1.

If the outcome is same as pre-determined or threshold value, then the performance of


this model is stated as satisfied, and weight demand does not change. However, this
model consists of a few discrepancies triggered when multiple weight inputs values
are fed into the model. Hence, to find desired output and minimize errors, some
changes should be necessary for the weights input.

"Single-layer perceptron can learn only linearly separable patterns."

Multi-Layered Perceptron Model:

Like a single-layer perceptron model, a multi-layer perceptron model also has the
same model structure but has a greater number of hidden layers.

The multi-layer perceptron model is also known as the Backpropagation algorithm,


which executes in two stages as follows:

o Forward Stage: Activation functions start from the input layer in the forward
stage and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified
as per the model's requirement. In this stage, the error between actual output
and demanded originated backward on the output layer and ended on the
input layer.
Hence, a multi-layered perceptron model has considered as multiple artificial neural
networks having various layers in which activation function does not remain linear,
similar to a single layer perceptron model. Instead of linear, activation function can be
executed as sigmoid, TanH, ReLU, etc., for deployment.

A multi-layer perceptron model has greater processing power and can process linear
and non-linear patterns. Further, it can also implement logic gates such as AND, OR,
XOR, NAND, NOT, XNOR, NOR.

Example: -

Perceptron Networks are single-layer feed-forward networks. These are also called
Single Perceptron Networks. The Perceptron consists of an input layer, a hidden
layer, and output layer.

The input layer is connected to the hidden layer through weights which may be
inhibitory or excitery or zero (-1, +1 or 0). The activation function used is a binary
step function for the input layer and the hidden layer.

The output is
Y= f (y)

The activation function is:


The weight updation takes place between the hidden layer and the output layer to
match the target output. The error is calculated based on the actual output and the
desired output.

If the output matches the target then no weight updation takes place. The weights
are initially set to 0 or 1 and adjusted successively till an optimal solution is found.

The weights in the network can be set to any values initially. The Perceptron learning
will converge to weight vector that gives correct output for all input training pattern
and this learning happens in a finite number of steps.

The Perceptron rule can be used for both binary and bipolar inputs.

Learning Rule for Single Output Perceptron


1) Let there be “n” training input vectors and x (n) and t (n) are associated with the
target values.
2) Initialize the weights and bias. Set them to zero for easy calculation.
3) Let the learning rate be 1.
4) The input layer has identity activation function so x (i)= s ( i).
5) To calculate the output of the network:

6) The activation function is applied over the net input to obtain an output.

7) Now based on the output, compare the desired target value (t) and the actual
output.

8) Continue the iteration until there is no weight change. Stop once this condition is
achieved.
Learning Rule for Multiple Output Perceptron
1) Let there be “n” training input vectors and x (n) and t (n) are associated with the
target values.
2) Initialize the weights and bias. Set them to zero for easy calculation.
3) Let the learning rate be 1.
4) The input layer has identity activation function so x (i)= s ( i).
5) To calculate the output of each output vector from j= 1 to m, the net input is:

6) The activation function is applied over the net input to obtain an output.

7) Now based on the output, compare the desired target value (t) and the actual
output and make weight adjustments.

w is the weight vector of the connection links between ith input and jth output
neuron and t is the target output for the output unit j.
8) Continue the iteration until there is no weight change. Stop once this condition is
achieved.

Example Of Perceptron Learning Rule


Implementation of AND function using a Perceptron network for bipolar inputs and
output.

The input pattern will be x1, x2 and bias b. Let the initial weights be 0 and bias be 0.
The threshold is set to zero and the learning rate is 1.

AND Gate
X1 X2 Target

1 1 1

1 -1 -1

-1 1 -1
X1 X2 Target

-1 -1 -1

#1) X1=1 , X2= 1 and target output = 1


W1=w2=wb=0 and x1=x2=b=1, t=1
Net input= y =b + x1*w1+x2*w2 = 0+1*0 +1*0 =0
As threshold is zero therefore:

From here we get, output = 0. Now check if output (y) = target (t).

y = 0 but t= 1 which means that these are not same, hence weight updation takes
place.

The new weights are 1, 1, and 1 after the first input vector is presented.
#2) X1= 1 X2= -1 , b= 1 and target = -1, W1=1 ,W2=2, Wb=1
Net input= y =b + x1*w1+x2*w2 = 1+1*1 + (-1)*1 =1
The net output for input= 1 will be 1 from:
Net Calculated Weight New
Input Bias Target
Input Output Changes Weights

X1 X2 b t yin Y ^w1 ^w2 ^b W1 W2 wb

EPOCH 1

1 1 1 1 0 0 1 1 1 1 1 1

1 -1 1 -1 1 1 -1 1 -1 0 2 0

-1 1 1 -1 2 1 1 -1 -1 1 1 -1

-1 -1 1 -1 -3 -1 0 0 0 1 1 -1

EPOCH 2

1 1 1 1 1 1 0 0 0 1 1 -1

1 -1 1 -1 -1 -1 0 0 0 1 1 -1

-1 1 1 -1 -1 -1 0 0 0 1 1 -1

-1 -1 1 -1 -3 -1 0 0 0 1 1 -1

Therefore again, target = -1 does not match with the actual output =1. Weight
updates take place.

Now new weights are w1 = 0 w2 =2 and wb =0


Similarly, by continuing with the next set of inputs, we get the following table:

The EPOCHS are the cycle of input patterns fed to the system until there is no weight
change required and the iteration stops.
Hebbian Learning Algorithm

Hebb Network was stated by Donald Hebb in 1949. According to Hebb’s rule, the
weights are found to increase proportionately to the product of input and output. It
means that in a Hebb network if two neurons are interconnected then the weights
associated with these neurons can be increased by changes in the synaptic gap.
This network is suitable for bipolar data. The Hebbian learning rule is generally
applied to logic gates.

The basis of the theory is when our brains learn something new, neurons are
activated and connected with other neurons, forming a neural network. These
connections start off weak, but each time the stimulus is repeated, the connections
grow stronger and stronger, and the action becomes more intuitive.

A good example is the act of learning to drive. When you start out, everything you
do is incredibly deliberate. You remind yourself to turn on your indicator, to check
your blind spot, and so on. However, after years of experience, these processes
become so automatic that you perform them without even thinking.

Neurons that fire together, wire together.

In the beginning, the values of all weights are set to zero. This learning rule can be
utilized for both easy and hard activation functions. Since desired reactions of neurons
are not utilized in the learning process, this is the unsupervised learning rule. The
absolute values of the weights are directly proportional to the learning time, which is
undesired.

The weights are updated as:

W (new) = w (old) + x*y

ith value of w(new) = ith value of w(old) + (ith value of x * y)


Flowchart of Hebb training algorithm

Training Algorithm For Hebbian Learning Rule

The training steps of the algorithm are as follows:

 Initially, the weights are set to zero, i.e. w =0 for all inputs i =1 to n and
n is the total number of input neurons.
 Let s be the output. The activation function for inputs is generally set as
an identity function.
 The activation function for output is also set to y= t.
 The weight adjustments and bias are adjusted to:

 The steps 2 to 4 are repeated for each input vector and output.
Example Of Hebbian Learning Rule
Let us implement logical AND function with bipolar inputs using Hebbian Learning

X1 and X2 are inputs, b is the bias taken as 1, the target value is the output of logical
AND operation over inputs.

Input Input Bias Target

X1 X2 b y

1 1 1 1

1 -1 1 -1

-1 1 1 -1

-1 -1 1 -1

#1) Initially, the weights are set to zero and bias is also set as zero.
W1=w2=b=0

#2) First input vector is taken as [x1 x2 b] = [1 1 1] and target value is 1.


The new weights will be:

#3) The above weights are the final new weights. When the second input is passed,
these become the initial weights.
#4) Take the second input = [1 -1 1]. The target is -1.

#5) Similarly, the other inputs and weights are calculated.


The table below shows all the input:
Target Weight Bias New
Inputs Bias
Output Changes Changes Weights

X1 X2 b y ?w1 ?w2 ?b W1 W2 b

1 1 1 1 1 1 1 1 1 1

1 -1 1 -1 -1 1 -1 0 2 0

-1 1 1 -1 1 -1 -1 1 1 -1

-1 -1 1 -1 1 1 -1 2 2 -2

Hebb Net for AND Function

Backpropagation Algorithm in Neural Network

Backpropagation is one of the important concepts of a neural network. Our task is to


classify our data best. For this, we have to update the weights of parameter and bias,
but how can we do that in a deep neural network? In the linear regression model, we
use gradient descent to optimize the parameter. Similarly here we also use gradient
descent algorithm using Backpropagation.

For a single training example, Backpropagation algorithm calculates the gradient of


the error function. Backpropagation can be written as a function of the neural
network. Backpropagation algorithms are a set of methods used to efficiently train
artificial neural networks following a gradient descent approach which exploits the
chain rule.
The main features of Backpropagation are the iterative, recursive and efficient method
through which it calculates the updated weight to improve the network until it is not
able to perform the task for which it is being trained. Derivatives of the activation
function to be known at network design time is required to Backpropagation.

How Backpropagation Algorithm Works

The Back propagation algorithm in neural network computes the gradient of the loss
function for a single weight by the chain rule. It efficiently computes one layer at a
time, unlike a native direct computation. It computes the gradient, but it does not
define how the gradient is used. It generalizes the computation in the delta rule.

Consider the following Back propagation neural network example diagram to


understand:

1. Inputs X, arrive through the preconnected path


2. Input is modeled using real weights W. The weights are usually randomly
selected.
3. Calculate the output for every neuron from the input layer, to the hidden
layers, to the output layer.
4. Calculate the error in the outputs

ErrorB= Actual Output – Desired Output

5. Travel back from the output layer to the hidden layer to adjust the weights
such that the error is decreased.

Keep repeating the process until the desired output is achieved


Why We Need Backpropagation?
Most prominent advantages of Backpropagation are:

 Backpropagation is fast, simple and easy to program


 It has no parameters to tune apart from the numbers of input
 It is a flexible method as it does not require prior knowledge about the
network
 It is a standard method that generally works well
 It does not need any special mention of the features of the function to be
learned.

Types of Backpropagation Networks


Two Types of Backpropagation Networks are:

 Static Back-propagation
 Recurrent Backpropagation

1. Static back-propagation:
It is one kind of backpropagation network which produces a mapping of a static
input for static output. It is useful to solve static classification issues like optical
character recognition.

2. Recurrent Backpropagation:
Recurrent Back propagation in data mining is fed forward until a fixed value is
achieved. After that, the error is computed and propagated backward.

The main difference between both of these methods is: that the mapping is rapid in
static back-propagation while it is nonstatic in recurrent backpropagation.

Now, how error function is used in Backpropagation and how Backpropagation works?
Let start with an example and do it mathematically to understand how exactly updates
the weight using Backpropagation.
Input values

X1=0.05
X2=0.10

Initial weight

W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55

Bias Values

b1=0.35 b2=0.60

Target Values

T1=0.01
T2=0.99

Now, we first calculate the values of H1 and H2 by a forward pass.

Forward Pass

To find the value of H1 we first multiply the input value from the weights as

H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775
To calculate the final result of H1, we performed the sigmoid function as

We will calculate the value of H2 in the same way as H1

H2=x1×w3+x2×w4+b1
H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925

To calculate the final result of H1, we performed the sigmoid function as

Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and
H2.

To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2
from the weights as

y1=H1×w5+H2×w6+b2
y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597

To calculate the final result of y1 we performed the sigmoid function as


We will calculate the value of y2 in the same way as y1

y2=H1×w7+H2×w8+b2
y2=0.593269992×0.50+0.596884378×0.55+0.60
y2=1.2249214

To calculate the final result of H1, we performed the sigmoid function as

Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our target
values T1 and T2.

Now, we will find the total error, which is simply the difference between the outputs
from the target outputs. The total error is calculated as

So, the total error is


Now, we will backpropagate this error to update the weights using a backward pass.

Backward pass at the output layer

To update the weight, we calculate the error correspond to each weight with the help
of a total error. The error on weight w is calculated by differentiating total error with
respect to w.

We perform backward process so first consider the last weight w5 as

From equation two, it is clear that we cannot partially differentiate it with respect to
w5 because there is no any w5. We split equation one into multiple terms so that we
can easily differentiate it with respect to w5 as

Now, we calculate each term one by one to differentiate Etotal with respect to w5 as
Putting the value of e-y in equation (5)
So, we put the values of in equation no (3) to find the final result.

Now, we will calculate the updated weight w5new with the help of the following formula

In the same way, we calculate w6new,w7new, and w8new and this will give us the following
values

w5new=0.35891648
w6new=408666186
w7new=0.511301270
w8new=0.561370121

Backward pass at Hidden layer

Now, we will backpropagate to our hidden layer and update the weight w1, w2, w3,
and w4 as we have done with w5, w6, w7, and w8 weights.

We will calculate the error at w1 as

From equation (2), it is clear that we cannot partially differentiate it with respect to w1
because there is no any w1. We split equation (1) into multiple terms so that we can
easily differentiate it with respect to w1 as
Now, we calculate each term one by one to differentiate Etotal with respect to w1 as

We again split this because there is no any H1final term in Etoatal as

will again split because in E1 and E2 there is no H1 term. Splitting


is done as

We again Split both because there is no any y1 and y2 term in E1 and E2. We split it as
Now, we find the value of by putting values in equation (18) and (19) as

From equation (18)

From equation (8)

From equation (19)


Putting the value of e-y2 in equation (23)

From equation (21)


Now from equation (16) and (17)

Put the value of in equation (15) as


We have we need to figure out as

Putting the value of e-H1 in equation (30)

We calculate the partial derivative of the total net input to H1 with respect to w1 the
same as we did for the output neuron:
So, we put the values of in equation (13) to find the final result.

Now, we will calculate the updated weight w1new with the help of the following formula

In the same way, we calculate w2new,w3new, and w4 and this will give us the following
values

w1new=0.149780716
w2new=0.19956143
w3new=0.24975114
w4new=0.29950229

We have updated all the weights. We found the error 0.298371109 on the network
when we fed forward the 0.05 and 0.1 inputs. In the first round of Backpropagation,
the total error is down to 0.291027924. After repeating this process 10,000, the total
error is down to 0.0000351085. At this point, the outputs neurons generate
0.159121960 and 0.984065734 i.e., nearby our target value when we feed forward the
0.05 and 0.1.
Associate Memory Network
These kinds of neural networks work on the basis of pattern association, which means
they can store different patterns and at the time of giving an output they can produce
one of the stored patterns by matching them with the given input pattern. These types
of memories are also called Content-Addressable Memory CAM. Associative
memory makes a parallel search with the stored patterns as data files.

A content addressable memory structure is a kind of memory structure that enables


the recollection of data based on the intensity of similarity between the input pattern
and the patterns stored in the memory.

In this condition, this type of memory is robust and fault-tolerant because of this type
of memory model, and some form of error-correction capability.

An associate memory is obtained by its content, adjacent to an explicit address


in the traditional computer memory system. The memory enables the recollection
of information based on incomplete knowledge of its contents.

Working of Associative Memory:

Associative memory is a depository of associated pattern which in some form. If the


depository is triggered with a pattern, the associated pattern pair appear at the output.
The input could be an exact or partial representation of a stored pattern.

If the memory is produced with an input pattern, may say α, the associated
pattern ω is recovered automatically.
There are two types of associate memory –

 Auto Associative Memory


 Hetero Associative memory

Auto-associative memory:

An auto-associative memory recovers a previously stored pattern that most closely


relates to the current pattern. It is also known as an auto-associative correlator.

This is a single layer neural network in which the input training vector and the output
target vectors are the same. The weights are determined so that the network stores a
set of patterns.

Architecture

As shown in the following figure, the architecture of Auto Associative memory


network has ‘n’ number of input training vectors and similar ‘n’ number of output
target vectors.
Hetero Associative memory

Similar to Auto Associative Memory network, this is also a single layer neural
network. However, in this network the input training vector and the output target
vectors are not the same. The weights are determined so that the network stores a
set of patterns. Hetero associative network is static in nature, hence, there would be
no non-linear and delay operations.

In a hetero-associate memory, the recovered pattern is generally different from the


input pattern not only in type and format but also in content. It is also known as
a hetero-associative correlator.

Architecture

As shown in the following figure, the architecture of Hetero Associative Memory


network has ‘n’ number of input training vectors and ‘m’ number of output target
vectors.
Hopfield Networks

Hopfield Network

Hopfield network is a special kind of neural network whose response is different from
other neural networks. It is calculated by converging iterative process. It has just one
layer of neurons relating to the size of the input and output, which must be the same.
When such a network recognizes, for example, digits, we present a list of correctly
rendered digits to the network. Subsequently, the network can transform a noise input
to the relating perfect output.

In 1982, John Hopfield introduced an artificial neural network to collect and retrieve
memory like the human brain. Here, a neuron is either on or off the situation. The state
of a neuron(on +1 or off 0) will be restored, relying on the input it receives from the
other neuron. A Hopfield network is at first prepared to store various patterns or
memories. Afterward, it is ready to recognize any of the learned patterns by uncovering
partial or even some corrupted data about that pattern, i.e., it eventually settles down
and restores the closest pattern. Thus, similar to the human brain, the Hopfield model
has stability in pattern recognition.

A Hopfield network is a single-layered and recurrent network in which the neurons are
entirely connected, i.e., each neuron is associated with other neurons. If there are two
neurons i and j, then there is a connectivity weight wij lies between them which is
symmetric wij = wji . With zero self-connectivity, Wii =0

The Hopfield network is commonly used for auto-association and optimization tasks.

Discrete Hopfield Network

A Hopfield network which operates in a discrete line fashion or in other words, it can
be said the input and output patterns are discrete vector, which can be either
binary 0,10,1 or bipolar +1,−1+1,−1 in nature. The network has symmetrical weights
with no self-connections i.e., wij = wji and wii = 0.

Architecture

Following are some important points to keep in mind about discrete Hopfield
network −

 This model consists of neurons with one inverting and one non-inverting
output.
 The output of each neuron should be the input of other neurons but not the
input of self.
 Weight/connection strength is represented by wij.
 Connections can be excitatory as well as inhibitory. It would be excitatory, if
the output of the neuron is same as the input, otherwise inhibitory.
 Weights should be symmetrical, i.e. wij = wji

The output from Y1 going to Y2, Yi and Yn have the


weights w12, w1i and w1n respectively. Similarly, other arcs have the weights on them.

Training Algorithm

During training of discrete Hopfield network, weights will be updated. As we know


that we can have the binary input vectors as well as bipolar input vectors. Hence, in
both the cases, weight updates can be done with the following relation
Energy Function Evaluation

An energy function is defined as a function that is bonded and non-increasing


function of the state of the system.

Energy function Ef⁡, ⁡also called Lyapunov function determines the stability of
discrete Hopfield network, and is characterized as follows −

You might also like