Unit 3
Unit 3
Neurons
Scientists agree that our brain has around 100 billion neurons.
ANN learning is robust to errors in the training data and has been successfully
applied for learning real-valued, discrete-valued, and vector-valued functions
containing problems such as interpreting visual scenes, speech recognition, and
learning robot control strategies. The study of artificial neural networks (ANNs) has
been inspired in part by the observation that biological learning systems are built
of very complex webs of interconnected neurons in brains. The human brain
contains a densely interconnected network of approximately 10^11-10^12
neurons, each connected neuron, on average connected, to l0^4-10^5 other
neurons. So on average human brain takes approximately 10^-1 to make
surprisingly complex decisions. ANN systems are motivated to capture this kind of
highly parallel computation based on distributed representations. Generally, ANNs
are built out of a densely interconnected set of simple units, where each unit takes
a number of real-valued inputs and produces a single real-valued output.
But ANNs are less motivated by biological neural systems, there are many
complexities to biological neural systems that are not modeled by ANNs. Some of
them are shown in the figures.
Difference between Biological Neurons and Artificial Neurons
Researchers are still to find out how ANNs use Gradient Descent for
the brain actually learns. learning.
The history of neural networking arguably began in the late 1800s with scientific
endeavors to study the activity of the human brain. In 1890, William James published
the first work about brain activity patterns. In 1943, McCulloch and Pitts created a
model of the neuron that is still used today in an artificial neural network. This model
is segmented in two parts
In 1951, Narvin Minsky made the first Artificial Neural Network (ANN) while
working at Princeton.
In 1958, "The Computer and the Brain" were published, a year after Jhon von
Neumann's death. In that book, von Neumann proposed numerous extreme changes
to how analysts had been modeling the brain.
Perceptron:
Despite the early accomplishment of the perceptron and artificial neural network
research, there were many individuals who felt that there was a constrained guarantee
in these methods. Among these were Marvin Minsky and Seymour Papert, whose 1969
book perceptrons were used to dishonor ANN research and focus attention on the
apparent constraints of ANN work. One of the limitations that Minsky and Papert's
highlight was the fact that the Perceptron was not capable of distinguishing patterns
that are not linearly separable in input space with a linear classification problem.
Regardless of the disappointment of Perceptron to deal with non-linearly separable
data, it was not an inherent failure of the technology, but a matter of scale. Hecht-
Nielsen showed a two-layer perceptron (Mark) in 1990 that is a three-layer machine
that was equipped for tackling non-linear separation problems. Perceptrons
introduced what some call the "quiet years," where ANN research was at a minimum
of interest.
In 1987, the IEEE annual international ANN conference was begun for ANN scientists.
In 1987, the International Neural Network Society(INNS) was formed, along with
INNS neural Networking journal in 1988.
The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are
known as nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human brain,
data is stored in such a manner as to be distributed, and we can extract more than one
piece of this data when necessary from our memory parallelly. We can say that the
human brain is made up of incredibly amazing parallel processors.
We can understand the artificial neural network with an example, consider an example
of a digital logic gate that takes an input and gives an output. "OR" gate, which takes
two inputs. If one or both the inputs are "On," then we get "On" in output. If both the
inputs are "Off," then we get "Off" in output. Here the output depends upon input. Our
brain does not perform the same task. The outputs to inputs relationship keep
changing because of the neurons in our brain, which are "learning."
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function.
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent
the network from working.
After ANN training, the information may produce output even with inadequate data.
The loss of performance here relies upon the significance of missing data.
Extortion of one or more cells of ANN does not prohibit it from generating output,
and this feature makes the network fault-tolerance.
It is the most significant issue of ANN. When ANN produces a testing solution, it does
not provide insight concerning why and how. It decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
Difficulty of showing the issue to the network:
ANNs can work with numerical data. Problems must be converted into numerical
values before being introduced to ANN. The presentation mechanism to be resolved
here will directly impact the performance of the network. It relies on the user's abilities.
The network is reduced to a specific value of the error, and this value does not give us
optimum results.
Artificial Neural Network can be best represented as a weighted directed graph, where
the artificial neurons form the nodes. The association between the neurons outputs
and neuron inputs can be viewed as the directed edges with weights. The Artificial
Neural Network receives the input signal from the external source in the form of a
pattern and image in the form of a vector. These inputs are then mathematically
assigned by the notations x(n) for every n number of inputs.
Afterward, each of the input is multiplied by its corresponding weights ( these weights
are the details utilized by the artificial neural networks to solve a specific problem ). In
general terms, these weights normally represent the strength of the interconnection
between neurons inside the artificial neural network. All the weighted inputs are
summarized inside the computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero
or something else to scale up to the system's response. Bias has the same input, and
weight equals to 1. Here the total of weighted inputs can be in the range of 0 to
positive infinity. Here, to keep the response in the limits of the desired value, a certain
maximum value is benchmarked, and the total of weighted inputs is passed through
the activation function.
Interconnections
Activation functions
Learning rules
Interconnections:
There are various types of Artificial Neural Networks (ANN) depending upon the
human brain neuron and network functions, an artificial neural network similarly
performs tasks. The majority of the artificial neural networks will have some similarities
with a more complex biological partner and are very effective at their expected tasks.
For example, segmentation or classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-
evolved results internally. As per the University of Massachusetts, Lowell Centre for
Atmospheric Research. The feedback networks feed information back into itself and
are well suited to solve optimization issues. The Internal system error corrections utilize
feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an
output layer, and at least one layer of a neuron. Through assessment of its output by
reviewing its input, the intensity of the network can be noticed based on group
behavior of the associated neurons, and the output is decided. The primary
advantage of this network is that it figures out how to evaluate and recognize input
patterns.
This layer also has a hidden layer that is internal to the network and has no direct
contact with the external layer. The existence of one or more hidden layers enables
the network to be computationally stronger, a feed-forward network because of
information flow through the input function, and the intermediate computations
used to determine the output Z. There are no feedback connections in which
outputs of the model are fed back into itself.
3. Single node with its own feedback
Single Node with own Feedback
When outputs can be directed back as inputs to the same layer or preceding layer
nodes, then it results in feedback networks. Recurrent networks are feedback
networks with closed loops. The above figure shows a single recurrent network
having a single neuron with feedback to itself.
4. Single-layer recurrent network
Activation Function
Definition
In artificial neural networks, an activation function is one that outputs a smaller value
for tiny inputs and a higher value if its inputs are greater than a threshold. An activation
function "fires" if the inputs are big enough; otherwise, nothing happens. An activation
function, then, is a gate that verifies how an incoming value is higher than a threshold
value.
Because they introduce non-linearities in neural networks and enable the neural
networks can learn powerful operations, activation functions are helpful. A feedforward
neural network might be refactored into a straightforward linear function or matrix
transformation on to its input if indeed the activation functions were taken out.
By generating a weighted total and then including bias with it, the activation function
determines whether a neuron should be turned on. The activation function seeks to
boost a neuron's output's nonlinearity.
Explanation: As we are aware, neurons in neural networks operate in accordance with
weight, bias, and their corresponding activation functions. Based on the mistake, the
values of the neurons inside a neural network would be modified. This process is
known as back-propagation. Back-propagation is made possible by activation
functions since they provide the gradients and error required to change the biases and
weights.
Introduction
In Machine Learning, binary classifiers are defined as the function that helps in
deciding whether input data can be represented as vectors of numbers and belongs
to some specific class.
Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:
o Input Nodes or Input Layer:
This is the primary component of Perceptron which accepts the initial data into the
system for further processing. Each input node contains a real numerical value.
Weight parameter represents the strength of the connection between units. This is
another most important parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in deciding the output.
Further, Bias can be considered as the line of intercept in a linear equation.
o Activation Function:
These are the final and important components that help to determine whether the
neuron will fire or not. Activation Function can be considered primarily as a step
function.
o Sign function
o Step function, and
o Sigmoid function
The data scientist uses the activation function to take a subjective decision based on
various problem statements and forms the desired outputs. Activation function may
differ (e.g., Sign, Step, and Sigmoid) in perceptron models by checking whether the
learning process is slow or has vanishing or exploding gradients.
This step function or Activation function plays a vital role in ensuring that output is
mapped between required values (0,1) or (-1,1). It is important to note that the weight
of input is indicative of the strength of a node. Similarly, an input's bias value gives the
ability to shift the activation function curve up or down.
In the first step first, multiply all input values with corresponding weight values and
then add them to determine the weighted sum. Mathematically, we can calculate the
weighted sum as follows:
Add a special term called bias 'b' to this weighted sum to improve the model's
performance.
∑wi*xi + b
Step-2
Y = f(∑wi*xi + b)
Based on the layers, Perceptron models are divided into two types. These are as
follows:
This is one of the easiest Artificial neural networks (ANN) types. A single-layered
perceptron model consists feed-forward network and also includes a threshold
transfer function inside the model. The main objective of the single-layer perceptron
model is to analyze the linearly separable objects with binary outcomes.
In a single layer perceptron model, its algorithms do not contain recorded data, so it
begins with inconstantly allocated input for weight parameters. Further, it sums up all
inputs (weight). After adding all inputs, if the total sum of all inputs is more than a pre-
determined value, the model gets activated and shows the output value as +1.
Like a single-layer perceptron model, a multi-layer perceptron model also has the
same model structure but has a greater number of hidden layers.
o Forward Stage: Activation functions start from the input layer in the forward
stage and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified
as per the model's requirement. In this stage, the error between actual output
and demanded originated backward on the output layer and ended on the
input layer.
Hence, a multi-layered perceptron model has considered as multiple artificial neural
networks having various layers in which activation function does not remain linear,
similar to a single layer perceptron model. Instead of linear, activation function can be
executed as sigmoid, TanH, ReLU, etc., for deployment.
A multi-layer perceptron model has greater processing power and can process linear
and non-linear patterns. Further, it can also implement logic gates such as AND, OR,
XOR, NAND, NOT, XNOR, NOR.
Example: -
Perceptron Networks are single-layer feed-forward networks. These are also called
Single Perceptron Networks. The Perceptron consists of an input layer, a hidden
layer, and output layer.
The input layer is connected to the hidden layer through weights which may be
inhibitory or excitery or zero (-1, +1 or 0). The activation function used is a binary
step function for the input layer and the hidden layer.
The output is
Y= f (y)
If the output matches the target then no weight updation takes place. The weights
are initially set to 0 or 1 and adjusted successively till an optimal solution is found.
The weights in the network can be set to any values initially. The Perceptron learning
will converge to weight vector that gives correct output for all input training pattern
and this learning happens in a finite number of steps.
The Perceptron rule can be used for both binary and bipolar inputs.
6) The activation function is applied over the net input to obtain an output.
7) Now based on the output, compare the desired target value (t) and the actual
output.
8) Continue the iteration until there is no weight change. Stop once this condition is
achieved.
Learning Rule for Multiple Output Perceptron
1) Let there be “n” training input vectors and x (n) and t (n) are associated with the
target values.
2) Initialize the weights and bias. Set them to zero for easy calculation.
3) Let the learning rate be 1.
4) The input layer has identity activation function so x (i)= s ( i).
5) To calculate the output of each output vector from j= 1 to m, the net input is:
6) The activation function is applied over the net input to obtain an output.
7) Now based on the output, compare the desired target value (t) and the actual
output and make weight adjustments.
w is the weight vector of the connection links between ith input and jth output
neuron and t is the target output for the output unit j.
8) Continue the iteration until there is no weight change. Stop once this condition is
achieved.
The input pattern will be x1, x2 and bias b. Let the initial weights be 0 and bias be 0.
The threshold is set to zero and the learning rate is 1.
AND Gate
X1 X2 Target
1 1 1
1 -1 -1
-1 1 -1
X1 X2 Target
-1 -1 -1
From here we get, output = 0. Now check if output (y) = target (t).
y = 0 but t= 1 which means that these are not same, hence weight updation takes
place.
The new weights are 1, 1, and 1 after the first input vector is presented.
#2) X1= 1 X2= -1 , b= 1 and target = -1, W1=1 ,W2=2, Wb=1
Net input= y =b + x1*w1+x2*w2 = 1+1*1 + (-1)*1 =1
The net output for input= 1 will be 1 from:
Net Calculated Weight New
Input Bias Target
Input Output Changes Weights
EPOCH 1
1 1 1 1 0 0 1 1 1 1 1 1
1 -1 1 -1 1 1 -1 1 -1 0 2 0
-1 1 1 -1 2 1 1 -1 -1 1 1 -1
-1 -1 1 -1 -3 -1 0 0 0 1 1 -1
EPOCH 2
1 1 1 1 1 1 0 0 0 1 1 -1
1 -1 1 -1 -1 -1 0 0 0 1 1 -1
-1 1 1 -1 -1 -1 0 0 0 1 1 -1
-1 -1 1 -1 -3 -1 0 0 0 1 1 -1
Therefore again, target = -1 does not match with the actual output =1. Weight
updates take place.
The EPOCHS are the cycle of input patterns fed to the system until there is no weight
change required and the iteration stops.
Hebbian Learning Algorithm
Hebb Network was stated by Donald Hebb in 1949. According to Hebb’s rule, the
weights are found to increase proportionately to the product of input and output. It
means that in a Hebb network if two neurons are interconnected then the weights
associated with these neurons can be increased by changes in the synaptic gap.
This network is suitable for bipolar data. The Hebbian learning rule is generally
applied to logic gates.
The basis of the theory is when our brains learn something new, neurons are
activated and connected with other neurons, forming a neural network. These
connections start off weak, but each time the stimulus is repeated, the connections
grow stronger and stronger, and the action becomes more intuitive.
A good example is the act of learning to drive. When you start out, everything you
do is incredibly deliberate. You remind yourself to turn on your indicator, to check
your blind spot, and so on. However, after years of experience, these processes
become so automatic that you perform them without even thinking.
In the beginning, the values of all weights are set to zero. This learning rule can be
utilized for both easy and hard activation functions. Since desired reactions of neurons
are not utilized in the learning process, this is the unsupervised learning rule. The
absolute values of the weights are directly proportional to the learning time, which is
undesired.
Initially, the weights are set to zero, i.e. w =0 for all inputs i =1 to n and
n is the total number of input neurons.
Let s be the output. The activation function for inputs is generally set as
an identity function.
The activation function for output is also set to y= t.
The weight adjustments and bias are adjusted to:
The steps 2 to 4 are repeated for each input vector and output.
Example Of Hebbian Learning Rule
Let us implement logical AND function with bipolar inputs using Hebbian Learning
X1 and X2 are inputs, b is the bias taken as 1, the target value is the output of logical
AND operation over inputs.
X1 X2 b y
1 1 1 1
1 -1 1 -1
-1 1 1 -1
-1 -1 1 -1
#1) Initially, the weights are set to zero and bias is also set as zero.
W1=w2=b=0
#3) The above weights are the final new weights. When the second input is passed,
these become the initial weights.
#4) Take the second input = [1 -1 1]. The target is -1.
X1 X2 b y ?w1 ?w2 ?b W1 W2 b
1 1 1 1 1 1 1 1 1 1
1 -1 1 -1 -1 1 -1 0 2 0
-1 1 1 -1 1 -1 -1 1 1 -1
-1 -1 1 -1 1 1 -1 2 2 -2
The Back propagation algorithm in neural network computes the gradient of the loss
function for a single weight by the chain rule. It efficiently computes one layer at a
time, unlike a native direct computation. It computes the gradient, but it does not
define how the gradient is used. It generalizes the computation in the delta rule.
5. Travel back from the output layer to the hidden layer to adjust the weights
such that the error is decreased.
Static Back-propagation
Recurrent Backpropagation
1. Static back-propagation:
It is one kind of backpropagation network which produces a mapping of a static
input for static output. It is useful to solve static classification issues like optical
character recognition.
2. Recurrent Backpropagation:
Recurrent Back propagation in data mining is fed forward until a fixed value is
achieved. After that, the error is computed and propagated backward.
The main difference between both of these methods is: that the mapping is rapid in
static back-propagation while it is nonstatic in recurrent backpropagation.
Now, how error function is used in Backpropagation and how Backpropagation works?
Let start with an example and do it mathematically to understand how exactly updates
the weight using Backpropagation.
Input values
X1=0.05
X2=0.10
Initial weight
W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55
Bias Values
b1=0.35 b2=0.60
Target Values
T1=0.01
T2=0.99
Forward Pass
To find the value of H1 we first multiply the input value from the weights as
H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775
To calculate the final result of H1, we performed the sigmoid function as
H2=x1×w3+x2×w4+b1
H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925
Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and
H2.
To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2
from the weights as
y1=H1×w5+H2×w6+b2
y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597
y2=H1×w7+H2×w8+b2
y2=0.593269992×0.50+0.596884378×0.55+0.60
y2=1.2249214
Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our target
values T1 and T2.
Now, we will find the total error, which is simply the difference between the outputs
from the target outputs. The total error is calculated as
To update the weight, we calculate the error correspond to each weight with the help
of a total error. The error on weight w is calculated by differentiating total error with
respect to w.
From equation two, it is clear that we cannot partially differentiate it with respect to
w5 because there is no any w5. We split equation one into multiple terms so that we
can easily differentiate it with respect to w5 as
Now, we calculate each term one by one to differentiate Etotal with respect to w5 as
Putting the value of e-y in equation (5)
So, we put the values of in equation no (3) to find the final result.
Now, we will calculate the updated weight w5new with the help of the following formula
In the same way, we calculate w6new,w7new, and w8new and this will give us the following
values
w5new=0.35891648
w6new=408666186
w7new=0.511301270
w8new=0.561370121
Now, we will backpropagate to our hidden layer and update the weight w1, w2, w3,
and w4 as we have done with w5, w6, w7, and w8 weights.
From equation (2), it is clear that we cannot partially differentiate it with respect to w1
because there is no any w1. We split equation (1) into multiple terms so that we can
easily differentiate it with respect to w1 as
Now, we calculate each term one by one to differentiate Etotal with respect to w1 as
We again Split both because there is no any y1 and y2 term in E1 and E2. We split it as
Now, we find the value of by putting values in equation (18) and (19) as
We calculate the partial derivative of the total net input to H1 with respect to w1 the
same as we did for the output neuron:
So, we put the values of in equation (13) to find the final result.
Now, we will calculate the updated weight w1new with the help of the following formula
In the same way, we calculate w2new,w3new, and w4 and this will give us the following
values
w1new=0.149780716
w2new=0.19956143
w3new=0.24975114
w4new=0.29950229
We have updated all the weights. We found the error 0.298371109 on the network
when we fed forward the 0.05 and 0.1 inputs. In the first round of Backpropagation,
the total error is down to 0.291027924. After repeating this process 10,000, the total
error is down to 0.0000351085. At this point, the outputs neurons generate
0.159121960 and 0.984065734 i.e., nearby our target value when we feed forward the
0.05 and 0.1.
Associate Memory Network
These kinds of neural networks work on the basis of pattern association, which means
they can store different patterns and at the time of giving an output they can produce
one of the stored patterns by matching them with the given input pattern. These types
of memories are also called Content-Addressable Memory CAM. Associative
memory makes a parallel search with the stored patterns as data files.
In this condition, this type of memory is robust and fault-tolerant because of this type
of memory model, and some form of error-correction capability.
If the memory is produced with an input pattern, may say α, the associated
pattern ω is recovered automatically.
There are two types of associate memory –
Auto-associative memory:
This is a single layer neural network in which the input training vector and the output
target vectors are the same. The weights are determined so that the network stores a
set of patterns.
Architecture
Similar to Auto Associative Memory network, this is also a single layer neural
network. However, in this network the input training vector and the output target
vectors are not the same. The weights are determined so that the network stores a
set of patterns. Hetero associative network is static in nature, hence, there would be
no non-linear and delay operations.
Architecture
Hopfield Network
Hopfield network is a special kind of neural network whose response is different from
other neural networks. It is calculated by converging iterative process. It has just one
layer of neurons relating to the size of the input and output, which must be the same.
When such a network recognizes, for example, digits, we present a list of correctly
rendered digits to the network. Subsequently, the network can transform a noise input
to the relating perfect output.
In 1982, John Hopfield introduced an artificial neural network to collect and retrieve
memory like the human brain. Here, a neuron is either on or off the situation. The state
of a neuron(on +1 or off 0) will be restored, relying on the input it receives from the
other neuron. A Hopfield network is at first prepared to store various patterns or
memories. Afterward, it is ready to recognize any of the learned patterns by uncovering
partial or even some corrupted data about that pattern, i.e., it eventually settles down
and restores the closest pattern. Thus, similar to the human brain, the Hopfield model
has stability in pattern recognition.
A Hopfield network is a single-layered and recurrent network in which the neurons are
entirely connected, i.e., each neuron is associated with other neurons. If there are two
neurons i and j, then there is a connectivity weight wij lies between them which is
symmetric wij = wji . With zero self-connectivity, Wii =0
The Hopfield network is commonly used for auto-association and optimization tasks.
A Hopfield network which operates in a discrete line fashion or in other words, it can
be said the input and output patterns are discrete vector, which can be either
binary 0,10,1 or bipolar +1,−1+1,−1 in nature. The network has symmetrical weights
with no self-connections i.e., wij = wji and wii = 0.
Architecture
Following are some important points to keep in mind about discrete Hopfield
network −
This model consists of neurons with one inverting and one non-inverting
output.
The output of each neuron should be the input of other neurons but not the
input of self.
Weight/connection strength is represented by wij.
Connections can be excitatory as well as inhibitory. It would be excitatory, if
the output of the neuron is same as the input, otherwise inhibitory.
Weights should be symmetrical, i.e. wij = wji
Training Algorithm
Energy function Ef, also called Lyapunov function determines the stability of
discrete Hopfield network, and is characterized as follows −