0% found this document useful (0 votes)
6 views

Computational Intelligence

Uploaded by

juhi46125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Computational Intelligence

Uploaded by

juhi46125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 127

Computational Intelligence

From AI to Computational
intelligence
• Conventional AI involves methods characterized by
formalism and statistical analysis. This is also known as
symbolic AI, logical AI or neat AI. Methods include:
- Expert System: applies reasoning capabilities to reach a
conclusion.
- Case based reasoning: the process of solving new problems
based on the solutions of similar past problems.
- Bayesian reasoning: represents a set of variables together
with a joint probability distribution with explicit
independence assumptions.
- Behavior based AI: a modular method of building AI system
by hand.
Computational intelligence
• Computational intelligence involves iterative
development or learning. Learning is based on empirical
data. It is also known as non-Symbolic AI, Scruffy AI or
Soft Computing. Methods include:
- Neural Network: systems with very strong pattern
recognition capabilities.
- Fuzzy systems: techniques for reasoning under
uncertainty.
- Evolutionary Computation: applies biologically inspired
concepts such as populations, mutation and survival of
the fittest to generate better solutions to the problem.
- Hybrid intelligent systems: to combine above techniques.
Computational Paradigms

Computational Paradigms

Hard Computing Soft Computing

Precise Models Imprecise Models

Numerical Functional
Symbolic Logic Approximate
Modelling & optimization &
& Reasoning Reasoning
Search Random Search
Hard vs soft computing
• Hard computing:
This is the conventional methodology relies on
the principles of accuracy, certainty and
inflexibility. it is suitable for mathematical
problems.
• Soft computing:
It is a modern approach premised on the idea of
the approximation, uncertainty and flexibility.
What is soft computing

• Soft computing is the reverse of hard (conventional)


computing. It refers to a group of computational
techniques that are based on AI and natural
selection. It provides cost-effective solutions to the
complex real-life problems for which hard
computing solution does not exist.
• Zadeh coined the term of soft computing. The
objective of soft computing is to provide precise
approximation and quick solutions for complex
real-life problems.
Need of soft computing

• Conventional computing or analytical models does not provide a


solution to some real-world problems. In that case, technique like
soft computing are required to obtain an approximate solution.
• Hard computing is used for solving mathematical problems that
need a precise answer. It fails to provide solutions for some real-
life problems. Thereby for real-life problems whose precise
solution does not exist, soft computing helps.
• Analytical models can be used for solving mathematical problems
and valid for ideal cases. But the real-world problems do not
have an ideal case; these exist in a non-ideal environment.
• Soft computing is not only limited to theory; it also gives insights
into real-life problems.
• Like all the above reasons, Soft computing helps to map the
human mind, which cannot be possible with conventional
mathematical and analytical models.
Characteristics of Soft computing
• Soft computing provides an approximate but precise solution for
real-life problems.
• The algorithms of soft computing are adaptive, so the current
process is not affected by any kind of change in the environment.
• The concept of soft computing is based on learning from
experimental data. It means that soft computing does not require
any mathematical model to solve the problem.
• Soft computing helps users to solve real-world problems by
providing approximate results that conventional and analytical
models cannot solve.
• It is based on Fuzzy logic, genetic algorithms, machine learning,
ANN, and expert systems.
Elements of soft computing

• Soft computing is viewed as a foundation component for


an emerging field of conceptual intelligence. Fuzzy Logic
(FL), Machine Learning (ML), Neural Network (NN),
Probabilistic Reasoning (PR), and Evolutionary
Computation (EC) are the supplements of soft computing.
• Following are three main types of techniques used by soft
computing:
- Fuzzy Logic
- Artificial Neural Network (ANN)
- Genetic Algorithms
Differences Between Soft & Hard
Computing
Parameters Soft Computing Hard Computing
Programs It can be evolve its It requires program to be
own programs written
Computation Takes less computation Takes more computation
time time. time.
Dependency It depends on It is mainly based on
approximation and binary logic and numerical
dispositional. systems.
Computation Parallel computation Sequential computation
type
Accuracy It needs robustness It needs accuracy
Data It can deal with noisy It can only deal with exact
data data
Result/Output Approximate result Exact and precise result
Example Neural Networks, such Any numerical problem or
as Madaline, Adaline, traditional methods of
Art Networks. solving using personal
Introduction to Computational
intelligence
• Computational Intelligence (CI) is the theory, design,
application and development of biologically and
linguistically motivated computational paradigms.
• Computational Intelligence (CI) usually refers the
ability of computes to learn a specific task from data
on experimental observation.
• This includes mainly bottom-up approaches to
solutions of (hard) problems based on various
heuristics (soft computing), rather than exact
approaches of traditional artificial intelligence
based on logic (hard computing).
Advantages of Computational
intelligence
• Ability to deal with uncertainty
• Parallel approach
• Dealing with complexity
• Generating Novelty
• Low cost optimization
Paradigms of computational
intelligence
AI vs CI
Artificial neural networks
• Neural network: information processing paradigm
inspired by biological nervous systems, such as our
brain
• Structure: large number of highly interconnected
processing elements (neurons) working together
• Like people, they learn from experience (by example)
• Objective: to develop model/Computational device for
modeling the brain to perform computational tasks at
a faster rate than traditional systems such as pattern
matching, optimization and classification etc.
Definition of ANN
“Data processing system consisting of a large
number of simple, highly interconnected
processing elements (artificial neurons) in an
architecture inspired by the structure of the
cerebral cortex of the brain”

(Tsoukalas & Uhrig, 1997).


Inspiration from Neurobiology
Biological neural networks
Biological neural networks
• A biological neuron has three types of main
components; dendrites, soma (or cell body)
and axon.
• Dendrites receives signals from other neurons.
• The soma, sums the incoming signals. When
sufficient input is received, the cell fires; that
is it transmit a signal over its axon to other
cells.
Artificial Neurons

• ANN is an information processing system that has certain


performance characteristics in common with biological
nets.
• Several key features of the processing elements of ANN
are suggested by the properties of biological neurons:
1. The processing element receives many signals.
2. Signals may be modified by a weight at the receiving synapse.
3. The processing element sums the weighted inputs.
4. Under appropriate circumstances (sufficient input), the
neuron transmits a single output.
5. The output from a particular neuron may go to many other
neurons.
Artificial Neurons
• From experience:
examples / training data
• Strength of connection
between the neurons is
stored as a weight-value
for the specific
connection.
• Learning the solution to
a problem = changing
the connection weights
Artificial Neurons
• ANNs have been developed as generalizations of
mathematical models of neural biology, based on the
assumptions that:

1. Information processing occurs at many simple elements


called neurons.
2. Signals are passed between neurons over connection links.
3. Each connection link has an associated weight, which, in
typical neural net, multiplies the signal transmitted.
4. Each neuron applies an activation function to its net input
to determine its output signal.
Artificial Neurons
Differences between Biological NN
& ANN
Biological NN Artificial Neural Network
• Time of execution is in few • Time of execution is in
milliseconds. nanoseconds.
• Can perform massive parallel • Can perform operations
operations simultaneously. simultaneously but process is
• Neurons stores the faster than brain.
information in its • Information stored in its
interconnections. contiguous memory location.
• Possess fault tolerant • Has no fault tolerance.
capability. • Control mechanism is simple
• Control mechanism is
complex
Model Of A Neuron
Wa
X1

Wb Y
X2  f()

Wc
X3

Input units Connection Summing


computation
weights function

(dendrite) (synapse) (axon)


(soma)
Artificial Neurons
• A neural net consists of a large number of simple
processing elements called neurons, units, cells or
nodes.

• Each neuron is connected to other neurons by means


of directed communication links, each with
associated weight.

• The weight represent information being used by the


net to solve a problem.
Artificial Neurons
• Each neuron has an internal state, called its activation or activity
level, which is a function of the inputs it has received. Typically, a
neuron sends its activation as a signal to several other neurons.

• It is important to note that a neuron can send only one signal at a


time, although that signal is broadcast to several other neurons.

• Neural networks are configured for a specific application, such


as pattern recognition or data classification, through a learning
process.

• In a biological system, learning involves adjustments to the


synaptic connections between neurons
Artificial Neural Network
Synapse Nucleus

x1 w1
 
y
Axon
x2 w2 Activation Function:
yin = x1w1 + x2w2 (y-in) = 1 if y-in >= 
and (y-in) = 0

Dendrite
- A neuron receives input, determines the strength or the weight of the input, calculates the total
weighted input, and compares the total weighted with a value (threshold)

- The value is in the range of 0 and 1

- If the total weighted input greater than or equal the threshold value, the neuron will produce the
output, and if the total weighted input less than the threshold value, no output will be produced
Why we use Artificial Neural
Network
• Human brain perform many desirable characteristics which
are not present in Von-Neumann Computer. They include:
- Used to extract patterns and detect trends that are too
complex to be noticed
- Real time operation based on Massive parallelism
- Adaptive Learning ability
- Distributed representation and computation
- Self Organization
- Inherent(Natural) Contextual information processing
- Fault tolerance via redundant information coding
- Low energy consumption
Multidisciplinary point of view of NN

Neurobiology
Artificial
Intelligence

Mathematics
Cognitive (Approximation
Psychology
Neural
theory,
Optimization)

Network
Economics Physics
(Time Series,
data mining)
(Statistical
Physics)
Engineering
Linguistic (Image/Signal
Processing)
History of ANN
• 1943 McCulloch-Pitts neurons
• 1949 Hebb’s law
• 1958 Perceptron (Rosenblatt)
• 1960 Adaline, better learning rule (Widrow,
Huff)
• 1969 Limitations (Minsky, Papert)
• 1972 Kohonen nets, associative memory
• 1977 Brain State in a Box (Anderson)
• 1982 Hopfield net, constraint satisfaction
• 1986 Backpropagation (Rumelhart, Hinton,
McClelland)
• 1987-1990 Adaptive Resonance Theory
(Carpenter and Grossberg)
• 1988 Neocognitron, character recognition
(Fukushima), Radial Basis Function Network
(Broomhead and Lowe)
Basic models of ANN
 The models of ANN are specified by three basic
entities namely:
- The model’s synaptic interconnections: a pattern of
connections between neurons
- The training or learning rules adopted for updating
and adjusting the connections weights: a method of
determining the connection weights
- Their activation functions: Function to compute
output signal from input signal
Connections:
• An ANN consists of a set of highly interconnected processing
elements(neurons) such that each processing element is found
to be connected through weights to the other processing
elements or to itself.
• The arrangement of neurons to form layers and the connection
pattern formed within and between layers is called the network
architecture.
• Neural network are classified into single-layer or multilayer
neural nets.
• A layer is formed by taking a processing element and
combining it with other processing elements.
• Practically, a layer implies a stage, going stage by stage i.e.
the input stage and output stage are linked with each other.
• These linked interconnections lead to the formation of various
network architecture.
Neuron connection architecture
• Five types of neuron connection architectures:
- Single-layer feed-forward network
- Multilayer feed-forward network
- Single node with its own feedback
- Single-layer recurrent network
- Multilayer recurrent network
• A network is said to be feed –forward if no neuron in output layer
is an input to a node in the same layer or in the preceding layer.
Or A neural network that does not contain cycles(feedback loops) is
called feed-forward network or perceptron.
• And when outputs can be directed back as inputs to same or
preceding layer nodes then it results in the formation of feedback
networks.
Single-layer feed forward
• Layer is formed by taking
processing elements and
combining it with other
processing elements.
• Input and output are
linked with each other.
• Inputs are connected to
the processing nodes with
various weights, resulting
in series of output one per
node
Multilayer feed forward network
•A multilayer feed-forward network is formed by interconnection of several
layer.
• The input layer is that which receives the input and buffer the input signal.
•The output layer generate the output of the network and the layer formed
between the input and output layer is called hidden layer.
•The hidden layer is internal to the network and has no direct contact with
external environment.
•In case of fully connected network, every output from one layer is
connected to each and every node in the next layer.
Single node with its own feedback
• If the feedback of the output
of the processing element is
directed back as input to the
processing elements in the
same layer then it is called
lateral feedback.
• Recurrent networks are
feedback networks.
• The figure shows a simple
recurrent neural network
having a single neuron with
feedback to itself.
Single-layer recurrent network
• Single layer recurrent
network is a feedback
connection in which a
processing element’s
output can be directed
back to the processing
element itself or to the
other processing
element or to both.
Multilayer recurrent network
• A processing element
output can be directed
back to the nodes in a
preceding layer.
• Also, a processing
element output can be
directed back to the
processing element itself
and to other processing
elements in the same
layer.
on-center-off-surround or Lateral
inhibition structure
• In this structure, each
processing neuron
receives two different
classes of inputs:
 Excitatory- input
from nearby
processing elements.
 Inhibitory- input from
more distantly located
processing elements.
Learning
• Capability to learn is the main property of ANN.
• Learning or training is a process by which a neural network
adapts itself to a stimulus by making proper parameter
adjustments, resulting in the production of desired response.
• Types of learning:
- Parameter/weighted learning
- Structural learning
- Supervised learning
- Unsupervised learning
- Reinforcement learning
Parameter & structure Learning
• Parameter learning:
The learning that is used to update the
connecting weights in a neural net.
• Structure learning:
It focuses on the change in network structure
which includes the number of processing
elements as well as their connection types.
Supervised Learning
• During the training of ANN under
supervised learning, the input vector
is presented to the network, which will
give an output vector.
• This output vector is compared with
the desired output vector.
• An error signal is generated, if there is
a difference between the actual output
and the desired output vector.
• On the basis of this error signal, the
weights are adjusted until the actual
output is matched with the desired
output.
• This type of learning is done under the
supervision of a teacher. This learning
process is dependent.
Unsupervised learning
• During the training of ANN under
unsupervised learning, the input vectors
of similar type are combined to form
clusters.
• When a new input pattern is applied, then
the neural network gives an output
response indicating the class to which the
input pattern belongs.
• There is no feedback from the
environment as to what should be the
desired output and if it is correct or
incorrect.
• Hence, in this type of learning, the
network itself must discover the patterns
and features from the input data, and the
relation for the input data over the output.
• This type of learning is done without the
supervision of a teacher.
• This learning process is independent.
Reinforcement learning
• This type of learning is used to
reinforce or strengthen the network
over some critic information.
• During the training of network
under reinforcement learning, the
network receives some feedback
from the environment.
• This makes it somewhat similar to
supervised learning. However, the
feedback obtained here is evaluative
not instructive, which means there is
no teacher as in supervised
learning.
• After receiving the feedback, the
network performs adjustments of the
weights to get better critic
information in future.
comparison
Activation Function
• The activation function is used to calculate the out response
of a neuron
• The sum of the weighted input signal is applied with an
activation to obtain the response
• The activation function are linear, threshold or non-linear
Linear: The output is proportional to the total weighted
input.
Threshold: The output is set at one of two values, depending
on whether the total weighted input is greater than or less
than some threshold value.
Non‐linear: The output varies continuously but not linearly
as the input changes.
Activation function
• It may be defined as the extra force or effort applied over the input to obtain an exact output.
In ANN, we can also apply activation functions over the input to get the exact output.
Followings are some activation functions of interest −
 Linear Activation Function:
It is also called the identity function as it performs no input editing. It can be defined as −

F(x)=x
 Sigmoid Activation Function:
It is of two type as follows −
• Binary sigmoidal function − This activation function performs input editing between 0 and 1.
It is positive in nature. It is always bounded, which means its output cannot be less than 0 and
more than 1. It is also strictly increasing in nature, which means more the input higher would
be the output. It can be defined as

Bipolar sigmoidal function − This activation function performs input editing between -1 and
1. It can be positive or negative in nature. It is always bounded, which means its output
cannot be less than -1 and more than 1. It is also strictly increasing in nature like sigmoid
function. It can be defined as
Activation Function
Step function Sign function Sigmoid function Linear function

Y Y Y Y
+1 +1 1 1

0 X 0 X 0 X 0 X
-1 -1 -1 -1

1, if X 0 sign 1, if X 0 sigmoid 1


step
Y  Y  Y  Y linearX
0, if X  0  1, if X  0 1 e X
Activation function
 Binary step function: this function can be defined as

where Θ represents the threshold value. This function is used


in single layer nets to convert the net input to an output that is
binary (1 or 0).
 Bipolar step function: this function can be defined as

where Θ represents the threshold value. This function is used


in single layer nets to convert the net input to an output that is
binary(+1 or -1).
 Ramp function: the ramp function is defined as
Weight
• In the architecture of ANN, each neuron is
connected with other neurons by means of direct
communication link and each communication
link is associated with weight
• Weight contains information about input signal
• This information is used by the net to solve the
problem
• The weight can be represented in terms of matrix
• Weight matrix can also called connection matrix
Weight Matrix
 T
 w1   w11w12 w13...w1m 
 wT 
 2  
 w 21w 22 w 23...w 2 m 
 wT 
 3
. 
W= .  =  .................. 

. 

 

 .


 ................... 
.   
 T
 wn   w n1w n 2 w n 3...w nm 
 
Bias
• The bias included in the network has its impact in
calculating the net input.
• A bias acts exactly as a weight on a connection
whose input signal is always 1.
• Bias is a constant which helps the model in a
way that it can fit best for the given data. In other
words, Bias is a constant which gives freedom to
perform best.
• The bias improve the performance of the neural
network
Why Bias is required?
• The relationship between input and output
given by the equation of straight line y=mx+c

C(bias)

Input X Y y=mx+C
Terminologies of ANN
• Weights: The weights contain information about the input signal. This
information is used by the net to solve a problem. Weight can be
represented in terms of matrix known as connection matrix.
• Bias: The bias included in the network has its impact in calculating the
net input. The bias is included by adding a component xₒ=1 to the input
vector x. thus, the input vector becomes
X= (1, X₁,…..Xi,…Xn)
The bias can be of two types: positive and negative. The positive bias helps
in increasing the net input of network and negative bias helps in
decreasing the net input of network.
• Threshold: it is a set value based upon which the final output of the
network may be calculated. Threshold value is used in activation function.
• Learning Rate: It is used to control the amount of weight adjustment at
each step of training. The learning rate, ranging from 0 to 1, determines
the rate of learning at each time step. The learning rate is denoted by ‘α’.
Terminologies of ANN
• Momentum factor: Convergence is made faster if a
momentum factor is added to the weight updation process. If
momentum has to be used, the weights from one or more
previous training pattern must be saved. It helps the network
in large weight adjustment until the corrections are in the
same general direction for several patterns. It is used in
back propagation network.
• Vigilance Parameter: It is used in ART network. It is used to
control the degree of similarity required for patterns to be
assigned to the same cluster unit. It ranges from 0.7 to 1 to
perform useful work in controlling the number of clusters. It
is denoted by ‘ρ’.
Neural Network Learning Rules
• Being a complex adaptive system, learning in ANN implies that
a processing unit is capable of changing its input/output
behavior due to the change in environment.
• The importance of learning in ANN increases because of the
fixed activation function as well as the input/output vector,
when a particular network is constructed. Now to change the
input/output behavior, it is needed to adjust the weights.
• During ANN learning, to change the input/output behavior, it is
needed to adjust the weights. Hence, a method is required with
the help of which the weights can be modified. These methods
are called Learning rules, which are simply algorithms or
equations.
McCulloch-Pitts Neuron Model
(M-P Neuron Model)
• The first mathematical model of a biological neuron was given by
McCulloch-Pitts in 1943
• This model does not exhibit any learning but serve as a basic
building block which is inspired further significant work in NN
research
• The M-P neurons are connected by directed weight path
• The connected path can be excitatory or inhibitory
• Excitatory neurons have positive weights and inhibitory neurons
have negative weights
• There will be same weights for excitatory connections entering
into a particular neuron
• The neuron is associated with the threshold value
• The neurons fires if the net input is grater than the threshold value
McCulloch-Pitts Neuron Model

x1
w
x2 w
xn w Y

-p
x n+1
-p
xn+2
-p

xn+m
• The activation of a McCulloch Pitts neuron is binary.
• Each neuron has a fixed threshold:
– f(yin) = 1 if yin >= θ
0 if yin < θ
• The threshold is set so that inhibition is absolute.
Example of McCulloch-Pitts Model

• Generate the output of Logic AND Function using


MP Model
Ans: The AND fn returns a true value only if both the
input are true else it returns a false value
Truth table of AND fn is
X1 x2 y
x1
1 1 1 1 y
1 0 0 1
x2
0 1 0
0 0 0
The threshold on unit y is 2
The output y = f(yin)
The net input is given as
yin =  weight input
= 1× x1 + 1× x2 = x1 + x2
From this the activation of output neuron can be
formed as
1 if y in 2
Y = f(yin) = 
0 if y in  2
Hebbian Learning Rule

• This rule, one of the oldest and simplest, was introduced by


Donald Hebb in 1949. It is a kind of feed-forward,
unsupervised learning.
Basic Concept − This rule is based on a proposal given by Hebb,
who wrote −
• “When an axon of cell A is near enough to excite a cell B and
repeatedly or persistently takes part in firing it, some growth
process or metabolic change takes place in one or both cells
such that A’s efficiency, as one of the cells firing B, is
increased.”
• On the basis of above postulate, we can conclude that the
connections between two neurons might be strengthened if the
neurons fire at the same time and might weaken if they fire at
different times.
Hebbian Learning Rule
Mathematical Formulation − According to Hebbian
learning rule, following is the formula to increase the
weight of connection at every time step.
Δwji(t)=αxi(t).yj(t)
Here, Δwji(t)) ⁡= increment by which the weight of
connection increases at time step t
α = the positive and constant learning rate
xi(t) = the input value from pre-synaptic neuron at time
step t
yi(t) = the output of pre-synaptic neuron at same time step t
Perceptron Learning Rule

• This rule is an error correcting the supervised learning


algorithm of single layer feed-forward networks with linear
activation function, introduced by Rosenblatt.
• Basic Concept − As being supervised in nature, to calculate
the error, there would be a comparison between the
desired/target output and the actual output. If there is any
difference found, then a change must be made to the weights
of connection.
• Mathematical Formulation − To explain its mathematical
formulation, suppose we have ‘n’ number of finite input
• vectors, x n, along with its desired/target output vector t n,
where n = 1 to N.
• Now the output ‘y’ can be calculated on the basis of the net
input, and activation function being applied over that net
The updating of weight can be done in the
following two cases:
Case I − when t ≠ y, then
w(new)=w(old)+tx
Case II − when t = y, then
No change in weight
Delta Learning Rule(Widrow-hoff rule)
• It is introduced by Bernard Widrow and Marcian Hoff, also called
Least Mean Square LMS method, to minimize the error over all
training patterns. It is kind of supervised learning algorithm with
having continuous activation function.
• Basic Concept − The base of this rule is gradient-descent approach,
which continues forever. Delta rule updates the synaptic weights so
as to minimize the net input to the output unit and the target value.
• Mathematical Formulation − To update the synaptic weights, delta
rule is given by
Δwi=α.xi.ej
Here Δwi = weight change for i th ⁡pattern;
α= the positive and constant learning rate;
xi = the input value from pre-synaptic neuron;
ej = (t−yin), the difference between the desired/target output and the
actual output ⁡yin
The updating of weight can be done in the
following two cases −
• Case-I − when t ≠ y, then
w(new)=w(old)+Δww(new)=w(old)+Δw
• Case-II − when t = y, then
No change in weight
Competitive Learning rule

• It is concerned with unsupervised


training in which the output nodes
try to compete with each other to
represent the input pattern.

• Basic Concept of Competitive


Network − This network is just
like a single layer feedforward
network with feedback connection
between outputs. The connections
between outputs are inhibitory
type, shown by dotted lines, which
means the competitors never
support themselves.
• Basic Concept of Competitive Learning Rule − There will be a
competition among the output nodes. Hence, the main concept is that
during training, the output unit with the highest activation to a given input
pattern, will be declared the winner. This rule is also called Winner-takes-
all because only the winning neuron is updated and the rest of the neurons
are left unchanged.
• Mathematical formulation − important factors for mathematical
formulation of this learning rule −
• Condition to be a winner − Suppose if a neuron yk⁡⁡wants to be the winner
then there would be the following condition −

• It means that if any neuron, say yk⁡ , wants to win, then its induced local
field the output of summation unit, say vk, must be the largest among all
the other neurons in the network.
• Condition of sum total of weight − Another constraint over
the competitive learning rule is, the sum total of weights to
a particular output neuron is going to be 1. For example, if
we consider neuron k then −

• Change of weight for winner − If a neuron does not


respond to the input pattern, then no learning takes place in
that neuron. However, if a particular neuron wins, then the
corresponding weights are adjusted as follows:

Here α is the learning rate.


Outstar Learning Rule

• This rule, introduced by Grossberg, is concerned with


supervised learning because the desired outputs are known.
It is also called Grossberg learning.
• Basic Concept − This rule is applied over the neurons
arranged in a layer. It is specially designed to produce a
desired output d of the layer of p neurons.
• Mathematical Formulation − The weight adjustments in
this rule are computed as follows
Δwj=α(d−wj)
• Here d is the desired neuron output and α is the learning
rate.
Perceptron
• Developed by Frank Rosenblatt
by using McCulloch and Pitts
model, perceptron is the basic
operational unit of artificial
neural networks. It employs
supervised learning rule and is
able to classify the data into two
classes.
• Operational characteristics of
the perceptron: It consists of a
single neuron with an arbitrary
number of inputs along with
adjustable weights, but the output
of the neuron is 1 or 0 depending
upon the threshold. It also schematic representation of the
consists of a bias whose weight is perceptron.
always 1.
Perceptron has following three basic elements:
• Links: It would have a set of connection links, which carries a
weight including a bias always having weight 1.
• Adder: It adds the input after they are multiplied with their
respective weights.
• Activation function: It limits the output of neuron. The most basic
activation function is a Heaviside step function that has two
possible outputs. This function returns 1, if the input is positive,
and 0 for any negative input.
Training Algorithm:
• Perceptron network can be trained for single output unit as well
as multiple output units.
Training Algorithm for Single Output Unit

Step 1 − Initialize the following to start the training −


- Weights - Bias - Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0
and the learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training vector x.
Step 4 − Activate each input unit as follows −

Step 5 − Now obtain the net input with the following relation −

Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the final output.
Step 7 − Adjust the weight and bias as follows −
Case 1 − if y ≠ t then,
• wi(new)=wi(old)+αtxi
• b(new)=b(old)+αt
Case 2 − if y = t then,
• wi(new)=wi(old)
• b(new)=b(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which would happen
when there is no change in weight.
Training Algorithm for Multiple Output Units

Step 1 − Initialize the following to start the training −


Weights , Bias, Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training vector x.
Step 4 − Activate each input unit as follows −
xi=si(i=1 to n)
Step 5 − Obtain the net input with the following relation −

Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the final output for each
output unit j = 1 to m −
Step 7 − Adjust the weight and bias
for x = 1 to n and j = 1 to m as
follows −
Case 1 − if yj ≠ tj then,
wij(new)=wij(old)+αtjxi
bj(new)=bj(old)+αtj
Case 2 − if yj = tj then,
wij(new)=wij(old)
bj(new)=bj(old)
Here ‘y’ is the actual output and ‘t’ is
the desired/target output.
Step 8: − Test for the stopping
condition, which will happen when
there is no change in weight.
Adaptive Linear Neuron (Adaline)

• Adaline stands for Adaptive Linear


Neuron, is a network having a single
linear unit. It was developed by Widrow
and Hoff in 1960. Some important
points about Adaline are as follows −
• It uses bipolar activation function.
• It uses delta rule for training to
minimize the Mean-Squared Error
(MSE) between the actual output and
the desired/target output.
• The weights and the bias are
adjustable.
Architecture:
• The basic structure of Adaline is
similar to perceptron having an extra
feedback loop with the help of which
the actual output is compared with the
desired/target output. After comparison
on the basis of training algorithm, the
weights and bias will be updated.
Training Algorithm

Step 1 − Initialize the following to start the training −


Weights Bias Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to
0 and the learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every bipolar training pair s:t.
Step 4 − Activate each input unit as follows −

Step 5 − Obtain the net input with the following relation −

Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the final output

Step 7 − Adjust the weight and bias as follows −
Case 1 − if y ≠ t then,
wi(new)=wi(old) + α(t−yin)xi
b(new)=b(old) + α(t−yin)
Case 2 − if y = t then,
wi(new)=wi(old)
b(new)=b(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
(t−yin) is the computed error.
Step 8 − Test for the stopping condition, which will happen when there
is no change in weight or the highest weight change occurred during
training is smaller than the specified tolerance.
Multiple Adaptive Linear Neuron (Madaline)

• Madaline stands for Multiple Adaptive


Adalines in parallel. It will have a single
output unit.
• It is just like a multilayer perceptron,
where Adaline will act as a hidden unit
between the input and the Madaline layer.
• The weights and the bias between the input
and Adaline layers are adjustable.
• The Adaline and Madaline layers have
fixed weights and bias of 1.
• Training can be done with the help of
Delta rule.
• Architecture
• The architecture of Madaline consists
of “n” neurons of the input
layer, “m” neurons of the Adaline layer,
and 1 neuron of the Madaline layer. The
Adaline layer can be considered as the
hidden layer as it is between the input
layer and the output layer, i.e. the
Madaline layer.
Training Algorithm

 We know that only the weights and bias between the input and the
Adaline layer are to be adjusted, and the weights and bias between the
Adaline and the Madaline layer are fixed.
Step 1 − Initialize the following to start the training −
Weights Bias Learning rate α
For easy calculation and simplicity, weights and bias must be set equal
to 0 and the learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-7 for every bipolar training pair s:t.
Step 4 − Activate each input unit as follows −
xi=si(i=1 to n)
Step 5 − Obtain the net input at each hidden layer, i.e. the Adaline layer
with the following relation −
Step 6 − Apply the following activation function to obtain the final output at the
Adaline and the Madaline layer −

Output at the hidden (Adaline) unit

Final output of the network

Step 7 − Calculate the error and adjust the weights as follows −


Case 1 − if y ≠ t and t = 1 then,
wij(new)=wij(old)+α(1−Qinj)xi
bj(new)=bj(old)+α(1−Qinj)
In this case, the weights would be updated on Qj where the net input is close to
0 because t = 1.
Case 2 − if y ≠ t and t = -1 then,
wik(new)=wik(old)+α(−1−Qink)xi
bk(new)=bk(old)+α(−1−Qink)
In this case, the weights would be updated on Qk where the net
input is positive because t = -1.
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Case 3 − if y = t then
There would be no change in weights.
Step 8 − Test for the stopping condition, which will happen when
there is no change in weight or the highest weight change
occurred during training is smaller than the specified
tolerance.
Back Propagation Neural Networks

• Back Propagation Neural (BPN) is a


multilayer neural network consisting of
the input layer, at least one hidden layer
and output layer.
• As its name suggests, back propagating
will take place in this network. The error
which is calculated at the output layer, by
comparing the target output and the actual
output, will be propagated back towards
the input layer.
Architecture:
• The architecture of BPN has three
interconnected layers having weights on
them.
• The hidden layer as well as the output
layer also has bias, whose weight is
always 1, on them.
• The working of BPN is in two phases:
One phase sends the signal from the
input layer to the output layer, and the
other phase back propagates the error
from the output layer to the input layer.
Training Algorithm

• For training, BPN will use binary sigmoid activation function. The training of BPN will
have the following three phases.
Phase 1 − Feed Forward Phase
Phase 2 − Back Propagation of error
Phase 3 − Updating of weights
All these steps will be concluded in the algorithm as follows
Step 1 − Initialize the following to start the training −
Weights Learning rate α
For easy calculation and simplicity, take some small random values.
Step 2 − Continue step 3-11 when the stopping condition is not true.
Step 3 − Continue step 4-10 for every training pair.
Phase 1
Step 4 − Each input unit receives input signal xi and sends it to the hidden unit for all i = 1 to n
Step 5 − Calculate the net input at the hidden unit using the following relation −

• Here b0j is the bias on hidden unit, vij is the weight on j unit of the hidden layer coming
from i unit of the input layer.
Now calculate the net output by applying the following activation
function

Send these output signals of the hidden layer units to the output
layer units.
Step 6 − Calculate the net input at the output layer unit using the
following relation −

• Here b0k ⁡is the bias on output unit, wjk is the weight on k unit of
the output layer coming from j unit of the hidden layer.
• Calculate the net output by applying the following activation
function
Phase 2
Step 7 − Compute the error correcting term, in correspondence with the target
pattern received at each output unit, as follows −
On this basis, update the weight and bias as follows −

Then, send δk back to the hidden layer.


Step 8 − Now each hidden unit will be the sum of its delta inputs from the
output units. Then calculate the error term and update the weight and bias
Phase 3
Step 9 − Each output unit (ykk = 1 to m) updates the weight and bias as follows −
vjk(new)=vjk(old)+Δvjk
b0k(new)=b0k(old)+Δb0k
Step 10 − Each output unit (zjj = 1 to p) updates the weight and bias as follows −
wij(new)=wij(old)+Δwij
b0j(new)=b0j(old)+Δb0j
Step 11 − Check for the stopping condition, which may be either the number of
epochs reached or the target output matches the actual output.
Learning factors of Back-Propagation Network:
- Initial Weights
- Learning Rate
- Momentum factor
- Generalization
- Number of Training data
- Number of Hidden layer nodes
Merits of BPN
1. The mathematical formula can be applied to
any network and does not require any
special mention of the features of the
function to be learnt.
2. The computing time is reduced if weights
chosen are small at the beginning.
3. The bias update of weight exist, which
provides a smoothing effect on the weight
correction term.
Demerits of BPN
1. The number of learning steps may be high and
also the learning phase has intensive calculation
2. The selection of the no. of hidden nodes in the
network is a problem.
- if the no. of hidden neurons is small, then the
function to be learnt may not be possibly
represented as the capacity of the network is
small
- if the no. of hidden neurons increases, it
increases the complexity of the network
3. For complex problems, it may require days or
weeks to train the network or it may not train
the network at all. Long training time results
in non-optimum step size
4. The network may get trapped in a local
minima even through there is a much deeper
minimum nearby
Applications of BPN
1. Optical character recognition
2. Image compression
3. Data compression
4. Control problems
5. Fault detection problems
Generalized Delta Learning Rule
Delta rule works only for the output layer. On the other hand,
generalized delta rule, based on back-propagation rule, is a
way of creating the desired values of the hidden layer.
Mathematical Formulation
For the activation function yk=f(yink) the derivation of net
input on Hidden layer as well as on output layer can be
given by
yink=∑iziwjk
And yinj=∑ixivij
• After that weights are to be updated if needed and chaining
rule is used to get the output.
Tree neural Network
• TNN are used for pattern recognition.
• The main concept of this network is to use a small multilayer NN at each
decision-making node of a binary classification tree for extracting the
non-linear features.
• TNN extract the power of tree classifiers for using appropriate local
features at different levels and nodes of the tree.
Algorithm for a TNN consists of two phases:
1.) Tree growing phase: A large tree is grown by recursively finding the rules
for splitting until all the terminal nodes have pure or nearly pure class
membership, else it cannot split further.
2.) Tree pruning phase: A small tree is being selected from the pruned sub-
tree to avoid the overfilling of data
 In the inner optimization problem, BPN algorithm is used to train the data.
 In outer optimization problem, a heuristic search method is used to find a
good pair of classes
Max Net

• This is also a fixed weight network,


which serves as a subnet for
selecting the node having the highest
input. All the nodes are fully
interconnected and there exists
symmetrical weights in all these
weighted interconnections.
• It uses the mechanism which is an
iterative process and each node
receives inhibitory inputs from all
other nodes through connections.
The single node whose value is
maximum would be active or winner
and the activations of all other nodes
would be inactive. Max Net uses
identity activation function.
• The task of this net is accomplished
by the self-excitation weight of +1
and mutual inhibition magnitude
Neocognitron

• It is a multilayer feedforward network,


based on supervised learning and is
used for visual pattern recognition,
mainly hand-written characters.
Architecture
• It is a hierarchical network, which
comprises many layers and there is a
pattern of connectivity locally in those
layers.
• As in the diagram, neocognitron is
divided into different connected layers
and each layer has two cells which are
as follows −
• S-Cell − It is called a simple cell,
which is trained to respond to a
particular pattern or a group of
patterns.
• C-Cell − It is called a complex cell,
which combines the output from S-cell
and simultaneously lessens the number
of units in each array. In another sense,
Training Algorithm
• Training of neocognitron is found to be progressed layer by layer. The
weights from the input layer to the first layer are trained and frozen.
Then, the weights from the first layer to the second layer are trained,
and so on. The internal calculations between S-cell and C-cell depend
upon the weights coming from the previous layers. Hence, the
training algorithm depends upon the calculations on S-cell and C-cell.
Calculations in S-cell
• The S-cell possesses the excitatory signal received from the previous
layer and possesses inhibitory signals obtained within the same layer.
Calculations in C-cell
• The net input of C-layer is
C=∑sixi
Adaptive Resonance Theory

• It is based on competition and uses


unsupervised learning model. Adaptive
Resonance Theory (ART) networks, as the name
suggests, is always open to new
learning adaptive without losing the old
patterns resonance.
• Basically, ART network is a vector classifier
which accepts an input vector and classifies it
into one of the categories depending upon which
of the stored pattern it resembles the most.
Operating Principal
The main operation of ART classification can be divided into the
following phases:
• Recognition phase − The input vector is compared with the
classification presented at every node in the output layer. The
output of the neuron becomes “1” if it best matches with the
classification applied, otherwise it becomes “0”.
• Comparison phase − In this phase, a comparison of the input
vector to the comparison layer vector is done. The condition for
reset is that the degree of similarity would be less than vigilance
parameter.
• Search phase − In this phase, the network will search for reset as
well as the match done in the above phases. Hence, if there would
be no reset and the match is quite good, then the classification is
over. Otherwise, the process would be repeated and the other
ART1

• It is a type of ART, which is designed to cluster binary vectors.

Architecture of ART1
It consists of the following two units −
Computational Unit − It is made up of the following :
• Input unit (F1 layer) − It further has the following two portions −
– F1a layer Input portion − In ART1, there would be no processing in this portion rather
than having the input vectors only. It is connected to F1b layer interface portion.
– F1b layer Interface portion − This portion combines the signal from the input portion
with that of F2 layer. F1bb layer is connected to F2 layer through bottom up
weights bij and F2 layer is connected to F1b layer through top down weights tji.
• Cluster Unit (F2 layer) − This is a competitive layer. The unit having the
largest net input is selected to learn the input pattern. The activation of all other
cluster unit are set to 0.
• Reset Mechanism − The work of this mechanism is based upon the similarity
between the top-down weight and the input vector. Now, if the degree of this
similarity is less than the vigilance parameter, then the cluster is not allowed to
learn the pattern and a rest would happen.
• Supplement Unit − Actually
the issue with Reset mechanism
is that the layer F2 must have to
be inhibited under certain
conditions and must also be
available when some learning
happens. That is why two
supplemental units
namely, G1 and G2 is added
along with reset unit, R. They
are called gain control units.
These units receive and send
signals to the other units present
in the network. ‘+’ indicates an
excitatory signal,
while ‘−’ indicates an inhibitory
signal.
Associative Memory Network
• Associative memory network can store a set of
patterns as memories.
• It is presented with a key pattern and responds by
producing one of the stored patterns, which
closely resembles or relates to the key pattern
• Thus, the recall is through association of the key
pattern, with the help of information memorized.
• These types of memories are also called as
content addressable memory.
• Pattern association is the process of forming
association between related pattern.
• The pattern that has to be associated may be of
same type or of a different type.
• Associative memory net are simplified model
of a human brain which can associate similar
pattern.
• Associative neural nets are single layer nets in
which the weights are determined to store an
asset of pattern association.
• Associative nets are of two types
- If the input vector pair is same as the output
vector pair, then it results in Auto Associative
Net
- If the input vector pair is different from that of
output vector pair then it is Hetro Associative
Net.
Hopfield Network
• Neural networks were designed on analogy with
the brain.
• The brain’s memory, however, works by
association.
• For example, we can recognize a familiar face even
in an unfamiliar environment within 100-200 ms.
• We can also recall a complete sensory experience,
including sounds and scenes, when we hear only a
few bars of music. The brain routinely associates
one thing with another.
Hopfield Network
• Multilayer neural networks trained with the back-
propagation algorithm are used for pattern
recognition problems.
• However, to emulate the human memory’s
associative characteristics we need a different type
of network: a recurrent neural network.
• A recurrent neural network has feedback loops
from its outputs to its inputs. The presence of such
loops has a profound impact on the learning
capability of the network.
• The Hopfield network is probably the second
most popular type of neural network after the
back-propogation model.
• It is based on Hebbian learning but uses
binary neurons.
• The Hopfield network can be used as
- Associative Memory
-Optimization Problems
• The basic idea of Hopfield network is that
it can store a set of exemplar patterns as
multiple stable states.
• Given a new input pattern which may be
partial or noisy, the network can converge to
one of the exemplar pattern that is nearest to
the input pattern.
• This is the basic concept of applying the
Hopfield network as associative memory.
Single-layer n-neuron Hopfield network

x1 1 y1
Input Signals

Output Signal
x2 2 y2

xi i yi

xn n yn

s
• Hopfield network consists of a single layer of
neurons 1, 2, 3, 4……n
• The network is fully interconnected that is
every neurons in the network is connected to
every other neuron.
• The network is recurrent that is it has feed
backward capabilities.
• Each input/output, xi , yj takes a discrete
bipolar values either 1 or -1
• Each edge is associated by weight, wij which
satisfies the following conditions
-The net has symmetrical weights with no self
connections i.e. diagonals elements of the
weights matrix of a Hopfield net are zero
i.e. wij = wji and wii =0
-Hopfield network is classified under
supervised learning since at the beginning it is
given correct exemplar pattern by a teacher
Algorithm
• The weights to be used for the application algorithm
are obtained from the training algorithm
• The activation are used for input vector
• The net input is calculated and applying the
activation function.
• The output is calculated
• The output is broadcasted to all other units
• The process is repeated until the convergence of net
is obtained
Step 1: Initialize weights to store pattern (use Hebb Rule)
while activation of net are not converge perform step 2 to 8
Step 2: For each input vector x, repeat step 3 to 7
Step 3: Set initial activation of the net equal to the external
input vector x
yi = xi (i=1,2,3…n)
Step 4: Perform steps 5 to 7 for each yi
Step 5: Computer the net input
yin i = xi + summation of yj wji
Step 6: Determine activation (output signal)
yi = 1 if yin i > theta
yi if yin i = theta
0 if yin i < theta
Step 7: Broadcast the value of yi to all other
units
Step 8: Test for convergence
The value of threshold theta is usually taken to
be zero
The Hopfield Network
•Example: Image reconstruction
•A 2020 discrete Hopfield network was trained with 20
input patterns, including the one shown in the left figure and
19 random patterns as the one on the right.
The Hopfield Network
•After providing only one fourth of the “face” image
as initial input, the network is able to perfectly
reconstruct that image within only two iterations.
Problems with Hopfield Model

• It can only store a very limited number of


different patterns.
• The Hopfield model constitutes an interesting
neural approach to identifying partially occluded (close
off)objects and objects in noisy images.
KOHONEN MODEL
• The Kohonen model is used for
- Self Organization
- Competitive Learning
• Human learning is not limited to supervised and reinforcement learning
• For example:
A baby gains a tremendous knowledge early on, such as how mom,
dad and other objects around the baby look, sound and smell. The
baby does not learn this by being told what is correct and what is not.
• In other words, human have the ability to learn without being
supervised or graded.
• Such type of learning is called unsupervised learning
• The term unsupervised learning refers to a method in which the neural
network can learn by itself without external information during its
learning.
Self-Organising feature map
• Our brain is dominated by the cerebral cortex (outer
layer of brain), a very complex structure of billions of
neurons and hundreds of billions of synapses.
• The cortex includes areas that are responsible for
different human activities (motor, visual, auditory,
sensory, etc.), and associated with different sensory
inputs.
• In brain each sensory input is mapped into a
corresponding area of the cerebral cortex.
• The cortex is a self-organising computational map
in the human brain.
• Self Organization is an unsupervised learning method,
where neural network organizes itself to form useful
information.
• Competitive Learning
• In Competitive learning, neurons or connecting edges
compete with each other.
• The winner of the competition strengthen their weights
while the loser’s weights are unchanged or weakened
• By applying competitive learning, the neural network
can self-organize and achieve unsupervised learning
Architecture of Kohonen Network

O u t p u t Si g n a l s
y1
I n p u t Si g n a l s

x1
y2
x2
y3

Input Output
layer layer
Kohonen Network
• It is a network of two layers, the first is the input layer
and second is the output layer called Kohonen Layer
• The neurons on the Kohonen layer are called Kohonen
Neurons
• Every input neurons are connected to every Kohonen
neurons with a variable associated weight.
• The network is feed-forward
• Input values representing patterns are presented
sequentially in time through the input layer, without
specifying desired output.
• The neurons in the Kohonen Layer are arranged in
one dimensional and also in two dimensional array.
• In both one dimensional and two dimensional, a
neighborhood parameter or radius (r) can be
defined to indicate the neighborhood of a specific
neuron
• The key principle for map formation is that training
should take place over an extended region of the
network centered on the maximally active mode
Training Algorithm
• Initially weights and learning rate are set
• The input vectors to be clustered are presented to
the network
• Once the input vector is given based on the initial
weights, the winner unit is calculated either by
Euclidean distance method or sum of product
method
• Based on the winner unit selection, the weights are
updated for the particular winner unit using
competitive learning

You might also like