274 - Soft Computing LECTURE NOTES
274 - Soft Computing LECTURE NOTES
ON
PRINCIPLES OF SOFT COMPUTING
PREPARED BY
1
Introduction
Basics of Soft Computing
• The idea of soft computing was initiated in 1981 when Lotfi A. Zadeh published his first
paper on soft data analysis “What is Soft Computing”, Soft Computing. Springer-Verlag
Germany/USA 1997.]
• Zadeh, defined Soft Computing into one multidisciplinary system as the fusion of the
fields of Fuzzy Logic, Neuro-Computing, Evolutionary and Genetic Computing, and
Probabilistic Computing.
• Soft Computing is the fusion of methodologies designed to model and enable solutions to
real world problems, which are not modeled or too difficult to model mathematically.
• The aim of Soft Computing is to exploit the tolerance for imprecision, uncertainty,
approximate reasoning, and partial truth in order to achieve close resemblance with
human like decision making.
• The Soft Computing – development history
SC = EC + NN + FL
EC = GP + ES + EP + GA
2
Definitions of Soft Computing (SC)
Lotfi A. Zadeh, 1992 : “Soft Computing is an emerging approach to computing which parallel
the remarkable ability of the human mind to reason and learn in a environment of uncertainty
and imprecision”.
Hybridization of these three creates a successful synergic effect; that is, hybridization creates a
situation where different entities cooperate advantageously for a final outcome.
Hence, a clear definite agreement on what comprises Soft Computing has not yet been reached.
More new sciences are still merging into Soft Computing.
• The main goal of Soft Computing is to develop intelligent machines to provide solutions
to real world problems, which are not modeled, or too difficult to model mathematically.
• Its aim is to exploit the tolerance for Approximation, Uncertainty, Imprecision, and
Partial Truth in order to achieve close resemblance with human like decision making.
Approximation : here the model features are similar to the real ones, but not the same.
3
Uncertainty : here we are not sure that the features of the model are the same as that of the
entity (belief).
Imprecision : here the model features (quantities) are not the same as that of the real ones,
but close to them.
exploit these tolerance to achieve tractability, robustness and low solution cost. In
effect, the role model for soft computing is the human mind.
The four fields that constitute Soft Computing (SC) are : Fuzzy Computing (FC),
Evolutionary Computing (EC), Neural computing (NC), and Probabilistic
Computing (PC), with the latter subsuming belief networks, chaos theory and parts
of learning theory.
4
Fuzzy Computing
In the real world there exists much fuzzy knowledge, that is, knowledge which
is vague, imprecise, uncertain, ambiguous, inexact, or probabilistic in nature.
Human can use such information because the human thinking and reasoning
frequently involve fuzzy information, possibly originating from inherently
inexact human concepts and matching of similar rather then identical
experiences.
The computing systems, based upon classical set theory and two-valued logic,
can not answer to some questions, as human does, because they do not have
completely true answers.
We want, the computing systems should not only give human like answers but
also describe their reality levels. These levels need to be calculated using
imprecision and the uncertainty of facts and rules that were applied.
Fuzzy Sets
5
then we can state explicitly whether each element x of space X "is or
is not" an element of A.
Α : Χ → [0, 1]
1 if x∈ X
A (x) = Eq.(2)
0 otherwise
− Thus, in classical set theory A (x) has only the values 0 ('false') and 1
('true''). Such sets are called crisp sets.
6
• Crisp and Non-crisp Set
The characteristic function A(x) of Eq. (2) for the crisp set is
generalized for the Non-crisp sets.
membership function.
− The proposition of Fuzzy Sets are motivated by the need to capture and
represent real world data with uncertainty due to imprecise
measurement.
7
• Example 1 : Heap Paradox
This example represents a situation where vagueness and uncertainty are inevitable.
- If we remove one grain from a heap of grains, we will still have a heap.
- However, if we keep removing one-by-one grain from a heap of grains, there will be a
time when we do not have a heap anymore.
- The question is, at what time does the heap turn into a countable collection of grains
that do not form a heap? There is no one correct answer to this question.
■
Non-Crisp Representation to represent the notion of a tall person.
A student of height 1.79m would belong to both tall and not tall sets with a particular degree of
membership.As the height increases the membership grade within the tall set would increase whilst
the membership grade within the not-tall set would decrease.
8
• Capturing Uncertainty
Therefore, Crisp Sets ⊆ Fuzzy Sets In other words, Crisp Sets are
Special cases of Fuzzy Sets.
9
Example 2: Set of SMALL ( as non-crisp set) Example 1: Set of prime
numbers ( a crisp set)
10
• Definition of Fuzzy Set
The value A(x) is the membership grade of the element x in a fuzzy set A.
to 12.
Assume:
SMALL(1) = 1, SMALL(2) = 1, SMALL(3) = 0.9, SMALL(4) = 0.6,
11
SMALL(5) = 0.4, SMALL(6) = 0.3, SMALL(7) = 0.2, SMALL(8) = 0.1,
SMALL(u) = 0 for u >= 9.
Set SMALL = {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3}, {7, 0.2},
Note that a fuzzy set can be defined precisely by associating with each x ,
its grade of membership in SMALL.
Originally the universal space for fuzzy sets in fuzzy logic was defined only
on the integers. Now, the universal space for fuzzy sets and fuzzy
relations is defined with three numbers. The first two numbers specify the
start and end of the universal space, and the third argument specifies the
increment between elements. This gives the user more flexibility in
choosing the universal space.
The fuzzy set SMALL of small numbers, defined in the universal space
12
X = { xi } = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} is presented as
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
13
• Graphic Interpretation of Fuzzy Sets PRIME Numbers
PRIME = FuzzySet {{1, 0}, {2, 1}, {3, 1}, {4, 0}, {5, 1}, {6, 0}, {7, 1}, {8, 0}, {9, 0}, {10, 0}, {11, 1},
{12, 0}}
{8, 0}, {9, 0}, {10, 0}, {11, 1}, {12, 0}} , UniversalSpace → {1, 12, 1}]
14
• Graphic Interpretation of Fuzzy Sets UNIVERSALSPACE
In any application of sets or fuzzy sets theory, all sets are subsets of
UNIVERSALSPACE = FuzzySet {{1, 1}, {2, 1}, {3, 1}, {4, 1}, {5, 1}, {6, 1},
{7, 1}, {8, 1}, {9, 1}, {10, 1}, {11, 1}, {12, 1}}
15
Finite and Infinite Universal Space
Examples:
16
• Graphic Interpretation of Fuzzy Sets EMPTY
EMPTY = FuzzySet {{1, 0}, {2, 0}, {3, 0}, {4, 0}, {5, 0}, {6, 0}, {7, 0}, {8,
{8, 0}, {9, 0}, {10, 0}, {11, 0}, {12, 0}} , UniversalSpace → {1, 12, 1}]
17
Fuzzy Operations
A fuzzy set operations are the operations on fuzzy sets. The fuzzy set
operations are generalization of crisp set operations. Zadeh [1965]
formulated the fuzzy set theory in the terms of standard operations:
Complement, Union, Intersection, and Difference.
Inclusion :
FuzzyInclude [VERYSMALL, SMALL]
Equality :
FuzzyEQUALITY [SMALL, STILLSMALL]
Complement :
FuzzyNOTSMALL = FuzzyCompliment [Small]
Union :
FuzzyUNION = [SMALL MEDIUM]
∪
Intersection :
FUZZYINTERSECTON = [SMALL MEDIUM]
∩
18
• Inclusion
The fuzzy set A is included in the fuzzy set B if and only if for every x in
Example :
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
19
SetVerySmall = FuzzySet [{{1,1},{2,0.8}, {3,0.7}, {4,0.4}, {5,0.2},{6,0.1},
{7,0}, {8, 0}, {9, 0}, {10, 0}, {11, 0}, {12, 0}} , UniversalSpace → {1, 12, 1}]
Include
[VERYSMALL,
SMALL]
20
• Comparability
if one of the fuzzy sets is a subset of the other set, they are comparable.
Example 1:
Example 2 :
D is not a subset of C.
21
• Equality
Let A and B
be fuzzy sets defined in the same space X.
Then A and B if
and only if
are equal, which is denoted X = Y
Example.
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
STILLSMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4},
{6, 0.3}, {7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
22
• Complement
Then the fuzzy set B is a complement of the fuzzy set A, if and only if,
Example 1.
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
NOTSMALL = FuzzySet {{1, 0 }, {2, 0 }, {3, 0.1}, {4, 0.4}, {5, 0.6}, {6, 0.7},
{7, 0.8}, {8, 0.9}, {9, 1 }, {10, 1 }, {11, 1}, {12, 1}}
23
SC – Fuzzy Computing
Example 2.
The empty set Φ and the universal set X, as fuzzy sets, are
Φ'= X' =
X , Φ
Empty = FuzzySet {{1, 0 }, {2, 0 }, {3, 0}, {4, 0}, {5, 0}, {6, 0},
{7, 0}, {8, 0}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
Universal = FuzzySet {{1, 1 }, {2, 1 }, {3, 1}, {4, 1}, {5, 1}, {6, 1},
{7, 1}, {8, 1}, {9, 1 }, {10, 1 }, {11, 1}, {12, 1}}
24
• Union
The union is defined as the smallest fuzzy set that contains both A and
B. The union of A and B is denoted by A ∪ B.
A(x) = 0.6 and B(x) = 0.4 ∴ (A ∪ B)(x) = max [0.6, 0.4] = 0.6
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
MEDIUM = FuzzySet {{1, 0 }, {2, 0 }, {3, 0}, {4, 0.2}, {5, 0.5}, {6, 0.8},
{7, 1}, {8, 1}, {9, 0.7 }, {10, 0.4 }, {11, 0.1}, {12, 0}}
FUZZYUNION = [SMALL
∪ MEDIUM]
The notion of the union is closely related to that of the connective "or".
25
If "David is Young" or "David is Bald," then David is associated with the
union of A and B. Implies David is a member of A ∪ B.
■
Identity:
A∪
Φ =A
input
= Equality [SMALL ∪ EMPTY , SMALL]
output = True
A
∪ X=X
input
= Equality [SMALL ∪ UnivrsalSpace , UnivrsalSpace]
output = True
■ Idempotence :
A∪A=A
output = True
■
Commutativity :
A ∪ B =B ∪ A
26
input = Equality [SMALL ∪ MEDIUM, MEDIUM ∪ SMALL]
output = True
■
Associativity:
A
∪ (B C) = B) C
∪ (A∪ ∪
output = True
MEDIUM = FuzzySet {{1, 0 }, {2, 0 }, {3, 0}, {4, 0.2}, {5, 0.5}, {6, 0.8},
{7, 1}, {8, 1}, {9, 0 }, {10, 0 }, {11, 0.1}, {12, 0}}
BIG
= FuzzySet [{{1,0}, {2,0}, {3,0}, {4,0}, {5,0}, {6,0.1}, {7,0.2},
{8,0.4}, {9,0.6}, {10,0.8}, {11,1}, {12,1}}]
27
27
SC – Fuzzy Computing
• Intersection
The intersection is defined as the greatest fuzzy set included both A and
B. The intersection of A and B is denoted by A ∩ B.
A(x) = 0.6 and B(x) = 0.4 ∴ (A ∩ B)(x) = min [0.6, 0.4] = 0.4
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
MEDIUM = FuzzySet {{1, 0 }, {2, 0 }, {3, 0}, {4, 0.2}, {5, 0.5}, {6, 0.8},
{7, 1}, {8, 1}, {9, 0.7 }, {10, 0.4 }, {11, 0.1}, {12, 0}}
{5,0.4}, {6,0.3}, {7,0.2}, {8, 0.1}, {9, 0},{10, 0}, {11, 0}, {12, 0}} , UniversalSpace →
{1, 12, 1}]
28
Neural Computing
29
Biological Model:
The human brain consists of a large number (more than a billion) of neural
cells that process information. Each cell works like a simple processor. The
massive interaction between all cells and their parallel processing, makes
the brain's abilities possible. The
structure of neuron is shown below.
Myelin Sheath consists of fat-containing cells that insulate the axon from electrical
activity. This insulation acts to increase the rate of transmission of signals. A gap
exists between each myelin sheath cell along the axon. Since fat inhibits the
propagation of electricity, the signals jump from one gap to the next.
30
Nodes of Ranvier are the gaps (about 1 m) between myelin sheath cells long
axons. Since fat serves as a good insulator, the myelin sheaths speed the rate of
transmission of an electrical impulse along the axon.
Synapse is the point of connection between two neurons or a neuron and a muscle or
a gland. Electrochemical communication between neurons takes place at these
junctions.
Terminal Buttons of a neuron are the small knobs at the end of an axon that
release chemicals called neurotransmitters.
31
• Information flow in a Neural Cell
The input /output and the propagation of information are shown below.
32
Artificial Neuron
The signals build up in the cell. Finally the cell fires (discharges)
through the output. The cell can start building up signals again.
33
• Functions :
34
• McCulloch-Pitts (M-P) Neuron Equation
34
35
• Basic Elements of an Artificial Neuron
Weighting Factors
Threshold
y= Σn Xi Wi - Φ k
i=1
36
Activation Function
37
• Example :
y (threshold) = 1
38
• Single and Multi - Layer Perceptrons
The neurons are shown as circles in the diagram. It has several inputs and
a single output. The neurons have gone under various names.
multi-layer perceptron
Output
Output Output
Input InputInput
39
• Perceptron
40
Genetic Algorithms
Genetic Algorithms (GAs) were invented by John Holland in early 1970's to mimic
some of the processes observed in natural evolution.
Later in 1992 John Koza used GAs to evolve programs to perform certain tasks.
He called his method "Genetic Programming" (GP).
GAs simulate the survival of the fittest, among individuals over consecutive
generation for solving a problem. Each generation consists of a population of
character strings that are analogous to the chromosome in our DNA
(Deoxyribonucleic acid). DNA contains the genetic instructions used in the
development and functioning of all known living organisms.
search into the region of better performance within the search space.
■ In nature, competition among individuals for scanty resources results
in the fittest individuals dominating over the weaker ones.
41
• Why Genetic Algorithms
42
■ Each gene represents a specific trait (feature) of the organism and has
several different settings, e.g. setting for a hair color gene may be
black or brown.
■ When two organisms mate they share their genes. The resultant
offspring may end up having half the genes from one parent and half
from the other parent. This process is called crossover
(recombination).
Mutation means, that the elements of DNA are a bit changed. This
change is mainly caused by errors in copying genes from parents.
43
Artificial Evolution and Search Optimization
44
• Taxonomy of Evolution & Search Optimization Classes
■ Enumerative Methods
These are the traditional search and control strategies. They search for
a solution in a problem space within the domain of artificial
intelligence. There are many control structures for search. The depth-
first search and breadth-first search are the two most
45
categories : uninformed and informed methods.
46
■ Calculus based techniques
47
■ Guided Random Search techniques
48
Evolutionary Algorithms (EAs)
new solutions that are biased towards regions of the space for
Each feasible solution is marked by its value or fitness for the problem.
- Search for a solution point, means finding which one point (or more)
among many feasible solution points in the search space is the solution.
This requires looking for some extremes, minimum or maximum.
- Search space can be whole known, but usually we know only a few
points and we are generating other points as the process of finding
solution continues.
49
- Search can be very complicated. One does not know where to look
- What we find is some suitable solution, not necessarily the best solution.
The solution found is often considered as a good solution, because it is not
often possible to prove what is the real optimum solution.
Associative Memory
50
Description of Associative Memory
■
A content-addressable memory is a type of memory that allows, the
recall of data based on the degree of similarity between the input
pattern and the patterns stored in memory.
■
It refers to a memory organization in which the memory is accessed by
its content and not or opposed to an explicit address in the traditional
computer memory system.
■
This type of memory allows the recall of information based on partial
knowledge of its contents.
51
■ It is a system that “associates” two patterns (X, Y) such that
- auto-associative and
- hetero-associative.
52
Adaptive Resonance Theory (ART)
Note : The terms nearest and closer are defined in many ways in clustering
algorithm. In ART, these two terms are defined in slightly different way by
introducing the concept of "resonance".
53
• Definitions of ART and other types of Learning
The unsupervised learning means that the network learns the significant
patterns on the basis of the inputs only. There is no feedback. There is no
external teacher that instructs the network or tells to which category a
certain input belongs. Learning in biological systems always starts as
unsupervised learning; Example : For the newly born, hardly any pre-
existing categories exist.
only limited feedback, like "on this input you performed well" or
"on this input you have made an error". In supervised mode of learning
54
• Description of Adaptive Resonance Theory
− a reset module.
best match is, the single neuron whose set of weights (weight
vector) matches most closely the input vector.
■ Vigilance parameter
55
■ Reset module
After the input vector is classified, the Reset module compares the
strength of the recognition match with the vigilance parameter.
- The Recognition neurons are disabled one by one by the reset function
until the vigilance parameter is satisfied by a recognition match.
There are two basic methods, the slow and fast learning.
- Fast learning method : here the algebraic equations are used to calculate
degree of weight adjustments to be made, and binary values are used.
56
Note : While fast learning is effective and efficient for a variety of tasks,
the slow learning method is more biologically plausible and can be used
with continuous-time networks (i.e. when the input vector can vary
continuously).
■ ART 1: The simplest variety of ART networks, accept only binary inputs.
■ Fuzzy ART : It Implements fuzzy logic into ART’s pattern recognition, thus
enhances generalizing ability. One very useful feature of fuzzy ART is
complement coding, a means of incorporating the absence of features into
pattern classifications, which goes a long way towards preventing
inefficient and unnecessary category proliferation.
57
Applications of Soft Computing
The relevance of soft computing for pattern recognition and image processing
is already established during the last few years. The subject has recently
gained importance because of its potential applications in problems like :
- Medical Imaging,
- Forensic Applications,
- Signature Verification,
- Multimedia,
- Target Recognition,
58
Fundamentals of Neural Networks
59
Introduction
60
Why Neural Network
■ The conventional computers are good for - fast arithmetic and does
what programmer programs, ask them to do.
61
Research History
The history is relevant because for nearly two decades the future of Neural
network remained uncertain.
McCulloch and Pitts (1943) are generally recognized as the designers of the
first neural network. They combined many simple processing units together
that could lead to an overall increase in computational power. They suggested
many ideas like : a neuron has a threshold level and once that level is
reached the neuron fires. It is still the fundamental way in which ANNs
operate. The McCulloch and Pitts's network had a fixed set of weights.
In the 1950 and 60's, many researchers (Block, Minsky, Papert, and
Rosenblatt worked on perceptron. The neural network model could be proved
to converge to the correct weights, that will solve the problem. The weight
adjustment (learning algorithm) used in the perceptron was found more
powerful than the learning rules used by Hebb. The perceptron caused great
excitement. It was thought to produce programs that could think.
Minsky & Papert (1969) showed that perceptron could not learn those
functions which are not linearly separable.
The neural networks research declined throughout the 1970 and until mid
80's because the perceptron could not learn certain important functions.
62
Neural network regained importance in 1985-86. The researchers, Parker
and LeCun discovered a learning algorithm for multi-layer networks called
back propagation that could solve problems that were not linearly
separable.
63
Biological Neuron Model
The human brain consists of a large number, more than a billion of neural
cells that process information. Each cell works like a simple processor. The
massive interaction between all cells and their parallel processing only
makes the brain's abilities possible.
Myelin Sheath consists of fat-containing cells that insulate the axon from electrical
activity. This insulation acts to increase the rate of transmission of signals. A gap
exists between each myelin sheath cell along the axon. Since fat inhibits the
propagation of electricity, the signals jump from one gap to the next.
64
Nodes of Ranvier are the gaps (about 1 m) between myelin sheath cells long
axons are Since fat serves as a good insulator, the myelin sheaths speed the rate of
transmission of an electrical impulse along the axon.
Synapse is the point of connection between two neurons or a neuron and a muscle or
a gland. Electrochemical communication between neurons takes place at these
junctions.
Terminal Buttons of a neuron are the small knobs at the end of an axon that
release chemicals called neurotransmitters.
The input /output and the propagation of information are shown below.
65
■ Axons act as transmission lines to send activation to other neurons.
66
SC - Neural Network – Introduction
■
A set of input connections brings in activations from other neurons.
■
A processing unit sums the inputs, and then applies a non-linear
activation function (i.e. squashing / transfer / threshold function).
■
An output line transmits the result to other neurons.
In other words ,
67
Single Layer Feed-forward Network
calculated in each neuron node, and if the value is above some threshold
(typically 0) the neuron fires and takes the activated value (typically 1);
otherwise it takes the deactivated value (typically -1).
input xi output yj
weights wij
w11
x1 y1
w21
w12
w22
x2 y2
w2m
w1m
wn1
wn2
xn ym
wnm
Single layer
Neurons
68
Multi Layer Feed-forward Network
Fig.
Multilayer feed-forward network in (ℓ – m – n) configuration.
■ The input layer neurons are linked to the hidden layer neurons; the
weights on these links are referred to as input-hidden layer weights.
■ The hidden layer neurons and the corresponding weights are referred to
as output-hidden layer weights.
69
■ A multi-layer feed-forward network with ℓ input neurons, m1 neurons in
the first hidden layers, m2 neurons in the second hidden layers, and n
output neurons in the output layers is written as (ℓ - m1 - m2 – n ).
Recurrent Networks
Example :
70
Learning Methods in Neural Networks
The learning methods in neural networks are classified into three basic types :
• Supervised Learning,
• Reinforced Learning
• Hebbian,
• Gradient descent,
• Competitive and
• Stochastic learning.
71
• Classification of Learning Algorithms
Neural Network
Learning algorithms
72
b Supervised Learning
c Unsupervised Learning
No teacher is present.
d Reinforced learning
Note : The Supervised and Unsupervised learning methods are most popular
forms of learning compared to Reinforced learning.
73
• Hebbian Learning
In this rule, the input-output pattern pairs (Xi , Yi) are associated by the
weight matrix W, known as correlation matrix computed as
W= Σn Xi Yi
T
i=1
There are many variations of this rule proposed by the other researchers
(Kosko, Anderson, Lippman) .
74
• Gradient descent Learning
• If ∆ Wij is the weight update of the link connecting the i th and the j th
Wij = ( E/ Wij )
η ∂ ∂
where
η is the learning rate parameters and E/ Wij ) is error
(∂ ∂
gradient
with reference to the weight Wij .
75
• Competitive Learning
• Stochastic Learning
76
• Taxonomy Of Neural Network Systems
In the previous sections, the Neural Network Architectures and the Learning
methods have been discussed. Here the popular neural network
systems are listed. The grouping of these systems in terms of architectures and
the learning methods are presented in the next slide.
– AM (Associative Memory)
– Boltzmann machines
– BSB ( Brain-State-in-a-Box)
– Cauchy machines
– Hopfield Network
– Neoconition
77
– Perceptron
30
78
• Classification of Neural Network
Learning Methods
descent
Percepton,
RBF
Hopfield, machines
31
79
■ Single-Layer NN Systems
input xi output yj
weights wij
w11
x1 y1
w21
w12
w22
x2 y2
w2m
w1m
wn1
wn2
xn ym
wnm
Single layer
Perceptron
where net j =
n
y j = f (net j) = 1 if net j ≥ 0 Σ xi wij
0 if net j 0 i=1
80
■ Learning Algorithm : Training Perceptron
K+1 K
i.e. W = W
ij ij
− If the output is 1 but should have been 0 then the weights are
K+1 K
W − α.
i.e. ij = W i j xi
− If the output is 0 but should have been 1 then the weights are
increased on the active input link
K+1 K
i.e. W ij
+ . xi
=W ij
α
Where
Wi j
81
82
αsmall leads to slow and α large leads to fast learning.
■ Definition : Sets of points in 2-D space are linearly separable if the sets
can be separated by a straight line.
Example
S1 S2 S1
S2
Note : Perceptron cannot find weights for classification problems that are
not linearly separable.
83
• XOR Problem :
Exclusive OR operation
X2
(0, 1)
(1, 1)
0 0 0
1 1 0 Even parity
0 1 1
(0, 1)
X1 , x2 plane
■ There is no way to draw a single straight line so that the circles are on
one side of the line and the dots on the other side.
35
84
SC - Neural Network –Single Layer learning
■ Step 1 :
Create
a peceptron with (n+1) input neurons x0 , x1 , . . . . . , . x n ,
xi wi
Let O be the output neuron.
■ Step 2 :
• Step 3 :
for
Iterate through the input patterns Xj of the training set using the weight set; ie compute the weighted sum of inputs net j =
Σn
i=1
• Step 4 :
i=1
0 if net j 0
• Step 5 :
85
Compare the computed output yj with the target output yj each
input pattern j .
If all the input patterns have been classified correctly, then output
(read) the weights and exit.
= Step 6 :
Then wi = wi - α xi , i= 0, 1, 2, . . . . , n
Then wi = wi + α xi , i= 0, 1, 2, . . . . , n
z Step 7 :
goto step 3
• END
36
86
SC - Neural Network –ADALINE
87
- ADALINE Training Mechanism
-
The basic structure of an ADALINE is similar to a linear neuron with an
extra feedback loop.
to the network.
■
The weights are adaptively adjusted based on delta rule.
•
After the ADALINE is trained, an input vector presented to the
•
Thus, the network performs an n dimensional mapping to a scalar
value.
•
The activation function is not used during the training phase. Once the
weights are properly adjusted, the response of the trained unit can be
tested by applying various inputs, which are
responses to a high degree with the test inputs, it is said that the
network could generalize. The process of training and generalization
are two important attributes of this network.
Usage of ADLINE
88
In practice, an ADALINE is used to Make binary decisions; the output is sent
through a binary threshold.
Clustering:
Classification/Pattern recognition:
•
Function approximation :
•
Prediction Systems:
89
Back-Propagation Network
What is BPN ?
network can overcome many restrictions, but they did not present
to hidden layer" ?
and Williams in 1986. The central idea behind this solution is that
the errors for the units of the hidden layer are determined by
90
1. Back-Propagation Network – Background
available is a difficult task when there is no a good theory available that may to
help reconstruct the missing data. It is in such situations the Back-propagation
(Back-Prop) networks may provide some answers.
- an input layer,
- an output layer.
91
• With BackProp networks, learning occurs during a training phase.
The steps followed during learning are :
− each input pattern in a training set is applied to the input units and then
propagated forward.
− the error signal for each such target output pattern is then back-
propagated from the outputs to the inputs in order to
− after a BackProp network has learned the correct classification for a set
of inputs, it can be tested on a second set of inputs to see how well it
classifies untrained patterns.
• An important consideration
in applying BackProp learning is how
92
1.1 Learning :
AND function
if both weights are set to 1 and the threshold is set to 1.5, then
(1)(0) + (1)(0) < 1.5 assign 0 , (1)(0) + (1)(1) < 1.5 assign 0
(1)(1) + (1)(0) < 1.5 assign 0 , (1)(1) + (1)(1) > 1.5 assign 1
93
1.2 Simple Learning Machines
i=1
If the net input is greater than the threshold θ , then the output unit is
the difference between the actual output of the network with the
94
• Error Measure ( learning rule )
―
If the input vector is correctly classified (i.e., zero
error), then the weights are left unchanged, and
―
If the input vector is incorrectly classified (i.e., not zero
error), then there are two cases to consider :
decreased by 1.
09
95
SC - NN – BPN – Background
∆ θ = - (tp - op) = - dp
10
96
SC - NN - BPN – Background
− all points on the right side of the line are +ve, therefore the output of
the neuron should be +ve.
− all points on the left side of the line are –ve, therefore the output of the
neuron should be –ve.
97
98
• Back Propagation Network
Learning By Example
The weight of the arc between i th input neuron to j th hidden layer is Vij .
The weight of the arc between i th hidden neuron to j th out layer is Wij
99
2.1 Computation of Input, Hidden and Output Layers
If the output of the input layer is the input of the input layer and the
transfer function is 1, then
{ O }I
= { I }I
ℓx1
ℓx1 (denotes matrix row, column size)
- Let Vij be the weight of the arc between i th input neuron to j th hidden
layer.
■ The input to the hidden neuron is the weighted sum of the outputs of
the input neurons. Thus the equation
T
{ I }H = [ V ] { O }I
100
SC - NN – Back Propagation Network
Shown below the pth neuron of the hidden layer. It has input from the
output of the input neurons layers. If we consider transfer function as
sigmoidal function then the output of the pth hidden neuron is given by
OHp = λ )
(1+e - (IHP – θHP )
where OHp
is the output of the pth hidden neuron,
IHp
is the input of the pth hidden neuron, and is
the threshold of the pth neuron;
θHP
101
Treating each component of the input of the hidden neuron separately, we
get the outputs of the hidden neuron as given by above equation .
The input to the output neuron is the weighted sum of the outputs of the
hidden neurons. Accordingly, Ioq the input to the qth output neuron is
given by the equation
T
{ I }O = [ W] { O }H
Shown below the qth neuron of the output layer. It has input from the
output of the hidden neurons layers.
102
1
OOq = λ )
(1+e - (IOq – θOq )
where OOq
th
is the output of the q output neuron,
IOq
th
is the input to the q output neuron, and
th
θOq is the threshold of the q neuron;
103
2.2 Calculation of Error
Consider any r th output neuron. For the target out value T, mentioned in
the table- 'nset' of input and output data" for the purpose of training,
calculate output O .
1
The Euclidean norm of error E for the first training pattern is given by
E1 = (1/2)
n 2
Σ (Tor - Oor )
r=1
This error function is for one training pattern. If we use the same
technique for all the training pattern, we get
nset
E (V, W) =
E j (V, W, I)
Σ
r=1
104
weights of [W] and [V].
network parameters that optimize the error function E over the 'nset'
ℓx m mxn
16
105
SC - NN - BPN – Algorithm
• Back-Propagation Algorithm
The benefits of hidden layer neurons have been explained. The hidden layer
allows ANN to develop its own internal representation of input-output mapping.
The complex internal representation capability allows the hierarchical network
to learn any mapping and not just the linearly separable ones.
has a three layer. The input layer is with ℓ nodes, the hidden layer with m
nodes and the output layer with n nodes. An example for training a BPN with
five training set have been shown for better understanding.
17
106
SC - NN - BPN – Algorithm
The basic algorithm loop structure, and the step by step procedure of
Back- propagation algorithm are illustrated in next few slides.
Repeat
End
18
107
SC - NN - BPN – Algorithm
Step 1 :
Normalize the I/P and O/P with respect to their maximum values. For
each training pair, assume that in normalized form there are
•x 1
• Step 2 :
19
108
SC - NN - BPN – Algorithm
2. Step 3 :
[V] 0
= [ random weights ]
[W] 0
= [ random weights ]
[∆ V] 0 =[∆ W] 0 =[0]
λ
For
general problems can be assumed as 1 and threshold
value as
0.
20
109
SC - NN - BPN – Algorithm
• Step 4 :
then by using linear activation function, the output of the input layer
may be evaluated as
{ O }I = { I }I
ℓx1 ℓx1
2. Step 5 :
T
{ I }H = [ V] { O }I
• Step 6 :
Let the hidden layer units, evaluate the output using the sigmoidal
function as
{ O }H = ( 1 + e - (IHi))
110
–
mx1
21
111
SC - NN - BPN – Algorithm
- Step 7 :
T
{ I }O = [ W] { O }H
■ Step 8 :
Let the output layer units, evaluate the output using sigmoidal function
as
{ O }O = ( 1 + e - (IOj))
–
22
112
SC - NN - BPN – Algorithm
■ Step 9 :
Calculate the error using the difference between the network output
(Tj - Ooj )2
EP = √ ∑ n
Α Step 10 :
Find a term { d } as
–
nx1
23
113
SC - NN - BPN – Algorithm
■ Step 11 :
Find [ Y ] matrix as
[ Y ] = { O }H 〈d〈
■ Step 12 :
t +1
[∆ W] =
t
Find α [∆ W] + η[Y]
Step 13 :
Find
{e}=[W]{d}
(OHi) (1 – OHi )
{ d* } = e i
mx1mx1
114
Find [ X ] matrix as
= { I }I 〈 d*
[ X ] = { O }I 〈 d* 〈 〈
1xm ℓ x 11 x m ℓ x 11 x m
24
115
SC - NN - BPN – Algorithm
• Step 14 :
Find
[∆V] t +1 = α
[∆V] t + η[X
1xm 1xm ]
1xm
1 Step 15 :
t +1 t t +1
Find [V] = [V ] +[∆ V]
t +1 t t +1
[W] = [W ] +[∆ W]
ƒ Step 16 :
∑ Ep
error rate =
nset
• Step 17 :
• End of Algorithm
116
25
117
SC - NN - BPN – Algorithm
• Problem :
I1 I2 O
In this problem,
■ the values lie between -1 and +1 i.e., no need to normalize the values.
0.4 0.1
0.2
0.4 -0.2 TO = 0.1
118
-0.7 -0.5
0.2
26
119
SC - NN - BPN – Algorithm
■ Step 1 : Input the first training set data (ref eq. of step 1)
0.4
{ O }I = { I }I =
-0.7
ℓx1 ℓx1
2x1
0.2
0.1 0.4
0 [W]0 =
[V] =
; -0.5
-0.2 0.2
2 x1
2x2
T
■ Step 3 : Find { I }H = [ V] { O }I as (ref eq. of step 5)
{ I }H = =
-0.4 0.2 -0.7 0.02
27
120
SC - NN - BPN – Algorithm
( 1 + e - (0.18))
0.5448
{ O }H = =
1 0.505
( 1 + e - (0.02))
28
121
SC - NN - BPN – Algorithm
0.5448
{ O }O = - (0.14354) = 0.4642
(1+e )
29
122
SC - NN - BPN – Algorithm
0.5448 –0.0493
[ Y ] = { O }H (d ) = (– 0.09058) =
0.505 –0.0457
1
[∆ W] = η[Y
0
α [∆ W] + ] assume η =0.6
–0.02958
=
–0.02742
0.2 –0.018116
{e}=[W]{d}= (– 0.09058) =
-0.5 –0.04529
123
30
124
SC - NN - BPN – Algorithm
{ d* } = =
0.4
– 0.001796 0.004528
■
0.003143 –0.007924
– 0.001077 0.002716
1 0
[∆V] = α [∆V] + η[X]=
0.001885 –0.004754
31
125
SC - NN - BPN – Algorithm
– 0.0989 0.04027
=
0.1981 –0.19524
• Step 15 :
• Step 16 :
Iterations are carried out till we get the error less than the tolerance.
• Step 17 :
Once the weights are adjusted the network is ready for inferencing
new objects .
126
Associative Memory
• An
auto-associative memory retrieves a stored pattern
previously
different from the input pattern not only in content but possibly also in
type and format.
03
127
SC - AM description
1. Associative Memory
from the input pattern not only in content but possibly also in type and
format.
128
SC - AM description
Auto-associative memory
129
Hetero-associative memory
130
1.2 Working of Associative Memory
■ Example
a stored pattern.
∆ Γ (∆ , Γ ), ( , +), (7 , 4).
Input Recalled
by the symbol
∆
7 Γ
W e
h n
131
the memory is triggered with an input pattern say
∆
then
the associated pattern Γ is retrieved automatically.
132
1.3 Associative Memory - Classes
• auto-associative and
• hetero-associative memory.
Examples
08
133
1.4 Related Terms
■ Encoding or memorization
th
(yj)k represents the j component of pattern Yk
for i = 1, 2, . . . , m and j = 1, 2, . . . , n.
Σ
p Wk where
W= α
k=1
134
SC - AM description
• Retrieval or recollection
input j Σ
= xi w ij
m
j=1
where
input j is weighted sum of the input or activation value of
+1 if input j ≥ θ j
Y j =
- 1 other wise
135
■ Errors and noise
The input pattern may contain errors and noise, or may be an incomplete
version of some previously encoded pattern.
When a corrupted input pattern is presented, the network will retrieve the
stored pattern that is closest to actual input pattern.
Thus, associative memories are robust and fault tolerant because of many
processing elements performing highly parallel and distributed
computations.
136
SC - AM description
- Performance Measures
137
2. Associative Memory Models
The popular models are Hopfield Model and Bi-directional Associative Memory
(BAM) model.
13
138
SC - AM models
The simplest and among the first studied associative memory models
possesses a very low memory capacity and therefore not much used.
The network architecture of these three models are described in the next
few slides.
14
139
SC - AM models
In this section, the neural network architectures of these models and the
construction of the corresponding connection weight matrix W of the
associative memory are illustrated.
15
140
SC - AM models
of processing units, one serving as the input layer while the other as
the output layer. The inputs are directly connected to the outputs,
via a series of weights. The links carrying weights connect every input to
every output. The sum of the products of the weights and the
neurons
weights wij
w11
x1 y1
w21
w12
w22
x2 y2
w2m
inputs outputs
w1m
wn1
wn2
Xn wnm Ym
141
− all n input units are connected to all m output units via connection
then the stored pattern associated with the input pattern is retrieved.
142
− Encoding : The process of constructing the connection weight matrix is
called encoding. During encoding the weight values of correlation matrix
Wk for an associated pattern pair (Xk, Yk) are computed as:
th
Σ
p Wk
by summing those individual correlation matrices Wk, ie, W = α
k=1
input j stands for the weighted sum of the input or activation value of
th
node j , for j = 1, 2, . . , n. and xi is the i component of pattern Xk , and
then determine the units Output using a bipolar output function:
+1 if input j ≥ θ j
Y j =
- 1 other wise
143
where θ j is the threshold value of output neuron j .
Note: The output units behave like linear threshold units; that compute
a weighted sum of the input and produces a -1 or +1 depending
whether the weighted sum is below or above a certain threshold value.
144
SC - AM models
unit is connected to every other unit in the network but not to itself.
for i, j = 1, 2, . . . . . , m.
input j = Σm xi w i
j + I j for j = 1, 2, . . ., m.
i=1
− each unit acts as both input and output unit. Like linear associator,
α p Wk where
145
k=1
− Decoding : After memorization, the network can be used for retrieval; the
process of retrieving a stored pattern, is called decoding; given an
stands for the weighted sum of the input or activation value of node j ,
for j = 1, 2, ..., n. and xi is the i th component of pattern Xk , and then
determine the units Output using a bipolar output function:
+1 if input j ≥ θ j
Y j =
- 1 other wise
Note: The output units behave like linear threshold units; that compute a
weighted sum of the input and produces a -1 or +1 depending whether the
weighted sum is below or above a certain threshold value.
146
repeated until the network stabilizes on a stored pattern where further
computations do not change the output of the units.
147
■ Bidirectional Associative Memory (two-layer)
neurons neurons
weights wij
w11
x1 y1
w21
w12
w22
x2 y2
w2m
inputs outputs
w1m
wn1
wn2
Xn wnm Ym
148
− In the bidirectional associative memory, a single associated pattern
Σ
p W
i.e., W = α k
k=1
149
SC - AM – auto correlator
In the previous section, the structure of the Hopfield model has been
explained. It is an auto-associative memory model which means patterns, rather
than associated pattern pairs, are stored in memory. In this section, the working
of an auto-associative memory (auto-correlator) is illustrated using some
examples.
Working of an auto-correlator :
21
150
SC - AM – auto correlator
A1 = (-1, 1 , -1 , 1 )
A2 = ( 1, 1 , 1 , -1 )
A3 = (-1, -1 , -1 , 1 )
Thus, the outer products of each of these three A1 , A2, A3 bipolar patterns
are
1 -1 1 -1
T
-1 1 -1 1
• -1 1 -1 1
151
1 1 1 -1
T
1 1 1 -1
e -1 -1 -1 1
1 1 1 -1
T
1 1 1 -1
• -1 -1 -1 1
3 1 3 -3
3 T
1 3 1 -1
T = [t ij ]= Σ [Ai ] 4x1 [Ai ] 1x4 3 1 3 -3
i=1 =
i -3 -1 -3 3
22
152
SC - AM – auto correlator
3 1 3 -3
3 T
1 3 1 -1
Σ [Ai ]
T = [t ij ]= 4x1 [Ai ] 1x4 = 3 1 3 -3
i=1
j -3 -1 -3 3
ai
old
anewj = ƒ (ai t i j , aj ) for ∀j=1,2,...,p
A2 yields α = ∑ ai t ij
Computation for the recall equation and
then find β
i=
1 2 3 4 α β
α = ∑ ai t i ,
1x3 + 1x1 + 1x3 + -1x-3 = 10 1
j=1 α = ∑ ai t
i , j=2 α = ∑ ai 1x1 + 1x3 + 1x1 + -1x-1 = 6 1
t i , j=3 α = ∑
ai t i , j=4
153
1x3 + 1x1 + 1x3 + -1x-3 = 10 1
a new old
Therefore j = ƒ (ai t ij , aj ) for ∀ j = 1 ,
2,...,p is ƒ (α , β )
anew1
=ƒ (10 , 1)
anew2 =ƒ (6 , 1)
anew3 =ƒ (10 , 1)
new
a 4 =ƒ (-1 , -1)
23
154
SC - AM – auto correlator
(x , y) = Σm | (xi - yi ) |
i=1
The HDs of A' from each of the stored patterns A1 , A2, A3 are
=4
HD (A' , A2) =2
HD (A' , A3) =6
1 2 3 4 α β
α = ∑ ai t i ,
1x3 + 1x1 + 1x3 + 1x-3 =4 1
j=1 α = ∑ ai t
i , j=2 α = ∑ ai 1x1 + 1x3 + 1x1 + 1x-1 =4 1
t i , j=3 α = ∑
ai t i , j=4 1x3 + 1x1 + 1x3 + 1x-3 =4 1
new
Therefore a j = ƒ (ai t i j ,ajold ) for ∀ j = 1 , 2 , . . . , p is ƒ (α , β )
anew1
=ƒ (4 , 1)
anew2 =ƒ (4 , 1)
anew3 =ƒ (4 , 1)
new
a 4 =ƒ (-4 , -1)
24
156
SC - Bidirectional hetero AM
BAM is its ability to recall stored pairs particularly in the presence of noise.
Definition : If the associated pattern pairs (X, Y) are different and if the
25
157
SC - Bidirectional hetero AM
Denote one layer as field A with elements Ai and the other layer as field B
with elements Bi.
Consider N training pairs { (A1 , B1) , (A2 , B2), . . , (Ai , Bi), . . (AN , BN) }
where Ai = (ai1 , ai2 ,..., ain) and Bi = (bi1 , bi2 ,..., bip) and
M0 =
N T
− the original correlation matrix of the BAM is Σ [ Xi ] [ Yi ]
i=1
where Xi = (xi1 , xi2 ,..., xin) and Yi = (yi1 , yi2 ,..., yip)
T
E=-αMβ
26
158
SC - Bidirectional hetero AM
Example
Retrieve the nearest of (Ai , Bi) pattern pair, given any pair (α , β ) .
β ' = Φ (α M ) and
α '= Φ ( β M )
' MT
β
β " = Φ (α ' M ) and )
''
α " =Φ (
Φ (F) = G = g1 , g2 , . . . . , gr ,
F = ( f1 , f2 , . . . . , f r )
1 if f i >0
0 (binary)
159
gi = , f i <0
-1 (bipolar)
previous gi, f i =0
Kosko has proved that this process will converge for any
correlation matrix M.
27
160
SC - Bidirectional hetero AM
of correlation matrix M :
− an existing pair (Xj , Yj) can be deleted from the memory model.
T T T T
Deletion : subtract the matrix corresponding to an existing pair (Xj , Yj) from
the correlation matrix M , them the new correlation matrix Mnew is given by
Xj
Mnew = M -( Yj
)
28
161
SC - Bidirectional hetero AM
Note : A system that changes with time is a dynamic system. There are two types
of dynamics in a neural network. During training phase it iteratively update
weights and during production phase it asymptotically converges to the solution
patterns. State is a collection of qualitative and qualitative items that characterize
the system e.g., weights, data flows. The Energy function (or Lyapunov function)
is a bounded function of the system state that decreases with time and the
system solution is the minimum energy.
− to store a pattern, the value of the energy function for that pattern
stored patterns.
T
E(A) = - AM A
162
We wish to retrieve the nearest of (Ai , Bi) pair, when any (α , β ) pair is
presented as initial condition to BAM. The neurons change
their states until a bidirectional stable state (Af , Bf) is reached. Kosko has
shown that such stable state is reached for any matrix M when it
corresponds to local minimum of the energy function. Each cycle of
decoding lowers the energy E if the energy function for any point
(
T
α , β ) is given by E=α Mβ
(Ai , Bi) does not constitute a local minimum, then the point cannot
method does not ensure that the stored pairs are at a local minimum.
29
163
SC - Bidirectional hetero AM
A1 = ( 1 0 0 0 0 1 ) B1 =( 1 1 0 0 0 )
A2 = ( 0 1 1 0 0 0 ) B2 =( 1 0 1 0 0 )
A3 = ( 0 0 1 0 1 1 ) B3 =( 0 1 1 1 0 )
X1 = ( 1 -1 -1 -1 -1 1 ) Y1 =( 1 1 -1 -1 -1 )
X2 = ( -1 1 1 -1 -1 -1 ) Y2 =( 1 -1 1 -1 -1 )
X3 = ( -1 -1 1 -1 1 1 ) Y3 =( -1 1 1 1 -1 )
1 1 -3 -1 1
1 -3 1 -1 1
T T T -1 -1 3 1 -1
M = X1 Y1 + X2 Y2 + X3 Y3 =
-1 -1 -1 1 3
-3 1 1 3 1
-1 3 -1 1 -1
α M = ( -1 -1 1 -1 1 1 ) ( M ) = ( -6 6 6 6 -6 )
Φ (α M) = β ' = ( -1 1 1 1 -1 )
164
β' MT = ( -5 -5 5 -3 7 5 )
Φ (β ' M T) = ( -1 -1 1 -1 1 1 ) =α'
α'M = ( -1 -1 1 -1 1 1 ) M = ( -6 6 6 6 -6 )
Φ
(α ' M) = β " = ( -1 1 1 1 1 -1 )
=β'
(α f , β f) =
Hence, (X3 , Y3 ) is correctly recalled, a desired result .
30
165
SC - Bidirectional hetero AM
Start with X2, and hope to retrieve the associated pair Y2 . Consider N
= 3 pattern pairs (A1 , B1 ) , (A2 , B2 ) , (A3 , B3 ) given by
A1 = ( 1 0 0 1 1 1 0 0 0 ) B1 = ( 1 1 1 0 0 0 0 1 0 )
A2 = ( 0 1 1 1 0 0 1 1 1 ) B2 = ( 1 0 0 0 0 0 0 0 1 )
A3 = ( 1 0 1 0 1 1 0 1 1 ) B3 = ( 0 1 0 1 0 0 1 0 1 )
X1 = ( 1 -1 -1 1 1 1 -1 -1 -1 ) Y1 = ( 1 1 1 -1 -1 -1 -1 1 -1 )
X2 = ( -1 1 1 1 -1 -1 1 1 1 ) Y2 = ( 1 -1 -1 -1 -1 -1 -1 -1 1 )
X3 = ( 1 -1 1 -1 1 1 -1 1 1 ) Y3 = ( -1 1 -1 1 -1 -1 1 0 1 )
T T T
M = X1 Y 1 + X2 Y2 + X3 Y3
-1 3 1 1 -1 -1 1 1 -1
1 -3 -1 -1 1 1 -1 -1 1
-1 -1 -3 1 -1 -1 1 -3 3
3 -1 1 -3 -1 -1 -3 1 -1
= -1 3 1 1 -1 -1 1 1 -1
-1 3 1 1 -1 -1 1 1 -1
1 -3 -1 -1 1 1 -1 -1 1
-1 -1 -3 1 -1 -1 1 -3 3
166
-1 -1 -3 1 -1 -1 1 -3 3
31
167
SC - Bidirectional hetero AM
X2 = ( -1 1 1 1 -1 -1 1 1 1 ) Y2 = ( 1 -1 -1 -1 -1 -1 -1 -1 1 )
Φ (α M) = ( 1 -1 -1 -1 1 1 -1 -1 1 ) = β'
Φ (β ' M T) = ( -1 1 1 1 -1 -1 1 1 1 ) = α'
Φ (α ' M) = ( 1 -1 -1 -1 1 1 -1 -1 1 ) = β"
=β'
αF= α ' = ( -1 1 1 1 -1 -1 1 1 1 ) = X2
βF= β ' = ( 1 -1 -1 -1 1 1 -1 -1 1 ) ≠ Y2
But β ' is not Y2 . Thus the vector pair (X2 , Y2) is not recalled correctly
by Kosko's decoding process.
32
168
SC - Bidirectional hetero AM
T
the coordinates of pair (α F , β F) , the energy EF = - α F M β F =
for -75
'
Y2 = ( 1 -1 -1 -1 1 -1 -1 -1 1 )
′ T
E = - X2 M Y2 = - 73
which is lower than E2 confirming the hypothesis that (X2 , Y2) is not
33
169
SC - Bidirectional hetero AM
T
M=
Σ Xi Yi computed from the pattern pairs. The system proceeds to
retrieve the nearest pair given any pair (α , β ), with the help of recall
equations. However, Kosko's encoding method does not ensure that the stored
pairs are at local minimum and hence, results in incorrect recall.
minimum number of times for using a pattern pair (Xi , Yi) for training to
170
energy at (Ai , Bi) does not exceed at points which are one Hamming
distance away from this location.
The new value of the energy function E evaluated at (Ai , Bi) then becomes
T T T
Multiple training encoding strategy for the recall of three pattern pairs
34
171
SC - Bidirectional hetero AM
A1 = ( 1 0 0 1 1 1 0 0 0 ) B1 = ( 1 1 1 0 0 0 0 1 0 )
A2 = ( 0 1 1 1 0 0 1 1 1 ) B2 = ( 1 0 0 0 0 0 0 0 1 )
A3 = ( 1 0 1 0 1 1 0 1 1 ) B3 = ( 0 1 0 1 0 0 1 0 1 )
X1 = ( 1 -1 -1 1 1 1 -1 -1 -1 ) Y1 = ( 1 1 1 -1 -1
X2 = ( -1 1 1 1 -1 -1 1 1 1 ) Y2 = ( 1 -1 -1 -1 -1 -
X3 = ( 1 -1 1 -1 1 1 -1 1 1 ) Y3 = ( -1 1 -1 1 -1
X2 = ( -1 1 1 1 -1 -1 1 1 1 ) Y2 = ( 1 -1 -1 -1 -1 -
T T T
becomes M = X1 Y1 +2X2 Y2 + X3 Y3
4 2 2 0 0 2 2 -2
2 -4 -2 -2 0 0 -2 -2 2
0 -2 -4 0 -2 -2 0 -4 4
= 4 -2 0 -4 -2 -2 -4 0 0
-2 4 2 2 0 0 2 2 -2
-2 4 2 2 0 0 2 2 -2
172
2 -4 -2 -2 0 0 -2 -2 2
0 -2 -4 0 -2 -2 0 -4 4
0 -2 -4 0 -2 -2 0 -4 4
35
173
SC - Bidirectional hetero AM
give
Now α = X2, and see that the corresponding pattern pair β = Y2
Φ (α M ) = ( 1 -1 -1 -1 -1 -1 -1 -1 1 ) = β'
β'M
T
= ( -16 16 18 18 -16 -16 16 18 18 )
Φ (β ' M T) = ( -1 1 1 1 -1 -1 1 1 1 ) = α'
β
Φ
(α ' M) = ( 1 -1 -1 -1 1 1 -1 -1 1 ) = "
β"
Here = β ' . Hence the cycle terminates with
αF=α' = ( -1 1 1 1 -1 -1 1 1 1 ) = X2
βF=β' = ( 1 -1 -1 -1 1 1 -1 -1 1 ) = Y2
matrix M . But, it is not possible to recall (X1 , Y1) using the same
36
174
SC - Bidirectional hetero AM
Note : The previous slide showed that the pattern pair (X2 , Y2 ) is
correctly recalled, using augmented correlation matrix
T T T
M = X1 Y1 + 2 X2 Y2 + X3 Y3
but then the same matrix M can not recall correctly the other
X1 = ( 1 -1 -1 1 1 1 -1 -1 -1 ) Y1 = ( 1 1 1 -1 -1 -1 -1 1 -1 )
α M = ( -6 24 22 6 4 4 6 22 -22 )
Φ (α M) = ( -1 1 1 1 1 1 1 1 -1 ) = β'
β'M
T
= ( 16 -16 -18 -18 16 16 -16 -18 -18 )
Φ (β ' M T) = ( 1 -1 -1 -1 1 1 -1 -1 -1 ) = α'
α'M
= ( -14 28 22 14 8 8 14 22 -22 )
β
Φ (α ' M) = ( -1 1 1 1 1 1 1 1 -1 ) = "
β"=
Here β ' . Hence the cycle terminates with
αF=α' = ( 1 -1 -1 -1 1 1 -1 -1 -1 ) = X1
βF=β' = ( -1 1 1 1 1 1 1 1 -1 ) ≠ Y1
Thus, the pattern pair (X1 , Y1 ) is not correctly recalled, using augmented
correlation matrix M.
175
To tackle this problem, the correlation matrix M needs to be further
augmented by multiple training of (X1 , Y1 ) as shown in the next slide.
37
176
SC - Bidirectional hetero AM
The previous slide shows that pattern pair (X1 , Y1) cannot be recalled
under the same augmentation matrix M that is able to recall (X2 , Y2).
T T T
M = 2 X1 Y1 + 2 X2 Y2 + X3 Y3
-1 5 3 1 -1 -1 1 3 -3
1 -5 -3 -1 1 1 -1 -3 3
-1 -3 -5 1 -1 -1 1 -5 5
5 -1 1 -5 -3 -3 -5 1 -1
= -1 5 3 1 -1 -1 1 3 -3
-1 5 3 1 -1 -1 1 3 -3
1 -5 -3 -1 1 1 -1 -3 3
-1 -3 -5 1 -1 -1 1 -5 5
-1 -3 -5 1 -1 -1 1 -5 5
Now observe in the next slide that all three pairs can be correctly recalled.
38
177
SC - Bidirectional hetero AM
X1 = ( 1 -1 -1 1 1
1 -1 -1 -1 ) Y1 = ( 1 1 1 -1 -1 -1 -1 1 -1 )
α M = ( 3 33 31 -3 -5 -5 -3 31 -31 )
(α
Φ M) = ( 1 1 1 -1 -1 -1 -1 1 -1 ) = β'
( T
β' M ) = ( 13 -13 -19 23 13 13 -13 -19 -19 )
(β T
Φ' M ) = ( 1 -1 -1 1 1 1 -1 -1 -1 ) = α'
α' M = ( 3 33 31 -3 -5 -5 -3 31 -31 )
(α '
Φ
M) = ( 1 1 1 -1 -1 -1 -1 1 -1 ) = β"
α
F = α' = ( 1 -1 -1 1 1 1 -1 -1 -1 ) = X1
β
F = β' = ( 1 1 1 -1 -1 -1 -1 1 -1 ) = Y1
X2 = ( -1 1 1 1 -1 -1 1 1 1 ) Y2 = ( 1 -1 -1 -1 -1 -1 -1 -1 1 )
(α
Φ M) = ( 1 -1 -1 -1 -1 -1 -1 -1 1 ) = β'
( T
β' M ) = ( -15 15 17 19 -15 -15 15 17 17 )
(β T
Φ' M ) = ( -1 1 1 1 -1 -1 1 1 1 ) = α'
178
(α '
Φ
M) = ( 1 -1 -1 -1 -1 -1 -1 -1 1 ) = β"
α
F= α' = ( -1 1 1 1 -1 -1 1 1 1 ) = X2
β
F = β' = ( 1 -1 -1 -1 -1 -1 -1 -1 1 ) = Y2
X3 = ( 1 -1 1 -1 1 1 -1 1 1 ) Y3 = ( -1 1 -1 1 -1 -1 1 0 1 )
α M = ( -13 17 -1 13 -5 -5 13 -1 1 )
(α
Φ M) = ( -1 1 -1 1 -1 -1 1 -1 1 ) = β'
( T
β' M ) = ( 11 -11 27 -63 11 11 -11 27 27 )
(β T
Φ' M ) = ( 1 -1 1 -1 1 1 -1 1 1 ) = α'
α' M = ( -13 17 -1 13 -5 -5 13 -1 1 )
(α '
Φ
M) = ( -1 1 -1 1 -1 -1 1 -1 1 ) = β"
α
F= α ' = ( 1 -1 1 -1 1 1 -1 1 1 ) = X3
β
F = β' = ( -1 1 -1 1 -1 -1 1 0 1 ) = Y3
39
179
SC - Bidirectional hetero AM
as
N
M= Σ T
qi Yi where qi 's are +ve real numbers.
Xi
i=1
40
180
SC - Bidirectional hetero AM
Yi
X=
: the bipolar pattern pairs
xi x
Y=
,
( X1 , X2, . . . . , XN) where Xi = ( 1 xi2 ,... i n )
■
:
( Y1 , Y2 , . . . . , YN) where Yj = ( x j 1 , x j 2 , . . .x j n )
181
Step 1 Initialize correlation matrix M to null matrix M← [0]
i
For ← 1 to N
M
← M⊕ [ qi ∗ Transpose ( Xi ) ⊗ ( Xi ) end
∗ scalar multiplication)
Φ ( A_M
ie B' ← )
end
What is ART ?
182
h The term "resonance" refers to resonant state of a neural network in
• ART systems are well suited to problems that require online learning of
large and evolving databases.
183
The back-propagation algorithm suffer from such stability problem.
• In supervised learning, the input and the expected output of the system
are provided, and the ANN is used to model the relationship between the
two. Given an input set x, and a corresponding output
• In unsupervised learning, the data and a cost function are provided. The
ANN is trained to minimize the cost function by finding a suitable
184
training iteration, the trainer provides the input to the network, and the
network produces a result. This result is put into the cost function, and the
total cost is used to update the weights. Weights are continually updated
until the system output produces a minimal cost. Unsupervised learning is
useful in situations where a cost function is known, but a data set is not
know that minimizes that cost function over a particular input space.
supervised learning, BPN is most used and well known for its ability to
attack problems which we can not solve explicitly. However there are
several technical problems with back-propagation type algorithms.
They are not well suited for tasks where input space changes and are
often slow to learn, particularly with many hidden units. Also the
05
185
SC – ART-Competitive learning
T
– For an output unit j , the input vector X = [x1 , x2 , x3 ] and the
T
weight vector Wj = [w1j , w1j , w1j ] are normalized to unit length.
=
a j
Σ3 xi wij = XT Wj = Wj XT
i=1
and then the output unit with the highest activation is selected for
further processing; this implied competitive.
– Assuming that output unit k has the maximal activation, the weights
leading to this unit are updated according to the competitive, called
wk (t + 1) =
||wk (t) + η {x (t) + wk (t)}||
186
which is normalized to ensure that wk (t + 1) is always of unit length;
only the weights at the winner output unit k are updated and all other
weights remain unchanged.
3
aj = { Σ (xi - wij )2 }1/2 = || xi - wij ||
i=1
the weights of the output units with the smallest activation are
updated according to
187
Limitations of Competitive Learning :
– Competitive learning lacks the capability to add new clusters when deemed
necessary.
188
significant input yet stable in response to irrelevant input?
− How can a neural network can remain plastic enough to learn new
patterns and yet be able to maintain the stability of the already learned
patterns?
− How does the system know to switch between its plastic and stable
modes.
− What is the method by which the system can retain previously learned
information while learning new things.
189
SC - ART networks
− The unsupervised ARTs named as ART1, ART2 , ART3, . . and are similar
The supervised ART algorithms that are named with the suffix "MAP", as
ARTMAP. Here the algorithms cluster both the inputs and targets and
190
The basic ART system is unsupervised learning model. It typically consists of
− a comparison field and a recognition field composed of neurons, − a
vigilance parameter, and
− a reset module
Reset F2
Recognition Field
F2 layer
•• ••
Z W
Reset
Module
Vigilance
F1 layer ρ
Normalized Input
191
10
192
SC - ART networks
• Comparison field
= Recognition field
= Vigilance parameter
11
193
SC - ART networks
• Reset Module
The reset module compares the strength of the recognition match to the
vigilance parameter.
In the search procedure, the recognition neurons are disabled one by one
by the reset function until the vigilance parameter is satisfied by a
recognition match.
12
194
SC - ART networks
ART includes a wide variety of neural networks. ART networks follow both
supervised and unsupervised algorithms. The unsupervised ARTs as ART1,
ART2, ART3, . . . . are similar to many iterative clustering algorithms.
13
195
SC - ART networks
Reset F2
Recognition Field
F2 layer, STM
•• ••
New
cluster LTM
Adaptive Filter
path
Reset
Expectation Module
Vigilance
− Bottom-up weights
F1 layer, STM ρ
Normalized Input
196
Fig Simplified ART Architecture
There are two sets of connections, each with their own weights, called :
14
197
SC - ART networks
Supervised ARTs are named with the suffix "MAP", as ARTMAP, that
combines two slightly modified ART-1 or ART-2 units to form a supervised
learning model where the first unit takes the input data and the second
unit takes the correct output data. The algorithms cluster both the inputs
and targets, and associate the two sets of clusters. Fuzzy ART and Fuzzy
ARTMAP are generalization using fuzzy logic.
ART Networks
Grossberg, 1976
Supervised
Unsupervised ART ART
Learning Learning
198
ART1 , ART2 Fuzzy ART ARTMAP Fuzzy Gaussian
ARTMAP ARTMAP
Grossberg,
Grossberg, Grossberg, Grossberg, 1992
etal 1987
1987 etal 1987 etal 1987
Simplified Simplified
ART ARTMAP
Alpaydin, Alpaydin,
1998 1998
15
199
SC - ART networks
ART networks can "discover" structure in the data by finding how the data
is clustered. The ART networks are capable of developing stable clusters of
arbitrary sequences of input patterns by self-organization.
200
Note : For better understanding, in the subsequent sections, first the
iterative clustering algorithm (a non-neural approach) is presented then
the ART1 and ART2 neural networks are presented.
16
201
SC - Iterative clustering
Organizing data into sensible groupings is one of the most fundamental mode
of understanding and learning.
- Cluster analysis does not use category labels that tag objects with prior
identifiers, i.e., class labels.
17
202
SC - Iterative clustering
- Example :
•••
••• •••
••• •••
••
■■
■■■■
■■■
The K-mean, ISODATA and Vector Quantization techniques are some of the
decision theoretic approaches for cluster formation among unsupervised
learning algorithms.
203
(Note : a recap of distance function in n-space is first mentioned and then vector
quantization clustering is illustrated.)
18
204
SC - Recap distance functions
a real number. Similarly the other element Y = (y1, y2, …yi…., yn)
n
The vector space operations on R are defined by
n
X • Y = ∑ i=1 (x1 y1 + x2 y2 + . . . . + xn yn) is a real number.
n
The dot product defines a distance function (or metric) on R by
i=1
X•Y
-1
θ = cos ( )
||X|| ||Y||
205
||X || = ∑ni=1 (xi - xi)2
19
206
SC – Recap distance functions
■ Euclidean Distance
Q = (q1 , q2 , . . qi . . , qn)
207
(px – qx)2 + (py – qy)2 + (pz – qz)2
20
208
SC - Vector quantization
The goal is to "discover" structure in the data by finding how the data is
clustered. One method for doing this is called vector quantization for
grouping feature vectors into clusters.
− When ever a new input vector Xp as pth pattern appears, the Euclidean
distance d between it and the jth cluster center C j is calculated as
1/2
p
p N )2
d= |X –Cj|= Σ ( X i – CJi
i=1
209
j = 1, . . , M
j≠k
C x = (1/Nx ) Σ X
x∈ Sn
21
210
SC - Vector quantization
Points X Y Points X Y
1 2 3 7 6 4
2 3 3 8 7 4
3 2 6 9 2 4
4 3 6 10 3 4
5 6 3 11 2 7
6 7 3 12 3 7
− Take a new pattern, find its distances from all the clusters identified,
1, (2,3) 0 (2 , 3) 1
2, (3,3) 1 (2.5 , 3) 1
211
3, (2,6) 3.041381 0 (2 , 6) 2
3.3333)
3.33333)
6.333333)
− No of clusters 3
− Clusters Membership S(1) = {P1, P2, P9, P10}; S(2) = {P3, P4, P11, P12};
212
SC - Vector quantization
Clusters formed
7 ■ ■ C2
− Number of input patterns : 12
6 ■ ■ C3
− Threshold distance assumed : 2.0
5
− No of clusters : 3
4
■ ■ ■ ■
3 − Cluster centers :
■ ■ ■ ■
2 C1 = (2.5, 3.5) ;
C1
1 X C2 = (2.5, 6.5);
0 C3 = ( 6.5, 3.5).
0 1 2 3 4 5 6 7 8 − Clusters Membership :
213
Note : About threshold distance
− See next slide, clusters for threshold distances as 3.5 and 4.5 .
23
214
SC - Vector quantization
- Example 2
Y
Y
8
8
7 ■ ■ C1
7 ■ ■
6 ■ ■ C2 C1
5 6 ■ ■
4 5
■ ■ ■ ■
4
3 ■ ■ ■ ■
■ ■ ■ ■
3 ■ ■ ■ ■
2
2
1 X
1 X
0
0
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
215
Fig Clusters formed
− Fig (b) for the threshold distance = 3.5 , two clusters formed.
− Fig (c) for the threshold distance = 4.5 , one cluster formed.
24
216
SC - ART Clustering
The taxonomy of important ART networks, the basic ART structure, and the
general ART architecture have been explained in the previous slides. Here only
Unsupervised ART (ART1 and ART2) Clustering are presented.
ART1 is a clustering algorithm can learn and recognize binary patterns. Here
ART2 is similar
to can learn and recognize arbitrary sequences
ART1,
of analog input
patterns.
The ART1 architecture, the model description, the pattern matching cycle, and
the algorithm - clustering procedure, and a numerical example is presented in
this section.
25
217
SC - ART1 architecture
system
2
G2
Top-Dn R
wij
weights +
• LTM • +
C Bottom-up
vji weights
― ρ
Gain + Comparison layer F1 - STM Reset
+ 1 G1 n - Neuron
Vigilance
parameter
+
IH=1 [1 1 0 0 Pi 0]
------------------
IH=h [1 0 0 1 Pi 0]
218
ATR1 model consists an "Attentional" and an "Orienting" subsystem.
26
219
SC - ART1 Model description
• Attentional Subsystem
(c) Gain control unit , Gain1 and Gain2, one for each layer.
(f) Interconnections among the nodes in each layer are not shown.
(h) Excitatory connection (+ve weights) from gain control to F1 and F2.
• Orienting Subsystem
220
(h) Reset layer for controlling the attentional subsystem overall dynamics.
27
221
SC - ART1 Model description
The comparison layer F1 receives the binary external input, then F1 passes
the external input to recognition layer F2 for matching it to a
− If Yes (match) then a new input vector is read and the cycle starts
again
To other nodes
To F2 in F2 (WTA)
From F2
To orient
vji
From From
orient
Gain2 Unit x2j
in F2
G1 Unit x1i G2
222
in F1
From
Gain1
wij
From F1
To all F1
Ii
and G1
28
'
223
SC - ART1 Pattern matching
The ART network structure does pattern matching and tries to determine
whether an input pattern is among the patterns previously stored in the
network or not.
of activation X is produced
G1 across F1.
=X
F1 system A and gain control G1.
+
► Output pattern S (which is
224
inhibitory signal) is sent to A. It
inactive.
► Net values calculated in the F2 units, as the sum the product of the
input values and the connection weights.
29
225
SC - ART1 Pattern matching
F2 =Y
0 0 1 0=U
inhibitory signal sent to G1. If it
1 0 0 0 = S*
V= 0 0 0 A
1 = X* input pattern for F1 units. Output
F1
+ U is transformed to pattern V, by
► Activities that develop over the nodes in F1 or F2 layers are the STM
traces not shown in the fig.
226
• The 2/3 Rule
► Among the three possible sources of input to F1 or F2, only two are
used at a time. The units on F1 and F2 can become active only if two out of
the possible three sources of input are active. This feature is called the 2/3
rule.
► Due to the 2/3 rule, only those F1 nodes receiving signals from both
input pattern I.
30
227
SC - ART1 Pattern matching
► Orientation sub-system A
F2 =Y
on F2.
F1 ► Nodes on F2 responds
+
according to their present state.
31
228
SC - ART1 Pattern matching
F2 = Y*
+
of the reset signal from A.
assign some uncommitted node or nodes on F2 and will begin to learn the
new pattern. Learning takes through LTM traces (modification of weights).
This learning process does not start or stop but continue while the pattern
229
matching process takes place. When ever signals are sent over
connections, the weights associated with those connections are subject to
modification.
► When a match occurs, there is no reset signal and the network settles
down into a resonate state. During this stable state, connections remain
active for sufficiently long time so that the weights are strengthened. This
resonant state can arise only when a pattern match occurs or during
enlisting of new units on F2 to store an unknown pattern.
32
230
SC - ART1 Algorithm
(Ref: Fig ART1 Architecture, Model and Pattern matching explained before)
■ I(X) is input data set ( ) of the form I(X) = { x(1), x(2), . . , x(t) } where
t represents time or number of vectors. Each x(t) has n elements;
th
is the 4 vector that has 3 elements .
T
wj (t) = [(w1j (t) . . . . wij (t) . . . wnj (t)] , T is transpose; Example :
Wj=1 Wj=2
W11 W12
W31 W32
T
vj (t) = [(vj1 (t) . . . . vji (t) . . . vjn (t)] , T is transpose; Example :
vj=1 vj=2
231
v11 v12 v13
■ For any two vectors u and v belong to the same vector space R, say
T
u, v ∈ R the notation < u , v > = u · v = u · v is scalar product; and
component by component.
■
The u ∧ v ∈ Rn means component wise minimum, that is the minimum on
each pair of components min { ui ; vi } , i = 1, n ;
i=1
33
232
SC - ART1 Algorithm
G1 =
0 otherwise
1 if input IH ≠ 0
G2 =
0 otherwise
ρ
by . Each neuron at the output layer represents a cluster, and the
of the cluster.
34
233
SC - ART1 Algorithm
Step - 1 (Initialization)
■ Initialize bottom-up wij (t) and top-down vji (t) weights for time t.
Weight wij is from neuron i in F1 layer to neuron j in F2 layer; where i
= 1, n ; j = 1, m ; and weight matrix W(t) = (wij (t)) is of
type n x m.
T
wj (t) = [(w1j (t) . . . . wij (t) . . . wnj (t)] , T is transpose and
W11 W12
W31 W32
type m x n.
vj (t) = [(vj1 (t) . . . . vji (t) . . . vjn (t)] T, T is transpose and vji = 1 .
234
v11 v12 v13
■ Learning rate α = 0, 9
■
Special Rule : Example
35
235
SC - ART1 Algorithm
Time t =1,
network. Then
36
236
SC - ART1 Algorithm
Σ
y j = n Ii x wij , If j = 1 , 2 then y j=1 and y j=1 are
i=1
W11 W12
W31 W32
Find k , the node in F2, that has the largest y k calculated in step 4.
no of nodes
in F
y k = Σ 2 max (y j )
j=1
Else go to step 6.
Note :
Calculated in step 4, y j=1 = 1/4 and y j=2 = 1/4, are equal, means an
237
Let us say the winner is the second node between the equals, i.e., k = 2.
r= =
||X(t)|| ||X(t)||
Go to step 6.
37
238
SC - ART1 Algorithm
* * * * *
X k =(x 1 ,x 2 ,·· · , x i=n ) where x i = vki x Ii is the
*
X K = (vk1 I1 , . . , vki Ii . ., vkn In) T
*
Calculate the similarity between X k and input IH using :
n
*
Σ Xi
*
X k i=1
=
IH
Σ Ii
i=1
*
Example : If X K=2 = {0 0 1} , IH=1 = {0 0 1}
*
then similarity between X k and input IH is
Σ
* n *
X K=2 X i
i=1
= = 1
239
n
IH=1
Σ Ii
i=1
38
240
SC - ART1 Algorithm
X K=2
ρ
The similarity = 1 is >
IH=1
*
It means the similarity between X K=2 , IH=1 is true. Therefore,
vk i (new)
wk i (new) = where i = 1, 2, . . , n
0.5 + || vk i (new) ||
(d) Update weight matrix W(t) and V(t) for next input vector, time t =2
vj=1 vj=2
241
v11 v12 v13
Wj=1 Wj=2
W11 W12
W31 W32
39
242
SC - ART1 Numerical example
Input:
1 0 0 1 x(1) = { 0 0 1}T
2 0 1 0 x(2) = { 0 1 0}T
3 0 1 1 x(3) = { 0 1 1}T
T
4 1 0 0 x(4) = { 1 0 0}
T
5 1 0 1 x(5) = { 1 0 1}
T
6 1 1 0 x(6) = { 1 1 0}
T
7 1 1 1 x(7) = { 1 1 1}
– The variable t is time, here the natural numbers which vary from
1 to 7, is expressed as t = 1 , 7 .
243
– Each x(t) has 3 elements, hence input layer F1 contains n= 3 neurons;
40
244
SC - ART1 Numerical example
Step - 1
(Initialization)
245
■ Initialize bottom-up wij (t) and top-down vji (t) weights for time t.
Weight wij is from neuron i in F1 layer to neuron j in F2 layer;
where i = 1, n ; j = 1, m ; and
T
wj (t) = [(w1j (t) . . . . wij (t) . . . wnj (t)] , T is transpose and
where j = 1, m; i = 1, n ;
vj (t) = [(vj1 (t) . . . . vji (t) . . . vjn (t)] T, T is transpose and vji = 1 .
■
Initialize
246
the vigilance parameter ρ = 0.3, usually 0.3 ≤ ρ ≤ 0.5
■
Learning rate α = 0, 9
■
Special Rule : While indecision , then the winner is second between
equal.
Step - 2
(Loop back from step 8)
Repeat
steps 3 to 10 for all input vectors IH = 1 to h=7 presented to
the
F1 layer; that is I(X) = { x(1), x(2), x(3), x(4), x(5), x(6), x(7) }
Step – 3
(Choose input pattern vector)
– As
input I ≠ 0 , therefore node G1 = 1 and thus activates
247
all nodes in
F1.
42
248
SC - ART1 Numerical example
Σ
y j = n Ii x wij
i=1
1/4
W11
1/4
1/4
W31
W12 1/4
1/4
1/4
W32
Find k , the node in F2, that has the largest y k calculated in step 4.
no of nodes
in F
y k = Σ 2 max (y j )
j=1
Else go to step 6.
Note :
Calculated in step 4, y j=1 = 1/4 and y j=2 = 1/4, are equal, means an
249
indecision tie. [Go by Remarks mentioned before, how to deal with the tie].
Let us say the winner is the second node between the equals, i.e., k = 2.
1
r= = = = =1
n
||X(t)|| ||X(t)|| 1
Σ |X(t)|
i=1
r>
Thus ρ = 0.3 , means resonance exists and learning starts as :
Go to Step 6.
43
250
SC - ART1 Numerical example
*
* = x* * ·· ·, * where x i ki i
is the
Xk ( 1 ,x 2 , x i=n ) = v xI
*
X K = (vk1 I1 , . . , vki Ii . ., vkn In) T
*
Accordingly X K=2 = {1 1 1} x {0 0 1} = {0 0 1}
*
Calculate the similarity between X k and input IH using :
n
*
Σ X i
*
X k i=1
= here n = 3
n
IH
Σ Ii
i=1
*
Accordingly , while X K=2 = {0 0 1} , IH=1 = {0 0 1}
*
Similarity between X k and input IH is
* *
X K=2 Σn X i
i=1
= = 1
251
n
IH=1
Σ Ii
i=1
44
252
SC - ART1 Numerical example
X K=2
>
The similarity = 1 is ρ
IH=1
*
It means the similarity between X K=2 , IH=1 is true. Therefore,
T
= 001
vk i (new)
wk i (new) = where i = 1, 2, . . , n = 3 ,
0.5 + || vk i (new) ||
253
=
0.5 + ||vk=2, i (t=2)|| 0.5 + 001
T
= 0 0 2/3
(d) Update weight matrix W(t) and V(t) for next input vector, time t =2
vj=1 vj=2
Wj=1 Wj=2
45
254
SC - ART1 Numerical example
Time t =2; IH = I2 = { 0 1 0 } ;
1/4 0
1 1 1
0 0 1
1/4 2/3
1/4
1/4
y j=1 = 0 1 0 = 1/4 = 0.25
1/4
0
y j=2 = 0 1 0 = 0 = 0
2/3
111 1
T
V k=1 · X(t=2) 0 1
r= = = = 1
n
||X(t=2)|| 1
Σ |X(t=2)|
255
i=1
r>
Resonance , since ρ = 0.3 , resonance exists ; So Start learning;
component by component
* T
X K=1 = Vk=1, i x IH=2, i = (vk1 IH1 , . . vki IHi . , vkn IHn)
= {1 1 1} x {0 1 0} = { 010 }
*
Find similarity between X K=1 = {0 1 0} and IH=2 = {0 1 0} as
Σ
* n *
X K=1 X i
i=1
= = 1
n
IH=2
Σ IH=2, i
i=1
46
256
SC - ART1 Numerical example
X K=1
Similarity = 1 is > ρ
IH=2
*
It means the similarity between X K=1 , IH=2 is true.
T
= 010
vk =1, i (new)
wk=1, (t=3) = =
257
0.5 + ||vk=1, i (t=3)|| 0.5 + 010
T
= 0 2/3 0
(d) Update weight matrix W(t) and V(t) for next input vector, time t =3
vj=1 vj=2
Wj=1 Wj=2
W11 W12 0 0
47
258
SC - ART1 Numerical example
Time t = 3; IH = I3 = { 0 1 1 } ;
0 0
0 1 0
0 0 1
0 2/3
2/3
y j=1 = 011 = 2/3 = 0.666 0
0
y j=2 = 011 = 2/3 = 0.666 2/3
winner as second; j = K = 2
Decision K = 2
001 0
259
1
T
V k=2 · X(t=3) 1 1
r= = = = 0.5
n
||X(t=3)|| 2
Σ |X(t=3)|
i=1
r>
Resonance , since ρ = 0.3 , resonance exists ; So Start learning;
component by component
* T
X K=2 = Vk=2, i x IH=3, i = (vk1 IH1 , . . vki IHi . , vkn IHn)
= {0 0 1} x {0 1 1} = { 001 }
*
Find similarity between X K=2 = {0 1 0} and IH=3 = {0 1 1} as
* *
X K=2 Σn X i
i=1
= = 1/2 = 0.5
n
IH=3
Σ IH=3, i
i=1
48
260
SC - ART1 Numerical example
X K=1
IH=2
*
It means the similarity between X K=2 , IH=3 is true.
T
= 001
vk =2, i (new)
wk=2, (t=4) = =
261
0.5 + ||vk=2, i (t=4)|| 0.5 + 001
T
= 0 0 2/3
(d) Update weight matrix W(t) and V(t) for next input vector, time t =4
vj=1 vj=2
Wj=1 Wj=2
W11 W12 0 0
49
262
SC - ART1 Numerical example
Time t = 4; IH = I4 = { 1 0 0 } ;
0 0
0 1 0
0 0 1
0 2/3
0 0
0 2/3
010 0
T
V k=1 · X(t=4) 0 0
r= = = = 0
n
||X(t=4)|| 1
Σ |X(t=4)|
r < ρ i=1
263
Resonance , since = 0.3 , no resonance exists ;
001 0
T
V k=2 · X(t=4) 0 0
r= = = = 0
n
||X(t=4)|| 1
Σ |X(t=4)|
i=1
50
Update weight matrix W(t) and V(t) for next input vector, time t =5
T
W(4) = W(3) ; V(4) = V(3) ; O(t = 4) = { 1 1}
vj=1 vj=2
264
v11 v12 v13 0 1 0
Wj=1 Wj=2
W11 W12
0 0
W31 W32
0 2/3
51
265
SC - ART1 Numerical example
Time t =5; IH = I5 = { 1 0 1 } ;
0 0
0 1 0
0 0 1
0 2/3
0 0
0 2/3
001 0
T
V k=2 · X(t=5) 1 1
r= = = = 0.5
n
||X(t=5)|| 2
Σ |X(t=5)|
i=1
266
Input vector x(t=5) is accepted by F2k=2 , means x(5) ∈ A2 Cluster.
component by component
* T
X K=2 = Vk=2, i x IH=5, i = (vk1 IH1 , . . vki IHi . , vkn IHn)
= {0 0 1} x {1 0 1} = { 001 }
*
Find similarity between X K=2 = {0 0 1} and IH=5 = {1 0 1} as
Σ
* n *
X K=2 X i
i=1
= = 1/2 = 0.5
n
IH=5
Σ IH=5, i
i=1
52
267
SC - ART1 Numerical example
X K=1
IH=2
*
It means the similarity between X K=2 , IH=5 is true.
T
= 001
vk =2, i (new)
wk=2, (t=6) = =
268
0.5 + ||vk=2, i (t=5)|| 0.5 + 001
T
= 0 0 2/3
(d) Update weight matrix W(t) and V(t) for next input vector, time t =6
vj=1 vj=2
Wj=1 Wj=2
W11 W12 0 0
53
269
SC - ART1 Numerical example
Time t =6; IH = I6 = { 1 1 0 } ;
0 0
0 1 0
0 0 1
0 2/3
0 0
0 2/3
010 1
T
V k=1 · X(t=6) 0 1
r= = = = 0.5
n
||X(t=6)|| 2
Σ |X(t=6)|
i=1
270
Input vector x(t=6) is accepted by F2k=1 , means x(6) ∈ A1 Cluster.
component by component
* T
X K=1 = Vk=1, i x IH=6, i = (vk1 IH1 , . . vki IHi . , vkn IHn)
= {0 1 0} x {1 1 0} = { 010 }
*
Find similarity between X K=1 = {0 1 0} and IH=6 = {1 1 0} as
Σ
* n *
X K=1 X i
i=1
= = 1/2 = 0.5
n
IH=6
Σ IH=5, i
i=1
54
271
SC - ART1 Numerical example
X K=1
IH=6
*
It means the similarity between X K=1 , IH=6 is true.
T
= 010
vk =1, i (new)
wk=1, (t=7) = =
272
0.5 + ||vk=1, i (t=7)|| 0.5 + 010
T
= 0 2/3 0
• Update weight matrix W(t) and V(t) for next input vector, time t =7
vj=1 vj=2
Wj=1 Wj=2
W11 W12 0 0
55
273
SC - ART1 Numerical example
Time t =7; IH = I7 = { 1 1 1 } ;
0 0
0 1 0
0 0 1
0 2/3
0 0
0 2/3
Decision K = 2
001 1
T
V k=2 · X(t=7) 1 1
r= = = = 0.333
||X(t=7)|| n 3
274
Σ |X(t=7)|
i=1
r>
Resonance , since ρ = 0.3 , resonance exists ; So Start learning;
component by component
* T
X K=2 = Vk=2, i x IH=7, i = (vk1 IH1 , . . vki IHi . , vkn IHn)
= {0 0 1} x {1 1 1} = { 001 }
*
Find similarity between X K=2 = {0 0 1} and IH=7 = {1 1 1} as
* *
X K=2 Σn X i
i=1
= = 1/3 = 0.333
n
IH=7
Σ IH=7, i
i=1
56
275
SC - ART1 Numerical example
X K=2
IH=7
*
It means the similarity between X K=2 , IH=7 is true.
T
001
vk =2, i (new)
wk=2, (t=8) = =
276
0.5 + ||vk=2, i (t=8)|| 0.5 + 001
T
= 0 0 2/3
- Update weight matrix W(t) and V(t) for next input vector, time t =8
vj=1 vj=2
Wj=1 Wj=2
W11 W12 0 0
57
277
SC - ART1 Numerical example
■ Remarks
These two weight matrices, given below, were arrived after all, 1 to 7,
patterns were one-by-one input to network that adjusted the weights
following the algorithm presented.
vj=1 vj=2
278
0 0 1
v21 v22 v23
Wj=1 Wj=2
W11 W12 0 0
58
279
SC - ART2
4.3 ART2
They later developed ART2 for clustering continuous or real valued vectors.
The capability of recognizing analog patterns is significant enhancement to
the system. The differences between ART2 and ART1 are :
The learning laws of ART2 are simple though the network is complicated.
59
280
Fuzzy Set Theory
■ The word "fuzzy" means "vagueness". Fuzziness occurs when the boundary
of a piece of information is not clear-cut.
• Classical set theory allows the membership of the elements in the set
- Example:
− For some people, age 25 is young, and for others, age 35 is young.
− Age 35 has some possibility of being young and usually depends on the
context in which it is being considered.
281
3. Introduction
However, our systems are unable to answer many questions. The reason is,
most systems are designed based upon classical set theory and two-valued
logic which is unable to cope with unreliable and incomplete information and
give expert opinions.
We want, our systems should also be able to cope with unreliable and
incomplete information and give expert opinions. Fuzzy sets have been able
provide solutions to many real world problems.
Fuzzy Set theory is an extension of classical set theory where elements have
degrees of membership.
282
• Classical Set Theory
A = { a1 , a2 , a3 , a4 , . . . . an }
If the elements
ai (i = 1, 2, 3, . . . n) of a set A are subset of
then
universal set X,
set A can be represented for all elements
x
∈
X by its characteristic function
1 if x∈X
A (x) =
otherwise
283
- value of 0 for those elements x that do not belong to set A.
The notations used to express these mathematically are
: Χ → [0, 1]
1 if x
∈
X
A (x) =
Eq.(2)
otherwise
− Thus in classical set theory A (x) has only the values 0 ('false') and 1
05
284
SC - Fuzzy set theory - Introduction
− A Fuzzy Set is any set that allows its members to have different
degree of membership, called membership function, in the interval
[0 , 1].
− Fuzzy logic is derived from fuzzy set theory dealing with reasoning that is
06
285
SC - Fuzzy set theory - Introduction
The characteristic function A(x) of Eq. (2) for the crisp set is
generalized for the Non-crisp sets.
membership function.
− The proposition of Fuzzy Sets are motivated by the need to capture and
represent real world data with uncertainty due to imprecise
measurement.
07
286
• Representation of Crisp and Non-Crisp Set
•
Non-Crisp Representation to represent the notion of a tall person.
1 1
0 0
A student of height 1.79m would belong to both tall and not tall sets
with a particular degree of membership.
As the height increases the membership grade within the tall set would
increase whilst the membership grade within the not-tall set would
decrease.
287
288
SC - Fuzzy set theory - Introduction
■ Capturing Uncertainty
•
The notation used for membership function A (x) of a fuzzy set A is
: Χ → [0, 1]
•
Each membership function maps elements of a given universal base
set X , which is itself a crisp set, into real numbers in [0, 1] .
■ Example
c (x)
F (x)
C
F
0.5
289
0
x
■
In the case of Crisp Sets the members of a set are :
09
290
SC - Fuzzy set theory - Introduction
10
291
SC - Fuzzy set theory – Fuzzy Set
2. Fuzzy Set
A
fuzzy set A, defined in the universal space X, is a function defined
in
X which assumes values in the range [0, 1].
The value
A(x) is the membership of the element x in a
grade
fuzzy set
A.
Example :
Set SMALL in set X consisting of natural numbers
≤
to 12.
Assume:
SMALL(1) = 1, SMALL(2) = 1, SMALL(3) = 0.9, SMALL(4) = 0.6,
292
SMALL(5) = 0.4, SMALL(u) = 0 for u >= 9.
SMALL(6) = 0.3,
SMALL(7) = 0.2,
SMALL(8) = 0.1,
Set SMALL = {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3}, {7, 0.2},
Note that a fuzzy set can be defined precisely by associating with each x ,
its grade of membership in SMALL.
11
293
SC - Fuzzy set theory – Fuzzy Set
Originally the universal space for fuzzy sets in fuzzy logic was defined only
on the integers. Now, the universal space for fuzzy sets and fuzzy
relations is defined with three numbers.
The first two numbers specify the start and end of the universal space,
and the third argument specifies the increment between elements. This
gives the user more flexibility in choosing the universal space.
12
294
SC - Fuzzy set theory – Fuzzy Membership
in a fuzzy set A.
The Graphic Interpretation of fuzzy membership for the fuzzy sets : Small,
Prime Numbers, Universal-space, Finite and Infinite UniversalSpace, and
Empty are illustrated in the next few slides.
13
295
SC - Fuzzy set theory – Fuzzy Membership
The fuzzy set SMALL of small numbers, defined in the universal space
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
{8, 0.1}, {9, 0}, {10, 0}, {11, 0}, {12, 0}} , UniversalSpace → {1, 12, 1}]
SMALL
.8
.6
.4
.2
0 0 1 2 3 4 5 6 7 8 9 10 11 12 X
296
14
297
SC - Fuzzy set theory – Fuzzy Membership
PRIME = FuzzySet {{1, 0}, {2, 1}, {3, 1}, {4, 0}, {5, 1}, {6, 0}, {7, 1}, {8, 0}, {9, 0}, {10, 0}, {11, 1},
{12, 0}}
{8, 0}, {9, 0}, {10, 0}, {11, 1}, {12, 0}} , UniversalSpace → {1, 12, 1}]
PRIME
.8
.6
.4
.2
0 0 1 2 3 4 5 6 7 8 9 10 11 12 X
298
15
299
SC - Fuzzy set theory – Fuzzy Membership
In any application of sets or fuzzy sets theory, all sets are subsets of
UNIVERSALSPACE = FuzzySet {{1, 1}, {2, 1}, {3, 1}, {4, 1}, {5, 1}, {6, 1},
{7, 1}, {8, 1}, {9, 1}, {10, 1}, {11, 1}, {12, 1}}
{8, 1}, {9, 1}, {10, 1}, {11, 1}, {12, 1}} , UniversalSpace → {1, 12, 1}]
UNIVERSAL SPACE
.8
.6
.4
.2
300
0 1 2 3 4 5 6 7 8 9 10 11 12 X
16
301
SC - Fuzzy set theory – Fuzzy Membership
Examples:
17
302
SC - Fuzzy set theory – Fuzzy Membership
EMPTY = FuzzySet {{1, 0}, {2, 0}, {3, 0}, {4, 0}, {5, 0}, {6, 0}, {7, 0}, {8,
EMPTY
.8
.6
303
.4
.2
0 0 1 2 3 4 5 6 7 8 9 10 11 12 X
18
304
SC - Fuzzy set theory – Fuzzy Operation
A fuzzy set operations are the operations on fuzzy sets. The fuzzy set
operations are generalization of crisp set operations. Zadeh [1965]
formulated the fuzzy set theory in the terms of standard operations:
Complement, Union, Intersection, and Difference.
Inclusion :
FuzzyInclude [VERYSMALL, SMALL]
Equality :
FuzzyEQUALITY [SMALL, STILLSMALL]
Complement :
FuzzyNOTSMALL = FuzzyCompliment [Small]
Union :
FuzzyUNION = [SMALL MEDIUM]
∪
Intersection :
FUZZYINTERSECTON = [SMALL MEDIUM]
∩
19
305
SC - Fuzzy set theory – Fuzzy Operation
• Inclusion
The fuzzy set A is included in the fuzzy set B if and only if for every x in
Example :
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
{8, 0.1}, {9, 0}, {10, 0}, {11, 0}, {12, 0}} , UniversalSpace → {1, 12, 1}]
VERYSMALL = FuzzySet {{1, 1 }, {2, 0.8 }, {3, 0.7}, {4, 0.4}, {5, 0.2},
{6, 0.1}, {7, 0 }, {8, 0 }, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
306
The Fuzzy Operation : Inclusion
Membership Grade B A
.8
.6
.4
.2
0 1 2 3 4 5 6 7 8 9 10 11 12 X
20
307
SC - Fuzzy set theory – Fuzzy Operation
• Comparability
if one of the fuzzy sets is a subset of the other set, they are comparable.
Example 1:
Example 2 :
D is not a subset of C.
308
Property Related to Inclusion :
21
309
SC - Fuzzy set theory – Fuzzy Operation
• Equality
Let A and B
be fuzzy sets defined in the same space X.
Then A and B if
and only if
are equal, which is denoted X = Y
Example.
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
STILLSMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4},
{6, 0.3}, {7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
Membership Grade BA
.8
.6
310
.4
.2
0 0 1 2 3 4 5 6 7 8 9 10 11 12 X
Note : If equality A(x) = B(x) is not satisfied even for one element x in
22
311
SC - Fuzzy set theory – Fuzzy Operation
• Complement
Then the fuzzy set B is a complement of the fuzzy set A, if and only if,
Example 1.
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
NOTSMALL = FuzzySet {{1, 0 }, {2, 0 }, {3, 0.1}, {4, 0.4}, {5, 0.6}, {6, 0.7},
{7, 0.8}, {8, 0.9}, {9, 1 }, {10, 1 }, {11, 1}, {12, 1}}
Membership Grade A Ac
.8
.6
.4
.2
312
0
0 1 2 3 4 5 6 7 8 9 10 11 12 X
23
313
SC - Fuzzy set theory – Fuzzy Operation
Example 2.
The empty set Φ and the universal set X, as fuzzy sets, are
Φ'= X' =
X , Φ
Empty = FuzzySet {{1, 0 }, {2, 0 }, {3, 0}, {4, 0}, {5, 0}, {6, 0},
{7, 0}, {8, 0}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
Universal = FuzzySet {{1, 1 }, {2, 1 }, {3, 1}, {4, 1}, {5, 1}, {6, 1},
{7, 1}, {8, 1}, {9, 1 }, {10, 1 }, {11, 1}, {12, 1}}
Membership Grade B A
.8
.6
.4
.2
0 0 1 2 3 4 5 6 7 8 9 10 11 12 X
314
24
315
SC - Fuzzy set theory – Fuzzy Operation
• Union
The union is defined as the smallest fuzzy set that contains both A and
B. The union of A and B is denoted by A ∪ B.
A(x) = 0.6 and B(x) = 0.4 ∴ (A ∪ B)(x) = max [0.6, 0.4] = 0.6
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
MEDIUM = FuzzySet {{1, 0 }, {2, 0 }, {3, 0}, {4, 0.2}, {5, 0.5}, {6, 0.8},
{7, 1}, {8, 1}, {9, 0.7 }, {10, 0.4 }, {11, 0.1}, {12, 0}}
FUZZYUNION = [SMALL
∪ MEDIUM]
{6,0.8}, {7,1}, {8, 1}, {9, 0.7}, {10, 0.4}, {11, 0.1}, {12, 0}} ,
316
.8
.6
.4
.2
0 1 2 3 4 5 6 7 8 9 10 11 12 X
FuzzyPlot [UNION]
The notion of the union is closely related to that of the connective "or".
25
317
SC - Fuzzy set theory – Fuzzy Operation
• Intersection
intersection operation :
Fuzzy Intersection : (A
∩ B)(x) = min [A(x), B(x)] for all x ∈ X
A(x) = 0.6 and B(x) = 0.4 ∴ (A ∩ B)(x) = min [0.6, 0.4] = 0.4
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
MEDIUM = FuzzySet {{1, 0 }, {2, 0 }, {3, 0}, {4, 0.2}, {5, 0.5}, {6, 0.8},
{7, 1}, {8, 1}, {9, 0.7 }, {10, 0.4 }, {11, 0.1}, {12, 0}}
{10, 0}, {11, 0}, {12, 0}} , UniversalSpace → {1, 12, 1}]
318
1
.8
.6
.4
.2
0 0 1 2 3 4 5 6 7 8 9 10 11 12 X
FuzzyPlot [INTERSECTION]
26
319
SC - Fuzzy set theory – Fuzzy Operation
■ Difference
MEDIUM = FuzzySet {{1, 0 }, {2, 0 }, {3, 0}, {4, 0.2}, {5, 0.5}, {6, 0.8},
{7, 1}, {8, 1}, {9, 0.7 }, {10, 0.4 }, {11, 0.1}, {12, 0}}
MEDIUM = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0.7 }, {10, 0.4 }, {11, 0}, {12, 0}}
NOTSMALL = FuzzySet {{1, 0 }, {2, 0 }, {3, 0.1}, {4, 0.4}, {5, 0.6}, {6, 0.7}, {7, 0.8},
FUZZYDIFFERENCE = [MEDIUM
∩ SMALL']
UniversalSpace
{10, 0.4}, {11, 0.1}, {12, 0}} , → {1, 12, 1}]
320
FUZZYDIFFERENCE = [MEDIUM
Membership Grade ∪ SMALL' ]
.8
.6
.4
.2
0 1 2 3 4 5 6 7 8 9 10 11 12 X
FuzzyPlot [UNION]
27
321
SC - Fuzzy set theory – Fuzzy Properties
■
Identity:
A∪
Φ =A
A
∪ X=X
input
= Equality [SMALL UnivrsalSpace , UnivrsalSpace]
∪
output = True
■ Idempotence :
A∪A=A
output = True
322
•
Commutativity :
A ∪ B =B ∪ A
output = True
28
323
SC - Fuzzy set theory – Fuzzy Properties
-
Associativity:
A ∪ (B∪ C) = (A∪ B) ∪ C
output = True
Small = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0.7 }, {10, 0.4 }, {11, 0}, {12, 0}}
Medium = FuzzySet {{1, 0 }, {2, 0 }, {3, 0}, {4, 0.2}, {5, 0.5}, {6, 0.8},
{7, 1}, {8, 1}, {9, 0 }, {10, 0 }, {11, 0.1}, {12, 0}}
Big
= FuzzySet [{{1,0}, {2,0}, {3,0}, {4,0}, {5,0}, {6,0.1}, {7,0.2},
{8,0.4}, {9,0.6}, {10,0.8}, {11,1}, {12,1}}]
324
• Small ∪ (Medium ∪ Big) = FuzzySet [{1,1},{2,1}, {3,0.9}, {4,0.6},
{5,0.5}, {6,0.8}, {7,1}, {8, 1}, {9, 0.7}, {10, 0.8}, {11, 1}, {12, 1}]
29
325
SC - Fuzzy set theory – Fuzzy Properties
•
Absorption by Empty Set :
A∩
Φ =Φ
output = True
■
Identity :
A∩X=A
•
Idempotence :
A∩ A=A
■
Commutativity :
A
∩ B=B ∩ A
output = True
326
■ Associativity :
A
∩ (B C) = (A B) C
∩ ∩ ∩
output = True
30
327
SC - Fuzzy set theory – Fuzzy Properties
■ Additional Properties
•
Distributivity:
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
output = True
•
Distributivity:
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
output = True
•
Law of excluded middle :
A ∪ A' = X
=
Law of contradiction
328
A ∩ A' = Φ
31
329
SC - Fuzzy set theory – Fuzzy Properties
Let A and B be two crisp sets in the universe of discourse X and Y..
The Cartesian product of A and B is denoted by A x B
Defined as A x B = { (a , b) │ a ∈ A , b ∈ B }
Note : Generally A x B ≠ B x A
B
Let A = {a, b, c} and B = {1, 2}
2
then A x B = { (a , 1) , (a , 2) ,
(b , 1) , (b , 2) , 1
(c , 1) , (c , 2) } A
ab c
•
Cartesian product of two Fuzzy Sets
or A x B (x , y) = A (x) B (y)
330
Thus the Cartesian product A x B is a fuzzy set of ordered pair
and y ∈ Y,
(x , y) for all x ∈ X with grade membership of (x , y) in
32
331
SC - Fuzzy set theory – Fuzzy Relations
- Fuzzy Relations
− Fuzzy relations offer the capability to capture the uncertainty and vagueness
in relations between sets and elements of a set.
In this section, first the fuzzy relation is defined and then expressing fuzzy
relations in terms of matrices and graphical visualizations. Later the properties
of fuzzy relations and operations that can be performed with fuzzy relations are
illustrated.
33
332
SC - Fuzzy set theory – Fuzzy Relations
AxB
= { (x , y) | x ∈ A, y ∈ B}
where A
and B are subsets of universal sets U1 and U2.
Fuzzy relation
on A x B is denoted by R or R(x , y) is defined as the set
R = { ((x , y) , R (x , y)) | (x , y)
A x B , R (x , y) [0,1] }
∈ ∈
333
Note :
set from the 2-D space (x , , R (x)) to 3-D space ((x , y) , R (x , y)).
34
334
SC - Fuzzy set theory – Fuzzy Relations
y y1 Y2 Y3
x1 0 0.1 0.2
R
X3 1 0.6 0.2
(X, Y, ) as :
Assuming x1 = 1 , x2 = 2 , x3 = 3
the relation can be graphically
335
and y1 = 1 , y2= 2 , y3= 3 , represented
by points in 3-D space
35
336
SC - Fuzzy set theory – Fuzzy Relations
− The first item is a list containing element and membership grade pairs,
{{v1, w1}, R11}, {{ v1, w2}, R12}, ... , {{ vn, wm}, Rnm}}.
where { v1, w1}, { v1, w2}, ... , { vn, wm} are the elements of the relation are
defined as ordered pairs, and { R11 , R12 , ... , Rnm} are the membership grades
of the elements of the relation that range from 0 to 1, inclusive.
− The second item is the universal space; for relations, the universal
space consists of a pair of ordered pairs,
where the first pair defines the universal space for the first set and the second
pair defines the universal space for the second set.
■ = FuzzyRelation [{{{1, 1}, 1}, {{1, 2}, 0.2}, {{1, 3}, 0.7}, {{1, 4}, 0}, {{2,
1}, 0.7}, {{2, 2}, 1}, {{2, 3}, 0.4}, {{2, 4}, 0.8}, {{3, 1},
0}, {{3, 2}, 0.6}, {{3, 3}, 0.3}, {{3, 4}, 0.5},
UniversalSpace → {{1, 3, 1}, {1, 4, 1}}]
This relation can be represented in the following two forms shown below
337
Elements of fuzzy relation are ordered pairs {vi , wj}, where vi is first and
wj is second element. The membership grades of the elements are
represented by the heights of the vertical lines.
36
338
SC - Fuzzy set theory – Fuzzy Relations
function. The first, the second and the total projections of fuzzy
(1)
R = {(x) , R(1) (x , y))}
max
Y
(2)
R = {(y) , R(2) (x , y))}
339
{
∈
max max R (x , y) | (x , y) AxB}
X Y
37
340
SC - Fuzzy set theory – Fuzzy Relations
The Fuzzy Relation R together with First, Second and Total Projection
y y1 y2 y3 y4 Y5 R(1)
x
Note :
(1)
For R select max means max with respect to y while x is considered fixed
For R(2) select max means max with respect to x while y is considered fixed
R(1) R(2)
1 1
.8
.8
.6 .6
341
.4
.4
.2
.2
0 x 0 y
1 2 3 4 5
1 2 3 4 5
(1) (2)
Fig Fuzzy plot of 1st projection R Fig Fuzzy plot of 2nd projection R
38
342
SC - Fuzzy set theory – Fuzzy Relations
x A, y z
variables, say (x , y) and (y , z) ; ∈ ∈ B, ∈ C.
◊ Max-Min Composition
R1
ο R2 = { ((x , z) , max(min (R1 (x , y) , R2 (y , z))))} ,
(x , z) AxC,y ∈ B
∈
Thus
R1 R2 is relation in the domain AxC
ο
39
343
SC - Fuzzy set theory – Fuzzy Relations
y y1 y2 y3 z z1 z2 z3
x y
y3 0.5 0 0.4
Note : Number of columns in the first table and second table are equal.
Consider row x1 and column z1 , means the pair (x1 , z1) for all yj ,
min ( R1 (x1 , y1) , R2 (y1 , z1)) = min (0.1, 0.8) = 0.1,
min ( R1 (x1 , y2) , R2 (y2 , z1)) = min (0.3, 0.2) = 0.2,
For x = x1 , z = z1 , y = yj , j = 1, 2, 3,
{ (x1 , z1) , max ( (min (0.1, 0.8), min (0.3, 0.2), min (0, 0.5) )
344
i.e. { (x1 , z1) , 0.2 }
(x1 , z2) , (x1 , z3) , (x2 , z1) , (x2 , z2) , (x2 , z3)
z z1 z2 z3
R1 ο R2 = x1 0.1 0.3 0
x2 0.8 1 0.3
40
345
SC - Fuzzy set theory – Fuzzy Relations
min
R1 R2 = { ((x , z) , y (max (R1 (x , y) , R2 (y , z))))} ,
(x , z) ∈ AxC,y∈B
y y1 y2 y3 z z1 z2 z3
x y
R2
x2 0.8 1 0.3 y2 0.2 1 0.6
y3 0.5 0 0.4
346
composition, the final result is
z z1 z2 z3
R1 R2 = x1 0.3 0 0.1
R1 ο R2 = R1 R2
41
347
Fuzzy Systems
impossible to quantify.
Fuzzy Logic can coordinate these two forms of knowledge in a logical way.
• Many real world problems have been modeled, simulated, and replicated
with the help of fuzzy systems.
• Expert Systems design have become easy because their domains are
inherently fuzzy and can now be handled better;
348
examples : Decision-support systems, Financial planners, Diagnostic
system, and Meteorological system.
03
349
Sc – Fuzzy System Introduction
• Introduction
Any system that uses Fuzzy mathematics may be viewed as Fuzzy system.
The Fuzzy Set Theory - membership function, operations, properties and the
relations have been described in previous lectures. These are the prerequisites
for understanding Fuzzy Systems. The applications of Fuzzy set theory is Fuzzy
logic which is covered in this section.
Here the emphasis is on the design of fuzzy system and fuzzy controller in a
04
350
Sc – Fuzzy System Introduction
• Fuzzy System
Fuzzy
Rule Base
Input output
variables variables
X1 Fuzzy Y1
X2 Y2
Fuzzification Inferencing Defuzzification
Xn Ym
Membeship Function
T
− Input Vector : X = [x1 , x2, . . . xn ] are crisp values, which are
transformed into fuzzy sets in the fuzzification block.
T
− Output Vector : Y = [y1 , y2, . . . ym ] comes out from the
351
defuzzification block, which transforms an output fuzzy set back to
a crisp value.
If (x is A ) AND (y is B ) . . . . . . THEN (z is C)
05
352
Sc – Fuzzy System – Fuzzy logic
1. Fuzzy Logic
A simple form of logic, called a two-valued logic is the study of "truth tables"
and logic circuits. Here the possible values are true as 1, and false as 0.
This simple two-valued logic is generalized and called fuzzy logic which treats
"truth" as a continuous quantity ranging from 0 to 1.
Definition : Fuzzy logic (FL) is derived from fuzzy set theory dealing with
reasoning that is approximate rather than precisely deduced from classical two-
valued logic.
06
353
Sc – Fuzzy System – Fuzzy logic
Logic is used to represent simple facts. Logic defines the ways of putting
symbols together to form sentences that represent facts. Sentences are
either true or false but not both are called propositions.
Examples :
"x = x" - No
has no meaning)
Examples: (a) The sky is blue., (b) Snow is cold. , (c) 12 * 12=144
354
Propositional logic : It is fundamental to all logic.
‡ Example ;
Sentence "Grass is green";
Proposition “yes”
07
355
Sc – Fuzzy System – Fuzzy logic
Statement : A simple statement is one that does not contain any other
statement as a part. A compound statement is one that has two or more
simple statements as parts called components.
p
conjunction ∧ q · && & AND "both p and q are true"
or q is true,
or both "
or both false"
08
356
Sc – Fuzzy System – Fuzzy logic
• Truth Value
p q ¬p ¬q p ∧ q p v q p→ q p↔ q q→ p
T T F F T T T T T
T F F T F T F F T
F T T F F T T F F
F F T T F F T T T
09
357
Sc – Fuzzy System – Fuzzy logic
■ Tautology
r, . . . .
¬ [p (¬p)
(p→ q) ↔ ∧ (¬q)] and (p→ q) ↔ ∨ q
A proof of these tautologies, using the truth tables are given below.
¬ [p (¬p)
Tautologies (p→ q) ↔ ∧ (¬q)] and (p→ q) ↔ ∨ q
p→ ¬ [p ∧ (¬p)
p q q ¬q p ∧ (¬q) (¬q)] ¬p ∨ q
T T T F F T F T
T F F T T F F F
F T T F F T T T
F
F F T T T T T
Note :
358
- The importance of these tautologies is that they express the
membership function for p→ q in terms of membership functions of
either propositions p and q or p and q.
10
359
Sc – Fuzzy System – Fuzzy logic
■ Equivalences
Some mathematical equivalence between Logic and Set theory and the
correspondence between Logic and Boolean algebra (0, 1) are given
below.
T 1
F 0
∧ x ∩ , ∩
∨ + ∪ , U
¬ ′ ie complement (―)
↔ =
p, q, r a, b, c
11
360
Sc – Fuzzy System – Fuzzy logic
Using these facts and the equivalence between logic and set theory,
we can obtain membership functions for p→ q (x , y) .
q (y)] 1 - q (y)]
1 1 0 0 1 1
1 0
0 1 0 0
0 1 1 0 1 1
0 0 1 1 1 1
361
Note :
■ Entries in last two columns of this table-2 agrees with the entries in
table-1 for p→ q , the proof of tautologies, read T as 1 and F as 0.
■ The implication membership functions of Eq.1 and Eq.2 are not the
only ones that give agreement with p→ q. The others are :
p
(x , y) = 1 - p (x) (1 - q (y))
→
q Eq (3)
p
→
q (x , y) = min [ 1, 1 - p (x) + q (y)] Eq (4)
12
362
Sc – Fuzzy System – Fuzzy logic
Modus Ponens
Modus Ponens is associated with the implication " A implies B " [A→ B]
In terms of propositions p and q, the Modus Ponens is expressed as
(p ∧ (p → q)) →
q Modus Tollens
(¬ q ¬p
(p q))
→
∧
→
13
363
Sc – Fuzzy System – Fuzzy logic
Like the extension of crisp set theory to fuzzy set theory, the extension of
crisp logic is made by replacing the bivalent membership functions of the
crisp logic with the fuzzy membership functions.
In crisp logic, the truth value acquired by the proposition are 2-valued,
namely true as 1 and false as 0.
In fuzzy logic, the truth values are multi-valued, as absolute true, partially
true, absolute false etc represented numerically as real value between
■ to 1.
Note : The fuzzy variables in fuzzy sets, fuzzy propositions, fuzzy relations
~
etc are represented usually using symbol ~ as P but for the purpose of
easy to write it is always represented as P .
14
364
Sc – Fuzzy System – Fuzzy logic
- Recaps
and B with the membership functions A (x) and B (x) based on algebraic
4. Fuzzy Union operator U ( OR connective ) applied to two fuzzy sets A and B with
the membership functions A (x) and B (x) based on min/max
operations is AUB= max [ A (x) , B (x) ] , x ∈ X (Eq. 03)
with the membership functions A (x) and B (x) based on algebraic sum is
x
A U B = A (x) + B (x) - A (x) B (x) , ∈ X (Eq. 04)
1 - A (x) , x
with the membership function A (x) is = ∈ X (Eq. 05)
365
R
R 0 0.2 1
Thus
R1 R2 is relation in the domain AxC
ο
15
366
Sc – Fuzzy System – Fuzzy logic
- Fuzzy Propositional
Example :
P: Ram is honest
16
367
Sc – Fuzzy System – Fuzzy logic
■ Fuzzy Connectives
Nagation ¬ ¬P 1 – T(P)
P
Disjuction ∨ ∨ Q Max[T(P) , T(Q)]
P
Conjuction ∧ ∧ Q min[T(P) , T(Q)]
Here P , Q are fuzzy proposition and T(P) , T(Q) are their truth values.
IF x is A THEN y is B, is equivalent to
R = (A x B) U (¬ A x Y)
368
− For the compound implication statement like
R = (A x B) U (¬ A x C)
17
369
Sc – Fuzzy System – Fuzzy logic
P
∧ Q : Mary is efficient and so is Ram, i.e.
T(P
∧ Q) = min (T(P), T(Q)) = min (0.8, 0.65)) = 0.65
P
∨ Q : Either Mary or Ram is efficient i.e.
T(P
∨ Q) = max (T(P), T(Q)) = max (0.8, 0.65)) = 0.8
18
370
Sc – Fuzzy System – Fuzzy logic
Let X = {a, b, c, d} ,
• If x is A THEN y is B
for all x in the set X
• If x is A THEN y is B Else y is C
Solution
IF x is A THEN y is B,
is equivalent to R = (A x B) U (¬ A x Y) and
371
for all x in the set X,
B 1 2 3 4
y 1 2 3 4
A
A
a 0 0 0 0
a 1 1 1 1
d 0.2 1 0.8 0
d 0 0 0 0
Therefore R = (A x B) U (¬ A x Y) gives
y 1 2 3 4
a 1 1 1 1
d 0.2 1 0.8 0
19
372
Sc – Fuzzy System – Fuzzy logic
Given X = {a, b, c, d} ,
R = (A x B) U (¬ A x C) and
B 1 2 3 4 y 1 2 3 4
A A
a 0 0 0 0 a 0 0.4 1 0.8
d 0.2 1 0.8 0 d 0 0 0 0
373
Therefore R = (A x B) U (¬ A x C) gives
y 1 2 3 4
a 1 1 1 1
d 0.2 1 0.8 0
20
374
Sc – Fuzzy System – Fuzzy logic
3 Fuzzy Quantifiers
− Absolute quantifiers
and − Relative quantifiers
Examples :
21
375
Sc – Fuzzy System – Fuzzification
ƒ Fuzzification
Example 2 : Speed X0 =
40km/h
Fig below shows the fuzzification of
low and a medium speed fuzzy set.
Low Medium
Medium
1
V Low Low High V High
A B
.8
1
.8
.6
.6
.4
.4
.2
.2
0
Speed X0 = 70km/h 10 20 30 40 50 60 70 80 90 00
Speed X0 =
40km/h
Characterizing two grades, low and
376
the car speed to characterize a
22
377
Sc – Fuzzy System – Fuzzy Inference
Fuzzy Inference
Fuzzy Inferencing combines - the facts obtained from the fuzzification with the
rule base, and then conducts the fuzzy reasoning process.
23
378
Sc – Fuzzy System – Fuzzy Inference
If x is A THEN y is B
- is ¬A
- is ¬B
24
379
Sc – Fuzzy System – Fuzzy Inference
If x is A THEN y is B
• is ¬B
x is ¬A
380
Sc – Fuzzy System – Fuzzy Inference
Example :
Apply the fuzzy Modus Ponens rules to deduce Rotation is quite slow?
Given :
Let H (High) , VH (Very High) , S (Slow) and QS (Quite Slow) indicate the
associated fuzzy sets.
Let the set for temperatures be X = {30, 40, 50, 60, 70, 80, 90, 100} , and
Let the set of rotations per minute be Y = {10, 20, 30, 40, 50, 60} and
R (x, y) = max (H x S , ¬ H x Y)
10 20 30 40 50 60 10 20 30 40 50 60
30 0 0 0 0 0 0 30 1 1 1 1 1 1
40 0 0 0 0 0 0 40 1 1 1 1 1 1
50 0 0 0 0 0 0 50 1 1 1 1 1 1
381
60 0 0 0 0 0 0 60 1 1 1 1 1 1
HxY=
80 0 0 0.8 1 0.6 0 80 0 0 0 0 0 0
100 0 0 0 0 0 0 100 1 1 1 1 1 1
10 20 30 40 50 60
30 1 1 1 1 1 1
40 1 1 1 1 1 1
50 1 1 1 1 1 1
60 1 1 1 1 1 1
R(x,Y) =
70 0 0 0.8 1 0.6 0
80 0 0 0.8 1 0.6 0
100 1 1 1 1 1 1
QS = VH ο R (x, y)
10 20 30 40 50 60
30 1 1 1 1 1 1
40 1 1 1 1 1 1
382
50 1 1 1 1 1 1
60 1 1 1 1 1 1
= [0 0 0 0 0 0 0.9 1] x
70 0 0 0 0 0 0
80 0 0 0 0 0 0
100 1 1 1 1 1 1
= [1 1 1 1 1 1 ]
where linguistic variables xi, yj take the values of fuzzy sets Ai and Bj
respectively.
Example :
IF
there is "heavy" rain and "strong" winds
383
Here, heavy , strong , and severe are fuzzy sets qualifying the variables
rain,
wind, and flood warnings respectively.
C = C1 ∩ C2 ∩ . . . ∩ Cn where
On the other hand, if the conclusion C to be drawn from a rule base R is the
disjunction of the individual consequents of each rule, then
C = C1 U C2 U . . . U Cn where
c (y ) = max (
c1 (y ), c2(y ) , cn (y )) , y Y where
∀
• is universe of discourse.
28
384
• Defuzzification
− Center of sums, −
Mean of maxima.
Centroid method
Σn xi (xi)
i=1
x* = where
Σ (xi)
385
Genetic Algorithms & Modeling
03
386
SC – GA - Introduction
• Introduction
Solving problems mean looking for solutions, which is best among others.
j unlike older AI systems, the GA's do not break easily even if the
387
1.1 Optimization
• In the panel design, we want to limit the weight and put constrain
on its shape.
388
• Optimization Methods
− low cost,
− high performance,
− low loss
Optimization
Methods
Linear Non-Linear
Programming Programming
Each of these methods are briefly discussed indicating the nature of the
problem they are more applicable.
389
■ Linear Programming
− the optimal solution, is the one that minimizes (or maximizes) the
objective function.
390
Enumerative search goes through every point (one point at a time )
related to the function's domain space. At each point, all possible
solutions are generated and tested to find optimum solution. It is easy to
implement but usually require significant computation. In the field of
artificial intelligence, the enumerative methods are subdivided into two
categories:
09
391
SC – GA - Introduction
which means
The two other search methodologies, shown below, the Classical and the
Enumerative methods, are first briefly explained. Later the Stochastic
methods are discussed in detail. All these methods belong to Non-Linear
search.
Search
Optimization
392
Evolutionary Genetic Simulated
10
393
SC – GA - Introduction
• Indirect methods :
394
11
395
SC – GA - Introduction
■ Enumerative Search
Here the search goes through every point (one point at a time) related to
the function's domain space.
− At each point, all possible solutions are generated and tested to find
optimum solution.
• Informed methods :
396
Next slide shows, the taxonomy of enumerative search in AI domain.
12
397
SC – GA - Introduction
− there are many control structures for search; the depth-first search
Enumerative Search
Queue: g(n)
Climbin
Search Search Search -and-test g
398
Impose fixed
Priority
depth limit
Queue: h(n)
Depth
Best first Problem Constraint Mean-end-
Limited satisfactio
search Reduction n analysis
Search
Priority Queue:
A*
Search AO* Search
Iterative
Deepening
DFS
13
399
SC – GA - Introduction
• Stochastic Search
The stochastic search techniques are grouped into two major subclasses :
− Evolutionary algorithms.
400
− the search evolves throughout generations, improving the
features of potential solutions by means of biological inspired
operations.
14
401
SC – GA - Introduction
Search
Optimization
Techniques
Indirect
Direct
Uninformed Informed
method
method
Search Search
Newton Finonacci
Genetic Genetic
Programming Algorithms
402
Fig. Taxonomy of Search Optimization techniques
- Genetic Programming
15
403
SC – GA - Introduction
Development History
EC = GP + ES + EP + GA
404
Evolutionary Genetic Evolution Evolutionary Genetic
16
405
SC – GA - Introduction
17
406
SC – GA - Introduction
Possible settings for a trait (e.g. blue, brown) are called alleles.
Each gene has its own position in the chromosome called its locus.
When two organisms mate they share their genes; the resultant
offspring may end up having half the genes from one parent and half
from the other. This process is called recombination (cross over) .
The new created offspring can then be mutated. Mutation means, that
the elements of DNA are a bit changed. This changes are mainly
caused by errors in copying genes from parents.
407
The fitness of an organism is measured by success of the organism in
its life (survival).
18
408
SC – GA - Introduction
Parents
Parents
Initialization
Recombination
Population
Mutation
Termination
Offspring
Survivor
Pseudo-Code
BEGIN
409
REPEAT UNTIL (termination condition ) is satisfied DO
■ SELECT parents;
END.
19
410
SC – GA - Introduction
• Search Space
In solving problems, some solution will be the best among others. The
space of all feasible solutions (among which the desired solution
− Each possible solution can be "marked" by its value (or fitness) for the
problem.
− Looking for a solution is then equal to looking for some extreme value
(minimum or maximum) in the search space.
− At times the search space may be well defined, but usually only a few
points in the search space are known.
20
411
SC – GA - Introduction
- Working Principles
Working principles :
412
Genetic algorithm begins with a set of solutions (represented by
chromosomes) called the population.
− Solutions from one population are taken and used to form a new
population. This is motivated by the possibility that the new population
will be better than the old one.
21
413
SC – GA - Introduction
• [Test] If the end condition is satisfied, stop, and return the best
solution in current population
• [Loop] Go to step 2
414
22
415
SC – GA - Introduction
Start
Seed Population
to each individual
No
Yes
Survival of Fittest
416
new individual into
population
Apply Mutation operator
No offspring
Terminate?
Finish
23
417
SC – GA - Encoding
◊ Encoding
Example :
A Gene represents some data (eye color, hair color, sight, etc.).
Chromosome 1 : 1101100100110110
Chromosome 2 : 1101111000011110
− There are many other ways of encoding, e.g., encoding values as integer or
real numbers or some permutations and so on.
418
− The virtue of these encoding method depends on the problem to work on .
24
419
SC – GA - Encoding
■ Binary Encoding
Chromosome 1: 101100101100101011100101
Chromosome 2: 111111100000110000011111
− This encoding is often not natural for many problems and sometimes
corrections must be made after crossover and/or mutation.
Example 1:
420
2 0010 8 1000 14 1110
4 0100 10 1010
5 0101 11 1011
25
421
SC – GA - Encoding
Example 2 :
Every variable will have both upper and lower limits as X iL ≤ Xi ≤ XiU
Because 4-bit string can represent integers from 0 to 15,
( X1
L
, X2L ) and ( X1
U
, X2
U
) respectively.
2 5 0 0x2 0 =0 K=ni - 1
Σ2kSk
2 2 1
1 =2
1x2 k=0
3
1x2 =8 Sn-1 . . . . S3 S2 S1 S0
10
422
Consider a 4-bit string (0111),
23 x 0 + 22 x 1 + 21 x 1 + 20 x 1 = 7
U L
(Xi − Xi )
L
Xi =Xi + --------------- x (decoded value of string)
ni
(2 −1)
− For e.g. a variable Xi ; let XiL =2 , and XiU = 17, find what value the
4-bit string Xi = (1010) would represent. First get decoded value for
Si = 1010 = 23 x 1 + 22 x 0 + 21 x 1 + 20 x 0 = 10 then
(17 -2)
Xi = 2 + ----------- x 10 = 12
(24 - 1)
26
423
SC – GA - Encoding
- Value Encoding
The Value encoding can be used in problems where values such as real
numbers are used. Use of binary encoding for this type of problems would
be difficult.
Examples :
Chromosome B ABDJEIFJDHDIERJFDLDFLFEGT
27
424
SC – GA - Encoding
- Permutation Encoding
Chromosome A 153264798
Chromosome B 856723149
Examples :
There are eight queens. Find a way to place them on a chess board so
that no two queens attack each other. Here, encoding describes the
position of a queen on each row.
425
28
426
SC – GA - Encoding
• Tree Encoding
Example :
Chromosome A Chromosome B
+
do untill
x /
step wall
5 y
427
Note : Tree encoding is good for evolving programs. The programming
language LISP is often used. Programs in LISP can be easily parsed as a
tree, so the crossover and mutation is relatively easy.
29
428
SC – GA - Operators
Genetic operators are analogous to those which occur in the natural world:
− Mutation.
− Population size says how many chromosomes are in population (in one
generation).
− If there are only few chromosomes, then GA would have a few possibilities
to perform crossover and only a small part of search space is explored.
− Research shows that after some limit, it is not useful to increase population
size, because it does not help in solving the problem faster. The population
size depends on the type of encoding and the problem.
429
30
430
SC – GA - Operators
Many reproduction operators exists and they all essentially do same thing.
They pick from current population the strings of above average and insert
their multiple copies in the mating pool in a probabilistic manner.
431
− Roulette wheel selection,
− Rank selection
− Boltzmann selection, −
Tournament selection,
− Steady state selection.
The Roulette wheel and Boltzmann selections methods are illustrated next.
31
432
SC – GA - Operators
• Example of Selection
Evolutionary Algorithms is
to maximize the function f(x) = x2 with x in
i.e., x = 0, 1, . . . 30, 31.
the integer interval [0 , 31],
433
4. is no of individuals in the population, is population size, n=4
n * pi is expected count
32
434
SC – GA - Operators
In fitness-proportionate selection :
5%
8 2
20% 9%
13%
8%
8%
17%
20%
435
− the fitness of the individuals is
th
would choose the 5 individual more
than other individuals .
n
Probability of i th string is pi = F i / (Σ F j ) , where
j=1
F
make copies of the ith string.
N=5
Cumulative Probability5 = Σ pi
i=1
33
436
SC – GA - Operators
• Boltzmann Selection
34
437
SC – GA - Operators
3.2 Crossover
− the others are Two Point, Uniform, Arithmetic, and Heuristic crossovers.
The operators are selected based on the way chromosomes are encoded.
35
438
SC – GA - Operators
• One-Point Crossover
Parent 1 11011|00100110110
Parent 2 11011|11000011110
Offspring 1 1 1 0 1 1 | 1 1 0 0 0 0 1 1 1 1 0
Offspring 2 1 1 0 1 1 | 0 0 1 0 0 1 1 0 1 1 0
36
439
SC – GA - Operators
► Two-Point Crossover
Parent 1 11011|0010011|0110
Parent 2 11011|1100001|1110
Offspring 1 1 1 0 1 1 | 0 0 1 0 0 1 1 | 0 1 1 0
Offspring 2 1 1 0 1 1 | 0 0 1 0 0 1 1 | 0 1 1 0
37
440
SC – GA - Operators
► Uniform Crossover
Parent 1 1 1 0 1 1 0 0 1 0 0 1 1 0 1 1 0
Parent 2 1 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0
If the mixing ratio is 0.5 approximately, then half of the genes in the
offspring will come from parent 1 and other half will come from parent 2.
The possible set of offspring after uniform crossover would be:
Offspring 1 11 12 02 11 11 12 12 02 01 01 02 11 12 11 11 02
Offspring 2 12 11 01 12 12 01 01 11 02 02 11 12 01 12 12 01
Note: The subscripts indicate which parent the gene came from.
38
441
SC – GA - Operators
• Arithmetic
Applying the above two equations and assuming the weighting factor a =
0.7, applying above equations, we get two resulting offspring. The possible
39
442
SC – GA - Operators
ii Heuristic
Heuristic crossover operator uses the fitness values of the two parent
chromosomes to determine the direction of the search.
Offspring2 = BestParent
40
443
SC – GA - Operators
3.3 Mutation
Mutation alters one or more gene values in a chromosome from its initial
state. This can result in entirely new gene values being added to the gene
pool. With the new gene values, the genetic algorithm may be able to
arrive at better solution than was previously possible.
The operators are selected based on the way chromosomes are encoded .
444
41
445
SC – GA - Operators
■ Flip Bit
The mutation operator simply inverts the value of the chosen gene. i.e. 0
goes to 1 and 1 goes to 0.
Original offspring 1 1 1 0 1 1 1 10 0 0 0 1 1 1 1 0
Original offspring 2 1 1 0 1 1 0 01 0 0 1 1 0 1 1 0
Mutated offspring 1 1 1 0 0 1 1 10 0 0 0 1 1 1 1 0
Mutated offspring 2 1 1 0 1 1 0 11 0 0 1 1 0 1 0 0
42
446
SC – GA - Operators
• Boundary
The mutation operator replaces the value of the chosen gene with either
the upper or lower bound for that gene (chosen randomly).
This mutation operator can only be used for integer and float genes.
• Non-Uniform
The mutation operator increases the probability such that the amount of
the mutation will be close to 0 as the generation number increases. This
mutation operator prevents the population from stagnating in the early
stages of the evolution then allows the genetic algorithm to fine tune the
solution in the later stages of evolution.
This mutation operator can only be used for integer and float genes.
• Uniform
The mutation operator replaces the value of the chosen gene with a
uniform random value selected between the user-specified upper and
lower bounds for that gene.
This mutation operator can only be used for integer and float genes.
• Gaussian
This mutation operator can only be used for integer and float genes.
447
Examples to demonstrate and explain : Random population, Fitness, Selection,
Crossover, Mutation, and Accepting.
Example 1 :
Maximize the function f(x) = x2 over the range of integers from 0 . . . 31.
− Repeat until x = 31
1.
Devise a means to represent a solution to the problem :
2.
Devise a heuristic for evaluating the fitness of any particular solution :
The function f(x) is simple, so it is easy to use the f(x) value itself to rate
the fitness of a solution; else we might have considered a more simpler
heuristic that would more or less serve the same purpose.
3.
Coding - Binary and the String length :
448
GAs often process binary representations of solutions. This works well,
because crossover and mutation can be clearly defined for binary solutions. A
Binary string of length 5 can represents 32 numbers (0 to 31).
4.
Randomly generate a set of solutions :
5.
Evaluate the fitness of each member of the population :
11000
01101 → 13; → 24; 01000 → 8; 10011 → 19;
13 24
→ 169; → 576; 8 → 64; 19 → 361.
449
Max 576 0.49 1.97
SC – GA - Examples
6.
Produce a new generation of solutions by picking from the existing pool
of solutions with a preference for solutions which are better suited than
others:
We divide the range into four bins, sized according to the relative fitness of
the solutions which they represent.
By generating 4 uniform (0, 1) random values and seeing which bin they fall
into we pick the four strings that will form the basis for the next generation.
7.
Randomly pair the members of the new generation
Random number generator decides for us to mate the first two strings
together and the second two strings together.
8.
Within each pair swap parts of the members solutions to create
offspring which are a mixture of the parents :
450
For the first pair of strings: 01101 , 11000
01101 ⇒ 0 1 1 0 |1 ⇒ 01100
11000 ⇒ 1 1 0 0 |0 ⇒ 11001
11000 ⇒ 1 1 |0 0 0 ⇒ 11011
10011 ⇒ 1 0 |0 1 1 ⇒ 10000
9.
Randomly mutate a very small fraction of genes in the population :
With a typical mutation probability of per bit it happens that none of the bits
in our population are mutated.
10.
Go back and re-evaluate fitness of the population (new generation) :
451
Total (sum) 1754 1.000 4.000
Observe that :
01101,11000, 01000,10011
01100,11001, 1 1 0 11 , 10000
■ The total fitness has gone from 1170 to 1754 in a single generation.
■ The algorithm has already come up with the string 11011 (i.e x = 27) as
a possible solution.
at A. Let a
horizontal force P acts at C.
ℓ1
θ 1
Find : Equilibrium configuration of the system if
W1
ℓ2
C
θ
2 P Solution : Since there are two unknowns θ 1 and
XU - XL 90 - 0
452
W2 Accuracy = ----------- = --------- = 60
24 - 1 15
Fig. Two bar pendulum
Hence, the binary coding and the corresponding angles Xi are given as
XiU - XiL
th
Xi = XiL + ----------- Si where Si is decoded Value of the i chromosome.
24 - 1
e.g. the 6th chromosome binary code (0 1 0 1) would have the corresponding
angle given by Si = 0 1 0 1 = 23 x 0 + 22 x 1 + 21 x 0 + 20 x 1 = 5
90 - 0
Xi = 0 + ----------- x 5 = 30
15
The binary coding and the angles are given in the table below.
Si Xi Si Xi
1 0000 0 9 1000 48
2 0001 6 10 1001 54
3 0010 12 11 1010 60
4 0011 18 12 1011 66
5 0100 24 13 1100 72
6 0101 30 14 1101 78
7 0110 36 15 1110 84
8 0111 42 16 1111 90
(c) = - P[(ℓ1 sinθ 1 + ℓ2 sinθ 2 )] - (W1 ℓ1 /2)cosθ 1 - W2 [(ℓ2 /2) cosθ 2 + ℓ1 cosθ 1]
(Eq.1)
453
θ1,θ2
lies between 0 and 90 both inclusive ie 0 ≤ θ 1 , θ 2 ≤ 90 (Eq. 3)
Since the objective function is –ve , instead of minimizing the function f let us
maximize -f = f ’ . The maximum value of f ’ = 8 when θ 1 and θ 2 are zero.
48
454
First randomly generate 8 population with 8 bit strings as shown in table below.
θ1 , θ2
1 0000 0000 0 0 1
These angles and the corresponding to fitness function are shown below.
455
− GA begins with a population of random strings.
− If the termination criteria are not met, the population is iteratively operated by
the three operators and evaluated until the termination criteria are met.
49
456
Hybrid Systems
Integration of NN FL GA
What is Hybridization ?
− Auxiliary hybrid system: the one technology calls the other technology
as subroutine;
457
− Fuzzy logic addresses the imprecision or vagueness in input and output,
− Genetic algorithms are inspired by biological evolution, can systemize
ƒ Introduction :
Fuzzy logic, Neural networks and Genetic algorithms are soft computing
methods which are inspired by biological computational processes and nature's
problem solving strategies.
Neural Networks (NNs) are highly simplified model of human nervous system
which mimic our ability to adapt to circumstances and learn from past experience.
Neural Networks systems are represented by different architectures like single and
multilayer feed forward network. The networks offers back proposition
generalization, associative memory and adaptive resonance theory.
458
Each of these technologies have provided efficient solution to wide range of
problems belonging to different domains. However, each of these technologies
suffer from advantages and disadvantages.
pipelining fashion.
459
fashion. Thus, one technology's output becomes another technology's
input and it goes on. However, this is one of the weakest form of
hybridization since an integrated combination of technologies is not
present.
460
1.2 Neural Networks, Fuzzy Logic, and Genetic Algorithms Hybrids
distinct technologies.
of other.
■ Neuro-Fuzzy Hybrid
Neural Networks :
squares errors; the training time required is quite large; the training
data has to be chosen over entire range where the variables are
expected to change.
Fuzzy logic :
461
− Merits : Fuzzy logic system, addresses the imprecision of inputs and
outputs defined by fuzzy sets and allow greater flexibility in formulating
detail system description.
extend the capabilities of the systems beyond either of these two technologies
applied individually. The integrated systems have turned out to be useful in :
- Neuro-Genetic Hybrids
462
Neural Networks : can learn various tasks from examples, classify
phenomena and model nonlinear relationships.
463
Genetic Hybrids
Fuzzy
-
The fuzzy systems like NNs (feed forward) are universal approximator in
the sense that they exhibit the capability to approximate general nonlinear
functions to any desired degree of accuracy.
464
The adjustments of system parameters called for in the process, so that
the system output matches the training data, have been tackled using
GAs. Several parameters which a fuzzy system is involved with like
input/output variables and the membership function that define the fuzzy
systems, have been optimized using GAs.
Neural networks (NNs) are the adaptive system that changes its structure based
on external or internal information that flows through the network. Neural network
solve problems by self-learning and self-organizing.
465
The steps involved are:
− The pattern of activation arriving at the output layer is compared with the
Limitations of BPN :
− BPN can recognize patterns similar to those they have learnt, but do not
have the ability to recognize new patterns.
Genetic Algorithms (GAs) are adaptive search and optimization algorithms, mimic
the principles of nature.
− The BPN determines its weight based on gradient search technique and
466
− GAs do not guarantee to find global optimum solution, but are good in
finding quickly good acceptable solution.
The GA based techniques for determining weights in a BPN are explained next.
15
467
2.1 GA based techniques for determining weights in a BPN
− low fit individuals are kept out from reproduction and so die,
their ancestors,
468
− a fitness function is formulated,
All these aspects of GAs for determining weights of BPN are illustrated in
next few slides.
16
469
SC – Hybrid Systems – GA based BPN
• Coding
chromosomes.
Example :
W11 V11 = ( 2 + 2) . 2 = 8
1 1 1
− each weight is real number and
W12 V12
470
2 2 2 − string S representing
= 40 in length
Fig. BPN with 2 – 2 - 2
ie choose 40 chromosomes
Chromosome
Chromosome
17
471
SC – Hybrid Systems – GA based BPN
• Weight Extraction
Let xkd+1 , xkd+2 , . . x(k + 1)d represent kth gene (k ≥ 0) in the chromosomes.
+ 10d-2
wk =
d-2
xkd +2 10 + xkd +3 10d-3 + . . . + x(k + 1)d , if 0 ≤ xkd +1 < 5
10d-2
The Chromosomes are stated in the Fig. The weights extracted from all
• Gene 0 : 84321 ,
4 x 103 + 3 x 102 + 2 x 10 + 1
W0 = + 3
= +4.321
10
■ Gene 1 : 46234 ,
472
0 ≤ x6 = 4 ≤ 5. Hence, the weight extracted is
6 x 103 + 2 x 102 + 3 x 10 + 4
W1 = − 103 = − 6.234
18
473
SC – Hybrid Systems – GA based BPN
■ Fitness Function :
Example :
The matrix on the right, represents a set of input I (I11 , I21) (T11 , T21)
Let w 01 , w 0
2 , . . . . w 040 be the weight sets extracted, using the Eq.
0
Let o 01 , o 2 , o 03 be the calculated outputs of BPN.
Compute Fitness F1 :
474
The fitness for the chromosome
F1
C01 is given by
F1 = 1 / E .
475
Algorithm
Keeping w i as a fixed weight, train the BPN for the N input instances;
Calculate error E i for each of the input instances using the formula below
2
E i =Σ ( T j i – O ji ) where O i is the output vector calculated by BPN;
1/2
i.e. E =(( Σ E i )/N)
Calculate the Fitness value F i for each of the individual string of the
population as F i =1/E
476
}
Thus
the Fitness values Fi for all chromosomes in the initial
population are
computed. The population size so
is p = 40, Fi,i=1
■ 2 , . . , 40 are computed.
Initial
Population of Extracted
C 01 w 01
Compute
Training BPN
Fitness
0 0
C 2 w 2
----
--- F i =1/E
---
C040 W
0
40 Fitness
Values
477
Fig. Computation of Fitness values for the population
21
478
SC – Hybrid Systems – GA based BPN
• Reproduction of Offspring
i.e., the best fit individuals have multiple copies while worst fit
Having formed the mating pool, select parent pair at random. Chromosomes
of respective pairs are combined using crossover operator. Fig. below shows
:
Pa Pb
Parent
Chromosomes A B
479
Offspring B A
Oa Ob
22
480
Example :
C 01 F1 C01 F1
value Fmax
Chromosomes C0 k Fk C0 k Fk
---- ----
max
F min is replaced by F
481
k Selection of Parent Chromosomes
Here, sample "Selection Of Parents" for the "Two Points Crossover" operator
The Crossover Points of the Chromosomes are randomly chosen for each
parent pairs as shown in the Fig. below.
Crossover
points
482
Selected Parent Pairs
The Genes are exchanged for Mutation as shown in the Fig. below.
New Population P1
24
483
SC – Hybrid Systems – GA based BPN
• Convergence
Example :
and crossover.
− the best individuals replicated and the reproduction carried out using two-
point crossover operators form the next generation P2 of the
chromosomes.
− at that stage, the weights extracted from the population Pi are the final
weights to be used by BPN.
484
• Fuzzy Back Propagation Network
Neural Networks and Fuzzy logic (NN-FL) represents two distinct methodologies
and the integration of NN and FL is called Neuro-Fuzzy systems.
Fuzzy-BPN architecture, maps fuzzy inputs to crisp outputs. Here, the Neurons
uses LR-type fuzzy numbers.
• Definition
~
m–x
L for x≤m, α 0
µ ~ (x) = α
M m–x
R for x ≤ m , β 0
485
β
R is a right reference, ~
m–x
m–x
L = max (0,1- )
α α
m–x
m–x
R = max (0,1- )
α α
~
LR-type fuzzy number M can be represented as (m, α, β) LR shown below.
Member ship
deg µ ~ (x)
00 α m, β x
486
Note : If α and β are both zero, then L-R type function indicates a crisp
value. The choice of L and R functions is specific to problem.
487
• Operations on LR-type Fuzzy Numbers
~ ~ = (n, γ , δ)
Let M = (m, α , β) LR and N LR be two L R-type fuzzy
• Addition
(m, α , β) LR (n, γ , δ) LR = (m + n, α + γ , β + δ ) LR
• Substraction
(m, α , β) LR (n, γ , δ) LR = (m - n, α + δ , β + γ ) LR
■ Multiplicaion
( m,
α , β) LR , , δ) LR = (mn , mα - mδ , nβ - mγ ) RL for m<0 , n≥ 0
(n,
β) LR γ
n<0
■ Scalar Multiplicaion
λ*(m, α , β) LR = (λm,
, λβ) , 0 , λ
∀
λ R
≥
LR
λα ∈
λ*(m,
, β) = (λm, - , -λβ) , λ<0 , λ
∀
α LR RL R
∈
λα
488
489
SC – Hybrid Systems – Fuzzy BPN
■ Fuzzy Neuron
The fuzzy neuron is the basic element of Fuzzy BP network. Fig. below
shows the architecture of the fuzzy neuron.
n ~ ~ ~
i=1
~ n ~ ~
i=0
~
490
~ ~ ~ ~
~ ~ ~ ~ ~ ~ ~
■ Architecture of Fuzzy BP
491
492
SC – Hybrid Systems – Fuzzy AM
A fuzzy logic system contains the sets used to categorize input data (i.e.,
fuzzification), the decision rules that are applied to each set, and then a way
of generating an output from the rule results (i.e., defuzzification).
(non-fuzzy) output .
Associative memory allows a fuzzy rule base to be stored. The inputs are
the degrees of membership, and the outputs are the fuzzy system’s output.
33
493
The problem indicates, that there are two inputs and one-output
variables. The inference engineer is constructed based on fuzzy rule :
Weight (X)
S M L
S M L L
Stream (Y) M S M L
L S S L
35
494
• Fuzzy Representation :
■ Defuzzification
Z COG = Σn µc (Z j ) Z j / Σn µc (Z j ) where
j=1 j=1
Z j
is the control output at the quantization level j,
µc (Z j )
represents its membership value in the output fuzzy set.
Referring to Fig in the previous slide and the formula for COG, we
get the fuzzy set of the washing time as w = { 0.8/20, 0.4/35, 0.2/60
} The calculated washing time using COG formula T = 41.025 min.
495
■ Simplified Fuzzy ARTMAP
− The Supervised ART algorithms that are named with the suffix
"MAP", as ARTMAP. Here the algorithms cluster both the inputs and
ART1:
targets and associate two sets of clusters.
ART2 :
The ART systems have many variations : ART1, ART2, Fuzzy ART,
ARTMAP.
ART-1 or ART-2 units into a supervised learning structure. Here, the first
unit takes the input data and the second unit takes the correct output
data, then used to make the minimum possible adjustment of the
vigilance parameter in the first unit in order to make the correct
classification.
496
The Fuzzy ARTMAP model is fuzzy logic based computations incorporated
inter-ART module called the Map Field. The Map Field forms predictive
associations between categories of the ART modules and realizes a
match tracking rule. If ARTa and ARTb are disconnected then each
module would be of self-organize category, groupings their respective
input sets.
497
In supervised mode, the mappings are learned between input vectors
a and b. A familiar example of supervised neural networks are feed-
forward networks with back-propagation of errors.
498
− ARTMAP systems can learn both in a fast as well as in a slow match
configuration, while, the BP networks can only learn in slow
mismatch configuration. This means that an ARTMAP system learns,
or adapts its weights, only when the input matches an established
category, while BP networks learn when the input does not match
an established category.
499