SOFT COMPUTING NOTES
SOFT COMPUTING NOTES
Soft Computing
Topics
1. Introduction
Why neural network ?, Research History, Biological Neuron model, Artificial
Neuron model, Notations, Neuron equation.
6. Single-Layer NN System
Single layer perceptron : Learning algorithm for training Perceptron, Linearly
separable task, XOR Problem; ADAptive LINear Element (ADALINE) :
Architecture, Training.
SC - Neural Network – Introduction
1. Introduction
■ The conventional computers are good for - fast arithmetic and does what
■ The conventional computers are not so good for - interacting with noisy data or
data from the environment, massive parallelism, fault tolerance, and adapting to
circumstances.
■ The neural network systems help where we can not formulate an algorithmic
Research History
The history is relevant because for nearly two decades the future of Neural network
remained uncertain.
McCulloch and Pitts (1943) are generally recognized as the designers of the first neural
network. They combined many simple processing units together that could lead to an
overall increase in computational power. They suggested many ideas like : a neuron
has a threshold level and once that level is reached the neuron fires. It is still the
fundamental way in which ANNs operate. The McCulloch and Pitts's network had a
fixed set of weights.
Hebb (1949) developed the first learning rule, that is if two neurons are active at the
same time then the strength between them should be increased.
SC - Neural Network – Introduction
In the 1950 and 60's, many researchers (Block, Minsky, Papert, and Rosenblatt
worked on perceptron. The neural network model could be proved to converge to the
correct weights, that will solve the problem. The weight adjustment (learning
algorithm) used in the perceptron was found more powerful than the learning rules
used by Hebb. The perceptron caused great excitement. It was thought to produce
programs that could think.
Minsky & Papert (1969) showed that perceptron could not learn those functions
which are not linearly separable.
The neural networks research declined throughout the 1970 and until mid 80's
because the perceptron could not learn certain important functions.
Neural network regained importance in 1985-86. The researchers, Parker and LeCun
discovered a learning algorithm for multi-layer networks called back propagation that
could solve problems that were not linearly separable.
The human brain consists of a large number, more than a billion of neural cells that
process information. Each cell works like a simple processor. The massive interaction
between all cells and their parallel processing only makes the brain's abilities possible.
Dendrites are branching fibers that
extend from the cell body or soma.
Soma or cell body of a neuron contains
the nucleus and other structures, support
chemical processing and production of
neurotransmitters.
Axon is a singular fiber carries
information away from the soma to the
synaptic sites of other neurons (dendrites
and somas), muscles, or glands.
Axon hillock is the site of summation
for incoming information. At any
moment, the collective influence of all
neurons that conduct impulses to a given
neuron will determine whether or not an
SC - Neural Network – Introduction
■ Soma processes the incoming activations and converts them into output
activations.
Input1
Input n
Input 2
SC - Neural Network – Introduction
O
u
t
p
u
t
SC - Neural Network – Introduction
■ A processing unit sums the inputs, and then applies a non-linear activation
function (i.e. squashing / transfer / threshold function).
In other words ,
- The input to a neuron arrives in the form of signals.
- The signals build up in the cell.
- Finally the cell discharges (cell fires) through the output .
- The cell can start building up signals again.
Notations
Functions
Sign(x)
O/P
1
1 if x 0 .8
sgn (x) = .6
0 if x 0
.4
.2
0
-4 -3 -2 -1 0 1 2 3 4 I/P
■ Threshold or Sign function : sigmoid(x) defined as a smoothed
(differentiable) form of the threshold function
Sign(x)
O/P
1
1 .8
sigmoid (x) = .6
1+e -x
.2
0
SC - Neural Network –Artificial Neuron Model
-4 -3 -2 -1 0 1 2 3 4 I/P
Input 1
Input 2
Output
Input n
Simplified Model of Real Neuron
(Threshold Logic Unit)
Neuron consists of three basic components - weights, thresholds, and a single activation
function.
x1 W1 Activation
Function
x2
W2
y
i=1
xn
Wn
Threshold
Synaptic Weights
Fig Basic Elements of an Artificial Linear Neuron
■ Weighting Factors w
■ Threshold
The node’s internal threshold is the magnitude offset. It affects the activation of
the node output y as:
n
Y = f (I) = f{ xi wi - k}
i=1
SC - Neural Network –Artificial Neuron Model
To generate the final output Y , the sum is passed on to a non-linear filter f called
Activation Function or Transfer function or Squash function which releases the
output Y.
In practice, neurons generally do not fire (produce an output) unless their total
input goes above a threshold value.
The total input for each neuron is the sum of the weighted inputs to the neuron
minus its threshold value. This is then passed through the sigmoid function. The
equation for the transition in a neuron is :
a = 1/(1 + exp(- x)) where
x = ai wi - Q
i
a is the activation for the neuron ai is
the activation for neuron i wi is the
weight
Q is the threshold subtracted
■ Activation Function
The activation functions are chosen depending upon the type of problem to be
solved by the network.
SC - Neural Network –Artificial Neuron Model
Activation Functions f - Types
Over the years, researches tried several functions to convert the input into an outputs.
The most commonly used functions are described below.
- I/P Horizontal axis shows sum of inputs .
- O/P Vertical axis shows the value the function produces ie output.
- All functions f are designed to produce values between 0 and 1.
• Threshold Function
A threshold (hard-limiter) activation function is either a binary type or a
bipolar type as shown below.
By varying different shapes of the function can be obtained which adjusts the
abruptness of the function as it changes between the two asymptotic values.
• Example :
The neuron shown consists of four inputs with the weights.
x1=1 +1
Activation
I Function
x2=2 +1
-1
X3=5 y
Summing
Junction
xn=8 +2
=0
Synaptic Threshold
Weights
Fig Neuron Structure of Example
The output I of the network, prior to the activation function stage, is
+1
+1
I = XT. W = 1 2 5 8 = 14
-1
+2
= (1 x 1) + (2 x 1) + (5 x -1) + (8 x 2) = 14
With a binary activation function the outputs of the neuron is:
y (threshold) = 1;
SC - Neural Network – Architecture
3. Neural Network Architectures
An Artificial Neural Network (ANN) is a data processing system, consisting large number
of simple highly interconnected processing elements as artificial neuron in a network
structure that can be represented using a directed graph G, an ordered 2-tuple (V, E) ,
consisting a set V of vertices and a set E of edges.
- The vertices may represent neurons (input/output) and
- The edges may represent synaptic links labeled by the weights attached. Example :
e5
V1 V3
V5
e2
e4
e5
V2 V4
e3
E = { e1 , e2 , e3 , e4, e5 }
The Single Layer Feed-forward Network consists of a single layer of weights , where
the inputs are directly connected to the outputs, via a series of weights. The synaptic
links carrying weights connect every input to every output , but not other way. This
way it is considered a network of feed-forward type. The sum of the products of the
weights and the inputs is calculated in each neuron node, and if the value is above
some threshold (typically 0) the neuron fires and takes the activated value (typically
1); otherwise it takes the deactivated value (typically -1).
SC - Neural Network – Architecture
x2 w22 y2
w2m
w1m
wn1
wn2
xn wnm ym
The name suggests, it consists of multiple layers. The architecture of this class of
network, besides having the input and the output layers, also have one or more
intermediary layers called hidden layers. The computational units of the hidden layer
are known as hidden neurons.
Input Output
hidden layer hidden layer
weights wjk y1
weights vij
w11
x1 v11
w12
v21 y1 y2
x2 w11
v1m
v2m y3
vn1 w1m
ym
Vℓm
Hidden Layer
xℓ neurons yj
yn
Input Layer
neurons xi Output Layer
neurons zk
- The hidden layer does intermediate computation before directing the input to
output layer.
- The input layer neurons are linked to the hidden layer neurons; the weights on
these links are referred to as input-hidden layer weights.
- The hidden layer neurons and the corresponding weights are referred to as output-
hidden layer weights.
- A multi-layer feed-forward network with ℓ input neurons, m1 neurons in the first
hidden layers, m2 neurons in the second hidden layers, and n output neurons in the
output layers is written as (ℓ - m1 - m2 – n ).
The Fig. above illustrates a multilayer feed-forward network with a configuration (ℓ -
m – n).
SC - Neural Network –Learning methods
Recurrent Networks
Example :
y1
x1
y1 y2
Feedback
x2 links
Yn
ym
Xℓ
The learning methods in neural networks are classified into three basic types :
- Supervised Learning,
- Unsupervised Learning and
- Reinforced Learning
Neural Network
Learning algorithms
• Supervised Learning
- A teacher is present during learning process and presents expected output.
- Every input pattern is used to train the network.
- Learning process is based on comparison, between network's computed output and
the correct expected output, generating "error".
- The "error" generated is used to change network parameters that result improved
performance.
• Unsupervised Learning
- No teacher is present.
- The expected or desired output is not presented to the network.
- The system learns of it own by discovering and adapting to the structural features in
the input patterns.
• Reinforced learning
- A teacher is present but does not present the expected or desired output but only
indicated if the computed output is correct or incorrect.
- The information provided helps the network in its learning process.
- A reward is given for correct answer computed and a penalty for a wrong answer.
SC - Neural Network –Learning methods
Note : The Supervised and Unsupervised learning methods are most popular forms of
learning compared to Reinforced learning.
• Hebbian Learning
Hebb proposed a rule based on correlative weight adjustment.
In this rule, the input-output pattern pairs (Xi , Yi) are associated by
the weight matrix W, known as correlation matrix computed as
n
W= Xi YiT
i=1
SC - Neural Network –Systems
where YiT is the transpose of the associated output vector Yi
- If Wij is the weight update of the link connecting the i th and the j th
neuron of the two neighboring layers, then Wij is defined as
Wij = ( E/ Wij )
Note : The Hoffs Delta rule and Back-propagation learning rule are the examples
of Gradient descent learning.
• Competitive Learning
- In this method, those neurons which respond strongly to the input stimuli have
their weights updated.
- When an input pattern is presented, all neurons in the layer compete, and the
winning neuron undergoes weight adjustment .
- This strategy is called "winner-takes-all".
• Stochastic Learning
- In this method the weights are adjusted in a probabilistic fashion.
- Example : Simulated annealing which is a learning mechanism
employed by Boltzmann and Cauchy machines.
In the previous sections, the Neural Network Architectures and the Learning methods
SC - Neural Network –Systems
have been discussed. Here the popular neural network systems are listed. The grouping of
these systems in terms of architectures and the learning methods are presented in the next
slide.
– AM (Associative Memory)
– Boltzmann machines
– BSB ( Brain-State-in-a-Box)
– Cauchymachines
– Hopfield Network
– Neoconition
– Perceptron
Learning Methods
Gradient Hebbian Competitive Stochastic
descent
Single-layer ADALINE, AM, LVQ, -
feed-forward Hopfield, Hopfield, SOFM
Percepton,
SC - Neural Network –Systems
Multi-layer CC Neocognition
feed- forward M,
MLF
F,
RBF
Recurrent RNN BAM ART Boltzmann and
Networks , Cauchy
BSB, machines
Hopfield,
x2 w22 y2
w2m
w1m
wn1
wn2
xn wnm ym
Single layer
Perceptron
1 if net j 0 n
y j = f (net j) = where net j = xi wij
i=1
0 if net j 0
SC - Neural Network –Single Layer learning
• Learning Algorithm : Training Perceptron
The training of Perceptron is a supervised learning algorithm where weights are
adjusted to minimize error when ever the output does not match the desired
output.
− If the output is 1 but should have been 0 then the weights are decreased
on the active input link
K+1 K
i.e. Wi j = − . xi
Wi j
− If the output is 0 but should have been 1 then the weights are increased on
the active input link
K+1 K
i.e. Wi j =
Wi j + . xi
Where K+1
K
Wi j is the new adjusted weight, Wi j is the old weight
SC - Neural Network –Single Layer learning
- Definition : Sets of points in 2-D space are linearly separable if the sets can be
separated by a straight line.
- Generalizing, a set of points in n-dimensional space are linearly separable if there
is a hyper plane of (n-1) dimensions separates the sets.
Example
S1 S2 S1
S2
Note : Perceptron cannot find weights for classification problems that are not
linearly separable.
• XOR Problem :
Exclusive OR operation
X2
Input x1 Input x2 Output
(1, 1)
(0, 1) •
0 0 0 Even parity
1 1 0
0 1 1
1 0 1 •
Odd parity
XOR truth table (0, 0) X1
(0, 1)
Even parity is, even number of 1 bits in the input Odd parity
is, odd number of 1 bits in the input
- There is no way to draw a single straight line so that the circles are on one side of
the line and the dots on the other side.
- Perceptron is unable to find a line separating even parity input
patterns from odd parity input patterns.
■ Step 5 :
Compare the computed output yj with the target output yj for
each input pattern j .
If all the input patterns have been classified correctly, then output (read) the
weights and exit.
■ Step 6 :
Otherwise, update the weights as given below :
If the computed outputs yj is 1 but should have been 0,
Then wi = wi - xi , i= 0, 1, 2, ............ , n
If the computed outputs yj is 0 but should have been 1,
Then wi = wi + xi , i= 0, 1, 2, ............., n
where is the learning parameter and is constant.
■ Step 7 :
goto step 3
■ END
x1 W1
x2
W2
Output
Neuron
xn Wn
–
Error
+
SC - Neural Network –ADALINE
Desired Output
■ After the ADALINE is trained, an input vector presented to the network with
fixed weights will result in a scalar output.
■ The activation function is not used during the training phase. Once the
weights are properly adjusted, the response of the trained unit can be tested by
applying various inputs, which are not in the training set. If the network
produces consistent responses to a high degree with the test inputs, it is
said that the network could generalize. The process of training and
generalization are two important attributes of this network.
Usage of ADLINE :
In practice, an ADALINE is used to
- Make binary decisions; the output is sent through a binary threshold.
- Realizations of logic gates such as AND, NOT and OR .
- Realize only those logic functions that are linearly separable.
SC - Neural Network –ADALINE
Applications of Neural Network
Neural Network Applications can be grouped in following categories:
■ Clustering:
A clustering algorithm explores the similarity between patterns and places similar
patterns in a cluster. Best known applications include data compression and data
mining.
■ Classification/Pattern recognition:
The task of pattern recognition is to assign an input pattern (like handwritten
symbol) to one of many classes. This category includes algorithmic
implementations such as associative memory.
■ Function approximation :
The tasks of function approximation is to find an estimate of the unknown function
subject to noise. Various engineering and scientific disciplines require function
approximation.
■ Prediction Systems:
The task is to forecast some future values of a time-sequenced data. Prediction
has a significant impact on decision support systems. Prediction differs from function
approximation by considering time factor. System may be dynamic and may produce
different results for the same input data based on system state (time).
Soft Computing
2. Back-Propagation Algorithm
Algorithm for training Network - Basic loop structure, Step-by-step procedure; Example:
Training Back-prop network, Numerical example.
Back-Propagation Network
What is BPN ?
Minsky and Papert (1969) showed that a two layer feed-forward network can
overcome many restrictions, but they did not present a solution to the problem
as "how to adjust the weights from input to hidden layer" ?
Real world is faced with a situations where data is incomplete or noisy. To make
reasonable predictions about what is missing from the information available is a difficult
task when there is no a good theory available that may to help reconstruct the missing
data. It is in such situations the Back-propagation (Back-Prop) networks may provide some
answers.
• Typically, units are connected in a feed-forward fashion with input units fully
connected to units in the hidden layer and hidden units fully connected to units in
the output layer.
• With BackProp networks, learning occurs during a training phase. The steps
followed during learning are :
− each input pattern in a training set is applied to the input units and then propagated
forward.
− the pattern of activation arriving at the output layer is compared with the correct
(associated) output pattern to calculate an error signal.
− the error signal for each such target output pattern is then back-propagated from
the outputs to the inputs in order to appropriately adjust the weights in each layer
of the network.
− after a BackProp network has learned the correct classification for a set of
inputs, it can be tested on a second set of inputs to see how well it classifies
SC - NN - BPN – Background
untrained patterns.
Learning :
AND function
AND W1
X1 X2 Y
Input I1
0 0 0 A
0 1 0 Output O
1 0 0 C
W2
1 1 1
Input I2 B
SC - NN - BPN – Background
− there are 4 inequalities in the AND function and they must be satisfied.
w10 + w2 0 < θ , w1 0 + w2 1 < θ ,
w11 + w2 0 < θ , w1 1 + w2 1 > θ
− one possible solution :
if both weights are set to 1 and the threshold is set to 1.5, then
(1)(0) + (1)(0) < 1.5 assign 0 , (1)(0) + (1)(1) < 1.5 assign 0
(1)(1) + (1)(0) < 1.5 assign 0 , (1)(1) + (1)(1) > 1.5 assign 1
• Example 1
AND Problem
AND W1
X1 X2 Y Input I1
0 0 0
A
0 1 0 Output O
1 0 0 C
1 1 1 Input I2 W2
B
SC - NN - BPN – Background
the output of the network is determined by calculating a weighted sum of its two
inputs and comparing this value with a threshold θ.
if the net input (net) is greater than the threshold, then the output is 1, else it is
0.
mathematically, the computation performed by the output unit is
net = w1 I1 + w2 I2 if net > θ then O = 1, otherwise O = 0.
• Example 2
Marital status and occupation
In the above example 1
the input characteristics may be : marital Status (single or married)
and their occupation (pusher or bookie).
this information is presented to the network as a 2-D binary input vector where 1st
element indicates marital status (single = 0, married = 1) and 2nd element
indicates occupation ( pusher = 0, bookie = 1 ).
the output, comprise "class 0" and "class 1".
by applying the AND operator to the inputs, we classify an individual as a
member of the "class 0" only if they are both married and a bookie; that is the
output is 1 only when both of the inputs are 1.
Rosenblatt (late 1950's) proposed learning networks called Perceptron. The task
was to discover a set of connection weights which correctly classified a set of binary
input vectors. The basic architecture of the perceptron is similar to the simple AND
network in the previous example.
The learning rule that Roseblatt developed, is based on determining the difference
between the actual output of the network with the target output (0 or 1), called
"error measure" which is explained in the next slide.
― If the input vector is correctly classified (i.e., zero error), then the
weights are left unchanged, and
the next input vector is presented.
― If the input vector is incorrectly classified (i.e., not zero error), then
there are two cases to consider :
θ = - (tp - op) = - dp
Hidden Layer
The best example to explain where back-propagation can be used is the XOR
problem.
Consider a simple graph shown below.
− all points on the right side of the line are +ve, therefore the output of the neuron
should be +ve.
− all points on the left side of the line are –ve, therefore the output of
the neuron should be –ve.
XOR X1
X1 X2 Y A
X2 C
1 1 0 Y
1 0 1
0 1 1
0 0 0 X2 B
X1
SC - NN – Back Propagation Network
2. Back Propagation Network
Learning By Example
The weight of the arc between i th hidden neuron to j th out layer is Wij
Vl1 Wm1
IIℓ OIℓ IHm OHm IOn OOn
ℓ m n
Vij Wij
Input Layer Hidden Output Layer
i - nodes Layer m- n - nodes
nodes
The table below indicates an 'nset' of input and out put data. It shows ℓ
inputs and the corresponding n output data.
Table : 'nset' of input and output data
No Input Ouput
I1 I2 .... Iℓ O1 O2 .... On
1 0.3 0.4 .... 0.8 0.1 0.56 .................... 0.82
2
:
nset
In this section, over a three layer network the computation in the input, hidden and output
layers are explained while the step-by-step implementation of the BPN algorithm by solving
an example is illustrated in the next section.
SC - NN – Back Propagation Network
Computation of Input, Hidden and Output Layers
(Ref.Previous slide, Fig. Multi-layer feed-forward back-propagation network)
It denotes weight matrix or connectivity matrix between hidden neurons and output
neurons as [ W ], we can get input to output neuron as
{ I }O = [ W] T { O }H
n x 1 n x m m x 1 (denotes matrix row, column size)
SC - NN – Back Propagation Network
• Output Layer Computation
Shown below the qth neuron of the output layer. It has input from the output of
the hidden neurons layers.
If we consider transfer function as sigmoidal function then the output of the qth
output neuron is given by
1
( 1 + e - (IOq – θOq))
OOq =
IH1 OH1 –
1
–
IH2 OH2 1
2 { O }O = - (IOq – θOq)
(1+e )
W1q
w2q
–
q –
IH3 OH3 W3q OOq
3
Wmq
IHm OHm
m
θOq Note : here again the threshold is not
O
IHO = -1 OHO = -1 treated as shown in the Fig (left); the
Outputs of the output neurons given by
Fig. Example of Treating threshold the above equation.
in output layer
SC - NN – Back Propagation Network
Calculation of Error
Consider any r th output neuron. For the target out value T, mentioned in the table-
'nset' of input and output data" for the purpose of training, calculate output O .
The error norm in output for the r th output neuron is
E1r = (1/2) e2r = (1/2) (T –O)2
where E1r is 1/2 of the second norm of the error er in the r th neuron for the given
training pattern.
e2r is the square of the error, considered to make it independent of sign +ve
or –ve , ie consider only the absolute value.
The Euclidean norm of error E1 for the first training pattern is given by
n
1
E = (1/2) 2
r=1 (Tor - Oor )
This error function is for one training pattern. If we use the same technique for all
the training pattern, we get
nset
E (V, W) = E j (V, W, I)
r=1
SC - NN - BPN – Algorithm
where E is error function depends on m ( 1 + n) weights of [W] and [V].
All that is stated is an optimization problem solving, where the objective or cost
function is usually defined to be maximized or minimized with respect to a set of
parameters. In this case, the network parameters that optimize the error function E
over the 'nset' of pattern sets [I nset ,t nset ] are synaptic weight values [ V ]
and [ W ] whose sizes are
[ V ] and [ W]
ℓxm mxn
16Back-Propagation Algorithm
The benefits of hidden layer neurons have been explained. The hidden layer allows ANN to
develop its own internal representation of input-output mapping. The complex internal
representation capability allows the hierarchical network to learn any mapping and not
just the linearly separable ones.
The basic algorithm loop structure, and the step by step procedure of Back-
propagation algorithm are illustrated in next few slides.
Normalize the I/P and O/P with respect to their maximum values.
For each training pair, assume that in normalized form there are
ℓ inputs given by { I }I and
ℓx1
n outputs given by { O}O
nx1
■ Step 2 :
=[ W ] 0 = [ 0]
For training data, we need to present one set of inputs and outputs. Present the pattern
as inputs to the input layer { I }I .
then by using linear activation function, the output of the input layer may be
evaluated as
{ O }I = { I }I
ℓx1 ℓx1
■ Step 5 :
{ I }H = [ V] T { O }I
mx1 mxℓ ℓx1
■ Step 6 :
Let the hidden layer units, evaluate the output using the
sigmoidal function as
–
–
1
{ O }H = (1+e - (IHi)
)
–
–
mx1
SC - NN - BPN – Algorithm
■ Step 9 :
{ I }O = [ W] T { O }H
nx1 nxm mx1
■ Step 8 :
Let the output layer units, evaluate the output using sigmoidal
function as
–
–
1
{ O }O = (1+e - (IOj)
)
–
–
Calculate the error using the difference between the network output and the
desired output as for the j th training set as
EP = (Tj - Ooj )2
n
■ Step 10 :
Find a term { d } as
–
–
–
–
nx1
SC - NN - BPN – Algorithm
■ Step 11 :
Find [ Y ] matrix as
[Y] = { O }H d
mxn mx1 1xn
■ Step 12 :
■ Step 13 :
Find { e } = [ W ] { d}
mx1 mxn nx1
–
–
(OHi) (1 – OHi )
{ d* } = ei
–
–
mx1mx1
Find [ X ] matrix as
[X] = { O }I d* = { I }I d*
1xm ℓx1 1xm ℓx1 1xm
SC - NN - BPN – Algorithm
■ Step 14 :
t +1 t
Find [ V] = [ V] + [ X]
1xm 1xm 1xm
■ Step 15 :
Find [ V ] t +1 = [V ] t + [ V ] t +1
[ W ] t +1 = [W ] t + [ W ] t +1
■ Step 16 :
Find error rate as
error rate =
Ep
nset
■ Step 17 :
Repeat steps 4 to 16 until the convergence in the error rate is less than the
tolerance value
■ End of Algorithm
• Problem :
Consider a typical problem where there are 5 training sets.
In this problem,
- there are two inputs and one output.
- the values lie between -1 and +1 i.e., no need to normalize the values.
- assume two neurons in the hidden layers.
- the NN architecture is shown in the Fig. below.
0.4 0.1
0.2
0.4 -0.2 TO = 0.1
-0.7 -0.5
0.2
Fig. Multi layer feed forward neural network (MFNN) architecture with
data of the first training set
1
Values from step
1&2 ( 1 + e - (0.18))
0.5448
1 =
0.505
( 1 + e - (0.02))
{ O }H =
SC - NN - BPN – Algorithm
■ Step 5 : (ref eq. of step 7)
0.5448
{ I }O = [ W] T { O }H = ( 0.2 - 0.5 ) = - 0.14354
0.505
1
{ O }O = = 0.4642
( 1 + e - (0.14354))
0.2 –0.018116
{ e } = [ W ] { d} = (– 0.09058) =
-0.5 –0.04529
0.4
[ X ] = { O }I ( d* ) = ( – 0.00449 0.01132)
-0.7
– 0.001796 0.004528
= 0.003143 –0.007924
– 0.001077 0.002716
[ V]1 = [ V]0+ [X]=
0.001885 –0.004754
from values at step 2 & step 8 above
SC - NN - BPN – Algorithm
■ Step 14 : (ref eq. of step 15)
– 0.0989 0.04027
=
0.1981 –0.19524
■ Step 15 :
With the updated weights [V] and [ W ] , error is calculated again and
next training set is taken and the error will then get adjusted.
■ Step 16 :
Iterations are carried out till we get the error less than the tolerance.
■ Step 17 :
Soft Computing
Introduction to fuzzy set, topics : classical set theory, fuzzy set theory,
crisp and non-crisp Sets representation, capturing uncertainty, examples.
Fuzzy membership and graphic interpretation of fuzzy sets - small, prime
numbers, universal, finite, infinite, empty space; Fuzzy Operations -
inclusion, comparability, equality, complement, union, intersection,
difference; Fuzzy properties related to union, intersection, distributivity,
law of excluded middle, law of contradiction, and cartesian product.
Fuzzy relations : definition, examples, forming fuzzy relations,
projections of fuzzy relations, max-min and min-max compositions.
Fuzzy Set Theory
Soft Computing
Topics
2. Fuzzy set
Fuzzy Membership; Graphic interpretation of fuzzy sets : small, prime numbers, universal,
finite, infinite, empty space;
Fuzzy Operations : Inclusion, Comparability, Equality, Complement, Union,
Intersection, Difference;
Fuzzy Properties : Related to union – Identity, Idempotence, Associativity,
Commutativity ; Related to Intersection – Absorption, Identity, Idempotence,
Commutativity, Associativity; Additional properties - Distributivity, Law of excluded
middle, Law of contradiction; Cartesian product .
3. Fuzzy Relations
Definition of Fuzzy Relation, examples;
Forming Fuzzy Relations – Membership matrix, Graphical form; Projections of
Fuzzy Relations – first, second and global; Max-Min and Min-Max compositions.
Fuzzy Set Theory
• The word "fuzzy" means "vagueness". Fuzziness occurs when the boundary of a piece
of information is not clear-cut.
• Fuzzy sets have been introduced by Lotfi A. Zadeh (1965) as an extension of the
classical notion of set.
• Classical set theory allows the membership of the elements in the set in binary
terms, a bivalent condition - an element either belongs or does not belong to the
set.
Fuzzy set theory permits the gradual assessment of the membership of elements in
a set, described with the aid of a membership function valued in the real unit interval
[0, 1].
• Example:
Words like young, tall, good, or high are fuzzy.
Human thinking and reasoning frequently involve fuzzy information, originating from
inherently inexact human concepts. Humans, can give satisfactory answers, which are
probably true.
However, our systems are unable to answer many questions. The reason is, most
systems are designed based upon classical set theory and two-valued logic which is
unable to cope with unreliable and incomplete information and give expert opinions.
We want, our systems should also be able to cope with unreliable and incomplete
information and give expert opinions. Fuzzy sets have been able provide solutions to
many real world problems.
Fuzzy Set theory is an extension of classical set theory where elements have degrees of
membership.
Α:Χ [0, 1]
A(x) = 1 , x is a member of A Eq.(1)
A(x) = 0 , x is not a member of A
− Thus in classical set theory A (x) has only the values 0 ('false')
and 1 ('true''). Such sets are called crisp sets.
− A Fuzzy Set is any set that allows its members to have different degree of
membership, called membership function, in the interval [0 , 1].
− Fuzzy logic is derived from fuzzy set theory dealing with reasoning that is
approximate rather than precisely deduced from classical predicate logic.
− The proposition of Fuzzy Sets are motivated by the need to capture and represent
real world data with uncertainty due to imprecise measurement.
■ Non-Crisp Representation
Degree or grade of truth to represent theDegree
notionorofgrade
a tall of
person.
truth
0 0
1.8 m Height x 1.8 m Height x
SC - Fuzzy set theory – Fuzzy Operation
Crisp logic Non-crisp logic
A student of height 1.79m would belong to both tall and not tall sets with a
particular degree of membership.
As the height increases the membership grade within the tall set would increase
whilst the membership grade within the not-tall set would decrease.
• Capturing Uncertainty
Instead of avoiding or ignoring uncertainty, Lotfi Zadeh introduced Fuzzy Set theory
that captures uncertainty.
Α:Χ [0, 1]
■ Each membership function maps elements of a given universal base set X , which
is itself a crisp set, into real numbers in [0, 1] .
■ Example
c (x) F (x)
1
C F
0.5
0 x
either out of the set, with membership of degree " 0 ", or in the
set, with membership of degree " 1 ",
SC - Fuzzy set theory – Fuzzy Operation
Therefore, Crisp Sets ⊆ Fuzzy Sets
In other words, Crisp Sets are Special cases of Fuzzy Sets.
A Fuzzy Set is any set that allows its members to have different degree of
membership, called membership function, in the interval [0 , 1].
The first two numbers specify the start and end of the universal space, and the third
argument specifies the increment between elements. This gives the user more
flexibility in choosing the universal space.
Fuzzy Membership
The Graphic Interpretation of fuzzy membership for the fuzzy sets : Small, Prime
Numbers, Universal-space, Finite and Infinite UniversalSpace, and Empty are
illustrated in the next few slides.
In any application of sets or fuzzy sets theory, all sets are subsets of a fixed set
called universal space or universe of discourse denoted by X. Universal space X as a
fuzzy set is a function equal to 1 for all elements.
UNIVERSALSPACE = FuzzySet {{1, 1}, {2, 1}, {3, 1}, {4, 1}, {5, 1}, {6, 1},
{7, 1}, {8, 1}, {9, 1}, {10, 1}, {11, 1}, {12, 1}}
Therefore SetUniversal is represented as
SetUniversal = FuzzySet [{{1,1},{2,1}, {3,1}, {4,1}, {5,1},{6,1}, {7,1},
{8, 1}, {9, 1}, {10, 1}, {11, 1}, {12, 1}} , UniversalSpace {1, 12, 1}]
SC - Fuzzy set theory – Fuzzy Operation
Examples:
1. Let N be the universal space of the days of the week.
N = {Mo, Tu, We, Th, Fr, Sa, Su}. N is finite. 2.
Let M = {1, 3, 5, 7, 9, ...}. M is infinite.
3. Let L = {u | u is a lake in a city }. L is finite.
(Although it may be difficult to count the number of lakes in a city,
but L is still a finite universal set.)
An empty set is a set that contains only elements with a grade of membership equal to
0.
Example: Let EMPTY be a set of people, in Minnesota, older than 120. The Empty
set is also called the Null set.
Fuzzy Operations
A fuzzy set operations are the operations on fuzzy sets. The fuzzy set operations are
generalization of crisp set operations. Zadeh [1965] formulated the fuzzy set theory in
the terms of standard operations: Complement, Union, Intersection, and Difference.
In this section, the graphical interpretation of the following standard fuzzy set terms
and the Fuzzy Logic operations are illustrated:
MEDIUM]
SC - Fuzzy set theory – Fuzzy Operation
• Inclusion
Let A and B be fuzzy sets defined in the same universal space X.
The fuzzy set A is included in the fuzzy set B if and only if for every x in the set X we
have A(x) B(x)
Example :
The fuzzy set UNIVERSALSPACE numbers, defined in the universal
space X = { xi } = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} is presented as
SetOption [FuzzySet, UniversalSpace {1, 12, 1}]
1
.8
.6
.4
.2
0
0 1 2 3 4 5 6 7 8 9 10 11 12 X
Fig Graphic Interpretation of Fuzzy Inclusion
FuzzyPlot [SMALL, VERYSMALL]
SC - Fuzzy set theory – Fuzzy Properties
• Comparability
Two fuzzy sets A and B are comparable if the
condition A B or B A holds, ie,
if one of the fuzzy sets is a subset of the other set, they are comparable.
Example 1:
Let A = {{a, 1}, {b, 1}, {c, 0}} and
B = {{a, 1}, {b, 1}, {c, 1}}.
Then A is comparable to B, since A is a subset of B.
Example 2 :
Let C = {{a, 1}, {b, 1}, {c, 0.5}} and
D = {{a, 1}, {b, 0.9}, {c, 0.6}}.
Then C and D are not comparable since
C is not a subset of D and
D is not a subset of C.
• Equality
Let A and B be fuzzy sets defined in the same space X.
Then A and B are equal, which is denoted X = Y
if and only if for all x in the set X, A(x) = B(x).
Example.
The fuzzy set B SMALL
SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
1
.8
.6
.4
.2
0
0 1 2 3 4 5 6 7 8 9 10 11 12 X
Note : If equality A(x) = B(x) is not satisfied even for one element x in
the set X, then we say that A is not equal to B.
• Complement
Let A be a fuzzy set defined in the space X.
Then the fuzzy set B is a complement of the fuzzy set A, if and only if, for all x in
the set X, B(x) = 1 - A(x).
Membership Grade A Ac
1
.8
.6
.4
.2
0
0 1 2 3 4 5 6 7 8 9 10 11 12 X
Example 2.
The empty set and the universal set X, as fuzzy sets, are
complements of one another.
'=X , X' =
1
.8
.6
SC - Fuzzy set theory – Fuzzy Properties
.4
.2
0
0 1 2 3 4 5 6 7 8 9 10 11 12 X
Fig Graphic Interpretation of Fuzzy Compliment
FuzzyPlot [EMPTY, UNIVERSALSPACE]
• Union
Let A and B be fuzzy sets defined in the space X.
The union is defined as the smallest fuzzy set that contains both A and B. The union of
A and B is denoted by A B.
The following relation must be satisfied for the union operation :
for all x in the set X, (A B)(x) = Max (A(x), B(x)).
1
.8
.6
.4
.2
0
0 1 2 3 4 5 6 7 8 9 10 11 12 X
SC - Fuzzy set theory – Fuzzy Properties
Fig Graphic Interpretation of Fuzzy Union
FuzzyPlot [UNION]
The notion of the union is closely related to that of the connective "or". Let A is a
class of "Young" men, B is a class of "Bald" men.
If "David is Young" or "David is Bald," then David is associated with the
union of A and B. Implies David is a member of A B.
• Intersection
Let A and B be fuzzy sets defined in the space X. Intersection is defined as the
greatest fuzzy set that include both A and B. Intersection of A and B is denoted by A
B. The following relation must be satisfied for the intersection operation :
for all x in the set X, (A B)(x) = Min (A(x), B(x)).
Fuzzy Intersection : (A B)(x) = min [A(x), B(x)] for all x X Example 1 :
A(x) = 0.6 and B(x) = 0.4 (A B)(x) = min [0.6, 0.4] = 0.4
1
.8
.6
.4
SC - Fuzzy set theory – Fuzzy Properties
.2
0
0 1 2 3 4 5 6 7 8 9 10 11 12 X
Fig Graphic Interpretation of Fuzzy Union
FuzzyPlot [INTERSECTION]
• Difference
1
.8
.6
.4
.2
0
0 1 2 3 4 5 6 7 8 9 10 11 12 X
Fuzzy Properties
■ Identity:
A =A
input = Equality [SMALL EMPTY , SMALL]
output = True
A X=X
input = Equality [SMALL UnivrsalSpace , UnivrsalSpace]
output = True
■ Idempotence :
A A=A
input = Equality [SMALL SMALL , SMALL]
output = True
■ Commutativity :
A B=B A
input = Equality [SMALL MEDIUM, MEDIUM SMALL]
output = True
A (B C) = (A B) C
Small = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3},
{7, 0.2}, {8, 0.1}, {9, 0.7 }, {10, 0.4 }, {11, 0}, {12, 0}}
Medium = FuzzySet {{1, 0 }, {2, 0 }, {3, 0}, {4, 0.2}, {5, 0.5}, {6, 0.8},
{7, 1}, {8, 1}, {9, 0 }, {10, 0 }, {11, 0.1}, {12, 0}}
SC - Fuzzy set theory – Fuzzy Properties
■ Identity :
A X=A
input = Equality [Small UnivrsalSpace , Small]
output = True
■ Idempotence :
A A=A
input = Equality [Small Small , Small]
output = True
■ Commutativity :
A B=B A
input = Equality [Small Big , Big Small]
output = True
■ Associativity :
A (B C) = (A B) C
input = Equality [Small (Medium Big), (Small Medium) Big] output =
True
SC - Fuzzy set theory – Fuzzy Properties
• Additional Properties
Related to Intersection and Union
■ Distributivity:
A (B C) = (A B) (A C)
input = Equality [Small (Medium Big) ,
(Small Medium) (Small Big)]
output = True
■ Distributivity:
A (B C) = (A B) (A C)
input = Equality [Small (Medium Big) ,
(Small Medium) (Small Big)]
output = True
Thus the Cartesian product A x B is a fuzzy set of ordered pair (x , y) for all
x X and y Y, with grade membership of (x , y) in X x Y given by the
above equations .
2. Fuzzy Relations
− Fuzzy relations offer the capability to capture the uncertainty and vagueness in relations
between sets and elements of a set.
In this section, first the fuzzy relation is defined and then expressing fuzzy relations in
terms of matrices and graphical visualizations. Later the properties of fuzzy relations and
operations that can be performed with fuzzy relations are illustrated.
Note :
Definition of fuzzy relation is a generalization of the definition of fuzzy set from
the 2-D space (x , , R (x)) to 3-D space ((x , y) , R (x , y)).
Cartesian product A x B is a relation by itself between x and y .
A fuzzy relation R is a sub set of R3 namely
{ ((x , y) , R (x , y)) | A x B x [0,1] U1 x U2 x [0,1] }
• Example of Fuzzy Relation
R = { ((x1 , y1) , 0)) , ((x1 , y2) , 0.1)) , ((x1 , y3) , 0.2)) ,
((x2 , y1) , 0.7)) , ((x2 , y2) , 0.2)) , ((x2 , y3) , 0.3)) ,
((x3 , y1) , 1)) , ((x3 , y2) , 0.6)) , ((x3 , y3) , 0.2)) ,
The relation can be written in matrix form as
y y1 Y2 Y3
x
x1 0 0.1 0.2
R
X2 0.7 0.2 0.3
X3 1 0.6 0.2
2
3
x
Note : Since the values of the membership function 0.7, 1, 0.6 are in the direction of x below
the major diagonal (0, 0.2, 0.2) in the matrix are grater than those 0.1, 0.2, 0.3 in the
direction of y, we therefore say that the relation R describes x is grater than y.
R w
2
3
v
R(2) = {(y) , R
(2)(x , y))}
max means max with respect to y while x is considered fixed means max
Y
with respect to x while y is considered fixed
max
X
SC - Fuzzy set theory – Fuzzy Relations
The Total Projection is also known as Global projection
Note :
For R(1) select max means max with respect to y while x is considered fixed means
Y
For R(2) select max with respect to x while y is considered fixed
max
x
SC - Fuzzy set theory – Fuzzy Relations
For R(T) select max with respect to R(1) and R(2)
R(1) R(2)
1 1
.8 .8
.6 .6
.4 .4
.2 .2
0 x y
0
1 2 3 4 5 1 2 3 4 5
Fig Fuzzy plot of 1st projection R(1) Fig Fuzzy plot of 2nd projection R(2)
• Max-Min Composition
y y1 y2 y3 z z1 z2 z3
x y
R1 x1 0.1 0.3 0 y1 0.8 0.2 0
R2
x2 0.8 1 0.3 y2 0.2 1 0.6
y3 0.5 0 0.4
Note : Number of columns in the first table and second table are equal. Compute max-
SC - Fuzzy set theory – Fuzzy Relations
min composition denoted by R1 R2 :
Step -1 Compute min operation (definition in previous slide). Consider row
x1 and column z1 , means the pair (x1 , z1) for all yj , j = 1, 2, 3, and perform
min operation
min ( R1 (x1 , y1) , R2 (y1 , z1)) = min (0.1, 0.8) = 0.1,
min ( R1 (x1 , y2) , R2 (y2 , z1)) = min (0.3, 0.2) = 0.2,
min ( R1 (x1 , y3) , R2 (y3 , z1)) = min ( 0, 0.5) = 0,
Step -2 Compute max operation (definition in previous slide).
For x = x1 , z = z1 , y = yj , j = 1, 2, 3,
Calculate the grade membership of the pair (x1 , z1) as
{ (x1 , z1) , max ( (min (0.1, 0.8), min (0.3, 0.2), min (0, 0.5) )
i.e. { (x1 , z1) , max(0.1, 0.2, 0) }
i.e. { (x1 , z1) , 0.2 }
Hence the grade membership of the pair (x1 , z1) is 0.2 .
Similarly, find all the grade membership of the pairs
(x1 , z2) , (x1 , z3) , (x2 , z1) , (x2 , z2) , (x2 , z3)
The final result is
z z1 z2 z3
x
R1 R2 = x1 0.1 0.3 0
x2 0.8 1 0.3
Note : If tables R1 and R2 are considered as matrices, the operation composition resembles
the operation multiplication in matrix calculus linking row by columns. After each cell is
occupied max-min value (the product is replaced by min, the sum is replaced by max).
R1 ◻ R2 is defined by
y y1 y2 y3 z z1 z2 z3
x y
x1 0.1 0.3 0 y1 0.8 0.2 0
R1 R2
x2 0.8 1 0.3 y2 0.2 1 0.6
y3 0.5 0 0.4
z z1 z2 z3
x
R1 ◻ R2 = x1 0.3 0 0.1
R1 R2 = R1 ◻ R2
SC - Neural Network –Applications