01-ci-cs6
01-ci-cs6
CS 30011
Version: 2024/08/13
0 Notices 1
0.1 Top Announcement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.3 Time Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.4 Lesson plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.5 Grading Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1 Introduction 4
1.1 Branches of Soft Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Hybrid Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Characteristics of Soft Computing . . . . . . . . . . . . . . . . . . . . . . . . 5
Bookmark link to the course folder in your web browser and join the
whatsapp group.
0.2 Links
Note This file may be old. Refer to the latest version of this file from the course folder.
Course folder
https://drive.google.com/drive/folders/1v1xYs8OubR6aoZS36rpY7U7Tq0 axsxR
Whatsapp group
https://chat.whatsapp.com/KatvgJ8BLEM88hbAt4izgL
Batch: CI-CS6
Tue 4 p.m. A-LH-009
Wed 11 a.m. A-LH-009
Fri 1 p.m. A-LH-009
♡
5. Optimization 3 32-34
1. Derivative-based Optimization and
Derivative-free Optimization
2. Genetic Algorithms (GA)
3. Differential Evolution (DE)
0.4.1 Resources
Text Book
Neuro-Fuzzy and Soft Computing, Jang, Sun, Mizutani, PHI/Pearson Education
Reference Books
1. Neural Network Design, M. T. Hagan, H. B. Demuth, Mark Beale, Thomson Learning,
Vikash Publishing House
2. Genetic Algorithms: Search, Optimization and Machine Learning, Davis E. Goldberg, Ad-
dison Wesley, N.Y., 1989
3. Swarm Intelligence Algorithms: A Tutorial, Adam Slowik, Ed: CRC Press, 2020
4. Introduction to Soft Computing, Roy and Chakraborty, Pearson Education
5. Fuzzy Logic with Engineering Applications, Timothy J. Ross, McGraw-Hill, 1997
6. Neural Networks: A Comprehensive Foundation, Simon Haykin, Prentice Hall
7. Neural Networks, Fuzzy Logic and Genetic Algorithms, S. Rajasekaran and G.A.V. Pai,
PHI, 2003
Prelude
Soft Computing is a term coined to contrast the word hard computing. Hard computing
means to solve problems using precise methods to obtain accurate solutions. For example
logical reasoning and numerical search techniques. Like if x = 10 , 1/x is 0.1.
Soft Computing
Soft computing means to solve problems using imprecise methods to obtain approxi-
mate solutions. Here approximate reasoning and randomized search techniques are
used. ♣
For e.g. if x is high, 1/x is low. Soft computing deals with problems characterized by
uncertainty and imprecision just like human mind. Such problems are unsolvable by hard
computing.
These are systems where more than one technology is used to solve a problem. The aim
is to consolidate the strength of individual technologies and eliminating their weaknesses.
1. Neuro Fuzzy: Fuzzy logic based neural network where fuzzy inputs are used.
2. Neuro Genetic: Optimizing neural networks using genetic algorithms.
3. Fuzzy Genetic: Run genetic algorithms with fuzzy constraints.
Prelude
5. If total received impulse is greater than a threshold, the neuron fires the neighboring
neurons. Its action is inhibitory if it prevents firing of next neuron or it can be excitatory
if helps firing the next neuron.
1. Inspired by the biological neuron, a model of artificial neuron can be utilized in creating
massively parallel computing structure.
2. Artificial neuron behaves like biological neuron.
(a). It collects the weighted sum of input from neighboring neurons.
zj = XWjT
(b). If the total input is above firing potential θj , it sends output signal to other neurons.
yj = f (zj − θj )
Here w are the weights and f is the activation function or transfer function for
neuron j.
(c). To provide a nonzero threshold, one of the inputs, say x0 , can be fixed as 1. Then
its weight will act like a threshold value.
zj = XWjT
yj = f (zj )
Here X contains x0 = 1 and W contains the term w0j extra.
3. In absence of activation function f , the neuron is not really useful. Activation function
makes it a highly nonlinear element. One example of activation function is unit step
function (also called Heaviside function), f = u(z).
0 if z < 0,
u(z) = (2.1)
1 if z ≥ 0.
It is the simplest neuron model consisting of only two layers. Input layer represents the
input xi connected to the neuron through links with weights wi . Output layer consists of neuron
and its output y. Output is yj = f (XWjT ) where weights wi are fixed.
Example 2.1 Design a McCulloch-Pitt’s neuron that takes two binary inputs and performs
AND operation to get the output. [Derived on the board]
Exercise 2.1 Design a neuron that takes two binary inputs and performs OR operation to get
the output.
Exercise 2.2 Design a neuron that takes two binary inputs and performs XOR operation to
get the output.
(c). The gradient descent method suggests that Error can be minimized by modifying
wij in following manner:
wij ← wij − η ∂∂Error
wij
This update equation is called learning rule and η is called learning rate.
3. Let us begin with a model that has linear activation function (f). The derivative of Error
when the activation is linear is:
∂ Error
∂ wij = (yj − ytj )xi
4. Such neurons with adjustable weights where output is linear function of inputs are called
adaline (adaptive linear neuron).
5. Adaline can be trained by the weight adjustment formula. The formula can be derived
by substituting the expression for Error gradient in the learning rule.
wij ← wij + η (ytj − yj )xi
2.2.3 Perceptron
Perceptron
Perceptron is extension of McCulloch-Pitt’s model designed for pattern recognition abil-
ities. The first layer of perceptron acts as feature detector. Learning is achieved by
making adjustments to connection strengths and to the threshold θ.
♣
1. Since the derivative of unit step function is zero everywhere (except at origin), the
weights of links connecting inputs and outputs can be updated using following sim-
plified formula:
wij ← wij + η ytj xi
Note Weights are updated only when yj ̸= ytj .
2. The inputs for perceptron may have been obtained by processing original sensory inputs
through an association layer. For example, original inputs can be polar in nature (rather
than binary) which are later converted to binary. Output may be polar or binary. For
binary output unit step function previously defined in eq. 2.1 may be used. For polar
output, following sign function may be used.
−1 if z < 0,
sign(z) = 0 if z = 0, (2.2)
1 if z > 0.
Example 2.2 Perform perceptron learning for OR operation using polar inputs and outputs.
Use learning rate, η = 1. [Solved on the board]
Exercise 2.3 Plot the above activation functions and determine if they are binary or polar.
Exercise 2.4 Derive and verify the derivatives f ′ (z) of above activation functions.
As it was evident from exercise 2.2, McCulloch Pitt’s model can not be designed for XOR
operation, it can also be inferred that the model can not be trained to learn XOR by either
adaline or perceptron learning rules. This is because adaline and perceptron have one input
layer and one output layer. Input layer is not counted so they are single layer networks.
Multiple layer network is a network with input layer, output layer and one or more addi-
tional layers called hidden layers. Multilayered network is also called multilayer perceptron
(MLP).
P
Nl −1
Then the output of a neuron j is yj = f i=1 wij yi .
2.5 Madaline
10
Forward propagation
The network is presented input patterns one by one and output of all neurons are com-
puted starting from first hidden layer till the final output layer.
Backpropagation
Using the cumulative error, the weights are updated in the direction of gradient descent.
The weight update mechanismconsists of following steps:
1. A pattern is presented to the system and error signal is generated at the output nodes.
[Eq 2.3 and 2.4]
2. The error signal is propagated backwards from the output nodes till the input nodes.
[Eq 2.5]
3. Using the error value, the weights on the links are updated. [Eq 2.6]
4. Although small learning rate guarantees stable but slow convergence and high learning
rate increases the chances of failure to converge, a small momentum factor can help in-
creasing the learning rate without divergence (oscillations). In [Eq 2.6], α is momentum
factor.
5. This process is repeated until error signal is weaker than a specified value. [Recursion]
Nl −1
X
yl,j = f wij yl −1,i (2.3)
i=0
′
δL,i = yt,i − yL,i fL,i (2.4)
XNl+1
δl,i = δl+1,j wij fl,i′ (2.5)
j=1
11
Example 2.3 Consider a two-layer feedforward ANN with two inputs x1 and x2 , hidden units
h1 and h2 , and output units y1 and y2 . Initialize the weights with the value 0.1 each and
compute their new values for first two training iterations (one iteration on each input) using
backpropagation. Assume learning rate of 0.5 and momentum factor of 0.1 with incremental
weight updates.
x0 = 1 x0 = 1
w1 w7
w2 w8
x1 h1 y1 y1
w3 w9
w4 w10
w5 w11
x2 h2 y2 y2
w6 w12
x1 x2 t1 t2
1 0 1 0
0 1 0 1
1 w1 = 0.1 w7 = 0.1
2 w2 = 0.1 w8 = 0.1
3 w3 = 0.1 w9 = 0.1
4 w4 = 0.1 w10 = 0.1
5 w5 = 0.1 w11 = 0.1
6 w6 = 0.1 w12 = 0.1
12
25 w1 = 0.0997 w7 = 0.1554
26 w2 = 0.0997 w8 = 0.1304
27 w3 = 0.1 w9 = 0.1304
28 w4 = 0.0997 w10 = 0.0317
29 w5 = 0.0997 w11 = 0.0625
30 w6 = 0.1 w12 = 0.0625
Recurrent Network
A network with nodes connected to form a directional loop is called a recurrent network.
♣
The state of such systems at time t+1 is dependent on the state at time t. We can apply
back propagation on such networks by unfolding the network. The network is unfolded in time.
13
Each time frame adds a new layer consisting of copies of all original nodes hence only
a limited number of frames can be supported. Applying back propagation on such network is
fairly straightforward. This is called BPTT (backpropagation through time).
RBFN
RBFN are ”locally tuned” receptive networks. There are no connection weights. Rather
there are proximity weights of the links that are used radially according to following
equation:
|x −w |2
−
y=e σ2 (2.7)
♣
Euclidean distance between inputs (x) and weights (w) is calculated. The output signal
is a Gaussian function of the euclidean distance.
14
Architecture
Inputs are connected to a hidden layer containing receptive fields. The final output of
RBFN is weighted sum of the outputs of receptive fields (fig.2.8 a). Here outputs of the re-
ceptive fields are being multipled by weights C. The output may be normalized by dividing
the original output by sum of the individual outputs (fig.2.8 b). Finally, fig.2.8 c and d show
”weighted sum” and ”normalized sum” cases for multiple output RBFN respectively.
15
3.1 Introduction
Crisp set or classical set is a set with a well defined boundary. There is clear distinction
between belonging to a set and not belonging to a set. Eg. is tank more than half filled?
Law of contradiction A ∩ A′ = ∅
Law of excluded middle A ∪ A′ = X
Idempotency A∩A=A
A∪A=A
Involution A’ ’ = A
Commutativity A∩B =B∩A
A∪B =B∪A
Associativity (A ∪ B) ∪ C = A ∪ (B ∪ C)
(A ∩ B) ∩ C = A ∩ (B ∩ C)
Distributivity A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
Absorption A ∪ (A ∩ B) = A ∩ (A ∪ B) = A
DeMorgan’s laws (A ∪ B)′ = A′ ∩ B′
(A ∩ B)′ = A′ ∪ B′
Table 3.1: Operations on crisp sets
It is a set without a crisp boundary. The transition from ”belong to a set” to ”not belong
to a set” is gradual. Eg. is tank almost full? Fuzziness does not arise from randomness, but
from imprecise nature of abstract concepts.
Fuzzy set
A fuzzy set A in X is defined as set of ordered pairs:
A = { (x, µ(x)) | x ∈ X }
Examples
A = {(11, 0.2), (12, 0.5), (13, 0.7), (14, 1), (15, 1), (16, 0.8), (17, 0.6)}
2. Ideal rainfall
( )
1
B= x
4 x ∈ (0, ∞)
1 + log 4.5
Figure 3.1
x −a c −x
Triangular 0, ,
b −a c −b
,0
x −a
Trapezoidal 0, b −a
, 1, dd − x
−c , 0
2
− 12 ( x −c
σ )
Gaussian e
1
Generalized bell c 2b
1+( x −
a )
1
Sigmoidal 1+e−a(x −c)
17
3.3 Terminology
Term Definition
support All points where membership is not 0. support(A) = {x | µA (x) > 0}
core All points where membership is full. core(A) = {x | µA (x) = 1}
normality A normal fuzzy set has non-empty core(A) ̸= ∅
core.
corssover Points where membership is 0.5. crossover(A) = {x | µA (x) = 0.5}
point
fuzzy single- Fuzzy set whose support is a single
ton point with membership value of 1.
α-cut or A crisp set with cutoff of α. Aα = {x | µA (x) ≥ α}
α-level set
strong α-cut A crisp set with strict cutoff of α. Aα = {x | µA (x) > α}
convex every α-cut is convex µA(m) < min(µA(a) , µA(b) ), a ≤ m ≤ b.
bandwith Distance between crossover points in
a normal complex fuzzy set.
symmetry Set is symmetric around x = c if µA (c − x) = µA (c + x) ∀ x ∈ X
open left limx →−∞ µA (x) = 1 and
limx →+∞ µA (x) = 0
open right limx →−∞ µA (x) = 0 and
limx →+∞ µA (x) = 1
closed limx →−∞ µA (x) = limx →+∞ µA (x) = 0
18
Figure 3.3
Figure 3.4
19