Module5-Artificial Neural Networks-11Mar2024
Module5-Artificial Neural Networks-11Mar2024
Sensory neurons get information from different parts of the body and
bring it into the CNS.
Motor neurons receive information from other neurons and transmit
commands to the body parts.
The CNS consists of only interneurons which connect one neuron to
another neuron by receiving information from one neuron and
transmitting it to another neuron.
The basic functionality of a neuron is to receive information, process it
and then transmit it to another neuron or to a body parts.
Machine Learning 4
Artificial Neural Networks
Biological Neurons :
A typical biological neuron has four parts called dendrites, soma, axon and
synapse.
The body of the neuron is called as soma.
Dendrites accept the input information and process it in the cell body called soma.
A single neuron is connected by axons to around 10,000 neurons and through
these axons the processed information is passed from one neuron to another
neuron.
A neuron gets fired if the input information crosses a threshold value and transmits
signals to another neuron through a synapse.
A synapse gets fired with an electrical impulse called spikes which are transmitted
to another neuron.
A single neuron can receive synaptic inputs from one neuron or multiple neurons.
These neurons form a network structure which processes input information and
generates a response.
Machine Learning 5
Artificial Neural Networks
The structure of a biological neuron is shown below.
Machine Learning 6
Artificial Neural Networks
Artificial Neurons:
Artificial neurons are like biological neurons which are called as nodes.
A node or a neuron can receive one or more input information and process it.
Artificial neurons or nodes are connected by connection links to one another.
Each connection link is associated with a synaptic weight.
The structure of a single Artificial neuron is shown below.
Machine Learning 7
Artificial Neural Networks
Simple Model of Artificial Neuron:
The first mathematical model of a biological neuron was designed by McCulloch &
Pitts in 1943.
It includes two steps:
1. It receives weighted inputs from other neurons.
2. It operates with a threshold function or activation function.
The received inputs are computed as a weighted sum which is given to the
activation function and if the sum exceeds the threshold value the neuron gets fired.
The mathematical model of a neuron is shown below.
Machine Learning 8
Artificial Neural Networks
The neuron is the basic processing unit that receives a set of inputs x1, x2,…,xn
and their associated weights w1, w2,…,wn.
The Summation function ‘Net-sum’ given by Eq. (10.1) computes the weighted
sum of the inputs received by the neuron.
Net-sum = xi w i (10.1)
The activation function is a binary step function which outputs a value 1 if the
Net-sum is above the threshold value , and a 0 if the Net-sum is below the
threshold value .
So, the activation function is applied to Net-sum as shown in Eq. (10.2).
f(x) = Activation function(Net-sum) (10.2)
Then, output of a neuron is given by,
≥
Y= (10.3)
<
Machine Learning 9
Artificial Neural Networks
Neuron model of McCulloch & Pitts can represent only a few Boolean functions.
A Boolean function has binary inputs and generates a binary output.
For example, a neuron for AND Boolean function would fire when all the inputs
are 1, whereas a neuron for OR Boolean function would fire even if one input is 1.
The weight and threshold values are fixed in this mathematical model.
Structure of Artificial Neural Network :
ANN imitates a human brain which inhibits some intelligence.
The network structure of ANN can be represented by a directed graph with a set
of neuron nodes and connection links or edges connecting these nodes as shown
in the Figure 10.4.
The nodes in the graph are arrayed in a layered manner and can process the
information in parallel.
Machine Learning 10
Artificial Neural Networks
The network shown in the above figure has three layers called input layer,
hidden layer and output layer.
The input layer receives the input information (x1, x2,…,xn) and passes it to
the nodes in the hidden layer.
Machine Learning 11
Artificial Neural Networks
The edges connecting the nodes from the input layer to the hidden layer
are associated with synaptic weights called as connection weights.
These computing nodes or neurons perform some computations based on
the input information (x1, x2,…,xn) received and if the weighted sum of the
inputs to a neuron is above the threshold value or the activation level of the
neuron, then the neuron fires.
Each neuron employs an activation function that determines the output of
the neuron.
The neuron transforms the input signals linearly by computing the sum of
the product of input signals and weights and adds biases to it.
Then, the activation function maps the weighted sum of the inputs to a non-
linear output value.
The node in the output layer produces the output as a single value.
Machine Learning 12
Artificial Neural Networks
Activation Functions:
Activation functions are mathematical functions associated with each
neuron in the neural network that map input signals to output signals.
Mathematical functions decides whether to fire a neuron or not based on the
input signals the neuron receives.
These functions normalize the output value of each neuron between 0 and 1
or between -1 and +1.
The Activation functions can be either linear or non-linear.
Linear functions are useful when the input values can be classified into any
one of the two groups and are generally used in binary perceptrons.
Non-linear functions are continuous functions that map the input in the
range of (0, 1) or (-1, 1), etc.
These functions are useful in learning high-dimensional data or complex
data such as audio, video and images.
Machine Learning 13
Artificial Neural Networks
Activation functions used in ANNs:
1. Identity Function or Linear Function:
f(x) = x x (10.4)
The value of f(x) increases linearly or proportionally with the value of x.
This function is useful when we do not want to apply any threshold.
The output would be just the weighted sum of input values.
The output value ranges between - and + .
2. Binary Step Function:
≥
f(x) = (10.5)
<
The output value is binary, i.e., 0 or 1 based on the threshold value .
• If value of f(x) is greater than or equal to , it outputs 1 or else it outputs 0.
Machine Learning 14
Artificial Neural Networks
3. Bipolar Step Function:
≥
f(x) = (10.6)
−1 <
The output value is bipolar, i.e., +1 or -1 based on the threshold value .
If value of f(x) is greater than or equal to , it outputs +1 or else it outputs -1.
4. Sigmoidal Function or Logistic Function:
(x) = (10.7)
1 + e−x
It is a widely used non-linear activation function which produces an S-shaped
curve, and the output values are in the range of 0 and 1.
It has a vanishing gradient problem, i.e., no change in the prediction for very low
input values and very high input values.
5. Bipolar Sigmoid Function:
1 − e−x
(x) =
1 + e−x
It outputs the values between -1 and +1. (10.8)
Machine Learning 15
Artificial Neural Networks
6. Ramp Functions:
f(x) = (10.9)
Machine Learning 16
Artificial Neural Networks
8. ReLu - Rectified Linear Unit Function:
This function is used in deep learning neural network models in the hidden layers.
It avoids or reduces the vanishing gradient problem.
This function outputs a value of 0 for negative input values and works like a linear
function if the input values are positive.
x ≥
r(x) = max (0, x) = (10.11)
<
9. Softmax Function:
This is a non-linear function used in the output layer that can handle multiple classes.
It calculates the probability of each target class which ranges between 0 and 1.
The probability of the input belonging to a particular class is computed by dividing the
exponential of the given input value by the sum of the exponential values of all the
inputs.
xi
s(xi) = xj where i = 0 …k (10.12)
∑
Machine Learning 17
Perceptron And Learning Theory
Perceptron is the first neural network model designed by Frank Rosenblatt
in 1958.
It is a linear binary classifier used for supervised learning.
Frank Rosenblatt modified the neuron model designed by McCulloch &
Pitts by combining two concepts such as McCulloch-Pitts model of artificial
neuron and Hebbian learning rule of adjusting weights.
He introduced variable weight values and an extra input that represents
bias to this model.
He proposed that artificial neurons could learn weights and thresholds
from data, and developed a supervised learning algorithm that enabled
the artificial neurons to learn the correct weights from training data by
themselves.
Machine Learning 18
Perceptron And Learning Theory
The perceptron model consists of 4 steps as shown in Figure 10.5.
1. Inputs from other neurons
2. Weights and bias
3. Net sum
4. Activation function
Machine Learning 19
Perceptron And Learning Theory
The modified neuron model receives a set of inputs x1, x2,…,xn their associated
weights w1, w2,…,wn and a bias.
The summation function ‘Net-sum’ given by Eq. (10.13) computes the weighted
sum of the inputs received by the neuron.
Net-sum = 𝑖 𝑖 (10.13)
After computing the ‘Net-sum’, the bias value is added to it and inserted in the
activation function as shown below:
f(x) = Activation function (Net-sum + bias) (10.14)
The activation function is a binary step function which outputs a value 1 if f(x) is
above the threshold value , and a 0 if f(x) is below the threshold value .
Then, the output of a neuron is given by:
1 𝑖𝑓 𝑓 𝑥 ≥ 𝜃
Y= (10.15)
0 𝑖𝑓 𝑓 𝑥 < 𝜃
Machine Learning 20
Perceptron And Learning Theory
Machine Learning 21
Perceptron And Learning Theory
Example 10.1:
Consider a perceptron to represent the Boolean function AND with the initial weights
w1 = 0.3, w2 = -0.2, learning rate = 0.2 and bias = 0.4 as shown in Figure 10.6. The
activation function used here is the Step function f(x) which gives the output value as
binary, i.e., 0 or 1. If value of f(x) is greater than or equal to 0, it outputs 1 or else it
outputs 0. Design a perceptron that performs the Boolean function AND and update the
weights until the Boolean function gives the desired output.
Machine Learning 22
Perceptron And Learning Theory
Solution:
Desired output for Boolean function AND is shown in Table 10.1.
For each Epoch, the weighted sum is calculated, and the activation function is
applied to compute the estimated output Yest.
Then, compare Yest with Ydes to find the error.
If there is an error, then update the weights.
Tables 10.2 to 10.5 shows how the weights are updated in the four Epochs.
Machine Learning 23
Perceptron And Learning Theory
Machine Learning 24
Perceptron And Learning Theory
Machine Learning 26
Perceptron And Learning Theory
We can observe that with 4 Epochs the perceptron learns, and the weights
are updated to 0.3 and 0.2 with which the perceptron gives the desired
output of a Boolean AND function.
Machine Learning 27
Perceptron And Learning Theory
XOR Problem:
A perceptron model can solve all Boolean functions which are linearly
separable.
However, the XOR problem was identified in 1969 by Minsky and Papert.
An XOR function returns a 1, if the two inputs are not equal and a 0 if they
are equal.
Following is the truth table of an XOR function shown below.
Table 10.6: XOR Truth Table
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0
Machine Learning 28
Perceptron And Learning Theory
Since XOR problem is not linearly separable, the single layer perceptron
failed to classify it.
But this problem can be solved by a Multi-Layer Perceptron.
Initially, the Multi-Layer Perceptron (MLP) was not successful due to the
lack of an appropriate learning algorithm.
Werbos introduced a back propagation algorithm for the three-layered
perceptron network in 1974, and a general back propagation algorithm
for a multi-layered perceptron was introduced by Rummelhart and
Mclelland in 1986.
Then, the ANN and Deep Neural Networks became a successful, and
they can be used to solve many complex problems in the current era.
Thus, MLP could solve any non-linear separable problem.
Machine Learning 29
Perceptron And Learning Theory
Delta Learning Rule and Gradient Descent:
Learning in neural networks is performed by adjusting the network weights to
minimize the difference between the desired and estimated outputs.
This delta difference is measured as an error function or cost function.
The cost function which is being linear and continuous, is differentiable.
This way of learning called as delta rule (also known as Widrow-Hoff rule or
Adaline rule) is a type of back propagation applied to train the network.
The training error of a hypothesis is half of the squared difference between the
desired target output and actual output and is given by:
Training Error = (ODesired − OEstimated)2 (10.16)
Where, T is the training dataset, ODesired and OEstimated are the desired target
output and estimated actual output respectively for a training instance d.
Machine Learning 30
Perceptron And Learning Theory
The gradient descent is an optimization approach which is used to minimize
the cost function by converging to a local minimal point moving in the
negative direction of the gradient and each step size during movement is
determined by the learning rate and the slope of the gradient.
Gradient descent learning is the foundation of back propagation algorithm
used in MLP.
Before studying MLP, we will study the different types of neural networks
that differ in their structure, activation function and learning mechanism.
Machine Learning 31
Types of Artificial Neural Networks
ANNs consist of multiple neurons arranged in layers.
There are different types of ANNs which differ by the network structure, activation
function involved, and the learning rules used.
In an ANN, there are three layers called input layer, hidden layer and output layer.
Any general ANN would consist of one input layer, one output layer and zero or
more hidden layers.
Feed Forward Neural Network:
This is the simplest neural network consists of neurons which are arranged in
layers and the information is propagated only in the forward direction.
This model may or may not contain a hidden layer and there is no back
propagation.
Based on the number of hidden layers they are further classified into single-
layered and multi-layered feed forward networks.
These ANNs are simple to design and easy to maintain.
Machine Learning 32
Types of Artificial Neural Networks
They are fast but cannot be used for complex learning.
They are used for simple classification and simple image processing, etc.
The model of a Feed Forward Neural Network is shown below.
Machine Learning 34
Types of Artificial Neural Networks
In this type of ANN, the information flows in both the directions.
In the forward direction, the inputs are multiplied by weights of neurons and
forwarded to the activation function of the neuron and then the output is
passed to the next layer.
If the output is incorrect, then the error is back propagated in the backward
direction to adjust the weights and biases to get the correct output.
Thus, the network learns with the training data.
This type of ANN is used in deep learning for complex classification, speech
recognition, medical diagnosis, forecasting, etc.
They are comparatively complex and slow.
The model of an MLP is shown below.
Machine Learning 35
Types of Artificial Neural Networks
Machine Learning 37
Learning in a Multi-layer Perceptron
Multi-Layer Perceptron (MLP) is a type of Feed Forward Neural Network with
multiple neurons arranged in layers.
All the neurons in a layer are fully connected to the neurons in the next layer.
The network has at least three layers with an input layer, one or more hidden
layers and an output layer.
The input layer is the visible layer, which just passes the input to the next layer.
The layers following the input layer are the hidden layers.
The hidden layers neither directly receive inputs nor send outputs to the
external environment.
The final layer is the output layer which outputs a single value or a vector of
values.
An MLP has at least one hidden layer, and as the number of hidden layers is
increased the learning becomes more complex, and hence they form a deep
neural network.
Machine Learning 38
Learning in a Multi-layer Perceptron
It uses back propagation for supervised learning of the network.
The activation functions used in the layers can be linear or non-linear depending
on the type of the problem being modelled.
A sigmoid activation function is used if the problem is a binary classification
problem and a softmax activation function is used in a multi-class classification
problem.
The typical non-linear separable problem is the XOR problem which could be
solved with an MLP.
A single neuron can only classify the values into two categories by drawing a
single straight line.
But to classify the XOR inputs, we need two straight lines.
Hence, the problem could be solved if we add a hidden layer having two neurons
working in parallel in the same layer and combine the outputs to get a single
output.
Machine Learning 39
Learning in a Multi-layer Perceptron
One neuron should work as an OR gate and the other neuron should work as a
NAND gate.
The neuron in the output layer should work as an AND gate.
The MLP network learns with two phases called the forward phase and the
backward phase.
In the forward phase, an input vector from the training dataset is taken and given
to the network which outputs a value called the estimated value OEstimated.
This value is compared with the desired output value ODesired, and the error is
calculated.
The calculated error is back propagated in the backward phase to update the
weights and biases of the network.
This process is repeated for the entire training dataset.
The algorithm will be terminated after a pre-defined number of epochs or if the
training error is reduced below a threshold value.
Machine Learning 40
Learning in a Multi-layer Perceptron
This way of minimizing the training error by updating the weights and biases
may lead to overfitting problem in neural networks.
There are several techniques available to overcome this overfitting problem.
One successful solution is to provide a set of validation data along with the
training data and use a cross validation approach to estimate the number of
iterations or weight updates to perform that produces the lowest error with
the validation set and stop when the validation error begins to increase.
Algorithm 10.2-Learning in an MLP:
Machine Learning 41
Learning in a Multi-layer Perceptron
Step 1: Forward Propagation
1. Calculate Input and Output in the Input Layer:
(Input layer is a direct transfer function, where the output of the node equals the input).
Input at Node j ‘Ij’ in the Input Layer is,
Ij = xj
Where, xj is the input received at Node j
Output at Node j ‘Oj’ in the Input Layer is,
Oj = Ij
2. Calculate Net Input and Output in the Hidden Layer and Output Layer:
Net Input at Node j in the Hidden Layer is,
Ij = xiwij + x0 𝑗
Where, xi is the input from Node i, wij is the weight in the link from Node i to Node j
x0 is the input to bias node 0 which is always assumed as 1
𝑗 is the weight in the link from the bias node 0 to Node j
Machine Learning 42
Learning in a Multi-layer Perceptron
Net Input at Node j in the Output Layer is,
Ij = Oiwij + x0 𝑗
Where, Oi is the output from Node i
wij is the weight in the link from Node i to Node j
x0 is the input to bias node 0 which is always assumed as 1
𝑗 is the weight in the link from the bias node 0 to Node j
Output at Node j is
Oj = −Ij
Where, Ij is the input received at Node j
3. Estimate error at the node in the Output Layer:
Error = ODesired – OEstimated
Where, ODesired is the desired output value of the Node in the Output Layer
OEstimated is the estimated output value of the Node in the Output Layer
Machine Learning 43
Learning in a Multi-layer Perceptron
Step 2: Backward Propagation
1. Calculate error at each node:
For each Unit k in the Output Layer
Errork = Ok (1 - Ok) (ODesired - Ok)
Where, Ok is the output value at Node k in the Output Layer.
ODesired is the desired output value of the Node in the Output Layer.
For each unit j in the Hidden Layer
Errorj = Oj (1 - Oj) 𝑘 jk
Where, Oj is the output value at Node j in the Hidden Layer.
Errork is the error at Node k in the Output Layer.
wjk is the weight in the link from Node j to Node k.
Machine Learning 44
Learning in a Multi-layer Perceptron
2. Update all weights and biases:
Update weights:
wij = × Errorj × Oi
wij = wij + wij
Where, Oi is the output value at Node i.
Errorj is the error at Node j.
is the learning rate.
wij is the weight in the link from Node i to Node j.
wij is the difference in weight that has to be added to wij.
Update Biases:
j= × Errorj
j= j+ j
Where, Errorj is the error at Node j
is the learning rate
j is the bias value from Bias Node 0 to Node j
j is the difference in bias that has to be added to j.
Machine Learning 45
Learning in a Multi-layer Perceptron
Example 10.2:
Consider learning in a Multi-Layer Perceptron. The given MLP consists of an Input
layer, one Hidden layer and an Output layer. The input layer has 4 neurons, the hidden
layer has 2 neurons, and the output layer has a single neuron. Train the MLP by updating
the weights and biases in the network.
Machine Learning 46
Learning in a Multi-layer Perceptron
Solution:
From the Figure 10.11, the weights and biases are tabulated in Table 10.7.
Machine Learning 47
Learning in a Multi-layer Perceptron
2. Calculate Net Input and Output in the Hidden Layer and Output Layer as
shown in Table 10.9.
Machine Learning 48
Learning in a Multi-layer Perceptron
Step 2: Backward Propagation
1. Calculate Error at each node as shown in Table 10.10.
For each unit k in the output layer, calculate:
Errork = Ok (1 - Ok) (Odesired - Ok)
For each unit j in the hidden layer, calculate:
Errorj = Oj (1 - Oj) 𝑘 jk
Machine Learning 49
Learning in a Multi-layer Perceptron
2. Update weight using the formula:
Learning rate = 0.8.
wij = × Errorj × Oi
wij = wij + wij
The updated weights and bias are shown in Tables 10.11 and 10.12
respectively.
Machine Learning 50
Learning in a Multi-layer Perceptron
Machine Learning 51
Learning in a Multi-layer Perceptron
Update bias using the formula:
j = × Errorj
j = j+ j
Machine Learning 52
Learning in a Multi-layer Perceptron
Iteration 2:
Now, with the updated weights and biases:
1. Calculate Input and Output in the Input Layer as shown below.
Calculate Net Input and Output in the Hidden Layer and Output
Layer as shown below.
Machine Learning 53
Learning in a Multi-layer Perceptron
Machine Learning 55
Radial Basis Function Neural Network
Typical Radial Basis Functions (RBF) are:
The Gaussian RBF which monotonically decreases with distance from the
centre which is given by,
−(x−c)2
H(x) = 𝑒 r2 (10.17)
Where, c is the centre and r is the radius.
A Multiquadric RBF which monotonically increases with distance from the
centre which is given by,
r2 + (x−c)2
H(x) = r
(10.18)
RBFNN architecture includes:
1. An input layer that feeds the input vector of n-dimension to the network
(x1, x2, ...., xn).
2. A hidden layer that comprises ‘m’ non-linear radial basis function neurons
where m ≥ n.
Machine Learning 56
Radial Basis Function Neural Network
The hidden layer implements the Radial Basis Function called Gaussian function.
The output of a hidden layer neuron for an input vector x is given as in Eq. 10.17.
−(x−c)2
i.e., H(x) = 𝑒 r2
Where, x is the input vector, c is the centre and r is the radius.
Each RBF neuron in the hidden layer compares the input vector with the centre of
the neuron which is a bell curve, and outputs a similarity value between 0 and 1.
If the input is equal to the neuron centre, then the output is 1 but as the difference
increases, the activation value or the output of the neuron falls off exponentially
towards 0.
3. An output layer that computes the linear weighted sum of the output of each
neuron from the hidden layer neurons which is given by,
F(x) = wi Hi(x) (10.19)
Machine Learning 57
Radial Basis Function Neural Network
Where, wi is the weight in the link from the Hidden Layer neuron i to the
Output Layer.
Hi(x) is the output of a Hidden Layer neuron i for an input vector x.
The architecture of a Radial Basis Function Neural Network is shown below.
Training or learning with RBFNN is very fast and the neural network is very
good at interpolation.
58
Machine Learning
Radial Basis Function Neural Network
Algorithm 10.3-Radial Basis Function Neural Network:
Input: Input vector (x1, x2, ...., xn)
Output: Yn
Assign random weights for every connection from the Hidden layer to the Output
layer in the network in the range [-1, +1].
Forward Phase:
Step 1: Calculate Input and Output in the Input Layer:
(Input layer is a direct transfer function, where the output of the node equals the input).
Input at Node i ‘Ii’ in the Input Layer is
Ii = xi
where, xi is the input received at Node i.
Output at Node i ‘Oi’ in the Input Layer is
Oi = Ii
Machine Learning 59
Radial Basis Function Neural Network
Step 2:
For each node j in the Hidden Layer, find the centre/receptor c and the variance r.
Define hidden layer neurons with Gaussian RBF whose output is:
−(x−cj)2
Hj(x) = 𝑒 r2
where, x is the input, cj is the centre and r is the radius.
Compute (x - cj)2 by applying Euclidean distance measure between x and cj.
Step 3:
For each node k in the Output Layer, compute linear weighted sum of the output
of each neuron k from the hidden layer neurons j.
Fk(x) = wjk Hj(x)
where, wjk is the weight in the link from the Hidden Layer neuron j to the Output
Layer neuron k.
Hj(x) is the output of a Hidden Layer neuron j for an input vector x.
Machine Learning 60
Radial Basis Function Neural Network
Backward Phase:
Step 1: Train the Hidden layer using Back propagation.
Step 2: Update the weights between the Hidden layer and Output layer.
Example 10.3:
Consider the XOR Boolean function that has 4 patterns (0, 0) (0, 1) (1, 0) and (1, 1) in a
2-dimensional input space. Construct a RBFNN as shown in Figure 10.13 that classifies
the input pattern:
(0, 0) → 0
(0, 1) → 1
(1, 0) → 1
(1, 1) → 0
Machine Learning 61
Radial Basis Function Neural Network
Solution:
Define 4 hidden layer neurons with Gaussian RBF:
1 2
H1(x) = , c1=(0, 0) H2(x) = , c2=(0, 1)
3 4
H3(x) = , c3=(1, 0) H4(x) = , c4=(1, 1)
Machine Learning 62
Radial Basis Function Neural Network
Machine Learning 63
Radial Basis Function Neural Network
Machine Learning 64
Radial Basis Function Neural Network
The following Table shows the obtained values during the calculations done
in the forward phase.
Machine Learning 66
Self-Organizing Feature Map
Self-Organizing Feature Map (SOFM) is a special type of Feed Forward
ANN developed by Dr Teuvo Kohonen in 1982.
Kohonen network is a competitive learning network which is also called
as adaptive learning network.
SOFM is an unsupervised learning model that clusters data by mapping a
high-dimensional data into a two-dimensional map (neurons) or plane.
This model learns to cluster, or self organize a high-dimensional data
without knowing the class membership of the input data.
These self-organizing nodes are also called as feature maps.
The mapping is based on the relative distance or similarity between the
points and the points that are near to each other in the input space are
mapped to nearby output map units in the SOFM.
Machine Learning 67
Self-Organizing Feature Map
Network Architecture and Operations:
The network architecture consists of only two layers called the Input layer
and the Output layer, and there are no Hidden layers.
The number of units in the Input layer is based on the length of the input
samples which is a vector of length ‘n’.
Each connection from the Input units in the Input layer to the output units
in the Output layer is assigned with random weights.
There is one weight vector of length ‘n’ associated with each output unit.
Output units have intra layer connections with no weights assigned
between these connections but used for updating the weights.
The network architecture of SOFM is shown in Figure 10.14.
Machine Learning 68
Self-Organizing Feature Map
The SOFM network operates in two phases, the training phase and the
mapping phase.
During the training phase, the input layer is fed with input samples
randomly from the training data.
The units or neurons in the output layer are initially assigned with some
weights.
Machine Learning 69
Self-Organizing Feature Map
As the input sample is fed, each output unit computes a similarity score
by Euclidean distance measure and compete with each other.
The output unit which is close to the input sample by similarity is chosen
as the winning unit, and its connection weights are adjusted by a learning
factor.
Thus, the best matching output unit whose weights are adjusted are
moved close to the input sample and a topological feature map is formed.
This process is repeated until the map does not change.
During the mapping phase, the test samples are just classified.
Machine Learning 70
Self-Organizing Feature Map
Machine Learning 71
Self-Organizing Feature Map
Machine Learning 72
Self-Organizing Feature Map
Example 10.4:
Consider the example shown in Figure 10.15 which considers four training
samples each vector of length 4 and two output units. Train the SOFM network
by determining the class memberships of the input data.
Training Samples:
x1: (1, 0, 1, 0)
x2 : (1, 0, 0, 0)
x3 : (1, 1, 1, 1)
x4 : (0, 1, 1, 0)
Output Units: Unit 1, Unit 2, Learning rate (t) = 0.6
Initial Weight matrix:
:
Machine Learning 73
Self-Organizing Feature Map
Solution:
Iteration 1:
Training Sample x1: (1, 0, 1, 0)
Weight Matrix:
:
Machine Learning 74
Self-Organizing Feature Map
Compute Euclidean distance between x1: (1, 0, 1, 0) and Unit 1 weights.
d2 = (0.3 - 1)2 + (0.5 - 0)2 + (0.7 - 1)2 + (0.2 - 0)2 = 0.87
Compute Euclidean distance between x1 : (1, 0, 1, 0) and Unit 2 weights.
d2 = (0.6 - 1)2 + (0.7 - 0)2 + (0.4 - 1)2 + (0.3 - 0)2 = 1.1
Unit 1 wins.
Update the weights of the winning unit.
New Unit 1 weights = [0.3 0.5 0.7 0.2] + 0.6 ([1 0 1 0] - [0.3 0.5 0.7 0.2])
= [0.3 0.5 0.7 0.2] + 0.6 [0.7 -0.5 0.3 -0.2]
= [0.3 0.5 0.7 0.2] + [0.42 -0.30 0.18 -0.12]
= [0.72 0.2 0.88 0.08]
:
Machine Learning 75
Self-Organizing Feature Map
Iteration 2:
Training Sample x2: (1, 0, 0, 0)
Weight Matrix:
:
Compute Euclidean distance between x2 : (1, 0, 0, 0) and Unit 1 weights.
d2 = (0.72 - 1)2 + (0.2 - 0)2 + (0.88 - 0)2 + (0.08 - 0)2 = 0.74
Compute Euclidean distance between x2 : (1, 0, 0, 0) and Unit 2 weights.
d2 = (0.6 - 1)2 + (0.7 - 0)2 + (0.4 - 0)2 + (0.3 - 0)2 = 0.9
Unit 1 wins.
Update the weights of the winning unit.
New Unit 1 weights = [0.72 0.2 0.88 0.08] + 0.6 ([1 0 0 0] - [0.72 0.2 0.88 0.08])
= [0.72 0.2 0.88 0.08] + 0.6 [0.28 -0.2 -0.88 -0.08]
= [0.72 0.2 0.88 0.08] + [0.17 -0.12 -0.53 -0.05] = [0.89 0.08 0.35 0.03]
Machine Learning 76
Self-Organizing Feature Map
:
Iteration 3:
Training Sample x3: (1, 1, 1, 1)
Weight Matrix:
:
Compute Euclidean distance between x3 : (1, 1, 1, 1) and Unit 1 weights.
d2 = (0.89 - 1)2 + (0.08 - 1)2 + (0.35 - 1)2 + (0.03 - 1)2 = 2. 2
Compute Euclidean distance between x3 : (1, 1, 1, 1) and Unit 2 weights.
d2 = (0.6 - 1)2 + (0.7 - 1)2 + (0.4 - 1)2 + (0.3 - 1)2 = 1.1
Unit 2 wins.
Machine Learning 77
Self-Organizing Feature Map
Update the weights of the winning unit.
New Unit 2 weights = [0.6 0.7 0.4 0.3] + 0.6 ([1 1 1 1] - [0.6 0.7 0.4 0.3])
= [0.6 0.7 0.4 0.3] + 0.6 [0.4 0.3 0.6 0.7]
= [0.6 0.7 0.4 0.3] + [0.24 0.18 0.36 0.42] = [0.84 0.88 0.76 0.72]
:
Iteration 4:
Training Sample x4: (0, 1, 1, 0)
Weight Matrix:
:
Machine Learning 78
Self-Organizing Feature Map
Compute Euclidean distance between x4 : (0, 1, 1, 0) and Unit 1 weights.
d2 = (0.89 - 0)2 + (0.08 - 1)2 + (0.35 - 1)2 + (0.03 - 0)2 = 2.06
Compute Euclidean distance between x4 : (0, 1, 1, 0) and Unit 2 weights.
d2 = (0.84 - 0)2 + (0.88 - 1)2 + (0.76 - 1)2 + (0.72 - 0)2 = 1.3
Unit 2 wins.
Update the weights of the winning unit.
New Unit 2 weights = [0.84 0.88 0.76 0.72] + 0.6 ([0 1 1 0] - [0.84 0.88 0.76 0.72])
= [0.84 0.88 0.76 0.72] + 0.6 [-0.84 0.12 0.24 -0.72]
= [0.84 0.88 0.76 0.72] + [-0.5 0.07 0.14 -0.43] = [0.34 0.95 0.9 0.29]
:
Machine Learning 79
Self-Organizing Feature Map
Best mapping units for each of the sample taken are:
x1 : (1, 0, 1, 0) Unit 1
x2 : (1, 0, 0, 0) Unit 1
x3 : (1, 1, 1, 1) Unit 2
x4 : (0, 1, 1, 0) Unit 2
This process is continued for many epochs until the feature map does not
change.
Machine Learning 80
Applications of Artificial Neural Networks
ANN learning mechanisms are used in many complex applications that involve
modelling of non-linear processes.
ANN models can be used to handle even noisy and incomplete data.
They are used to model complex patterns, recognize patterns and solve prediction
problems like humans in many areas such as:
1. Real-time applications: Face recognition, emotion detection, self-driving cars,
navigation systems, routing systems, target tracking, vehicle scheduling, etc.
2. Business applications: Stock trading, sales forecasting, customer behaviour
modelling, Market research and analysis, etc.
3. Banking and Finance: Credit and loan forecasting, fraud and risk evaluation,
currency price prediction, real-estate appraisal, etc.
4. Education: Adaptive learning software, student performance modelling, etc.
5. Healthcare: Medical diagnosis or mapping symptoms to a medical case, image
interpretation and pattern recognition, drug discovery, etc.
6. Other Engineering Applications: Robotics, aerospace, electronics, manufacturing,
communications, chemical analysis, food research, etc.
Machine Learning 81
Advantages and Disadvantages of ANN
Advantages of ANN:
1. ANN can solve complex problems involving non-linear processes.
2. ANNs can learn and recognize the complex patterns and solve
problems as humans solve a problem.
3. ANNs have a parallel processing capability and can predict in
less time.
4. They have an ability to work with inadequate knowledge, as it can
even handle incomplete and noisy data.
5. They can scale well to larger data sets and outperforms other
learning mechanisms.
Machine Learning 82
Advantages and Disadvantages of ANN
Disadvantages of ANN:
1. An ANN requires processors with parallel processing capability to train the
network running for many epochs.
The function of each node requires a CPU capability which is difficult for
very large networks with a large amount of data.
2. They work like a black box, and it is difficult to understand their working in
inner layers.
It is hard to understand the relationship between the representations
learned at each layer.
3. The modelling with ANN is also extremely complicated and the development
takes a much longer time.
4. Generally, neural networks require more data than traditional machine
learning algorithms, and they do not perform well on small datasets.
5. They are also more computationally expensive than traditional learning
techniques.
Machine Learning 83
Challenges of Artificial Neural Networks
Machine Learning 84