0% found this document useful (0 votes)
16 views71 pages

Chapter 2 Adaline

Uploaded by

Deepak singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views71 pages

Chapter 2 Adaline

Uploaded by

Deepak singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Chapter 2:

Dr. Gaur Sanjay B.C.


Visible and Hidden Neurons

• Visible Neurons: there is interface between network and the


environment in which it operates. They are also clamped to
onto specific states determined by the environment.
• Hidden Neurons: they operates independent of the
environment. And there is free running condition, in which all
the neurons are allowed to operate freely.
Hebb Net
• The first learning law for artificial
Neural Network was designed by
Donald Hebb in 1949.
• For Hebb net, the input and output
data should be in bipolar form. It
is limitation of Hebb-net that it
cannot learn for binary data.
• The Hebb-net consists of bias, Figure : Architecture of single layer net
which acts as a weight on a
connection from a unit whose
activation is always 1.
Algorithm
Step 1: Initially all weight and bias are set to zero.
wi=0 and b=0;
Step 2: for all input and target pair perform step 3-step6.
Step 3: Set activation for input units with input vector.
xi=Si
Step 4: Set activation for output units with the output Neuron.
y=t
Step 5: adjust the weight by applying Hebb rule,
wi(new)=wi(old)+xiy; (Δw= xiy)
Step 6: adjust the bias by applying Hebb rule,
b(new)=b(old)+y ((Δb= y)
Step 1: initially

𝑤1 = 𝑤2 = 𝑏 = 0
Input Target Weight Change Final Weight
x1 x2 b y Δw1 Δw2 Δb w1 w2 b
Initially 0 0 0

Step 2: x1=1, x2=1, b=1, y=1 as per the table;


Now update weight
𝑤2 (𝑛𝑒𝑤) = 𝑤2 𝑜𝑙𝑑 + 𝑥2 𝑦
𝑤1 (𝑛𝑒𝑤) = 𝑤1 𝑜𝑙𝑑 + 𝑥1 𝑦 = 0+1=1
= 0+1=1 𝑏(𝑛𝑒𝑤) = b 𝑜𝑙𝑑 + 𝑦
= 0+1=1
Input Target Weight Change Final Weight
x1 x2 b y Δw1 Δw2 Δb w1 w2 b
1 1 1 1 1 1 1
𝛻𝑤1 = 𝑥1 𝑦 𝛻𝑤2 = 𝑥2 𝑦 𝛻𝑏 = 𝑦
= 1.1=1 = 1.1=1 =1
Input Target Weight Change Final Weight
x1 x2 b y Δw1 Δw2 Δb w1 w2 b
1 1 1 1 1 1 1 1 1 1

Step 3: x1=1, x2=-1, b=1, y=-1 as per the table;


Now update weight
𝑤2 (𝑛𝑒𝑤) = 𝑤2 𝑜𝑙𝑑 + 𝑥2 𝑦
𝑤1 (𝑛𝑒𝑤) = 𝑤1 𝑜𝑙𝑑 + 𝑥1 𝑦 = 1+1=2
= 1-1=0 𝑏(𝑛𝑒𝑤) = b 𝑜𝑙𝑑 + 𝑦
= 1-1=0
Input Target Weight Change Final Weight
x1 x2 b y Δw1 Δw2 Δb w1 w2 b
1 1 1 1 0 2 0
𝛻𝑤1 = 𝑥1 𝑦 𝛻𝑤2 = 𝑥2 𝑦 𝛻𝑏 = 𝑦
= 1.(-1)=-1 =(- 1).(-1)=1 = -1
Input Target Weight Change Final Weight
x1 x2 b y Δw1 Δw2 Δb w1 w2 b
1 -1 1 -1 -1 1 -1 0 2 0

Step 4:Finally,
Input Target Weight Change Final Weight
x1 x2 b y Δw1 Δw2 Δb w1 w2 b
Initially 0 0 0

1 1 1 1 1 1 1 1 1 1

1 -1 1 -1 -1 1 -1 0 2 0

-1 1 1 -1 1 -1 -1 1 1 -1

-1 -1 1 -1 1 1 -1 2 2 -2
Finally, the Hebb net for AND Gate is given by

𝑣 = 𝑏 + 𝑤1 𝑥1 + 𝑤2 𝑥2
1 𝑖𝑓𝑣 ≥ 0
𝑦=ቊ
−1 𝑖𝑓𝑣 < 0

Input Output
x1 x2 b v=b+wixi y
1 1 1 2 1

1 -1 1 -2 -1

-1 1 1 -2 -1

-1 -1 1 -6 -1
The straight line separating the region can be obtained after
presenting each input pair. Thus, line after first iteration:
𝑥1 𝑤1 + 𝑥2 𝑤2 + 𝑏=0;
𝑤1 𝑏
𝑥2 = − 𝑤 𝑥1 − 𝑤 ;
2 2
After first input: 𝑤1 =1, 𝑤2 =1, b=1
𝑥2 = −𝑥1 −1 ;
Thus, line after second iteration:

b + w1 x1 + w2 x2 = 0
w1 b
x2 = − x1 −
w2 w2
at b = 0, w1 = 0, w2 = 2,
0 0
x2 = − x1 −
2 2
x2 = 0
x2 vertical axis
Thus, line after third iteration:

b + w1 x1 + w2 x2 = 0
w1 b
x2 = − x1 −
w2 w2
at b = −1, w1 = 1, w2 = 1,
1 −1
x2 = − x1 −
1 1
x2 = − x1 + 1
y = mx + c
Thus, line after forth iteration:

b + w1 x1 + w2 x2 = 0
w1 b
x2 = − x1 −
w2 w2
at b = −2, w1 = 2, w2 = 2,
2 −2
x2 = − x1 −
2 2
x2 = − x1 + 1
y = mx + c
Example: Develop a Perceptron for the AND function
with binary inputs and bipolar targets without bias up to
second epoch. (take first with (0,0) and second without
(0,0)) (SN Deepa)

Solution: Case I: with (0,0)


x1 x2
1 1
1 0
0 1
0 0
Step 1: Initially
Input Target Weight Change Final Weight
x1 x2 b y Δw1 Δw2 Δb w1 w2 b
Initially 0 0 0
The net input, Yin = w1 x1 + w2 x2
1 if Yin  0

f (Yin ) = 0 if − 0  Yin  0
 − 1 if Yin  −0

The weight change,


w = txi

The weight change,


w(new) = w(old ) + w
Step 2: EPOCH 1:
Input Target Weight Change Final Weight

x1 x2 y t Δw1 Δw2 w1 w2
0 0
1 1 0 1 1 1 1 1
1 0 1 -1 -1 0 0 1
0 1 1 -1 0 -1 0 0
0 0 0 -1 0 0 0 0 FINAL WEIGHT

Step 3: EPOCH 2:
Input Target Weight Change Final Weight

x1 x2 y t Δw1 Δw2 w1 w2
0 0
1 1 0 1 1 1 1 1
1 0 1 -1 -1 0 0 1
0 1 1 -1 0 -1 0 0
0 0 0 -1 0 0 0 0 FINAL WEIGHT
Solution: Case II: without (0,0)
x1 x2
1 1
1 0
0 1
0 0
Step 1: Initially
Input Target Weight Change Final Weight
x1 x2 y Δw1 Δw2 w1 w2
Initially 0 0
The net input,Yin = w1 x1 + w2 x2 The weight change,
1 if Yin  0 w = txi

f (Yin ) = 0 if − 0  Yin  0
 − 1 if Yin  −0
The weight change,
w(new) = w(old ) + w

Step 2: EPOCH 1:
Input Target Weight Change Final Weight

x1 x2 y t Δw1 Δw2 w1 w2
0 0
1 1 0 1 1 1 1 1
1 0 1 -1 -1 0 0 1
0 1 1 -1 0 -1 0 0 FINAL WEIGHT
Step 3: EPOCH 2:

Input Target Weight Change Final Weight

x1 x2 y t Δw1 Δw2 w1 w2
0 0
1 1 0 1 1 1 1 1
1 0 1 -1 -1 0 0 1
0 1 1 -1 0 -1 0 0 FINAL WEIGHT

Thus, from the above solution it is clear that without bias the
convergence does not occur. Even after neglecting (0,0) the
convergence does not occur.
Application Image Processing: The Iris Dataset classification

The Iris flower data set is a multivariate data set conceived by Ronald
Fisher in 1936. Fisher was a British statistician and biologist.

He recorded the length and width of sepals and petals in centimeters for
three different species of flowers: Iris Setosa, Iris Virginica, and Iris
Versicolor.

The total number of records is 150, with 50 for every species. The
columns of the data set are organized as follows:
SepalLengt SepalWidt PetalLengt PetalWidth
Species
hCm hCm hCm Cm
5.2 2.7 3.9 1.4 Iris-versicolor
5.5 4.2 1.4 0.2 Iris-setosa
5.6 2.5 3.9 1.1 Iris-versicolor
6.3 2.5 5.0 1.9 Iris-virginica
Activation Function
• Identity Function Linear Function:

Output y = f(v) = v

Equation : Linear function has the equation similar to as of a straight line i.e. y = ax
No matter how many layers we have, if all are linear in nature, the final activation function
of last layer is nothing but just a linear function of the input of first layer.
Range : -inf to +inf
Uses : Linear activation function is used at just one place i.e. output layer.
Issues : If we will differentiate linear function to bring non-linearity, result will no more
depend on input “x” and function will become constant, it won’t introduce any ground-
breaking behavior to our algorithm.
For example : Calculation of price of a house is a regression problem. House price may
have any big/small value, so we can apply linear activation at output layer. Even in this
case neural net must have any non-linear function at hidden layers.
Activation Function…
• Sigmoidal: it is S shaped curve: Hyperbolic or log functions are
commonly used. Two main types of sigmoidal function are: Binary
and Bipolar sigmoidal function
• a). Binary sigmoidal function/Logistic Function:
1

(range is 0 to 1) 0.95
lembda=0.5
lembda=1
lembda=15
0.9
1 lembda=30

Output = 0.85

1 + e − x 0.8

where  is steepness parameter


0.75

0.7

0.65

0.6

0.55

0.5
0 1 2 3 4 5 6 7 8 9 10

Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are very steep.
This means, small changes in x would also bring about large changes in the value of Y.
Value Range : 0 to 1
Uses : Usually used in output layer of a binary classification, where result is either 0 or
1, as value for sigmoid function lies between 0 and 1 only so, result can be predicted
easily to be 1 if value is greater than 0.5 and 0 otherwise.
Activation Function…
b). Bipolar Sigmoidal/Hyperbolic tangent
• The desired range is between +1 and -1
1
lembda=0.5
0.8 lembda=15
2
Output = −1
lembda=25
0.6
lembda=30
− x
1+ e 0.4

where  is steepness parameter 0.2

-0.2

-0.4

-0.6

-0.8
Value Range :- -1 to +1
-1
Nature :- non-linear -10 -8 -6 -4 -2 0 2 4 6 8 10

Uses :- Usually used in hidden layers of a neural network as it’s


values lies between -1 to 1 hence the mean for the hidden layer
comes out be 0 or very close to it, hence helps in centering the
data by bringing mean close to 0. This makes learning for the next
layer much easier.
Activation Function…
Signum Function/Hard-Limit:
• The function is defined as

+ 1 for net  0
Output = 
− 1 for net  0
Activation Function…
Binary Activation Function
f(net)
a). Unipolar +1

0
+ 1 for net  0 net
Output = 
0 for net  0

b). Bipolar
f(net)

+ 1 for net  0 +1
Output = 
− 1 for net  0
net
-1
Linear Networks
• The Adaline
Adaline (Adaptive Linear element)
• In 1960 Widrow and Hoff developed the learning rule (Delta
Rule), which is very closely related to Perceptron learning
rule.
• Adaline and Madaline both uses the Least Mean Square
(LMS) error for learning.
• Adaline uses bipolar activation function for input and target
output.
• The weight and bias are adjusted by Delta Rule/LMS rule or
Widrow-Hoff Rule.
Delta Rule
• “The adjustment made to a synaptic weight of
a neuron is proportional to the product of the
error signal and the input signal of the
synapse”.
wi =  (t − yin )xi
where  = learning rate
t = target value
y = net input to output unit =  wi xi
x = input
Derivation (Delta Rule) for single
output unit
• The mean square error for a particular training pattern is

E =  (t j − yin )
2

• Partial derivative of E w.r.t. each weight (change in error with


weight w1)

 (t )
E 
= j − y j −in
2

w1 j w1 j j

E
=

w1 j w1 j
(
t j − y j −in )2
Since w1j influences the error
only at output unit yj
y j −in
E
w1 j
( )
= 2 t j − y j −in (− 1)
w1 j
y j −in
(
= −2 t j − y j −in ) w
1j

= −2(t j − y j −in )x
1

• Thus, the error will be reduced rapidly depending upon the


given learning by adjusting the weights according to the delta
rule is given by
w1 =  (t − yin )x1
Architecture (SN Deepa)
• Similar to single layer Neuron, the
Adaline, also has only one output unit.
• The output unit receives input from
several units and also from bias; whose
activation is always +1.
• Each input neurons is connected with
output neurons with weighted
interconnections. (w1,w2,… wn)
• These weights get changed as the
training progresses.
Algorithm
Step1: To start the training process, initially the weights and the bias are set to
be random values (non zero but small random value)
Step2: While stopping condition is false. Do step 3-7.
Step3: For each bipolar training pair s:t, perform step 4-6.
Step4: Set activations of input units xi=si, for i=1 to n.
Step5: Compute net input y-in. For any Adaline, whose three inputs are
x1,x2,x3, and one bias b. The net sum y-in is given by:
net input y−in = b +  wi xi
i

The activation function is used to compute output y.


 1 if yin  

y = f ( y−in ) =  0 if -   yin  
− 1 if yin  −

Algo…
Step6: If t ≠ y, update bias and weights, i=1 to n.
wi (new) = wi (old ) +  (t − y−in )xi
b(new) = b(old ) +  (t − y−in )
else
wi (new) = wi (old )
b(new) = b(old )
Step7: Test for stopping condition.
Stopping condition:
➢ When weight changes reaches to small level.
➢ Pre-decided Number of iterations/epoch.
Example: Develop an Adaline network for ANDNOT function with
bipolar inputs and targets. Find the final weights after second epoch.

• Truth table
x1 x2 Output • Output= (x1 ANDNOT x2)
1 1 -1
1 -1 1 y = x1. x2
-1 1 -1
-1 -1 -1

1
b

x1 w1 y

w2
x2
Architecture
• Step 1: Initially weight and bias are assumed a random value
say 0.2. The learning rate is =0.2.
• Step 2: The weights are calculated until the least mean square
error is obtained
Consider w1=w2=b=0.2, Learning rate η=0.2

• The operations are carried out for two epochs.


Epoch -1: First iteration
• If x1=1 and x2=1, t=-1, 𝛼= 0.2
• Y_in=w1*x1+w2*x2+b

Y_in=0.2*1+0.2*1+0.2 t-Y_in=0.2*1+0.2*1+0.2
=0.6 = -1.6

∆wi=𝛼 * (tj-yj) * xi ∆wi=𝛼 * (tj-yj) * xi


j=1 and i=1 j=1 and i=2
∆w1=𝛼 * (t1-y1) * x1 ∆w2=𝛼 * (t1-y1) * x2
= 0.2*(-1.6))*(1) = 0.2*(-1.6))*(1)
=-0.32 =-0.32
∆b=𝛼 * (tj-yj) * 1 for j=1
∆w2=𝛼 * (t1-y1) * 1
= 0.2*(-1.6))*(1) = -0.32
w1 0.2
w2 0.2
b 0.2
x1 1
x2 1
alpha 0.2
t -1

yin=w1x1+w2x2+b 0.6
t-yin -1.6
dw1 -0.32
dw2 -0.32
db -0.32
w1 -0.12
w2 -0.12
b -0.12
Epoch -1: First iteration
• If x1=1 and x2=1, t=-1, 𝛼= 0.2
• Y_in=w1*x1+w2*x2+b

Y_in=0.2*1+0.2*1+0.2 t-Y_in=0.2*1+0.2*1+0.2
=0.6 = -1.6

∆wi=𝛼 * (tj-yj) * xi ∆wi=𝛼 * (tj-yj) * xi


j=1 and i=1 j=1 and i=2
∆w1=𝛼 * (t1-y1) * x1 ∆w2=𝛼 * (t1-y1) * x2
= 0.2*(-1.6))*(1) = 0.2*(-1.6))*(1)
=-0.32 =-0.32
∆b=𝛼 * (tj-yj) * 1 for j=1
∆w2=𝛼 * (t1-y1) * 1
= 0.2*(-1.6))*(1) = -0.32
Second Iteration
w1 -0.12 • If x1=1 and x2=-1, t=1, 𝛼= 0.2
w2 -0.12
• Y_in=w1*x1+w2*x2+b
b -0.12

yin=w1x1+w2x2+b 0.6 -0.12


t-yin -1.6 1.12
dw1 -0.32 0.224
dw2 -0.32 -0.224
db -0.32 0.224
w1 -0.12 0.104
w2 -0.12 -0.344
b -0.12 0.104
Input Net Input Yin t E=(t-y)2
x1 x2 b w1=0.55, w2=-0.33, b=0.43
1 1 1 0.65 -1
2.72
1 -1 1 1.31 1
0.09
-1 1 1 -0.45 -1
0.30
-1 -1 1 0.21 -1
1.46

Net error=4.57

Thus, the error is minimized from 5.7 to 4.57 after second


iteration.

It can further reduce, by taking more number of iterations.


Madaline Rule 1
Architecture (from :SN Deepa)
Algorithm
Madline Rule II
Algorithm
Algorithm…
Example
Q. From a Madaline network for XOR function
with bipolar input and targets using MR-1
algorithm.

• Solution :
x1 x2 target
1 1 -1
1 -1 1
-1 1 1
-1 -1 -1
A scatterplot with two features of the Iris dataset

The scatterplot is shown in figure generated by combining two features


of the dataset, more specifically the petal and sepal width by each species.
From the figure, we can have an overall understanding of the three
varieties as three classes.
The perceptron needs to be trained by the available data, which is linearly
separable. (Linear separability is a property of two sets of points. If there
exists at least a line in the plane that separates the two sets of points, then
they are linearly separable.) The records of Iris-versicolor and Iris-
virginica for example are not linearly separable, whereas Iris-setosa and
Iris-versicolor fit the rule.
The perceptron used for the classification of Iris data is shown in figure.
Total four input features (x1, x2, x3 and x4) e.g. SepalLengthCm,
SepalWidthCm, PetalLengthCm, PetalWidthCm are used as the input for
training of the network to classify the data into three classes as Iris-
versicolor, Iris-setosa and Iris-virginica.
The network consists of a bias, which acts as a weight on a connection
from a unit whose activation is always 1.
During the training weights are updated as:
If t≠y
w(new)=w(old)+∆w
else
w(new)=w(old);
Stop the training
The diagram shows the perceptron’s process of receiving inputs and
combining them with weights. After training in fact, the perceptron
determines a set of weights. New records can go through the net input
function, which is defined as follows:
4

𝑛𝑒𝑡 = ෍ 𝑤𝑖 𝑥𝑖 + 𝑏
𝑖=1

The Sigmoid activation function is now applied for the proper


classification of the data. The sigmoid function takes net as the input
and gives outputs within the range of 0 and 1:
1
𝑦=
1 + 𝑒 −λ.𝑛𝑒𝑡

Where λ decides steepness of the curve.

The output y decides, which class it belong to.


Back Propagation
Features of Back-propagation
• In 1961, the basics concept of continuous backpropagation were derived in
the context of control theory by J. Kelly, Henry Arthur, and E. Bryson.
• Backpropagation is a short form for "backward propagation of errors." It is
a standard method of training artificial neural networks
• Backpropagation is fast, simple and easy to program
• A feedforward neural network is an artificial neural network.
• Backpropagation simplifies the network structure by removing weighted
links that have a minimal effect on the trained network.
• It is especially useful for deep neural networks working on error-prone
projects, such as image or speech recognition.
• The biggest drawback of the Backpropagation is that it can be sensitive for
noisy data.
Advantages of Back propagation
• Back-propagation is fast, simple and easy to program
• It has no parameters to tune apart from the numbers of input
• It is a flexible method as it does not require prior knowledge about the
network
• It is a standard method that generally works well
• It does not need any special mention of the features of the function to be
learned.
Types of Backpropagation Networks
Two Types of Backpropagation Networks are:
• Static back-propagation: It is one kind of backpropagation
network which produces a mapping of a static input for
static output. It is useful to solve static classification issues
like optical character recognition.
• Recurrent Backpropagation: Recurrent backpropagation is
fed forward until a fixed value is achieved. After that, the
error is computed and propagated backward.
The main difference between both of these methods is: that the
mapping is rapid in static back-propagation while it is
nonstatic in recurrent backpropagation.
Example:
input=[0.5,0.10] and output=[0.1,0.99]
• initial weights, the biases, and training
inputs/outputs:

Given training set: inputs 0.05 and 0.10, and outputs 0.01 and 0.99.
The Forward Pass for Node H1
Here’s how we calculate the total net
input for :

After applying activation function to get the output of :


The Forward Pass for Node H2
Carrying out the same process for h2 we get:
The Forward Pass for Node O1 and
O2
We repeat this process for the output layer neurons, using the output from the
hidden layer neurons as inputs.

Similarly,
Calculating the Total Error
Calculate the error for each output neuron using the squared error function and sum
them to get the total error:

For example, the target output for O1 is 0.01 but the neural network output
0.75136507, therefore its error is:
The Backwards Pass
“Our goal with backpropagation is to update each of the weights in the
network so that they cause the actual output to be closer the target output”
By applying the chain rule we know that:

First, how much does the total error change with respect to the output?
Next, how much does the output of O1 change with respect to its total net input?
The partial derivative of the logistic function is the output multiplied by 1 minus the
output:

Finally, how much does the total net input of o1 change with respect to w5 ?
Putting it all together:

To decrease the error, we then subtract this value from the current weight
Hidden Layer
Next, we’ll continue the backwards pass by calculating new values
for w1, w2, w3 and w4..
We know that out(h1) affects both out(O1) and out(O2) therefore the Etotal needs
to take into consideration its effect on the both output neurons: out h1

Similarly, second part we can calculate through the output equation

Substituting the value


Following the same process for , we get:

Therefore:

Now that we have , we need to figure out and then for each weight:
We can now update w1 :

You might also like