Chapter 2 Adaline
Chapter 2 Adaline
𝑤1 = 𝑤2 = 𝑏 = 0
Input Target Weight Change Final Weight
x1 x2 b y Δw1 Δw2 Δb w1 w2 b
Initially 0 0 0
Step 4:Finally,
Input Target Weight Change Final Weight
x1 x2 b y Δw1 Δw2 Δb w1 w2 b
Initially 0 0 0
1 1 1 1 1 1 1 1 1 1
1 -1 1 -1 -1 1 -1 0 2 0
-1 1 1 -1 1 -1 -1 1 1 -1
-1 -1 1 -1 1 1 -1 2 2 -2
Finally, the Hebb net for AND Gate is given by
𝑣 = 𝑏 + 𝑤1 𝑥1 + 𝑤2 𝑥2
1 𝑖𝑓𝑣 ≥ 0
𝑦=ቊ
−1 𝑖𝑓𝑣 < 0
Input Output
x1 x2 b v=b+wixi y
1 1 1 2 1
1 -1 1 -2 -1
-1 1 1 -2 -1
-1 -1 1 -6 -1
The straight line separating the region can be obtained after
presenting each input pair. Thus, line after first iteration:
𝑥1 𝑤1 + 𝑥2 𝑤2 + 𝑏=0;
𝑤1 𝑏
𝑥2 = − 𝑤 𝑥1 − 𝑤 ;
2 2
After first input: 𝑤1 =1, 𝑤2 =1, b=1
𝑥2 = −𝑥1 −1 ;
Thus, line after second iteration:
b + w1 x1 + w2 x2 = 0
w1 b
x2 = − x1 −
w2 w2
at b = 0, w1 = 0, w2 = 2,
0 0
x2 = − x1 −
2 2
x2 = 0
x2 vertical axis
Thus, line after third iteration:
b + w1 x1 + w2 x2 = 0
w1 b
x2 = − x1 −
w2 w2
at b = −1, w1 = 1, w2 = 1,
1 −1
x2 = − x1 −
1 1
x2 = − x1 + 1
y = mx + c
Thus, line after forth iteration:
b + w1 x1 + w2 x2 = 0
w1 b
x2 = − x1 −
w2 w2
at b = −2, w1 = 2, w2 = 2,
2 −2
x2 = − x1 −
2 2
x2 = − x1 + 1
y = mx + c
Example: Develop a Perceptron for the AND function
with binary inputs and bipolar targets without bias up to
second epoch. (take first with (0,0) and second without
(0,0)) (SN Deepa)
x1 x2 y t Δw1 Δw2 w1 w2
0 0
1 1 0 1 1 1 1 1
1 0 1 -1 -1 0 0 1
0 1 1 -1 0 -1 0 0
0 0 0 -1 0 0 0 0 FINAL WEIGHT
Step 3: EPOCH 2:
Input Target Weight Change Final Weight
x1 x2 y t Δw1 Δw2 w1 w2
0 0
1 1 0 1 1 1 1 1
1 0 1 -1 -1 0 0 1
0 1 1 -1 0 -1 0 0
0 0 0 -1 0 0 0 0 FINAL WEIGHT
Solution: Case II: without (0,0)
x1 x2
1 1
1 0
0 1
0 0
Step 1: Initially
Input Target Weight Change Final Weight
x1 x2 y Δw1 Δw2 w1 w2
Initially 0 0
The net input,Yin = w1 x1 + w2 x2 The weight change,
1 if Yin 0 w = txi
f (Yin ) = 0 if − 0 Yin 0
− 1 if Yin −0
The weight change,
w(new) = w(old ) + w
Step 2: EPOCH 1:
Input Target Weight Change Final Weight
x1 x2 y t Δw1 Δw2 w1 w2
0 0
1 1 0 1 1 1 1 1
1 0 1 -1 -1 0 0 1
0 1 1 -1 0 -1 0 0 FINAL WEIGHT
Step 3: EPOCH 2:
x1 x2 y t Δw1 Δw2 w1 w2
0 0
1 1 0 1 1 1 1 1
1 0 1 -1 -1 0 0 1
0 1 1 -1 0 -1 0 0 FINAL WEIGHT
Thus, from the above solution it is clear that without bias the
convergence does not occur. Even after neglecting (0,0) the
convergence does not occur.
Application Image Processing: The Iris Dataset classification
The Iris flower data set is a multivariate data set conceived by Ronald
Fisher in 1936. Fisher was a British statistician and biologist.
He recorded the length and width of sepals and petals in centimeters for
three different species of flowers: Iris Setosa, Iris Virginica, and Iris
Versicolor.
The total number of records is 150, with 50 for every species. The
columns of the data set are organized as follows:
SepalLengt SepalWidt PetalLengt PetalWidth
Species
hCm hCm hCm Cm
5.2 2.7 3.9 1.4 Iris-versicolor
5.5 4.2 1.4 0.2 Iris-setosa
5.6 2.5 3.9 1.1 Iris-versicolor
6.3 2.5 5.0 1.9 Iris-virginica
Activation Function
• Identity Function Linear Function:
Output y = f(v) = v
Equation : Linear function has the equation similar to as of a straight line i.e. y = ax
No matter how many layers we have, if all are linear in nature, the final activation function
of last layer is nothing but just a linear function of the input of first layer.
Range : -inf to +inf
Uses : Linear activation function is used at just one place i.e. output layer.
Issues : If we will differentiate linear function to bring non-linearity, result will no more
depend on input “x” and function will become constant, it won’t introduce any ground-
breaking behavior to our algorithm.
For example : Calculation of price of a house is a regression problem. House price may
have any big/small value, so we can apply linear activation at output layer. Even in this
case neural net must have any non-linear function at hidden layers.
Activation Function…
• Sigmoidal: it is S shaped curve: Hyperbolic or log functions are
commonly used. Two main types of sigmoidal function are: Binary
and Bipolar sigmoidal function
• a). Binary sigmoidal function/Logistic Function:
1
(range is 0 to 1) 0.95
lembda=0.5
lembda=1
lembda=15
0.9
1 lembda=30
Output = 0.85
1 + e − x 0.8
0.7
0.65
0.6
0.55
0.5
0 1 2 3 4 5 6 7 8 9 10
Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are very steep.
This means, small changes in x would also bring about large changes in the value of Y.
Value Range : 0 to 1
Uses : Usually used in output layer of a binary classification, where result is either 0 or
1, as value for sigmoid function lies between 0 and 1 only so, result can be predicted
easily to be 1 if value is greater than 0.5 and 0 otherwise.
Activation Function…
b). Bipolar Sigmoidal/Hyperbolic tangent
• The desired range is between +1 and -1
1
lembda=0.5
0.8 lembda=15
2
Output = −1
lembda=25
0.6
lembda=30
− x
1+ e 0.4
-0.2
-0.4
-0.6
-0.8
Value Range :- -1 to +1
-1
Nature :- non-linear -10 -8 -6 -4 -2 0 2 4 6 8 10
+ 1 for net 0
Output =
− 1 for net 0
Activation Function…
Binary Activation Function
f(net)
a). Unipolar +1
0
+ 1 for net 0 net
Output =
0 for net 0
b). Bipolar
f(net)
+ 1 for net 0 +1
Output =
− 1 for net 0
net
-1
Linear Networks
• The Adaline
Adaline (Adaptive Linear element)
• In 1960 Widrow and Hoff developed the learning rule (Delta
Rule), which is very closely related to Perceptron learning
rule.
• Adaline and Madaline both uses the Least Mean Square
(LMS) error for learning.
• Adaline uses bipolar activation function for input and target
output.
• The weight and bias are adjusted by Delta Rule/LMS rule or
Widrow-Hoff Rule.
Delta Rule
• “The adjustment made to a synaptic weight of
a neuron is proportional to the product of the
error signal and the input signal of the
synapse”.
wi = (t − yin )xi
where = learning rate
t = target value
y = net input to output unit = wi xi
x = input
Derivation (Delta Rule) for single
output unit
• The mean square error for a particular training pattern is
E = (t j − yin )
2
(t )
E
= j − y j −in
2
w1 j w1 j j
E
=
w1 j w1 j
(
t j − y j −in )2
Since w1j influences the error
only at output unit yj
y j −in
E
w1 j
( )
= 2 t j − y j −in (− 1)
w1 j
y j −in
(
= −2 t j − y j −in ) w
1j
= −2(t j − y j −in )x
1
• Truth table
x1 x2 Output • Output= (x1 ANDNOT x2)
1 1 -1
1 -1 1 y = x1. x2
-1 1 -1
-1 -1 -1
1
b
x1 w1 y
w2
x2
Architecture
• Step 1: Initially weight and bias are assumed a random value
say 0.2. The learning rate is =0.2.
• Step 2: The weights are calculated until the least mean square
error is obtained
Consider w1=w2=b=0.2, Learning rate η=0.2
Y_in=0.2*1+0.2*1+0.2 t-Y_in=0.2*1+0.2*1+0.2
=0.6 = -1.6
yin=w1x1+w2x2+b 0.6
t-yin -1.6
dw1 -0.32
dw2 -0.32
db -0.32
w1 -0.12
w2 -0.12
b -0.12
Epoch -1: First iteration
• If x1=1 and x2=1, t=-1, 𝛼= 0.2
• Y_in=w1*x1+w2*x2+b
Y_in=0.2*1+0.2*1+0.2 t-Y_in=0.2*1+0.2*1+0.2
=0.6 = -1.6
Net error=4.57
• Solution :
x1 x2 target
1 1 -1
1 -1 1
-1 1 1
-1 -1 -1
A scatterplot with two features of the Iris dataset
𝑛𝑒𝑡 = 𝑤𝑖 𝑥𝑖 + 𝑏
𝑖=1
Given training set: inputs 0.05 and 0.10, and outputs 0.01 and 0.99.
The Forward Pass for Node H1
Here’s how we calculate the total net
input for :
Similarly,
Calculating the Total Error
Calculate the error for each output neuron using the squared error function and sum
them to get the total error:
For example, the target output for O1 is 0.01 but the neural network output
0.75136507, therefore its error is:
The Backwards Pass
“Our goal with backpropagation is to update each of the weights in the
network so that they cause the actual output to be closer the target output”
By applying the chain rule we know that:
First, how much does the total error change with respect to the output?
Next, how much does the output of O1 change with respect to its total net input?
The partial derivative of the logistic function is the output multiplied by 1 minus the
output:
Finally, how much does the total net input of o1 change with respect to w5 ?
Putting it all together:
To decrease the error, we then subtract this value from the current weight
Hidden Layer
Next, we’ll continue the backwards pass by calculating new values
for w1, w2, w3 and w4..
We know that out(h1) affects both out(O1) and out(O2) therefore the Etotal needs
to take into consideration its effect on the both output neurons: out h1
Therefore:
Now that we have , we need to figure out and then for each weight:
We can now update w1 :