0% found this document useful (0 votes)

109 views28 pages

Adaline

The document discusses the ADALINE (Adaptive Linear Neuron) algorithm for training neural networks to classify patterns into two or more categories. It introduces the Delta rule (also known as the Widrow-Hoff rule or LMS rule) for updating the network weights and bias to minimize the error between the network output and target values on each training example. The Delta rule aims to find a robust set of weights and bias by taking steps in the direction of steepest descent on the error surface.

Uploaded by

Anonymous 05P3kMI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views28 pages

Adaline

Uploaded by

Anonymous 05P3kMI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

ADALINE for Pattern Classification

K. M. Leung
Department of Computer Science and Engineering
Polytechnic School of Engineering, NYU

2015.09.29

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

1 / 28

Abstract
A supervised learning algorithm known as the Widrow-Hoff rule, or the
Delta rule, or the LMS rule, is introduced to train neuron networks to
classify patterns into two or more categories.

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

2 / 28

Simple ADELINE for Pattern Classification

Although the Perceptron learning rule always converges, in fact in a finite
number of steps, to a set of weights and biases, provided that such a set
exists, the set obtained is often not the best in terms of robustness. We
will discuss here the ADALINE, which stands for Adaptive Linear Neuron,
and a learning rule which is capable, at least in principle, of finding such a
robust set of weights and biases.
The architecture for the NN for the ADALINE is basically the same as the
Perceptron, and similarly the ADALINE is capable of performing pattern
classifications into two or more categories. Bipolar neurons are also used.
The ADALINE differs from the Perceptron in the way the NNs are trained,
and in the form of the transfer function used for the output neurons during
training. For the ADALINE, the transfer function is taken to be the
identity function during training. However, after training, the transfer
function is taken to be the bipolar Heaviside step function when the NN is
used to classify any input patterns.

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

3 / 28

Thus the transfer function is

f (yin ) = yin ,
during training,
(
+1, if yin 0
f (yin ) =
after training.
1, if yin < 0
We will first consider the case of classification into 2 categories only, and
thus the NN has only a single output neuron. Extension to the case of
multiple categories is treated in the next section.
The total input received by the output neuron is given by
yin = b +

N
X

xi wi .

i=1

Just like Hebbs rule and the Perceptron learning rule, the Delta rule is
also a supervised learning rule. Thus we assume that we are given a
training set:
{s(q) , t (q) },
q = 1, 2, . . . , Q.
where s(q) is a training vector, and t (q) is its corresponding targeted
output value.
K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

4 / 28

Also like Hebbs rule and the Perceptron rule, one cycles through the
training set, presenting the training vectors one at a time to the NN. For
the Delta rule, the weights and bias are updated so as to minimize the
square of the difference between the net output and the target value for
the particular training vector presented at that step.
Notice that this procedure is not exactly the same as minimizing the overall
error between the NN outputs and their corresponding target values for all
the training vectors. Doing so would require the solution to a large scale
optimization problem involving N weight components and a single bias.

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

5 / 28

Multi-Parameter Minimization
To better understand the updating procedure for the weights and bias in
the Delta rule, we need to digress and consider the topic of
multi-parameter minimization. We assume that E (w) is a scalar function
of a vector argument, w. We want to find the point w Rn at which E
takes on its minimum value.
Suppose we want to find the minimum value iteratively starting with w(0).
The iteration amounts to
w(k + 1) = w(k) + w(k),

k = 0, 1, . . . .

The question is how should the changes in the weight vector be chosen in
order that we end up with a lower value for E :
E (w(k + 1)) < E (w(k)).
For sufficiently small w(k), we obtain from Taylors theorem
E (w(k + 1)) = E (w(k) + w(k)) E (w(k)) + g(k) w(k),
where g(k) = E (w)|w=w(k) is the gradient of E (w) at w(k).
K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

6 / 28

It is clear that E (w(k + 1)) < E (w(k)) if g(k) w(k) < 0. The largest
decrease in the value of E (w) occurs in the direction w(k) = g(k), if
is sufficiently small and positive. This direction is called the steepest
descent direction, and controls the size of the step and is called the
learning rate. Thus starting from w(0)), the idea is to find a minimum of
the function E (w) iteratively by making successive steps along the local
gradient direction, according to
w(k + 1) = w(k) g(k),

k = 0, 1, . . . .

This method of finding the minimum is known as the steepest descent

method.
This is a greedy method which may lead to convergence to a local but not
a global minimum of E .

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

7 / 28

Delta Rule
Suppose at the k-th step in the training process, the current weight vector
and bias are given by w(k) and b(k), respectively, and the q-th training
vectors, s(k) = s(q) , is presented to the NN. The total input received by
the output neuron is
yin = b(k) +

N
X

si (k)wi (k).

i=1

Since the transfer function is given by the identity function during training,
the output of the NN is
y (k) = yin = b(k) +

N
X

si (k)wi (k).

i=1

However the target output is t(k) = t (q) , and so if y (k) 6= t(k) then there
is an error given by y (k) t(k). This error can be positive or negative.
K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

8 / 28

The Delta rule aims at finding the weights and bias so as to minimize the
square of this error
!2
N
X
E (w(k)) = (y (k) t(k))2 = b(k) +
si (k)wi (k) t(k) .
i=1

We can absorb the bias term by introducing an extra input neuron, X0 , so

that its activation (signal) is always fixed at 1 and its weight is the bias.
Then the square of the error in the k-th step is
!2
N
X
E (w(k)) =
si (k)wi (k) t(k) .
i=0

The gradient of this function, g(k), in a space of dimension N + 1 (N

weights and 1 bias) is
!
N
X
gj (k) = wj (k) E (w(k)) = 2
si (k)wi (k) t(k) sj (k).
i=0
K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

9 / 28

Using the steepest descent method, we have

w(k + 1) = w(k) 2

N
X

!
si (k)wi (k) t(k) s(k).

i=0

The i = 1, 2, . . . , N components of this equation gives the updating rule

for the weights. The zeroth component of this equation gives the updating
rule for the bias
!
N
X
b(k + 1) = b(k) 2
si (k)wi (k) t(k) .
i=0

Notice that in the textbook by Fausett, the factors of 2 are missing from
these two updating formulas. We can also say that the learning rate there
is twice the value here.
We will now summarize the Delta rule. To save space, we use vector
notation, where vectors are denoted by boldface quantities.

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

10 / 28

The Delta Rule

1
2

Set learning rate and initialize weights and bias.

Repeat the following steps, while cycling through the training set
q = 1, 2, . . . , Q, until changes in the weights and bias are
insignificant.
1
2

Set activations for input vector x = s(q) .

Compute total input for the output neuron:
yin = x w + b

3
4

Set y = yin .
Update the weights and bias
wnew = wold 2(y t (q) )x,
b new = b old 2(y t (q) ).

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

11 / 28

Notice that for the Delta rule, unlike the Perceptron rule, training does not
stop even after all the training vectors have been correctly classified. The
algorithm continuously attempts to produce more robust sets of weights
and bias. Iteration is stopped only when changes in the weights and bias
are smaller than a preset tolerance level.
In general, there is no proof that the Delta rule will always lead to
convergence, or to a set of weights and bias that enable the NN to
correctly classify all the training vectors. One also needs to experiment
with the size of the learning rate. Too small a value may require too many
iterations. Too large a value may lead to non-convergence.
Because the identity function is used as the transfer function during
training, the error at each step of the training process may never become
small, even though an acceptable set of weights and bias may have already
been found. In that case the weights will continually change from one
iteration to the next. The amount of changes are proportional to .
Therefore in some cases, one may want to gradually decrease towards
zero during iteration, especially when one is close to obtaining the best set
of weights and bias. Of course there are many ways in which can be
made to approach zero.
K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

12 / 28

Exact Optimal Choice of Weights and Bias

Actually one can find, at least in principle, a set of weights and bias that
will perform best for a given training set. To see this, it is better to absorb
the bias to simplify the expressions. What this problem intends to
accomplish mathematically is to find a vector w that minimizes the overall
squares of the errors (the least mean squares, or LMS)
Q
Q
N
1 X X (q)
1 X
(q) 2
(y t ) =
si wi t (q)
F (w) =
Q
Q
q=1

q=1

!2
.

i=0

Since F (w) is quadratic in the weight components, the solution can be

readily obtained, at least formally. To obtain the solution, we take the
partial derivatives of F (w), set them to zero, and solve the resulting set of
equations. Since F (w) is quadratic in the weight components, its partial
derivatives are linear, and the resulting equation for the weight
components are linear and can therefore be solved.
K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

13 / 28

Taking the partial derivative of F (w) with respect to the j-th component
of the weight vector gives
wj F (w) =

Q
Q
N
X
2 X
2 X
(q)
(q)
(q)
(y t )wj
si w i =
(y t (q) )sj
Q
Q
q=1
q=1
i=0
!
!
Q
N
N
X
2 X X (q)
(q)
si wi t (q) sj = 2
wi Cij vj ,
Q
q=1

i=0

where we have defined the correlation matrix C such that

Q
1 X (q) (q)
Cij =
si sj
Q
q=1

and a vector v having components

vj =

Q
1 X (q) (q)
t sj .
Q
q=1

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

14 / 28

Setting the partial derivatives to zero gives the set of linear equations
(written in matrix notation):
wC = v.
Notice that the correlation matrix C and the vector v can be easily
computed from the given training set.
Assuming that the correlation matrix is nonsingular, the solution is
therefore given by
w = vC1 ,
where C1 is the inverse matrix for C. Notice that the correlation matrix
is symmetric and has dimension (N + 1) (N + 1).
Although the exact solution is formally available, computing it this way
requires the computation of the inverse of matrix C or solving a system of
linear equations. The computational complexity involved is of O(N + 1)3 .
For most practical problems, N is so large that computing the solution this
way is really not feasible.
K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

15 / 28

Application: Bipolar Logic Function: AND

We use the Delta rule here to train the same NN (the bipolar logic
function: AND) that we have treated before using different training rules.
The training set is given by the following table:
s(q) t (q)
q
1
[1 1]
1
2
[1 -1]
-1
[-1 1]
-1
3
4 [-1 -1]
-1
We assume that the weights and bias are initially zero, and apply the
Delta rule to train the NN. We find that for a learning rate larger than
about 0.3, there is no convergence as the weight components increase
without bound. For less than 0.3 but larger than 0.16, the weights
converge but to values that fail to correctly classify all the training vectors.
The weights converge to values that correctly classify all the training
vectors if is less than about 0.16. They become closer and closer to the
most robust set of weights and bias when is below 0.05.
K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

16 / 28

We also consider here the exact formal solution given in the last section.
We will absorb the bias by appending a 1 in the leading position of each of
q
s(q) t (q)
1
[1 1 1]
1
the training vectors so that the training set is 2
[1 1 -1]
-1
3
[1 -1 1]
-1
4 [1 -1 -1]
-1
We first compute the correlation matrix
4

1 X (q)T (q)
s
s
C =
4
q=1

1
1

1
1
1 1 1 + 1 1 1 1
=
4
1
1

!
1
1

+ 1 1 1 1 + 1 1 1 1
1
1
K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

17 / 28

Thus

1 0 0
C = 0 1 0
0 0 1
Since C is an identity matrix (the training vectors are as independent of
each other as they can be), its inverse is just itself.
Then we compute the vector v
4

1 X (q) (q)
t s
4
q=1

1
1 1 1 1 1 1
=
4

1 1 1 1 1 1
1 1 1
=
2 2 2 .

v =

Therefore we have
W = vC1 =

1
2

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

18 / 28

This means that

1
b= ,
W 12 12 ,
2
and so the best decision boundary is given by the line
x2 = 1 x1 ,
which we know before is the correct result.

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

19 / 28

NN with multiple Output Neurons

We now extend our discussions here to NN with multiple output neurons,
and thus are capable of clustering input vectors into more than 2 classes.
As before, we need to have M neurons in the output layer.
x1

w11
w12 w21
w22

w23
w13
w31
xn

w32
w33

Figure: A neural network for multi-category classification.

K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

20 / 28

We will absorb the biases as we did before with the Perceptron. Suppose
at the k-th step in the training process, the current weight matrix and bias
vector are given by W(k) and b(k), respectively, and one of the training
vectors s(k) = s(q) , for some integer q between 1 and Q, is presented to
the NN. The output of neuron Yj is
yj (k) = yin,j =

N
X

si (k)wij (k).

i=0
(q)

However the target is tj (k) = tj , and so the error is yj (k) tj (k). Thus
we want to find a set of wmn that minimizes the quantity
E (W(k)) =

M
X
j=1

(yj (k) tj (k)) =

N
X

!2
si (k)wij (k) tj (k)

i=0

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

21 / 28

In order to do that we need to take the partial derivative of E with respect

to wmn for any given m and n.
We will use the abbreviated notation wmn to represent the partial
derivative w .
mn

Note that
wij wmn = ij ,
which is the Kronecker delta defined by
(
1
if
ij =
0
if

i = j,
i 6= j.

The reason is because if i is not the same as m, and j is not the same as
n, then wij and wmn refer to two different weights and are therefore
independent of each other, and the partial derivative is then 0.
Otherwise they refer to the same weight and the partial derivative is 1.
K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

22 / 28

We take the gradient of this function with respect to wmn

wmn E (W(k)) = wmn

M
X

(yj (k) tj (k))2

j=1

= 2

M
X

(yj (k) tj (k)) wmn yj .

j=1

wmn yj = wmn

N
X

si (k)wij (k) =

i=0

N
X

si (k)wmn wij (k)

i=0

Since
wmn wij (k) = i,m j,n ,
thus
wmn yj = wmn

N
X
i=0

si (k)wij (k) =

N
X

si (k)i,m j,n = j,n sm (k),

i=0

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

23 / 28

and so we have
wmn E (W(k)) = 2

M
X

(yj (k) tj (k)) j,n sm (k)

j=1

= 2sm (k) (yn (k) tn (k)) .

Using the steepest descent method, we have
wij (k + 1) = wij (k) 2si (k) (yj (k) tj (k)) .
The i = 1, 2, . . . , N components of this equation gives the updating rule
for the weights. The i = 0 component of this equation gives the updating
rule for the bias
bj (k + 1) = bj (k) 2 (yj (k) tj (k)) .

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

24 / 28

General Delta Rule for Multiple Output Neurons

1
2

Set learning rate and initialize weights and bias.

Repeat the following steps, while cycling through the training set
q = 1, 2, . . . , Q, until changes in the weights and biases are within
tolerance.
1
2

Set activations for input vector x = s(q) .

Compute total input for the output neuron:
yin = x W + b

3
4

Set y = yin .
Update the weights and biases
Wnew = Wold 2xT (y t(q) ),
bnew = bold 2(y t(q) ).

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

25 / 28

An Example
We will now treat the same example that we have considered before for
the Perceptron with multiple output neurons. We use bipolar output
neurons and the training set:
(class 1)

s(1) = 1 1 , s(2) = 1 2
with t(1) = t(2) = 1 1
(class 2)

s(3) = 2 1 , s(4) = 2 0

with t(3) = t(4) = 1 1

(class 3)

s(5) = 1 2 , s(6) = 2 1
(class 4)

s(7) = 1 1 , s(8) = 2 2

with t(5) = t(6) = 1 1

with t(7) = t(8) = 1 1

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

26 / 28

It is clear that N = 2, Q = 8, and the number of classes is 4. The number

of output neuron is chosen to be M = 2 so that 2M = 4 classes can be
represented.
Our exact calculation of the weights and bias for the case of a single
output neuron can be extended to the case of multiple output neurons.
One can then obtain the following exact results for the weights and biases:
#
" 91
1
W=
b=

153
8
153

6
2
3

2
153

1
6

Using these exact results, we can easily see how good or bad our iterative
solutions are.
It should be remarked that the most robust set of weights and biases is
determined only by a few training vectors that lie very close to the decision
boundaries. However in the Delta rule, all training vectors contribute in
some way. Therefore the set of weights and biases obtained by the Delta
rule is not necessarily always the most robust.
K. M. Leung (Department of Computer Science and EngineeringPolytechnic
CS6673
School of Engineering, NYU)

2015.09.29

27 / 28

The Delta rule usually gives convergent results if the learning rate is not
too large. The resulting set of weights and biases typically leads to correct
classification of all the training vectors, provided such a set exist. How
close this set is to the best choice depends on the starting weights and
biases, the learning rate and the number of iterations. We find that for
this example much better convergence can be obtained if the learning rate
at step k is set to be = 1/k.

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

CS6673
School of Engineering, NYU)

2015.09.29

28 / 28

3 DeltaRule PDF
No ratings yet
3 DeltaRule PDF
10 pages
Artificial Neural Networks Video Tutorial: Machine Learning 17CS73
No ratings yet
Artificial Neural Networks Video Tutorial: Machine Learning 17CS73
23 pages
Learning Rules
No ratings yet
Learning Rules
60 pages
ADALINE For Pattern Classification: Polytechnic University Department of Computer and Information Science
No ratings yet
ADALINE For Pattern Classification: Polytechnic University Department of Computer and Information Science
27 pages
Additional Topics
No ratings yet
Additional Topics
21 pages
9 Lesson 06
No ratings yet
9 Lesson 06
12 pages
7th Lecture Widrow Hoff Learning Algorithm s1!21!22
No ratings yet
7th Lecture Widrow Hoff Learning Algorithm s1!21!22
20 pages
Kevin Swingler - Lecture 3: Delta Rule
No ratings yet
Kevin Swingler - Lecture 3: Delta Rule
10 pages
Hoff Learning Rule
No ratings yet
Hoff Learning Rule
22 pages
Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)
No ratings yet
Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)
5 pages
Learning Rules in ANN
No ratings yet
Learning Rules in ANN
11 pages
Error
No ratings yet
Error
24 pages
Network Learning (Training)
No ratings yet
Network Learning (Training)
29 pages
Lec7 Inroduction to Neural Network (1)
No ratings yet
Lec7 Inroduction to Neural Network (1)
24 pages
Jntuk R20 ML Unit-V
No ratings yet
Jntuk R20 ML Unit-V
19 pages
Assignment Neural Networks
No ratings yet
Assignment Neural Networks
7 pages
Gradient Maths - Step by Step Delta Rule PDF
No ratings yet
Gradient Maths - Step by Step Delta Rule PDF
18 pages
Chapter_7
No ratings yet
Chapter_7
68 pages
Model of Neuron in An ANN
No ratings yet
Model of Neuron in An ANN
12 pages
Delta Rule
No ratings yet
Delta Rule
3 pages
The_delta_rule
No ratings yet
The_delta_rule
1 page
3 DeltaRule PDF
No ratings yet
3 DeltaRule PDF
10 pages
Lect3 UWA PDF
No ratings yet
Lect3 UWA PDF
73 pages
Perceptrons
No ratings yet
Perceptrons
11 pages
Lecture 2
No ratings yet
Lecture 2
12 pages
CT1 NNDL Question Bank
No ratings yet
CT1 NNDL Question Bank
8 pages
Perceptron&ADALINEcode
No ratings yet
Perceptron&ADALINEcode
2 pages
PERCEPTRONS
No ratings yet
PERCEPTRONS
13 pages
Lec19 Delta
No ratings yet
Lec19 Delta
19 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
Chapter-2 Single Feed Forward Netwotk
No ratings yet
Chapter-2 Single Feed Forward Netwotk
132 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
Unit - I Artificial Neural Networks
No ratings yet
Unit - I Artificial Neural Networks
23 pages
Experiment 7 AISC
No ratings yet
Experiment 7 AISC
5 pages
ML UNIT-5
No ratings yet
ML UNIT-5
19 pages
l6 - Generalized Delta Ruled
No ratings yet
l6 - Generalized Delta Ruled
16 pages
3-ADALINE (Adaptive Linear Neuron) (Widrow & Hoff, 1960) : W X T E
No ratings yet
3-ADALINE (Adaptive Linear Neuron) (Widrow & Hoff, 1960) : W X T E
8 pages
Handout Delta Rule
No ratings yet
Handout Delta Rule
10 pages
Unit - I Artificial Neural Networks
No ratings yet
Unit - I Artificial Neural Networks
23 pages
20-Delta Rule-02-09-2024
No ratings yet
20-Delta Rule-02-09-2024
3 pages
Tasks on Neurons and ANN
No ratings yet
Tasks on Neurons and ANN
15 pages
Machine Learning Unit 5 Notes
No ratings yet
Machine Learning Unit 5 Notes
19 pages
Networks With Threshold Activation Functions: Navigation
No ratings yet
Networks With Threshold Activation Functions: Navigation
6 pages
Introduction To Neural Networks: John Paxton Montana State University Summer 2003
No ratings yet
Introduction To Neural Networks: John Paxton Montana State University Summer 2003
31 pages
Intelligent Systems Notes: Federico Rossi A.A 2017/2018
No ratings yet
Intelligent Systems Notes: Federico Rossi A.A 2017/2018
34 pages
Algorithm of Delta Learning Rule
No ratings yet
Algorithm of Delta Learning Rule
2 pages
Perceptrons For Regression & Classification
No ratings yet
Perceptrons For Regression & Classification
20 pages
6.1-Fundamentals of Artificial Neural Networks
No ratings yet
6.1-Fundamentals of Artificial Neural Networks
12 pages
Soft Computing
No ratings yet
Soft Computing
40 pages
Adaline
No ratings yet
Adaline
3 pages
chapter3- Perceptron Adaline
No ratings yet
chapter3- Perceptron Adaline
53 pages
Module 4 Chapter 2
No ratings yet
Module 4 Chapter 2
22 pages
L04 Slides.mlp1
No ratings yet
L04 Slides.mlp1
22 pages
Adaline and K
0% (1)
Adaline and K
29 pages
Ann 4
No ratings yet
Ann 4
88 pages
Soft Computing
No ratings yet
Soft Computing
92 pages
Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering
No ratings yet
Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering
16 pages
Back Propagation ALGORITHM
No ratings yet
Back Propagation ALGORITHM
11 pages
Solution ToYegnRame2001
No ratings yet
Solution ToYegnRame2001
107 pages
Sequences and Infinite Series, A Collection of Solved Problems
From Everand
Sequences and Infinite Series, A Collection of Solved Problems
Steven Tan
No ratings yet
Adaline and Delta Learning Rule
No ratings yet
Adaline and Delta Learning Rule
18 pages
Question of The Day: N N N N
No ratings yet
Question of The Day: N N N N
8 pages
Module-3: Chapter-4 Artificial Neural Networks
No ratings yet
Module-3: Chapter-4 Artificial Neural Networks
19 pages
Adaline and Medaline
50% (2)
Adaline and Medaline
14 pages
AIML-Module-3-part 2
No ratings yet
AIML-Module-3-part 2
122 pages
Perceptron Example (Practice Que)
No ratings yet
Perceptron Example (Practice Que)
26 pages
Reference-Unit-4 (NLP - Neural Network)
No ratings yet
Reference-Unit-4 (NLP - Neural Network)
39 pages
Adaptive Linear Neuron
No ratings yet
Adaptive Linear Neuron
4 pages
An Introduction To Neural Networks Python
No ratings yet
An Introduction To Neural Networks Python
11 pages
Adaline
No ratings yet
Adaline
28 pages
Hebbian Learning and Gradient Descent Learning: Neural Computation: Lecture 5
No ratings yet
Hebbian Learning and Gradient Descent Learning: Neural Computation: Lecture 5
20 pages
(A) Plot The Error E As A Function of The Number of Training Iterations
No ratings yet
(A) Plot The Error E As A Function of The Number of Training Iterations
8 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
28 pages
Chapter 07 Artificial Neural Network
No ratings yet
Chapter 07 Artificial Neural Network
62 pages
Presentation On: Neural Network
No ratings yet
Presentation On: Neural Network
30 pages
Adaptive Linear Neuron
No ratings yet
Adaptive Linear Neuron
4 pages
Soft Computing Unit-2 by Arun Pratap Singh
100% (1)
Soft Computing Unit-2 by Arun Pratap Singh
74 pages
Artificial Neural Networks[1]
No ratings yet
Artificial Neural Networks[1]
87 pages
8th_lecture_Delta_Rule_Learning_s1_21_22
No ratings yet
8th_lecture_Delta_Rule_Learning_s1_21_22
48 pages

Adaline

Uploaded by

Adaline

Uploaded by

ADALINE for Pattern Classification

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

Simple ADELINE for Pattern Classification

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

Thus the transfer function is

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

This method of finding the minimum is known as the steepest descent

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

We can absorb the bias term by introducing an extra input neuron, X0 , so

The gradient of this function, g(k), in a space of dimension N + 1 (N

Using the steepest descent method, we have

The i = 1, 2, . . . , N components of this equation gives the updating rule

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

The Delta Rule

Set learning rate and initialize weights and bias.

Set activations for input vector x = s(q) .

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

Exact Optimal Choice of Weights and Bias

Since F (w) is quadratic in the weight components, the solution can be

where we have defined the correlation matrix C such that

and a vector v having components

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

Application: Bipolar Logic Function: AND

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

This means that

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

NN with multiple Output Neurons

Figure: A neural network for multi-category classification.

(yj (k) tj (k)) =

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

In order to do that we need to take the partial derivative of E with respect

We take the gradient of this function with respect to wmn

(yj (k) tj (k))2

(yj (k) tj (k)) wmn yj .

si (k)wmn wij (k)

si (k)i,m j,n = j,n sm (k),

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

(yj (k) tj (k)) j,n sm (k)

= 2sm (k) (yn (k) tn (k)) .

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

General Delta Rule for Multiple Output Neurons

Set learning rate and initialize weights and bias.

Set activations for input vector x = s(q) .

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

It is clear that N = 2, Q = 8, and the number of classes is 4. The number

K. M. Leung (Department of Computer Science and EngineeringPolytechnic

You might also like