Hoff Learning Rule

The document discusses the Widrow & Hoff learning rule, also known as the Delta rule or LMS rule. It describes how the rule works to minimize error by adjusting weights according to the gradient of the error function. It also provides an example of applying the rule to an ADALINE network to learn an AND logic function.

Uploaded by

Jo Jo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views22 pages

Hoff Learning Rule

Uploaded by

Jo Jo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Widrow & Hoff

Learning Rule
Dr. Nitish Katal

1
Outline

• Delta Learning Rule

• Single & Multiple Output Layers
• Choice of Learning Rate

2
Learning by Error Minimization
• What is the Best Learning Parameters?
• Learn from your mistakes

• For an input (xi ,t ) in the training set,

• The cost of a mistake is 𝑡 − 𝑦𝐼𝑁 or 𝑡 − 𝑥𝑖 𝑤𝑖

• Define the cost (or loss) for a particular weight vector w to be:
• Sum of squared costs over the training set
1
𝐸 = 𝑡 − 𝑦𝐼𝑁 2
2
• One strategy for learning:
• Find the w with least cost on this data
3
Delta Rule /
Widrow & Hoff Learning rule
• Learning: minimizing mean squared error
1
𝐸 = 𝑡 − 𝑦𝐼𝑁 2
2
• Different strategies exist for learning by optimization
• Gradient descent is a popular algorithm

• The Delta rule for adjusting the weight for each pattern is given
as
∆𝑤𝐼 = 𝛼 𝑡 − 𝑦𝐼𝑁 𝑥𝐼

4
Delta Rule
Now:
x : Vector of activations of the inputs
yIN : Net input to the output unit Y
t : Target
𝑛

𝑦𝐼𝑁 = ෍ 𝑥𝑖 𝑤𝑖
𝑖=1
Then, the squared error for a particular training pattern is:
1
𝐸 = 𝑡 − 𝑦𝐼𝑁 2
2

where E is the function of all the weights 𝑤𝑖 , I = 1, …, n

5
Delta Rule … (contd.)
𝜕𝐸 𝜕𝐸 𝜕𝐸
Gradient of E: 𝐸= , ,…,
𝜕𝑤1 𝜕𝑤2 𝜕𝑤𝑛

• The gradient keeps the direction of the most rapid increase in E;

• The opposite direction gives the most rapid decrease in error
𝜕𝐸
−
𝜕𝑤𝐼
Since,
𝜕𝐸 𝜕 1
= 𝑡 − 𝑦𝐼𝑁 2
𝜕𝑤𝐼 𝜕𝑤𝐼 2
𝜕 𝑦𝐼𝑁
= − 𝑡 − 𝑦𝐼𝑁
𝜕𝑤𝐼
= − 𝑡 − 𝑦𝐼𝑁 𝑥𝐼
6
Delta Rule … (contd.)
𝑛
𝜕𝐸 𝜕𝐸 𝜕𝐸 𝜕𝐸 𝜕 1
Derivation 𝐸= , ,…, =෍ 𝑡 − 𝑥𝑖 𝑤𝑖 2
𝜕𝑤1 𝜕𝑤2 𝜕𝑤𝑛 𝜕𝑤𝐼 𝜕𝑤𝐼 2
𝑖=1

𝜕𝐸 𝜕 1 𝑛
So: = 𝑡 − 𝑦𝐼𝑁 2
= ෍ 𝑡 − 𝑥𝑖 𝑤𝑖
𝜕 𝑡 − 𝑥𝑖 𝑤𝑖
𝜕𝑤𝐼 𝜕𝑤𝐼 2 𝜕𝑤𝐼
𝑖=1
𝑛 𝑛

We know that: 𝑦𝐼𝑁 = ෍ 𝑥𝑖 𝑤𝑖 = − ෍ 𝑡 − 𝑥𝑖 𝑤𝑖 𝑥𝐼

𝑖=1
𝑖=1
𝑛 • The gradient keeps the direction of the most rapid
Thus: 𝜕𝐸 𝜕 1 2 increase in E;
= ෍ 𝑡 − 𝑥𝑖 𝑤𝑖
𝜕𝑤𝐼 𝜕𝑤𝐼 2 • The opposite direction gives the most rapid decrease in
𝑖=1
error

7
Summary : Delta/WH/LMS Rule

∆𝑤𝐼 = 𝛼 𝑡 − 𝑦𝐼𝑁 𝑥𝐼

wi ( new ) = wi ( old ) +  ( t − yIN ) xi

b ( new ) = b ( old ) +  ( t − yIN )

8
Choice of Learning Rate
•

9
Summary
• The real life problems are complex and non-
convex.
• Purpose of NN Learning or Training:
• Minimise the output errors for particular set of
training data
• By adjusting the network weights wij.
• Error Function E(wij)
• “Measures” how far the current network is from the
desired one.
• Gradient of the Error Function
• Partial derivatives of error function ∂E(wij)/∂wij
• Guides the direction to move in weight space to
reduce the error.
• The learning rate 𝛼 specifies:
• The step sizes we take in weight space for each
iteration of the weight update equation.
• Keep iterating
• The weight space until the errors are ‘small enough’.
• Choice of Activation functions with
11 derivatives
ADALINE
Learning Rule & Architecture

12
Outline

• ADALINE
• Architecture
• Learning Algorithm
• Example
ADALINE : Introduction
• Adaline : Adaptive Linear Neuron
• NN having a single linear unit.

• Developed by Widrow and Hoff in

1960.

• Features of Adaline
• Uses bipolar activation function.
• Uses delta rule for training to
minimize the MSE between the actual +1 if 𝑦𝐼𝑁 ≥ 0
output and the desired/target output. 𝑓 𝑦𝐼𝑁 =ቊ
−1 if 𝑦𝐼𝑁 < 0
• The weights and the bias are
adjustable.
Algorithm
Step 0 Initialize all weights
𝒘𝒊 (Small random values ) (i = 1 … n) n is the no. of input neurons
Set Learning Rate
Step 1 While Stopping Criteria is FALSE, Do Steps 2 - 6
Step 2 For each : Bipolar Training Vector: Inputs (s) & Target (t)
Do Steps : 3 - 5
Step 3 : Set activations for input units
𝒙𝒊 = 𝒔𝒊 (i = 1 … n)
Step 4 : Find Net Input to Output units

𝒚𝑰𝑵 = 𝒃 + ෍ 𝒙𝒊 𝒘𝒊
𝒊
Step 5 : Update the Weights & Bias, (i = 1 … n)
𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵 𝒙𝒊
𝒃 𝒏𝒆𝒘 = 𝒃 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵
Step 6 : Check Stopping Criteria
If Largest Weight Change that in Step 2 is SMALLER than a specified tolerance, then STOP;
else continue.
Choice of Learning Rate
• Hecht-Nelson (1990), proposed:
• An upper bound for its value can be found from the largest Eigen
values of the correlational matrix R of input vectors x(p).
𝑃
1
𝑅 = ෍ 𝑥 𝑝 𝑇𝑥 𝑝
𝑃
𝑝=1
• 𝛼 < Half of the largest eigen value of R.
• Commonly a small value is chosen (𝛼 = 0.1)

• For Single Layer Neurons for n inputs Widrow et al. (1988),

proposed:
0.1 ≤ 𝑛𝛼 ≤ 1.0
Application
• Bipolar Inputs & Outputs for AND Logic

Inputs Target
𝒙𝟏 𝒙𝟐 𝑩 𝒕
1 1 1 1
-1 1 1 -1
1 -1 1 -1
-1 -1 1 -1
Application
Initialization: 𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵 𝒙𝒊
• Set w1, w2, b, 𝛼 = Some small value 𝒃 𝒏𝒆𝒘 = 𝒃 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵

Inputs Target Net Error Weight Changes Weights MSE

𝒙𝟏 𝒙𝟐 𝑩 𝒕 𝒚𝑰𝑵 𝒕 − 𝒚𝑰𝑵 𝚫𝒘𝟏 𝚫𝒘𝟐 𝚫𝐛 𝒘𝟏 𝒘𝟐 𝒃
1 1 1 1
-1 1 1 -1
1 -1 1 -1
-1 -1 1 -1
Sum of Error
1 1 1 1
-1 1 1 -1
1 -1 1 -1
-1 -1 1 -1
Sum of Error
Application
Initialization: 𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵 𝒙𝒊
• Set w1, w2, b, 𝛼 = Some small value 𝒃 𝒏𝒆𝒘 = 𝒃 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵

Inputs Target Net Error Weight Changes Weights MSE

𝒙𝟏 𝒙𝟐 𝑩 𝒕 𝒚𝑰𝑵 𝒕 − 𝒚𝑰𝑵 𝚫𝒘𝟏 𝚫𝒘𝟐 𝚫𝒘𝟑 𝒘𝟏 𝒘𝟐 𝒘𝟑
1 1 1 1
-1 1 1 -1
1 -1 1 -1
-1 -1 1 -1
Sum of Error
1 1 1 1
-1 1 1 -1
1 -1 1 -1
-1 -1 1 -1
Sum of Error
𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵 𝒙𝒊
𝒃 𝒏𝒆𝒘 = 𝒃 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵
Application
𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵 𝒙𝒊
𝒃 𝒏𝒆𝒘 = 𝒃 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵
References
• L. Fausett, “Neural Network : Architectures, Algorithms & Applications”
• Chapter: 2
• Topics: 2.4, 2.4.1, 2.4.2

• Machine Learning week 1: Cost Function, Gradient Descent and Univariate Linear
Regression
• https://bit.ly/3hSveVF

22
Thank You!

3 DeltaRule PDF
No ratings yet
3 DeltaRule PDF
10 pages
Learning Rules
No ratings yet
Learning Rules
60 pages
Learning Rules in ANN
No ratings yet
Learning Rules in ANN
11 pages
CT1 NNDL Question Bank
No ratings yet
CT1 NNDL Question Bank
8 pages
Additional Topics
No ratings yet
Additional Topics
21 pages
Assignment Neural Networks
No ratings yet
Assignment Neural Networks
7 pages
NN3 PDF
No ratings yet
NN3 PDF
7 pages
ADALINE For Pattern Classification: Polytechnic University Department of Computer and Information Science
No ratings yet
ADALINE For Pattern Classification: Polytechnic University Department of Computer and Information Science
27 pages
Lec7 Inroduction to Neural Network (1)
No ratings yet
Lec7 Inroduction to Neural Network (1)
24 pages
Network Learning (Training)
No ratings yet
Network Learning (Training)
29 pages
Kevin Swingler - Lecture 3: Delta Rule
No ratings yet
Kevin Swingler - Lecture 3: Delta Rule
10 pages
Soft Computing
No ratings yet
Soft Computing
40 pages
chapter3- Perceptron Adaline
No ratings yet
chapter3- Perceptron Adaline
53 pages
NN-BNU4
No ratings yet
NN-BNU4
47 pages
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
No ratings yet
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
20 pages
Error
No ratings yet
Error
24 pages
Adaline
No ratings yet
Adaline
3 pages
Adaline
No ratings yet
Adaline
28 pages
3 DeltaRule PDF
No ratings yet
3 DeltaRule PDF
10 pages
ANN 2 A
No ratings yet
ANN 2 A
20 pages
Neural Networks For Machine Learning: Lecture 3a Learning The Weights of A Linear Neuron
No ratings yet
Neural Networks For Machine Learning: Lecture 3a Learning The Weights of A Linear Neuron
34 pages
Ci - Adaline & Madaline Network
No ratings yet
Ci - Adaline & Madaline Network
35 pages
L04 Slides.mlp1
No ratings yet
L04 Slides.mlp1
22 pages
What Is Neural Network Technology?
No ratings yet
What Is Neural Network Technology?
17 pages
Experiment 7 AISC
No ratings yet
Experiment 7 AISC
5 pages
Multi Layer Feed-Forward Network Learning
No ratings yet
Multi Layer Feed-Forward Network Learning
5 pages
SC - Unit 2
No ratings yet
SC - Unit 2
50 pages
The_delta_rule
No ratings yet
The_delta_rule
1 page
Neural Network Learning Rules
No ratings yet
Neural Network Learning Rules
33 pages
Lecture 02 - Artificial Neural Network
No ratings yet
Lecture 02 - Artificial Neural Network
37 pages
3-ADALINE (Adaptive Linear Neuron) (Widrow & Hoff, 1960) : W X T E
No ratings yet
3-ADALINE (Adaptive Linear Neuron) (Widrow & Hoff, 1960) : W X T E
8 pages
Back Propagation
No ratings yet
Back Propagation
20 pages
Lecture 1.2.1
No ratings yet
Lecture 1.2.1
85 pages
Neural Networks For Machine Learning: Lecture 3a Learning The Weights of A Linear Neuron
No ratings yet
Neural Networks For Machine Learning: Lecture 3a Learning The Weights of A Linear Neuron
34 pages
7th Lecture Widrow Hoff Learning Algorithm s1!21!22
No ratings yet
7th Lecture Widrow Hoff Learning Algorithm s1!21!22
20 pages
Lecture 2
No ratings yet
Lecture 2
12 pages
Unit - I Artificial Neural Networks
No ratings yet
Unit - I Artificial Neural Networks
23 pages
Unit - I Artificial Neural Networks
No ratings yet
Unit - I Artificial Neural Networks
23 pages
Supervised Learning Network
No ratings yet
Supervised Learning Network
33 pages
Chapter_7
No ratings yet
Chapter_7
68 pages
Lect3 UWA PDF
No ratings yet
Lect3 UWA PDF
73 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
Neuro Fuzzy - Session 3
No ratings yet
Neuro Fuzzy - Session 3
16 pages
Learning Rules
No ratings yet
Learning Rules
11 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Learning
No ratings yet
Learning
34 pages
Chapter3
No ratings yet
Chapter3
30 pages
Ppt-Ii NNFL
No ratings yet
Ppt-Ii NNFL
43 pages
Artificial Neural Network _ Training
No ratings yet
Artificial Neural Network _ Training
52 pages
Introduction To Neural Networks: Revision Lectures: © John A. Bullinaria, 2004
No ratings yet
Introduction To Neural Networks: Revision Lectures: © John A. Bullinaria, 2004
24 pages
2012-1158. Backpropagation NN
No ratings yet
2012-1158. Backpropagation NN
56 pages
Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)
No ratings yet
Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)
5 pages
Adaline and Medaline
50% (2)
Adaline and Medaline
14 pages
Lecture 8 - Supervised Learning in Neural Networks - (Part 1)
No ratings yet
Lecture 8 - Supervised Learning in Neural Networks - (Part 1)
7 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Week10 (Backprop and Competitive)
No ratings yet
Week10 (Backprop and Competitive)
63 pages