Hoff Learning Rule
Hoff Learning Rule
Learning Rule
Dr. Nitish Katal
1
Outline
2
Learning by Error Minimization
• What is the Best Learning Parameters?
• Learn from your mistakes
• Define the cost (or loss) for a particular weight vector w to be:
• Sum of squared costs over the training set
1
𝐸 = 𝑡 − 𝑦𝐼𝑁 2
2
• One strategy for learning:
• Find the w with least cost on this data
3
Delta Rule /
Widrow & Hoff Learning rule
• Learning: minimizing mean squared error
1
𝐸 = 𝑡 − 𝑦𝐼𝑁 2
2
• Different strategies exist for learning by optimization
• Gradient descent is a popular algorithm
• The Delta rule for adjusting the weight for each pattern is given
as
∆𝑤𝐼 = 𝛼 𝑡 − 𝑦𝐼𝑁 𝑥𝐼
4
Delta Rule
Now:
x : Vector of activations of the inputs
yIN : Net input to the output unit Y
t : Target
𝑛
𝑦𝐼𝑁 = 𝑥𝑖 𝑤𝑖
𝑖=1
Then, the squared error for a particular training pattern is:
1
𝐸 = 𝑡 − 𝑦𝐼𝑁 2
2
𝜕𝐸 𝜕 1 𝑛
So: = 𝑡 − 𝑦𝐼𝑁 2
= 𝑡 − 𝑥𝑖 𝑤𝑖
𝜕 𝑡 − 𝑥𝑖 𝑤𝑖
𝜕𝑤𝐼 𝜕𝑤𝐼 2 𝜕𝑤𝐼
𝑖=1
𝑛 𝑛
7
Summary : Delta/WH/LMS Rule
∆𝑤𝐼 = 𝛼 𝑡 − 𝑦𝐼𝑁 𝑥𝐼
8
Choice of Learning Rate
•
9
Summary
• The real life problems are complex and non-
convex.
• Purpose of NN Learning or Training:
• Minimise the output errors for particular set of
training data
• By adjusting the network weights wij.
• Error Function E(wij)
• “Measures” how far the current network is from the
desired one.
• Gradient of the Error Function
• Partial derivatives of error function ∂E(wij)/∂wij
• Guides the direction to move in weight space to
reduce the error.
• The learning rate 𝛼 specifies:
• The step sizes we take in weight space for each
iteration of the weight update equation.
• Keep iterating
• The weight space until the errors are ‘small enough’.
• Choice of Activation functions with
11 derivatives
ADALINE
Learning Rule & Architecture
12
Outline
• ADALINE
• Architecture
• Learning Algorithm
• Example
ADALINE : Introduction
• Adaline : Adaptive Linear Neuron
• NN having a single linear unit.
• Features of Adaline
• Uses bipolar activation function.
• Uses delta rule for training to
minimize the MSE between the actual +1 if 𝑦𝐼𝑁 ≥ 0
output and the desired/target output. 𝑓 𝑦𝐼𝑁 =ቊ
−1 if 𝑦𝐼𝑁 < 0
• The weights and the bias are
adjustable.
Algorithm
Step 0 Initialize all weights
𝒘𝒊 (Small random values ) (i = 1 … n) n is the no. of input neurons
Set Learning Rate
Step 1 While Stopping Criteria is FALSE, Do Steps 2 - 6
Step 2 For each : Bipolar Training Vector: Inputs (s) & Target (t)
Do Steps : 3 - 5
Step 3 : Set activations for input units
𝒙𝒊 = 𝒔𝒊 (i = 1 … n)
Step 4 : Find Net Input to Output units
𝒚𝑰𝑵 = 𝒃 + 𝒙𝒊 𝒘𝒊
𝒊
Step 5 : Update the Weights & Bias, (i = 1 … n)
𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵 𝒙𝒊
𝒃 𝒏𝒆𝒘 = 𝒃 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵
Step 6 : Check Stopping Criteria
If Largest Weight Change that in Step 2 is SMALLER than a specified tolerance, then STOP;
else continue.
Choice of Learning Rate
• Hecht-Nelson (1990), proposed:
• An upper bound for its value can be found from the largest Eigen
values of the correlational matrix R of input vectors x(p).
𝑃
1
𝑅 = 𝑥 𝑝 𝑇𝑥 𝑝
𝑃
𝑝=1
• 𝛼 < Half of the largest eigen value of R.
• Commonly a small value is chosen (𝛼 = 0.1)
Inputs Target
𝒙𝟏 𝒙𝟐 𝑩 𝒕
1 1 1 1
-1 1 1 -1
1 -1 1 -1
-1 -1 1 -1
Application
Initialization: 𝒘𝒊 𝒏𝒆𝒘 = 𝒘𝒊 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵 𝒙𝒊
• Set w1, w2, b, 𝛼 = Some small value 𝒃 𝒏𝒆𝒘 = 𝒃 𝒐𝒍𝒅 + 𝜶 𝒕 − 𝒚𝑰𝑵
• Machine Learning week 1: Cost Function, Gradient Descent and Univariate Linear
Regression
• https://bit.ly/3hSveVF
22
Thank You!
23