0% found this document useful (0 votes)
418 views

Learning Rules

This document discusses artificial neural network learning rules. It begins by introducing neural networks and their basic components and mathematics. It then discusses various neural network learning rules including Hebbian, perceptron, delta, and correlation rules. Key aspects of the general learning rule framework are presented. Examples are provided to illustrate how weights are updated under the Hebbian and perceptron learning rules.

Uploaded by

Akhil Arora
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
418 views

Learning Rules

This document discusses artificial neural network learning rules. It begins by introducing neural networks and their basic components and mathematics. It then discusses various neural network learning rules including Hebbian, perceptron, delta, and correlation rules. Key aspects of the general learning rule framework are presented. Examples are provided to illustrate how weights are updated under the Hebbian and perceptron learning rules.

Uploaded by

Akhil Arora
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 60

Artificial Neural

Network
Learning Rules

Presented By:
Suman
Asstt. Prof.
S.S.I.E.T.
DerraBassi
Neural Network

Inputs Outputs

Connection between cells


Artificial Neural Network
.
x0 1
W0
x1 W1
W2
x2 g

Wn

xn
Neural Network
Mathematics

Output
Inputs

y11 f ( x1 , w11 ) y11 2


1 y1 f ( y 1 , w12 ) y32
y 12 f ( x2 , w12 ) y 1 y 2 2 2
y 2 f ( y , w2 )
1 2
y 2
y y f ( y 2
, w 3
1)
1 3 Out
y31 f ( x3 , w31 ) y3 2 2
y 1 y3 f ( y , w3 ) y3
1 2

y 14 f ( x4 , w14 ) 4
Neural Network Learning
Rules
A Neuron is considered to be an adaptive
element.
Its weight are modifiable depending on the
input signal,
it receives, its output value and associated
teacher response.
In some cases the teacher signal is not
available and
no error information can be used, thus the
neuron
will modify its weight based only on the
input and or/ output. This a case of
unsupervised learning.
Types of Learning Rules
Hebbian
Preception
Delta
Widrow Hof
Correlation
Winner-take-all
Outstar
General Learning Rule
The weight vector
Wi = [Wi1 Wi2 Win]
Increases in proportion to the product to
input x and learning signal r.
The learning signal r is in general a function
of w i , x and sometimes of the teachers
signal d i.
The learning signal is function of
The increment of the weight vector w i
produced by the learning step at time t
according to the general rule
r r ( wi , x, d i )

Where c is a positive number called the


learning constant that determines the rate of
learningwi (t ) cr[ wi (t ), x(t ), d i (t )]x(t )

The weight vector adapted atx (time tt )]


becomes
w (t 1) w (
at thei next instant,
i t ) cr[ w (t
or learning
i ), t ), d
step. i ( x(t )
The subscript index is the discrete time
training steps.
For kth step
k 1
wi wi cr ( wi , x , d i ) x
k k k k k

Thelearning assumes the form of a


sequences of discrete time weight
modifications
Hebbian Learning rule
The hebbian rule represents a purely feed
forward, unsupervised learning.
Statement
When an axon of cell A is near enough to
excite a cell B and repeatedly takes place in
firing it , some growth process or metabolic
change takes place in one or both cells such
that As efficiency, as one of the cells firing
B , is increased
Thelearning signal is equal simply to the
neurons output.
r f ( wit x )

The increment of the weight vector becomes

wi cf ( wit x) x
The single weight is adapted using the
following increments
wij cf ( wit x) x j
This can be written as
wij coi x j
The learning rule requires the weight
initialization at small values around 0 prior to
learning.
The rule states that if the crossoi xproduct
j of
output of output and input, or correlation
terms is positive, this results in an
increase of weight wij ; otherwise the weight
decreases.
Example
Assume the network shown in figure with initial
weight vector 1
1
w
1
0

0.5

Needs to be trained using the set of three input


vectors as below
1 1 0
2 0.5 1
x1 x2 x3
1.5 2 1

0 1. 5 1.5

For an arbitrary choice of learning constant c


=1.
Network for Training
Step 1
Inputx1applied to the network results in
activation net1

1
2
net 1 w1t x1 [1 1 0 0.5] 3
1.5

0
The updated weight are

w 2 w1 sgn(net 1 ) x1 w1 x1

1 1 2
1 2 3
w 2 w1 x1
0 1.5 1.5

0.5 0 0.5
Step2
The learning step with x2 input

1
0.5
net 2 w 2t x 2 [2 3 1.5 0.5] 0.25
2

1.5
The updated weights are

w 3 w 2 sgn(net 2 ) x 2 w 2 x 2

2 1 1
3 0.5 2.5
w3 w 2 x2
1.5 2 3.5

0 .5 1 .5 2
Step 3
For input x3 we obtain

0
1
net 3 w 3t x3 [1 2.5 3.5 2] 3
1

1.5
The updated weights are

w 4 w 3 sgn(net 3 ) x3 w 3 x3

1 0 1
2.5 1 3.5
w 4 w 3 x3
3.5 1 4.5

2 1.5 0.5
It can be seen that learning with discrete
f(net) and c= 1 results in adding or
subtracting the entire input pattern vectors
to and from the weight vector.
In the case of a continuous f(net) , the
weight incrementing/decrementing vector is
scaled down to a fractional value of the input
pattern
Example with bipolar
activation
Solving the previous example with bipolar
activation function f(net) , input Xi and initial
weight W1. 1
j (x j ) 1
j x j
1 e

STEP 1
1.905
f (net ) 0.905
1
2.81
w2
1.357

0 . 5
STEP 2 1.828
2.772
f (net ) 0.077
2 w3
1.512

0 .616

STEP 3

f (net 3 ) 0.932 1.828


3.70
w4
2.44

0 . 783
Comparison of learning using discrete and
continuous activation function indicates that
the weight adjustments are tapered for
continuous f(net) but are generally in the
same direction.
Perceptron Learning Rule
For the perception learning rule, the learning
signal is the diference between the desired
and actual neurons response.
Thus learning is supervised and the learning
signal is equal to
r d i oi

o
Where i sgn( w t
i x) and d i is the desired
response
Theweight adjustment in this method are
obtained as follows.
wi c[ d i sgn( wit x )]x

For j = 1,2 ,n

wij c[d i sgn( wit x)]x j

Note that this rule is applicable only for


binary neuron response.
The weights are adjusted if and only if oi is
incorrect.
Obviously , since the desired response is either
1 or -1,the weight adjustment reduces to
wi 2cx
Plus andw t x) 1
sign when di=1, sgn(
Minus sign when d = -1 and x) 1
t
sgn( w
The weight adjustment is inherently zero when
the desired and actual responses agree.
Perceptron Learning Rule
Example
Input vectors
1 0 1
2 1.5 1
x1 , x 2 , x3
0 0.5 0.5

1 1 1

Initial weight vector 1



1
w
1

0

0.5

Learning constant = 0.1


The teacher desired response for x1 , x2, x3 are
d1 = -1 , d2 = -1 and d3 = 1
Step1
Input is x1desired output is d1
1
2
net 1 w1t x1 [1 1 0 0.5] 2.5
0

1

Correction in this step is necessary since


d 1 sgn( 2.5)
The updated weight vector

w 2 w1 0.1(1 1) x1
Therefore

1 1 0.8
1 2 0.6
w2 0.2
0 0 0

0.5 1 0.7
Step2
Inputis x2, desired output is d2. For the
present vector w2. The activation value as
follows. 0.8
0.6
net w x 2 [0 1.5
2 2t
0.5 1] 1.6
0

0. 7

Correction is not performed in this step since


d2 = sgn(-1.6)
Step3
Input is x3, desired output is d3, present vector
is w3. Computing activation function.
0 .8
0.6
net 3 w 3t x3 [ 1 1 0.5 1] 2.1
0

0 . 7
Correction is necessary in this step since

d 3 sgn(2.1)
The updated weight values are
0.8 1 0.6
0.6 1 0.4
w4 0.2
0 0.5 0.1

0. 7 1 0. 5
Delta Learning Rule
The delta learning is only valid for
continuous activation function in supervised
traning mode.
The learning signal for this rule is called
delta and is definedt as follows.
r [d1 f ( wi x)] f ' ( wit x)

f ' ( wit x)
The term is the derivative of the
activation function f(net) computed for
net wit x
Delta learning Rule
The learning rule can be readily derived from
the condition of least squared error between
o i and d i.
Calculating the gradient vector with respect
to w i of the squared error defined as
1
E ( d i oi ) 2
2

1
E [ d i f ( wit x i )] 2
2
We obtain the error gradient vector value
E (d i oi ) f ' ( wit x) x

The components of the gradient vector are


E
( d i oi ) f ' ( wit x ) x j
wij
for j = 1, 2 ,3 ..n
The minimization of the error requires the
weight changes to be in the negative
gradient direction, we take
wi E

Wherewis a positive constant.
i ( d i oi ) f ' ( net i ) x

For the
wsingle
ij weight
( d i
adjustment
o i ) f ' ( becomes
net i )x j

Forj= 1,2 , 3n
The weight adjustment is computed based
on minimization of the squared error.
Example
Input vectors
1 0 1
2 1.5 1
x1 , x 2 , x3
0 0.5 0.5

1 1 1
Initial weight vector
1
1
w1
0

0.5

Learning constant = 0.1and


0.1
The teacher desired response for x 1 , x2, x3 are
d1 = -1 , d2 = -1 and d3 = 1
Step1
Input is vector x1, initial weight vector is w1

net 1 w1t x1 2.5


o 1 f ( net 1 ) 0.848
1
f ' ( net ) [1 (o1 ) 2 ] 0.140
1

2
1
w2 c(d1 o1 ) f ' ( net 1 ) x1 w
t
[0.974 0.948 0 0.526]
Step2
Input is vector x2, weight vector is w3
net 2 w 2t x1 1.948
o 2 f (net 2 ) 0.75
1
f ' (net ) [1 (o 2 ) 2 ] 0.218
2

2
w 3 c(d 2 o 2 ) f ' (net 2 ) x 2 w 2

[0.974 0.956 0.002 0.531]t


Step3
Input vector is x3 and weight vector is w3

net 3 w 3t x3 2.46
o 3 f (net 3 ) 0.842
1
f ' ( net 3 ) [1 (o 3 ) 2 ] 0.145
2
w 4 c(d 3 o 3 ) f ' (net 3 ) x3 w 3

[0.974 0.929 0.016 0.505]t


Inthis example the desired values are +1
and -1, the corrections will be performed in
each step because
d i f (net i ) 0

Thismethod requires small c values , since it


is based on moving the weight vector in the
weight space in the negative error gradient
direction.
Widrow - Hof Learning
Rule
It is applicable for the supervised traning of
neural network.
It is independent of the activation function of
neurons used since it minimizes the squared
error between the desired output value d i
and the neurons activation value
net i wit x
The learning signal for this rule is defined as
r d i wit x
The weight vector increment under this learning is

wi c(d i wit x) x
Or, for the single weight the adjustment is

wij c(d i wit x) x j


For j = 1, 2..n
This rule is a special case of the delta
learning rule.
In this activation function f( net ) = net ,

we obtain f( net )=1


This rule sometimes called as the LMS
( Least Mean Square) learning rule.
Correlation Learning Rule
General Learning Rule
r r ( wi , x, d i )
wi (t ) cr[ wi (t ), x(t ), d i (t )]x(t )
By substituting r = d i into the general rule ,
we obtain correlation learning rule.
The adjustment of weight vector are
wi cd i x wij cd i x j
For j= 1, 2..n
Statement

If d i is the desired response due to x j , the


corresponding weight increase is proportional
to their product.
Itcan be interpreted as a special case of of
the Hebbian learning rule with binary
activation function and for o i = d i
Hebbian leaning is performed in an
unsupervised environment, while correlation
learning is supervised.
Similar to Hebbian learning , this rule
requires the weight initialization w = 0
Winner- Take-All Learning
Rule
This rule is diferent from any one of the
rule.
This rule is an example of competitive
learning, and it is used for unsupervised
network training.
The learning is based on the premise that
one of the neurons in the layer, say the
mth, has the maximum response due to
input x . This neuron is declared as winner.
Winner Takes All
As
a result of this winning event, the
weight vector
wm [ wm1 wm 2 wmn ]

Theweights highlighted in the figure is the


only one adjusted in the given
wm ( x step.
unsupervised learning
wm )
The individual weight adjustment becomes
wmj ( x j wmj )
Where is a small learning constant. It
decreases as the learning progresses.
The winner selection is based on the
maximum activation among all p neurons
participating in a competition. This criterion
corresponds to finding the weight vector that
is closest to the input x. For I = 1, 2.p
wmt x max(wit x)
Outstar Learning Rule
Thisrule is designed to produce a desired
response d of the layer of p neurons as
shown in fig.
The rule is used to provide learning of
repetitive and characteristics properties of
input/output relationship.
This rule is concerned with supervised
learning.
However, it is supposed to allow the network
to extract statistical properties of input and
output signals.
Theweight adjustment in this rule are
computed as follows.
w j (d w j )

The individual weight adjustment are


wmj (d m wmj )
For m = 1, 2, ..,p
Note that in contrast to any learning rule
discussed previously, the adjusted weights
are fanning out of the j th node. In this
learning method and the weight vector is
defined as
w j [ w1 j w2 j t
w pj ]

The rule ensures that the output pattern


becomes similar to the undistorted desired
output after repetitively on distorted output
versions.
Summary of Learning
Rules

You might also like