Perceptron Learning Rules
Perceptron Learning Rules
Objectives
One of the questions we raised in Chapter 3 was: ÒHow do we determine the
weight matrix and bias for perceptron networks with many inputs, where
it is impossible to visualize the decision boundaries?Ó In this chapter we
will describe an algorithm for training perceptron networks, so that they
can learn to solve classification problems. We will begin by explaining what
a learning rule is and will then develop the perceptron learning rule. We
will conclude by discussing the advantages and limitations of the single-
layer perceptron network. This discussion will lead us into future chapters.
4-1
4 Perceptron Learning Rule
Learning Rules
As we begin our discussion of the perceptron learning rule, we want to dis-
Learning Rule cuss learning rules in general. By learning rule we mean a procedure for
modifying the weights and biases of a network. (This procedure may also
4-2
Perceptron Architecture
Perceptron Architecture
Before we present the perceptron learning rule, letÕs expand our investiga-
tion of the perceptron network, which we began in Chapter 3. The general
perceptron network is shown in Figure 4.1.
The output of the network is given by
a = hardlim ( Wp + b ) . (4.2)
4-3
4 Perceptron Learning Rule
AA AA
AA AA
p a
Rx1
W Sx1
n
AA AA
SxR
Sx1
1 b
Sx1
R S
a = hardlim (Wp + b)
w 1, 1 w 1, 2 … w 1, R
w 2, 1 w 2, 2 … w 2, R
W = … . (4.3)
…
…
w S, 1 w S, 2 … w S, R
w i, 1
w i, 2
iw = . (4.4)
…
w i, R
T
1w
T
W = 2w . (4.5)
…
T
Sw
This allows us to write the ith element of the network output vector as
4-4
Perceptron Architecture
T
a i = hardlim ( n i ) = hardlim ( iw p + b i ) . (4.6)
a = hardlim (n)
Recall that the hardlim transfer function (shown at left) is defined as:
a = hardlim ( n ) = 1 if n ≥ 0 (4.7)
0 otherwise.
n = Wp + b Therefore, if the inner product of the ith row of the weight matrix with the
input vector is greater than or equal to – b i , the output will be 1, otherwise
the output will be 0. Thus each neuron in the network divides the input
space into two regions. It is useful to investigate the boundaries between
these regions. We will begin with the simple case of a single-neuron percep-
tron with two inputs.
4
Single-Neuron Perceptron
LetÕs consider a two-input perceptron with one neuron, as shown in Figure
4.2.
p1
A
w1,1
A Σ
A A
n a
p2 w1,2 b
1
a = hardlim (Wp + b)
a = hardlim ( n ) = hardlim ( Wp + b )
T
(4.8)
= hardlim ( 1w p + b ) = hardlim ( w 1, 1 p 1 + w 1, 2 p 2 + b )
Decision Boundary The decision boundary is determined by the input vectors for which the net
input n is zero:
T
n = 1w p + b = w 1, 1 p 1 + w 1, 2 p 2 + b = 0 . (4.9)
To make the example more concrete, letÕs assign the following values for
the weights and bias:
4-5
4 Perceptron Learning Rule
w 1, 1 = 1 , w 1, 2 = 1 , b = – 1 . (4.10)
This defines a line in the input space. On one side of the line the network
output will be 0; on the line and on the other side of the line the output will
be 1. To draw the line, we can find the points where it intersects the p 1 and
p 2 axes. To find the p 2 intercept set p 1 = 0 :
b –1
p 2 = – ---------- = – ------ = 1 if p 1 = 0 . (4.12)
w 1, 2 1
b –1
p 1 = – ---------- = – ------ = 1 if p 2 = 0 . (4.13)
w 1, 1 1
a = hardlim ( 1w p + b ) = hardlim 1 1 2 – 1 = 1 .
T
(4.14)
0
Therefore, the network output will be 1 for the region above and to the right
of the decision boundary. This region is indicated by the shaded area in Fig-
ure 4.3.
p2
a=1
1w
1w Tp + b = 0 1
p1
a=0 1
4-6
Perceptron Architecture
We can also find the decision boundary graphically. The first step is to note
1w that the boundary is always orthogonal to 1w , as illustrated in the adjacent
figures. The boundary is defined by
T
1w p + b = 0. (4.15)
1w For all points on the boundary, the inner product of the input vector with
the weight vector is the same. This implies that these input vectors will all
have the same projection onto the weight vector, so they must lie on a line
orthogonal to the weight vector. (These concepts will be covered in more de-
tail in Chapter 5.) In addition, any vector in the shaded region of Figure 4.3
will have an inner product greater than – b , and vectors in the unshaded
region will have inner products less than – b . Therefore the weight vector
1w will always point toward the region where the neuron output is 1.
After we have selected a weight vector with the correct angular orientation,
4
the bias value can be computed by selecting a point on the boundary and
satisfying Eq. (4.15).
2
+2
LetÕs apply some of these concepts to the design of a perceptron network to
implement a simple logic function: the AND gate. The input/target pairs for
the AND gate are
0 , t = 0 p = 0 , t = 0 p = 1 , t = 0 p = 1 , t = 1 .
p1 = 1 2 2 3 3 4 4
0 1 0 1
The figure to the left illustrates the problem graphically. It displays the in-
put space, with each input vector labeled according to its target. The dark
circles indicate that the target is 1, and the light circles indicate that
the target is 0.
The first step of the design is to select a decision boundary. We want to
have a line that separates the dark circles and the light circles. There are
an infinite number of solutions to this problem. It seems reasonable to
choose the line that falls ÒhalfwayÓ between the two categories of inputs, as
shown in the adjacent figure.
Next we want to choose a weight vector that is orthogonal to the decision
boundary. The weight vector can be any length, so there are infinite possi-
bilities. One choice is
1w
1w = 2 , (4.16)
AND 2
4-7
4 Perceptron Learning Rule
+ b = 2 2 1.5 + b = 3 + b = 0
T
1w p ⇒ b = –3 . (4.17)
0
We can now test the network on one of the input/target pairs. If we apply
p 2 to the network, the output will be
a = hardlim ( 1w p 2 + b ) = hardlim 2 2 0 – 3
T
(4.18)
1
a = hardlim ( – 1 ) = 0,
which is equal to the target output t 2 . Verify for yourself that all inputs are
correctly classified.
To experiment with decision boundaries, use the Neural Network Design
Demonstration Decision Boundaries (nnd4db).
Multiple-Neuron Perceptron
Note that for perceptrons with multiple neurons, as in Figure 4.1, there
will be one decision boundary for each neuron. The decision boundary for
neuron i will be defined by
T
iw p + bi = 0 . (4.19)
{p 1, t 1} , { p 2, t 2} , …, {p Q, tQ} , (4.20)
4-8
Perceptron Learning Rule
Test Problem
In our presentation of the perceptron learning rule we will begin with a
simple test problem and will experiment with possible rules to develop
some intuition about how the rule should work. The input/target pairs for
our test problem are
1 , t = 1 p = –1 , t = 0 p = 0 , t = 0 .
p1 =
2
1 2
2
2 3
–1
3
4
The problem is displayed graphically in the adjacent figure, where the two
input vectors whose target is 0 are represented with a light circle , and
1
2 the vector whose target is 1 is represented with a dark circle . This is a
very simple problem, and we could almost obtain a solution by inspection.
This simplicity will help us gain some intuitive understanding of the basic
concepts of the perceptron learning rule.
3 The network for this problem should have two-inputs and one output. To
simplify our development of the learning rule, we will begin with a network
without a bias. The network will then have just two parameters, w 1, 1 and
w 1, 2 , as shown in Figure 4.4.
p1
A
w1,1
A Σ
A A
n a
p2 w1,2
AA
a = hardlim(Wp)
AA
Figure 4.4 Test Problem Network
AA
1 By removing the bias we are left with a network whose decision boundary
2
must pass through the origin. We need to be sure that this network is still
AA
able to solve the test problem. There must be an allowable decision bound-
ary that can separate the vectors p 2 and p 3 from the vector p 1 . The figure
to the left illustrates that there are indeed an infinite number of such
3
boundaries.
4-9
4 Perceptron Learning Rule
AA
AA
The adjacent figure shows the weight vectors that correspond to the allow-
able decision boundaries. (Recall that the weight vector is orthogonal to the
decision boundary.) We would like a learning rule that will find a weight
AA
1
2 vector that points in one of these directions. Remember that the length of
the weight vector does not matter; only its direction is important.
AA 3
Constructing Learning Rules
Training begins by assigning some initial values for the network parame-
ters. In this case we are training a two-input/single-output network with-
out a bias, so we only have to initialize its two weights. Here we set the
elements of the weight vector, 1w , to the following randomly generated val-
ues:
T
1w = 1.0 – 0.8 . (4.21)
We will now begin presenting the input vectors to the network. We begin
with p 1 :
a = hardlim ( 1w p 1 ) = hardlim 1.0 – 0.8 1
T
2 (4.22)
a = hardlim ( – 0.6 ) = 0 .
The network has not returned the correct value. The network output is 0,
while the target response, t 1 , is 1.
1
2 We can see what happened by looking at the adjacent diagram. The initial
weight vector results in a decision boundary that incorrectly classifies the
vector p 1 . We need to alter the weight vector so that it points more toward
p 1 , so that in the future it has a better chance of classifying it correctly.
1w
3
One approach would be to set 1w equal to p 1 . This is simple and would en-
sure that p 1 was classified properly in the future. Unfortunately, it is easy
to construct a problem for which this rule cannot find a solution. The dia-
gram to the lower left shows a problem that cannot be solved with the
weight vector pointing directly at either of the two class 1 vectors. If we ap-
ply the rule 1w = p every time one of these vectors is misclassified, the net-
workÕs weights will simply oscillate back and forth and will never find a
solution.
Another possibility would be to add p 1 to 1w . Adding p 1 to 1w would make
1w point more in the direction of p 1 . Repeated presentations of p 1 would
cause the direction of 1w to asymptotically approach the direction of p 1 .
This rule can be stated:
new old
If t = 1 and a = 0, then 1w = 1w +p. (4.23)
4-10
Perceptron Learning Rule
Applying this rule to our test problem results in new values for 1w :
We now move on to the next input vector and will continue making changes
to the weights and cycling through the inputs until they are all classified
correctly.
3
The next input vector is p 2 . When it is presented to the network we find:
a = hardlim ( 1w p 2 ) = hardlim 2.0 1.2 – 1
T
2 (4.25)
4
= hardlim ( 0.4 ) = 1 .
– p 2 = 2.0 – – 1 = 3.0 ,
new old
1w = 1w (4.27)
1 1.2 2 – 0.8
2
–1 (4.28)
= hardlim ( 0.8 ) = 1 .
4-11
4 Perceptron Learning Rule
The diagram to the left shows that the perceptron has finally learned to
classify the three vectors properly. If we present any of the input vectors to
1
the neuron, it will output the correct class for that input vector.
2
This brings us to our third and final rule: if it works, donÕt fix it.
1w
new old
If t = a, then 1w = 1w . (4.30)
3
Here are the three rules, which cover all possible combinations of output
and target values:
new old
If t = 1 and a = 0, then 1w = 1w + p.
If t = 0 and a = 1, then 1w
new
= 1w
old
– p. (4.31)
new old
If t = a, then 1w = 1w .
e = t –a. (4.32)
new old
If e = 1, then 1w = 1w +p.
If e = – 1, then 1w
new
= 1w
old
– p. (4.33)
new old
If e = 0, then 1w = 1w .
Looking carefully at the first two rules in Eq. (4.33) we can see that the sign
of p is the same as the sign on the error, e. Furthermore, the absence of p
in the third rule corresponds to an e of 0. Thus, we can unify the three rules
into a single expression:
new old old
1w = 1w + ep = 1w + ( t – a )p . (4.34)
This rule can be extended to train the bias by noting that a bias is simply
a weight whose input is always 1. We can thus replace the input p in Eq.
(4.34) with the input to the bias, which is 1. The result is the perceptron
rule for a bias:
new old
b = b +e. (4.35)
4-12
Perceptron Learning Rule
Perceptron Rule The perceptron rule can be written conveniently in matrix notation:
new old T
4
W = W + ep , (4.38)
and
new old
b = b +e. (4.39)
2
+2
To test the perceptron learning rule, consider again the apple/orange rec-
ognition problem of Chapter 3. The input/output prototype vectors will be
1 1
p1 = –1 , t 1 = 0 p2 = 1 , t 2 = 1 . (4.40)
–1 –1
(Note that we are using 0 as the target output for the orange pattern, p 1 ,
instead of -1, as was used in Chapter 3. This is because we are using the
hardlim transfer function, instead of hardlims .)
Typically the weights and biases are initialized to small random numbers.
Suppose that here we start with the initial weight matrix and bias:
The first step is to apply the first input vector, p 1 , to the network:
1
a = hardlim ( Wp 1 + b ) = hardlim 0.5 – 1 – 0.5 – 1 + 0.5
(4.42)
–1
= hardlim ( 2.5 ) = 1
4-13
4 Perceptron Learning Rule
e = t 1 – a = 0 – 1 = –1 . (4.43)
1
a = hardlim (Wp 2 + b) = hardlim ( – 0.5 0 0.5 1 + ( – 0.5 ))
(4.46)
–1
= hardlim (– 0.5) = 0
e = t2 – a = 1 – 0 = 1 (4.47)
new old T
W = W + ep = – 0.5 0 0.5 + 1 1 1 – 1 = 0.5 1 – 0.5 (4.48)
new old
b = b + e = – 0.5 + 1 = 0.5 (4.49)
The third iteration begins again with the first input vector:
1
a = hardlim (Wp 1 + b) = hardlim ( 0.5 1 – 0.5 – 1 + 0.5)
(4.50)
–1
= hardlim (0.5) = 1
e = t 1 – a = 0 – 1 = –1 (4.51)
new old T
W = W + ep = 0.5 1 – 0.5 + ( – 1 ) 1 – 1 – 1 (4.52)
= – 0.5 2 0.5
4-14
Proof of Convergence
new old
b = b + e = 0.5 + ( – 1 ) = – 0.5 . (4.53)
If you continue with the iterations you will find that both input vectors will
now be correctly classified. The algorithm has converged to a solution. Note
that the final decision boundary is not the same as the one we developed in
Chapter 3, although both boundaries correctly classify the two input vec-
tors.
To experiment with the perceptron learning rule, use the Neural Network
Design Demonstration Perceptron Rule (nnd4pr).
Proof of Convergence
Although the perceptron learning rule is simple, it is quite powerful. In
fact, it can be shown that the rule will always converge to weights that ac- 4
complish the desired classification (assuming that such weights exist). In
this section we will present a proof of convergence for the perceptron learn-
ing rule for the single-neuron perceptron shown in Figure 4.5.
p1 w1,1
AAAA
AAAA
p2
p3
Σ n a
w1,R b
pR
1
a = hardlim (1wTp + b)
The network is provided with the following examples of proper network be-
havior:
{p 1, t 1} , {p 2, t 2} , …, {p Q, t Q} . (4.55)
Notation
To conveniently present the proof we will first introduce some new nota-
tion. We will combine the weight matrix and the bias into a single vector:
4-15
4 Perceptron Learning Rule
x = 1w . (4.56)
b
We will also augment the input vectors with a 1, corresponding to the bias
input:
pq
zq = . (4.57)
1
The perceptron learning rule for a single-neuron perceptron (Eq. (4.34) and
Eq. (4.35)) can now be written
new old
x = x + ez . (4.59)
x ( k ) = x ( k – 1 ) + z' ( k – 1 ) , (4.60)
{ z 1, z 2, …, z Q, – z 1, – z 2, …, – z Q } . (4.61)
We will assume that a weight vector exists that can correctly categorize all
Q input vectors. This solution will be denoted x∗ . For this weight vector
we will assume that
x∗ z q > δ > 0 if t q = 1 ,
T
(4.62)
and
x∗ z q < – δ < 0 if t q = 0 .
T
(4.63)
Proof
We are now ready to begin the proof of the perceptron convergence theo-
rem. The objective of the proof is to find upper and lower bounds on the
length of the weight vector at each stage of the algorithm.
4-16
Proof of Convergence
Assume that the algorithm is initialized with the zero weight vector:
x ( 0 ) = 0 . (This does not affect the generality of our argument.) Then, af-
ter k iterations (changes to the weight vector), we find from Eq. (4.60):
If we take the inner product of the solution weight vector with the weight
vector at iteration k we obtain
x∗ z' ( i ) > δ .
T
Therefore
(4.66)
4
x∗ x ( k ) > kδ .
T
(4.67)
where
2 T
x = x x. (4.69)
If we combine Eq. (4.67) and Eq. (4.68) we can put a lower bound on the
squared length of the weight vector at iteration k :
2
( x∗ x ( k ) ) ( kδ )
T 2
2
x(k ) ≥ --------------------------
- > ------------2- . (4.70)
x∗ x∗
2
Next we want to find an upper bound for the length of the weight vector.
We begin by finding the change in the length at iteration k :
2 T
x(k ) = x ( k )x ( k )
T
= [ x ( k – 1 ) + z' ( k – 1 ) ] [ x ( k – 1 ) + z' ( k – 1 ) ] (4.71)
T T
= x ( k – 1 )x ( k – 1 ) + 2x ( k – 1 )z' ( k – 1 )
T
+ z' ( k – 1 )z' ( k – 1 )
Note that
4-17
4 Perceptron Learning Rule
T
x ( k – 1 )z' ( k – 1 ) ≤ 0 , (4.72)
since the weights would not be updated unless the previous input vector
had been misclassified. Now Eq. (4.71) can be simplified to
2 2 2
x(k ) ≤ x(k – 1) + z' ( k – 1 ) . (4.73)
2 2
We can repeat this process for x ( k – 1 ) , x(k – 2) , etc., to obtain
+ … + z' ( k – 1 )
2 2 2
x(k ) ≤ z' ( 0 ) . (4.74)
2
If Π = max { z' ( i ) } , this upper bound can be simplified to
2
x(k ) ≤ kΠ . (4.75)
We now have an upper bound (Eq. (4.75)) and a lower bound (Eq. (4.70)) on
the squared length of the weight vector at iteration k . If we combine the
two inequalities we find
Π x∗
2 2
2 ( kδ )
kΠ ≥ x ( k ) > ------------2- or k < ----------------- . (4.76)
x∗
2
δ
Because k has an upper bound, this means that the weights will only be
changed a finite number of times. Therefore, the perceptron learning rule
will converge in a finite number of iterations.
The maximum number of iterations (changes to the weight vector) is in-
versely related to the square of δ . This parameter is a measure of how close
the solution decision boundary is to the input patterns. This means that if
the input classes are difficult to separate (are close to the decision bound-
ary) it will take many iterations for the algorithm to converge.
Note that there are only three key assumptions required for the proof:
1. A solution to the problem exists, so that Eq. (4.66) is satisfied.
2. The weights are only updated when the input vector is misclassified,
therefore Eq. (4.72) is satisfied.
3. An upper bound, Π , exists for the length of the input vectors.
Because of the generality of the proof, there are many variations of the per-
ceptron learning rule that can also be shown to converge. (See Exercise
E4.9.)
Limitations
The perceptron learning rule is guaranteed to converge to a solution in a
finite number of steps, so long as a solution exists. This brings us to an im-
4-18
Proof of Convergence
portant question. What problems can a perceptron solve? Recall that a sin-
gle-neuron perceptron is able to divide the input space into two regions.
The boundary between the regions is defined by the equation
T
1w p + b = 0. (4.77)
This problem is illustrated graphically on the left side of Figure 4.6, which
also shows two other linearly inseparable problems. Try drawing a straight
line between the vectors with targets of 1 and those with targets of 0 in any
of the diagrams of Figure 4.6.
4-19
4 Perceptron Learning Rule
Summary of Results
Perceptron Architecture
Input Hard Limit Layer
AA AA
AA AA
p a
Rx1
W Sx1
n
AA AA
SxR
Sx1
1 b
Sx1
R S
a = hardlim (Wp + b)
T
1w
T
a = hardlim ( Wp + b ) W = 2w
…
T
Sw
T
a i = hardlim ( n i ) = hardlim ( iw p + b i )
Decision Boundary
T
iw p + bi = 0 .
new old
b = b +e
where e = t – a .
4-20
Solved Problems
Solved Problems
P4.1 Solve the three simple classification problems shown in Figure
P4.1 by drawing a decision boundary. Find weight and bias values
that result in single-neuron perceptrons with the chosen decision
boundaries.
The next step is to find the weights and biases. The weight vectors must be
orthogonal to the decision boundaries, and pointing in the direction of
points to be classified as 1 (the dark points). The weight vectors can have
any length we like.
1w
1w
1w
4-21
4 Perceptron Learning Rule
Now we find the bias values for each perceptron by picking a point on the
decision boundary and satisfying Eq. (4.15).
T
1w p +b = 0
T
b = – 1w p
We can now check our solution against the original points. Here we test the
T
first network on the input vector p = – 2 2 .
T
a = hardlim ( 1w p + b )
= hardlim – 2 1 – 2 + 0
2
= hardlim ( 6 )
= 1
»2+2
ans = We can use MATLAB to automate the testing process and to try new
4 points. Here the first network is used to classify a point that was not in the
original problem.
w=[-2 1]; b = 0;
a = hardlim(w*[1;1]+b)
a =
0
0 , t = 1 p = 1 , t = 1 p = 0 , t = 0 p = 2 , t = 0
p1 = 1 2 2 3 3 4 4
2 0 –2 0
Each target t i indicates whether or not the net input in response to p i must
be less than 0, or greater than or equal to 0. For example, since t 1 is 1, we
4-22
Solved Problems
know that the net input corresponding to p 1 must be greater than or equal
to 0. Thus we get the following inequality:
Wp 1 + b ≥ 0
0w 1, 1 + 2w 1, 2 + b ≥ 0
2w 1, 2 + b ≥ 0.
2w 1, 2 + b ≥ 0 (i)
w 1, 1 + b ≥ 0 ( ii )
– 2w 1, 2 + b < 0 ( iii )
2w 1, 1 + b < 0 ( iv )
4
Solving a set of inequalities is more difficult than solving a set of equalities.
One added complexity is that there are often an infinite number of solu-
tions (just as there are often an infinite number of linear decision bound-
aries that can solve a linearly separable classification problem).
However, because of the simplicity of this problem, we can solve it by
graphing the solution spaces defined by the inequalities. Note that w 1, 1
only appears in inequalities (ii) and (iv), and w 1, 2 only appears in inequal-
ities (i) and (iii). We can plot each pair of inequalities with two graphs.
w1,1 w1,2
ii
iv
b b
iii i
Any weight and bias values that fall in both dark gray regions will solve
the classification problem.
Here is one such solution:
W = –2 3 b = 3.
4-23
4 Perceptron Learning Rule
class 1: p 1 = 1 , p 2 = 1 , class 2: p 3 = 2 , p 4 = 2 ,
1 2 –1 0
class 3: p 5 = – 1 , p 6 = – 2 , class 4: p 7 = – 1 , p 8 = – 2 .
2 1 –1 –2
p
2x1 AA
AA
W AA
AA n
a
2x1
AA AA
2x2
2x1
1 b
2x1
2 2
a = hardlim (Wp + b)
4-24
Solved Problems
1
3
class 1: t 1 = 0 , t 2 = 0 , class 2: t 3 = 0 , t 4 = 0 ,
0 0 1 1
class 3: t 5 = 1 , t 6 = 1 , class 4: t 7 = 1 , t 8 = 1 .
0 0 1 1
4-25
4 Perceptron Learning Rule
1w = – 3 and 2w = 1 .
–1 –2
Note that the lengths of the weight vectors is not important, only their di-
rections. They must be orthogonal to the decision boundaries. Now we can
calculate the bias by picking a point on a boundary and satisfying Eq.
(4.15):
b 1 = – 1w p = – – 3 – 1 0 = 1 ,
T
b 2 = – 2w p = – 1 – 2 0 = 0 .
T
1
3
T
1w
W = = – 3 – 1 and b = 1 ,
T 1 –2 0
2w
4-26
Solved Problems
2 , t = 0 p = 1 , t = 1 p = –2 , t = 0 p = –1 , t = 1
p1 = 1 2 2 3 3 4 4
2 –2 2 1
W(0) = 0 0 b(0) = 0 .
We start by calculating the perceptronÕs output a for the first input vector
p 1 , using the initial weights and bias.
a = hardlim ( W ( 0 )p 1 + b ( 0 ) )
4
= hardlim 0 0 2 + 0 = hardlim ( 0 ) = 1
2
The output a does not equal the target value t 1 , so we use the perceptron
rule to find new weights and biases based on the error.
e = t 1 – a = 0 – 1 = –1
T
W ( 1 ) = W ( 0 ) + ep 1 = 0 0 + ( – 1 ) 2 2 = – 2 – 2
b ( 1 ) = b ( 0 ) + e = 0 + ( –1 ) = –1
We now apply the second input vector p 2 , using the updated weights and
bias.
a = hardlim ( W ( 1 )p 2 + b ( 1 ) )
= hardlim – 2 – 2 1 – 1 = hardlim ( 1 ) = 1
–2
This time the output a is equal to the target t 2 . Application of the percep-
tron rule will not result in any changes.
W(2) = W(1)
b(2) = b(1)
We now apply the third input vector.
4-27
4 Perceptron Learning Rule
a = hardlim ( W ( 2 )p 3 + b ( 2 ) )
= hardlim – 2 – 2 – 2 – 1 = hardlim ( – 1 ) = 0
2
The output in response to input vector p 3 is equal to the target t 3 , so there
will be no changes.
W(3) = W(2)
b(3) = b(2)
We now move on to the last input vector p 4 .
a = hardlim ( W ( 3 )p 4 + b ( 3 ) )
= hardlim – 2 – 2 – 1 – 1 = hardlim ( – 1 ) = 0
1
This time the output a does not equal the appropriate target t 4 . The per-
ceptron rule will result in a new set of values for W and b .
e = t4 – a = 1 – 0 = 1
T
W ( 4 ) = W ( 3 ) + ep 4 = – 2 – 2 + ( 1 ) – 1 1 = – 3 – 1
b(4) = b(3) + e = – 1 + 1 = 0
We now must check the first vector p 1 again. This time the output a is
equal to the associated target t 1 .
a = hardlim ( W ( 4 )p 1 + b ( 4 ) )
= hardlim – 3 – 1 2 + 0 = hardlim ( – 8 ) = 0
2
Therefore there are no changes.
W(5) = W(4)
b(5) = b(4)
The second presentation of p 2 results in an error and therefore a new set
of weight and bias values.
4-28
Solved Problems
a = hardlim ( W ( 5 )p 2 + b ( 5 ) )
= hardlim – 3 – 1 1 + 0 = hardlim ( – 1 ) = 0
–2
Here are those new values:
e = t2 – a = 1 – 0 = 1
T
W ( 6 ) = W ( 5 ) + ep 2 = – 3 – 1 + ( 1 ) 1 – 2 = – 2 – 3
b ( 6 ) = b ( 5 ) + e = 0 + 1 = 1.
W = –2 –3 b = 1.
Now we can graph the training data and the decision boundary of the solu-
tion. The decision boundary is given by
n = Wp + b = w 1, 1 p 1 + w 1, 2 p 2 + b = – 2 p 1 – 3 p 2 + 1 = 0 .
b 1 1
p 2 = – ---------- = – ------ = --- if p 1 = 0 .
w 1, 2 –3 3
b 1 1
p 1 = – ---------- = – ------ = --- if p 2 = 0 .
w 1, 1 –2 2
4-29
4 Perceptron Learning Rule
1 , t = 0 p = 1 , t = 0 p = 2 , t = 0
p1 = 1 2 2 3 3
1 0 2 0 –1 1
2 , t = 0 p = –1 , t = 1 p = –2 , t = 1
p4 = 4 5 5 6 6
0 1 2 0 1 0
–1 , t = 1 p = –2 , t = 1 .
p7 = 7 8 8
–1 1 –2 1
LetÕs begin the algorithm with the following initial weights and biases:
W(0) = 1 0 , b(0) = 1 .
0 1 1
a = hardlim (W ( 0 )p 1 + b ( 0 )) = hardlim ( 1 0 1 + 1 ) = 1 ,
0 1 1 1 1
4-30
Solved Problems
e = t1 – a = 0 – 1 = –1 ,
0 1 –1
W ( 1 ) = W ( 0 ) + ep 1 = 1 0 + – 1 1 1 = 0 – 1 ,
T
0 1 –1 –1 0
b ( 1 ) = b ( 0 ) + e = 1 + –1 = 0 .
1 –1 0
e = t2 – a = 0 – 0 = 0 ,
0 0 0
T
W ( 2 ) = W ( 1 ) + ep 2 = 0 –1 + 0 0 –1 ,
1 2 =
–1 0 0 –1 0
b(2) = b(1) + e = 0 + 0 = 0 .
0 0 0
a = hardlim (W ( 2 )p 3 + b ( 2 )) = hardlim ( 0 – 1 2 + 0 ) = 1 ,
–1 0 –1 0 0
e = t3 – a = 0 – 1 = –1 ,
1 0 1
T
W ( 3 ) = W ( 2 ) + ep 3 = 0 –1 + –1 –2 0 ,
2 –1 =
–1 0 1 1 –1
4-31
4 Perceptron Learning Rule
b ( 3 ) = b ( 2 ) + e = 0 + –1 = –1 .
0 1 1
a = hardlim (W ( 8 )p 1 + b ( 8 )) = hardlim ( – 2 0 1 + – 1 ) = 0 ,
1 –1 1 1 1
e = t1 – a = 0 – 0 = 0 ,
0 1 –1
W ( 9 ) = W ( 8 ) + ep 1 = – 2 0 + 0 1 1 = – 2 0
T
,
1 –1 –1 0 –2
b ( 9 ) = b ( 8 ) + e = –1 + 0 = –1 .
1 –1 0
At this point the algorithm has converged, since all input patterns will be
correctly classified. The final decision boundaries are displayed in Figure
P4.7. Compare this result with the network we designed in Problem P4.3.
1
3
4-32
Epilogue
Epilogue
In this chapter we have introduced our first learning rule Ñ the perceptron
learning rule. It is a type of learning called supervised learning, in which
the learning rule is provided with a set of examples of proper network be-
havior. As each input is applied to the network, the learning rule adjusts
the network parameters so that the network output will move closer to the
target.
The perceptron learning rule is very simple, but it is also quite powerful.
We have shown that the rule will always converge to a correct solution, if
such a solution exists. The weakness of the perceptron network lies not
with the learning rule, but with the structure of the network. The standard
perceptron is only able to classify vectors that are linearly separable. We
will see in Chapter 11 that the perceptron architecture can be generalized
4
to mutlilayer perceptrons, which can solve arbitrary classification prob-
lems. The backpropagation learning rule, which is introduced in Chapter
11, can be used to train these networks.
In Chapters 3 and 4 we have used many concepts from the field of linear
algebra, such as inner product, projection, distance (norm), etc. We will
find in later chapters that a good foundation in linear algebra is essential
to our understanding of all neural networks. In Chapters 5 and 6 we will
review some of the key concepts from linear algebra that will be most im-
portant in our study of neural networks. Our objective will be to obtain a
fundamental understanding of how neural networks work.
4-33
4 Perceptron Learning Rule
Further Reading
[BaSu83] A. Barto, R. Sutton and C. Anderson, ÒNeuron-like adap-
tive elements can solve difficult learning control problems,Ó
IEEE Transactions on Systems, Man and Cybernetics, Vol.
13, No. 5, pp. 834Ð846, 1983.
A classic paper in which a reinforcement learning algo-
rithm is used to train a neural network to balance an in-
verted pendulum.
[Brog91] W. L. Brogan, Modern Control Theory, 3rd Ed., Englewood
Cliffs, NJ: Prentice-Hall, 1991.
A well-written book on the subject of linear systems. The
first half of the book is devoted to linear algebra. It also has
good sections on the solution of linear differential equa-
tions and the stability of linear and nonlinear systems. It
has many worked problems.
[McPi43] W. McCulloch and W. Pitts, ÒA logical calculus of the ideas
immanent in nervous activity,Ó Bulletin of Mathematical
Biophysics, Vol. 5, pp. 115Ð133, 1943.
This article introduces the first mathematical model of a
neuron, in which a weighted sum of input signals is com-
pared to a threshold to determine whether or not the neu-
ron fires.
[MiPa69] M. Minsky and S. Papert, Perceptrons, Cambridge, MA:
MIT Press, 1969.
A landmark book that contains the first rigorous study de-
voted to determining what a perceptron network is capable
of learning. A formal treatment of the perceptron was need-
ed both to explain the perceptronÕs limitations and to indi-
cate directions for overcoming them. Unfortunately, the
book pessimistically predicted that the limitations of per-
ceptrons indicated that the field of neural networks was a
dead end. Although this was not true, it temporarily cooled
research and funding for research for several years.
[Rose58] F. Rosenblatt, ÒThe perceptron: A probabilistic model for
information storage and organization in the brain,Ó Psycho-
logical Review, Vol. 65, pp. 386Ð408, 1958.
This paper presents the first practical artificial neural net-
work Ñ the perceptron.
4-34
Further Reading
4-35
4 Perceptron Learning Rule
Exercises
E4.1 Consider the classification problem defined below:
–1 , t = 1 p = 0 , t = 1 p = 1 , t = 1 p = 1 , t = 0
p1 = 1 2 2 3 3 4 4
1 0 –1 0
0 , t = 0 .
p5 = 5
1
–1 , t = 1 p = –1 , t = 1 p = 0 , t = 0 p = 1 , t = 0 .
p1 = 1 2 2 3 3 4 4
1 –1 0 0
p5 = –2 p6 = 1 p7 = 0 p8 = –1
0 1 1 –2
iv. Which of the vectors in part (iii) will always be classified the same
way, regardless of the solution values for W and b ? Which may
vary depending on the solution? Why?
E4.3 Solve the classification problem in Exercise E4.2 by solving inequalities (as
in Problem P4.2), and repeat parts (ii) and (iii) with the new solution. (The
solution is more difficult than Problem P4.2, since you canÕt isolate the
weights and biases in a pairwise manner.)
4-36
Exercises
E4.4 Solve the classification problem in Exercise E4.2 by applying the percep-
tron rule to the following initial parameters, and repeat parts (ii) and (iii)
with the new solution.
W(0) = 0 0 b(0) = 0
E4.5 Prove mathematically (not graphically) that the following problem is un-
solvable for a two-input/single-neuron perceptron.
–1 , t = 1 p = –1 , t = 0 p = 1 , t = 1 p = 1 , t = 0
p1 = 1 2 2 3 3 4 4
1 –1 –1 1
E4.7 The vectors in the ordered set defined below were obtained by measuring
the weight and ear lengths of toy rabbits and bears in the Fuzzy Wuzzy An-
imal Factory. The target values indicate whether the respective input vec-
»2+2 tor was taken from a rabbit (0) or a bear (1). The first element of the input
ans = vector is the weight of the toy, and the second element is the ear length.
4
1 , t = 0 p = 1 , t = 0 p = 2 , t = 0 p = 2 , t = 0
p1 = 1 2 2 3 3 4 4
4 5 4 5
4-37
4 Perceptron Learning Rule
3 , t = 1 p = 3 , t = 1 p = 4 , t = 1 p = 4 , t = 1
p5 = 5 6 6 7 7 8 8
1 2 1 2
p3 = 2 .
2
i. Is the problem still linearly separable? Demonstrate your answer
graphically.
»2+2
ans = ii. Use MATLAB and to initialize and train a network to solve this
4 problem. Explain your results.
iii. If p 3 is changed to
p3 = 2
1.5
new old
b = b + αe
4-38