The Multilayer Perceptron
The Multilayer Perceptron
first proposed in 1943 by McCulloch and Pitts. t consists of a summing function with an internal threshold, and !weighted! inputs as shown below.
"or a neuron recei#ing n inputs, each input xi $ i ranging from 1 to n% is weighted by multiplying it with a weight wi . The sum of the wixi products gi#es the net acti#ation of the neuron. This acti#ation #alue is sub&ected to a transfer function to produce the neuron's output. The weight #alue of the connection or lin( carrying signals from a neuron i to a neuron j is termed wij..
wij j
Transfer functions )ne of the design issues for *++s is the type of transfer function used to compute the output of a node from its net acti#ation. *mong the popular transfer functions are, -tep function -ignum function -igmoid function .yperbolic tangent function n the step function, the neuron produces an output only when its net acti#ation reaches a minimum #alue / (nown as the threshold. "or a binary neuron i, whose output is a 0 or 1 #alue, the step function can be summarised as,
112213113.doc
112213113.doc
*nother type of signum function is 1 if activation i > 0 output i = 0 if activation i = 0 1 if activation < 0 i The sigmoid transfer function produces a continuous #alue in the range 0 to 1. t has the form, 1 output i = gain . activation 1 + e
i
The parameter gain is determined by the system designer. t affects the slope of the transfer around 5ero. The multilayer perceptron uses the sigmoid as the transfer function. * #ariant of the sigmoid transfer function is the hyperbolic tangent function. t has the form,
output i
where u = gain . activationi. This function has a shape similar to the sigmoid $shaped li(e an S%, with the difference that the #alue of outputi ranges between /1 and 1.
112213113.doc
output i
output i
Step function
Signum
0 T activation i
0 activation i
output i output i
1 Sigmoid 0.8
0 71
0 activation i
activation i
Hyperbolic Tangent
112213113.doc
The error function Ep is defined to be proportional to the s=uare of the difference tpj 7 opj $1% Ep = 1>1$tpj - opj ! j The acti#ation of each unit j, for pattern p, can be written as net pj = wijopi $1%
8
112213113.doc
i The output from each unit j is determined by the non7linear transfer function fj opj = fj"netpj 4e assume fj to be the sigmoid function, f"net = 1>$1 ? e-k.net%, where k is a positi#e constant that controls the !spread! of the function. The delta rule implements weight changes that follow the path of steepest descent on a surface in weight space. The height of any point on this surface is e=ual to the error measure Ep. This can be shown by showing that the deri#ati#e of the error measure with resepect to each weight is proportional to the weight change dictated by the delta rule, with a negati#e constant of proportionality, i.e., pwi 7Ep#wij
The multilayer perceptron learning algorithm using the generalised delta rule
1. 1. 3. nitialise weights $to small random #alues% and transfer function Present input *d&ust weights by starting from output layer and wor(ing bac(wards wij$t $ %% < wij$t% ? pjopi wij$t% represents the weights from node i to node j at time t& is a gain term, and pj is an error term for pattern p on node j. "or output layer units pj = kopj"% - opj "tpj - opj "or hidden layer units pj = kopj"% - opj p'wj' ' where the sum is o#er the ' nodes in the following layer. The learning rule in a multilayer perceptron is not guaranteed to produce con#ergence, and it is possible for the networ( to fall into a situation $the so called local minima% in which it is unable to learn the correct output.
112213113.doc
line 1
line 2
;owering the gain term progressi#ely *ddition of more nodes for better representation of patterns
ntroduction of a momentum term which determines the effect of past weight changes on the current direction of the mo#ement in weight space, wij$t $ %% < wij$t% ? where momentum term 0 B B 1.
"ault Tolerance +eural networ(s are highly fault tolerant. This characteristic is also (nown as !graceful degradation!. Cecause of its distributed nature, a neural networ( (eeps on wor(ing e#en when a significant fraction of its neurons and interconnections fail. *lso, relearning after damage can be relati#ely =uic(.
112213113.doc
112213113.doc
"or many of the applications of neural networ(s, the underlying principle is that of pattern recognition. Target identification from sonar echoes has been de#eloped. 9i#en only a day of training, the net produced 100G correct identification of the target, compared to 93G scored by a Cayesian classifier. There are many commercial applications of networ(s in character recognition. )ne such system performs signature #erification on ban( che=ues. +etwor(s ha#e been applied to the problems of aircraft identification, and to terrain matching for automatic na#igation.
;arge number of iterations re=uired for learning, not suitable for real7time learning 1. +o guaranteed solution :emedies such as the !momentum term! add to computational cost )ther remedies, using estimates of transfer functions using transfer functions with easy to compute deri#ati#es using estimates of error #alues, eg., a single global error #alue for the hidden layer 3. -caling problem 3o not scale up well from small research systems to larger real systems. Coth too many and too few units slow down learning.
The =uestion one might as( at this point is 7 does an effecti#e system need to mimic nature e@actlyI
112213113.doc
10
:6"6:6+C6Ceale, :., J Kac(son, T., !+eural Computing, *n ntroduction!, Cristol , .ilger, c1990. $Contains full deri#ation of the generalised delta rule. *#ailable at Murdoch library%
112213113.doc
11