0% found this document useful (0 votes)
196 views11 pages

The Multilayer Perceptron

The document summarizes the artificial neuron model and multilayer perceptrons. It discusses how artificial neurons operate by weighting inputs and applying a transfer function to calculate the output. Multilayer perceptrons connect neurons in layers to solve nonlinearly separable problems. They use continuous transfer functions like sigmoid and learn through backpropagation of errors between layers. Perceptrons can represent complex decision regions and are fault tolerant due to their distributed nature. Applications discussed include speech synthesis, financial analysis, and more.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
196 views11 pages

The Multilayer Perceptron

The document summarizes the artificial neuron model and multilayer perceptrons. It discusses how artificial neurons operate by weighting inputs and applying a transfer function to calculate the output. Multilayer perceptrons connect neurons in layers to solve nonlinearly separable problems. They use continuous transfer functions like sigmoid and learn through backpropagation of errors between layers. Perceptrons can represent complex decision regions and are fault tolerant due to their distributed nature. Applications discussed include speech synthesis, financial analysis, and more.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 11

The artificial neuron revisited The synthetic or artificial neuron, which is a simple model of the biological neuron, was

first proposed in 1943 by McCulloch and Pitts. t consists of a summing function with an internal threshold, and !weighted! inputs as shown below.

"or a neuron recei#ing n inputs, each input xi $ i ranging from 1 to n% is weighted by multiplying it with a weight wi . The sum of the wixi products gi#es the net acti#ation of the neuron. This acti#ation #alue is sub&ected to a transfer function to produce the neuron's output. The weight #alue of the connection or lin( carrying signals from a neuron i to a neuron j is termed wij..
wij j

3irection of signal flow

Transfer functions )ne of the design issues for *++s is the type of transfer function used to compute the output of a node from its net acti#ation. *mong the popular transfer functions are, -tep function -ignum function -igmoid function .yperbolic tangent function n the step function, the neuron produces an output only when its net acti#ation reaches a minimum #alue / (nown as the threshold. "or a binary neuron i, whose output is a 0 or 1 #alue, the step function can be summarised as,

0 if activationi T outputi = 1 if activationi > T

112213113.doc

4hen the threshold T is 0, the step function is called signum function.

112213113.doc

*nother type of signum function is 1 if activation i > 0 output i = 0 if activation i = 0 1 if activation < 0 i The sigmoid transfer function produces a continuous #alue in the range 0 to 1. t has the form, 1 output i = gain . activation 1 + e
i

The parameter gain is determined by the system designer. t affects the slope of the transfer around 5ero. The multilayer perceptron uses the sigmoid as the transfer function. * #ariant of the sigmoid transfer function is the hyperbolic tangent function. t has the form,

output i

e activationi e activationi = activationi activationi e +e

where u = gain . activationi. This function has a shape similar to the sigmoid $shaped li(e an S%, with the difference that the #alue of outputi ranges between /1 and 1.

112213113.doc

output i

output i

Step function

Signum

0 T activation i

0 activation i

output i output i

1 Sigmoid 0.8

0 71

0 activation i

activation i

Figure "unctional form of transfer functions

Hyperbolic Tangent

Introduction to the multilayer perceptron


To be able to sol#e nonlinearly separable problems, a number of neurons are connected in layers to build a multilayer perceptron. 6ach of the perceptrons is used to identify small linearly separable sections of the inputs. )utputs of the perceptrons are combined into another perceptron to produce the final output. The hard7limiting $step% function used for producing the output pre#ents information on the real inputs flowing on to inner neurons. To sol#e this problem, the step function is replaced with a continuous function7 usually the sigmoid function.

112213113.doc

The *rchitecture of the Multilayer Perceptron


n a multilayer perceptron, the neurons are arranged into an input layer, an output layer and one or more hidden layers.

The 9eneralised 3elta :ule


The learning rule for the multilayer perceptron is (nown as !the generalised delta rule! or the !bac(propagation rule!. The generalised delta rule repetiti#ely calculates an error function for each input and bac(propagates the error from one layer to the pre#ious one. The weights for a particular node are ad&usted in direct proportion to the error in the units to which it is connected. ;et Ep tpj opj wij = = < < error function for pattern p target output for pattern p on node j actual output for pattern p on node j weight from node i to node j

The error function Ep is defined to be proportional to the s=uare of the difference tpj 7 opj $1% Ep = 1>1$tpj - opj ! j The acti#ation of each unit j, for pattern p, can be written as net pj = wijopi $1%
8

112213113.doc

i The output from each unit j is determined by the non7linear transfer function fj opj = fj"netpj 4e assume fj to be the sigmoid function, f"net = 1>$1 ? e-k.net%, where k is a positi#e constant that controls the !spread! of the function. The delta rule implements weight changes that follow the path of steepest descent on a surface in weight space. The height of any point on this surface is e=ual to the error measure Ep. This can be shown by showing that the deri#ati#e of the error measure with resepect to each weight is proportional to the weight change dictated by the delta rule, with a negati#e constant of proportionality, i.e., pwi 7Ep#wij

The multilayer perceptron learning algorithm using the generalised delta rule
1. 1. 3. nitialise weights $to small random #alues% and transfer function Present input *d&ust weights by starting from output layer and wor(ing bac(wards wij$t $ %% < wij$t% ? pjopi wij$t% represents the weights from node i to node j at time t& is a gain term, and pj is an error term for pattern p on node j. "or output layer units pj = kopj"% - opj "tpj - opj "or hidden layer units pj = kopj"% - opj p'wj' ' where the sum is o#er the ' nodes in the following layer. The learning rule in a multilayer perceptron is not guaranteed to produce con#ergence, and it is possible for the networ( to fall into a situation $the so called local minima% in which it is unable to learn the correct output.

112213113.doc

Multilayer Perceptrons as Classifiers


The single layer perceptron is limited to calculating a single line of separation between classes. ;et us consider a two layer perceptron with two units in the input layer. f one unit is set to respond with a 1 if the input is abo#e its decision line, and the other responds with a 1 if the input is below its decision line, the second layer produces a solution in the form of a 1 if its input is abo#e line 1 and below line 1.

line 1

line 2

Fig. A 2-layer perceptron and the resluting decesion region.


* three layer perceptron can therefore produce arbitrarily shaped decision regions, and are capable of separating any classes. This statement is referred to as the (olmogorov theorem. Considering pattern recognition as a mapping function from un(nown inputs to (nown classes, any function, no matter how comple@, can be represented by a multilayer perceptron of no more than three layers.

The energy landscape


The beha#iour of a neural networ( as it attempts to arri#e at a solution can be #isualised in terms of the error or energy function Ep. The energy is a function of the input and the weights. "or a gi#en pattern, Ep can be plotted against the weights to gi#e the so called energy surface. The energy surface is a landscape of hills and #alleys, with points of minimum energy corresponding to wells and ma@imum energy found on pea(s. The generalised delta rule aims to minimise Ep by ad&usting weights so that they correspond to points of lowest energy. t does this by the method of gradient descent where the changes are made in the steepest downward direction. *ll possible solutions are depressions in the energy surface, (nown as basins of attraction.
112213113.doc A

;earning 3ifficulties in Multilayer Perceptrons


)ccasionally, the multilayer perceptron fails to settle into the global minimum of the energy surface and instead find itself in one of the local minima. This is due to the gradient descent strategy followed. * number of alternati#e approaches can be ta(en to reduce this possibility,

;owering the gain term progressi#ely *ddition of more nodes for better representation of patterns

ntroduction of a momentum term which determines the effect of past weight changes on the current direction of the mo#ement in weight space, wij$t $ %% < wij$t% ? where momentum term 0 B B 1.

pjopi ? "wij$t% 7 wij$t - %%%

*ddition of random noise to perturb a system out of a local minima.

Advantages of Multilayer Perceptrons


The following two features characterise multilayer perceptrons and artificial neural networ(s in general. They are mainly responsible for the !edge! these networ(s ha#e o#er con#entional computing systems. 9eneralisation +eural networ(s are capable of generalisation, that is, they classify an un(nown pattern with other (nown patterns that share the same distinguishing features. This means noisy or incomplete inputs will be classified because of their similarity with pure and complete inputs.

"ault Tolerance +eural networ(s are highly fault tolerant. This characteristic is also (nown as !graceful degradation!. Cecause of its distributed nature, a neural networ( (eeps on wor(ing e#en when a significant fraction of its neurons and interconnections fail. *lso, relearning after damage can be relati#ely =uic(.

112213113.doc

Applications of Multilayer Perceptrons


The multilayer perceptron with bac(propagation has been applied in numerous applications ranging from )C: $)ptical Character :ecognition% to medicine. Crief accounts of a few are gi#en below. -peech synthesis * #ery well (nown use of the multilayer perceptron is +6Ttal( E1F, a te@t7to7speech con#ersion system, de#eloped by -e&nows(i and :osenberg in 19DA. t consists of 103 input units, 110 hidden units, and 12 output units with o#er 1A000 synapses. 6ach output unit represents one basic unit of sound, (nown as a phoneme. Conte@t is utilised in training by presenting se#en successi#e letters to the input and the net learns to pronounce the middle letter. 90G correct pronunciation achie#ed with the training set $D07DAG with unseen set%. :esistant to damage and displays graceful degradation. Multilayer perceptrons are also being used for speech recognition to be used in #oice acti#ated control systems. "inancial applications 6@amples include bond rating, loan application e#aluation and stoc( mar(et prediction. Cond rating in#ol#es categorising the bond issuerHs capability. There is no hard and fast rules for determining these ratings. -tatistical regression is inappropriate because the factors to be used are not well defined. +eural networ(s trained with bac(propagation has consistently outperformed standard statistical techni=ues E1F. Pattern :ecognition

112213113.doc

"or many of the applications of neural networ(s, the underlying principle is that of pattern recognition. Target identification from sonar echoes has been de#eloped. 9i#en only a day of training, the net produced 100G correct identification of the target, compared to 93G scored by a Cayesian classifier. There are many commercial applications of networ(s in character recognition. )ne such system performs signature #erification on ban( che=ues. +etwor(s ha#e been applied to the problems of aircraft identification, and to terrain matching for automatic na#igation.

;imitations of Multilayer Perceptrons


1. Computationally e@pensi#e learning process

;arge number of iterations re=uired for learning, not suitable for real7time learning 1. +o guaranteed solution :emedies such as the !momentum term! add to computational cost )ther remedies, using estimates of transfer functions using transfer functions with easy to compute deri#ati#es using estimates of error #alues, eg., a single global error #alue for the hidden layer 3. -caling problem 3o not scale up well from small research systems to larger real systems. Coth too many and too few units slow down learning.

Ciological arguments against Cac(propagation


Cac(propagation not used or used through different pathways in biological systems.

Ciological systems use only local information for self7ad&ustments.

The =uestion one might as( at this point is 7 does an effecti#e system need to mimic nature e@actlyI

112213113.doc

10

:6"6:6+C6Ceale, :., J Kac(son, T., !+eural Computing, *n ntroduction!, Cristol , .ilger, c1990. $Contains full deri#ation of the generalised delta rule. *#ailable at Murdoch library%

112213113.doc

11

You might also like