A Basic Introduction To Neural Networks
A Basic Introduction To Neural Networks
In "Neural Network Primer: Part I" by Maureen Caudill, AI Expert, Feb. 1989
ANNs are processing devices (algorithms or actual hardware) that are loosely modeled
after the neuronal structure of the mamalian cerebral cortex but on much smaller scales.
A large ANN might have hundreds or thousands of processor units, whereas a mamalian
brain has billions of neurons with a corresponding increase in magnitude of their overall
interaction and emergent behavior. Although ANN researchers are generally not
concerned with whether their networks accurately resemble biological systems, some
have. For example, researchers have accurately simulated the function of the retina and
modeled the eye rather well.
Although the mathematics involved with neural networking is not a trivial matter, a user
can rather easily gain at least an operational understanding of their structure and function.
Most ANNs contain some form of 'learning rule' which modifies the weights of the
connections according to the input patterns that it is presented with. In a sense, ANNs
learn by example as do their biological counterparts; a child learns to recognize dogs
from examples of dogs.
Although there are many different kinds of learning rules used by neural networks, this
demonstration is concerned only with one; the delta rule. The delta rule is often utilized
by the most common class of ANNs called 'backpropagational neural networks'
(BPNNs). Backpropagation is an abbreviation for the backwards propagation of error.
With the delta rule, as with other types of backpropagation, 'learning' is a supervised
process that occurs with each cycle or 'epoch' (i.e. each time the network is presented
with a new input pattern) through a forward activation flow of outputs, and the
backwards error propagation of weight adjustments. More simply, when a neural network
is initially presented with a pattern it makes a random 'guess' as to what it might be. It
then sees how far its answer was from the actual one and makes an appropriate
adjustment to its connection weights. More graphically, the process looks something like
this:
Note also, that within each hidden layer node is a sigmoidal activation function which
polarizes network activity and helps it to stablize.
Backpropagation performs a gradient descent within the solution's vector space towards a
'global minimum' along the steepest vector of the error surface. The global minimum is
that theoretical solution with the lowest possible error. The error surface itself is a
hyperparaboloid but is seldom 'smooth' as is depicted in the graphic below. Indeed, in
most problems, the solution space is quite irregular with numerous 'pits' and 'hills' which
may cause the network to settle down in a 'local minum' which is not the best overall
solution.
Since the nature of the error space can not be known a prioi, neural network analysis
often requires a large number of individual runs to determine the best solution. Most
learning rules have built-in mathematical terms to assist in this process which control the
'speed' (Beta-coefficient) and the 'momentum' of the learning. The speed of learning is
actually the rate of convergence between the current solution and the global minimum.
Momentum helps the network to overcome obstacles (local minima) in the error surface
and settle down at or near the global miniumum.
It is also possible to over-train a neural network, which means that the network has been
trained exactly to respond to only one type of input; which is much like rote
memorization. If this should happen then learning can no longer occur and the network is
refered to as having been "grandmothered" in neural network jargon. In real-world
applications this situation is not very useful since one would need a separate
grandmothered network for each new kind of input.