0% found this document useful (0 votes)
9 views

ML UNIT 2 ANN

This document provides an overview of neural networks, highlighting their biological inspiration and how they differ from conventional computers in processing information. It discusses the history of artificial neural networks, key characteristics such as learning, parallel operation, and generalization, as well as their applications in various fields. The document also explains the structure and function of artificial neurons, drawing parallels to biological neurons.

Uploaded by

syb9721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

ML UNIT 2 ANN

This document provides an overview of neural networks, highlighting their biological inspiration and how they differ from conventional computers in processing information. It discusses the history of artificial neural networks, key characteristics such as learning, parallel operation, and generalization, as well as their applications in various fields. The document also explains the structure and function of artificial neurons, drawing parallels to biological neurons.

Uploaded by

syb9721
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

MODULE 2

NEURAL NETWORKS

Introduction

The recent rise of interest in neural networks has its roots in the recognition that the
brain performs computations in a different manner than do conventional digital computers.
Computers are extremely fast and precise at executing sequences of instructions that have
been formulated for them. A human information process sing system is composed ofneurons
switching at speeds about a million times slower than computer gates. Yet, humansare more
efficient than computers at computationally complex tasks such as speech understanding.
Moreover, not only humans, but also even animals, can process visual information better than
the fastest computers.
Artificial neural systems, or neural networks (NN), are physical cellular systems, which
can acquire, store, and utilize experiential knowledge. The knowledge is in the form of stable
states or mappings embedded in networks that can be recalled in response to the presentation
cues. Neural network processing typically involves dealing with large-scale problems in
terms of dimensionality, amount of data handled, and the volume of simulation or neural
hardware processing. This large-scale approach is both essential and typical for real-life
applications. By keeping view of all these, the research community has made an effort in
designing and implementing the various neural network models for differentapplications.
Now let us formally define the basic idea of neural network:

Definition: A neural network is a computing system made up of a number of simple,


highly interconnected nodes or processing elements, which process informationby its
dynamic state response to external inputs.

Humans and Computers


Human beings are more intelligent than computers. Computers could only do logical
things well. But in case of solving cross word puzzles, vision problem, controlling an arm to
pick it up or something similar, that requires exceptionally complex techniques. Like these
problems, human beings do better than computers.
Computers are designed to carry out one instruction after another, extremely rapid,
whereas our brains work with many slower units. Whereas a computer can typically carry out
a few million operations every second, the units in the brain respond about ten per second.
However, they work on many different things at once, which computer can’t do.
The computer is a high-speed, serial machine and is used as such, compared to the slow,
highly parallel nature of the brain. Counting is an essentially serial activity, as is
adding, with the thing done one after another, and so the computer can beat the brain any time.
For vision, or speech recognition, the problem is a highly parallel one, with many different and
conflicting inputs, triggering many different and conflicting ideas and memories, and it is only
the combining of all these different factors that allow us to perform such feats, but then, our brains
are able to operate in parallel easily and so we leave the computers far behind.
The conclusion that we can reach from all of this is that the problems that we are trying
to solve are immensely parallel ones.

History of artificial neural networks

 The field of neural networks is not new. The first formal definition of a synthetic
neuron model based on the highly simplified considerations of the biological model
proposed by McCulloch and Pitts in 1943. The McCulloch-Pitts (MP) neuron model
resembles what is known as a binary logic device.
 The next major development, after the MP neuron model was proposed, occurred in
1949, when D.O. Hebb proposed a learning mechanism for the brain that become the
starting point for artificial neural networks (ANN) learning (training) algorithms. He
postulated that as the brain learns, it changes its connectivity patterns.
 The idea of learning mechanism was first incorporated in ANN by E. Rosenblatt 1958.
 By introducing the least mean squares (LMS) learning algorithm, Widrow and Hoff
developed in 1960 a model of a neuron that learned quickly and accurately. This model
was called ADALINE for ADAptive LInear NEuron. The applications ofADALINE
and its extension to MADALINE (for Many ADALINES) include pattern recognition,
weather forecasting, and adaptive controls. The monograph on learning machines by
Nils Nilsson (1965) summarized the developments of that time.
 In 1969, research in the field of ANN suffered a serious setback. Minsky and Papert
published a book on perceptrons in which they proved that single layer neural networks
have limitations in their abilities to process data, and are capable of any mapping that
is linearly separable. They pointed out, carefully applying mathematical techniques,
that are logical Exclusive-OR (XOR) function could not be realized by perceptrons.
 Further, Minsky and Papert argued that research into multi-layer neural networks would
be unproductive. Due to this pessimistic view of Minsky and Papert, the field
of ANN entered into an almost total eclipse for nearly two decades. Fortunately, Minsky
and Papert’s judgment has been disapproved; multi-layer perceptron networkscan solve
all nonlinear separable problems.
 Nevertheless, a few dedicated researchers such as Kohonen, Grossberg, Anderson and
Hopfield continued their efforts.
 The study of learning in networks of threshold elements and of the mathematical theory
of neural networks was pursued by Sun - Ichi – Amari (1972, 1977). Also Kunihiko
Fukushima developed a class of neural network architectures known as neocognitrons
in 1980.
 There have been many impressive demonstrations of ANN capabilities: a network has
been trained to convert text to phonetic representations, which were then converted to
speech by other means (Sejnowsky and Rosenberg 1987); other network can recognize
handwritten characters (Burr 1987); and a neural network based image- compression
system has been devised (Cottrell, Munro, and Zipser 1987). These all use the
backpropagation network, perhaps the most successful of the current algorithms.
Backpropagation, invented independently in three separate research efforts (Werbos
1974, Parker 1982, and Rumelhart, Hinton and Williams 1986) provides a systematic
means for training multi-layer networks, thereby overcoming limitations presented by
Minsky.

Characteristics of ANN

Artificial neural networks are biologically inspired; that is, they are composed of elements
that perform in a manner that is analogous to the most elementary functions of the biological
neuron. The important characteristics of artificial neural networks are learning from
experience, generalize from previous examples to new ones, and abstract essential
characteristics from inputs containing irrelevant data.
Learning

The NNs learn by examples. Thus, NN architectures can be ‘trained’ with known
examples of a problem before they are tested for their ‘inference’ capability on unknown
instances of the problem. They can, therefore, identify new objects previously untrained.
ANN can modify their behavior in response to their environment. Shown a set of inputs
(perhaps with desired outputs), they self-adjust to produce consistent responses. A wide
variety of training algorithms has been discussed in later units.
Parallel operation

The NNs can process information in parallel, at high speed, and in a distributed manner.

Mapping

The NNs exhibit mapping capabilities, that is, they can map input patterns to their associated
output patterns.

Generalization

The NNs possess the capability to generalize. Thus, they can predict new outcomes
from past trends. Once trained, a network’s response can be to a degree, insensitive to minor
variations in its input. This ability to see through noise and distortion to the pattern that lies
within is vital to pattern recognition in a real-world environment. It is important to note that
the ANN generalizes automatically as a result of its structure, not by using human intelligence
embedded in the form of adhoc computer programs.
Robust
The NNs are robust systems and are fault tolerant. They can, therefore, recall full
patterns from incomplete, partial or noisy patterns
Abstraction
Some ANN’s are capable of abstracting the essence of a set of inputs. i.e. they can extract
features of the given set of data, for example, convolution neural networks are used to extract
different features from images like edges, dark spots, shapes ..etc. Such networks are trained
for feature patterns based on which they can classify or cluster the given input set.

Applicability

ANN’s are not a panacea. They are clearly unsuited to such tasks as calculating the payroll.
They are preferred for a large class of pattern-recognition tasks that conventional computers do
poorly, if at all.

Applications of ANN

Neural networks are preferred when the task is related to large-amount data processing. The
following are the potential applications of neural networks:
 Classification
 Prediction
 Data Association
 Data Conceptualization
 Data Filtering
 Optimization
In addition to the above fields, neural networks can apply to the fields of Medicine,
Commercial and Engineering, etc.

The Biological motivation for ANN


ANN’s are biologically inspired; that is viewing at the organization of the brain
considering network configurations and algorithms. The human nerve system, built of cells
called neurons is of staggering complexity. It contains approximately ten thousand million
(1011) basic neurons. Each of these neurons is connected to about ten thousand (104) others.
The connection of each neuron with other neurons forms a densely network called a neural
network. These massive interconnections provide an exceptionally large computing power
and memory. The neuron accepts many inputs, which are all added up in some fashion. If
enough active inputs are received at once, then the neuron will be activated at once, then the
neuron will be activated and “fire”; if not, then the neuron will remain in its inactive, quit state.
The schematic diagram of biological neuron is shown in Fig.2.1. From a systems theory, the
neuron considered to be as a multiple-input-single-output (MISO) system as shown in Fig.2.2.

Fig. 2.1. A Schematic view of the biological neuron


Fig.2.2 Model representation of a biological neuron with multiple inputs

Human Artificial
Neuron Processing Element
Dendrites Combining Function
Cell Body Transfer Function
Axons Element Output
Synapses Weights

The soma is the body of the neuron. Attached to the soma there are long irregularly
shaped filaments, called dendrites. These nerve processes are often less than a micron in
diameter, and have complex branching shapes. The dendrites act as the connections through
which all the inputs to the neuron arrive. These cells are able to perform more complex
functions than simple addition on the inputs they receive, but considering simple summation
is a reasonable approximation.
Another type of nerve process attached to the soma is called an axon. This iselectrically
active, unlike the dendrite, and serves as the output channel of the neuron. Axonsalways appear
on output cell, but are often absent from interconnections, which have both inputs and outputs
on dendrites. The axon is a non-linear threshold device, producing a voltage pulse, called an
action potential, that last about 1 millisecond (10-3 sec) when the resting potential within the
soma rises above a certain critical threshold.
The axon terminates in a specialized contact called a synapse that couples the axon with
the dendrite of another cell. There is no direct linkage across the junction; rather, it is
temporally chemical one. The synapse releases chemicals called neurotransmitters when its
potential is raised sufficiently by the action potential. It may take the arrival of more than one
action potential before the synapse is triggered. The neurotransmitters that are released by the
synapse diffuse across the gap, any chemically activate gates on the dendrites, which, when
open, allow charged ions to flow. It is this flow of ions that alters the dendrite potential, and
provides a voltage pulse on the dendrite, which is then conducted along into the
next neuron body. Each dendrite may have many synapses acting on it, allowing massive
interconnectivity to be achieved.
Artificial Neuron
The artificial neuron is developed to mimic the first-order characteristics of the
biological neuron. In similar to the biological neuron, the artificial neuron receives many inputs
representing the output of other neurons. Each input is multiplied by a corresponding weight,
analogous to the synaptic strength. All of these weighted inputs are then summed and passed
through an activation function to determine the neuron input. This artificial neuron model is
shown in Fig.2.3.

Fig. 2.3 An artificial neuron


The mathematical model of the artificial neuron may written as
u(t) = w1x1 + w2x2 + w3x3+ ............. + wnxn

n n (2.1)
= wi xi = wi xi
i

i 0

Assuming w0 = and x0 = 1
y(t) = f [u(t)] (2.2)
where f[.] is a nonlinear function called as the activation function, the input-output function or
the transfer function. In equation (2.1) and Fig.2.3, [x0, x1, …., xn] represent the inputs, [w0,
w1, . . . . , wn] represents the corresponding synaptic weights. In vector form, we can represent
the neural inputs and the synaptic weights as
X = [x0, x1, …., xn]T , and W = [w0, w1, , wn]
Equations (2.1) and (2.2) can be represented in vector form as:
U = WX (2.3)
Y = f[U] (2.4)
The activation function f[.] is chosen as a nonlinear function to emulate the nonlinear
behavior of conduction current mechanism in biological neuron. The behavior of the artificial
neuron depends both on the weights and the activation function. Sigmoidal functions are the
commonly used activation functions in multi-layer static neural networks. Other types of
activation functions are discussed in later units.

McCulloch-Pitts Model
The McCulloch-Pitts model of the neuron is shown in Fig. 2.4(a). The inputs xi , for i= 1,2, . .
.,n are 0 or 1, depending on the absence or presence of the input impulse at instant k. The
neuron’s output signal is denoted as Y. The firing rule for this model is defined as follows
n
k
1 if wi xi T
Yk+1 = i
n 1
0 if w xk T

i i
i
where super script k = 0,1,2,. . . . , denotes the discrete – time instant, and wi is themultiplicative
weight connecting the i’th input with the neuron’s membrane. Note that wi =
+1 for excitatory synapses, wi = -1 for inhibitory synapses for this model, and T is the neuron’s
threshold value, which needs to be exceeded by the weighted sum of the signals for the neuron
to fire.
This model can perform the basic operations NOT, OR and AND, provided its weights and
thresholds are approximately selected. Any multivariable combinational function can be
implemented using either the NOT and OR, or alternatively the NOT and AND, Boolean
operations. Examples of three-input NOR and NAND gates using the McCulloch-Pitts neuron
model are shown in (Fig.2.4 (b) and Fig 2.4(c)).

Fig.2.4 McCulloch-Pitts

Keyword definitions
Action potential: The pulse of electrical potential generated across the membrane of a
neuron (or an axon) following the application of a stimulus greater than threshold value.
Axon: The output fiber of a neuron, which carries the information in the form of action
potentials to other neurons in the network.
Dendrite: The input line of the neuron that carries a temporal summation of action potentials
to the soma.
Excitatory neuron: A neuron that transmits an action potential that has excitatory (positive)
influence on the recipient nerve cells.
Inhibitory neuron: A neuron that transmits an action potential that has inhibitory (negative)
influence on the recipient nerve cells.
Lateral inhibition: The local spatial interaction where the neural activity generated by one
neuron is suppressed by the activity of its neighbors.
Latency: The time between the application of the stimulus and the peak of the resulting
action potential output.
Refractory period: The minimum time required for the axon to generate two consecutive
action potentials.
Neural state: A neuron is active if it’s firing a sequence of action potentials.
Neuron: The basic nerve cell for processing biological information.
Soma: The body of a neuron, which provides aggregation, thresholding and nonlinear
activation to dendrite inputs.
Synapse: The junction point between the axon (of a pre-synaptic neuron) and the dendrite
(of a post-synaptic neuron). This acts as a memory (storage) to the past-accumulated experience
(knowledge).
Activation Functions
Operations of Artificial Neuron
The schematic diagram of artificial neuron is shown in Fig.3.1. The artificial neuron
mainly performs two operations, one is the summing of weighted net input and the second is
passing the net input through an activation function. The activation function also called
nonlinear function and some time transfer function of artificial neuron.
The net input of jth neuron may be written as
NETj = w1x1 + w2x2 + w3x3+ .. . . + wnxn
j
(3.1
)where j is the threshold of jth neuron,

Fig. 3.1 Artificial neuron.


X = [x1 x2 . . . xn] in the input vector and W = [w1 w2 . . . wn] is the synaptic weight vector.
The NETj signal is processed by an activation function F to produce the neuron’s output signal.
OUTj = F(NETj) (3.2)
What functional form for F(.) should be selected? Can it be – square root, log, ex , x3
and so on. Mathematicians and computer scientists, however, have found that, the sigmoid (S-
shaped) function is more useful. In addition to this sigmoid function, there are number of other
function are using in artificial neural networks. They are discussed in the next section.
Types of activation functions
The behavior of the artificial neuron depends both on the synaptic weights and the
activation function. Sigmoid functions are the commonly used activation functions in multi-
layered feed forward neural networks. Neurons with sigmoid functions bear a greater
resemblance to the biological neurons than with other activation functions. The other feature
of sigmoid function is that it is differentiable, and gives a continuous values output. Some of
the popular activation functions are described below along with their other characteristics.
1. Sigmoid function ( Unipolar sigmoid) . The characteristics of this function is shown in
Fig.3.2 and its mathematical description is
1
y(x) = f(x) = (3.1)
x
1 e
and its range of signal is 0<y<1.
The derivative of the above function is
written as
y' (x) = f ' (x) = f(x) (1-f(x)) (3.2)

Fig.3.2 A sigmoid (S-shaped) function


Moreover, sigmoid functions are continuous and monatonic, and remain finite even as x

network training.
2. Hyperbolic tangent (bipolar sigmoid) function. The characteristics of this function is
shown in Fig.3.3 and its mathematical description is

y(x)=f(x)= tanh( x) ex e x (3.3)


ex e x
range of signal is -1<y<1 and its
derivative can be obtained as
= f’(x) = 1-[f(x)]2 (3.4)

Fig.3.3 A hyperbolic tangent (bipolar


sigmoid) function
3. Radial basis function: The Gaussian function is the most commonly used “ radially
symmetric” function, the characteristics of this function is shown in Fig.3.4 and its
mathematical description is
 x2
y = f(x) = exp( ) (3.5)
2
Range of signal is 0<y<1 and its
derivative can be obtained as
Fig.3.4. A Gaussian function
 x2
y’= f’(x) = -x exp( ) (3.6)

2
The function has maximum response, f(x) =1, when the input is x=0, and the response
decreases to f(x)=0 as the
4. Hard Limiter: The hard limiter function is the mostly used in classification of patterns,
the characteristics of this function is shown in Fig.3.5 and its mathematical description is
1, 0
f(u(t))= sign (u(t)) = (3.7)
-1 u(t) 0

Fig. 3.5 Hard Limiter


This function is not differentiable. Therefore it cannot be used for continuation type of
applications.
5. Piecewise linear: The piecewise linear function characteristics is shown in Fig.3.6 and
its mathematical description is
1 if gu
f (u(t)) u if gu (3.8)
g
-1 if gu -1

Fig. 3.6 Piecewise


linear
6. Linear: The Linear function characteristics is shown in Fig.3.8 and its mathematical
description is
f(u(t)) = gu(t) (3.10)

Fig. 3.8 Linear


function
It is differentiable and is mostly used for output nodes of the networks.

Selection of activation function


The selection of an activation function is depends upon the application to which the
neural network used and also the level (in which layer) neuron. The activation functions that
are mainly used are the sigmoid (unipolar sigmoidal), the hyperbolic tangent (bipolar sigmoid),
radial basis function, hard limiter and linear functions. The sigmoid and hyperbolictangent
functions perform well for the prediction and the process-forecasting types of problems.
However, they do not perform as well for classification networks. Instead, the radial basis
function proves more effective for those networks, and highly recommended function for any
problems involving fault diagnosis and feature categorization. The hard limiter suits well for
classification problems. The linear function may be used at output layer in feed forward
networks.

Classification of Artificial Neural Networks

Introduction
The development of artificial neuron based on the understanding of the biological
neural structure and learning mechanisms for required applications. This can be summarized
as (a) development of neural models based on the understanding of biological neurons, (b)
models of synaptic connections and structures, such as network topology and (c) the learning
rules. Researchers are explored different neural network architectures and used for various
applications. Therefore the classification of artificial neural networks (ANN) can be done based
on structures and based on type of data.
A single neuron can perform a simple pattern classification, but the power of neural
computation comes from neuron connecting networks. The basic definition of artificial neural
networks as physical cellular networks that are able to acquire, store and utilize experimental
knowledge has been related to the network’s capabilities and performance. The simplest
network is a group of neurons are arranged in a layer. This configuration is knownas single
layer neural networks. There are two types of single layer networks namely, feed- forward and
feedback networks. The single linear neural (that is activation function is linear) network will
have very limited capabilities in solving nonlinear problems, such as classification etc., because
their decision boundaries are linear. This can be made little more complex by selecting
nonlinear neuron (that is activation function is nonlinear) in single layer neural network. The
nonlinear classifiers will have complex shaped decision boundaries, which can solve complex
problems. Even nonlinear neuron single layer networks will have limitations in classifying
more close nonlinear classifications and fine control problems. In recent studies shows that the
nonlinear neural networks in multi-layer structures can simulate more complicated systems,
achieve smooth control, complex classifications and have capabilities beyond those of single
layer networks. In this unit we discuss first classifications, single layer neural networks and
multi-layer neural networks. The structure of a neural network refers to how its neurons are
interconnected.
4.2.0 Applications
Having different types artificial neural networks, these networks can be used to broad classes
of applications, such as (i) Pattern Recognition and Classification, (ii) Image Processing and
Vision, (iii) System Identification and Control, and (iv) Signal Processing.The details of
suitability of networks as follows:
(i) Pattern Recognition and Classification: Almost all networks can be used to solve
these types of problems.
(ii) Image Processing and Vision: The following networks are used for the
applications in this area: Static single layer networks, Dynamic single layer
networks, BAM, ART, Counter – propagation networks, First – Order dynamic
networks.
(iii) System Identification and Control: The following networks are used for the
applications in this area: Static multi layer networks, Dynamic multi layer networks
of types time-delay and Second-Order dynamic networks.
(iv) Signal Processing: The following networks are used for the applications in this
area: Static multi layer networks of type RBF, Dynamic multi layer networks of
types Cellular and Second-Order dynamic networks.
4.3.0 Single Layer Artificial Neural Networks
The simplest network is a group of neuron arranged in a layer. This configuration is
known as single layer neural networks. This type of network comprises of two layers, namely
the input layer and the output layer. The input layer neurons receive the input signals and the
output layer neurons receive the output signals. The synaptic links carrying the weights connect
every input neuron to the output neuron but not vice-versa. Such a network is said to'be
feedforward in type or acyclic in nature. Despite the two layers, the network is termed single
layer, since it is the output layer alone Which performs computation. The input layer merely
transmits the signals to the output layer. Hence, the name single layer feedforward network.
Figure 4.2 illustrates an example network.
There are two types of single layer networks namely, feed-forward and feedback
networks.
4.3.1 Feed forward single layer neural network
Consider m numbers of neurons are arranged in a layer structure and each neuron
receiving n inputs as shown in Fig.4.2.
Output and input vectors are respectively

O = [o1 o2 . . . om]T (4.1)

X = [x1 x2 . . . xn]T

Weight wji connects the jth neuron with the ith input. Then the activation value for jth neuron
as

n for j = 1,2, … m (4.2)


netj = wjixi
i

The following nonlinear transformation involving the activation function f(netj), for j=1,2,. .
.m, completes the processing of X. The transformation will be done by each of the m neurons
in the network.

oj = f( Wjt X ), for j = 1, 2, . . . m (4.3)


where weight vector wj contains weights leading toward the jth output node and is defined as
follows
Wj = [ wj1 wj2 . . . wjn] (4.4)

X
W

(b) Block diagram of single layer network

Fig. 4.2 Feed forward single layer neural network

Introducing the nonlinear matrix operator F, the mapping of input space X to output space O
implemented by the network can be written as
O = F (W X) (4.5a)
Where W is the weight matrix and also known as connection matrix and is represented as
w1 . . w1n
w11 . . w
2w
w
22 2n
21
W= . . . . . (4.5b)
. . . .
.
.

wm1 wm . . wmn
2

i ialized and it should be finalized


The
n through appropriatetraining method.
weight
i
matrix
t
will be
The nonlinear activation function f(.) on the diagonal of the matrix operator F(.) operates
component-wise on the activation values net of each neuron. Each activation value is, in turn,
a scalar product of an input with the respective weight vector, X is called input vector and O is
called output vector. The mapping of an input to an output is shown as in (4.5) is of the feed-
forward and instantaneous type, since it involves no delay between the input and theoutput .
Therefore the relation (4.5a) may be written in terms of time t as
O (t) = F (W X(t)) (4.6)
This type of networks can be connected in cascade to create a multiplayer network. Though
there is no feedback in the feedforward network while mapping from input X(t) to output O(t),
the output values are compared with the “teachers” information, which provides the desired
output values. The error signal is used for adapting the network weights. The details about will
be discussed in later units.
Example: To illustrate the computation of output O(t), of the single layer feed forward network
consider an input vector X(t) and a network weight matrix W (say initializedweights), given
below. Consider the neurons uses the hard limiter as its activation function.
0 1
1 0 2

1 1 0
X = [-1 1 -1]T W=
0 3

The out vector may be obtained from the (4.5) as


O = F (W X) = [ sgn(-1-1) sgn(+1+2) sgn(1) sgn(-1+3) ]
= [ -1 1 1 1]
The output vector of the above single layer feedforward network is = [ -1 1 1 1].

4.4 Types of connections


There are three different options are available for connecting nodes to one
another, as shown in Fig. 4.5 They are:
Intralayer connection: The output from a node fed into other nodes in the same layer.
Interlayer connection: The output from a node in one layer fed into nodes in other layer.
Recurrent connection: The outputs from a node fed into itself.
Fig. 4.5 Different connection option of Neural Networks
In general, when building a neural network its structure will be specified. In engineering
applications, suitable and mostly preferred structure is the interlayer connection type topology.
Within the interlayer connections, there are two more options: (i) feed forwardconnections and
(ii) feedback connections, as shown in Fig. 4.6.

Fig. 4.6 Feed forward and feed back connections of Neural Networks
4.4.0 Multi Layer Artificial Neural Networks
Cascading a group of single layers networks can form the feed forward neural network.
This type networks also known as feed forward multi layer neural network. In which, the output
of one layer provides an input to the subsequent layer. The input layer gets input from outside;
the output of input layer is connected to the first hidden layer as input.The output layer
receives its input from the last hidden layer. The multi layer neural network provides no
increase in computational power over single layer neural networks unless there is a nonlinear
activation function between layers. Therefore, due to nonlinear activation function of each
neuron in hidden layers, the multi layer neural networks able to solve many of the complex
problems, such as, nonlinear function approximation, learning generalization, nonlinear
classification etc.
A multi layer neural network consists of input layer, output layer and hidden layers.
The number of nodes in input layer depends on the number of inputs and the number of nodes
in the output layer depends upon the number of outputs. The designer selects the number of
hidden layers and neurons in respective layers. According to the Kolmogorov’s theorem single
hidden layer is sufficient to map any complicated input – output mapping.
Fig. 4.7. Multilayer feedforward network
Recurrent Networks
These networks differ from feedforward network architectures in the sense that there
is atleast one feedback loop. Thus, in these networks, for example, there could exist .one
layer with feedback connections as shown in Fig. 4.8. There could also be neurons with self-
feedback links, i.e. the output of a neuron is fed back into itself as input.
The idea behind RNNs is to make use of sequential information. In a traditional
neural network we assume that all inputs (and outputs) are independent of each other. But for
many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better
know which words came before it. RNNs are called recurrent because they perform thesame
task for every element of a sequence, with the output being depended on the previous
computations. Another way to think about RNNs is that they have a “memory” which captures
information about what has been calculated so far. In theory RNNs can make use of information
in arbitrarily long sequences, but in practice they are limited to looking back onlya few steps.

Fig. 4.8. A recurrent neural network.


Training Methods of Artificial Neural Networks

Introduction
The dynamics of neuron consists of two parts. One is the dynamics of the
activation state and the second one is the dynamics of the synaptic weights. The Short Term
Memory (STM) in neural networks is modeled by the activation state of the network and the
Long Term Memory is encoded the information in the synaptic weights due to learning. The
main property of artificial neural network is that, the ability of the learning from its
environment and history. The network learns about its environment and history through its
interactive process of adjustment applied to its synaptic weights and bias levels. Generally,
the network becomes more knowledgeable about its environment and history, after completion
each iteration of learning process. It is important to distinguish between representation and
learning. Representation refers to the ability of a perceptron (or other network) to simulate a
specified function. Learning requires the existence of a systematic procedure for adjusting the
network weights to produce that function. Here we will discuss most of popular learning rules.

Definition of learning
There are too many activities associated with the notion of learning and we define
learning in the context of neural networks [1] as
“Learning is a process by which the free parameters of neural network are adapted
through a process of stimulation by the environment in which the network is embedded. The
type of learning is determined by the manner in which the parameter changes takes place”

Based on the above definition the learning process of ANN can be divided into the following
sequence of steps:
1. The ANN is stimulated by an environment.
2. The ANN undergoes changes in its free parameters as a result of the above
stimulation.
3. The ANN responds in a new way to the environment because of the changes that have
occurred in its internal structure.
PERCEPTRONS

8.0.0 Introduction

We know that perceptron is one of the early models of artificial neuron. It was proposed by
Rosenblatt in 1958. It is a single layer neural network whose weights and biases could be trained to
produce a correct target vector when presented with the corresponding input vector.The perceptron is a
program that learns concepts, i.e. it can learn to respond with True (1) or False (0) for inputs we present
to it, by repeatedly "studying" examples presented to it. The training technique used is called the
perceptron learning rule. The perceptron generated greatinterest due to its ability to generalize from its
training vectors and work with randomly distributed connections. Perceptrons are especially suited for
simple problems in pattern classification. In this also we give the perceptron convergence theorem.

WHAT IS A PERCEPTRON?

A perceptron is a binary classification algorithm modeled after the functioning of the human brain—
it was intended to emulate the neuron. The perceptron, while it has a simple structure, has the ability
to learn and solve very complex problems.

What is Multilayer Perceptron?

A multilayer perceptron (MLP) is a group of perceptrons, organized in multiple layers, that can
accurately answer complex questions. Each perceptron in the first layer (on the left) sends signals to
all the perceptrons in the second layer, and so on. An MLP contains an input layer, at least one hidden
layer, and an output layer.
The perceptron learns as follows:

1. Takes the inputs which are fed into the perceptrons in the input layer, multiplies them by their
weights, and computes the sum.
2. Adds the number one, multiplied by a “bias weight”. This is a technical step that makes it possible
to move the output function of each perceptron (the activation function) up, down, left and right
on the number graph.
3. Feeds the sum through the activation function—in a simple perceptron system, the activation
function is a step function.
4. The result of the step function is the output.

A multilayer perceptron is quite similar to a modern neural network. By adding a few ingredients, the
perceptron architecture becomes a full-fledged deep learning system:
Activation functions and other hyperparameters: a full neural network uses a variety of
activation functions which output real values, not boolean values like in the classic perceptron.
It is more flexible in terms of other details of the learning process, such as the number of training
iterations (iterations and epochs), weight initialization schemes, regularization, and so on. All
these can be tuned as hyperparameters.
Backpropagation: a full neural network uses the backpropagation algorithm, to perform
iterative backward passes which try to find the optimal values of perceptron weights, to
generate the most accurate prediction.
Advanced architectures: full neural networks can have a variety of architectures that can
help solve specific problems. A few examples are Recurrent Neural Networks (RNN),
Convolutional Neural Networks (CNN), and Generative Adversarial Networks (GAN).

WHAT IS BACKPROPAGATION AND WHY IS IT IMPORTANT?


After a neural network is defined with initial weights, and a forward pass is performed to generate the
initial prediction, there is an error function which defines how far away the model is from the true
prediction. There are many possible algorithms that can minimize the error function—for example, one
could do a brute force search to find the weights that generate the smallest error. However, for large
neural networks, a training algorithm is needed that is very computationally efficient. Backpropagation
is that algorithm—it can discover the optimal weights relatively quickly, even for a network with
millions of weights.

HOW BACKPROPAGATION WORKS?


1. Forward pass—weights are initialized and inputs from the training set are fed into the
network. The forward pass is carried out and the model generates its initial prediction.
2. Error function—the error function is computed by checking how far away the prediction is from
the known true value.
3. Backpropagation with gradient descent—the backpropagation algorithm calculates how much
the output values are affected by each of the weights in the model. To do this, it calculates partial
derivatives, going back from the error function to a specific neuron and its weight. This provides
complete traceability from total errors, back to a specific weight which contributed to that error.
The result of backpropagation is a set of weights that minimize the error function.
4. Weight update—weights can be updated after every sample in the training set, but this is usually
not practical. Typically, a batch of samples is run in one big forward pass, and then
backpropagation performed on the aggregate result. The batch size and number of batches used in
training, called iterations, are important hyperparameters that are tuned to get the best results.
Running the entire training set through the backpropagation process is called an epoch.
Training algorithm of BPNN:

1. Inputs X, arrive through the pre connected path


2. Input is modeled using real weights W. The weights are usually randomly selected.
3. Calculate the output for every neuron from the input layer, to the hidden layers, to the
output layer.
4. Calculate the error in the outputs

ErrorB= Actual Output – Desired Output

5. Travel back from the output layer to the hidden layer to adjust the weights such that the error
is decreased.

Keep repeating the process until the desired output is achieved

Architecture of back propagation network:

As shown in the diagram, the architecture of BPN has three interconnected layers having weights on
them. The hidden layer as well as the output layer also has bias, whose weight is always 1, on them. As
is clear from the diagram, the working of BPN is in two phases. One phase sends the signal from the
input layer to the output layer, and the other phase back propagates the error from the output layer to the
input layer.
CONVERGENCE AND LOCAL MINIMA
Backpropagation is only guaranteed to converge to a local, and not a global, minima. However, since each weight
in a network essentially corresponds to a different dimension in the error space, a local minimum with respect to
one weight may not be a local minimum with respect to other weights. This can provide an “escape route” from
becoming trapped in local minima. If the weights are initialized to values close to zero, the sigmoid threshold
function is approximately linear and so they produce linear outputs. As the weights grow, though, the network is
able to represent more complex functions that are not linear in nature. It is the hope that by the time the weights
are able to approximate the desired function that they will be close enough to the global minimum that even
becoming stuck in a local minima will be acceptable. Common heuristic methods to reduce the problem of local
minima are: • Add a momentum term to the weight-update rule 18 • Use stochastic gradient descent rather than
true gradient descent • Train multiple networks using the same training data but initialize the networks with
different random weights. If the different networks lead to different local minima, choose the network that
performs best on a validation set of data or all networks can be kept and treated as a committee whose output is
the (possibly weighted) average of individual network outputs.
A multilayer perceptron is quite similar to a modern neural network. By adding a few ingredients, the
perceptron architecture becomes a full-fledged deep learning system:
Activation functions and other hyperparameters: a full neural network uses a variety of
activation functions which output real values, not boolean values like in the classic perceptron.
It is more flexible in terms of other details of the learning process, such as the number of training
iterations (iterations and epochs), weight initialization schemes, regularization, and so on. All
these can be tuned as hyperparameters.
Backpropagation: a full neural network uses the backpropagation algorithm, to perform
iterative backward passes which try to find the optimal values of perceptron weights, to
generate the most accurate prediction.
Advanced architectures: full neural networks can have a variety of architectures that can
help solve specific problems. A few examples are Recurrent Neural Networks (RNN),
Convolutional Neural Networks (CNN), and Generative Adversarial Networks (GAN).

WHAT IS BACKPROPAGATION AND WHY IS IT IMPORTANT?


After a neural network is defined with initial weights, and a forward pass is performed to generate the
initial prediction, there is an error function which defines how far away the model is from the true
prediction. There are many possible algorithms that can minimize the error function—for example, one
could do a brute force search to find the weights that generate the smallest error. However, for large
neural networks, a training algorithm is needed that is very computationally efficient. Backpropagation
is that algorithm—it can discover the optimal weights relatively quickly, even for a network with
millions of weights.

HOW BACKPROPAGATION WORKS?


5. Forward pass—weights are initialized and inputs from the training set are fed into the
network. The forward pass is carried out and the model generates its initial prediction.
6. Error function—the error function is computed by checking how far away the prediction is from
the known true value.
7. Backpropagation with gradient descent—the backpropagation algorithm calculates how much
the output values are affected by each of the weights in the model. To do this, it calculates partial
derivatives, going back from the error function to a specific neuron and its weight. This provides
complete traceability from total errors, back to a specific weight which contributed to that error. The
result of backpropagation is a set of weights that minimize the error function.
8. Weight update—weights can be updated after every sample in the training set, but this is usually
not practical. Typically, a batch of samples is run in one big forward pass, and then
backpropagation performed on the aggregate result. The batch size and number of batches used in
training, called iterations, are important hyperparameters that are tuned to get the best results.
Running the entire training set through the backpropagation process is called an epoch.
Training algorithm of BPNN:

1. Inputs X, arrive through the pre connected path


2. Input is modeled using real weights W. The weights are usually randomly selected.
3. Calculate the output for every neuron from the input layer, to the hidden layers, to the output
layer.
4. Calculate the error in the outputs

ErrorB= Actual Output – Desired Output

5. Travel back from the output layer to the hidden layer to adjust the weights such that the error is
decreased.

Keep repeating the process until the desired output is achieved

Architecture of back propagation network:

As shown in the diagram, the architecture of BPN has three interconnected layers having weights on
them. The hidden layer as well as the output layer also has bias, whose weight is always 1, on them. As
is clear from the diagram, the working of BPN is in two phases. One phase sends the signal from the
input layer to the output layer, and the other phase back propagates the error from the output layer to the
input layer.
HIDDEN LAYER REPRESENTATION IN BACK PROPOGATION

The final values at the hidden neurons, colored in green, are computed using z^l — weighted inputs in

layer l, and a^l— activations in layer l. For layer 2 and 3 the equations are:

 l=2

Equations for z² and a²

 l=3

Equations for z³ and a³

W² and W³ are the weights in layer 2 and 3 while b² and b³ are the biases in those layers.

Activations a² and a³ are computed using an activation function f. Typically, this function f is non-

linear (e.g. sigmoid, ReLU, tanh) and allows the network to learn complex patterns in data. We won’t go

over the details of how activation functions work, but, if interested, I strongly recommend reading this

great article.

Looking carefully, you can see that all of x, z², a², z³, a³, W¹, W², b¹ and b² are missing their subscripts

presented in the 4-layer network illustration above. The reason is that we have combined all parameter

values in matrices, grouped by layers. This is the standard way of working with neural networks and

one should be comfortable with the calculations. However, I will go over the equations to clear out any

confusion.
Let’s pick layer 2 and its parameters as an example. The same operations can be applied to any layer in the

network.

 W¹ is a weight matrix of shape (n, m) where n is the number of output neurons (neurons in the next

layer) and m is the number of input neurons (neurons in the previous layer). For us, n = 2 and m = 4.

Equation for W¹

NB: The first number in any weight’s subscript matches the index of the neuron in the next layer (in

our case this is the Hidden_2 layer) and the second number matches the index of the neuron in

previous layer (in our case this is the Input layer).

 x is the input vector of shape (m, 1) where m is the number of input neurons. For us, m = 4.

Equation for x

 b¹ is a bias vector of shape (n , 1) where n is the number of neurons in the current layer. For us, n = 2.

Equation for b¹

Following the equation for z², we can use the above definitions of W¹, x and b¹ to derive “Equation for z²”:
Equation for z²

Now carefully observe the neural network illustration from above.

Input and Hidden_1 layers

You will see that z² can be expressed using (z_1)² and (z_2)² where (z_1)² and (z_2)² are the sums of the

multiplication between every input x_i with the corresponding weight (W_ij)¹.

This leads to the same “Equation for z²” and proofs that the matrix representations for z², a², z³ and a³ are

correct.

REMARKS ON THE BACK PROPOGATION ALGORITHM

Why We Need Backpropagation?


Most prominent advantages of Backpropagation are:

 Backpropagation is fast, simple and easy to program


 It has no parameters to tune apart from the numbers of input
 It is a flexible method as it does not require prior knowledge about the network
 It is a standard method that generally works well
 It does not need any special mention of the features of the function to be learned.

(I)Convergence and local minima


Backpropagation is a multi-layer algorithm. In multi-layer neural networks, it can go back and
change the weights.

All neurons are interconnected to each other and they converge at a point so that the information
is passed onto every neuron in the network.

Using the backpropagation algorithm we are minimizing the errors by modifying the weights.
This minimization of errors can be done only locally but not globally.

The Representational Power of the Feed-Forward Networks


The representation power is how effectively you are representing the neural network. It depends
on the depth and width of the network.

We use boolean, continuous, and arbitrary functions in order to represent the network.
(ii)Hypothesis Space search and Inductive Bias

(iii)Inductive Bias
(iv)Hypothesis Space search

(v)Hidden Layer Representation


By using backpropagation algorithm one can define new features in the hidden layer which are
not explicitly represented in the input.

You might also like