ML UNIT 2 ANN
ML UNIT 2 ANN
NEURAL NETWORKS
Introduction
The recent rise of interest in neural networks has its roots in the recognition that the
brain performs computations in a different manner than do conventional digital computers.
Computers are extremely fast and precise at executing sequences of instructions that have
been formulated for them. A human information process sing system is composed ofneurons
switching at speeds about a million times slower than computer gates. Yet, humansare more
efficient than computers at computationally complex tasks such as speech understanding.
Moreover, not only humans, but also even animals, can process visual information better than
the fastest computers.
Artificial neural systems, or neural networks (NN), are physical cellular systems, which
can acquire, store, and utilize experiential knowledge. The knowledge is in the form of stable
states or mappings embedded in networks that can be recalled in response to the presentation
cues. Neural network processing typically involves dealing with large-scale problems in
terms of dimensionality, amount of data handled, and the volume of simulation or neural
hardware processing. This large-scale approach is both essential and typical for real-life
applications. By keeping view of all these, the research community has made an effort in
designing and implementing the various neural network models for differentapplications.
Now let us formally define the basic idea of neural network:
The field of neural networks is not new. The first formal definition of a synthetic
neuron model based on the highly simplified considerations of the biological model
proposed by McCulloch and Pitts in 1943. The McCulloch-Pitts (MP) neuron model
resembles what is known as a binary logic device.
The next major development, after the MP neuron model was proposed, occurred in
1949, when D.O. Hebb proposed a learning mechanism for the brain that become the
starting point for artificial neural networks (ANN) learning (training) algorithms. He
postulated that as the brain learns, it changes its connectivity patterns.
The idea of learning mechanism was first incorporated in ANN by E. Rosenblatt 1958.
By introducing the least mean squares (LMS) learning algorithm, Widrow and Hoff
developed in 1960 a model of a neuron that learned quickly and accurately. This model
was called ADALINE for ADAptive LInear NEuron. The applications ofADALINE
and its extension to MADALINE (for Many ADALINES) include pattern recognition,
weather forecasting, and adaptive controls. The monograph on learning machines by
Nils Nilsson (1965) summarized the developments of that time.
In 1969, research in the field of ANN suffered a serious setback. Minsky and Papert
published a book on perceptrons in which they proved that single layer neural networks
have limitations in their abilities to process data, and are capable of any mapping that
is linearly separable. They pointed out, carefully applying mathematical techniques,
that are logical Exclusive-OR (XOR) function could not be realized by perceptrons.
Further, Minsky and Papert argued that research into multi-layer neural networks would
be unproductive. Due to this pessimistic view of Minsky and Papert, the field
of ANN entered into an almost total eclipse for nearly two decades. Fortunately, Minsky
and Papert’s judgment has been disapproved; multi-layer perceptron networkscan solve
all nonlinear separable problems.
Nevertheless, a few dedicated researchers such as Kohonen, Grossberg, Anderson and
Hopfield continued their efforts.
The study of learning in networks of threshold elements and of the mathematical theory
of neural networks was pursued by Sun - Ichi – Amari (1972, 1977). Also Kunihiko
Fukushima developed a class of neural network architectures known as neocognitrons
in 1980.
There have been many impressive demonstrations of ANN capabilities: a network has
been trained to convert text to phonetic representations, which were then converted to
speech by other means (Sejnowsky and Rosenberg 1987); other network can recognize
handwritten characters (Burr 1987); and a neural network based image- compression
system has been devised (Cottrell, Munro, and Zipser 1987). These all use the
backpropagation network, perhaps the most successful of the current algorithms.
Backpropagation, invented independently in three separate research efforts (Werbos
1974, Parker 1982, and Rumelhart, Hinton and Williams 1986) provides a systematic
means for training multi-layer networks, thereby overcoming limitations presented by
Minsky.
Characteristics of ANN
Artificial neural networks are biologically inspired; that is, they are composed of elements
that perform in a manner that is analogous to the most elementary functions of the biological
neuron. The important characteristics of artificial neural networks are learning from
experience, generalize from previous examples to new ones, and abstract essential
characteristics from inputs containing irrelevant data.
Learning
The NNs learn by examples. Thus, NN architectures can be ‘trained’ with known
examples of a problem before they are tested for their ‘inference’ capability on unknown
instances of the problem. They can, therefore, identify new objects previously untrained.
ANN can modify their behavior in response to their environment. Shown a set of inputs
(perhaps with desired outputs), they self-adjust to produce consistent responses. A wide
variety of training algorithms has been discussed in later units.
Parallel operation
The NNs can process information in parallel, at high speed, and in a distributed manner.
Mapping
The NNs exhibit mapping capabilities, that is, they can map input patterns to their associated
output patterns.
Generalization
The NNs possess the capability to generalize. Thus, they can predict new outcomes
from past trends. Once trained, a network’s response can be to a degree, insensitive to minor
variations in its input. This ability to see through noise and distortion to the pattern that lies
within is vital to pattern recognition in a real-world environment. It is important to note that
the ANN generalizes automatically as a result of its structure, not by using human intelligence
embedded in the form of adhoc computer programs.
Robust
The NNs are robust systems and are fault tolerant. They can, therefore, recall full
patterns from incomplete, partial or noisy patterns
Abstraction
Some ANN’s are capable of abstracting the essence of a set of inputs. i.e. they can extract
features of the given set of data, for example, convolution neural networks are used to extract
different features from images like edges, dark spots, shapes ..etc. Such networks are trained
for feature patterns based on which they can classify or cluster the given input set.
Applicability
ANN’s are not a panacea. They are clearly unsuited to such tasks as calculating the payroll.
They are preferred for a large class of pattern-recognition tasks that conventional computers do
poorly, if at all.
Applications of ANN
Neural networks are preferred when the task is related to large-amount data processing. The
following are the potential applications of neural networks:
Classification
Prediction
Data Association
Data Conceptualization
Data Filtering
Optimization
In addition to the above fields, neural networks can apply to the fields of Medicine,
Commercial and Engineering, etc.
Human Artificial
Neuron Processing Element
Dendrites Combining Function
Cell Body Transfer Function
Axons Element Output
Synapses Weights
The soma is the body of the neuron. Attached to the soma there are long irregularly
shaped filaments, called dendrites. These nerve processes are often less than a micron in
diameter, and have complex branching shapes. The dendrites act as the connections through
which all the inputs to the neuron arrive. These cells are able to perform more complex
functions than simple addition on the inputs they receive, but considering simple summation
is a reasonable approximation.
Another type of nerve process attached to the soma is called an axon. This iselectrically
active, unlike the dendrite, and serves as the output channel of the neuron. Axonsalways appear
on output cell, but are often absent from interconnections, which have both inputs and outputs
on dendrites. The axon is a non-linear threshold device, producing a voltage pulse, called an
action potential, that last about 1 millisecond (10-3 sec) when the resting potential within the
soma rises above a certain critical threshold.
The axon terminates in a specialized contact called a synapse that couples the axon with
the dendrite of another cell. There is no direct linkage across the junction; rather, it is
temporally chemical one. The synapse releases chemicals called neurotransmitters when its
potential is raised sufficiently by the action potential. It may take the arrival of more than one
action potential before the synapse is triggered. The neurotransmitters that are released by the
synapse diffuse across the gap, any chemically activate gates on the dendrites, which, when
open, allow charged ions to flow. It is this flow of ions that alters the dendrite potential, and
provides a voltage pulse on the dendrite, which is then conducted along into the
next neuron body. Each dendrite may have many synapses acting on it, allowing massive
interconnectivity to be achieved.
Artificial Neuron
The artificial neuron is developed to mimic the first-order characteristics of the
biological neuron. In similar to the biological neuron, the artificial neuron receives many inputs
representing the output of other neurons. Each input is multiplied by a corresponding weight,
analogous to the synaptic strength. All of these weighted inputs are then summed and passed
through an activation function to determine the neuron input. This artificial neuron model is
shown in Fig.2.3.
n n (2.1)
= wi xi = wi xi
i
+θ
i 0
Assuming w0 = and x0 = 1
y(t) = f [u(t)] (2.2)
where f[.] is a nonlinear function called as the activation function, the input-output function or
the transfer function. In equation (2.1) and Fig.2.3, [x0, x1, …., xn] represent the inputs, [w0,
w1, . . . . , wn] represents the corresponding synaptic weights. In vector form, we can represent
the neural inputs and the synaptic weights as
X = [x0, x1, …., xn]T , and W = [w0, w1, , wn]
Equations (2.1) and (2.2) can be represented in vector form as:
U = WX (2.3)
Y = f[U] (2.4)
The activation function f[.] is chosen as a nonlinear function to emulate the nonlinear
behavior of conduction current mechanism in biological neuron. The behavior of the artificial
neuron depends both on the weights and the activation function. Sigmoidal functions are the
commonly used activation functions in multi-layer static neural networks. Other types of
activation functions are discussed in later units.
McCulloch-Pitts Model
The McCulloch-Pitts model of the neuron is shown in Fig. 2.4(a). The inputs xi , for i= 1,2, . .
.,n are 0 or 1, depending on the absence or presence of the input impulse at instant k. The
neuron’s output signal is denoted as Y. The firing rule for this model is defined as follows
n
k
1 if wi xi T
Yk+1 = i
n 1
0 if w xk T
i i
i
where super script k = 0,1,2,. . . . , denotes the discrete – time instant, and wi is themultiplicative
weight connecting the i’th input with the neuron’s membrane. Note that wi =
+1 for excitatory synapses, wi = -1 for inhibitory synapses for this model, and T is the neuron’s
threshold value, which needs to be exceeded by the weighted sum of the signals for the neuron
to fire.
This model can perform the basic operations NOT, OR and AND, provided its weights and
thresholds are approximately selected. Any multivariable combinational function can be
implemented using either the NOT and OR, or alternatively the NOT and AND, Boolean
operations. Examples of three-input NOR and NAND gates using the McCulloch-Pitts neuron
model are shown in (Fig.2.4 (b) and Fig 2.4(c)).
Fig.2.4 McCulloch-Pitts
Keyword definitions
Action potential: The pulse of electrical potential generated across the membrane of a
neuron (or an axon) following the application of a stimulus greater than threshold value.
Axon: The output fiber of a neuron, which carries the information in the form of action
potentials to other neurons in the network.
Dendrite: The input line of the neuron that carries a temporal summation of action potentials
to the soma.
Excitatory neuron: A neuron that transmits an action potential that has excitatory (positive)
influence on the recipient nerve cells.
Inhibitory neuron: A neuron that transmits an action potential that has inhibitory (negative)
influence on the recipient nerve cells.
Lateral inhibition: The local spatial interaction where the neural activity generated by one
neuron is suppressed by the activity of its neighbors.
Latency: The time between the application of the stimulus and the peak of the resulting
action potential output.
Refractory period: The minimum time required for the axon to generate two consecutive
action potentials.
Neural state: A neuron is active if it’s firing a sequence of action potentials.
Neuron: The basic nerve cell for processing biological information.
Soma: The body of a neuron, which provides aggregation, thresholding and nonlinear
activation to dendrite inputs.
Synapse: The junction point between the axon (of a pre-synaptic neuron) and the dendrite
(of a post-synaptic neuron). This acts as a memory (storage) to the past-accumulated experience
(knowledge).
Activation Functions
Operations of Artificial Neuron
The schematic diagram of artificial neuron is shown in Fig.3.1. The artificial neuron
mainly performs two operations, one is the summing of weighted net input and the second is
passing the net input through an activation function. The activation function also called
nonlinear function and some time transfer function of artificial neuron.
The net input of jth neuron may be written as
NETj = w1x1 + w2x2 + w3x3+ .. . . + wnxn
j
(3.1
)where j is the threshold of jth neuron,
network training.
2. Hyperbolic tangent (bipolar sigmoid) function. The characteristics of this function is
shown in Fig.3.3 and its mathematical description is
2
The function has maximum response, f(x) =1, when the input is x=0, and the response
decreases to f(x)=0 as the
4. Hard Limiter: The hard limiter function is the mostly used in classification of patterns,
the characteristics of this function is shown in Fig.3.5 and its mathematical description is
1, 0
f(u(t))= sign (u(t)) = (3.7)
-1 u(t) 0
Introduction
The development of artificial neuron based on the understanding of the biological
neural structure and learning mechanisms for required applications. This can be summarized
as (a) development of neural models based on the understanding of biological neurons, (b)
models of synaptic connections and structures, such as network topology and (c) the learning
rules. Researchers are explored different neural network architectures and used for various
applications. Therefore the classification of artificial neural networks (ANN) can be done based
on structures and based on type of data.
A single neuron can perform a simple pattern classification, but the power of neural
computation comes from neuron connecting networks. The basic definition of artificial neural
networks as physical cellular networks that are able to acquire, store and utilize experimental
knowledge has been related to the network’s capabilities and performance. The simplest
network is a group of neurons are arranged in a layer. This configuration is knownas single
layer neural networks. There are two types of single layer networks namely, feed- forward and
feedback networks. The single linear neural (that is activation function is linear) network will
have very limited capabilities in solving nonlinear problems, such as classification etc., because
their decision boundaries are linear. This can be made little more complex by selecting
nonlinear neuron (that is activation function is nonlinear) in single layer neural network. The
nonlinear classifiers will have complex shaped decision boundaries, which can solve complex
problems. Even nonlinear neuron single layer networks will have limitations in classifying
more close nonlinear classifications and fine control problems. In recent studies shows that the
nonlinear neural networks in multi-layer structures can simulate more complicated systems,
achieve smooth control, complex classifications and have capabilities beyond those of single
layer networks. In this unit we discuss first classifications, single layer neural networks and
multi-layer neural networks. The structure of a neural network refers to how its neurons are
interconnected.
4.2.0 Applications
Having different types artificial neural networks, these networks can be used to broad classes
of applications, such as (i) Pattern Recognition and Classification, (ii) Image Processing and
Vision, (iii) System Identification and Control, and (iv) Signal Processing.The details of
suitability of networks as follows:
(i) Pattern Recognition and Classification: Almost all networks can be used to solve
these types of problems.
(ii) Image Processing and Vision: The following networks are used for the
applications in this area: Static single layer networks, Dynamic single layer
networks, BAM, ART, Counter – propagation networks, First – Order dynamic
networks.
(iii) System Identification and Control: The following networks are used for the
applications in this area: Static multi layer networks, Dynamic multi layer networks
of types time-delay and Second-Order dynamic networks.
(iv) Signal Processing: The following networks are used for the applications in this
area: Static multi layer networks of type RBF, Dynamic multi layer networks of
types Cellular and Second-Order dynamic networks.
4.3.0 Single Layer Artificial Neural Networks
The simplest network is a group of neuron arranged in a layer. This configuration is
known as single layer neural networks. This type of network comprises of two layers, namely
the input layer and the output layer. The input layer neurons receive the input signals and the
output layer neurons receive the output signals. The synaptic links carrying the weights connect
every input neuron to the output neuron but not vice-versa. Such a network is said to'be
feedforward in type or acyclic in nature. Despite the two layers, the network is termed single
layer, since it is the output layer alone Which performs computation. The input layer merely
transmits the signals to the output layer. Hence, the name single layer feedforward network.
Figure 4.2 illustrates an example network.
There are two types of single layer networks namely, feed-forward and feedback
networks.
4.3.1 Feed forward single layer neural network
Consider m numbers of neurons are arranged in a layer structure and each neuron
receiving n inputs as shown in Fig.4.2.
Output and input vectors are respectively
X = [x1 x2 . . . xn]T
Weight wji connects the jth neuron with the ith input. Then the activation value for jth neuron
as
The following nonlinear transformation involving the activation function f(netj), for j=1,2,. .
.m, completes the processing of X. The transformation will be done by each of the m neurons
in the network.
X
W
Introducing the nonlinear matrix operator F, the mapping of input space X to output space O
implemented by the network can be written as
O = F (W X) (4.5a)
Where W is the weight matrix and also known as connection matrix and is represented as
w1 . . w1n
w11 . . w
2w
w
22 2n
21
W= . . . . . (4.5b)
. . . .
.
.
wm1 wm . . wmn
2
Fig. 4.6 Feed forward and feed back connections of Neural Networks
4.4.0 Multi Layer Artificial Neural Networks
Cascading a group of single layers networks can form the feed forward neural network.
This type networks also known as feed forward multi layer neural network. In which, the output
of one layer provides an input to the subsequent layer. The input layer gets input from outside;
the output of input layer is connected to the first hidden layer as input.The output layer
receives its input from the last hidden layer. The multi layer neural network provides no
increase in computational power over single layer neural networks unless there is a nonlinear
activation function between layers. Therefore, due to nonlinear activation function of each
neuron in hidden layers, the multi layer neural networks able to solve many of the complex
problems, such as, nonlinear function approximation, learning generalization, nonlinear
classification etc.
A multi layer neural network consists of input layer, output layer and hidden layers.
The number of nodes in input layer depends on the number of inputs and the number of nodes
in the output layer depends upon the number of outputs. The designer selects the number of
hidden layers and neurons in respective layers. According to the Kolmogorov’s theorem single
hidden layer is sufficient to map any complicated input – output mapping.
Fig. 4.7. Multilayer feedforward network
Recurrent Networks
These networks differ from feedforward network architectures in the sense that there
is atleast one feedback loop. Thus, in these networks, for example, there could exist .one
layer with feedback connections as shown in Fig. 4.8. There could also be neurons with self-
feedback links, i.e. the output of a neuron is fed back into itself as input.
The idea behind RNNs is to make use of sequential information. In a traditional
neural network we assume that all inputs (and outputs) are independent of each other. But for
many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better
know which words came before it. RNNs are called recurrent because they perform thesame
task for every element of a sequence, with the output being depended on the previous
computations. Another way to think about RNNs is that they have a “memory” which captures
information about what has been calculated so far. In theory RNNs can make use of information
in arbitrarily long sequences, but in practice they are limited to looking back onlya few steps.
Introduction
The dynamics of neuron consists of two parts. One is the dynamics of the
activation state and the second one is the dynamics of the synaptic weights. The Short Term
Memory (STM) in neural networks is modeled by the activation state of the network and the
Long Term Memory is encoded the information in the synaptic weights due to learning. The
main property of artificial neural network is that, the ability of the learning from its
environment and history. The network learns about its environment and history through its
interactive process of adjustment applied to its synaptic weights and bias levels. Generally,
the network becomes more knowledgeable about its environment and history, after completion
each iteration of learning process. It is important to distinguish between representation and
learning. Representation refers to the ability of a perceptron (or other network) to simulate a
specified function. Learning requires the existence of a systematic procedure for adjusting the
network weights to produce that function. Here we will discuss most of popular learning rules.
Definition of learning
There are too many activities associated with the notion of learning and we define
learning in the context of neural networks [1] as
“Learning is a process by which the free parameters of neural network are adapted
through a process of stimulation by the environment in which the network is embedded. The
type of learning is determined by the manner in which the parameter changes takes place”
Based on the above definition the learning process of ANN can be divided into the following
sequence of steps:
1. The ANN is stimulated by an environment.
2. The ANN undergoes changes in its free parameters as a result of the above
stimulation.
3. The ANN responds in a new way to the environment because of the changes that have
occurred in its internal structure.
PERCEPTRONS
8.0.0 Introduction
We know that perceptron is one of the early models of artificial neuron. It was proposed by
Rosenblatt in 1958. It is a single layer neural network whose weights and biases could be trained to
produce a correct target vector when presented with the corresponding input vector.The perceptron is a
program that learns concepts, i.e. it can learn to respond with True (1) or False (0) for inputs we present
to it, by repeatedly "studying" examples presented to it. The training technique used is called the
perceptron learning rule. The perceptron generated greatinterest due to its ability to generalize from its
training vectors and work with randomly distributed connections. Perceptrons are especially suited for
simple problems in pattern classification. In this also we give the perceptron convergence theorem.
WHAT IS A PERCEPTRON?
A perceptron is a binary classification algorithm modeled after the functioning of the human brain—
it was intended to emulate the neuron. The perceptron, while it has a simple structure, has the ability
to learn and solve very complex problems.
A multilayer perceptron (MLP) is a group of perceptrons, organized in multiple layers, that can
accurately answer complex questions. Each perceptron in the first layer (on the left) sends signals to
all the perceptrons in the second layer, and so on. An MLP contains an input layer, at least one hidden
layer, and an output layer.
The perceptron learns as follows:
1. Takes the inputs which are fed into the perceptrons in the input layer, multiplies them by their
weights, and computes the sum.
2. Adds the number one, multiplied by a “bias weight”. This is a technical step that makes it possible
to move the output function of each perceptron (the activation function) up, down, left and right
on the number graph.
3. Feeds the sum through the activation function—in a simple perceptron system, the activation
function is a step function.
4. The result of the step function is the output.
A multilayer perceptron is quite similar to a modern neural network. By adding a few ingredients, the
perceptron architecture becomes a full-fledged deep learning system:
Activation functions and other hyperparameters: a full neural network uses a variety of
activation functions which output real values, not boolean values like in the classic perceptron.
It is more flexible in terms of other details of the learning process, such as the number of training
iterations (iterations and epochs), weight initialization schemes, regularization, and so on. All
these can be tuned as hyperparameters.
Backpropagation: a full neural network uses the backpropagation algorithm, to perform
iterative backward passes which try to find the optimal values of perceptron weights, to
generate the most accurate prediction.
Advanced architectures: full neural networks can have a variety of architectures that can
help solve specific problems. A few examples are Recurrent Neural Networks (RNN),
Convolutional Neural Networks (CNN), and Generative Adversarial Networks (GAN).
5. Travel back from the output layer to the hidden layer to adjust the weights such that the error
is decreased.
As shown in the diagram, the architecture of BPN has three interconnected layers having weights on
them. The hidden layer as well as the output layer also has bias, whose weight is always 1, on them. As
is clear from the diagram, the working of BPN is in two phases. One phase sends the signal from the
input layer to the output layer, and the other phase back propagates the error from the output layer to the
input layer.
CONVERGENCE AND LOCAL MINIMA
Backpropagation is only guaranteed to converge to a local, and not a global, minima. However, since each weight
in a network essentially corresponds to a different dimension in the error space, a local minimum with respect to
one weight may not be a local minimum with respect to other weights. This can provide an “escape route” from
becoming trapped in local minima. If the weights are initialized to values close to zero, the sigmoid threshold
function is approximately linear and so they produce linear outputs. As the weights grow, though, the network is
able to represent more complex functions that are not linear in nature. It is the hope that by the time the weights
are able to approximate the desired function that they will be close enough to the global minimum that even
becoming stuck in a local minima will be acceptable. Common heuristic methods to reduce the problem of local
minima are: • Add a momentum term to the weight-update rule 18 • Use stochastic gradient descent rather than
true gradient descent • Train multiple networks using the same training data but initialize the networks with
different random weights. If the different networks lead to different local minima, choose the network that
performs best on a validation set of data or all networks can be kept and treated as a committee whose output is
the (possibly weighted) average of individual network outputs.
A multilayer perceptron is quite similar to a modern neural network. By adding a few ingredients, the
perceptron architecture becomes a full-fledged deep learning system:
Activation functions and other hyperparameters: a full neural network uses a variety of
activation functions which output real values, not boolean values like in the classic perceptron.
It is more flexible in terms of other details of the learning process, such as the number of training
iterations (iterations and epochs), weight initialization schemes, regularization, and so on. All
these can be tuned as hyperparameters.
Backpropagation: a full neural network uses the backpropagation algorithm, to perform
iterative backward passes which try to find the optimal values of perceptron weights, to
generate the most accurate prediction.
Advanced architectures: full neural networks can have a variety of architectures that can
help solve specific problems. A few examples are Recurrent Neural Networks (RNN),
Convolutional Neural Networks (CNN), and Generative Adversarial Networks (GAN).
5. Travel back from the output layer to the hidden layer to adjust the weights such that the error is
decreased.
As shown in the diagram, the architecture of BPN has three interconnected layers having weights on
them. The hidden layer as well as the output layer also has bias, whose weight is always 1, on them. As
is clear from the diagram, the working of BPN is in two phases. One phase sends the signal from the
input layer to the output layer, and the other phase back propagates the error from the output layer to the
input layer.
HIDDEN LAYER REPRESENTATION IN BACK PROPOGATION
The final values at the hidden neurons, colored in green, are computed using z^l — weighted inputs in
layer l, and a^l— activations in layer l. For layer 2 and 3 the equations are:
l=2
l=3
W² and W³ are the weights in layer 2 and 3 while b² and b³ are the biases in those layers.
Activations a² and a³ are computed using an activation function f. Typically, this function f is non-
linear (e.g. sigmoid, ReLU, tanh) and allows the network to learn complex patterns in data. We won’t go
over the details of how activation functions work, but, if interested, I strongly recommend reading this
great article.
Looking carefully, you can see that all of x, z², a², z³, a³, W¹, W², b¹ and b² are missing their subscripts
presented in the 4-layer network illustration above. The reason is that we have combined all parameter
values in matrices, grouped by layers. This is the standard way of working with neural networks and
one should be comfortable with the calculations. However, I will go over the equations to clear out any
confusion.
Let’s pick layer 2 and its parameters as an example. The same operations can be applied to any layer in the
network.
W¹ is a weight matrix of shape (n, m) where n is the number of output neurons (neurons in the next
layer) and m is the number of input neurons (neurons in the previous layer). For us, n = 2 and m = 4.
Equation for W¹
NB: The first number in any weight’s subscript matches the index of the neuron in the next layer (in
our case this is the Hidden_2 layer) and the second number matches the index of the neuron in
x is the input vector of shape (m, 1) where m is the number of input neurons. For us, m = 4.
Equation for x
b¹ is a bias vector of shape (n , 1) where n is the number of neurons in the current layer. For us, n = 2.
Equation for b¹
Following the equation for z², we can use the above definitions of W¹, x and b¹ to derive “Equation for z²”:
Equation for z²
You will see that z² can be expressed using (z_1)² and (z_2)² where (z_1)² and (z_2)² are the sums of the
multiplication between every input x_i with the corresponding weight (W_ij)¹.
This leads to the same “Equation for z²” and proofs that the matrix representations for z², a², z³ and a³ are
correct.
All neurons are interconnected to each other and they converge at a point so that the information
is passed onto every neuron in the network.
Using the backpropagation algorithm we are minimizing the errors by modifying the weights.
This minimization of errors can be done only locally but not globally.
We use boolean, continuous, and arbitrary functions in order to represent the network.
(ii)Hypothesis Space search and Inductive Bias
(iii)Inductive Bias
(iv)Hypothesis Space search