Unit 5 Neural Networks
Unit 5 Neural Networks
Unit 5
Typically, a neural network also includes a bias which adjusts the input
of the activation function.
The Artificial Neuron
3. A threshold activation function (or simply activation function, also
called squashing function) results in an output signal only when an
input signal exceeding a specific threshold value comes as an input. It is
similar in behaviour to the biological neuron which transmits the signal
only when the total input signal meets the firing threshold.
Structure of an artificial neuron
Structure of an artificial neuron
• Output of the activation function, y , can be expressed as follows:
ADALINE network
Adaptive Linear Neural Element (ADALINE)
• Early single-layer ANN developed by Professor Bernard Widrow of
Stanford University.
• It has only output neuron. The output value can be +1 or −1.
• A bias input x0 (where x0 = 1) having a weight w is added.
• The activation function is such that if the weighted sum is positive or
0, then the output is 1, else it is −1. Formally, we can say,
ADALINE & MADALINE
• The supervised learning algorithm adopted by the ADALINE network
is known as Least Mean Square (LMS) or Delta rule.
• A network combining a number of ADALINEs is termed as MADALINE
(many ADALINE). MADALINE networks can be used to solve problems
related to nonlinear separability.
Different Network Topologies
• Single layer feed-forward networks – Input layer projecting into the
output layer
Single layer feed-forward network
• It consists of only two layers– the input layer and the output layer.
• The input layer consists of a set of ‘m’ input neurons X1 , X2 , …, Xm
connected to each of the ‘n’ output neurons Y1 , Y2 , …, Yn .
• The connections carry weights w11 , w12 , …, wmn .
• The input layer of neurons does not conduct any processing – they
pass the input signals to the output neurons.
• The computations are performed only by the neurons in the output
layer.
• So, though it has two layers of neurons, only one layer is performing
the computation.
Single layer feed-forward network
• The net signal input to the output neurons is given by
2-layer or 1-hidden
layer fully connected
network
Multi-layer feed-forward network
• Each of the layers may have varying number of neurons. For example,
it may have ‘m’ neurons in the input layer and ‘r’ neurons in the
output layer, and there is only one hidden layer with ‘n’ neurons.
• The net signal input to the neuron in the hidden layer is given by, for
the k-th hidden layer neuron
• The net signal input to the neuron in the output layer is given by, for
the k-th output layer neuron
Multi-layer feed-forward network
Different Network Topologies
• Recurrent networks – A network with feedback, where some of its
inputs are connected to some of its outputs (discrete time).
Recurrent networks
• In the case of recurrent neural networks, there is a small deviation.
There is a feedback loop, from the neurons in the output layer to the
input layer neurons.
• There may also be self-loops.
Backpropagation
• Multi-layer feed forward network is a commonly adopted
architecture.
• It has been observed that a neural network with even one hidden
layer can be used to reasonably approximate any continuous
function.
• The learning method adopted to train a multi-layer feed forward
network is termed as backpropagation, which we will study in the
next section.
What is Backpropagation?
• Backpropagation is the essence of neural network training.
• It is the method of fine-tuning the weights of a neural network based
on the error rate obtained in the previous epoch (i.e., iteration).
• Proper tuning of the weights allows you to reduce error rates and
make the model reliable by increasing its generalization.
• Backpropagation in neural network is a short form for "backward
propagation of errors."
• It is a standard method of training artificial neural networks.
• This method helps calculate the gradient of a loss function with
respect to all the weights in the network.
Training Model
Summarize the steps
• Calculate the error – How far is your model output from the actual
output.
• Minimum Error – Check whether the error is minimized or not.
• Update the parameters – If the error is huge then, update the
parameters (weights and biases). After that again check the error.
Repeat the process until the error becomes minimum.
• Model is ready to make a prediction – Once the error becomes
minimum, you can feed some inputs to your model and it will
produce the output.
Backpropagation
• The Back propagation algorithm in neural network computes the
gradient of the loss function for a single weight by the chain rule.
• It efficiently computes one layer at a time, unlike a native direct
computation.
• It computes the gradient, but it does not define how the gradient is
used. It generalizes the computation in the delta rule.
• Let’s put it in an another way, we need to train our model.
• One way to train our model is called as Backpropagation.
Backpropagation
• The Backpropagation algorithm looks for the minimum value of the
error function in weight space using a technique called the delta rule
or gradient descent. The weights that minimize the error function is
then considered to be a solution to the learning problem.
• Let’s understand how it works with an example:
• You have a dataset, which has labels.
Now, what we did here:
• We first initialized some random value to ‘W’ and propagated
forward.
• Then, we noticed that there is some error. To reduce that error, we
propagated backwards and increased the value of ‘W’.
• After that, also we noticed that the error has increased. We came to
know that, we can’t increase the ‘W’ value.
• So, we again propagated backwards and we decreased ‘W’ value.
• Now, we noticed that the error has reduced.
Now, what we do here:
• So, we are trying to get the value of weight such that the error
becomes minimum.
• Basically, we need to figure out whether we need to increase or
decrease the weight value.
• Once we know that, we keep on updating the weight value in that
direction until error becomes minimum.
• You might reach a point, where if you further update the weight, the
error will increase.
• At that time you need to stop, and that is your final weight value.
The Global Minimum
• We need to reach the
‘Global Loss Minimum’.
• two inputs
• two hidden neurons
• two output neurons
• two biases
• Below are the steps involved in Backpropagation:
• Step – 1: Forward Propagation
• Step – 2: Backward Propagation
• Step – 3: Putting all the values together and calculating the updated weight
value
How Backpropagation Works?
Step – 1: Forward Propagation
Step – 1: Forward Propagation
Step – 1: Forward Propagation
How Backpropagation Works?
Step – 2: Backward Propagation
• Now, we will propagate backwards. This way we will try to reduce the
error by changing the values of weights and biases.
• Consider W5, we will calculate the rate of change of error w.r.t change
in weight W5.
Step – 2: Backward Propagation
• Let’s see now how much does the total net input of O1 changes w.r.t
W5?
Step – 3: Update weight value
Putting all the values together and calculating the updated weight
Step 3
• Similarly, we can calculate the other weight values as well.
• After that we will again propagate forward and calculate the output.
Again, we will calculate the error.
• If the error is minimum we will stop right there, else we will again
propagate backwards and update the weight values.
• This process will keep on repeating until error becomes minimum.
Pseudo code for backpropagation
Assign all network inputs and output Initialize all weights with small random numbers, typically between -1 and 1
repeat
for every pattern in the training set
Present the pattern to the network
// Propagated the input forward through the network:
for each layer in the network
for every node in the layer
1. Calculate the weight sum of the inputs to the node
2. Add the threshold to the sum
3. Calculate the activation for the node
end
end
// Propagate the errors backward through the network
for every node in the output layer
calculate the error signal
end
for all hidden layers
for every node in the layer
1. Calculate the node's signal error
2. Update each node's weight in the network
end
end
// Calculate Global Error
Calculate the Error Function
end
while ((maximum number of iterations < than specified) AND
(Error Function is > than specified))
Types of Backpropagation Networks
• Two Types of Backpropagation Networks are:
• Static Back-propagation
• Recurrent Backpropagation
Types of Backpropagation Networks
• Static back-propagation:
• It is one kind of backpropagation network which produces a mapping of a
static input for static output. It is useful to solve static classification issues
like optical character recognition.
• Recurrent Backpropagation:
• Recurrent Back propagation in data mining is fed forward until a fixed value
is achieved. After that, the error is computed and propagated backward.
• The main difference between both of these methods is: that the mapping is
rapid in static back-propagation while it is nonstatic in recurrent
backpropagation.
Why We Need Backpropagation?
• Most prominent advantages of Backpropagation are:
• Backpropagation is fast, simple and easy to program
• It has no parameters to tune apart from the numbers of input
• It is a flexible method as it does not require prior knowledge about
the network
• It is a standard method that generally works well
• It does not need any special mention of the features of the function
to be learned.
Why We Need Backpropagation?
• Human is given a set of data and asked to classify them into
predefined classes.
• Like a human, ANN will come up with "theories" about how the
samples fit into the classes.
• These are then tested against the correct outputs to see how accurate
the guesses of the network are.
• Radical changes in the latest theory are indicated by large changes in
the weights, and small changes may be seen as minor adjustments to
the theory.
Other issues
• There are also issues regarding generalizing a neural network.
• Issues to consider are problems associated with under-training and over-
training data.
• Under-training can occur when the neural network is not complex enough
to detect a pattern in a complicated data set.
• This is usually the result of networks with so few hidden nodes that it
cannot accurately represent the solution, therefore under-fitting the data.
• On the other hand, over-training can result in a network that is too
complex, resulting in predictions that are far beyond the range of the
training data. Networks with too many hidden nodes will tend to over-fit
the solution.
Relationship between size and price
• To achieve this, we have taken the following steps:
• We’ve established the relationship using a linear equation for which the plots
have been shown. The first plot has a high error from training data points.
Therefore, this will not perform well on either public or the private leaderboard.
This is an example of “Underfitting”. In this case, our model fails to capture the
underlying trend of the data
• In the second plot, we just found the right relationship between price and size,
i.e., low training error and generalization of the relationship
• In the third plot, we found a relationship which has almost zero training error.
This is because the relationship is developed by considering each deviation in the
data point (including noise), i.e., the model is too sensitive and captures random
patterns which are present only in the current dataset. This is an example of
“Overfitting”. In this relationship, there could be a high deviation between the
public and private leaderboards
Validation
• A common practice in data science competitions is to iterate over various
models to find a better performing model.
• However, it becomes difficult to distinguish whether this improvement in
score is coming because we are capturing the relationship better, or we are
just over-fitting the data.
• To find the right answer for this question, we use validation techniques.
This method helps us in achieving more generalized relationships
What is Cross Validation?
• Cross Validation is a technique which involves reserving a particular sample
of a dataset on which you do not train the model. Later, you test your
model on this sample before finalizing it.