0% found this document useful (0 votes)
26 views99 pages

Unit 5 Neural Networks

Uploaded by

parthpanchal2207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views99 pages

Unit 5 Neural Networks

Uploaded by

parthpanchal2207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Neural Networks

Unit 5

AML-Prof. Minal Chauhan


Biological Inspirations
• Humans perform complex tasks like vision, motor control, or language
understanding very well.
• One way to build intelligent machines is to try to imitate the
(organizational principles of) human brain.
• The brain is a highly complex, non-linear, and parallel computer,
composed of some 1011 neurons that are densely connected (~104
connection per neuron). We have just begun to understand how the
brain works...
Neural Network
• In any human being, the nervous system coordinates the different
actions by transmitting signals to and from different parts of the body.
• The nervous system is constituted of a special type of cell, called
neuron or nerve cell, which has special structures allowing it to
receive or send signals to other neurons.
• Neurons connect with each other to transmit signals to or receive
signals from other neurons.
• This structure essentially forms a network of neurons or a neural
network.
• Coordinating the actions and taking the decisions for such a complex
task, which may appear to be trivial, are possible because of the
massive parallel complex network, i.e. the neural network.
Neural Network
• By virtue of billions of networked neurons that it possesses, the biological
neural network is a massively large and complex parallel computing
network.
• It is because of this massive parallel computing network that the nervous
system helps human beings to perform actions or take decisions at a speed
and with such ease that the fastest supercomputer of the world will also be
envy of.
• For example, let us think of the superb flying catches taken by the fielders
in the cricket world cup.
• It is a combination of superior calculation based on past cricketing
experience, understanding of local on-ground conditions, and anticipation
of how hard the ball has been hit that the fielder takes the decision about
when to jump, where to jump, and how much to jump. This is a highly
complex task
Biological Neuron
• The human nervous system has two main parts –
• the central nervous system (CNS) consisting of the brain and spinal cord
• the peripheral nervous system consisting of nerves and ganglia outside the
brain and spinal cord.
• The CNS integrates all information, in the form of signals, from the
different parts of the body.
• The peripheral nervous system, on the other hand, connects the CNS
with the limbsand organs.
• Neurons are basic structural units of the CNS. A neuron is able to
receive, process, and transmit information in the form of chemical
and electrical signals.
Biological Neuron
• Dendrites – to receive signals from
neighbouring neurons.
• Soma – main body of the neuron
which accumulates the signals coming
from the different dendrites. It ‘fires’
when a sufficient amount of signal is
accumulated.
• Axon – last part of the neuron which
receives signal from soma, once the
neuron ‘fires’, and passes it on to the
neighbouring neurons through the
axon terminals (to the adjacent
dendrite of the neighbouring neurons).
• Synapse – There is a very small gap
between the axon terminal of one
neuron and the adjacent dendrite of
the neighbouring neuron.
Artificial Neural Networks
• Computational models inspired by the human brain:
• Massively parallel, distributed system, made up of simple processing
units (neurons)
• It resembles the brain in two respects:
• Synaptic connection strengths among neurons are used to store the acquired
knowledge.
• Knowledge is acquired by the network from its environment through a
learning process
Properties of ANNs
• Learning from examples
• labeled or unlabeled
• Adaptivity
• changing the connection strengths to learn things
• Non-linearity
• the non-linear activation functions are essential
• Fault tolerance
• if one of the neurons or connections is damaged, the whole network still works quite
well
• Thus, they might be better alternatives than classical solutions for
problems characterised by:
• high dimensionality, noisy, imprecise or imperfect data; and
• a lack of a clearly stated mathematical solution or algorithm
The Artificial Neuron
• The biological neural network has been modelled in the form of ANN
with artificial neurons simulating the function of biological neurons.
As depicted in Figure 10.2, input signal x (x , x , …, x ) comes to an
artificial neuron. Each neuron has three major components:
1. A set of ‘i’ synapses having weight w . A signal x forms the input to
the i-th synapse having weight w . The value of weight w may be
positive or negative. A positive weight has an excitatory effect, while a
negative weight has an inhibitory effect on the output of the
summation junction, y .
The Artificial Neuron
2. A summation junction for the input signals is weighted by the
respective synaptic weight. Because it is a linear combiner or adder of
the weighted input signals, the output of the summation junction, y ,
can be expressed as follows:

Typically, a neural network also includes a bias which adjusts the input
of the activation function.
The Artificial Neuron
3. A threshold activation function (or simply activation function, also
called squashing function) results in an output signal only when an
input signal exceeding a specific threshold value comes as an input. It is
similar in behaviour to the biological neuron which transmits the signal
only when the total input signal meets the firing threshold.
Structure of an artificial neuron
Structure of an artificial neuron
• Output of the activation function, y , can be expressed as follows:

• An artificial neuron: - computes the weighted sum of its input (called


its net input) - adds its bias - passes this value through an activation
function
• We say that the neuron “fires” (i.e. becomes active) if its output is
above zero
Activation functions
• Identity function Identity function is used as an activation function for
the input layer. It is a linear function having the form

• The output remains the same as the input.


• There are different types of activation functions. The most commonly
used activation functions are Identity/linear function, Threshold/step
function, Sigmoid function.
Activation functions
• Also called the squashing function as it limits the
amplitude of the output of the neuron.
• Many types of activations functions are used:
• linear: a = f(n) = n
• threshold: a = { 1 if n >= 0
(hardlimiting) 0 if n < 0
• sigmoid: a = 1/(1+e-n)
Activation functions
• Some activation
functions are listed here:
Early ANN
• The first artificial neural network was invented in 1958 by psychologist
Frank Rosenblatt.
• Called Perceptron, it was intended to model how the human brain
processed visual data and learned to recognize objects.
• Other researchers have since used similar ANNs to study human cognition.
• Eventually, someone realized that in addition to providing insights into the
functionality of the human brain, ANNs could be useful tools in their own
right.
• Their pattern-matching and learning capabilities allowed them to address
many problems that were difficult or impossible to solve by standard
computational and statistical methods.
• By the late 1980s, many real-world institutes were using ANNs for a variety
of purposes.
Early ANN
• Other early ANN Models: ADALINE, Hopfield Network
• Current Models:
• Deep Learning Architectures
• Multilayer feedforward networks (Multilayer perceptrons)
• Radial Basis Function networks
• Self Organizing Networks
Rosenblatt's perceptron model
• Rosenblatt perceptron is a binary single neuron model. The inputs integration is
implemented through the addition of the weighted inputs that have fixed weights
obtained during the training stage. This simple single neuron model has the main
limitation of not being able to solve non-linear separable problems.
Perceptron
• Perceptron is a single layer neural network and a multi-layer perceptron
is called Neural Networks
• Perceptron is a linear classifier (binary). Also, it is used in supervised
learning. It helps to classify the given input data. But how does it works?
• A normal neural network looks like this as we all know
Perceptron
• The perceptron consists of 4 parts.
1.Input values or One input layer
2.Weights and Bias
3.Net sum
4.Activation Function
• The Neural Networks work the same way as the perceptron. So, if you
want to know how neural network works, learn how perceptron
works.
Perceptron
But how does it work?
• The perceptron works on these simple steps
a. All the inputs x are multiplied with their weights w. Let’s call it k.
But how does it work?
• b. Add all the multiplied values and call them Weighted Sum.
But how does it work?
• c. Apply that weighted sum to the correct Activation Function.
• For Example: Unit Step Activation Function.
Why-
• Why do we need Weights and
Bias?
• Weights shows the strength of the
particular node.
• A bias value allows you to shift the
activation function curve up or
down.
• Why do we need Activation
Function?
• In short, the activation functions are
used to map the input between the
required values like (0, 1) or (-1, 1).
Where we use Perceptron?
• Perceptron is usually used to classify the data into two parts.
Therefore, it is also known as a Linear Binary Classifier.
Perceptron Learning
• The perceptron receives a set of input x1 , x2 ,…, xn .
• The linear combiner or the adder node computes the linear
combination of the inputs applied to the synapses with synaptic
weights being w1 , w2 , …, wn .
• Then, the hard limiter checks whether the resulting sum is positive or
negative.
• If the input of the hard limiter node is positive, the output is +1, and if
the input is negative, the output is −1. The hard limiter input is
Perceptron
Perceptron Learning
• perceptron includes an adjustable value or bias as an additional
weight w0 .
• This additional weight w0 is attached to a dummy input x0 , which is
always assigned a value of 1.
• This consideration modifies the above equation to

• The output is decided by the expression


Perceptron Learning
• The objective of perceptron is to classify a set of inputs into two
classes, c1 and c2 .
• This can be done using a very simple decision rule – assign the inputs
x1 , x2 , x3 , …, xn to c1 if the output of the perceptron, i.e. y , is +1
and c2 if y is −1.
• So, for an n-dimensional signal space, i.e. a space for ‘n’ input signals
x1 , x2 , x3 , …, xn , the simplest form of perceptron will have two
decision regions, resembling two classes, separated by a hyperplane
defined by
Perceptron Learning
• Therefore, for two input signals denoted by variables x1 and x2 , the
decision boundary is a straight line of the form

• So, for a perceptron having the values of synaptic weights w0 , w1 ,


and w2 as −2, ½, and ¼, respectively, the linear decision boundary will
be of the form
Perceptron Learning
• So, any point (x1 , x2 ) which lies above the decision boundary, will be
assigned to class c1 and the points which lie below the boundary are
assigned to class c2 .
Example
• Let us examine if this perceptron is able to classify a set of points
given below:
• p1 = (5, 2) and p2 = (−1, 12) belonging to c1
• p3 = (3, −5) and p4 = (−2, −1) belonging to c2
Example
• The same classification is obtained by mapping the points in the input
space,
Perceptron
• Thus, we can see that for a data set with linearly separable classes,
perceptrons can always be employed to solve classification problems
using decision lines (for twodimensional space), decision planes (for
three-dimensional space), or decision hyperplanes (for n-dimensional
space).
• Appropriate values of the synaptic weights w1 , w2 , w3, …, wn can be
obtained by training a perceptron.
• However, one assumption for perceptron to work properly is that the
two classes should be linearly separable
Multi-layer perceptron
• A basic perceptron works very successfully for data sets which
possess linearly separable patterns.
• However, in practical situation, that is an ideal situation to have.
• Basic perceptron is not able to learn to compute even a simple 2-bit
XOR.
Multi-layer perceptron
• A linear decision boundary with two decision lines can clearly
partition the data. This is the philosophy used to design the multi-
layer perceptron model.
Multi-layer perceptron
• The neural network contains one or more intermediate layers
between the input and the output nodes, which are hidden from both
input and output nodes.
• Each neuron in the network includes a non-linear activation function
that is differentiable.
• The neurons in each layer are connected with some or all the neurons
in the previous layer.
Multi-layer perceptron

ADALINE network
Adaptive Linear Neural Element (ADALINE)
• Early single-layer ANN developed by Professor Bernard Widrow of
Stanford University.
• It has only output neuron. The output value can be +1 or −1.
• A bias input x0 (where x0 = 1) having a weight w is added.
• The activation function is such that if the weighted sum is positive or
0, then the output is 1, else it is −1. Formally, we can say,
ADALINE & MADALINE
• The supervised learning algorithm adopted by the ADALINE network
is known as Least Mean Square (LMS) or Delta rule.
• A network combining a number of ADALINEs is termed as MADALINE
(many ADALINE). MADALINE networks can be used to solve problems
related to nonlinear separability.
Different Network Topologies
• Single layer feed-forward networks – Input layer projecting into the
output layer
Single layer feed-forward network
• It consists of only two layers– the input layer and the output layer.
• The input layer consists of a set of ‘m’ input neurons X1 , X2 , …, Xm
connected to each of the ‘n’ output neurons Y1 , Y2 , …, Yn .
• The connections carry weights w11 , w12 , …, wmn .
• The input layer of neurons does not conduct any processing – they
pass the input signals to the output neurons.
• The computations are performed only by the neurons in the output
layer.
• So, though it has two layers of neurons, only one layer is performing
the computation.
Single layer feed-forward network
• The net signal input to the output neurons is given by

for the k-th output neuron.


• The signal output from each output neuron will depend on the
activation function used.
Different Network Topologies
• Multi-layer feed-forward networks
– One or more hidden layers.
– Input projects only from previous layers onto a layer.
typically, only from one layer to the next

2-layer or 1-hidden
layer fully connected
network
Multi-layer feed-forward network
• Each of the layers may have varying number of neurons. For example,
it may have ‘m’ neurons in the input layer and ‘r’ neurons in the
output layer, and there is only one hidden layer with ‘n’ neurons.
• The net signal input to the neuron in the hidden layer is given by, for
the k-th hidden layer neuron

• The net signal input to the neuron in the output layer is given by, for
the k-th output layer neuron
Multi-layer feed-forward network
Different Network Topologies
• Recurrent networks – A network with feedback, where some of its
inputs are connected to some of its outputs (discrete time).
Recurrent networks
• In the case of recurrent neural networks, there is a small deviation.
There is a feedback loop, from the neurons in the output layer to the
input layer neurons.
• There may also be self-loops.
Backpropagation
• Multi-layer feed forward network is a commonly adopted
architecture.
• It has been observed that a neural network with even one hidden
layer can be used to reasonably approximate any continuous
function.
• The learning method adopted to train a multi-layer feed forward
network is termed as backpropagation, which we will study in the
next section.
What is Backpropagation?
• Backpropagation is the essence of neural network training.
• It is the method of fine-tuning the weights of a neural network based
on the error rate obtained in the previous epoch (i.e., iteration).
• Proper tuning of the weights allows you to reduce error rates and
make the model reliable by increasing its generalization.
• Backpropagation in neural network is a short form for "backward
propagation of errors."
• It is a standard method of training artificial neural networks.
• This method helps calculate the gradient of a loss function with
respect to all the weights in the network.
Training Model
Summarize the steps
• Calculate the error – How far is your model output from the actual
output.
• Minimum Error – Check whether the error is minimized or not.
• Update the parameters – If the error is huge then, update the
parameters (weights and biases). After that again check the error.
Repeat the process until the error becomes minimum.
• Model is ready to make a prediction – Once the error becomes
minimum, you can feed some inputs to your model and it will
produce the output.
Backpropagation
• The Back propagation algorithm in neural network computes the
gradient of the loss function for a single weight by the chain rule.
• It efficiently computes one layer at a time, unlike a native direct
computation.
• It computes the gradient, but it does not define how the gradient is
used. It generalizes the computation in the delta rule.
• Let’s put it in an another way, we need to train our model.
• One way to train our model is called as Backpropagation.
Backpropagation
• The Backpropagation algorithm looks for the minimum value of the
error function in weight space using a technique called the delta rule
or gradient descent. The weights that minimize the error function is
then considered to be a solution to the learning problem.
• Let’s understand how it works with an example:
• You have a dataset, which has labels.
Now, what we did here:
• We first initialized some random value to ‘W’ and propagated
forward.
• Then, we noticed that there is some error. To reduce that error, we
propagated backwards and increased the value of ‘W’.
• After that, also we noticed that the error has increased. We came to
know that, we can’t increase the ‘W’ value.
• So, we again propagated backwards and we decreased ‘W’ value.
• Now, we noticed that the error has reduced.
Now, what we do here:
• So, we are trying to get the value of weight such that the error
becomes minimum.
• Basically, we need to figure out whether we need to increase or
decrease the weight value.
• Once we know that, we keep on updating the weight value in that
direction until error becomes minimum.
• You might reach a point, where if you further update the weight, the
error will increase.
• At that time you need to stop, and that is your final weight value.
The Global Minimum
• We need to reach the
‘Global Loss Minimum’.

• This is nothing but


Backpropagation.

• Let’s now understand the


math behind
Backpropagation.
The Local Minimum
• We can add the momentum to accelerate
the learning process by "encouraging" the
weight changes to continue in the same
direction with larger steps.
• Momentum term prevents the learning
process from settling in a local minimum.
by "over stepping" the small "hill".
• Typically, the momentum term has a value
between 0 and 1.
• It should be noted that no matter what
modifications are made to the
Backpropagation algorithm, such as
including the momentum term, there are
no guarantees that the network will not
settle in a local minimum.
How Backpropagation Algorithm Works?
1. Inputs X, arrive through the preconnected path
2. Input is modeled using real weights W. The weights are usually
randomly selected.
3. Calculate the output for every neuron from the input layer, to the
hidden layers, to the output layer.
4. Calculate the error in the outputs
ErrorB= Actual Output – Desired Output
5. Travel back from the output layer to the hidden layer to adjust the
weights such that the error is decreased.
• Keep repeating the process until the desired output is achieved
The network contains:

• two inputs
• two hidden neurons
• two output neurons
• two biases
• Below are the steps involved in Backpropagation:
• Step – 1: Forward Propagation
• Step – 2: Backward Propagation
• Step – 3: Putting all the values together and calculating the updated weight
value
How Backpropagation Works?
Step – 1: Forward Propagation
Step – 1: Forward Propagation
Step – 1: Forward Propagation
How Backpropagation Works?
Step – 2: Backward Propagation
• Now, we will propagate backwards. This way we will try to reduce the
error by changing the values of weights and biases.
• Consider W5, we will calculate the rate of change of error w.r.t change
in weight W5.
Step – 2: Backward Propagation
• Let’s see now how much does the total net input of O1 changes w.r.t
W5?
Step – 3: Update weight value
Putting all the values together and calculating the updated weight
Step 3
• Similarly, we can calculate the other weight values as well.
• After that we will again propagate forward and calculate the output.
Again, we will calculate the error.
• If the error is minimum we will stop right there, else we will again
propagate backwards and update the weight values.
• This process will keep on repeating until error becomes minimum.
Pseudo code for backpropagation
Assign all network inputs and output Initialize all weights with small random numbers, typically between -1 and 1
repeat
for every pattern in the training set
Present the pattern to the network
// Propagated the input forward through the network:
for each layer in the network
for every node in the layer
1. Calculate the weight sum of the inputs to the node
2. Add the threshold to the sum
3. Calculate the activation for the node
end
end
// Propagate the errors backward through the network
for every node in the output layer
calculate the error signal
end
for all hidden layers
for every node in the layer
1. Calculate the node's signal error
2. Update each node's weight in the network
end
end
// Calculate Global Error
Calculate the Error Function
end
while ((maximum number of iterations < than specified) AND
(Error Function is > than specified))
Types of Backpropagation Networks
• Two Types of Backpropagation Networks are:
• Static Back-propagation
• Recurrent Backpropagation
Types of Backpropagation Networks
• Static back-propagation:
• It is one kind of backpropagation network which produces a mapping of a
static input for static output. It is useful to solve static classification issues
like optical character recognition.
• Recurrent Backpropagation:
• Recurrent Back propagation in data mining is fed forward until a fixed value
is achieved. After that, the error is computed and propagated backward.
• The main difference between both of these methods is: that the mapping is
rapid in static back-propagation while it is nonstatic in recurrent
backpropagation.
Why We Need Backpropagation?
• Most prominent advantages of Backpropagation are:
• Backpropagation is fast, simple and easy to program
• It has no parameters to tune apart from the numbers of input
• It is a flexible method as it does not require prior knowledge about
the network
• It is a standard method that generally works well
• It does not need any special mention of the features of the function
to be learned.
Why We Need Backpropagation?
• Human is given a set of data and asked to classify them into
predefined classes.
• Like a human, ANN will come up with "theories" about how the
samples fit into the classes.
• These are then tested against the correct outputs to see how accurate
the guesses of the network are.
• Radical changes in the latest theory are indicated by large changes in
the weights, and small changes may be seen as minor adjustments to
the theory.
Other issues
• There are also issues regarding generalizing a neural network.
• Issues to consider are problems associated with under-training and over-
training data.
• Under-training can occur when the neural network is not complex enough
to detect a pattern in a complicated data set.
• This is usually the result of networks with so few hidden nodes that it
cannot accurately represent the solution, therefore under-fitting the data.
• On the other hand, over-training can result in a network that is too
complex, resulting in predictions that are far beyond the range of the
training data. Networks with too many hidden nodes will tend to over-fit
the solution.
Relationship between size and price
• To achieve this, we have taken the following steps:
• We’ve established the relationship using a linear equation for which the plots
have been shown. The first plot has a high error from training data points.
Therefore, this will not perform well on either public or the private leaderboard.
This is an example of “Underfitting”. In this case, our model fails to capture the
underlying trend of the data
• In the second plot, we just found the right relationship between price and size,
i.e., low training error and generalization of the relationship
• In the third plot, we found a relationship which has almost zero training error.
This is because the relationship is developed by considering each deviation in the
data point (including noise), i.e., the model is too sensitive and captures random
patterns which are present only in the current dataset. This is an example of
“Overfitting”. In this relationship, there could be a high deviation between the
public and private leaderboards
Validation
• A common practice in data science competitions is to iterate over various
models to find a better performing model.
• However, it becomes difficult to distinguish whether this improvement in
score is coming because we are capturing the relationship better, or we are
just over-fitting the data.
• To find the right answer for this question, we use validation techniques.
This method helps us in achieving more generalized relationships
What is Cross Validation?
• Cross Validation is a technique which involves reserving a particular sample
of a dataset on which you do not train the model. Later, you test your
model on this sample before finalizing it.

• Here are the steps involved in cross validation:

1. You reserve a sample data set


2. Train the model using the remaining part of the dataset
3. Use the reserve sample of the test (validation) set. This will help you in
gauging the effectiveness of your model’s performance. If your model
delivers a positive result on validation data, go ahead with the current
model. It rocks!
Methods
• Train/test split
• k-Fold Cross-Validation
• Leave-one-out Cross-Validation
• Leave-one-group-out Cross-Validation
• Nested Cross-Validation
• Time-series Cross-Validation
• Wilcoxon signed-rank test
• McNemar’s test
• 5x2CV paired t-test
• 5x2CV combined F test
Train/test split or The validation set approach
• In this approach, we reserve 50% of the dataset for validation and the
remaining 50% for model training.
• However, a major disadvantage of this approach is that since we are
training a model on only 50% of the dataset, there is a huge
possibility that we might miss out on some interesting information
about the data which will lead to a higher bias.
Leave one out cross validation (LOOCV)
• In this approach, we reserve only one data point from the available
dataset, and train the model on the rest of the data. This process iterates
for each data point. This also has its own advantages and disadvantages.
Let’s look at them:
• We make use of all data points, hence the bias will be low
• We repeat the cross validation process n times (where n is number of data points)
which results in a higher execution time
• This approach leads to higher variation in testing model effectiveness because we
test against one data point. So, our estimation gets highly influenced by the data
point. If the data point turns out to be an outlier, it can lead to a higher variation
• LOOCV leaves one data point out. Similarly, you could leave p training examples out
to have validation set of size p for each iteration. This is called LPOCV (Leave P Out
Cross Validation)
k-fold cross validation
• From the above two validation methods, we’ve learnt:

1. We should train the model on a large portion of the dataset. Otherwise


we’ll fail to read and recognise the underlying trend in the data. This will
eventually result in a higher bias
2. We also need a good ratio of testing data points. As we have seen above,
less amount of data points can lead to a variance error while testing the
effectiveness of the model
3. We should iterate on the training and testing process multiple times. We
should change the train and test dataset distribution. This helps in
validating the model effectiveness properly
k-fold cross validation
1. Randomly split your entire dataset into k”folds”
2. For each k-fold in your dataset, build your model on k – 1 folds of
the dataset. Then, test the model to check the effectiveness for kth
fold
3. Record the error you see on each of the predictions
4. Repeat this until each of the k-folds has served as the test set
5. The average of your k recorded errors is called the cross-validation
error and will serve as your performance metric for the model
k-fold validation when k=10
How to choose the right value of k?
• Always remember, a lower value of k is more biased, and hence undesirable. On
the other hand, a higher value of K is less biased, but can suffer from large
variability. It is important to know that a smaller value of k always takes us
towards validation set approach, whereas a higher value of k leads to LOOCV
approach.
• Precisely, LOOCV is equivalent to n-fold cross validation where n is the number of
training examples.
Stratified k-fold cross validation
• Stratification is the process
of rearranging the data so
as to ensure that each fold
is a good representative of
the whole.
• For example, in a binary
classification problem
where each class
comprises of 50% of the
data, it is best to arrange
the data such that in every
fold, each class comprises
of about half the instances.
Stratified k-fold cross validation
Cross Validation for time series
• Splitting a time-series dataset randomly
does not work because the time section
of your data will be messed up. For a time
series forecasting problem, we perform
cross validation in the following manner.
• Folds for time series cross validation are
created in a forward chaining fashion
• Suppose we have a time series for yearly
consumer demand for a product during a
period of n years. The folds would be
created like:
Cross Validation for time series
Cross Validation for time series
• We progressively select a new train and test
set.
• We start with a train set which has a
minimum number of observations needed
for fitting the model.
• Progressively, we change our train and test
sets with each fold. In most cases, 1 step
forecasts might not be very important.
• In such instances, the forecast origin can be
shifted to allow for multi-step errors to be
used.
• For example, in a regression problem, the
following code could be used for performing
cross validation.
Custom Cross Validation Techniques
• Unfortunately, there is no single method that works best for all kinds
of problem statements.
• Often, a custom cross validation technique based on a feature, or
combination of features, could be created if that gives the user stable
cross validation scores while making submissions in hackathons.

You might also like