3. Cross Validation
3. Cross Validation
Validation
Introduction
• Machine learning validation methods provide a means for us to estimate
generalization error.
• This is crucial for determining what model provides the most best
predictions for unobserved data.
• In cases where large amounts of data are available, machine learning
data validation begins with splitting the data into three separate
datasets:
1. A training set is used to train the machine learning model(s) during
development.
2. A validation set is used to estimate the generalization error of the
model created from the training set for the purpose of model selection.
Cross-Validation in Machine Learning
• The model validation process in the previous section
works when we have large datasets.
• When data is limited we must instead use a technique
called cross-validation.
• The purpose of cross-validation is to provide a
better estimate of a model's ability to perform on
unseen data.
• It provides an unbiased estimate of the generalization
error, especially in the case of limited data.
There are many reasons we may want to do this i.e.
Cross Validation