0% found this document useful (0 votes)
6 views

Cross-Validation in Machine Learning - Javatpoint

Cross-validation is a technique used in machine learning to validate model efficiency by training on a subset of data and testing on an unseen subset. Various methods of cross-validation include the validation set approach, leave-p-out, leave-one-out, k-fold, and stratified k-fold cross-validation, each with its own advantages and disadvantages. While cross-validation helps in assessing model performance and generalization, it also has limitations, particularly with inconsistent data and evolving datasets.

Uploaded by

manoj walekar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Cross-Validation in Machine Learning - Javatpoint

Cross-validation is a technique used in machine learning to validate model efficiency by training on a subset of data and testing on an unseen subset. Various methods of cross-validation include the validation set approach, leave-p-out, leave-one-out, k-fold, and stratified k-fold cross-validation, each with its own advantages and disadvantages. While cross-validation helps in assessing model performance and generalization, it also has limitations, particularly with inconsistent data and evolving datasets.

Uploaded by

manoj walekar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Cross-Validation in Machine Learning


Cross-validation is a technique for validating the model efficiency by training it on the subset of
input data and testing on previously unseen subset of the input data. We can also say that it
is a technique to check how a statistical model generalizes to an independent dataset.

In machine learning, there is always the need to test the stability of the model. It means based
only on the training dataset; we can't fit our model on the training dataset. For this purpose, we
reserve a particular sample of the dataset, which was not part of the training dataset. After that,
we test our model on that sample before deployment, and this complete process comes under
cross-validation. This is something different from the general train-test split.

Hence the basic steps of cross-validations are:

Reserve a subset of the dataset as a validation set.

Provide the training to the model using the training dataset.

Now, evaluate model performance using the validation set. If the model performs well
with the validation set, perform the further step, else check for the issues.

Methods used for Cross-Validation


There are some common methods that are used for cross-validation. These methods are given
below:

1. Validation Set Approach

2. Leave-P-out cross-validation

3. Leave one out cross-validation

4. K-fold cross-validation

5. Stratified k-fold cross-validation

Validation Set Approach

We divide our input dataset into a training set and test or validation set in the validation set
approach. Both the subsets are given 50% of the dataset.

But it has one of the big disadvantages that we are just using a 50% dataset to train our model,
so the model may miss out to capture important information of the dataset. It also tends to
give the underfitted model.

https://www.javatpoint.com/cross-validation-in-machine-learning 2/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Leave-P-out cross-validation

In this approach, the p datasets are left out of the training data. It means, if there are total n
datapoints in the original input dataset, then n-p data points will be used as the training
dataset and the p data points as the validation set. This complete process is repeated for all the
samples, and the average error is calculated to know the effectiveness of the model.

There is a disadvantage of this technique; that is, it can be computationally difficult for the
large p.

Leave one out cross-validation

This method is similar to the leave-p-out cross-validation, but instead of p, we need to take 1
dataset out of training. It means, in this approach, for each learning set, only one datapoint is
reserved, and the remaining dataset is used to train the model. This process repeats for each
datapoint. Hence for n samples, we get n different training set and n test set. It has the
following features:

In this approach, the bias is minimum as all the data points are used.

The process is executed for n times; hence execution time is high.

This approach leads to high variation in testing the effectiveness of the model as we
iteratively check against one data point.

K-Fold Cross-Validation

K-fold cross-validation approach divides the input dataset into K groups of samples of equal
sizes. These samples are called folds. For each learning set, the prediction function uses k-1
folds, and the rest of the folds are used for the test set. This approach is a very popular CV
approach because it is easy to understand, and the output is less biased than other methods.

The steps for k-fold cross-validation are:

Split the input dataset into K groups

For each group:

Take one group as the reserve or test data set.

Use remaining groups as the training dataset

Fit the model on the training set and evaluate the performance of the model
using the test set.

https://www.javatpoint.com/cross-validation-in-machine-learning 3/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Let's take an example of 5-folds cross-validation. So, the dataset is grouped into 5 folds. On 1st
iteration, the first fold is reserved for test the model, and rest are used to train the model. On
2nd iteration, the second fold is used to test the model, and rest are used to train the model.
This process will continue until each fold is not used for the test fold.

Consider the below diagram:

Stratified k-fold cross-validation

This technique is similar to k-fold cross-validation with some little changes. This approach
works on stratification concept, it is a process of rearranging the data to ensure that each fold
or group is a good representative of the complete dataset. To deal with the bias and variance, it
is one of the best approaches.

It can be understood with an example of housing prices, such that the price of some houses
can be much high than other houses. To tackle such situations, a stratified k-fold cross-
validation technique is useful.

Holdout Method

This method is the simplest cross-validation technique among all. In this method, we need to
remove a subset of the training data and use it to get prediction results by training it on the
rest part of the dataset.

The error that occurs in this process tells how well our model will perform with the unknown
dataset. Although this approach is simple to perform, it still faces the issue of high variance,
and it also produces misleading results sometimes.

https://www.javatpoint.com/cross-validation-in-machine-learning 4/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Comparison of Cross-validation to train/test split in Machine


Learning
Train/test split: The input data is divided into two parts, that are training set and test set
on a ratio of 70:30, 80:20, etc. It provides a high variance, which is one of the biggest
disadvantages.

Training Data: The training data is used to train the model, and the dependent
variable is known.

Test Data: The test data is used to make the predictions from the model that is
already trained on the training data. This has the same features as training data
but not the part of that.

Cross-Validation dataset: It is used to overcome the disadvantage of train/test split by


splitting the dataset into groups of train/test splits, and averaging the result. It can be
used if we want to optimize our model that has been trained on the training dataset for
the best performance. It is more efficient as compared to train/test split as every
observation is used for the training and testing both.

Limitations of Cross-Validation
There are some limitations of the cross-validation technique, which are given below:

For the ideal conditions, it provides the optimum output. But for the inconsistent data, it
may produce a drastic result. So, it is one of the big disadvantages of cross-validation, as
there is no certainty of the type of data in machine learning.

In predictive modeling, the data evolves over a period, due to which, it may face the
differences between the training set and validation sets. Such as if we create a model for
the prediction of stock market values, and the data is trained on the previous 5 years
stock values, but the realistic future values for the next 5 years may drastically different,
so it is difficult to expect the correct output for such situations.

Applications of Cross-Validation
This technique can be used to compare the performance of different predictive modeling
methods.

It has great scope in the medical research field.

It can also be used for the meta-analysis, as it is already being used by the data scientists
in the field of medical statistics.

https://www.javatpoint.com/cross-validation-in-machine-learning 5/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

← Prev Next →

For Videos Join Our Youtube Channel: Join Now

Feedback

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

https://www.javatpoint.com/cross-validation-in-machine-learning 6/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Splunk SPSS Swagger Transact-SQL

Tumblr ReactJS Regex Reinforcement


Learning

R Programming RxJS React Native Python Design


Patterns

Python Pillow Python Turtle Keras

Preparation

Aptitude Reasoning Verbal Ability Interview Questions

Company Questions

Trending Technologies

https://www.javatpoint.com/cross-validation-in-machine-learning 7/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Artificial AWS Tutorial Selenium Cloud


Intelligence tutorial Computing
AWS
Artificial Selenium Cloud Computing
Intelligence

Hadoop tutorial ReactJS Data Science Angular 7


Tutorial Tutorial Tutorial
Hadoop
ReactJS Data Science Angular 7

Blockchain Git Tutorial Machine DevOps


Tutorial Learning Tutorial Tutorial
Git
Blockchain Machine Learning DevOps

B.Tech / MCA

DBMS tutorial Data Structures DAA tutorial Operating


tutorial System
DBMS DAA
Data Structures Operating System

Computer Compiler Computer Discrete


Network tutorial Design tutorial Organization and Mathematics
Architecture Tutorial
Computer Network Compiler Design
Computer Discrete
Organization Mathematics

Ethical Hacking Computer Software html tutorial


Graphics Tutorial Engineering
Ethical Hacking Web Technology
Computer Graphics Software
Engineering

Cyber Security Automata C Language C++ tutorial


tutorial Tutorial tutorial
C++
Cyber Security Automata C Programming

https://www.javatpoint.com/cross-validation-in-machine-learning 8/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Java tutorial .Net Python tutorial List of


Framework Programs
Java Python
tutorial
Programs
.Net

Control Data Mining Data


Systems tutorial Tutorial Warehouse
Tutorial
Control System Data Mining
Data Warehouse

https://www.javatpoint.com/cross-validation-in-machine-learning 9/9

You might also like