0% found this document useful (0 votes)

6 views

Cross-Validation in Machine Learning - Javatpoint

Cross-validation is a technique used in machine learning to validate model efficiency by training on a subset of data and testing on an unseen subset. Various methods of cross-validation include the validation set approach, leave-p-out, leave-one-out, k-fold, and stratified k-fold cross-validation, each with its own advantages and disadvantages. While cross-validation helps in assessing model performance and generalization, it also has limitations, particularly with inconsistent data and evolving datasets.

Uploaded by

manoj walekar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Cross-Validation in Machine Learning - Javatpoint

Uploaded by

manoj walekar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Cross-Validation in Machine Learning

Cross-validation is a technique for validating the model efficiency by training it on the subset of
input data and testing on previously unseen subset of the input data. We can also say that it
is a technique to check how a statistical model generalizes to an independent dataset.

In machine learning, there is always the need to test the stability of the model. It means based
only on the training dataset; we can't fit our model on the training dataset. For this purpose, we
reserve a particular sample of the dataset, which was not part of the training dataset. After that,
we test our model on that sample before deployment, and this complete process comes under
cross-validation. This is something different from the general train-test split.

Hence the basic steps of cross-validations are:

Reserve a subset of the dataset as a validation set.

Provide the training to the model using the training dataset.

Now, evaluate model performance using the validation set. If the model performs well
with the validation set, perform the further step, else check for the issues.

Methods used for Cross-Validation

There are some common methods that are used for cross-validation. These methods are given
below:

1. Validation Set Approach

2. Leave-P-out cross-validation

3. Leave one out cross-validation

4. K-fold cross-validation

5. Stratified k-fold cross-validation

Validation Set Approach

We divide our input dataset into a training set and test or validation set in the validation set
approach. Both the subsets are given 50% of the dataset.

But it has one of the big disadvantages that we are just using a 50% dataset to train our model,
so the model may miss out to capture important information of the dataset. It also tends to
give the underfitted model.

https://www.javatpoint.com/cross-validation-in-machine-learning 2/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Leave-P-out cross-validation

In this approach, the p datasets are left out of the training data. It means, if there are total n
datapoints in the original input dataset, then n-p data points will be used as the training
dataset and the p data points as the validation set. This complete process is repeated for all the
samples, and the average error is calculated to know the effectiveness of the model.

There is a disadvantage of this technique; that is, it can be computationally difficult for the
large p.

Leave one out cross-validation

This method is similar to the leave-p-out cross-validation, but instead of p, we need to take 1
dataset out of training. It means, in this approach, for each learning set, only one datapoint is
reserved, and the remaining dataset is used to train the model. This process repeats for each
datapoint. Hence for n samples, we get n different training set and n test set. It has the
following features:

In this approach, the bias is minimum as all the data points are used.

The process is executed for n times; hence execution time is high.

This approach leads to high variation in testing the effectiveness of the model as we
iteratively check against one data point.

K-Fold Cross-Validation

K-fold cross-validation approach divides the input dataset into K groups of samples of equal
sizes. These samples are called folds. For each learning set, the prediction function uses k-1
folds, and the rest of the folds are used for the test set. This approach is a very popular CV
approach because it is easy to understand, and the output is less biased than other methods.

The steps for k-fold cross-validation are:

Split the input dataset into K groups

For each group:

Take one group as the reserve or test data set.

Use remaining groups as the training dataset

Fit the model on the training set and evaluate the performance of the model
using the test set.

https://www.javatpoint.com/cross-validation-in-machine-learning 3/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Let's take an example of 5-folds cross-validation. So, the dataset is grouped into 5 folds. On 1st
iteration, the first fold is reserved for test the model, and rest are used to train the model. On
2nd iteration, the second fold is used to test the model, and rest are used to train the model.
This process will continue until each fold is not used for the test fold.

Consider the below diagram:

Stratified k-fold cross-validation

This technique is similar to k-fold cross-validation with some little changes. This approach
works on stratification concept, it is a process of rearranging the data to ensure that each fold
or group is a good representative of the complete dataset. To deal with the bias and variance, it
is one of the best approaches.

It can be understood with an example of housing prices, such that the price of some houses
can be much high than other houses. To tackle such situations, a stratified k-fold cross-
validation technique is useful.

Holdout Method

This method is the simplest cross-validation technique among all. In this method, we need to
remove a subset of the training data and use it to get prediction results by training it on the
rest part of the dataset.

The error that occurs in this process tells how well our model will perform with the unknown
dataset. Although this approach is simple to perform, it still faces the issue of high variance,
and it also produces misleading results sometimes.

https://www.javatpoint.com/cross-validation-in-machine-learning 4/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Comparison of Cross-validation to train/test split in Machine

Learning
Train/test split: The input data is divided into two parts, that are training set and test set
on a ratio of 70:30, 80:20, etc. It provides a high variance, which is one of the biggest
disadvantages.

Training Data: The training data is used to train the model, and the dependent
variable is known.

Test Data: The test data is used to make the predictions from the model that is
already trained on the training data. This has the same features as training data
but not the part of that.

Cross-Validation dataset: It is used to overcome the disadvantage of train/test split by

splitting the dataset into groups of train/test splits, and averaging the result. It can be
used if we want to optimize our model that has been trained on the training dataset for
the best performance. It is more efficient as compared to train/test split as every
observation is used for the training and testing both.

Limitations of Cross-Validation
There are some limitations of the cross-validation technique, which are given below:

For the ideal conditions, it provides the optimum output. But for the inconsistent data, it
may produce a drastic result. So, it is one of the big disadvantages of cross-validation, as
there is no certainty of the type of data in machine learning.

In predictive modeling, the data evolves over a period, due to which, it may face the
differences between the training set and validation sets. Such as if we create a model for
the prediction of stock market values, and the data is trained on the previous 5 years
stock values, but the realistic future values for the next 5 years may drastically different,
so it is difficult to expect the correct output for such situations.

Applications of Cross-Validation
This technique can be used to compare the performance of different predictive modeling
methods.

It has great scope in the medical research field.

It can also be used for the meta-analysis, as it is already being used by the data scientists
in the field of medical statistics.

https://www.javatpoint.com/cross-validation-in-machine-learning 5/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

← Prev Next →

For Videos Join Our Youtube Channel: Join Now

Feedback

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

https://www.javatpoint.com/cross-validation-in-machine-learning 6/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Splunk SPSS Swagger Transact-SQL

Tumblr ReactJS Regex Reinforcement

Learning

R Programming RxJS React Native Python Design

Patterns

Python Pillow Python Turtle Keras

Preparation

Aptitude Reasoning Verbal Ability Interview Questions

Company Questions

Trending Technologies

https://www.javatpoint.com/cross-validation-in-machine-learning 7/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Artificial AWS Tutorial Selenium Cloud

Intelligence tutorial Computing
AWS
Artificial Selenium Cloud Computing
Intelligence

Hadoop tutorial ReactJS Data Science Angular 7

Tutorial Tutorial Tutorial
Hadoop
ReactJS Data Science Angular 7

Blockchain Git Tutorial Machine DevOps

Tutorial Learning Tutorial Tutorial
Git
Blockchain Machine Learning DevOps

B.Tech / MCA

DBMS tutorial Data Structures DAA tutorial Operating

tutorial System
DBMS DAA
Data Structures Operating System

Computer Compiler Computer Discrete

Network tutorial Design tutorial Organization and Mathematics
Architecture Tutorial
Computer Network Compiler Design
Computer Discrete
Organization Mathematics

Ethical Hacking Computer Software html tutorial

Graphics Tutorial Engineering
Ethical Hacking Web Technology
Computer Graphics Software
Engineering

Cyber Security Automata C Language C++ tutorial

tutorial Tutorial tutorial
C++
Cyber Security Automata C Programming

https://www.javatpoint.com/cross-validation-in-machine-learning 8/9
9/12/24, 10:53 PM Cross-Validation in Machine Learning - Javatpoint

Java tutorial .Net Python tutorial List of

Framework Programs
Java Python
tutorial
Programs
.Net

Control Data Mining Data

Systems tutorial Tutorial Warehouse
Tutorial
Control System Data Mining
Data Warehouse

https://www.javatpoint.com/cross-validation-in-machine-learning 9/9

Regional Unified Numeracy Test
100% (4)
Regional Unified Numeracy Test
4 pages
Unit V
No ratings yet
Unit V
12 pages
Module3-Ensemble Learning
No ratings yet
Module3-Ensemble Learning
107 pages
Ml Unit4 Notes
No ratings yet
Ml Unit4 Notes
20 pages
Unit 2
No ratings yet
Unit 2
28 pages
Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
Module 6_ML
No ratings yet
Module 6_ML
30 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
model-validation
No ratings yet
model-validation
5 pages
Lecture Note #6_PEC-CS701E
No ratings yet
Lecture Note #6_PEC-CS701E
11 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
cross validation
No ratings yet
cross validation
5 pages
Cross Validation
No ratings yet
Cross Validation
4 pages
ADS
No ratings yet
ADS
20 pages
Model Validation & Data Partition
No ratings yet
Model Validation & Data Partition
14 pages
Validation Over Under Fir Unit 5
No ratings yet
Validation Over Under Fir Unit 5
6 pages
Cofusion Matrix Cross- Validation
No ratings yet
Cofusion Matrix Cross- Validation
34 pages
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
No ratings yet
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
21 pages
K Fold and Other Cross-Validation Techniques
No ratings yet
K Fold and Other Cross-Validation Techniques
10 pages
Cross Validation
No ratings yet
Cross Validation
5 pages
Lecture-5-HCL-DSE - Sumita Narang-2
No ratings yet
Lecture-5-HCL-DSE - Sumita Narang-2
40 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
Lecture 3
No ratings yet
Lecture 3
15 pages
Model Cross Validation
No ratings yet
Model Cross Validation
11 pages
A Gentle Introduction To K-Fold Cross-Validation
No ratings yet
A Gentle Introduction To K-Fold Cross-Validation
69 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Cross Validation Thesis
100% (4)
Cross Validation Thesis
5 pages
Several Model Validation Techniques in Python - by Terence Shin - Towards Data Science
No ratings yet
Several Model Validation Techniques in Python - by Terence Shin - Towards Data Science
10 pages
Why Do We Use Cross Validation Set in Our Models?
No ratings yet
Why Do We Use Cross Validation Set in Our Models?
2 pages
Mi Unit 5 2m
No ratings yet
Mi Unit 5 2m
3 pages
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
No ratings yet
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
21 pages
Unit Ii ML
No ratings yet
Unit Ii ML
57 pages
Week 10 - PROG 8510 Week 10
No ratings yet
Week 10 - PROG 8510 Week 10
16 pages
All Types of Cross Validation
No ratings yet
All Types of Cross Validation
9 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
Lec - 4
No ratings yet
Lec - 4
43 pages
ML Unit 2
No ratings yet
ML Unit 2
18 pages
Analysis of K-Fold Cross-Validation Over Hold-Out
No ratings yet
Analysis of K-Fold Cross-Validation Over Hold-Out
6 pages
CSL0777 L08
No ratings yet
CSL0777 L08
29 pages
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
17 pages
unitIV (1)
No ratings yet
unitIV (1)
54 pages
A "Short" Introduction To Model Selection
No ratings yet
A "Short" Introduction To Model Selection
25 pages
Receiver Operator Characteristic
No ratings yet
Receiver Operator Characteristic
25 pages
ML 5
No ratings yet
ML 5
14 pages
Lecture 12 - Machine Learning
No ratings yet
Lecture 12 - Machine Learning
18 pages
MC4301 - ML Unit 2 (Model Evaluation and Feature Engineering)
No ratings yet
MC4301 - ML Unit 2 (Model Evaluation and Feature Engineering)
40 pages
ML Unit 2
No ratings yet
ML Unit 2
33 pages
Data Mining Assignment Help
No ratings yet
Data Mining Assignment Help
5 pages
Pa ZG512 Ec-3r First Sem 2022-2023
No ratings yet
Pa ZG512 Ec-3r First Sem 2022-2023
5 pages
Train Test Split in Python
No ratings yet
Train Test Split in Python
11 pages
Chapter-3-Common Issues in Machine Learning
No ratings yet
Chapter-3-Common Issues in Machine Learning
20 pages
ML Notes (Module-3)
No ratings yet
ML Notes (Module-3)
21 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
2020-Maleki-(NeuroimageClin)-Machine Learning Algorithm Validation ...
No ratings yet
2020-Maleki-(NeuroimageClin)-Machine Learning Algorithm Validation ...
13 pages
SML Updated UNIT 4
No ratings yet
SML Updated UNIT 4
44 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
Data Science Assignment 2
No ratings yet
Data Science Assignment 2
14 pages
5 - Model For Predictions - ML
No ratings yet
5 - Model For Predictions - ML
52 pages
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
From Everand
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
Satou Takahiro
No ratings yet
Sociology Mains 20012 - 14
No ratings yet
Sociology Mains 20012 - 14
10 pages
List_of_Champion_Sectors_08July2020
No ratings yet
List_of_Champion_Sectors_08July2020
1 page
CO PO Mapping
No ratings yet
CO PO Mapping
4 pages
A Few Tips
No ratings yet
A Few Tips
1 page
Block Diagrams
No ratings yet
Block Diagrams
26 pages
Conversion, Obversion, and Contraposition of Categorical Syllogism
No ratings yet
Conversion, Obversion, and Contraposition of Categorical Syllogism
1 page
Anshika Gupta Resume
No ratings yet
Anshika Gupta Resume
2 pages
Southeast Region Encampment - 2005
No ratings yet
Southeast Region Encampment - 2005
20 pages
JCMPHS Portpolio
No ratings yet
JCMPHS Portpolio
63 pages
Portfolio EAPP
No ratings yet
Portfolio EAPP
20 pages
Pulse-Experiment
No ratings yet
Pulse-Experiment
7 pages
Chapter 3 Research Theory
No ratings yet
Chapter 3 Research Theory
10 pages
Fung (2024)
No ratings yet
Fung (2024)
21 pages
English Support Material-6
No ratings yet
English Support Material-6
1 page
Research Paper Baby Thesis
100% (1)
Research Paper Baby Thesis
8 pages
DLL - Mathematics 5 - Q3 - W2
No ratings yet
DLL - Mathematics 5 - Q3 - W2
12 pages
How To Remotely Terminate and Disconnect Remote Desktop
No ratings yet
How To Remotely Terminate and Disconnect Remote Desktop
5 pages
Week 6 - Interactive Presentation
No ratings yet
Week 6 - Interactive Presentation
38 pages
The Namesake Notes
No ratings yet
The Namesake Notes
12 pages
Use Business Equipment and Resoures
No ratings yet
Use Business Equipment and Resoures
3 pages
2ms Sequence 1
No ratings yet
2ms Sequence 1
22 pages
APJ Abdul Kalam
100% (1)
APJ Abdul Kalam
12 pages
EDLD 5333 Week 2 Assignment New
No ratings yet
EDLD 5333 Week 2 Assignment New
16 pages
Sessional Paper (BAS 303) - B
No ratings yet
Sessional Paper (BAS 303) - B
2 pages
Merit List Neet Ug 2019
No ratings yet
Merit List Neet Ug 2019
221 pages
Abhishek Anand
No ratings yet
Abhishek Anand
1 page
1
No ratings yet
1
7 pages
Past Perfect Lesson Plan
100% (1)
Past Perfect Lesson Plan
2 pages
GNM EDUCATION RESEARCH ADMN Syllabus
No ratings yet
GNM EDUCATION RESEARCH ADMN Syllabus
6 pages
Part2 - CIE IGCSE Biology 9 RN - Sample
No ratings yet
Part2 - CIE IGCSE Biology 9 RN - Sample
2 pages
G11 Jan Exam Revision P2 - Short
No ratings yet
G11 Jan Exam Revision P2 - Short
7 pages
Dissertation Roast Pig
100% (2)
Dissertation Roast Pig
5 pages
Alchemy
No ratings yet
Alchemy
1 page
Atom Structure - Class 9th Grade Physics Science Project Free PDF Download
No ratings yet
Atom Structure - Class 9th Grade Physics Science Project Free PDF Download
12 pages
OPCRF Sample
No ratings yet
OPCRF Sample
2 pages