SlideShare a Scribd company logo
7
Most read
9
Most read
15
Most read
PRESENTATION
On
K-Fold Cross Validation Method
• Under the Guidance –Mrs.Divya Gupta
(M.tech)
Assistant Professor Computer Science
Department IERT ALD
• Made By Shubham Gupta
Aktu-Roll No-1511010041
B.Tech CSE 3rd Year 39
TABLE OF CONTENT
• Data Sets
1-Training Data Sets
2-Testing Data Sets
3-Data Set Figure Representation
• Cross validation
1-Definition
2-Methods of Cross Validation
• Hold –Out Method For Cross Validation
1-Definition
2-Need
3-Advantages
4-Disadvantages
• K-Fold Cross Validation methods
1-Definition
2-Need
3-Advantages
4-Disadvantages
* References
2
DATA SETS
• In machine Learning, the study and construction of algorithm that can learn from and
make prediction on the data is a common task. Such algorithms works by making
data- driven predictions or decision through building a mathematical model from
input data.
• The data used is called Data Sets.
• Date Sets are classified into two types.
1-Training Data set
2-Testing Data set.
3
TRAINING DATA SET
• Type of data set in which we know the solution or in other words we can say we
know the input and output data both such type of data is called Training Data set.
Eg-History (We know the outcome of that).
• It is used for leaning of result and making algorithm or pattern. Hence it should be in
large amount say 70% of the initial data.
• Also know as Development data set.
4
TESTING DATA SET
• Type of Data Set in which we don't know the solution or in other words we don’t
now the output of that input set. Such type of Data Set is called Training Data sets.
Eg. Future (We don’t know the outcomes of events that will occur in future.
• It is used for Data validation. Hence it should be maximum say 30% of initial data.
• Also know as Validation data sets.
5
DATA SET FIGURE REPRESENTATION
6
CROSS VALIDATION
• Cross Validation is a model validation technique for accessing how the result of
statistical analysis will generalize to an independent data set.
• So we can say cross validation is used for-
1-Finding or estimating expected error.
2-Helps in selecting the best fit Model (Model which fit the data set best).
3-Avoiding Over-Fit Model.(e.g. time fit Model like Earthquake.)
7
METHOD USED FOR CROSS
VALIDATION
• There are four methods used for Cross Validation. These are-
1-Hold out sample Validation.
2-K-Fold Cross Validation
3-Leave one out Cross Validation
4-Bootstraps Methods
Here we will discuss only 2 methods Hold out sample Validation and K-Fold Cross
Validation only.
8
HOLD OUT CROSS VALIDATION
• Step by Step-
• Step 1:- Took all data
• Step 2-Randomly divided into two parts
(say 70% 30)
• Step 3: Use Part1 as development
(training data set) and Part2 as
testing data set.
9
WHY WE DID SO IN HOLD OUT
METHOD
• To ensure that we learn the generalized
pattern without much error.
• Pattern obtained from the training set data
must show similar results in test/validation
data.
10
ADVANTAGES /DISADVANTAGES
OF HOLD OUT METHOD
• Advantages
1-Simplest method
2-Easily can work on large Data.
3-Fast method as compared to other method.
Disadvantage
1-Not working for small data set.(here it comes the Role of K-Fold Cross validation.
11
WHY WE NEED K-FOLD CROSS
VALIDATION METHOD
• Suppose a situation in which we have a short data
set say 500 data sets.
• Now we split the data into 70 :30 % as hold out
method says.
• Hence we only get 150 records which is too low.
• To increase it we make it 50:50 %Ratio.
• Now if we make 50:50 ratio than the training data
will become too low.
• If we don’t have much training data the model
develop will have more error and will not be accurate.
12
DILEMMA STATE IN TRAINING
AND TESTING DATA
• #More Training data more
accurate model will develop.
• #Less error in the model.
• Here it comes the role of K-Fold
CV.
• #more Testing data more Value to
check data.
13
K-FOLD CROSS VALIDATION
• Let assume k=5.So it will be 5-Fold validation.
• First take the data and divide it into 5
equal parts.
• Each part will have 20% of the data set values.
14
K-FOLD CROSS VALIDATION
CONTD
• Now used 4 parts as
development and 1 parts
for validation.
See the given figure
15
K-FOLD CROSS VALIDATION
CONTD
• Similar we can
done the same
thing for next
four.
See the
Figure
16
K-FOLD CROSS VALIDATION
CONTD
• Points to be noted
• Each part become available for 1 time in validation set.
• Similar Each part will become 4 times in the training Set.
• Hence we have increased both validation set and training.
17
ADVANTAGES OF K-FOLD CROSS VALIDATION
METHOD
• Given We have big data for model Development as in the Hold out method we have
only 500 data set now we have 500x5=2500 data sets in the K-Fold Cross validation
method .
• Given We have now a big data for validation. In case of Hold out method we have
only 150 data sets now in case of K-Fold cross validation method we have
100x5=500 data sets for validation.
• Hence we Have big data so it will more accurate as compared to other methods.
18
DISADVANTAGES OF K-FOLD CROSS
VALIDATION METHOD
• Only the Disadvantage that the K-Fold Cross Validation method has is it calculation.
• As we Repeat the model-K-times Hence it required More heavy calculation. Infact
it required K-times more calculation as compared to Hold –Out Cross Validation
method.
• Hence it is K-times slower.
19
REFERENCES
• Wikipedia-
https://en.wikipedia.org/wiki/Training,_test,_and_validation_sets
• Geeks for Geeks
https://www.geeksforgeeks.org/cross-validation-machine-learning/
• Udacity
https://www.youtube.com/watch?v=TIgfjmp-4BA
20
21

More Related Content

What's hot (20)

PPTX
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PDF
Logistic regression in Machine Learning
Kuppusamy P
 
PDF
Linear regression
MartinHogg9
 
PPT
2.2 decision tree
Krish_ver2
 
PPTX
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
PDF
Feature selection
Dong Guo
 
PPT
Data Preprocessing
Object-Frontier Software Pvt. Ltd
 
PPTX
Naive bayesian classification
Dr-Dipali Meher
 
PDF
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
PPTX
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
PPTX
Supervised Machine Learning
Livares Technologies Pvt Ltd
 
PPTX
Data preprocessing in Machine learning
pyingkodi maran
 
PDF
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
PPTX
Classification in data mining
Sulman Ahmed
 
PDF
Modelling and evaluation
eShikshak
 
PPTX
Classification techniques in data mining
Kamal Acharya
 
PPTX
Stochastic Gradient Decent (SGD).pptx
Shubham Jaybhaye
 
ODP
Machine Learning with Decision trees
Knoldus Inc.
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Logistic regression in Machine Learning
Kuppusamy P
 
Linear regression
MartinHogg9
 
2.2 decision tree
Krish_ver2
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
Feature selection
Dong Guo
 
Naive bayesian classification
Dr-Dipali Meher
 
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
Supervised Machine Learning
Livares Technologies Pvt Ltd
 
Data preprocessing in Machine learning
pyingkodi maran
 
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
Classification in data mining
Sulman Ahmed
 
Modelling and evaluation
eShikshak
 
Classification techniques in data mining
Kamal Acharya
 
Stochastic Gradient Decent (SGD).pptx
Shubham Jaybhaye
 
Machine Learning with Decision trees
Knoldus Inc.
 

Similar to K-Folds Cross Validation Method (20)

PPTX
shubhampresentation-180430060134.pptx
ABINASHPADHY6
 
PPTX
crossvalidation.pptx
PriyadharshiniG41
 
PPTX
Statistical Learning and Model Selection (1).pptx
rajalakshmi5921
 
PPTX
6 Evaluating Predictive Performance and ensemble.pptx
mohammedalherwi1
 
PDF
Unit1_Introduction to ML_Cross_validation.pdf
RAMESHWAR CHINTAMANI
 
PPTX
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
shamsul2010
 
PDF
Introduction to Artificial Intelligence_ Lec 10
Dalal2Ali
 
PPTX
Cross validation.pptx
YouKnowwho28
 
PPTX
Week 11 Model Evalaution Model Evaluation
khairulhuda242
 
PDF
Cmpe 255 cross validation
Abraham Kong
 
PPTX
Lecture2_machine learning training+testing.pptx
MuhammadAfzaal327724
 
PPTX
Cross Validation Cross ValidationmCross Validation.pptx
Nishant83346
 
PPTX
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
JVSTHARUNSAI
 
PDF
Validation Model dalam penggunaan klusterisasi.pdf
AriWibowo373720
 
PDF
Barga Data Science lecture 10
Roger Barga
 
PPTX
MACHINE LEARNING YEAR DL SECOND PART.pptx
NAGARAJANS68
 
PPTX
Model Selection Techniques
Swati .
 
PPTX
4.1.pptx
LimitlessHorizons
 
PDF
Machine Learning - Implementation with Python - 3.pdf
University College of Engineering Kakinada, JNTUK - Kakinada, India
 
shubhampresentation-180430060134.pptx
ABINASHPADHY6
 
crossvalidation.pptx
PriyadharshiniG41
 
Statistical Learning and Model Selection (1).pptx
rajalakshmi5921
 
6 Evaluating Predictive Performance and ensemble.pptx
mohammedalherwi1
 
Unit1_Introduction to ML_Cross_validation.pdf
RAMESHWAR CHINTAMANI
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
shamsul2010
 
Introduction to Artificial Intelligence_ Lec 10
Dalal2Ali
 
Cross validation.pptx
YouKnowwho28
 
Week 11 Model Evalaution Model Evaluation
khairulhuda242
 
Cmpe 255 cross validation
Abraham Kong
 
Lecture2_machine learning training+testing.pptx
MuhammadAfzaal327724
 
Cross Validation Cross ValidationmCross Validation.pptx
Nishant83346
 
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
JVSTHARUNSAI
 
Validation Model dalam penggunaan klusterisasi.pdf
AriWibowo373720
 
Barga Data Science lecture 10
Roger Barga
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
NAGARAJANS68
 
Model Selection Techniques
Swati .
 
Machine Learning - Implementation with Python - 3.pdf
University College of Engineering Kakinada, JNTUK - Kakinada, India
 
Ad

Recently uploaded (20)

PDF
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
PDF
William Stallings - Foundations of Modern Networking_ SDN, NFV, QoE, IoT, and...
lavanya896395
 
PDF
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
PDF
NTPC PATRATU Summer internship report.pdf
hemant03701
 
PDF
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
PDF
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
PPTX
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
PDF
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
PPTX
UNIT 1 - INTRODUCTION TO AI and AI tools and basic concept
gokuld13012005
 
PPTX
Distribution reservoir and service storage pptx
dhanashree78
 
PPTX
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
PPTX
Alan Turing - life and importance for all of us now
Pedro Concejero
 
PDF
3rd International Conference on Machine Learning and IoT (MLIoT 2025)
ClaraZara1
 
PPT
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
PDF
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
PPTX
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
PPTX
Fundamentals of Quantitative Design and Analysis.pptx
aliali240367
 
PPTX
Unit_I Functional Units, Instruction Sets.pptx
logaprakash9
 
PPTX
Basics of Electrical Engineering and electronics .pptx
PrabhuNarayan6
 
PDF
this idjfk sgfdhgdhgdbhgbgrbdrwhrgbbhtgdt
WaleedAziz7
 
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
William Stallings - Foundations of Modern Networking_ SDN, NFV, QoE, IoT, and...
lavanya896395
 
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
NTPC PATRATU Summer internship report.pdf
hemant03701
 
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
UNIT 1 - INTRODUCTION TO AI and AI tools and basic concept
gokuld13012005
 
Distribution reservoir and service storage pptx
dhanashree78
 
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
Alan Turing - life and importance for all of us now
Pedro Concejero
 
3rd International Conference on Machine Learning and IoT (MLIoT 2025)
ClaraZara1
 
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
Fundamentals of Quantitative Design and Analysis.pptx
aliali240367
 
Unit_I Functional Units, Instruction Sets.pptx
logaprakash9
 
Basics of Electrical Engineering and electronics .pptx
PrabhuNarayan6
 
this idjfk sgfdhgdhgdbhgbgrbdrwhrgbbhtgdt
WaleedAziz7
 
Ad

K-Folds Cross Validation Method

  • 1. PRESENTATION On K-Fold Cross Validation Method • Under the Guidance –Mrs.Divya Gupta (M.tech) Assistant Professor Computer Science Department IERT ALD • Made By Shubham Gupta Aktu-Roll No-1511010041 B.Tech CSE 3rd Year 39
  • 2. TABLE OF CONTENT • Data Sets 1-Training Data Sets 2-Testing Data Sets 3-Data Set Figure Representation • Cross validation 1-Definition 2-Methods of Cross Validation • Hold –Out Method For Cross Validation 1-Definition 2-Need 3-Advantages 4-Disadvantages • K-Fold Cross Validation methods 1-Definition 2-Need 3-Advantages 4-Disadvantages * References 2
  • 3. DATA SETS • In machine Learning, the study and construction of algorithm that can learn from and make prediction on the data is a common task. Such algorithms works by making data- driven predictions or decision through building a mathematical model from input data. • The data used is called Data Sets. • Date Sets are classified into two types. 1-Training Data set 2-Testing Data set. 3
  • 4. TRAINING DATA SET • Type of data set in which we know the solution or in other words we can say we know the input and output data both such type of data is called Training Data set. Eg-History (We know the outcome of that). • It is used for leaning of result and making algorithm or pattern. Hence it should be in large amount say 70% of the initial data. • Also know as Development data set. 4
  • 5. TESTING DATA SET • Type of Data Set in which we don't know the solution or in other words we don’t now the output of that input set. Such type of Data Set is called Training Data sets. Eg. Future (We don’t know the outcomes of events that will occur in future. • It is used for Data validation. Hence it should be maximum say 30% of initial data. • Also know as Validation data sets. 5
  • 6. DATA SET FIGURE REPRESENTATION 6
  • 7. CROSS VALIDATION • Cross Validation is a model validation technique for accessing how the result of statistical analysis will generalize to an independent data set. • So we can say cross validation is used for- 1-Finding or estimating expected error. 2-Helps in selecting the best fit Model (Model which fit the data set best). 3-Avoiding Over-Fit Model.(e.g. time fit Model like Earthquake.) 7
  • 8. METHOD USED FOR CROSS VALIDATION • There are four methods used for Cross Validation. These are- 1-Hold out sample Validation. 2-K-Fold Cross Validation 3-Leave one out Cross Validation 4-Bootstraps Methods Here we will discuss only 2 methods Hold out sample Validation and K-Fold Cross Validation only. 8
  • 9. HOLD OUT CROSS VALIDATION • Step by Step- • Step 1:- Took all data • Step 2-Randomly divided into two parts (say 70% 30) • Step 3: Use Part1 as development (training data set) and Part2 as testing data set. 9
  • 10. WHY WE DID SO IN HOLD OUT METHOD • To ensure that we learn the generalized pattern without much error. • Pattern obtained from the training set data must show similar results in test/validation data. 10
  • 11. ADVANTAGES /DISADVANTAGES OF HOLD OUT METHOD • Advantages 1-Simplest method 2-Easily can work on large Data. 3-Fast method as compared to other method. Disadvantage 1-Not working for small data set.(here it comes the Role of K-Fold Cross validation. 11
  • 12. WHY WE NEED K-FOLD CROSS VALIDATION METHOD • Suppose a situation in which we have a short data set say 500 data sets. • Now we split the data into 70 :30 % as hold out method says. • Hence we only get 150 records which is too low. • To increase it we make it 50:50 %Ratio. • Now if we make 50:50 ratio than the training data will become too low. • If we don’t have much training data the model develop will have more error and will not be accurate. 12
  • 13. DILEMMA STATE IN TRAINING AND TESTING DATA • #More Training data more accurate model will develop. • #Less error in the model. • Here it comes the role of K-Fold CV. • #more Testing data more Value to check data. 13
  • 14. K-FOLD CROSS VALIDATION • Let assume k=5.So it will be 5-Fold validation. • First take the data and divide it into 5 equal parts. • Each part will have 20% of the data set values. 14
  • 15. K-FOLD CROSS VALIDATION CONTD • Now used 4 parts as development and 1 parts for validation. See the given figure 15
  • 16. K-FOLD CROSS VALIDATION CONTD • Similar we can done the same thing for next four. See the Figure 16
  • 17. K-FOLD CROSS VALIDATION CONTD • Points to be noted • Each part become available for 1 time in validation set. • Similar Each part will become 4 times in the training Set. • Hence we have increased both validation set and training. 17
  • 18. ADVANTAGES OF K-FOLD CROSS VALIDATION METHOD • Given We have big data for model Development as in the Hold out method we have only 500 data set now we have 500x5=2500 data sets in the K-Fold Cross validation method . • Given We have now a big data for validation. In case of Hold out method we have only 150 data sets now in case of K-Fold cross validation method we have 100x5=500 data sets for validation. • Hence we Have big data so it will more accurate as compared to other methods. 18
  • 19. DISADVANTAGES OF K-FOLD CROSS VALIDATION METHOD • Only the Disadvantage that the K-Fold Cross Validation method has is it calculation. • As we Repeat the model-K-times Hence it required More heavy calculation. Infact it required K-times more calculation as compared to Hold –Out Cross Validation method. • Hence it is K-times slower. 19
  • 20. REFERENCES • Wikipedia- https://en.wikipedia.org/wiki/Training,_test,_and_validation_sets • Geeks for Geeks https://www.geeksforgeeks.org/cross-validation-machine-learning/ • Udacity https://www.youtube.com/watch?v=TIgfjmp-4BA 20
  • 21. 21