K-Folds Cross Validation Method

PRESENTATION
On
K-Fold Cross Validation Method
• Under the Guidance –Mrs.Divya Gupta
(M.tech)
Assistant Professor Computer Science
Department IERT ALD
• Made By Shubham Gupta
Aktu-Roll No-1511010041
B.Tech CSE 3rd Year 39

TABLE OF CONTENT
• Data Sets
1-Training Data Sets
2-Testing Data Sets
3-Data Set Figure Representation
• Cross validation
1-Definition
2-Methods of Cross Validation
• Hold –Out Method For Cross Validation
1-Definition
2-Need
3-Advantages
4-Disadvantages
• K-Fold Cross Validation methods
1-Definition
2-Need
3-Advantages
4-Disadvantages
* References
2

DATA SETS
• In machine Learning, the study and construction of algorithm that can learn from and
make prediction on the data is a common task. Such algorithms works by making
data- driven predictions or decision through building a mathematical model from
input data.
• The data used is called Data Sets.
• Date Sets are classified into two types.
1-Training Data set
2-Testing Data set.
3

TRAINING DATA SET
• Type of data set in which we know the solution or in other words we can say we
know the input and output data both such type of data is called Training Data set.
Eg-History (We know the outcome of that).
• It is used for leaning of result and making algorithm or pattern. Hence it should be in
large amount say 70% of the initial data.
• Also know as Development data set.
4

TESTING DATA SET
• Type of Data Set in which we don't know the solution or in other words we don’t
now the output of that input set. Such type of Data Set is called Training Data sets.
Eg. Future (We don’t know the outcomes of events that will occur in future.
• It is used for Data validation. Hence it should be maximum say 30% of initial data.
• Also know as Validation data sets.
5

DATA SET FIGURE REPRESENTATION
6

CROSS VALIDATION
• Cross Validation is a model validation technique for accessing how the result of
statistical analysis will generalize to an independent data set.
• So we can say cross validation is used for-
1-Finding or estimating expected error.
2-Helps in selecting the best fit Model (Model which fit the data set best).
3-Avoiding Over-Fit Model.(e.g. time fit Model like Earthquake.)
7

METHOD USED FOR CROSS
VALIDATION
• There are four methods used for Cross Validation. These are-
1-Hold out sample Validation.
2-K-Fold Cross Validation
3-Leave one out Cross Validation
4-Bootstraps Methods
Here we will discuss only 2 methods Hold out sample Validation and K-Fold Cross
Validation only.
8

HOLD OUT CROSS VALIDATION
• Step by Step-
• Step 1:- Took all data
• Step 2-Randomly divided into two parts
(say 70% 30)
• Step 3: Use Part1 as development
(training data set) and Part2 as
testing data set.
9

WHY WE DID SO IN HOLD OUT
METHOD
• To ensure that we learn the generalized
pattern without much error.
• Pattern obtained from the training set data
must show similar results in test/validation
data.
10

ADVANTAGES /DISADVANTAGES
OF HOLD OUT METHOD
• Advantages
1-Simplest method
2-Easily can work on large Data.
3-Fast method as compared to other method.
Disadvantage
1-Not working for small data set.(here it comes the Role of K-Fold Cross validation.
11

WHY WE NEED K-FOLD CROSS
VALIDATION METHOD
• Suppose a situation in which we have a short data
set say 500 data sets.
• Now we split the data into 70 :30 % as hold out
method says.
• Hence we only get 150 records which is too low.
• To increase it we make it 50:50 %Ratio.
• Now if we make 50:50 ratio than the training data
will become too low.
• If we don’t have much training data the model
develop will have more error and will not be accurate.
12

DILEMMA STATE IN TRAINING
AND TESTING DATA
• #More Training data more
accurate model will develop.
• #Less error in the model.
• Here it comes the role of K-Fold
CV.
• #more Testing data more Value to
check data.
13

K-FOLD CROSS VALIDATION
• Let assume k=5.So it will be 5-Fold validation.
• First take the data and divide it into 5
equal parts.
• Each part will have 20% of the data set values.
14

CONTD
• Now used 4 parts as
development and 1 parts
for validation.
See the given figure
15

CONTD
• Similar we can
done the same
thing for next
four.
See the
Figure
16

CONTD
• Points to be noted
• Each part become available for 1 time in validation set.
• Similar Each part will become 4 times in the training Set.
• Hence we have increased both validation set and training.
17

ADVANTAGES OF K-FOLD CROSS VALIDATION
METHOD
• Given We have big data for model Development as in the Hold out method we have
only 500 data set now we have 500x5=2500 data sets in the K-Fold Cross validation
method .
• Given We have now a big data for validation. In case of Hold out method we have
only 150 data sets now in case of K-Fold cross validation method we have
100x5=500 data sets for validation.
• Hence we Have big data so it will more accurate as compared to other methods.
18

DISADVANTAGES OF K-FOLD CROSS
VALIDATION METHOD
• Only the Disadvantage that the K-Fold Cross Validation method has is it calculation.
• As we Repeat the model-K-times Hence it required More heavy calculation. Infact
it required K-times more calculation as compared to Hold –Out Cross Validation
method.
• Hence it is K-times slower.
19

REFERENCES
• Wikipedia-
https://en.wikipedia.org/wiki/Training,_test,_and_validation_sets
• Geeks for Geeks
https://www.geeksforgeeks.org/cross-validation-machine-learning/
• Udacity
https://www.youtube.com/watch?v=TIgfjmp-4BA
20

K-Folds Cross Validation Method

More Related Content

What's hot (20)

Similar to K-Folds Cross Validation Method (20)

Recently uploaded (20)

K-Folds Cross Validation Method