Introduction of Holdout Method
Last Updated :
26 Aug, 2020
Holdout Method is the simplest sort of method to evaluate a classifier. In this method, the data set (a collection of data items or examples) is separated into two sets, called the
Training set and Test set.
A classifier performs function of assigning data items in a given collection to a target category or class.
Example -
E-mails in our inbox being classified into spam and non-spam.
Classifier should be evaluated to find out, it's accuracy, error rate, and error estimates. It can be done using various methods. One of most primitive methods in evaluation of classifier is
'Holdout Method'.
In the holdout method, data set is partitioned, such that - maximum data belongs to training set and remaining data belongs to test set.
Example -
If there are 20 data items present, 12 are placed in training set and remaining 8 are placed in test set.
- After partitioning data set into two sets, training set is used to build a model/classifier.
- After construction of classifier, we use data items in test set, to test accuracy, error rate and error estimate of model/classifier.
However, it is vital to remember two statements with regard to holdout method. These are :
If maximum possible data items are placed in training set for construction of model/classifier, classifier's error rates and estimates would be very low and accuracy would be high. This is sign of a good classifier/model.
Example -
A student 'gfg' is coached by a teacher. Teacher teaches her all possible topics which might appear for exam. Hence, she tends to commit very less mistakes in exam, thus performing well.
If more training data are used to construct a classifier, it qualifies any data used from test set, to test it (classifier).
If more number of data items are present in test set, such that they are used to test classifier built using training set. We can observe more accurate evaluation of classifier with respect to it's accuracy, error rate and estimation.
Example -
A student 'gfg' is coached by a teacher. Teacher teaches her some topics, which might appear for the exam. If the student 'gfg' is given a number of exams on basis of this coaching, an accurate determination of student's weak and strong points can be found out.
If more test data are used to evaluate constructed classifier, it's error rate, error estimate and accuracy can be accurately determined.
Problem :
During partitioning of whole data set into 2 parts i.e., training set and test set, if all data items belonging to class - GFG1, are placed in test set entirely, such that none of data items of class GFG1 are in training set. It is evident, that model/classifier built, is not trained using data items of class - GFG1.
Solution :
Stratification is a technique, using which data items belonging to class - GFG1 are divided and placed into two data sets i.e training set and test set, equally. Such that, model/classifier is trained by data items belonging to class -GFG1.
Example -
All the four data items belonging to class - GFG1, here, are divided equally and placed, two data items each, into two data sets - training set and test set.
Similar Reads
Introduction of Repeated Holdout Method
Prerequisite: Introduction of Holdout Method Repeated Holdout Method is an iteration of the holdout method i.e it is the repeated execution of the holdout method.This method can be repeated â 'K' times/iterations.In this method, we employ random sampling of the dataset. The dataset is partitioned ra
2 min read
Concept of Display Method in Software Engineering
Display methods are basically main and important methods that are employed by software inspection process simply to ensure and confirm correctness of code and also to validate formal models. In this method, display is a precise document i.e., accurate and correct document in which program is simply
4 min read
Failover Testing in Software Testing
Software products are tested multiple times before they are released to ensure they work as expected. However, testing before delivery doesn't guarantee that issues wonât arise in the future. Sometimes, software can fail due to unexpected events like network problems or server failures. Failover tes
6 min read
Goel-Okumoto Model - Software Engineering
The Goel-Okumoto Model is a reliable software prediction tool based on simple principles: bugs are independent, bug detection is related to existing bugs, and bugs are fixed promptly. Through mathematical estimation, it helps predict bug counts and manage software development effectively, offering e
7 min read
Understanding 8D Principle of Problem Solving
8D Problem Solving :8D Stands for the Eight Disciplines of team-oriented problem-solving. It is a step-by-step process of identifying the root cause of a problem, providing corrective solutions, and preventive solutions to eliminate the recurring problems permanently. 8D follows the logic of PDCA (P
4 min read
Quasi renewal processes - Software Engineering
Let {N(t), t > 0} be a counting process and let $X_n$ be the time between the $(n-1)_{th}$ and the $n_{th}$ event of this process, n\geq 1 Definition: If the sequence of non-negative random variables {X1, X2, ....} is independent and $$X_i=aX_{i-1}$$ for $i\geq 2$ where $\alpha > 0$ is a const
7 min read
Steady State Model using MATLAB
A steady state model must be utilized to produce an underlying state when the model is begun from rest, an alleged "cold beginning." On the off chance that a system is in a steady state, at that point the as of late watched conduct of the system will proceed into what's to come. In systems, a system
1 min read
Stop and Wait protocol, its problems and solutions
It is the simplest flow control method in which the sender will send the packet and then wait for the acknowledgement by the receiver that it has received the packet then it will send the next packet. Stop and wait protocol is very easy to implement. Total time taken to send is, Ttotal = Tt(data) +
2 min read
Sample and Hold Circuit
Sample and Hold Circuit is a circuit that used in signal processing and data procurement system. The main function of this system is to transmit the signal and sample the input value and hold or freeze this processed value for some time. This circuit permits the circuit to catch and manage the insta
10 min read
JUnit 5 â @Timeout
The @Timeout annotation in JUnit restricts the duration of test and lifecycle methods. It aids in ensuring that a test procedure is completed according to the planned timetable. TimeoutExceptions occur when the duration of the test method exceeds the specified limit. Any JUnit test or lifecycle meth
7 min read