SlideShare a Scribd company logo
International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020
DOI: 10.5121/ijcsit.2020.12202 15
ENACTMENT RANKING OF SUPERVISED
ALGORITHMS DEPENDENCE OF DATA SPLITTING
ALGORITHMS: A CASE STUDY OF REAL DATASETS
Hina Tabassum and Dr. Muhammad Mutahir Iqbal
Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan
ABSTRACT
We conducted comparative analysis of different supervised dimension reduction techniques by integrating
a set of different data splitting algorithms and demonstrate the relative efficacy of learning algorithms
dependence of sample complexity. The issue of sample complexity discussed in the dependence of data
splitting algorithms. In line with the expectations, every supervised learning classifier demonstrated
different capability for different data splitting algorithms and no way to calculate overall ranking of
techniques was directly available. We specifically focused the classifier ranking dependence of data
splitting algorithms and devised a model built on weighted average rank Weighted Mean Rank Risk
Adjusted Model (WMRRAM) for consent ranking of learning classifier algorithms.
KEY WORDS
Supervised Learning Algorithms, Data Splitting Algorithms, Ranking, Weighted Mean rank risk-adjusted
Model
1. INTRODUCTION
Building computational models with generalization capabilities and high predictions are one of
the main needs of machine learning algorithms. Computational methods of supervised learning
algorithms are trained to estimate the output of an unknown target variable/function. The
noteworthy point is that the trained datasets should also be able to generalize the unseen
datasets. Over-training comes in the category of poor generalization of trained model and if the
model over train the correct output is not possible. Also Sometimes there exist situations when
only one dataset is accessible and we are not accomplishing to gather new dataset set there we
need some scheme to cope with the absence of data by splitting the available into training and
test data but the splitting criteria may induce biasness in the comparison of the supervised
learning classifiers. Various data splitting algorithms used to split the original datasets in to
training and test datasets.
Which supervised classifier outperforms to the other is restricted to a given domain of the
instances provided by the splitting algorithms. The appraisal of whether the selection of splitting
algorithm influence the performance of classifiers we compare four standard data splitting
methods using multiple datasets balance, an imbalance with two and maximum six classes from
UCI repository. Fifteen supervised learning classifiers learned to hypothesize whether the
performance of the classifiers affected by data and sample complexity or by wrong choice of
learning classifier. Stability of the data-splitting algorithms measured in the rapport of error rate
of individual supervised learning classifier.
International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020
16
2. DATA SPLITTING APPROACHES
In the case when only one dataset is available, numerous possible methods can come into
consideration to make the required task of learning the machine algorithms. Splitting the data is
widely used study design in high dimensional datasets and it is possible to split the available
original datasets into training, testing and validation datasets [1].
2.1. Training Datasets
A subset of original datasets used for estimating and learning the parameter of the required
machine learning algorithms.
2.2. Testing Datasets
A subset of original datasets used to estimate the performance of the required learning model.
3. STANDARD DATA SPLITTING ALGORITHMS
Several data splitting algorithms proposed in literature but it’s crucial to say that the complexity
and the quality of these algorithms outperform to each other and statistically significant.
Following data splitting algorithm are compared and used commonly:
3.1. Hold-Out-Method
Hold-out-method also called test sample estimation [2] is the simplest method in the class of all
data splitting algorithms that divides the original datasets randomly into training and testing
datasets. Mostly studied commonly used 25:75, 30:70, 90:10, 66:44 and 50:50 holdout sets[3]
in training and testing datasets. The holdout method cause the increase in biasness in two data
sets i.e. training and testing datasets because both may have different distributions. The main
drawback of holdout method is that if the data is not large than this method is inefficient in its
performance. For example in the classification problem it might be possible that subset consist
of any missed class instance which cause the inefficient estimation and evaluation of model. For
the cause of better results and to reduce the bias the method is iteratively used on datasets and
the average of the resulted accuracy is calculated overall iterations. The above procedure is also
called the repeated holdout method. Hold-out-method is a common method to avoid over
training of data[4]
3.2. Leave One-Out Method
Leave one-out Method is described as the special case of the k-fold cross validation method
where k=n. as n is the size of the original datasets and each train set has only one instance to
learn [5]. This method does not involve any subsampling and produce unbiased estimates with
large variation. The drawback of this method is that it is expensive and difficult to applicable in
many real situations.
3.3. Cross Validation Method
Cross Validation Method is the most popular resampling technique. We call it as k-fold cross
validation and sometimes rotation estimation method [2] where k is the parameter and the
original dataset is divided into the disjoint fold of the equal sizes. In each turn only one k-fold is
used for testing dataset and the remaining k-1 used as training datasets .the average of all
International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020
17
accuracies is the resulting output of the model. The main drawback of this method is that it
suffers in the pessimistic bias and by increasing the folds bias may be reduce as the resultant
increase in variance. Mostly k is unfixed but commonly k is fixed at tenfold [3]that shows good
results on different domain of datasets. This method is similar to the repeated holdout method
where we use all the instances iteratively to learn and evaluation of the model.
Figure 1: Strategy of Cross-validation
3.4. Bootstrap Method
Bootstrapping is a probabilistic statistical method and often used in situation where it is difficult
to compute standard error by parametric methods Bootstrap Method generates bootstrap sample
with replacement from the original datasets[6]. As in sampling with replacement each instance
has an equal chance being selected more than once. Thus the overall error of the predicted
model is given by averaging all bootstrap estimates. The most commonly used bootstrap
approach which can also considered is 0.632bootstrap where 0.632 is the expected fraction of
the instance that appeared in the 63.2%trainng set from the original dataset and the remaining
36.8% appears as testing instances. Symbolically the 0.632 bootstrap is defined
as TiB
B
i
i BAccBAcc
B
TAcc )(638.0)(632.0
1
)( ,
1
 
(2) where IBiBAcc )( is the accuracy
of the model build with bootstrap training datasets and is the accuracy of the original datasets
[1]. Bootstrap method proves best for small datasets and show high bias with high variability.
Figure 2: Bootstrap Strategy
4. EXPERIMENTAL DATASETS
We used six benchmark real world datasets from UCI repository and chosen datasets are from
multiple fields consist of balance, imbalance and multiclass datasets to check the efficacy of the
International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020
18
data splitting algorithms in dependence of different domain of datasets. Detailed description of
benchmark datasets with corresponding characteristics are detailed below in table.
Table 1: Experimental Benchmark Datasets
Data sets No. of
instances
Balanced
Imbalanced
Dimensions Classes Area
Abalone 4177 Imbalanced 08 03 Life
Breast Tissue 106 Imbalanced 09 06 Life
Wine 6463 Imbalanced 13 02 Social
Iris 150 Balanced 04 03 Plant
Car 38 Imbalanced 07 05 Social
Diabetes 768 Imbalanced 08 02 Life
5. EVALUATION MEASURES FOR DATA SPLITTING ALGORITHMS
Evaluation measures used to assess the data splitting algorithms are the competency of the data
splitting techniques to select instances to train the model. Efficacy of the data splitting
algorithms is measured in differences between the error rate of instance classification to the
target class of the original datasets and the test datasets. Moreover the performance of the
splitting algorithms also measured in expressions of the user purposed and automatic selection
of instances by the data splitting algorithms in account of learning time of models.
6. COMPARISON OF RESULTS
Boxplot is used to present the results of standard data splitting methods for fifteen supervised
classification algorithms on multiple datasets separately. Dataset generated by data splitting
algorithms used to train the supervised classification models on training set and performance of
the supervised classifiers attained on the unseen dataset (testing dataset). The Individual sub-
figure of boxplot corresponds to a data splitting algorithm performance of the supervised
classifiers on benchmark datasets. The first dataset is the abalone multiclass imbalance dataset
containing 4177 instances in three classes and split into training and testing dataset by using the
four standard data splitting algorithms. Fifteen Supervised learning classifiers trained a model
on training dataset and results attained from unseen (testing) dataset. Significant performance
with the small variance observed by Cross-validation algorithm holdout, Leave-one-out and
bootstrap method shows the high variance for all supervised learning classifiers. Wine dataset is
the biggest dataset used for evaluation is multiclass imbalance dataset contains 6463 instances
in three classes and split into training and testing dataset by using the four standard data splitting
methods. Fifteen Supervised learning classifiers trained a model on training dataset and results
attained from unseen (testing) dataset. On this dataset holdout, method rule good performance
with the small variance than other three methods such as bootstrap, cross-validation and Leave-
one-out method. The performance prediction of Leave-one-out- method shows the largest
variance but shows stability to be optimistic. Iris dataset is the balanced three-class benchmark
data set from the UCI repository with 150 instances. The Leave-one-out method significantly
performs worst following the cross-validation method with high variance when supervised
classifiers trained the unseen data. The Holdout method performs well with small variance
among the other three data splitting algorithms. Performance of the bootstrap method is also
acceptable but has had a large variance than holdout method. The Diabetes dataset is an
imbalanced two-class dataset with 768 instances. As in the comparison of other imbalance,
International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020
19
dataset bootstrap method performs better, when supervised classifiers trained the model on
unseen datasets. In comparison to diabetes, boxplot shows an improvement for all data splitting
algorithms holdout, cross- validation and Leave-one-out with small variance. Breast tissue and
car datasets are relatively small multiclass imbalanced datasets following six and five classes as
compared to other datasets with 106 and 38 instances. The purpose of including this benchmark
dataset is to access the performance of data splitting algorithms on small data sets and we
obtained incredible results with small variance and have the same patterned of error rate
following all data splitting algorithms excluding holdout method has had large variance on car
data set. No algorithm outperforms other algorithm on all benchmark datasets because if one
data-splitting algorithm attainted better result with one supervised algorithm than in some cases
it gives a poor result with other supervised algorithms. On multiclass and small datasets cross
validation, bootstrap and Leave-one-out data algorithms shows good result while the
performance of holdout algorithm is pessimistic bias because they use all dataset for learning
the algorithm. On a balanced multiclass dataset, bootstrap has a good result but cross -validation
and the Leave-one-out shows optimistic results. On very large binary class dataset,
approximately all data-splitting algorithms perform better except Leave-one-out method. An
obvious and noteworthy difference among performances of the supervised learning algorithms
observed dependence of type of instances by user proposed and the data splitting algorithms.
The type of instances used to build the model affects the performance results of classifiers
significantly.
HM BM CV LOM
-0.10-0.050.000.050.10
Abalone
Methods
DifferenaceinErrors
HM BM CV LOM
-0.03-0.02-0.010.000.01
Wine
Methods
DifferanceinError
HM BM CV LOM
-0.020.000.020.040.060.080.10
Iris
Methods
DifferanceinError
HM BM CV LOM
-0.50.00.5
Breast Tissue
Methods
DifferanceinError
HM BM CV LOM
-0.10.00.10.20.3
Diabeties
Methods
DifferanceinError
HM BM CV LOM
-0.50.00.51.0
Car
Methods
DifferanceinError
Figure 3: Difference in the Error rate of supervised Algorithms in dependence of User
Proposed and Data Splitting Algorithms
International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020
20
7. WEIGHTED MEAN RANK RISK ADJUSTED MODEL (WMRRAM)
However, the overall result shows that the learning classifier performance for six different
datasets dependence of data splitting algorithms is comparable and noteworthy variation exist in
the rank of the classifiers. To overcome the variation in the rank data and come to the
consolidate result WMRRAM model is used. The method that I will call the method of
Weighted Mean Rank Risk Adjusted Model implicates first ranking the datasets in each column
of two-way table by computing overall mean and standard deviation of the weighted rank
datasets. The first step is to form the Meta table by ranking the supervised algorithm
dependence of data splitting algorithms by given a lowest error rate a rank of 1 ,the next lowest
error rate a rank of 2 and so on. Thus in each row of Meta table we have a set of values from 1
to 4, since there are 4 data splitting algorithms. Second step is stacking. Stacked generalization
known as stacking in literature review is a scheme of combining the output of multiple
classifiers in such a way that the output compares with the independent set of instances and the
true class[7] in our case by data splitting algorithms. As stacking covers the concept of Meta
learning [7] so at first N supervised classifiers iS , Ni ,....2,1 learnt from data splitting
algorithms for each multiple datasets iD , .,...,2,1 Ni  Output of the supervised classifiers
iS on the evaluation datasets ranked subsequently by the performance of standard data splitting
algorithms. The outperform algorithm assigned rank 1; rank 2 is for runner-up and so on. We
assigned average rank to overcome the situation where multiple data algorithms have had same
performance. Let )(iw denote the weights assigned iteratively to the ithdata splitting algorithm
where 1)(0  iw and used them to form new instances jI , Kj ,....2,1 of new dataset Z ,
which will then aid as a meta-level evaluation dataset. Each instance of the Z dataset will be of
the form iS ( jI ). Finally, we persuaded a global weighted mean rank risk adjusted model from
the Z meta-dataset. The main advantage of the stacking is that learning algorithm with the best
mean rank may be one who gets quite few poor ranks because of some other characteristics do
not take account the variability in the ranks. For consensus ranking of the supervised learning
algorithms dependence of data splitting algorithms we use Z meta-dataset. Risk is widely
studied topic particularly from the decision making point of view and discussed in many
dimensions [8]. Decision makers can assign arbitrary numbers for weights. The performed
calculations were based on the weights of each characteristic and the weighted mean rank do not
take account the variability in the ranks and there may be possibility that the supervised learning
algorithm dependence of data splitting algorithms with the best mean rank may be one who gets
quite few poor ranks because of some other data splitting algorithm. In order to grasp a
consensus, result we used a WMRRAM approach. In WMRRAM model risk is taking as
variability and uncertainty in ranking of different learning algorithms and statistical properties
of the rank data is used to reveal which supervised learning algorithms is ranked highest and
which is ranked second and so on dependence of data splitting algorithm. The overall mean rank
obtained by using formula inspired by Friedman’s M statistic [9] and standard deviation
z calculated by using the formula:
6
6
1






 J
JZ I (1)
1
)(
6
1
2




J
I
j
Zj
Z


(2)
International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020
21
Where j denotes the multiple datasets, include in study for the evaluation of the performance of
supervised classifier dependence of data splitting algorithm and j= 1, 2…6. The WMRRAM for
the consensus ranking of multiple supervised classifiers are:
ZiZWMRRAM   (3)
i. e. the increase or decrease will be in proportion to variations in the ranks obtained by different
classifiers.
Following table shows the ranking behavior of the supervised algorithms with dependence
of data splitting algorithms.
Table 2: Meta Table of Ranking of Supervised Classifiers dependence of
Data Splitting Algorithms
Classifiers WMRRAM Rank
LDA 5.80676102 1
K-NN 6.91198256 5
ID3 12.0018463 15
MLP 6.63202482 4
NBC 9.44948222 11
BVM 6.02228655 3
CVM 6.00478598 2
C4.5 8.43272106 9
C-RT 9.89072464 12
CS-CRT 9.95330797 13
CS-MC4 7.94174278 8
C-SVC 7.19845443 6
PLS-DA 11.0797238 14
PLS-LDA 9.2770369 10
RFT 7.55490695 7
Figure 4: Graphicl representation of Ranking of Supervised Classifiers dependence
of Data Splitting Algorithms
International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020
22
8. CONCLUSION
Evaluation of learning classifier performance and comparisons trendy nowadays and after
studying the literature, a decision drained is that most articles just focus on some known
learning algorithm performances with one or two data sets only without centering the quality
and ratio of instances used to train or test the model. All learning algorithms include pros and
cons but with measuring the performance of a specific algorithm this work show the impact of
data splitting algorithms on the ranking of learning algorithms by using the proposed model of
WMRRA. Results show that the performance of the learning classifiers varies with the data
domain and these domains fixed in the framework of the number of instances and attributes
used in the comparison of learning classifiers. Considering the WMRRA model, the classifier
LDA met the highest-ranking score with a rank of 1, CVM, BVM, MLP followed a rank 2, 3, 4
and ID3 with a rank of 15 dependence data splitting algorithms In short, classifiers ranking is
strongly robust to the dependence of sample complexity. Now, it is feasible because of the
methodology used, all the learning classifiers obtained acceptable performance rates and had an
adequate ranking in all related characteristics used. However, analyzing the result, mined from
the software it was quite problematic to select a learning algorithm with the best performance.
With reference to the above conclusion, the approach of the WMRRA model provides the best
possible way of a ranking of the learning classifiers.
REFERENCES
[1] K. K. Dobbin and R. M. Simon, "Optimally splitting cases for training and testing high dimensional
classifiers," Dobbin and Simon BMC Medical Genomics vol. 4, no. 31, pp. 1-8, 2011.
[2] R. Kohavi, "A Study of CrossValidation and Bootstrap for Accuracy Estimation and Model Selecti,"
presented at the International Joint Conference on Articial Intelligence IJCA, 1995.
[3] J. Awwalu and O. F. Nonyelum, "On Holdout and Cross Validation A Comparison between Neural
Network and Support Vector Machine " International Journal of Trend in Research and
Development, vol. 6, no. 2, pp. 235-239, 2019.
[4] Z. Reitermanov´a, "Data Splitting," 2010.
[5] Y. Xu and R. Goodacre, "On Splitting Training and Validation Set: A Comparative Study of
Cross‑Validation, Bootstrap and Systematic Sampling for Estimating the Generalization
Performance of Supervised Learning," Journal of Analysis and Testing, vol. 2, pp. 249–262 2018.
[6] CatherineChampagne, HeatherMcNairn, B. Daneshfar, and J. Shang, "A bootstrap method for
assessing classification accuracy and confidence for agricultural land use mapping in Canada,"
International Journal of Applied Earth Observation and Geoinformation, vol. 29, pp. 44-52, 2014
.
[7] AgapitoLedezma, RicardoAler, AraceliSanchis, and DanielBorrajo, "GA-stacking:
Evolutionarystacked generalization," Intelligent Data Analysis pp. 1-31, 2010, doi: 10.3233/IDA-
2010-0410.
[8] A. Gosavi, "Analyzing Responses from Likert Surveys and Risk-adjusted Ranking: A Data
Analytics Perspective," Procedia Computer Science, vol. 61, pp. 24-31, 2015, doi:
doi.org/10.1016/j.procs.2015.09.139.
[9] S. M. Abdulrahman, P. Brazdil, J. N. v. Rijn, and J. Vanschoren, "Speeding up algorithm selection
using average ranking and active testing by introducing runtime," Machine Learning volume, vol.
107, pp. 79-108, 2017, doi: doi.org/10.1007/s10994-017-5687-8

More Related Content

What's hot (20)

Booster in High Dimensional Data Classification
Booster in High Dimensional Data ClassificationBooster in High Dimensional Data Classification
Booster in High Dimensional Data Classification
rahulmonikasharma
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
ahmad abdelhafeez
 
Dissertation
DissertationDissertation
Dissertation
Mefratechnologies
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Sunil Nair
 
IRJET- Disease Prediction System
IRJET- Disease Prediction SystemIRJET- Disease Prediction System
IRJET- Disease Prediction System
IRJET Journal
 
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
ijcsa
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
csandit
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...
IJMER
 
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
C LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHmC LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHm
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
IJCI JOURNAL
 
G046024851
G046024851G046024851
G046024851
IJERA Editor
 
Fault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringFault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clustering
IRJET Journal
 
T180203125133
T180203125133T180203125133
T180203125133
IOSR Journals
 
DataMining_CA2-4
DataMining_CA2-4DataMining_CA2-4
DataMining_CA2-4
Aravind Kumar
 
Effective Feature Selection for Feature Possessing Group Structure
Effective Feature Selection for Feature Possessing Group StructureEffective Feature Selection for Feature Possessing Group Structure
Effective Feature Selection for Feature Possessing Group Structure
rahulmonikasharma
 
Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...
eSAT Publishing House
 
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET Journal
 
IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...
IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...
IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...
IJDKP
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
Editor IJMTER
 
Ijmet 10 01_141
Ijmet 10 01_141Ijmet 10 01_141
Ijmet 10 01_141
IAEME Publication
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
aciijournal
 
Booster in High Dimensional Data Classification
Booster in High Dimensional Data ClassificationBooster in High Dimensional Data Classification
Booster in High Dimensional Data Classification
rahulmonikasharma
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
ahmad abdelhafeez
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Sunil Nair
 
IRJET- Disease Prediction System
IRJET- Disease Prediction SystemIRJET- Disease Prediction System
IRJET- Disease Prediction System
IRJET Journal
 
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
ijcsa
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
csandit
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...
IJMER
 
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
C LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHmC LUSTERING  B ASED  A TTRIBUTE  S UBSET  S ELECTION  U SING  F AST  A LGORITHm
C LUSTERING B ASED A TTRIBUTE S UBSET S ELECTION U SING F AST A LGORITHm
IJCI JOURNAL
 
Fault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringFault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clustering
IRJET Journal
 
Effective Feature Selection for Feature Possessing Group Structure
Effective Feature Selection for Feature Possessing Group StructureEffective Feature Selection for Feature Possessing Group Structure
Effective Feature Selection for Feature Possessing Group Structure
rahulmonikasharma
 
Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...Comparative study of various supervisedclassification methodsforanalysing def...
Comparative study of various supervisedclassification methodsforanalysing def...
eSAT Publishing House
 
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET-  	  Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
IRJET Journal
 
IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...
IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...
IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...
IJDKP
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
Editor IJMTER
 
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary AlgorithmAutomatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithm
aciijournal
 

Similar to Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algorithms: A Case Study of Real Datasets (20)

Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
aciijournal
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
aciijournal
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
aciijournal
 
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
IRJET Journal
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
IRJET Journal
 
An advance extended binomial GLMBoost ensemble method with synthetic minorit...
An advance extended binomial GLMBoost ensemble method  with synthetic minorit...An advance extended binomial GLMBoost ensemble method  with synthetic minorit...
An advance extended binomial GLMBoost ensemble method with synthetic minorit...
IJECEIAES
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
Swati .
 
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
IRJET Journal
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
IRJET Journal
 
Ijetr021251
Ijetr021251Ijetr021251
Ijetr021251
Engineering Research Publication
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Editor IJCATR
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
ijsc
 
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
JVSTHARUNSAI
 
U0 vqmtq2otq=
U0 vqmtq2otq=U0 vqmtq2otq=
U0 vqmtq2otq=
International Journal of Science and Research (IJSR)
 
Machine Learning: Transforming Data into Insights
Machine Learning: Transforming Data into InsightsMachine Learning: Transforming Data into Insights
Machine Learning: Transforming Data into Insights
pemac73062
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
IOSR Journals
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
Editor IJCATR
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret language
javed khan
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
IJDKP
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted Cluster
IOSR Journals
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
aciijournal
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
aciijournal
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
aciijournal
 
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
IRJET Journal
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
IRJET Journal
 
An advance extended binomial GLMBoost ensemble method with synthetic minorit...
An advance extended binomial GLMBoost ensemble method  with synthetic minorit...An advance extended binomial GLMBoost ensemble method  with synthetic minorit...
An advance extended binomial GLMBoost ensemble method with synthetic minorit...
IJECEIAES
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
Swati .
 
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
IRJET Journal
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
IRJET Journal
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Editor IJCATR
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
ijsc
 
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
JVSTHARUNSAI
 
Machine Learning: Transforming Data into Insights
Machine Learning: Transforming Data into InsightsMachine Learning: Transforming Data into Insights
Machine Learning: Transforming Data into Insights
pemac73062
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
IOSR Journals
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
Editor IJCATR
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret language
javed khan
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
IJDKP
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted Cluster
IOSR Journals
 

More from AIRCC Publishing Corporation (20)

CFP : 15th International Conference on Computer Science, Engineering and Appl...
CFP : 15th International Conference on Computer Science, Engineering and Appl...CFP : 15th International Conference on Computer Science, Engineering and Appl...
CFP : 15th International Conference on Computer Science, Engineering and Appl...
AIRCC Publishing Corporation
 
The Study of Artificial Intelligent Building Automation Control System in Hon...
The Study of Artificial Intelligent Building Automation Control System in Hon...The Study of Artificial Intelligent Building Automation Control System in Hon...
The Study of Artificial Intelligent Building Automation Control System in Hon...
AIRCC Publishing Corporation
 
CFP : 7th International Conference on Internet of Things (CIoT 2025)
CFP : 7th International Conference on Internet of Things (CIoT 2025)CFP : 7th International Conference on Internet of Things (CIoT 2025)
CFP : 7th International Conference on Internet of Things (CIoT 2025)
AIRCC Publishing Corporation
 
CFP : 5th International Conference on Advances in Computing & Information Tec...
CFP : 5th International Conference on Advances in Computing & Information Tec...CFP : 5th International Conference on Advances in Computing & Information Tec...
CFP : 5th International Conference on Advances in Computing & Information Tec...
AIRCC Publishing Corporation
 
An Intelligent Self-Adaptable Application to Support Children Education and L...
An Intelligent Self-Adaptable Application to Support Children Education and L...An Intelligent Self-Adaptable Application to Support Children Education and L...
An Intelligent Self-Adaptable Application to Support Children Education and L...
AIRCC Publishing Corporation
 
Developing a Framework for Online Practice Examination and Automated Score Ge...
Developing a Framework for Online Practice Examination and Automated Score Ge...Developing a Framework for Online Practice Examination and Automated Score Ge...
Developing a Framework for Online Practice Examination and Automated Score Ge...
AIRCC Publishing Corporation
 
Call for Papers - 6th International Conference on Advances in Artificial Inte...
Call for Papers - 6th International Conference on Advances in Artificial Inte...Call for Papers - 6th International Conference on Advances in Artificial Inte...
Call for Papers - 6th International Conference on Advances in Artificial Inte...
AIRCC Publishing Corporation
 
Architectural Aspect-Aware Design for IoT Applications: Conceptual Proposal
Architectural Aspect-Aware Design for IoT Applications: Conceptual ProposalArchitectural Aspect-Aware Design for IoT Applications: Conceptual Proposal
Architectural Aspect-Aware Design for IoT Applications: Conceptual Proposal
AIRCC Publishing Corporation
 
CFP : 6th International Conference on Big Data, Machine Learning and IoT (BML...
CFP : 6th International Conference on Big Data, Machine Learning and IoT (BML...CFP : 6th International Conference on Big Data, Machine Learning and IoT (BML...
CFP : 6th International Conference on Big Data, Machine Learning and IoT (BML...
AIRCC Publishing Corporation
 
Call for Papers - 14th International Conference on Soft Computing, Artificial...
Call for Papers - 14th International Conference on Soft Computing, Artificial...Call for Papers - 14th International Conference on Soft Computing, Artificial...
Call for Papers - 14th International Conference on Soft Computing, Artificial...
AIRCC Publishing Corporation
 
Call for Papers - 6th International Conference on Big Data and Machine Learni...
Call for Papers - 6th International Conference on Big Data and Machine Learni...Call for Papers - 6th International Conference on Big Data and Machine Learni...
Call for Papers - 6th International Conference on Big Data and Machine Learni...
AIRCC Publishing Corporation
 
5th International Conference on Advances in Computing & Information Technolog...
5th International Conference on Advances in Computing & Information Technolog...5th International Conference on Advances in Computing & Information Technolog...
5th International Conference on Advances in Computing & Information Technolog...
AIRCC Publishing Corporation
 
Call for Papers - 6 th International Conference on Machine Learning & Trends ...
Call for Papers - 6 th International Conference on Machine Learning & Trends ...Call for Papers - 6 th International Conference on Machine Learning & Trends ...
Call for Papers - 6 th International Conference on Machine Learning & Trends ...
AIRCC Publishing Corporation
 
Call for Papers - 6th International Conference on Natural Language Computing ...
Call for Papers - 6th International Conference on Natural Language Computing ...Call for Papers - 6th International Conference on Natural Language Computing ...
Call for Papers - 6th International Conference on Natural Language Computing ...
AIRCC Publishing Corporation
 
Call for Papers - 12th International Conference on Cybernetics & Informatics ...
Call for Papers - 12th International Conference on Cybernetics & Informatics ...Call for Papers - 12th International Conference on Cybernetics & Informatics ...
Call for Papers - 12th International Conference on Cybernetics & Informatics ...
AIRCC Publishing Corporation
 
Enhancing Public Reputation Systems: Trust Scaling to Mitigate Voter Subjecti...
Enhancing Public Reputation Systems: Trust Scaling to Mitigate Voter Subjecti...Enhancing Public Reputation Systems: Trust Scaling to Mitigate Voter Subjecti...
Enhancing Public Reputation Systems: Trust Scaling to Mitigate Voter Subjecti...
AIRCC Publishing Corporation
 
Artificial Intelligence and Machine Learning Algorithms Are Used to Detect an...
Artificial Intelligence and Machine Learning Algorithms Are Used to Detect an...Artificial Intelligence and Machine Learning Algorithms Are Used to Detect an...
Artificial Intelligence and Machine Learning Algorithms Are Used to Detect an...
AIRCC Publishing Corporation
 
Call for Papers - 12th International Conference on Cybernetics & Informatics ...
Call for Papers - 12th International Conference on Cybernetics & Informatics ...Call for Papers - 12th International Conference on Cybernetics & Informatics ...
Call for Papers - 12th International Conference on Cybernetics & Informatics ...
AIRCC Publishing Corporation
 
Analysis of Random Distortions in the Elements of the Basic Cell for an Analo...
Analysis of Random Distortions in the Elements of the Basic Cell for an Analo...Analysis of Random Distortions in the Elements of the Basic Cell for an Analo...
Analysis of Random Distortions in the Elements of the Basic Cell for an Analo...
AIRCC Publishing Corporation
 
Call for papers - 6th International Conference on NLP & Information Retrieval...
Call for papers - 6th International Conference on NLP & Information Retrieval...Call for papers - 6th International Conference on NLP & Information Retrieval...
Call for papers - 6th International Conference on NLP & Information Retrieval...
AIRCC Publishing Corporation
 
CFP : 15th International Conference on Computer Science, Engineering and Appl...
CFP : 15th International Conference on Computer Science, Engineering and Appl...CFP : 15th International Conference on Computer Science, Engineering and Appl...
CFP : 15th International Conference on Computer Science, Engineering and Appl...
AIRCC Publishing Corporation
 
The Study of Artificial Intelligent Building Automation Control System in Hon...
The Study of Artificial Intelligent Building Automation Control System in Hon...The Study of Artificial Intelligent Building Automation Control System in Hon...
The Study of Artificial Intelligent Building Automation Control System in Hon...
AIRCC Publishing Corporation
 
CFP : 7th International Conference on Internet of Things (CIoT 2025)
CFP : 7th International Conference on Internet of Things (CIoT 2025)CFP : 7th International Conference on Internet of Things (CIoT 2025)
CFP : 7th International Conference on Internet of Things (CIoT 2025)
AIRCC Publishing Corporation
 
CFP : 5th International Conference on Advances in Computing & Information Tec...
CFP : 5th International Conference on Advances in Computing & Information Tec...CFP : 5th International Conference on Advances in Computing & Information Tec...
CFP : 5th International Conference on Advances in Computing & Information Tec...
AIRCC Publishing Corporation
 
An Intelligent Self-Adaptable Application to Support Children Education and L...
An Intelligent Self-Adaptable Application to Support Children Education and L...An Intelligent Self-Adaptable Application to Support Children Education and L...
An Intelligent Self-Adaptable Application to Support Children Education and L...
AIRCC Publishing Corporation
 
Developing a Framework for Online Practice Examination and Automated Score Ge...
Developing a Framework for Online Practice Examination and Automated Score Ge...Developing a Framework for Online Practice Examination and Automated Score Ge...
Developing a Framework for Online Practice Examination and Automated Score Ge...
AIRCC Publishing Corporation
 
Call for Papers - 6th International Conference on Advances in Artificial Inte...
Call for Papers - 6th International Conference on Advances in Artificial Inte...Call for Papers - 6th International Conference on Advances in Artificial Inte...
Call for Papers - 6th International Conference on Advances in Artificial Inte...
AIRCC Publishing Corporation
 
Architectural Aspect-Aware Design for IoT Applications: Conceptual Proposal
Architectural Aspect-Aware Design for IoT Applications: Conceptual ProposalArchitectural Aspect-Aware Design for IoT Applications: Conceptual Proposal
Architectural Aspect-Aware Design for IoT Applications: Conceptual Proposal
AIRCC Publishing Corporation
 
CFP : 6th International Conference on Big Data, Machine Learning and IoT (BML...
CFP : 6th International Conference on Big Data, Machine Learning and IoT (BML...CFP : 6th International Conference on Big Data, Machine Learning and IoT (BML...
CFP : 6th International Conference on Big Data, Machine Learning and IoT (BML...
AIRCC Publishing Corporation
 
Call for Papers - 14th International Conference on Soft Computing, Artificial...
Call for Papers - 14th International Conference on Soft Computing, Artificial...Call for Papers - 14th International Conference on Soft Computing, Artificial...
Call for Papers - 14th International Conference on Soft Computing, Artificial...
AIRCC Publishing Corporation
 
Call for Papers - 6th International Conference on Big Data and Machine Learni...
Call for Papers - 6th International Conference on Big Data and Machine Learni...Call for Papers - 6th International Conference on Big Data and Machine Learni...
Call for Papers - 6th International Conference on Big Data and Machine Learni...
AIRCC Publishing Corporation
 
5th International Conference on Advances in Computing & Information Technolog...
5th International Conference on Advances in Computing & Information Technolog...5th International Conference on Advances in Computing & Information Technolog...
5th International Conference on Advances in Computing & Information Technolog...
AIRCC Publishing Corporation
 
Call for Papers - 6 th International Conference on Machine Learning & Trends ...
Call for Papers - 6 th International Conference on Machine Learning & Trends ...Call for Papers - 6 th International Conference on Machine Learning & Trends ...
Call for Papers - 6 th International Conference on Machine Learning & Trends ...
AIRCC Publishing Corporation
 
Call for Papers - 6th International Conference on Natural Language Computing ...
Call for Papers - 6th International Conference on Natural Language Computing ...Call for Papers - 6th International Conference on Natural Language Computing ...
Call for Papers - 6th International Conference on Natural Language Computing ...
AIRCC Publishing Corporation
 
Call for Papers - 12th International Conference on Cybernetics & Informatics ...
Call for Papers - 12th International Conference on Cybernetics & Informatics ...Call for Papers - 12th International Conference on Cybernetics & Informatics ...
Call for Papers - 12th International Conference on Cybernetics & Informatics ...
AIRCC Publishing Corporation
 
Enhancing Public Reputation Systems: Trust Scaling to Mitigate Voter Subjecti...
Enhancing Public Reputation Systems: Trust Scaling to Mitigate Voter Subjecti...Enhancing Public Reputation Systems: Trust Scaling to Mitigate Voter Subjecti...
Enhancing Public Reputation Systems: Trust Scaling to Mitigate Voter Subjecti...
AIRCC Publishing Corporation
 
Artificial Intelligence and Machine Learning Algorithms Are Used to Detect an...
Artificial Intelligence and Machine Learning Algorithms Are Used to Detect an...Artificial Intelligence and Machine Learning Algorithms Are Used to Detect an...
Artificial Intelligence and Machine Learning Algorithms Are Used to Detect an...
AIRCC Publishing Corporation
 
Call for Papers - 12th International Conference on Cybernetics & Informatics ...
Call for Papers - 12th International Conference on Cybernetics & Informatics ...Call for Papers - 12th International Conference on Cybernetics & Informatics ...
Call for Papers - 12th International Conference on Cybernetics & Informatics ...
AIRCC Publishing Corporation
 
Analysis of Random Distortions in the Elements of the Basic Cell for an Analo...
Analysis of Random Distortions in the Elements of the Basic Cell for an Analo...Analysis of Random Distortions in the Elements of the Basic Cell for an Analo...
Analysis of Random Distortions in the Elements of the Basic Cell for an Analo...
AIRCC Publishing Corporation
 
Call for papers - 6th International Conference on NLP & Information Retrieval...
Call for papers - 6th International Conference on NLP & Information Retrieval...Call for papers - 6th International Conference on NLP & Information Retrieval...
Call for papers - 6th International Conference on NLP & Information Retrieval...
AIRCC Publishing Corporation
 

Recently uploaded (20)

Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
APGAR SCORE BY sweety Tamanna Mahapatra MSc Pediatric
APGAR SCORE  BY sweety Tamanna Mahapatra MSc PediatricAPGAR SCORE  BY sweety Tamanna Mahapatra MSc Pediatric
APGAR SCORE BY sweety Tamanna Mahapatra MSc Pediatric
SweetytamannaMohapat
 
Contact Lens:::: An Overview.pptx.: Optometry
Contact Lens:::: An Overview.pptx.: OptometryContact Lens:::: An Overview.pptx.: Optometry
Contact Lens:::: An Overview.pptx.: Optometry
MushahidRaza8
 
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
TechSoup
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Link your Lead Opportunities into Spreadsheet using odoo CRM
Link your Lead Opportunities into Spreadsheet using odoo CRMLink your Lead Opportunities into Spreadsheet using odoo CRM
Link your Lead Opportunities into Spreadsheet using odoo CRM
Celine George
 
Herbs Used in Cosmetic Formulations .pptx
Herbs Used in Cosmetic Formulations .pptxHerbs Used in Cosmetic Formulations .pptx
Herbs Used in Cosmetic Formulations .pptx
RAJU THENGE
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
THE STG QUIZ GROUP D.pptx quiz by Ridip Hazarika
THE STG QUIZ GROUP D.pptx   quiz by Ridip HazarikaTHE STG QUIZ GROUP D.pptx   quiz by Ridip Hazarika
THE STG QUIZ GROUP D.pptx quiz by Ridip Hazarika
Ridip Hazarika
 
Odoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo SlidesOdoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo Slides
Celine George
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
"Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules""Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules"
rupalinirmalbpharm
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
How to Create A Todo List In Todo of Odoo 18
How to Create A Todo List In Todo of Odoo 18How to Create A Todo List In Todo of Odoo 18
How to Create A Todo List In Todo of Odoo 18
Celine George
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
APGAR SCORE BY sweety Tamanna Mahapatra MSc Pediatric
APGAR SCORE  BY sweety Tamanna Mahapatra MSc PediatricAPGAR SCORE  BY sweety Tamanna Mahapatra MSc Pediatric
APGAR SCORE BY sweety Tamanna Mahapatra MSc Pediatric
SweetytamannaMohapat
 
Contact Lens:::: An Overview.pptx.: Optometry
Contact Lens:::: An Overview.pptx.: OptometryContact Lens:::: An Overview.pptx.: Optometry
Contact Lens:::: An Overview.pptx.: Optometry
MushahidRaza8
 
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
Drive Supporter Growth from Awareness to Advocacy with TechSoup Marketing Ser...
TechSoup
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Link your Lead Opportunities into Spreadsheet using odoo CRM
Link your Lead Opportunities into Spreadsheet using odoo CRMLink your Lead Opportunities into Spreadsheet using odoo CRM
Link your Lead Opportunities into Spreadsheet using odoo CRM
Celine George
 
Herbs Used in Cosmetic Formulations .pptx
Herbs Used in Cosmetic Formulations .pptxHerbs Used in Cosmetic Formulations .pptx
Herbs Used in Cosmetic Formulations .pptx
RAJU THENGE
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
THE STG QUIZ GROUP D.pptx quiz by Ridip Hazarika
THE STG QUIZ GROUP D.pptx   quiz by Ridip HazarikaTHE STG QUIZ GROUP D.pptx   quiz by Ridip Hazarika
THE STG QUIZ GROUP D.pptx quiz by Ridip Hazarika
Ridip Hazarika
 
Odoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo SlidesOdoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo Slides
Celine George
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
"Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules""Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules"
rupalinirmalbpharm
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
How to Create A Todo List In Todo of Odoo 18
How to Create A Todo List In Todo of Odoo 18How to Create A Todo List In Todo of Odoo 18
How to Create A Todo List In Todo of Odoo 18
Celine George
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 

Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algorithms: A Case Study of Real Datasets

  • 1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020 DOI: 10.5121/ijcsit.2020.12202 15 ENACTMENT RANKING OF SUPERVISED ALGORITHMS DEPENDENCE OF DATA SPLITTING ALGORITHMS: A CASE STUDY OF REAL DATASETS Hina Tabassum and Dr. Muhammad Mutahir Iqbal Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan ABSTRACT We conducted comparative analysis of different supervised dimension reduction techniques by integrating a set of different data splitting algorithms and demonstrate the relative efficacy of learning algorithms dependence of sample complexity. The issue of sample complexity discussed in the dependence of data splitting algorithms. In line with the expectations, every supervised learning classifier demonstrated different capability for different data splitting algorithms and no way to calculate overall ranking of techniques was directly available. We specifically focused the classifier ranking dependence of data splitting algorithms and devised a model built on weighted average rank Weighted Mean Rank Risk Adjusted Model (WMRRAM) for consent ranking of learning classifier algorithms. KEY WORDS Supervised Learning Algorithms, Data Splitting Algorithms, Ranking, Weighted Mean rank risk-adjusted Model 1. INTRODUCTION Building computational models with generalization capabilities and high predictions are one of the main needs of machine learning algorithms. Computational methods of supervised learning algorithms are trained to estimate the output of an unknown target variable/function. The noteworthy point is that the trained datasets should also be able to generalize the unseen datasets. Over-training comes in the category of poor generalization of trained model and if the model over train the correct output is not possible. Also Sometimes there exist situations when only one dataset is accessible and we are not accomplishing to gather new dataset set there we need some scheme to cope with the absence of data by splitting the available into training and test data but the splitting criteria may induce biasness in the comparison of the supervised learning classifiers. Various data splitting algorithms used to split the original datasets in to training and test datasets. Which supervised classifier outperforms to the other is restricted to a given domain of the instances provided by the splitting algorithms. The appraisal of whether the selection of splitting algorithm influence the performance of classifiers we compare four standard data splitting methods using multiple datasets balance, an imbalance with two and maximum six classes from UCI repository. Fifteen supervised learning classifiers learned to hypothesize whether the performance of the classifiers affected by data and sample complexity or by wrong choice of learning classifier. Stability of the data-splitting algorithms measured in the rapport of error rate of individual supervised learning classifier.
  • 2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020 16 2. DATA SPLITTING APPROACHES In the case when only one dataset is available, numerous possible methods can come into consideration to make the required task of learning the machine algorithms. Splitting the data is widely used study design in high dimensional datasets and it is possible to split the available original datasets into training, testing and validation datasets [1]. 2.1. Training Datasets A subset of original datasets used for estimating and learning the parameter of the required machine learning algorithms. 2.2. Testing Datasets A subset of original datasets used to estimate the performance of the required learning model. 3. STANDARD DATA SPLITTING ALGORITHMS Several data splitting algorithms proposed in literature but it’s crucial to say that the complexity and the quality of these algorithms outperform to each other and statistically significant. Following data splitting algorithm are compared and used commonly: 3.1. Hold-Out-Method Hold-out-method also called test sample estimation [2] is the simplest method in the class of all data splitting algorithms that divides the original datasets randomly into training and testing datasets. Mostly studied commonly used 25:75, 30:70, 90:10, 66:44 and 50:50 holdout sets[3] in training and testing datasets. The holdout method cause the increase in biasness in two data sets i.e. training and testing datasets because both may have different distributions. The main drawback of holdout method is that if the data is not large than this method is inefficient in its performance. For example in the classification problem it might be possible that subset consist of any missed class instance which cause the inefficient estimation and evaluation of model. For the cause of better results and to reduce the bias the method is iteratively used on datasets and the average of the resulted accuracy is calculated overall iterations. The above procedure is also called the repeated holdout method. Hold-out-method is a common method to avoid over training of data[4] 3.2. Leave One-Out Method Leave one-out Method is described as the special case of the k-fold cross validation method where k=n. as n is the size of the original datasets and each train set has only one instance to learn [5]. This method does not involve any subsampling and produce unbiased estimates with large variation. The drawback of this method is that it is expensive and difficult to applicable in many real situations. 3.3. Cross Validation Method Cross Validation Method is the most popular resampling technique. We call it as k-fold cross validation and sometimes rotation estimation method [2] where k is the parameter and the original dataset is divided into the disjoint fold of the equal sizes. In each turn only one k-fold is used for testing dataset and the remaining k-1 used as training datasets .the average of all
  • 3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020 17 accuracies is the resulting output of the model. The main drawback of this method is that it suffers in the pessimistic bias and by increasing the folds bias may be reduce as the resultant increase in variance. Mostly k is unfixed but commonly k is fixed at tenfold [3]that shows good results on different domain of datasets. This method is similar to the repeated holdout method where we use all the instances iteratively to learn and evaluation of the model. Figure 1: Strategy of Cross-validation 3.4. Bootstrap Method Bootstrapping is a probabilistic statistical method and often used in situation where it is difficult to compute standard error by parametric methods Bootstrap Method generates bootstrap sample with replacement from the original datasets[6]. As in sampling with replacement each instance has an equal chance being selected more than once. Thus the overall error of the predicted model is given by averaging all bootstrap estimates. The most commonly used bootstrap approach which can also considered is 0.632bootstrap where 0.632 is the expected fraction of the instance that appeared in the 63.2%trainng set from the original dataset and the remaining 36.8% appears as testing instances. Symbolically the 0.632 bootstrap is defined as TiB B i i BAccBAcc B TAcc )(638.0)(632.0 1 )( , 1   (2) where IBiBAcc )( is the accuracy of the model build with bootstrap training datasets and is the accuracy of the original datasets [1]. Bootstrap method proves best for small datasets and show high bias with high variability. Figure 2: Bootstrap Strategy 4. EXPERIMENTAL DATASETS We used six benchmark real world datasets from UCI repository and chosen datasets are from multiple fields consist of balance, imbalance and multiclass datasets to check the efficacy of the
  • 4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020 18 data splitting algorithms in dependence of different domain of datasets. Detailed description of benchmark datasets with corresponding characteristics are detailed below in table. Table 1: Experimental Benchmark Datasets Data sets No. of instances Balanced Imbalanced Dimensions Classes Area Abalone 4177 Imbalanced 08 03 Life Breast Tissue 106 Imbalanced 09 06 Life Wine 6463 Imbalanced 13 02 Social Iris 150 Balanced 04 03 Plant Car 38 Imbalanced 07 05 Social Diabetes 768 Imbalanced 08 02 Life 5. EVALUATION MEASURES FOR DATA SPLITTING ALGORITHMS Evaluation measures used to assess the data splitting algorithms are the competency of the data splitting techniques to select instances to train the model. Efficacy of the data splitting algorithms is measured in differences between the error rate of instance classification to the target class of the original datasets and the test datasets. Moreover the performance of the splitting algorithms also measured in expressions of the user purposed and automatic selection of instances by the data splitting algorithms in account of learning time of models. 6. COMPARISON OF RESULTS Boxplot is used to present the results of standard data splitting methods for fifteen supervised classification algorithms on multiple datasets separately. Dataset generated by data splitting algorithms used to train the supervised classification models on training set and performance of the supervised classifiers attained on the unseen dataset (testing dataset). The Individual sub- figure of boxplot corresponds to a data splitting algorithm performance of the supervised classifiers on benchmark datasets. The first dataset is the abalone multiclass imbalance dataset containing 4177 instances in three classes and split into training and testing dataset by using the four standard data splitting algorithms. Fifteen Supervised learning classifiers trained a model on training dataset and results attained from unseen (testing) dataset. Significant performance with the small variance observed by Cross-validation algorithm holdout, Leave-one-out and bootstrap method shows the high variance for all supervised learning classifiers. Wine dataset is the biggest dataset used for evaluation is multiclass imbalance dataset contains 6463 instances in three classes and split into training and testing dataset by using the four standard data splitting methods. Fifteen Supervised learning classifiers trained a model on training dataset and results attained from unseen (testing) dataset. On this dataset holdout, method rule good performance with the small variance than other three methods such as bootstrap, cross-validation and Leave- one-out method. The performance prediction of Leave-one-out- method shows the largest variance but shows stability to be optimistic. Iris dataset is the balanced three-class benchmark data set from the UCI repository with 150 instances. The Leave-one-out method significantly performs worst following the cross-validation method with high variance when supervised classifiers trained the unseen data. The Holdout method performs well with small variance among the other three data splitting algorithms. Performance of the bootstrap method is also acceptable but has had a large variance than holdout method. The Diabetes dataset is an imbalanced two-class dataset with 768 instances. As in the comparison of other imbalance,
  • 5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020 19 dataset bootstrap method performs better, when supervised classifiers trained the model on unseen datasets. In comparison to diabetes, boxplot shows an improvement for all data splitting algorithms holdout, cross- validation and Leave-one-out with small variance. Breast tissue and car datasets are relatively small multiclass imbalanced datasets following six and five classes as compared to other datasets with 106 and 38 instances. The purpose of including this benchmark dataset is to access the performance of data splitting algorithms on small data sets and we obtained incredible results with small variance and have the same patterned of error rate following all data splitting algorithms excluding holdout method has had large variance on car data set. No algorithm outperforms other algorithm on all benchmark datasets because if one data-splitting algorithm attainted better result with one supervised algorithm than in some cases it gives a poor result with other supervised algorithms. On multiclass and small datasets cross validation, bootstrap and Leave-one-out data algorithms shows good result while the performance of holdout algorithm is pessimistic bias because they use all dataset for learning the algorithm. On a balanced multiclass dataset, bootstrap has a good result but cross -validation and the Leave-one-out shows optimistic results. On very large binary class dataset, approximately all data-splitting algorithms perform better except Leave-one-out method. An obvious and noteworthy difference among performances of the supervised learning algorithms observed dependence of type of instances by user proposed and the data splitting algorithms. The type of instances used to build the model affects the performance results of classifiers significantly. HM BM CV LOM -0.10-0.050.000.050.10 Abalone Methods DifferenaceinErrors HM BM CV LOM -0.03-0.02-0.010.000.01 Wine Methods DifferanceinError HM BM CV LOM -0.020.000.020.040.060.080.10 Iris Methods DifferanceinError HM BM CV LOM -0.50.00.5 Breast Tissue Methods DifferanceinError HM BM CV LOM -0.10.00.10.20.3 Diabeties Methods DifferanceinError HM BM CV LOM -0.50.00.51.0 Car Methods DifferanceinError Figure 3: Difference in the Error rate of supervised Algorithms in dependence of User Proposed and Data Splitting Algorithms
  • 6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020 20 7. WEIGHTED MEAN RANK RISK ADJUSTED MODEL (WMRRAM) However, the overall result shows that the learning classifier performance for six different datasets dependence of data splitting algorithms is comparable and noteworthy variation exist in the rank of the classifiers. To overcome the variation in the rank data and come to the consolidate result WMRRAM model is used. The method that I will call the method of Weighted Mean Rank Risk Adjusted Model implicates first ranking the datasets in each column of two-way table by computing overall mean and standard deviation of the weighted rank datasets. The first step is to form the Meta table by ranking the supervised algorithm dependence of data splitting algorithms by given a lowest error rate a rank of 1 ,the next lowest error rate a rank of 2 and so on. Thus in each row of Meta table we have a set of values from 1 to 4, since there are 4 data splitting algorithms. Second step is stacking. Stacked generalization known as stacking in literature review is a scheme of combining the output of multiple classifiers in such a way that the output compares with the independent set of instances and the true class[7] in our case by data splitting algorithms. As stacking covers the concept of Meta learning [7] so at first N supervised classifiers iS , Ni ,....2,1 learnt from data splitting algorithms for each multiple datasets iD , .,...,2,1 Ni  Output of the supervised classifiers iS on the evaluation datasets ranked subsequently by the performance of standard data splitting algorithms. The outperform algorithm assigned rank 1; rank 2 is for runner-up and so on. We assigned average rank to overcome the situation where multiple data algorithms have had same performance. Let )(iw denote the weights assigned iteratively to the ithdata splitting algorithm where 1)(0  iw and used them to form new instances jI , Kj ,....2,1 of new dataset Z , which will then aid as a meta-level evaluation dataset. Each instance of the Z dataset will be of the form iS ( jI ). Finally, we persuaded a global weighted mean rank risk adjusted model from the Z meta-dataset. The main advantage of the stacking is that learning algorithm with the best mean rank may be one who gets quite few poor ranks because of some other characteristics do not take account the variability in the ranks. For consensus ranking of the supervised learning algorithms dependence of data splitting algorithms we use Z meta-dataset. Risk is widely studied topic particularly from the decision making point of view and discussed in many dimensions [8]. Decision makers can assign arbitrary numbers for weights. The performed calculations were based on the weights of each characteristic and the weighted mean rank do not take account the variability in the ranks and there may be possibility that the supervised learning algorithm dependence of data splitting algorithms with the best mean rank may be one who gets quite few poor ranks because of some other data splitting algorithm. In order to grasp a consensus, result we used a WMRRAM approach. In WMRRAM model risk is taking as variability and uncertainty in ranking of different learning algorithms and statistical properties of the rank data is used to reveal which supervised learning algorithms is ranked highest and which is ranked second and so on dependence of data splitting algorithm. The overall mean rank obtained by using formula inspired by Friedman’s M statistic [9] and standard deviation z calculated by using the formula: 6 6 1        J JZ I (1) 1 )( 6 1 2     J I j Zj Z   (2)
  • 7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020 21 Where j denotes the multiple datasets, include in study for the evaluation of the performance of supervised classifier dependence of data splitting algorithm and j= 1, 2…6. The WMRRAM for the consensus ranking of multiple supervised classifiers are: ZiZWMRRAM   (3) i. e. the increase or decrease will be in proportion to variations in the ranks obtained by different classifiers. Following table shows the ranking behavior of the supervised algorithms with dependence of data splitting algorithms. Table 2: Meta Table of Ranking of Supervised Classifiers dependence of Data Splitting Algorithms Classifiers WMRRAM Rank LDA 5.80676102 1 K-NN 6.91198256 5 ID3 12.0018463 15 MLP 6.63202482 4 NBC 9.44948222 11 BVM 6.02228655 3 CVM 6.00478598 2 C4.5 8.43272106 9 C-RT 9.89072464 12 CS-CRT 9.95330797 13 CS-MC4 7.94174278 8 C-SVC 7.19845443 6 PLS-DA 11.0797238 14 PLS-LDA 9.2770369 10 RFT 7.55490695 7 Figure 4: Graphicl representation of Ranking of Supervised Classifiers dependence of Data Splitting Algorithms
  • 8. International Journal of Computer Science & Information Technology (IJCSIT) Vol 12, No 2, April 2020 22 8. CONCLUSION Evaluation of learning classifier performance and comparisons trendy nowadays and after studying the literature, a decision drained is that most articles just focus on some known learning algorithm performances with one or two data sets only without centering the quality and ratio of instances used to train or test the model. All learning algorithms include pros and cons but with measuring the performance of a specific algorithm this work show the impact of data splitting algorithms on the ranking of learning algorithms by using the proposed model of WMRRA. Results show that the performance of the learning classifiers varies with the data domain and these domains fixed in the framework of the number of instances and attributes used in the comparison of learning classifiers. Considering the WMRRA model, the classifier LDA met the highest-ranking score with a rank of 1, CVM, BVM, MLP followed a rank 2, 3, 4 and ID3 with a rank of 15 dependence data splitting algorithms In short, classifiers ranking is strongly robust to the dependence of sample complexity. Now, it is feasible because of the methodology used, all the learning classifiers obtained acceptable performance rates and had an adequate ranking in all related characteristics used. However, analyzing the result, mined from the software it was quite problematic to select a learning algorithm with the best performance. With reference to the above conclusion, the approach of the WMRRA model provides the best possible way of a ranking of the learning classifiers. REFERENCES [1] K. K. Dobbin and R. M. Simon, "Optimally splitting cases for training and testing high dimensional classifiers," Dobbin and Simon BMC Medical Genomics vol. 4, no. 31, pp. 1-8, 2011. [2] R. Kohavi, "A Study of CrossValidation and Bootstrap for Accuracy Estimation and Model Selecti," presented at the International Joint Conference on Articial Intelligence IJCA, 1995. [3] J. Awwalu and O. F. Nonyelum, "On Holdout and Cross Validation A Comparison between Neural Network and Support Vector Machine " International Journal of Trend in Research and Development, vol. 6, no. 2, pp. 235-239, 2019. [4] Z. Reitermanov´a, "Data Splitting," 2010. [5] Y. Xu and R. Goodacre, "On Splitting Training and Validation Set: A Comparative Study of Cross‑Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning," Journal of Analysis and Testing, vol. 2, pp. 249–262 2018. [6] CatherineChampagne, HeatherMcNairn, B. Daneshfar, and J. Shang, "A bootstrap method for assessing classification accuracy and confidence for agricultural land use mapping in Canada," International Journal of Applied Earth Observation and Geoinformation, vol. 29, pp. 44-52, 2014 . [7] AgapitoLedezma, RicardoAler, AraceliSanchis, and DanielBorrajo, "GA-stacking: Evolutionarystacked generalization," Intelligent Data Analysis pp. 1-31, 2010, doi: 10.3233/IDA- 2010-0410. [8] A. Gosavi, "Analyzing Responses from Likert Surveys and Risk-adjusted Ranking: A Data Analytics Perspective," Procedia Computer Science, vol. 61, pp. 24-31, 2015, doi: doi.org/10.1016/j.procs.2015.09.139. [9] S. M. Abdulrahman, P. Brazdil, J. N. v. Rijn, and J. Vanschoren, "Speeding up algorithm selection using average ranking and active testing by introducing runtime," Machine Learning volume, vol. 107, pp. 79-108, 2017, doi: doi.org/10.1007/s10994-017-5687-8