0% found this document useful (0 votes)
31 views

A Random Forest Based Predictor For Medical Data Classification Using Feature Ranking 2019

Uploaded by

Rifqi Zumadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

A Random Forest Based Predictor For Medical Data Classification Using Feature Ranking 2019

Uploaded by

Rifqi Zumadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Informatics in Medicine Unlocked 15 (2019) 100180

Contents lists available at ScienceDirect

Informatics in Medicine Unlocked


journal homepage: www.elsevier.com/locate/imu

A Random Forest based predictor for medical data classification using T


feature ranking
Md. Zahangir Alam1, M. Saifur Rahman, M. Sohel Rahman∗
Department of CSE, BUET, ECE Building, West Palasi, Dhaka, 1205, Bangladesh

A R T I C LE I N FO A B S T R A C T

Keywords: Medical data classification is considered to be a challenging task in the field of medical informatics. Although
Medical data classification many works have been reported in the literature, there is still scope for improvement. In this paper, a feature
Feature ranking ranking based approach is developed and implemented for medical data classification. The features of a dataset
Random forest are ranked using some suitable ranker algorithms, and subsequently the Random Forest classifier is applied only
Disease predictors
on highly ranked features to construct the predictor. We have conducted extensive experiments on 10 bench-
mark datasets and the results are promising. We present highly accurate predictors for 10 different diseases, as
well as suggest a methodology that is sufficiently general and is expected to perform well for other diseases with
similar datasets.

1. Introduction and analyzed in the literature to classify medical data accurately.


Abbass et al. proposed a system with the pareto-differential evaluation
In recent times, the application of computational or machine in- algorithm with a local search scheme, termed the Memetic Pareto-
telligence in medical diagnostics has become quite common. Machine Artificial Neural Network (MPANN), to diagnose breast cancer [5].
intelligence aided decision systems are often being adopted to assist Subsequently, Kiyan et al. [6] presented a statistical neural network-
(but not to replace) a physician in diagnosing the disease of a patient. A based approach to diagnose breast cancer. In Ref. [7], Karabatak et al.
physician typically accumulates her knowledge based on patient developed an expert system for detecting breast cancer, where, to re-
symptoms and the confirmed diagnoses. Thus diagnostic accuracy is duce the dimensions of the dataset, Association Rules (AR) were used.
highly dependent on a physician's experience. Since it is now relatively Peng et al. proposed a hybrid feature selection approach to address the
easy to acquire and store a large amount of information digitally, the issues of high dimensionality of biomedical data, and experimented on
deployment of computerized medical decision support systems has the breast cancer dataset [8]. Fana et al. combined case-based data
become a viable approach to assisting physicians to swiftly and accu- clustering and a fuzzy decision tree to design a hybrid model for
rately diagnose patients [1]. Such a system can be seen as a classifi- medical data classification [9]. The model was executed on two data-
cation task as the goal is to make a prediction (i.e., diagnosis) on a new sets, WBC and liver disorders. Azar et al. proposed three classification
case based on the available records and features (of previously known methods, namely, radial basis function (RBF), multilayer perceptron
cases). Such classification tasks are considered to be one of the most (MLP), and probabilistic neural network (PNN), and experimented on a
challenging tasks in medical informatics [2]. breast cancer dataset [10]. In their experiments, PNN showed better
While various statistical techniques may be applied in medical data performance than MLP.
classification, the major drawback of these approaches is that they During the last three years, several works on medical data classifi-
depend on some assumptions (e.g., related to the properties of the re- cation have been reported in the literature, albeit only on a breast
levant data) for their successful application [3,4]. To know the prop- cancer dataset. Examples include, but may not be limited to, the back
erties of the dataset is a difficult task and is sometimes is not feasible. propagation (BP-NN) approach [11], fuzzy-rough nearest neighbor
On the other hand, soft computing based approaches are less dependent method [12], PCA followed by Support Vector Machine (SVM) with
on such knowledge. Recursive Feature Elimination (SVM-RFE) [13], PCA in combination
A number of soft computing based classifiers have been proposed with a feed-forward neural network [14], ANN with MLP and also BP-


Corresponding author.
E-mail addresses: [email protected] (Md. Z. Alam), [email protected] (M.S. Rahman), [email protected] (M.S. Rahman).
1
Supported by an ICT Ph.D. Fellowship.

https://doi.org/10.1016/j.imu.2019.100180
Received 31 January 2019; Received in revised form 3 April 2019; Accepted 5 April 2019
Available online 13 April 2019
2352-9148/ © 2019 Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/BY-NC-ND/4.0/).
Md. Z. Alam, et al. Informatics in Medicine Unlocked 15 (2019) 100180

NN [15], deep belief network (DBN) [16], SVM ensembles with bagging Table 1
and boosting [17], knowledge-based system using Expectation Max- Brief description of the datasets used in this research.
imization (EM) clustering, noise removal, and Regression Trees (CART) DataSet ID No. Of Training Testing
[18]. Motivated by the promising results of [16], very recently, Karthik Features Samples Samples
et al. [19] have worked on breast cancer classification using Deep
Wisconsin Breast Cancer WBC 9 499 200
Neural Networks (DNN) and achieved better results than others. On the
Pima Indians Diabetes PID 8 576 192
other hand, Khan et al. have presented a model for breast cancer and Bupa Bp 6 200 145
Parkinson's disease prediction, in which ensembles of Evolutionary Hepatitis Hp 19 80 75
Wavelet Neural Networks have been used [20]. Heart-Statlog HtS 13 180 90
On the other hand, Anooj et al. employed weighted fuzzy rules to SpectF SF 44 176 91
SaHeart SHt 9 304 158
develop a clinical decision support system (CDSS) for heart disease
PlanningRelax PRx 12 120 62
prediction [21]. They first generated fuzzy rules based on historical Parkinsons PkS 22 130 65
data for better learning, and subsequently developed the CDSS based on Hepatocellular Carcinoma HCC 49 110 55
those. Also, the fuzzy rules were weighted based on the importance of (HCC)
attributes. Samb et al. [22] proposed a modified SVM-RFE and con-
ducted experiments on multiple medical datasets (e.g., SPECT Heart
Data). They have also incorporated local search operators into their • We identify and report the most important features from the re-
algorithm. Jaganathan et al. [23] employed feature selection using the spective datasets based on how they contribute in the prediction
concept of fuzzy entropy. An earlier work by Polat and Gunes [24] also tasks, and present an insight on the results from a medical point of
had proposed a feature selection approach based on Kernel F-Score. view. We also conduct an ablation study to verify the importance of
Jabbar et al. [25] developed a hybrid approach using K-Nearest the features selected for a model, and confirm the positive con-
Neighbor (KNN) and Genetic Algorithm (GA). An intelligent medical tribution thereof on the classification task.
decision method system on evolutionary strategy was developed in Ref.
[26] using Neural network (NN), GA, SVM, KNN, MLP, RBF, PNN, self- 2. Materials and methods
organizing map (SOM), and Naive Bayes (NB) as classifiers.
Khanmohammadi et al. developed a CDSS [27] and experimented 2.1. Datasets
on ten medical datasets and based on their experiments reported SVM
to be the most desirable classification algorithm for developing CDSS. The datasets we have used was collected from University of
Dennis et al. presented an efficient medical data classification system California at Irvine (UCI) Machine Learning Repository [32]. In parti-
based on Adaptive Genetic Fuzzy System (AGFS) [28]. In this metho- cular we use ten benchmark datasets, corresponding to ten diseases as
dology, rules are first generated from data, and then optimized rules described in Table 1. More details of the datasets are provided in the
selection is performed using GA. Seera et al. [29] proposed a hybrid supplementary material. Since we want to do an independent testing of
intelligent system and conducted experiments on breast cancer, Dia- our model, for each disease, training and testing samples are separated,
betes, and Liver Disorders datasets. Alwidian et al. also have considered applying a random split following the strategy of [2] (please see
breast cancer prediction and have introduced WCBA, which is an effi- Table 1). The datasets organized in training and testing samples can be
cient Weighted Classification (Based on Association rules) algorithm downloaded from the following link to reproduce our experiments:
[30]. They experimentally have shown that WCBA, in most cases, https://github.com/zahangirbd/medical_data_for_classification.
outperforms the other Association Classification (AC) algorithms.
Most of the works discussed above have focused on a limited 2.2. Model construction overview
number of datasets (e.g., 1–3). On the other hand, in this paper, our
focus has been on finding a general methodology for the medical da- A diagram of our model construction workflow is shown in Fig. 1.
tasets. The quest for a general methodology however is not new, and we For each disease, the same workflow is followed to create an in-
do find a few attempts towards that direction in the literature. For dependent model for that disease. We first check whether all features
example, a number of evolutionary Extreme Learning Machine (ELM) are important for the classification task. This is done using several
models have been reported for medical data classification in recent feature ranking algorithms. Then based on the ranking, a subset of the
years, albeit with a slightly different focus. Mohapatra et al. first dis- top-ranked features are selected. Finally, the Random Forest algorithm
cussed the idea to classify binary medical dataset based on ELM [2] and is applied on the selected features to train and construct the final model.
very recently, Eshtay et al. have proposed an ELM based Competitive
Swarm Optimization (CSO) technique for this task [31]. Both of these
works have focused on experimentation by varying the number of
hidden neurons (of the ELM). More will be discussed on this issue in a
later section (Section 3.13).
The contribution of this paper is as follows.

• We propose a general methodology for medical data classification


that employs a feature ranking and selection strategy followed by an
appropriate training of a suitable classifier algorithm. We use a
number of feature ranking strategies and the Random Forest algo-
rithm as the final classifier for our predictors. We have conducted a
thorough evaluation of our approach on 10 benchmark datasets and
presented insightful discussions on the results.
• We present highly accurate predictors for 10 different diseases as
well as suggest a generalized methodology that should perform well
for other diseases with similar datasets. The proposed framework is
also expected to be useful in any other domain that exhibits similar
characteristics of features. Fig. 1. Model construction overview.

2
Md. Z. Alam, et al. Informatics in Medicine Unlocked 15 (2019) 100180

Fig. 2. Results of the models on the Breast Cancer Dataset.

We apply 10-fold cross validation during the model training. Notably, Windows 10 operating system. For applying various classification al-
cross validation is a method to evaluate a predictive model by parti- gorithms, Weka 3.8 [36] has been used. We evaluate the performance
tioning the original sample into a training set to train the model, and a of the models using different popular metrics from the literature,
validation/test set to evaluate it. In 10-fold cross validation, the original namely, accuracy, sensitivity (or recall), precision, F-score, Area under
samples are randomly partitioned into 10 equal sized subsamples, and Receiver Operating Characteristics Curve (AUROC), Area under Preci-
among these subsamples a single subsample is retained as the validation sion-Recall Curve (AUPR), and Root mean squared error (RMSE). De-
data for testing the model, while the remaining 9 subsamples are used tails of these metrics and relevant notions have been discussed in the
as training data. supplementary material. All through the experiments, the 10-fold cross
validation result is the average of the results.
2.3. Feature ranking and selection For each disease, we proceed with the following experimental plan.
Recall that, in addition to the base model (with all features), based on
Features are evaluated and ranked using some ranking algorithms. different ranking algorithms, we train three other models taking dif-
These algorithms evaluate/rank each of the features in the dataset in ferent subsets of the top-ranked features. We evaluate these four models
the context of the output variable (i.e., the Class). Many algorithms are and identify the one that performs the best. Then we take the base
available for feature ranking in the literature [33–35]. We use the fol- model and the best performing model and conduct independent testing
lowing feature ranking options from the Waikato Environment for on these two only. Finally, based on the independent testing results, we
Knowledge Analysis (popularly known as Weka) tool [36]: In- provide a comparison with the state-of-the-art, considering each disease
foGainAttributeEval, GainRatioAttributeEval, CorrelationAttributeEval, separately.
OneRAttributeEval, ReliefFAttributeEval, RandomForest and SVM. More The supplementary material additionally describes the mapping
details of these algorithms are provided in the supplementary materials. between feature names and its numbers, and reports the detailed
Once the ranked features are identified, several combinations of fea- ranking results for each dataset.
tures are selected from among the features based on their ranking
scores. 3.1. Results on breast cancer dataset

2.4. Final model construction Based on the results of feature ranking, three subsets of features are
finalized, and corresponding models are constructed as follows.
For each disease, we construct multiple models. In particular, for
each disease, we identify 3 separate subsets of highly ranked features,
and thus in the sequel construct three separate models. We refer to
• Model WBC
I : Based on the results produced by the ranker,
GainRatioAttributeEval, Feature 1 is removed; the rest of the features
these models using the following form: are used to construct this model.
Model << IdatasetID

>
> , Model << IIdatasetID
>
>
, Model << III
datasetID >
> . Additionally, we ModelIIWBC : Considering the results of the rankers,
have constructed a baseline model using all the features, i.e., ignoring InfoGainAttributeEval and GainRatioAttributeEval, Features 1, 9 are
the feature ranking exercise; this model is referred to as excluded while constructing this model.
Model << Base

datasetID >
> . For example, for Diabetes, we have four models, WBC
ModelIII : Based on the outcome of the ranker,
PID
namely, ModelBase , ModelIPID , ModelIIPID , ModelIII PID
We use the Random InfoGainAttributeEval, Features 1, 4, 5, 8, 9 are excluded and the
Forest algorithm as the classifier to train and construct the model and other four features are used to construct this model.
apply 10-fold cross validation.
Fig. 2 (left panel) reports the 10-fold cross validation performance
3. Results of the above three models along with the baseline model. From these
results it is evident that ModelIWBC and ModelIIWBC perform better than
WBC
All experiments have been conducted on a CPU with Intel(R) Core ModelIII and M odelIWBC . Although the performances of M odelIWBC and
WBC
(TM) i5-7200U (@ 2.50 GHz) having 8 GB of RAM and running the M odelI are similar, considering the RMSE value, M odelIWBC seems to

3
Md. Z. Alam, et al. Informatics in Medicine Unlocked 15 (2019) 100180

have a slight edge over ModelIIWBC . As the best performer, we conduct this model.
independent testing for ModelWBC and compare the results with the
baseline (ModelWBC). Fig. 2 also presents the independent testing results
• Model Bp
III : Features 1, 3 are excluded based on the outcome of the
ranker, GainRatioAttributeEval; thus the model is constructed using
(mid panel) and the corresponding confusion matrix (right panel). The the rest of the features.
contribution of feature ranking and selection is evident from the results.
Fig. 4 (left panel) displays the 10-fold cross validation performance
3.2. Results on diabetes dataset of the above three models along with the baseline model. From these
results it is observed that ModelIIBp performs better than the other
Based on the results of feature ranking, three subsets of features are models. As the best performer, we conduct independent testing for
finalized, and corresponding models are constructed as follows. The ModelIIBp and compared the results with the baseline (ModelBaseBp).
detailed ranking results are provided in the supplementary materials. Fig. 4 also presents the independent testing results (mid panel) and
the corresponding confusion matrix (right panel). The contribution of
• Model I :PID
Based on the outcome of the rankers, feature ranking and selection is evident from the results.
InfoGainAttributeEval and GainRatioAttributeEval, Feature 3 is ex-
cluded; the rest of the features are used to construct this model.
• ModelIIPID: Considering the ranked features produced by the ran- 3.4. Results on Hepatitis Dataset
kers, GainRatioAttributeEval and Correlation- AttributeEval, Features
3, 4 are excluded while constructing this model. Based on the results of feature ranking, three subsets of features are
• ModelIIIPID: Features 3, 7 are excluded based on the outcome of the finalized and corresponding models are constructed as follows. The
ranker, InfoGainAttributeEval; thus the model is constructed using detail ranking results are provided in the supplementary materials.
the rest of the features.

Fig. 3 (left panel) presents the 10-fold cross validation performance


• Model Hp
I : Based on the outcome of the ranker, InfoGainAttributeEval,
Features 1, 3, 7–10, 13, 16 are excluded; the rest of the eleven
of the above three models along with the baseline model. From these features are used to construct this model.
results it is observed that ModelIIPID performs better than the other
models. As the best performer, we conduct independent testing for
• ModelII Hp: Considering the ranked features produced by the rankers
InfoGainAttributeEval and GainRatioAt- tributeEval, Features 1, 8, 9,
ModelIIPID and compare the results with the baseline (Model base PID). 16, are excluded while constructing this model.
Fig. 3 shows the independent testing results (mid panel) and the
corresponding confusion matrix (right panel). The contribution of fea-
• ModelIII Hp: Features 1, 7–10, 13, 16 are excluded based on the
outcome of the rankers, InfoGainAttributeEval and
ture ranking and selection is evident from the results. GainRatioAttributeEval; thus the model is constructed using the rest
of the features.
3.3. Results on Bupa Dataset
Fig. 5 (left panel) shows the 10-fold cross validation performance of
Based on the results of feature ranking, three subsets of features are the above three models along with the baseline model. From these re-
finalized and corresponding models are constructed as follows. The sults it is observed that ModelBp performs better than the other models.
detail ranking results are provided in the supplementary materials. As the best performer we conduct independent testing for ModelIIIHp
and compared the results with the baseline (ModelBaseHp).
• Model Bp
I : Based on the outcome of the rankers, InfoGainAttributeEval Fig. 5 also presents the independent testing results (mid panel) and
and GainRatioAttributeEval, Feature 1 is excluded; the rest of the the corresponding confusion matrix (right panel). The contribution of
features are used to construct this model. feature ranking and selection is evident from the results.
• ModelIIBp: Considering the ranked features produced by the ranker
CorrelationAttributeEval, Feature 3 is excluded while constructing

Fig. 3. Results of the models on the Diabetes Dataset.

4
Md. Z. Alam, et al. Informatics in Medicine Unlocked 15 (2019) 100180

Fig. 4. Results of the models on the Bupa Dataset.

3.5. Results on statlog (heart) dataset From these results it is observed that ModelIIIHtS performs better than
the other models. As the best performer, we conduct independent
Based on the results of feature ranking, three subsets of features are testing for ModelIIIHtS and compared the results with the baseline
finalized and corresponding models are constructed as follows. The (ModelBaseHtS).
detail ranking results are provided in the supplementary materials. Fig. 6 also presents the independent testing results (mid panel) and
the corresponding confusion matrix (right panel). The contribution of
• Model I
HtS
: Based on the outcome of the ranker, feature ranking and selection is evident from the results.
InfoGainAttributeEval, Features 1, 4 are excluded; the rest of the
features are used to construct this model. 3.6. Results on SPECTF dataset
• ModelIIHtS: Considering the ranked features produced by the ranker
CorrelationAttributeEval, Features 4, 6 are excluded while con- Based on the results of feature ranking three subsets of features are
structing this model. finalized and corresponding models are constructed as follows. The
• ModelIIIHtS: Feature 5 is excluded based on the outcome of the detail ranking results are provided in the supplementary materials.
ranker, ReliefFAttributeEval; thus the model is constructed using the
rest of the features. • Model SF
I : Based on the results produced by the ranker,
InfoGainAttributeEval & InfoGainAttributeEval, Features 1, 20, 21, 23,
Fig. 6 (left panel) demonstrates the 10-fold cross validation per- 24, 27, 31, 33, 38 are removed; the rest of the features are used to
formance of the above three models along with the baseline model. construct this model.

Fig. 5. Results of the models on the Hepatitis Dataset.

5
Md. Z. Alam, et al. Informatics in Medicine Unlocked 15 (2019) 100180

Fig. 6. Results of the models on Statlog (Heart) Dataset.

• Model SF
II : Considering the results of the rankers, RandomForest 3.7. Results on SaHeart dataset
Features 2, 4, 14, 20, 27, 28, 30, 36, 38, 40, 43 are excluded to
construct this model. Based on the results of feature ranking, three subsets of features are
• ModelIIISF: Based on the outcome of the ranker, SVM, Features 1, finalized, and corresponding models are constructed as follows. The
24–33 are excluded and the rest features are used to construct this detail ranking results are provided in the supplementary materials.
model.
• Model I
SHt
: Based on the results produced by the ranker,
Fig. 7 (left panel) shows the 10-fold cross validation performance of ReliefFAttributeEval Features 2, 3, 5, 7 are removed; the rest of the
the above three models along with the baseline model. From these re- features are used to construct this model.
sults it is observed that ModelSF performs better than the other models. • ModelIISHt: Considering the results of the rankers, RandomForest
As the best performer, we conduct independent testing for ModelSF and Features 4, 8 are excluded to construct this model.
compared the results with the baseline (ModelSF). Fig. 7 also presents • ModelIIISHt: Based on the outcome of the ranker, SVM, Features 1, 2,
the independent testing results (mid panel) and the corresponding 5, 6, 7 are excluded and the rest are used to construct this model.
confusion matrix (right panel). The contribution of feature ranking and
selection is evident from the results. Fig. 8 (left panel) demonstrates the 10-fold cross validation per-
formance of the above three models along with the baseline model.
From these results it is observed that ModelIISHt performs better than
the other models. As the best performer, we conduct independent

Fig. 7. Results of the models on the SPECTF Dataset.

6
Md. Z. Alam, et al. Informatics in Medicine Unlocked 15 (2019) 100180

Fig. 8. Results of the models on SaHeart Dataset.

testing for ModelIISHt and compared the results with the baseline • Model III
PkS
: Based on the outcome of the ranker, SVM, Features 1,
(ModelBaseSHt). 11–20 are excluded and the rest are used to construct this model.
Fig. 8 also displays the independent testing results (mid panel) and
the corresponding confusion matrix (right panel). The contribution of Fig. 10 (left panel) presents the 10-fold cross validation perfor-
feature ranking and selection is evident from the results. mance of the above three models along with the baseline model. From
these results it is observed that ModelIIIPkS performs better than the
3.8. Results on PlanningRelax dataset other models. As the best performer, we conduct independent testing
for ModelIIIPkS and compared the results with the baseline (Mod-
Based on the results of feature ranking, three subsets of features are elBasePkS).
finalized and corresponding models are constructed as follows. The Fig. 10 also displays the independent testing results (mid panel) and
detail ranking results are provided in the supplementary materials. the corresponding confusion matrix (right panel). The contribution of
feature ranking and selection is evident from the results.
• Model I
PRx
: Based on the results produced by the ranker,
CorrelationAttributeEval Features 1, 2, 5, 6, 10, 11, are removed; the 3.10. Results on hepatocellular carcinoma (HCC) dataset
rest of the features are used to construct this model.
• ModelIIPRx: Considering the results of the rankers, Based on the results of feature ranking, three subsets of features are
InfoGainAttributeEval, GainRatioAttributeEval & SVM Features 1, 8, 9 finalized and corresponding models are constructed as follows. The
are excluded to construct this model. detail ranking results are provided in the supplementary materials.
• ModelIIIPRx: Based on the outcome of the ranker, RandomForest,
Feature 6 is excluded and the rest are used to construct this model. • Model I
HCC
: Based on the results produced by the ranker,
InfoGainAttributeEval Features 1, 26, 28–30, 33–35 are removed; rest
Fig. 9 (left panel) displays the 10-fold cross validation performance of the features are used to construct this model.
of the above three models along with the baseline model. From these
results it is observed that ModelIPRx performs better than the other
• ModelIIHCC: Considering the results of the rankers, RandomForest
Features 24–26, 29, 30, 32, 36, 39, 40, 42, 44, 48 are excluded to
models. As the best performer, we conduct independent testing for construct this model.
ModelIPRx and compared the results with the baseline (ModelBasePRx).
Fig. 9 also presents the independent testing results (mid panel) and
• ModelIIIHCC: Based on the outcome of the ranker, SVM, Features 7,
29, 41, 49 are excluded and the rest are used to construct this model.
the corresponding confusion matrix (right panel). The contribution of
feature ranking and selection is evident from the results. Fig. 11 (left panel) displays the 10-fold cross validation performance
of the above three models along with the baseline model. From these
3.9. Results on Parkinsons dataset results it is observed that ModelIHCC performs better than the other
models. As the best performer, we conduct independent testing for
Based on the results of feature ranking three subsets of features are ModelIHCC and compared the results with the baseline (ModelBaseHCC).
finalized, and corresponding models are constructed as follows. The Fig. 11 also presents the independent testing results (mid panel) and
detail ranking results are provided in the supplementary materials. the corresponding confusion matrix (right panel). The contribution of
feature ranking and selection is evident from the results.
• Model I
PkS
: Based on the results produced by the ranker,
InfoGainAttributeEval Features 17, 18 are removed; the rest of the 3.11. Comparison
features are used to construct this model.
• ModelIIPkS: Considering the results of the rankers, The feature ranking based models proposed in this paper have
InfoGainAttributeEval Features 2, 4, 16–18, 20, 21 are excluded to achieved better training (cross-validation) accuracy as well as better
construct this model. independent testing accuracy for medical data classification as

7
Md. Z. Alam, et al. Informatics in Medicine Unlocked 15 (2019) 100180

Fig. 9. Results of the models on PlanningRelax Dataset.

Fig. 10. Results of the models on Parkinsons Dataset.

compared with the baseline (i.e., without feature ranking). Their per- dataset.
formance is also quite promising in comparison with the state-of-the- In the Hepatitis dataset, our testing accuracy is only second to the
art. Fig. 12 presents the testing accuracy and F-score of the best models claimed 100% accuracy of ICSELM [2]. However, this performance
reported in this paper and the state-of-the-art; results of other methods should be checked with the F-scores presented in Fig. 12: the F-score of
have been taken from the respective papers and the results that are not ICSELM is quite low (0.6875) as compared to our best model (0.859),
available in the literature are left missing in the bar-charts. suggesting that our model is more robust than ICSELM.
Now as we can see, in testing accuracy, our best model outperforms In all other datasets, considering the accuracy, our model outper-
all previous approaches for most of the datasets. In the Breast Cancer forms CSO-ELM [31] (and also CSO-RELM) excepting in the Heart da-
dataset, the recent work by Karthik et al. [19], which employs a deep taset; for this dataset the accuracy reported for CSO-ELM and CSO-
neural network (DNN) following a recursive feature elimination step is RELM in Ref. [31] is 0.8612 and 0.8402 respectively. Our best model is
ahead of us in accuracy, albeit only slightly. Karthik et al. also pre- not far behind having an accuracy of 0.8333 with an F-score of 0.830
sented the sensitivity (i.e., recall) and specificity (i.e., True Negative suggesting the robustness of our predictor. Unfortunately, Eshtay et al.
Rate = 1 - False Positive Rate) with the help of a bar chart (Fig. 5 of [31] did not report any other metric nor did they provide the confusion
[19]). From the bar chart we see that the sensitivity is slightly less than matrix for further evaluation/comparison. On another note, both CSO-
0.98 whereas our best model's sensitivity (i.e., recall) is 0.985 (cf. ELM and CSO-RELM exploit a metaheuristics technique called Compe-
Fig. 2). The specificity of DNN [19] is slightly better than that of our titive Swarm Optimization (CSO). Due to the inherent stochastic nature
best model. Overall, DNN [19] performs slightly better than ours but of CSO, Eshtay et al. [31] conducted 30 independent runs. According to
notably, the approach of [19] is only designed for the Breast Cancer their report, the worst case accuracy of CSO-ELM across these 30 runs is

8
Md. Z. Alam, et al. Informatics in Medicine Unlocked 15 (2019) 100180

Fig. 11. Results of the models on HCC Dataset.

Fig. 12. Comparison of this proposed models' testing accuracy and Fscore with the state-of-the-art. w/o Rank means the baseline model for that disease and Rank
means the best model constructed after feature selection and ranking. We do not report base ELM results from the study [2] as the work [31] is very recent.

0.8152 which is lower than our accuracy; for CSO-RELM this is even accuracies of the best-performing model and the corresponding ablation
worse, 0.7826. With respect to the ELM based works of [2,31], we have model for each of the datasets.
some more observations as discussed in a later section (Section 3.13). From these results presented in Fig. 13, it is evident that when the
least-ranked feature is removed from the best-performing model, the
3.12. Ablation study performance degrades. There is only one exception to this finding and
that is for the SaHeart dataset, albeit only in the context of training
We have conducted an Ablation study on the best-performing model accuracy; for testing accuracy we do find that the ablation model per-
for each dataset. In particular, for each disease, we have removed the forms worse. Thus we can be confident that all selected features
lowest ranked feature from the best-performing model which leads to (through feature ranking and selection) are indeed contributing to-
the formation of another model, called, the ablation model. For example, wards the robustness of the predictor for each disease.
ModelHCC is the best-performing model for HCC dataset with Feature 32
being the lowest ranked feature thereof. Now, through removing 3.13. Discussions
Feature 32 we get the ablation model. Then the performances of the
best-performing model and the corresponding ablation model are We have proposed a general methodology (Fig. 14) for medical data
compared with each other. Fig. 13 presents the training and testing classification that employs a feature ranking and selection strategy

9
Md. Z. Alam, et al. Informatics in Medicine Unlocked 15 (2019) 100180

Fig. 13. The results of Ablation study on the best-performing model of each dataset.

followed by model training and construction using a suitable classifier importance in each dataset (from classification point of view). For ex-
algorithm. We believe that this general methodology would be useful ample, for the Diabetes dataset, our experiments suggest that Features 3
for any medical dataset for prediction/diagnosis tasks. To judge the and 4 have less contribution/importance in the context of disease
efficacy, robustness and generality of our approach, we have experi- prediction. This is interesting, as Feature 3 is ‘diastolic blood pressure
mented on 10 medical datasets using a number of feature ranking (mm Hg)’ and Feature 4 is ‘triceps skin fold thickness (mm)’. On the
strategies, and applied the Random Forest algorithm as the final clas- other hand, across all ranking algorithms, Feature 2 (‘plasma glucose
sifier for our predictors. Our experiments have clearly suggested that concentration a 2 h in an oral glucose tolerance test’) has been ranked
feature ranking and selection is useful. as the most important feature, which is in accord from a medical per-
We have also conducted an ablation study on each dataset to verify spective. As another example, for the Heart Disease dataset, all rankers
the importance of the features selected for the model and to confirm the have ranked Feature 12 as the most important feature; Feature 12 re-
positive contribution thereof on the classification task. Section 3.12 has presents the ‘number of major vessels colored by fluoroscopy’ and
demonstrated these results. These findings further strengthen our claim hence is indeed a very important feature from clinical point of view.
that our general methodology of feature ranking and selection does play From another angle, as has been briefly indicated before, recent
a strong role in a robust model construction. works of [2,31] have focused on experimentation by varying the
As a by-product, we also have a suggestion concerning the feature number of hidden neurons (of the ELM). However, those studies remain

Fig. 14. General methodology.

10
Md. Z. Alam, et al. Informatics in Medicine Unlocked 15 (2019) 100180

inconclusive, as sometimes the testing accuracy was found better with a indeed useful and sufficiently general. It is also worth mentioning that
higher number of hidden neurons, and sometimes with a lower number. the top-ranked features have been found to be important from a med-
For example, for the Breast Cancer dataset, the testing accuracy is ical point of view. Therefore, while one may argue that feature ranking
higher for lower number of hidden neurons- 97% for 10 hidden neurons and selection should always lead to a better predictor/classifier, in our
and 83.87% for 300- and for the Hepatitis dataset, the relation is re- case, we have the added value of explaining the phenomenon in real
versed: 86.12% for 10 and 100% for 300 [2]. Similar issues are also (clinical) context. Thus we expect that a medical practitioner would be
observed in Ref. [31]. Hence, for a new medical dataset, it is difficult to more confident (and comfortable, so to speak) in using our predictor.
make even an educated guess as to what should be the ideal number of Although not reported here, we have also conducted extensive ex-
neurons to obtain best results. On the other hand, from our experiments periments with other classifiers, such as: Support Vector Machine
it is evident that the feature ranking and selection strategy provides (SVM), Bayes Network, Multilayer Perceptron, etc. and found the better
better results than the baseline, and thus we present a general metho- contribution of Random Forest across all datasets as compared to other
dology that is expected to perform well across all medical datasets classifiers. The use of 10 datasets is also a strong feature of this research
(which have similar characteristics in the data pattern) consistently. as only one study [31] in the literature, so far as we know, has used all
We have also conducted some experiments using other classifier of these datasets for experimentation, albeit with a different goal: they
algorithms (in place of Random Forest). It has been observed that un- have worked on the evolutionary Extreme Learning Machine (ELM)
like Random Forest, other classifiers do not perform consistently well model, focusing on compacting networks by reducing the number of
across all datasets. For example, the testing accuracies of SVM and neurons in the hidden layer. This is clearly in contrast with our focus of
Bayes Net on Breast Cancer Dataset are 0.99 and 0.975 respectively. finding a general methodology for medical data classification. Notably,
Thus SVM actually outperforms Random Forest (accuracy for our best their study remained inconclusive as sometimes the testing accuracy
model, ModelIIWBC, is 0.9850) on Breast Cancer Dataset. But for the was found better with higher number of hidden neurons and sometimes
Bupa dataset both SVM and Bayes Net fall significantly short with with lower number. On the other hand, we present a general metho-
testing accuracies of only 0.689655 and 0.689655 respectively (the dology that is expected to perform well across all medical datasets
accuracy of our best model, ModelIIBp is 0.786207). (which have similar characteristics in the data pattern) consistently.

4. Conclusion Ethical statement

Medical data classification is one of the complex and challenging A. No financial and personal relationships with other people or orga-
tasks in medical informatics. Due to its complex nature, various nizations exists that could inappropriately influence (bias) this
methods have been proposed in the literature. In this paper, we have work.
revisited this challenge and have made an effort to present a general- B. The authors confirm that this manuscript reports original research
ized methodology for this classification task. In particular, we have works of the authors and has not been published or submitted for
proposed a feature ranking based methodology that employs a feature consideration elsewhere.
selection strategy, and train and construct the model based on only the
highly ranked features. To elaborate, we first check whether all features Conflicts of interest
are important for the classification task. This is done by applying sev-
eral feature ranking algorithms through 10-fold cross validation. Then None Declared.
based on the ranking, a subset of the top-ranked features is formed
based on 10-fold cross validation performance. Finally, the Random Acknowledgment
Forest algorithm is applied on the selected features to train and con-
struct the final model. In fact we form several subsets of top-ranked The first author is a recipient of the ICT Ph.D. Fellowship ad-
features and corresponding to each subset, train a model. Thus we have ministered by ICT Division, Government of the People’s Republic of
several models to compare and choose from each other. Bangladesh, website: http://www.ictd.gov.bd/.
We have conducted extensive experiments on 10 benchmark data-
sets and our results are promising. Our feature ranking and selection Appendix A. Supplementary data
based models have performed consistently better than the baseline
(without feature ranking) model. Our best models are also found to be Supplementary data to this article can be found online at https://
competitive with the state-of-the-art. We have also done limited abla- doi.org/10.1016/j.imu.2019.100180.
tion study to verify the importance of the features selected for the
model, and to confirm the positive contribution thereof on the classi- References
fication task. To conclude, we not only have developed highly accurate
predictors for 10 different diseases, but also have presented a general [1] Chabat F, Hansell DM, Yang G-Z. Computerized decision support in medical ima-
methodology that should perform well for other diseases which have ging. IEEE Eng Med Biol Mag 2000;19(5):89–96.
[2] Mohapatra P, Chakravarty S, Dash P. An improved cuckoo search based extreme
similar characteristics in the data pattern. learning machine for medical data classification. Swarm and Evolutionary
While the use of feature ranking and selection strategy is not new in Computation, vol. 24. 2015. p. [25]–49]. https://doi.org/10.1016/j.swevo.2015.
the applied machine learning literature, to the best of our knowledge, 05.003https://www.sciencedirect.com/science/article/pii/S2210650215000413.
[3] Duda RO, Hart PE. Pattern classification and scene analysis. NY, USA: A Wiley-
this had not been comprehensively investigated in the context of Interscience Publication, John Wiley & Sons; 1973.
Medical Data Classification before the current research work. From this [4] Coast D, Stern R, Cano G, Briller S. An approach to cardiac arrhythmia analysis
angle, this can be seen as the first attempt where the feature ranking using hidden markov models. IEEE Trans Biomed Eng 1990;37(9):[826]–36].
https://doi.org/10.1109/10.58593https://www.sciencedirect.com/science/
and selection scheme has been applied in the context of Medical Data
article/pii/0031320394900310.
Classification using the Random Forest as the final classifier. Also, the [5] Abbass HA. An evolutionary artificial neural networks approach for breast cancer
ablation study conducted here is new in this application domain. Thus, diagnosis. Artif Intell Med 2002;25(3):265281https://www.sciencedirect.com/
science/article/pii/S001048251300303X.
while this paper does not provide any new theoretical advancement in
[6] Kiyan T, Yildirim T. Breast cancer diagnosis using statistical neural networks. J
the context of machine learning, we believe that our rigorous experi- Electr Electron Eng 2004;4(2):11491153.
ments along with an appropriate ablation study have advanced the [7] Karabatak M, Ince MC. An expert system for detection of breast cancer based on
state-of-the-art. In fact, through extensive experiments on 10 different association rules and neural network. Expert Syst Appl 2009;36(2):34653469.
https://doi.org/10.1016/j.eswa.2008.02.064https://www.sciencedirect.com/
benchmark datasets, we have been able to show that our approach is

11
Md. Z. Alam, et al. Informatics in Medicine Unlocked 15 (2019) 100180

science/article/pii/S0957417408001103. https://doi.org/10.1016/j.jksuci.2011.09.002https://www.sciencedirect.com/
[8] Peng Y, Wu Z, Jiang J. A novel feature selection approach for biomedical data science/article/pii/S1319157811000346.
classificatio. J Biomed Inform 2010;43:1523. https://doi.org/10.1016/j.jbi.2009. [22] Samb ML, Camara F, Ndiaye S, Slimani Y, Esseghir MA. A novel rfe-svm-based
07.008https://www.sciencedirect.com/science/article/pii/S1532046409001014. feature selection approach for classification, Int J Adv Sci Technol 43. doi.
[9] Fana C-Y, Changb P-C, Linb J-J, Hsiehb J. A hybrid model combining case-based 10.1.1.641.826 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.641.
reasoning and fuzzy decision tree for medical data classification. Appl Soft Comput 8266.
2011;24:632644. https://doi.org/10.1016/j.asoc.2009.12.023https://www. [23] Jaganathan P, Kuppuchamy R. A threshold fuzzy entropy based feature selection for
sciencedirect.com/science/article/pii/S1568494609002774. medical database classification. Comput Biol Med 2013;43(12):[27]–38]. https://
[10] Azar AT, El-Said SA. Performance analysis of support vector machines classifiers in doi.org/10.1016/j.compbiomed.2013.10.016https://www.sciencedirect.com/
breast cancer mammography recognition. Neural Comput Appl science/article/pii/S001048251300303X.
2014;4(5):11631177. https://doi.org/10.1007/s00521\-012\-1324\-4https://link. [24] Polat K, Gnes S. A new feature selection method on classification of medical da-
springer.com/article/10.1007/s00521/discretionary{-}{ }{ }012/discretionary{-}{ tasets: Kernel f-score feature selection. Expert Syst Appl 2009;36. 1036710373.
}{ }1324/ discretionary{-}{ }{ }4. [25] Jabbar MA, Deekshatulu B, Chandra P. Classification of heart disease using k-
[11] Bhattacherjee A, Roy S, Paul S, Roy P, Kausar N, Kausar N. Classification approach nearest neighbor and genetic algorithm. Elsevier Procedia Technology, vol. 10.
for breast cancer detection using back propagation neural network: a study. 2013. p. 85–94. https://doi.org/10.1016/j.protcy.2013.12.340https://www.
Biomedical Image Analysis and Mining Techniques for Improved Health Outcomes sciencedirect.com/science/article/pii/S2212017313004945.
2015. https://doi.org/10.4018/978-1-4666-8811-7.ch010. [26] Gorunescu F, Belciug S. Evolutionary strategy to develop learning-based decision
[12] Onan A. A fuzzy-rough nearest neighbor classifier combined with consistency-based systems. application to breast cancer and liver fibrosis stadialization. J Biomed
subset evaluation and instance selection for automated diagnosis of breast cancer. Inform 2014;49:112118. https://doi.org/10.1016/j.jbi.2014.02.001https://www.
Expert Syst Appl 2015;42(20):68446852. https://doi.org/10.1016/j.eswa.2015.05. sciencedirect.com/science/article/pii/S1532046414000173.
006https://www.sciencedirect.com/science/article/pii/S0957417415003267. [27] Khanmohammadi S, Rezaeiahari M. Ahp based classification algorithm selection for
[13] Yin Z, Fei Z, Yang C, Chen A. A novel svm-rfe based biomedical data processing clinical decision support development. Elsevier Proce-dia Computer Science, vol.
approach: basic and beyond. IECON 2016-42nd annual conference of the IEEE in- 36. 2014. p. 328–34. https://doi.org/10.1016/j.procs.2014.09.101https://www.
dustrial electronics society IEEE; 2016. p. 71437148. https://doi.org/10.1109/ sciencedirect.com/science/article/pii/S1877050914013507.
IECON.2016.7793954https://ieeexplore.ieee.org/document/7793954. [28] Dennis B, Muthukrishnan S. Agfs: Adaptive genetic fuzzy system for medical data
[14] Jhajharia S, Varshney HK, Verma S, Kumar R. A neural network based breast cancer classification. Appl Soft Comput 2014;24:[242]–52]. https://doi.org/10.1016/j.
prognosis model with pca processed features. 2016 International conference on asoc.2014.09.032https://www.sciencedirect.com/science/article/pii/
advances in computing, communications and informatics (ICACCI)IEEE; 2016. p. S1568494614004852.
18961901. https://doi.org/10.1109/ICACCI.2016.7732327https://ieeexplore.ieee. [29] Seera M, Lim CP. A hybrid intelligent system for medical data classification. Expert
org/document/7732327. Syst Appl 2014;41:22392249. https://doi.org/10.1016/j.eswa.2013.09.022https://
[15] Jouni H, Issa M, Harb A, Jacquemod G, Leduc Y. Neural network architecture for www.sciencedirect.com/science/article/pii/S0957417413007562.
breast cancer detection and classification. IEEE inter- national multidisciplinary [30] Alwidian J, Hammo BH, Obeid N. Wcba: weighted classification based on asso-
conference on engineering technology (IMCET) IEEE; 2016. p. 3741. https://doi. ciation rules algorithm for breast cancer disease. Appl Soft Comput
org/10.1109/IMCET.2016.7777423https://ieeexplore.ieee.org/document/ 2018;62:536549. https://doi.org/10.1016/j.asoc.2017.11.013https://www.
7777423/. sciencedirect.com/science/article/pii/S1568494617306762.
[16] Abdel-Zaher AM, Eldeib AM. Breast cancer classification using deep belief net- [31] Eshtay M, Faris H, Obeid N. Improving extreme learning machine by competitive
works. Expert Syst Appl 2016;46:139144. https://doi.org/10.1016/j.eswa.2015.10. swarm optimization and its application for medical diagnosis problems. Expert Syst
015https://www.sciencedirect.com/science/article/pii/S0957417415007101. Appl 2018;104:134152. https://doi.org/10.1016/j.eswa.2018.03.024https://www.
[17] Huang M-W, Chen C-W, Lin W-C, Ke S-W, Tsai C-F. Svm and svm ensembles in sciencedirect.com/science/article/pii/S0957417418301696.
breast cancer prediction. PLoS One 2017;12(1):e0161501. [32] University of California at irvine (uci) machine learning repository. https://archive.
[18] Nilashi M, Ibrahim APDO, Ahmadi H, Shahmoradi L. A knowledge-based system for ics.uci.edu/ml/datasets.html.
breast cancer classification using fuzzy logic method, Telematics Inf 34([4]). [33] Dash M, Liu H. Feature selection for classification. Intell Data Anal 1997;1(1–4):
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0161501. [131]–56]. https://doi.org/10.1016/S1088-467X(97) 00008-5http://www.lsi.us.
[19] Karthik S, Perumal RS, C.Mouli PVSSR. Breast cancer classification using deep es/∼riquelme/publicaciones/kes03.pdf.
neural networks. Knowledge Computing and Its Applications 2018. p. 227–41. [34] Ruiz R, Riquelme JC, Aguilar-Ruiz JS. Fast feature ranking algorithm. In: Palade V,
https://doi.org/10.1007/978\-981\-10\-6680\-1\_12https://link.springer.com/ Howlett RJ, Jain LC, editors. KES 2003 (LNAI 2773). 2003. p. 325331http://www.
chapter/10.1007/978/discretionary{-}{ }{ }981/discretionary{-}{ }{ }10/ lsi.us.es/∼riquelme/publicaciones/kes03.pdf.
discretionary{-}{ }{ }6680/discretionary{-}{ }{ }1_12. [35] J. Novakovi, P. Strbac, D. Bulatovi, Toward optimal feature selection using ranking
[20] Khan MM, Mendes A, Chalup SK. Evolutionary wavelet neural network ensembles methods and classification algorithms, Yugosl J Oper Res ([21]). doi:10.2298/
for breast cancer and Parkinson's disease prediction. PLoS One YJOR1101119N.URL http://elib.mi.sanu.ac.rs/files/journals/yjor/41/
2018;13(2):e0192192https://doi.org/10.1371/journal.pone.0192192https:// yujorn41p119/discretionary{-}{ }{ }135.pdf.
journals.plos.org/plosone/article?id=10.1371/journal.pone.0192192. [36] Waikato environment for knowledge analysis (weka). https://www.cs.waikato.ac.
[21] Anooj PK. Clinical decision support system: risk level prediction of heart disease nz/ml/weka/.
using weighted fuzzy rules. J. King Saud Univ. Comput. Inf. Sci. 2012;11(1):2740.

12

You might also like