Particle_Swarm_Optimization-Based_Random_Forest_Framework_for_the_Classification_of_Chronic_Diseases
Particle_Swarm_Optimization-Based_Random_Forest_Framework_for_the_Classification_of_Chronic_Diseases
ABSTRACT In this paper, a hybrid metaheuristic-based Machine learning approach has been propounded
for the classification of various Chronic Diseases (CDs). The CDs often get misdiagnosed due to various
issues viz., similar and overlapping symptoms, sensitive devices, lack of clinical experts, etc. Based on the
above issues, this study has utilized a fusion of Particle Swarm Optimization with Random Forest (PSORF)
for the automatic identification of CDs. The approach PSORF comprises of two main components: PSO
for obtaining the minimal optimal feature set, also to optimize the performance of the RF classifier, and
RF classifier for the classification of multiple CDs. In this research, five different CD datasets have been
deployed onto a series of experiments have been conducted to identify the best approach for the classification
of CDs. To address the issues of imbalanced and incomplete data in the datasets used, Synthetic Minority
Oversampling Technique (SMOTE) and Expected Minimization (EM) Imputation techniques have been
applied before training the model. This ensures the data quality is improved before being used for analysis.
Furthermore, the performance of the PSO and RF classifiers has been compared with other metaheuristic
and ML classifiers in terms of different performance metrics. For this purpose, Friedman’s tests have been
employed to calculate the mean ranks of all the classifiers across all the datasets for different metrics. The
results showed that the proposed technique achieved the highest mean rank in terms of Accuracy, F-measure,
and Receiver Operating Characteristics (ROC) across all five datasets.
INDEX TERMS Chronic diseases, machine learning, metaheuristic techniques, multi-classification, PSO,
SMOTE.
2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by/4.0/ 133931
A. Singh et al.: PSORF Framework for the Classification of CDs
identifies gaps in the use of ML techniques for detecting optimization algorithms for feature selection with ML
CDs. Section III outlines the materials and methods utilized classifiers greatly reduces the computational power
in this study. In addition, it explains the benchmarks feature required. Therefore, it can be concluded that Meta-
selection and ML techniques in brief. Furthermore, it explains heuristic Optimization (MHO) based ML classifiers
the proposed methodology and all its stages in detail. have been shown to outperform DL models.
Section IV illustrates the experimental work carried out on • Accuracy Paradox: Despite various issues and research
different datasets using the proposed approach and shows the gaps, previous studies achieved excellent performances.
comparison of the proposed approach with other ML and The accuracy paradox may be at play here. Even
metaheuristic techniques. It further explains the limitations though the training model achieves high accuracy levels,
and future work of the study. Section V concludes the study. it has low predictive value. This is especially true when
handling an imbalanced Breast cancer dataset, where
II. RELATED WORK AND RESEARCH GAPS the accuracy rate can be over 97% in all cases [12],
In the literature, several researchers have examined numerous [13], [14], [15], [16], [17]. However, such a model
ML models for the detection of various CDs to help clinical trained on this data may not perform well in identifying
decision-making. In this regard, this section discusses multi- cancer patients in real-life situations, despite producing
ple works done for the classification of Breast Cancer, heart, accurate training results due to a high proportion of
Diabetes, and respiratory diseases as shown in Table 1, 2, 3, cancer patients’ examples.
and 4 and also identifies the research gaps. • No statistical testing: It’s worth noting that only a few
Upon review of prior research, it was found that many studies have been found in the literature that utilized
studies employed metaheuristic optimization algorithms statistical testing to validate their models and achieve
for feature selection and different machine learning (ML) optimal performance [13] and [22]. Most studies instead
and deep learning (DL) models for disease classification, compared various ML and DL models using different
as outlined in Tables 1, 2, 3, and 4. While these previous performance metrics to determine the top performer.
studies have yielded promising outcomes, there are still some However, these results were not adequately explained in
areas for further research and improvement, as described those studies.
below.
In order to create a reliable and effective model, this
• Imbalance dataset: It must be acknowledged that pre-
study has addressed all of the research gaps mentioned
vious research has frequently depended on imbalanced
previously. The issue of imbalanced and missing data was
datasets to predict diseases, producing biased outcomes.
tackled in section III, while section IV thoroughly explains
However, a study conducted by Zhang et al. [21]
and confirms the classification performance of the proposed
resolved this issue by utilizing the SMOTE filter. It is
model.
crucial to meticulously scrutinize potential biases when
interpreting research findings. III. MATERIALS AND METHODS
• Missing data: In this study, it was found that the In this section, the materials and methods utilized in this
Exasens dataset contains some missing values that study have been examined. It describes the different datasets
must be addressed before being used in the training employed in this study and then discusses the benchmark
model. If left untreated, such values can significantly metaheuristic and ML classification techniques. It further
impact the accuracy of the classification model. Previous showcases the different stages of the proposed methodology
studies by Ramachandra and Murthy [27] and Gill and in detail.
Pathwar [28] did not address these missing values.
However, Amutha and Sekar [26] utilized the KNN A. DATASETS
Imputation method to address this issue. While this This study has employed five publicly available datasets
method is effective, adaptive, and flexible, it can be as evaluation benchmarks: the International Confer-
susceptible to outliers and is computationally expensive. ence on Biomedical Health Informatics (ICBHI) lung
• Lower performances: Previous studies have clearly sound database [35], Wisconsin Breast Cancer Dataset
demonstrated that certain datasets exhibit lower per- (WBCD) [36], Z-Alizadehsani dataset [37], Exasens
formance levels due to missing data, lack of feature dataset [38], and Diabetes dataset [39] collected from UCI
selection, and high computational models [22], [29], library, Kaggle, and dataworld. For ease purpose, datasets
[30]. It has been observed that studies that employed ICBHI, WBCD,
metaheuristic-based ML classifiers outperformed those Z-Alizadehsani, Exasens, and Diabetes have been specified
using DL models when comparing studies that utilized as D1, D2, D3, D4, and D5 respectively. Detailed information
the same dataset. While DL models are known for regarding each dataset has been presented in Table 5.
their automatic feature selection, it is important to note In this study structured data consisting of symptomatic
that tuning these features and the model’s parameters information in accordance with the respective diseases has
can consume a significant amount of computational been considered for the evaluation of ML classifiers for
resources. On the other hand, utilizing metaheuristic classifying Chronic diseases. The distribution of instances
TABLE 1. Previous works done for the detection of breast cancer using different feature selection and ML approaches on the wisconsin breast cancer
dataset.
TABLE 2. Previous works done for the detection of Coronary artery disease using different feature selection and ML approaches on the Z-alizadehsani
dataset.
TABLE 3. Previous works done for the detection of Diabetes using different feature selection and ML approaches on the vanderbilt diabetes dataset.
into different classes corresponding to different diseases is Further details regarding the datasets are mentioned as
shown in Figure 2. follows:
133934 VOLUME 11, 2023
A. Singh et al.: PSORF Framework for the Classification of CDs
TABLE 4. Previous works done for the detection of Respiratory Disease using different feature selection and ML approaches on the ICBHI and exasens
dataset.
patients with several missing values. Before deploying • Multilayer Perceptron (MLP): It is the simplest form of
the dataset into this study, 13 patients with heavily neural network that learns a function f (·) : Rp −→
missing data were excluded.11 Rq by training on a dataset where p is the number
of dimensions for the input and q is the number of
B. BENCHMARK TECHNIQUES dimensions for the output [13], [16]. The model consists
This section discusses the benchmark techniques utilized in of three layers: the ‘‘Input layer’’ consists of a set of
this study for comparing and validating the performance of neurons xi |x1 , x2 , . . . ..xn indicating the input features,
the proposed approach. As mentioned earlier, the proposed the middle layer is the ‘‘Hidden layer’’ consisting of
approach comprises two components i.e., PSO and RF. one or more layers containing neurons that transform the
Hence, for comparison purposes, two sets of benchmark previous layer values into a weighted linear summation
techniques have been utilized. One set is for comparing and then apply a non-linear activation function g(·) :
feature selection techniques and another set is for comparing Rp −→ Rq , and the last layer ‘‘Output layer’’ that
proposed approaches with state-of-art classifiers. receives the input from the hidden layer and transform it
into the output values [25].
1) FEATURE SELECTION • Sequential Minimal Optimization (SMO): A supervised
In this study, to compare and validate the performance of learning algorithm designed for the training of SVM as
PSO, three benchmark metaheuristic optimization techniques its training requires solving large complicated Quadratic
GA [15], [16], [19], Bat [19], and FA [19] have been Programming (QP) optimization problems. This prob-
employed. These algorithms are population-based algorithms lem becomes more cumbersome when dealing with
where the agents perform both local and global searches. large datasets leading to a running time of O(N 3 ) [47].
They are iterative in nature. They generally start from a SMO breaks these large QP problems into small QP
randomly chosen solution and move forward. The goal is problems which then can be solved analytically. All
to find an optimal solution at each iteration until no further these calculations make SMO scale between linear or
improvements can be made. Also, It is not advisable to use quadratic in the training set size hence making it faster
the Firefly algorithm as one of the benchmark techniques than SVM.
due to its ‘‘center bias operator’’ problem [46] because • Bagging. It is an averaging ensemble classifier that
this operator enables the algorithm to optimize its function builds several estimators independently and then aver-
in a way that places its respective optima in the center ages their predictors. The idea is that the com-
of the feasible set. Despite this, numerous studies in the bined estimators perform better than single estima-
literature have utilized this algorithm for feature selection and tors due to the reduction in variance. It works best
tuning of hyper-parameters of ML classifiers. For comparison with strong and complex models as they reduce
purposes, this study has incorporated both types of MHO overfitting [23].
algorithms, one with and others without a center bias operator
problem. C. PROPOSED METHODOLOGY
This section introduces the details of the proposed approach
2) ML CLASSIFIERS
PSO-RF for the multiclassification of Chronic Diseases.
This section discusses the cutting-edge classifiers that were Additionally, various stages of the proposed approach have
employed to assess and verify the effectiveness of the been exhibited in Figure 3.
proposed method. The key elements of each stage are briefly elaborated on as
• Naïve Bayes: This supervised learning classifier is an
follows:
amalgamation of two terms: The term ‘‘naive’’ indicates
that the algorithm assumes conditional independence
between all features, given the value of the class 1) STAGE 1: DATA PREPROCESSING
variable. On the other hand, the term ‘‘Bayes’’ indicates In this stage, the original raw data has been treated in terms
that the method is based on the Bayes theorem [12], [27]. of quantity and quality by having it pass through different
This theorem describes the relationship between the sub-stages to enhance the performance of the proposed
class variable (denoted as z) and the dependent feature approach. The various sub-stages are shown in Figure 4.
vectors (y1 through yn ). as shown in (1). The datasets were first checked for their types. Among all
(P(z)P(y1 . . . ..yn |z)) the datasets, dataset D1 was unstructured and needed to be
P(z|y1 , . . . ..yn ) = (1) converted into structured data using Python programming.
P(y1 . . . ..yn )
Hence, the .csv file containing the patient id and disease
There are different versions of Naïve Bayes which differ has been aligned with the .txt file of different .wav files to
only in terms of the assumption they make regarding the get a structured file. In addition, it has been observed from
distribution P(yi |z) [32]. Table 5 that the Exasens dataset suffers from a missingness
11 Diabetes dataset, Modified dataset by Robert Hoyt‘‘, accessed on problem. The dataset consists of 33.36% of the whole data
20/09/23. missing values. In this regard, this study has deployed
FIGURE 3. Overview of the proposed PSO-RF approach. The PSO-RF consists of a preprocessing module, a metaheuristic feature selector, and an
ensemble Random Forest classifier.
FIGURE 4. Representation of multiple stages of preprocessor module for treating raw unstructured data.
TABLE 6. Increment in the number of instances after applying SMOTE TABLE 7. Balancing the weights of an imbalanced ICBHI dataset D1 using
across all the datasets. classbalancer.
Chronic disease metadata as the diagnosis of a disease is done It is a stochastic population-based approach influenced by
using the differential diagnosis method where the idea is to fish schooling or bird flocking behavior. It is different from
rule out the non-related diseases. Hence, a lot of tests such as other optimization algorithms like Differential Evolution in
laboratory tests, scans, X-rays, and blood tests were done, all terms that it does not depend on any gradient or differ-
of which are not really required, and also may not be related ential gradient. It simply explores and exploits the search
to the actual disease. And this unrelated existence of these space using the particle’s position and velocity information.
tests might cause an overfitting problem [34]. Therefore, There are various advantages of PSO including being
FS is essential before training the classification model as computationally inexpensive, having low system require-
it will lead to a faster, more accurate, and cost-effective ments, faster convergence, easy implementation, etc [49].
model. It is mostly used for finding the maxima or minima of
For this purpose, this study has utilized a metaheuristic a function defined over a multidimensional vector space.
approach PSO introduced by Kennedy and Eberhart [48]. It performs feature selection by considering the features as
particles in a high dimensional space where each particle TABLE 8. Description of different evaluation metrics utilized in this study.
in the swarm is an optimal solution. The fitness function
is calculated for each particle in the swarm based on
its position [13], [16], [19]. Each particle’s position is
represented as Xi = xi1 , xi2 , .........xid , where d denotes
the dimension. Likewise, every particle has an associated
velocity, denoted by Vi = vi1 , vi2 , ........., vid . After each
iteration, the velocity and position values at any time instant t
and t + 1 for each particle are updated as shown in (2) and (3)
respectively.
Vi (t + 1) = wVi (t) + c1 r1 (pbesti (t) − Xi (t))
+ c2 r2 (gbesti (t) − Xi (t)) (2)
Xi (t + 1) = Xi (t) + Vi (t + 1) (3)
In the above equations, w is the inertia constant with values
between 0 and 1. It determines how much each particle keeps Algorithm 1. The basic idea of RF is to form a single
up with its previous velocity. In the same way, r1 and r2 are strong classifier by combining multiple decision trees by
constants selected at random, with a value ranging from 0 to 1. either taking the average of their outputs or taking the
Meanwhile, c1 and c2 are coefficients linked to cognitive and majority vote. In previous works, RF has shown an excellent
social aspects. They control the trade-off between exploration performance as compared to other classifiers [12], [15]. The
and exploitation as c1 helps in finding the local minima reason is that it uses bagging for the ensemble process
and c2 helps in finding the global minima among the local which reduces the correlation between the trees. Also, the
minima. The determination of the optimal local and global variance and overfitting of the classifier get reduced [20],
value is based on the variables pbest and gbest respectively. [31]. Moreover, by restricting the features, the decision trees
These variables depend on the position of the particle Xi (t) can learn faster and hence can be built in a small amount of
as shown in (4) and (5). In order to determine the pbest time.
and gbest values, the fitness function (f ) of a particle at The algorithm 1 also considers a forest L comprising of
t +1 instant is compared with its fitness function at t instant of various small decision trees l wherein for each l belonging to
time. L, it selects a bootstrap sample S* from S. Furthermore, for
each node of the tree, a very small feature set sf is obtained
pbesti (t) = Xi (t)iff (Xi ) < f (pbesti ) (4) from F which is then used for node splitting.
Also, gbesti (t) ∈ {pbest1 (t), . . . .pbestm (t)}
|f (gbesti (t)) = min{f (pbest1 (t), . . . .pbestm (t)) (5) IV. EXPERIMENTAL RESULTS AND DISCUSSION
The experimental work conducted on the four chronic
The complete procedure for the proposed approach has
disease datasets, namely D1, D2, D3, D4, and D5 has
been illustrated in Algorithm 1, where the preprocessed
been thoroughly discussed in this section. The experiments
training set S = (p1 , q1 ), .........(pn , qm ) consisting of n rows
illustrate the efficacy of the components of the proposed
and m columns considered in this study where S ∈ D, i.e.,
model by comparing them with the conventional feature
S could be any of the five datasets D. The selected optimal
selection and classification methods. Moreover, Friedman’s
feature set F i was then passed to the training model Random
test has also been employed as a statistical test for validating
forest. The goal was to select the feature set that maximizes
the performance of the proposed approach against previous
the classification accuracy and minimizes the number of
methods.
features. To achieve this goal, the fitness function (f) set for
PSO is shown in (6).
A. EXPERIMENTAL SETUP
Ns
Fitness(f ) = θ ∗ acc(f ) + (1 − θ) ∗ (1 − ) (6) All experiments were run on a Windows 11 with AMD
Nf Ryzen 5 4600H with Radeon Graphics processor and 24 GB
where Ns and Nf define the number of selected and total RAM. All the computations in this study have been
number of features respectively. The classification accuracy done using three different software. The preprocessing and
has been denoted by acc(f ), and θ signifies the weighing classification have been done using the Weka and Jupyter
factor between the classification accuracy and the number of Notebook. In addition, for statistical testing, the SPSS tool
selected features. has been utilized.
TABLE 9. Value of parameters set for Genetic, PSO, Firefly, and Bat algorithm across all datasets.
FIGURE 12. Comparison of MAE and RMSE across all classifiers in terms
of mean Rank calculated by Friedman’s Test.
E. STATISTICAL TESTING
In this section, a thorough comparison has been conducted
between the proposed approach and other benchmark clas-
sifiers, utilizing Friedman’s statistical test to determine the
results [19]. This test with the associated p-value has been
performed for multiple comparisons. It has been undertaken
to detect the performance difference between the PSO-RF
The results obtained from the experimentation work and different classifiers. The null hypothesis with threshold
illustrated two important observations. value p = 0.05 considered for this study was that there is no
• Firstly, a situation of accuracy paradox has been raised significant difference between PSO-RF and other classifiers.
for dataset D1. The performance of all the classifiers The indication of a significant difference is appraised by
for different metrics across dataset D1 is ideal, which p<0.05. Different test statistics set for Friedmann’s test have
is quite impossible. This is due to the presence of a been shown in Table 13.
high imbalance across the classes of dataset D1. These It is worth mentioning that the performance difference
biased outcomes have resulted because of the biased between PSO-RF and other classifiers is highly significant
data. (p < 0.05) for Accuracy, F-measure, and RMSE. Hence,
• Secondly, there are cases where multiple classifiers rejecting the null hypothesis for these parameters that, there
have shown similar results corresponding to the same is no significant difference between PSO-RF and other
metric. For example, for dataset D3, SMO, Bagging, classifiers.
and RF have shown similar performance in terms of The Friedmann mean rank obtained on the above exper-
Accuracy. imental results for different classifiers across different
Hence, to further assess the classification performance of evaluation metrics is shown in Figures 11 and 12. In terms
the proposed approach against other state-of-the-art classi- of Accuracy, ROC, and F-measure, the higher the rank of
fiers, some statistical tests are required that are discussed in the classifier the better the classifier. Whereas for MAE, and
the subsection below. RMSE, the lower the error rank the better the classifier.
TABLE 11. Comparison of performance of classifiers across all five datasets (D1, D2, D3, D4, and D5) in terms of accuracy (in %), ROC (in %), F-measure
(in %).
TABLE 12. Comparison of performance of classifiers across all five datasets (D1, D2, D3, D4, and D5) in terms of MAE and RMSE.
TABLE 13. Values of different test statistics are set across different
performance metrics during friedman’s test.
FIGURE 14. Comparison of Proposed approach with previous studies with FIGURE 16. Comparison of Proposed approach with previous studies with
respect to WBCD Dataset in terms of Accuracy.Multilayer respect to Exasens Dataset D4 in terms of Accuracy. Deep Convolution
perceptron+Open source development Model Algorithm (MLP+ODMA), neural network (DCNN).
Support Vector Machine- Wolf Optimization Algorithm
(SVM+WOA),Particle Swarm Optimization+Artificial Neural Network
(PSO+ANN).
result of the presence of a high imbalance in the dataset. [3] L. J. Grimm, C. S. Avery, E. Hendrick, and J. A. Baker, ‘‘Benefits and risks
The ML classifiers utilized in this study are not complex of mammography screening in women ages 40 to 49 years,’’ J. Primary
Care Community Health, vol. 13, Jan. 2022, Art. no. 215013272110583.
enough to deal with such highly imbalanced data. Secondly. [4] S. Selvakani, K. Vasumathi, and V. Aadhiseshan, ‘‘Application of machine
to tackle down the imbalance data problem, this study has learning in predicting heart disease,’’ Asian Basic Appl. Res. J., vol. 5,
utilized SMOTE filter which might result in the generation of pp. 61–68, Apr. 2023.
[5] A. Chaurasia, ‘‘Ensemble technique to predict heart disease using machine
some noisy data. Therefore, in the future, this study aims to learning classifiers,’’ Netw. Biol., vol. 13, no. 1, p. 1, 2023.
provide a suitably complex AI-based predictive model for the [6] G. N. Ahamad, Shafiullah, H. Fatima, Imdadullah, S. M. Zakariya,
multi-classification of diseases in dataset D1. Furthermore, M. Abbas, M. S. Alqahtani, and M. Usman, ‘‘Influence of optimal
hyperparameters on the performance of machine learning algorithms for
for the problem of imbalance dataset, different variants of predicting heart disease,’’ Processes, vol. 11, no. 3, p. 734, Mar. 2023.
SMOTE can be applied in the future studies. [7] A. Singh and N. Prakash, ‘‘A review of AI models for prediction and
detecting heart disease for improved wellbeing,’’ Vivekananda J. Res.,
vol. 10, pp. 14–25, Oct. 2021.
V. CONCLUSION
[8] S. W. Ali, M. Asif, M. Rashid, S. Tanvir, S. Shams, and S. Abid, ‘‘Detection
This study aimed to provide an efficient Machine learning of crackle and wheeze in lung sound using machine learning technique for
framework PSORF that can not only detect but also clinical decision support system,’’ Vawkum Trans. Comput. Sci., vol. 11,
no. 1, pp. 67–78, 2023.
Classify similar Chronic diseases such as COPD, Asthma,
[9] M. A. Elsadig, A. Altigani, and H. T. Elshoush, ‘‘Breast cancer detection
Bronchiectasis, etc. For this purpose, this study considered using machine learning approaches: A comparative study,’’ Int. J. Electr.
five different datasets across which a series of experiments Comput. Eng., vol. 13, no. 1, p. 736, Feb. 2023.
have been performed. The datasets obtained from public [10] V. R. Allugunti, ‘‘Breast cancer detection based on thermographic images
using machine learning and deep learning algorithms,’’ Int. J. Eng. Comput.
repositories suffered from missing values and imbalanced Sci., vol. 4, no. 1, pp. 49–56, Jan. 2022.
data problems that were rectified through EM Imputation [11] B. S. Abunasser, M. R. J. Al-Hiealy, I. S. Zaqout, and S. S. Abu-Naser,
and SMOTE techniques. The processed data was then passed ‘‘Breast cancer detection and classification using deep learning Xception
algorithm,’’ Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 7, pp. 223–228,
through the PSO-RF framework which provided the best 2022.
optimal feature set and efficient classification result on [12] T. O. Oladele, B. J. Olorunsola, T. O. Aro, H. B. Akande, and
all the datasets. In addition, to validate the classification O. A. Olukiran, ‘‘Nature-inspired meta-heuristic optimization algorithms
for breast cancer diagnostic model: A comparative study,’’ FUOYE J. Eng.
performance of the PSORF framework, both PSO and Technol., vol. 6, no. 1, pp. 26–29, Mar. 2021.
RF were compared with different metaheuristic and ML [13] R. O. Ogundokun, S. Misra, M. Douglas, R. Damaševičius, and
classifiers respectively. The performance of PSO with other R. Maskeliūnas, ‘‘Medical Internet-of-Things based breast cancer diag-
nosis using hyperparameter-optimized neural networks,’’ Future Internet,
metaheuristic techniques, namely firefly, Bat and Genetic vol. 14, no. 5, p. 153, May 2022.
search were compared through radar graphs on the basis of [14] B. Sahu, S. Mohanty, and S. Rout, ‘‘A hybrid approach for breast cancer
various evaluation metrics. It was evident from the graphs that classification and diagnosis,’’ ICST Trans. Scalable Inf. Syst., vol. 6, no. 20,
Jul. 2018, Art. no. 156086.
across all the datasets, PSO provided the best results. Hence, [15] B. J. Olorunsola, T. O. Oladele, T. O. Aro, H. Babalola, and O. A. Olukiran,
for further evaluation, five different PSO-based classifiers ‘‘Performance comparison of selected swarm intelligence algorithms on
were compared by using various performance metrics. The breast cancer diagnosis,’’ Afr. J. MIS, vol. 3, no. 1, pp. 5–21, 2021.
[16] Z. Guo, L. Xu, and N. A. Asgharzadeholiaee, ‘‘A homogeneous
results showed that among all the classifiers, the PSO-based ensemble classifier for breast cancer detection using parameters tuning
RF classifier outperformed the other classifiers in terms of MLP neural network,’’ Appl. Artif. Intell., vol. 36, no. 1, Dec. 2022,
of Accuracy, F-measure, and ROC. However, there were Art. no. 2031820.
[17] X. Jia, X. Sun, and X. Zhang, ‘‘Breast cancer identification using
some classifiers whose performances were similar across machine learning,’’ Math. Problems Eng., vol. 2022, pp. 1–8,
all the datasets. Therefore, for further clarification on the Oct. 2022.
classification performance of the classifiers, Friedman’s [18] H. Huang, X. Feng, S. Zhou, J. Jiang, H. Chen, Y. Li, and C. Li, ‘‘A new
fruit fly optimization algorithm enhanced support vector machine for
testing was performed. The test results proved that among all diagnosis of breast cancer based on high-level features,’’ BMC Bioinf.,
the classifiers PSO-RF achieved the highest rank indicating vol. 20, no. S8, pp. 1–14, Jun. 2019.
that it has outperformed other classifiers. The proposed [19] A. Gupta, R. Kumar, H. S. Arora, and B. Raman, ‘‘C-CADZ: Com-
putational intelligence system for coronary artery disease detection
PSO-RF framework not only classified the binary Chronic using Z-Alizadeh Sani dataset,’’ Int. J. Speech Technol., vol. 52, no. 3,
diseases such as Breast cancer, Diabetes and Heart disease pp. 2436–2464, Feb. 2022.
but also classified multiple chronic diseases that were [20] Y. A. Z. A. Fajri, W. Wiharto, and E. Suryani, ‘‘Hybrid model feature
selection with the bee swarm optimization method and Q-learning on the
symptomatically similar such as COPD, Asthma, Pneumonia, diagnosis of coronary heart disease,’’ Information, vol. 14, no. 1, p. 15,
Bronchiectasis, etc. Dec. 2022.
[21] S. Zhang, Y. Yuan, Z. Yao, J. Yang, X. Wang, and J. Tian, ‘‘Coro-
nary artery disease detection model based on class balancing meth-
REFERENCES ods and LightGBM algorithm,’’ Electronics, vol. 11, no. 9, p. 1495,
[1] C. W. Tsao, A. W. Aday, Z. I. Almarzooq, A. Alonso, A. Z. May 2022.
Beaton, M. S. Bittencourt, A. K. Boehme, A. E. Buxton, A. P. Carson, [22] J. Hassannataj Joloudari, F. Azizi, M. A. Nematollahi, R. Alizadehsani,
Y. Commodore-Mensah, and M. S. Elkind, ‘‘Heart disease and stroke E. Hassannatajjeloudari, I. Nodehi, and A. Mosavi, ‘‘GSVMA: A genetic
statistics, 2022 update: A report from the American heart associatio,’’ support vector machine ANOVA method for CAD diagnosis,’’ Frontiers
Circulation, vol. 145, no. 8, pp. e153–e639, 2022. Cardiovascular Med., vol. 8, p. 2178, Feb. 2022.
[2] A. Singh, N. Prakash, and A. Jain, ‘‘A review on prevalence of [23] B. Kolukisa and B. Bakir-Gungor, ‘‘Ensemble feature selection and
worldwide COPD situation,’’ in Proceedings of Data Analytics and classification methods for machine learning-based coronary artery dis-
Management (Lecture Notes in Networks and Systems), vol. 572, ease diagnosis,’’ Comput. Standards Interfaces, vol. 84, Mar. 2023,
A. Khanna, Z. Polkowski, and O. Castillo, Eds. Singapore: Springer, 2023. Art. no. 103706.
[24] B. Kolukisa, L. Yavuz, A. Soran, B.-G. Burcu, D. Tuncer, A. Onen, [47] Y. Wan, Z. Wang, and T.-Y. Lee, ‘‘Incorporating support vector machine
and V. C. Gungor, ‘‘Coronary artery disease diagnosis using optimized with sequential minimal optimization to identify anticancer peptides,’’
adaptive ensemble machine learning algorithm,’’ Int. J. Bioscience, BMC Bioinf., vol. 22, no. 1, p. 286, May 2021.
Biochemistry Bioinf., vol. 10, no. 1, pp. 58–65, 2020. [48] J. Kennedy and R. Eberhart, ‘‘Particle swarm optimization,’’ in Proc. IEEE
[25] A. Singh and A. Payal, ‘‘CAD diagnosis by predicting stenosis in arteries Int. Conf. Neural Netw. (ICNN), vol. 4, Aug. 2002, pp. 1942–1948.
using data mining process,’’ Intell. Decis. Technol., vol. 15, no. 1, [49] A. Singh and A. Jain, ‘‘Financial fraud detection using bio-inspired key
pp. 59–68, Mar. 2021. optimization and machine learning technique,’’ Int. J. Secur. Appl., vol. 13,
[26] S. Amutha and J. R. Sekar, ‘‘An optimized framework for diabetes mellitus no. 4, pp. 75–90, Dec. 2019, doi: 10.33832/ijsia.2019.13.4.08.
diagnosis using grid search based support vector machine,’’ in Proc. Int. [50] J. S. Park, K. Kim, J. H. Kim, Y. J. Choi, K. Kim, and D. I. Suh, ‘‘A machine
Conf. Comput., Commun., Signal Process. Cham, Switzerland: Springer, learning approach to the development and prospective evaluation of a
Jan. 2023, pp. 153–167. pediatric lung sound classification model,’’ Sci. Rep., vol. 13, no. 1,
[27] A. C. Ramachandra and D. Murthy, ‘‘Diabetes prediction using machine p. 1289, Jan. 2023.
learning approach,’’ Strad Res., vol. 10, no. 8, 2023.
[28] S. Gill and P. Pathwar, ‘‘Prediction of diabetes using various feature
selection and machine learning paradigms,’’ in Modern Approaches AKANSHA SINGH received the B.Tech. and
in Machine Learning & Cognitive Science: A Walkthrough. Cham, M.Tech. degrees in computer science and engi-
Switzerland: Springer, 2022, pp. 133–146. neering from Guru Gobind Singh Indraprastha
[29] P. Rajendra and S. Latifi, ‘‘Prediction of diabetes using logistic regression University (GGSIPU), Delhi, India, in 2017 and
and ensemble techniques,’’ Comput. Methods Programs Biomed. Update, 2020, respectively, where she is currently pur-
vol. 1, Jan. 2021, Art. no. 100032. suing the Ph.D. degree in computer science
[30] J. Dhar, ‘‘Multistage ensemble learning model with weighted voting and and engineering. Her research interests include
genetic algorithm optimization strategy for detecting chronic obstructive machine learning, computational metaheuristic
pulmonary disease,’’ IEEE Access, vol. 9, pp. 48640–48657, 2021. models, deep learning, bioinformatics, and data
[31] R. R. Irshad, S. Hussain, S. S. Sohail, A. S. Zamani, D. Ø. Madsen, mining. She was a recipient of two best paper
A. A. Alattab, A. A. A. Ahmed, K. A. A. Norain, and O. A. S. Alsaiari, awards at an international and national conference respectively. Her honors
‘‘A novel IoT-enabled healthcare monitoring framework and improved
include the Short Term Research Fellowship (STRF) from GGSIPU and the
grey wolf optimization algorithm-based deep convolution neural network
STEM Fellowship.
model for early diagnosis of lung cancer,’’ Sensors, vol. 23, no. 6, p. 2932,
Mar. 2023.
[32] P. S. Zarrin, N. Roeckendorf, and C. Wenger, ‘‘In-vitro classification of NUPUR PRAKASH received the B.E. degree
saliva samples of COPD patients and healthy controls using machine in electronics and communication and the M.E.
learning tools,’’ IEEE Access, vol. 8, pp. 168053–168060, 2020.
degree in computer science and technology from
[33] G. Petmezas, G.-A. Cheimariotis, L. Stefanopoulos, B. Rocha, R. P. the University of Roorkee (now IIT Roorkee), in
Paiva, A. K. Katsaggelos, and N. Maglaveras, ‘‘Automated lung sound
1981 and 1986, respectively, and the Ph.D. degree
classification using a hybrid CNN-LSTM network and focal loss function,’’
from Punjab University, in 1998. She is currently
Sensors, vol. 22, no. 3, p. 1232, Feb. 2022.
a Professor with the Department of Computer
[34] S. W. Ali, M. Asif, M. Rashid, S. Tanvir, S. Shams, and S. Abid, ‘‘Detection
of crackle and wheeze in lung sound using machine learning technique for
Science and Engineering and holds the position of
clinical decision support system,’’ Vawkum Trans. Comput. Sci., vol. 11, the Vice Chancellor of The Northcap University,
no. 1, pp. 67–78, 2023. Gurgaon, India. Prior to joining The NorthCap
[35] ICBHI Dataset. Accessed: Jun. 20, 2023. [Online]. Available: University, she was the Vice-Chancellor of Indira Gandhi Delhi Technical
https://paperswithcode.com/dataset/icbhi-respiratory-sound-database University for Women; the Principal of the Indira Gandhi Institute of
[36] WBCD. Accessed: Jun. 20, 2023. [Online]. Available: Technology, Delhi; the Dean of the School of Engineering and Technology;
https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data and the Dean of the School of ICT, Guru Gobind Singh Indraprastha
[37] Z-Alizadehsani Dataset. Accessed: Jun. 20, 2023. [Online]. Available: University, Government of Delhi. She has been a strong propagator of STEM
https://archive.ics.uci.edu/dataset/extention-of-z-alizadehsani-dataset education among girls and has won many awards and accolades. She has
[38] EXASENS. Accessed: Jun. 20, 2023. [Online]. Available: guided 12 Ph.D. scholars and authored more than 100 research papers and
https://archive.ics.uci.edu/dataset/523/exasens articles in various national and international journals/conferences of repute.
[39] Diabetes Prediction Dataset. Accessed: Sep. 20, 2023. [Online]. Available: Her H-index and i10 index are 17 and 30, respectively, with 1844 citations.
https://data.world/informatics-edu/diabetes-prediction Her research interests include artificial neural networks, natural language
[40] M. Zhang, M. Li, L. Guo, and J. Liu, ‘‘A low-cost AI-empowered processing, mobile communication, secure wireless networks, and machine
stethoscope and a lightweight model for detecting cardiac and respiratory learning algorithms. She is a Life Member of the Computer Society of India
diseases from lung and heart auscultation sounds,’’ Sensors, vol. 23, no. 5, (CSI) and a Former Member of the IEEE Women in Engineering (WIE),
p. 2591, Feb. 2023. USA. She has chaired various expert committees of UGC, NBA, and NAAC.
[41] C. Wall, L. Zhang, Y. Yu, A. Kumar, and R. Gao, ‘‘A deep ensemble neural
network with attention mechanisms for lung abnormality classification
using audio inputs,’’ Sensors, vol. 22, no. 15, p. 5566, Jul. 2022. ANURAG JAIN received the M.Tech. degree from
[42] A. Mohamed, E. Amer, S. N. Eldin, J. Khaled, and M. Hossam, IIT Kharagpur and the Ph.D. degree from Guru
‘‘The impact of data processing and ensemble on breast cancer detection Gobind Singh Indraprastha University, Delhi,
using deep learning,’’ J. Comput. Commun., vol. 1, no. 1, pp. 27–37, India.
Feb. 2022. He is currently a Professor with Guru Gob-
[43] X. Wang, I. Ahmad, D. Javeed, S. Zaidi, F. Alotaibi, M. Ghoneim, ind Singh Indraprastha University. He is doing
Y. Daradkeh, J. Asghar, and E. Eldin, ‘‘Intelligent hybrid deep learning research in the areas of healthcare, cybersecurity,
model for breast cancer detection,’’ Electronics, vol. 11, no. 17, p. 2767, and speech processing. He has also been involved
Sep. 2022. in identifying the importance of ML and data
[44] H. Mohammedqasim, R. Mohammedqasem, O. Ata, and E. I. Alyasin, science in his research domain. He has published
‘‘Diagnosing coronary artery disease on the basis of hard ensemble voting many national and international research papers in many reputed journals
optimization,’’ Medicina, vol. 58, no. 12, p. 1745, Nov. 2022. and conferences. His i10 index is 14 with nearly 675 citations. His research
[45] Vanderbilt Diabetes Datasets. Accessed: Sep. 20, 2023. [Online]. Avail- interests include speech processing, natural language processing, artificial
able: https://hbiostat.org/data/ intelligence, machine learning, and data mining in the healthcare domain.
[46] J. Kudela, ‘‘The evolutionary computation methods no one should use,’’ Prof. Jain is a Life Member of the Computer Society of India (CSI).
2023, arXiv:2301.01984.