0% found this document useful (0 votes)
2 views15 pages

1 s2.0 S2772528624000141 Main

This study presents a machine learning-based model for predicting cervical cancer using hybrid feature selection techniques. The model achieves 99.19% accuracy by employing various classifiers and addressing data imbalances through Random Oversampling and feature selection methods like PCA and XGBoost. The research highlights the importance of early screening and risk factor analysis in improving cervical cancer prognosis.

Uploaded by

Amin Aminss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views15 pages

1 s2.0 S2772528624000141 Main

This study presents a machine learning-based model for predicting cervical cancer using hybrid feature selection techniques. The model achieves 99.19% accuracy by employing various classifiers and addressing data imbalances through Random Oversampling and feature selection methods like PCA and XGBoost. The research highlights the importance of early screening and risk factor analysis in improving cervical cancer prognosis.

Uploaded by

Amin Aminss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Neuroscience Informatics 4 (2024) 100169

Contents lists available at ScienceDirect

Neuroscience Informatics
journal homepage: www.elsevier.com/locate/neuri

Original article

An ensemble machine learning-based approach to predict cervical


cancer using hybrid feature selection
Khandaker Mohammad Mohi Uddin a,∗ , Abdullah Al Mamun b , Anamika Chakrabarti b ,
Rafid Mostafiz c , Samrat Kumar Dey d
a
Department of Computer Science and Engineering, Southeast University, Bangladesh
b
Department of Computer Science and Engineering, Dhaka International University, Bangladesh
c
Institute of Information Technology, Noakhali Science and Technology University, Bangladesh
d
School of Science and Technology, Bangladesh Open University, Bangladesh

a r t i c l e i n f o a b s t r a c t

Article history: Cervical cancer has recently emerged as the leading cause of premature death among women. Around
Received 27 May 2024 85% of cervical cancer cases occur in underdeveloped countries. There are several risk factors associated
Received in revised form 2 August 2024 with cervical cancer. This study describes a novel predictive model that uses early screening and risk
Accepted 5 August 2024
trends from individual health records to forecast cervical cancer patients’ prognoses. This study uses
machine learning classification techniques to investigate the risk factors for cervical cancer. Additionally,
Keywords:
Cervical cancer use the voting method to evaluate all models and select the most appropriate model. The dataset used
Machine learning in this study contains missing values and shows a significant imbalance. Thus, the Random Oversampling
SelectKBest technique was used as a sampling method. We used Principal Component Analysis (PCA) and XGBoost
XGBoost feature selection techniques to determine the most important features. To predict the accuracy, we
PCA used several machine learning classifiers, including Support Vector Machines (SVM), Random Forest (RF),
Random forest k-nearest Neighbors (KNN), Decision Trees (DT), Naive Bayes (NB), Logistic Regression (LR), AdaBoost
Multilayer perceptron
(AdB), Gradient Boosting (GB), Multilayer Perceptron (MLP), and Nearest Centroid Classifier (NCC). To
Voting classifier
demonstrate the efficacy of the suggested model, a comparison of its accuracy, sensitivity, and specificity
was performed. We used the Random Oversampling approach along with the Ensemble ML method, hard
voting on RF and MLP, and achieved 99.19% accuracy. It is demonstrated that the ensemble ML classifier
(hard voting) performs better at handling classification problems when features are decreased and the
high-class imbalance problem is handled.
© 2024 The Author(s). Published by Elsevier Masson SAS. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction tries, and the risk of cervical cancer is six times higher for women
with HIV than for non-HIV-positive women, with HIV accounting
According to the World Health Organization (WHO), women’s for 5% of all cervical cancer cases [3,4]. Furthermore, HIV causes
cancers such as ovarian, bone, breast, and cervical cancer are the an abnormally high rate of cervical cancer worldwide. The primary
leading causes of unseasonable female death [1]. Cervical cancer, risk factor for cervical cancer is a lifetime infection with positive
also known as cervix cancer, is a malignant tumor that devel- high-risk types of HPV. Human Papillomavirus (HPV) is a group
ops when the tissue cells wrapping around the cervix grow and of viruses that can infect the skin and mucous membranes. There
multiply uncontrollably without following the normal cell division are over 200 different types of HPV, which are divided into low-
process [2]. Cervical cancer is expected to be the third most com- risk and high-risk categories. Low-risk HPV can cause common skin
mon disease among women worldwide by 2020, with 6,04,000 or genital warts, but they are rarely associated with cancer. High-
new cases reported. Nearly 90% of the expected 34,200 cervical risk HPV types have been linked to a variety of cancers, including
cancer deaths in 2020 will occur in middle- to low-income coun- cervical cancer and cancers of the vulva, vagina, penis, anus, and
throat. HPV is most commonly transmitted through intimate skin-
to-skin contact, but it can also spread through other forms of close
*
Corresponding author.
contact. Vaccines are available to prevent infection with the most
E-mail addresses: [email protected] (K.M.M. Uddin),
[email protected] (A. Al Mamun), [email protected] common high-risk types of HPV, significantly lowering the risk of
(A. Chakrabarti), rafi[email protected] (R. Mostafiz), [email protected] (S.K. Dey). associated cancers. That is thought to be the intention behind al-

https://doi.org/10.1016/j.neuri.2024.100169
2772-5286/© 2024 The Author(s). Published by Elsevier Masson SAS. This is an open access article under the CC BY-NC-ND license (http://
creativecommons.org/licenses/by-nc-nd/4.0/).
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Fig. 1. Cervical Cancer: Symptoms.

most all cases of cervical cancer. Smoking is another major risk The following noteworthy contributions have been contributed
factor for cervical cancer. There is compelling evidence to support to the current study. A straightforward, accurate, and efficient
the claim that women who have used the tablet for five years or machine-learning-based cervical cancer prediction model has been
more are significantly more likely to develop cervical cancer in developed.
HPV-positive individuals. Similarly, taking the pill appears to re-
duce the risk of uterine and ovarian cancers. Smoking and passive • To acquire a deeper understanding of their role in the risk
smoking have other negative effects, such as a weakened immune prediction of cervical cancer, different feature representations
system. At some point in their lives, approximately 8 out of 10 were applied.
women with an HPV infection are at risk of developing cervi- • Missing value management, data encoding, feature selection,
cal cancer; however, most HPV infections do not lead to cervical and resampling are the processes in data preprocessing.
cancer [5,6]. In 2023, 13,960 US women may be diagnosed with • The ROS technique is used to balance data and avoid bias.
invasive cervical cancer, with a 50% decrease in prevalence costs • Optimal feature selection is done by utilizing XGBoost, Selec-
since the mid-1970s. The death rate is expected to rise, with 4,310 tKbest, PCA, or a combination of XGBoost and PCA.
deaths in the US. The average age of diagnosis is 50, with more • By deploying numerous classifiers instead of a single machine
than 20% discovered after the age of 65 [7]. A wealth of knowledge learning method, ensemble learning guarantees the correct-
about cancer has been accumulated as a result of advancements in ness of the prediction model.
medical technology, and it is now readily available to a wide range
of medical researchers [8]. Machine learning researchers are con- 2. Related work
stantly striving to improve the cervical cancer data that is now
available and can be studied using predictive models. Predictive This section focuses on different prediction algorithms pre-
models developed using machine learning have accelerated the sented in recent years for predicting cervical cancer, categorizing
diagnosis of cervical cancer [9]. However, one or more of the fol- them as machine learning-based approaches.
lowing challenges continue to limit these models’ ability to predict Jiayi Lu et al. [13] introduced an innovative ensemble technique
cervical cancer outcomes: The use of dimension reduction tech- tailored to estimate the risk of cervical cancer. Their approach ad-
niques, resampling and data balancing methods to address skewed dresses shortcomings in traditional voting-based historical analyses
data, and the resolution of overfitting in Decision Tree (DT) prob- by advocating for a data rectification process to enhance predic-
lems are all absent. Dimensionally Reduced Methodologies is a tion accuracy. Additionally, they explore the potential benefits of
term refers to techniques used to reduce the number of features gene assistance modules in refining predictability. Through rigor-
or variables in a dataset while retaining its essential characteris- ous measurements, their method, utilizing the voting approach,
tics. Common methods include Principal Component Analysis (PCA) demonstrates promising accuracy in predicting cervical cancer risk.
and feature selection techniques. Decision tree is a type of model Nithya et al. [14] aimed to deepen understanding of cervical
used in data analysis and machine learning [10–12]. Furthermore, cancer risk factors by leveraging machine learning (ML) techniques
an innovative approach to developing a machine learning model in the R environment. They systematically investigated various fea-
may provide the opportunity to combat cervical cancer more ef- ture selection techniques to identify pivotal elements for accurate
fectively and improve the future for girls and women. Based on prediction. Their approach involves iterative cycles of model train-
the patient’s risk variables and preliminary screening findings from ing, employing diverse feature selection methods to uncover essen-
each person’s medical data, this work proposes a machine learning tial features and establish an optimal feature selection model.
model for projecting how cervical cancer may progress in a pa- Akter et al. [15] employed three ML models—Decision Trees
tient test. This work makes a significant contribution to the field of (DT), Random Forests (RF), and XGBoost—to forecast cervical can-
healthcare by presenting a prediction model that can aid in the ac- cer risk based on behavioral and feature data. Their study show-
curate identification of cervical cancer using ML approaches. Other cases significant performance improvements over existing tech-
cancers, such as blood, breast, and postal cancers, can also be di- niques, achieving an impressive 93.33% accuracy.
agnosed using the described technique in a medical setting. The Asadi et al. [16] reaffirmed the potential of ML in enhancing
primary goal of this study is to forecast the outcomes of biop- cervical cancer prediction. Their findings underscore the effective-
sies performed on cervical cancer patients. When evaluating the ness of Decision Tree (DT) algorithms in discerning crucial predic-
model’s performance, sensitivity takes precedence over other met- tors, contributing to improved predictive accuracy.
rics such as accuracy because it allows for a more positive cervical Sujay et al. [17] applied ML methodologies to expedite and re-
cancer test result for patients who are already incontinent. fine the determination process, particularly in swiftly identifying
Fig. 1 shows Cervical Cancer: Symptoms. cancerous samples. By leveraging biopsy data with high resolution,

2
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Fig. 2. The visual representation of the proposed method.

their study showcases the Bayes Net approach, achieving an im- 3.1. Dataset description
pressive Area Under the Curve (AUC) of 95% and an accuracy of
96.38% after training various ML models. The Hospital ‘Universitario de Caracas’ in Caracas, Venezuela,
I. J. Ratul et al. [18] presented supervised ML techniques aimed donated the information for this dataset. The Risk Factors dataset
at anticipating early signs of cervical cancer using a dataset from may be found on the UCI ML repository [19] and it’s a pub-
the UCI Machine Learning repository. Their study rigorously eval- lic dataset with 858 patient samples in 36 attributes. There are
uates performance measures including Accuracy, Recall, F1-score, 36 features in this dataset but the main features are Age, Num-
Precision, and ROC-AUC, providing valuable insights into early ber_of_sexual_partners, First_sexual_intercourse, Num_of_pregnan-
threat prediction. cies, Smokes, Smokes_years, Smokes_packs_per_year, IUD,
Tanimu et al. [9] proposed a decision tree classification ap- IUD_years,................ Dx_Cancer, Dx_CIN, Dx_HPV, Dx, Hinselmann,
proach for analyzing cervical cancer risk factors. Through metic- Schiller, Citology, Biopsy. But Hinselmann, Schiller, cytology, and
ulous examination of Recursive Feature Elimination (RFE), Least biopsy are our main target features. In this study, we use Biopsy
Absolute Shrinkage and Selection Operator (LASSO), and other fea- as our main target outcome. With the use of a biopsy, Fig. 3 shows
ture selection methods, they identify pertinent qualities crucial for the class distribution of patients, where 803 are non-cancerous
cervical cancer identification. and 55 are cancerous.

3. Materials and method 3.2. Dataset preprocessing

To achieve maximum accuracy these datasets, need to be pro-


Fig. 2 shows the complete workflow diagram for the proposed cessed before being fitted into ML models. Pre-processing tech-
paradigm. At first, data from the dataset was pre-processed for niques are used to address problems such as missing numbers,
missing value handling and data encoding to convert categori- outliers, label encoding, and others. Missing values can be found
cal variables into numerical values. Several feature selection ap- in this dataset and before applying the Machine Learning model
proaches are used to extract the most important features from the it needs to be cleaned. The mean and median of the column’s
dataset. A resampling method is used to optimize the dataset. Fol- non-missing values are used as imputations to fill in the blanks
lowing that, we divided the dataset into 80% for training and 20% when a column contains missing values. The number of dimen-
for testing, respectively. Following that, various machine-learning sions in the training dataset is reduced using PCA. The most cru-
techniques are used on the training data to train the models. After cial variables in the dataset are chosen using the feature selection
that, run the models through the test dataset before applying the method. Fig. 5 represents the numerical column representation
voting classifier. The confusion matrix is used to assess the effec- of Age, Number_of_sexual_partners, Num_of_pregnancies, Smokes,
tiveness of our model. Finally, we compare the results to determine Dx_Cancer, Dx, Hinselmann, Schiller, Citology, those are the most
the best model. important values from the dataset.

3
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Table 2
Selected Features (XGBoost).

Priority Attributes Importance


1 Schiller 0.372705
2 Hinselmann 0.160002
3 Citology 0.076914
4 Dx_Cancer 0.055241
5 STDs_syphilis 0.052083
6 Dx 0.036602
7 Dx_HPV 0.029568
8 STDs_Time_since_first_diagnosis 0.026331
9 Num_of_pregnancies 0.025440
10 Smokes_packs_per_year 0.022946
11 Hormonal_Contraceptives_years 0.018842
12 STDs_Number_of_diagnosis 0.017675
13 STDs_number 0.016101
14 IUD_years 0.013764
15 STDs_Time_since_last_diagnosis 0.012161
16 Age 0.010484
17 Smokes_years 0.010361
Fig. 3. Cancerous and non-cancerous patients. 18 Number_of_sexual_partners 0.008126
19 Hormonal_Contraceptives 0.007358
20 STDs 0.006977
Table 1
Important Attributes Selected by PCA.

Priority Features
Table 3
1 Citology Important Features Using SelectKBest.
2 Dx
Priority Features
3 Dx_Cancer
4 Dx_CIN 1 STDs_cervical_condylomatosis
5 Dx_HPV 2 STDs_vaginal_condylomatosis
6 Hinselmann 3 STDs_syphilis
7 Schiller 4 STDs_pelvic_inflammatory_disease
8 STDs_AIDS 5 STDs_genital_herpes
9 STDs_cervical_condylomatosis 6 STDs_molluscum_contagiosum
10 STDs_genital_herpes 7 STDs_AIDS
11 STDs_Hepatitis_B 8 STDs_Hepatitis_B
12 STDs_HPV 9 STDs_HPV
13 STDs_molluscum_contagiosum 10 STDs_Number_of_diagnosis
14 STDs_Number_of_diagnosis 11 STDs_Time_since_first_diagnosis
15 STDs_pelvic_inflammatory_disease 12 STDs_Time_since_last_diagnosis
16 STDs_syphilis 13 Dx_Cancer
17 STDs_Time_since_first_diagnosis 14 Dx_CIN
18 STDs_Time_since_last_diagnosis 15 Dx_HPV
19 STDs_vaginal_condylomatosis 16 Dx
20 Biopsy 17 Hinselmann
18 Schiller
19 Citology
3.2.1. Feature selection 20 STDs_cervical_condylomatosis
To increase accuracy, feature selection is employed in the ML
process. Selecting the most crucial variables and eliminating the
redundant and unnecessary ones, also increases the algorithm’s samples with high error rates. It alters the contribution of each un-
ability for prediction. In this study, we use PCA, SelectKBest, and derlying decision tree model by weighting it to gradually improve
XGBoost to pick the most important variables. performance at each step of the model [21]. The features chosen
using the XGBoost feature selection approach are shown in Table 2.
a) PCA
c) SelectKBest
Principal component analysis, a popular technique, is used to
analyze large datasets with a high number of characteristics per SelectKBest is a feature selection method that chooses features
observation to improve the interpretability of the data while pre- based on the highest scores of a given scoring function. As a
serving the most information. The ability to view multidimensional scoring function, the sci-kit-learn library’s mutual_info_classif is
data is another PCA benefit. PCA is a formal statistical technique employed. It ranks characteristics based on their interdependence
for reducing a dataset’s dimensionality [20]. Table 1 shows the at- [22]. The important features chosen using the SelectKBest feature
tributes selected through the PCA feature selection approach. selection approach are shown in Table 3.
b) XGBoost c) XGBoost + PCA
In the machine learning process, feature selection is utilized Combining XGBoost with PCA can be a strong tool for im-
to increase accuracy. The algorithm’s ability to forecast is further proving the performance of our machine-learning model. PCA is
enhanced by selecting the most crucial variables and eliminat- commonly used for dimensionality reduction, which can assist in
ing the redundant and unnecessary ones. We utilized XGBoost to alleviating the curse of dimensionality and improve XGBoost’s ef-
determine which factors were most important for our investiga- ficiency. Use PCA to minimize the number of dimensions in the
tion. XGBoost is a potent machine-learning algorithm based on feature space. This entails computing the main components and
gradient-boosting decision trees. The basic idea is to combine the picking a subset of them based on the desired level of vari-
predicted results from many separate decision tree models that ance retention. After that, train an XGBoost classifier with the
have undergone multiple training rounds. The approach prioritizes PCA-transformed features. Table 4 shows the attributes selected

4
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Table 4 points into distinct classes [25]. Through the use of kernel func-
Important Features Using XGBoost + PCA. tions, SVMs work well in high-dimensional spaces and are suc-
Priority Features cessful for both linearly and non-linearly separable datasets [26].
1 Dx The fundamental concept underlying the support vector ma-
2 Schiller chine is to identify a hyperplane that maximizes the margin—that
3 Citology is, the distance between the hyperplane and the closest data points
4 Hinselmann
from each class, or support vector [27].
5 STDs_AIDS
6 Dx_CIN Equation (1) represents the linear SVM classifier.
7 Dx_HPV
8 STDs_HPV f (a) = sign(x · a + b) (1)
9 Smokes_packs_per_year
10 Smokes_years Here,
11 Dx_Cancer
12 STDs_vaginal_condylomatosis
• f (a) is the decision function that predicts the class of the in-
13 Hormonal_Contraceptives
14 STDs_molluscum_contagiosum put a.
15 STDs_cervical_condylomatosis • x is the weight vector perpendicular to the hyperplane.
16 IUD_years • a is the input feature vector.
17 STDs_Number_of_diagnosis • b is the bias or intercept term.
18 STDs_Time_since_first_diagnosis
19 STDs_Time_since_last_diagnosis
20 Biopsy The hyperplane or decision boundary dividing the classes is repre-
sented by the equation {(x · a) + b} = 0 [28]. Data points nearest to
this hyperplane are called support vectors.
through the XGBoost + PCA feature selection approach. Fig. 4
shows the correlation between the selected features. 3.3.2. Random forest
RF is a group of decision trees that use the standard of several
3.2.2. Imbalanced data handling samples to increase the level of accuracy of a provided dataset.
This dataset has 803 patients who are non-cancerous and 55 It can tackle challenges with classification and regression [29]. In
patients are cancerous. An oversampling process is used to han- comparison to a single decision tree, the Random Forest approach
dle this dataset and Fig. 6 shows the visualization of balanced decreases overfitting and improves accuracy by utilizing the ex-
data. This method’s main goal is to replicate minority class sam- perience of users to merge several weak models into a single,
ples at random [23]. Replicating all of the initial samples, the ROS stronger model [30]. Equation (2) represents the RF classifier.
increases the dataset. The essential component is that sample va-
riety is maintained since the ROS doesn’t produce fresh samples 
A 
A
[24]. R (a) = X R (i ∈ B R ) = X R ∅(i ; z R ) (2)
The number of patients with cancer was matched to the num- R =1 R =1
ber of patients without cancer, resulting in 803 patients in each
group. It is crucial to remember that this modification was made Basic decision tree in RF where X R is the R th region’s typical re-
for analytical reasons. Data is frequently unbalanced in real-world action. B R , Z R is the parameter that is divided on, and φ (i ; Z R ) is
situations, and balancing techniques are used to correct this imbal- the limit of information.
ance.
In order to balance the dataset, duplicate samples from the 3.3.3. K-Nearest Neighbor
original data are created via random oversampling. Although this Among supervised learning techniques, the non-parametric
method can help address class imbalances, because the oversam- KNN method detects similarities in the training set [31]. For clas-
pled data lacks variability, it may cause overfitting, a situation in sification and regression applications, the KNN technique is a pop-
which the model performs well on training data but poorly on ular and straightforward supervised learning algorithm. It is an
unseen data. It is crucial to take into account additional strate- instance-based, non-parametric algorithm that categorizes new in-
gies like data augmentation, sophisticated oversampling techniques stances according to how similar they are to the ones that already
(like SMOTE), or ensemble methods that can lessen overfitting and exist in the training dataset [32]. Equations to 5 represent the KNN
enhance model generalization in order to address this problem. classifier.
Since random oversampling was used to balance the cancerous 
 a
data with the non-cancerous dataset, there is a risk of data leakage, 
where the same data points may inadvertently appear in both the Euclidean =  (bi − c i )2 (3)
training and testing subsets. To address this, it is crucial to ensure i =1
that the training and testing data are completely separated. 
a
Manhattan = | bi − ci | (4)
3.3. Machine learning classifier i =1
  d1
In this study, performance has been evaluated using 10 classi- 
a

fiers which are SVM, RF, KNN, DT, NB, LR, AB, GB, NCC, and MLP.
Minkowski = (| bi − c i |)d (5)
Following that, Voting is used for the best model having the best i =1

accuracy of all classifiers. KNN is commonly based on some distance algorithms and those
are Euclidian, Manhattan, and Minkowski. Those algorithms eval-
3.3.1. Support vector machine uate the separation between the present data and all other data.
A support Vector Machine is a supervised learning algorithm Where b and c are two instances in the dataset, a is the num-
that is used for classification and regression applications. Its ma- ber of features, and b i and c i are the values of the i th feature in
jor goal is to select the appropriate hyperplane for separating data instances b and c, respectively.

5
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Fig. 4. Correlation of features (XGBoost + PCA).

3.3.4. Decision tree Assuming a dataset D = {(a1 , b1 ), (a2 , b2 ), . . . . . . , (an , bn )}, where
Decision Tree is a well-known supervised machine learning ai denotes the feature vector and b i is the corresponding class la-
technique for classification and regression tasks. It operates by re- bel. The DT learns a function f : x → y, where x is the feature
cursively splitting the feature space into smaller parts and making space and y is the set of class labels [34].
judgments based on feature values. Each internal node represents
a decision based on a feature value, and each leaf node represents 3.3.5. Naive Bayes
the class label or regression value [33]. Equation (6) represents the Based on the Bayes theorem and supposing predictor indepen-
DT classifier where A  that gives the accurate decision. dence, the Naive Bayes classifier is a probabilistic machine learning
model. It is extensively employed in classification jobs, particu-
A  (ai ) = [D 0 (ai ) − D (ai )] + K [0 (ai ) −  (ai )] larly in spam filtering and text classification [35]. The result of
many complicated issues may be predicted simply and accurately

bi
 by using NB [9]. It operates based on the Bayes theorem and takes
+ xi + j y 0 a i + j (6) a convinced attribute into account separately. Equation (7) repre-
j =1 sents the NB classifier.

6
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Fig. 5. Column Representation in Numerical.

x( A , B ) x( B | A )x( A ) • x(A|B) denotes the chance of occurrence in the future and x( A)


x ( A | B) = = (7)
x( B | A  )x( B  )
x( B ) Z
y  =1 is the prior probability.

Here, 3.3.6. Logistic regression


To establish a link between categorical dependent variables
• x( A | B) is the probability of event A occurring given that
and independent variables, a typical supervised learning approach
event B has occurred.
called LR computes the likelihood and predicts the result of a
• x(B | A) is the probability of event B occurring given that
event A has occurred. dependent variable with a category [36]. One common statistical
• x( A) and x(B) are the probabilities of events A and B occur- technique is logistic regression. It represents the likelihood that an
ring, respectively. instance falls into a specific class. Logistic Regression relies on the

7
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Fig. 5. (continued).

• W is the weight vector.


• x is the bias or intercept term.
• e is the base of the natural logarithm.

3.3.7. Adaptive boost


Adaptive boosting is also known as AdaBoost. It’s an enhancing
method that is spread by the enhancing algorithm [38]. This algo-
rithm aims to create a single strong classifier by combining several
weak classifiers. The primary concept behind AdaBoost is to merge
numerous weak classifiers into powerful classifiers. Each weak
classifier focuses on the instances that the previous ones misclas-
sified, gradually increasing overall classification performance [39].
Equation (9) represents the AB classifier.
 

x
B ( z) = sign θx b x ( z ) (9)
x=1
Fig. 6. Cancerous and non-cancerous patients after using Oversampling.
Here,
logistic function, which is often referred to as the sigmoid function
[37]. Equation (8) represents the LR classifier. • b x denotes the xth vulnerable classifier.

1 1 3.3.8. Gradient boosting


P a= = (8)
1 + e − W b +x
L
b A group of ML techniques called gradient boosting merges
Here, many insignificant learning models to produce a potent predic-
tion model [38]. Gradient Boosting is an ensemble learning method
• P (a = b1 ) is the probability of the target variable a being class that sequentially combines several weak learners to create a strong
1 given input features. prediction model. The main principle of gradient boosting is to
• b represents the input feature vector. minimize the errors of the prior models to enhance the predic-

8
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

tive performance iteratively. Specifically, for classification tasks, the Table 5


Gradient Boosting classifier is employed [40]. Equation (10) repre- Environment setup of the system.

sents the GB classifiers. Resources Details


CPU Intel® Core™ i5-10400 CPU @ 2.90GHz
G x (a) = G x−1 (a) + η ∗ h x (a) (10) GPU Intel® UHD Graphics 630
RAM 16 GB
Here, Experimental tools Google Colab, Jupyter Notebook

• G x (a) represents the prediction of the ensemble model at the


xth iteration. Equation (14) represents the soft voting.
• G x−1 (a) is the prediction of the ensemble model up to the
(x − 1)th iteration. 
N
sof t voting = argmax j P (C i predictsclassj ) (14)
• η is the learning rate.
i =1
• h x (a) is the weak learner that is being fit to the residuals of
the previous model. Here,

3.3.9. Nearest centroid • C i represents individual classifiers.


The Nearest Centroid Classifier is a machine-learning classifica- • N represents the number of individual classifiers.
tion algorithm that is simple and intuitive. It works on the princi- • 1 is the indicator function.
ple of locating the centroid of each class in the feature space and • P denotes the probability of Ci .
then classifying new data points by assigning them to the class
with the centroid closest to the data point in the feature space 4. Results
[41]. It entails predicting a new example’s class label by identify-
ing the class that the training dataset’s centroid belongs to [38]. A total of 20% and 80% of the data were used for testing
and training the proposed system. Several ML classifiers, includ-
Equation (11) represents the NC classifiers.
ing SVM, RF, KNN, DT, NB, LR, AdB, GB, MLP, and NCC, are used to
1  determine which model performs best. Use an ensemble machine
μi = yx (11)
| Ai | learning classifier in addition to the best results. The voting classi-
x A x
fication method is used in this case. The best machine learning
Here, model among these ten classifiers is identified by using perfor-
mance metrics such as f1-score, accuracy, recall, and precision.
• A i is the number of samples in class μi .
• y x represents the xth feature of the A th
x sample in class μi . 4.1. System specification

3.3.10. Multilayer perceptron Some resources were required to perform this research. The
Multilayer Perceptron is a type of feedforward artificial neural materials listed in Table 5 are used in the development of this
network that has multiple layers of nodes, including an input layer, suggested framework.
one or more hidden layers, and an output layer. Every node in one
layer is connected to every node in the next layer. The MLP can 4.2. Confusion matrix
learn complicated correlations in data and is frequently used for
classification and regression problems [42]. Its activation function Accuracy shouldn’t be the only factor that is taken into con-
is linear. Equation (12) represents the MLP classifier. sideration when assessing a model. The confusion matrix is used
⎛ ⎛ ⎞ ⎞ to compute the metrics. A confusion matrix is made up of TP, TN,
N2 N1 FP, and FN. It is a design for a table that enables the performance
ẑ = f 2 ⎝ wi ∗ f1 ⎝ w i j ∗ x j + bi ⎠ + b2 ⎠
(2 ) (1 ) (1 ) (2 )
(12) of the model to be visualized. True positive denotes a cancerous
i =1 j =1 person who is correctly predicted to have cancer. TN stands for
true negative which means a noncancerous person who was cor-
Here,
rectly predicted to be cancer-free. False positive, or FP, refers to the
mistaken diagnosis of cancer in a noncancerous individual. False
• ẑ is the predicted output.
negative, or FN, is a result that suggests a malignant person is
• f 1 and f 2 are the activation functions.
cancer-free. FN should be maintained as minimal as possible be-
• N 1 and N 2 are the number of neurons.
cause it is obviously the most critical element. Those performance
• x j represents the input features.
(1) (2) metrics were used in this study to assess the performance of the
• wi and w i are the weights. model we’ve proposed. Following is a definition of those metrics
(1) (2)
• bi and b2 are the biases. utilizing their basic representations:

3.3.11. Voting classifier i. Accuracy: The proportion of the model’s accurate predictions
A voting classifier is a type of ML classifier that trains a variety that were made out of all the predictions and it is defined by
of models and selects the one with the greatest likelihood depend- Equation (15).
ing on the class of a category of interest. There are two categories:
TP +TN
hard voting and easy voting [43]. In this proposed work both hard (15)
and soft voting is employed. T P + TN + FP + FN
Equation (13) represents the hard voting. ii. Sensitivity: The model’s capacity to accurately detect cervical
cancer in individuals. A sensitivity of 100% means that all cer-

N
hardvoting = argmax j 1 (C i predictsclassj ) (13) vical cancer cases were correctly predicted by selected model
and it is defined by Equation (16).
i =1

9
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

TP achieved the worst 92.25% accuracy. On the other hand, with XG-
(16)
T P + FN Boost + SelectKBest + PCA + Random Oversampling, Random
iii. Specificity: This measure assesses how well the model can Forest achieved best 98.69% accuracy and Nearest Centroid Clas-
identify individuals who will never get cervical cancer and it sifier achieved worst 89.60% accuracy.
is defined by Equation (17). Table 8 shows the accuracy of ensemble machine-learning clas-
sifiers on Gradient Boosting, Random Forest, and Multilayer Per-
TN ceptron with all combination of feature selection model and Ran-
(17)
TN + FP dom Oversampling. Here the combination of XGBoost + PCA +
Random Oversampling with hard voting on Random Forest and
iv. Precision: This metrics count’s the percentage of cases of cer-
Multilayer Perceptron achieved best 99.19% accuracy.
vical cancer that the model correctly identified. Precision mea-
Fig. 8 illustrates the accuracy of hard voting and soft voting
sures how many cervical cancer cases the model correctly pre-
with the combination of XGBoost + PCA + Random Oversampling
dicted and it is defined by Equation (18).
where the hard voting on Random Forest and Multilayer Percep-
TP tron achieved the best 99.19% accuracy and the soft voting on
(18) Gradient Boosting and Random Forest achieved the worst 98.63%
TP + FP
accuracy.
v. F-Measure: The model’s sensitivity and precision are combined
to form the F-Measure which is known as the harmonic mean
4.3. ROC & AUC
of the two and it is defined by Equation (19).

TP ROC curve is a method for displaying a model’s efficiency. It


1
(19)
TP + 2
(F P + F N) is a thorough index that reflects the sensitivity and specificity of
continuous variables. The curve depicts the connection between
Fig. 7 shows the confusion matrix for all ML classifiers, here the sensitivity and specificity. The area under the ROC curve (AUC) is
best features are selected byPCA and XGBoost feature selection frequently used to assess the performance of models, also known
method. as the ROC curve. The overall two-dimensional surface area of the
The evaluation metrics sensitivity, specificity, precision, and F- ROC curve is measured by AUC. The AUC value increases as the
Measure are particularly essential in the context of binary classifi- model’s performance improves.
cation problems. They provide essential details about a classifier’s Fig. 9 illustrates the ROC curve and AUC value of all machine-
performance and are essential in figuring out how well the model learning models with the combination of XGBoost + PCA +
functions overall. Random Oversampling where the best AUC value of 96.14% was
These metrics provide a comprehensive view of the classifier’s achieved by the Random forest and the lowest AUC value of 85.33%
performance in various aspects: was achieved by the Adaptive Boost Classifier.
Fig. 10 illustrates the ROC curve and AUC value of all machine-
• Sensitivity and specificity focus on the correct identification of learning models with the combination of SelectKBest + PCA +
positive and negative instances. Random Oversampling where the best AUC value of 92.11% was
• Precision focuses on the accuracy of positive predictions. achieved by the Support Vector Machine and the lowest AUC value
• F-Measure provides a balanced assessment that takes into ac- of 72.92% achieved by the K-Nearest Neighbor Classifier.
count both false positives and false negatives. Fig. 11 illustrates the ROC curve and AUC value of all machine-
learning models with the combination of XGBoost + SelectKBest +
Table 6 shows the model evaluation of different machine-learning PCA + Random Oversampling where the best AUC value of 94.55%
classifiers. Where the best combination of XGBoost + PCA+ Ran- achieved by the Naïve Bayes classifiers and the lowest AUC value
dom Oversampling + Random Forest achieved 100% sensitivity of 78.28% achieved by the Decision Tree Classifier.
with 97.93%, 97.88%, and 98.93% specificity, precision, and F-
measure respectively and the worst combination of SelectKbest
5. Discussion
+ PCA+ Random Oversampling + K-Nearest Neighbor achieved
81.33% sensitivity with 96.45%, 97.14%, and 96.38% specificity, pre-
cision, and F-measure respectively. In this work, we evaluated our proposed model’s efficacy in
Accuracy is an important evaluation metric in machine learn- predicting the risk of cervical cancer with the most recent ap-
ing because it provides a simple measure of how well a model proaches published by Tanimu J. et al. [9], Abdoh et al. [44], and
performs overall. It denotes the proportion of correctly predicted Ijaz et al. [45]. Our model showed encouraging results in terms
instances to the total number of instances in a dataset. Table 7 of accuracy, sensitivity, and specificity by combining sampling
shows the accuracy of ten machine-learning classifiers with all techniques with machine learning algorithms. Abdoh et al. [44]
combinations of feature selection models and Random Oversam- used SMOTE-RF, SMOTE-RF-RFE, and SMOTE-RF-PCA techniques,
pling. With XGBoost + PCA, Logistic Regression achieved the best whereas Ijaz et al. [45] used DBSCAN, SMOTETomek, RF, iForest,
at 97.67% accuracy and Naïve Bayes achieved the worst at 91.47% and SMOTE in various combinations. Additionally, Tanimu J. et al.
accuracy. On the other hand, with XGBoost + PCA + Random [9] focused on RFE, LASSO, SMOTETomek, and DT algorithms.
Oversampling, Random Forest achieved the best 98.94% accuracy However, Table 9 shows our approach performed better than
and Naïve Bayes achieved the worst 91.41% accuracy.WithSelec- these methods in terms of accuracy, achieving an unusual 99.19%
tKBest + PCA, Logistic Regression and Nearest Centroid Classi- when using a voting technique on Random Forest (RF) and Mul-
fier achieved the best 96.90% accuracy and Naïve Bayes achieved tilayer Perceptron (MLP). Furthermore, our model achieved com-
the worst 86.43% accuracy. On the other hand, with SelectKBest petitive accuracies of 99.00% and 98.63% in various combinations,
+ PCA + Random Oversampling, Random Forest achieved the including ROS+XGBoost+PCA+RF and ROS+XGBoost+PCA+MLP.
best 96.45% accuracy and Nearest Centroid Classifier achieved the At this time, we achieved a significant accuracy of 98.57% using
worst 90.54% accuracy.WithXGBoost + SelectKBest + PCA, Logis- ROS+XGBoost+PCA+GB. Furthermore, our model shows higher
tic Regression achieved the best 97.29% accuracy and Naïve Bayes sensitivity than the previously mentioned methodologies.

10
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Table 6
Model Evaluation of different ML classifiers using the combination of different
feature selection approaches.

Data Model Sensitivity Specificity Precision F-Measure


SVM 92.52% 95.37% 95.52% 94.00%
RF 100% 97.93% 97.88% 98.93%
KNN 100% 97.22% 97.14% 98.55%
XGBoost
DT 100% 96.98% 96.89% 98.42%
+
NB 93.81% 89.26% 88.67% 91.71%
ROS
LR 93.96% 93.09% 93.03% 93.49%
+
AdB 99.74% 97.21% 97.14% 98.42%
PCA
GB 100% 97.22% 97.14% 98.55%
MLP 100% 97.10% 97.01% 98.48%
NCC 87.06% 95.03% 95.52% 91.09%

SVM 91.79% 95.82% 96.01% 93.85%


RF 98.44% 94.62% 94.40% 96.38%
KNN 81.33% 96.45% 97.14% 88.54%
KBest
DT 97.78% 93.46% 93.15% 95.41%
+
NB 93.82% 87.82% 86.92% 90.24%
ROS
LR 91.77% 95.57% 95.77% 93.72%
+
AdB 97.29% 94.10% 93.90% 95.56%
PCA
GB 97.41% 93.88% 93.65% 95.49%
MLP 96.13% 95.90% 95.89% 96.01%
NCC 86.78% 95.15% 95.64% 91.00%

SVM 93.35% 94.33% 94.40% 93.87%


RF 100% 97.45% 97.38% 98.68%
XGBoost
KNN 100% 97.33% 97.26% 98.61%
+
DT 100% 95.14% 94.89% 97.38%
KBest
NB 93.36% 89.74% 89.29% 91.28%
+
LR 94.24% 93.69% 93.65% 93.94%
ROS
AdB 100% 95.82% 95.64% 97.77%
+
GB 100% 97.10% 97.01% 98.48%
PCA
MLP 100% 97.22% 97.14% 98.55%
NCC 85.41% 94.92% 95.52% 90.18%

Table 7
The Accuracy of all ML classifiers.

Accuracy
Model Name XGBoost + PCA XGBoost + PCA + ROS KBest + PCA KBest + PCA + ROS KBest + XGBoost + PCA KBest + XGBoost + PCA + ROS
SVM 96.51% 93.90% 93.80% 93.71% 95.35% 93.84%
RF 96.12% 98.94% 92.64% 96.45% 96.12% 98.69%
KNN 96.51% 98.57% 92.64% 87.42% 96.90% 98.63%
DT 94.19% 98.44% 92.64% 95.52% 93.41% 97.45%
NB 91.47% 91.41% 86.43% 90.60% 92.25% 91.47%
LR 97.67% 93.52% 96.90% 93.59% 97.29% 93.96%
AdB 94.57% 98.44% 92.64% 95.64% 94.96% 97.82%
GB 94.96% 98.57% 93.02% 95.58% 96.12% 98.51%
MLP 97.29% 93.51% 92.64% 96.01% 96.90% 98.57%
NCC 96.90% 90.66% 96.90% 90.54% 96.12% 89.60%

Table 8
The Accuracy of Ensemble (Hard Voting& Soft Voting) ML classifiers.

Accuracy
Model Name XGBoost +PCA +ROS + Voting (Hard) KBest +PCA +ROS + Voting (Hard) KBest +XGBoost +PCA +ROS + Voting (Hard)
GB + RF 99.07% 95.89% 99.00%
GB + MLP 99.00% 95.89% 99.00%
RF + MLP 99.19% 96.26% 99.00%
GB + RF + MLP 98.82% 95.89% 98.75%
Accuracy
Model Name XGBoost + PCA + ROS + Voting (Soft) KBest + PCA + ROS + Voting (Soft) KBest + XGBoost + PCA + ROS + Voting (Soft)
GB + RF 98.63% 95.95% 98.63%
GB + MLP 98.75% 95.58% 98.63%
RF + MLP 99.00% 96.51% 98.69%
GB + RF + MLP 98.88% 95.89% 98.63%

11
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Fig. 7. Confusion Metrics of (a) SVM (b) RF (c) KNN (d) DT (e) NB (f) LR (g) NCC (h) AdB (i) GB (j) MLP.

Table 9
Comparison results of our model with past studies.

Studies Method Features count Accuracy Sensitivity Specificity


Abdoh et al. [44] SMOTE-RE 30 96.06% 94.55% 97.51%
SMOTE-RE-RFE 18 95.87% 94.42% 97.26%
SMOTE-RE-PCA 11 95.74 94.16%% 97.76%
Ijaz et al. [45] DBSCAN+SMOTETomek+RF 10 97.72% 97.43%% 98.01%
DBSCAN+SMOTE+RF 10 97.22% 96.43% 98.01%
iForest+SMOTETomek+RF 10 97.50% 97.91% 97.08%
iForest+SOMET+RF 10 97.58% 97.45% 97.58%
Tanimu, J.J [9] RFE+DT 20 97.65% 85.71% 98.72%
LASSO+DT 10 96.47% 71.43% 98.71%
RFE+SMOTETomek+DT 20 98.82% 100% 98.71%
LASSO+ SMOTETomek+DT 10 92.94% 85.71% 93.59%
Our Proposed Model ROS+XGBoost+PCA+RF 20 98.94% 100% 97.93%
ROS+XGBoost+PCA+GB 20 98.57% 100% 97.22%
ROS+XGBoost+PCA+MLP 20 98.51% 100% 97.10%
ROS+KBest+PCA+RF 20 96.45% 98.44% 94.62%
ROS+KBest+PCA+AB 20 95.64% 97.29% 94.10%
ROS+KBest+PCA+MLP 20 96.01% 96.13% 95.90%
ROS+XGBoost+PCA+ Hard Voting(RF+MLP) 20 99.19% 100% –

12
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Fig. 7. (continued).

Fig. 8. Accuracy of Voting Classifiers.

13
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Fig. 9. ROC curve and AUC (XGBoost+ROS+PCA). Fig. 11. ROC curve and AUC (XGBoost+KBest+ROS+PCA).

Our best-performing model, utilizing hard voting with a combina-


tion of MLP, RF, XGBoost, and PCA, achieved an accuracy of 99.19%
and 100% sensitivity. Although we also explored feature selection
techniques like SelectKbest and a combination of XGBoost and
SelectKbest, they didn’t perform as well as XGBoost alone with
PCA. In future research, we will gather a larger dataset to im-
prove the performance of our system. We’ll also look into more
advanced class balancing algorithms and ensemble-based classi-
fiers like boosting and bagging to create an online screening tool.

Human and animal rights

The authors declare that the work described has not involved
experimentation on humans or animals.

Informed consent and patient details

The authors declare that this report does not contain any per-
sonal information that could lead to the identification of the pa-
tient(s) and/or volunteers.

Funding

This work did not receive any grant from funding agencies in
Fig. 10. ROC curve and AUC (KBest+ROS+PCA). the public, commercial, or not-for-profit sectors.

Author contributions
6. Conclusion
All authors attest that they meet the current International Com-
Cervical cancer is now recognized as a major cause of death mittee of Medical Journal Editors (ICMJE) criteria for Authorship.
among women. The World Health Organization (WHO) reports that
more than 85% of all cervical cancer cases occur in developing Declaration of competing interest
countries. Thanks to machine learning, we can identify factors that
increase the risk of this disease in women. In our study, we tested The authors declare that they have no known competing finan-
ten different machine learning models and used an Ensemble ML cial or personal relationships that could be viewed as influencing
classifier (hard voting) to predict cervical cancer based on pa- the work reported in this paper.
tient data regarding risk factors. By balancing the classes in our
dataset, we significantly improved the accuracy of our predictions. Acknowledgement
We found that random oversampling particularly enhanced our
classification results. Compared to previous methods, our recom- We are thankful to Dr. Umme Raihan Siddiqi, Department of
mended model showed great improvement in experimental tests. Physiology, Shaheed Suhrawardy Medical College (ShSMC), Dhaka,

14
K.M.M. Uddin, A. Al Mamun, A. Chakrabarti et al. Neuroscience Informatics 4 (2024) 100169

Bangladesh for her immense support in study design, dataset val- [20] I.T. Jolliffe, J. Cadima, Principal component analysis: a review and recent de-
idation, developing medical terminology, appropriate mobile app velopments, Philos. Trans. R. Soc. A, Math. Phys. Eng. Sci. 374 (2065) (2016)
20150202.
implementation, and deployment.
[21] C. Bentéjac, A. Csörgő, G. Martínez-Muñoz, A comparative analysis of gradient
boosting algorithms, Artif. Intell. Rev. 54 (2021) 1937–1967.
References [22] P. Fabian, Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12
(2011) 2825.
[1] WHO, Comprehensive cervical cancer prevention and control: a healthier fu- [23] H. Li, J. Li, P.C. Chang, J. Sun, Parametric prediction on default risk of Chinese
ture for girls and women, Geneva, Switzerland, 2013. listed tourism companies by using random oversampling, isomap, and locally
[2] B. Nithya, V. Ilango, Evaluation of machine learning based optimized feature linear embeddings on imbalanced samples, Int. J. Contemp. Hosp. Manag. 35
selection approaches and classification methods for cervical cancer prediction, (2013) 141–151.
SN Appl. Sci. 1 (2019) 1–16. [24] R. Bardenet, M. Brendel, B. Kégl, M. Sebag, Collaborative hyperparameter tun-
[3] H. Sung, J. Ferlay, R.L. Siegel, M. Laversanne, I. Soerjomataram, A. Jemal, F. Bray, ing, in: International Conference on Machine Learning, PMLR, 2013, May,
Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality pp. 199–207.
worldwide for 36 cancers in 185 countries, CA Cancer J. Clin. 71 (3) (2021) [25] H. Tan, Machine learning algorithm for classification, J. Phys. Conf. Ser. 1994 (1)
209–249. (2021, August) 012016, IOP Publishing.
[4] D. Stelzle, L.F. Tanaka, K.K. Lee, A.I. Khalil, I. Baussano, A.S. Shah, D.A. McAl- [26] S.R. Amendolia, G. Cossu, M.L. Ganadu, B. Golosio, G.L. Masala, G.M. Mura, A
lister, S.L. Gottlieb, S.J. Klug, A.S. Winkler, F. Bray, Estimates of the global bur- comparative study of k-nearest neighbour, support vector machine and multi-
den of cervical cancer associated with HIV, Lancet Glob. Health 9 (2) (2021) layer perceptron for thalassemia screening, Chemom. Intell. Lab. Syst. 69 (1–2)
e161–e169. (2003) 13–20.
[5] W. Wang, E. Arca, A. Sinha, K. Hartl, N. Houwing, S. Kothari, Cervical cancer [27] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995) 273–297.
screening guidelines and screening practices in 11 countries: a systematic lit- [28] C.M. Bishop, N.M. Nasrabadi, Pattern Recognition and Machine Learning, Vol. 4,
erature review, Prev. Med. Rep. 28 (2022) 101813. No. 4, Springer, New York, 2006, p. 738.
[6] A. Welfare, Cancer Data in Australia, AIHW, Canberra, 2022. [29] S. Wan, Y. Liang, Y. Zhang, M. Guizani, Deep multi-layer perceptron classifier for
[7] M.M. Carneiro, Reflections on Pink October, Women & Health 61 (10) (2021) behavior analysis to estimate Parkinson’s disease severity using smartphones,
915–916. IEEE Access 6 (2018) 36825–36833.
[8] S.E. Jujjavarapu, S. Deshmukh, Artificial neural network as a classifier for the [30] L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32.
identification of hepatocellular carcinoma through prognosticgene signatures, [31] L.E. Peterson, K-nearest neighbor, Scholarpedia 4 (2) (2009) 1883.
Curr. Genomics 19 (6) (2018) 483–490. [32] N.S. Altman, An introduction to kernel and nearest-neighbor nonparametric re-
[9] J.J. Tanimu, M. Hamada, M. Hassan, H.A. Kakudi, J.O. Abiodun, Machine learning gression, Am. Stat. 46 (3) (1992) 175–185.
method for classification of cervical cancer, Electronics 11 (2022) 463. [33] L. Breiman, Classification and Regression Trees, Routledge, 2017.
[10] T.M. Alam, M.M.A. Khan, M.A. Iqbal, W. Abdul, M. Mushtaq, Cervical cancer [34] T. Hastie, R. Tibshirani, J.H. Friedman, J.H. Friedman, The Elements of Statistical
prediction through different screening methods using data mining, Int. J. Adv. Learning: Data Mining, Inference, and Prediction, vol. 2, Springer, New York,
Comput. Sci. Appl. 10 (2) (2019). 2009, pp. 1–758.
[11] D.N. Punjani, K.H. Atkotiya, Cervical cancer test identification classifier using [35] D. Meurers, Natural language processing and language learning, in: Encyclope-
decision tree method, Int. J. Res. Advent Technol. 7 (4) (2019). dia of Applied Linguistics, 2012, pp. 4193–4205.
[12] A. Choudhury, A framework for safeguarding artificial intelligence systems [36] T. Rymarczyk, E. Kozłowski, G. Kłosowski, K. Niderla, Logistic regression for ma-
within healthcare, British J. Healthcare Manag. 25 (8) (2019) 1–6. chine learning in process tomography, Sensors 19 (15) (2019) 3400.
[13] J. Lu, E. Song, A. Ghoneim, M. Alrashoud, Machine learning for assisting cervi-
[37] G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learn-
cal cancer diagnosis: an ensemble approach, Future Gener. Comput. Syst. 106
ing, vol. 112, Springer, New York, 2013, p. 18.
(2020) 199–205.
[38] R. Rojas, AdaBoost and the super bowl of classifiers a tutorial introduction to
[14] B. Nithya, V. Ilango, Evaluation of machine learning based optimized feature
adaptive boosting, Freie University, Berlin, Tech. Rep, 1(1), 2009, pp. 1–6.
selection approaches and classification methods for cervical cancer prediction,
[39] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning
SN Appl. Sci. 1 (2019) 1–16.
and an application to boosting, J. Comput. Syst. Sci. 55 (1) (1997) 119–139.
[15] L. Akter, M.M. Islam, M.S. Al-Rakhami, M.R. Haque, Prediction of cervical can-
[40] J.H. Friedman, Greedy function approximation: a gradient boosting machine,
cer from behavior risk using machine learning techniques, SN Comput. Sci. 2
Ann. Stat. (2001) 1189–1232.
(2021) 1–10.
[41] R. Szeliski, Computer Vision: Algorithms and Applications, Springer Nature,
[16] F. Asadi, C. Salehnasab, L. Ajori, Supervised algorithms of machine learning for
2022.
the prediction of cervical cancer, J. Biomed. Phys. Eng. 10 (4) (2020) 513.
[42] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-
[17] S.K. Suman, N. Hooda, Predicting risk of cervical cancer: a case study of ma-
propagating errors, Nature 323 (6088) (1986) 533–536.
chine learning, J. Stat. Manag. Syst. 22 (4) (2019) 689–696.
[43] D. Ruta, B. Gabrys, Classifier selection for majority voting, Inf. Fusion 6 (1)
[18] I.J. Ratul, A. Al-Monsur, B. Tabassum, A.M. Ar-Rafi, M.M. Nishat, F. Faisal, Early
(2005) 63–81.
risk prediction of cervical cancer: a machine learning approach, in: 2022
[44] S.F. Abdoh, M.A. Rizka, F.A. Maghraby, Cervical cancer diagnosis using random
19th International Conference on Electrical Engineering/Electronics, Computer,
forest classifier with SMOTE and feature reduction techniques, IEEE Access 6
Telecommunications and Information Technology (ECTI-CON), IEEE, 2022, May,
(2018) 59475–59485.
pp. 1–4.
[45] M.F. Ijaz, M. Attique, Y. Son, Data-driven cervical cancer prediction model with
[19] Kelwin Fernandes, Jaime Cardoso, Jessica Fernandes, Cervical cancer (risk fac-
outlier detection and over-sampling methods, Sensors 20 (10) (2020) 2809.
tors), UCI Machine Learning Repository, https://doi.org/10.24432/C5Z310, 2017.

15

You might also like