0% found this document useful (0 votes)

2 views9 pages

130 - Cervical Cancer Prediction Through Different Screening Methods Using Data Mining

The paper discusses the use of data mining techniques, particularly decision tree algorithms, to predict cervical cancer through various screening methods. It highlights the challenges of cervical cancer screening, especially in low and middle-income countries, and presents a dataset from the University of California, Irvine, which was balanced using the Synthetic Minority Oversampling Technique (SMOTE). The study finds that the Boosted Decision Tree algorithm yielded the highest prediction accuracy, emphasizing the need for effective screening methods to combat cervical cancer.

Uploaded by

maria isabel Vidal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views9 pages

130 - Cervical Cancer Prediction Through Different Screening Methods Using Data Mining

Uploaded by

maria isabel Vidal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 10, No. 2, 2019

Cervical Cancer Prediction through Different

ed
Screening Methods using Data Mining
Talha Mahboob Alam1, Muhammad Milhan Afzal Khan2, Muhammad Atif Iqbal3, Abdul Wahab4, Mubbashar
Mushtaq5

iew
Computer Science and Engineering Department, University of Engineering and Technology Lahore, Pakistan1,2,3,5
School of Systems and Technology, University of Management and Technology Lahore, Pakistan 4
Talhamahbo

Abstract—Cervical cancer remains an important reason of occurrence is abundant in low and middle income countries
deaths worldwide because effective access to cervical screening [9]. The important task of cervical cancer is screening. An

ev
methods is a big challenge. Data mining techniques including ideal screening test is the one that is least incursive, easy to
decision tree algorithms are used in biomedical research for achieve, acceptable to subject, cheap and effective in
predictive analysis. The imbalanced dataset was obtained from diagnosing the disease process in its early incursive stage
the dataset archive belongs to the University of California, when the treatment is easy for ailment. There are four
Irvine. Synthetic Minority Oversampling Technique (SMOTE) screening methods including cervical cytology also called Pap

r
has been used to balance the dataset in which the number of smear test, biopsy, Schiller and Hinslemann [10]. Cytology
instances has increased. The dataset consists of patient age,
screening method is a microscopic analysis of cells scratched
number of pregnancies, contraceptives usage, smoking patterns
and chronological records of sexually transmitted diseases
from the cervix and is used to detect cancerous or pre-
(STDs). Microsoft azure machine learning tool was used for
simulation of results. This paper mainly focuses on cervical
cancer prediction through different screening methods using
data mining techniques like Boosted decision tree, decision forest
er
cancerous conditions of the cervix [11]. Biopsy method is a
surgical process which includes finding of a living tissue
sample for performing diagnosis [12]. The solution of iodine
has applied for visual inspection of cervix known as
Hinslemann test. Lugol's iodine is used for visual inspection
pe
and decision jungle algorithms as well performance evaluation
has done on the basis of AUROC (Area under Receiver operating of cervix after smearing Lugol's iodine detection rate of
characteristic) curve, accuracy, specificity and sensitivity. 10-fold doubtful region over the cervix, this is also known as Schiller
cross-validation method was utilized to authenticate the results test [13].
and Boosted decision tree has given the best results. Boosted
decision tree provided very high prediction with 0.978 on The size of data is increasing gradually. Expansive,
AUROC curve while Hinslemann screening method has used. complex and useful datasets have now expanded in all the
The results obtained by other classifiers were significantly worse different fields of science, business and especially in
ot

than boosted decision tree. healthcare domain. With these larger data sets, the capacity to
mine beneficial hidden knowledge in these huge volume of
Keywords—Boosted decision tree; cervical cancer; data mining; data is gradually significant in today’s economical world. The
dcision trees; decision forest; decision jungle; screening methods method of applying novel techniques for discovering
tn

knowledge from data is called data mining [14]. Medical data

I. INTRODUCTION consists of information regarding patients and symptoms with
Cancer is a dangerous disease in which group of abnormal respect to specific disease. The volume of such type of data is
cells develops hysterically by avoiding the usual rules of cell expanded quickly. By utilizing the traditional techniques, it is
division. Development of cancer takes place when normal exceptionally difficult to separate the important information
cells in a particular portion of the body begin to grow out of from raw medical data. Due to growth in statistics,
rin

control [1]. Each year around 8.2 million people die from mathematics and other domains, it is now possible to extract
cancer which is 13% of total deaths worldwide. In 2017, only the meaningful information from raw data. Data mining is
26% of under developing countries reported having screening helpful where large collections of healthcare data are available
services available for public. In 90% developed countries [15]. Several data mining techniques like support vector
treatment services are available compared to less than 26% of machine (SVM), kernel learning methods as well as clustering
techniques were used in healthcare [16]. With the rise of
ep

low income countries. The expected cancer incidences will

reach up to 22 million in 2030 [2, 3]. Millions of early deaths computing methods for disease prediction, WHO and other
among women is due to lung and breast cancer but cervical international organizations are working together for effective
cancer is most dangerous because it is only diagnosed in screening method to detect the cervical cancer. These
females. Woman’s reproductive system consists of cervix, initiatives are raising public awareness for effective screening
uterus, vagina and the ovaries. Cervix is the opening to the methods for cervical cancer but over the time all these
Pr

uterus from the vagina where cervical cancer occurs [4]. measures have proved to be ineffective because the number of
Sexually transmitted human papillomavirus (HPV) is the parameters for screening of cervical cancer are still debatable
important cause of cervical cancer [5-8]. Cervical cancer [4, 8, 10]. The methods and techniques have been used for

388 | P a g e
www.ijacsa.thesai.org
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3474371
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 2, 2019

screening of cervical cancer are limited to small number of number of parameters with the help of data mining techniques.
parameters. The available literature for screening of cervical As the current techniques are not sufficient, it is necessary to
cancer explores mainly Papanicolaou (Pap) smear test [17], explore the all parameters or symptoms for screening

ed
hormonal status, FIGO stage [18] and cervical intraepithelial prediction of cervical cancer. Decision tree methods have been
neoplasia (CIN) [19] but only single parameter was used for used to predict cervical cancer but the demographic and
screening prediction of cervical cancer. The available data medical attributes were different in previous studies. The aim
mining techniques using large number of parameters [20-23] of this study was to predict the cervical cancer, based on the
were not given effective results. A comparison of studies for demographic information, tumor related parameters, sexually
screening prediction of cervical cancer along with approaches transmitted diseases (STD) related parameters and important

iew
has presented in Table 1. It was not found effective results in medical records.
screening prediction of cervical cancer while using huge

TABLE I. COMPARISON OF EXISTING TECHNIQUES

Data Set
Reference Technique Results

ev
Repository Attributes Instances

[20] Universitario de Caracas Hospital patients 28 858 Hybrid method using deep learning AUC = 0.6875

r
[24] NCBI 61 160 CART Algorithm Accuracy = 83.87%

Naïve Bayes Accuracy =78.93 %

[25]
Chung Shan Medical University Hospital
Tumor Registry
38
er
75 SVM

Random Forest Tree

Accuracy =78.67 %
Accuracy =80.18 %
pe
Bucheon St Mary’s Hospital, Republic of
[21] 15 731 SVM Accuracy =74.41%
Korea

Chung Shan Medical University Hospital

[17] 12 168 MARS Accuracy =86.00%
Tumor Registry

GEP
ot

AUROC=0.72

MLP AUROC=0.67
[18] State Hospital in Rzeszow 10 107
PNN AUROC=0.56
tn

RBFNN AUROC=0.48

Transfer Learning with Partial

[26] Universitario de Caracas Hospital patients 18 858 RMSE=35.11
observability
rin

PNN AUROC=0.818
AUROC=0.659
MLP

GEP AUROC=0.651
ep

Clinical data from patients treated surgically

[27] 23 102 SVM AUROC=0.478
in 1998–2001.
LRA
AUROC=0.559

RBFNN AUROC=0.640
Pr

k-Means AUROC=0.406

389 | P a g e
www.ijacsa.thesai.org
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3474371
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 2, 2019

B. Data Preprocessing
II. RELATED WORK
Data mining fundamentally depends on the quality of data.
Kelwin Fernandes et al. [20] presented an automated Raw data generally vulnerable to noisy data, missing values,

ed
method for predicting the effect of the patient biopsy for the outliers and inconsistency. So, it is vital for selected data to be
diagnosis of cervical cancer by using medical history of processed before being mined. Preprocessing the data is an
patients. Their technique allows a joint and fully supervised essential step to enhance data efficiency. Data preprocessing is
optimization method for high dimensional reduction and one of the most vital data mining step which deals with data
classification. They discovered certain medical results from preparation and transformation of the dataset which make

iew
the embedding spaces and confirmed through the medical knowledge discovery more efficient. There are following steps
literature. R. Vidya and G. M. Nasira [24] predicted cervical which were used to preprocess data in this study for the
cancer using random forest with K-means learning and experiments.
implemented the techniques in MATLAB tool. These
experiments were performed with the help of NCBI dataset to Step 1: Ignoring some instances and attributes which
construct decision tree using classification methods. Yulia et makes the data consistent because of high ratio of missing
al. [25] predicted cervical cancer using Pap smear test results. values. This method is very effective because there were
The Pap smear test results were divided into two categories: several instances and attributes with missing values in the

ev
cancerous and non-cancerous patients. Three classification dataset which has been used. Some attributes in this dataset
methods Naïve Bayes, support vector machine and random like STDs: Time since first diagnosis and STDs: Time since
forest were used to compute the results in which random forest last diagnosis, in which more than 80% data was missing so
tree was given better results. Jimin kahng et al. [21] predicted these attributes were deleted. Two attributes STDs:cervical
the cervical cancer development using SVM. Weka was used condylomatosis and STDs:AIDS has constant value so these
were also deleted.

r
to train and test the data set as well as analyze relationships
between attributes. Chang et al. [17] predicted the recurrence Step 2: There were many attributes with missing values
of cervical cancer in patients using MARS (Multivariate like number of pregnancies, hormonal conceptive etc. whereas
Adaptive Regression Splines) and C5.0 algorithm. MARS
powerfully estimated the relationship between a dependent
variable and set of descriptive variables in a pair wise
regression. C5.0 used greedy method in which a top down
approach was used to build the decision tree and then trained
er
missing values denoted in data as “?” then replace these values
with median values of respective class. The median value was
computed as following [29].
( ) ( )
pe
the data with the help of significant attributes. Maciej Kusy et
al. [18] presented neural networks to predict adverse events in Step 3: The other important task was outlier detection in
cervical cancer patients. MLP is a type of neural network data. An outlier is a data object that deviates significantly
where the input signal is fed forward through a number of from the rest of the objects. In this study, two attributes like
layers. MLP contains input layer, hidden layer and output age and number of partners contains outliers. To solve this
layer. The GEP classifier delivered efficient results in the issue defining lower and upper threshold limits, these outliers
prediction of the adverse events in cervical cancer as compare were replaced with median value.
ot

to other methods. Kelwin Fernandes et al. [26] used transfer Step 4: Normalization is scaling technique of data
learning technique for cervical cancer screening. Their study preprocessing. There were several methods of normalization
consists on linear predictive models. Positive results were i.e. Min-Max, Z-score and decimal scaling normalization [30].
obtained in most experiments as compared to other methods. Decimal scaling normalization was applied by using following
tn

Bogdan Obrzut et al. [27] utilized computational intelligence method [31].

methods for prediction for cervical cancer patients. The
probabilistic neural network (PNN) was a very efficient , V and j denotes the scaled values, range of
method for predicting overall survival in cervical cancer values and smallest integer respectively.
patients treated with radical hysterectomy.
In this study, all integer values of all attributes like age,
rin

III. METHODOLOGY hormonal conceptive etc. are scaled between [0-10] and
Boolean attributes like smokes, HPV,STD etc. are scaled
Our methodology consists of three main steps; the first [0,1].
step is data set selection. The second step includes
preprocessing in which the original data is prepared for Step 5: After data cleaning, cervical cancer data set
classification. The last step contains building effective consists of 734 instances and 32 attributes. This data is
ep

classification based model for prediction. imbalanced because only 70 instances are cancerous and 663
are non-cancerous diagnosed patients. To overcome this
A. Dataset problem of imbalanced data, Synthetic Minority
Publicly available dataset have been utilized [28] which Oversampling Technique (SMOTE) has been used. This is a
was obtained from the UCI repository, in this research. The statistical method for increasing the number of instances in
dataset contains 858 patients and 36 attributes which includes dataset in a balanced way. The module works by producing
Pr

the patient age, number of pregnancies, contraceptives usage, new instances from existing minority cases that supplied as
smoking patterns and chronological records of sexually input. By using SMOTE, majority instances do not change.
transmitted diseases (STDs). The new instances are not just copies of existing minority

390 | P a g e
www.ijacsa.thesai.org
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3474371
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 2, 2019

classes because the algorithm takes samples of the feature previous value is discarded. The new function is written as:
space for each target class and its nearest neighbors which
generate new instances that associate the features of the target ( ) ( ) ∑ ( ) ∑ ( ( ) )

ed
class. This method makes the samples more generic [32]. is
a minority class and searches the nearest neighbors and one
Terminals nodes or leaves are denoted by J in the tree. The
neighbor is randomly selected as then random numbers
accuracy of boosted decision tree will improve if number of
between [0,1] 𝟃 selected. The new sample was created
leaves and size of tree also increases but over fitting problem
as:
and longer processing time may occur.

iew
( ) 𝟃
2) Decision forest: The other algorithm to perform
SMOTE outperforms random oversampling method classification by utilizing ensemble learning method is known
because it also avoids over fitting problem [33]. Using as decision forest. Ensemble methods are generalized rather
SMOTE function the total instances have increased. After than depend on a single model. A generalized model generates
SMOTE, minority class has oversampled from 70 to 563 multiple associated models and merging them which gives
instances. better results. Mostly, ensemble models offer efficient

ev
C. Classification Models accuracy as compared to single decision tree. Decision forest
A supervised method for classification is decision trees, differs from random forest method, in random forest method
which is very popular because most of biomedical data mining the individual decision trees might only use some randomized
tasks have already used decision trees for efficient prediction portion of the data or features. There were many methods to
[18]. Three decision tree methods were used in this study as ensemble decision trees but voting is one of the effective

r
follows. method for making results in an ensemble model [35].
1) Boosted decision tree: The transformation of a Decision forest works by constructing multiple decision trees
weakened classifier to a vigorous or strong classifier is the key and then voting on the most popular output class. By utilizing
role of boosting. A weak classifier is generally a poor
performance prediction model which leads to low accuracy
due to high misclassification rate. Boosted method works
perfect when majority vote of all weak learners for each
er
the whole data set and different starting points, set of
classification trees are constructed. Decision forest outputs
non-normalized frequency of histograms of labels for each
decision tree. Probabilities of each label is determined by
pe
prediction combines in such way that final prediction results aggregation method which sums the histograms then
are effective. Each iteration for a weak learner is added in base normalizes the results. Final decision of the ensemble is based
learner which trained with respect to the error of the whole on trees in which high prediction confidence depends on high
ensemble. When weak learner is added iteratively in an weight. Criminisi [36] presented a complete detail associated
ensemble then it delivers the precise classification. A learning with decision forest.
method consecutively tries new models to provide an extra Step 1: Forest training is done by optimizing the
ot

accuracy of the class variable which leads to gradient parameters of the weak learner at each split node j and
boosting. The negative gradient of the loss function is denotes the parent set and split parameters.
correlated with each new model which tends to minimize the ( )
error. Friedman [34] presented a complete detail associated
tn

with boosted decision tree. Step 2: The objective function or loss function denoted as I
which takes the value of information gain. ( ) Described as
Step 1: ( ) fit a decision tree to pseudo residuals.
Represents the number of leaves and input space divided into Entropy of example set parent node, denotes the
disjoint regions R1 m… R m which predicts a constant value
in each region. The output can be written as: weighting left/right children and ( ) represents entropy of
rin

example sat child nodes

( ) ∑ ( ) ( ) ( ) ∑ ( )
, -

Denotes the predicted value in region. Step 3: The entropy of generic set of training points were
ep

denoted by S and ( ) represents labels of normalized

Step 2: has multiplied with which deceases the empirical histogram resultant to the training points in .
error rate by minimizing the loss function the value of model
is updated ( ) ( ) ∑ ( ) ( )

( ) ( ) ( ) ∑ ( ( ) ( ))
This method contrasts from random forest method like
Pr

some random features of data may only use by decision tree

Step 3: when the new updated value has determined then instead of complete features.

391 | P a g e
www.ijacsa.thesai.org
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3474371
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 2, 2019

3) Decision jungles: A large number of applications was * +* +* + (* + * + * +

developed by using decision forests and trees in data science
but these methods have some limitations like while given Step 3: The information gain objective needs the

ed
minimization of the total weighted entropy of instances,
large amount of data the number of nodes in decision trees
defined as:
will develop exponentially with depth. Decision jungles
method compares two new node merging algorithms that (* + * + * +)=∑ H( )
jointly optimize both the features and the structure of the
(* + * + * +) Presents features and branches for all
directed acyclic graph (DAGs) powerfully. DAGs have same

iew
parent nodes , ∑ presents sum over child nodes and
structure as decision trees except the nodes can have multiple
number of examples at , H ( ) denotes entropy of examples
parents. Node splitting and node merging is determined by
that reach child node .
objective function and entropies of weighted sum at leaves.
The training of DAGs is done level by level by combining Step 4: To solve the minimization problem cluster search
objective function over both structure of DAGs and split method was used which substitutes among optimizing the
function. At each level, the algorithm jointly learns the branching variables and the split parameters but optimizes the
features and branching structure of the nodes. This is done by branching variables more globally.

ev
minimizing an objective function defined over the predictions. IV. RESULTS AND DISCUSSION
Decision jungles require radically less memory while
In this study numerous methods have been examined and
considerably improved generalization. Shotton [37] presented three methods that have the best performances has been
a comprehensive detail related to decision jungles. presented. 10 fold cross validation method was used in the

r
Step 1: Set of parent nodes, and a set of child nodes were evaluation of the proposed methods. Cross validation method
denoted by and . Denotes the parameters of split feature was used because it uses the entire training dataset for both
function for parent node and Si denotes the set of labeled training and evaluation, instead of some portion [38]. Among
that reach node i. The set of instances that reach any child
node is.
(* + * + * +)=[⋃ ( )] ꓴ [⋃ ( )]
er 858 patients, 124 patients have huge number of missing values
due to privacy concerns and the remaining 734 were
considered. Using SMOTE method, imbalanced dataset
problem was overcome and instances were increased. The new
balanced dataset consists of 32 attributes and 1226 patients in
pe
Step 2: The objective function E related with the current which cancer patients were 563 and non-cancer patients were
level of the DAG is a function of * + . The difficulty of 663 as shown in Fig. 1 of confusion matrix. The median value
learning the parameters of the decision DAG as a joint of patients’ age was 26 years (range, 13-84). The median
minimization of the objective over the split parameters * + number of sex partners was 2 (range, 1–10). The median of
and the child assignments * + * + were resolved. The task of first sexual intercourse age was 17 (range, 10-32) and median
learning the current level of a DAG can be written as: of number of pregnancies was 2 (range, 0-10).
ot
tn
rin
ep
Pr

Fig. 1. Confusion Matrix Obtained by using Different Models.

392 | P a g e
www.ijacsa.thesai.org
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3474371
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 2, 2019

There were four screening methods (target attributes) in Boosted decision tree, decision forest and decision jungle
the data set labeled as biopsy, cytology, Schiller and algorithms were used to determine the prediction ability of
hinslemann. These four screening methods have been used to tested models by computing the accuracy, sensitivity,

ed
diagnose cancer and each screening method was trained with specificity and AUROC curve. AUROC curve is a best
same dataset but individually. Boosted decision tree measure to evaluate the performance of classification models
outperformed all other methods as shown in Table 2. The [39-42]. The AUROC curve performance of proposed models
hinslemann screening method also outperformed other has shown in Fig. 2.
methods as AUROC curve performance is 0.978 which was
slightly higher from Biopsy but significant higher from The AUROC curve is a summary measure of performance

iew
cytology and Schiller. The AUROC curve has also given that indicates whether on average a true positive is ranked
better results on boosted decision tree i.e. 0.974 on biopsy, higher than a false positive rate or not. AUROC curve was
0.959 on cytology and 0.943 on Schiller target attribute. The also used for evaluation of different techniques [18, 27] in
complete performance of proposed models has given in Fig. 3 biomedical data mining.
and performance on AUROC curve has shown in Fig. 2.

TABLE II. AUROC CURVE OBTAINED BY THE ML TECHNIQUES ON THE RISK PREDICTION TASK WITH MULTIPLE SCREENING METHODS: BIOPSY, CYTOLOGY,

ev
SCHILLER AND HINSELMANN. PERFORMANCE WAS ALSO EVALUATED IN TERMS OF ACCURACY, SENSITIVITY AND SPECIFICITY

Method Screening Method (Target Attribute) Accuracy Sensitivity Specificity AUROC

Boosted Decision Tree 0.937 0.891 0.974 0.974
Decision Forest Biopsy 0.880 0.785 0.957 0.943

r
Decision Jungle 0.863 0.733 0.968 0.929
Boosted Decision Tree 0.934 0.893 0.965 0.959
Decision Forest
Decision Jungle
Boosted Decision Tree
Cytology
er
0.888
0.879
0.909
0.790
0.735
0.870
0.963
0.989
0.942
0.935
0.929
0.943
pe
Decision Forest Schiller 0.865 0.766 0.948 0.918
Decision Jungle 0.863 0.726 0.978 0.908
Boosted Decision Tree 0.941 0.896 0.974 0.978
Decision Forest Hinslemann 0.892 0.793 0.965 0.945
Decision Jungle 0.879 0.730 0.991 0.934
ot
tn
rin
ep
Pr

Fig. 2. Comparison of Area under Receiver operating Characteristic (AUROC) Curve between Boosted Decision Tree (Blue Line) and Decision Forest (Red
Line) as these Model Gives Best Results. Plots are Shown for the Models with Threshold=5.

393 | P a g e
www.ijacsa.thesai.org
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3474371
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 2, 2019

ed
iew
ev
Fig. 3. The Results in Terms of Accuracy, Sensitivity, Specificity and AUROC Curve in the Prediction of Cervical Cancer.

There are 50% of cervical cancer identification in females processing stage. Several data mining methods like artificial
age (35–54) and around 20% diagnosed more than 65 years neural networks, support vector machines and k-nearest

r
old as well as around 15% of between the age of (20 – 30). neighbor method were also used to resolve the high
Median age for diagnosis in cervical cancer is 48 years. dimensional classification problem [54]. In this study, high
Cervical cancer is significantly unusual in females, younger dimensional classification problem was resolved by using
than age 20. In any case, several young females end up
infected with different sorts of human papilloma infection
(HPV), which can expand their danger of getting cervical
cancer in future. Young females with early abnormal changes
who don't have regular checkup are at high risk of cervical
er decision tree methods because only those attributes were
considered which showed highest relevance with the screening
method (target class). The Hinslemann screening method
showed high performance because Hinslemann is also
traditional method of screening of cervical cancer which is
pe
cancer when they reach at the age of 40 [43-45]. The main risk effective [55-57]. The performance of biopsy screening
factor for cervical cancer growth is HPV. Sexual relation with method was slightly low from Hinslemann screening method.
infected persons is another risk factor for HPV. Different From various studies, it was also found that biopsy screening
parameters with respect to sexual relation like sexual relation has huge impact for cervical cancer detection [58, 59]. The use
with multiple persons are also danger factor for females which of boosted decision tree was preferred because it focused on
leads to cervical cancer. Sexually dynamic females (sexually misclassified instances and had tendency to increase accuracy.
obsessed) have never been in danger of cervical cancer as Boosting is one way to decrease the misclassification rate
ot

compare to those who have multiple sexual partners [46,47]. because inside boosting, iteration was introduced [60]. In
Smoking is related with a higher risk for precancerous general, this increased the degree of accuracy in classification.
fluctuations in the cervix and development to invasive cervical Since, boosted decision tree is an ensemble model in which
cancer, particularly for women infected with HPV. Women results from various models are consolidated. The outcome
tn

with weak immune system are more prone to getting HPV acquired from ensemble model is normally higher to the
[48]. outcome from any of individual model. In this study,
maximum number of leaves per tree were 20 and minimum
This study was exploited late advancements in statistical number of leaves per tree were 10. Learning rate has taken
learning for handling the high dimensional data with low which is 0.1 but processing time slightly increases
numerous features. Other promising areas of research in these
rin

because 100 number of tree to ensemble are constructed while

conditions were also used ensemble learning methods [49]. boosted decision tree has used. Learning rate and number of
Classification algorithms have a wide range of applications
trees are higher which leads to better performance but
which used decision trees other than biomedical domain. processing time also increased. Boosted decision tree was also
Astronomical objects detection [50], fraud detection in used for sentiment analysis of Greek language which
banking [51] and financial failure prediction [52] were also efficiently coped with both high dimensional and imbalanced
utilized decision trees for classification. There were several
ep

datasets and achieves considerably enhanced then other

classification algorithms presented in literature but decision traditional machine learning methods [61] as well as utilized
trees were generally utilized because of its simplicity of for cardiovascular risk prediction [62] and risk prediction for
implementation and ease to understand as compared to other inflammatory bowel disease [63]. Due to some limitations,
classification methods. Recently, high dimensional decision forest was not given better results. The main
classification problems have been abundant due to substantial limitation of the decision forests is that real time prediction is
Pr

developments in technology [53]. Generally the problem of slow when a large number of trees are made. These algorithms
large dimensional data modelling has been solved by variable are fast to train but quite slow to create predictions once they
reduction methods in the preprocessing and in the post- are trained. The accuracy may increases when the number of

394 | P a g e
www.ijacsa.thesai.org
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3474371
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 2, 2019

trees were also increased [64] but also leads slower model for incidence and mortality worldwide for 36 cancers in 185 countries," CA:
prediction. In most real world applications the decision forest is a cancer journal for clinicians, vol. 68, pp. 394-424, 2018.
fast enough but in some situations run time performance is [10] R. A. Kerkar, "Screening for cervical cancer: an overview."

ed
important and other methods would be chosen. Decision forest [11] G. Guvenc, A. Akyuz, and C. H. Açikel, "Health belief model scale for
cervical cancer and Pap smear test: psychometric testing," Journal of
was also used to understand protein interactions and making advanced nursing, vol. 67, pp. 428-437, 2011.
predictions based on all the protein domains [65]. The other
[12] M. T. Galgano, P. E. Castle, K. A. Atkins, W. K. Brix, S. R. Nassau, and
applications of decision forest were prediction of different M. H. Stoler, "Using biomarkers as objective standards in the diagnosis
types of liver diseases including alcoholic, liver damage and of cervical biopsies," The American journal of surgical pathology, vol.
liver cirrhosis [66]. Other than biomedical classification,

iew
34, p. 1077, 2010.
Decision forest method was applied for academic data analysis [13] H. Ramaraju, Y. Nagaveni, and A. Khazi, "Use of Schiller’s test versus
[67] as well as classification and forecasting of chronic kidney Pap smear to increase detection rate of cervical dysplasias," International
disease [68]. Decision Jungles were used for feature selection Journal of Reproduction, Contraception, Obstetrics and Gynecology,
vol. 5, pp. 1446-1450, 2017.
for images with some modification to achieve efficient results
[14] N. Jothi and W. Husain, "Data mining in healthcare–a review," Procedia
with modest training time [69]. Computer Science, vol. 72, pp. 306-313, 2015.
V. CONCLUSION [15] P. Ahmad, S. Qamar, and S. Q. A. Rizvi, "Techniques of data mining in
healthcare: a review," International Journal of Computer Applications,

ev
Nowadays, cervical cancer is a common disease and its vol. 120, 2015.
screening often involves very time consuming clinical tests. In [16] T. M. Alam and M. J. Awan, "Domain Analysis of Information
this perspective, machine learning can deliver efficient ExtractionTechniques," INTERNATIONAL JOURNAL OF
MULTIDISCIPLINARY SCIENCES AND ENGINEERING, vol. 9, pp.
methods to speed up the diagnosis procedure. Furthermore in 1-9, 2018.
this research work, Data mining methods especially tree based
[17] C.-C. Chang, S.-L. Cheng, C.-J. Lu, and K.-H. Liao, "Prediction of

r
algorithms enable sound prediction for cervical cancer Recurrence in Patients with Cervical Cancer Using MARS and
patients. The imbalanced data set problem in which cancerous Classification," International Journal of Machine Learning and
patients were too small as compared to non-cancerous patients Computing, vol. 3, p. 75, 2013.
has been resolved by using SMOTE method. The prediction
ability of the boosted decision tree measured by the AUROC
curve value which outperformed decision forest and decision
jungle. The low AUROC curve value for the decision forest
and decision jungle methods disqualified them as best
er
[18]

[19]
M. Kusy, B. Obrzut, and J. Kluska, "Application of gene expression
programming and neural networks to predict adverse events of radical
hysterectomy in cervical cancer patients," Medical & biological
engineering & computing, vol. 51, pp. 1357-1365, 2013.
J. M. Yamal, M. Guillaud, E. N. Atkinson, M. Follen, C. MacAulay, S.
pe
B. Cantor, et al., "Prediction using hierarchical data: Applications for
predictive classifiers. We believe that with the growing automated detection of cervical cancer," Statistical Analysis and Data
collection of cervical cancer patient’s data and the rapidly Mining: The ASA Data Science Journal, vol. 8, pp. 65-74, 2015.
advancing methods for analyzing this data, we will begin to be [20] K. Fernandes, D. Chicco, J. S. Cardoso, and J. Fernandes, "Supervised
able to identify best screening method for cervical cancer deep learning embeddings for the prediction of cervical cancer
patients that will be informative for patient care. In future, this diagnosis," PeerJ Computer Science, vol. 4, p. e154, 2018.
study can be used as a prototype to develop a healthcare [21] J. Kahng, E.-H. Kim, H.-G. Kim, and W. Lee, "Development of a
cervical cancer progress prediction tool for human papillomavirus-
system for cervical cancer patients.
ot

positive Koreans: A support vector machine-based approach," Journal of

REFERENCES International Medical Research, vol. 43, pp. 518-525, 2015.
[1] M. Hejmadi, Introduction to cancer biology: Bookboon, 2009. [22] Y. Al-Wesabi, A. Choudhury, and D. Won, "Classification of cervical
cancer dataset," in Avishek Choudhury, Wesabi, Classification of
[2] N. Kamil and S. Kamil, "Global cancer incidences, causes and future
Cervical Cancer Dataset, Proceedings of the 2018 IISE Annual
predictions for subcontinent region," Systematic Reviews in Pharmacy,
tn

Conference, Orlando, 2018, pp. 1456-1461.

vol. 6, p. 13, 2015.
[23] Y. Qi, Z. Zhao, L. Zhang, H. Liu, and K. Lei, "A Classification
[3] R. L. Siegel, K. D. Miller, and A. Jemal, "Cancer statistics, 2017," CA: a
Diagnosis of Cervical Cancer Medical Data Based on Various Artificial
cancer journal for clinicians, vol. 67, pp. 7-30, 2017.
Neural Networks," in 2018 International Conference on Network,
[4] S. Subramanian, R. Sankaranarayanan, P. O. Esmy, J. V. Communication, Computer Engineering (NCCE 2018), 2018.
Thulaseedharan, R. Swaminathan, and S. Thomas, "Clinical trial to
[24] R. Vidya and G. Nasira, "Prediction of cervical cancer using hybrid
implementation: Cost and effectiveness considerations for scaling up
induction technique: A solution for human hereditary disease patterns,"
rin

cervical cancer screening in low-and middle-income countries," Journal

Indian Journal of Science and Technology, vol. 9, 2016.
of Cancer Policy, vol. 7, pp. 4-11, 2016.
[25] Y. E. Kurniawati, A. E. Permanasari, and S. Fauziati, "Comparative
[5] K. U. Petry, "HPV and cervical cancer," Scandinavian Journal of
study on data mining classification methods for cervical cancer
Clinical and Laboratory Investigation, vol. 74, pp. 59-62, 2014.
prediction using pap smear results," in Biomedical Engineering
[6] G. Ronco, J. Dillner, K. M. Elfström, S. Tunesi, P. J. Snijders, M. (IBIOMED), International Conference on, 2016, pp. 1-5.
Arbyn, et al., "Efficacy of HPV-based screening for prevention of
[26] K. Fernandes, J. S. Cardoso, and J. Fernandes, "Transfer learning with
ep

invasive cervical cancer: follow-up of four European randomised

partial observability applied to cervical cancer screening," in Iberian
controlled trials," The lancet, vol. 383, pp. 524-532, 2014.
conference on pattern recognition and image analysis, 2017, pp. 243-
[7] K. J. Sales, "Human papillomavirus and cervical cancer," in Cancer and 250.
Inflammation Mechanisms: Chemical, Biological, and Clinical Aspects,
[27] B. Obrzut, M. Kusy, A. Semczuk, M. Obrzut, and J. Kluska, "Prediction
ed: John Wiley & Sons, 2014, pp. 165-180.
of 5–year overall survival in cervical cancer patients treated with radical
[8] W. H. Organization, "WHO guidance note: comprehensive cervical hysterectomy using computational intelligence methods," BMC cancer,
cancer prevention and control: a healthier future for girls and women,"
Pr

vol. 17, p. 840, 2017.

2013.
[28] U. M. L. Repository, "Cervical cancer (Risk Factors) Data Set," 2017.
[9] F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A.
[29] R. F. Woolson and W. R. Clarke, Statistical methods for the analysis of
Jemal, "Global cancer statistics 2018: GLOBOCAN estimates of
biomedical data vol. 371: John Wiley & Sons, 2011.

395 | P a g e
www.ijacsa.thesai.org
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3474371
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 10, No. 2, 2019

[30] S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, "Data preprocessing for [52] N. Öcal, M. K. Ercan, and E. Kadıoğlu, "Predicting Financial Failure
supervised leaning," International Journal of Computer Science, vol. 1, Using Decision Tree Algorithms: An Empirical Test on the
pp. 111-117, 2006. Manufacturing Industry at Borsa Istanbul," International Journal of

ed
[31] S. Patro and K. K. Sahu, "Normalization: A preprocessing stage," arXiv Economics and Finance, vol. 7, 2015.
preprint arXiv:1503.06462, 2015. [53] V. Pappu and P. M. Pardalos, "High-Dimensional Data Classification,"
[32] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, in Clusters, Orders, and Trees: Methods and Applications: In Honor of
"SMOTE: synthetic minority over-sampling technique," Journal of Boris Mirkin's 70th Birthday, F. Aleskerov, B. Goldengorin, and P. M.
artificial intelligence research, vol. 16, pp. 321-357, 2002. Pardalos, Eds., ed New York, NY: Springer New York, 2014, pp. 119-
150.
[33] Z. Zheng, Y. Cai, and Y. Li, "Oversampling method for imbalanced
[54] M. Zekić-Sušac, S. Pfeifer, and N. Šarlija, "A Comparison of Machine

iew
classification," Computing and Informatics, vol. 34, pp. 1017-1037,
2016. Learning Methods in a High-Dimensional Classification Problem,"
Business Systems Research Journal, vol. 5, pp. 82-96, 2014.
[34] J. H. Friedman, "Greedy function approximation: a gradient boosting
machine," Annals of statistics, pp. 1189-1232, 2001. [55] Y. Eraso, "Migrating techniques, multiplying diagnoses: the contribution
of Argentina and Brazil to early'detection policy'in cervical cancer,"
[35] L. Rokach, "Decision forest: Twenty years of research," Information História, Ciências, Saúde-Manguinhos, vol. 17, pp. 33-51, 2010.
Fusion, vol. 27, pp. 111-125, 2016.
[56] M. Aref‐Adib and T. Freeman‐Wang, "Cervical cancer prevention
[36] A. Criminisi and J. Shotton, Decision forests for computer vision and and screening: the role of human papillomavirus testing," The
medical image analysis: Springer Science & Business Media, 2013. Obstetrician & Gynaecologist, vol. 18, pp. 251-263, 2016.
[37] J. Shotton, T. Sharp, P. Kohli, S. Nowozin, J. Winn, and A. Criminisi,

ev
[57] I. Löwy, "Cancer, women, and public health: the history of screening for
"Decision jungles: Compact and rich models for classification," in cervical cancer," História, Ciências, Saúde-Manguinhos, vol. 17, pp. 53-
Advances in Neural Information Processing Systems, 2013, pp. 234-242. 67, 2010.
[38] D. Krstajic, L. J. Buturovic, D. E. Leahy, and S. Thomas, "Cross-
[58] P. Ghosh, G. Gandhi, P. Kochhar, V. Zutshi, and S. Batra, "Visual
validation pitfalls when selecting and assessing regression and
inspection of cervix with Lugol's iodine for early detection of
classification models," Journal of cheminformatics, vol. 6, p. 10, 2014.
premalignant & malignant lesions of cervix," The Indian journal of
[39] F. Garrido, W. Verbeke, and C. Bravo, "A Robust profit measure for

r
medical research, vol. 136, p. 265, 2012.
binary classification model evaluation," Expert Systems with
[59] K. Petry, J. Horn, A. Luyten, and R. Mikolajczyk, "Punch biopsies
Applications, vol. 92, pp. 154-160, 2018.
shorten time to clearance of high-risk human papillomavirus infections
[40] M. Vihinen, "How to evaluate performance of prediction methods? of the uterine cervix," BMC cancer, vol. 18, p. 318, 2018.
Measures and their interpretation in variation effect analysis," in BMC
genomics, 2012, p. S2.
[41] D. J. Hand, "Measuring classifier performance: a coherent alternative to
the area under the ROC curve," Machine learning, vol. 77, pp. 103-123,
2009.
er
[60] A. Niculescu-Mizil and R. Caruana, "Obtaining Calibrated Probabilities
from Boosting.".
[61] V. Athanasiou and M. Maragoudakis, "A novel, gradient boosting
framework for sentiment analysis in languages where NLP resources are
not plentiful: a case study for modern greek," Algorithms, vol. 10, p. 34,
pe
[42] K. Hajian-Tilaki, "Receiver operating characteristic (ROC) curve 2017.
analysis for medical diagnostic test evaluation," Caspian journal of [62] S. F. Weng, J. Reps, J. Kai, J. M. Garibaldi, and N. Qureshi, "Can
internal medicine, vol. 4, p. 627, 2013.
machine-learning improve cardiovascular risk prediction using routine
[43] C. Sun, A. J. Brown, A. Jhingran, M. Frumovitz, L. Ramondetta, and D. clinical data?," PloS one, vol. 12, p. e0174944, 2017.
C. Bodurka, "Patient preferences for side effects associated with cervical [63] Z. Wei, W. Wang, J. Bradfield, J. Li, C. Cardinale, E. Frackelton, et al.,
cancer treatment," International journal of gynecological cancer: official "Large sample size, wide variant spectrum, and advanced machine-
journal of the International Gynecological Cancer Society, vol. 24, p. learning technique boost risk prediction for inflammatory bowel
1077, 2014.
disease," The American Journal of Human Genetics, vol. 92, pp. 1008-
ot

[44] I.C.o.E.S.o.C. Cancer, "Cervical cancer and hormonal contraceptives: 1012, 2013.
collaborative reanalysis of individual data for 16 573 women with
[64] S. Fong, W. Song, R. Wong, C. Bhatt, and D. Korzun, "Framework of
cervical cancer and 35 509 women without cervical cancer from 24
Temporal Data Stream Mining by Using Incrementally Optimized Very
epidemiological studies," The Lancet, vol. 370, pp. 1609-1621, 2007.
Fast Decision Forest," in Internet of Things and Big Data Analytics
[45] G. Danaei, S. Vander Hoorn, A. D. Lopez, C. J. Murray, M. Ezzati, and Toward Next-Generation Intelligence, ed: Springer, 2018, pp. 483-502.
tn

C. R. A. c. group, "Causes of cancer in the world: comparative risk

[65] X.-W. Chen and M. Liu, "Prediction of protein–protein interactions
assessment of nine behavioural and environmental risk factors," The
using random decision forest framework," Bioinformatics, vol. 21, pp.
Lancet, vol. 366, pp. 1784-1793, 2005.
4394-4400, 2005.
[46] F. X. Bosch, A. Lorincz, N. Muñoz, C. Meijer, and K. V. Shah, "The [66] A. Singh and B. Pandey, "A New Intelligent Medical Decision Support
causal relation between human papillomavirus and cervical cancer," System Based on Enhanced Hierarchical Clustering and Random
Journal of clinical pathology, vol. 55, pp. 244-265, 2002.
Decision Forest for the Classification of Alcoholic Liver Damage,
rin

[47] S. de Sanjosé, M. Brotons, and M. A. Pavón, "The natural history of Primary Hepatoma, Liver Cirrhosis, and Cholelithiasis," Journal of
human papillomavirus infection," Best practice & research Clinical healthcare engineering, vol. 2018, 2018.
obstetrics & gynaecology, vol. 47, pp. 2-13, 2018.
[67] A. J. Fernández-García, L. Iribarne, A. Corral, and J. Criado, "A
[48] E. Mazarico, R. Gómez, L. Guirado, N. Lorente, and E. Gonzalez- Comparison of Feature Selection Methods to Optimize Predictive
Bosquet, "Relationship between smoking, HPV infection, and risk of Models Based on Decision Forest Algorithms for Academic Data
cervical cancer," Eur. J. Gynaec. Oncol.-ISSN, vol. 392, p. 2936, 2015. Analysis," in World Conference on Information Systems and
ep

[49] L. Rokach, "Ensemble-based classifiers," Artificial Intelligence Review, Technologies, 2018, pp. 338-347.
vol. 33, pp. 1-39, 2010. [68] W. Gunarathne, K. Perera, and K. Kahandawaarachchi, "Performance
[50] A. Franco-Arcega, L. Flores-Flores, and R. F. Gabbasov, "Application Evaluation on Machine Learning Classification Techniques for Disease
of decision trees for classifying astronomical objects," in Artificial Classification and Forecasting through Data Analytics for Chronic
Intelligence (MICAI), 2013 12th Mexican International Conference on, Kidney Disease (CKD)," in Bioinformatics and Bioengineering (BIBE),
2013, pp. 181-186. 2017 IEEE 17th International Conference on, 2017, pp. 291-296.
[51] K. Chitra and B. Subashini, "Data mining techniques and its applications [69] S. Baek, K. I. Kim, and T.-K. Kim, "Deep Convolutional Decision
Pr

in banking sector," International Journal of Emerging Technology and Jungle for Image Classification," arXiv preprint arXiv:1706.02003,
Advanced Engineering, vol. 3, pp. 219-226, 2013. 2017.

396 | P a g e
www.ijacsa.thesai.org
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3474371

Flight Price Prediction
57% (7)
Flight Price Prediction
19 pages
Paper 51-Cervical Cancer Prediction
No ratings yet
Paper 51-Cervical Cancer Prediction
9 pages
An_Optimized_Machine_learning_model_for_Automatic_Prediction_of_Cervical_Cancer_Using_Decision_Tree_Classifier
No ratings yet
An_Optimized_Machine_learning_model_for_Automatic_Prediction_of_Cervical_Cancer_Using_Decision_Tree_Classifier
6 pages
Minor Project Ankit
No ratings yet
Minor Project Ankit
9 pages
Paper V1.edited
No ratings yet
Paper V1.edited
6 pages
Paper V1.edited
No ratings yet
Paper V1.edited
6 pages
Cervical Cancer Prediction Using Machine Learning
No ratings yet
Cervical Cancer Prediction Using Machine Learning
10 pages
Paper_1-Predicting_Cervical_Cancer_Based_on_Behavioral_Risk_Factors
No ratings yet
Paper_1-Predicting_Cervical_Cancer_Based_on_Behavioral_Risk_Factors
9 pages
Performance Comparison of KNN, Random Forest and SVM in The Prediction of Cervical Cancer From Behavioral Risk
No ratings yet
Performance Comparison of KNN, Random Forest and SVM in The Prediction of Cervical Cancer From Behavioral Risk
9 pages
Stella Proposal Model For Predicting Cervical Cancer Using Machine Learning Algorithms
No ratings yet
Stella Proposal Model For Predicting Cervical Cancer Using Machine Learning Algorithms
5 pages
Cervical_Net_An_Effective_Convolution_Neural_Network_for_Five-class_Classification_of_Cervical_Cells
No ratings yet
Cervical_Net_An_Effective_Convolution_Neural_Network_for_Five-class_Classification_of_Cervical_Cells
5 pages
Design and Development of An Efficient Risk Prediction Model For Cervical Cancer
No ratings yet
Design and Development of An Efficient Risk Prediction Model For Cervical Cancer
11 pages
Early Risk Prediction of Cervical Cancer A Machine Learning Approach
No ratings yet
Early Risk Prediction of Cervical Cancer A Machine Learning Approach
4 pages
10 1109@ccis 2018 86911
No ratings yet
10 1109@ccis 2018 86911
5 pages
Proactive Cervical Cancer Risk Assessment Using Data-Driven Analytics
No ratings yet
Proactive Cervical Cancer Risk Assessment Using Data-Driven Analytics
11 pages
Journal Pre-Proof: Expert Systems With Applications
No ratings yet
Journal Pre-Proof: Expert Systems With Applications
42 pages
Prediction of Cervical Cancer Using Machine Learning and Deep Learning Algorithms
No ratings yet
Prediction of Cervical Cancer Using Machine Learning and Deep Learning Algorithms
5 pages
An Enhanced Ensemble Diagnosis of Cervical Cancer: A Pursuit of Machine Intelligence Towards Sustainable Health
No ratings yet
An Enhanced Ensemble Diagnosis of Cervical Cancer: A Pursuit of Machine Intelligence Towards Sustainable Health
15 pages
Enhancing Cervical Cancer Detection and Robust Classification Through A Fusion of Deep Learning Models
No ratings yet
Enhancing Cervical Cancer Detection and Robust Classification Through A Fusion of Deep Learning Models
14 pages
ML Classifier Comparative Performance Analysis of Prediction On Cervical Cancer
No ratings yet
ML Classifier Comparative Performance Analysis of Prediction On Cervical Cancer
4 pages
Classification of Cervical Cancer Using Deep Learn 2
No ratings yet
Classification of Cervical Cancer Using Deep Learn 2
26 pages
CCD Article for Synopsis
No ratings yet
CCD Article for Synopsis
14 pages
864.full
No ratings yet
864.full
16 pages
Basepaper 17
No ratings yet
Basepaper 17
7 pages
applsci-13-01061-v2
No ratings yet
applsci-13-01061-v2
22 pages
1-s2.0-S2772528624000141-main
No ratings yet
1-s2.0-S2772528624000141-main
15 pages
Intelligent cervical cancer detection: empowering healthcare with machine learning algorithms
No ratings yet
Intelligent cervical cancer detection: empowering healthcare with machine learning algorithms
9 pages
Deep ensemble learning with uncertainty aware prediction ranking for cervical cancer detection using Pap smear images
No ratings yet
Deep ensemble learning with uncertainty aware prediction ranking for cervical cancer detection using Pap smear images
11 pages
Evaluation of Machine Learning Based Optimized Feature Selection Approaches and Classification Methods For Cervical Cancer Prediction
No ratings yet
Evaluation of Machine Learning Based Optimized Feature Selection Approaches and Classification Methods For Cervical Cancer Prediction
16 pages
BasePaper Cervic
No ratings yet
BasePaper Cervic
18 pages
Prediction of Cervical Cancer From Behavior Risk Using Machine Learning Techniques
No ratings yet
Prediction of Cervical Cancer From Behavior Risk Using Machine Learning Techniques
10 pages
The Effect of Features Combination On Coloscopy Images of Cervical Cancer Using The Support Vector Machine Method
No ratings yet
The Effect of Features Combination On Coloscopy Images of Cervical Cancer Using The Support Vector Machine Method
9 pages
Conventional Neural Network Based Automated Cervical Cancer Detection Technique
No ratings yet
Conventional Neural Network Based Automated Cervical Cancer Detection Technique
6 pages
applsci-12-04661
No ratings yet
applsci-12-04661
14 pages
Classification_of_Cervical_Cancer_using_Convolutional_Neural_Networks
No ratings yet
Classification_of_Cervical_Cancer_using_Convolutional_Neural_Networks
5 pages
1 s2.0 S235291482300206X Main
No ratings yet
1 s2.0 S235291482300206X Main
9 pages
A Review of Computational Methods For Cervical Cel
No ratings yet
A Review of Computational Methods For Cervical Cel
38 pages
Predicting Cervical Cancer Using Machine Learning: Project Report Submitted To Raghupathi Cavale
No ratings yet
Predicting Cervical Cancer Using Machine Learning: Project Report Submitted To Raghupathi Cavale
9 pages
Fast Facts: Early Breast Cancer
From Everand
Fast Facts: Early Breast Cancer
Jayant S. Vaidya
No ratings yet
Diagnosing Cervical Cancer Using Machine Learning Methods
No ratings yet
Diagnosing Cervical Cancer Using Machine Learning Methods
3 pages
Cervical Cancer Classification Using Machine Learning With Feature Importance and Model Explainability
No ratings yet
Cervical Cancer Classification Using Machine Learning With Feature Importance and Model Explainability
4 pages
A Study to Examine the Impact of Nursing Strategy on Risk through Cervical Cancer Screening and Anxiety among Women in Selected Community Area at Erode
No ratings yet
A Study to Examine the Impact of Nursing Strategy on Risk through Cervical Cancer Screening and Anxiety among Women in Selected Community Area at Erode
6 pages
Rabbit-Tortoise Model for Cancer Cure
From Everand
Rabbit-Tortoise Model for Cancer Cure
Dr. Biswaroop Roy Chowdhury
No ratings yet
Journal Pone 0296107
No ratings yet
Journal Pone 0296107
20 pages
Cervical Cancer Diagnostics Healthcare System Using Hybrid Object Detection Adversarial Networks
No ratings yet
Cervical Cancer Diagnostics Healthcare System Using Hybrid Object Detection Adversarial Networks
10 pages
Cervix Type and Cervical Cancer Classification System Using Deep Learning Techniques
No ratings yet
Cervix Type and Cervical Cancer Classification System Using Deep Learning Techniques
14 pages
Completed Work
No ratings yet
Completed Work
14 pages
1-s2.0-S2352914824000595-main
No ratings yet
1-s2.0-S2352914824000595-main
15 pages
1 s2.0 S0010482521006843 Main
No ratings yet
1 s2.0 S0010482521006843 Main
12 pages
Paper1 PDF
No ratings yet
Paper1 PDF
11 pages
A Review of Image Analysis and Machine Learning Techniques
No ratings yet
A Review of Image Analysis and Machine Learning Techniques
8 pages
IJISS+Vol.14+No.1+January March+2024,+Pp.17 28
No ratings yet
IJISS+Vol.14+No.1+January March+2024,+Pp.17 28
12 pages
200-206
No ratings yet
200-206
7 pages
Transfer Learning Based Classification of Cervical Cancer Immunohistochemistry Images
No ratings yet
Transfer Learning Based Classification of Cervical Cancer Immunohistochemistry Images
5 pages
Automatic Classification of Cervical Cells Using D
No ratings yet
Automatic Classification of Cervical Cells Using D
11 pages
Multimodal Cervical Cancer Diagnosis IEEE
No ratings yet
Multimodal Cervical Cancer Diagnosis IEEE
3 pages
Cervical Smear Analyzer (CSA) Expert System For Identification of Cervical Cells in Papanicolaou Smear Test
No ratings yet
Cervical Smear Analyzer (CSA) Expert System For Identification of Cervical Cells in Papanicolaou Smear Test
4 pages
Artículo 1
No ratings yet
Artículo 1
16 pages
Proposal
No ratings yet
Proposal
26 pages
Unveiling the Double-edged Sword: The Truth About Cancer Screening Tests
From Everand
Unveiling the Double-edged Sword: The Truth About Cancer Screening Tests
Pasquale De Marco
No ratings yet
Cervical Cancer Final Project (1)
No ratings yet
Cervical Cancer Final Project (1)
3 pages
135 -Precision Clinical Medicine Through Machine Learning Using High and Low Quantile Ranges of Vital Signs for Risk Stratification of ICU Patients
No ratings yet
135 -Precision Clinical Medicine Through Machine Learning Using High and Low Quantile Ranges of Vital Signs for Risk Stratification of ICU Patients
13 pages
119800 a 085
No ratings yet
119800 a 085
6 pages
1-s2.0-S016786551730257X-main
No ratings yet
1-s2.0-S016786551730257X-main
7 pages
9-A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
No ratings yet
9-A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
18 pages
imbalanced data
No ratings yet
imbalanced data
54 pages
matrix confusion
No ratings yet
matrix confusion
25 pages
Lectura 1
No ratings yet
Lectura 1
13 pages
2 - How To Balance The Bionformatics Peudo-Negative Sampling
No ratings yet
2 - How To Balance The Bionformatics Peudo-Negative Sampling
13 pages
An Introduction To ROC Analysis
100% (1)
An Introduction To ROC Analysis
14 pages
Imbalanced Data An Extensive Guide On How To Deal With Imbalanced Classification Problems by Lavinia Guadagnolo Eni digiTALKS Medium
No ratings yet
Imbalanced Data An Extensive Guide On How To Deal With Imbalanced Classification Problems by Lavinia Guadagnolo Eni digiTALKS Medium
18 pages
Advanced Machine Learning Techniques For Cardiovascular Disease Early Detection and Diagnosis
No ratings yet
Advanced Machine Learning Techniques For Cardiovascular Disease Early Detection and Diagnosis
29 pages
Machine learning approaches for mapping and predicting landslide-prone areas in SAo SebastiAo (Southeast Brazil)
No ratings yet
Machine learning approaches for mapping and predicting landslide-prone areas in SAo SebastiAo (Southeast Brazil)
15 pages
Gradient Boosting
No ratings yet
Gradient Boosting
9 pages
s8- Detection of Malicious Social Bots -Project Report
No ratings yet
s8- Detection of Malicious Social Bots -Project Report
58 pages
Xg boosting reference
No ratings yet
Xg boosting reference
6 pages
Dissertation Kgare M
No ratings yet
Dissertation Kgare M
114 pages
1 s2.0 S2667305322000217 Main
No ratings yet
1 s2.0 S2667305322000217 Main
8 pages
Extreme Gradient Boosting
No ratings yet
Extreme Gradient Boosting
8 pages
IPL_PREDICTION final
No ratings yet
IPL_PREDICTION final
6 pages
5-IEEE[1]
No ratings yet
5-IEEE[1]
6 pages
Anticipating Consumer Demand Using ML
No ratings yet
Anticipating Consumer Demand Using ML
8 pages
49.automatic English Essay Scoring Algorithm Based On Machine Learning
No ratings yet
49.automatic English Essay Scoring Algorithm Based On Machine Learning
4 pages
A Machine Learning Approach To Fracture Mechanics Problems
No ratings yet
A Machine Learning Approach To Fracture Mechanics Problems
27 pages
ResearchPaper
No ratings yet
ResearchPaper
18 pages
Dissertation CathyWesthues Revised
No ratings yet
Dissertation CathyWesthues Revised
239 pages
Grid Search Optimization (GSO) Based Future Sales Prediction For Big Mart
No ratings yet
Grid Search Optimization (GSO) Based Future Sales Prediction For Big Mart
7 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Intent Detection Report (1)
0% (1)
Intent Detection Report (1)
41 pages
Analysis of Machine Learning Model For Predicting Sales Forecasting
No ratings yet
Analysis of Machine Learning Model For Predicting Sales Forecasting
6 pages
gradient_boosting
No ratings yet
gradient_boosting
39 pages
Tree-Based Algorithms Approach On Predicting Customer Satisfaction Powerpoint
No ratings yet
Tree-Based Algorithms Approach On Predicting Customer Satisfaction Powerpoint
34 pages
Finalllllllllllll Report
No ratings yet
Finalllllllllllll Report
38 pages
A Predictive Model For Steady-State Multiphase Pipe Flow: Machine Learning On Lab Data
No ratings yet
A Predictive Model For Steady-State Multiphase Pipe Flow: Machine Learning On Lab Data
23 pages
NGBoost Natural Gradient Boosting For Probabilistic Prediction
No ratings yet
NGBoost Natural Gradient Boosting For Probabilistic Prediction
11 pages
Amutenda r206668v Technical Paper
No ratings yet
Amutenda r206668v Technical Paper
5 pages
Abstract
No ratings yet
Abstract
4 pages
Chapter 1
No ratings yet
Chapter 1
39 pages
DMBI
No ratings yet
DMBI
15 pages
CIEA_Term_Project
No ratings yet
CIEA_Term_Project
19 pages

130 - Cervical Cancer Prediction Through Different Screening Methods Using Data Mining

Uploaded by

130 - Cervical Cancer Prediction Through Different Screening Methods Using Data Mining

Uploaded by

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 10, No. 2, 2019

Cervical Cancer Prediction through Different

knowledge from data is called data mining [14]. Medical data

low income countries. The expected cancer incidences will

TABLE I. COMPARISON OF EXISTING TECHNIQUES

Naïve Bayes Accuracy =78.93 %

Random Forest Tree

Chung Shan Medical University Hospital

Transfer Learning with Partial

Clinical data from patients treated surgically

Bogdan Obrzut et al. [27] utilized computational intelligence method [31].

example sat child nodes

denoted by S and ( ) represents labels of normalized

some random features of data may only use by decision tree

3) Decision jungles: A large number of applications was * +* +* + (* + * + * +

Fig. 1. Confusion Matrix Obtained by using Different Models.

Method Screening Method (Target Attribute) Accuracy Sensitivity Specificity AUROC

because 100 number of tree to ensemble are constructed while

datasets and achieves considerably enhanced then other

positive Koreans: A support vector machine-based approach," Journal of

Conference, Orlando, 2018, pp. 1456-1461.

cervical cancer screening in low-and middle-income countries," Journal

invasive cervical cancer: follow-up of four European randomised

vol. 17, p. 840, 2017.

C. R. A. c. group, "Causes of cancer in the world: comparative risk

You might also like