Prediction of Cervical Cancer Using Machine Learning and Deep Learning Algorithms
Prediction of Cervical Cancer Using Machine Learning and Deep Learning Algorithms
Volume 4 Issue 6, September-October 2020 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470
@ IJTSRD | Unique Paper ID – IJTSRD33378 | Volume – 4 | Issue – 6 | September-October 2020 Page 426
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
Random Forest Classifier (RF), Support Vector Machine
(SVM) and Neural networks, persons with cervical cancer
will be classified. It obviously reduces further proceeding
tests.
TARGET VARIABLES
The target variables are four tests and are characterized by
‘1’ or ‘0’ in our data set where ‘1’ represents a malignant
tumor and ‘0’ indicates benign tumor.
@ IJTSRD | Unique Paper ID – IJTSRD33378 | Volume – 4 | Issue – 6 | September-October 2020 Page 427
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
They are detailed below: Fig 2- Count of Target Variables Vs Age
SCHILLER
In this test, Schiller's iodine solution is applied to the cervix
under direct vision. Normal cervical mucosa contains
glycogen and stains brown, whereas abnormal or cancer
affected areas do not take up the stain. The abnormal areas
can then be biopsied and examined histologically. The
composition of Schiller's iodine is the same as Lugol's iodine,
but Lugol's iodine being more concentrated. When Schiller's
iodine is not available, Lugol's iodine can be used as an
alternative. Schiller's test is not specific for cervical cancer,
as areas of inflammation and keratosis may also not take up
the stain.
CYTOLOGY
The medical and scientific study of cells. Cytology refers to a
branch of pathology that deals with diagnoses of diseases C. Feature Engineering And Data Visualization
and conditions through the examination of tissue samples During feature engineering process, data is classified based
from the body. on the measurement levels as Numerical data and
Categorical data. Target variables are classified separately.
Cytological examinations are performed on body fluids Numerical data are represented as numbers. Features which
(examples are urine, blood and cerebrospinal fluid) or on cannot be grouped are classified under numerical data.
material that is aspirated (drawn out via suction into a Categorical data describes categories or groups and also
syringe) from the body. Screening is performed using answers to yes or no questions.
cervical cytology (Pap test) or a human papillomavirus
(HPV) test, or a combination of the two tests. Fig 3- Heatmap Correlation: Numerical & Categorical
Data
HINSELMANN
Colposcopic examination was almost impossible to perform
because of the distance from the focus that was no more than
80 mm. Hinselmann test tried to solve this problem by
pulling out the uterine cervix. The examined part is
anemised by this procedure, which can prejudice the final
result and a small amount of blood might also leak. Besides
that, a patient can feel pain if the portio is held by thin
forceps.
BIOPSY
Cervical biopsy is a procedure to remove tissue from the
cervix to test for abnormal or precancerous conditions, or
cervical cancer. Cervical biopsies can be done in several
ways. The biopsy can remove a tissue sample for testing.
E. Build Models
Modelling was done on the original data after default data
cleaning and scaling where necessary. For all the classifier
algorithms, the dataset is split 25%. 75% for training and
25% for testing. Feature selection is done and the models are
built based on the corresponding features.
@ IJTSRD | Unique Paper ID – IJTSRD33378 | Volume – 4 | Issue – 6 | September-October 2020 Page 428
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
Logistic regression, SVM, Decision tree, Random forests and However, while working in an imbalanced data accuracy is
Deep neural network (DNN) were the algorithms used. not an appropriate measure to evaluate model performance.
To fully evaluate the effectiveness of our model, we must
F. Model Evaluation and Testing examine precision and recall as well.
The conventional evaluation methods do not accurately
measure model performance when faced with imbalanced
datasets.
Deep Neural Networks (DNN) 91% 0.91 1.00 1.00 0.13 0.95 0.23
We get a reasonable accuracy and recall rate for Random We get a reasonable accuracy and recall rate for Random
Forest Model with both original data and resampled data. Forest Model with both original data and resampled data and
However resampling is not applicable for DNN. DNN.
Confusion Matrix for Random Forest: Hence it’s been concluded that the Random Forest with
Accuracy: 0.912087912879121 SMOTE and DNN are the overall best model so far for
[ [165 1] predicting the cancer indicator if all the 4 target variables
[15 1] ] (Biopsy, Cytology, Hinselmann and Schiller) are combined
together and classified as multi classifier for target variable.
Confusion Matrix for DNN:
[ [192 0] VI. REFERENCES
[ 20 3] ] [1] WEN WU “Data-Driven Diagnosis of Cervical Cancer
with Support Vector Machine- Based Approaches”,
The confusion matrix showed that both Random Forest and Department of Blood Transfusion, Jinan Military
DNN are able to identify the affected people at a reasonable General Hospital, Jinan, China Year: 2017
rate compared to other algorithms.
@ IJTSRD | Unique Paper ID – IJTSRD33378 | Volume – 4 | Issue – 6 | September-October 2020 Page 429
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
[2] Yasha Singh, Dhruv Srivastava, P.S. Chandranand & Dr. [5] Prediction of Cervical Cancer using Voting and DNN
Surinder Singh “Algorithms for screening of Cervical Classifiers Publication: Komala Rayavarapu, Krishna
Cancer: A chronological review” Kishore K.V | Vignan’s Foundation for Science,
Technology and Research
[3] Xiaoyu Deng, Yan Luo, Cong Wang “Analysis of Risk
Factors for Cervical Cancer Based on Machine Learning [6] Dhwaani Parikh, Vineet Menon “Machine Learning
Methods”, School of Automation, Beijing University of Applied to Cervical Cancer Data”, RMIT University, 124
Posts and Telecommunications, Beijing, 100876, China La Trobe St, Melbourne VIC 3000
[4] Upasana “Handle Imbalanced Classification Problems
in machine learning”, Consultant of Data & Analytics in
KPMG
@ IJTSRD | Unique Paper ID – IJTSRD33378 | Volume – 4 | Issue – 6 | September-October 2020 Page 430