0% found this document useful (0 votes)
5 views

2019-05 Machine Learning Techniques For Detecting and Predicting Breast Cancer

This document discusses using machine learning techniques to detect and predict breast cancer. It introduces the challenges of analyzing mammography images and prognostic data for breast cancer. Different machine learning classification algorithms like random forests, support vector machines, naive Bayes, decision trees, k-nearest neighbors, logistic regression and artificial neural networks are described for categorizing breast cancer patient data and detecting features in mammograms.

Uploaded by

Parashu Ram Pal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

2019-05 Machine Learning Techniques For Detecting and Predicting Breast Cancer

This document discusses using machine learning techniques to detect and predict breast cancer. It introduces the challenges of analyzing mammography images and prognostic data for breast cancer. Different machine learning classification algorithms like random forests, support vector machines, naive Bayes, decision trees, k-nearest neighbors, logistic regression and artificial neural networks are described for categorizing breast cancer patient data and detecting features in mammograms.

Uploaded by

Parashu Ram Pal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Innovative Technology and Exploring Engineering (IJITEE)

ISSN: 2278-3075, Volume-8 Issue-7 May, 2019

Machine Learning Techniques for Detecting and


Predicting Breast Cancer
Rati Shukla, Vikash Yadav, Parashu Ram Pal, Pankaj Pathak

 it is not easy to interpret mammography image. The


Abstract: Breast cancer is a syndrome that causes hues numbers important abnormalities in breast cancer are Calcification
of casualty every year due to ineffectiveness of proper filtering and masses. Micro calcification defined as the high
and appropriate classification methods. Breast Cancer is not one frequency part and noise having a lower
of the homogeneous diseases that differ greatly among different frequencybackground.
categories of Cancer sufferer and even within each individual
tumor. Classification of cancer sufferer using Machine
Learning methodologies in different class of risk criterion such
as high, low and medium has led many research dimensions of
life science data. Therefore, Machine Learning is one of the very
use full methodologies to study and design the different class of
development and prognosis of cancerous situation. Machine
learning methods are very powerful and effective tool for key
feature extraction and classification form complex cancerous
data set. In this study, we put forward applicability of different
Machine Learning classification techniques employed in the
prediction and prognosis of Breast Cancer.

Index Terms: Breast Cancer, Classification, Neural Network,


Support Vector Machine, Cancer Susceptibility Fig. 1:Representing Cancerous Tissues and Fatty Tissues
The challenges in correct segmentation of each micro
I. INTRODUCTION calcification are because of the variation in shape and size of
micro calcification and presence of high frequency noise in
Breast cancer thrives with the breast cell. The first traces of superimposed surrounding tissue [4]. The Prognosis is an
breast cancer are liposuction and abnormal mammogram. An important point that mostly physician and cancer sufferer
early warning signs of breast cancer any change in nipple communicate as complicated to argue and one of the ways of
size, spoon, nipple removal, and abnormal discharge of blood, suggests prognostic detail to improve sufferer understanding
temperature. The rate of breast cancer is much higher in [6] [7]. Pathological approaches for determining lumps
developed country than developing country. The researcher status and their symptoms are useful for breast cancer
assumes that lifestyle (unbalanced diet, physical activity) and prediction and prognosis. Machine learning prediction
mental problems like, excessive stress, sadness affect quality techniques have benefited health care industry a lot. It
of life of cancer sufferer. To select effective treatments for suggests different class of change and watch tool, targeted at
chronic disease such as breast cancer patients, it is essential improving cancer sufferer' protection and wellness program
to carefully show on the risk and benefits of each treatment [8]. Medical databases are too complex to handle. There are
[1]. Machine learning approaches are one of the efficient several investigations on the medical database.
ways to categorize the cancerous patient data on the basis of Pre-processing is necessary for real-time medical data.
different symptoms. Many researchers have been conducted Improve medical data with a big trait, with an effective
feature selection algorithm to calculate binary and multiply
research to carry out machine learning approaches on various
data [9].
Biological datasets for cataloging [2][3].
The main goals of array express are repository for data, high
Breast cancer is became one of the major reason of death quality gene expression and experimental protocols [11].
among females [4]. For identifying breast cancer symptoms One of the important dimensions of bioinformatics research
mammography is one of the effective techniques but is analysis of coherent patterns in gene expression data.
the quality of mammography technique is very poor. Hence, [12][13]. Microarray techniques are very valuable and
powerful, so it is important to extract the greatest value from
Revised Manuscript Received on May 06, 2019
the microarray data, especially from the larger microarray
Rati Shukla,GIS Cell, MNNIT Allahabad
sample series [14]. Many branches, like decision-making,
Vikash Yadav, Department of Computer Science and Engineering, ABES financial and medical research can use Multi-relational
Engineering College, Ghaziabad classification for the destination [16] [17] [19] [26].
Parashu Ram Pal, Department of Information Technology and
Engineering, ABES Engineering College, Ghaziabad.
Pankaj Pathak, Symbiosis International (Deemed University), Pune,
Maharashtra

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number G5859058719/19©BEIESP 2658 & Sciences Publication
Machine Learning Techniques for Detecting and Predicting Breast Cancer

II. PROBLEM AND CHALLENGES


To construct efficient and right classifiers for biological
application in machine learning areas is one of the leading
challenges [4]. There are lots of requirements of
computational biologists to handle and assist translate the big
amount of data that is steadily being accumulated in the
genomic study [10]. Genes and protein expressions, ways to
analyze high-transfer data in the form of photos and images
are computing techniques becoming important for
understanding the diseases and importance of discovering
the drug in the future [15]. Classification of genes expression
using Machine learning is a research area that presents a new
challenge because of the unique peculiarity of the problem
[18]. There are mainly following challenges in genes
expression classification using machine learning approach,
the hues number of gene expression, relevant features
investigation, and occurrence of noises inherent in the
dataset, classification accuracy and reliability

III. CLASSIFIER FOR BREAST CANCER PREDICTION


Machine learning makes the use of data mining algorithms
to find patterns in large datasets. Number of breast cancer
Fig. 2:Machine Learning Classifier to Classify
prediction models has been developed using statistical and
Cancerous Features
machine learning techniques and employed. Artificial
P (A/B): The conditional probability of occurrence of event A
intelligence has taken a great place in the scientific and
given the event B is true. P (A) and P (B): The Probability of
technical development community [21]. Effective uses of
Occurrence of event A and B respectively. P (B/a):
Machine learning classification based data extraction
Conditional Practicability of occurrence of event B given the
techniques are available like Random Forecast (RF), Support
event A is true.
Vector Machines (SVM), Naive Bayes Classifier (NBC),
Decision Trees (DT), K-Nearest Neighbor (KNN), Logistic B. Logistic Regression (Predictive Learning Model)
Regression, Artificial Neural Networks (ANN) to massive A type of supervised machine learning approach for
volume of healthcare data. ML methods, is generally used to prediction model. Logistic Regression Generalize idea of
show covert correlations between diseases and gene integer regression to situations where outcome variable is
expression. Timely detection and right investigation of the categorical. This technique mostly focuses on binary
problem using ML technique will help the doctor in saving classification of data.
the life of a cancer patient. Due to varied appearances and
complexity of lumps, the Brain Tumor Detection based on PY = 1= α0 + α1 X
Machine Learning algorithms method gives the satisfied
accuracy. It requires high degree of accuracy as human life
involved. Number of ML techniques has been applied to
widely find characteristic of disease diagnosis and prediction.
By learning machine, physician can check the clinical
performance of the drug against cancer by transferring the
facilities obtained from the data on the basis of cell lines for P(Y=1): Predicted probability above 1 or below zero.
personal patients. A classification model built based on gene Logistic Regression is a generalized idea of linear
expression measurements of samples from patients who have regression to situation where outcome variable are
cancer on the left, right, and both lobes of the prostate as categorical. Logistic Regression is a predictive Learning
classes. Classify different cancerous cases using standard Model when the output label predicted a value between 0
machine learning strategy to find genes are valuable and andone.
effective way to analysis
C. Support Vector Machines (SVM)
SVM is now days one of the most effective Machine learning
A. Naive Bayes Classifier (NBC)
classifier under the supervised machine learning techniques
NBC is one of the efficient Machine learning Algorithm used for classification or regression challenges in the field of
based on Bayes theorem with independent postulation with medical sciences. Supports Vector Machines are based on
predictors for classification problems. Probabilistic two key concepts are selection of hyper plane which
Approach to Classification is relationship between input segregate the two classes and maximum distance between the
features and class expressed as probabilities. nearest data points of
margin. A hyper plane is one
that separates between a set

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number G5859058719/19©BEIESP 2659 & Sciences Publication
k
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-7 May, 2019

of data points having different margin memberships. A


schematic example is shown in Figure-02. The objects
belong either to class BLUE or BLACK [22][23].

Fig 5: Deep Learning Neural Network

F. K-Nearest Neighbor (KNN)


One of the simplest algorithms that store all existing cases
and classifies the new cases based on their similarity
measure. KNN is A simplest non parametric method for
classification based on finding the k nearest in some
Fig 3:Support Vector Machines (SVM)
reference set and taking a majority poll among the margin
of this. ‘Nearest’ can be measured by Euclidean distance
The main behind SVM based classification is to create a
method (EDM).
choice boundary between two data points of margin that
Euclidean Distance Function=
enables the forecasting of labels. SVM is very supportive in
managing classification tasks for high-dimensional and
sparse microarray data.

D. Support Vector Machines (SVM)


DT recursively divide cancer sufferer data into different
classes based on the value. Decision tree (DT) provides
efficient tool for classification and forecasting in the area
of life sciences [24] [25]. Assess a machine learning
perspective to evaluate the activities of the abnormal
thruways in lumps, which can help to identify hidden
respondents. Fig 6: Cancerous Tumor Classification Using KNN
Algorithm

IV. PREDICTION OF BREAST CANCER USING ML


TECHNIQUES
Breast cancer can reappear month or a year after the
treatment at the same place called recurrent breast cancer.
Breast cancer that has returned and it spread out to other
parts of the cancer suffers human body like liver, brain,
lungs is metastasis or distant recurrence. Common
symptoms of metastatic breast cancer sufferer May angry,
feel scared, stressed, outraged, and depressed. Local
recurrence is usually found on a diagnostic mammogram,
Fig 4: Decision tree for breast cancer diagnosis during a physical examination by a cancer expert cancer
sufferer see a change in breast in X-ray examination. The
E. Artificial Neural Networks (ANN) surgeon eliminates the lumps, diagnosed by the pathologist
ANN is a system inspired computational model based on the and tested for hormone receptor status. If physical
function and structure of biological nerve system. The idea examination symptoms are abnormal the pathological
behind ANN is basically set of interconnected neurons. Three results can sure/unsure metastasis. Discomfort, nipple
main component of neural network are basically used for discharge, inverted nipple, growing view, new shape or size,
transformation are layer Input, output, hidden layer. The nipple crust lumps, Shortness of breath, Weight loss, and
neurons are connected by means of edges and each edges is Bone pains are the common symptoms. The machine based
associated with their vertex (input, output, hidden) called classification and feature selection methods applied before
wait. building modules for prediction of cancer recurrence.

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number G5859058719/19©BEIESP 2660 & Sciences Publication
Machine Learning Techniques for Detecting and Predicting Breast Cancer

Efficient Feature choice can directly reduce the number of Machine Learning and High Dimensional Visualization in Cancer
Detection, Diagnosis, and Management. Annals of the New York
original features by selecting a subset of them that still Academy of Sciences, 1020(1), 239-262.
retains complete information for classification. Joint effort of 8. Hagerty, R. G., Butow, P. N., Ellis, P. M., Dimitry, S., & Tattersall, M.
different recent machine learning techniques and note of H. N. (2005). Communicating prognosis in cancer care: a systematic
review of the literature. Annals of Oncology, 16(7), 1005-1053.
cancer surgeon/scientist for Breast cancer recurrence major
9. Nithya, B., &Ilango, V. (2017, June). Predictive analytics in health
future trends of computational biology seem to get good
care using machine learning tools and techniques. In Intelligent
results with accuracy. Computing and Control Systems (ICICCS), 2017 International
Conference on (pp. 492-499). IEEE.
V. DISEASES PROGNOSIS 10. Vanaja, S., & Kumar, K. R. (2014). Analysis of feature selection
algorithms on classification: a survey. International Journal of
Computer Applications, 96(17).
The achievement of a disease diagnosis is totally relying on 11. Cohen, J. (2004). Bioinformatics—an introduction for computer
quality of a non-decline medical diagnosis. Prognostic scientists. ACM Computing Surveys (CSUR), 36(2), 122-158.
12. Parkinson, H., Sarkans, U., Shojatalab, M., Abeygunawardena, N.,
prognosis is more than that simple diagnostic investigation. Contrino, S., Coulson, R., &Lilja, P. (2005). Array Express—a public
Analysis of prognostic accuracy of our machine learning repository for microarray gene expression data at the EBI. Nucleic
based approach to that of established clinical predictors and acids research, 33(suppl_1), D553-D555.
13. Jiang, D., Pei, J., & Zhang, A. (2005). An interactive approach to
visual assessments. Cancer prognosis is mainly concerned mining gene expression data. IEEE Transactions on knowledge and
with three predictive tasks susceptibility; recurrence Data Engineering, 17(10), 1363-1378.
survival. Among the various Machines learning Algorithm 14. Korenberg, M. J. (Ed.). (2007). Microarray data analysis: methods and
applications (Vol. 377). Springer Science & Business Media.
SVM provides more accurate result. Gene mutation of cancer 15. Zhang, Y., &Rajapakse, J. C. (2009). Machine learning in
patient’s profiles effectively used with unsupervised ML bioinformatics (Vol. 4). John Wiley & Sons.
methods to find clinically perceptible a division breast cancer 16. Han, J., Pei, J., &Kamber, M. (2011). Data mining: concepts and
techniques. Elsevier.
patient’s group. A good study of detecting the melanoma 17. Yin, X., Han, J., Yang, J., & Philip, S. Y. (2006), Crossmine: Efficient
skin cancer using high level features of skin lesson can be classification across multiple database relations. In Constraint-Based
taken from as well. [27] mining and inductive databases (pp. 172- 195). Springer, Berlin,
Heidelberg.
18. Yin, X., Han, J., Yang, J., & Yu, P. S. (2006). Efficient classification
VI. CONCLUSION across multiple database relations: A crossmine approach. IEEE
Transactions on Knowledge and Data Engineering, 18(6), 770-783.
To be precise, this study addresses the importance and 19. Lu, Y., & Han, J. (2003). Cancer classification using gene expression
data. Information Systems, 28(4), 243-268.
effective use of machine learning techniques to analyze 20. Menden, M. P., Iorio, F., Garnett, M., McDermott, U., Benes, C. H.,
different class of Breast cancer species and reduce the Ballester, P. J., &Saez- Rodriguez, J. (2013). Machine learning
mortality rate. To analyze Biological data related to breast prediction of cancer cell sensitivity to drugs based on genomic and
chemical properties. PLoS one, 8(4), e61318.
cancer using various machine learning, learned techniques 21. Agrawal, R., &Srikant, R. (1994, September). Fast algorithms for
are available. ML is actively involved in Breast cancer mining association rules. In Proc. 20th int. conf. very large data bases,
related complication. Good Eating habits and lifestyle VLDB (Vol. 1215, pp. 487-499).
22. Kuo, C. Y., Yu, L. C., Chen, H. C., & Chan, C. L. (2018). Comparison
influence on Breast cancer related risks. Usability of of models for the prediction of medical costs of spinal fusion in Taiwan
machine learning classifiers and its utility in cancer Diagnosis-Related Groups by machine learning algorithms.
prediction/prognosis can decrease mortality. The diversified Healthcare informatics research, 24(1), 29-37.
analysis of the studies focuses on development of efficient 23. Jiang, Y., Xie, J., Han, Z., Liu, W., Xi, S., Huang, L., & Yu, J. (2018).
Immuno marker Support Vector Machine Classifier for Prediction of
and right predictive models using supervised machine
Gastric Cancer Survival and Adjuvant Chemotherapeutic Benefit.
learning based classification algorithms. Application of Clinical Cancer Research, clincanres-0848.
different Machine learning classification techniques like 24. Vapnik, V. (2013). The nature of statistical learning theory. Springer
ANN, NBC, DT, CNN, SVM techniques for feature choice science & business media.
25. Elsayad, A. M., &Elsalamony, H. A. (2013). Diagnosis of breast
and study of multiple- dimensional biological data, cancer using decision tree models and SVM. International Journal of
non-multiple data integration is a good resource for human Computer Applications, 83(5).
understanding in breast cancer predictions and prognosis 26. Elsalamony, H. A., &Elsayad, A. M. (2013). Bank Direct Marketing
Based on Neural Network and C5. 0 Models. International Journal of
ofdiseases. Engineering and Advanced Technology (IJEAT), 2(6).
27. Yadav, V. & Kaushik, D. V., (2018). A Study on Automatic Early
REFERENCES Detection of Skin Cancer. International Journal of Advanced
Intelligence Paradigms (IJAIP), ISSN online: 1755-0394 ISSN print:
1. Simes, R. J. (1985). Treatment selection for cancer patients: application 1755-0386, Vol. 12, No. 3/4, pp. 392-399, March 2019, U.K., DOI:
of statistical decision theory to the treatment of advanced ovarian cancer. 10.1504/IJAIP.2018.10015438.
Journal of chronic diseases, 38(2), 171- 186. 28. Yadav, V. & Kaushik, D. V., (2018). Detection of Melanoma Skin
2. Asri, H., Mousannif, H., Al Moatassime, H., & Noel, T. (2016). Using Disease by Extracting High Level Features for Skin Lesions.
machine learning algorithms for breast cancer risk prediction and International Journal of Advanced Intelligence Paradigms (IJAIP),
diagnosis. Procedia Computer Science, 83, 1064-1069. ISSN online: 1755-0394 ISSN print: 1755- 0386, Vol. 11, Nos. 3/4,
3. Maclin, P. S., Dempsey, J., Brooks, J., & Rand, J. (1991). Using neural pp. 397-408, September 2018, U.K., DOI:
networks to diagnose cancer. Journal of Medical Systems, 15(1), 11-19. 10.1504/IJAIP.2018.10015438.
4. Malvia, S., Bagadi, S. A., Dubey, U. S., &Saxena, S. (2017).
Epidemiology of breast cancer in Indian women. Asia Pacific Journal of
Clinical Oncology.
5. Gangnon, R. E., Stout, N. K., Alagoz, O., Hampton, J. M., Sprague, B. L.,
& Trentham-Dietz,
6. (2018). Contribution of breast cancer to overall mortality for US
women.Medical Decision Making, 38(1_suppl), 24S-31S.
7. Mccarthy, J. F., Marx, K. A., Hoffman, P. E., Gee, A. G., O'neil, P. H.
I. L. I. P., Ujwal, M. L., & Hotchkiss, J. (2004). Applications of

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number G5859058719/19©BEIESP 2661 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-7 May, 2019

AUTHORS PROFILE
Rati Shukla received her BSc in 2006, MCA degree in
2009 from U.P. Technical University Lucknow and MTech
(Computer Science and Engineering) degree in 2014 from
Motilal Nehru National Institute of Technology, Allahabad
(U.P. India). She is pursuing her PhD at GIS Cell Motilal
Nehru National Institute of Technology, Allahabad (U.P.
India). She worked as a Guest Faculty at the Department of
Computer Science and Engineering, Motilal Nehru National Institute of
Technology, Allahabad (U.P. India) from 2010 to 2012. Her areas of interest
are genetic algorithm, data structure.

Dr. Vikash Yadav received his B.Tech (Computer


Science & Engineering) degree in 2009 from Dr.
Ambedkar Institute of Technology for Handicapped,
Kanpur (U.P. India), M.Tech (Software Engineering)
degree in 2013 from Motilal Nehru National Institute of
Technology, Allahabad (U.P. India) and Ph.D (Computer
Science & Engineering) degree from Dr. A.P.J Abdul Kalam University
(Formerly U. P. Technical University) Lucknow, (U.P. India) in 2017 in the
field of Image Processing. He is currently working as an Assistant Professor in
the Department of Computer Science & Engineering, ABES Engineering
College, Ghaziabad, India and has more the 7 years of Teaching/Research
experience and published more than 30 research papers in various
National/International Conferences/Journals. He is also a reviewer of various
SCI/SCIE/Scopus indexed journals. His area of interest includes Data
Structure, Data Mining, Image Processing and Big Data Analytics.

Dr. Parashu Ram Pal, obtained Masters and Ph.D. in


1998 and 2010 respectively. He is working as a Professor in
Department of Information Technology, ABES
Engineering College, Ghaziabad, India. His area of interests
are DBMS, Data Mining, Automata Theory, Computer
Graphics and Computer Architecture. He has published more than 30 Research
Papers in various International, National Journals & Conferences. He is devoted
to Education, Research & Development for more than twenty years and always
try to create a proper environment for imparting quality education with the spirit
of service to the humanity. He believes in motivating the staff and students to
achieve excellence in the field of education and research.

Dr. Pankaj Pathak obtained Masters and Ph.D. in 2005,


2014 respectively. He is working as an Assistant Professor
in Symbiosis Institute of Telecom Management. His area of
interests are Data Mining, AI, and Smart Technologies. He
has Published Several Research papers in the area of Data
Mining, IOT security and Speech Recognition Technology.

Published By:
Blue Eyes Intelligence Engineering
Retrieval Number G5859058719/19©BEIESP 2662 & Sciences Publication

You might also like