0% found this document useful (0 votes)
8 views

Survey on Supervised Machine Learning in the Diagnosis and Detection of Breast Cancer STA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Survey on Supervised Machine Learning in the Diagnosis and Detection of Breast Cancer STA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

--------------------------------------------------------------------------------------------------------------------------------------

Survey on Supervised Machine Learning in the


Diagnosis and Detection of Breast Cancer
Authors: Ashish, Devvrat, Yogesh, Indian Institute Of Information Technology Kota, Rajasthan.

Guide: Dr. Veena Khandelwal Indian Institute Of Information Technology Kota, Rajasthan.

Abstract: Breast cancer stands as one of the most prevalent and life-threatening diseases
affecting women worldwide. It is the leading cause of cancer-related deaths among
women, emphasizing the critical need for early and accurate detection. Timely diagnosis
of breast cancer significantly improves patient outcomes, as early-stage cancers are not
only more treatable but also have a higher survival rate. However, traditional diagnostic
methods, such as mammography, ultrasound, and biopsy, often face challenges like inter-
observer variability, limited sensitivity in certain populations, and high costs associated
with repeated screenings. These limitations underscore the need for advanced tools to
aid clinicians in providing precise and efficient diagnoses. In this context, supervised
machine learning has emerged as a transformative technology in medical diagnostics.
Leveraging vast amounts of labeled data, supervised learning algorithms are capable of
identifying intricate patterns in medical datasets, such as imaging studies, genetic
information, and patient records. These patterns often surpass human-level capabilities,
enabling models to assist healthcare professionals in making data-driven decisions with
improved accuracy and reliability.

1 INTRODUCTION
This research survey aims to explore the recent advancements in the application of
supervised machine learning techniques for breast cancer detection and diagnosis. The
focus includes analyzing various algorithms, such as Support Vector Machines (SVM),
Random Forests, K-Nearest Neighbors (KNN), and Convolutional Neural Networks (CNN),
that have been extensively utilized to build predictive models. These models are trained
on diverse datasets, ranging from imaging modalities like mammograms to structured
clinical records, to accurately classify and predict the presence of breast cancer. By
systematically reviewing technical articles from reputable sources, this survey highlights
the methodologies, datasets, and evaluation metrics employed in existing studies.
Furthermore, it discusses the performance and limitations of different algorithms,
shedding light on their role in clinical applications. Through this analysis, the survey seeks
to provide a comprehensive understanding of how machine learning can enhance
diagnostic accuracy, reduce false positives and negatives, and ultimately contribute to
better patient outcomes. The potential of machine learning in breast cancer research is
immense, promising a future where AI-driven tools could personalize treatment plans,
reduce diagnostic errors, and alleviate the burden on healthcare systems. This survey
underscores the importance of continued research and innovation in this field, with the

1
--------------------------------------------------------------------------------------------------------------------------------------
ultimate goal of improving the lives of millions of women globally. Breast cancer is the
most common cancer in women worldwide and ranks as the second most common cancer
overall. In 2022 alone, approximately 2.3 million women were diagnosed with breast
cancer, leading to 670,000 deaths globally. Remarkably, about half of these cases occur in
women with no specific risk factors other than age and gender. The disease's impact
highlights the urgency for advanced detection and treatment methods. In terms of
survival rates, early detection plays a critical role. The 5-year relative survival rate for
breast cancer in the U.S. is 91%. For localized breast cancer, the rate is an impressive 99%,
whereas regional breast cancer has an 86% survival rate. However, distant breast cancer
sees a significant drop in survival, with a 5 year relative rate of just 31%. These statistics
underscore the critical importance of timely diagnosis and intervention, further
emphasizing the role of cutting edge technologies like supervised machine learning in
improving outcomes for patients.[1]

2. SURVEY
2.1 RANGE AND DISTRIBUTION
The survey conducted for this research involved an in-depth analysis of 75 articles
selected from a total pool of 471 articles available on IEEE Xplore. These articles were
chosen based on three primary search parameters: "Machine Learning," "Supervised,"
and "Breast Cancer." The selected articles represent a diverse range of studies published
between 2010 and 2024, reflecting significant advancements in the application of
supervised machine learning in the field of breast cancer diagnosis and detection. The
chosen range of articles highlights the evolving methodologies and innovations in
supervised learning techniques over the past decade. This timeline captures not only the
historical progression but also the growing sophistication in the tools, algorithms, and
datasets used for breast cancer detection and diagnosis.[2]

2.2 COMMON METHODS


The surveyed articles were further classified based on the supervised learning methods
employed to develop machine learning models. The classification revealed the prevalence
of specific algorithms, each tailored to address distinct challenges in breast cancer
diagnosis and detection. Among the algorithms, K-Nearest Neighbors (KNN) emerged as
the most frequently utilized method, primarily due to its ability to deliver reliable results
by calculating the closeness of data points to actual patient conditions. Other commonly
used algorithms included Decision Tree Classifiers (DTC), Random Forests, Support
Vector Machines (SVM), Logistic Regression, Multilayer Perceptrons (MLP), Artificial
Neural Networks (ANN), Convolutional Neural Networks (CNN), and Naive Bayes
classifiers. The chart below illustrates the frequency of algorithm usage across the
surveyed studies, showcasing the dominance of KNN and other versatile methods in
achieving accurate diagnostic results.[3][4][5][6][7][8][9][10][11][12][13][14][15][16].

2
--------------------------------------------------------------------------------------------------------------------------------------

FIG 2.1 The graph illustrates the distribution of methods employed across 75 research articles to predict
the presence of breast cancer. Each method is represented on the x-axis, while the y-axis indicates the
frequency of its occurrence (number of articles using the method). The graph highlights how often each
method appears, showing which methods are most commonly utilized and which are less frequent.

Labels Series 1
DTC 65
RANDOM FOREST 44
MLP 15
ANN 32
LOGISTIC REGRESSION 68
KNN 75
SVM 60
CNN 38
NAÏVE BAYES 24
Table 2.1 demonstrates the distribution of machine learning methods employed across 75 research
articles to predict the presence of breast cancer. Each row lists a specific method (e.g., DTC, Random
Forest, SVM) alongside the corresponding number of occurrences (frequency).

3
--------------------------------------------------------------------------------------------------------------------------------------
2.3 DATASETS USED: A critical component of this survey involved categorizing the articles
based on the datasets used for training and testing machine learning models. The studies
utilized a wide variety of datasets, ranging from raw to pre-processed versions,
depending on the specific requirements of the model. Notable datasets include:
• Wisconsin Breast Cancer Dataset (WBCD)[2][4]: One of the most extensively used
datasets due to its simplicity and accessibility.
• UCI Machine Learning Repository Datasets[7][13][14]: Widely adopted for training
models in academic research.
• NKI and FIGO[12][13][14]: Datasets focused on patient genetic profiles and clinical
stages, respectively.
• RSNA and MIAS[5][6][18]: Imaging datasets utilized for mammography analysis.
• DDSM and SEER[11][17]: Large-scale datasets used for advanced diagnostic techniques
and population-based studies.
While many datasets were used across studies, only a few were consistently applied for
comparative analysis due to variations in preprocessing methods and feature engineering
techniques.

Fig 2.2 The figure illustrates the distribution of datasets employed across the studies reviewed in this
survey. It categorizes the datasets based on their application in training and testing machine learning
models for breast cancer prediction.

4
--------------------------------------------------------------------------------------------------------------------------------------
2.4 MODEL ACCURACY The effectiveness of various models was evaluated based on their
performance in detection and diagnosis tasks. Models achieving high accuracy in these
tasks are particularly valuable in clinical settings, where precision is paramount.
• Detection Accuracy: Among the reviewed articles, Gradient Boosting Classification
(GBC) models demonstrated exceptional performance in detecting breast cancer,
achieving an accuracy of 99%. This underscores the model's robustness in identifying
cancerous patterns with minimal errors.
• Diagnosis Accuracy: Most supervised learning models, including SVM, Random Forest,
and ANN, reported accuracies ranging between 97% and 98%. These high-performance
metrics highlight the reliability of machine learning algorithms in diagnosing breast
cancer when trained on well curated datasets.
The analysis also revealed the potential for improving accuracy through tailored
preprocessing techniques, balanced train-test splits, and algorithm optimization. The
findings from this survey provide valuable insights into the current state of machine
learning in breast cancer diagnostics and lay the groundwork for future research in this
domain.

3 Scope
On the Basis of the Articles: What Does the Future Hold? Supervised learning has become
an indispensable part of medical diagnostics, offering transformative potential in cancer
detection and classification. With its ability to construct predictive models from labelled
datasets, supervised learning algorithms analyse complex patterns within data—such as
medical imaging, patient records, and genomic information—to assist clinicians in
diagnosing diseases like breast cancer. This technological integration not only enhances
the precision of cancer detection but also fosters a deeper understanding of the
underlying factors contributing to the disease. The application of supervised learning is
rapidly evolving, paving the way for more sophisticated models that are capable of
handling diverse data formats and complex relationships. With continued advancements
in computational power and data availability, supervised learning is poised to
revolutionize the landscape of medical diagnostics, making earlier and more accurate
breast cancer detection accessible to broader populations.
3.1 STRONG STATISTICS IN FAVOR The potential of supervised machine learning in breast
cancer detection and diagnosis is underscored by several key factors:
3.1.1. Improvement in Diagnostic Accuracy Supervised learning algorithms have
demonstrated exceptional accuracy in breast cancer diagnostics. For instance, Multilayer
Perceptron (MLP) models, when trained with an 80:20 training-to-testing data split, have
achieved an outstanding accuracy rate of 99.12%. Similarly, algorithms like Support
Vector Machines (SVM) and Random Forests deliver consistently high performance, with
accuracy rates exceeding 97% across various datasets. This improvement in diagnostic
accuracy is not merely theoretical but has practical implications. With such high precision

5
--------------------------------------------------------------------------------------------------------------------------------------
levels, these models significantly reduce the chances of false negatives and false positives,
thereby ensuring timely and accurate detection of breast cancer, which is critical for
improving patient outcomes.
3.1.2. Versatility of Algorithms A distinguishing feature of supervised learning is the
versatility of its algorithms, which can be tailored to analyse different types of medical
data. For example: • SVM: Widely used for classifying medical images, such as
mammograms, due to its ability to handle both linear and non-linear relationships within
the data. • Random Forests: Particularly effective for structured data, including patient
records, genetic profiles, or clinical test results. • Multilayer Perceptron (MLP) and
Logistic Regression: Often utilized in predictive modelling for assessing the likelihood of
breast cancer based on multiple clinical and demographic factors. The ability of these
algorithms to adapt to varying data types makes them invaluable tools in the diverse field
of medical diagnostics, particularly for a multifaceted condition like breast cancer.
3.1.3. Handling High-Dimensional Data Breast cancer diagnosis involves interpreting
complex, high-dimensional datasets that span across multiple modalities, including
imaging (e.g., mammography, ultrasound), genomic sequences, and clinical parameters.
Supervised learning excels in this domain by leveraging dimensionality reduction
techniques and feature selection methods to identify the most critical attributes while
discarding irrelevant noise. For instance, models trained on high-dimensional datasets
can analyse intricate relationships between tumour size, shape, and density in imaging
data, or identify significant genetic mutations that contribute to breast cancer. By doing
so, these models not only improve diagnostic accuracy but also enhance the
interpretability of results, enabling clinicians to make more informed decisions.

3.2 FUTURE OUTLOOK The integration of supervised learning into medical diagnostics is
just the beginning of a larger revolution in personalized healthcare. As these algorithms
become more sophisticated, the future holds promising advancements, such as:
• Real-Time Diagnostics: Immediate and accurate analysis of patient data, enabling faster
intervention.
• Integration with Other AI Technologies: Synergy with unsupervised and reinforcement
learning techniques to uncover hidden patterns and optimize treatment strategies.
• Personalized Medicine: Tailoring treatment plans based on an individual’s unique
genetic and clinical profile, informed by supervised learning models.

6
--------------------------------------------------------------------------------------------------------------------------------------

4. CONCLUSION
Supervised machine learning has emerged as a transformative tool in breast cancer
detection and diagnosis, offering unparalleled accuracy, versatility, and the ability to
handle high-dimensional medical data. Models such as Multilayer Perceptron (MLP),
Support Vector Machines (SVM), and Random Forests consistently deliver high-
performance metrics, improving diagnostic precision and enabling early detection. The
adaptability of these algorithms to diverse data types, ranging from imaging to genomic
profiles, underscores their potential in addressing the multifaceted challenges of breast
cancer diagnostics. As research continues to advance, supervised learning is poised to
revolutionize healthcare by enabling real-time diagnostics, fostering personalized
treatment plans, and integrating seamlessly with other AI technologies. However,
challenges like data privacy, interpretability, and bias need to be addressed to ensure
equitable and reliable applications. In conclusion, supervised learning represents a
powerful ally in the fight against breast cancer, driving progress toward more accurate,
accessible, and personalized healthcare solutions, ultimately saving lives and improving
patient outcomes on a global scale.

7
--------------------------------------------------------------------------------------------------------------------------------------
References
1. https://www.who.int/news-room/fact-sheets/detail/breast-cancer
2. https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=Machine%20Lea
rning&highlight=true&returnType=SEARCH&matchPubs=true&searchWithin=Su
pervised%20Learning&searchWithin=Breast%20Cancer&returnFacets=ALL&ra
nges=2010_2024_Year
3. P. Singh, J. Nagill and K. Saini, "Using Supervised Learning for Breast Cancer
Detection using AI&ML," 2023 5th International Conference on Advances in
Computing, Communication Control and Networking (ICAC3N), Greater Noida,
India, 2023, pp. 281-285, doi: 10.1109/ICAC3N60023.2023.10541492.
4. M. Gupta and B. Gupta, "A Comparative Study of Breast Cancer Diagnosis Using
Supervised Machine Learning Techniques," 2018 Second International Conference
on Computing Methodologies and Communication (ICCMC), Erode, India, 2018, pp.
997-1002, doi: 10.1109/ICCMC.2018.8487537
5. Anshuman and U. Kumar, "Machine Learning model for detection of Breast
Cancer," 2021 5th International Conference on Information Systems and Computer
Networks (ISCON), Mathura, India, 2021, pp. 1-4, doi:
10.1109/ISCON52037.2021.9702416.
6. A. Kumar, R. Patra and A. Ghosh, "Model Selection for Predicting Breast Cancer
using Supervised Machine Learning Algorithms," 2020 IEEE 1st International
Conference for Convergence in Engineering (ICCE), Kolkata, India, 2020, pp. 320-
324, doi: 10.1109/ICCE50343.2020.9290578.
7. M. Akhil and P. V. S. Kumar, "Breast Cancer Prognosis using Machine Learning
Applications," 2022 4th International Conference on Advances in Computing,
Communication Control and Networking (ICAC3N), Greater Noida, India, 2022, pp.
488-493, doi: 10.1109/ICAC3N56670.2022.10074517.
8. A. Bah and M. Davud, "Analysis of Breast Cancer Classification with Machine
Learning based Algorithms," 2022 2nd International Conference on Computing
and Machine Intelligence (ICMI), Istanbul, Turkey, 2022, pp. 1-4, doi:
10.1109/ICMI55296.2022.9873696.
9. P. C. Chhipa, R. Upadhyay, G. G. Pihlgren, R. Saini, S. Uchida and M. Liwicki,
"Magnification Prior: A Self-Supervised Method for Learning Representations on
Breast Cancer Histopathological Images," 2023 IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2023, pp. 2716-2726,
doi: 10.1109/WACV56688.2023.00274.
10. I. Koç, W. Tashan, I. Shayea and A. Zhetpisbayeva, "Breast Cancer Detection Based
on Machine Learning," 2024 IEEE 13th International Conference on
Communication Systems and Network Technologies (CSNT), Jabalpur, India, 2024,
pp. 1-6, doi: 10.1109/CSNT60213.2024.10545785.
11. K. Shilpa, T. Adilakshmi and K. Chitra, "Applying Machine Learning Techniques To
Predict Breast Cancer," 2022 Second International Conference on Interdisciplinary
Cyber Physical Systems (ICPS), Chennai, India, 2022, pp. 17-21, doi:
10.1109/ICPS55917.2022.00011.

8
--------------------------------------------------------------------------------------------------------------------------------------
12. V. L. K. Vasista, K. Sona, J. Pedarla, B. Sahithi, T. K. R. K. Rao and K. B. Prakash,
"Predicting Breast Cancer Using Classical Machine Learning and Deep Learning
Algorithms," 2023 International Conference on Intelligent and Innovative
Technologies in Computing, Electrical and Electronics (IITCEE), Bengaluru, India,
2023, pp. 988-991, doi: 10.1109/IITCEE57236.2023.10090883.
13. P. P. Sengar, M. J. Gaikwad and A. S. Nagdive, "Comparative Study of Machine
Learning Algorithms for Breast Cancer Prediction," 2020 Third International
Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India,
2020, pp. 796-801, doi: 10.1109/ICSSIT48917.2020.9214267
14. P. Sathiyanarayanan, S. Pavithra., M. SAI SARANYA. and M. Makeswari.,
"Identification of Breast Cancer Using The Decision Tree Algorithm," 2019 IEEE
International Conference on System, Computation, Automation and Networking
(ICSCAN), Pondicherry, India, 2019, pp. 1-6, doi: 10.1109/ICSCAN.2019.8878757.
15. M. R. Ahmed, M. A. Ali, J. Roy, S. Ahmed and N. Ahmed, "Breast Cancer Risk
Prediction based on Six Machine Learning Algorithms," 2020 IEEE Asia-Pacific
Conference on Computer Science and Data Engineering (CSDE), Gold Coast,
Australia, 2020, pp. 1-5, doi: 10.1109/CSDE50874.2020.9411572.
16. R. Kumar, M. Chaudhry, H. K. Patel, N. Prakash, A. Dogra and S. Kumar, "An
Analysis of Ensemble Machine Learning Algorithms for Breast Cancer Detection:
Performance and Generalization," 2024 11th International Conference on
Computing for Sustainable Global Development (INDIACom), New Delhi, India,
2024, pp. 366-370, doi: 10.23919/INDIACom61295.2024.10498618.
17. D. Ghadge, S. Hon, T. Saraf, T. Wagh, A. Tambe and Y. S. Deshmukh, "Analysis on
Machine Learning-Based Early Breast Cancer Detection," 2024 4th International
Conference on Innovative Practices in Technology and Management (ICIPTM),
Noida, India, 2024, pp. 1-5, doi: 10.1109/ICIPTM59628.2024.10563587.
18. D. Mitra, N. Sharma, M. Rashid and R. Singh, "Classification Rules based Breast
Cancer Detection using Machine Learning Approach," 2022 5th International
Conference on Contemporary Computing and Informatics (IC3I), Uttar Pradesh,
India, 2022, pp. 1274-1278, doi: 10.1109/IC3I56241.2022.10072832.

You might also like