Report MP
Report MP
DIAGNOSIS SYSTEM
A mini project report submitted by
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
MARCH 2024
BONAFIDE CERTIFICATE
This is to certify that the project report entitled, “Deep Learning based Breast Cancer
Diagnosis System” is a bonafide record of Mini Project work done during the even semester of
theacademic year 2023-2024 by
in partial fulfillment of the requirements for the award of the degree of Bachelor of Technology
in Computer Science and Engineering of Karunya Institute of Technology and Sciences.
First and foremost, I praise and thank ALMIGTHY GOD whose blessings have bestowed
I am grateful to our beloved founders Late. Dr. D.G.S. Dhinakaran, C.A.I.I.B, Ph.D and
Dr. Paul Dhinakaran, M.B.A, Ph.D, for their love and always remembering us in their prayers.
I extend my thanks to our Vice Chancellor Dr. Prince Arulraj, M.E., Ph.D., and our
Registrar Dr. Elijah Blessing, M.E., Ph.D, for giving me this opportunity to do the project.
I wish to extend my thanks to our Pro-Vice Chancellor (RC) Dr. E.J. James
and Dr. Ridling Margaret Waller Pro-Vice Chancellor (QS), for providing me this opportunity
to do the project.
I would like to thank Dr. Ciza Thomas., Dean, School of Computer Science and
Technology for his direction and invaluable support to complete the same.
I would like to place my heart-felt thanks and gratitude to Dr. Immanuel Johnraja,
M.E., Ph.D., Head of the Division, Computer Science and Engineering for his encouragement
and guidance.
I feel it is a pleasure to be indebted to, Mrs. Keirolona Safana Seles, MTech, Assistant
Professor,Division of CSE, and for her invaluable support, advice and encouragement.
I also thank all the staff members of the Department for extending their helping hands to
I would also like to thank all my friends and my parents who have prayed and helped me
Acknowledgement 3
Abstract 5
1. Introduction 6
1.1 Introduction
1.2 Objectives
1.3 Motivation
1.4 Overview of the Project
1.5 Chapter wise Summary
2. Literature Survey 10
2.1 Classification of skin cancer with deep Neural Networks
2.2 Deep CNN for computer aided design
2.3 Image Augmentation for deep learning
2.4 Breast Cancer Detection by Fusing Multiple CNN
2.5 Pattern Recognition
2.6 Summary
3. Proposed Architecture 13
4. Implementation 14
4.1. Modules Description
4.2. Implementation Details
4.3. Tools used
5. Test results 18
5.1. Findings
5.2. Results
6. Conclusions and Further Scope 24
References 26
Breast cancer diagnosis remains a critical challenge in oncology, necessitating accurate and efficient
classification methods. In this project, we propose a deep learning-based approach for breast cancer
classification utilizing the Wisconsin Breast Cancer (WBC) dataset. The objective is to develop a
robust model capable of distinguishing between malignant and benign tumors. Drawing insights
from a comprehensive literature review, we integrate state-of-the-art convolutional neural networks
(CNNs) and techniques such as transfer learning, ensemble learning, attention mechanisms, and
adversarial training. These methodologies aim to enhance classification accuracy, robustness, and
generalization ability. Additionally, we investigate domain adaptation and self-supervised learning
strategies to address data scarcity issues and reduce dependency on medical imaging data. By the
project's conclusion, our model will classify breast tumors with high accuracy, providing valuable
insights into the tumor's malignancy without the need for direct medical imaging. The proposed
model holds promise for improving diagnostic precision and patient care in breast oncology,
particularly in scenarios where medical imaging data is limited or unavailable. Furthermore, the
findings and methodologies developed in this project have broader implications for the
advancement of machine learning-based healthcare solutions, facilitating more effective and
accessible diagnostic tools for breast cancer detection and treatment planning.
1.1 Introduction
Breast cancer is one of the most prevalent cancers affecting women worldwide, with timely and
accurate diagnosis being critical for effective treatment. Traditional methods of breast cancer
classification often rely on subjective interpretation, leading to variations in diagnostic accuracy.
In recent years, deep learning techniques, particularly convolutional neural networks (CNNs),
have emerged as promising tools for improving the accuracy and efficiency of breast cancer
diagnosis. Leveraging the power of CNNs, this project focuses on exploring their application for
breast cancer classification using the Wisconsin Breast Cancer (WBC) dataset. By integrating
advanced techniques such as transfer learning, ensemble learning, and attention mechanisms, we
aim to develop a robust model capable of accurately distinguishing between malignant and
benign tumors. The insights gained from this study have the potential to revolutionize breast
cancer diagnosis by providing clinicians with more reliable and objective tools for tumor
classification, ultimately leading to improved patient outcomes and personalized treatment
strategies.
1.2 Objectives
To Develop a Robust CNN Model: Develop a robust CNN model capable of accurately
distinguishing between malignant and benign breast tumors with high precision and recall.
To Address Class Imbalance: Address class imbalance issues within the dataset through
techniques such as oversampling, under sampling or class-weighted loss functions to ensure
equitable learning across malignant and benign classes.
To Evaluate Model Generalization: Assess the generalization ability of the developed model by
evaluating its performance on unseen data, including cross- validation and testing on
independent datasets.
To Provide Insights for Clinical Application: Provide insights and recommendations for the
clinical application of the developed CNN model, including its integration into existing
diagnostic workflows and its potential impact on patient care and treatment planning.
1.3 Motivation
The motivation behind this project stems from the transformative potential of deep learning
techniques, particularly convolutional neural networks (CNNs), in revolutionizing breast cancer
diagnosis. By harnessing the power of CNNs and advanced techniques such as transfer learning,
ensemble learning, and attention mechanisms, we aim to develop a robust model capable of
accurately distinguishing between malignant and benign breast tumors. Such a model holds
promise for improving diagnostic accuracy, facilitating early detection, and ultimately enhancing
patient outcomes in breast oncology.
Furthermore, the exploration of deep learning methodologies in breast cancer classification aligns
with the broader goal of advancing machine learning applications in healthcare. By developing
reliable and objective tools for tumor classification, this project contributes to the ongoing efforts
to enhance personalized medicine and improve the quality of care for breast cancer patients.
Ultimately, the motivation behind this project lies in its potential to make a meaningful impact on
breast cancer diagnosis and treatment, ultimately saving lives and improving quality of life for
individuals.
Breast cancer is a significant health concern worldwide, necessitating accurate and timely diagnosis
for effective treatment. Traditional methods of breast cancer classification often rely on subjective
interpretation, leading to variations in diagnostic accuracy. This project aims to address this
challenge by leveraging the power of deep learning techniques, particularly convolutional neural
networks (CNNs), for breast cancer classification using the Winsconsin Breast Cancer (WBC)
dataset.
The project begins with an exploration of deep learning methodologies, focusing on understanding
CNN architectures and their applicability to medical image analysis. Subsequently, we preprocess
the WBC dataset, perform data augmentation, and address class imbalance issues to ensure robust
model training.
The core of the project involves developing a CNN model tailored for breast cancer classification.
We employ transfer learning techniques to leverage pre-trained models such as VGG or ResNet,
fine-tuning them on the WBC dataset to adapt to breast cancer classification tasks. Ensemble
learning approaches are explored to further enhance classification performance, combining the
strengths of multiple models for improved accuracy and generalization.
Furthermore, attention mechanisms are integrated into the CNN architecture to highlight important
features in the breast cancer images, aiding in accurate tumor detection and classification.
Adversarial training strategies are employed to enhance the model's robustness against noise and
artifacts commonly encountered in medical imaging data.
The developed CNN model undergoes rigorous evaluation to assess its performance on unseen data,
including cross-validation and testing on independent datasets. Comparative analysis against
baseline models and traditional machine learning algorithms validates the superiority of the CNN
model in breast cancer classification.
Ultimately, the project aims to provide valuable insights for clinical application, offering a reliable
and objective tool for breast cancer diagnosis that can potentially improve patient outcomes .
Introduction: The introduction chapter of our project lays the groundwork by emphasizing the
importance of utilizing Convolutional Neural Networks (CNNs) for breast cancer classification
using the Wisconsin Breast Cancer (WBC) dataset. It outlines the project's objectives,
motivations, and provides an overview, offering a clear roadmap for the subsequent chapters.
Literature Survey: The literature review chapter provides a succinct overview of existing
research relevant to breast cancer classification using CNNs and similar datasets. It offers
context by identifying gaps, evaluating methodologies, and synthesizing findings within the
scholarly discourse, informing our project's rationale and approach.
Proposed Architecture:
The proposed architecture chapter delineates the structural framework for developing and
implementing a CNN model for breast cancer classification. It outlines data preprocessing
techniques, model architectures, and training strategies to be employed, serving as a blueprint for
organizing and executing the project's investigation into breast cancer classification.
Implementation: The implementation chapter delves into the practical details of our project's
execution. It includes descriptions of data preprocessing steps, model configurations, and
training procedures. Additionally, it provides insights into the development process, including
the rationale behind the chosen implementation techniques and tools.
Conclusions and Further Scope: The final chapter summarizes the key findings and conclusions
drawn from our project on breast cancer classification using CNNs and the WBC dataset. It also
discusses potential future avenues for research and development, paving the way for continued
exploration and innovation in this field of medical image analysis.
2.1 E Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M.,
& Thrun, S. (2017). Dermatologist-level classification of skin cancer with
deep neural networks. Nature, 542(7639), 115-118.
Esteva et al. developed a deep neural network model capable of classifying skin cancer with a
level of accuracy comparable to dermatologists. The model achieved dermatologist-level
performance in classifying skin lesions as malignant or benign based on images. The findings
demonstrate the potential of deep learning techniques, specifically convolutional neural networks
(CNNs), in automating skin cancer diagnosis. The model's ability to accurately classify skin
lesions suggests its potential utility as a diagnostic aid for dermatologists, potentially improving
diagnostic accuracy and patient outcomes.
2.2 Shin, H. C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., ... & Summers,
R. M. (2016). Deep convolutional neural networks for computer-aided
detection: CNN architectures, dataset characteristics and transfer learning.
IEEE transactions on medical imaging, 35(5), 1285-1298.
Shin et al. investigated the application of deep convolutional neural networks (CNNs) for
computer-aided detection in medical imaging, specifically mammography. They explored
different CNN architectures, dataset characteristics, and transfer learning strategies to enhance
performance in detecting breast cancer.
2.4 Wang, J., Yang, X., Cai, H., Tan, W., Jin, C., Li, L., ... & Ma, F. (2020).
Breast Cancer Detection Using Deep Learning Algorithms by Fusing
Multiple CNN Models. IEEE Access, 8, 23549-23558.."
Wang et al. proposed a novel approach for breast cancer detection by fusing multiple CNN
models. They demonstrated that ensemble learning, which combines predictions from diverse
CNN architectures, can improve classification performance compared to individual models The
findings suggest that ensemble learning techniques can enhance the accuracy and reliability of
breast cancer detection systems. By aggregating predictions from multiple CNN models,
researchers can develop more robust and effective diagnostic tools for breast cancer screening
and diagnosis.
2.5 Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A.
(2016). Learning deep features for discriminative localization. In
Proceedings of the IEEE conference on computer vision and pattern
recognition(pp.2921-2929).
2.6 Summary
This literature survey explores the application of deep learning techniques, particularly
convolutional neural networks (CNNs), in breast cancer classification. Studies demonstrate
CNNs' effectiveness in automating diagnosis, enhancing computer-aided detection, and
improving model robustness through data augmentation and ensemble learning. Additionally,
interpretability techniques such as Grad-CAM offer insights into CNN model predictions,
fostering trust in automated diagnostic systems. Overall, the survey underscores CNNs'
potential to revolutionize breast cancer diagnosis and treatment.
CNN Module:
It will display the recommendation after using the search bar to look up the necessary
information. based on the needs and the budget.
Visualization Module:
The visualization module provides interactive visualization tools for representing geospatial data
and analysis results. It utilizes libraries like Matplotlib, Plotly, and Folium for creating maps,
scatter plots, heatmaps, and other visualizations to aid in data interpretation and communication.
The implementation of the project "Geospatial Prediction Using K-Means and Foursquare API "
involves several key steps to develop a robust and efficient system for analyzing spatial datasets:
Fig4.2.1(Implementation Process)
The model is compiled using binary cross entropy loss as the loss function and the Adam
optimizer. It is trained on the training data using the fit method, specifying the number of epochs
and batch size. Validation data is provided to monitor the model's performance during training
and prevent overfitting.
Visualization:
Interactive visualization tools are employed to represent geospatial data and analysis results in
intuitive and informative ways. This includes creating maps, scatter plots, heatmaps, and other
visualizations to aid in data interpretation and communication.
These implementation details provide a systematic approach to building, training, and evaluating
a CNN model for breast cancer classification, with the option for further optimization and
deployment for practical use.
Python:
Python serves as the primary programming language for implementing the project's functionalities.
It offers a rich ecosystem of libraries and tools for data manipulation, analysis, and visualization,
making it well-suited for exploratory data analysis (EDA) tasks.
Pandas:
Pandas is a powerful Python library for data manipulation and analysis. It provides data structures
such as DataFrames and Series, along with a wide range of functions for cleaning, filtering, and
transforming geolocational data.
17 | 26 P a g e Mini Project 2023-2024
NumPy:
NumPy is a fundamental library for numerical computing in Python. It provides support for multi-
dimensional arrays, mathematical functions, and linear algebra operations, which are essential for
handling geospatial data and performing statistical analysis.
TensorFlow:
A powerful deep learning framework used for building and training neural network models,
including CNNs..
Scikit-learn:
Scikit-learn is a popular Python library for machine learning and data mining. It offers a wide
range of algorithms and tools for clustering, classification, regression, and dimensionality
reduction, which can be applied to geolocational data for exploratory analysis.
Keras:
An intuitive high-level neural networks API that serves as the interface for building and training
models in TensorFlow..
These tools collectively provide a comprehensive environment for implementing, testing, and
evaluating the breast cancer classification model using CNNs, enabling efficient development
and analysis of machine learning solutions..
Accuracy and Sensitivity: The developed CNN model exhibits impressive performance with an
accuracy rate of 96.4%. This high accuracy underscores the model's capability to accurately
classify breast cancer cases, demonstrating its sensitivity in detecting lesions and distinguishing
between benign and malignant tumors.
Robustness: The CNN model demonstrates robustness in distinguishing between benign and
malignant tumors, indicating its reliability and consistency in breast cancer diagnosis. Interest
requiring further investigation.
Clinical Impact: The CNN model's ability to accurately classify breast cancer cases has
significant clinical implications. It can assist healthcare professionals in making timely and
informed decisions, leading to improved patient outcomes and potentially saving lives.
A box plot, often known as a boxplot, is a technique used in descriptive statistics to visually
represent numerical data's localization, dispersion, and skewness groups through their
quartiles. The central rectangular box represents the interquartile range (IQR) of the
dataset, which spans from the first quartile (Q1) to the third quartile (Q3). The length of
the box indicates the spread of the middle 50% of the data. The line inside the box
represents the median (Q2) of the dataset. From our Dataset we have determined the
rangestands between 1-6
Fig5.2.4(Model Loss)
22 | 26 P a g e Mini Project 2023-2024
Chapter 6
Conclusions and Further Scope
In conclusion, the utilization of Convolutional Neural Networks (CNNs) for breast cancer
classification has demonstrated remarkable accuracy and sensitivity in discerning malignancies.
Through the implementation of CNN models, we achieved a high accuracy rate of 96.4% in
accurately classifying breast cancer cases. The developed CNN model exhibits superior
performance in distinguishing between benign and malignant tumors, thereby showcasing its
potential as a powerful tool for precise breast cancer diagnosis.
By leveraging CNN's ability to extract intricate patterns from medical images, our model not
only enhances diagnostic protocols but also holds promise in improving patient prognosis. The
robustness of the model is evident from its ability to accurately classify tumors, thereby assisting
clinicians in making informed decisions regarding patient treatment and management strategies.
Future Scope:
Despite the promising results achieved in this study, there are several avenues for future
exploration and enhancement:
Real-Time Diagnosis: Developing real-time diagnostic tools that integrate our CNN model into
clinical workflows could facilitate rapid and accurate breast cancer diagnosis, enabling timely
interventions and improving patient outcomes.
Clinical Validation: Conducting extensive clinical validation studies to assess the model's
performance across diverse patient populations and healthcare settings is essential for ensuring
its reliability and generalizability in real-world scenarios.
Ethical Considerations: Addressing ethical considerations such as data privacy, bias mitigation,
and equitable access to healthcare services is crucial to ensure the responsible deployment and
adoption of AI-driven diagnostic tools in clinical practice..