0% found this document useful (0 votes)
18 views

Project Documentation

The document is a project report for developing AI models to predict benign and malignant colon cancer using deep CNN models. The goals are to develop a model that can augment the diagnosis process by reducing the time pathologists spend analyzing slide samples. The methodology uses CNN and transfer learning with pre-trained models to classify colon cell images as benign or adenocarcinoma. Over 12 days, tasks included collecting and preprocessing the dataset, designing and training a CNN architecture on the images, and evaluating the model's performance.

Uploaded by

953621243060
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Project Documentation

The document is a project report for developing AI models to predict benign and malignant colon cancer using deep CNN models. The goals are to develop a model that can augment the diagnosis process by reducing the time pathologists spend analyzing slide samples. The methodology uses CNN and transfer learning with pre-trained models to classify colon cell images as benign or adenocarcinoma. Over 12 days, tasks included collecting and preprocessing the dataset, designing and training a CNN architecture on the images, and evaluating the model's performance.

Uploaded by

953621243060
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

PROJECT REPORT

Benign and Malignant Colon Cancer Prediction using Deep CNN models

Name: Nithish S
Reg. No: 953621243039
Company Name: Hera Diagnostics, Rajapalayam
Time Frame: 26/06/2023 to 04/08/2023

1. Project Description
Usually while diagnosing a patient with cancer,the diagnosis process involves multiple
steps such as non-invasive for initial screening and invasive procedures after reliable diagnosis
from initial screening. The whole process of both stages is time consuming and mostly the
invasive procedures such as biopsies (removing a piece of muscle/tissue/fluid/blood around the
cancer affected region mostly through surgery) consumes a lot of time. After removing a sample
from the patient, the sample is placed in a slide and gets stained using chemicals such as
hematoxylin and eosin and they are left to dry. Then the slides are manually analyzed by a
pathologist which takes hours. A pathologist usually looks for the features like Cell Polarity,
Cell Size and Shape, Invasion into Stroma, Mitotic Figures, Chromatin Content and Nuclear
Cytoplasmic Ratio.
● Problem Statement: The main purpose of the project is to reduce the efforts of the
pathologist who spends hours analyzing the slide samples.
● Goals: Developing an AI model which augments the process of diagnosis and not fully
taking control over the pathologist's job.
● Dataset: The dataset used is an open source dataset which contains only images and has
no annotations. Each image has a dimension of 768x768 pixels. The data is divided into
two categories 1.Benign and 2. Adenocarcinoma, both parts are equally divided and we
have 5000 images in each category.

2. Methodology
We can use AI methods such as CNN(Convolutional Neural Network) based algorithms
for precise classification of cells. These Algorithms can do the next level of learning from
images by extracting features in front of the images and identify the patterns underlying in the
input images.In addition to CNN-based algorithms, implementing transfer learning with pre-
trained models like VGG16 or ResNet can expedite the training process and enhance the model's
accuracy by leveraging features learned from large image datasets. This combination of cutting-
edge CNN architectures and transfer learning will provide a robust foundation for our AI-
assisted cell classification system.
Day wise Tasks Completed during the internship period

Day Date Task Done

1 26/06/2023 • Project kickoff involves officially starting the project,


discussing goals, and clarifying the scope.

• Define the objectives clearly, including what the AI


model should achieve.

• Understanding the dataset requirements entails identifying


the dataset source, format, size, and any specific
preprocessing needs.

2 27/06/2023 • Collecting and organizing the dataset involves acquiring


the dataset you'll be working with.

• Ensure that you have a balanced distribution of benign


and adenocarcinoma images to prevent bias in the model.

• Organize the data in a structured manner for ease of


access during later stages.

3 28/06/2023 • On this day, I wrote code or used image processing


libraries (such as OpenCV or PIL) to resize each image to a
predefined dimension, such as 224x224 pixels.

• Ensure that the aspect ratio is maintained during resizing


to prevent distortion.

• This consistency in image dimensions will make it easier


to feed the data into your neural network during training.

4 30/06/2023 • Complete the resizing of all images and perform a quality


check and Continue resizing any remaining images from
the dataset.
• After resizing, visually inspect a sample of resized images
to ensure they look appropriate and haven't been distorted.

• Additionally, check the file format (e.g., JPEG, PNG) to


ensure consistency.

5 01/06/2023 • I began applying data augmentation techniques to


increase the dataset size.

• Data augmentation involved applying various


transformations to the existing images to create new
training examples. Common augmentation techniques
included Rotations, Flips, Translations, Brightness and
Contrast Adjustments, and Zoom.

• I implemented these techniques using libraries like


Augmentor, OpenCV, or image augmentation functions
available in deep learning frameworks like TensorFlow and
PyTorch. Data augmentation helped the model generalize
better by exposing it to a wider range of variations in the
training data.

6 03/07/2023 • Continuing with data augmentation, I ensured that a


sufficient number of augmented images were generated to
expand the training dataset.

• I kept track of the transformations applied to each image


for later reference.

• Additionally, I started documenting my data


preprocessing procedures, as this documentation would be
valuable for reproducibility and future reference.

7 04/07/2023 • Splitting the dataset into training and testing sets ensures
that you have a separate portion of data for training and
evaluating the model.

• A common split is 80% for training and 20% for testing to


assess model performance.

• After Augmentation and Splitting the DataSet for training


and testing, healthcheck on the dataset was performed
which ensures proper training of the CNN model.

8 05/07/2023 Read Research Papers about different CNN architectures


that are available, and analyze the advantages and
disadvantages of each architecture.
After getting to know multiple CNN architectures, I
understood that a proper architecture is required for better
feature extraction and better confidence of model.
Taking these into consideration, I designed a simple CNN
architecture.

9 06/07/2023 • The focus is on establishing the initial structure of the


deep learning model.

• This involves defining the neural network architecture,


selecting appropriate activation functions, specifying the
loss function relevant to the classification task, and
deciding on weight initialization methods.

• Choices regarding the architecture might include the


number and type of layers, whether to employ well-
established architectures like VGG16 or ResNet, or
designing a custom architecture tailored to the dataset.

10 07/07/2023 • On this day attention shifts to the configuration of the


training pipeline.

• This encompasses critical decisions such as optimizer


selection (e.g., Adam, SGD, RMSprop), setting an initial
learning rate, determining batch size for mini-batch
training, and considering regularization techniques to
mitigate overfitting.

• These two days(9 & 10) lay the foundation for subsequent
model development and training phases, ensuring a well-
structured and optimized neural network ready for learning
from the data.

11 08/07/2023 • On Day 11, I embarked on the crucial task of model


development. This phase involved crafting the neural
network architecture, whether it was a predefined
Convolutional Neural Network (CNN) or a custom design
tailored to my dataset.

• I determined the arrangement of layers, including


convolutional, pooling, and fully connected layers, as well
as the activation functions used between them, such as
Rectified Linear Units (ReLU).

• This architecture formed the backbone of my deep


learning model and was essential for feature extraction
from input images, setting the stage for accurate cell
classification in subsequent phases of the project.

12 10/07/2023 • On Day 12, I translated my chosen neural network


architecture into code, configuring the layers, activations,
and other architectural elements.
• This involved setting up the model's structure, specifying
input and output dimensions, and compiling the model with
the chosen loss function and optimizer.

• Subsequently, I trained the model on a limited portion of


my dataset, typically a smaller subset. This initial testing
allowed for a rapid assessment of the model's behavior,
helping me catch any glaring issues or architectural flaws
early on.

13 11/07/2023 • On Day 13, I reviewed the results from Day 12's initial
testing. If the model exhibited any shortcomings, this was
the time to make necessary adjustments to its architecture,
hyperparameters, or preprocessing steps.

• Fine-tuning involved experimenting with different


learning rates, batch sizes, or regularization techniques.

• After these refinements, I validated the model on a more


substantial subset of the dataset, providing a more
representative assessment of its performance and
generalization capabilities.

14 12/07/2023 • On Day 14, I continued refining the model iteratively,


considering the insights gained from validation and
performance on the smaller dataset.

• I focused on achieving a balance between model


complexity and generalization. Additionally, I started
calculating and documenting key performance metrics,
such as accuracy, precision, recall, and F1-score.

• These metrics provided a quantitative measure of the


model's effectiveness in classifying cells, guiding further
adjustments if necessary.

15 13/07/2023 • On this day, I focused on analyzing the results obtained


from the initial training of my model.

• This crucial step involved a comprehensive examination


of various aspects, including loss curves, accuracy, and the
model's behavior on the validation data.

• By closely scrutinizing these metrics, I gained valuable


insights into how well my model was performing and
where it might need improvement.
• The loss curves provided information about the
convergence of the training process, while accuracy metrics
offered a glimpse into the model's classification
capabilities.

• The behavior on validation data allowed me to assess the


model's generalization. This analysis laid the foundation for
informed decision-making regarding the model's
refinement.

16 14/07/2023 • Building on the insights gained from Day 15's analysis,


Day 16 was dedicated to implementing optimizations to
enhance the model's performance.

• This encompassed a range of actions, such as fine-tuning


hyperparameters to strike the right balance between
underfitting and overfitting, introducing regularization
techniques like dropout or L2 regularization to combat
overfitting, and potentially making adjustments to the
model architecture itself.

• These optimizations aimed to address the identified areas


for improvement and refine the model's capabilities. By
iteratively refining the model based on data-driven insights,
I ensured that it continued to evolve and meet the
performance standards required for my specific
classification task.

17 15/07/2023 • On Day 17, I embarked on the pivotal step of model


training. This involved feeding the entire training dataset
into my neural network and iteratively adjusting the
model's parameters to minimize the chosen loss function.

• Training a deep learning model, especially a complex one,


often demanded significant computational resources, which
included GPU acceleration.

• The duration of this process varied, and it could take


several hours or even days, contingent on factors like
model complexity and dataset size. This phase allowed the
model to uncover intricate patterns and relationships within
the data.

18 17/07/2023 • On Day 18, the model training continued from where it


left off. It was essential to monitor the training process
closely, keeping an eye on key indicators like training loss
and accuracy.

• I also considered early stopping strategies to prevent


overfitting, wherein training stopped when validation
performance ceased to improve.

• The goal was to ensure that the model converged to a


point where it had learned to generalize effectively from
the training data.

19 18/07/2023 • On this day, I delved into fine-tuning, a critical aspect of


model refinement.

• Fine-tuning involved making iterative adjustments to the


model architecture, hyperparameters, and other factors
based on observations and insights gained from the
validation results during the earlier stages of training.

• This step aimed to enhance the model's performance,


increase its ability to generalize, and address any potential
issues or limitations identified.

20 19/07/2023 • On Day 20, fine-tuning efforts persist. The adjustments


made to the model continue to be refined based on insights
and observations.

• Concurrently, validation on the validation dataset is


carried out to gauge the impact of these adjustments on the
model's performance.

• This iterative process of fine-tuning ensures that the


model reaches an optimal state, achieving the highest
possible accuracy and robustness while minimizing
overfitting.

21 20/07/2023 • On Day 21, evaluating the model's performance on the


test dataset is a pivotal step in assessing its real-world
applicability.

• This process involves exposing the model to new, unseen


data, mirroring its performance in practical scenarios.

• It serves as a litmus test, revealing how well the model


generalizes beyond the training data and highlighting any
potential issues such as overfitting or underfitting.

22 21/07/2023 • Moving to Day 22, the focus turns to quantifying the


model's performance comprehensively. Calculating and
documenting key performance metrics, including accuracy,
precision, recall, and F1-score, offers an objective and
clear understanding of the model's effectiveness in
classification tasks.

• These metrics go beyond a mere accuracy percentage,


providing insights into the model's ability to correctly
classify various classes, identify false positives, and assess
its overall precision and recall.

• This documentation becomes instrumental in gauging the


model's suitability for its intended medical diagnosis
application and guiding any necessary refinements.

23 22/07/2023 • On Day 23, I focused on creating a user-friendly


graphical user interface (GUI) using Flask, HTML, CSS,
and JavaScript.

• I designed the layout, styled elements with CSS, and


added interactivity with JavaScript.

• This phase aimed to establish a visually appealing and


responsive interface for pathologists to interact with the AI
model.

24 24/07/2023 • Continuing from Day 23, Day 24 involved integrating the


GUI with Flask for backend functionality.

• User inputs were processed, and real-time results from


the AI model were provided.

• Thorough testing and user feedback guided refinements


in HTML, CSS, and JavaScript, ensuring a smooth and
effective user experience for pathologists interacting with
the AI model through the web-based GUI

25 25/07/2023 • On Day 25, the focus was on integrating the AI model


with the pathologist's workflow by connecting the
graphical user interface (GUI) with the model.

• This involved ensuring a smooth and secure data flow


between the GUI and the AI model, allowing pathologists
to use the tool effectively.

26 26/07/2023 • Day 26 was dedicated to rigorous testing and validation


of the integrated system. Comprehensive testing scenarios
included user interactions, data inputs, and system
responses.
• This ensured a reliable and stable system, free from
usability or technical issues, and validated the accuracy of
AI model predictions within the GUI context.

27 27/07/2023 • On Day 27, I documented the entire project, including


code, model architecture, usage instructions, and
troubleshooting steps.

• This documentation was critical for future reference,


ensuring that both users and developers could understand
and utilize the system effectively.

• The emphasis was on creating well-organized and


comprehensive documentation.

28 28/07/2023 • Day 28 was dedicated to reviewing and finalizing project


documentation to ensure it was comprehensive, accurate,
and user-friendly.

• I paid close attention to clarity, organization, and


accessibility for individuals with varying levels of
technical expertise.

• This step guaranteed that the project's knowledge was


well-preserved and readily accessible for all stakeholders.

29 31/07/2023 • On Day 29, I conducted thorough final testing and


addressed any remaining issues or glitches to ensure that
the project was well-prepared for deployment in a real-
world setting.

• This phase involved subjecting the entire system,


including the AI model, GUI, and integration components,
to comprehensive testing scenarios.

• The goal was to identify and rectify any anomalies,


usability issues, or technical glitches that could hinder the
smooth operation of the system.

30 01/08/2023 • On Day 30, the focus remained on the optimization of the


AI model. I delved into fine-tuning model parameters and
hyperparameters, leveraging user feedback and
performance evaluations.

• This involved adjusting learning rates, batch sizes, or


regularization techniques to enhance the model's overall
performance. The aim was to ensure that the AI model
operated at its peak accuracy and robustness levels.
31 02/08/2023, • Continuing into Day 31, the emphasis shifted towards
03/08/2023 refining the user interface (UI) based on user feedback and
usability testing.

• I worked on making the UI more intuitive, visually


appealing, and user-friendly. This included revising layout
designs, improving user navigation, and optimizing the
overall user experience.

• By aligning the UI more closely with user expectations


and needs, I ensured that the interaction between
pathologists and the AI-assisted system was as smooth and
effective as possible.

32 04/08/2023 • On the final day, Day 32, I prepared for a project


presentation to my internship team or supervisor.

• The presentation was compelling and covered key aspects


of the project, including its objectives, development
process, achieved results, and future potential applications.

• I was ready to field questions and provide insights into


my AI-assisted cell classification system, highlighting its
significance and the value it brought to the medical
diagnosis domain.

• This presentation marked the culmination of my efforts


and showcased the project's readiness for real-world
implementation and further advancements.

Conclusion
Through this project, we have successfully achieved several significant goals aimed at enhancing
the diagnostic process in the field of cell classification for cancer diagnosis. Our primary
objective was to develop an AI-assisted cell classification system that augments the pathologist's
workflow, reducing their workload while maintaining a high standard of accuracy and reliability.

One of the key accomplishments was the development of a robust deep learning model, based on
Convolutional Neural Networks (CNN), which demonstrated exceptional capabilities in
accurately classifying cells into benign and adenocarcinoma categories. This model, fine-tuned
through rigorous optimization, significantly reduces the time and effort required for manual slide
analysis. It acts as a valuable second opinion for pathologists, aiding in the identification of
crucial features such as cell polarity, size, shape, invasion into stroma, mitotic figures, chromatin
content, and nuclear-cytoplasmic ratio

.
1. Adenocarcinoma Samples 2. Benign Samples

Furthermore, the integration of this AI model into a user-friendly graphical user interface (GUI)
marked another milestone. The GUI, built using Flask, HTML, CSS, and JavaScript, ensures
seamless interaction between pathologists and the AI system. It provides a platform where users
can effortlessly upload cell images, receive prompt analysis, and access classification results in a
comprehensible format.

3.Training and Validation Loss 4. Training and Validation Accuracy

In addition to the technical achievements, the project prioritized usability and user feedback.
Continuous optimization based on user inputs led to an interface that not only streamlines the
diagnostic process but also caters to the specific needs and expectations of medical professionals.
This collaborative approach ensured that the system aligns with the highest standards of medical
accuracy and usability.
5. Confusion Matrix

As a result of these accomplishments, we have created an AI-assisted cell classification system


that significantly reduces the efforts of pathologists without compromising accuracy. It
demonstrates the potential to revolutionize the diagnostic process, offering a powerful tool for
early cancer detection and efficient disease management.

6. Final Review From Project Team

In conclusion, this project underscores the successful fusion of cutting-edge AI technology with
medical expertise. It represents a significant step forward in the quest to enhance cancer
diagnosis, reduce human error, and ultimately improve patient outcomes. Our achievement lies
not only in the development of a sophisticated AI model and user-friendly interface but also in
the promise of a brighter future for medical professionals and patients alike.

You might also like