0% found this document useful (0 votes)
48 views

MAjor Project Report

The document discusses developing a deep learning based lung cancer detection system using CT scan images. It aims to identify the type of lung cancer from images in order to enable early detection. The methodology section describes using a convolutional neural network model with techniques like data augmentation and backpropagation to analyze CT scans and detect cancer types.

Uploaded by

Aman Mourya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

MAjor Project Report

The document discusses developing a deep learning based lung cancer detection system using CT scan images. It aims to identify the type of lung cancer from images in order to enable early detection. The methodology section describes using a convolutional neural network model with techniques like data augmentation and backpropagation to analyze CT scans and detect cancer types.

Uploaded by

Aman Mourya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

DEEP LEARNING BASED LUNG CANCER DETECTION SYSTEM

B.Tech Major Project Synopsis Report (KEC – 753)

DEPARTMENT OF ELECTRONICS ENGINEERING


RAJKIYA ENGINEERING COLLEGE, KANNAUJ

PROJECT GUIDE : PROJECT MEMBER’S:


Mr. Ashwini Kumar Upadhyay Harshita Gupta (1908390300026)
Assistant Professor Sakshi Maurya (1908390300039)
Dept. of Electronics Engineering Shubhash Chandra (1908390300047)
Divyanshu Dev (18839030015)
TABLE OF CONTENTS

1. Introduction………………………………………………………………………….3
a) Motivation & Objective
b) Previous Work
c) Brief Description about work

2. Methodology………………………………………………………………………..6
3. Dataset……………………………………………………………………………..12
4. Experiment………………………………………………………………………….15
5. Result………………………………………………………………………………19
6. Conclusion & Future Work…………………………………………………………21

Page | 1
LIST OF FIGURES

Fig 1 Deep Learning Architecture…………………………………………………………….6


Fig 2 Common Activation Function………………………………………………………….7
Fig 3 Convolutional Neural Network Architecture…………………………………………...8
Fig 4 Max and Average Pooling Operation…………………………………………………..9
Fig 5 VGG-16 Architecture…………………………………………………………………..9
Fig 6 Modal Summary……………………………………………………………………….10
Fig 7 Lung Adenocarcinoma CT Scan image……………………………………………….12
Fig 8 Large cell carcinoma…………………………………………………………………..13
Fig 9 Squamous Cell Carcinoma…………………………………………………………….13
Fig 10 Normal CT Scan image………………………………………………………………14
Fig 11 Model Metrics Visualization…………………………………………………………19

Page | 2
INTRODUCTION

MOTIVATION-
With the help of modern technology, doctors can replace any part of the human body, from
bones to organs to hands and face, except for the brain and lungs. Therefore, early detection
of lung or brain damage should be recognized to improve human survival. This is the main
motivation for this project. There are many ways to diagnose lung cancer, such as chest X-
rays, computed tomography (CT), and magnetic resonance imaging (MRI), but even analyzing
these reports, doctors cannot accurately predict the stage. The size of the cancer or the size
of the tumor. Therefore, there is a great need for new techniques, namely image processing
methods, that improve manual analysis and are good tools for more accurate cancer prediction.
Today's technology allows doctors to replace any part of the human body, from bones
to organs to hands and faces, with the exception of the brain and lungs. Therefore, there is a
need for early detection of lung and brain damage to improve people's survival. This is the
main motivation for this project. There are many techniques for diagnosing lung cancer,
including chest x-rays (X-rays), computed tomography (CT), and magnetic resonance imaging
(MRI). Predict cancer stage and tumor size. Therefore, new technology is greatly needed. H.
Imaging techniques that are great tools for improving manual analysis and predicting
cancer more accurately.

OBJECTIVE -
The main goal of this project is to identify the type of lung cancer. Lung cancer is the leading
disease today. Identifying lung cancer at an early stage is extremely important because early
detection is the only method to improve survival rates. The presence of lung cancer can be
diagnosed by CT imaging of the lungs. Manual detection by a physician may lead to false
reporting. Therefore, the focus of this project is on the need for a computerized method for
cancer detection. This project analyzes CT images using various image processing operation’s
to obtain accurate detection results. Here, a CT image will be reviewed and then the image will
be preprocessed to remove noise and enhance the image. Further extraction of morphological
features was performed using a deep learning model and lung cancer detection took place.

Page | 3
PREVIOUS WORK

In the medical field, specific detection on CT scan images is very difficult and complicated.
However, methods using CNNs are widely used in various studies for disease state detection
and pre-diagnosis.
The first computer-aided detection (CAD) systems for detecting lung nodules were developed
in the late 1980s, but these attempts were not attractive because the computing resources for
advanced image analysis techniques were lacking at the time. Since the invention of graphics
processing units (GPUs) and convolutional neural networks (CNNs), the power of image
analysis and decision support computer systems has improved significantly. Researchers have
proposed many deep learning models for medical image analysis, and some of the most relevant
methods for detecting and classifying lung nodules are mentioned here

Ruchita Tekade,proposed 3D multipath VGG-like network which is evaluated on 3Dcubes


extracted from LIDC-IDRI, LUNA 16 and Kaggle Data Science Bowl 2017 datasets 2Maithily
Marathe, basically implemented Lung Cancer Detection System using Convolution Neural
Network and compare their model with VGG16 and achieve good validation accuracy 333
CT scan analysis techniques are face a lot of false positive results in the early stage of lung
cancer diagnosis. Therefore, a multi-strategy-based approach is required for early-stage lung
cancer detection.

Page | 4
BRIEF DESCRIPTION ABOUT THE WORK –
Cancer is considered to be the most serious disease in consideration for death rates. It caused
more deaths as comparative to breast cancer, prostate, liver or skin cancer. The most common
symptoms reported for lung cancer patients is smoking and also other symptoms include chest
pain, cough, blood coughing. Not all the patients having this symptoms, may be some patients
have joint pain, headache problem as well. There are no accurate signs or symptoms of lung
cancer. It is stated that only 15% patient cases detected at an early stage. Early diagnosis of
lung cancer with providing best treatment is critical for many patients as well as unaffordable
for healthcare system. Cancer detection can be done using X-ray, Computed Tomography (CT)
scans, Positron emission tomography (PET scans), MRI that is Magnetic resonance imaging
etc. CT scans are more preferred by radiologist for detection of lung cancer. These scans are
created by combining different x- ray images with different angles but analysing thousands of
scan at same time will be burden for many radiologists. So, the main goal of this is to detect
lung cancer and its stages from CT Scan images using deep learning .
For medical field data, deep learning based model works well comparative to state of the art
methods. It is subfield of machine learning which works well for object as well as image
detection and classification task. From our research, its seems that there are many automated
detection systems implemented for lung cancer detection which are based on tradition.
In this work , Chest CTscan images dataset is used which is obtained from Kaglgle website.
The dataset is split into training ,testing and validation set for evaluation of the model . In this
we use deep learning based CNN model i.e. VGG 16 for detection of the lung cancer.
Basically, In this firstly we import python libraries for to handle the data and perform typical
and complex task with single line of code then apply Data Augmentation technique for
preprocessing the data. Use the tensorflow Library to build CNN model and Keras framework
of the tensor flow library contains all the functionalities that one may need to define the
architecture of a CNN and train it on the data. For this , Adam optimizer is used for to optimize
the CNN model and then evaluate the model performance on testing dataset using different
metrics and predict the type of cancer using this architecture and get good accuracy on testing.
This whole implementation done on Google Colaboratory . Google Colaboratory is a platform
which is well suited for performing machine and deep learning algorithms .

Page | 5
METHODOLOGY
DEEP LEARNING
Deep Learning is a type of artificial intelligence (AI) that is capable of recognizing patterns
and making decisions from large datasets. It uses many layers of artificial neural networks to
process data, and can be used for various tasks such as image recognition and classification,
natural language processing .Deep Learning is a subfield of machine learning that involves
using layers of artificial neural networks to automatically learn representations of data. For
example, a deep learning system could be trained to recognize objects in an image. In this case,
the system would need to learn to identify different shapes, colors, and textures in order to
accurately classify the objects in the image.
Deep learning is more powerful technology to resolve complex task. there are many DL (deep
learning) algorithms exist like recurrent neural network, deep neural network and convolutional
neural network

.
Figure 1; Deep Neural Networks
Figure 1 ; Deep Learning

Artificial Neural Network


An Artificial Neural Network is an interconnected architecture where there exists an input layer
where input data is placed, a hidden layer(s) where artificial neurons are stacked on on top of
each other and an output layer where the prediction or classification is made .
Forward Propagation
Forward propagation is technique in which data moves through from the corresponding input
layer, hidden layers and output layer sequentially.
Back Propagation
Back-propagation is the opposite of forward-propagation because it provides back-propagation
to the network. Back propagation is used to adjust the weights of the neural network after the
errors have been computed by the forward propagation algorithm.

Page | 6
Activation Functions
Activation functions are an important part of neural networks. This allows neural networks to
solve problems by generating non-linear functions. The three most commonly used activation
functions are Sigmoid, TanH and ReLU. Activation functions are used in both forward and
back propagation, in forward propagation we use the activation function to calculate the loss
when comparing the output of the function to a real number, and in back propagation we use
it to update the parameters of the neural network on fig.2 shows an activation function
commonly used in neural networks

Figure 2; Common Activation Function

Softmax Activation Function


Softmax is an activation function that which calculates the probabilities distribution of the event.
Basically, it is use for multiple classes.
Mathematically, Softmax is defined as,

Page | 7
CONVOLUTIONAL NEURAL NETWORKS
Convolutional Neural Networks is all about using Deep Learning with Computer vision.It is
also known as Conv Net as well as CNN. CNN is useful for feature extraction and classification
of objects in the image. This CNN is nothing but a stack of different layer. It is more preferred
in healthcare industry. some of the applications are tumor or cancer detection, drug discovery,
disease diagnosis.
Basically, it has three layers-
a) Convolution Layer
b) Pooling Layer

c) Fully-Connected layer
Figure 3: Convolutional Neural Network Architecture

Convolutional Layer
Linear functions used in convolutional neural networks are called convolutional layers. Each
node in the hidden layer uses an image processing feature detector to extract different features
from input image.
Pooling Layer
Pooling layer extract and select the important features from them also known as subsampling
of convolved features. There are two type of pooling one is max pooling and other is average
pooling but max pooling is widely used in so many resarches where maximum value among
the values in pooling the window is selected as sampled features from convolved features.

Page | 8
Figure 4: Max and Average Pooling Operation

TRANSFER LEARNING VGG-16


VGG16 is a convolutional neural network model developed by the Visual Geometry Group
(VGG) at Oxford. It was first introduced by Simonyan and Zisserman in their paper "Very
Deep Convolutional Networks for Large-scale Image Recognition". It is a 16-layer neural
network, consisting of 13 convolutional layers and 3 fully connected layers. Three Fully-
Connected (FC) layers follow a stack of convolutional layers. The final layer is the soft-max
layer.
The model is trained on the ImageNet dataset, which is a large database of images used in
image classification and recognition. The VGG16 model has achieved excellent performance
in image classification tasks, as well as object detection and segmentation. By using relatively
small convolutional filters, VGG16 is able to capture more fine-grained details from the image.
The model is also very efficient, making it suitable for real-time applications. VGG16 is a
popular model for transfer learning, which is where the model is trained on a large dataset and
then used to classify images on a different dataset. This helps to reduce the amount of data and
training time needed to get good results.

Figure 5 : VGG-16 Architecture

Page | 9
MODEL SUMMARY

Figure 6: Model Summary

Page | 10
BLOCK DIAGRAM

Image from
Preprocessing
Database

Feature Classify by
Extraction CNN model

Lung Cancer YES


LUNG CANCER
Detection DETECTED

NO NO LUNG
O CANCER
DETECTED

Page | 11
DATASET

Description
Building deep learning models require a lot of data. For this project datasets has been
researched and identified before any real work has begun. Since there is a heavy emphasis on
building models for this project, a key part of the project relies on a Dataset. Prior to coding, I
had to ensure I had a great dataset to work with to build a model.

CHEST CT SCAN IMAGES DATASET


In our study, Data has been collected from Kaggle website. This dataset used for training our
CNN(convolutional neural network) model for lung cancer detection. There are total 1000 CT
scans of patients which is in jpg/png format. It consists of CT Scan images which need to pre-
processed first before these images fed to CNN model. For pre-processing dataset is further
split into 70% training set to train our model, 20 % testing set and 10% in validation set for
evaluate the performance4
Basically, this dataset consists 4 classes which are ADENOCARCINOMA, LARGE CELL
CARCINOMA, SQUAMOUS CELL CARCINOMA AND NORMAL CT – SCAN IMAGES.
Adenocarcinoma
Lung adenocarcinoma is a type of lung cancer that starts in the glands of the lung. It is the most
common type of lung cancer and is the leading cause of cancer death in both men and women.
It is also one of the most difficult types of cancer to treat. The main factor for causing the lung
adenocarcinoma is unknown but smoking is the biggest risk factor for and other risk factors
include exposure to certain chemicals. Symptoms of lung adenocarcinoma include a persistent
cough, chest pain, shortness of breath, fatigue, and weight loss. The diagnosis is usually made
with a CT scan.

Figure 7; LUNG ADENOCARCINOMA

Page | 12
Large cell carcinoma
Lung cancer is an aggressive type of cancer that begins in the lung cells. It is the most common
type of lung cancer and accounts for 80-85% of all lung cancers. Most lung cancers are caused
by smoking, but other factors such as radon gas, secondhand smoke, and certain chemicals can
also increase a person's risk. Symptoms of lung cell carcinoma vary from mild to severe and
may include persistent cough, chest pain, shortness of breath.

Figure 8; LARGE CELL CARCINOMA

Squamous cell carcinoma


This type of lung cancer is most often found in the middle part of the lung or one of the major
branches of the airway where the larger bronchi meet the trachea of the lung. Squamous cell
lung cancer accounts for about 30% of all non-small cell lung cancers and is commonly
associated with smoking.

Figure 9; SQUAMOUS CELL CARCINOMA

Page | 13
And the last folder is the normal CT-Scan.

Figure 10: NORMAL CT SCAN IMAGE

Page | 14
EXPERIMENT
TECHNOLOGY DECISION
Overview In this section
In this, we give the details about the technologies that used for this project. Although there are
many tools that exist out there in the market but found that these tools outlined perform well
for the problem that needs to be solved.
PYTHON
Python is a high-level language and used for general-purpose programming. It is widely used
in scientific computing and can be used for a wide range of common tasks from data mining to
software development. Python is the main language used in this project
GOOGLE COLAB
Google Colab is developed by Google Research. Colab allows anyone to write and run
arbitrary Python code through a browser and is particularly suited for machine learning, data
analysis, and training. Technically speaking. Libraries like NumPy, Pandas, Matplotlib,
Tensorflow, etc. are supported by Google Colab.
Importing the Python library makes it very easy to process data and perform common and complex
operations with one line of code, such as Numpy, Pandas, Tensorflow, matplotlib, and more

Numpy
Numpy is a Python library that efficiently performs numerical calculations in Python. This
library is optimized for solving math problems. Numpy can also perform more efficient
mathematical operations compared to Python math libraries
Pandas
Pandas is a library in Python that, like numpy, is also used for data preprocessing and
preparation. One of the key features of Pandas is the DataFrame and Series data structures.
These data structures are optimized and include nice indexing that allows various functions
like reorganization, slicing, merging, concatenation, etc. Pandas and Numpy are very efficient
when used together to manage data.
Matplotlib
Matplotlib is a Python plotting library that allows programmers to create a wide variety of
graphs and visualizations with ease of use. The great feature about Matplotlib is that it creating
visualizations is simplified. Matplotlib also works very well with pandas and numpy.
OpenCV
Open Source Computer Vision (OpenCV) is a well-established computer vision library written
in C/C++ and abstracted for interoperability with C++, Python, and Java. It is a powerful
imaging tool that includes many tools for image processing, feature extraction, and more.

Page | 15
Tensorflow
Tensorflow is an open-source deep learning library developed by Google and originally used
by Google Brain for machine learning and deep learning research. At its core, Tensorflow is
designed to compute multidimensional arrays called Tensors, but what makes Tensorflow great
is its ability to flexibly distribute computations across different devices such as CPUs and
GPUs.
Keras
Keras is also a deep learning framework that abstracts a lot of code from other platforms like
Tensorflow and Theano. Compared to other frameworks, Keras is more minimalistic.
Apply Data Augmentation technique for data preprocessing.
DATA AUGMENTATION –
Data augmentation is a set of methods to artificially increase the amount of data by creating
new data points from existing data. This involves making small changes to the data to create
new data points. Data augmentation is useful for improving the performance and results of
machine learning models by generating new and good examples for your training data set.
When the data set of a machine learning model is large and sufficient, the model is more
accurate and performs better.
There are three main options when compiling a model.
OPTIMIZER
This is a useful technique for optimizing cost functions using gradient descent. The Adam
optimizer is used to optimize the CNN model and training process, and some
hyperparameters are also used. Adaptive Moment Estimation is an optimization method
algorithm for gradient descent. This method is very impactful when solving large problems
with large amounts of data or parameters. It requires less memory. Intuitively, this is a
combination of the "momentum gradient descent" algorithm and the "RMSP" algorithm. The
Adam optimizer contains a combination of two gradient descent methodologies.
a) momentum b) RMS spread

LOSS FUNCTION
A loss function used to track whether a model improves with training.
CATEGORICAL CROSS LOSS FUNCTION

This is also known as logarithmic loss, log loss or logistic loss. Every predicted class probability

is compared to the actual class desired output 0 or 1 and a loss is calculated that penalizes the

probability based on how far it is from the actual expected value. Categorical cross entropy is

used for multi-class classification deep learning model. The aim is to reduce the loss of the
model.

Page | 16
Cross-entropy is defined a

Mathematical Equation of Categorical Cross Entropy

METRICS –
This helps to evaluate the model by predicting the training and the validation data.
For preprocessing the dataset , we made dataset of target size is 224x224.
For training our model , Some hyperparameters are used -
 Epochs stands for number of iteration for training our neural network.
 Learning rate referred as tuning parameter which determines step size at every iteration.
 Batch size is number of samples utilized in one iteration as shown in Table.I.
TABLE I. HYPERPARAMETERS UTILIZED WITH THEIR VALUES

HYPERPARAMETER VALUE’S SET FOR CNN MODEL

EPOCHS 20

BATCHSIZE 64

LEARNING RATE 0.002

Page | 17
PERFORMANCE MATRICES
Deep learning algorithms stated above are evaluated based on four basic parameters.
ACCURACY : It is metrics which determines measure of correctly classified instances out
of total number of instances. accuracy gives combine result of true positive and true negative
values.
ACCURACY = TP+TN
TP+TN+FP+FN

PRECISION: Precision states that how many of total number of selected instances identifies
accurately.
PRECISION = TP + FP
TP

RECALL: Recall determines how many of correctly classified instances identified.

F1 SCORE: It is harmonic mean of precision and recall.it can be represented as:


F1-SCORE = precision . recall
precision + recall

a) True Positive (TP):


Algorithm predict positive classes correctly. therefore, for our dataset images
containing cancer/non-cancer classified as cancer/non-cancer.
b) True Negative (TN):
Algorithm predict negative classes correctly. for our dataset images containing
cancer/non- cancer will be classify as non-cancer/cancer
c) False Positive (FP) :
Algorithm predicts positive classes incorrectly.
d) False Negative(FN):
Outcome where model predict negative.

Page | 18
RESULT
The result for lung cancer detection using the VGG16 CNN model shows a testing accuracy of
77.5% and a training accuracy of 81.5%. This result indicates that the model is able to recognize
and differentiate between healthy and cancerous lung tissue with a high degree of accuracy.
The model is able to identify the presence of cancerous tissue in lung images with 77.5%
accuracy. This result is encouraging, as it has the potential to improve the early detection of
lung cancer and provide more accurate diagnoses. However, the model should be further
evaluated with more data to ensure that it is performing optimally. Additionally, the model
should be tested with different types of data to further improve its accuracy. Overall, the
VGG16 CNN model shows good promise for the early detection of lung cancer.

Figure 11; Model Metrics Visulaization

PREDICTION’S DONE BY MODEL


a) For normal chest CT scan

Page | 19
For Adenocarcinoma

For Large Cell Carcinoma

For Squamous cell carcinoma

Page | 20
CONCLUSION AND FUTURE WORK

In this our work, we implement a convolutional neural network-based system for lung cancer
detection. We used the Kaggle site dataset for this study. We practice segmentation and
preprocessing techniques. 224×224 pixels is the new size for all CT Scan images. CNN model
was finally trained using the training data and the model's performance was evaluated using
the test dataset. We use test dataset dataset to evaluate the model's performance and we get an
accuracy rate of 77.3%. We aim for maximum accuracy of 95-100%. where the training
accuracy for vgg16 is 81.4%.

FUTURE WORK
Finally, we have developed an accurate web application for lung cancer screening. In
addition, technologies such as Python and deep learning are used to build websites. Deep
learning libraries like camera, tensorflow, seaborn and sklearn are used. For this system, the
user name and input image are selected from the database. A system will determine if a
particular image is cancerous or not and display the results.

Page | 21
REFRENCES

1. Nasraullah Nasrullah, Jun Sang, Mohammad S. Alam, Muhammad Mateen, Bin Cai
and Haibo Hu,” Automated Lung Nodule Detection and Classification Using Deep
Learning Combined with Multiple Strategies”
2. Ruchita Tekade , Dr. R.K. Rajeswari,”Lung cancer detection and classification using
deep Learning” IEEE 2018.
3. Maithily Marathe , Madhuri Bhalekar, “Detection of Lung Cancer using CT scans
with Deep Learning approach” IEEE 2022
4. 2022ChestCTScanImageDatasethttps://www.kaggle.com/datasets/mohamedhanyyy/c
hest-ctscan-images (accessed on 10 October,2022)

Page | 22
APPENDIX
CODE

Page | 23
Page | 24
Page | 25
Page | 26

You might also like