MAjor Project Report
MAjor Project Report
1. Introduction………………………………………………………………………….3
a) Motivation & Objective
b) Previous Work
c) Brief Description about work
2. Methodology………………………………………………………………………..6
3. Dataset……………………………………………………………………………..12
4. Experiment………………………………………………………………………….15
5. Result………………………………………………………………………………19
6. Conclusion & Future Work…………………………………………………………21
Page | 1
LIST OF FIGURES
Page | 2
INTRODUCTION
MOTIVATION-
With the help of modern technology, doctors can replace any part of the human body, from
bones to organs to hands and face, except for the brain and lungs. Therefore, early detection
of lung or brain damage should be recognized to improve human survival. This is the main
motivation for this project. There are many ways to diagnose lung cancer, such as chest X-
rays, computed tomography (CT), and magnetic resonance imaging (MRI), but even analyzing
these reports, doctors cannot accurately predict the stage. The size of the cancer or the size
of the tumor. Therefore, there is a great need for new techniques, namely image processing
methods, that improve manual analysis and are good tools for more accurate cancer prediction.
Today's technology allows doctors to replace any part of the human body, from bones
to organs to hands and faces, with the exception of the brain and lungs. Therefore, there is a
need for early detection of lung and brain damage to improve people's survival. This is the
main motivation for this project. There are many techniques for diagnosing lung cancer,
including chest x-rays (X-rays), computed tomography (CT), and magnetic resonance imaging
(MRI). Predict cancer stage and tumor size. Therefore, new technology is greatly needed. H.
Imaging techniques that are great tools for improving manual analysis and predicting
cancer more accurately.
OBJECTIVE -
The main goal of this project is to identify the type of lung cancer. Lung cancer is the leading
disease today. Identifying lung cancer at an early stage is extremely important because early
detection is the only method to improve survival rates. The presence of lung cancer can be
diagnosed by CT imaging of the lungs. Manual detection by a physician may lead to false
reporting. Therefore, the focus of this project is on the need for a computerized method for
cancer detection. This project analyzes CT images using various image processing operation’s
to obtain accurate detection results. Here, a CT image will be reviewed and then the image will
be preprocessed to remove noise and enhance the image. Further extraction of morphological
features was performed using a deep learning model and lung cancer detection took place.
Page | 3
PREVIOUS WORK
In the medical field, specific detection on CT scan images is very difficult and complicated.
However, methods using CNNs are widely used in various studies for disease state detection
and pre-diagnosis.
The first computer-aided detection (CAD) systems for detecting lung nodules were developed
in the late 1980s, but these attempts were not attractive because the computing resources for
advanced image analysis techniques were lacking at the time. Since the invention of graphics
processing units (GPUs) and convolutional neural networks (CNNs), the power of image
analysis and decision support computer systems has improved significantly. Researchers have
proposed many deep learning models for medical image analysis, and some of the most relevant
methods for detecting and classifying lung nodules are mentioned here
Page | 4
BRIEF DESCRIPTION ABOUT THE WORK –
Cancer is considered to be the most serious disease in consideration for death rates. It caused
more deaths as comparative to breast cancer, prostate, liver or skin cancer. The most common
symptoms reported for lung cancer patients is smoking and also other symptoms include chest
pain, cough, blood coughing. Not all the patients having this symptoms, may be some patients
have joint pain, headache problem as well. There are no accurate signs or symptoms of lung
cancer. It is stated that only 15% patient cases detected at an early stage. Early diagnosis of
lung cancer with providing best treatment is critical for many patients as well as unaffordable
for healthcare system. Cancer detection can be done using X-ray, Computed Tomography (CT)
scans, Positron emission tomography (PET scans), MRI that is Magnetic resonance imaging
etc. CT scans are more preferred by radiologist for detection of lung cancer. These scans are
created by combining different x- ray images with different angles but analysing thousands of
scan at same time will be burden for many radiologists. So, the main goal of this is to detect
lung cancer and its stages from CT Scan images using deep learning .
For medical field data, deep learning based model works well comparative to state of the art
methods. It is subfield of machine learning which works well for object as well as image
detection and classification task. From our research, its seems that there are many automated
detection systems implemented for lung cancer detection which are based on tradition.
In this work , Chest CTscan images dataset is used which is obtained from Kaglgle website.
The dataset is split into training ,testing and validation set for evaluation of the model . In this
we use deep learning based CNN model i.e. VGG 16 for detection of the lung cancer.
Basically, In this firstly we import python libraries for to handle the data and perform typical
and complex task with single line of code then apply Data Augmentation technique for
preprocessing the data. Use the tensorflow Library to build CNN model and Keras framework
of the tensor flow library contains all the functionalities that one may need to define the
architecture of a CNN and train it on the data. For this , Adam optimizer is used for to optimize
the CNN model and then evaluate the model performance on testing dataset using different
metrics and predict the type of cancer using this architecture and get good accuracy on testing.
This whole implementation done on Google Colaboratory . Google Colaboratory is a platform
which is well suited for performing machine and deep learning algorithms .
Page | 5
METHODOLOGY
DEEP LEARNING
Deep Learning is a type of artificial intelligence (AI) that is capable of recognizing patterns
and making decisions from large datasets. It uses many layers of artificial neural networks to
process data, and can be used for various tasks such as image recognition and classification,
natural language processing .Deep Learning is a subfield of machine learning that involves
using layers of artificial neural networks to automatically learn representations of data. For
example, a deep learning system could be trained to recognize objects in an image. In this case,
the system would need to learn to identify different shapes, colors, and textures in order to
accurately classify the objects in the image.
Deep learning is more powerful technology to resolve complex task. there are many DL (deep
learning) algorithms exist like recurrent neural network, deep neural network and convolutional
neural network
.
Figure 1; Deep Neural Networks
Figure 1 ; Deep Learning
Page | 6
Activation Functions
Activation functions are an important part of neural networks. This allows neural networks to
solve problems by generating non-linear functions. The three most commonly used activation
functions are Sigmoid, TanH and ReLU. Activation functions are used in both forward and
back propagation, in forward propagation we use the activation function to calculate the loss
when comparing the output of the function to a real number, and in back propagation we use
it to update the parameters of the neural network on fig.2 shows an activation function
commonly used in neural networks
Page | 7
CONVOLUTIONAL NEURAL NETWORKS
Convolutional Neural Networks is all about using Deep Learning with Computer vision.It is
also known as Conv Net as well as CNN. CNN is useful for feature extraction and classification
of objects in the image. This CNN is nothing but a stack of different layer. It is more preferred
in healthcare industry. some of the applications are tumor or cancer detection, drug discovery,
disease diagnosis.
Basically, it has three layers-
a) Convolution Layer
b) Pooling Layer
c) Fully-Connected layer
Figure 3: Convolutional Neural Network Architecture
Convolutional Layer
Linear functions used in convolutional neural networks are called convolutional layers. Each
node in the hidden layer uses an image processing feature detector to extract different features
from input image.
Pooling Layer
Pooling layer extract and select the important features from them also known as subsampling
of convolved features. There are two type of pooling one is max pooling and other is average
pooling but max pooling is widely used in so many resarches where maximum value among
the values in pooling the window is selected as sampled features from convolved features.
Page | 8
Figure 4: Max and Average Pooling Operation
Page | 9
MODEL SUMMARY
Page | 10
BLOCK DIAGRAM
Image from
Preprocessing
Database
Feature Classify by
Extraction CNN model
NO NO LUNG
O CANCER
DETECTED
Page | 11
DATASET
Description
Building deep learning models require a lot of data. For this project datasets has been
researched and identified before any real work has begun. Since there is a heavy emphasis on
building models for this project, a key part of the project relies on a Dataset. Prior to coding, I
had to ensure I had a great dataset to work with to build a model.
Page | 12
Large cell carcinoma
Lung cancer is an aggressive type of cancer that begins in the lung cells. It is the most common
type of lung cancer and accounts for 80-85% of all lung cancers. Most lung cancers are caused
by smoking, but other factors such as radon gas, secondhand smoke, and certain chemicals can
also increase a person's risk. Symptoms of lung cell carcinoma vary from mild to severe and
may include persistent cough, chest pain, shortness of breath.
Page | 13
And the last folder is the normal CT-Scan.
Page | 14
EXPERIMENT
TECHNOLOGY DECISION
Overview In this section
In this, we give the details about the technologies that used for this project. Although there are
many tools that exist out there in the market but found that these tools outlined perform well
for the problem that needs to be solved.
PYTHON
Python is a high-level language and used for general-purpose programming. It is widely used
in scientific computing and can be used for a wide range of common tasks from data mining to
software development. Python is the main language used in this project
GOOGLE COLAB
Google Colab is developed by Google Research. Colab allows anyone to write and run
arbitrary Python code through a browser and is particularly suited for machine learning, data
analysis, and training. Technically speaking. Libraries like NumPy, Pandas, Matplotlib,
Tensorflow, etc. are supported by Google Colab.
Importing the Python library makes it very easy to process data and perform common and complex
operations with one line of code, such as Numpy, Pandas, Tensorflow, matplotlib, and more
Numpy
Numpy is a Python library that efficiently performs numerical calculations in Python. This
library is optimized for solving math problems. Numpy can also perform more efficient
mathematical operations compared to Python math libraries
Pandas
Pandas is a library in Python that, like numpy, is also used for data preprocessing and
preparation. One of the key features of Pandas is the DataFrame and Series data structures.
These data structures are optimized and include nice indexing that allows various functions
like reorganization, slicing, merging, concatenation, etc. Pandas and Numpy are very efficient
when used together to manage data.
Matplotlib
Matplotlib is a Python plotting library that allows programmers to create a wide variety of
graphs and visualizations with ease of use. The great feature about Matplotlib is that it creating
visualizations is simplified. Matplotlib also works very well with pandas and numpy.
OpenCV
Open Source Computer Vision (OpenCV) is a well-established computer vision library written
in C/C++ and abstracted for interoperability with C++, Python, and Java. It is a powerful
imaging tool that includes many tools for image processing, feature extraction, and more.
Page | 15
Tensorflow
Tensorflow is an open-source deep learning library developed by Google and originally used
by Google Brain for machine learning and deep learning research. At its core, Tensorflow is
designed to compute multidimensional arrays called Tensors, but what makes Tensorflow great
is its ability to flexibly distribute computations across different devices such as CPUs and
GPUs.
Keras
Keras is also a deep learning framework that abstracts a lot of code from other platforms like
Tensorflow and Theano. Compared to other frameworks, Keras is more minimalistic.
Apply Data Augmentation technique for data preprocessing.
DATA AUGMENTATION –
Data augmentation is a set of methods to artificially increase the amount of data by creating
new data points from existing data. This involves making small changes to the data to create
new data points. Data augmentation is useful for improving the performance and results of
machine learning models by generating new and good examples for your training data set.
When the data set of a machine learning model is large and sufficient, the model is more
accurate and performs better.
There are three main options when compiling a model.
OPTIMIZER
This is a useful technique for optimizing cost functions using gradient descent. The Adam
optimizer is used to optimize the CNN model and training process, and some
hyperparameters are also used. Adaptive Moment Estimation is an optimization method
algorithm for gradient descent. This method is very impactful when solving large problems
with large amounts of data or parameters. It requires less memory. Intuitively, this is a
combination of the "momentum gradient descent" algorithm and the "RMSP" algorithm. The
Adam optimizer contains a combination of two gradient descent methodologies.
a) momentum b) RMS spread
LOSS FUNCTION
A loss function used to track whether a model improves with training.
CATEGORICAL CROSS LOSS FUNCTION
This is also known as logarithmic loss, log loss or logistic loss. Every predicted class probability
is compared to the actual class desired output 0 or 1 and a loss is calculated that penalizes the
probability based on how far it is from the actual expected value. Categorical cross entropy is
used for multi-class classification deep learning model. The aim is to reduce the loss of the
model.
Page | 16
Cross-entropy is defined a
METRICS –
This helps to evaluate the model by predicting the training and the validation data.
For preprocessing the dataset , we made dataset of target size is 224x224.
For training our model , Some hyperparameters are used -
Epochs stands for number of iteration for training our neural network.
Learning rate referred as tuning parameter which determines step size at every iteration.
Batch size is number of samples utilized in one iteration as shown in Table.I.
TABLE I. HYPERPARAMETERS UTILIZED WITH THEIR VALUES
EPOCHS 20
BATCHSIZE 64
Page | 17
PERFORMANCE MATRICES
Deep learning algorithms stated above are evaluated based on four basic parameters.
ACCURACY : It is metrics which determines measure of correctly classified instances out
of total number of instances. accuracy gives combine result of true positive and true negative
values.
ACCURACY = TP+TN
TP+TN+FP+FN
PRECISION: Precision states that how many of total number of selected instances identifies
accurately.
PRECISION = TP + FP
TP
Page | 18
RESULT
The result for lung cancer detection using the VGG16 CNN model shows a testing accuracy of
77.5% and a training accuracy of 81.5%. This result indicates that the model is able to recognize
and differentiate between healthy and cancerous lung tissue with a high degree of accuracy.
The model is able to identify the presence of cancerous tissue in lung images with 77.5%
accuracy. This result is encouraging, as it has the potential to improve the early detection of
lung cancer and provide more accurate diagnoses. However, the model should be further
evaluated with more data to ensure that it is performing optimally. Additionally, the model
should be tested with different types of data to further improve its accuracy. Overall, the
VGG16 CNN model shows good promise for the early detection of lung cancer.
Page | 19
For Adenocarcinoma
Page | 20
CONCLUSION AND FUTURE WORK
In this our work, we implement a convolutional neural network-based system for lung cancer
detection. We used the Kaggle site dataset for this study. We practice segmentation and
preprocessing techniques. 224×224 pixels is the new size for all CT Scan images. CNN model
was finally trained using the training data and the model's performance was evaluated using
the test dataset. We use test dataset dataset to evaluate the model's performance and we get an
accuracy rate of 77.3%. We aim for maximum accuracy of 95-100%. where the training
accuracy for vgg16 is 81.4%.
FUTURE WORK
Finally, we have developed an accurate web application for lung cancer screening. In
addition, technologies such as Python and deep learning are used to build websites. Deep
learning libraries like camera, tensorflow, seaborn and sklearn are used. For this system, the
user name and input image are selected from the database. A system will determine if a
particular image is cancerous or not and display the results.
Page | 21
REFRENCES
1. Nasraullah Nasrullah, Jun Sang, Mohammad S. Alam, Muhammad Mateen, Bin Cai
and Haibo Hu,” Automated Lung Nodule Detection and Classification Using Deep
Learning Combined with Multiple Strategies”
2. Ruchita Tekade , Dr. R.K. Rajeswari,”Lung cancer detection and classification using
deep Learning” IEEE 2018.
3. Maithily Marathe , Madhuri Bhalekar, “Detection of Lung Cancer using CT scans
with Deep Learning approach” IEEE 2022
4. 2022ChestCTScanImageDatasethttps://www.kaggle.com/datasets/mohamedhanyyy/c
hest-ctscan-images (accessed on 10 October,2022)
Page | 22
APPENDIX
CODE
Page | 23
Page | 24
Page | 25
Page | 26