0% found this document useful (0 votes)

100 views54 pages

Report of Mini

This document presents a mini project report on lung cancer detection. It discusses the importance of early detection of lung cancer which can be achieved through techniques like deep learning, machine learning and computer vision applied to CT scan images. The project aims to analyze CT scan slices using various pre-processing methods and deep learning algorithms like 3D convolutional neural networks to train a model that can accurately detect lung cancer. The model would help identify and diagnose cancer in early stages improving health outcomes. The document outlines the objectives, methodology, implementation and testing of the proposed system for automated lung cancer detection and classification.

Uploaded by

Kamma Vijaya Poojitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views54 pages

Report of Mini

Uploaded by

Kamma Vijaya Poojitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 54

A Mini Project Report

On
LUNG CANCER DETECTION
Submitted to
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY ANANTAPUR,
ANANTHAPURAM
In Partial Fulfillment of the Requirements for the Award of the Degree of
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE & TECHNOLOGY
Submitted By
K. KAVYA - (18691A2813)
K.V. POOJITHA - (18691A2831)
D.V. PRATHYUSHA - (18691A2833)
K. REESHPA - (18691A2838)
Under the Guidance of
Dr. Rajakumar R
Associate Professor
Department of Computer Science & Technology

MADANAPALLE INSTITUTE OF TECHNOLGY & SCIENCE

(UGC – AUTONOMOUS)
(Affiliated to JNTUA, Anantapur)
Accredited by NBA, Approved by AICTE, New Delhi)
AN ISO 9001:2008 Certified Institution
P. B. No: 14, Angallu, Madanapalle – 517325
2018-2022
2009-2013

DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY

BONAFIDE CERTIFICATE

This is to certify that the mini project work entitled “LUNG CANCER DETECTION” is a
Bonafede work carried out by
K. KAVYA - (18691A2813)
K.V. POOJITHA - (18691A2831)
D.V. PRATHYUSHA - (18691A2833)
K. REESHPA - (18691A2838)
Submitted in partial fulfillment of the requirements for the award of degree Bachelor of
Technology in the stream of Computer Science & Technology in Madanapalle Institute of
Technology and Science, Madanapalle, affiliated to Jawaharlal Nehru Technological
University Anantapur, Anantapur during the academic year 2021-2022.

Guide Head of the Department

Dr. Rajakumar R Dr. M. Sreedevi
Associate Professor Professor and Head
Department of CST Department of CST

Submitted for the University examination held on:

1
ACKNOWLEDGEMENT

We sincerely thank the MANAGEMENT of Madanapalle Institute of Technology and

Science for providing excellent infrastructure and lab facilities that helped me to complete this mini
project.

We sincerely thank Dr. C.Yuvaraj, M.E., Ph.D., Principal for guiding and providing
facilities for the successful completion of our mini project at Madanapalle Institute of
Technology and Science, Madanapalle.

We express our deep sense of gratitude to Dr. M. Sreedevi, Ph.D., Professor and Head,
Department of CST for her valuable guidance and constant encouragement given to us during this
work.

We express our deep gratitude to my guide Dr. Rajakumar R, Associate Professor,

Department of CST for his guidance and encouragement that helped us to complete this mini
project.

We express my deep sense gratitude to Dr. R. Sendhil, Assistant Professor, Project

Coordinator for their guidance and encouragement that helped us to complete this mini project.

We also wish to place on record my gratefulness to other Faculty of CST Department and
to our friends and our parents for their help and cooperation during our mini project work.

2
DECLARATION

We hereby declare that the results embodied in this project “LUNG CANCER
DETECTION” by us under the guidance of Dr. Rajakumar R, Associate Professor, Dept. of
Computer Science & Technology partial fulfillment of the award of Bachelor of Technology in
Computer Science & Technology from Jawaharlal Nehru Technological University
Anantapur, Anantapur and we have not submitted the same to any other University/institute for
award of any other degree.

Date :
Place :

PROJECTASSOCIATES:
K. KAVYA - (18691A2813)
K.V. POOJITHA - (18691A2831)
D.V. PRATHYUSHA - (18691A2833)
K. REESHPA - (18691A2838)
I certify that above statement made by the students is correct to the best of my knowledge.

Date: Guide: Dr. Rajakumar R

3
Table of Contents

S.NO. Topic Page No

ACKNOWLEDGEMENT III
DECLARATION IV
ABSTRACT VIII
1 INTRODUCTION 1
1.1 Motivation 2
1.2 Existing System 2
1.2.1 Limitations of existing system 3
1.3 Objectives 3
1.4 Outcomes 3
1.5 Applications 3
1.6 Structure of Project (System Analysis) 3
1.6.1 Requisites Accumulating and Analysis 4
1.6.2 System Design 4
1.6.3 Implementation 4
1.6.4 Testing 4
1.6.5 Deployment of System and Maintenance 4
1.7 Functional Requirements 5
1.8 Nonfunctional Requirements 5
1.8.1 Examples 6
1.8.2 Advantages 6
1.8.3 Disadvantages 6
2 LITERATURE SURVEY 7
2.1 Literature Survey Conclusion 12
3 PROBLEM ANALYSIS 13
3.1 Existing Approach 13
3.1.1 Drawbacks 13
3.2 Proposed System 13
3.2.1 Advantages 13
3.3 Software and Hardware Requirements 14
3.4 About dataset 14
3.5 Algorithms 14
3.6 Flow chart 16
4 SYSTEM METHODOLOGY 17
4.1 CT scan 17
4.2 CAD 18
4.3 Feature Detection 18
5 IMPLEMENTATION 20

4
5.1 Code 20
6 EXPERIMENTAL RESULTS 40
7 TESTING 43
7.1 Software testing 43
7.1.1 Types of testing 43
8 CONCLUSION AND FUTURE SCOPE 44
7.1 Conclusion 44
7.2 Future scope 44
REFERENCES 45

5
LISTOF FIGURES

S.NO. Name of the figure Page No

1 Lung CT 1

2 Human Development Index 2

3 Project SDLC 3

4 Flow chart 16

5 Output 40, 41

6
ABSTRACT
Treacherous and appalling are the perfect combined words for Cancer, and if not detected at an
early stage, it will be a humongous risk for the health, Hence the pilot detection of the cancer is
vital which can be accomplished through deep learning, machine learning and computer vision
which have a significant potential with their recent advancements and developments and promise a
great virtue in early detection, which can be achieved through the usage of CT scans through the
honed deep learning algorithm. The snag with prior identification is exceedingly demanding,
therefore the usage of Low Dose Computer Tomography (LDCT) is so done, and for the prime
results different type of approaches are so traversed and compared. The methods employed are
segmentation techniques, 3D Convolution Neural Network (CNN), U-Net Building. The aim of this
venture is to assess the information (Slices of CT scans) furnished with numerous pre-processing
strategies and examine the information the usage of system studying algorithms, in this situation 3-d
Convolutional Neural Networks to teach and validate the version, to create a correct version which
may be used to decide whether or not someone has most cancers or not. This will substantially
assist withinside the identity and removal of most cancer cells withinside the early stages.

Keywords: Lung Cancer, Deep Learning, Convolutional Neural Networks, Low Dose Computer
Tomography, Computer Aided Detection, Feature Extraction, CT scan, Watershed Algorithm, U-
Net, Image Segmentation, VGG model, False Positives.

7
CHAPTER 1
INTRODUCTION
Lung cancer is a disease in which the cells of the lung tissues grow uncontrollably and form tumors.
It is the leading cause of death from cancer among both men and women worldwide, reporting over
1.8 million deaths according to World Health Organization (WHO) and the survey held by Global
Cancer Survey (GLOBOCAN 2020). The changes for this to happen occurs mainly the interaction
between a person's genetic factors and three categories of external agents such as: physical
carcinogens: such as ultraviolet and ionizing radiation, chemical carcinogens: such as asbestos,

components of tobacco smoke, aflatoxin (a food contaminant), and arsenic (a drinking water
contaminant), biological carcinogens, such as infections from certain viruses, bacteria, or parasites.

Fig. 1. shows the lung cancer through and X ray with blurred small dots

According to estimates from the World Health Organization (WHO) in 2019, cancer is the first or
second leading cause of death before the age of 70 years in 112 of 183 countries and ranks third or
fourth in a further 23 countries.
Cancer's rising prominence as a leading cause of death partly reflects marked declines in mortality
rates of stroke and coronary heart disease, relative to cancer, in many countries. The extent to which
the position of cancer as a cause of premature death reflects national levels of social and economic
development can be seen by comparing the maps in Figure 1 and Figure 2A, the latter depicting the
4-tier Human Development Index (HDI) based on the United Nation's 2019 Human Development
Report.

8
Fig. 2. (A) The 4-Tier Human Development Index (HDI) and (B) 20 Areas of the World. The sizes of
the respective populations are included in the legend. Source: United Nations Procurement
Division/United Nations Development Program.

Hence. The early detection of cancer is an immediate need of action, the cancer burden can also be
reduced through early detection of cancer and appropriate treatment and care of patients who
develop cancer. Many cancers have a high chance of cure if diagnosed early and treated
appropriately. Early detection of lung cancer is important because it allows for timely treatment and
has potential to reduce deaths. The way lung cancer is diagnosed is by inspecting a patient's CT
scan images, looking for small blobs in the lungs called nodules. Finding a nodule is not in itself
indicative of cancer; the nodules have to have certain characteristics (shape, size, etc.) to support a
cancer diagnosis.

1.1 MOTIVATION
From our investigation and anatomy, we have found out that there are multiple algorithms and
techniques used for the detection and segmentation but each has one or the other drawback that
makes it fall behind, hence we compare the best approaches to give out the best results thereby
making early detection efficient and less time consuming.

1.2 EXISTING SYSTEM

The Existing methods strive to perform various detection and segmentation for lung cancer which
involves heavy usage of Computer Aided Detection (CAD). Usage of image processing methods
based on pattern detection algorithms for specific styles, boundaries and edge identifications have a
lot of unbalanced false positives and negatives errors along with other problems arising such as
difficulty in detecting nodules if there is presence of any other moderate diseases in lungs, and
asymmetry of the lungs.

9
1.2.1 Limitations of existing system
The recent emergence of deep convolutional neural networks has provided some attractive solutions
for domain transfer mainly the usage of 2D and 3D.

1.3 OBJECTIVES
The paramount objective is the to detect the location of the cancerous lung nodules, to classify the
lung cancer and its severity and to use the best method for early detection by comparison of both
machine learning and deep learning methods such as flattening, pooling, U-net etc.

1.4 OUTCOMES
We describe the methods, implementation steps, and results of lung cancer detection. Within the
approaches explored three different methods to handle the labels: averaging nodule encodings per
patient, labeling nodules with the same label given to the patient, or, finally to not do nodule
detection at all and opt instead for a 3D model on the raw images. Of these, the first method seemed
to have given the best score.

1.5 APPLICATIONS
This Strategy is used for early detection of Cancer, Used for large datasets and
Used for the location of Cancer in the Lung.

1.6 STRUCTURE OF PROJECT (SYSTEM ANALYSIS)

Fig. 3. Project SDLC

 Accumulation and Analysis of project requisites

 Application System Design

 Practical Implementation

 Manual Testing of my plication

10
 Deployment of Application of System

 Project Maintenance

1.6.1 Requisites Accumulating and Analysis

It’s the first and foremost stage of the any project as our is an academic leave for requisites
amassing, we followed of variety of journals from vivid divergent sources such as IEEE, ACM and
LCD and stockpiled and individually examined each and every paper which we collected to refer,
and finally gleaned upon individual web revisitation by setting and substance importance input and
for analysis stage we took referees from the paper and did literature survey of the papers and
amassed all the Requisites of the project in this stage.

1.6.2 System Design

The System design is composed of many graphical modules and image processing packages.
Computer Aided Detection (CAD) plays an important role and considers mainly itself with CT scan
images especially with Low Dose Computer Tomography (LDCT) images , programming language
Python here is used which should be a minimum version of 2.7 along with MATPLOTLIB, pandas,
Numpy, TensorFlow, Pillow, Scipy and a windows version of 7 major processing techniques plays
the major part.

1.6.3 Implementation
The Implementation is Phase where we endeavor to give the practical output of the work done in
designing stage and most of Coding in Business logic lay comes into action in this stage and is the
main and crucial part of the project.

1.6.4 Testing

 UNIT TESTING
It's done by the developer at every stage of the project, and fine-tuning the bug and module
dependencies is also done by the developer, but we're only going to fix all the runtime mistakes
here.
 MANUAL TESTING
Because our project is on academic level, we are unable to conduct any automated testing;
therefore, we rely on manual testing using trial and error methods.

11
1.6.5 Deployment of System and Maintenance
Once the project is complete, we will deploy the client system in the real world. Throughout our
academic break, we solely launched the client system in our college lab with all appropriate
equipment and a Windows OS. Our project's maintenance is a one-time procedure.
1.7 FUNCTIONAL REQUIREMENTS
1. Data Collection
2. Data Pre-processing
3. Training and Testing
4. Modelling
5. Predicting

1.8 NONFUNCTIONAL REQUIREMENTS

Non on functional requirements (NFRs) define constraints which affect how the system should do
it. While a system can still work if NFRs are not met, it may not meet user or stakeholder
expectations, or the needs of the business.

NFRs also keep functional requirements in line, so to speak. Attributes that make the product
affordable, easy to use, and accessible, for example, come from NFRs. Let’s look at some actual
examples, each page must load within 2 seconds, the process must finish within 3 hours so data is
available by 8 a.m. local time after an overnight update, the system must meet Web Content
Accessibility Guidelines WCAG 2.1, database security must meet HIPPA requirements, Users shall
be prompted to provide an electronic signature before loading a new page. Description of non-
functional requirements is just as critical as a functional requirement.

 Usability requirement
 Serviceability requirement
 Manageability requirement
 Recoverability requirement
 Security requirement
 Data Integrity requirement
 Capacity requirement
 Availability requirement
 Scalability requirement
 Interoperability requirement
 Reliability requirement
12
 Maintainability requirement
 Regulatory requirement
 Environmental requirement

1.8.1 Examples
Here, are some examples of non-functional requirement:
 The date format must be as follows: month. date. year

 The payment processing gateway must be PCI DSS compliant.

 The web dashboard must be available to US user’s 98 percent of the time every month
during business hours EST.

1.8.2 Advantages

Benefits/pros of Non-functional testing are:

 The non-functional requirements ensure that the software system complies with all
applicable laws and regulations. They ensure the reliability, availability, and performance of
the software system.

 They ensure a positive user experience and software that is simple to use.

 They assist in formulating safety coverage of the software program system.

1.8.3 Disadvantages
Cons/drawbacks of non-function requirement are:
 Non-useful requirement may also have an effect on the diverse high-degree software
program subsystem.
 They require unique attention in the course of the software program structure/high-
degree layout section which will increase costs.
 Their implementation does now no longer generally map to the precise software
program sub-system.
 It is hard to regulate non-useful when you by skip the structure section.

13
CHAPTER 2
LITERATURE SURVEY
In 2004, Aristfanes c silva et al. [1] have published a paper diagnosing of lung nodule by
means of dice coefficient where there major contribution has been improving the index of dice
coefficient, providing an optimized ROC curve and Skeletisation, the algorithm that have been
employed here are texture processing algorithms mainly the statistical, skeptical and structural ones,
the proposed methods are a variety I,e. Spatial Gray Level Dependence Method-SGLDM, Gray
Level Difference Method-GLDM, and Gray Level Run Length Matrices GLRLM. Simulated
mainly with MATLAB and achieving an accuracy of 79.33% its major drawback has been that there
is no dataset used.

In 2019, Mr. Attique Khan et al. [2] published a paper regarding lung cancer detection in
which their major contribution has been developing a novel design of contrast stretching based
classical features fusion processing for localizing the cancer classification. The algorithm that has
been utilized here is CNN along with Gamma correction max intensity weight approach, entropy-
based approach with NCA (Neighborhood Component Analysis), their program has been simulated
with tools such as Jupyter Notebook and CAD which is an important notion. Lung data science
bowl is the dataset used, providing a Maximum Accuracy is 99.4%, however Finding difficulty in
locating in small lesions has been a limitation for them.

In 2012, Maxine Tan et al. [3] have published a paper with a main ideology being feature –
deselective neuroevolutionary classifier, with Novel Feature DE selective Neuro evolution Format
as their major goal and contribution, usage of genetic algorithm has been done along with proposed
methods FD-NEAT classifier and ANN and SVM respectively. LIDC database has been utilized,
which gives an accuracy about 83.93% and the challenge being the complexity in number of nodes.

In 2015, Shuang Feng dia Et al.[4] published a paper on neurocomputing with their major
focus and contribution had been on providing the best robust method on lung segmentation to
minimize the time of post processing as much as possible. The algorithms used here are GMMS and
EM algorithm and Minimum Cut theory as one of the methods for improved graph cuts alongside
morphological calculations. The dataset had been derived from General Hospital of Ningxia

14
Medical University and simulated by python tool it achieved an accuracy of 85% and sensitivity of
86%, despite all this it has drawbacks such as very high time consumption and high expense.

In 2019, Marjolein Et al.[5] published a paper where the main focus and successful major
contribution had been Identify early-stage lung cancer while preventing unnecessary workup for
benign nodules, using deep LR algorithm and proposed methods being LUNG-RADS, VDT
(Volume Doubling Time) with the utilization of NLST dataset and python as simulation tool it
achieved an accuracy and sensitivity of 94.5% and 99% respectively yet limited to the challenge of
time consumption being high for asymmetric lungs.

In 2019 Yuan Huang Et al.[6] published a paper regarding lung segmentation using full
CNN Algorithm and providing a major contribution by providing New weekly supervised training
scheme and proposed solutions being EWT GAN and using two datasets LOBE and LOLA 2011
respectively, even though they achieved an accuracy of 98% as compared to other methods they
have mentioned in their paper with CAD and python being the simulant tools, the major challenge
or drawback has been finding difficulty in detection is there are any other moderate lung diseases
present.

In 2018 Zhengwe hui Et al.[7] published a paper on pulmonary lung nodule detection, where
their major contribution has been to improve the nodule candidate detection using deep neural
networks, multi-level network has been threating the nodule detection task as pixel-level
segmentation problem and proposed method KNN and NMS. Dataset utilized is LUNA 16and
achieving 94.03% sensitivity score, 1/3 less false positives with CAD and Python tools and the
limitation being Heavily unbalanced positives and negatives. In 2019, Nadas El-Askary Et al.[8]
published a paper on Lung Nodule feature extraction where their major contribution and goal has
been Improve early detection for nodules, 5 stages model and Random forest algorithm is used and
proposed methods are SVM, KNN and L7DC database is the dataset used, simulated by CAD and
python it achieved 90.73% for accuracy, 90.67% for sensitivity, 90.08% for specificity and the
major limitation being its complex nature and multiple steps which makes it time consuming.

In 2016, Xinyan li Et al.[9] published a paper on enhanced lung segmentation which aims to
propose an efficient and accurate lung segmentation, The improved method used in this paper
combines mathematical morphology and kernel graph cuts and KMC, OTSC are the algorithms
used, Absence of dataset is the major drawback and is slower in nature and has been stimulated by
Jupyter notebook idle and CAD.

15
In 2020 Bijaya Kumar Et al.[10] published a paper regrading benign nodules segmentation ,
its main contribution is that is considered Benign tissue, Adenocarcinoma, Squamous cells
Carcinoma and CNN is the algorithm is used and proposed solutions being Machine Learning, Data
acquisition, Data formatting, Testing with the dataset being RGB color histopathology images
with .jpeg format and simulated with Python and CAD it achieved a 91% accuracy, 94% sensitivity
but the challenge is the evidence of more false negatives.

In 2017 Jong Won Kim Et al.[11] published a paper regarding diagnosis of lung cancer using deep
neural network, where their major contribution has been Segmentation of chest CT data and storing
in 3D Arrays, the algorithm used here is 2D CNN, DST, SVM, ANN, VGG are the proposed
methods. Simulated by Python and MATLAB it achieved a varying accuracy between 40%-70%
and the limitation is that there is no dataset used.

In 2012 D. Shiloah Elizabeth Et al.[12] published a paper regarding A Novel Segmentation

Approach for Improving Diagnostic Accuracy which has also been their major contribution along
with Automated approach to segmentation using parenchyma and the algorithm being employed
here is Probabilistic Neural Network, with the proposed works being ROI Extraction and PBR’s, it
was simulated through CAD and Jupyter Notebook and achieved an accuracy of 90.4% but face a
problem when segmentation became a challenging task if two lungs are asymmetric.

In 2021 Peter M Et al.[13] published a paper regarding nodule detections where their major
focus has been gaining information on nodule characteristics and more detailed information on
nodules and major contribution is that they had been able to do so and improve the accuracy. LCP-
CNN is the algorithm used here along with the proposed works being Area-Under the ROC curve
analysis (AVC) and US National Lung Screening Trail is the dataset used and simulant tool is CAD
and Python which provided an accuracy of 97% but the drawback is the Unbalance between
malignant and Benign.

In 2020 Waiya Chintanapakdee Et al.[14] published a paper regarding early cancer detection
and their major contribution has been “Early – stage cancerous cells detected”, their algorithm is
LCS program (Lung Cancer Screening) CNN and proposed work used here is radiology, the
simulant tool used here is CAD and Jupyter Notebook and the dataset used here is Medicare
Conversion factor and achieving an accuracy of 74.7% and the limitation is that it is time
consuming. In 2020 Ying Su Et al.[15] published a paper regarding lung cancer detection in which

16
their major contribution has been providing an automatic lung detection of lung nodules and the
algorithm used here is R-CNN algorithm and the proposed method being Artificial intelligence
technique. Simulated by CAD, Python and achieving an Accuracy of 91.2% and the dataset used
here is International Public Database (LIDC – IDRI) and the drawback is that the Optimization of
R-CNN is not up to the mark.

In 2019 Onur Ozdemir Et al.[16] published a paper regarding lung cancer detection and
segmentation where their major contribution has been characterization of model uncertainty in our
deep learning system, CT analysis well calibrated classification and the algorithm used here is an
End-to-end probabilistic diagnostic for lung cancer build on deep 3D CNN and the proposed work
is an End-to-end automated diagnostic tool to diagnose lung cancer. Simulated tools are CADe.
CADx, and the datasets used here are LUNA 16 and Kaggle which enabled the result of 0.921,
0.869 coefficient indices and the challenge being Lack of nodule detection in LUNA 16.

In 2017 He Ma Et al.[17] published a paper regarding lung cancer detection and

segmentation where their major contribution has been to precisely, quickly locate the positions of
nodules, 4 kinds of CNN structure and the algorithm used here Firangi Filter Combining two groups
of images four channel CNN and the proposed work is Improve performance of lung nodule
detection and greatly reduce the false positives under a huge amount of image data, Simulated tools
are CAD systems and achieving a 80.06% sensitivity, 94% accuracy and the drawback is that They
did not improve the detection precision.

In 2020 Wadood Abdul Et al.[18] published a paper regarding lung cancer segmentation
where their major contribution has been providing CNN Model, ALDC system is built to detect,
classify whether tumors found in lungs are malignant or benign. ALCDC system validated using
images from LIDC, IDRI. Is the algorithm and the comparison shows proposed LADC system
performs better than existing state – of-the art systems is the proposed works with SVM KNN.
Simulated tools are CAD system, Python and the dataset is the LIDC- IDRI database which enabled
the result of 97.02% accuracy and the cons being that it is time consuming.

In 2018 Bohdon chapaliuk Et al.[19] published a paper regarding lung cancer segmentation
where their major contribution has been providing Trained c3d and 3c dense net network for whole
image classification and 3D, CNN are the algorithms used alongside the proposed works being is U-
Net and ANN. Simulated by CAD, Jupyter Notebook and the dataset is LIDC-IDRI, thereby
achieving an 85% accuracy and the challenge has been the evidence of anomalies. In 2018 Ruchita

17
tekade Et al.[20] published a paper regarding lung nodule detection where the major contribution is
the Improved efficiency of lung nodule detection. 2D and 3D CNN algorithms have been employed
here, the proposed methods are U-Net architecture, LUNA 16 and LIDC-IDRI are the datasets used
and the simulant tools used here is Python 3.5 and the result proctored is an accuracy of 95.66% and
the major drawback and challenge faced is Excessive complexity.

In 2018 Goran Jakimovski Et al.[21] published a paper regarding lung nodule detection and
lung feature extractions where the major contribution is the furnishing of DNN includes layers of
convolution that can search anomalies. The algorithm used here is 3D DNN which is used to test the
deep neural networks. The proposed works are PBR, LBP and various feature extractions. Simulant
tools are CAD along with Jupyter notebook and the accuracy achieved is 90.1% and the absence of
dataset is the main drawback. In 2017 Lei fan Et al.[22] published a paper regarding lung
segmentation where the major contribution is Image segmentation and pooling and 3D CNN and the
algorithm used here is 3DD CNN which is efficient in lung detection and the proposed works are
histograms and Zernike. The database used here is CT Image dataset and the simulant tool is CAD,
accuracy is 67.7% and the major drawbacks are that it is inefficient and slow.

In 2018 Guobin Zhang Et al.[23] published a paper regarding lung nodule detection where
the major contribution is the detailed report of the five major components in computer aided
detection system which are: data acquisition, pre-processing, lung segmentation, nodule detection
and false positive detections. CNN algorithm and K means Clustering are the algorithm and
proposed method used respectively, the dataset that has been utilized here is LIDC IDRI dataset
along with simulant tools that are CAD and Python which enabled the result of 89.49% sensitivity
and 84% accuracy and Time consuming and complex nature being the major cons and
disadvantages.

2.1 LITERATURE SURVEY CONCLUSION

The Existing strategies attempt to carry out numerous detection and segmentation for lung most
cancers which entails heavy utilization of CAD. Usage of photo processing strategies primarily
based totally on sample detection algorithms for particular styles, obstacles and aspect
identifications have a variety of unbalanced fake positives and negatives mistakes at the side of
different troubles springing up inclusive of trouble in detecting nodules if there's presence of every
other mild sickness in lungs, and asymmetry of the lungs and other variants such as complex nature,
the heavy unbalance between false positives and negatives. Our project carries out to compare and
resolve the best issue.

18
CHAPTER 3
PROBLEM ANALYSIS
3.1 EXISTING APPROACH

Existing methods are used to identify cancerous (malignant) and noncancerous(benign) nodules are
small growths of cells inside the lung. We all know that we need to detect the malignant lung
nodules at an early stage to cure the lung cancer for the crucial prognosis. In CT scan images they
appear different diagnosis on basis of slight morphological changes, locations, and clinical
biomarkers that we need to identify the nodules which are cancerous. In this we measure the
probability of malignancy for the early for the cancerous lung nodules. Several diagnostic
procedures are used by physicians, for the early diagnosis of malignant lung nodules, whereas
clinical settings, computed tomography (CT) scan analysis, positron emission tomography (PET)
and needle prick biopsy analysis. Mostly, for the investigation purpose we need computed
tomography (CT) images.

3.1.1 Drawbacks
In this, there is no significant difference in detection sensitivities between low-dose and standard-
dose CT images.

3.2 PROPOSED SYSTEM

In this we are going segment the CT images after pre-processing is done. So, we use watershed
algorithm to make segmentation. To make a mask for cancer cells in lung images. We need to build
model called VGG186NET transfer learning model for better accuracy. We use a dataset LUNA16
and subset of LIDC-IDRI. By the represents in the 3d image. Algorithm treats pixels values as local
topography. We use a Sobel filter to remove external layers of lungs. It provides better segmented
lungs than earlier process. In these three models are used, first model performs significantly well in
number classification problem. Second model performs with specified number of elements and
layers. Third model gives best results in object classification. As mentioned earlier VGG Net
distinguish the objects and used to classify unseen objects.

3.2.1 Advantages
Sequential_2 and VGG_16 is showing good when training the model and reaching high levels of
test accuracy.

19
3.3 SOFTWARE AND HARDWARE ARE REQUIREMENRS
Software Requirements
The functional requirements of the model we are used in this are following
 Python idle 3.7 version (or)
 Jupiter

Hardware Requirements
Minimum hardware requirements are required to approach a model. Dataset that needs store large
data/arrays in memory will require more RAM, whereas images are used in this model to find out
the nodules.
 Operating system: windows 10
 Processor : intel i3
 Ram : 4 GB
 Hard disk : 250 GB

3.4 ABOUT DATASET

We used LUNA16 dataset is a subset of LIDC-IDRI dataset. It focuses on a large-scale evaluation
of automatic module detection algorithms on LIDC/IDRI dataset.

3.5 ALGORITHMS
3D-CNN
First, we need know CNN, it is a convolutional neural network in deep learning algorithm, which
can take in an input image, assign importance to various aspects/objects in the image and able to
differentiate one from the other. In this, pre-processing required in a Connect is much lower as
compared to other classification algorithms. In this architecture, they are two types of CNN
Segmentation and CNN classification. Firstly, segmentation CNN identifies the regions in an image
from one or more classes of semantically interpretable objects. Secondly, classification CNN
identifies each pixel into one or more given to a set of real-world object categories. A CovNet can
successfully capture the spatial and temporal dependencies in an image through the applications pf
relevant filters. We have also used 3D-CNN algorithm, which is the best among all techniques that
were used in this model.
WATERSHED
This is region-based technique that utilizes image morphology. It requires selection of at least one
marker interior to each object of the image, including the background as separate object. It is a
classical algorithm used for segmentation for separating different objects in an image and its

20
transformation is defined as grayscale image. For better segmentation we integrate Sobel filter with
watershed algorithms. Also, it removes the external layers of the lungs where lung filter with
morphological operations and morphological gradients provides better segmented lungs. We tried
different models like Sequential_1, Sequential_2, VGG16-NET. VGG-Net gives more appreciable
results in object classification that any other models. It follows arrangement of convolution and max
pooling layers consistently throughout the whole architectures. Pooling operation that calculates the
maximum, value in each patch of each feature map. VGG net learned to extract the features that can
distinguish the objects and is used to classify the unseen objects.

LBP
Another algorithm we used in this LBP (local binary pattern), is a simple yet very efficient texture
operator which labels the pixels of an image by thresholding the neighborhood of each pixel and
considers the result as binary number. LBP operator transforms an image into an array or an image
of integer labels describing small-scale appearance of the image.

AUTO ENCODERS
Next, we have seen auto encoder, autoencoder means it is a type of Artificial Neural Network used
to learn efficient coding of unlabeled data. The encoding is validated and refined by attempting to
regenerate the input from the encoding. Autoencoders can be used for image denoising, image
compression, and in some cases even generation of image data.

FLATTENNING
Another, flattening is converting the data into 1D array for inputting it to next layer. We flatten the
output of convolutional layers to create a single long feature vector and it is connected to final
classification model, which is called fully connected layer.

POOLING
A pooling layer is another building block of CNN. Its function is to progressively reduce the spatial
size of representation to reduce the number of parameters and computation in network. Pooling
layer operates on each feature map independently. The most common approach is max pooling. It is
used in CNN for consolidating the features learned by the convolutional layer feature map. It is
basically helps in reduction of overfitting by the time of training of model by compressing the
features in feature map.
U-NET
U-Net is a CNN that was developed for biomedical image segmentation, the network which is based
on the fully convolutional network and its architecture was modified and extended to work with
fewer training images and to yield more precise segmentations. The main idea is to supplement a
21
usual contracting network by successive layer, where pooling operations are replaced by
unsampling operators. One important modification in U-Net is that there a large number of feature
channels in the unsampling part, which allow to network to propagate context information to higher
resolution layers.

3.6 FLOW CHART

22
CHAPTER 4
SYSTEM METHODOLOGY
4.1 CT SCAN IMAGES

A Computerized Tomography scan(CT) uses computers and rotating X-ray machines to create
cross-sectional images of the body. These images provide more detailed information than normal X-
ray images. They can show the soft tissues, blood vessels, and bones in various parts of the body. A
CT scan may be used to visualize the head, shoulders, spine, heart, abdomen, knee, chest.
During a CT scan, you lie in a tunnel-like machine while the inside of the machine rotates and takes
a series of X-rays from different angles. These pictures are then sent to a computer, where they’re
combined to create images of slices, or cross-sections, of the body. They may also be combined to
produce a 3D image of a particular area of the body.

4.1.1 Why is a CT scan Performed?

A CT scan has many uses, bus it’s particularly well-suited for diagnosing diseases and evaluating
injuries. The imaging technique can help your doctor to diagnose infections, muscle disorders, and
bone fractures, to pinpoint the location of masses and tumors(including cancer), to study the blood
vessels and other internal structures, to assess the extent of internal injuries and internal blooding, to
guide procedures, such as surgeries and biopsies, and also to monitor the effectiveness of treatments
for certain medical conditions, including cancer and heart diseases.

4.1.2 How is a CI Scan Performed?

Your doctor may give you a special dye called a contrast material to help internal structures show
up more clearly on the X-ray images. The contract material blocks X-rays and appears white on the
images, allowing it to highlight the intestines, blood vessels, or other structures in the area being
examined. Depending on the part of your body that’s being inspected, you may need to drink a
liquid containing the contrast. Alternatively, the contrast may need to be injected into your arm or
administered through your rectum via an enema. If your doctor plans on using a contrast material,
they may ask you to fast for four to six hours before your CT scan.

It’s very important to stay still while CT images are being taken because movement can result in
blurry pictures. Your doctor may also ask to hold your breath for a short period during the test to
prevent your chest from moving up and down.

23
4.1.3 What do CT Scan Results Mean?
CT scan results are considered normal if the radiologist didn’t see any tumors, blood clots,
fractures, or other abnormalities in the images. If any abnormalities are detected during the CT scan,
you may need further tests or treatments, depending on the type of abnormality found.

4.2 COMPUTER AIDED DIAGNOSIS

CAD system assists the doctors for the diagnosis of diseases in higher degree of perfection within a
short period of time. Now CAD is the most preferable method for the initial diagnosis of cancer
using X-ray, CT, mammogram or MRI images. CAD works as an intermediate in between the
radiologist and the input images.
Computer Aided Diagnosis (CAD) is one of the trusted methods in the field of medicine. The
output from CAD doesn't think about as a final result however used as a reference for more tests in
the relevant field. In fact, CAD helps the doctors for detection of cancer more precisely and early.
The combination of artificial intelligence, digital image processing technique and radiological
image processing etc. makes the CAD system more reliable and efficient. Sensitivity, specificity,
absolute detection rate etc. are the important parameters of the CAD system. Now CAD system is
mostly used for breast cancer detection, lung cancer detection, colon cancer, coronary artery
disease, congenital heart defect, lung cancer, bone cancer, brain tumor etc. Any part of body can
affect cancer and very high possibility to spread other parts. These days CAD system developed to a
great extends, however it's not reached to 100% accuracy. In this article that discusses the necessary
options, motivation, findings from the early developments and future expansions of CAD systems.
CAD system is the one which helps to improve the diagnostic performance of radiologists in their
image interpretations. The main aim of this technique is to develop a CAD system for finding the
lung cancer using the lung CT images and classify the nodule as Benign or Malignant.

4.3 FEATURE EXTRACTION

Feature extraction is a general term for methods of constructing combinations of the variables to get
around these problems while still describing the data with sufficient accuracy. Many machine
learning practitioners believe that properly optimized feature extraction is the key to effective
model construction. The biggest advantage of Deep Learning is that we do not need to manually
extract features from the image. The network learns to extract features while training. You just feed
the image to the network (pixel values). When the input data to an algorithm is too large to be
processed and it is suspected to be redundant (e.g., the same measurement in both feet and meters,
or the repetitiveness of images presented as pixels), then it can be transformed into a reduced set of
24
features (also named a feature vector). Determining a subset of the initial features is called feature
selection.[2] The selected features are expected to contain the relevant information from the input
data, so that the desired task can be performed by using this reduced representation instead of the
complete initial data.
Feature extraction involves reducing the number of resources required to describe a large set of
data. When performing analysis of complex data one of the major problems stems from the number
of variables involved. Analysis with a large number of variables generally requires a large amount
of memory and computation power, also it may cause a classification algorithm to overfit to training
samples and generalize poorly to new samples. Feature extraction is a general term for methods of
constructing combinations of the variables to get around these problems while still describing the
data with sufficient accuracy.

4.3.1 What is Feature Extraction?

Feature extraction is a process of dimensionality reduction by which an initial set of raw data is
reduced to more manageable groups for processing. A characteristic of these large data sets is a
large number of variables that require a lot of computing resources to process. Feature extraction is
the name for methods that select and /or combine variables into features, effectively reducing the
amount of data that must be processed, while still accurately and completely describing the original
data set.

4.3.2 Why is this Useful?

The process of feature extraction is useful when you need to reduce the number of resources needed
for processing without losing important or relevant information. Feature extraction can also reduce
the amount of redundant data for a given analysis. Also, the reduction of the data and the machine’s
efforts in building variable combinations (features) facilitate the speed of learning and
generalization steps in the machine learning process.

4.3.3 Practical Uses of Feature Extraction

Results can be improved using constructed sets of application-dependent features, typically built by
an expert. One such process is called feature engineering. Alternatively, general dimensionality
reduction techniques are used such as:
AUTOENCODERS
– The purpose of autoencoders is unsupervised learning of efficient data coding. Feature extraction
is used here to identify key features in the data for coding by learning from the coding of the
original data set to derive new ones.

25
IMAGE PROCESSING
– Algorithms are used to detect features such as shaped, edges, or motion in a digital image or
video.

26
CHAPTER 5
IMPLEMENTATION

5.1 CODE
Lung Cancer Convolutional Network:
From google.colab import drive
drive.mount('/content/gdrive')
import SimpleITK as sitk
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

import os
import glob

##import SimpleITK as sitk

from PIL import Image

from scipy.misc import imread

%matplotlib inline
from IPython.display import clear_output
pd.options.mode.chained_assignment = None
annotations = pd.read_csv("/content/drive/MyDrive/annotations.csv")
candidates =pd.read_csv("/content/drive/MyDrive/candidates.csv")
annotations.head()
candidates.info()
print(len(candidates[candidates['class'] == 1]))
print(len(candidates[candidates['class'] == 0]))
import multiprocessing
num_cores = multiprocessing.cpu_count()
print(num_cores)
class CTScan(object):
def __init__(self, filename = None, coords = None):
self.filename = filename
self.coords = coords
self.ds = None
self.image = None
def reset_coords(self, coords):
self.coords = cords
def read_mhd_image(self):
path = glob.glob( "/content/drive/MyDrive/seg-lungs-LUNA16-20220103T180646Z-
001"+ self.filename + '.mhd')
self.ds = sitk.ReadImage(path[0])
self.image = sitk.GetArrayFromImage(self.ds)

27
def get_resolution(self):
return self.ds.GetSpacing()
def get_origin(self):
return self.ds.GetOrigin()
def get_ds(self):
return self.ds
def get_voxel_coords(self):
origin = self.get_origin()
resolution = self.get_resolution()
voxel_coords =[np.absolute(self.coords[j]origin[j]/resoluti[j]\
for j in range(len(self.coords))]
return tuple(voxel_coords)
def get_image(self):
return self.image
def get_subimage(self, width):
self.read_mhd_image()
x, y, z = self.get_voxel_coords()
subImage=self.image[z,y-width/2:y+width/2,x-width/2:x+width/2]
return subImage
def normalizePlanes(self, npzarray):
maxHU = 400.
minHU = -1000.
npzarray = (npzarray - minHU) / (maxHU - minHU)
npzarray[npzarray>1] = 1.
npzarray[npzarray<0] = 0.
return npzarray
def save_image(self, filename, width):
image = self.get_subimage(width)
image = self.normalizePlanes(image)
Image.fromarray(image*255).convert('L').save(filename)
positives = candidates[candidates['class']==1].index
negatives = candidates[candidates['class']==0].index
scan = CTScan(np.asarray(candidates.iloc[negatives[600]])[0], \
np.asarray(candidates.iloc[negatives[600]])[1:-1])
scan.read_mhd_image()
x, y, z = scan.get_voxel_coords()
image = scan.get_image()
dx, dy, dz = scan.get_resolution()
x0, y0, z0 = scan.get_origin()
filename = '1.3.6.1.4.1.14519.5.2.1.6279.6001.100398138793540579077826395208'
coords = (70.19, -140.93, 877.68)#[877.68, -140.93, 70.19]
scan = CTScan(filename, coords)
scan.read_mhd_image()
x, y, z = scan.get_voxel_coords()
image = scan.get_image()
dx, dy, dz = scan.get_resolution()
x0, y0, z0 = scan.get_origin()
positives
np.random.seed(42)
negIndexes =np.random.choice(negatives,len(positives)*5,replace=False)
candidatesDf = candidates.iloc[list(positives)+list(negIndexes)]
from sklearn.model_selection import train_test_split
28
X = candidatesDf.iloc[:,:-1]
y = candidatesDf.iloc[:,-1]
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.20,random_state = 42)
X_train.size
y_train
y_test
X_train,X_val,y_train,y_val=train_test_split(X_train,y_train,tes
t_size = 0.20, random_state = 42)
X_train.size
X_train
y_train
len(X_train)
X_train.to_pickle(‘/content/drive/My Drive/preprocessed_data/traindata')
X_test.to_pickle('/content/drive/My Drive/preprocessed_data/testdata')
X_val.to_pickle('/content/gdrive/My Drive/preprocessed_data/valdata')
def normalizePlanes(npzarray):
maxHU = 400.
minHU = -1000.
npzarray = (npzarray - minHU) / (maxHU - minHU)
npzarray[npzarray>1] = 1.
npzarray[npzarray<0] = 0.
return npzarray
print('number of positive cases are ' + str(y_train.sum()))
print('total set size is ' + str(len(y_train)))
print('percentage of positive cases are'+str(y_train.sum()*1.0/len(y_train)))
tempDf = X_train[y_train == 1]
tempDf = tempDf.set_index(X_train[y_train == 1].index + 1000000)
X_train_new = X_train.append(tempDf)
tempDf = tempDf.set_index(X_train[y_train == 1].index + 2000000)
X_train_new = X_train_new.append(tempDf)

ytemp = y_train.reindex(X_train[y_train == 1].index + 1000000)

ytemp.loc[:] = 1
y_train_new = y_train.append(ytemp)
ytemp = y_train.reindex(X_train[y_train == 1].index + 2000000)
ytemp.loc[:] = 1
y_train_new = y_train_new.append(ytemp)
print(len(X_train_new), len(y_train_new))
X_train_new
y_train_new
X_train_new.index
import scipy.misc
import cv2
#from scipy.misc import imresize
from PIL import ImageEnhance
class PreProcessing(object):
def __init__(self, image = None):
self.image = image
def subtract_mean(self):
self.image = (self.image/255.0 - 0.25)*255
return self.image
def downsample_data(self):
self.image = cv2.resize(self.image, (40,40), interpolation = cv2.INTER_AREA)
29
return self.image
def upsample_data(self):
self.image = cv2.resize(self.image, (224, 224), interpolation = cv2.INTER_CUBIC)
return self.image
dirName = ('/content/drive/My Drive/train/'plt.figure(figsize = (10,10))
inp = imread(dirName + 'image_'+ str(30517) + '.jpg')
print ("Original shape of input image: ", inp.shape)
plt.subplot(221)
plt.imshow(inp, cmap='gray')
plt.grid(False)
inp = PreProcessing(inp).upsample_data()
Pp = PreProcessing(inp)
inp2 = Pp.subtract_mean()
plt.subplot(222)
plt.imshow(inp2, cmap='gray')
plt.grid(False)
inp3 = ImageEnhance.Contrast(Image.fromarray(inp))
contrast = 1.5
inp3 = inp3.enhance(contrast)
plt.subplot(223)
plt.imshow(inp3, cmap='gray')
plt.grid(False)
inp4 = Pp.downsample_data()
plt.subplot(224)
plt.imshow(inp4,cmap='gray')
plt.grid(False)
inp.shape
inp.dtype
y_train_new.values.astype(int)
dirName
train_filenames =\X_train_new.index.to_series().apply(lambda x:\
'/content/drive/MyDrive/train/image_'+str(x)+'.jpg')
train_filenames.values.astype(str)
train_filenames.values.astype(str)
#@title Default title text
dataset_file = 'traindatalabels.txt'
filenames = train_filenames.values.astype(str)
train_filenames =X_train_new.index.to_series().apply(lambda x:filenames)
labels = y_train_new.values.astype(int)
traindata = np.zeros(filenames.size,\dtype=[('var1', 'S36'), ('var2', int)])
traindata['var1'] = filenames
traindata['var2'] = labels
np.savetxt(dataset_file, traindata, fmt="%10s %d")
traindata
import h5py
h5f = h5py.File('/content/gdrive/My Drive/lungcancer/Dataset_LUNA_16/data/traindataset.h5', 'r')
X_train_images = h5f['X']
Y_train_labels = h5f['Y']
h5f2 = h5py.File('/content/gdrive/My Drive/lungcancer/Dataset_LUNA_16/data/valdataset.h5', 'r')
X_val_images = h5f2['X']
Y_val_labels = h5f2['Y']
X_train_images
Y_train_labels
30
X_train_processing = np.array(X_train_images)
X_val_processing = np.array(X_val_images)
print(X_train_processing.shape)
print(X_val_processing.shape)
print(X_train_processing.dtype)
print(X_val_processing.dtype)
img_test = X_train_processing[1000]
img_test_downsampled = PreProcessing(img_test).downsample_data()
img_test_upsampled = PreProcessing(img_test).upsample_data()
print(Y_train_labels[1000])
plt.subplot(121)
plt.imshow(img_test,cmap='gray')
plt.subplot(122)
plt.imshow(img_test_downsampled,cmap='gray')
plt.subplot(121)
plt.imshow(img_test,cmap='gray')
plt.subplot(122)
plt.imshow(img_test_upsampled,cmap='gray')
img_test_subtracted_mean = PreProcessing(img_test_upsampled).subtract_mean()
plt.subplot(121)
plt.imshow(img_test_upsampled,cmap='gray')
plt.subplot(122)
plt.imshow(img_test_subtracted_mean,cmap='gray')
import cv2
brisk = cv2.BRISK_create(50)
img_BRISK_1 = img_test_upsampled.copy()
rgb = cv2.cvtColor(img_BRISK_1, cv2.COLOR_GRAY2RGB)
img_BRISK_2 = rgb*1000
img_BRISK_3 = np.array(img_BRISK_2, dtype=np.uint8)
img_contr_copy = img_BRISK_3.copy()
plt.subplot(221)
plt.imshow(img_BRISK_3, cmap='gray')
kpts_1, des_1 = brisk.detectAndCompute(img_BRISK_3, None)

image_BRISK_1 = cv2.drawKeypoints(image=img_BRISK_3, outImage=img_BRISK_3,

keypoints=kpts_1, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
plt.subplot(222)
plt.imshow(image_BRISK_1, cmap='gray')
#with enhanced contrast
inp_contr = ImageEnhance.Contrast(Image.fromarray(img_contr_copy))
contrast = 1.5
img_contr = inp_contr.enhance(contrast)
plt.subplot(223)
plt.imshow(img_contr, cmap='gray')
img_BRISK_4 = np.array(img_contr)
kpts_2, des_2 = brisk.detectAndCompute(img_BRISK_4, None)
image_BRISK_2 = cv2.drawKeypoints(image=img_BRISK_4, outImage=img_BRISK_4,
keypoints=kpts_2, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
plt.subplot(224)
plt.imshow(image_BRISK_2, cmap='gray')
X_train_images_np = np.expand_dims(np.array(X_train_images), axis = 3)
y_train_labels_np = np.array(Y_train_labels)
X_val_images_np = np.expand_dims(np.array(X_val_images), axis = 3)
31
y_val_labels_np = np.array(Y_val_labels)
print(X_train_images_np.shape)
print(y_train_labels_np.shape)
print(X_val_images_np.shape)
print(y_val_labels_np.shape)
print(X_train_images_np.dtype)
print(y_train_labels_np.dtype)
print(X_val_images_np.dtype)
print(y_val_labels_np.dtype)
h5f2 = h5py.File('/content/drive/My Drive/testdataset.h5', 'r')
X_test_images = h5f2['X']
Y_test_labels = h5f2['Y']
X_test_images_np = np.expand_dims(np.array(X_test_images), axis = 3)
y_test_labels_np = np.array(Y_test_labels)
print(X_test_images_np.shape)
print(y_test_labels_np.shape)
print(X_test_images_np.dtype)
print(y_test_labels_np.dtype)
#MODEL MAKING AND MODEL TRAINING
from keras.models import Sequential
from keras.layers import
Conv2D,MaxPooling2D,Dense,Flatten,Dropout,BatchNormalization,AveragePooling2D
from keras.optimizers import SGD
from keras.preprocessing.image import ImageDataGenerator
def define_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same',
input_shape=(50, 50, 1)))
model.add(BatchNormalization())
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.1))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.3))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.4))
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
32
model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'));
return model
mobile = define_model()
mobile.summary()
# plot diagnostic learning curves
import matplotlib.pyplot as plt
def summarize_diagnostics(hist):
plt.plot(hist.history["accuracy"])
plt.plot(hist.history['val_accuracy'])
plt.plot(hist.history['loss'])
plt.plot(hist.history['val_loss'])
plt.title("model accuracy")
plt.ylabel("Accuracy")
plt.xlabel("Epoch")
plt.legend(["Accuracy","Validation Accuracy","loss","Validation Loss"]
callbacks = [keras.callbacks.ModelCheckpoint('classification_deep_conv_model.h5',
save_best_only=True),
keras.callbacks.ReduceLROnPlateau()]
# run the test harness for evaluating a model
def run_test_harness():
# create data generator
datagen = ImageDataGenerator(width_shift_range=0.1, height_shift_range=0.1,
horizontal_flip=True,zoom_range=0.1)
# prepare iterator
it_train = datagen.flow(X_train_images_np, y_train_labels_np, batch_size=64)
# fit model
steps = int(X_train_images_np.shape[0] / 64)
hist = model.fit_generator(it_train, steps_per_epoch=steps, epochs=500,
validation_data=(X_val_images_np, y_val_labels_np), verbose=1, callbacks = callbacks)
# evaluate model
_, acc = model.evaluate(X_val_images_np, y_val_labels_np, verbose=1)
print('> %.3f' % (acc * 100.0))
# learning curves
summarize_diagnostics(hist)
# entry point, run the test harness
run_test_harness()
model_new_1 = keras.models.load_model("classification_deep_conv_model.h5")
model_new_1.summary()
from sklearn.metrics import confusion_matrix,classification_report
y_pred = model_new_1.predict(X_test_images_np)
y_p = np.argmax(y_pred,axis=1)
y_true = np.argmax(y_test_labels_np,axis=1)
print('confusion matrix')
print(confusion_matrix(y_true,y_p))
print('Classification report')
33
print(classification_report(y_true,y_p))
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
op_layer = model_new_1.get_layer('dense_3').output
model_new = Model(inputs = model_new_1.input,outputs = op_layer)
train_new = sc.fit_transform(model_new.predict(X_train_images_np))
val_new = sc.fit_transform(model_new.predict(X_val_images_np))
test_new = sc.fit_transform(model_new.predict(X_test_images_np))
from sklearn.svm import SVC
svm = SVC(kernel='rbf')
svm.fit(train_new,np.argmax(y_train_labels_np,axis=1))
print('Training score of svm',svm.score(train_new,np.argmax(y_train_labels_np,axis=1)))
print('validation score of svm :',svm.score(val_new,np.argmax(y_val_labels_np,axis=1)))
print('testing score of svm :',svm.score(test_new,np.argmax(y_test_labels_np,axis=1)))
from xgboost import XGBClassifier
xg = XGBClassifier()
xg.fit(train_new,np.argmax(y_train_labels_np,axis=1))
print('Training score of xgb',xg.score(train_new,np.argmax(y_train_labels_np,axis=1)))
print('validation score of xgb :',xg.score(val_new,np.argmax(y_val_labels_np,axis=1)))
print('testing score of xgb :',xg.score(test_new,np.argmax(y_test_labels_np,axis=1)))
from sklearn.ensemble import RandomForestClassifier
rf=RandomForestClassifier()
rf.fit(train_new,np.argmax(y_train_labels_np,axis=1))
print('Training score of random_forest',rf.score(train_new,np.argmax(y_train_labels_np,axis=1)))
print('validation score of random_forest :',rf.score(val_new,np.argmax(y_val_labels_np,axis=1)))
print('testing score of random_forest :',rf.score(test_new,np.argmax(y_test_labels_np,axis=1)))

5.2 LUNG_CANCER_2

%matplotlib inline
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import pydicom as dicom
import os
import scipy.ndimage
import matplotlib.pyplot as plt
from skimage import measure, morphology, segmentation
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import scipy.ndimage as ndimage
from google.colab import drive
drive.mount('/content/drive')
# Some constants
INPUT_FOLDER = '/content/drive/My Drive/lungcancer/stage1_train'
patients = os.listdir(INPUT_FOLDER)
patients.sort()
print(len(patients))
#patients.remove('.DS_Store')
# Load the scans in given folder path
def load_scan(path):
ds = []
for s in os.listdir(path):
if( s != '.DS_Store'):
ds.append(s)
34
slices = [dicom.read_file((path + '/' + s),force = True) for s in ds]
slices.sort(key = lambda x: int(x.InstanceNumber))
try:
slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
except:
slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)

for s in slices:
s.SliceThickness = slice_thickness

return slices
def get_pixels_hu(scans):
image = np.stack([s.pixel_array for s in scans])
# Convert to int16 (from sometimes int16),
# should be possible as values should always be low enough (<32k)
image = image.astype(np.int16)

# Set outside-of-scan pixels to 0

# The intercept is usually -1024, so air is approximately 0
image[image == -2000] = 0

# Convert to Hounsfield units (HU)

intercept = scans[0].RescaleIntercept
slope = scans[0].RescaleSlope

if slope != 1:
image = slope * image.astype(np.float64)
image = image.astype(np.int16)

image += np.int16(intercept)

return np.array(image, dtype=np.int16)

first_patient = load_scan(INPUT_FOLDER +'/'+ patients[2])
first_patient_pixels = get_pixels_hu(first_patient)
plt.hist(first_patient_pixels.flatten(), bins=80, color='c')
plt.xlabel("Hounsfield Units (HU)")
plt.ylabel("Frequency")
plt.show()

# Show some slice in the middle

plt.imshow(first_patient_pixels[80], cmap=plt.cm.gray)
plt.show()
plt.axis('off')
def resample(image, scan, new_spacing=[1,1,1]):
# Determine current pixel spacing
spacing = spacing = map(float, ([scan[0].SliceThickness] + list(scan[0].PixelSpacing)))
spacing = np.array(list(spacing))

resize_factor = spacing / new_spacing

new_real_shape = image.shape * resize_factor
new_shape = np.round(new_real_shape)
real_resize_factor = new_shape / image.shape
new_spacing = spacing / real_resize_factor
35
image = scipy.ndimage.interpolation.zoom(image, real_resize_factor)

return image, new_spacing

pix_resampled, spacing = resample(first_patient_pixels, first_patient, [1,1,1])
print("Shape before resampling\t", first_patient_pixels.shape)
print("Shape after resampling\t", pix_resampled.shape)
Shape before resampling (233, 512, 512)
Shape after resampling (350, 414, 414)
def plot_3d(image, threshold=-300):

# Position the scan upright,

# so the head of the patient would be at the top facing the camera
p = image.transpose(2,1,0)
p = p[:,:,::-1]

verts, faces,_,_ = measure.marching_cubes_lewiner(p, threshold)

fig = plt.figure(figsize=(10, 10))

ax = fig.add_subplot(111, projection='3d')

# Fancy indexing: `verts[faces]` to generate a collection of triangles

mesh = Poly3DCollection(verts[faces], alpha=0.1)
face_color = [0.5, 0.5, 1]
mesh.set_facecolor(face_color)
ax.add_collection3d(mesh)

ax.set_xlim(0, p.shape[0])
ax.set_ylim(0, p.shape[1])
ax.set_zlim(0, p.shape[2])

return plt.show()
plot_3d(pix_resampled, 400)
def largest_label_volume(im, bg=-1):
vals, counts = np.unique(im, return_counts=True)

counts = counts[vals != bg]

vals = vals[vals != bg]

biggest = vals[np.argmax(counts)]
return biggest

def segment_lung_mask(image, fill_lung_structures=True):

# not actually binary, but 1 and 2.

# 0 is treated as background, which we do not want
binary_image = np.array(image > -350, dtype=np.int8)+1
labels = measure.label(binary_image)

# Pick the pixel in the very corner to determine which label is air.
# Improvement: Pick multiple background labels from around the patient
# More resistant to "trays" on which the patient lays cutting the air
# around the person in half
36
background_label = labels[0,0,0]

#Fill the air around the person

binary_image[background_label == labels] = 2

# Method of filling the lung structures (that is superior to something like

# morphological closing)
if fill_lung_structures:
# For every slice we determine the largest solid structure
for i, axial_slice in enumerate(binary_image):
axial_slice = axial_slice - 1
labeling = measure.label(axial_slice)
l_max = largest_label_volume(labeling, bg=0)

if l_max is not None: #This slice contains some lung

binary_image[i][labeling != l_max] = 1

binary_image -= 1 #Make the image actual binary

binary_image = 1-binary_image # Invert it, lungs are now 1

# Remove other air pockets insided body

labels = measure.label(binary_image, background=0)
l_max = largest_label_volume(labels, bg=0)
if l_max is not None: # There are air pockets
binary_image[labels != l_max] = 0
return binary_image

segmented_lungs = segment_lung_mask(pix_resampled, False)

segmented_lungs_fill = segment_lung_mask(pix_resampled, True)
plot_3d(segmented_lungs, 0)
plot_3d(segmented_lungs_fill, 0)
plot_3d(segmented_lungs_fill - segmented_lungs, 0)

Watershed Algortihm

test_patient_scans = load_scan(INPUT_FOLDER +'/'+ patients[1])

test_patient_images = get_pixels_hu(test_patient_scans)
print ("Original Slice")
plt.imshow(test_patient_images[100], cmap='gray')
plt.show()
def generate_markers(image):
#Creation of the internal Marker
marker_internal = image < -400
marker_internal = segmentation.clear_border(marker_internal)
marker_internal_labels = measure.label(marker_internal)
areas = [r.area for r in measure.regionprops(marker_internal_labels)]
areas.sort()
if len(areas) > 2:
for region in measure.regionprops(marker_internal_labels):
if region.area < areas[-2]:
for coordinates in region.coords:
marker_internal_labels[coordinates[0], coordinates[1]] = 0
37
marker_internal = marker_internal_labels > 0
#Creation of the external Marker
external_a = ndimage.binary_dilation(marker_internal, iterations=10)
external_b = ndimage.binary_dilation(marker_internal, iterations=55)
marker_external = external_b ^ external_a
#Creation of the Watershed Marker matrix
marker_watershed = np.zeros((512, 512), dtype=np.int)
marker_watershed += marker_internal * 255
marker_watershed += marker_external * 128

return marker_internal, marker_external, marker_watershed

#Show some example markers from the middle

test_patient_internal, test_patient_external, test_patient_watershed =
generate_markers(test_patient_images[100])
print ("Internal Marker")
plt.imshow(test_patient_internal, cmap='gray')
plt.show()
print ("External Marker")
plt.imshow(test_patient_external, cmap='gray')
plt.show()
print ("Watershed Marker")
plt.imshow(test_patient_watershed, cmap='gray')
plt.show()
plt.subplot(121)
plt.imshow(test_patient_images[100],cmap = 'gray')
plt.axis('off')
plt.subplot(122)
plt.imshow(test_patient_watershed,cmap = 'gray')
plt.axis('off')
def seperate_lungs(image):
#Creation of the markers as shown above:
marker_internal, marker_external, marker_watershed = generate_markers(image)

#Creation of the Sobel-Gradient

sobel_filtered_dx = ndimage.sobel(image, 1)
sobel_filtered_dy = ndimage.sobel(image, 0)
sobel_gradient = np.hypot(sobel_filtered_dx, sobel_filtered_dy)
sobel_gradient *= 255.0 / np.max(sobel_gradient)

#Watershed algorithm
watershed = segmentation.watershed(sobel_gradient, marker_watershed)

#Reducing the image created by the Watershed algorithm to its outline

outline = ndimage.morphological_gradient(watershed, size=(3,3))
outline = outline.astype(bool)

#Performing Black-Tophat Morphology for reinclusion

#Creation of the disk-kernel and increasing its size a bit
blackhat_struct = [[0, 0, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1],
38
[1, 1, 1, 1, 1, 1, 1],
[0, 1, 1, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 0, 0]]

blackhat_struct = ndimage.iterate_structure(blackhat_struct, 8)
#Perform the Black-Hat
outline += ndimage.black_tophat(outline, structure=blackhat_struct)

#Use the internal marker and the Outline that was just created to generate the lungfilter
lungfilter = np.bitwise_or(marker_internal, outline)
#Close holes in the lungfilter
#fill_holes is not used here, since in some slices the heart would be reincluded by accident
lungfilter = ndimage.morphology.binary_closing(lungfilter, structure=np.ones((7,7)), iterations=3)

#Apply the lungfilter (note the filtered areas being assigned -2000 HU)
segmented = np.where(lungfilter == 1, image, -2000*np.ones((512, 512)))

#### nodule
lung_nodule_1 = np.bitwise_or(marker_internal, image)
lung_nodule = np.where(lungfilter == 1, lung_nodule_1, np.zeros((512, 512)))

return segmented, lung_nodule, lungfilter, outline, watershed, sobel_gradient, marker_internal,

marker_external, marker_watershed

#Some Testcode:
test_segmented, lung_nodule, test_lungfilter, test_outline, test_watershed, test_sobel_gradient,
test_marker_internal, test_marker_external, test_marker_watershed =
seperate_lungs(test_patient_images[100])

print ("Lung Nodule")

plt.imshow(lung_nodule, cmap='gray')
plt.show()
print ("Sobel Gradient")
plt.imshow(test_sobel_gradient, cmap='gray')
plt.show()
print ("Watershed Image")
plt.imshow(test_watershed, cmap='gray')
plt.show()
print ("Outline after reinclusion")
plt.imshow(test_outline, cmap='gray')
plt.show()
print ("Lungfilter after closing")
plt.imshow(test_lungfilter, cmap='gray')
plt.show()
print ("Segmented Lung")
plt.imshow(test_segmented, cmap='gray')
plt.show()
plt.subplot(121)
plt.imshow(test_patient_images[100],cmap = 'gray')
plt.axis('off')
plt.subplot(122)
plt.imshow(test_segmented,cmap = 'gray')
39
plt.axis('off')
test_segmented.shape
labels_df = pd.read_csv('/content/drive/My Drive/lungcancer/stage1_labels.csv', index_col=0)
labels_df.head()
for patient in patients[:1]:
label = labels_df.at[(patient, 'cancer')]
path = INPUT_FOLDER +'/'+ patient
ds = []
for s in os.listdir(path):
if( s != '.DS_Store'):
ds.append(s)
slices = [dicom.read_file((path + '/' + s),force=True) for s in ds]
#slices.sort(key = lambda x: int(x.ImagePositionPatient[2]))
print(len(slices),label)
print(slices[0])
print(slices[0].PatientID)
print('len of ds is ',len(ds))
print(labels_df.at[(str(slices[0].PatientID), 'cancer')])
data = []
labels = []
print('*'*30)
print("data in converting.......")
print('*'*30)
j=1
for patient in patients:
test_patient_scans = load_scan(INPUT_FOLDER +'/'+ patients[j])
test_patient_images = get_pixels_hu(test_patient_scans)
path = INPUT_FOLDER +'/'+ patient
ds = []
for s in os.listdir(path):
if( s != '.DS_Store'):
ds.append(s)
slices = [dicom.read_file((path + '/' + s),force=True) for s in ds]
print("patient_number_{}".format(j))
i=0
for s in slices:
try:
i += 1
if i in range(0,50):
continue
#taking 100 slices from each patient
elif i in range(50, 150):
img = test_patient_images[i]
seg_img = seperate_lungs(img)[0]
new_img = np.expand_dims(seg_img,axis = -1)
label = labels_df.at[(str(s.PatientID), 'cancer')]
data.append(new_img)
labels.append(label)

print("converted image is : "+str(len(labels)))

else:
break
except IndexError:
40
continue
j += 1
print("Done")
print(len(data))
#print(len(ids))# both lists, with columns specified
print(len(labels))
data_new = np.array(data)
data_new.shape
labels_new = np.array(labels)
labels_new.shape
from numpy import save
print("saving data")
save("/content/drive/My Drive/lungcancer/data_4.npy",data_new)
print("saving labels")
save("/content/drive/My Drive/lungcancer/labels_4.npy",labels_new)
from numpy import load
print("loading data")
data = load("/content/drive/My Drive/lungcancer/data_4.npy")
print("loading labels")
labels = load("/content/drive/My Drive/lungcancer/labels_4.npy")
print(data.shape)
print(labels.shape)
data_preview = data[:, :, :, 0]
plt.imshow(data_preview[100],cmap = 'gray')
labels[100]
from sklearn.model_selection import train_test_split
(trainX, testX, trainY, testY) = train_test_split(data, labels,
test_size=0.1, stratify=labels, random_state=42,shuffle = True)
trainX.shape
trainY.shape
from keras.preprocessing.image import ImageDataGenerator
aug_train = ImageDataGenerator(rescale= 1.0/255.,
rotation_range=20,
zoom_range=0.15,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest")

aug_test = ImageDataGenerator(rescale= 1.0/255.)

Using TensorFlow backend.
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Dropout

# Initialising the CNN

classifier = Sequential()

# Step 1 - Convolution
41
classifier.add(Conv2D(32, (3, 3), input_shape = (512, 512, 1), activation = 'relu'))

# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Adding a second convolutional layer

classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a third convolutional layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(Dropout(0.5))

# Adding Convolution
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))

# Step 3 - Flattening
classifier.add(Flatten())
# Step 4 - Full connection
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dropout(0.5))
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier.summary()
import keras as k
import time
NAME = "test_1-{}".format(int(time.time()))
callbacks = [
# k.callbacks.EarlyStopping(patience=3, monitor='val_loss'),
k.callbacks.TensorBoard(log_dir='logs\{}'.format(NAME)),
k.callbacks.ModelCheckpoint('test_model_1.h5', save_best_only=True)]
hist_1 = classifier.fit(aug_train.flow(trainX, trainY, batch_size=32), steps_per_epoch=100, epochs
= 50, verbose = 1,
validation_data = (testX, testY), callbacks=callbacks)
aug_train.fit(trainX)
classifier_2 = Sequential()

# Step 1 - Convolution
classifier_2.add(Conv2D(32, (3, 3), input_shape = (512, 512, 1), activation = 'relu'))

# Step 2 - Pooling
classifier_2.add(MaxPooling2D(pool_size = (2, 2)))

# Adding a second convolutional layer

classifier_2.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier_2.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a third convolutional layer
classifier_2.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier_2.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a forth convolutional layer
classifier_2.add(Conv2D(64, (3, 3), activation = 'relu'))
classifier_2.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a forth convolutional layer
42
classifier_2.add(Conv2D(64, (3, 3), activation = 'relu'))
classifier_2.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a forth convolutional layer
classifier_2.add(Conv2D(64, (3, 3), activation = 'relu'))
classifier_2.add(MaxPooling2D(pool_size = (2, 2)))

# Step 3 - Flattening
classifier_2.add(Flatten())

# Step 4 - Full connection

classifier_2.add(Dense(units = 128, activation = 'relu'))
classifier_2.add(Dense(units = 128, activation = 'relu'))
classifier_2.add(Dense(units = 128, activation = 'relu'))
classifier_2.add(Dense(units = 1, activation = 'sigmoid'))
classifier_2.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier_2.summary()
import keras as k
import time
NAME = "test_2-{}".format(int(time.time()))
callbacks_2 = [
# k.callbacks.EarlyStopping(patience=3, monitor='val_loss'),
k.callbacks.TensorBoard(log_dir='logs\{}'.format(NAME)),
k.callbacks.ModelCheckpoint('test_model_2.h5', save_best_only=True)]
hist_2 = classifier_2.fit(aug_train.flow(trainX, trainY, batch_size=32), steps_per_epoch=100,
epochs = 50, verbose = 1,
validation_data = (testX, testY), callbacks=callbacks_2)
# print accuracy graph
plt.plot(hist_1.history["accuracy"])
plt.plot(hist_1.history['val_accuracy'])
plt.title("model accuracy")
plt.legend(["Accuracy","Validation Accuracy"])
plt.ylabel("Acuuracy")
plt.xlabel("Epoch")
# print loss graph
import matplotlib.pyplot as plt
plt.plot(hist_1.history['loss'])
plt.plot(hist_1.history['val_loss'])
plt.title("model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
plt.legend(["loss","Validation Loss"])
# print accuracy graph
import matplotlib.pyplot as plt
plt.plot(hist_2.history["accuracy"])
plt.plot(hist_2.history['val_accuracy'])
plt.title("model accuracy")
plt.ylabel("Accuracy")
plt.xlabel("Epoch")
plt.legend(["Accuracy","Validation Accuracy"])
# print loss graph
plt.plot(hist_2.history['loss'])
plt.plot(hist_2.history['val_loss'])
plt.title("model loss")
43
plt.ylabel("Loss")
plt.xlabel("Epoch")
plt.legend(["loss","Validation Loss"])

VGG-16 TRANSFER LEARNING

import os
import random
import tensorflow as tf
import time
import numpy as np
from tqdm import tqdm
from vgg16 import VGG16
from tensorflow.keras.preprocessing import image
from imagenet_utils import preprocess_input, decode_predictions
from keras.layers import Dense, Activation, Flatten
from keras.layers import merge,Input
from sklearn.utils import shuffle
from keras.models import Model
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img,
img_to_array
data.shape
from skimage.transform import resize
import numpy as np
data_new = np.zeros((len(data),224,224,3))
for i in range(len(data)):
image = data[i]
image_resize = resize(image, (224, 224), anti_aliasing=True)
data_new[i] = image_resize
data_new.shape
from numpy import save
print("saving data")
save("/content/drive/My Drive/lungcancer/data_channel_3_resized.npy",data_new)
from numpy import load
print("loading data")
data = load("/content/drive/My Drive/lungcancer/data_channel_3_resized.npy")
print("loading labels")
labels = load("/content/drive/My Drive/lungcancer/labels_channel_3_3.npy")
data.shape
labels.shape
j=0
for i in labes:
if i == 1:
j = j+1
print(j)
print("there are {} cancer patients in this dataset".format(j))
from sklearn.model_selection import train_test_split
(trainX, testX, trainY, testY) = train_test_split(data, labels,
test_size=0.1, stratify=labels, random_state=42,shuffle = True)
trainX.shape
trainY.shape
from keras.preprocessing.image import ImageDataGenerator
44
aug_train = ImageDataGenerator(rescale= 1.0/255.,
rotation_range=20,
zoom_range=0.15,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest")

aug_test = ImageDataGenerator(rescale= 1.0/255.)

aug_train.fit(trainX)
NAME = "test_5-{}".format(int(time.time()))
image_input = Input(shape=(224,224,3))
model = VGG16(input_tensor=image_input, include_top = True, weights = 'imagenet')
model.summary()
last_layer = model.get_layer('fc2').output
x = Dense(128, activation='relu', name='fc3')(last_layer)
x = Dense(128, activation='relu', name='fc4')(x)
out = Dense(1, activation = 'sigmoid', name = 'output')(x)
custom_vgg_model = Model(image_input, out)
for layer in custom_vgg_model.layers[:-4]:
layer.trainable = False
custom_vgg_model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics=['accuracy'])
custom_vgg_model.summary()
callbacks_5 = []
#tf.keras.callbacks.EarlyStopping(patience=3, monitor='val_loss'),
#tf.keras.callbacks.TensorBoard(log_dir='logs\{}'.format(NAME)),
#tf.keras.callbacks.ModelCheckpoint('test_model_5.h5', save_best_only=True)]
hist_5 = custom_vgg_model.fit(aug_train.flow(trainX, trainY, batch_size=32),
steps_per_epoch=500, epochs = 50, verbose = 1,
validation_data = (testX, testY), callbacks=callbacks_5)
# print loss graph
import matplotlib.pyplot as plt
plt.plot(hist_5.history['loss'])
plt.plot(hist_5.history['val_loss'])
plt.title("model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
plt.legend(["loss","Validation Loss"])
# print accuracy graph
plt.plot(hist_5.history["accuracy"])
plt.plot(hist_5.history['val_accuracy'])
plt.title("model accuracy")
plt.legend(["Accuracy","Validation Accuracy"])
plt.ylabel("Acuuracy")
plt.xlabel("Epoch")

PLOTTING SOME IMAGES IN DIFFERENT WAYS

data_preview.shape
test_image = data_preview[50]
test_image.shape
test_image.dtype
45
plt.imshow(test_image)
plt.imshow(test_image, cmap = 'gray')
import skimage
image = skimage.color.gray2rgb(test_image)
plt.imshow(image)
image.shape
import cv2
import numpy as np
img = np.array(test_image, dtype=np.uint8)
color_img = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
plt.imshow(color_img)
color_img.shape
plt.imshow(data_preview[50], cmap='Accent')
labels[50]
plt.imshow(data_preview[100], cmap='Accent')

46
CHAPTER 7

EXPERIMENTAL RESULTS
Lung Cancer Convolution Outputs

Fig1: Original shape of input image(50,50) Fig2: <matplotlib image >

Lung_Cancer2 Output

47
48
CHAPTER 6
TESTING
6.1 Software Testing
Testing: Testing is a manner of executing a software with the purpose of locating blunders. To
make our software program carry out properly it must be blunders free. If trying out is achieved
efficiently it's going to eliminate all of the mistakes from the software program
 Integration Testing

 Alpha Testing

 Beta Testing

 Performance Testing and so on

6.2 TYPES OF TESTING

Integration Testing
The segment in software program trying out wherein person software program modules are mixed
and examined as a group. It is commonly carried out with the aid of using trying out teams.
Alpha Testing

49
Type of trying out a software program product or device performed on the developer's site. Usually,
it's far finished via way of means of the quit users.
Beta Testing
Final checking out earlier than liberating utility for business purpose. It is usually accomplished
with the aid of using end- customers or others.
Performance Testing
Functional checking out carried out to assess the compliance of a gadget or factor with distinctive
overall performance requirements. It is commonly carried out through the overall performance
engineer.

CHAPTER 8
CONCLUSION AND FUTURE SCOPE
8.1 CONCLUSION
We evaluated different approaches that used various machine learning techniques. Within the
approaches described above, we explored three different methods to handle the labels: averaging
nodule encodings per patient, labeling nodules with the same label given to the patient, or, finally,
to not do nodule detection at all and opt instead for a 3D model on the raw images. Thus, we have
concluded from the explored approaches that the best one is the first approach with the most
efficiency and less drawbacks.

8.2 FUTURE SCOPE

The concept that we've offered is on the very starting of its deployment section and for this reason
we want to increase my undertaking via way of means of imposing a version which can make
precious predictions primarily based totally at the reviews certain withinside the datasets and gain a
better cost of accuracy via way of means of the usage of the primary technique to construct a
network, to offer a good deal better studying fee of the nodes growing the overall performance of

50
the version and minimizing the losses effectively. We can flow ahead with the deployment
technique making this software floor breaking concerning the scientific industry, enhancing the
fitness care device for this reason inflicting a massive useful effect at the lives of many individuals.

REFERENCES
[1] D Shiloh, Elizabeth, H. Khanna, C. Sunil A novel segmentation approach for improving diagnostic
accuracy of CAD systems for detecting lung cancer from chest computed tomography images,
ACM.2012.
[2] Jong won Kim, Hojun Lee, Taeson Yoon. Automated Diagnosis of Lung Cancer with the Use of
Deep Convolutional Neural Networks on Chest CT ACM.2017.
[3] Nadas El-Askary, Mohammed AM. Saleem, Mohammed I. Roushdy Feature Extraction and Analysis
for Lung Nodule Classification using Random Forest. ACM.2019.
[4] Xinyan li, Shaeting Feng, Daru pan. Enhanced lung segmentation in chest CT images based on kernel
graph cuts, ACM.2016.
[5] Shuang feng Dai, Ke Lu a , Jiayang Dong, Yifei Zhang , Yong Chen. A novel approach of lung
segmentation on chest CT images using graph cuts. ACM.2015.
[6] Yuan Huang, Fugen Zhu Lung Segmentation Using a Fully Convolutional Neural Network with
Weekly Supervision, ACM.2018
[7] Zhengwe hui, Ajim Muhammad, Ming Zhu Pulmonary Nodule Detection in CT Images via Deep
Neural Network: Nodule Candidate Detection. AXM.1018

51
[8] Paulo Cezar P. Carvalho, Aristfanes c silva, Marcelo Gattas, Diagnosis of Lung Nodule Using Gini
Coefficient and Skeletonization in Computerized Tomography Images, ACM.2004
[9] Maxine Tan, Rudi Deckles, Jan P Cornelis, Bart Jarens. Analysis of a Feature- Deselective
Neuroevolutionary Classifier (FD-NEAT) in a Computer-Aided Lung Nodule Detection System for
CT Images. ACM.2012.
[10] Ying Su , Dan Li , Xiaodong Chen , Lung Nodule Detection based on Faster R-CNN Framework,
Computer Methods and Programs in Biomedicine (2020).
[11] Lakshmanaprabu S.K., et al., Optimal deep learning model for classification of lung cancer on CT
images, Future Generation Computer Systems (2018).
[12] M. Attique Khana , S. Rubabb , Asifa Kashif c , Muhammad Imran Sharif d, Nazeer Muhammadb ,
Jamal Hussain Shahd, ,Yu-Dong Zhange , Suresh Chandra Satapathy. Lungs cancer classification
from CT images: An integrated design of contrast based classical features fusion and selection ✩ M.
Attique Khana , S. Rubabb , Asifa Kashif c , Muhammad Imran Sharif d, Nazeer Muhammadb , Jamal
Hussain Shahd,∗ , Yu-Dong Zhange , Suresh Chandra Satapathy. (2019)
[13] K. Mya, M. Tun, and A. S. Khaing, “Feature Extraction and Classification of Lung Cancer Nodule
using Image Processing Technique,” Int. J. Eng. Res. Technol., vol. 3, no. 3, pp. 2204–2211, 2014.
[14] Wariya Chintanapakdee, MDa,b , Dexter P. Mendoza, MDa , Eric W. Zhang, MDa , Ariel Botwin,
MDa , Matthew D. Gilman, MDa , Justin F. Gainor, MDc , Jo-Anne O. Shepard, MDa , Subba R.
Digumarthy, MD Detection of Extrapulmonary Malignancy During Lung Cancer Screening: 5-Year
Analysis at a Tertiary Hospital (2020)
[15] Marjolein A Heuvelmans, Matthijs Oudkerk Deep learning to stratify lung nodules on annual
follow-up CT.(2019)
[16] Marjolein A. Heuvelmans a,b, , Peter M.A. van Ooijen c , Sarim Ather d , Carlos Francisco Silva
e,g,n , Daiwei Han f , Claus Peter Heussel e,g,n , William Hickes h , HansUlrich Kauczor e,g,n , Petr
Novotny i,j , Heiko Peschl k , Mieneke Rook f,l , Roman Rubtsov e,g,n , Oyunbileg von Stackelberg
e,g,n , Maria T. Tsakok k , Carlos Arteta i , Jerome Declerck i , Timor Kadiri , Lyndsey Pickup i ,
Fergus Gleeson h , Matthijs Oudkerk. (2021)
[17] Anna Meldo a,b , Lev Utkin a, *, Maxim Kovalev a , Ernest Kasimov a. The natural language
explanation algorithms for the lung cancer computer-aided diagnosis system.(2020)
[18] Lei Fan, Zhaoqiang Xia, Xiaobiao Zhang, Xiaoyi Feng. Lung Nodule Detection Based on 3D
Convolutional Neural Networks LCD. (2019).
[19] Goran Jakimovski Danco Davcev Lung cancer medical image recognition using Deep Neural
Networks LCD. (2018).

52
[20] Ait Skourt Sidi Mohammed Ben Abdellah Fez, Nikola S. Nikolov, Nikola.Nikolov. Feature-
Extraction Methods for Lung-Nodule Detection: A Comparative Deep Learning Study Brahim. IEEE.
(2019).
[21] Ruchita tekade, Rajeshwari K. Lung Cancer detection and Classification using Deep Learning 2018
Fourth International Conference on Computing Communication Control and Automation
(ICCUBEA)
[22] Bohdon chapaliuk, yuriy zaychenko Deep /earning approach in & computer-aided detection 6ystem
for lung cancer IEEE. (2018)
[23] Wadood Abdul, An Automatic Lung Cancer Detection and Classification (ALCDC) System Using
Convolutional Neural Network 2020 IEEE 13th International Conference on Developments in
eSystems Engineering (DeSE)
[24] Hongyang Jiang, He Ma*, Wei Qian, Mengdi Gao and Yan Li. An Automatic Detection System of
Lung Nodule Based on Multi-Group Patch-Based Deep Learning Network
DOI10.1109/JBHI.2017.2725903, IEEE Journal of Biomedical and Health Informatics
[25] A 3D Probabilistic Deep Learning System for Detection and Diagnosis of Lung Cancer Using Low-
Dose CT scans Onur Ozdemir , Member, IEEE, Rebecca L. Russell, and Andrew A. Berlin, Member,
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 39, NO. 5, MAY 2020.

JD-1402-21 - E Information On Software Upgrade For BAM & LAN
No ratings yet
JD-1402-21 - E Information On Software Upgrade For BAM & LAN
15 pages
Digital Forensics
0% (1)
Digital Forensics
35 pages
DNP3 TRANSFIX-Family Installation Manual
No ratings yet
DNP3 TRANSFIX-Family Installation Manual
64 pages
Very Hungry Reindeer BookPDF
No ratings yet
Very Hungry Reindeer BookPDF
8 pages
C4_Project Report Phase 2 (2)
No ratings yet
C4_Project Report Phase 2 (2)
55 pages
Lung Cancer Detection Report
No ratings yet
Lung Cancer Detection Report
22 pages
Project Report Early Lung Cancer Detection Using Machine Learning and Image Processing
No ratings yet
Project Report Early Lung Cancer Detection Using Machine Learning and Image Processing
52 pages
Project Report Early Lung Cancer Detection Using Machine Learning and Image Processing
No ratings yet
Project Report Early Lung Cancer Detection Using Machine Learning and Image Processing
52 pages
Lung Cancer Detection Final Report
No ratings yet
Lung Cancer Detection Final Report
26 pages
Aihc Report
No ratings yet
Aihc Report
13 pages
Final Edition 1
No ratings yet
Final Edition 1
90 pages
Mobile Application Development
No ratings yet
Mobile Application Development
75 pages
ffffffffffffffffffffff
No ratings yet
ffffffffffffffffffffff
25 pages
Batch 36
No ratings yet
Batch 36
57 pages
Lung Cancer Stages Prediction
No ratings yet
Lung Cancer Stages Prediction
59 pages
Ijarcce 2023 12709
No ratings yet
Ijarcce 2023 12709
9 pages
Project Report i
No ratings yet
Project Report i
52 pages
Final Book
No ratings yet
Final Book
95 pages
Documentation
No ratings yet
Documentation
67 pages
Final Lung Record
No ratings yet
Final Lung Record
49 pages
Lung Cancer Detection
No ratings yet
Lung Cancer Detection
16 pages
R182254V Proposal Prayer Mupikata
No ratings yet
R182254V Proposal Prayer Mupikata
8 pages
Lung Cancer Project
No ratings yet
Lung Cancer Project
34 pages
re paper
No ratings yet
re paper
7 pages
Prediction of Lung Cancer Using ML (1) - 2
No ratings yet
Prediction of Lung Cancer Using ML (1) - 2
52 pages
Report RBMP
No ratings yet
Report RBMP
39 pages
Lung Cancer
No ratings yet
Lung Cancer
70 pages
Lung Cancer
No ratings yet
Lung Cancer
13 pages
Final Book
No ratings yet
Final Book
96 pages
LungCancerD SRS
No ratings yet
LungCancerD SRS
7 pages
A Novel Method To Detect Lung Cancer Using Deep Learning
No ratings yet
A Novel Method To Detect Lung Cancer Using Deep Learning
9 pages
Poc 3-1 All Units Notes
No ratings yet
Poc 3-1 All Units Notes
10 pages
Lung Cancer Classification and Detection Using CNN
No ratings yet
Lung Cancer Classification and Detection Using CNN
8 pages
MAjor Project Report
No ratings yet
MAjor Project Report
27 pages
Industrial Training Report
No ratings yet
Industrial Training Report
14 pages
8
No ratings yet
8
12 pages
review 1
No ratings yet
review 1
20 pages
Lung Cancer Detection and Prediction of Cancer Stages Using Image Processing
No ratings yet
Lung Cancer Detection and Prediction of Cancer Stages Using Image Processing
9 pages
597 Icac3n23
No ratings yet
597 Icac3n23
5 pages
2024
No ratings yet
2024
14 pages
Wa0022.
No ratings yet
Wa0022.
60 pages
Manuscript Reference
No ratings yet
Manuscript Reference
31 pages
10 1109@iccsp48568 2020 9182258
No ratings yet
10 1109@iccsp48568 2020 9182258
4 pages
Teja - Technical Seminar Presentation
No ratings yet
Teja - Technical Seminar Presentation
28 pages
project proposal
No ratings yet
project proposal
21 pages
JETIR2305D49
No ratings yet
JETIR2305D49
5 pages
2020_9470 defense
No ratings yet
2020_9470 defense
14 pages
Lung Cancer Detection Using Image Processing Synopsis Report
No ratings yet
Lung Cancer Detection Using Image Processing Synopsis Report
19 pages
ANANS Review2
No ratings yet
ANANS Review2
39 pages
The Roadmap To A Strong Business
No ratings yet
The Roadmap To A Strong Business
49 pages
Lung Cancer Detection Using Machine Learning IJERTCONV7IS01011
No ratings yet
Lung Cancer Detection Using Machine Learning IJERTCONV7IS01011
6 pages
PROJECT REPORT on Lung Cancer Detection Using Cnn
No ratings yet
PROJECT REPORT on Lung Cancer Detection Using Cnn
21 pages
TSP_CMC_54460
No ratings yet
TSP_CMC_54460
26 pages
Batch 7
No ratings yet
Batch 7
42 pages
b.tech-biomed-batchno-10 (1)
No ratings yet
b.tech-biomed-batchno-10 (1)
54 pages
Deep Learning and Machine Learning Algorithms to Predict Lung Cancer
No ratings yet
Deep Learning and Machine Learning Algorithms to Predict Lung Cancer
5 pages
Part 1
No ratings yet
Part 1
36 pages
Lung Cancer (CT) 2024
No ratings yet
Lung Cancer (CT) 2024
9 pages
Hybrid model detection and classification of lung cancer
No ratings yet
Hybrid model detection and classification of lung cancer
11 pages
Mukherjee 2020
No ratings yet
Mukherjee 2020
5 pages
Lung Cancer Detection Using Digital Image Processing On CT Scan Images
No ratings yet
Lung Cancer Detection Using Digital Image Processing On CT Scan Images
7 pages
IJRAR22B3053
No ratings yet
IJRAR22B3053
18 pages
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
E-Commerce The Contribution To The Malaysia's Economy
No ratings yet
E-Commerce The Contribution To The Malaysia's Economy
11 pages
Legion Pro 5 16arx8 82wm00fhax
No ratings yet
Legion Pro 5 16arx8 82wm00fhax
2 pages
Sendproof Protocol v1.0
No ratings yet
Sendproof Protocol v1.0
14 pages
Payroll Management System Project
90% (10)
Payroll Management System Project
21 pages
Vision System
No ratings yet
Vision System
8 pages
internet security
No ratings yet
internet security
3 pages
What Is MySQLi Connect
No ratings yet
What Is MySQLi Connect
7 pages
Eaton Jockey Touch™ Microprocessor Based Jockey Pump Controller
No ratings yet
Eaton Jockey Touch™ Microprocessor Based Jockey Pump Controller
28 pages
Demystifying UML
No ratings yet
Demystifying UML
12 pages
Avaya Support - Knowledge Base InQuira InfoCenter - DL360 G7 Bad RAID Backup Battery Detected
No ratings yet
Avaya Support - Knowledge Base InQuira InfoCenter - DL360 G7 Bad RAID Backup Battery Detected
2 pages
WSO2 Platform Overview
No ratings yet
WSO2 Platform Overview
35 pages
Hruthik Reddy_Senior Data Engineer
No ratings yet
Hruthik Reddy_Senior Data Engineer
4 pages
CS169: Mobile Wireless Networks: Lab1: NS3 Introduction
No ratings yet
CS169: Mobile Wireless Networks: Lab1: NS3 Introduction
17 pages
XRY v7.5.1 Technical Release Notes
No ratings yet
XRY v7.5.1 Technical Release Notes
2 pages
West Bengal University of Technology BF-142, Salt Lake City, Kolkata-700064 Syllabus For BCA
No ratings yet
West Bengal University of Technology BF-142, Salt Lake City, Kolkata-700064 Syllabus For BCA
29 pages
MIS MCQs
No ratings yet
MIS MCQs
10 pages
Android Complete
No ratings yet
Android Complete
3 pages
Code Champ-1
No ratings yet
Code Champ-1
4 pages
Sophos Endpoint Security and Control 9.7 Policy Setup Guide: April 2011
No ratings yet
Sophos Endpoint Security and Control 9.7 Policy Setup Guide: April 2011
30 pages
Magic Square Puzzle
No ratings yet
Magic Square Puzzle
1 page
Buy Microsoft Access - Standalone Price & Licence - Microsoft Store
No ratings yet
Buy Microsoft Access - Standalone Price & Licence - Microsoft Store
2 pages
Unit 3
No ratings yet
Unit 3
23 pages
IPDirector TechRef RemoteInstaller 7.92
No ratings yet
IPDirector TechRef RemoteInstaller 7.92
274 pages
Over The Mountain
No ratings yet
Over The Mountain
6 pages
Practical Slips Answers Java
No ratings yet
Practical Slips Answers Java
58 pages
Optimising Ariba Standard Account For Ventia Suppliers Guide-6972
No ratings yet
Optimising Ariba Standard Account For Ventia Suppliers Guide-6972
16 pages