D15 Final
D15 Final
Lung cancer is one of the key causes of death amongst humans globally,
with a mortality rate of approximately five million cases annually which is even
higher than breast cancer and prostate cancer combination. However, early detection
and diagnosis can improve the survival rate. This research's main contribution is the
detection and classification of different kinds of lung cancers. The machine learning
technique such as support vector machine (SVM) and K-Nearest Neighbors(KNN)
evaluated chest CT scan images dataset for texture feature classification. The
proposed technique's performance achieves better accuracy of 96% and 94% for
support vector machine and K-nearest neighbors, respectively, than state-of-the-art
techniques.
INTRODUCTION
Lung cancer, also known as lung carcinoma, is a malignant tumor characterized
by uncontrolled growth of the cell in tissues of the lung. It is mandatory to treat
this to avoid spreading its growth by metastasis to other parts of the body.
Long-period tobacco smoking is the primary factor for 85% of lung cancers.
About 10–15% of cases occur in people who have never smoked but due to air
pollution, secondhand smoking, asbestos, and radon gas.
The cause of cancer-related death among men is mainly due to lung cancer.
Hence, it is essential to determine a new robust method to diagnose the lung
cancer at an earlier stage.
For the present study, dataset and CT image samples and algorithms KNN and
SVM have been taken for analysis.
OBJECTIVES
The main aim of this research is detection and classification of
lung cancer by using machine learning techniques.
To improve the performance measure in terms of accuracy.
To solve complex problems by using support vector machine.
To make classification and feature extraction more accurate.
EXISTING SYSTEM
Different diagnosing techniques have been used during lung cancer
detection such as MRI (magnetic resonance imaging), CT (computed
tomography), PET (positron emission tomography).
The existing designs of lung cancer detection and classification systems
are based on hand-engineered techniques and their outcomes in terms of
accuracy and other performance measures are limited.
The manual interpretation of the lung cancer CT images are time
consuming and very critical.
PROPOSED SYSTEM
The proposed system detects and classify the lung cancer by using SVM
and KNN based classification of lung cancer images into cancerous and
non-cancerous.
The main objective of this work is to detect the cancerous lung nodules
from the given input lung image and to classify the cancer images
quickly and effectively and its severity.
The proposed technique's performance achieves better accuracy of 97%
and 95% for support vector machine and K-Nearest Neighbors,
respectively, than state-of-the-art techniques.
Software Requirements
Operating system: Windows 7 or above
Coding Language: Python
Modules Required: numpy,pandas,flask.
Hardware Requirements
Processor : i3 or above
Hard Disk: 16 GB available hard disk space (32-bit) or 20GB (64-bit)
DATASET
DATA PREPARATION
The data used for the research is obtained as secondary data
from the publicly available Lung Image Database (LIDC-IDRI)
[21].
It contains diagnostic, and lung cancer screening thoracic
computed tomography scans with marked-up annotated lesions.
This dataset is considered the benchmark for lung cancer
research and data science challenges in the machine learning
community.
To create this dataset, seven academic centers and eight
medical imaging companies collaborated.
For our purposes, the dataset and labels used were from the first
initial phase of the process, to ensure that that the algorithm will
capture the full range of cancer cases from patients.
Additionally, due to a lack of resources, only snapshots of the CT
scan were acquired.
In other words, the classifier learned from images rather than from
full CT scans. In cancer cases, the snapshot was deliberately chosen
from the layer containing the nodule.
METHODOLOGY
SVM Kernel
Classification
Selection
This research was conducted in an exploratory manner. Because the
major goal of this thesis is to classify lung cancer, a dataset was
created using secondary data from the Cancer Imaging Archive,
which included snapshots of CT scans of patients with and without
cancer.
Before training, several preprocessing procedures were used to all
the photos in order to enrich the characteristics in the snapshots.
I used deep learning technologies for training, namely
convolutional neural networks for classification and K-Nearest
Neighbors , Support Vector Machine for detection
Chest CT scan images of Lung cancer subtypes. (a)Adenocarcinoma
(b) Large cell carcinoma (c) Squamous cell carcinoma and (d) normal
CLASSIFICATION
After selecting and extracting texture features, a feature vector was
created. SVM and KNN are used to classify the features vector. The
fundamental goal of SVM is to automatically categorize data into
low or high dimensions
By selecting appropriate Neighbors, the K-Nearest Neighbor (KNN)
classifier has been developed to improve and effectively identify and
classify lung cancer.
The Euclidian distance is used to compute distance when the KNN
classifier compares samples via voting.
IMPLEMENTATION
Detection
Classification
CONCLUSION
D15 paper.doc