Lung Cancer Detection Using Ensemble Techniques
Lung Cancer Detection Using Ensemble Techniques
Abstract:- This paper implements a system for enhancing of medical imaging data of 1400+ images. Since the
the detection of lung cancer through an ensemble dataset is large, it was necessary to properly group them,
approach, which amalgamates the predictive outputs which is why it was systematically divided into three
generated by three distinct convolutional neural networks distinctive directories: train, test, and validation.
(CNNs): ResNet50, EfficientNet, and InceptionNet. Particularly, the train directory included 70% of the
Leveraging the diverse architectural features and images, whereas the test and validation directories
learning capabilities of these CNNs, the ensemble method contained 20% and 10%, accordingly, for the sake of later
aims to synergistically fuse their individual predictions to robust model evaluation.
achieve heightened accuracy and robustness in After the data collection was complete, a detailed data
identifying potential lung cancer manifestations. cleaning phase was performed. Thus, every image was
classified into one of four classes: normal,
Keywords:- Lung Cancer Detection; CNN; Ensemble adenocarcinoma, large cell carcinoma, and squamous cell
Techniques; Resnet50; VGG16; Inceptionnet. carcinoma. Such type of categorization was essential for
the following stages of training and validation. Moreover,
I. INTRODUCTION each inappropriate or unusable image was found and
eliminated from the dataset to ensure its quality and
This paper introduces a methodology to enhance lung purity.
cancer detection by integrating predictions from ResNet50, With a clean and well-organized dataset in place, the next
EfficientNet, and InceptionNet convolutional neural step was data preprocessing. This phase involved a series
networks. Leveraging the architectural features of these of essential transformations to prepare the images for
models, the ensemble approach averages their outputs, model training. Initially, the images were relabeled and
aiming for heightened accuracy and robustness in identifying indexed according to their respective classes to facilitate
potential lung cancer manifestations. Through evaluation, efficient data handling. Subsequently, they underwent
this study demonstrates the accuracy of the proposed rescaling and resizing to a standardized dimension of 224
ensemble method to be 90.2%, offering a promising avenue by 224 pixels, ensuring uniformity across the dataset. To
for advancing clinical diagnosis and patient outcomes in further enhance model generalization and robustness,
health management. A system has been proposed to various data augmentation techniques were applied,
streamline the operational efficiency of organizations, including horizontal flipping, contrast adjustment, and
researchers, and medical professionals by implementing grayscale conversion.
automated processes. This system entails the development of Following data preprocessing, the focus shifted to model
application programming interfaces (APIs) to facilitate loading and initialization. Pre-trained convolutional
seamless interaction with the model and databases. Its core neural networks (CNNs) such as ResNet50, EfficientNet,
functionality involves the classification of CT-Scans in large- and InceptionNet were selected for their well-established
scale batches, followed by the systematic storage of the architectures and superior performance in image
processed data within the database infrastructure. classification tasks. These models were loaded along with
their pre-trained weights, allowing them to leverage the
Goal and Objectives knowledge gained from extensive training on large-scale
image datasets.
Implementing a model to classify CT-Scan images of The subsequent training phase involved feeding the
lungs as cancerous or non cancerous. preprocessed images into the input layers of the CNNs.
Delivering the model to the end user in a cost effective The models were trained with non-trainable weights,
and quick way. allowing them to learn and extract meaningful features
Diagnosing patients and detecting early signs of lung from the input data over multiple epochs. A total of 15
cancer to encourage early intervention. epochs were chosen to balance between model
convergence and computational efficiency.
II. MODEL TRAINING Upon completion of individual model training, an
ensemble method was employed to combine the
The data collection process started with grouping the CT- predictive outputs of the three CNNs. This ensemble
scan images of lungs, which formed the most critical part model leveraged the collective intelligence and diverse
in the training of the following models. The source of the perspectives of multiple CNN architectures, leading to
images, the well-established Kaggle dataset, the Chest improved prediction accuracy and robustness.
CT-Scan Images Dataset, represents a thorough collection
Model evaluation was conducted using test data. Model was saved to disk. APIs were created to interact
Precision, recall, F1 score were calculated and rate of type with the model and the database.
1 errors and type 2 errors were noted.
Fig 1: Model
III. RESULTS
V. CONCLUSION