0% found this document useful (0 votes)
5 views

A CAD System for Lung Cancer Detection Using Hybri

This article presents a fully automated computer-aided diagnosis (CAD) system for lung cancer detection using hybrid deep learning techniques, specifically a combination of deep convolutional neural networks (DCNN) and long short-term memory networks (LSTMs). The proposed system achieves over 98.8% accuracy and outperforms existing methods, providing significant potential for early detection and accurate diagnosis of lung cancer. The research aligns with Saudi Arabia's Vision 2030 to enhance healthcare quality and improve life standards.

Uploaded by

muizz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

A CAD System for Lung Cancer Detection Using Hybri

This article presents a fully automated computer-aided diagnosis (CAD) system for lung cancer detection using hybrid deep learning techniques, specifically a combination of deep convolutional neural networks (DCNN) and long short-term memory networks (LSTMs). The proposed system achieves over 98.8% accuracy and outperforms existing methods, providing significant potential for early detection and accurate diagnosis of lung cancer. The research aligns with Saudi Arabia's Vision 2030 to enhance healthcare quality and improve life standards.

Uploaded by

muizz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

diagnostics

Article
A CAD System for Lung Cancer Detection Using Hybrid Deep
Learning Techniques
Ahmed A. Alsheikhy 1, * , Yahia Said 1 , Tawfeeq Shawly 2 , A. Khuzaim Alzahrani 3 and Husam Lahza 4

1 Department of Electrical Engineering, College of Engineering, Northern Border University,


Arar 91431, Saudi Arabia; [email protected]
2 Department of Electrical Engineering, Faculty of Engineering at Rabigh, King Abdulaziz University,
Jeddah 21589, Saudi Arabia; [email protected]
3 Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences,
Northern Border University, Arar 91431, Saudi Arabia; [email protected]
4 Department of Information Technology, College of Computing and Information Technology,
King Abdulaziz University, Jeddah 21589, Saudi Arabia; [email protected]
* Correspondence: [email protected]

Abstract: Lung cancer starts and spreads in the tissues of the lungs, more specifically, in the tissue
that forms air passages. This cancer is reported as the leading cause of cancer deaths worldwide.
In addition to being the most fatal, it is the most common type of cancer. Nearly 47,000 patients
are diagnosed with it annually worldwide. This article proposes a fully automated and practical
system to identify and classify lung cancer. This system aims to detect cancer in its early stage to save
lives if possible or reduce the death rates. It involves a deep convolutional neural network (DCNN)
technique, VGG-19, and another deep learning technique, long short-term memory networks (LSTMs).
Both tools detect and classify lung cancers after being customized and integrated. Furthermore,
image segmentation techniques are applied. This system is a type of computer-aided diagnosis
(CAD). After several experiments on MATLAB were conducted, the results show that this system
achieves more than 98.8% accuracy when using both tools together. Various schemes were developed
to evaluate the considered disease. Three lung cancer datasets, downloaded from the Kaggle website
and the LUNA16 grad challenge, were used to train the algorithm, test it, and prove its correctness.
Lastly, a comparative evaluation between the proposed approach and some works from the literature
Citation: Alsheikhy, A.A.; Said, Y.;
is presented. This evaluation focuses on the four performance metrics: accuracy, recall, precision,
Shawly, T.; Alzahrani, A.K.; Lahza, H.
and F-score. This system achieved an average of 99.42% accuracy and 99.76, 99.88, and 99.82% for
A CAD System for Lung Cancer
recall, precision, and F-score, respectively, when VGG-19 was combined with LSTMs. In addition, the
Detection Using Hybrid Deep
results of the comparison evaluation show that the proposed algorithm outperforms other methods
Learning Techniques. Diagnostics
2023, 13, 1174. https://doi.org/
and produces exquisite findings. This study concludes that this model can be deployed to aid and
10.3390/diagnostics13061174 support physicians in diagnosing lung cancer correctly and accurately. This research reveals that the
presented method has functionality, competence, and value among other implemented models.
Academic Editor: Chiara Martini

Received: 27 January 2023 Keywords: lung cancer; classification; diagnosis; DCNN; VGG-19; LSTMs; medical informatics;
Revised: 9 March 2023 artificial intelligence; CAD
Accepted: 17 March 2023
Published: 19 March 2023

1. Introduction
Copyright: © 2023 by the authors.
Artificial intelligence (AI) has been used in various applications, such as in educational,
Licensee MDPI, Basel, Switzerland. industrial, economic, and medical fields. In the medical field, AI can detect and predict
This article is an open access article diseases. Recently, researchers have turned their attention to using AI in genomes to save
distributed under the terms and lives and provide solutions for numerous diseases.
conditions of the Creative Commons According to the World Health Organization (WHO) and [1], lung cancer has been
Attribution (CC BY) license (https:// reported as a widespread global disease that occurs due to the uncontrolled growth of
creativecommons.org/licenses/by/ tissues [1,2]. There has been a significant increase in the death rate from lung cancer [1]. In
4.0/). 2020, the International Association of Cancer Society (IACS) reported that 235,760 patients

Diagnostics 2023, 13, 1174. https://doi.org/10.3390/diagnostics13061174 https://www.mdpi.com/journal/diagnostics


Diagnostics 2023, 13, x FOR PEER REVIEW 2 of 20

Diagnostics 2023, 13, 1174 2 of 20


tissues [1,2]. There has been a significant increase in the death rate from lung cancer [1].
In 2020, the International Association of Cancer Society (IACS) reported that 235,760 pa-
tients were diagnosed
were diagnosed with with
lunglung
cancercancer
[1]. [1].
In Inthethe sameyear,
same year,around
around132,000
132,000 deaths
deaths were
were
announced by the IACS
announced by the IACS [1]. [1].
Lungs
Lungs are
are spongy
spongy organs
organs inin the
the body.
body. They
They are
are responsible
responsible forfor intaking
intaking oxygen
oxygen from
from
inhalation
inhalation and expending carbon dioxide through exhalation [1–3]. There are numerous
and expending carbon dioxide through exhalation [1–3]. There are numerous
signs
signs and
andsymptoms
symptomsofoflung lungcancer.
cancer. These
Thesesymptoms
symptoms include, but but
include, are not
are limited to, per-
not limited to,
sistent coughing,
persistent coughing, coughing
coughing blood,
blood,shortness
shortness ofofbreath,
breath,wheezing,
wheezing,and andpain
painin inthe
the bones
bones
and
and chest
chest [1,2].
[1,2]. These
These symptoms
symptoms typically
typically appear
appear whenwhen thethe disease
disease is
is at
at an
an advanced
advanced stagestage
as opposed to an early stage [1,3–7]. Several factors can lead to a patient’s
as opposed to an early stage [1,3–7]. Several factors can lead to a patient’s increased increased risk
risk of
of lung cancer, such as prolonged smoking, exposure to secondhand
lung cancer, such as prolonged smoking, exposure to secondhand smoke [2], exposure to smoke [2], exposure
to radiation,
radiation, and
and a family
a family history
history of of lung
lung cancer
cancer [1,3,6,7].
[1,3,6,7].
The disease
disease isis diagnosed
diagnosed using
using CTCTscans,
scans,sputum
sputumcytology,
cytology, oror biopsy,
biopsy, which involves
involves
taking
taking out
out a sample of infected lung tissue. tissue. Treatment
Treatment plans are chosen based on several several
considerations,
considerations, suchsuchasasthethepatient’s
patient’s overall
overall health, disease
health, stage,
disease and and
stage, preference. A phy-
preference. A
sician’s recommendation is often a patient’s best course of action. Typically,
physician’s recommendation is often a patient’s best course of action. Typically, a treatment a treatment
plan includes surgery, chemotherapy,and
surgery, chemotherapy, andradiation
radiation[8].[8].
In general, two types of lung cancer exist, and these types are as follows [1,8–10]:
1. Non-smallcell
Non-small celllung
lung cancer
cancer (NSCLC)
(NSCLC) is
is the
the most
most common
common variant,
variant, which
which grows
grows and
spreadsslowly;
spreads slowly;
2. Smallcell
Small celllung
lungcancer
cancer(SCLC)
(SCLC)isiscaused
causedbybysmoking
smokingand
andspreads
spreadsfaster
fasterthan
thanNSCLC.
NSCLC.
Adenocarcinoma, largelargecell
cellcarcinoma
carcinoma(LCC),
(LCC),and
andsquamous
squamous cellcell
carcinoma
carcinoma(SCC) are
(SCC)
identified
are as the
identified subtypes
as the of NSCLC
subtypes of NSCLC[11–16]. At the
[11–16]. At same time,time,
the same small cell carcinoma
small and
cell carcinoma
combined
and small
combined cellcell
small carcinoma
carcinomaareare
classified asas
classified the
thesubtypes
subtypesofofSCLC.
SCLC.Figure
Figure 1 depicts
types of lung cancer. In this research, NSCLC is considered and
types of lung cancer. In this research, NSCLC is considered and studied. studied.

Types of lung cancer.


Figure 1. Types

In aa recently
In recently published
published paper
paper on
on lung
lung cancer
cancer detection,
detection, the
the authors
authors utilized
utilized circles
circles
inside the lungs to detect cancer. The circles were indicators of cancerous lungs. Herein,
inside the lungs to detect cancer. The circles were indicators of cancerous lungs. Herein,
these circles
these circles are
are identified
identified and
and classified
classified using
using the
the proposed
proposed system.
system.
1.1. Research Problem and Motivations
1.1. Research Problem and Motivations
This article builds on a vision of Saudi 2030 to improve the quality of life of an indi-
This article builds on a vision of Saudi 2030 to improve the quality of life of an indi-
vidual society by developing a reliable approach to identifying lung cancer and providing
vidual society by developing a reliable approach to identifying lung cancer and providing
accurate diagnoses that would save lives. The system offers healthcare providers an avenue
for the comprehensive, integrated, and effective detection of lung cancer and the appropri-
ate treatment administration to patients. Various studies have been conducted to identify
and categorize the disease using numerous algorithms. The highest achieved accuracy was
Diagnostics 2023, 13, 1174 3 of 20

99%, as in [3], while the F-score was 99.19%. Therefore, this study aims to achieve higher
accuracy and F-score results.
The motivations of this research stem from the promising vision of Saudi Arabia, the
2030 vision. In 2016, the Saudi Arabia government announced and initiated this vision
to start a new era. This era has become the dream for every citizen and resident in Saudi
Arabia. The objectives of this vision are:
i. Enhancing the economy of the country;
ii. Providing digital services and transforming public services into a digital world;
iii. Increasing safety and security;
iv. Providing a better life and improving the quality of life.
This vision involves 12 programs and initiatives, one of which is called the Quality-
of-Life Program. It aims to create a better environment for Saudi citizens and residents
by providing the necessary and possible lifestyle options. These options will support the
government by engaging all people in the culture, sports, and activities inside the country.
This engagement will increase social life, create more jobs, and improve the economy.
In addition, utilizing and using new digital health solutions are critical in enhancing the
quality of life. The authors of this research want to engage and take a place in this promising
vision by proposing and developing a new approach that can elaborate and increase the
current level of lung cancer diagnosis, identification, and analysis in Saudi Arabia.

1.2. Research Contributions


In this research, the contribution is achieved by proposing an automated, intelligent
system to spot, identify, and classify NSCLC and its subtypes with high accuracy using two
deep learning techniques. As found in the recently published articles in this field, as in [3],
the current approaches detect and classify the tumors, and the maximum obtained accuracy
was near 99%. Thus, the proposed system aims to close this gap and provides a bridge to
enhance the identification and classification accuracy by implementing and developing a
new model for lung cancer identification and categorization with a high accuracy, which
was observed to be over 99.3%.
The rest of the paper is organized as follows: a literature review is presented in the
next subsection, and Section 2 details the developing approach. The results and comparison
evaluation between the developed method and other research in the literature are described
in Section 3. The discussion is provided in Section 4, and the conclusion is given in Section 5.

1.3. Related Work


Hosseini et al. [1] provided a systematic review of lung cancer using deep learning
approaches. The authors reviewed 32 conferences and journals related to lung cancer from
2016 to 2021 and combed through databases from IEEE Xplore, ScienceDirect, Springer Link,
Wiley, and Google Scholar. Numerous algorithms for lung cancer detection were studied,
evaluated, and analyzed based on their architecture, used datasets, and performance
metrics, such as accuracy and sensitivity. Readers can find more information in [1].
Sousa et al. [2] analyzed a deep learning method developed on U-Net and ResNet34
structures. This procedure was performed on four different types of cross-cohort datasets.
The authors evaluated the mean of a performance metric called the dice similarity coefficient
(DSC) and found it to be more than 0.93. Two radiation experts spotted and determined
the limitations of the developed method. The authors confirmed that there was a slight
degradation for consolidation while testing two pathologist cases. In this article, the
developed system reaches over 99.3% and 99.2% accuracy and F-scores, respectively. The
presented algorithm evaluates the disease based on four performance metrics, as stated
earlier. In addition, two deep learning tools were utilized to evaluate the system on three
datasets. Interested readers can refer to [2] for additional information.
In [3], Nazir et al. proposed an approach to optimize lung cancer diagnosis using
image fusion for segmentation purposes. This study incorporated and integrated the fusion
method, designed and developed on a Laplacian pyramid (LP) decomposition with an
Diagnostics 2023, 13, 1174 4 of 20

adaptive sparse representation (ASR). This process worked because it fragmented the
medical CT images into different sizes. At this point, the LP method was applied to fuse
these sizes into four layers. A dataset was used to evaluate their approach using DSC as
the performance metric, and it was nearly 0.9929. The authors claimed that their method
produced better results than others recently published in the field. In contrast, the presented
model uses three datasets and two deep learning techniques to achieve 99.42% accuracy,
as the experiments show in Section 4. The algorithm reaches 99.76, 99.88, and 99.82% for
recall, precision, and the F-score, respectively. This system developed three schemes and
determined a confusion matrix for every scheme. These outcomes show that the proposed
model is better than the developed method in [3] as it achieves higher outputs. Readers are
advised to refer to [3] for additional information.
In [4], Dayma developed a manual process machine to detect lung cancer readily.
This process involved numerous CT images and a Gabor filter, and a dataset of 1800 im-
ages, of which 900 were images of kids diagnosed with lung cancer. Each image was
200 × 200 pixels, and the dataset was collected from the IMBA Home Database. Unfortu-
nately, there was no mention of any performance metric or obtained results. In contrast,
the presented system is a CAD model, and three datasets were used to prove its flow and
results. This CAD system achieves 99.42% average accuracy, and its maximum reached
99.61%. Moreover, the reached results of the other considered performance metrics were
99.76, 99.88, and 99.82% for recall, precision, and the F-score, respectively. Two deep learn-
ing techniques were used with three schemes to achieve these results. Both deep learning
models were customized, as shown in the next section.
Hasan and Al Kabir [7] developed algorithms to determine whether cancer had spread
in a patient’s lungs. These algorithms worked based on methods of image processing
and statistical learning. The algorithms were tested on a dataset from the Kaggle site
with 198 images. It achieved around 72.2% accuracy, which is significantly lower than
the accuracy of the approach examined in this article, which is 99.42%. The proposed
algorithm in this study achieves 99.76, 99.88, and 99.82% for recall, precision, and the
F-score, respectively. These outcomes are exquisite and show that this method is better
than the implemented one in [7]. Three scenarios were developed to reach acceptable
results, as shown in Section 3. Moreover, the implemented method in [7] used one dataset
with 198 images, while the presented system in this study only used three datasets with
1463 images for testing.
Nasser and Abu Naser [9] developed an artificial neural network (ANN) model to
detect whether a human body contained lung cancer. This process involved numerous
symptoms utilized as inputs to an ANN to diagnose the disease. Survey Lung Cancer, a
dataset, was used to train and validate the model. It had an accuracy of nearly 96.67% after
running more than 1,418,000 learning cycles. The approach took more time to reach this
level of accuracy than the proposed approach in this article, which achieves more than 99%
accuracy in fewer learning cycles. This difference demonstrates the superiority of the latter
approach. Furthermore, compared to the model in [9], the execution time of this article’s
approach is of a shorter duration. The CAD system achieved 99.42, 99.76, 99.88, and 99.82%
for accuracy, recall, precision, and the F-score, respectively, using two deep learning tools,
VGG-19 and LSTMs. These tools were customized and encapsulated, as illustrated in the
next section. Readers are advised to refer to [9] for more information.
Bhatia et al. [15] implemented an algorithm for lung cancer detection using deep
residual learning on CT-scan images. The authors used U-Net and ResNet models to extract
features and highlight potential regions that are vulnerable to cancer. Multiple classifiers
utilized to predict cancer included viz., XGBoost, RF, and individual predictions. This
achieved 84% accuracy on an LIDC-IRDI dataset. In contrast, the presented system reached
99.42, 99.76, 99.88, and 99.82% for accuracy, recall, precision, and the F-score, respectively.
This system utilized three scenarios using VGG-19 and LSTMs after performing various
modifications. The obtained outcomes are better than the developed model in [15]. In
addition, three datasets of CT scans and X-ray images were used.
Diagnostics 2023, 13, 1174 5 of 20

Madan et al. [16] presented a technique to identify lung cancer using an ensemble CNN
approach. This algorithm used CT scans and X-ray images of two datasets. This involved a
dataset that contained 1623 images and achieved an accuracy of around 93%, lower than
the obtained accuracy from the proposed CAD algorithm in this article. Conclusively, the
proposed method in this article, for purposes of accuracy, achieves 99.42%, which is better
than the results reached in [16]. Moreover, 99.76, 99.88, and 99.82% for recall, precision,
and the F-score were also achieved. VGG-19 and LSTMs were modified to be utilized in
this study.

2. Materials and Methods


2.1. Problem Statement
Healthcare providers need a solution that can give accurate diagnoses and results. The
Ministry of Health in Saudi Arabia utilizes several devices to support and assist physicians
in identifying lung cancer at an early stage with higher accuracy. Some implemented lung
cancer diagnosis and classification methods that reached an accuracy between 98.8 and
99%, as in [2] and [3]. Hence, the authors target and intend to participate in the vision by
providing a reliable solution to detect and classify lung cancer precisely with an accuracy
higher than 99.3% and less processing time. The target processing time, also known as the
execution time, is less than 3.5s for each input.

2.2. Research Objectives


This study aims to achieve numerous goals, and these objectives are summarized
as follows:
• To explore, study, and analyze current methods in lung cancer to mark and locate
their vulnerabilities;
• To search for available lung cancer datasets, download them, and conduct an analysis;
• To implement a feasible model to identify and categorize lung cancer by incorporating
the convolutional neural network (VGG-19) and LSTMs;
• To determine the number of identified cancer cells using the developed model;
• To evaluate numerous performance parameters to assess the proposed algorithm with
other state-of-the-art approaches using four parameters: accuracy, precision, recall,
and the F-score;
• To build a confusion matrix that characterizes how the proposed model categorizes a
given test dataset appropriately and accurately. The proposed CAD system generally
generates three confusion matrixes as three schemes are utilized.

2.3. Datasets
The utilized datasets were downloaded from the Kaggle website [17,18] and LUNA16
grand challenge [19]. These datasets were approximately 70.125 GB, with 2,351 images of
CT scans and X-ray types. These images were divided into four categories: adenocarci-
noma, large cell carcinoma, squamous cell carcinoma, and normal. In total, 888 images
were reserved for training and validation purposes, while the testing dataset included
1463 images. For the training and validation dataset, each category contained 222 images.
The total number of extracted features per image was 22, such as area, diameter, texture,
and radius. Therefore, the proposed system extracted 51,722 features/characteristics for all
utilized images. Table 1 illustrates complete details about the used datasets, which include
the type of images, the type of dataset, the size, and the ground truth.

2.4. The Utilized Deep Learning Techniques (DLTs)


2.4.1. VGG-19
This tool is a convolutional neural network with 19 deep layers. It is considered
another variation of the VGG technique. Visual Geometry Group developed it; thus, it is
known as VGG. VGG-19 contains 16 convolutional layers, 3 fully connected layers, 5 max-
pool layers, and 1 soft-max layer. In addition, there are 19.6 billion FLOPs. This tool accepts
Diagnostics 2023, 13, 1174 6 of 20

images of 224 × 224. Rectified linear unit (ReLU) activates each hidden layer to stabilize
the system and regulate its parameters. In addition, the inputs were normalized using a
batch normalization method. Moreover, the Adam optimizer was included in this tool. The
obtained final characteristics matrix was paved and flattened to be fed into the dense layers.
The kernel size was set to 3 × 3. The size of the pooling filter in every hidden layer was
2 × 2 with four strides. Due to space limitations, the architecture of VGG-19 is omitted.
Table 2 summarizes the utilized hyperparameters for this study’s three datasets.

Table 1. The utilized datasets.

Properties Chest CT Scans [17] LIDC-IDRI [18] LUNA16 [19]


Number of images 1000 463 888
Size 125 MB 4 GB 66 GB
Ground truth Yes Yes Yes
Type CT scan X-rays CT scan

Table 2. The hyperparameters of VGG-19.

Parameter Used Value


Input size (n) 224 × 224
Activation (ACT) ReLU
Kernel size (KS) 3 ×3
Pool filtering size (np) 2×2
Stride number (sn) 4
Padding (pad) Same
Optimizer Adam

2.4.2. Long Short-Term Memory Networks (LSTMs)


Another encapsulated deep learning tool in the proposed CAD system is LSTMs. Tanh
was used as an activation inside this tool. To make the customization and integration of
VGG-19 and LSTMs smooth and possible, the values of the common hyperparameters
remained changed. Table 3 lists the utilized hyperparameters of LSTMs in this study. The
Adam optimizer is included in this network as well.

Table 3. LSTMs’ hyperparameters and their values.

Parameter Value
Number of cells (nc) 50
Number of units in every cell (nu) 64
Activation (ACT) Tanh
Optimizer (opt) Adam

2.5. The Proposed Methodology


The proposed approach in this article can detect and classify lung cancer quickly
and accurately. This approach involves incorporating and elaborating numerous stages to
achieve every physician’s core goal: saving lives. The proposed algorithm contains DCNN,
specifically VGG-19, and LSTMs to detect and classify the disease. The proposed model
in this article identifies and classifies only NSCLC quickly and correctly. This model uses
principal component analysis (PCA) to reduce and minimize the resulting error [20–23]. In
this proposed model, we deliver rich detail along with figures and charts to explain some
achieve every physician’s core goal: saving lives. The proposed algorithm contains
DCNN, specifically VGG-19, and LSTMs to detect and classify the disease. The proposed
model in this article identifies and classifies only NSCLC quickly and correctly. This
model uses principal component analysis (PCA) to reduce and minimize the resulting er-
Diagnosticsror
2023,[20–23].
13, 1174 In this proposed model, we deliver rich detail along with figures and charts 7 of 20
to explain some results and represent outputs from the presented algorithm that are gen-
erated and initiated by the implemented approach.
In a recently results and represent
published paper onoutputs
lungfrom the presented
cancer detection, algorithm that areutilized
the authors generated and initiated
circles
by the implemented approach.
inside the lungs to detect cancer. The circles were indicators of cancerous lungs. Here, the
In a recently published paper on lung cancer detection, the authors utilized circles
proposed algorithm uses
inside theDCNN
lungs toand LSTMs
detect cancer.after modifying
The circles them toofdetect
were indicators lunglungs.
cancerous cancer Here, the
nodules and classify them. algorithm
proposed Moreover, it DCNN
uses can helpandradiologists
LSTMs after to providethem
modifying efficient pattern
to detect lung cancer
recognition. Figurenodules
2 depicts
andaclassify
block them.
diagram of theitpresented
Moreover, model to to
can help radiologists identify
provideand clas-pattern
efficient
recognition.
sify lung cancer nodules Figure
using 2 depicts
three a block
schemes of diagram of the presented
the considered tools.model to identify
The used and classify
datasets
lung cancer nodules using three schemes of the considered tools. The used datasets from
from the Kaggle website and LUNA16 grad challenge are investigated to see how the pre-
the Kaggle website and LUNA16 grad challenge are investigated to see how the presented
sented approach works and responds to numerous procedures and operations.
approach works and responds to numerous procedures and operations.

Figure 2. Flowchart Figure


of the 2.
proposed
Flowchartmodel.
of the proposed model.

The model starts by preprocessing to remove noise, resize the inputs, and convert
The model starts by preprocessing to remove noise, resize the inputs, and convert all
all inputs into gray images. Gabor filters and a discrete wavelet transformation method
inputs into gray images.
distinguishGaborlungfilters
regionsandandaseparate
discretethesewavelet
regions transformation
from the originalmethod
images. dis-
tinguish lung regions and separate these regions from the original images.
The second stage of the proposed model is the deep learning phase, where VGG-19
The second stage of the are
and LSTMs proposed
ensembledmodel is the deep
to identify learningthe
and categorize phase, where
disease VGG-19Initially,
accordingly.
and LSTMs are ensembled to identify and categorize the disease accordingly. Initially,and
the outputs from the previous step are normalized to support the training thetesting
stages, and all incoming data are resized to 224 × 224. Convolutional layers are required
outputs from the previous step are normalized to support the training and testing stages,
to generate the relevant features and map them according to the input information. Each
and all incoming data
filter are resized
in the to 224 ×layers
convolutional 224. contains
Convolutional layersnumber
a determined are required to gen-
of parameters that can
erate the relevant learn
features and map them according to the input information. Each
through the training phase. These layers produce output features that are smaller filter
in the convolutionalthanlayers contains
the input data. Thea determined number
number of utilized ofinparameters
filters the proposedthat
modelcan learn
is 16, and these
through the trainingfilters haveThese
phase. a dimension
layersof 3 × 3 with
produce output of 55 × 55,
a sizefeatures thatasare
illustrated
smallerinthan
Table
the2. In the
input data. The number of utilized filters in the proposed model is 16, and these filters and
medical field, the most utilized pool is the max-pool technique. Thus, it is adopted
modified in this research to extract features from every block of the characteristics maps.
have a dimension of 3 × 3 with a size of 55 × 55, as illustrated in Table 2. In the medical
The proposed approach extracts numerous features to learn by itself. The total number
field, the most utilized pool isfeatures
of extracted the max-pool
is 22 per technique.
image, includingThus,theit area
is adopted and modified
of the detected RoIs, diameter,
in this research to standard
extract features
deviation, and mean. The proposed CAD system extracts 51,722 pro-
from every block of the characteristics maps. The features for
all images. The size of the extracted features is reduced in the max pooling phase using
principal component analysis (PCA) method. After generating the outputs of the extracted
Diagnostics 2023, 13, 1174 8 of 20

features, a sample of the training images is fed into the model to improve the accuracy of
the profound learning results.
The next stage is identifying the potential RoIs according to the learned data and
drawing red circles around all identified RoIs. This operation is performed in MATLAB
using a built-in function. After that, the detected RoIs are classified according to the
learned features using the utilized DLTs. The presented model classifies the disease into
healthy, adenocarcinoma, squamous cell carcinoma, and large cell carcinoma, as illustrated
in Figure 2. The extracted required characteristics are flattened and transformed into a
one-dimensional array. Every image is normalized in every layer in the presented system.
The multiplication of the generated matrices is carried in the fully connected layers. The
soft-max layers distribute the extracted features into different groups. These groups are
utilized to identify and classify the detected RoIs into their suitable classes, as depicted in
Figure 2.
Finally, the proposed model evaluates numerous performance parameters to compare
it with other state-of-the-art methods. The computed parameters are:
1. True Positive (TP): indicates the number of adequately identified types in the given dataset;
2. False Positive (FP): determines the number of types that are mispredicted;
3. True Negative (TN): gives the number of healthy lungs identified correctly;
4. False Negative (FN): measures the number of negative samples identified incorrectly;
5. Precision (PR): shows the ratio of the identified types over the summation of the
classes that are identified incorrectly plus the actual classes that are correctly classified,
as demonstrated in the equation below:

PR = TP/(TP + FP) (1)

6. Recall (RE): computes the ratio of the identified classes over the summation of the
actual images plus the number of negative types that are incorrectly classified, as
depicted in (2):
RE = TP/(TP + FN) (2)
7. Accuracy (Acc): this parameter indicates how the proposed approach performs well,
and it is evaluated as follows:

Acc = (TP + TN)/N (3)

where N is the total number of images being tested and computed as follows:

N = TP + TN + FN + FP (4)

8. F-score: represents a harmonic mean of two performance metrics of the presented


CAD algorithm: recall and precision. Therefore, the higher the value is, the better the
model is developed. This metric is evaluated as follows:

F-score = 2 × [(PR × RE)/(PR + RE)] (5)

The implemented model herein has various advantages. These advantages are
as follows:
A. It is easy to run and operate;
B. Procedures are automated to minimize human intervention;
C. It is a dependable and practical solution;
D. No specific modules are mandatory.
The proposed Algorithm 1 to detect and classify lung cancer is illustrated as follows:
Diagnostics 2023, 13, 1174 9 of 20

Algorithm: Lung Cancer Detection and Classification


Input: an image: CT Scan or X-ray.
Output: the detection and classification of Lung Cancer: NSCLC and its subtypes.
1. Read an image or a sequence of images from a file.
2. In the preprocessing phase: Do the following:
3. Remove any detected noise.
4. Rescale the inputs to the required size of VGG-19 and LSTMs.
5. Apply various filters to enhance the pixels of the inputs.
6. Transform the resultant image into a gray image.
7. End of Preprocessing phase.
8. For the Deep Learning phase: Do the following:
9. Create a Zero matrix with a size = size of the input image by 4.
10. For i =1: 4
11. Perform a filtration: Gabor Filter and DWT to determine the magnitude and wavelength for
every pixel.
12. Perform a masking operation using the morphological process to extract the required
features, such as Area, shape, diameter, and correlation.
13. Determine a dynamic threshold for every image.
14. Invert the image to separate the foreground and the background.
15. Create a Binary image to detect and classify the disease with a size = 1024 × 1024.
16. Find any potential area and draw a circle around it.
17. Determine the number of detected areas and their drawn circles.
18. For i = 1: 1024
19. For j = 1: 1024
20. Compute the number of white pixels z to compare it with the threshold.
21. Plot the detected circles.
22. If z > threshold:
23. Cancer is Detected.
24. End
25. Classify the detected cancer as Adenocarcinoma, LCC, or SCC.
26. End
27. End of Deep Learning phase.
28. Calculate the required performance parameters: accuracy, precision, recall, and F-score.
29. End of the algorithm.

3. Results
Numerous simulation evaluation experiments were performed to verify the proposed
approach and test its functionality and outputs. Several scenarios were tested to illustrate
how the system works to detect the type of lung cancer present. In addition, the compu-
tation of performance parameters is presented as well. All simulation tests were carried
out on a Microsoft Windows hosting machine. This hosting machine was run by Windows
11 Pro as its operating system. In addition, the clock was 2.4 GHz and had 16 GB of RAM.
A further provision in this article is the comparative assessment between some literary
works and the developed approach. The use of MATLAB in all simulation scenarios was
paramount as it possesses built-in tools for image processing purposes. These tools were
utilized and employed in the developed method. The implemented model/algorithm was
tested over 200 times, and it took between 6 and 8 h for the training phase to achieve its
highest results. During the training stage, the for-loop instruction was set to run around
3000 times to permit the algorithm to learn deeply to reach and accomplish acceptable
results. The proposed approach produced four subgraphs for its outputs of detection
and classification. Moreover, determining the number of detected lung cancer areas was
provided. This article offers only three scenarios/cases of cancerous lungs. These cases
represent the subtypes of NSCLC; these scenarios are as follows:
phase to achieve its highest results. During the training stage, the for-loop instruction was
set to run around 3000 times to permit the algorithm to learn deeply to reach and accom-
plish acceptable results. The proposed approach produced four subgraphs for its outputs
Diagnostics 2023, 13, 1174 of detection and classification. Moreover, determining the number of detected lung cancer 10 of 20
areas was provided. This article offers only three scenarios/cases of cancerous lungs. These
cases represent the subtypes of NSCLC; these scenarios are as follows:
3.1. Scenario 1: Adenocarcinoma
3.1. Scenario 1: Adenocarcinoma
Figure 3 illustrates four subgraphs: an original CT-Scan image in (a); (b) shows its
Figure 3 illustrates four subgraphs: an original CT-Scan image in (a); (b) shows its
segmented output with red circles around potential areas of possible cancer; (c) outlines
segmented output with red circles around potential areas of possible cancer; (c) outlines
only the detected regions or spots containing the disease; and (d) shows the cancerous lung
only the detected regions or spots containing the disease; and (d) shows the cancerous
with all detected cancer spots. Furthermore, the number of the detected nodules and their
lung with all detected cancer spots. Furthermore, the number of the detected nodules and
classification type are also offered.
their classification type are also offered.

Figure 3. The obtained results of Case 1: (a) an input, (b) segmented potential RoIs, (c) the outlined
Figure 3. The obtained results of Case 1: (a) an input, (b) segmented potential RoIs, (c) the outlined
spots, and (d) type of detected tumor: Adenocarcinoma.
spots, and (d) type of detected tumor: Adenocarcinoma.
As demonstrated and presented in Figure 3, the implemented model segmented the po-
As demonstrated and presented in Figure 3, the implemented model segmented the
tential and possible regions of interest (RoIs) that could contain tumors. These regions of
potential and possible regions of interest (RoIs) that could contain tumors. These regions
interest were determined by adding red circles, as illustrated in Figure 3b. Then, the pro-
of interest were determined by adding red circles, as illustrated in Figure 3b. Then, the
posed method outlined these regions of interest by removing all the other parts, as de-
proposed method outlined these regions of interest by removing all the other parts, as
picted in Figure 3c. Finally, these red circles/dots were restored to their original locations
depicted in Figure 3c. Finally, these red circles/dots were restored to their original locations
and placed in the original image, as in Figure 3d. The classification process result is shown
and placed in the original image, as in Figure 3d. The classification process result is shown
in Figure 3d.
in Figure 3d.
3.2. Scenario 2: Large Cell Carcinoma
3.2. Scenario 2: Large Cell Carcinoma
Figure 4 illustrates an original CT-scan image in (a); (b) shows its segmented output
Figure 4 illustrates an original CT-scan image in (a); (b) shows its segmented output
with red circles around areas of possible cancer; (c) displays only the detected regions or
with red circles around areas of possible cancer; (c) displays only the detected regions or
spots containing the disease; and (d) shows the cancerous lung and the type of detected
tumors. Furthermore, Figure 4 includes the computed number of detected cancer nodules
and classification type.
Diagnostics 2023, 13, x FOR PEER REVIEW 11 of 20

Diagnostics 2023, 13, 1174


spots containing the disease; and (d) shows the cancerous lung and the type of detected
11 of 20
tumors. Furthermore, Figure 4 includes the computed number of detected cancer nodules
and classification type.

Figure 4. The obtained results of Case 2: (a) an input, (b) segmented potential RoIs, (c) the outlined
Figure 4. The obtained results of Case 2: (a) an input, (b) segmented potential RoIs, (c) the outlined
spots, and (d) type of detected tumor: LCC.
spots, and (d) type of detected tumor: LCC.
The proposed CAD system generated the outputs, as shown in Figure 4. It discovers
The proposed CAD system generated the outputs, as shown in Figure 4. It discovers
and learns by itself, through the deep learning phase, which type of discovered cancer has
and learns by itself, through the deep learning phase, which type of discovered cancer has
been identified. Each RoI is encircled in red, as depicted in Figure 4b, and all cancer spots
been identified. Each RoI is encircled in red, as depicted in Figure 4b, and all cancer spots
are outlined and shown alone, as in Figure 4c. Moreover, the model verifies the number
are outlined and shown alone, as in Figure 4c. Moreover, the model verifies the number of
of all found cancer cells and their types, as in Figure 4d.
all found cancer cells and their types, as in Figure 4d.
3.3. Scenario 3: Squamous Cell Carcinoma
3.3. Scenario 3: Squamous Cell Carcinoma
Figure 5 illustrates an original CT-scan image in (a); (b) shows its segmented output
Figure 5 illustrates an original CT-scan image in (a); (b) shows its segmented output
with red circles around areas of possible cancer; and (c) displays only the detected regions
with red circles around areas of possible cancer; and (c) displays only the detected regions
or spots. In addition, Figure 5 illustrates the estimated number of cancer spots the model
or spots. In addition, Figure 5 illustrates the estimated number of cancer spots the model
identified and the result of the classification calculations.
identified and the result of the classification calculations.
Tables 4–6 list all the values of the performance parameters under consideration and
measured by the proposed model for the three developed schemes. Table 4 represents
the results of using VGG-19 alone, Table 5 shows the achieved results of LSTMs, while
Table 6 demonstrates the outputs of combining VGG-19 and LSTMs. Accuracy, precision,
recall, and F-scores are measured in percentages. In total, 850 CT-scan and X-ray images
are shown. The developed algorithm reaches the highest outcomes of accuracy, precision,
recall, and F-score, as shown in Table 6.
The accuracy obtained increases to almost 99.61% when increasing the number of
iterations inside the model and applying more testing inputs. Figure 6 depicts the graphical
representation of the evaluated performance metrics of the three developed schemes.
Diagnostics 2023, 13, 1174 12 of 20
Diagnostics 2023, 13, x FOR PEER REVIEW 12 of 20

Figure 5. The results of Case 3: (a) an input, (b) segmented potential RoIs, (c) the outlined spots,
Figure 5. The results of Case 3: (a) an input, (b) segmented potential RoIs, (c) the outlined spots, and
and (d) type of detected tumor: SCC.
(d) type of detected tumor: SCC.
Tables 4–6 list all the values of the performance parameters under consideration and
Table 4. The evaluated performance metrics of VGG-19.
measured by the proposed model for the three developed schemes. Table 4 represents the
results of using VGG-19 alone,
Performance MetricTable 5 shows the achieved results
Evaluated of N
Value: LSTMs, while Table
= 850 Images
6 demonstrates the outputs of combining VGG-19 and LSTMs. Accuracy, precision, recall,
796
TP
and F-scores are measured in percentages. InAdenocarcinoma
total, 850 CT-scan and X-ray
= 245, LCC = 161,images are
SCC = 233
shown. The developed algorithm reaches the highest outcomes of accuracy, precision, re-
TN 17
call, and F-score, as shown in Table 6.
FN 29
Table 4. The evaluatedFP
performance metrics of VGG-19. 8
Performance Metric
Accuracy Evaluated Value: N = 850 Images
95.647%
Precision 79699%
TP
Recall Adenocarcinoma = 245, LCC = 161, SCC = 233
96.484%
TN 17
F-score 97.726%
FN 29
FP 8
Table 5. The achieved outcomes of LSTMs.
Accuracy 95.647%
Precision Metric
Performance 99% N = 850 Images
Evaluated Value:
Recall 96.484%
811
TP
F-score 97.726%
Adenocarcinoma = 273, LCC = 201, SCC = 337
TN 24
FN 9
FP 6
Adenocarcinoma = 273, LCC = 201, SCC = 337
TN 24
FN 9
FP 6
Diagnostics 2023, 13, 1174 13 of 20
Accuracy 98.235%
Precision 99.266%
Recall
Table 5. Cont. 98.902%
F-score Performance Metric 99.084%
Evaluated Value: N = 850 Images
Accuracy 98.235%
Table 6. The evaluated performance analysis of combined VGG-19 and LSTMs.
Precision 99.266%

Performance Metric Recall Evaluated Value: N = 85098.902% Images


F-score 833 99.084%
TP
Adenocarcinoma = 307, LCC = 289, SCC = 237
Table 6. The evaluated performance analysis of combined VGG-19 and LSTMs.
TN 11
FN Performance Metric 2
Evaluated Value: N = 850 Images

FP 1 833
TP
Adenocarcinoma = 307, LCC = 289, SCC = 237
Accuracy 99.42%%
TN 11
Precision 99.880%
FN 2
Recall FP
99.760% 1
F-score Accuracy 99.820% 99.42%%
Precision 99.880%
The accuracy obtained increases to almost 99.61% when increasing
Recall the number of
99.760%
iterations inside the model and applying
F-score
more testing inputs. Figure 6 depicts
99.820%
the graph-
ical representation of the evaluated performance metrics of the three developed schemes.

Figure 6. The evaluated performance


Figure metrics
6. The evaluated analysis.metrics analysis.
performance

Using the VGG-19 model standalone, the first scheme achieved the minimum results
for all considered metrics, while the LSTMs reached better values. The last scheme, combin-
ing both techniques, achieved the highest outcomes for the considered metrics, as shown in
Figure 6. Table 7 lists the comparison results for the average accuracy, precision, and recall
Diagnostics 2023, 13, 1174 14 of 20

values between the proposed algorithm and some research in the literature. Moreover, the
utilized tools are included in Table 7, and N.M. stands for not mentioned. In this research,
the obtained values of Table 6 are used in the comparison analysis.

Table 7. Performance metrics and their values.

Works Utilized Technology Precision Recall Accuracy


Sousa et al., 2021 [2] Fused image technique N.M. 89% 99%
Nazir et al., 2021 [3] LP + ASR 89% N.M. 99%
Al-Yasriy et al., 2020 [5] CNN 95.714% N.M. 93.548%
Hasan et al., 2019 [7] Image processing N.M. N.M. 72.2%
Nasser and Abu-Naser,
ANN N.M. N.M. 96.67%
2019 [9]
Madan et al., 2019 [16] XGBoost + RFA N.M. N.M. 84%
The proposed algorithm VGG-19 and LSTMs 99.42% 99.880% 99.760%

Table 7 shows that the presented model generates better results than other works
regarding all performance metrics. The implemented methods in [7,16] performed the
lowest accuracy of 72.2 and 84%, respectively, while the works in [5,9] reached a moderate
93.548 and 96.67% accuracy, respectively. However, the proposed model displayed a
maximum accuracy of 99.61%, which no other methods have achieved.
Table 8 demonstrates the obtained confusion matrix of the presented approach for
850 images of the third scheme, which gave the highest outcomes. The green marks the
appropriate and correct categorized classes, while the red determines the improperly
identified types. In addition, all subtypes of NSCLC are identified by the light orange color.
Adenocarcinoma is Class A, Class B refers to LCC, Class C represents SCC, and Class D
denotes healthy tissue.

Table 8. The obtained confusion matrix of the third scheme.

True Class
Class A Class B Class C Class D
Predicted Class

310 289 237 14


Class A 307 = (99.032%) 0 0 1 = (7.143%)
Class B 1 = (0.323%) 286 = (98.962%) 0 2 = (0.692%)
Class C 0 0 237 = (100%) 0
Class D 2 = (14.286%) 0 0 12 = (85.714%)

Figures 7 and 8 illustrate the achieved receiver operating characteristic (ROC) curve
of the proposed system and the error histogram chart. Figure 7 shows the performance of
classification by the system. Ten different thresholds were applied to obtain the classification
results. In Figure 8, the error histogram was achieved with 20 bins. This figure includes
the training, validation, and testing datasets. Figure 9 depicts the obtained cross-entropy
results of the training, validation, and testing datasets. The best value of this quantity
occurred at epoch 11.
Diagnostics 2023, 13, 1174 15 of 20
Diagnostics 2023, 13, x FOR PEER REVIEW 15 of 20
Diagnostics 2023, 13, x FOR PEER REVIEW 15 of 20

Figure 7. The evaluated performance metrics analysis.


Figure7.7.The
Figure Theevaluated
evaluatedperformance
performancemetrics
metricsanalysis.
analysis.

Figure 8. The achieved error histogram chart.


Figure 8. The achieved error histogram chart.
Figure 8. The achieved error histogram chart.
Diagnostics 2023, 13, 1174 16 of 20
Diagnostics 2023, 13, x FOR PEER REVIEW 16 of 20
Diagnostics 2023, 13, x FOR PEER REVIEW 16 of 20

Figure 9. The
Figure9.9. The achieved
achieved cross-entropy
cross-entropyresults.
results.
Figure The achieved cross-entropy results.

4.4.Discussion
Discussion
The
Thex-fold
x-foldcross-validation
x-fold cross-validationtechnique
cross-validation techniquewas
technique wasutilized
was utilizedin
utilized ininthis
thisresearch
this researchto
research totoevaluate
evaluate
evaluatethethe
the
proposed
proposedmodel.
model. This
model. This techniqueisis
This technique isaaastatistical
statisticalmethod
statistical method
method tototo estimate
estimate
estimate how
how
how the
the the algorithm
algorithm
algorithm be-be-
behaves andgenerates
haves and generatesits
generates itsoutputs.
its outputs.In
outputs. Inthis
In thisarticle,
this fivefolds
article,five folds were
foldswere conducted.
wereconducted. Figure
conducted.FigureFigure1010 rep-
10rep-
rep-
resents
resentsthe average
the average
average graphical outputs
graphical
graphical outputs
outputs for for
five different
for five
five runsruns
different
different of the
runs considered
ofof performance
theconsidered
the considered perfor-
perfor-
metrics: accuracy,
mance metrics: precision,
metrics: accuracy, and recall
accuracy, precision,
precision,and of the
andrecallthird scheme
recallofofthe
thethird after
thirdscheme150 iterations.
schemeafterafter150 In
iterations.10,
Figure
150iterations. InIn
adenocarcinoma is called
Figure 10, adenocarcinoma Class
adenocarcinomaisiscalledA, LCC
calledClass is Class
ClassA, B,
A,LCC and
LCCisisClassSCC is Class
ClassB,B,andandSCC C.
SCCisisClass
ClassC.C.

Figure 10. Result analysis of the proposed model.


Figure10.
Figure 10.Result
Resultanalysis
analysisofofthe
theproposed
proposedmodel.
model.

Theproposed
The proposedalgorithm
algorithmidentifies
identifiesand
andclassifies
classifies NSCLC
NSCLC accurately,
accurately, asas demonstrated
demonstrated in
The proposed algorithm identifies and classifies NSCLC accurately, as demonstrated
in
the the previous graphs. The implemented algorithm in this research and study integrates
in previous graphs.
the previous The The
graphs. implemented
implemented algorithm in this
algorithm in research and study
this research integrates
and study the
integrates
VGG-19 tool with the LSTM technique to perform deep learning to diagnose and categorize
Diagnostics
Diagnostics 2023,
2023, 13,13,
1174x FOR PEER REVIEW 1717
ofof2020

the VGG-19
lung tumors to tool withand
reach the LSTM
achieve technique to perform
the acceptable rangedeepof learning to diagnose
the considered and cat-
performance
egorize lung tumors to reach and achieve the acceptable range
parameters, as shown in Table 6. The presented CAD system surpasses other developed of the considered perfor-
methods, as shown in Table 7, as no other methods have achieved the same results de-
mance parameters, as shown in Table 6. The presented CAD system surpasses other that
veloped
were reached methods,
by theasproposed
shown insystem.
Table 7,The as no other methods
combination andhave achieved
integration of the same and
VGG-19 re-
sults that
LSTMs were the
yielded reached
best by andthe proposed
highest system. The
outcomes. combination
Various steps and and integration
phases of VGG-
took place and
19 and
were LSTMs yielded
employed the best and highest outcomes. Various steps and phases took place
appropriately.
andThewere employed
executed andappropriately.
conducted evaluations on the achieved outcomes of the developed
The executed
model show that it can anddistinguish,
conducted evaluations
determine, and on the achieved
classify NSCLCoutcomes of the
correctly. developed
Tables 4–6 list
model show that it can distinguish, determine, and classify
all model values when employing it on the 850 inputs. These inputs were CT scans NSCLC correctly. Tables 4–6X-
and
list all model values when employing it on the 850 inputs. These
rays. The same tables detail the total number of inputs determined and correctly categorize inputs were CT scans
and
the X-rays.ofThe
number sameclassified
inputs tables detail the total number
inappropriately. of inputs
Moreover, thedetermined
comparative andevaluation
correctly
categorize the number of inputs classified inappropriately.
between the developed algorithm and its procedures and other state-of-the-art works is Moreover, the comparative
evaluation
provided between
in Table the developed
7. The algorithm matrix
resultant confusion and its of
procedures
the third and schemeotherisstate-of-the-art
represented in
works is provided in Table 7. The resultant confusion matrix
Table 8. This matrix shows that the presented system has the ability and the capability of the third scheme is repre-
sented in Table 8. This matrix shows that the presented system
to identify and classify the disease correctly. These evaluations indicate and imply has the ability and the
that
capability to identify and classify the disease correctly. These
the proposed model surpasses and outperforms other works in all performance metrics. evaluations indicate and
imply11
Figure that the proposed
illustrates model surpasses
the obtained accuracyand andoutperforms othercharts
the loss function worksof in the
all performance
third scheme
metrics. Figure 11 illustrates the obtained accuracy and the
when the learning rate L was 0.01 for 15 epochs. This chart contains 465 iterations,loss function charts of the third
with
31 iterations for each epoch. In addition, the validation occurs every 30 iterations.iterations,
scheme when the learning rate L was 0.01 for 15 epochs. This chart contains 465 The black
with 31lines
dashed iterations
refer tofortheeach epoch. In
validation addition,
process. The the validation
accuracy andoccurs
the loss every 30 iterations.
function become
steady and stable after five epochs. The loss function converges nearly to 0,loss
The black dashed lines refer to the validation process. The accuracy and the function
whereas the
become steady and stable after five
accuracy reaches 99.8%, as shown in the same graph. epochs. The loss function converges nearly to 0,
whereas the accuracy reaches 99.8%, as shown in the same graph.

Figure 11. The achieved accuracy and loss function charts.


Figure 11. The achieved accuracy and loss function charts.

Figure1212illustrates
Figure illustratesthe
thecomparative
comparative accuracy
accuracy analysis
analysis between
between thetheproposed
proposedsystem
system
and some developed literature models. It shows that the presented
and some developed literature models. It shows that the presented model achieves model achieves
the the
best
best accuracy
accuracy results,results,
and noandothernomethod
other method couldthat
could reach reach that accuracy
accuracy level.
level. The Theachieved
lowest lowest
achievedwas
accuracy accuracy waswhile
in [7,16], in [7,16], while moderate
moderate values
values were were obtained
obtained in [5,9].
in [5,9]. The The highest
highest reached
accuracy was in [3]. However, the proposed CAD method outperforms all these all
reached accuracy was in [3]. However, the proposed CAD method outperforms these
methods
methods and achieved 99.42% on average, while its maximum
and achieved 99.42% on average, while its maximum result was 99.61%. result was 99.61%.
Diagnostics 2023, 13, 1174 18 of 20
Diagnostics 2023, 13, x FOR PEER REVIEW 18 of 20

Figure 12. The comparative accuracy analysis charts between the proposed approach and some
Figure 12. The comparative accuracy analysis charts between the proposed approach and some
works [3,5,7,9,16].
works [3,5,7,9,16].

5.5.Conclusions
Conclusions
In this
In this study,
study, the
thedeveloped
developed model
modelreveals a robust,
reveals trustworthy,
a robust, and highly
trustworthy, efficient
and highly effi-
system to detect and classify tumors from CT scans and X-ray images
cient system to detect and classify tumors from CT scans and X-ray images correctly correctly and accu-and
rately. The The
accurately. implemented
implemented algorithm involves
algorithm various
involves tools, tools,
various such as the as
such Gabor filter, dis-
the Gabor filter,
crete wavelet transformation, PCA, and other filters, to deliver acceptable
discrete wavelet transformation, PCA, and other filters, to deliver acceptable outcomes outcomes andand
results. This
results. This model
model possesses
possesses thethe capability
capabilityand andthetheability
abilitytotodistinguish,
distinguish,differentiate,
differentiate,
andclassify
and classify tumors
tumors ofof the
the NSCLC
NSCLC types.
types. The
Theperformance
performanceofofthe thepresented
presentedmethod
method was
was
evaluated on the utilized three datasets from The Kaggle website and
evaluated on the utilized three datasets from The Kaggle website and the LUNA16 grad the LUNA16 grad
challenge.From
challenge. Fromthetheattained
attainedfindings,
findings,the
theimplemented
implementedmodel modelsurpasses
surpassesallallother
othermethods
meth-
ods regarding
regarding the considered
the considered metrics:
metrics: accuracy,
accuracy, precision,
precision, andand recall.
recall. TheThe developedsystem
developed sys-
tem generally shows considerable enhancements and enrichment
generally shows considerable enhancements and enrichment on all considered metrics.on all considered met-
rics.developed
The The developed algorithm
algorithm demonstrates
demonstrates its utility,
its utility, efficiency,
efficiency, and accuracy
and accuracy in detect-
in detecting and
ing and classifying NSCLC. Validation through MATLAB shows effective performance
classifying NSCLC. Validation through MATLAB shows effective performance producing
producing accepted outputs. The proposed algorithm reaches an average of 99.42% accu-
accepted outputs. The proposed algorithm reaches an average of 99.42% accuracy and
racy and around 99.61% when the number of iterations increases significantly. In addition,
around 99.61% when the number of iterations increases significantly. In addition, in some
in some cases, the proposed system achieves 99.8% accuracy, and this is the only work
cases, the proposed system achieves 99.8% accuracy, and this is the only work that could
that could reach this accuracy.
reach this accuracy.
Future works are in place to improve the algorithm, leading to the appropriate clas-
Future works are in place to improve the algorithm, leading to the appropriate classifi-
sification of all subtypes and the production of accurate results from classification opera-
cation of all subtypes and the production of accurate results from classification operations
tions with minimal execution time.
with minimal execution time.
Author Contributions: Conceptualization, A.A.A. and A.K.A.; data curation, Y.S. and A.K.A.; for-
Author Contributions: Conceptualization, A.A.A. and A.K.A.; data curation, Y.S. and A.K.A.; formal
mal analysis, A.A.A. and Y.S.; funding acquisition, T.S.; investigation, H.L.; methodology, A.A.A.;
analysis, A.A.A. and Y.S.;
project administration, funding
A.A.A. andacquisition, T.S.;H.L.
T.S.; resources, investigation,
and A.K.A.;H.L.; methodology,
software, A.A.A.; project
Y.S.; supervision, T.S.;
administration, A.A.A. and T.S.; resources, H.L. and A.K.A.; software, Y.S.; supervision,
validation, Y.S. and H.L.; visualization, H.L.; writing—original draft, A.A.A.; writing—review T.S.; valida-
and
tion, Y.S. T.S.
editing, and and
H.L.;A.K.A.
visualization, H.L.;have
All authors writing—original
read and agreeddraft, A.A.A.;
to the writing—review
published andmanu-
version of the editing,
T.S. and A.K.A. All authors have read and agreed to the published version of the manuscript.
script.
Diagnostics 2023, 13, 1174 19 of 20

Funding: This research work was funded by the Institutional Fund Projects under grant no. (IFPIP:
646-829-1443). The authors gratefully acknowledge technical and financial support provided by the
Ministry of Education and King Abdulaziz University, DSR, Jeddah, Saudi Arabia.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The utilized datasets in this study were downloaded from the Kaggle
website, and their links are available upon request.
Acknowledgments: This research work was funded by the Institutional Fund Projects under grant no.
(IFPIP: 646-829-1443). The authors gratefully acknowledge technical and financial support provided
by the Ministry of Education and King Abdulaziz University, DSR, Jeddah, Saudi Arabia.
Conflicts of Interest: The authors declare they have no conflict of interest to report regarding the
present study.

References
1. Hosseini, H.; Monsefi, R.; Shadroo, S. Deep Learning Applications for Lung Cancer Diagnosis: A Systematic Review. arXiv 2022,
arXiv:2201.00227.
2. Sousa, J.; Pereira, T.; Silva, F.; Silva, M.C.; Vilares, A.T.; Cunha, A.; Oliveira, H.P. Lung Segmentation in CT Images: A Residual
U-Net Approach on A Cross-Cohort Dataset. Appl. Sci. 2022, 12, 1959. [CrossRef]
3. Nazir, I.; Ul Haq, I.; Khan, M.M.; Qureshi, M.B.; Ullah, H.; Butt, S. Efficient Pre-Processing and Segmentation for Lung Cancer
Detection Using Fused CT Images. Electronics 2021, 11, 34. [CrossRef]
4. Dayma, M.M. Lung Cancer Detection Using MATLAB. IOSR J. Comput. Eng. (IOSR-JCE) 2021, 23, 35–40.
5. Al-Yasriy, H.F.; Al-Husieny, M.S.; Mohsen, F.Y.; Khalil, E.A.; Hassan, Z.S. Diagnosis of Lung Cancer Based on CT Scans Using
CNN. In Proceedings of the 2nd International Scientific Conference of Al-Ayen University (ISCAU-2020): IOP Conf. Series:
Material Science and Engineering, Thi-Qar, Iraq, 15–16 July 2020; p. 10.
6. Ahmed, B.T. Lung Cancer Prediction and Detection Using Image Processing Mechanisms: An Overview. Signal Image Process.
Lett. (SIMPLE) 2019, 1, 20–31. [CrossRef]
7. Hasan, R.; Al Kabir, M. Lung Cancer Detection and Classification Based on Image Processing and Statistical Learning. arXiv 2019,
arXiv:1911.1065.
8. Available online: https://healthcare.utah.edu/huntsmancancerinstitute/news/2019/11/even-non-smokers-can-get-lung-cancer.
php (accessed on 5 March 2022).
9. Nasser, I.M.; Abu-Naser, S.S. Lung Cancer Detection Using Artificial Neural Network. Int. J. Eng. Inf. Syst. 2019, 3, 17–23.
10. Available online: https://www.lungevity.org/for-patients-caregivers/lung-cancer-101/types-of-lung-cancer (accessed on 5
March 2022).
11. Melisa, B. Image Detection Using the VGG-19 Convolutional Neural Network. Available online: https://medium.com/mlearning-
ai/image-detection-using-convolutional-neural-networks-89c9e21fffa3 (accessed on 22 November 2022).
12. Khattar, A.; Quadri, S.M.K. Generalization of Convolutional Network Domain Adaptation Network for Classification of Disaster
Images on Twitter. Multimed. Tools Appl. 2022, 81, 30437–30464. [CrossRef]
13. Dolphin, R. LSTM Networks: A Detailed Explanation. Available online: https://towardsdatascience.com/lstm-networks-a-
detailed-explanation-8fae6aefc7f9 (accessed on 26 December 2022).
14. Yeturu, K. Machine learning algorithms, applications, and practices in data science. Handb. Stat. 2020, 43, 81–206.
15. Bhatia, S.; Sinha, Y.; Goel, L. Lung Cancer Detection: A Deep Learning Approach. In “Soft Computing for Problem Solving,” Advances
in Intelligent Systems and Computing; Bansal, J., Das, K., Nagar, A., Deep, K., Ojha, A., Eds.; Springer Nature: Singapore, 2019;
Volume 817, pp. 699–705.
16. Madan, B.; Panchal, A.; Chavan, D. Lung Cancer Detection Using Deep Learning. In Proceedings of the 2nd International
Conference on Advances in Science and Technology (ICAST-2019), Makassar, Indonesian, 5–6 November 2019; p. 3.
17. Available online: https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images/ (accessed on 4 March 2022).
18. Available online: https://www.kaggle.com/datasets/raddar/nodules-in-chest-xrays-lidcidri (accessed on 11 December 2022).
19. Available online: https://luna16.grand-challenge.org/Download/ (accessed on 8 March 2023).
20. Makaju, S.; Prasad, P.W.C.; Alsadoon, A.; Singh, A.K.; Elchouemi, A. Lung Cancer Detection Using CT Scan Images. Sci. Procedia
Comput. Sci. 2018, 125, 107–114. [CrossRef]
21. Mahersia, H.; Zaroug, M.; Gabralla, L. Lung Cancer Detection on CT Scan Images: A Review on the Analysis Techniques. Int. J.
Adv. Res. Artif. Intell. (IJARAI) 2015, 4, 38–45. [CrossRef]
Diagnostics 2023, 13, 1174 20 of 20

22. Tun, K.M.M.; Khaing, A.S. Feature Extraction and Classification of Lung Cancer Nodule Using Image Processing Techniques. Int.
J. Eng. Res. Technol. (IJERT) 2014, 3, 2204–2210.
23. Kanitkar, S.; Thombare, N.D.; Lokhande, S.S. Lung Cancer Detection and Classification: A review. Int. J. Eng. Res. Technol. (IJERT)
2013, 2, 2312–2315.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like