Classification of Pulmonary Lesions Using Gan and Semi Supervised Gan
Classification of Pulmonary Lesions Using Gan and Semi Supervised Gan
12(10), 1582-1591
Article DOI:10.21474/IJAR01/19789
DOI URL: http://dx.doi.org/10.21474/IJAR01/19789
RESEARCH ARTICLE
CLASSIFICATION OF PULMONARY LESIONS USING GAN AND SEMI SUPERVISED GAN
Amirul Haikal Abdullah, Siti Salasiah Mokri, Ahmad Abid Hakimi Md Salleh and Qurratu’Aini Thaqifah
Ithanin
Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering & Built Environment,
UniversitiKebangsaan Malaysia, Malaysia.
……………………………………………………………………………………………………....
Manuscript Info Abstract
……………………. ………………………………………………………………
Manuscript History Lung cancer is a leading cause of cancer-related deaths, with early
Received: 29 August 2024 detection being important for effective treatment. This study
Final Accepted: 30 September 2024 investigates automated lung nodules classification using Generative
Published: October 2024 Adversarial Networks (GANs) and semi-supervised GAN (SGAN), to
address data scarcity issues in lung nodule classification. By generating
Key words:-
Lung Nodules, Classification, synthetic lung nodule images, SGAN and conventional GAN models
Generative Adversarial Network, Semi are evaluated on image quality, variation, and classification metrics,
Supervised, Computed Tomography including accuracy, specificity, sensitivity, and test set loss. Although
SGAN produces superior synthetic images, both models achieve
comparable classification accuracy of 95%. GAN demonstrates slightly
higher specificity and sensitivity. The results show that while SGAN
generates enhanced synthetic images, the improvement does not result
in better classification due to the grayscale nature of the images and
effective feature extraction by the classifier.
Copyright, IJAR, 2024,. All rights reserved.
……………………………………………………………………………………………………....
Introduction:-
In 2023, lung cancer represented a very significant 21% of all cancer-related deaths in the United States. The main
factor that increases the risk of lung cancer is smoking, either active or passive smoking (Sung et al., 2021). Lung
cancer is the uncontrolled growth of the cells in the lung tissues (Zhang et al., 2015). There are no obvious symptoms
during the early stage of cancer although there is a high probability of having small nodules in the lungs during this
stage.
Early detection of lung cancer is made feasible with the use of computed tomography (CT) screening. Normally, the
diagnostic decision is made manually based on the experience of the physicians, but this approach is difficult and time
consuming due to large number of CT scans to be reviewed (Tran et al., 2019). As of present, the diagnosis is supported
by the computer aided diagnostic (CAD) system to assist the physicians in detecting and diagnosing the lung nodules,
whether benign (non-cancerous) or malignant (cancerous).
The current trend in screening and diagnosing medical images is by using advanced deep learning techniques
including the problem of lung nodules classification in CT images (Cheng et al., 2024). However, the basic use of any
deep learning methods requires a high volume of training images to achieve a good performance, and this becomes a
constraint in biomedical processing research (Islam et al., 2024). The dataset limitation problem could be tackled by
using Generative Adversarial Networks (GAN) to produce more data as well as to alleviate the problem of
unbalanced dataset (Li & Wang, 2024). Due to this, Generative Adversarial Networks (GAN) as proposed by
Goodfellow et al., (2014), has been widely applied in the generation of artificial images in various fields in medical
imaging (Yi, Walia&Babyn, 2018)including the classification task.
Following GAN, Semi-supervised Generative Adversarial Network (SGAN) has been proposed bySalimans et al.,
(2016) to improve the network performance when labeled data is scarce. Dissimilar to GAN that requires large
amount of labeled data for effective training, semi-supervised GAN optimally uses both labeled and unlabeled data
which makes it advantageous in any domain where labeled data is scarce. It is well known that labeled data is costly
and time consuming. Significantly, semi-supervised GAN could generalize well from limited labeled data, which is
supplemented by vast amount of unlabeled data. For example, semi-supervised GAN could produce high quality
images and better classification accuracy with fewer labeled examples (Kushwala et al., 2024), (Toutouh et al., 2023)
Thus, the aim of this paper is to investigate and compare the effectiveness of Semi-supervised GAN the generation of
synthetic images and classification of lung nodules (benign or malignant) in CT images with the conventional GAN.
This study focuses on 2D lung CT images.
Literature Review:-
The application of deep learning methods has achieved good performance in many medical imaging domains
including the lung nodule classification task. In basic, deep learning, which is inspired by neurological architecture, is
a subset of machine learning that makes predictions over several layers of neural network (Tranet el., 2019). In
addition, Convolutional Neural Network (CNN) is a multilayer network that comprises of several convolutional layers
followed max-pooling layers and finally fully connected layers (Lee et al, 2017). The convolutional layers perform
feature extraction from the input images, while the pooling layers reduce the dimensions of the images.
To date, several deep learning architectures based on CNN have been implemented to classify the lung nodules in CT
images such as ShuffleNet, DenseNet, GoogleNet, and MobileNet. However, these architectures require large-scale
labeled datasets to avoid overfitting and better generalization (Zhao et al., 2018). However, acquiring large amounts
of labeled data presents a significant challenge. The process is costly and time-consuming, requiring expert
involvement to label data that often suffers from a low signal-to-noise ratio.
Following this, GAN is proposed to tackle the problem of scarce real data, data imbalance and privacy problems.
Apart from generating synthetic images, various studies in the field of medical imaging have utilized GAN for other
purposes such as noise suppression, artifact suppression, and domain translation between different modalities. GAN
has been successfully used to generate synthetic images of lung nodules in large quantities and improve the
classification performance using CNN based classifier (Mohd Isham et al., 2021), (Onishi et al., 2019).
Basically, GAN consists of two networks namely the generator and discriminator. The generator network generates
synthetic images that mimic the real images while the discriminator determines whether the synthetic image is a fake
or true. The discriminator learns to differentiate the synthetic image from the real image. Each of these networks
works hand in hand to improve the classification while minimizing the errors. Along the training process, both the
generator and discriminator models compete against each other to improve their effectiveness, based on the min-max
game theory problem. The parameters of both the generator and the discriminator are updated through the
backpropagation process (Dash, Ye& Wang,2021).
Apart from using GAN to generate synthetic images, it has been widely used for other applications. For example,
GANs are used to segment CT images, which are then classified using models like VGG16, achieving high accuracy
rates of up to 97% (Swaminathan et al., 2023). Besides that, the Opt_att-GANC model combines CT and PET images
using an attention-based GAN, achieving an accuracy of 93.74% (Nandipati& Devarakonda, 2023). Furthermore,
GANs are integrated with Graph Neural Networks (GNNs) to analyze multimodal data, capturing relationships
between CT images, clinical, and molecular data, leading to improved accuracy and interpretability in lung cancer
detection (Pushpa et al., 2023). GANs are used to generate synthetic medical data to address data imbalance, as seen
in breast cancer diagnosis, where models like SNGAN and CGAN were effective in augmenting mammography and
ultrasound data, respectively (Jiménez-Gaona et al., 2024). In skin disease classification, GANs improve dataset
diversity by generating realistic images of various skin conditions, which are then used to train CNN models
(Mounica, 2024).
1583
ISSN: 2320-5407 Int. J. Adv. Res. 12(10), 1582-1591
SGAN has been proposed bySalimans et al., (2016) that offers significant advantages over traditional GANs in
generating synthetic images, primarily optimizing the use of both labeled and unlabeled data. This network enhances
the quality and diversity of generated images while reducing the reliance on extensive labeled datasets.
In specific to lung diseases classification using SGAN, Liu, Wang& Rong, (2019) reported that extending
unsupervised GAN methods to semi-supervised GANs significantly improves supervised learning with limited
labeled data for classifying six lung diseases on frontal chest X-ray images. They used additional labels on GAN-
synthesized samples to guide the training process and optimize network parameters using semi-supervised training
strategies.
It is also shown in Madani et al., (2018) that SGAN can learn structures within unlabeled X-ray image data
extensively, compensating for the low number labeled image datasets. This results in a substantial reduction in
labelled datasets to achieve a similar performance through supervised training techniques. They stated that semi-
supervised GAN is more robust in addressing the issue of domain data. This model achieves better accuracy as
compared to supervised CNN when labeled data is limited. In this case, SGAN requires less labeled training data to
achieve comparable performance to a supervised CNN classifier. For example, the proposed SGAN model only needs
10 labeled images for each class to achieve an accuracy of 73.08%– an accuracy that requires somewhere between
250 to 500 labeled images for a conventional CNN classifier.
Oluwasanmi et al., (2021) used SGAN to detect and classify COVID-19 in lung CT images with a total of 48 true
positive COVID-19 cases out of 50 images. They achieved 96% sensitivity. A semi-supervised model was built
following the GAN learning framework. During training, a small portion of the dataset was set aside to be trained
using supervised method with the discriminator being supervised. In the study, 100 images for each of the two classes
were subsampled, totaling 200. Another 100 images for each class were also set aside as a test set, totaling another
200. The remaining 1000 images were designated as the unlabeled set. The discriminator model utilized ResNet50
architecture as classifier and they showed that ResNet50 achieved the best result compared to DenseNet, Inception,
MobileNet, and VGG16. In Odena (2016), the SGAN-trained classifier is also able to perform as well as or better
than the standalone CNN model for the MNIST handwritten digit recognition task when trained with 25, 50, 100 and
1000 labeled.
Briefly, the classification of lung tumors using SGAN is active research in medical image analysis. SGAN is still
relatively new in its application across various medical classification domains. Various studies are underway to
understand SGAN as well as improving the stability of the GAN training process, as learning techniques in this
domain are still relatively new.
Methodology:-
1584
ISSN: 2320-5407 Int. J. Adv. Res. 12(10), 1582-1591
Figure 1 shows the conventional GAN architecture that has generator and discriminator network. The generator
generates a new image while the generator decides whether the image that is generated belongs to the real training
image or not. The generator takes a random number and produces an image. The resulting image is put into a
discriminator. Then, the discriminator takes the real and fake picture and determines its status through probability.
Number 0 is a fake while 1 represents a true prediction. The architecture is then connected to a CNN classifier to
classify the image. In this study, the long nodules CT images are classified into K=2 labels (benign or malignant)
using ResNet18 classifier.
Figure 2 shows the SGAN architecture that has two inputs: labeled and unlabeled data. The generator takes random
numbers and produces a synthetic image from the noise. All the images provided and produced are then channeled
to the discriminator. The synthetic images are differentiated by the discriminator using a softmax function to
generate false (K+1) image classes. Again, the discriminator takes real and fake images and determines its status
through a probability between 0 and 1.
The real picture is classified according to the labels that have been learned, or a new label (N+1) will be generated
for the picture that has no similarity with the existing labels. Since we only need two label classes, the discriminator
will classify the true image into K=2 labels (benign/malignant). Other label classes that have been generated will be
considered as classes for false data. The generator will produce a better image while the discriminator will
discriminate better.
In this study, the Lung Image Database Consortium image collection (LIDC-IDRI) was used (Armato et al., 2011).
This database has 1018 cases collected with the collaboration of seven academic institutions and eight medical
imaging companies. There are 1018 CT scans from 1010 patients with a total of 244,527 images. Figure 3 shows
some of the lung nodules CT images. For each CT scan, the DICOM image has a resolution measuring 512 x 512 x
width, with the number of slices varying from 65 to 764 slices. The total number of average slice widths on this
database is 240 slices. For deep learning training, the data is saved in PNG format. In this study, a total of 223 benign
data and 1423 malignant data are used which of 224 x 224 in the form of PNG files. Of these, 490 of the images are
set for validation and testing, while 200 images (100 benign, 100 malignant) as labeled images and the rest as
unlabeled images used for training. The programming was done using Intel Xeon CPU of 2.20GHz, NVIDIA T4
GPU graphics card, 12.7GB system memory, and 15GB graphics memory. The training parameters for GAN and
SGAN were made equal to conducting a fair performance analysis between these two as shown in Table 1.
1585
ISSN: 2320-5407 Int. J. Adv. Res. 12(10), 1582-1591
Fig 3:- Samples of benign and malignant nodules of LIDC-IDRI dataset (26-Armato III et al., 2011).
To measure the performance of both networks, quantitative and qualitative measures were carried out. For qualitative
evaluation, the generated images were evaluated based on the quality and diversity of the generated images. On the
other hand, the quantitative evaluation metrics are the cross-entropy loss, classification accuracy, specificity and
sensitivity. The cross-entropy loss is defined as:
Crossentropyloss = − i yi log
(pi ) (1)
where yi is the ground truth label (0 or 1) for the i-thsample, and pi is the predicted probability of class 1 (model
output) for the i-thsample.
Accuracy is a measurement of the correct prediction of the data by the network divided by the total data. Accuracy
can be calculated as follows:
TP + TN
Accuracy = (2)
TP + FP + FN + TN
Specificity is the number of data correctly labeled as negative relative to the total number of negative data.
Specificity can be calculated as follows:
TN
Specificity = (3)
FP + TN
Sensitivity is the number of data correctly labeled as positive relative to the total number of positive data. Sensitivity
can be calculated as follows:
1586
ISSN: 2320-5407 Int. J. Adv. Res. 12(10), 1582-1591
TP
Sensitivity = (4)
TP + FN
True Positive (TP) is the data on the malignant class which is true and True Negative (TN) is the data on the benign
class which is true. On the other hand, False Positive (FP) is the data in the malignant class which is wrong and False
Negative (FN) is the data in the benign class which is wrong.
Fig 4:- Image generation by the SGAN architecture at 1 iteration (left) and 50 iterations (right)
Figure 5 shows the new CT image data generated by a conventional GAN at 1 epoch and 50 epochs. The images
generated at 50 epochs have higher clarity and greater diversity than those produced at 1 epoch. Moreover, the details
in these images closely match the characteristics of real data. The quality of the generated images could be further
enhanced by increasing the number of epochs; however, in this study, the maximum is set to 50 epochs for both
architectures. At each epoch, both architectures generate 64 additional images, all of which are saved as PNG files.
Fig 5:- Image generation by the GAN architecture at 1 iteration (left) and 50 iterations (right).
1587
ISSN: 2320-5407 Int. J. Adv. Res. 12(10), 1582-1591
The graph in Figure 6 illustrates the changes of test loss during image generation using a conventional GAN model.
Initially, the loss is high, but it decreases rapidly over the first 20 epochs. This indicates that the model learns fast as it
improves its capability to generate realistic images. At around 20 epochs, the curve flattens at 0.004 at a convergence
point. Beyond approximately 40 epochs, there is a slight increase in test loss, which may be indicative of overfitting,
where continued training introduces noise rather than further enhancing image quality.
The graph in Figure 7 represents the test loss for the SGAN architecture across iterations. It shows a distinct pattern as
compared to Figure 6. There is an abrupt drop in test loss within just a few epochs from highvalue initially. This is
followed by stabilization and the test loss is consistently almost zero for the remaining iterations, indicating that
SGAN rapidly converges to a minimal loss and maintains stability.
1588
ISSN: 2320-5407 Int. J. Adv. Res. 12(10), 1582-1591
By comparison, the GAN’s test loss curve shows a gradual decrease with the curve slightly maintains around a stable
point but never reaching as close to zero as the SGAN. Additionally, the test loss increases after 40 epochs that
indicates overfitting, as the model starts incorporating noise rather than improving the quality of generated images. In
contrast, SGAN achieves a low-test loss more quickly and maintains this low value without fluctuation. It shows that
SGAN has a more robust convergence and generalization capability. This stability in SGAN indicates that the model
has learned the data effectively, requiring fewer epochs to reach optimal performance and reducing the risk of
overfitting. This suggests that SGAN might be better suited for generating high-quality images with fewer training
iterations.
Table 2 tabulates the classification performance of both GAN and SGAN in which the classifier used is ResNet18. It
shows that both the conventional GAN and SGAN architectures achieve an accuracy of 95% in classifying nodules.
This indicates that, in terms of overall accuracy, both models are equally performed. However, further analysis shows
that the conventional GAN demonstrates a higher specificity of 0.1029 compared to 0.0846 for SGAN. This suggests
that the conventional GAN is slightly better at correctly identifying negative cases (benign), reducing the likelihood
of false positives. In terms of sensitivity, the conventional GAN also outperforms the SGAN with a value of 0.9099
compared to SGAN's 0.8997. This higher sensitivity indicates that the conventional GAN is more effective at
correctly identifying positive cases (malignant), hence reducing false negatives. Finally, the test set loss that provides
the model’s performance during classification for both networks is also recorded. The conventional GAN has a
significantly lower test set loss of 0.0109 compared to SGAN’s 0.0686. This lower test set loss implies that the
conventional GAN is more robust and generalizes better as compared to SGAN.
Despite SGAN’s capability to stabilize faster in generating synthetic images, apparently this advantage does not
appear to translate into improved classification performance. Both the conventional GAN and SGAN achieve the
same accuracy level of 95%, indicating comparable effectiveness in nodule classification. This suggests that the
quality of the synthetic images, although visually enhanced with SGAN as indicated by almost zero test loss (Figure
7) has limited impact on the model's ability to classify nodules correctly.
This is possibly due to the nature of the generated images, which are grayscale images. While SGAN may excel at
creating synthetic images with finer details, these insignificant improvements may not significantly affect the model’s
performance in the classification task. Grayscale images lack color information that could otherwise contribute to
enhanced feature extraction. This lacking may potentially limit the benefits of SGAN’s enhanced image quality.
Second, the classification model (ResNet18) has already effectively extracted relevant features from the grayscale
images, regardless of the improvements offered by SGAN. Thus, while SGAN might produce better images for visual
inspection, the grayscale nature and the feature requirements of the classifier may reduce the benefit of having better
synthetic images.
In summary, although SGAN is advantageous in producing visually superior synthetic images, these improvements in
image generation do not lead to significant gain in classification performance. The grayscale format and the specific
features important to nodule classification appear to allow both GAN and SGAN to perform equally, emphasizing that
visual quality alone does not necessarily enhance model accuracy in the classification task.
Conclusion:-
The study evaluates the image generation and classification performance of SGAN and conventional GAN models. In
this research, the performance of data generation by both architectures has been assessed qualitatively and
quantitatively. For the qualitative evaluation, the generated nodule images are assessed in terms of quality and
diversity. In the quantitative evaluation, the metric used is cross-entropy loss recorded at each epoch. While SGAN
and conventional GAN produce images with different levels of quality and diversity, both models achieve a
satisfactory level of similarity between generated and real images. When analyzing test loss at each epoch, SGAN
shows a better average cross-entropy loss, with values closer to 0, indicating its superior performance in generating
1589
ISSN: 2320-5407 Int. J. Adv. Res. 12(10), 1582-1591
images that closely mimic real data. In terms of classification performance, the study shows that while both SGAN
and conventional GAN architectures achieve the same accuracy of 95%, the conventional GAN outperforms SGAN
in other metrics, namely specificity, sensitivity, and test set loss. These metrics indicate that the conventional GAN is
more effective at distinguishing between classes (nodules vs. non-nodules) with higher reliability, reducing false
positives and false negatives.
Acknowledgement:-
The authors would like to acknowledge the Ministry of Higher Education Malaysia (MOHE) for the Fundamental
Research Grant Scheme (FRGS) Project Code: FRGS/1/2024/TK07/UKM/02/6 and the UniversitiKebangsaan
Malaysia.
References:-
1. Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., & Bray, F. (2021). Global
cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185
countries. CA: A Cancer Journal for Clinicians, 71(3), 209–249. https://doi.org/10.3322/caac.21660
2. Zhang, M., Zhuo, N., Guo, Z., Zhang, X., Liang, W., Zhao, S., & He, J. (2015). Establishment of a mathematic
model for predicting malignancy in solitary pulmonary nodules. Journal of Thoracic Disease, 7(10), 1833.
3. Tran, G. S., Nghiem, T. P., Nguyen, V. T., Luong, M., Burie, J. C., & Levin-Schwartz, Y. (2019). Improving
accuracy of lung nodule classification using deep learning with focal loss. Journal of Healthcare Engineering,
2019. https://doi.org/10.1155/2019/2048356
4. Cheng, X., Li, J., Mi, M., Wang, H., Wang, J., & Su, P. (2024). Accuracy study on deep learning-based CT
image analysis for lung nodule detection and classification. Traitement Du Signal, 41(2), 891–899.
https://doi.org/10.18280/ts.410229
5. Islam, T., Hafiz, M. S., Rahman, J., Kabir, M. M., &Mridha, M. F. (2024). A systematic review of deep learning
data augmentation in medical imaging: Recent advances and future research directions. Healthcare Analytics.
https://doi.org/10.1016/j.health.2024.100340
6. Li, S., & Wang, T. (2024). Data enhancement method based on cyclic generation adversarial network. Journal
of Computing and Electronic Information Management. https://doi.org/10.54097/2oatvge7
7. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio,
Y. (2014, June 10). Generative adversarial networks. ArXiv preprint. https://arxiv.org/abs/1406.2661
8. Yi, X., Walia, E., &Babyn, P. (2018). Generative adversarial network in medical imaging. ArXiv preprint.
https://arxiv.org/abs/1809.07294
9. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques
for training GANs. Advances in Neural Information Processing Systems, 29.
10. Kushwaha, M., Nandanwar, A. K., Varghese, A., Choudhary, J., & Singh, D. P. (2024). Improved semi-
supervised image classification using GAN. 2024 IEEE SCEECS.
https://doi.org/10.1109/sceecs61402.2024.10481944
11. Toutouh, J., Nalluru, S., Hemberg, E., & O'Reilly, U. (2023). Semi-supervised generative adversarial networks
with spatial coevolution for enhanced image generation and classification. Applied Soft Computing.
https://doi.org/10.1016/j.asoc.2023.110890
12. Tran, G. S., Nghiem, T. P., Nguyen, V. T., Luong, C. M., Burie, J. C., & Levin-Schwartz, Y. (2019). Improving
accuracy of lung nodule classification using deep learning with focal loss. Journal of Healthcare Engineering,
2019. https://doi.org/10.1155/2019/2048356
13. Lee, J. G., Jun, S., Cho, Y. W., Lee, H., Kim, G. B., Seo, J. B., & Kim, N. (2017). Deep learning in medical
imaging: General overview. Korean Journal of Radiology, 18(4), 570–584.
14. Zhao, L., Zhu, D., Lu, J., & Luo, Y. (2018). Synthetic medical images using F & BGAN for improved accuracy.
SS Symmetry, 1(16).
15. Mohd Isham, N. N., Mokri, S. S., Abd Rahni, A. A., & Ali, N. F. (2021). Classification of lung nodules in CT
images using conditional generative adversarial–convolutional neural network. International Journal of
Nonlinear Analysis and Applications, 12(Special Issue), 1047–1058.
16. Onishi, Y., Teramoto, A., Tsujimoto, M., Tsukamoto, T., Saito, K., Toyama, H., & Imaizumi, K. (2019).
Automated pulmonary nodule classification in computed tomography images using a deep convolutional neural
network trained by generative adversarial networks. 2019 International Conference on Biomedical Engineering
and Bioinformatics.
17. Dash, A., Ye, J., & Wang, G. (2021). A review of generative adversarial networks (GANs) and their
1590
ISSN: 2320-5407 Int. J. Adv. Res. 12(10), 1582-1591
1591