9Intensity-based statistical features for classification of lungs CT scan nodules using
9Intensity-based statistical features for classification of lungs CT scan nodules using
Intelligence
Sheeraz Akram, Muhammad Younus Javed, Ayyaz Hussain, Farhan Riaz & M.
Usman Akram
To cite this article: Sheeraz Akram, Muhammad Younus Javed, Ayyaz Hussain, Farhan Riaz &
M. Usman Akram (2015) Intensity-based statistical features for classification of lungs CT scan
nodules using artificial intelligence techniques, Journal of Experimental & Theoretical Artificial
Intelligence, 27:6, 737-751, DOI: 10.1080/0952813X.2015.1020526
1. Introduction
The humans are suffering from different diseases. The cancer is the most dangerous of all
diseases. More number of people are suffering from lung cancer (Greenlee, Murray, Bolden, &
Wingo, 2000) and the number of people who died from lung cancer is high than any other
cancers (Jung et al., 2011). The survival rate of lung cancer patients can be increased by more
than 50% by early detection of the lung cancer, which is only 14% at present (Jung et al., 2011).
The survival rate has significantly improved, but there is need to increase this survival rate.
In order to detect lung cancer at an early stage, there is a need to have an inner view of the body.
In our case, we have the inner view of body using CT scan which provides 3D lung images. The
lung CT scans are not of good quality at the initial stage, so there is a need to enhance these
images so that the any lesion can be identified.
The cancer is caused by different lesions that are produced in the different body parts. Such
lesions are referred to as nodule if they cause cancer, otherwise non-nodule. The main task in the
design of a CAD system is segmentation of the volume of a particular body part, such as lung
volume, to be separated from the complete image so that we can keep our focus on the object of
interest. The next task is to separate the objects from the lung volume which are not part of the
lungs. These objects are unwanted lesions. The unwanted lesions are potential nodules.
The potential nodules are classified into nodules and non-nodules which is the next task of the
CAD system.
The lung CT scans are observed by the experts for their opinion but the medical experts with
same expertise are not available at every place. The need of certain guidance of such medical
experts, the field of medical imaging introduced computer-aided diagnostic (CAD) systems
which help the medical specialist to identify and categorise the problem.
In this paper, this section gives an introduction, the next section is literature review related to
proposed technique, and the proposed method is discussed briefly in Section 3. Section 4 is
preprocessing to process the lung CT scans, Section 5 is candidate nodule and feature extraction
and nodule pruning, Section 6 is on Candidate Nodule up-sampling. Section 7 is Support Vector
Machine (SVM)-based Classification, Section 8 is on Results and Discussion and the last section
concludes the work.
2. Literature review
The lesions are detected automatically by scanning of the lung images. The various methods are
introduced to classify these lesions as nodules and non-nodules. In Ozekes (2007), the density
value of each pixel is calculated, the rule-based lung region segmentation is performed and the
Regions of Interest (ROIs) are extracted using 8-directional search. The classification is
performed by Location Change Measurement and the later nodules are searched using trained
Genetic Algorithm from the ROIs images. In the lung segmentation by the Genetic Cellular
Neural Network, the ROIs are extracted based on 8-directional search. The nodules are detected
by searching through 3D image with 3D template using convolution-based filter. The Fuzzy
Rule-Based Thresholding is used to further refine the detected nodules in Ozekes, Osman, and
Ucan (2008). In Xujiong Ye, Lin, Dehmeshki, Slabaugh, and Beddoe (2009), the lung
segmentation is performed, and the 3D nodules are extracted by anti-geometric diffusion,
volumetric shape features, Gaussian filtering and multi-scale dot enhancement filtering. The 3D
potential nodules are segmented. The 2D and 3D features are calculated, and the Rule-Based
filtering and weighted SVM are used for Nodule classification. In Retico et al. (2009), the
pleural regions are identified by directional-gradient concentration and morphological opening.
The ROIs are extracted from segmented pleura region. The features are extracted and candidate
nodules are classified using Feed-forward Neural Network. The region growing algorithm is
used to identify the lung parenchyma. The rolling-ball methodology is used to correct the
boundaries of pleura; the region growing algorithm is used again to identify the lung nodule.
The SVM is used to reduce the false positives in Da Silva Sousa, Silva, Paiva, and Nunes
(2010). In Lee, Kouzani, and Hu (2010), the nodule classification is achieved by classification
of training data-set into nodules and non-nodules using clustering. The nodules and non-nodules
obtained from clustering are used for training of SVM. This is ensemble classification aided by
clustering. In Maeda et al. (2011), the consecutive CT images are temporally subtracted to
detect candidate nodules. The features of candidate nodules are calculated and the candidates
are refined using rule-based feature analysis. The feature space is reduced using PCA. The
Artificial Neural network is used for nodule classification. The isotropic resampling of CT
Journal of Experimental & Theoretical Artificial Intelligence 7393
image is performed to change the resolution of image. The lung is segmented using the
established techniques of segmentation. The nodule centre is estimated based on divergence of
normalised gradient. The multi-scale nodule and vessel enhancement filtering is used to
segment nodule clusters. The invariant, shape and regional descriptor are calculated and the mix
of ANN, GA (FD-NEAT) and SVM is used for feature selection and nodule classification (Tan,
Deklerck, Jansen, Bister, & Cornelis, 2011). In Choi and Choi (2012), the lung volume is
extracted by thresholding, contouring correction and morphological operation. The candidate
nodules are obtained by multiple thresholding technique from lung volume. The extracted
candidate nodules are pruned using rule-based pruning method. The rules are designed
according to the features of a lung nodule. The features are extracted from the candidate
nodules. The genetic programming classifier is trained and used for the classification of nodules
and non-nodules. Choi and Choi (2013) introduce the hierarchical block classification approach
using SVM for nodule classification. The CT image is split into blocks. The non-informative
blocks are discarded. The block image is enhanced and the object is segmented in the block
image. The location of the block is adjusted. The features are extracted from nodule candidate
block images. The SVM is used to classify candidate nodules as nodules and non-nodules.
In Tartar, Kilic, and Akan (2013), the features are selected from the images of nodules and non-
nodules. The features included 2D-PCA values with minimum Redundancy Maximum
Relevance, Statistical values of 2D-PCA with minimum Redundancy Maximum Relevance, and
2D-Geometric shape features with minimum Redundancy Maximum Relevance. The best
features are selected from the extracted features. The ANN, RF, Bagging and Adaboost are used
for training and testing. In El-Baz et al. (2013), the lung nodules, arteries, veins, bronchi and
bronchioles are isolated from the surrounding structures. The 2D and 3D deformable templates
are used to describe the geometry of nodules and the genetic optimisation algorithm is also used
in the detection. Later, the false positives are reduced from the detected nodules. In Rebouc as
Filho, Cortez, and Albuquerque (2013), the segmentation in lung CT images is performed using
region growing, 3D region growing variations and multi-thresholding to segment the blood
vessels, lung emphysema and bones. The work done previously focuses separately on different
parts such as segmentation, feature extraction and classification. There is a need of work that
focuses on all parts at once place.
3. Proposed method
In this paper, the classification of nodules using SVM is proposed. The block diagram is given
in Figure 1. The proposed classification consists of the following steps. First, the Lung CT Scan
is processed to obtain the lung lobe region. In next step, the candidate nodules are obtained and
pruning is performed to reduce the false positive. Later, the features are extracted from
candidate nodules and the required features are selected. The number of candidate nodules is
increased by up-sampling and SVM classifier is used for the classification of nodules and
non-nodules. The proposed method improves the accuracy of nodule classification. The
proposed method is evaluated on Lung Image Database Consortium Database (LIDC) (Reeves
et al., 1997).
4. Preprocessing
The lung CT image contains values in HU (Hounsfield units). The Lung CT scans are 3D
images. Each image contains slices ranges from 100 to 250. The size of each slice is 512 rows
and 512 columns. The lung CT scans are preprocessed using steps given next.
7440 S. Akram et al.
4.1 Thresholding
The HU values in each Lung CT scan range from 2 2000 to þ 2000 HU. The lung area is a low
density area ranging from 2 1000 to 2 450 HU, called non-body area. The CT scanner area is
also part of the non-body area of Lung CT scan. The body area contains the surrounding of lung
lobes. The lungs are in the non-body area, so threshold it at 2 500 HU (Brown et al., 1997; Hu,
Hoffman, & Reinhardt, 2001; Jemal et al., 2009). The voxels value below 2 500 HU contains
lung volume, and voxels values above þ 500 HU contain the body volume. Figure 2(b) shows
the result of thresholding.
If voxel value is less than 2 500 HU then
Voxel value ¼ minimum HU value in lung CT
Else
Voxel value ¼ maximum HU value in lung CT
scanner producing a cylinder around the lungs and the body area. There is a need to remove the
cylinder. The 3D connected component approach is applied to make all the component
boundaries accurate (Messay, Hardie, & Rogers, 2010; Suárez-Cuenca et al., 2009). The non-
body component touching the sides of Lung CT image is removed and the voxel values are set
to background values, i.e. the value of the body voxels. Figure 2(c) shows the Background
Removed Lung CT image as a result of Background Removal.
median slice. Multiple threshold values are calculated as the vessels and nodules have different
densities depending on the type of nodule. These ROIs contain both nodules and vessels.
The 2D statistical features for intensity values are extracted from the median slice of the
segmented object. The 3D statistical features for intensity values are extracted from the 3D
segmented object.
The statistical features for intensity values are calculated as given in Equations (1) –(4)
Pn
¼ i¼1 X i
Mean ðXÞ ; ð1Þ
n
Pn
2 X 2 Þ2
i¼1 ðX i
Variance ðs 2 Þ ¼ ; ð2Þ
n21
Pn
i¼1 ðX i
2 XÞ 4
Kurtosis ¼ ; ð3Þ
ðn 2 1Þs 4
Pn
i¼1 ðX i
2 XÞ 3
Skewness ¼ : ð4Þ
ðn 2 1Þ 3
The accuracy, sensitivity, specificity and AUC (Area under Receiver Operating Curve [ROC
curve]) are measured as given in Equations (5) – (7).
TN þ TP
Accuracy ¼ ; ð5Þ
TN þ TP þ FN þ FP
TP
Sensitivity ¼ ; ð6Þ
TP þ FN
TN
Specificity ¼ : ð7Þ
TN þ FP
Table 3 shows the result of 2D statistical features for intensity values. With 50– 50 training
testing ratio, 93.37% accuracy, 94.92% sensitivity and 91.81% specificity are achieved. Figure 4
shows the ROC curve for 2D statistical features for intensity values for 30– 70, 50 –50 and 70–
30 training – testing ratios. The AUC is highest for the 70 –30 training – testing ratio. Table 4
shows the result of the 3D statistical features for intensity values. With 50– 50 training – testing
ratio, 84.90% accuracy, 87.65% sensitivity and 82.15% specificity are achieved. Figure 5 shows
the ROC curve for 3D statistical features for intensity values for 30– 70, 50 – 50 and 70– 30
training –testing ratios. The AUC is highest for the 70– 30 training –testing ratio. Table 5 shows
the result of 2D and 3D statistical features for intensity values. With 50 –50 training testing ratio,
96.54% accuracy, 96.31% sensitivity and 96.77% specificity are achieved. Figure 6 shows the
ROC curve for 2D and 3D statistical features for intensity values for 30– 70, 50 –50 and 70– 30
training –testing ratios. The AUC is highest for the 70– 30 training – testing ratio.
Figures 7 and 8 show the scatter graphs of 2D statistical features for intensity values and 3D
statistical features for intensity values.
Suzuki, Armato, Li, Sone, and Doi (2003) worked for nodules of size 8– 20 mm with
sensitivity 80.3%. Rubin et al. (2004) worked for nodule size $ 3 with sensitivity of 76%.
Dehmeshki, Ye, Lin, Valdivieso, and Amin (2007) worked for nodule size 3– 20 mm with
sensitivity of 90%. Suárez-Cuenca et al. (2009) worked for nodule sizes 4 –27 mm with
sensitivity of 80%. Opfer and Wiemker (2007) worked for nodule size $ 4 mm with sensitivity
of 74%. Sahiner et al. (2007) worked for nodule size 3– 36.4 mm with sensitivity of 79%.
Messay, Hardie, and Rogers (2010) worked for nodule size 3– 30 mm with sensitivity of
748
12 S. Akram et al.
82.66%. Choi and Choi (2012) worked for nodule size 3 – 30 mm with sensitivity of 94.1%. Choi
and Choi (2013) worked for nodule size 3 –30 mm with sensitivity of 95.28%. The Proposed
Method work for nodule size 3 –30 mm with sensitivity of 96.31%, that is better than the earlier
Journal of Experimental & Theoretical Artificial Intelligence 749
13
techniques. The performance comparison of earlier CAD systems and the proposed CAD system
is represented in Table 6.
9. Conclusion
In this paper, a novel technique based on intensity-based statistical feature using SVM for
automatic pulmonary nodule detection is presented. The thresholding, background removal,
hole-filling and contour correction is performed to extract the lung volume. The candidate
nodules are extracted from the lung volume. The candidate nodules are pruned using rules based
on information of nodules. The intensity-based statistical 2D and 3D features are extracted. The
SVM classifier is trained using nodule samples from the data-set. The classifier is evaluated by
selecting nodules from extracted nodules from data-set of LIDC. The classifier achieves the
sensitivity of 96.31% with accuracy of 96.54% that is improved than the existing CAD systems.
In future work, the sensitivity and accuracy can be further improved by reducing the false
positives by improving candidate nodule pruning algorithm.
Acknowledgement
This work is not financially supported by any funding agency.
Disclosure statement
No potential conflict of interest was reported by the authors.
References
Brown, M. S., McNitt-Gray, M. F., Mankovich, N. J., Goldin, J. G., Hiller, J., Wilson, L. S., & Aberie, D. R.
(1997). Method for segmenting chest CT image data using an anatomical model: Preliminary results.
IEEE Transactions on Medical Imaging, 16, 828– 839. doi:10.1109/42.650879
Choi, W. J., & Choi, T. S. (2012). Genetic programming-based feature transform and classification for the
automatic detection of pulmonary nodules on computed tomography images. Inform Sciences, 212,
57 – 78. doi:10.1016/j.ins.2012.05.008
Choi, W. J., & Choi, T. S. (2013). Automated pulmonary nodule detection system in computed tomography
images: A hierarchical block classification approach. Entropy, 15, 507– 523. doi:10.3390/
e15020507
750
14 S. Akram et al.
Da Silva Sousa, J. R. F. S., Silva, A. C., De Paiva, A. C., & Nunes, R. A. (2010). Methodology for automatic
detection of lung nodules in computerized tomography images. Computer Methods and Programs in
Biomedicine, 98(1), 1 – 14. doi:10.1016/j.cmpb.2009.07.006
Dehmeshki, J., Ye, X., Lin, X., Valdivieso, M., & Amin, H. (2007). Automated detection of lung nodules in
CT images using shape-based genetic algorithm. Computerized Medical Imaging and Graphics, 31,
408– 417. doi:10.1016/j.compmedimag.2007.03.002
El-Baz, A., Elnakib, A., El-Ghar, M. A., Gimel’farb, G., Falk, R., & Farag, A. (2013). Automatic detection
of 2D and 3D lung nodules in chest spiral CT scans. International Journal of Biomedical Imaging,
2013, 1 – 11.
Greenlee, R. T., Murray, T., Bolden, S., & Wingo, P. A. (2000). Cancer statistics, 2000. CA: A Cancer
Journal for Clinicians, 50, 7 – 33. doi:10.3322/canjclin.50.1.7
Hu, S., Hoffman, E. A., & Reinhardt, J. M. (2001). Automatic lung segmentation for accurate quantitation
of volumetric X-ray CT images. IEEE Transactions on Medical Imaging, 20, 490– 498. doi:10.1109/
42.929615
Jemal, A., Siegel, R., Ward, E., Hao, Y., Xu, J., & Thun, M. J. (2009). Cancer statistics, 2009. CA:
A Cancer Journal for Clinicians, 59, 225– 249. doi:10.3322/caac.20006
Jung, K. W., Won, Y. J., Park, S., Kong, H. J., Sung, J., Shin, H. R., . . . Lee, J. S. (2011). Cancer statistics
in Korea: Incidence, mortality and survival in 2005. Journal of Korean Medical Science, 43, 1 –11.
Jusoh, N. A., & Zain, J. M. (2009). Application of freeman chain codes: An alternative recognition
technique for Malaysian car plates. International Journal of Computer Science and Network
Security, 9, 222– 227.
Lee, S. L. A., Kouzani, A. Z., & Hu, E. J. (2010). Random forest based lung nodule classification aided by
clustering. Computerized Medical Imaging and Graphics, 34, 535–542. doi:10.1016/j.compmedi-
mag.2010.03.006
Maeda, S., Tomiyama, Y., Kim, H., Miyake, N., Itai, Y., Tan, . . . Yamamoto, A. (2011). Detection of lung
nodules in thoracic MDCT images based on temporal changes from previous and current images.
Journal of Advanced Computational Intelligence and Intelligent Informatics, 15, 707– 713.
Messay, T., Hardie, R. C., & Rogers, S. K. (2010). A new computationally efficient CAD system for
pulmonary nodule detection in CT imagery. Medical Image Analysis, 14, 390– 406. doi:10.1016/j.
media.2010.02.004
Opfer, R., & Wiemker, R. (2007, March). Performance analysis for computer-aided lung nodule detection
on LIDC data. In Medical imaging 2007: Image perception, observer performance, and technology
assessment, Vol. 6515 of Proceedings of the SPIE, San Diego, CA, 65151C.
Ozekes, S. (2007). Rule-based lung region segmentation and nodule detection via Genetic Algorithm
trained template matching. Istanbul Commerce University Journal of Science, 6, 17 –30.
Ozekes, S., Osman, O., & Ucan, O. N. (2008). Nodule detection in a lung region that’s segmented with
using genetic cellular neural networks and 3D template matching with fuzzy rule based thresholding.
Korean Journal of Radiology, 9(1), 1 – 9. doi:10.3348/kjr.2008.9.1.1
Paik, D. S., Beaulieu, C. F., Rubin, G. D., Acar, B., Jeffrey, Jr, R. B., Yee, J., Dey J. & Napel, S. (2004).
Surface normal overlap: A computer-aided detection algorithm with application to colonic polyps
and lung nodules in helical CT. IEEE Transactions on Medical Imaging, 23, 661– 675. doi:10.1109/
TMI.2004.826362
Rebouc as Filho, P. P., Cortez, P. C., & de Albuquerque, V. H. C. (2013). 3D segmentation and visualization of
lung and its structures using CT images of the thorax. Journal of Biomedical Science and Engineering,
6, 1099– 1108. doi:10.4236/jbise.2013.611138
Reeves, A. P., Biancardi, A. M., Apanasovich, T. V., Meyer, C. R., MacMahon, H., Beek, E. J., . . . Clarke,
L. P. (1997). The Lung Image Database Consortium (LIDC): A comparison of different size metrics
for pulmonary nodule measurements. Academic Radiology, 14, 1475– 1485.
Retico, A., Fantacc, M. E., Gori, I., Kasae, P., Golosio, B., Piccioli, A., . . . Tangaro, S. (2009). Pleural
nodule identification in low-dose and thin-slice lung computed tomography. Computers in Biology
and Medicine, 39, 1137– 1144.
Journal of Experimental & Theoretical Artificial Intelligence 751
15
Rubin, G. D., Lyo, J. K., Paik, D. S., Sherbondy, A. J., Chow, L. C., Leung, A. N., . . . Napel, S. (2004).
Pulmonary nodules on multi-detector row CT scans: Performance comparison of radiologists and
computer-aided detection. Radiology, 234, 274– 283.
Sahiner, B., Hadjiiski, L. M., Chan, H., Shi, J., Cascade, P. N., Kazerooni, E. A., . . . Poopat, C. (2007,
March). Effect of CAD on radiologists’ detection of lung nodules on thoracic CT scans: Observer
performance study. In Proceedings of SPIE 6515, medical imaging 2007: Image perception,
observer performance, and technology assessment, Vol. 6515 of Proceedings of the SPIE, San
Diego, CA, 65151D.
Suárez-Cuenca, J. J., Tahoces, P. G., Souto, M., Lado, M. J., Remy-Jardin, M., Remy, J., & Vidal, J. J.
(2009). Application of the iris filter for automatic detection of pulmonary nodules on computed
tomography images. Computers in Biology and Medicine, 39, 921– 933.
Suzuki, K., Armato, S. G., Li, F., Sone, S., & Doi, K. (2003). Massive training artificial neural network
(MTANN) for reduction of false positives in computerized detection of lung nodules in low-dose
computed tomography. Medical Physics, 30, 1602– 1617. doi:10.1118/1.1580485
Tan, M., Deklerck, R., Jansen, B., Bister, M., & Cornelis, J. (2011). A novel computer-aided lung nodule
detection system for CT images. Medical Physics, 38, 5630 –5645. doi:10.1118/1.3633941
Tartar, A., Kilic, N., & Akan, A. (2013). Classification of pulmonary nodules by using hybrid features.
Computational and Mathematical Methods in Medicine, 2013, 1 – 11. doi:10.1155/2013/148363
Xujiong, Y., Xinyu, L., Dehmeshki, J., Slabaugh, G., & Beddoe, G. (2009). Shape-based computer-aided
detection of lung nodules in thoracic CT images. IEEE Transactions on Biomedical Engineering, 56,
1810– 1820. doi:10.1109/TBME.2009.2017027