Analysison Various Feature Extraction Methodsfor Medical Image Classification
Analysison Various Feature Extraction Methodsfor Medical Image Classification
net/publication/342710840
CITATIONS READS
8 4,309
2 authors:
All content following this page was uploaded by S. Vani Kumari on 19 November 2020.
1 Introduction
Soft Computing (SC) uses various computational methodologies that are used to
exploit the tolerances in terms of imprecision and uncertainty in order to obtain
vigorous, tractable and low cost solution. Since problems in real world are highly
influenced by impreciseness, uncertainties and categorical nature, SC is preferred
to be applied in various applications such as pattern recognition, image processing
especially medical images, data mining etc. Neural Networks (NN), Probabilistic
reasoning, Support Vector Machines, Fuzzy logic and Evolutionary Computation
are the principal techniques of soft computing [1].
Medical image processing is utilized to enhance the interpretability of the gener-
ated images and intensify the assessment of particular features for both automated
and manual data management [2]. For this reason, the medical research communities
focus on digital image processing to generate sufficient records.
Preprocessing being an important phase in medical image processing is for
improving the quality of the image as low contrast, unnecessary noise and weak
boundaries are the general qualities of medical images. The preprocessing phase
includes tasks like background removal and filtering. Contour methods can be used
to separate the background from the foreground in an image and to identify contin-
uous boundaries of a medical image [3]. Active Contour Method is a well-known
Contour method used for separating the necessary pixels from the background [4].
After separating the original image from background, filtering is applied to improve
the quality of the image. Wiener Filter one the popular filtering technique used to
remove Gaussian noise, salt and pepper noise and speckle noise from the medical
images [5]. Discrete Fourier Transform (DFT) being a popular technique can be used
for preserving the rotation invariance of an image. Image segmentation is used to
identify the segmented part of interest that contains the abnormalities in the medical
images [6]. Watershed Algorithm is a popular segmentation method used for region
segmentation and separates the overlapping images [7].
Texture of an image describes the arrangement of intensities and interesting char-
acteristics of the image. Textural features play an important role in image processing
and are therefore used in applications such as remote sensing, medical image pro-
cessing and image retrieval based on content based image retrieval. Relevant features
from the segmented part of interest are extracted during feature extraction. Gray Level
Co-occurrence Matrix (GLCM) is a well-known feature extraction method that uses
co-occurrence or dependency matrices based on the gray level and distribution of
pixels [8]. The co-occurrence matrices are used to measure texture of the image.
Gray Level Run Length Matrix (GLRM) a famous feature extraction method based
on the histogram of the image [9]. Run length is the number of adjacent pixels in a
particular direction with the same gray intensity
Local Binary Pattern (LBP) is a widely used approach for extracting features from
computer vision images [10]. LBP requires simple calculations and is also invariant
to illumination. LBP is used in textural analysis of real time data in many applications
Analysis on Various Feature Extraction Methods … 21
like face analysis and motion analysis. It concentrates on histogram statistics. LBP
gives the information about the center pixel and its neighbors.
Completed Local Binary Patterns (CLBP) are popular texture feature extraction
methods based on neighborhood property [11]. Local Tetra Patterns (LTrP) is another
popular feature extraction method which encodes the neighboring pixels into three
different values based on the threshold and later on these neighboring pixels are
combined after thresholding [12].
The recent advancements in Computer-Aided Detection and Diagnosis study
necessitate the usage of classification methods to train the classifiers using medi-
cal image Datasets to recognize the disease proficiently [13]. Classification is being
extensively used in statistics, machine learning etc. Some of the well-known classi-
fiers are Support vector machines, Decision Trees, Neural Networks and Bayesian
classifier. Neural Networks and SVMs possess a unique ability of learning from
previous experiences hence can being applied in medical diagnosis [14, 15].
In this study three popular methods Active Contour, Wiener Filter and DFT are
applied in the preprocessing phase. Five best feature extraction methods viz. GLCM,
GLRM, LBP, CLBP and LTrP are considered for extracting the features from medical
images and finally two well-known classifiers MLPBPN and SVM are used for
classification.
Section 2 discusses related work, Sect. 3 addresses methods and methodology,
Sect. 4 deliberates the experimental results and conclusion of paper is in Sect. 5.
2 Related Work
Oliveira et al. [4] reviewed various image segmentation that can be applied on skin
lesions and discussed the importance of Active Contour Method for boundary detec-
tion. Hemalatha et al. [16] discussed the importance of Active Contour Method
for segmenting different medical images. Contours are a group of points obtained
by interpolation operation. Active Contour models are used in medical images for
early diagnosis and detection of abnormalities and are also used to separate the
foreground from the background. George et al. [17] experimented five filtering tech-
niques viz. Gaussian, Wiener, Adaptive, Mean and Median on mammogram image
Dataset demonstrated that the Wiener Filter is best compared with the rest of the
other techniques. Francesco et al. [6] proposed a method for normalization of rota-
tion dependent features to rotation invariant features by applying Discrete Fourier
Transform on four textural Datasets.
Avinash et al. [7] developed an image processing method that uses Gabor filter for
image enhancement and Watershed Method for segmentation on lung cancer Dataset.
Nayak et al. [18] applied Watershed Algorithm to segment the images and identify
the points with minima in the surrounding region and develop catchment basins. The
method is applied on lung cancer images to detect the abnormalities. The efficiency
of the segmentation method is evaluated based on right classification, overlapped
area and dice coefficient.
22 S. Vani Kumari and K. Usha Rani
Abbas et al. [8] extracted ten different GLCM features of a segmented skin lesion
and performed classification using SVM. The performance of the classifier is evalu-
ated based on sensitivity, accuracy and specificity. Öztürk et al. [9] extracted features
of histopathologic images for identifying the cancerous regions by applying GLCM,
LBP and other feature extraction methods. KNN, SVM, LDA and Boosted Trees
classifier are used for classification of these images.
Camlica et al. [10] proposed a methodology to classify medical images for content
based retrieval based on a folded method. Textural features of medical images are
extracted using LBP and SVM is used as a classifier and the performance of the
method is compared based on the time needed to train the images with and without
the folded method. Chen et al. [11] proposed a method to classify remote sensing
data based on Gabor filter for image enhancement and CLBP for extracting features.
Oberoi et al. [12] developed a framework for classification of retinal images by
extracting features using LTrP. The texture of an image is described considering the
relationship between a center pixel and its neighboring pixels based on directions.
Gautam et al. [14] classified mammogram images by considering four GLCM
features using back propagation neural network (BP-NN). Mehdy et al. [19] reviewed
various neural networks that can be applied to detect breast cancer in its early stages.
Karayılan et al. [20] applied MLPBPN to classify heart Dataset. The method used the
13 clinical features that were provided with the Cleveland Dataset. Four performance
metrics were used to evaluate the proposed method.
Chang et al. [21] developed a CAD for classifying thyroid nodules as benign
and malignant using SVM classifier. The features of segmented thyroid images were
extracted using GLRM. Naraei et al. [22] used both MLP and SVM for classification
of online heart disease database. Clinical features were considered for classification.
Experimentation is done using cross fold validation.
This section presents various methods considered in this study for preprocessing,
segmentation, extraction of features and classification. This section also provides
details on the proposed methodology and description of the data sets used in this
study for experimentation.
3.1 Preprocessing
Medical diagnosis is a crucial task which involves processing medical images. The
process of medical image acquisition is prone to noise hence the images are to be
preprocessed.
Analysis on Various Feature Extraction Methods … 23
(a) Active Contour is described as a group of co-ordinates at the control points para-
metrically and eliminate the background as portrayed in Eq. (1) in which m(s) and
n(s) indicates the m, n co-ordinates pass the contours and s specifies the normalized
index of the control points [16].
The energy function contains external and internal energy in which the internal
energy has bend and elastic forces, and conversely, external energy forces the curve
toward the edges and there would be an effect of external conditions also.
(b). Weiner Filter is used to attain optimal mean square amplitude error with the pass-
band [17]. Assume ds(t) as distorted signal with Ft(w) Fourier transform and the
desired signal after the elimination of distortion be de(t) with the Fourier transform
as Fd(w). Moreover, noise in spectrum is referred to as n(w) Eq. (2) shows the input
with noise signals subjected into a filter W i(w) and its corresponding output O(w).
Equation (3) shows error among the desired signal.
(c). DFT is used to transform a fixed sequence of samples in a function into samples
of discrete-time Fourier transform [6]. The purpose of DFT is to obtain rotation
invariance of images.
Further the labels are allotted to recognize the watershed in images. Regions
allotted to various watersheds with respect to the random conditions are referred as
Water Parting Regions. This algorithm is applied to determine the difference among
pixels in a region.
24 S. Vani Kumari and K. Usha Rani
This subsection presents five well-known feature extraction techniques for extracting
features from different medical data sets.
(a). GLCM is well-known statistical feature extraction method that searches for
the nature of the texture using relationship among the gray levels [23]. It is spec-
ified as a probability of 2 pixels (m, n) with the distance de at direction ϑ which
refers to the probability of (m, n) with similar intensity pi and denotes the square
array i.e. no. of columns and rows are same as the no. of distinct levels of images.
Features extracted from GLCM are Autocorrelation, Correlation, Contrast, Cluster
Prominence, Dissimilarity, Cluster Shade, energy, homogeneity, Maximum Proba-
bility, entropy, variance, Sum Entropy, Difference Variance, Information measure
of Correlation, Sum Variance, Inverse difference normalized and Inverse difference
moment normalized.
(b). GLRM uses 2-D matrix to extract features to analyze the features through runs
of pixels with equal gray levels where run length indicates the number of adjacent
pixels with equal gray intensity in specific direction [9]. Seven different features
extracted from the GLR matrix are Low Gray Level Run Emphasis, High Gray
Level Run Emphasis, Gray Level Non-uniformity, Long Run Emphasis, Run Length
Non-uniformity, Run Percentage, and Short Run Emphasis.
(c). LBP uses local region statistics, binary encoding and histogram statistics [10]. In
region statistics a pixel is selected as a central pixel and the remaining pixels along a
circle are considered in either clock-wise or anti clock wise in local region statistics.
In binary encoding all the pixels with value less than the center pixel are encoded
as 0 and others are encoded as 1. In histogram statistics the binary coded values are
converted to decimal numbers and computes the histogram.
(d). CLBP is a widely used method to describe the local textural feature of an image.
CLBP is able to capture symbol information along with magnitude and center pixel
level information [11]. It is able to capture the change between the center pixel and
neighbor’s in terms of sign, magnitude and pixel levels. A completed local binary
pattern divides the image into two sets namely signs (CLBP_S) and magnitudes
(CLBP_M) CLBP features are extracted based on uniform and rotation invariance
patterns.
(e). LTrP is another famous feature extraction method that encodes the relationship
between the pixels in binary format by using the characteristics of neighbors by trans-
formation consistency statistics of directional derivative in horizontal and vertical
directions [12]. Then the center pixel is encoded for its magnitude and finally con-
verts all the binary encoded values into magnitudes. Finally a histogram is computed
by traversing all the pixels.
Analysis on Various Feature Extraction Methods … 25
3.4 Classification
Five different medical Datasets are considered in this study for experimentation
purpose. All the Datasets are publicly available [24–27]. Table 1 describes the total
number of images in each Dataset and number of benign and malignant images in
each of them.
Table 1 Description of
Dataset name Total images Benign Malignant
datasets
Brain 692 71 621
Breast 322 207 115
Skin 200 80 120
Thyroid 76 14 62
Colon 235 154 81
26 S. Vani Kumari and K. Usha Rani
In this study a novel methodology for finding the best feature extraction method to
classify medical images is proposed. In the first step Preprocessing is done on medical
images by applying Active Contour Method, Wiener Filter and DFT methods consec-
utively. Popular Active Contour method is applied to separate the background from
foreground. Famous filtering technique, Wiener Filter is used to enhance the image
quality and finally well-known DFT method is used to obtain rotation invariance. In
the second step segmentation is performed using popular Watershed Algorithm to
obtain the segmented part of interest. In the third step five different feature extraction
methods are applied on all the segmented images to extract the features. Finally in
step five two well-known classifiers are applied to find the best feature extraction
method. The entire methodology is presented in Fig. 1 and the proposed method is
as follows:
Step 1: Preprocessing the medical images:
(a) Apply Active Contour Method to separate background from foreground.
(b) Apply Wiener Filter on the fore ground image.
(c) Apply DFT on the filtered image.
Step 2: Segmenting the image applying Watershed Algorithm to obtain the
segmented part of interest.
Step 3: Extract the features from the segmented part of interest using GLCM,
GLRM, LBP, CLBP and LTrP.
Step 4: Classify the segmented part of interest using MLPBPN and SVM.
The proposed method is experimented by considering images in all the five Datasets.
All the images are preprocessed using the three step process for removal of back-
ground using Active Contour Method, Wiener Filter for enhancing the image qual-
ity and finally applying DFT for obtaining rotation invariance. After preprocessing
the images are segmented using Watershed Algorithm. Matlab-R2018a is used for
experimentation. Figure 2 represents the original image and segmented image after
applying Watershed Algorithm on various medical image Datasets. For uniformity
of processing a 256 × 256 pixels image is obtained after segmentation.
Five different feature extraction methods are applied on all the segmented images
to obtain the features from the segmented part of interest. Each feature extraction
method extracts different number of features. The following Table 2 represents the
number of features extracted from each segmented image that are considered in this
work.
The aim of this work is to find the best feature extraction method for extracting
features from medical images. The performance of each of the feature extraction
Analysis on Various Feature Extraction Methods … 27
Fig. 1 Proposed
methodology
method can be tested by applying the features extracted from each method for all the
images of the five datasets to two classifiers seperately. In this work two well-known
classifiers MLPBPN and SVM are trained and tested to analyse the performance of
each feature extraction method. Accuracy is considered as performance metric to
evaluate the performace of the each of feature extraction method. Accuracy is the
ratio of number of correct predictions to the total number of predictions made.
28 S. Vani Kumari and K. Usha Rani
Original
Image
Segmented
Image
(a) Brain (b) Breast (c) Colon (d) Skin (e) Thyroid
Table 2 No of Features
Feature extraction method No of features
extracted from Feature
Extraction Method GLCM 22
GLRM 7
LBP 4
CLBP 8
LTrP 16
where
TP No. of correct classifications of positive examples (True Positive)
FP No.of incorrect classifications of negative examples (False Positive)
FN No.of incorrect classifications of positive examples (Flase Negative)
TN No.of correct classifications of negative examples (True Negative).
Results of various feature extraction methods applied on different datasets at 80%
learning level using MLPBPN and SVM are tabulated in Tables 3 and 4 respectively.
The results depict that while training MLPBPN with Brain Dataset highest accu-
racy of 64.1% is achieved for both LBP and CLBP methods. For Breast, Skin
and Colon Datasets highest accuracies of 97.1%, 70% and 90% are achieved by
GLCM method respectively. Interestingly incase of Thyroid Dataset all the five
methods achieved equal accuracy of 80%. The overall results for MLPBPN show that
GLCM is better compared with the other four methods. Figure 3 shows the graphical
representation of MLPBN results.
The results depict that while training SVM with Brain, Skin, Thyroid and Colon
Datasets highest accuracies of 100%, 62.5, 80% and 97.87% are achieved respectively
by GLCM Method. For Breast Dataset highest accuracies of 64.1% are achieved by
GLRM, LBP and CLBP methods. Interestingly incase of Thyroid Dataset all four
methods except LTrP achieved equal accuracy of 80%. The overall results for SVM
shows that GLCM is having highest classification accuracy for majority of datasets.
Figure 4 shows the graphical representation of SVM results.
5 Conclusion
considered for obtaining rotation invariance. In the Segmentation step famous Water-
shed Algorithm is used for finding the segmented part of interest from the image. In
Feature Extraction step all the five well-known methods GLCM, GLRM, LBP, CLBP
and LTrP are applied separately for extracting the features from the segmented part of
interest. To evaluate the performance of the feature extraction methods two popular
classifiers MLPBPN and SVM are considered in the classification step. The results
of both the classifiers are compared based on Accuracy. Finally by experimenting the
proposed method it is concluded that out of five feature extraction methods GLCM
is having better performance with both the classifiers i.e. SVM and MLPBPN on
medical images.
References
1. D. Ibrahim, An overview of soft computing. Proc. Comput. Sci. 102, 34–38 (2016)
2. N. Dey, A.S. Ashour, Computing in medical image analysis, in Soft Computing Based Medical
Image Analysis (Academic Press, 2018), pp. 3–11
3. R.C. Gonzalez, R.E. Woods, Digital Image Processing, 3rd edn. (Prentice-Hall, 2007). ISBN-
10: 013168728X
4. R.B. Oliveira et al., Computational methods for the image segmentation of pigmented skin
lesions: a review. Comput. Methods Prog. Biomed. 131, 12141 (2016)
5. J-C. Yoo, C.W. Ahn, Image restoration by blind-Wiener filter. IET Image Process. 8(12),
815–823 (2014)
6. F. Bianconi, A. Fernández, Rotation invariant co-occurrence features based on digital circles
and discrete Fourier transform. Pattern Recogn. Lett. 48, 34–41 (2014)
7. S. Avinash, K. Manjunath, S. Senthil Kumar, An improved image processing analysis for the
detection of lung cancer using Gabor filters and watershed segmentation technique, in 2016
International Conference on Inventive Computation Technologies (ICICT), vol. 3 (IEEE, 2016)
8. Z. Abbas et al., An efficient gray-level co-occurrence matrix (GLCM) based approach towards
classification of skin lesion, in 2019 Amity International Conference on Artificial Intelligence
(AICAI) (IEEE, 2019)
9. Ş. Öztürk, B. Akdemir, Application of feature extraction and classification methods for
histopathological image using GLCM, LBP, LBGLCM, GLRLM and SFTA. Proc. Comput.
Sci. 132, 40–46 (2018)
Analysis on Various Feature Extraction Methods … 31
10. Z. Camlica, H.R. Tizhoosh, F. Khalvati, Medical image classification via svm using lbp features
from saliency-based folded data, in 2015 IEEE 14th International Conference on Machine
Learning and Applications (ICMLA) (IEEE, 2015)
11. C. Chen et al., Gabor-filtering-based completed local binary patterns for land-use scene
classification, in 2015 IEEE International Conference on Multimedia Big Data (IEEE, 2015)
12. A. Oberoi et al., A framework for medical image retrieval using local tetra patterns. Int. J. Eng.
Technol. 5(1), 27–36 (2013)
13. Q. Li, R.M. Nishikawa (eds.), Computer-Aided Detection and Diagnosis in Medical Imaging
(Taylor & Francis, 2015)
14. A. Gautam et al., An improved mammogram classification approach using back propagation
neural network, in Data Engineering and Intelligent Computing (Springer, Singapore, 2018),
pp. 369–376
15. M.Q. Khan et al., Classification of melanoma and nevus in digital images for diagnosis of skin
cancer. IEEE Access 7, 90132–90144 (2019)
16. R.J. Hemalatha et al., Active contour based segmentation techniques for medical image analysis,
in Medical and Biological Image Analysis (2018), p. 17
17. M.J. George, D.A.S. Dhas, Preprocessing filters for mammogram images: a review, in 2017
Conference on Emerging Devices and Smart Systems (ICEDSS) (IEEE, 2017)
18. T. Nayak et al., Automatic segmentation and breast density estimation for cancer detection
using an efficient watershed algorithm, in Data Analytics and Learning (Springer, Singapore,
2019), pp. 347–358
19. M.M. Mehdy et al., Artificial neural networks in image processing for early detection of breast
cancer, in Computational and Mathematical Methods in Medicine (2017)
20. T. Karayılan, Ö. Kılıç, Prediction of heart disease using neural network, in 2017 International
Conference on Computer Science and Engineering (UBMK) (IEEE, 2017)
21. Y. Chang et al., Computer-aided diagnosis for classifying benign versus malignant thyroid
nodules based on ultrasound images: a comparison with radiologist-based assessments. Med.
Phys. 43(1), 554–567 (2016)
22. P. Naraei, A. Abhari, A. Sadeghian, Application of multilayer perceptron neural networks
and support vector machines in classification of healthcare data, in 2016 Future Technologies
Conference (FTC) (IEEE, 2016)
23. R.M. Haralick, K. Shanmugam, I. Dinstein, Textural features of image classification, IEEE
Trans. Syst. Man Cybern. SMC-3(6) (1973)
24. https://figshare.com/articles/brain_tumor_dataset/1512427
25. http://peipa.essex.ac.uk/info/mias.html
26. https://www.dropbox.com/s/k88qukc20ljnbuo/PH2Dataset.rar?file_subpath=%2FPH2
Dataset
27. http://cimalab.intec.co/applications/thyroid/
28. https://wiki.cancerimagingarchive.net/display/Public/CT+COLONOGRAPHY#
b72bcf6147ed4fb9935e37f82d01af06