Comparison of Feature Selection Methods
Comparison of Feature Selection Methods
com
ISSN: 2248-9622, Vol. 6, Issue 1, (Part - 1) January 2016, pp.94-99
ABSTRACT
Even though a great attention has been given on the cervical cancer diagnosis, it is a tuff task to observe the
pap smear slide through microscope. Image Processing and Machine learning techniques helps the pathologist
to take proper decision. In this paper, we presented the diagnosis method using cervical cell image which is
obtained by Pap smear test. Image segmentation performed by multi-thresholding method and texture and shape
features are extracted related to cervical cancer. Feature selection is achieved using Mutual Information(MI),
Sequential Forward Search (SFS), Sequential Floating Forward Search (SFFS) and Random Subset Feature
Selection(RSFS) methods.
Keywords - Segmentation, Feature Extraction, Feature Selection, Multi-thresholding, Classification
www.ijera.com 94|P a g e
B. Ashok Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 1, (Part - 1) January 2016, pp.94-99
information at the time of feature extraction should Image preprocessing includes noise removal and
be avoided. Multi resolution local binary pattern image enhancement. Image resize also performed.
(MRLBP) [14] variants based texture feature Cell image converted from RGB image to Gray scale
extraction techniques had been presented. Image image.
segmentation performed using discrete wavelet
transform (DWT) further Daubechies wavelet(db2) 3.1 Image segmentation
used as decomposition filter. Deep Sparse Support
Vector Machine (DSSVM) based feature selection Segmentation means split the input image into
method had been discussed [15] and further desired regions so as to get the features. There are lot
extraction of color and texture features was of methods to segment an image such as edge based
performed using super pixels. The importance of methods, region based, watershed method and
local texture(LTP) and local shape (MultiHPOG) thresholding methods. In this work, we used multi
features discussed [16] and the further improvement thresholding method to segment the input image.
in performance achieved by adding the global shape Nucleus and Cytoplasm of the cervical cell image are
feature (Gabor wavelets). Feature selection based on segmented.
information has the drawback such as the interaction RGB Gray scale Segmented Segmented
between the features and classifiers. This may lead to Image Image Nucleus Cytoplasm
the selection irrelevant features. To overcome this Image Image
problem, Mutual Information based techniques used
[17]. Even though new feature extraction, feature
selection and classification methods have been
introduced, there will be a requirement exists for
newer or better techniques. In our previous works
[18][19] various segmentation methods were
analyzed.
Input image
Preprocessing
Image Preprocessing
Segmentation using
multi-thresholding
www.ijera.com 95|P a g e
B. Ashok Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 1, (Part - 1) January 2016, pp.94-99
Equiv Diameter, Major Axis Length, Minor Axis Random Subset Feature Selection methods are used
Length, Perimeter, roundness, position, brightness to select the features.
and nucleus cytoplasm ratio and cytoplasm related
features are Area, Centroid, Eccentricity, Equiv 3.3 Mutual Information (MI)
Diameter, Major Axis Length, Minor Axis Length, Basically Mutual information is the
Perimeter, brightness, roundness. Totally 30 shape measurement of similarity. The mutual
features are extracted. In this proposed work, by information of two random variables is a measure of
combining texture and shape features we extract the variable’s mutual dependence that is correlation
totally 44 features. between two variables. Increase in mutual
information which is often equivalent to minimizing
IV.Feature Selection conditional entropy. MI comes under filter method.
Feature selection is one of the optimization Generally filter methods provide ranking to the
process to select a minimum number of effective features. Choosing the selection point based on
features. Feature selection is required to reduce curse ranking is performed through cross-validation. In
of dimensionality due to irrelevant or overfitting, cost feature selection, MI is used to characterize the
of feature measurement and computational burden. In relevance and redundancy between features.
order to select a feature subset, a search method and The mutual information between two discreet
an objective function is required. Small sample size random variables X, Y jointly distributed according
p( x, y) log
and what objective function to use are some of the to p(x, y) is given by
potential difficulties in feature selection process.
I(X; Y) = (1)
Number of features is directly proportional to the
x, y
feature subset that is as in the case of an increase in
features there may be an increase in feature subset = H(X) − H(X|Y) (2)
also.
There is a large number of search methods, which = H(Y) − H(Y|X) (3)
can be grouped in three categories. Exponential
algorithms, Sequential algorithms and Randomized = H(X) + H(Y) − H(X, Y) (4)
algorithms. Exponential algorithms evaluate a
number of subsets that grows exponentially with the Where H(X) and H(Y) are the individual entropy and
dimensionality of the search space. Exhaustive H(X|Y) and H(Y|X) are conditional entropy. H(X,Y)
Search, Branch and Bound, Beam Search comes Joint entropy. p(x,y) joint distribution and p(x)
under exponential algorithms. Sequential algorithms probability distribution
add or remove features sequentially, but have a In this work, all 44 features are ranked by the MI
tendency to become trapped in local minima. algorithm. The first top ranked 10 features are
Representative examples of sequential search include selected from the ranking list.
Sequential Forward Selection, Sequential Backward
Selection, Sequential Floating Selection and 3.4 Sequential Forward Selection (SFS)
Bidirectional Search. Randomized algorithms Sequential forward selection is one of the
incorporating randomness into their search procedure deterministic single-solution method. Sequential
to escape local minima. Examples are Random forward selection is the simplest and fastest greedy
Generation plus Sequential Selection, Simulated search algorithm start with a single feature and
Annealing and Genetic Algorithms. iteratively add features until the termination criterion
Objective function is divided into three types. One is is met. The following algorithm explain the concept
filter method, second one is wrapper method and of SFS
another is embedded method. Filter methods are
defined as independent evaluation of the
classification algorithm and the objective function
evaluates feature subsets by their information
content, typically interclass distance, mutual
information or information-theoretic measures. In
wrapper method, the objective function is a pattern
classifier, which evaluates feature subsets by their
predictive accuracy (recognition rate on test data) by
statistical re-sampling or cross validation. Embedded
methods are specific methods. In this paper filter
based Mutual Information (MI) and wrapper based
methods such as Sequential Forward Selection,
Sequential Floating Forward selection(SFFS) and
www.ijera.com 96|P a g e
B. Ashok Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 1, (Part - 1) January 2016, pp.94-99
Algorithm
Step 1: Inclusion. Use the basic sequential feature Figure 3, SVM classifier
selection method to select the most significant feature
with respect to feature set and include it in the
VI.Results and Discussion
feature subset. Stop if desired features have been
In this work, 150 images of pap smear test are
selected, otherwise go to step 2.
collected from Rajah Muthiah Medical College,
Step 2: Conditional exclusion. Find the least
Annamalainagar, out of 150 images, 100 images are
significant feature k in feature subset. If it is the
used for training the SVM classifier and 50 images
feature just added, then keep it and return to step 1.
are used for testing.
Otherwise, exclude the feature k. Note that feature
www.ijera.com 97|P a g e
B. Ashok Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 1, (Part - 1) January 2016, pp.94-99
www.ijera.com 98|P a g e
B. Ashok Int. Journal of Engineering Research and Applications www.ijera.com
ISSN: 2248-9622, Vol. 6, Issue 1, (Part - 1) January 2016, pp.94-99
www.ijera.com 99|P a g e