A Review of Feature Selection and Its Methods
A Review of Feature Selection and Its Methods
B. Venkatesh, J. Anuradha
SCOPE, Vellore Institute of Technology, Vellore, TN, 632014, India
E-mail: [email protected]
Abstract: Nowadays, being in digital era the data generated by various applications
are increasing drastically both row-wise and column wise; this creates a bottleneck
for analytics and also increases the burden of machine learning algorithms that work
for pattern recognition. This cause of dimensionality can be handled through
reduction techniques. The Dimensionality Reduction (DR) can be handled in two
ways namely Feature Selection (FS) and Feature Extraction (FE). This paper focuses
on a survey of feature selection methods, from this extensive survey we can conclude
that most of the FS methods use static data. However, after the emergence of IoT and
web-based applications, the data are generated dynamically and grow in a fast rate,
so it is likely to have noisy data, it also hinders the performance of the algorithm.
With the increase in the size of the data set, the scalability of the FS methods becomes
jeopardized. So the existing DR algorithms do not address the issues with the dynamic
data. Using FS methods not only reduces the burden of the data but also avoids
overfitting of the model.
Keywords: Dimensionality Reduction (DR), Feature Selection (FS), Feature
Extraction (FE).
1. Introduction
As the data increases exponentially the quality of data required for processing by
Data mining, Pattern Recognition, Image processing, and other Machine Learning
algorithms decrease gradually. Bellman calls this scenario “Curse of
Dimensionality”. Higher dimension data leads to the prevalence of noisy, irrelevant
and redundant data. Which intern causes overfitting of the model and increases the
error rate of the learning algorithm. To handle these problems “Dimensionality
Reduction” techniques are applied, and it is the part of the preprocessing stage. So,
Feature Selection (FS) and Feature Extraction (FE) are most commonly using
dimensionality reduction approaches. FS is used to clean up the noisy, redundant and
irrelevant data. As a result, the performance is boosted.
In FS a subset of features are selected from the original set of features based on
features redundancy and relevance. Based on the relevance and redundant features,
Y u and L i u [1] in 2004 have classified the feature subset as four types. They are:
1) Noisy & irrelevant; 2) Redundant & Weakly relevant; 3) Weakly relevant and
Non-redundant; 4) Strongly relevant. The feature which did not require for predicting 3
accuracy is known as an irrelevant feature. Some of the popular approaches that fit
into filter and wrapper methods are models, search strategies, feature quality
measures, and feature evaluation.
Set of features are key factors for determining the hypothesis of the predicting
models. The No of features and the hypothesis space are directly proportional to each
other, i.e., as the number of features increases, then the hypothesis space is also
increased. For example, if there are M features with the binary class label in a dataset,
M
then it has 2 2 in the search space. The hypothesis space can further be reduced by
discarding redundant and irrelevant features.
The relevancy of the feature is measured based on the characteristics of the data
not by its value. Statistics is such one technique which shows the relationship between
the features and its importance.
The distortion of irrelevant and redundant features is not due to the presence of
un-useful information; it is because the features did not have a statistical relationship
with other features. Individually any feature may be irrelevant but it is relevant when
joined with other features [2].
4. Applications
These days there is a demand for computational power, processing capacity and
storage to handle the volume of data in various fields such as massive Image
processing, Microarray data, Graph Theory, Gene Selection, Network security and so
on. The massive data is the major concern for the learning models. To improve the
performance of the learner, it is very much essential to apply dimensionality
reduction techniques to generate compact and error-free data for better results. In the
following paragraphs we explain in details about each application area in details.
In Table 4 the different FS methods and its applications are presented. The filter-
based approaches are widely used methods.