Feature Selection
Feature Selection
Outline
• Introduction
• What is Feature selection?
• Is Feature Selection required?
• Motivation for Feature Selection.
• Relevance of Features
• Variable Ranking
• Feature Subset Selection
Introduction
• The volume of data is practically exploding by the
day. Not only this, the data that is available now in
becoming increasingly unstructured.
• An universal problem of intelligent (learning) agents
is where to focus their attention.
• It is very critical to understand “What are the aspects
of the problem at hand are important/necessary to
solve it?”
– i.e. discriminate between the relevant and irrelevant parts
of experience.
What is Feature selection?
(or Variable Selection)
• Problem of selecting some subset of a learning
algorithm’s input variables upon which it should
focus attention, while ignoring the rest.
• In other words, Dimensionality Reduction. As
Humans, we constantly do that!
What is Feature selection?
(or Variable Selection)
• Given a set of features F = { f1 ,…, fi ,…, fn } the
Feature Selection problem is to find a subset
that “maximizes the learner’s ability to classify
patterns”.
• Formally F’ should maximize some scoring
function.
Is Feature Selection required?
Two Thoughts
Motivation for Feature Selection.
• Especially when dealing with a large number of variables
there is a need for Dimensionality Reduction.
• Feature Selection can significantly improve a learning
algorithm’s performance.
• The Curse of Dimensionality
Feature Selection — Optimality?
• In theory, the goal is to find an optimal feature-
subset (one that maximizes the scoring function).
• In real world applications this is usually not
possible.
– For most problems it is computationally intractable to
search the whole space of possible feature subsets.
– One usually has to settle for approximations of the
optimal subset.
– Most of the research in this area is devoted to finding
efficient search-heuristics.
Relevance of Features
• There are several definitions of relevance in
literature.
– Relevance of 1 variable, Relevance of a variable given
other variables, Relevance given a certain learning
algorithm ,..
– Most definitions are problematic, because there are
problems where all features would be declared to be
irrelevant
– This can be defined through two degrees of
relevance: weak and strong relevance.
• A feature is relevant iff it is weakly or strongly
relevant and irrelevant (redundant) otherwise.
Relevance of Features
• Strong Relevance of a variable/feature:
– Let Si = {f1, …, fi-1, fi+1, …fn} be the set of all features
except fi. Denote by si a value-assignment to all
features in Si.
– A feature fi is strongly relevant, iff there exists
some xi, y and si for which p(fi = xi, Si = si) > 0 such
that
– p(Y = y | fi = xi; Si = si) ≠ p(Y = y | Si = si)
– This means that removal of fi alone will always result
in a performance deterioration of an optimal Bayes
classifier.
Relevance of Features
• Weak Relevance of a variable/feature:
– A feature fi is weakly relevant, iff it is not strongly
relevant, and there exists a subset of features Si‘
of Si for which there exists some xi, y and si’ with
p(fi = xi, Si’ = si’) > 0 such that
– p(Y = y | fi = xi; Si’ = si’) ≠ p(Y = y | Si’ = si’)
– This means that there exists a subset of
features Si’, such that the performance of an
optimal Bayes classifier on Si’ is worse
than Si’ U { fi }
Variable Ranking
• Variable Ranking is the process of ordering the
features by the value of some scoring function,
which usually measures feature-relevance.