Feature Construction
Feature Construction
Feature construction
Dimensionality reduction -- motivation .2
– May Improve performance of classification algorithm by removing irrelevant
features
Defying the curse of dimensionality - simpler models result in improved - –
:generalization
– Classification algorithm may not scale up to the size of the full feature set
either in space or time
– Allows us to better understand the domain
– Cheaper to collect and store data based on reduced feature set.
Two approaches for
dimensionality reduction
– Feature construction
– Feature selection
Feature construction )a(
• Linear methods
– Principal component analysis (PCA)
– Independent component analysis (ICA)
– ….
• Non-linear methods
– Non linear component analysis (NLCA)
– Kernel PCA
– Local linear embedding (LLE)
– ….
Principal component analysis
(PCA) (1)
• PCA is mostly used as a tool in exploratory data analysis and for
making predictive models.
• PCA involves the calculation of the eigenvalue decomposition of
a data covariance matrix, usually after mean centering the data
for each attribute.
• PCA is mathematically defined as an orthogonal linear
transformation that transforms the data to a new coordinate
system such that the greatest variance by any projection of the
data comes to lie on the first coordinate (called the first principal
component), the second greatest variance on the second
coordinate, and so on.
• PCA is theoretically the optimal linear scheme, in terms of least
mean square error, for compressing a set of high dimensional
vectors into a set of lower dimensional vectors and then
reconstructing the original set.
Principal component analysis
(PCA) (2)
• The applicability of PCA is limited by the assumptions
made in its derivation:
• 1. Linearity: We assume the observed data set are
linear combinations of certain basis vectors.
• 2. PCA only finds the independent axes of the data
under the Gaussian assumption.
• 3. It is only when we believe that the observed data
has a high signal-to-noise ratio that the principal
components with larger variance correspond to
interesting dynamics and lower ones correspond to
noise.
The End