0% found this document useful (0 votes)
0 views

Unit_5(Dimensionality_Reduction)

The document discusses dimensionality reduction techniques, emphasizing the importance of feature selection methods such as filter, wrapper, and embedded methods. It highlights Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as key techniques for reducing data dimensionality while retaining information. Additionally, it provides tips for effective feature selection, including leveraging domain knowledge and using regularization techniques.

Uploaded by

Smrutee Behera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Unit_5(Dimensionality_Reduction)

The document discusses dimensionality reduction techniques, emphasizing the importance of feature selection methods such as filter, wrapper, and embedded methods. It highlights Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as key techniques for reducing data dimensionality while retaining information. Additionally, it provides tips for effective feature selection, including leveraging domain knowledge and using regularization techniques.

Uploaded by

Smrutee Behera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

• The number of input features, variables, or columns present in a given

dataset is known as dimensionality, and the process to reduce these features


is called dimensionality reduction.

• A dataset contains a huge number of input features in various cases, which


makes the predictive modeling task more complicated, for such cases,
dimensionality reduction techniques are required to use.
Wrapper Method
Embedded Method:

• Feature selection using the embedded method combines the


advantages of both filter and wrapper methods.
• In this approach, feature selection occurs as part of the model
training process.
• The model itself selects the most relevant features during training,
based on their contribution to model performance.
• This method is particularly efficient and effective when working
with regularization techniques.
Feature Selection Methods: Useful Tricks & Tips

Here are some useful tricks and tips for feature selection:
• Understand Your Data: Before selecting features, thoroughly understand your
dataset. Know the domain and the relationships between different features.
• Filter Methods: Use statistical measures like correlation, chi-square, or mutual
information to rank features based on their relevance to the target variable.
• Wrapper Methods: Employ algorithms like Recursive Feature Elimination (RFE) or
Forward/Backward Selection, which select subsets of features based on the
performance of a specific machine learning algorithm.
• Embedded Methods: Some machine learning algorithms inherently perform feature
selection during training. Examples include LASSO (L1 regularization) and tree-
based methods like Random Forests.
*LASSO (Least Absolute Shrinkage and Selection Operator) regression is a type of linear regression that
incorporates regularization to prevent overfitting and enhance model interpretability.
• Dimensionality Reduction: Techniques like Principal Component Analysis (PCA) or
t-distributed Stochastic Neighbor Embedding (t-SNE) can reduce the dimensionality of
your data while retaining most of the information.
• Feature Importance: For tree-based algorithms like Random Forest or Gradient
Boosting Machines (GBM), you can use the built-in feature importance attribute to
select the most important features.
• Domain Knowledge: Leverage domain expertise to identify features that are likely to
be important. Sometimes, features that seem irrelevant on the surface might be crucial
when considering domain-specific insights.
• Regularization: Regularization techniques like LASSO (L1 regularization) penalize
the absolute size of the coefficients, effectively performing feature selection by driving
some coefficients to zero.
• Cross-Validation: Perform feature selection within each fold of cross-validation to
ensure that your feature selection process is not biased by the specific dataset splits.
• Ensemble Methods: Combine the results of multiple feature selection methods to get a
more robust set of selected features.
• This feature extraction method reduces the dimensionality of large
data sets while preserving the maximum amount of information.

• Principal Component Analysis emphasizes variation and captures


important patterns and relationships between variables in the
dataset.
How Does Principal Component Analysis (PCA)
Work?
• In general, all the features are not equally important and there are certain
features that account for a large percentage of variance in the dataset.
• The motivation behind the PCA algorithm is that there are certain features
that capture a large percentage of variance in the original dataset.
• So it's important to find the directions of maximum variance in the
dataset.
• These directions are called principal components.
• And PCA is essentially a projection of the dataset onto the principal
components.
• So how do we find the principal components?
What is Covariance Matrix?

• The variance-covariance matrix is a square matrix with diagonal elements that represent
the variance and the non-diagonal components that express covariance.
• The covariance of a variable can take any real value- positive, negative, or zero.
• A positive covariance suggests that the two variables have a positive relationship, whereas a
negative covariance indicates that they do not.
• If two elements do not vary together, they have a zero covariance.
Example: Find the covariance matrix
Example: 2 Find the Eigen Values and Eigen Vector for 3 X 3 Matrix

𝟐 𝟏 𝟑
𝑨= 𝟏 𝟐 𝟑
𝟑 𝟑 𝟐𝟎

Sol:
• Singular Value Decomposition is a matrix factorization technique widely used in various
applications, including linear algebra, signal processing, and machine learning.
• It decomposes a matrix into three other matrices, allowing for the representation of the
original matrix in a reduced form.
Decomposition of Matrix:

Given a matrix M of size m x n (or a data frame with m rows and n columns),
SVD decomposes it into three matrices:

M = U *Σ *Vᵗ,

where U is an m x m orthogonal matrix, Σ is an m x r diagonal matrix, and


V is an r x n orthogonal matrix.
r is the rank of the matrix M
• The diagonal elements of Σ are the singular values of the original matrix M,
and they are arranged in descending order.
• The columns of U are the left singular vectors of M. These vectors form an
orthogonal basis for the column space of M.
• The columns of V are the right singular vectors of M.
In summary,
• PCA is suitable for unsupervised dimensionality reduction,
• LDA is effective for supervised problems with a focus on class separability, and
• SVD is versatile, catering to various applications including collaborative filtering and matrix
factorization.
The choice depends on the nature of your data and the goals of your analysis.

You might also like