0% found this document useful (0 votes)
2 views

PCA_gl

Principal Component Analysis (PCA) is a dimensionality-reduction method that transforms a large set of variables into a smaller one while retaining most of the information. The process involves standardization, covariance matrix computation, and calculating eigenvectors and eigenvalues to identify principal components. PCA is primarily used for linear models and is important for feature selection and improving scores in linear-based algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

PCA_gl

Principal Component Analysis (PCA) is a dimensionality-reduction method that transforms a large set of variables into a smaller one while retaining most of the information. The process involves standardization, covariance matrix computation, and calculating eigenvectors and eigenvalues to identify principal components. PCA is primarily used for linear models and is important for feature selection and improving scores in linear-based algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Principal Component Analysis

What is Principal Component Analysis?

• Dimensionality-reduction method, used to reduce the dimensionality of large


data sets, by transforming a large set of variables into a smaller one that still
contains most of the information in the large set.

• Reducing the number of variables of a data set naturally comes at the expense of
accuracy, but the trick in dimensionality reduction is to trade a little accuracy for
simplicity.

• PCA is simple — reduce the number of variables of a data set, while preserving as
much information as possible.
Step by Step Explanation of PCA

• STEP 1: STANDARDIZATION
• STEP 2: COVARIANCE MATRIX COMPUTATION
• STEP 3: COMPUTE THE EIGENVECTORS AND EIGENVALUES OF THE COVARIANCE
MATRIX TO IDENTIFY THE PRINCIPAL COMPONENTS
Step 1: Standardization

• Standardize the range of the continuous initial variables so that each one of them
contributes equally to the analysis.

• More specifically, the reason why it is critical to perform standardization prior to


PCA, is that the latter is quite sensitive regarding the variances of the initial
variables. So, transforming the data to comparable scales can prevent this
problem.
Step 2: Covariance Matrix Computation

• The aim of this step is to understand how the variables of the input data set are
varying from the mean with respect to each other, or in other words, to see if
there is any relationship between them.
• Because sometimes, variables are highly correlated in such a way that they
contain redundant information.
• So, in order to identify these correlations, we compute the covariance matrix.
Step 3: Compute the Eigenvectors and Eigenvalues of
the Covariance Matrix
• An eigenvector of A is a nonzero vector v in Rn such that Av=λv, for some scalar λ.
• An eigenvalue of A is a scalar λ such that the equation Av=λv has
a nontrivial solution.
PCA After Reduction
PCA (During Interviews)

• These are mostly used only for Linear models (Linear Regression and Logistic
Regression).
• Stress the importance of Transformation being performed.
• Explain PCA when questions are asked on improving scores of Linear Based Algos
and on Feature selection.

You might also like