PCA_gl
PCA_gl
• Reducing the number of variables of a data set naturally comes at the expense of
accuracy, but the trick in dimensionality reduction is to trade a little accuracy for
simplicity.
• PCA is simple — reduce the number of variables of a data set, while preserving as
much information as possible.
Step by Step Explanation of PCA
• STEP 1: STANDARDIZATION
• STEP 2: COVARIANCE MATRIX COMPUTATION
• STEP 3: COMPUTE THE EIGENVECTORS AND EIGENVALUES OF THE COVARIANCE
MATRIX TO IDENTIFY THE PRINCIPAL COMPONENTS
Step 1: Standardization
• Standardize the range of the continuous initial variables so that each one of them
contributes equally to the analysis.
• The aim of this step is to understand how the variables of the input data set are
varying from the mean with respect to each other, or in other words, to see if
there is any relationship between them.
• Because sometimes, variables are highly correlated in such a way that they
contain redundant information.
• So, in order to identify these correlations, we compute the covariance matrix.
Step 3: Compute the Eigenvectors and Eigenvalues of
the Covariance Matrix
• An eigenvector of A is a nonzero vector v in Rn such that Av=λv, for some scalar λ.
• An eigenvalue of A is a scalar λ such that the equation Av=λv has
a nontrivial solution.
PCA After Reduction
PCA (During Interviews)
• These are mostly used only for Linear models (Linear Regression and Logistic
Regression).
• Stress the importance of Transformation being performed.
• Explain PCA when questions are asked on improving scores of Linear Based Algos
and on Feature selection.