0% found this document useful (0 votes)
10 views

The Math Behind PCA

Principal Component Analysis (PCA) is a technique for reducing the dimensionality of datasets while retaining variance by transforming original variables into uncorrelated principal components. The process involves standardizing data, computing a covariance matrix, finding eigenvalues and eigenvectors, selecting top components, forming a projection matrix, and projecting data into a lower-dimensional space. PCA helps identify the axes of greatest variance, allowing for effective data representation and analysis.

Uploaded by

Mahrukh Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

The Math Behind PCA

Principal Component Analysis (PCA) is a technique for reducing the dimensionality of datasets while retaining variance by transforming original variables into uncorrelated principal components. The process involves standardizing data, computing a covariance matrix, finding eigenvalues and eigenvectors, selecting top components, forming a projection matrix, and projecting data into a lower-dimensional space. PCA helps identify the axes of greatest variance, allowing for effective data representation and analysis.

Uploaded by

Mahrukh Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

The Math Behind PCA (Principal Component Analysis)

What is PCA?

Principal Component Analysis (PCA) is a linear transformation technique


used to reduce the dimensionality of a dataset while preserving as much
variance (information) as possible.

It does this by transforming the original variables into a new set of


uncorrelated variables called principal components, ordered by how
much variance they capture from the data.

The Mathematical Steps of PCA

Step 1: Standardize the Data

PCA is sensitive to scale, so we start by centering and standardizing the


dataset:

(std) x i−μi
xi =
σi

Where:

 x i = original value of the feature

 μi = mean of the feature

 σ i = standard deviation of the feature

This step ensures all features contribute equally.

Step 2: Compute the Covariance Matrix

Next, we measure the relationships (covariances) between all pairs of


features.
1 T
C= X X
n−1

Where:

 X = standardized data matrix (rows: samples, columns: features)

 C = covariance matrix (symmetric)

Each element c i j of C tells us the covariance between features i and j .


Step 3: Compute the Eigenvalues and Eigenvectors

We solve the eigen decomposition of the covariance matrix:


C v=λ X

Where:

 λ = eigenvalue (amount of variance captured)

 v = eigenvector (direction of the new axis)

You get:

 A set of eigenvectors (principal directions)

 A set of eigenvalues (explained variance per direction)

The eigenvector with the highest eigenvalue is the first principal


component, and so on.

Step 4: Sort Eigenvalues and Select Top k

Sort eigenvalues in descending order and pick the top k components that
capture the most variance.

The explained variance ratio is calculated as:


λk
Explained Variance Ratiok =
∑λ
This helps in choosing how many components to keep.

Step 5: Form the Projection Matrix

Let’s say we choose kkk eigenvectors (columns of matrix W ):


W =[v 1 , v 2 , … , v k ]

Where each v i is an eigenvector (principal component).

Step 6: Project the Data

Transform the original data into the new space:


Z=XW

 X = standardized data matrix

 W = matrix of top kkk eigenvectors

 Z = transformed data in lower dimensions

Each row of Z is a data point represented in the principal component


space.

Geometric Intuition

 PCA finds the axes of greatest variance in the data.

 These new axes (principal components) are orthogonal


(perpendicular).

 Data is rotated and projected onto the new axes.

 The first few principal components often capture most of the variability.

You might also like