Module 2 Lab 2
Module 2 Lab 2
Section 1: What is Principal Component Analysis (PCA) and Why Use It?
PCA is a technique for simplifying complex datasets by reducing the number of features
(dimensions) while preserving as much important information (variance) as possible [1] [2] .
Why use PCA?
To visualize high-dimensional data in 2D or 3D.
To speed up machine learning and reduce overfitting.
To remove noise and redundancy from data.
To find patterns or groupings that are hard to see in the original data.
Example:
If one feature ranges from 1–1000 and another from 0–1, the first would dominate the
analysis unless standardized.
Step 2: Covariance Matrix Calculation
What is Covariance?
It measures how two features vary together.
Positive covariance: features increase together; negative: one increases as the other
decreases.
Covariance Matrix:
A table showing the covariance between every pair of features.
For 3 features, it’s a 3x3 matrix.
Why?
It helps find relationships between features and is the foundation for finding principal
components [2] [3] .
Sample A B C
1 2 3 4
2 3 4 5
3 4 5 6
Step-by-step:
1. Standardize A, B, C.
2. Compute covariance matrix (3x3).
3. Find eigenvectors/eigenvalues.
4. Sort and select top 2 eigenvectors (PC1, PC2).
5. Project data onto PC1 and PC2 to get new values for each sample.
6. Plot samples on a 2D graph using PC1 and PC2.
Section 6: Summary Table
Step What Happens? Why It Matters
Covariance Matrix Measures how features vary together Finds relationships between features
Principal Components New axes capturing most information Reduce data size, keep important info
Shows how much info each Helps choose how many components
Explained Variance
component keeps to keep
If you want a deeper explanation of any step, or a code example for a specific part, just ask!
⁂
1. https://www.pickl.ai/blog/a-step-by-step-complete-guide-to-principal-component-analysis-pca-for-b
eginners/
2. https://www.turing.com/kb/guide-to-principal-component-analysis
3. https://www.datacamp.com/tutorial/pca-analysis-r