Module 2 Lab 2

This document provides a comprehensive guide to Principal Component Analysis (PCA), explaining its purpose, process, and applications in a beginner-friendly manner. It details the steps involved in PCA, including standardization, covariance matrix calculation, eigenvectors/eigenvalues, and data transformation, while emphasizing the importance of explained variance and visualization. PCA is highlighted as a valuable technique for simplifying complex datasets, enhancing data analysis, and improving machine learning models.

Uploaded by

katrao39798

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Module 2 Lab 2

Uploaded by

katrao39798

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Detailed Explanation of Module 2 Lab 2: Principal Components Analysis (PCA) –

Updated with All Your Questions

This guide explains every concept and step in the PCA lab, integrating your follow-up questions
and using beginner-friendly language and examples. Each section is structured for clarity and
depth, with practical context and simple analogies.

Section 1: What is Principal Component Analysis (PCA) and Why Use It?
PCA is a technique for simplifying complex datasets by reducing the number of features
(dimensions) while preserving as much important information (variance) as possible [1] [2] .
Why use PCA?
To visualize high-dimensional data in 2D or 3D.
To speed up machine learning and reduce overfitting.
To remove noise and redundancy from data.
To find patterns or groupings that are hard to see in the original data.

Section 2: Step-by-Step PCA Process

Step 1: Standardization (Normalization)

Why?
Features may have different units or scales (e.g., height in cm, weight in kg).
Standardization rescales all features to have mean = 0 and standard deviation = 1, so
each feature contributes equally [2] [3] .
How?
For each value, subtract the mean of that feature and divide by its standard deviation:

Example:
If one feature ranges from 1–1000 and another from 0–1, the first would dominate the
analysis unless standardized.
Step 2: Covariance Matrix Calculation
What is Covariance?
It measures how two features vary together.
Positive covariance: features increase together; negative: one increases as the other
decreases.
Covariance Matrix:
A table showing the covariance between every pair of features.
For 3 features, it’s a 3x3 matrix.
Why?
It helps find relationships between features and is the foundation for finding principal
components [2] [3] .

Step 3: Eigenvectors and Eigenvalues

Eigenvectors:
Directions (axes) in the data space along which variance is maximized.
Eigenvalues:
Tell how much variance is along each eigenvector.
Why?
The eigenvector with the largest eigenvalue points in the direction of the greatest
variance in the data [2] [3] .
How?
Solve the equation , where is the covariance matrix, is the eigenvector,
[2]
and is the eigenvalue .

Step 4: Computing the Principal Components (Selection and Construction)

What are Principal Components?
New axes (directions) created from the original features, capturing the most variance.
The first principal component (PC1) captures the most variance; the second (PC2)
captures the next most, and so on [2] [3] .
How are they computed?
1. Pair eigenvectors and eigenvalues.
2. Sort eigenvectors by their eigenvalues in descending order.
3. Select the top k eigenvectors (those with the highest eigenvalues); these are your
principal components [2] [3] .
4. Project the standardized data onto these new axes (multiply the data by the selected
eigenvectors), creating new features (PC1, PC2, ...) [2] [3] .
Simple Example:
Imagine you have two features (height and weight).
After standardization and covariance calculation, you find two eigenvectors.
The first points in the direction where the data is most spread out (maybe a diagonal
through the data cloud).
The second is perpendicular to the first.
You keep the first one (or both) as your principal components and project your data
onto these axes for easier analysis [2] [3] .

Step 5: Transforming Data to the New Subspace

How?
Multiply the original standardized data by the matrix of selected eigenvectors.
Each data point now has coordinates in terms of the principal components, not the
original features [2] [3] .
Why?
This transformation gives you a new dataset with fewer features but most of the
important information preserved.

Section 3: Interpreting and Using Principal Components

Explained Variance and Choosing Number of Components

Explained Variance:
The proportion of the dataset’s total variance captured by each principal component.
The first few PCs often capture most of the variance.
Cumulative Explained Variance:
Add up the explained variance for the top PCs to see how much total information you
keep.
Rule of Thumb: Keep enough PCs to explain 90% of the variance [3] .
Example:
If PC1 explains 70% and PC2 explains 20%, the first two together explain 90% of the
data’s variance.
Visualization
2D or 3D Scatter Plots:
Plot data using the first two or three PCs as axes.
Color points by class (e.g., benign/malignant).
If classes separate well, PCA has revealed useful structure [3] .
Loadings:
Show how much each original feature contributes to each principal component.
High positive or negative values mean strong influence.

Section 4: Practical Applications of PCA

Data Visualization:
Make complex, high-dimensional data visible and understandable [3] .
Noise Reduction:
Remove less important components (those with low variance) to reduce noise.
Feature Engineering:
Use PCs as new features for machine learning models.
Real-World Examples:
Image compression, facial recognition, anomaly detection, recommendation systems,
and healthcare data analysis [1] [3] .

Section 5: PCA in Action – Simple Example

Suppose you have three features (A, B, C) for each sample.

Sample A B C

1 2 3 4

2 3 4 5

3 4 5 6

Step-by-step:
1. Standardize A, B, C.
2. Compute covariance matrix (3x3).
3. Find eigenvectors/eigenvalues.
4. Sort and select top 2 eigenvectors (PC1, PC2).
5. Project data onto PC1 and PC2 to get new values for each sample.
6. Plot samples on a 2D graph using PC1 and PC2.
Section 6: Summary Table
Step What Happens? Why It Matters

Prevents features with large values

Standardization Rescale features to mean 0, std 1
dominating

Covariance Matrix Measures how features vary together Finds relationships between features

Find new axes (directions) of max

Eigenvectors/Eigenvalues Basis for principal components
variance

Principal Components New axes capturing most information Reduce data size, keep important info

Shows how much info each Helps choose how many components
Explained Variance
component keeps to keep

Enables visualization and better

Data Projection Transform data onto new axes
modeling

Section 7: Key Takeaways

PCA is a powerful tool for simplifying complex data and improving analysis.
It works by finding new axes (principal components) that capture the most variance.
The process involves standardization, covariance calculation, finding
eigenvectors/eigenvalues, selecting top components, and projecting data.
PCA is widely used for visualization, noise reduction, and as a preprocessing step for
machine learning.

If you want a deeper explanation of any step, or a code example for a specific part, just ask!
⁂

1. https://www.pickl.ai/blog/a-step-by-step-complete-guide-to-principal-component-analysis-pca-for-b
eginners/
2. https://www.turing.com/kb/guide-to-principal-component-analysis
3. https://www.datacamp.com/tutorial/pca-analysis-r

A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
Hindu Mantras
100% (1)
Hindu Mantras
9 pages
Pca
No ratings yet
Pca
18 pages
PCA_dev
No ratings yet
PCA_dev
16 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
DR Pca
No ratings yet
DR Pca
22 pages
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
No ratings yet
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Dimensionality Reduction (Principal Component Analysis)
No ratings yet
Dimensionality Reduction (Principal Component Analysis)
12 pages
Principal Component Analysis: #Datascience
No ratings yet
Principal Component Analysis: #Datascience
13 pages
Pca
No ratings yet
Pca
28 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
program-3
No ratings yet
program-3
7 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Pca
No ratings yet
Pca
5 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
Principal Component Analysis (PCA) Explained - Built in
No ratings yet
Principal Component Analysis (PCA) Explained - Built in
11 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
3 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
2. PCA
No ratings yet
2. PCA
22 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Ai ( PCA)
No ratings yet
Ai ( PCA)
3 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Unit-3
No ratings yet
Unit-3
28 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Love Report
No ratings yet
Love Report
7 pages
PCA_Notes
No ratings yet
PCA_Notes
3 pages
Steps for PCA
No ratings yet
Steps for PCA
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Clustering_and_dimensionality_reduction_techniques__PCA__t_SNE__K_means_ (1)
No ratings yet
Clustering_and_dimensionality_reduction_techniques__PCA__t_SNE__K_means_ (1)
15 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
R PCA (Principal Component Analysis) - DataCamp
No ratings yet
R PCA (Principal Component Analysis) - DataCamp
54 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
PCA
100% (1)
PCA
33 pages
PCA - Principal Component Analysis: Step by Step Computation of PCA
No ratings yet
PCA - Principal Component Analysis: Step by Step Computation of PCA
2 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Module 3 Lab 3
No ratings yet
Module 3 Lab 3
4 pages
Module 1 Lab 2
No ratings yet
Module 1 Lab 2
7 pages
temp 2 Lab 1
No ratings yet
temp 2 Lab 1
5 pages
Module 2 Lab 3
No ratings yet
Module 2 Lab 3
5 pages
Ang A., Tang W. Probability Concepts in Engineering 2ed 2007
100% (1)
Ang A., Tang W. Probability Concepts in Engineering 2ed 2007
419 pages
Tulbaghia Violacea
No ratings yet
Tulbaghia Violacea
2 pages
Jric Inventory Form
No ratings yet
Jric Inventory Form
52 pages
Classification of Elements - Chemistry Class 11 Notes, Ebook Free PDF Download
No ratings yet
Classification of Elements - Chemistry Class 11 Notes, Ebook Free PDF Download
13 pages
High Impedance Technical Report
No ratings yet
High Impedance Technical Report
4 pages
Interpretation of Symbols On Megalith K-1
100% (6)
Interpretation of Symbols On Megalith K-1
15 pages
2 - Mini Switch Monitor Datasheet
No ratings yet
2 - Mini Switch Monitor Datasheet
2 pages
Yang Chen Steel Machinery Co., LTD
No ratings yet
Yang Chen Steel Machinery Co., LTD
3 pages
Mesolithic Europe
No ratings yet
Mesolithic Europe
16 pages
10.4324 9781315769981 Previewpdf
No ratings yet
10.4324 9781315769981 Previewpdf
25 pages
Exercises Describing Famous People
100% (1)
Exercises Describing Famous People
2 pages
Parenteral - Enteral Nutrition Pada PASIEN Dengan MALNUTRISI
No ratings yet
Parenteral - Enteral Nutrition Pada PASIEN Dengan MALNUTRISI
19 pages
Plant Propagation
No ratings yet
Plant Propagation
2 pages
Sustainable Urban Drainage Systems (Suds) : The Queen Elizabeth Olympic Park, Stratford, London
No ratings yet
Sustainable Urban Drainage Systems (Suds) : The Queen Elizabeth Olympic Park, Stratford, London
11 pages
Hetronic ERGO-S-Programming Service-Manual 042024
No ratings yet
Hetronic ERGO-S-Programming Service-Manual 042024
40 pages
Dsu Checklist Tomahawk Pa-38-112
No ratings yet
Dsu Checklist Tomahawk Pa-38-112
33 pages
CG Science
100% (2)
CG Science
227 pages
Antennas For VHForUHF Personal Radio A Theoretical and Experimental Study of Characteristics and Performance
No ratings yet
Antennas For VHForUHF Personal Radio A Theoretical and Experimental Study of Characteristics and Performance
10 pages
CC 4200-Manual-De-Partes
No ratings yet
CC 4200-Manual-De-Partes
687 pages
Chemistry The History of Atom The Periodic Table and Radioactivity Zambak 1st Edition Nuh Özdi̇n 2024 scribd download
100% (1)
Chemistry The History of Atom The Periodic Table and Radioactivity Zambak 1st Edition Nuh Özdi̇n 2024 scribd download
67 pages
3.12 Oej-15ft Stroke
No ratings yet
3.12 Oej-15ft Stroke
10 pages
PSDE1624 Book 1 Chapter 1 2024 Slides
No ratings yet
PSDE1624 Book 1 Chapter 1 2024 Slides
18 pages
Mirra Adjustment Guide
No ratings yet
Mirra Adjustment Guide
2 pages
Cloward's Procedure
No ratings yet
Cloward's Procedure
30 pages
1 Thessalonians 5 - 19 - How To Avoid Quenching The Holy Spirit
100% (1)
1 Thessalonians 5 - 19 - How To Avoid Quenching The Holy Spirit
3 pages
Chemistry: Make Plastic Out of Milk!: Main Idea
No ratings yet
Chemistry: Make Plastic Out of Milk!: Main Idea
3 pages
List of P&ID Items: Process Industry
No ratings yet
List of P&ID Items: Process Industry
3 pages
Grundfos - CR 90 3 2 A F A E HQQE
No ratings yet
Grundfos - CR 90 3 2 A F A E HQQE
11 pages
Geography worksheet pollution Class 9
No ratings yet
Geography worksheet pollution Class 9
5 pages