0% found this document useful (0 votes)

31 views

Principal Component Analysis (PCA) Explained - Built in

Principal Component Analysis

Uploaded by

Sam Smith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Principal Component Analysis (PCA) Explained - Built in

Principal Component Analysis

Uploaded by

Sam Smith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

A Step by Step Explanation of bumpsskkier@gmail.

com

05/07/2024, 17:18
Principal Component Analysis Principal Component Analysis (PCA) Explained | Built In Continue as Sam

(PCA)
To create your account, Google will share your name,
email address, and profile picture with builtin.com.
See builtin.com's privacy policy and terms of service.
Learn how to use a PCA when working with large data sets.

Written by Zakaria Jaadi

Image: Shutterstock / Built In

UPDATED BY REVIEWED BY
Brennan Whitfield | Feb 23, 2024 Sadrach Pierre

https://builtin.com/data-science/step-step-explanation-principal-component-analysis 1/11
method used to simplify a large data set into a smaller set while still maintaining
05/07/2024, 17:18 significant patterns and trends. Principal Component Analysis (PCA) Explained | Built In

Principal component analysis can be broken down into five steps. I’ll go through each step,
providing logical explanations of what PCA is doing and simplifying mathematical concepts
such as standardization, covariance, eigenvectors and eigenvalues without focusing on how to
compute them.

How Do You Do a Principal Component Analysis?

1. Standardize the range of continuous initial variables

2. Compute the covariance matrix to identify correlations
3. Compute the eigenvectors and eigenvalues of the covariance matrix to identify the
principal components
4. Create a feature vector to decide which principal components to keep
5. Recast the data along the principal components axes

First, some basic (and brief) background is necessary for context.

What Is Principal Component Analysis?

Principal component analysis, or PCA, is a dimensionality reduction method that is often used
to reduce the dimensionality of large data sets, by transforming a large set of variables into a

https://builtin.com/data-science/step-step-explanation-principal-component-analysis 2/11
05/07/2024, 17:18 Principal Component Analysis (PCA) Explained | Built In

What Are Principal Components?

Principal components are new variables that are constructed as linear combinations or
mixtures of the initial variables. These combinations are done in such a way that the new
variables (i.e., principal components) are uncorrelated and most of the information within the
initial variables is squeezed or compressed into the first components. So, the idea is 10-
dimensional data gives you 10 principal components, but PCA tries to put maximum possible
information in the first component, then maximum remaining information in the second and
so on, until having something like shown in the scree plot below.

https://builtin.com/data-science/step-step-explanation-principal-component-analysis 3/11
Geometrically speaking, principal components represent the directions of the data that explain
05/07/2024, 17:18 Principal
a maximal amount of variance, that is Component
to say, the Analysis (PCA)
lines that Explained
capture most| information
Built In of
the data. The relationship between variance and information here, is that, the larger the
variance carried by a line, the larger the dispersion of the data points along it, and the larger
the dispersion along a line, the more information it has. To put all this simply, just think of
principal components as new axes that provide the best angle to see and evaluate the data, so
that the differences between the observations are better visible.

How PCA Constructs the Principal Components

As there are as many principal components as there are variables in the data, principal
components are constructed in such a manner that the first principal component accounts for
the largest possible variance in the data set. For example, let’s assume that the scatter plot
of our data set is as shown below, can we guess the first principal component ? Yes, it’s
approximately the line that matches the purple marks because it goes through the origin and
it’s the line in which the projection of the points (red dots) is the most spread out. Or
mathematically speaking, it’s the line that maximizes the variance (the average of the squared
distances from the projected points (red dots) to the origin).

https://builtin.com/data-science/step-step-explanation-principal-component-analysis 4/11
05/07/2024, 17:18 Principal Component Analysis (PCA) Explained | Built In
The second principal component is calculated in the same way, with the condition that it is
uncorrelated with (i.e., perpendicular to) the first principal component and that it accounts for
the next highest variance.

This continues until a total of p principal components have been calculated, equal to the
original number of variables.

Step-by-Step Explanation of PCA

Step 1: Standardization
The aim of this step is to standardize the range of the continuous initial variables so that each
one of them contributes equally to the analysis.

More specifically, the reason why it is critical to perform standardization prior to PCA, is that
the latter is quite sensitive regarding the variances of the initial variables. That is, if there are
large differences between the ranges of initial variables, those variables with larger ranges will
dominate over those with small ranges (for example, a variable that ranges between 0 and 100
will dominate over a variable that ranges between 0 and 1), which will lead to biased results.
So, transforming the data to comparable scales can prevent this problem.

Mathematically, this can be done by subtracting the mean and dividing by the standard
deviation for each value of each variable.

Once the standardization is done, all the variables will be transformed to the same scale.

https://builtin.com/data-science/step-step-explanation-principal-component-analysis 5/11
The covariance matrix is a p × p symmetric matrix (where p is the number of dimensions) that
05/07/2024, 17:18 Principal Component Analysis (PCA) Explained | Built In
has as entries the covariances associated with all possible pairs of the initial variables. For
example, for a 3-dimensional data set with 3 variables x, y, and z, the covariance matrix is a
3×3 data matrix of this from:

Covariance Matrix for 3-Dimensional Data

Covariance Matrix for 3-Dimensional Data.

Since the covariance of a variable with itself is its variance (Cov(a,a)=Var(a)), in the main
diagonal (Top left to bottom right) we actually have the variances of each initial variable. And
since the covariance is commutative (Cov(a,b)=Cov(b,a)), the entries of the covariance matrix
are symmetric with respect to the main diagonal, which means that the upper and the lower
triangular portions are equal.

What do the covariances that we have as entries of the matrix tell us about the
correlations between the variables?

It’s actually the sign of the covariance that matters:

If positive then: the two variables increase or decrease together (correlated)

If negative then: one increases when the other decreases (Inversely correlated)

Now that we know that the covariance matrix is not more than a table that summarizes the
correlations between all the possible pairs of variables, let’s move to the next step.

Step 3: Compute the eigenvectors and eigenvalues of the

covariance matrix to identify the principal components

https://builtin.com/data-science/step-step-explanation-principal-component-analysis 6/11
because the eigenvectors of the Covariance matrix areactually the
FORdirections
EMPLOYERS of the axes whereLOG
JOIN IN

05/07/2024, 17:18 there is the most variance (most information) and that
Principal Component we call(PCA)
Analysis Principal Components.
Explained | Built In And
JOBSeigenvalues are simply
COMPANIES theARTICLES
coefficients attached to eigenvectors,
SALARIES which give the amountof MY ITEMS
COURSES

variance carried in each Principal Component.

By ranking your eigenvectors in order of their eigenvalues, highest to lowest, you get the
principal components in order of significance.

Principal Component Analysis Example:

Let’s suppose that our data set is 2-dimensional with 2 variables x,y and that the eigenvectors
and eigenvalues of the covariance matrix are as follows:

Principal Component Analysis Example

If we rank the eigenvalues in descending order, we get λ1>λ2, which means that the
eigenvector that corresponds to the first principal component (PC1) is v1 and the one that
corresponds to the second principal component (PC2) is v2.

After having the principal components, to compute the percentage of variance (information)
accounted for by each component, we divide the eigenvalue of each component by the sum of
eigenvalues. If we apply this on the example above, we find that PC1 and PC2 carry
respectively 96 percent and 4 percent of the variance of the data.

Step 4: Create a Feature Vector

https://builtin.com/data-science/step-step-explanation-principal-component-analysis 7/11
data set will have only p dimensions.
05/07/2024, 17:18 Principal Component Analysis (PCA) Explained | Built In
Principal Component Analysis Example:

Continuing with the example from the previous step, we can either form a feature vector with
both of the eigenvectors v1 and v2:

Principal Component Analysis eigen

vectors
Or discard the eigenvector v2, which is the one of lesser significance, and form a feature vector
with v1 only:

Principal
Component
Discarding the eigenvector v2 will reduce dimensionality by 1, and will consequently cause a
loss of information in the final data set. But given that v2 was carrying only 4 percent of the
information, the loss will be therefore not important and we will still have 96 percent of the
information that is carried by v1.

So, as we saw in the example, it’s up to you to choose whether to keep all the components or
discard the ones of lesser significance, depending on what you are looking for. Because if you
just want to describe your data in terms of new variables (principal components) that are
uncorrelated without seeking to reduce dimensionality, leaving out lesser significant
components is not needed.

Step 5: Recast the Data Along the Principal Components

Axes

https://builtin.com/data-science/step-step-explanation-principal-component-analysis 8/11
05/07/2024, 17:18 Principal Component AnalysisPrincipal
featureComponent
vector Analysis (PCA) Explained | Built In

An overview of principal component analysis (PCA). | Video: Visually Explained

References:

[Steven M. Holland, Univ. of Georgia]: Principal Components Analysis

[skymind.ai]: Eigenvectors, Eigenvalues, PCA, Covariance and Entropy
[Lindsay I. Smith]: A tutorial on Principal Component Analysis

https://builtin.com/data-science/step-step-explanation-principal-component-analysis 9/11
05/07/2024, 17:18
Recent Data SciencePrincipal
ArticlesComponent Analysis (PCA) Explained | Built In

Top 13 Predictive Analytics Tools to Know

Big Data in Retail: 12 Companies to Know

32 Big Data Examples and Applications

About Get Involved Resources Tech Hubs

Our Story Recruit With Built Customer Our Sites
In Support
Careers
Become an Share Feedback
Our Staff Writers Expert
Contributor Report a Bug
Built In is the online Content
community for startups Descriptions Browse Jobs
and tech companies.
Find startup jobs, tech Tech A-Z
news and events.

   

Learning Lab User Accessibility Copyright Privacy Terms of Your Privacy Choices/Cookie CA Notice of
Agreement Statement Policy Policy Use Settings Collection

https://builtin.com/data-science/step-step-explanation-principal-component-analysis 10/11
05/07/2024, 17:18 Principal Component Analysis (PCA) Explained | Built In

https://builtin.com/data-science/step-step-explanation-principal-component-analysis 11/11

A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
Saint James Academy: Research Methodology
100% (1)
Saint James Academy: Research Methodology
8 pages
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
No ratings yet
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
8 pages
PCA_Notes
No ratings yet
PCA_Notes
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
PCA_dev
No ratings yet
PCA_dev
16 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
DR Pca
No ratings yet
DR Pca
22 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Steps for PCA
No ratings yet
Steps for PCA
5 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
program-3
No ratings yet
program-3
7 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Pca
No ratings yet
Pca
18 pages
Principal Component Analysis - Intro - Towards Data Science
No ratings yet
Principal Component Analysis - Intro - Towards Data Science
4 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
MDA PrincipalComponentAnalysis
No ratings yet
MDA PrincipalComponentAnalysis
20 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
PCA - Principal Component Analysis: Step by Step Computation of PCA
No ratings yet
PCA - Principal Component Analysis: Step by Step Computation of PCA
2 pages
Unit-3
No ratings yet
Unit-3
28 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
STAT502
No ratings yet
STAT502
13 pages
1501589578da-mod15-Q1-e-text
No ratings yet
1501589578da-mod15-Q1-e-text
9 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Practical Guide To Principal Component N R
No ratings yet
Practical Guide To Principal Component N R
43 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
Presentation a i Std 2
No ratings yet
Presentation a i Std 2
63 pages
Mlfa Autumn 2023 Pca
No ratings yet
Mlfa Autumn 2023 Pca
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Clustering_and_dimensionality_reduction_techniques__PCA__t_SNE__K_means_ (1)
No ratings yet
Clustering_and_dimensionality_reduction_techniques__PCA__t_SNE__K_means_ (1)
15 pages
Pca
No ratings yet
Pca
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
No ratings yet
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
36 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
Pac
No ratings yet
Pac
70 pages
Pca
No ratings yet
Pca
17 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Dimensionality Reduction (Principal Component Analysis)
No ratings yet
Dimensionality Reduction (Principal Component Analysis)
12 pages
Principal Component Analysis: #Datascience
No ratings yet
Principal Component Analysis: #Datascience
13 pages
PCA S3
No ratings yet
PCA S3
26 pages
JBoss AS 5 Performance Tuning
From Everand
JBoss AS 5 Performance Tuning
Francesco Marchioni
No ratings yet
BROOKE Burke
No ratings yet
BROOKE Burke
8 pages
Leetcode Upload
No ratings yet
Leetcode Upload
4 pages
Obturator Stretch Dealing With Chronic Pelvic Pain - Embody Health Centre
No ratings yet
Obturator Stretch Dealing With Chronic Pelvic Pain - Embody Health Centre
7 pages
SGD
No ratings yet
SGD
3 pages
Wageningen University Research_MSc_MFS
No ratings yet
Wageningen University Research_MSc_MFS
2 pages
Download ebooks file Land Drainage And Flood Defence Responsibilities 4th ed Edition (Ice) all chapters
100% (4)
Download ebooks file Land Drainage And Flood Defence Responsibilities 4th ed Edition (Ice) all chapters
61 pages
Public administration in Canada Second Edition Barker all chapter instant download
No ratings yet
Public administration in Canada Second Edition Barker all chapter instant download
65 pages
Capsicum Cultivation in Polyhouse
No ratings yet
Capsicum Cultivation in Polyhouse
10 pages
Six Steps To Successful Repair of GT Components
100% (3)
Six Steps To Successful Repair of GT Components
13 pages
Uppsc Pcs 2020 Booklist
No ratings yet
Uppsc Pcs 2020 Booklist
2 pages
Cold War Debate
No ratings yet
Cold War Debate
5 pages
90 VA V - V-Ring Seals, Globally Valid - SKF
No ratings yet
90 VA V - V-Ring Seals, Globally Valid - SKF
2 pages
Lab Experiment No. 1
No ratings yet
Lab Experiment No. 1
6 pages
INTERVIEW (1)
No ratings yet
INTERVIEW (1)
7 pages
Instrumental Methods of Analysis UNIT 1
No ratings yet
Instrumental Methods of Analysis UNIT 1
27 pages
Thermal Analysis of Automobile Drive Axles
No ratings yet
Thermal Analysis of Automobile Drive Axles
13 pages
Describing Group Performances
No ratings yet
Describing Group Performances
40 pages
forkman2000
No ratings yet
forkman2000
3 pages
Transport Test-1
No ratings yet
Transport Test-1
5 pages
Animal Crossing New Leaf
100% (1)
Animal Crossing New Leaf
187 pages
Risk Assessment Logistic
No ratings yet
Risk Assessment Logistic
3 pages
Model 130F20 ICP® Electret Array Microphone Installation and Operating Manual
No ratings yet
Model 130F20 ICP® Electret Array Microphone Installation and Operating Manual
15 pages
Body System Card Sort
No ratings yet
Body System Card Sort
4 pages
Research Q2 M1
No ratings yet
Research Q2 M1
5 pages
Untitled Document - Edited
No ratings yet
Untitled Document - Edited
2 pages
Ginol 1218
No ratings yet
Ginol 1218
1 page
Lev Vygotsky in Powerpoint
No ratings yet
Lev Vygotsky in Powerpoint
51 pages
Algebra II 14 J
No ratings yet
Algebra II 14 J
3 pages
Nptel: Design of Offshore Structures - Web Course
No ratings yet
Nptel: Design of Offshore Structures - Web Course
2 pages
Physical Mechanisms of Laminar Boundary Layer Transition
No ratings yet
Physical Mechanisms of Laminar Boundary Layer Transition
72 pages
24 Questions
No ratings yet
24 Questions
42 pages
LATABLE - Ficha Tecnica 330ml-500ml
No ratings yet
LATABLE - Ficha Tecnica 330ml-500ml
3 pages
Mahematics March 27, 2023
No ratings yet
Mahematics March 27, 2023
13 pages

Principal Component Analysis (PCA) Explained - Built in

Uploaded by

Principal Component Analysis (PCA) Explained - Built in

Uploaded by

A Step by Step Explanation of bumpsskkier@gmail.

Written by Zakaria Jaadi

Image: Shutterstock / Built In

How Do You Do a Principal Component Analysis?

1. Standardize the range of continuous initial variables

First, some basic (and brief) background is necessary for context.

What Is Principal Component Analysis?

What Are Principal Components?

How PCA Constructs the Principal Components

Step-by-Step Explanation of PCA

Covariance Matrix for 3-Dimensional Data

It’s actually the sign of the covariance that matters:

If positive then: the two variables increase or decrease together (correlated)

Step 3: Compute the eigenvectors and eigenvalues of the

variance carried in each Principal Component.

Principal Component Analysis Example:

Principal Component Analysis Example

Step 4: Create a Feature Vector

Principal Component Analysis eigen

Step 5: Recast the Data Along the Principal Components

An overview of principal component analysis (PCA). | Video: Visually Explained

[Steven M. Holland, Univ. of Georgia]: Principal Components Analysis

Top 13 Predictive Analytics Tools to Know

Big Data in Retail: 12 Companies to Know

32 Big Data Examples and Applications

About Get Involved Resources Tech Hubs

You might also like