0% found this document useful (0 votes)

8 views

Principal Component Analysis Concepts

Principal Component Analysis (PCA) is a technique used to reduce the dimensionality of large data sets by transforming the data to a new set of variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA involves computing the eigenvectors and eigenvalues of the covariance matrix to determine the principal components that best explain the variance in the data.

Uploaded by

aman gupta

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Principal Component Analysis Concepts

Uploaded by

aman gupta

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

PCA

Principal Component Analysis Concepts

[email protected]
QU1HPBT85A

Proprietary content. ©Great

1. Main idea: seek most accurate data representation in a lower dimensional space

1. Example in 2-D, project data to 1-D subspace (a line) with minimal projection error

[email protected]
QU1HPBT85A

1. In both the pictures above, the data points (black dots) are projected to one line but the
second line is closer to the actual points (less projection errors) than first one

1. Notice that the good line to use for projection lies in the direction of largest variance

Ref: http://www.cs.haifa.ac.il/~rita/uml_course/add_mat/PCA.pdf
Proprietary content. ©Great
ThisLearning.
file is meantAll
forRights
personalReserved. Unauthorized use oronly.
use by [email protected] distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action.
PCA Pt 2

5. After the data is projected on the best line, need to transform the coordinate system to
get 1D representation for vector y

5. Note that new data y has the same variance as old data x in the direction of the green
[email protected]
QU1HPBT85A line

5. PCA preserves largest variances in the data

8. In general PCA on n dimensions will result in another set of new n dimensions. The
one which captures maximum variance in the underlying data is the principal
component 1, principal component 2 is orthogonal to it

8. Example in 2-D, project data to 1-D subspace (a line) with minimal projection error

[email protected]
QU1HPBT85A

Mechanics of Principal Component Analysis

[email protected]
QU1HPBT85A

http://setosa.io/ev/principal-component-analysis/
Proprietary content. ©Great
ThisLearning.
file is meantAll
forRights
personalReserved. Unauthorized use oronly.
use by [email protected] distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Principal Component Analysis steps

1. Begins by standardizing the data. Data on all the dimensions are subtracted from their
means to shift the data points to the origin. i.e. the data is centered on the origins

1. Generate the covariance matrix / correlation matrix for all the dimensions

1. Perform eigen decomposition, that is, compute eigen vectors which are the principal
components and the corresponding eigen values which are the magnitudes of variance
captured
[email protected]
QU1HPBT85A

1. Sort the eigen pairs in descending order of eigen values and select he one with the
largest value. This is the first principal component that covers the maximum
information from the original data

1. PCA effectiveness depends upon the scales of the attributes. If attributes have
different scales, PCA will pick variable with highest variance rather than picking up
attributes based on correlation

1. Changing scales of the variables can change the PCA

1. Interpreting PCA can become challenging due to presence of discrete data

[email protected]
QU1HPBT85A 1. Presence of skew in data with long thick tail can impact the effectiveness of the PCA
(related to point 1)

1. PCA assumes linear relationship between attributes. It is ineffective when relationships

are non linear

Proprietary content. ©Great

Description – Explore the iris data set and perform PCA

The data set is winequality-red.csv

[email protected]
QU1HPBT85A

Sol: PCA-iris.ipynb
Proprietary content. ©Great
ThisLearning.
file is meantAll
forRights
personalReserved. Unauthorized use oronly.
use by [email protected] distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Principal Component Analysis (Signal to noise ratio)
Principal Component Analysis (Signal to noise ratio)
Signal – all valid values for a variable
(show between max and min values for
Y max
x axis and y axis). Represents a valid
data

Noise – The spread of data points

across the best fit line. For a given
value of x, there are multiple values of
[email protected]
QU1HPBT85A Y min y (some on line and some around the
X min X max line). This spread is due to random
factors

Signal to Noise Ratio – Variance of

Signal
signal / variance in noise.
X_std_df = pd.DataFrame(X_std)
axes = pd.plotting.scatter_matrix(X_std_df)
plt.tight_layout() Greater the SNR the better the model
will be

Proprietary content. ©Great

1. Variance is measured within the

dimensions and co-variance is
among the dimensions

1. Express total variance (variance and

[email protected]
QU1HPBT85A cross variance between dimensions
as a matrix (variance matrix)

1. Covariance matrix is a mathematical

representation of the total variance
of individual dimension and across
dimensions .
Covariance matrix for three dimensions x,y and z

eig_vals, eig_vecs = np.linalg.eig(cov_matrix)

Proprietary content. ©Great
ThisLearning.
file is meantAll
forRights
personalReserved. Unauthorized use oronly.
use by [email protected] distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Improving SNR through PCA ( Scaling the dimensions)
Improving SNR through PCA ( Scaling the dimensions)

2nd Principal Component

1. The mean is subtracted from all the
points on both dimensions i.e. (xi –
xbar) and (yi – ybar)

1. The dimensions are transformed using

algebra into new set of dimensions
[email protected]
QU1HPBT85A

1. The transformation is a rotation of axes

in mathematical space

1 st Principal Component

X_std = StandardScaler().fit_transform(X)
eig_vals, eig_vecs = np.linalg.eig(cov_matrix)

ThisLearning.
file is meantAll
forRights
personalReserved. Unauthorized use oronly.
use by [email protected] distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action.
(Calculating total variance (covariance and variance )
PCA (Calculating total variance (covariance and variance )

4. Multiplying the two matrices produces a matrix of total variance also called
covariance matrix (a square and symmetric matrix).

[email protected]
QU1HPBT85A

5. The original data points are now

represented by the red dots on
new dimensions

5. It also introduces error of

representation (vertical red lines Noise
[email protected]
QU1HPBT85A
from the blue dots to
corresponding red dots on the
X min X max
new dimension)

Signal
5. The axis rotation is done such that
the new dimension captures max
variance in the data points and print('Eigen Vectors \n%s', eig_vecs)
also reduces total error of print('\n Eigen Values \n%s', eig_vals)
representation

ThisLearning.
file is meantAll
forRights
personalReserved. Unauthorized use oronly.
use by [email protected] distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action.
Properties of principal components and their covariance
matrix Properties of principal components and their covariance matrix
8. Thus to find principal components we need to get the diagonal matrix
from the original covariance matrix

[email protected]
QU1HPBT85A

8. For this we have to transform the matrix A to a new matrix B such that the
covariance matrix of B ( ), is a diagonal matrix (Ref to part 2, bullet
5)

1. PCA can also be used to reduce dimensions

1. Arrange all eigen vectors along with corresponding eigen values in

descending order of eigen values

1. Plot a cumulative eigen_value graph as shown below

[email protected]
QU1HPBT85A
1. Eigen vectors with insignificant contribution to total eigen values can be
removed from analysis (for e.g. eigen vector 6 and 7 below)

Thanks
[email protected]
QU1HPBT85A

Lecture Planner - Physics - Varun JEE Advanced 2024
No ratings yet
Lecture Planner - Physics - Varun JEE Advanced 2024
1 page
Probability
No ratings yet
Probability
128 pages
Chapter 1 Introducing Operating Systems
No ratings yet
Chapter 1 Introducing Operating Systems
40 pages
Fundamentals of Python: First Programs Second Edition
No ratings yet
Fundamentals of Python: First Programs Second Edition
42 pages
Effects of Cure Cycles On Void Content and Mechanical Properties of Composite Laminates
No ratings yet
Effects of Cure Cycles On Void Content and Mechanical Properties of Composite Laminates
7 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
Case+Study+Summary+Session+May22
No ratings yet
Case+Study+Summary+Session+May22
18 pages
MRA MS Week 1
No ratings yet
MRA MS Week 1
11 pages
MRA MS Week 1
No ratings yet
MRA MS Week 1
11 pages
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
MRA MS Week 2
No ratings yet
MRA MS Week 2
12 pages
MLS+1+-+Decision+Trees+and+Random+Forests
No ratings yet
MLS+1+-+Decision+Trees+and+Random+Forests
16 pages
Measures+of+Central+tendency dispersion-+Lecture+Slides
No ratings yet
Measures+of+Central+tendency dispersion-+Lecture+Slides
14 pages
DL Mentoring Session - Final
No ratings yet
DL Mentoring Session - Final
17 pages
Stats_essentials
No ratings yet
Stats_essentials
17 pages
Industry Transform Data
No ratings yet
Industry Transform Data
51 pages
DSBA+-+Exploratory+Data+Analysis+v2
No ratings yet
DSBA+-+Exploratory+Data+Analysis+v2
22 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Ch. 2 Science Matter Energy Systems
No ratings yet
Ch. 2 Science Matter Energy Systems
47 pages
ITP-Day 4 - Deck
No ratings yet
ITP-Day 4 - Deck
52 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Introduction - Lecture Slides-1
No ratings yet
Introduction - Lecture Slides-1
12 pages
Clustering
No ratings yet
Clustering
7 pages
Model Deployment - Slide
No ratings yet
Model Deployment - Slide
14 pages
Probability 2
No ratings yet
Probability 2
28 pages
ITPPA1 File Structures
No ratings yet
ITPPA1 File Structures
19 pages
Azure Week1 - Class Presentation
No ratings yet
Azure Week1 - Class Presentation
43 pages
Prompt Engineering
No ratings yet
Prompt Engineering
26 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
NLP-2 - Problem Statement
No ratings yet
NLP-2 - Problem Statement
3 pages
Logistic+regression Data
No ratings yet
Logistic+regression Data
13 pages
Model Deployment GL
No ratings yet
Model Deployment GL
20 pages
Measures+of+Central+Tendency Dispersion +Lecture+Slides
No ratings yet
Measures+of+Central+Tendency Dispersion +Lecture+Slides
14 pages
Statistics Lecture Notes
No ratings yet
Statistics Lecture Notes
11 pages
PDS - Week 2
No ratings yet
PDS - Week 2
10 pages
Isoquants
No ratings yet
Isoquants
7 pages
Reference+Materials ARIMA
No ratings yet
Reference+Materials ARIMA
21 pages
Installation Steps Anaconda Windows
No ratings yet
Installation Steps Anaconda Windows
16 pages
The Art and Science of Data_ Navigating the Data Science Lifecycle
No ratings yet
The Art and Science of Data_ Navigating the Data Science Lifecycle
33 pages
2. Resampling Methods-1
No ratings yet
2. Resampling Methods-1
43 pages
Statistics Lecture Notes
No ratings yet
Statistics Lecture Notes
11 pages
Slides - Ensemble
No ratings yet
Slides - Ensemble
6 pages
HTML in Depth
No ratings yet
HTML in Depth
12 pages
(Experiment 1) Lab Report
No ratings yet
(Experiment 1) Lab Report
4 pages
Logistic Regression Video
No ratings yet
Logistic Regression Video
37 pages
Clustering
No ratings yet
Clustering
7 pages
1 Introduction To Python
No ratings yet
1 Introduction To Python
93 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Download ebooks file Fundamentals of Python First Programs 1st Edition Kenneth A. Lambert all chapters
100% (8)
Download ebooks file Fundamentals of Python First Programs 1st Edition Kenneth A. Lambert all chapters
41 pages
FRA Week 2
No ratings yet
FRA Week 2
11 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Download Fundamentals of Python First Programs 1st Edition Kenneth A. Lambert ebook All Chapters PDF
100% (14)
Download Fundamentals of Python First Programs 1st Edition Kenneth A. Lambert ebook All Chapters PDF
50 pages
Immediate download Fundamentals of Python First Programs 1st Edition Kenneth A. Lambert ebooks 2024
100% (3)
Immediate download Fundamentals of Python First Programs 1st Edition Kenneth A. Lambert ebooks 2024
51 pages
130596733x 567412 - Students
No ratings yet
130596733x 567412 - Students
58 pages
Chapter 11
No ratings yet
Chapter 11
92 pages
Ebbing11e PPT Lectures CH01
No ratings yet
Ebbing11e PPT Lectures CH01
71 pages
Chapter 8 Programming I
No ratings yet
Chapter 8 Programming I
58 pages
Information Security - Ch12
No ratings yet
Information Security - Ch12
52 pages
Data Science Hindi
67% (3)
Data Science Hindi
29 pages
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
From Everand
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
Mark Magic
No ratings yet
Corizo Class 2_Numpy Pandas
No ratings yet
Corizo Class 2_Numpy Pandas
7 pages
Corizo Class 3_Visualization
No ratings yet
Corizo Class 3_Visualization
6 pages
Practical - Week - 04.1 - SolutionManual
No ratings yet
Practical - Week - 04.1 - SolutionManual
12 pages
Practical - Week - 04.2 - Solution - Manual
No ratings yet
Practical - Week - 04.2 - Solution - Manual
10 pages
Application of Vectors
No ratings yet
Application of Vectors
3 pages
Modeling of Two-Dimensional Random Fields: R. Jankowski
No ratings yet
Modeling of Two-Dimensional Random Fields: R. Jankowski
7 pages
Jumping From A Height: T 0 T t1 T t2
No ratings yet
Jumping From A Height: T 0 T t1 T t2
3 pages
Ramjet Engine Test Rig: Propulsion Engineering Laboratory
No ratings yet
Ramjet Engine Test Rig: Propulsion Engineering Laboratory
7 pages
Modified Karhunen-Loeve Random Field: A Discrete Estimator
No ratings yet
Modified Karhunen-Loeve Random Field: A Discrete Estimator
10 pages
The Testflow User's Guide: Michael Shell January 10, 2007, Version 1.1
No ratings yet
The Testflow User's Guide: Michael Shell January 10, 2007, Version 1.1
22 pages
Redux 312: Product Data
No ratings yet
Redux 312: Product Data
4 pages
Pollution Assignment
No ratings yet
Pollution Assignment
5 pages
Aerodynamics: Other Reference Books
No ratings yet
Aerodynamics: Other Reference Books
1 page
AST Nozzle Repair Req.
No ratings yet
AST Nozzle Repair Req.
10 pages
Max13450e Max13451e
100% (1)
Max13450e Max13451e
17 pages
CMI700US
No ratings yet
CMI700US
2 pages
MIL DTL 14824 - Amendment1
No ratings yet
MIL DTL 14824 - Amendment1
32 pages
Marimba Grade 2 Exam Scales and Arpeggios
100% (1)
Marimba Grade 2 Exam Scales and Arpeggios
1 page
SKYDECK Hanger System: For Concrete Slab In-Fills and Drops
No ratings yet
SKYDECK Hanger System: For Concrete Slab In-Fills and Drops
4 pages
12 Computer Science Sp 08 With Solution
No ratings yet
12 Computer Science Sp 08 With Solution
15 pages
Redox Reaction
No ratings yet
Redox Reaction
31 pages
Thermal Analysis: Chapter Six
No ratings yet
Thermal Analysis: Chapter Six
52 pages
History of Computers
No ratings yet
History of Computers
94 pages
MORPHEME
No ratings yet
MORPHEME
11 pages
Extra Credit
No ratings yet
Extra Credit
2 pages
(FE) Script Hub
No ratings yet
(FE) Script Hub
110 pages
Lab 2 5 1 Basic Switch Configuration
0% (1)
Lab 2 5 1 Basic Switch Configuration
14 pages
Ch5 - 1 - Static Equilibrium - 0
No ratings yet
Ch5 - 1 - Static Equilibrium - 0
14 pages
(WWW - Entrance-Exam - Net) - AIEEE Maths Sample Paper 3
No ratings yet
(WWW - Entrance-Exam - Net) - AIEEE Maths Sample Paper 3
5 pages
9th Coordinate Geometry
No ratings yet
9th Coordinate Geometry
25 pages
2 RM - Paradigms of Research
No ratings yet
2 RM - Paradigms of Research
19 pages
Weight and Balance Definitions
100% (1)
Weight and Balance Definitions
3 pages
Smbus Driver Ext Arch10
No ratings yet
Smbus Driver Ext Arch10
20 pages
mediclinics-hand-dryer-M99A-C-CS (1)
No ratings yet
mediclinics-hand-dryer-M99A-C-CS (1)
2 pages
DC gm3688
No ratings yet
DC gm3688
3 pages
MOBIRAY
No ratings yet
MOBIRAY
1 page
PDR Series: Power Drive Reach Lift Truck Serial Number 333423 and Higher
No ratings yet
PDR Series: Power Drive Reach Lift Truck Serial Number 333423 and Higher
186 pages
Details of Minor Course For 2019 Batch
No ratings yet
Details of Minor Course For 2019 Batch
9 pages
Dokumen - Tips Class X Chemistry
No ratings yet
Dokumen - Tips Class X Chemistry
135 pages
Defects Incomplete Root Fusion-Penetration
No ratings yet
Defects Incomplete Root Fusion-Penetration
4 pages
IOAA 2014, Romania (Problems)
70% (10)
IOAA 2014, Romania (Problems)
30 pages
Roundtest Ra-1600: Roundness/Cylindricity Measurement
No ratings yet
Roundtest Ra-1600: Roundness/Cylindricity Measurement
8 pages

Principal Component Analysis Concepts

Uploaded by

Principal Component Analysis Concepts

Uploaded by

PCA

Principal Component Analysis Concepts

Proprietary content. ©Great

5. PCA preserves largest variances in the data

Mechanics of Principal Component Analysis

1. Changing scales of the variables can change the PCA

1. Interpreting PCA can become challenging due to presence of discrete data

1. PCA assumes linear relationship between attributes. It is ineffective when relationships

Proprietary content. ©Great

Description – Explore the iris data set and perform PCA

The data set is winequality-red.csv

Noise – The spread of data points

Signal to Noise Ratio – Variance of

Proprietary content. ©Great

1. Variance is measured within the

1. Express total variance (variance and

1. Covariance matrix is a mathematical

eig_vals, eig_vecs = np.linalg.eig(cov_matrix)

2nd Principal Component

1. The dimensions are transformed using

1. The transformation is a rotation of axes

Proprietary content. ©Great

Proprietary content. ©Great

5. The original data points are now

5. It also introduces error of

Proprietary content. ©Great

Proprietary content. ©Great

1. PCA can also be used to reduce dimensions

1. Arrange all eigen vectors along with corresponding eigen values in

1. Plot a cumulative eigen_value graph as shown below

Proprietary content. ©Great

Proprietary content. ©Great

You might also like