0% found this document useful (0 votes)

6 views

Lecture 9 -Data Prep - Reduction - PCA-M

The document discusses data preprocessing in data mining, focusing on four major steps: data cleaning, integration, reduction, and transformation. It highlights dimensionality reduction techniques such as Principal Components Analysis (PCA) and explains the importance of removing dimensional redundancy to enhance data analysis efficiency. Additionally, it covers concepts like covariance, eigenvalues, and eigenvectors, which are essential for understanding PCA and its application in regression modeling.

Uploaded by

gihel53025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lecture 9 -Data Prep - Reduction - PCA-M

Uploaded by

gihel53025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

CS06504

Data Mining
Lecture # 9
Data Preprocessing
(Ch # 3)
Data Preprocessing
 Dimensionality reduction is a part of Data
Preprocessing
 Data Preprocessing has following four
major steps
1. Data cleaning
2. Data integration
3. Data reduction
4. Data Transformation and Discretization
Data Reduction
 Reduce representation of the data set that is
much smaller in volume, yet closely
maintains the integrity of the original data.
Different Strategies are
• Dimensionality Reduction
• Numerosity reduction
• Data Compression
Dimensionality Reduction
(DR)
 Process of reducing the number of random
attributes under consideration.
 Two very common methods are Wavelet
Transforms and Principal Components
Analysis (PCA)
Numerosity Reduction
 Replace the original data volume by
alternative smaller forms of data
representation.
 Regression and log-linear models
(parametric methods)
 Histograms, clustering, sampling and data
cube aggregation (nonparametric methods)
Data Compression
 Transformations are applied to obtain a reduced or
compressed representation
 Lossless: The original data can be reconstructed
from the compressed data without any information
loss.
 Lossy: Only an approximation of the original data
can be reconstructed.
DR: Principal Components Analysis
(PCA)
 Why PCA?
 PCA is a useful statistical technique, has
found applications in:
Face recognition
Image Compression
Reducing dimension of data
PCA Goal:
Removing Dimensional Redundancy
 The major goal of PCA in Data Science and Machine
Learning is to remove the “dimensional redundancy”
from data.
 What does that mean?
 A typical dataset contains several dimensions (variables) that
may or may not correlate.
 Dimensions that correlate vary together.
 The information represented by a set of dimensions with high
correlation can be extracted by studying just one dimension
that represents the whole set.
 Hence the goal is to reduce the dimensions of a dataset to a
smaller set of representative dimensions that do not correlate.
PCA Goal:
Removing Dimensional Redundancy
Dim 1
Dim 2
Dim 3
Analyzing 12
Dim 4
Dim 5
Dimensional data
Dim 6 is challenging !!!
Dim 7
Dim 8
Dim 9
Dim 10
Dim 11
Dim 12
PCA Goal:
Removing Dimensional Redundancy

Dim 1
Dim 2 But some dimensions
Dim 3
represent redundant
Dim 4
Dim 5
information. Can we
Dim 6 “reduce” these.
Dim 7
Dim 8
Dim 9
Dim 10
Dim 11
Dim 12
PCA Goal:
Removing Dimensional Redundancy
Dim 1
Dim 2 Lets assume we have a
Dim 3 “PCA black box” that
Dim 4
can reduce the
Dim 5
correlating
Dim 6
Dim 7
dimensions.
Dim 8
Dim 9 Pass the 12d data set
Dim 10 through the black box
Dim 11
to get a three
Dim 12
dimensional data set.
PCA Goal:
Removing Dimensional Redundancy
Given appropriate reduction,
Dim 1
Dim 2
analyzing the reduced dataset
Dim 3
is much more efficient than
Dim 4 the original “redundant” data.
Dim 5 Dim A
Dim 6 PCA Dim B
Dim 7 Black box
Dim C
Dim 8
Dim 9
Dim 10 Pass the 12 d data set through the
Dim 11 black box to get a three dimensional
Dim 12
data set.
Mathematics inside PCA Black box: Bases
 Lets now give the “black box” a mathematical form.
 In linear algebra dimensions of a space are a linearly
independent set called “bases” that spans the space created by
dimensions.
i.e. each point in that space is a linear combination of the bases
set.
e.g. consider the simplest example of standard basis in R n
consisting of the coordinate axes.
Every point in R3 is a linear
combination of the standard basis of
R3
1 0 0
0 1 0
0 0 1
(2,3,3) = 2 (1,0,0) + 3(0,1,0) + 3 (0,0,1)
PCA Goal: Change of Basis
 Assume X is the 6-dimensional data set given as
input
Dimensions

 x11 x12 x13 x14 x15 x16 

x x26 
Data Points

 21 x22 x23 x24 x25

X  x31 x32 x33 x34 x35 x36 
 
 x41 x42 x43 x44 x45 x46 
 x51 x52 x53 x54 x55 x56 

• A naïve basis for X is standard basis for R6 and hence

BX = X
• Here, we want to find a new (reduced) basis P such as
PX = Y
• Y will be the resultant reduced data set.
PCA Goal
 Change of Basis
PX Y

 p11 p12  p1m   x1   y1 

p p22  p2 m   x2   y2 
 21
     
     
     
     
    
 pm1 pm 2  pmm   xm   ym 

• QUESTION: What is a good choice for P ?

– Lets park this question right now and revisit after studying
some related concepts
Background Stats/Maths
 Mean and Standard Deviation
 Variance and Covariance
 Covariance Matrix
 Eigenvectors and Eigenvalues
Mean and Standard Deviation
 Mean: n

 Xi
x  i 1

n
it doesn’t tell us a lot about data
set.
Different data sets can have same
mean.
 Standard Deviation (SD) of a data n

set is a measure of how spread  i

( X  X ) 2

out the data is. s i 1

(n  1)
 Variance is another measure of
n
the spread of data in data set. It is
almost identical to SD. 2
 i
( X  X ) 2

s  i 1
(n  1)
Covariance
 SD and Variance are 1-dimensional
 1-D data sets could be
Heights of all the people in the room
Salary of employee in a company
Marks in the quiz
 However many datasets have more than 1-dimension
 Our aim is to find any relationship between different
dimensions.
 E.g. Finding relationship with students result and their hour of
study.
 It is used to measure relationship between 2-Dimensions.
n

 (X i  X )(Yi  Y )
cov( X , Y )  i 1
(n  1)
Covariance Interpretation
 We have data set for students study hour (H) and marks
achieved (M)
 We find cov(H,M)
 Exact value of covariance is not as important as the sign
(i.e. positive or negative)
 +ve , both dimensions increase together
 -ve , as one dimension increases other decreases
 Zero, their exist no relationship
Covariance Matrix
 Covariance is always measured between 2 –
dim.
 What if we have a data set with more than
2-dim?
 We have to calculate more than one
covariance measurement.
 E.g. from a 3-dim data set (dimensions
x,y,z) we could cacluate cov(x,y) ,
cov(x,z) , cov(y,z)
Covariance Matrix
 Can use covariance matrix to find
covariance of all the possible pairs
 Since cov(a,b)=cov(b,a)
The matrix is symmetrical about the main
diagonal

 cov( x, x) cov( x, y ) cov( x, z ) 

 
c  cov( y, x) cov( y, y ) cov( y, z ) 
 cov( z , x) cov( z , y ) cov( z , z ) 
 
Eigenvectors
 Consider the two
multiplications between a
matrix and a vector  2 3   1   11
     
 In first example the resulting  2 1   3   5 
vector is not an integer
multiple of the original  2 3   3   12   3
vector.       4  
 Whereas in second example,  2 1   2   8   2
the resulting vector is 4 times
the original vector

22
Eigenvectors and Eigenvalues
 More formally defined
Let A be a nn matrix. The vector v that
satisfies
Av = v
for some scalar  is called the eigenvalue
of matrix A corresponding to eigenvector
v.

23
Principal Components Analysis
(PCA)
PCA is a method to identify a
new set of predictors, as linear
combinations of the original
ones, that captures the
‘maximum amount’ of
variance in the observed data.
A technique for identifying
patterns in data.
Also used to express data in such a way as to
highlight similarities and differences.
PCA are used to reduce the dimension in data
without losing the integrity of information.
Principal Components Analysis
(PCA)
Definition
Principal Components Analysis (PCA) produces a list of
p principle components (Y1, . . . , Yp) such that
Each Yi is a linear combination of the original
predictors, and it’s vector norm is 1
The Yi’s are pairwise orthogonal
The Yi’s are ordered in decreasing order in the
amount of captured observed variance.
That is, the observed data shows more variance
in the direction of Y1 than in the direction of Y2.

To perform dimensionality reduction we select the

top m principle components of PCA as our new
predictors and express our observed data in
The Intuition Behind PCA
Top PCA components capture
the most of amount of
variation (interesting features)
of the data.
Each component is a linear
combination of the original
predictors - we visualize them
as vectors in the feature
space.
The Intuition Behind PCA
Transforming our
observed data means
projecting our dataset
onto the space defined
by the top m PCA
components, these
components are our
new predictors.
Using PCA for
Regression
PCA is easy to use in Python, so how do we then
use it for regression modeling in a real-life problem?
If we use all p of the new Yj , then we have not
improved the dimensionality. Instead, we select the
first M PCA variables, Y1, ..., YM , to use as predictors
in a regression model.
The choice of M is important and can vary from
application to application. It depends on various
things, like how collinear the predictors are, how
truly related they are to the response, etc...
What would be the best way to check for a
specified problem?
Train and Test!!!
Step by Step
 Step 1:
We need to have some data for PCA

 Step 2:
Subtract the mean from each of the
data point
Step1 & Step2
X1 X2
2.5 2.4 0.69 0.49 0.4761 0.2401
0.5 0.7 -1.31 -1.21 1.7161 1.4641
2.2 2.9 0.39 0.99 0.1521 0.9801
1.9 2.2 0.09 0.29 0.0081 0.0841
3.1 3 1.29 1.09 1.6641 1.1881
2.3 2.7 0.49 0.79 0.2401 0.6241
2 1.6 0.19 -0.31 0.0361 0.0961
1 1.1 -0.81 -0.81 0.6561 0.6561
1.5 1.6 -0.31 -0.31 0.0961 0.0961
1.1 0.9 -0.71 -1.01 0.5041 1.0201
18.1 19.1 0 0 5.549 6.449
Step3: Calculate the Covariance matrix

 Calculate the covariance matrix

Var(x1) = 5.549/9 = 0.61656
Var(x2) = 6.449/9 = 0.71656
Cov(x1,x2) = 5.539/9 = 0.6154

 Non-diagonal elements in the covariance matrix are positive

 So x1 , x2 variable increase together

 .616555556 .615444444 
cov  
 .615444444 .716555556 
Step 4: Calculate the eigenvalues and
eigenvectors of the covariance matrix
using the following equation

where A is the cov. Matrix and I is

the identity matrix.
By solving the above equation we
get a second degree equation. After
solving the second degree equation
we tow values of . These are
eigenvalues of A.
Now we will find eigenvectors by solving the following
equation
A
Since covariance matrix is square, we can calculate
the eigenvector and eigenvalues of the matrix using
the constraint

We solve the above two equation one by one for both

the eigenvalues and find two eigenvectors
The final answer is

Where
What does this all mean?

Data Points

Eigenvectors
Conclusion
 Eigenvector give us information about the
pattern.
 By looking at graph in previous slide. See
how one of the eigenvectors go through the
middle of the points.
 Second eigenvector tells about another
weak pattern in data.
 So by finding eigenvectors of covariance
matrix we are able to extract lines that
characterize the data.
Step 5:Chosing components and forming a feature
vector.
 Highest eigenvalue is the principal
component of the data set.
 In our example, the eigenvector with the
largest eigenvalue was the one that pointed
down the middle of the data.
 So, once the eigenvectors are found, the
next step is to order them by eigenvalues,
highest to lowest.
 This gives the components in order of
significance.
Cont’d
 Now, here comes the idea of dimensionality
reduction and data compression
 You can decide to ignore the components of
least significance.
 You do lose some information, but if
eigenvalues are small you don’t lose much.
 More formal stated (see next slide)
Cont’d
 We have n – dimension
 So we will find n eigenvectors
 But if we chose only p first eigenvectors.
 Then the final dataset has only p dimension
Step 6: Deriving the new dataset
 Now, we have
chosen the   .7351 .6778 
components  
(eigenvectors) that  .6778 .7351 
Choice-1: with two eigenvectors
we want to keep.
 We can write them in
form of a matrix of
vectors  .6778 
 
 In our example we  .7351 
have two Choice-2: with one eigenvector
eigenvectors, So we i.e. first eigenvector
Cont’d
 To obtain the final dataset we will
multiply the above vector transposed
with the transpose of original data
matrix i.e.

for first eigenvector we get first

principle component as

 Final dataset will have data items in

columns and dimensions along rows.
 So we have original data set
represented in transformed form
Original data set represented using two
eigenvectors.
Original data set restored using only one
eigenvectors.
PCA – Mathematical Working
 Naïve Basis (I) of input matrix (X) spans a large
dimensional space.
 Change of Basis (P) is required so that X can be
projected along a lower dimension space having
significant dimensions only.
 A properly selected P will generate a projection Y.
 Use this P to project the correlation matrix. Lessen
the number of Eigenvectors in P for a reduced
dimension projection.
PCA Procedure
 Step 1
Get data

 Step 2
Subtract the mean

 Step 3
Calculate the covariance matrix

 Step 4
Calculate the eigenvectors and eigenvalues of the
covariance matrix

 Step 5
Choose components and form a feature vector

Automatic Setup Matrix Generation
No ratings yet
Automatic Setup Matrix Generation
10 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
1501589578da-mod15-Q1-e-text
No ratings yet
1501589578da-mod15-Q1-e-text
9 pages
PCA
100% (1)
PCA
33 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Presentation a i Std 2
No ratings yet
Presentation a i Std 2
63 pages
Unit 3
No ratings yet
Unit 3
102 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
2. PCA
No ratings yet
2. PCA
22 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
W4.2 DataPreProcessing-PCA (1)
No ratings yet
W4.2 DataPreProcessing-PCA (1)
22 pages
Presentation
No ratings yet
Presentation
31 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
Module 3
No ratings yet
Module 3
41 pages
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
No ratings yet
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
32 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Module 5 - BECE309L - AIML - Part2
No ratings yet
Module 5 - BECE309L - AIML - Part2
34 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Pac
No ratings yet
Pac
70 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
Mlfa Autumn 2023 Pca
No ratings yet
Mlfa Autumn 2023 Pca
32 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
16. Principal Component Analysis
No ratings yet
16. Principal Component Analysis
27 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
MLPDF 2
No ratings yet
MLPDF 2
9 pages
Principal+Component+Analysis
No ratings yet
Principal+Component+Analysis
6 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
Unit-3
No ratings yet
Unit-3
28 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Dimensonality Reduction
No ratings yet
Dimensonality Reduction
25 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
PCA
100% (1)
PCA
45 pages
UploadFile_9116
No ratings yet
UploadFile_9116
21 pages
9 ML
No ratings yet
9 ML
39 pages
PCA_gl
No ratings yet
PCA_gl
8 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
cheat sheet
No ratings yet
cheat sheet
2 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
lec 13-14 PCA
No ratings yet
lec 13-14 PCA
53 pages
Pca PDF
No ratings yet
Pca PDF
33 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
DS Unit 3 Essay Answers
No ratings yet
DS Unit 3 Essay Answers
15 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Lecture 7 -Data Preprocessing - Cleaning-M
No ratings yet
Lecture 7 -Data Preprocessing - Cleaning-M
21 pages
Chapter_5_v8.2
No ratings yet
Chapter_5_v8.2
21 pages
Company Research
No ratings yet
Company Research
5 pages
Lecture 13-Supervised Learning-Decision Trees-M
No ratings yet
Lecture 13-Supervised Learning-Decision Trees-M
47 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Lecture 10-Assiciation Rule Mining-I-M
No ratings yet
Lecture 10-Assiciation Rule Mining-I-M
30 pages
synthetic_tourism_dataset
No ratings yet
synthetic_tourism_dataset
112 pages
synthetic_tourism_dataset_gilgit_baltistan
No ratings yet
synthetic_tourism_dataset_gilgit_baltistan
112 pages
Immediate download (Ebook) Totally positive matrices by Pinkus Allan ISBN 9780521194082, 0521194083 ebooks 2024
100% (1)
Immediate download (Ebook) Totally positive matrices by Pinkus Allan ISBN 9780521194082, 0521194083 ebooks 2024
82 pages
Arc Length Method
100% (1)
Arc Length Method
22 pages
Direction Cosine Matrix IMU Theory
No ratings yet
Direction Cosine Matrix IMU Theory
30 pages
Lab7March Operator Overloading1
No ratings yet
Lab7March Operator Overloading1
2 pages
MSC Mathematics
100% (1)
MSC Mathematics
32 pages
Comparative Analysis of Gauss Elimination and Gauss-Jordan Elimination
No ratings yet
Comparative Analysis of Gauss Elimination and Gauss-Jordan Elimination
6 pages
Past Year Questions On Matrices
No ratings yet
Past Year Questions On Matrices
3 pages
Chapter 11: Numerical Solution of The Eigenvalue Problem
No ratings yet
Chapter 11: Numerical Solution of The Eigenvalue Problem
24 pages
Fridberg Linear
No ratings yet
Fridberg Linear
16 pages
Maths Presentation (Grp-8)
No ratings yet
Maths Presentation (Grp-8)
20 pages
12 em QB Full Book Ghss Eriyur
No ratings yet
12 em QB Full Book Ghss Eriyur
68 pages
MATM1644 Module Guide 2024
No ratings yet
MATM1644 Module Guide 2024
8 pages
IPM Indore 2019 Original Question Paper Answer Key
100% (1)
IPM Indore 2019 Original Question Paper Answer Key
19 pages
UG - R19 - Automobile Engineering - CS &amp - Syllabus PDF
No ratings yet
UG - R19 - Automobile Engineering - CS &amp - Syllabus PDF
152 pages
B.TECH CS II Year
No ratings yet
B.TECH CS II Year
20 pages
Kernel Methods
No ratings yet
Kernel Methods
6 pages
Linear Algebra and its Applications: Functions of a matrix and Krylov matrices聻
No ratings yet
Linear Algebra and its Applications: Functions of a matrix and Krylov matrices聻
16 pages
Answer: 1. The SWOT Matrix, SPACE Matrix, BCG Matrix, IE Matrix, and The Grand Strategy
No ratings yet
Answer: 1. The SWOT Matrix, SPACE Matrix, BCG Matrix, IE Matrix, and The Grand Strategy
2 pages
Trip Distribution Models TransCAD-1
No ratings yet
Trip Distribution Models TransCAD-1
34 pages
ICAR Catalogue
No ratings yet
ICAR Catalogue
5 pages
Part Time PG Diploma in Scientific Computing and Industrial Mathematics
No ratings yet
Part Time PG Diploma in Scientific Computing and Industrial Mathematics
20 pages
Final Project Report On Mimo System
100% (1)
Final Project Report On Mimo System
30 pages
Design of Experiments Via Taguchi Methods Orthogonal Arrays
No ratings yet
Design of Experiments Via Taguchi Methods Orthogonal Arrays
21 pages
Comp Proj
No ratings yet
Comp Proj
61 pages
2020-2-May-Scopus-Supplier Selection in A Manufacturing-DOI 10.5373JARDCSV12SP520201784
No ratings yet
2020-2-May-Scopus-Supplier Selection in A Manufacturing-DOI 10.5373JARDCSV12SP520201784
9 pages
Mega Math Book 3rd Edition
No ratings yet
Mega Math Book 3rd Edition
309 pages
Potential IIT Bombay CEP Courses For TEQIP III Colleges: Laboratory and Ergonomic Safety For Engineers
No ratings yet
Potential IIT Bombay CEP Courses For TEQIP III Colleges: Laboratory and Ergonomic Safety For Engineers
26 pages
Franklin A. Graybill-Theory and Application of The Linear Model-Cengage Learning (March 27, 2000)
No ratings yet
Franklin A. Graybill-Theory and Application of The Linear Model-Cengage Learning (March 27, 2000)
237 pages
Matrix M of SA Questions
100% (1)
Matrix M of SA Questions
14 pages

Lecture 9 -Data Prep - Reduction - PCA-M

Uploaded by

Lecture 9 -Data Prep - Reduction - PCA-M

Uploaded by

CS06504

 x11 x12 x13 x14 x15 x16 

 21 x22 x23 x24 x25

• A naïve basis for X is standard basis for R6 and hence

 p11 p12  p1m   x1   y1 

• QUESTION: What is a good choice for P ?

set is a measure of how spread  i

out the data is. s i 1

 cov( x, x) cov( x, y ) cov( x, z ) 

To perform dimensionality reduction we select the

 Calculate the covariance matrix

 Non-diagonal elements in the covariance matrix are positive

where A is the cov. Matrix and I is

We solve the above two equation one by one for both

for first eigenvector we get first

 Final dataset will have data items in

You might also like