0% found this document useful (0 votes)

40 views

Lecture 9 - Data Reduction

Data

Uploaded by

raoseshu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views

Lecture 9 - Data Reduction

Data

Uploaded by

raoseshu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Transfer Functions

Data Preprocessing
- Data Reduction
Data Preprocessing

 Data Preprocessing: An Overview

 Data Quality

 Major Tasks in Data Preprocessing

 Data Cleaning

 Data Integration

 Data Reduction

 Data Transformation and Data Discretization

2
2
Data Reduction Strategies

 Data reduction: Obtain a reduced representation of the data set that is

much smaller in volume but yet produces the same (or almost the same)
analytical results

 Why data reduction? — A database/data warehouse may store terabytes of

data. Complex data analysis may take a very long time to run on the
complete data set.

 Data reduction strategies

 Dimensionality reduction, e.g., remove unimportant attributes
 Numerosity reduction (some simply call it: Data Reduction)
 Data compression

3
Data Reduction Strategies

 Data reduction strategies

 Dimensionality reduction, e.g., remove unimportant attributes

 Wavelet transforms

 Principal Components Analysis (PCA)

 Feature subset selection, feature creation

 Numerosity reduction (some simply call it: Data Reduction)

 Regression and Log-Linear Models

 Histograms, clustering, sampling

 Data cube aggregation

 Data compression

4
Data Reduction: Dimensionality Reduction

 Curse of dimensionality
 When dimensionality increases, data becomes increasingly sparse
 Density and distance between points, which is critical to clustering,
outlier analysis, becomes less meaningful
 Dimensionality reduction
 Avoid the curse of dimensionality
 Help eliminate irrelevant features and reduce noise
 Reduce time and space required in data mining
 Allow easier visualization
 Dimensionality reduction techniques
 Wavelet transforms
 Principal Component Analysis
 Supervised and nonlinear techniques (e.g., feature selection)

5
Visualization Problem
 Not easy to visualize multivariate data
 - 1D: dot

 - 2D: Bivariate plot (i.e. X-Y plane)

 - 3D: X-Y-Z plot

 - 4D: ternary plot with a color code /Tetrahedron- 5D, 6D,

etc. : ???
Motivation

• Given data points in d dimensions

• Convert them to data points in r<d dimensions
• With minimal loss of information
Basics of PCA
 PCA is useful when we need to extract useful information
from multivariate data sets.

 This technique is based on the reduced dimensionality.

What is Principal Component

 A principal component can be defined as a linear

combination of optimally-weighted observed variables.
What are the new axes?

Original Variable B PC 2
PC 1

Original Variable A

• Orthogonal directions of greatest variance in data

• Projections along PC1 discriminate the data most along any one axis
Principle Component Analysis

PCA:
Orthogonal projection of data onto lower-dimension
linear space that...
• maximizes variance of projected data (purple line)

• minimizes mean squared distance between

• data point and
• projections (sum of blue lines) 14
The Principal Components
• Vectors originating from the center of mass

• Principal component #1 points

in the direction of the largest variance.

• Each subsequent principal component…

• is orthogonal to the previous ones, and
• points in the directions of the largest
variance of the residual subspace

15
2D Gaussian dataset

16
1st PCA axis

17
2nd PCA axis

18
Principal component analysis
• Principal component analysis (PCA) is a procedure which
uses the correlations between the variables to identify
which combinations of variables capture most information
about the dataset

• Mathematically, it determines the eigenvectors of the

covariance matrix and sorts them in importance according
to their corresponding eigenvalues
Basics for Principal Component Analysis

• Orthogonal/Orthonormal

• Standard deviation, Variance, Covariance

• The Covariance matrix

• Eigenvalues and Eigenvectors

Covariance

• Standard Deviation and Variance are 1-dimensional

• How much do the dimensions vary from the mean with respect to each other ?

• Covariance measures between 2 dimensions

We easily see, if X=Y we end up with variance

Covariance Matrix

• Let X be a random vector.

• Then the covariance matrix of X, denoted by Cov(X), is

• The diagonals of Cov(X) are .

• In matrix notation,

The covariance matrix is symmetric

Orthogonality/Orthonormality

1.5 <v1,v2> = <(1 0),(0 1)>

= 0
1
0.5

0.5 1.0 1.5

• Two vectors v1 and v2 for which <v1,v2>=0 holds are said to be orthogonal

• Unit vectors which are orthogonal are said to be orthonormal.

Eigenvalues/Eigenvectors

• Let A be an nxn square matrix and x an nx1 column vector. Then a (right)
eigenvector of A is a nonzero vector x such that:

Eigenvalue Eigenvector

Procedure:
Finding the eigenvalues

=0 Finding lambdas

Find corresponding eigenvectors

Transformation

• Looking for a transformation of the data matrix X (pxn) such that

Y= T X=1 X1+ 2 X2+..+ p Xp

Transformation

What is a reasonable choice for the  ?

Remember: We wanted a transformation that maximizes information

That means: captures Variance in the data

Maximize the variance of the projection of the observations on the Y

variables !
Find  such that

Var(T X) is maximal

The matrix C=Var(X) is the covariance matrix of the Xi variables

Transformation
Can we intuitively see that in a picture?

Good Better
 v( x1 ) c(x1,x2 ) ........c(x1,x p ) 
 
 c(x1,x2 ) v( x2 ) ........c(x2 ,x p ) 
Cov(X)=  
 
 c(x ,x ) c(x ,x )..........v( x ) 
 1 p 2 p p 
PCA algorithm
(based on sample covariance matrix)
• Given data {x1, …, xm}, compute covariance matrix 

1 m 1 m
   (x i  x)( x  x) T where x   xi
m i 1 m i 1

• PCA basis vectors = Compute the eigenvectors of 

• Larger eigenvalue  more important eigenvectors

29
PCA – zero mean
• Suppose we are given x1, x2, ..., xM (N x 1) vectors
N: # of features
Step 1: compute sample mean M: # data
M
1
x
M
x
i 1
i

Step 2: subtract sample mean (i.e., center data at zero)

Φi  xi  x
Step 3: compute the sample covariance matrix Σx
1 M
1 M
1 where A=[Φ1 Φ2 ... ΦΜ]
x 
M

i 1
( x i  x )( x i  x )T

M

i 1
 T
i
i  
M
AAT
i.e., the columns of A are the Φi
(N x M matrix)

30
PCA - Steps
Step 4: compute the eigenvalues/eigenvectors of Σx
 xui  iui
where we assume 1  2  ...  N
Note : most software packages return the eigenvalues (and corresponding eigenvectors)
is decreasing order – if not, you can explicitly put them in this order)

Since Σx is symmetric, <u1,u2,…,uN> form an orthogonal basis

in RN and we can represent any x∈RN as: x 
x 
y 
y 
1 1

 2  2
N  .   . 

x  x   yi ui  y1u1  y2u2  ...  y N u N

   
 .  .
xx:  
 .   . 
i 1    
i.e., this is  .   . 
just a “change”  .   . 
(x  x)T ui    
yi  T
 ( x  x )T
ui if || ui || 1 of basis!  xN   y N 
ui ui
Note : most software packages normalize ui to unit length to simplify calculations; if
not, you can explicitly normalize them) 31
PCA - Steps
Step 5: dimensionality reduction step – approximate x using
only the first K eigenvectors (K<<N) (i.e., corresponding to
the K largest eigenvalues where K is a parameter)

32
Example
• Compute the PCA of the following dataset:

(1,2),(3,3),(3,5),(5,4),(5,6),(6,5),(8,7),(9,8)

• Compute the sample covariance matrix is:

• The eigenvalues can be computed by finding the roots of the

characteristic polynomial:

33
Example (cont’d)
• The eigenvectors are the solutions of the systems:
xui  iui

Note: if ui is a solution, then cui is also a solution where c≠0.

Eigenvectors can be normalized to unit-length using:

vi
vˆi 
|| vi ||
34
Choosing the projection dimension K ?

• K is typically chosen based on how much information

(variance) we want to preserve:
K

Choose the smallest  i

K that satisfies
i 1
N
T where T is a threshold (e.g., 0.9)
the following
inequality: 
i 1
i

• If T=0.9, for example, we “preserve” 90% of the information

(variance) in the data.

• If K=N, then we “preserve” 100% of the information in the

data (i.e., just a “change” of basis and xˆ  x )

35
Data Normalization

• The principal components are dependent on the units used

to measure the original variables as well as on the range of
values they assume.

• Data should always be normalized prior to using PCA.

• A common normalization method is to transform all the data

to have zero mean and unit standard deviation:

xi   where μ and σ are the mean and standard

deviation of the i-th feature xi

36

Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
45 Colonial Broadcasting
100% (1)
45 Colonial Broadcasting
17 pages
ISYE 8803 - Kamran - M1 - Intro To HD and Functional Data - Updated
No ratings yet
ISYE 8803 - Kamran - M1 - Intro To HD and Functional Data - Updated
87 pages
Cluster Analysis Using SPSS
No ratings yet
Cluster Analysis Using SPSS
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Face Recognition PAC
No ratings yet
Face Recognition PAC
24 pages
Module 3
No ratings yet
Module 3
41 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
Unit-3
No ratings yet
Unit-3
28 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Lec 11: Linear Dimensionality Reduction: 11.33.1 Minimizing Variance
No ratings yet
Lec 11: Linear Dimensionality Reduction: 11.33.1 Minimizing Variance
3 pages
Copy of deep-learning
No ratings yet
Copy of deep-learning
28 pages
Unit 1 Ganeshk e
No ratings yet
Unit 1 Ganeshk e
24 pages
Outline: - Mathematical Background - PCA - SVD - Some PCA and SVD Applications - Case Study: LSI
No ratings yet
Outline: - Mathematical Background - PCA - SVD - Some PCA and SVD Applications - Case Study: LSI
42 pages
Data Pre-Processing-IV (Feature Extraction-PCA)_7c5a4c5da931f4f69a14c94e7e8b9062
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)_7c5a4c5da931f4f69a14c94e7e8b9062
23 pages
W4.2 DataPreProcessing-PCA (1)
No ratings yet
W4.2 DataPreProcessing-PCA (1)
22 pages
Lecture8 2015
No ratings yet
Lecture8 2015
51 pages
Lecture 11 Dimensionality Reduction
No ratings yet
Lecture 11 Dimensionality Reduction
32 pages
Ahmed Rebai PCA-ICA
No ratings yet
Ahmed Rebai PCA-ICA
34 pages
Computer Vision: Spring 2006 15-385,-685
No ratings yet
Computer Vision: Spring 2006 15-385,-685
58 pages
rsfinal (1)
No ratings yet
rsfinal (1)
30 pages
3 - Feature Extraction
No ratings yet
3 - Feature Extraction
22 pages
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
No ratings yet
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
48 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Matlab DrWasaN
No ratings yet
Matlab DrWasaN
48 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
MLPDF 2
No ratings yet
MLPDF 2
9 pages
NOTES- PROBABILITY FUNCTIONS
No ratings yet
NOTES- PROBABILITY FUNCTIONS
4 pages
Chapter2 Proof Solution
No ratings yet
Chapter2 Proof Solution
56 pages
Ch-2-Mathematical Building Blocks NN
No ratings yet
Ch-2-Mathematical Building Blocks NN
29 pages
Csci567 Hw1 Spring 2016
No ratings yet
Csci567 Hw1 Spring 2016
9 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
HW2 MTH452/552
No ratings yet
HW2 MTH452/552
7 pages
Machine Learning Lecture1
No ratings yet
Machine Learning Lecture1
56 pages
Modeling Basics: Compartment Models Dimensional Analysis Stochastic Modeling
No ratings yet
Modeling Basics: Compartment Models Dimensional Analysis Stochastic Modeling
58 pages
Presentation
No ratings yet
Presentation
31 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
Lect 4
No ratings yet
Lect 4
34 pages
A Problem in Enumerating Extreme Points
No ratings yet
A Problem in Enumerating Extreme Points
9 pages
16. Principal Component Analysis
No ratings yet
16. Principal Component Analysis
27 pages
Lecture 8
No ratings yet
Lecture 8
76 pages
DimensionalityReduction 13022024
No ratings yet
DimensionalityReduction 13022024
32 pages
Itc Introduction
No ratings yet
Itc Introduction
40 pages
lec3
No ratings yet
lec3
60 pages
mophong05_identifydistribution_09
No ratings yet
mophong05_identifydistribution_09
36 pages
Output Primitives
No ratings yet
Output Primitives
52 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
Lesson 2 - Graphics Primitive Output
No ratings yet
Lesson 2 - Graphics Primitive Output
40 pages
Principal Component Analysis - A Tutorial
No ratings yet
Principal Component Analysis - A Tutorial
37 pages
Presentation a i Std 2
No ratings yet
Presentation a i Std 2
63 pages
Prob RV Opt Basics
No ratings yet
Prob RV Opt Basics
35 pages
Process Optimization Algorythms PDF
No ratings yet
Process Optimization Algorythms PDF
77 pages
A Tangent Distance Preserving Dimensionality Reduction Algorithm
No ratings yet
A Tangent Distance Preserving Dimensionality Reduction Algorithm
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
17 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Matrix Theory and Applications for Scientists and Engineers
From Everand
Matrix Theory and Applications for Scientists and Engineers
Alexander Graham
No ratings yet
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
41 pages
EM202263TEJ623STAT - 1HI6007 Final Assessment T1 20221
No ratings yet
EM202263TEJ623STAT - 1HI6007 Final Assessment T1 20221
8 pages
Lesson 1 Normal Curve Distribution
100% (1)
Lesson 1 Normal Curve Distribution
43 pages
CHAPTER 3. Prediction Techniques
No ratings yet
CHAPTER 3. Prediction Techniques
15 pages
Guidelines For Reporting Reliability and Agreement Studies (GRRAS) Were Proposed
No ratings yet
Guidelines For Reporting Reliability and Agreement Studies (GRRAS) Were Proposed
12 pages
Formula Sheet CT1
No ratings yet
Formula Sheet CT1
3 pages
Quantitative Reasoning
No ratings yet
Quantitative Reasoning
31 pages
Amazon Data Analysis Presentation 1
No ratings yet
Amazon Data Analysis Presentation 1
20 pages
PCM (8) Test For Significance (Dr. Tante)
No ratings yet
PCM (8) Test For Significance (Dr. Tante)
151 pages
L10 - T Test
No ratings yet
L10 - T Test
28 pages
1 Qs
No ratings yet
1 Qs
21 pages
Chi Square Test
No ratings yet
Chi Square Test
3 pages
Data Kelompok 4 Statistik
No ratings yet
Data Kelompok 4 Statistik
5 pages
chapter-4
No ratings yet
chapter-4
38 pages
HR Analytics Applications
No ratings yet
HR Analytics Applications
26 pages
Chapter 10 Two Sample Inferences
No ratings yet
Chapter 10 Two Sample Inferences
82 pages
RSCH Answer Key Oed Only Ezxc
No ratings yet
RSCH Answer Key Oed Only Ezxc
29 pages
NEP Introductory Econometrics December 2024
No ratings yet
NEP Introductory Econometrics December 2024
7 pages
Chapter 05
No ratings yet
Chapter 05
23 pages
Data Analysis - Statistics
No ratings yet
Data Analysis - Statistics
68 pages
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
No ratings yet
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
99 pages
Faculty - College of Computing, Informatics and Media - 2023 - Session 1 - Degree - Asc550-1
No ratings yet
Faculty - College of Computing, Informatics and Media - 2023 - Session 1 - Degree - Asc550-1
6 pages
F1 CRD Lecture Stat-701 Final
No ratings yet
F1 CRD Lecture Stat-701 Final
10 pages
Angie S.: Subject: Probability and Statistics
No ratings yet
Angie S.: Subject: Probability and Statistics
4 pages
Chapter 2 - Correlation and Regression
No ratings yet
Chapter 2 - Correlation and Regression
19 pages
Análisis de Supervivencia
100% (1)
Análisis de Supervivencia
441 pages
Pearsons R
No ratings yet
Pearsons R
16 pages
Module 3 - Branches of Statistics (1)
No ratings yet
Module 3 - Branches of Statistics (1)
50 pages

Lecture 9 - Data Reduction

Uploaded by

Lecture 9 - Data Reduction

Uploaded by

Transfer Functions

 Data Preprocessing: An Overview

 Major Tasks in Data Preprocessing

 Data Transformation and Data Discretization

 Data reduction: Obtain a reduced representation of the data set that is

 Why data reduction? — A database/data warehouse may store terabytes of

 Data reduction strategies

 Data reduction strategies

 Dimensionality reduction, e.g., remove unimportant attributes

 Principal Components Analysis (PCA)

 Feature subset selection, feature creation

 Numerosity reduction (some simply call it: Data Reduction)

 Histograms, clustering, sampling

 Data cube aggregation

 - 2D: Bivariate plot (i.e. X-Y plane)

 - 3D: X-Y-Z plot

 - 4D: ternary plot with a color code /Tetrahedron- 5D, 6D,

• Given data points in d dimensions

 This technique is based on the reduced dimensionality.

 A principal component can be defined as a linear

• Orthogonal directions of greatest variance in data

• minimizes mean squared distance between

• Principal component #1 points

• Each subsequent principal component…

• Mathematically, it determines the eigenvectors of the

• Standard deviation, Variance, Covariance

• The Covariance matrix

• Eigenvalues and Eigenvectors

• Standard Deviation and Variance are 1-dimensional

• Covariance measures between 2 dimensions

We easily see, if X=Y we end up with variance

• Let X be a random vector.

• Then the covariance matrix of X, denoted by Cov(X), is

• The diagonals of Cov(X) are .

The covariance matrix is symmetric

1.5 <v1,v2> = <(1 0),(0 1)>

0.5 1.0 1.5

• Unit vectors which are orthogonal are said to be orthonormal.

Find corresponding eigenvectors

• Looking for a transformation of the data matrix X (pxn) such that

Y= T X=1 X1+ 2 X2+..+ p Xp

What is a reasonable choice for the  ?

Remember: We wanted a transformation that maximizes information

That means: captures Variance in the data

Maximize the variance of the projection of the observations on the Y

The matrix C=Var(X) is the covariance matrix of the Xi variables

• PCA basis vectors = Compute the eigenvectors of 

• Larger eigenvalue  more important eigenvectors

Step 2: subtract sample mean (i.e., center data at zero)

Since Σx is symmetric, <u1,u2,…,uN> form an orthogonal basis

x  x   yi ui  y1u1  y2u2  ...  y N u N

• Compute the sample covariance matrix is:

• The eigenvalues can be computed by finding the roots of the

Note: if ui is a solution, then cui is also a solution where c≠0.

Eigenvectors can be normalized to unit-length using:

• K is typically chosen based on how much information

Choose the smallest  i

• If T=0.9, for example, we “preserve” 90% of the information

• If K=N, then we “preserve” 100% of the information in the

• The principal components are dependent on the units used

• Data should always be normalized prior to using PCA.

• A common normalization method is to transform all the data

xi   where μ and σ are the mean and standard

You might also like