0% found this document useful (0 votes)
325 views

Dimensionality Reduction

The document discusses dimensionality reduction techniques. It introduces the concept of dimensionality reduction to reduce redundant and noisy features. Some common dimensionality reduction methods mentioned are feature selection, feature extraction using singular value decomposition (SVD) and principal component analysis (PCA), and neural network-based methods like autoencoders. SVD is described as a way to reduce the dimensionality of document vectors in information retrieval applications to address issues like synonymy and polysemy.

Uploaded by

Rounak Mandal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
325 views

Dimensionality Reduction

The document discusses dimensionality reduction techniques. It introduces the concept of dimensionality reduction to reduce redundant and noisy features. Some common dimensionality reduction methods mentioned are feature selection, feature extraction using singular value decomposition (SVD) and principal component analysis (PCA), and neural network-based methods like autoencoders. SVD is described as a way to reduce the dimensionality of document vectors in information retrieval applications to address issues like synonymy and polysemy.

Uploaded by

Rounak Mandal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 85

Machine Learning: Foundation and Applications

Dimensionality Reduction

Plaban Kumar Bhowmick


Introduction to Dimensionality Reduction

 Curse of Dimensionality
 Large number of input features may lead to poor performance

 Dimensionality Reduction
 Reduce the number of redundant and noisy features
Dimensionality Reduction: Why

 Classifiers are already selecting the set of good features


 But we still need dimensionality reduction
 Complexity may depend on input size
▪ Reducing memory need and computation by discarding irrelevant features
 Reduction the cost of extraction
 Simpler models are more robust (low variance) on small datasets
 Data can be explained by fewer no of features (hidden or latent)
 Data can be visualized with lower dimensions (e.g., tSNE)
Dimensionality Reduction Methods

 Feature Selection
 Filtering and wrapper-based methods

 Feature Extraction Methods


 Singular Value Decomposition (SDV)
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)

 Neural network-based method


 Autoencoder
Feature Selection
Subset Selection

 Best subset of the set of features


 Least no dimensions that most contribute to the accuracy
 Total no of possibilities:
▪ Infeasible to check for all
 Iteratively select an test
▪ Stop when the performance criteria is met
▪ Testing is performed on validation set
▪ is the feature set of input dimensions,
▪ is the error incurred on a validation sample represented with feature set
 Sequential Forward Selection
 Sequential Backward Selection
Forward Selection (Wrapper Method)

 Start with no feature:


 At each step
 For each :
▪ Train the model on the training set with
▪ Calculate on the validation set

 Add if
 Stop
▪ If adding any feature does not decrease , or
▪ If the decrease in error too small, or
▪ If we have reached the desired performance level
Backward Selection

 Start with no feature:


 At each step
 For each :
▪ Train the model on the training set with
▪ Calculate on the validation set

 Add if
 Stop
▪ If adding any feature does not decrease , or
▪ If the decrease in error too small, or
▪ If we have reached the desired performance level
Feature Extraction Methods
Singular Value Decomposition (SVD)
Use Case: Information Retrieval
high Feynman was a
Newton was similarity
Document good physicist.
Query ( a good low
Collection physicist similarity
Newton was a
British physicist

𝐷𝑖  Very high-dimensional
𝑄 o hundreds of millions of dimensions


𝑄∙ ⃗
𝐷𝑖  This is a very sparse vector
⃗ ⃗
cos( 𝑄, 𝐷𝑖 ¿¿= o most entries are zero.
|𝑄|∨ 𝐷𝑖 ∨¿¿
⃗ ⃗
𝐷𝑗

Can we reduce dimensions of query and document vectors


that capture their semantics (meaning)?
Vector Space Model
Problems with VSM

 Two problems with VSM:


 synonymy
▪ car and automobile
 Query containing ‘car’ does not match with document containing ‘automobile’
 polysemy
▪ words have more than one meaning
▪ bank, python
 Query refers to ‘python’ as a language but document as a snake.
The Problem
 Example: Vector Space Model
 (from Lillian Lee)

make
auto car
hidden
engine emissions
Markov
bonnet hood
model
tyres make
emissions
lorry model
normalize
boot trunk
learning
Synonymy Polysemy
Will have small cosine Will have large cosine
but are related but not truly related
Motivating Example

car model mixed machine learning

automobile 1 1 0 1 0 0

car 1 0 1 1 0 0

model 1 1 1 2 1 1

learning 0 0 0 1 1 1
Motivating Example
𝑄= 𝑐𝑎𝑟   𝑚𝑜𝑑𝑒𝑙
R R R R NR NR

automobile 1 1 0 1 0 0 0

car 1 0 1 1 0 0 1

model 1 1 1 2 1 1 1

learning 0 0 0 1 1 1 0

2 1 2 3 1 1
 X    
LSA: Motivating Example

Let us do this
𝑄= 𝑐𝑎𝑟   𝑚𝑜𝑑𝑒𝑙
with Linear
Algebra R R R R NR NR

automobile 1 1 1 1 0 0 0

car 1 1 1 1 0 0 1

model 1 1 1 2 1 1 1

learning 0 0 0 1 1 1 0

2 2 2 3 1 1
     
Bit of Linear Algebra: Rank

automobile 1 1 0 1 0 0

car 1 0 1 1 0 0

model 1 1 1 2 1 1

learning 0 0 0 1 1 1

Let be a matrix
number of linearly independent rows or column

Rank of this matrix is 4


Bit of Linear Algebra: Rank

𝑥1 𝑥1 𝑥1 𝑥+¿
1 𝑥2𝑥2 𝑥2

automobile 1 1 1 1 0 0 1 0

car 1 1 1 1 0 0 1 0

model 1 1 1 2 1 1 1 1

learning 0 0 0 1 1 1 0 1

Rank of this matrix is 2


Rank Leads to Dimensionality

car car
𝑥1 (1,1,0)

𝐷1 , 𝐷2 , 𝐷 3 (1,1,0)
𝐷1 , 𝐷2 , 𝐷 3 (1,0)

automobile automobile

𝐷4 (1,1,1)
𝐷4 (1,1)
𝐷5 , 𝐷6 ( 0,0,1) 𝐷5 , 𝐷6 (0,1)

learning learning
𝑥2 (0,0,1)
Matrix Factorization

1 1 1 1 0 0

1 1 1 1 0 0 1 0
1 1 1 1 0 0
1 1 1 2 1 1 1 0
0 0 0 1 1 1
0 0 0 1 1 1 1 1

0 1

Concepts: car model Concept Space: Lower dimensional


machine learning model representation of

Latent Semantic Analysis


First Step towards LSA
𝐶 : 𝑅𝑎𝑛𝑘−4 𝐶 ′ : 𝑅𝑎𝑛𝑘 − 2
1 1 0 1 0 0 1 1 1 1 0 0
1 0 1 1 0 0 1 1 1 1 0 0
Approx.
1 1 1 2 1 1 1 1 1 2 1 1
0 0 0 1 1 1 0 0 0 1 1 1

𝑥1 (1,1,0)

𝑥2 (0,0,1)
Criteria for Approximation?

 Given: An term document matrix and

 Find a matrix of (column) rank such that is as close to as


possible

 |||| = |||| is called the Frobenius norm


▪ |||| =
Eigenvectors and Eigenvalues

 A real symmetric matrix has eigenvectors () and


eigenvalues ()
 The eigenvectors () are pairwise orthogonal
▪ for
 The eigenvectors are normal (unit length)

 How to compute and ?


▪ Solve
▪ Use to determine
Eigenvalue and Eigenvector

Eigenvalues:

Eigenvectors:
Eigenvector Decomposition

 Eigenvector Decomposition (EVD) or Matrix


Diagonalization (Symmetric matrix)


𝐴=| 6
5 |
5
6
𝜆1 =11 𝜆2 =1 ||
𝑥1= 1
1 | |
𝑥2 = 1
−1

𝐴= | | |
6
5
5
=
1 1
6 √2 1 ||
. . | |
1 11 0 1 1 1
−1 0 1 √ 2 1 −1 |
𝑈 Λ
Singular Value Decomposition

 Term-document matrices () are not generally


square and symmetric

 Singular value decomposition (SVD)


 Let be matrix whose columns are orthogonal
eigenvectors of
 Let be matrix whose columns are orthogonal
eigenvectors of
Singular Value Decomposition (SVD)
Let be the rank of . Then there exists an SVD of

𝑪= 𝑼 𝚺 𝑽
1. Eigenvalues of are the same as the eigenvalues of

2. be an matrix
• with , zero otherwise
Dimensionality Reduction

𝜎1 𝑚 ×(𝑛 −𝑟 )
𝜎1
𝜎2 𝜎2
Σ 𝑚×𝑛 =¿ 𝜎 𝑟𝑟 ×𝑟 𝜎𝑟
0 𝑟 ×𝑟
0 (𝑚− 𝑟 )× 𝑛
Dimensionality Reduction

𝐶 Eigenvectors of Σ Eigenvectors of

𝑚 ×𝑛 𝑚×𝑚 𝑚 ×𝑛 𝑛 ×𝑛


𝑟 ×𝑛
𝑟 ×𝑟

𝑚 ×𝑟

𝑼 𝚺 𝑽
Singular Value Decomposition (SVD)
′ ′ ′
𝑈
𝐶 ≈𝑈 Σ𝑉 =𝑈

, Σ =Σ ,𝑉 =𝑉
 : Input Term-Doc Matrix
 matrix ( terms, documents)
 : Left Singular Matrix
 matrix ( terms, concepts)
 : Singular Values
 diag matrix (strength of each concept)
 : Right Singular Matrix
 matrix ( documents, concepts)
Singular Value Decomposition (SVD)
𝐶 ≈ 𝑈 Σ 𝑉 ⊤=∑ 𝜎 𝑖 𝑢𝑖 ∘ 𝑣 ⊤
𝑖
𝑖

𝑛 𝑟 𝑟 𝑛

𝑚
𝐶 ≈ 𝑚
Singular Value Decomposition (SVD)
𝐶 ≈ 𝑈 Σ 𝑉 ⊤=∑ 𝜎 𝑖 𝑢𝑖 ∘ 𝑣 ⊤
𝑖
𝑖

𝑛 𝜎 1 𝒖𝟏 𝒗 𝟏 𝜎 2 𝒖𝟐 𝒗 𝟐

𝑚
𝐶 ≈ +¿
Low Rank Approximation

 The problem
 Given an matrix and a positive integer , find another matrix to
minimize the Frobenius norm of
Low Rank Approximation from SVD

 Steps
1. Given , find SVD

2. Obtain from with replacing zero the smallest singular values on


diagonal of

3. Compute
What did we do?
𝑘 𝑘 𝑛 Lower dimensional
representation of doc in

𝐶
latent space

Hidden or
Latent
Concepts
Term-Document Matrix
docid text
d1 Ship ocean voyage
d2 Ocean boat
d3 ship
d4 Voyage trip
Term-document matrix
d5 voyage
d6 trip
Singular Vector Decomposition

𝑈 Σ

= *

Latent concepts Concepts strength

Doc in new space


𝑉
Singular Vector Decomposition
𝑈 Σ 𝑉⊤

Representation of document in latent space


Python Libraries for SVD

numpy
numpy.linalg.svd:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.svd.html

Scipy
scipy.linalg.svd:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.svd.html

scipy.sparse.linalg.svds:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html
SVD: Summary

 A general matrix can be decomposed as


 has columns as eigenvectors of
 has columns as eigenvectors of
 is a diagonal matrix with

 Dimensionality reduction as a low rank approximation of


 Discard some of the low entries in
Principal Component Analysis (PCA)
A Toy Example
Physics Model:
𝑥= 𝑓 (𝑡 )= 𝐴𝑐𝑜𝑠( 𝜔 𝑡 )

Naïve Experimenter:
How many dimensions are important for
measurement?
Which dimensions are important?

Experimental Setup:
Measure ball’s position in 3D
Three cameras (120 Hz) record movement of the
system

How to analyze the movie data to understand the


Source: https://www.cs.cmu.edu/~elaw/papers/pca.pdf
system behavior?
A Toy Example

[]
𝑥𝐴 Camera A
𝑦𝐴
𝑋 = 𝑥𝐵

Camera B
𝑦𝐵
𝑥𝐶
𝑦𝐶 Camera C

What is an orthonormal basis for ()?


{( 1,0 ) ,( 0,1) }

Naïve basis reflect the way the data has been collected

Consider
A Toy Example

[]
𝑥𝐴 Camera A
𝑦𝐴
𝑋 = 𝑥𝐵

Camera B
𝑦𝐵
𝑥𝐶
𝑦𝐶 Camera C

𝐵=
[ ][
𝑏1
𝑏2
=
1
0
0
1 ] dimensional

[ ][ ]
𝑏1 1 0 … 0
𝑏2 0 1 … 0
𝐵= . = . . … . dimensional
. . . … .
Each row vector is an orthonormal basis vector 0 0 … 1
𝑏𝑚
Each data point can be trivially expressed as linear
combination of
Change of Basis: Core PCA Idea

 Change of Basis
 Is there another basis, which is a linear combination of the original
basis, that best re-express the original dataset?

 Requirement of linearity in PCA


 Simplifies the re-expression: restricting the set of potential basis

 PCA re-expresses the data as a linear combination of its basis


vectors
Change of Basis: Core PCA Idea
𝑡1 𝑡2 𝑡𝑛 Change of Basis:

𝑋=¿ 𝑌 𝑚×𝑛 =𝑃 𝑚×𝑚 𝑋 𝑚×𝑛

 row of
6 ×72000  column of
 column of
10 mins of recording with 120 Hz
=
Change of Basis: Core PCA Idea
Interpretation of Change of Basis:
Matrix performs a linear transform from to

Matrix performs a rotation and stretch

Rows of is the set of new basis vectors

[] [ ]
𝑝1 𝑝1 . 𝑥𝑖

[ ]
𝑝2 𝑝1 . 𝑥 1 ⋯ 𝑝 1 . 𝑥𝑛 𝑝2 . 𝑥𝑖
𝑃𝑋 = . [ 𝑥 1 𝑥 2 … 𝑥 𝑛 ] 𝑌= ⋮ ⋱ ⋮ 𝑦𝑖= .
. 𝑝𝑚 . 𝑥1 ⋯ 𝑝𝑚 . 𝑥𝑛 .
𝑝𝑚 𝑝𝑚 . 𝑥𝑖
Change of Basis: Core PCA Idea

𝑝2
𝑏2 =[ 0,1,0]

𝑥𝑖 𝑝1

𝑦𝑖
[ 2]
.𝑥 𝑖

=𝑝
=𝑝
1
1]
𝑦 𝑖[

2
.𝑥 𝑖
𝑏1=[1,0,0]
,1 ]
𝑦
0 [3]
[0, 𝑖
=
𝑝
𝑏 3= 3 .𝑥
𝑖
𝑝3
Change of Basis: Core PCA Idea

 Principal Components of

 What is the best way to re-express ?


 Need to have an objective
 What property should hold?

 What is a good choice of basis ?


Setting up the Objective

 What does best-express the data mean?


 Separate the wheat from the chaff

 Original data is garbled


 Three potential causes: Noise, Rotation, Redundancy

 Find a mathematical (linear algebra) goal for deciphering


garbled data
Modelling Noise with Variance
Purity of data: Signal-to-Noise Ratio
𝜎 2𝑠𝑖𝑔𝑛𝑎𝑙
𝑆𝑁𝑅= 2
𝜎 𝑛𝑜𝑖𝑠𝑒
 High precision data
Modelling Noise with Variance
al
si gn
The spring moves in one direction
ise Any measurement that deviates from the straight line must be a
no noise

SNR measures the fatness of the cloud


Thin line along the signal direction: Highly precise data
A circle indicates SNR : noisy data
Thin line along the noise direction: highly noisy data
Naïve basis
Directions with largest variances  Dynamics of interest

Maximizing variance  Appropriate rotation of naïve basis


Finding the direction of
Modelling Redundancy
Covariance as a Model of Noise and Redundancy

𝑎=[ 𝑎1 𝑎 2 … 𝑎𝑛 ] 𝑏=[ 𝑏1 𝑏2 … 𝑏𝑛 ]

Variance: 𝜎 2𝑎=⟨ 𝑎 𝑖 𝑎 𝑖 ⟩ 𝑖 𝜎 2𝑏= ⟨ 𝑏 𝑖 𝑏𝑖 ⟩𝑖

Covariance: 𝜎 2𝑎𝑏= ⟨ 𝑎 𝑖 𝑏𝑖 ⟩𝑖 Degree of linear relationship between two variables


Large (small) value  High (low) redundancy
Covariance as a Model of Noise and Redundancy

𝑎=[ 𝑎1 𝑎 2 … 𝑎𝑛 ] 𝑏=[ 𝑏1 𝑏2 … 𝑏𝑛 ] Both zero mean

2 1 ⊤
𝜎 𝑎𝑏= 𝑎𝑏
𝑛− 1
Measurement types

[]
𝑥1

( )
2
𝜎1
2
𝜎 12 … 𝜎1𝑛
2 All
𝑥2 measurements
𝑋= . 1 ⊤ 𝜎 221 𝜎 22 … 𝜎 22 𝑛 for a particular
.
𝐶 𝑋= 𝑋𝑋 =
𝑛 −1 ⋮ ⋮ ⋱ ⋮ type
. 2 2 2
𝑥𝑚 𝜎𝑚1 𝜎𝑚2 … 𝜎𝑚
Covariance Matrix

  dot product b/w vector of measurement type () and vector of


measurement type ()

 is a square symmetric matrix

 Diagonal terms  variance of particular measurement types

 Off-diagonal terms  covariance between measurement types

Two Goals:

 (1) Minimize redundancy: Measured by the covariance

 (2) Maximizing signal: Measured by variance


Diagonalizing
What would be the nature of the covariance matrix () of ?
We want low redundancy  Low absolute covariance
Thus
In order to minimize redundancy, minimize covariance
Set all the off-diagonal terms of to be zero

( ) 𝐶 𝑌0 =¿ 0
2 2 2
𝜎1 𝜎 12 … 𝜎1𝑛
2 2 2
𝐶 𝑋 = 𝜎 21 𝜎2 … 𝜎 2𝑛
⋮ ⋮ ⋱ ⋮
𝜎 2𝑚 1 2
𝜎𝑚 2 … 𝜎 2𝑚

DIAGONALIZATION
Diagonalizing

 What would be the diagonalization method?

 PCA assumes all the basis vectors are orthonormal


 [Kronecker Delta]
 is an orthonormal matrix

 PCA assumes directions with the largest variance as most important


[Principal Component]
 is a generalized rotation that aligns the basis vectors along maximum variance
PCA Method

 Select normalized direction () in m-dimensional space


 The variance of is maximized along

 Find another normalized direction ()


 is orthogonal to all the previously computed vectors

 Repeat until vectors are selected

How to formalize this in linear algebra?


Important Assumptions
 Linearity:
 Change of basis assumes linearity
 Include non-linearity: Kernel PCA

 Mean and variance are sufficient statistics


 Only class of distribution follow this is exponential
 must be exponentially distributed (e.g. Gaussian)

 Large variances correspond to interesting dynamics

 Principal components are orthogonal


Solving PCA with Linear Algebra

Find some orthogonal matrix where such that


is diagonalized.

The rows of are the principal components of


Solving PCA with Linear Algebra

is a symmetric matrix

𝐴=𝐸 Λ 𝐸 ⊤

𝐴
𝑒1

¿ 𝑒1 𝑒2 𝑒𝑚 𝑒2

𝑒𝑚
Solving PCA with Linear Algebra

𝐴
𝑒1

𝐴=𝐸 Λ 𝐸 ⊤ ¿ 𝑒1 𝑒2 𝑒𝑚 𝑒2

𝑒𝑚

Assume:
𝐴=𝑃 ⊤ Λ 𝑃
Solving PCA with Linear Algebra
⊤ 1
𝑃=𝐸 𝐶𝑌 = Λ
𝑛 −1

Principal Components of are the eigenvectors of ; the rows of

The diagonal value () of is the variance of along

[ ]
𝑝1 . 𝑥𝑖
𝑦 𝑖= 𝑝2 . 𝑥𝑖

𝑝𝑚 . 𝑥𝑖
SVD and PCA
SVD over a data matrix:
𝑋=𝑈 Σ𝑉 ⊤
Data Transform
𝑌 =𝑈 ⊤ 𝑋
1
𝐶𝑌 = Λ
𝑛 −1
Λ=Σ 2 𝜆=𝜎 2
1 2
𝐶𝑌 = Σ
𝑛 −1
PCA: Summary

 PCA is an unsupervised dimensionality reduction method


 Steps
 Organize data
▪  No of measurement types
▪  No of observations
 Subtract off the mean of each measurement type (row )
 Calculate SVD or the eigenvectors of the covariance matrix ()
Linear Discriminant Analysis (LDA)
PCA Issue

o o o o o o
o + + o
o o o +
+ + + o o o + +
o ++ o
o o ++++ + + + o o
+ + ++
o o
++ + + +
o o
+ + +
+ + + + + +

PCA LDA

Unsupervised Supervised
LDA Intuition
1) Centre of clusters should
be far apart
High between class variance

2) Data points in each class


closer to its centroid
Low within class variance
Linear (Fisher’s) Discriminant Analysis

 Two class classification problem


 Reduce dimensionality of the data to one dimension
 Project a data item into one dimension

 One dimensional is used for classification


Linear Discriminant Analysis

 Objective
 To find the direction () such that the projected data points are well
separated

 Objective (Restated)
 Maximize the projected class mean distance
 Minimize within class variance
Projections

 Projection of

 Projection of

 Projection of

 Projection of covariance

 Projection of covariance
Revisiting the Goals

Maximize the distance between the projected means



( 𝑤 𝜇1 − 𝑤 𝜇 2)
⊤ ⊤
( 𝑤 ⊤ 𝜇 1 − 𝑤⊤ 𝜇2 )

⊤ ⊤ max 𝑤⊤ 𝑆 𝐵 𝑤
( 𝜇1 − 𝜇 2 ) 𝑤 𝑤 ( 𝜇 1 − 𝜇2 )
⊤ ⊤
(𝜇1 − 𝜇 2) ( 𝜇1 − 𝜇 2 ) 𝑤 𝑤


𝑤 (𝜇 1 − 𝜇 2 ) ( 𝜇 1 −𝜇 2 ) 𝑤
⊤ Between class covariance

𝑤⊤ 𝑆 𝐵 𝑤
Revisiting the Goals

Minimize within class variance

𝑤⊤ Σ 1 𝑤+𝑤⊤ Σ 2 𝑤

𝑤 ⊤ ( Σ 1+ Σ 2) 𝑤
𝑤 ⊤ 𝑆𝑤 𝑤

mi 𝑛 𝑤⊤ 𝑆𝑤 𝑤

Within class variance


Combining Two Goals

max 𝑤⊤ 𝑆 𝐵 𝑤 min 𝑤⊤ 𝑆𝑤 𝑤
w w

𝑤⊤ 𝑆 𝐵 𝑤
max ⊤ Rayleigh Quotient
w 𝑤 𝑆𝑤 𝑤

Constraint Optimization Problem


max 𝑤⊤ 𝑆 𝐵 𝑤
w

Such that
Constraint Optimization
max 𝑤⊤ 𝑆 𝐵 𝑤 Such that
w

Lagrangian

𝜕𝐿
=0
𝜕𝑤
2 𝑆 𝐵 𝑤 −2 𝜆 𝑆 𝑤 𝑤=0
𝑆 𝐵 𝑤=𝜆 𝑆 𝑤 𝑤
𝑆−𝑤1 𝑆 𝐵 𝑤= 𝜆𝑤
Constraint Optimization
𝑆−𝑤1 𝑆 𝐵 𝑤= 𝜆𝑤

is eigenvector of

What is the rank of ?

How many eigenvectors of ?


LDA Steps

 Find the matrices and

 Compute

 the eigenvector of

 Project each data point


Multi-Class LDA

https://content.iospress.com/articles/ai-communications/aic729
LDA Example (Two Class)

[ ]
4 1 𝑐1
2 4 𝑐1 𝜇1 =[ 3.00 3.60]
2 3 𝑐1 𝜇 2=[8.40 7.60 ] [
𝑆 𝐵 = 2 9.16
21.60
21.60
16.00 ]
3 6 𝑐1
4 4 𝑐1
𝑋=
9 10 𝑐2 𝑆𝑤 1 =
[ 0 .80
−0.40
− 0.40
2.6 4 ]
6 8 𝑐2 𝑆𝑤 =
[ 2 .64 − 0.44
]
[ ]
−0.44 5.28
9 5 𝑐2 𝑆𝑤 2 = 1 .84 − 0.04
8 7 𝑐2 − 0.04 2.64
10 8 𝑐2

https://content.iospress.com/articles/ai-communications/aic729
LDA Example (Two Class)

[ ]
4 1 𝑐1 𝑆−𝑤1 𝑆 𝐵 𝑤= 𝜆𝑤
2 4 𝑐1 ¿ 𝑆−𝑤1 𝑆𝐵 − 𝜆 𝐼 ∨¿ 0
2 3 𝑐1

𝑋=
3
4
6
4
𝑐1
𝑐1 [ 1 1.89 − 𝜆
5.08
8.81
3.76 − 𝜆
=0
]
9 10 𝑐2 𝜆=15.65
6 8 𝑐2
9
8
5
7
𝑐2
𝑐2
[ 1 1.89
5.08
8.81
3.76 ][ ]
𝑤1
𝑤2
=15.65
𝑤1
𝑤2 [ ]
10 8 𝑐2 ⊤
𝑤∗ =[ − 0.91− 0.39 ]

https://content.iospress.com/articles/ai-communications/aic729
PCA vs LDA
PCA vs LDA

Coffee odor detection using PCA


gas sensors

LDA
LDA: Summary

 LDA is a supervised dimensionality reduction technique

 Formulation of LDA is based on two objectives


 Maximize the distance between mean of the classes
 Minimize the variance of data points within class

You might also like