0% found this document useful (0 votes)

325 views

Dimensionality Reduction

The document discusses dimensionality reduction techniques. It introduces the concept of dimensionality reduction to reduce redundant and noisy features. Some common dimensionality reduction methods mentioned are feature selection, feature extraction using singular value decomposition (SVD) and principal component analysis (PCA), and neural network-based methods like autoencoders. SVD is described as a way to reduce the dimensionality of document vectors in information retrieval applications to address issues like synonymy and polysemy.

Uploaded by

Rounak Mandal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

325 views

Dimensionality Reduction

Uploaded by

Rounak Mandal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 85

Machine Learning: Foundation and Applications

Dimensionality Reduction

Plaban Kumar Bhowmick

Introduction to Dimensionality Reduction

 Curse of Dimensionality
 Large number of input features may lead to poor performance

 Dimensionality Reduction
 Reduce the number of redundant and noisy features
Dimensionality Reduction: Why

 Classifiers are already selecting the set of good features

 But we still need dimensionality reduction
 Complexity may depend on input size
▪ Reducing memory need and computation by discarding irrelevant features
 Reduction the cost of extraction
 Simpler models are more robust (low variance) on small datasets
 Data can be explained by fewer no of features (hidden or latent)
 Data can be visualized with lower dimensions (e.g., tSNE)
Dimensionality Reduction Methods

 Feature Selection
 Filtering and wrapper-based methods

 Feature Extraction Methods

 Singular Value Decomposition (SDV)
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)

 Neural network-based method

 Autoencoder
Feature Selection
Subset Selection

 Best subset of the set of features

 Least no dimensions that most contribute to the accuracy
 Total no of possibilities:
▪ Infeasible to check for all
 Iteratively select an test
▪ Stop when the performance criteria is met
▪ Testing is performed on validation set
▪ is the feature set of input dimensions,
▪ is the error incurred on a validation sample represented with feature set
 Sequential Forward Selection
 Sequential Backward Selection
Forward Selection (Wrapper Method)

 Start with no feature:

 At each step
 For each :
▪ Train the model on the training set with
▪ Calculate on the validation set

 Add if
 Stop
▪ If adding any feature does not decrease , or
▪ If the decrease in error too small, or
▪ If we have reached the desired performance level
Backward Selection

 Start with no feature:

 At each step
 For each :
▪ Train the model on the training set with
▪ Calculate on the validation set

 Add if
 Stop
▪ If adding any feature does not decrease , or
▪ If the decrease in error too small, or
▪ If we have reached the desired performance level
Feature Extraction Methods
Singular Value Decomposition (SVD)
Use Case: Information Retrieval
high Feynman was a
Newton was similarity
Document good physicist.
Query ( a good low
Collection physicist similarity
Newton was a
British physicist

𝐷𝑖  Very high-dimensional
𝑄 o hundreds of millions of dimensions

⃗
𝑄∙ ⃗
𝐷𝑖  This is a very sparse vector
⃗ ⃗
cos( 𝑄, 𝐷𝑖 ¿¿= o most entries are zero.
|𝑄|∨ 𝐷𝑖 ∨¿¿
⃗ ⃗
𝐷𝑗

Can we reduce dimensions of query and document vectors

that capture their semantics (meaning)?
Vector Space Model
Problems with VSM

 Two problems with VSM:

 synonymy
▪ car and automobile
 Query containing ‘car’ does not match with document containing ‘automobile’
 polysemy
▪ words have more than one meaning
▪ bank, python
 Query refers to ‘python’ as a language but document as a snake.
The Problem
 Example: Vector Space Model
 (from Lillian Lee)

make
auto car
hidden
engine emissions
Markov
bonnet hood
model
tyres make
emissions
lorry model
normalize
boot trunk
learning
Synonymy Polysemy
Will have small cosine Will have large cosine
but are related but not truly related
Motivating Example

car model mixed machine learning

automobile 1 1 0 1 0 0

car 1 0 1 1 0 0

model 1 1 1 2 1 1

learning 0 0 0 1 1 1
Motivating Example
𝑄= 𝑐𝑎𝑟 𝑚𝑜𝑑𝑒𝑙
R R R R NR NR

automobile 1 1 0 1 0 0 0

car 1 0 1 1 0 0 1

model 1 1 1 2 1 1 1

learning 0 0 0 1 1 1 0

2 1 2 3 1 1
 X    
LSA: Motivating Example

Let us do this
𝑄= 𝑐𝑎𝑟 𝑚𝑜𝑑𝑒𝑙
with Linear
Algebra R R R R NR NR

automobile 1 1 1 1 0 0 0

car 1 1 1 1 0 0 1

model 1 1 1 2 1 1 1

learning 0 0 0 1 1 1 0

2 2 2 3 1 1
     
Bit of Linear Algebra: Rank

automobile 1 1 0 1 0 0

car 1 0 1 1 0 0

model 1 1 1 2 1 1

learning 0 0 0 1 1 1

Let be a matrix
number of linearly independent rows or column

Rank of this matrix is 4

Bit of Linear Algebra: Rank

𝑥1 𝑥1 𝑥1 𝑥+¿
1 𝑥2𝑥2 𝑥2

automobile 1 1 1 1 0 0 1 0

car 1 1 1 1 0 0 1 0

model 1 1 1 2 1 1 1 1

learning 0 0 0 1 1 1 0 1

Rank of this matrix is 2

Rank Leads to Dimensionality

car car
𝑥1 (1,1,0)

𝐷1 , 𝐷2 , 𝐷 3 (1,1,0)
𝐷1 , 𝐷2 , 𝐷 3 (1,0)

automobile automobile

𝐷4 (1,1,1)
𝐷4 (1,1)
𝐷5 , 𝐷6 ( 0,0,1) 𝐷5 , 𝐷6 (0,1)

learning learning
𝑥2 (0,0,1)
Matrix Factorization

1 1 1 1 0 0

1 1 1 1 0 0 1 0
1 1 1 1 0 0
1 1 1 2 1 1 1 0
0 0 0 1 1 1
0 0 0 1 1 1 1 1

0 1

Concepts: car model Concept Space: Lower dimensional

machine learning model representation of

Latent Semantic Analysis

First Step towards LSA
𝐶 : 𝑅𝑎𝑛𝑘−4 𝐶 ′ : 𝑅𝑎𝑛𝑘 − 2
1 1 0 1 0 0 1 1 1 1 0 0
1 0 1 1 0 0 1 1 1 1 0 0
Approx.
1 1 1 2 1 1 1 1 1 2 1 1
0 0 0 1 1 1 0 0 0 1 1 1

𝑥1 (1,1,0)

𝑥2 (0,0,1)
Criteria for Approximation?

 Given: An term document matrix and

 Find a matrix of (column) rank such that is as close to as

possible

 |||| = |||| is called the Frobenius norm

▪ |||| =
Eigenvectors and Eigenvalues

 A real symmetric matrix has eigenvectors () and

eigenvalues ()
 The eigenvectors () are pairwise orthogonal
▪ for
 The eigenvectors are normal (unit length)

 How to compute and ?

▪ Solve
▪ Use to determine
Eigenvalue and Eigenvector

Eigenvalues:

Eigenvectors:
Eigenvector Decomposition

 Eigenvector Decomposition (EVD) or Matrix

Diagonalization (Symmetric matrix)
▪
▪

▪
𝐴=| 6
5 |
5
6
𝜆1 =11 𝜆2 =1 ||
𝑥1= 1
1 | |
𝑥2 = 1
−1

𝐴= | | |
6
5
5
=
1 1
6 √2 1 ||
. . | |
1 11 0 1 1 1
−1 0 1 √ 2 1 −1 |
𝑈 Λ
Singular Value Decomposition

 Term-document matrices () are not generally

square and symmetric

 Singular value decomposition (SVD)

 Let be matrix whose columns are orthogonal
eigenvectors of
 Let be matrix whose columns are orthogonal
eigenvectors of
Singular Value Decomposition (SVD)
Let be the rank of . Then there exists an SVD of
⊤
𝑪= 𝑼 𝚺 𝑽
1. Eigenvalues of are the same as the eigenvalues of

2. be an matrix
• with , zero otherwise
Dimensionality Reduction

𝜎1 𝑚 ×(𝑛 −𝑟 )
𝜎1
𝜎2 𝜎2
Σ 𝑚×𝑛 =¿ 𝜎 𝑟𝑟 ×𝑟 𝜎𝑟
0 𝑟 ×𝑟
0 (𝑚− 𝑟 )× 𝑛
Dimensionality Reduction

𝐶 Eigenvectors of Σ Eigenvectors of

𝑚 ×𝑛 𝑚×𝑚 𝑚 ×𝑛 𝑛 ×𝑛

≈
𝑟 ×𝑛
𝑟 ×𝑟

𝑚 ×𝑟
⊤
𝑼 𝚺 𝑽
Singular Value Decomposition (SVD)
′ ′ ′
𝑈
𝐶 ≈𝑈 Σ𝑉 =𝑈
⊤
, Σ =Σ ,𝑉 =𝑉
 : Input Term-Doc Matrix
 matrix ( terms, documents)
 : Left Singular Matrix
 matrix ( terms, concepts)
 : Singular Values
 diag matrix (strength of each concept)
 : Right Singular Matrix
 matrix ( documents, concepts)
Singular Value Decomposition (SVD)
𝐶 ≈ 𝑈 Σ 𝑉 ⊤=∑ 𝜎 𝑖 𝑢𝑖 ∘ 𝑣 ⊤
𝑖
𝑖

𝑛 𝑟 𝑟 𝑛

𝑚
𝐶 ≈ 𝑚
Singular Value Decomposition (SVD)
𝐶 ≈ 𝑈 Σ 𝑉 ⊤=∑ 𝜎 𝑖 𝑢𝑖 ∘ 𝑣 ⊤
𝑖
𝑖

𝑛 𝜎 1 𝒖𝟏 𝒗 𝟏 𝜎 2 𝒖𝟐 𝒗 𝟐

𝑚
𝐶 ≈ +¿
Low Rank Approximation

 The problem
 Given an matrix and a positive integer , find another matrix to
minimize the Frobenius norm of
Low Rank Approximation from SVD

 Steps
1. Given , find SVD

2. Obtain from with replacing zero the smallest singular values on

diagonal of

3. Compute
What did we do?
𝑘 𝑘 𝑛 Lower dimensional
representation of doc in

𝐶
latent space

Hidden or
Latent
Concepts
Term-Document Matrix
docid text
d1 Ship ocean voyage
d2 Ocean boat
d3 ship
d4 Voyage trip
Term-document matrix
d5 voyage
d6 trip
Singular Vector Decomposition

𝑈 Σ

= *

Latent concepts Concepts strength

Doc in new space

⊤
𝑉
Singular Vector Decomposition
𝑈 Σ 𝑉⊤

Representation of document in latent space

Python Libraries for SVD

numpy
numpy.linalg.svd:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.svd.html

Scipy
scipy.linalg.svd:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.svd.html

scipy.sparse.linalg.svds:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html
SVD: Summary

 A general matrix can be decomposed as

 has columns as eigenvectors of
 has columns as eigenvectors of
 is a diagonal matrix with

 Dimensionality reduction as a low rank approximation of

 Discard some of the low entries in
Principal Component Analysis (PCA)
A Toy Example
Physics Model:
𝑥= 𝑓 (𝑡 )= 𝐴𝑐𝑜𝑠( 𝜔 𝑡 )

Naïve Experimenter:
How many dimensions are important for
measurement?
Which dimensions are important?

Experimental Setup:
Measure ball’s position in 3D
Three cameras (120 Hz) record movement of the
system

How to analyze the movie data to understand the

Source: https://www.cs.cmu.edu/~elaw/papers/pca.pdf
system behavior?
A Toy Example

[]
𝑥𝐴 Camera A
𝑦𝐴
𝑋 = 𝑥𝐵
⃗
Camera B
𝑦𝐵
𝑥𝐶
𝑦𝐶 Camera C

What is an orthonormal basis for ()?

{( 1,0 ) ,( 0,1) }

Naïve basis reflect the way the data has been collected

Consider
A Toy Example

[]
𝑥𝐴 Camera A
𝑦𝐴
𝑋 = 𝑥𝐵
⃗
Camera B
𝑦𝐵
𝑥𝐶
𝑦𝐶 Camera C

𝐵=
[ ][
𝑏1
𝑏2
=
1
0
0
1 ] dimensional

[ ][ ]
𝑏1 1 0 … 0
𝑏2 0 1 … 0
𝐵= . = . . … . dimensional
. . . … .
Each row vector is an orthonormal basis vector 0 0 … 1
𝑏𝑚
Each data point can be trivially expressed as linear
combination of
Change of Basis: Core PCA Idea

 Change of Basis
 Is there another basis, which is a linear combination of the original
basis, that best re-express the original dataset?

 Requirement of linearity in PCA

 Simplifies the re-expression: restricting the set of potential basis

 PCA re-expresses the data as a linear combination of its basis

vectors
Change of Basis: Core PCA Idea
𝑡1 𝑡2 𝑡𝑛 Change of Basis:

𝑋=¿ 𝑌 𝑚×𝑛 =𝑃 𝑚×𝑚 𝑋 𝑚×𝑛

 row of
6 ×72000  column of
 column of
10 mins of recording with 120 Hz
=
Change of Basis: Core PCA Idea
Interpretation of Change of Basis:
Matrix performs a linear transform from to

Matrix performs a rotation and stretch

Rows of is the set of new basis vectors

[] [ ]
𝑝1 𝑝1 . 𝑥𝑖

[ ]
𝑝2 𝑝1 . 𝑥 1 ⋯ 𝑝 1 . 𝑥𝑛 𝑝2 . 𝑥𝑖
𝑃𝑋 = . [ 𝑥 1 𝑥 2 … 𝑥 𝑛 ] 𝑌= ⋮ ⋱ ⋮ 𝑦𝑖= .
. 𝑝𝑚 . 𝑥1 ⋯ 𝑝𝑚 . 𝑥𝑛 .
𝑝𝑚 𝑝𝑚 . 𝑥𝑖
Change of Basis: Core PCA Idea

𝑝2
𝑏2 =[ 0,1,0]

𝑥𝑖 𝑝1

𝑦𝑖
[ 2]
.𝑥 𝑖

=𝑝
=𝑝
1
1]
𝑦 𝑖[

2
.𝑥 𝑖
𝑏1=[1,0,0]
,1 ]
𝑦
0 [3]
[0, 𝑖
=
𝑝
𝑏 3= 3 .𝑥
𝑖
𝑝3
Change of Basis: Core PCA Idea

 Principal Components of

 What is the best way to re-express ?

 Need to have an objective
 What property should hold?

 What is a good choice of basis ?

Setting up the Objective

 What does best-express the data mean?

 Separate the wheat from the chaff

 Original data is garbled

 Three potential causes: Noise, Rotation, Redundancy

 Find a mathematical (linear algebra) goal for deciphering

garbled data
Modelling Noise with Variance
Purity of data: Signal-to-Noise Ratio
𝜎 2𝑠𝑖𝑔𝑛𝑎𝑙
𝑆𝑁𝑅= 2
𝜎 𝑛𝑜𝑖𝑠𝑒
 High precision data
Modelling Noise with Variance
al
si gn
The spring moves in one direction
ise Any measurement that deviates from the straight line must be a
no noise

SNR measures the fatness of the cloud

Thin line along the signal direction: Highly precise data
A circle indicates SNR : noisy data
Thin line along the noise direction: highly noisy data
Naïve basis
Directions with largest variances  Dynamics of interest

Maximizing variance  Appropriate rotation of naïve basis

Finding the direction of
Modelling Redundancy
Covariance as a Model of Noise and Redundancy

𝑎=[ 𝑎1 𝑎 2 … 𝑎𝑛 ] 𝑏=[ 𝑏1 𝑏2 … 𝑏𝑛 ]

Variance: 𝜎 2𝑎=⟨ 𝑎 𝑖 𝑎 𝑖 ⟩ 𝑖 𝜎 2𝑏= ⟨ 𝑏 𝑖 𝑏𝑖 ⟩𝑖

Covariance: 𝜎 2𝑎𝑏= ⟨ 𝑎 𝑖 𝑏𝑖 ⟩𝑖 Degree of linear relationship between two variables

Large (small) value  High (low) redundancy
Covariance as a Model of Noise and Redundancy

𝑎=[ 𝑎1 𝑎 2 … 𝑎𝑛 ] 𝑏=[ 𝑏1 𝑏2 … 𝑏𝑛 ] Both zero mean

2 1 ⊤
𝜎 𝑎𝑏= 𝑎𝑏
𝑛− 1
Measurement types

[]
𝑥1

( )
2
𝜎1
2
𝜎 12 … 𝜎1𝑛
2 All
𝑥2 measurements
𝑋= . 1 ⊤ 𝜎 221 𝜎 22 … 𝜎 22 𝑛 for a particular
.
𝐶 𝑋= 𝑋𝑋 =
𝑛 −1 ⋮ ⋮ ⋱ ⋮ type
. 2 2 2
𝑥𝑚 𝜎𝑚1 𝜎𝑚2 … 𝜎𝑚
Covariance Matrix

  dot product b/w vector of measurement type () and vector of

measurement type ()

 is a square symmetric matrix

 Diagonal terms  variance of particular measurement types

 Off-diagonal terms  covariance between measurement types

Two Goals:

 (1) Minimize redundancy: Measured by the covariance

 (2) Maximizing signal: Measured by variance

Diagonalizing
What would be the nature of the covariance matrix () of ?
We want low redundancy  Low absolute covariance
Thus
In order to minimize redundancy, minimize covariance
Set all the off-diagonal terms of to be zero

( ) 𝐶 𝑌0 =¿ 0
2 2 2
𝜎1 𝜎 12 … 𝜎1𝑛
2 2 2
𝐶 𝑋 = 𝜎 21 𝜎2 … 𝜎 2𝑛
⋮ ⋮ ⋱ ⋮
𝜎 2𝑚 1 2
𝜎𝑚 2 … 𝜎 2𝑚

DIAGONALIZATION
Diagonalizing

 What would be the diagonalization method?

 PCA assumes all the basis vectors are orthonormal

 [Kronecker Delta]
 is an orthonormal matrix

 PCA assumes directions with the largest variance as most important

[Principal Component]
 is a generalized rotation that aligns the basis vectors along maximum variance
PCA Method

 Select normalized direction () in m-dimensional space

 The variance of is maximized along

 Find another normalized direction ()

 is orthogonal to all the previously computed vectors

 Repeat until vectors are selected

How to formalize this in linear algebra?

Important Assumptions
 Linearity:
 Change of basis assumes linearity
 Include non-linearity: Kernel PCA

 Mean and variance are sufficient statistics

 Only class of distribution follow this is exponential
 must be exponentially distributed (e.g. Gaussian)

 Large variances correspond to interesting dynamics

 Principal components are orthogonal

Solving PCA with Linear Algebra

Find some orthogonal matrix where such that

is diagonalized.

The rows of are the principal components of

Solving PCA with Linear Algebra

is a symmetric matrix

𝐴=𝐸 Λ 𝐸 ⊤

𝐴
𝑒1

¿ 𝑒1 𝑒2 𝑒𝑚 𝑒2

𝑒𝑚
Solving PCA with Linear Algebra

𝐴
𝑒1

𝐴=𝐸 Λ 𝐸 ⊤ ¿ 𝑒1 𝑒2 𝑒𝑚 𝑒2

𝑒𝑚

Assume:
𝐴=𝑃 ⊤ Λ 𝑃
Solving PCA with Linear Algebra
⊤ 1
𝑃=𝐸 𝐶𝑌 = Λ
𝑛 −1

Principal Components of are the eigenvectors of ; the rows of

The diagonal value () of is the variance of along

[ ]
𝑝1 . 𝑥𝑖
𝑦 𝑖= 𝑝2 . 𝑥𝑖
…
𝑝𝑚 . 𝑥𝑖
SVD and PCA
SVD over a data matrix:
𝑋=𝑈 Σ𝑉 ⊤
Data Transform
𝑌 =𝑈 ⊤ 𝑋
1
𝐶𝑌 = Λ
𝑛 −1
Λ=Σ 2 𝜆=𝜎 2
1 2
𝐶𝑌 = Σ
𝑛 −1
PCA: Summary

 PCA is an unsupervised dimensionality reduction method

 Steps
 Organize data
▪  No of measurement types
▪  No of observations
 Subtract off the mean of each measurement type (row )
 Calculate SVD or the eigenvectors of the covariance matrix ()
Linear Discriminant Analysis (LDA)
PCA Issue

o o o o o o
o + + o
o o o +
+ + + o o o + +
o ++ o
o o ++++ + + + o o
+ + ++
o o
++ + + +
o o
+ + +
+ + + + + +

PCA LDA

Unsupervised Supervised
LDA Intuition
1) Centre of clusters should
be far apart
High between class variance

2) Data points in each class

closer to its centroid
Low within class variance
Linear (Fisher’s) Discriminant Analysis

 Two class classification problem

 Reduce dimensionality of the data to one dimension
 Project a data item into one dimension

 One dimensional is used for classification

Linear Discriminant Analysis

 Objective
 To find the direction () such that the projected data points are well
separated

 Objective (Restated)
 Maximize the projected class mean distance
 Minimize within class variance
Projections

 Projection of

 Projection of covariance

 Projection of covariance
Revisiting the Goals

Maximize the distance between the projected means

⊤
( 𝑤 𝜇1 − 𝑤 𝜇 2)
⊤ ⊤
( 𝑤 ⊤ 𝜇 1 − 𝑤⊤ 𝜇2 )

⊤ ⊤ max 𝑤⊤ 𝑆 𝐵 𝑤
( 𝜇1 − 𝜇 2 ) 𝑤 𝑤 ( 𝜇 1 − 𝜇2 )
⊤ ⊤
(𝜇1 − 𝜇 2) ( 𝜇1 − 𝜇 2 ) 𝑤 𝑤

⊤
𝑤 (𝜇 1 − 𝜇 2 ) ( 𝜇 1 −𝜇 2 ) 𝑤
⊤ Between class covariance

𝑤⊤ 𝑆 𝐵 𝑤
Revisiting the Goals

Minimize within class variance

𝑤⊤ Σ 1 𝑤+𝑤⊤ Σ 2 𝑤

𝑤 ⊤ ( Σ 1+ Σ 2) 𝑤
𝑤 ⊤ 𝑆𝑤 𝑤

mi 𝑛 𝑤⊤ 𝑆𝑤 𝑤

Within class variance

Combining Two Goals

max 𝑤⊤ 𝑆 𝐵 𝑤 min 𝑤⊤ 𝑆𝑤 𝑤
w w

𝑤⊤ 𝑆 𝐵 𝑤
max ⊤ Rayleigh Quotient
w 𝑤 𝑆𝑤 𝑤

Constraint Optimization Problem

max 𝑤⊤ 𝑆 𝐵 𝑤
w

Such that
Constraint Optimization
max 𝑤⊤ 𝑆 𝐵 𝑤 Such that
w

Lagrangian

𝜕𝐿
=0
𝜕𝑤
2 𝑆 𝐵 𝑤 −2 𝜆 𝑆 𝑤 𝑤=0
𝑆 𝐵 𝑤=𝜆 𝑆 𝑤 𝑤
𝑆−𝑤1 𝑆 𝐵 𝑤= 𝜆𝑤
Constraint Optimization
𝑆−𝑤1 𝑆 𝐵 𝑤= 𝜆𝑤

is eigenvector of

What is the rank of ?

How many eigenvectors of ?

LDA Steps

 Find the matrices and

 Compute

 the eigenvector of

 Project each data point

Multi-Class LDA

https://content.iospress.com/articles/ai-communications/aic729
LDA Example (Two Class)

[ ]
4 1 𝑐1
2 4 𝑐1 𝜇1 =[ 3.00 3.60]
2 3 𝑐1 𝜇 2=[8.40 7.60 ] [
𝑆 𝐵 = 2 9.16
21.60
21.60
16.00 ]
3 6 𝑐1
4 4 𝑐1
𝑋=
9 10 𝑐2 𝑆𝑤 1 =
[ 0 .80
−0.40
− 0.40
2.6 4 ]
6 8 𝑐2 𝑆𝑤 =
[ 2 .64 − 0.44
]
[ ]
−0.44 5.28
9 5 𝑐2 𝑆𝑤 2 = 1 .84 − 0.04
8 7 𝑐2 − 0.04 2.64
10 8 𝑐2

https://content.iospress.com/articles/ai-communications/aic729
LDA Example (Two Class)

[ ]
4 1 𝑐1 𝑆−𝑤1 𝑆 𝐵 𝑤= 𝜆𝑤
2 4 𝑐1 ¿ 𝑆−𝑤1 𝑆𝐵 − 𝜆 𝐼 ∨¿ 0
2 3 𝑐1

𝑋=
3
4
6
4
𝑐1
𝑐1 [ 1 1.89 − 𝜆
5.08
8.81
3.76 − 𝜆
=0
]
9 10 𝑐2 𝜆=15.65
6 8 𝑐2
9
8
5
7
𝑐2
𝑐2
[ 1 1.89
5.08
8.81
3.76 ][ ]
𝑤1
𝑤2
=15.65
𝑤1
𝑤2 [ ]
10 8 𝑐2 ⊤
𝑤∗ =[ − 0.91− 0.39 ]

https://content.iospress.com/articles/ai-communications/aic729
PCA vs LDA
PCA vs LDA

Coffee odor detection using PCA

gas sensors

LDA
LDA: Summary

 LDA is a supervised dimensionality reduction technique

 Formulation of LDA is based on two objectives

 Maximize the distance between mean of the classes
 Minimize the variance of data points within class

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Vector SPM 2003 - 2007
No ratings yet
Vector SPM 2003 - 2007
4 pages
ME3261-CAD T1 Qs and Solutions
100% (1)
ME3261-CAD T1 Qs and Solutions
3 pages
Packages in Python
No ratings yet
Packages in Python
54 pages
Session 1 Tableau Environment
No ratings yet
Session 1 Tableau Environment
16 pages
The Binomial, Poisson, and Normal Distributions
No ratings yet
The Binomial, Poisson, and Normal Distributions
39 pages
Operating System Service
No ratings yet
Operating System Service
6 pages
Allocation Free Space Management Memory Mapped Files
No ratings yet
Allocation Free Space Management Memory Mapped Files
25 pages
OS 04 Threads
No ratings yet
OS 04 Threads
67 pages
Scheduled Classic
No ratings yet
Scheduled Classic
33 pages
Chapter 5: Threads: Multithreading Models Thread Libraries Thread Pools
No ratings yet
Chapter 5: Threads: Multithreading Models Thread Libraries Thread Pools
20 pages
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
No ratings yet
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
22 pages
DEKKER
No ratings yet
DEKKER
5 pages
C Tokens
No ratings yet
C Tokens
14 pages
Operating System - Multi-Threading
No ratings yet
Operating System - Multi-Threading
7 pages
Bankers Algorithm
No ratings yet
Bankers Algorithm
4 pages
Introduction To Real-Time Operating Systems
No ratings yet
Introduction To Real-Time Operating Systems
36 pages
Data Preprocessing and Cleaning
No ratings yet
Data Preprocessing and Cleaning
6 pages
Disk Scheduling Algorithms
No ratings yet
Disk Scheduling Algorithms
9 pages
Bayes' Theorem
No ratings yet
Bayes' Theorem
2 pages
OS Threads Unit 3 Copy 1
No ratings yet
OS Threads Unit 3 Copy 1
34 pages
Operating System
No ratings yet
Operating System
18 pages
Services and Components of OS
No ratings yet
Services and Components of OS
41 pages
Memory Management in RTOS
No ratings yet
Memory Management in RTOS
20 pages
Bankers Algorithm Example
100% (1)
Bankers Algorithm Example
4 pages
DLD - LAB - MANUAL - New - Verilog Spring 2017 PDF
No ratings yet
DLD - LAB - MANUAL - New - Verilog Spring 2017 PDF
48 pages
Processes: Process Concept Process Scheduling Operation On Processes Cooperating Processes Interprocess Communication
No ratings yet
Processes: Process Concept Process Scheduling Operation On Processes Cooperating Processes Interprocess Communication
27 pages
Data Preprocessing in Python - Handling Missing Data
No ratings yet
Data Preprocessing in Python - Handling Missing Data
8 pages
Dekker's Algorithm
No ratings yet
Dekker's Algorithm
9 pages
Data Cleaning and Preprocessing Techniques
No ratings yet
Data Cleaning and Preprocessing Techniques
13 pages
Leetcode 20
No ratings yet
Leetcode 20
2 pages
Data Normalization
No ratings yet
Data Normalization
7 pages
Dimensionality Reduction Lecture Slide
No ratings yet
Dimensionality Reduction Lecture Slide
27 pages
RTOS For Fault Tolerant Application
No ratings yet
RTOS For Fault Tolerant Application
8 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Page Replacement Algorithms
No ratings yet
Page Replacement Algorithms
22 pages
Concepts (PPT) - Data Preprocessing
No ratings yet
Concepts (PPT) - Data Preprocessing
19 pages
Ec2207 - Digital Electronics Lab Manual
No ratings yet
Ec2207 - Digital Electronics Lab Manual
83 pages
Array and String-Students
No ratings yet
Array and String-Students
33 pages
ARIMA Model
No ratings yet
ARIMA Model
30 pages
Q.1. Explain Process, PCB and Process State Diagram. Ans. Process
No ratings yet
Q.1. Explain Process, PCB and Process State Diagram. Ans. Process
16 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
9 pages
Disk Scheduling Algorithms in OS
No ratings yet
Disk Scheduling Algorithms in OS
25 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
Parallel Sorting Algorithms
No ratings yet
Parallel Sorting Algorithms
22 pages
Deep Learning Handson
No ratings yet
Deep Learning Handson
65 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Unit I
No ratings yet
Unit I
85 pages
ML Lect1
100% (1)
ML Lect1
51 pages
3.1 What Is Data Warehouse?: Unit Iii
No ratings yet
3.1 What Is Data Warehouse?: Unit Iii
33 pages
Deep Learning CNN
100% (1)
Deep Learning CNN
28 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Page Replacement Algorithms - Page Fault - Gate Vidyalay
No ratings yet
Page Replacement Algorithms - Page Fault - Gate Vidyalay
9 pages
The Normal Binomial and Poisson Distributions
No ratings yet
The Normal Binomial and Poisson Distributions
25 pages
Demand Paging
No ratings yet
Demand Paging
26 pages
Data Cleaning 2021
No ratings yet
Data Cleaning 2021
61 pages
Compiler Project Report On Mini C++ Compiler Using Lex and Yacc
No ratings yet
Compiler Project Report On Mini C++ Compiler Using Lex and Yacc
11 pages
Datatypes in Python
No ratings yet
Datatypes in Python
6 pages
Neuro Fuzzy Systems
100% (1)
Neuro Fuzzy Systems
27 pages
Android Basics: Xin Yang 2016-05-06
No ratings yet
Android Basics: Xin Yang 2016-05-06
57 pages
XAI (v7)
No ratings yet
XAI (v7)
40 pages
Matrix Operations in MATLAB PDF
No ratings yet
Matrix Operations in MATLAB PDF
5 pages
Calculus for Business Economics the Social Life Sciences Brief 10th Edition Gerald_Bradley - Download the ebook today and own the complete version
100% (1)
Calculus for Business Economics the Social Life Sciences Brief 10th Edition Gerald_Bradley - Download the ebook today and own the complete version
60 pages
Simulation of Simple Random Double Auction
No ratings yet
Simulation of Simple Random Double Auction
1 page
AoPS Community 6
No ratings yet
AoPS Community 6
1 page
Basic Calculus LAS Quarter 4 Antiderivatives
No ratings yet
Basic Calculus LAS Quarter 4 Antiderivatives
3 pages
Sesion 9 10 Edge
No ratings yet
Sesion 9 10 Edge
12 pages
DR B Menge: Domain Image Range
No ratings yet
DR B Menge: Domain Image Range
11 pages
QM Formula Codes 1
No ratings yet
QM Formula Codes 1
1 page
1617 QS015 - 1 Solution
No ratings yet
1617 QS015 - 1 Solution
24 pages
Advanced Canonical Methods Hamilton Jacobi Equation Action Angle
No ratings yet
Advanced Canonical Methods Hamilton Jacobi Equation Action Angle
8 pages
Micro I. Lesson 4
No ratings yet
Micro I. Lesson 4
11 pages
Mathematical Table Turning Revisited: by Bill Baritompa, Rainer L Owen, Burkard Polster, and Marty Ross
No ratings yet
Mathematical Table Turning Revisited: by Bill Baritompa, Rainer L Owen, Burkard Polster, and Marty Ross
17 pages
Unit 3 CSP
No ratings yet
Unit 3 CSP
32 pages
Combining Functions Shifting and Scaling Graphs
No ratings yet
Combining Functions Shifting and Scaling Graphs
50 pages
Full Tutorials PDF
No ratings yet
Full Tutorials PDF
17 pages
M.sc. Part I Sem II April 2016 Set 2 - 1
0% (1)
M.sc. Part I Sem II April 2016 Set 2 - 1
35 pages
Lecture No.4: Electrostatics: Work, Energy, and Potential
No ratings yet
Lecture No.4: Electrostatics: Work, Energy, and Potential
16 pages
Numerical Analysis Short Notes: Download The App For Test Series
No ratings yet
Numerical Analysis Short Notes: Download The App For Test Series
5 pages
Functions. Activities
No ratings yet
Functions. Activities
5 pages
Let's Have Some Fun
No ratings yet
Let's Have Some Fun
8 pages
Msqe Pea 2016
No ratings yet
Msqe Pea 2016
11 pages
Simple Curves
No ratings yet
Simple Curves
12 pages
Unit - 1: Analysis of Algorithm
No ratings yet
Unit - 1: Analysis of Algorithm
16 pages
Slot: A1 Set: A SRM Institute of Science and Technology College of Engineering and Technology Department of Mathematics
No ratings yet
Slot: A1 Set: A SRM Institute of Science and Technology College of Engineering and Technology Department of Mathematics
6 pages
Chapter Seven: Random Variable and Mathematical Expectation Random Variable Define Random Variable With Example
No ratings yet
Chapter Seven: Random Variable and Mathematical Expectation Random Variable Define Random Variable With Example
7 pages
2203 09346v2
No ratings yet
2203 09346v2
34 pages
Mathematical Methods I
No ratings yet
Mathematical Methods I
4 pages
Solvability of A Fractional Integral Equation With The Concept of Measure of Noncompactness
No ratings yet
Solvability of A Fractional Integral Equation With The Concept of Measure of Noncompactness
18 pages