0% found this document useful (0 votes)

6 views8 pages

Dimensionality Reduction: Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis Notes

Uploaded by

Raj Narayanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views8 pages

Dimensionality Reduction: Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis Notes

Uploaded by

Raj Narayanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Dimensionality Reduction

Linear Discriminant Analysis (LDA)

 Linear discriminant analysis (LDA) is a method used in statistics to find a linear combination
of features that separates two or more classes of objects or events.
The resulting combination may be used as a linear classifier, or, more commonly, for
dimensionality reduction before classification.

 We use LDA when we want to have better separability for classification problems. It is also a
dimensionality reduction technique.

 Consider the following example:

- Here, the Green drug means that the drug is
effective
- Here, the Red drug means that the drug is
ineffective

In this case, we have two dimensions of data

(Gene X & Y). To separate these two genes, we
can draw the dotted lines. However, there is a
misclassification and it’s not perfect.

Can we try to better the classification using a 3D data?

Even with 3D representation, it’s hard for us to tell if the data is separated properly because
the depth is not perceived on a 2D graph.

What if we need 4 or more genes to separate the data!

There are problems with this:
- We can’t draw a 4D or more dimensional graph on a flat surface.
- The same issue was faced in PCA as well;
- In PCA, we reduce the dimensions by focusing on the genes(variables) with the most
variation.
This is useful for plotting data with a lot of dimensions (or a lot of genes) onto a
simple X/Y plot.
- In LDA, we will not be interested in the genes with most variation, but in maximizing the
separability between two groups so we can make the best classification decisions.

 Linear Discriminant Analysis (LDA) is like PCA, but it focuses on maximizing the separability
among known categories.

LDA 1
 Using a simple example, we will try to reduce a 2D graph into a 1D graph.

We can start with the bad ways of reducing dimensions:

- One bad option would be to ignore Gene Y and project the data
entirely on X-axis.
This way is bad because it ignores the useful information that Gene Y
provides;
Likewise, projecting the genes onto the Y-axis (i.e., ignoring Gene X)
isn't any better either.

To overcome these issues, LDA provides a better solution.

 What LDA does is that, it uses the information from both

the genes (variables) to create a new axis and projects the
data onto this new axis in a way to maximize the
separation of the two categories.

 Now, how does LDA create the new axis?

The new axis is created according to two criteria
(considered simultaneously)
1st Criteria:
Is that once the data is projected onto the new axis,
we want to maximize the distance between the two means.
- Here, μ is the mean for the green category and μ is the
mean for red category.

2nd Criteria:
Is that we want to minimize the variation within
each category (which LDA calls ‘Scatter’, denoted by S2).
- The S2 is the scatter for green category and S2 is the scatter
for red category.

And both the criteria are considered simultaneously using the formula:
( μ−μ )2 Ideally large
2 2
⟶
S +S Ideally small
Here, in the above formula, the numerator is squared because the mean of green category
may be greater than mean of red category and the difference may be negative. Therefore,
we square the numerator.

Also, in the above formula, we can call: ( μ−μ ) as distance, ‘d’. Hence the formula becomes:
2
d
2 2
S +S
We shall now see why the distance b/w the two means and the scatters are important.
LDA 2
 Consider this dataset:
Here, the data is spread pretty good on the x-axis but there is an overlap along
the y-axis. In this case, if we only maximize the distance b/w the means, then
we will get something like this:

And this separation isn’t great as we’ll have a lot of overlap in the middle.

However, if we optimize the distance b/w the means and the scatter, we’d get:
In this case, we’d get a nice separation b/w the categories.
Although the means in this graph are a little closer than they were in the graph in
the middle, but the scatter is much less and therefore, the separation is good.

 Now, what if we have more than 2 genes (more than 2 dimensions)?

The process is the same:
- Create an axis that maximizes the
distance between the means for the two
categories while minimizing the scatter.

Let’s take an example: Here’s the

dataset with 3 genes.
- LDA will create a new axis:
- Then the data are projected onto the new
axis:
The axis was chosen to maximize the distance
between the two means (between the two
categories) while minimizing the "scatter".

 Now, what if we have 3 categories?

In this case, two things change:
1) 1st difference is how we measure the distances among
the means.
2) 2nd difference is that LDA created 2 axes instead of 1.
Consider the dataset where we have 2 genes but 3 categories:

In the 1st point,

- We will calculate the centroid of the overall dataset:
- Then we measure the distances between a point that is
central in each category and the main central point:
- And then we maximize the
distance between each
category and the central
point while minimizing the
scatter for each category.
The equation that we want to
optimize is:

LDA 3
2 2 2
d +d +d
2 2 2
S +S +S

In the 2nd point,

LDA creates 2 axes in case of 3 categories to separate the data
because the 3 central points for each category define a plane.
(Remember: 2 points define a line; 3 points define a plane)

Therefore, we create x & y axes and these are optimized to separate

the categories better.

When we only use 2 genes, this is no big deal. The data started out on a X/Y plot and plotting
them on a new X/Y plot doesn't change much.

But what if we used 10,000 genes? That would mean we’d need 10,000 dimensions to draw
the data. And here is where being able to create 2 axes that maximize the separation of 3
categories becomes important.

 Now, let’s compare LDA & PCA!

To do that, let’s consider a
dataset:

Here, we’ve got 3 categories we’re

trying to separate and we’ve got
10,000 genes.

Plotting this as a raw data would

require 10,000 axes. But we used
LDA to reduce this number to 2.

In LDA graph, although the separation isn’t perfect, it is still easy to see the 3 categories
(using the 3 colored circles).

In the PCA graph, it does not separate the categories nearly as well as LDA. We can see a lot
of overlap b/w black and blue points. However, PCA wasn’t even trying to separate the
categories; it was just looking for the genes with the most variation.

These are a few differences b/w LDA & PCA; now let’s talk about some similarities:
o Both methods rank the new axes that they create in order of importance.
 PC1 (the first new axis that PCA creates) accounts for the most variation in
the data.
 PC2 (the second new axis) does the second-best job.
 LD1 (the first new axis that LDA creates) accounts for the most variation
between the categories.
 LD2 (the second new axis) does the second-best job.

o Also, both methods can let you dig in and see which genes are driving the new axes.
 In PCA, this means looking at the loading scores.
 In LDA, we can see which genes or which variables correlate with the new
axes.
SUMMARY
LDA is like PCA — both try to reduce dimensions.
— PCA looks at the genes with the most variation.
— LDA tries to maximize the separation of known categories.
LDA 4
More Notes

 LDA is also closely related to principal component analysis (PCA) and factor analysis in that
they both look for linear combinations of variables which best explain the data.

 LDA explicitly attempts to model the difference between the classes of data.
 PCA, in contrast, does not take into account any difference in class.
 Factor Analysis builds the feature combinations based on differences rather than similarities.

 Discriminant analysis is also different from factor analysis in that it is not an

interdependence technique: a distinction between independent variables and dependent
variables (also called criterion variables) must be made.

 LDA works when the measurements made on independent variables for each observation
are continuous quantities.
o When dealing with categorical independent variables, the equivalent
technique is Discriminant Correspondence Analysis.
 Consider the image on the right:
We see that there are two categories.
Updated
We need to enhance the separability of these difference
categories.
Difference b/w
classes

To do this, we will create a unit vector from origin and

project the data points onto this unit vector.

Once done, we will observe that the variance within

each category of the projected datapoints on the unit
vector is less compared to the variance of the original
dataset.

We will also observe that the difference b/w classes

before projection is less. After the projection, the distance b/w the categories on the unit
vector is higher.

Therefore, LDA, increases the inter-cluster distance

decreases the intra-cluster variance.

 `Lastly, for these datapoints below, they will be projected as follows after LDA:

Before LDA After LDA

 In the 3rd example, LDA will not work as

there is overlap, in this case, we will apply

LDA 5
QDA (Quadratic Discriminant Analysis). The idea in QDA is to increase the dimensions so that
we can classify.
Thus the convex curve in 2D in QDA will look like a plane in 3D and it will easily separate the
classses in higher dimensions.

PCA LDA
Purpose: Purpose:
To capture maximum variance To provide maximum separable boundary
Objective:
Objective:
LDA, on the other hand, is a supervised
PCA is an unsupervised technique that
technique that aims to maximize the
aims to maximize the variance of the data
separation between different classes in the
along the principal components. The goal is
data. The goal is to identify the directions
to identify the directions that capture the
that capture the most separation between
most variation in the data
the classes
Dimensionality Reduction:
Dimensionality Reduction:
LDA reduces the dimensionality of the data
PCA reduces the dimensionality of the data
by creating a linear combination of the
by projecting it onto a lower-dimensional
features that maximizes the separation
space
between the classes
Output:
Output:
PCA outputs principal components, which
LDA outputs discriminant functions, which
are linear combinations of the original
are linear combinations of the original
features. These principal components are
features that maximize the separation
orthogonal to each other and capture the
between the classes
most variation in the data
Interpretation:
Interpretation:
PCA is often used for exploratory data
LDA is often used for classification tasks, as
analysis, as the principal components can
the discriminant functions can be used to
be used to visualize the data and identify
separate the classes
patterns
Performance: Performance:
PCA is generally faster and more However, LDA may be more effective at
computationally efficient than LDA, as it capturing the most important information in
does not require labeled data the data when class labels are available
- It’s a supervised ML used for classification
- It’s an unsupervised ML tasks
(Meaning there’s no numeric target or (Target is involved)
label involved) - It focuses on finding features such that the
separability b/w groups is maximum
Used to enhance classification by increasing
Used to reduce dimensionality mostly
separability
While some finer details are lost, LDA will
retain the most discriminative information
Some info is lost as we only choose PCs
for classification purposes. It aims to
that provide >90% variance
preserve the information that helps
distinguish between different classes
Code:
Code:
pca . fit ( x)
lda. fit transform (x , y)
pca .transform (x)
Here, ‘y’ or target is passed as it’s a
No ‘y’ or target is passed because it is an
supervised ML technique
unsupervised ML technique

LDA 6
LDA vs PCA: When to use which method?
 PCA is an unsupervised learning algorithm while LDA is a supervised learning
algorithm.
 This means that PCA finds directions of maximum variance regardless of class labels
while LDA finds directions of maximum class separability.

So now that you know how each method works, when should you use PCA vs LDA for
dimensionality reduction?
 In general, you should use LDA when your goal is classification – that is, when you
have labels for your data points and want to predict which label new points will have
based on their feature values.
 On the other hand, if you don’t have labels for your data or if your goal is simply to
find patterns in your data (not classification), then PCA will likely work better.

That said, there are some situations where LDA may outperform PCA even when you’re not
doing classification.
 For example, imagine that your data has 100 features but only 10% of those features
are actually informative (the rest are noise).
 If you run PCA on this dataset, it will identify all 100 components since its goal is
simply to maximize variance.
 However, because only 10% of those components are actually informative, 90% of
them will be useless.
 If you were to run LDA on this same dataset, it would only identify 10 components
since its goal capturing class separability would be better served by discarding noisy
features.
 Thus, if noise dominates your dataset, then LDA may give better results even if your
goal isn’t classification!
 Because LDA makes stronger assumptions about the structure of your data, it will
often perform better than PCA when your dataset satisfies those assumptions but
worse when it doesn’t.

LDA 7
 Assumptions of LDA:
o LDA assumes that the data has a Gaussian distribution and that the covariance
matrices of the different classes are equal.
o It also assumes that the data is linearly separable, meaning that a linear decision
boundary can accurately classify the different classes.

 Advantages of LDA:
o It is a simple and computationally efficient algorithm.
o It can work well even when the number of features is much larger than the number
of training samples.
o It can handle multicollinearity (correlation between features) in the data.

 Disadvantages of LDA:
o It assumes that the data has a Gaussian distribution, which may not always be the
case.
o It assumes that the covariance matrices of the different classes are equal, which may
not be true in some datasets.
o It assumes that the data is linearly separable, which may not be the case for some
datasets.
o It may not perform well in high-dimensional feature spaces.
o LDA is used specifically in solving supervised classification problems for multiple
classes; something impossible if using logistic regression. But LDA does not work in
cases when the mean of the distributions is shared.
o In such a situation, LDA cannot produce a new axis that can linearly separate both
classes. To solve this problem, non-linear discriminant analysis is used in machine
learning

LDA 8

UNIT-4 Machine Learning
No ratings yet
UNIT-4 Machine Learning
20 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
2 pages
Cohen Et Al. - 2004 - Trends in EM
No ratings yet
Cohen Et Al. - 2004 - Trends in EM
42 pages
UNIT - 4
No ratings yet
UNIT - 4
76 pages
Data Preprocessing-VI (Feature Extraction- LDA)_41bb2da568511945498fd61f3cdcf116
No ratings yet
Data Preprocessing-VI (Feature Extraction- LDA)_41bb2da568511945498fd61f3cdcf116
24 pages
The Shape of Data: Geometry-Based Machine Learning and Data Analysis in R
From Everand
The Shape of Data: Geometry-Based Machine Learning and Data Analysis in R
Colleen M. Farrelly
No ratings yet
ML 4
No ratings yet
ML 4
14 pages
Linear Discriminant Analysis LDA PDF
No ratings yet
Linear Discriminant Analysis LDA PDF
2 pages
PCALDAICA (2)
No ratings yet
PCALDAICA (2)
28 pages
Empirical Asset Pricing Models Jau-Lian Jeng download
100% (4)
Empirical Asset Pricing Models Jau-Lian Jeng download
46 pages
PCA and LDA Assignment
No ratings yet
PCA and LDA Assignment
5 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
14 Linear Discriminant Analysis 05-09-2024
No ratings yet
14 Linear Discriminant Analysis 05-09-2024
3 pages
Ya Wen Part-Managing Validity Versus Reliability Trade-Offs in Scale-Building Decisions
No ratings yet
Ya Wen Part-Managing Validity Versus Reliability Trade-Offs in Scale-Building Decisions
13 pages
1.2. Linear and Quadratic Discriminant Analysis — scikit-learn 1.6.1 documentati
No ratings yet
1.2. Linear and Quadratic Discriminant Analysis — scikit-learn 1.6.1 documentati
10 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
No ratings yet
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
33 pages
LDA Final
No ratings yet
LDA Final
25 pages
Methods and Measures Growth mixture modeling A method for identifying differences in longitudinal change among unobserved groups
No ratings yet
Methods and Measures Growth mixture modeling A method for identifying differences in longitudinal change among unobserved groups
13 pages
Reference+Material LDA
No ratings yet
Reference+Material LDA
24 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
45 pages
ML-UNIT4
No ratings yet
ML-UNIT4
44 pages
Fishers LDA
No ratings yet
Fishers LDA
47 pages
Principal Component Analysis (PCA) and Linear Discriminant Analysis for Image Recognition
No ratings yet
Principal Component Analysis (PCA) and Linear Discriminant Analysis for Image Recognition
17 pages
AI Com LDA Tarek
No ratings yet
AI Com LDA Tarek
22 pages
ML 6
No ratings yet
ML 6
5 pages
Discriminant Analysis Presentation
No ratings yet
Discriminant Analysis Presentation
7 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
16 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
1 page
Linear+Discriminant+Analysis+Reference
No ratings yet
Linear+Discriminant+Analysis+Reference
6 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
12 pages
The "Art" of Creative Food Experiences
No ratings yet
The "Art" of Creative Food Experiences
11 pages
3.4 Lda
No ratings yet
3.4 Lda
12 pages
ML-UNIT4
No ratings yet
ML-UNIT4
41 pages
ML Unit 3
No ratings yet
ML Unit 3
29 pages
Presentation
No ratings yet
Presentation
16 pages
Week 8 Notes_DM
No ratings yet
Week 8 Notes_DM
26 pages
Lecture 16_25.09.2024_PCA, Unsupervised Learning-Clustring & Metrics
No ratings yet
Lecture 16_25.09.2024_PCA, Unsupervised Learning-Clustring & Metrics
51 pages
EE 566 - Pattern Recognition Project
No ratings yet
EE 566 - Pattern Recognition Project
19 pages
ML 6
No ratings yet
ML 6
7 pages
lec9lda
No ratings yet
lec9lda
48 pages
Gordon Allport: Motivation and Personality
No ratings yet
Gordon Allport: Motivation and Personality
33 pages
PA
No ratings yet
PA
8 pages
What Is Linear Discriminant Analysis
No ratings yet
What Is Linear Discriminant Analysis
3 pages
Evaluation of Work Engagement As A Measure of Psychological Well-Being From Work Motivation
0% (2)
Evaluation of Work Engagement As A Measure of Psychological Well-Being From Work Motivation
263 pages
Linear Discriminant Analysis: A Detailed Tutorial
No ratings yet
Linear Discriminant Analysis: A Detailed Tutorial
22 pages
U20cs604 Machine Learning Unit II
No ratings yet
U20cs604 Machine Learning Unit II
50 pages
Blockchain Adoption in the Maritime Industry Empirical Evidence From the Technological-Organizational-Environmental Framework
No ratings yet
Blockchain Adoption in the Maritime Industry Empirical Evidence From the Technological-Organizational-Environmental Framework
24 pages
ANN UNIT-II (1)
No ratings yet
ANN UNIT-II (1)
29 pages
9.MBTI Global Manual Tech Brief LGBTQ
No ratings yet
9.MBTI Global Manual Tech Brief LGBTQ
9 pages
08 - Chapter 5
No ratings yet
08 - Chapter 5
78 pages
9 - Linear Discriminant Analysis
No ratings yet
9 - Linear Discriminant Analysis
19 pages
Unit-4 Dimensionality Reduction
No ratings yet
Unit-4 Dimensionality Reduction
17 pages
Lecture 5
No ratings yet
Lecture 5
13 pages
Applied Multivariate Statistical Analysi
No ratings yet
Applied Multivariate Statistical Analysi
9 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
27 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
21 pages
ML Ex 1
No ratings yet
ML Ex 1
4 pages
Lda PDF
No ratings yet
Lda PDF
47 pages
Linear Discriminat Analysis
No ratings yet
Linear Discriminat Analysis
23 pages
Machine Learning Lab Manual 8
No ratings yet
Machine Learning Lab Manual 8
12 pages
Linear Discriminant Analysis: January 2015
No ratings yet
Linear Discriminant Analysis: January 2015
67 pages
PHD - 2012 - Jenes - Barbara - Den - DIMENSIONS AND MEASUREMENT MODEL OF COUNTRY IMAGE AND COUNTRY BRAND PDF
No ratings yet
PHD - 2012 - Jenes - Barbara - Den - DIMENSIONS AND MEASUREMENT MODEL OF COUNTRY IMAGE AND COUNTRY BRAND PDF
288 pages
1694601448-Unit 3.5 Linear Discriminant Analysis CU 2.0
No ratings yet
1694601448-Unit 3.5 Linear Discriminant Analysis CU 2.0
25 pages
LDA Tutorial
No ratings yet
LDA Tutorial
47 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
33 pages
CO3 Session 14
No ratings yet
CO3 Session 14
15 pages
(Waploaded 40394) SHE Must Be Obeyed S01E05
No ratings yet
(Waploaded 40394) SHE Must Be Obeyed S01E05
10 pages
Linear Discriminant Analysis Summary
No ratings yet
Linear Discriminant Analysis Summary
12 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
LDA
No ratings yet
LDA
10 pages
Role Portrayal of Women in Advertising - An Empirical Study
No ratings yet
Role Portrayal of Women in Advertising - An Empirical Study
21 pages
Yi 2009
No ratings yet
Yi 2009
17 pages
Two-Dimensional Linear Discriminant Analysis: Jieping Ye Ravi Janardan Qi Li
No ratings yet
Two-Dimensional Linear Discriminant Analysis: Jieping Ye Ravi Janardan Qi Li
8 pages
Reading - Strategic Factor Analysis Summary (SFAS) Matrix
No ratings yet
Reading - Strategic Factor Analysis Summary (SFAS) Matrix
3 pages
Oseland, Sara Wathne - The Emotional Regulation Checklist Factor Form
No ratings yet
Oseland, Sara Wathne - The Emotional Regulation Checklist Factor Form
55 pages
An Empirical Investigation of The Relationship Between Personality Traits and Fashion Consciousness Among College Going Students in India
No ratings yet
An Empirical Investigation of The Relationship Between Personality Traits and Fashion Consciousness Among College Going Students in India
9 pages
Factors Influencing Online Shopping Intention - An Empirical Study in Vietnam
No ratings yet
Factors Influencing Online Shopping Intention - An Empirical Study in Vietnam
10 pages
Validation of A New General Self-Efficacy Scale
No ratings yet
Validation of A New General Self-Efficacy Scale
24 pages
Chapter 19: Factor Analysis: Advance Marketing Research
No ratings yet
Chapter 19: Factor Analysis: Advance Marketing Research
37 pages
Project Failure As A Reoccurring Issue in Developing Countries PDF
No ratings yet
Project Failure As A Reoccurring Issue in Developing Countries PDF
20 pages
Sustainability 13 13385 v2
No ratings yet
Sustainability 13 13385 v2
14 pages
The Influence of Maqashid Syariah Toward Mustahik's Empowerment & Welfare (Study Baznas Riau) - E. Armas Pailis, DKK
No ratings yet
The Influence of Maqashid Syariah Toward Mustahik's Empowerment & Welfare (Study Baznas Riau) - E. Armas Pailis, DKK
11 pages
Stress and Coping in The Explanation OF Psychological Adjustment Among Chronically Ill Adults
No ratings yet
Stress and Coping in The Explanation OF Psychological Adjustment Among Chronically Ill Adults
10 pages
Exercises of Limits
From Everand
Exercises of Limits
Simone Malacrida
No ratings yet
Event-Brand Transfer in An Entertainment Service: Experiential Marketing
No ratings yet
Event-Brand Transfer in An Entertainment Service: Experiential Marketing
20 pages
Manual Bender Lauretta PDF
0% (1)
Manual Bender Lauretta PDF
67 pages
Joreskog&Sorbom - LISREL 8 - Structural Equation Modeling With Simplis Command Language (1998) - Iki1psl
No ratings yet
Joreskog&Sorbom - LISREL 8 - Structural Equation Modeling With Simplis Command Language (1998) - Iki1psl
12 pages
Purchase Intention Journal
0% (1)
Purchase Intention Journal
6 pages
Factor Analysis - Stata
No ratings yet
Factor Analysis - Stata
4 pages