ML Unit 3
ML Unit 3
Both Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are
linear transformation techniques that are commonly used for dimensionality reduction.
PCA can be described as an “unsupervised” algorithm, since it “ignores” class labels and
its goal is to find the directions (the so-called principal components) that maximize the
variance in a dataset.
In contrast to PCA, LDA is “supervised” and computes the directions (“linear
discriminants”) that will represent the axes that maximize the separation between multiple
classes.
Although it might sound intuitive that LDA is superior to PCA for a multi-class
classification task where the class labels are known, this might not always the case.
Linear Discriminant Analysis
Under Linear Discriminant Analysis, we are basically looking for,
1. Which set of parameters can best describe the association of the group for an object?
2. What is the best classification preceptor model that separates those groups?
It is widely used for modeling varieties in groups, i.e. distributing variables into two or more
classes, suppose we have two classes and we need to classify them efficiently.
Linear Discriminant Analysis (Contd.)
Linear Discriminant Analysis
Classes can have multiple features, using one single feature to classify may yield in some
kind of overlapping of variables, so there is a need of increasing the number of features to
avoid overlapping that would result in proper classification in return.
Linear Discriminant Analysis (Contd.)
Performing Linear Discriminant Analysis is a three-step process.
I. Calculate the ‘separability’ between the classes. Known as the between-class variance, it is
defined as the distance between the mean of different classes, and allows for the algorithm to
put a quantitative measure on ‘how difficult’ the problem is (closer means = harder problem).
This separability is kept in a ‘between-class scatter matrix’.
Linear Discriminant Analysis (Contd.)
Performing Linear Discriminant Analysis is a three-step process.
II. Compute the within-class variance, or the distance between the mean and the sample of
every class. This is another factor in the difficulty of separation — higher variance within a
class makes clean separation more difficult.
Linear Discriminant Analysis (Contd.)
Performing Linear Discriminant Analysis is a three-step process.
III. Construct a lower-dimensional space that maximizes the between-class variance
(‘separability’) and minimizes the within-class variance. Known as Fisher’s Criterion, Linear
Discriminant Analysis can be computed using singular value decomposition, eigenvalues, or
using a least squares method.
Linear Discriminant Analysis (Contd.)
How to Prepare Data for LDA?
Classification Problems: LDA is intended for classification problems where the output
variable is categorical. LDA supports both binary and multi-class classification.
Remove Outliers: Consider removing outliers from your data. These can skew the basic
statistics used to separate classes in LDA such the mean and the standard deviation.
Same Variance: LDA assumes that each input variable has the same variance. It is almost
always a good idea to standardize your data before using LDA so that it has a mean of 0
and a standard deviation of 1.
Linear Discriminant Analysis (Contd.)
Extensions to LDA
Linear Discriminant Analysis is a simple and effective method for classification. Because it
is simple and so well understood, there are many extensions and variations to the method.
Some popular extensions include:
Quadratic Discriminant Analysis (QDA): Each class uses its own estimate of variance
(or covariance when there are multiple input variables).
For customers’ recognition: LDA helps here to identify and choose the parameters to
describe the components of a group of customers who are highly likely to buy similar
products.
For predictions: LDA is firmly used for prediction and hence in decision making, “will
you read a book” gives you a predicted result through one or two possible class as a
reading book or not.
Linear Discriminant Analysis (Contd.)
For face recognition: it is the most famous application in the field of computer vision,
every face is drawn with a large number of pixel values, LDA reduces the number of
features to a more controllable number first before implementing the classification task. A
temple is created with newly produced dimensions which are a linear combination of pixel
values.
In medical: LDA is used here to classify the state of patients’ diseases as mild, moderate
or severe based on the various parameters and the medical treatment the patient is going
through in order to decrease the movement of treatment.
In learning: Nowadays, robots are trained to learn and talk to work as human beings, this
can be treated as classification problems. LDA makes similar groups based on various
parameters such as frequencies, pitches, sounds, tunes, etc.
Principal Component Analysis
Background for PCA:
Have you come across a situation where you have so many variables and you are unable to
understand the relationship between each different variable? You are in that situation where
you can overfit your model to the data.
In these kinds of situations you need to lower down your feature space to understand the
relationship between the variables that will result in less chances for overfitting.
To reduce or lower down the dimension of the feature space is called “Dimensionality
Reduction”. It can be achieved either by “Feature Exclusion” or by “Feature Extraction”.
Feature exclusion is about dropping variables and keeping only those features that can be
used to predict the target whereas feature extraction is about extracting features from
features.
Suppose we have 5 independent features and we create 5 new features on the basis of old 5
features, this is the way features extraction works.
Principal Component Analysis (Contd.)
What is Principal Component Analysis?
In both the pictures above, the data points (black dots) are projected to one line but the
second line is closer to the actual points (less projection errors) than first one.
In the direction of largest variance the good line lies that is used for projection.
Principal Component Analysis (Contd.)
What is Principal Component Analysis?
It is needed to modify the coordinate system so as to retrieve 1D representation for vector
y after the data gets projected on the best line.
In the direction of the green line new data y and old data x have the same variance.
Principal Component Analysis (Contd.)
What is Principal Component Analysis?
Principal component takes care of the maximum variance in the underlying data 1 and the
other principal component is orthogonal to it that is 2.
Principal Component Analysis (Contd.)
When to Use PCA?
Case:1 When you want to lower down the number of variables, but you are unable to
identify which variable you don't want to keep in consideration.
Case:2 When you want to check if the variables are independent of each other.
Case:3 When you are ready to make independent features less interpretable.
In above all the three cases you can use PCA.
Principal Component Analysis (Contd.)
Principal Component Analysis steps:
Step 2: Create a correlation matrix or covariance matrix for all the desired dimensions.
Step 3: Calculate eigenvectors that are the principal component and respective eigenvalues
that apprehend the magnitude of variance.
Step 4: Arrange the eigen pairs in decreasing order of respective eigenvalues and pick the
value which has the maximum value, this is the first principal component that protects the
maximum information from the original data.
Principal Component Analysis (Contd.)
Principal Component Analysis steps:
STEP 1: STANDARDIZATION
The aim of this step is to standardize the range of the continuous initial variables so that
each one of them contributes equally to the analysis.
More specifically, the reason why it is critical to perform standardization prior to PCA, is
that the latter is quite sensitive regarding the variances of the initial variables.
That is, if there are large differences between the ranges of initial variables, those variables
with larger ranges will dominate over those with small ranges which will lead to biased
results.
So, transforming the data to comparable scales can prevent this problem.
Principal Component Analysis (Contd.)
Principal Component Analysis steps:
STEP 1: STANDARDIZATION
Mathematically, this can be done by subtracting the mean and dividing by the standard
deviation for each value of each variable.
Once the standardization is done, all the variables will be transformed to the same scale.
Principal Component Analysis (Contd.)
Principal Component Analysis steps:
Since the covariance of a variable with itself is its variance (Cov(a,a)=Var(a)), in the main
diagonal (Top left to bottom right) we actually have the variances of each initial variable.
And since the covariance is commutative (Cov(a,b)=Cov(b,a)), the entries of the
covariance matrix are symmetric with respect to the main diagonal, which means that the
upper and the lower triangular portions are equal.
Principal Component Analysis (Contd.)
Principal Component Analysis Steps:
STEP 3: Compute the Eigenvectors and Eigenvalues of the Covariance Matrix to
Identify the Principal Components
Eigenvectors and eigenvalues are the linear algebra concepts that we need to compute from
the covariance matrix in order to determine the principal components of the data.
Before getting to the explanation of these concepts, let’s first understand what do we mean
by principal components.
Principal components are new variables that are constructed as linear combinations or
mixtures of the initial variables.
These combinations are done in such a way that the new variables (i.e., principal
components) are uncorrelated and most of the information within the initial variables is
squeezed or compressed into the first components.
So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to
put maximum possible information in the first component, then maximum remaining
information in the second and so on.
Principal Component Analysis (Contd.)
Principal Component Analysis Steps:
STEP 3: Compute the Eigenvectors and Eigenvalues of the Covariance Matrix to
Identify the Principal Components
Organizing information in principal components this way, will allow you to reduce
dimensionality without losing much information, and this by discarding the components with
low information and considering the remaining components as your new variables.
An important thing to realize here is that, the principal components are less interpretable
and don’t have any real meaning since they are constructed as linear combinations of the
initial variables.
Geometrically speaking, principal components represent the directions of the data that
explain a maximal amount of variance, that is to say, the lines that capture most information
of the data.
The relationship between variance and information here, is that, the larger the variance
carried by a line, the larger the dispersion of the data points along it, and the larger the
dispersion along a line, the more the information it has.
Principal Component Analysis (Contd.)
How PCA Constructs The Principal Components?
As there are as many principal components as there are variables in the data, principal
components are constructed in such a manner that the first principal component accounts for
the largest possible variance in the data set.
The second principal component is calculated in the same way, with the condition that it is
uncorrelated with (i.e., perpendicular to) the first principal component and that it accounts
for the next highest variance.
This continues until a total of p principal components have been calculated, equal to the
original number of variables.
Now that we understood what we mean by principal components, let’s go back to
eigenvectors and eigenvalues.
What you firstly need to know about them is that they always come in pairs, so that every
eigenvector has an eigenvalue.
And their number is equal to the number of dimensions of the data.
For example, for a 3-dimensional data set, there are 3 variables, therefore there are 3
eigenvectors with 3 corresponding eigenvalues.
Principal Component Analysis (Contd.)
How PCA Constructs The Principal Components?
Without further thinking, it is eigenvectors and eigenvalues who are behind all the magic
explained above, because the eigenvectors of the Covariance matrix are actually the
directions of the axes where there is the most variance(most information) and that we call
Principal Components.
And eigenvalues are simply the coefficients attached to eigenvectors, which give the
amount of variance carried in each Principal Component.
By ranking your eigenvectors in order of their eigenvalues, highest to lowest, you get the
principal components in order of significance.
Principal Component Analysis (Contd.)
Principal Component Analysis (Performance issues)
Potency of PCA is directly dependent on the scale of the attributes. PCA will choose the
variable that has the highest attributes if they are on a different scale without taking care of
correlation.
Effectiveness of PCA can be influenced by the appearance of skew in the data with long
thick tails.