0% found this document useful (0 votes)
14 views

AI Unit 4

Uploaded by

evilgen7.try
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

AI Unit 4

Uploaded by

evilgen7.try
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Pattern Recognition System

Pattern is everything around in this digital world. A pattern can either be seen
physically or it can be observed mathematically by applying algorithms. In Pattern
Recognition, pattern is comprises of the following two fundamental things:

Collection of observations

The concept behind the observation


Differentiate between good and bad features.
Feature properties.
In a statistical-classification problem, a decision boundary is a hypersurface that
partitions the underlying vector space into two sets. A decision boundary is the
region of a problem space in which the output label of a classifier is ambiguous.
Classifier is a hypothesis or discrete-valued function that is used to assign
(categorical) class labels to particular data points.

Classifier is used to partition the feature space into class-labeled decision regions.
While Decision Boundaries are the borders between decision regions.

A Sensor : A sensor is a device used to measure a property, such as pressure,


position, temperature, or acceleration, and respond with feedback.

A Preprocessing Mechanism : Segmentation is used and it is the process of


partitioning a data into multiple segments. It can also be defined as the technique
of dividing or partitioning an data into parts called segments.

A Feature Extraction Mechanism : feature extraction starts from an initial set of


measured data and builds derived values (features) intended to be informative and
non-redundant, facilitating the subsequent learning and generalization steps, and in
some cases leading to better human interpretations. It can be manual or automated.

A Description Algorithm : Pattern recognition algorithms generally aim to provide


a reasonable answer for all possible inputs and to perform “most likely” matching
of the inputs, taking into account their statistical variation
A Training Set : Training data is a certain percentage of an overall dataset along
with testing set. As a rule, the better the training data, the better the algorithm or
classifier performs.

Statistical Approach and Structural Approach


Descriptive Statistics: It summarizes data from a sample using indexes such as the
mean or standard deviation.
Inferential Statistics: It draw conclusions from data that are subject to random
variation.
Sentence Patterns
Phrase Patterns
Formulas
Idioms

Pattern recognition is a subfield of machine learning that focuses on the automatic


discovery of patterns and regularities in data. It involves developing algorithms
and models that can identify patterns in data and make predictions or decisions
based on those patterns.

There are several basic principles and design considerations that are important in
pattern recognition:

Feature representation: The way in which the data is represented or encoded is


critical for the success of a pattern recognition system. It is important to choose
features that are relevant to the problem at hand and that capture the underlying
structure of the data.
Similarity measure: A similarity measure is used to compare the similarity between
two data points. Different similarity measures may be appropriate for different
types of data and for different problems.
Model selection: There are many different types of models that can be used for
pattern recognition, including linear models, nonlinear models, and probabilistic
models. It is important to choose a model that is appropriate for the data and the
problem at hand.
Evaluation: It is important to evaluate the performance of a pattern recognition
system using appropriate metrics and datasets. This allows us to compare the
performance of different algorithms and models and to choose the best one for the
problem at hand.
Preprocessing: Preprocessing is the process of preparing the data for analysis. This
may involve cleaning the data, scaling the data, or transforming the data in some
way to make it more suitable for analysis.
Feature selection: Feature selection is the process of selecting a subset of the most
relevant features from the data. This can help to improve the performance of the
pattern recognition system and to reduce the complexity of the model.
Prerequisite – Pattern Recognition | Introduction Pattern Recognition System Pattern is
everything around in this digital world. A pattern can either be seen physically or it can be
observed mathematically by applying algorithms. In Pattern Recognition, pattern is comprises of
the following two fundamental things:

Collection of observations
• The concept behind the observation
• Differentiate between good and bad features.
• Feature properties.
1.In a statistical-classification problem, a decision boundary is a hyper surface that partitions the
underlying vector space into two sets. A decision boundary is the region of a problem space in
which the output label of a classifier is ambiguous. Classifier is a hypothesis or discrete-valued
function that is used to assign (categorical) class labels to particular data points.
2.Classifier is used to partition the feature space into class-labeled decision regions. While
Decision Boundaries are the borders between decision regions.

a) A Sensor : A sensor is a device used to measure a property, such as pressure, position,


temperature, or acceleration, and respond with feedback.
b) A Preprocessing Mechanism : Segmentation is used and it is the process of partitioning a
data into multiple segments. It can also be defined as the technique of dividing or
partitioning an data into parts called segments.
c) A Feature Extraction Mechanism : feature extraction starts from an initial set of measured
data and builds derived values (features) intended to be informative and non-redundant,
facilitating the subsequent learning and generalization steps, and in some cases leading to
better human interpretations. It can be manual or automated.
d) A Description Algorithm : Pattern recognition algorithms generally aim to provide a
reasonable answer for all possible inputs and to perform “most likely” matching of the
inputs, taking into account their statistical variation
e) A Training Set : Training data is a certain percentage of an overall dataset along with
testing set. As a rule, the better the training data, the better the algorithm or classifier
performs.
3.Statistical Approach and Structural Approach
4.Descriptive Statistics: It summarizes data from a sample using indexes such as the mean or
standard deviation.
5.Inferential Statistics: It draw conclusions from data that are subject to random variation.
6.Sentence Patterns
7.Phrase Patterns
8.Formulas
9.Idioms
Pattern recognition is a subfield of machine learning that focuses on the automatic discovery of
patterns and regularities in data. It involves developing algorithms and models that can identify
patterns in data and make predictions or decisions based on those patterns.

There are several basic principles and design considerations that are important in pattern
recognition:
i. Feature representation: The way in which the data is represented or encoded is critical for
the success of a pattern recognition system. It is important to choose features that are
relevant to the problem at hand and that capture the underlying structure of the data.
ii. Similarity measure: A similarity measure is used to compare the similarity between two
data points. Different similarity measures may be appropriate for different types of data
and for different problems.
iii. Model selection: There are many different types of models that can be used for pattern
recognition, including linear models, nonlinear models, and probabilistic models. It is
important to choose a model that is appropriate for the data and the problem at hand.
iv. Evaluation: It is important to evaluate the performance of a pattern recognition system
using appropriate metrics and datasets. This allows us to compare the performance of
different algorithms and models and to choose the best one for the problem at hand.
v. Preprocessing: Preprocessing is the process of preparing the data for analysis. This may
involve cleaning the data, scaling the data, or transforming the data in some way to make
it more suitable for analysis.
vi. Feature selection: Feature selection is the process of selecting a subset of the most
relevant features from the data. This can help to improve the performance of the pattern
recognition system and to reduce the complexity of the model.
Statistical Pattern Recognition
Statistical pattern recognition: Statistical pattern recognition is the branch of statistics
that deals with the identification and classification of patterns in data. It is a type of supervised
learning, where the data is labeled with class labels that indicate which class a particular instance
belongs to. The goal of statistical pattern recognition is to learn a model that can accurately
classify new data instances based on their features.

The topic of machine learning known as statistical pattern recognition focuses on finding
patterns and regularities in data. It enables machines to gain knowledge from data, enhance
performance, and make choices based on what they have discovered. The goal of Statistical
Pattern Recognition is to find relationships between variables that can be used for prediction or
classification tasks. This article will explore the various techniques used in Statistical Pattern
Recognition and how these methods are applied to solve real-world problems.

The importance of pattern recognition lies in its ability to detect complex relations among
variables without explicit programming instructions. By using statistical models, machines can
identify regularities in data that would otherwise require manual labor or trial-and-error
experimentation by humans. In addition, machines can generalize from existing knowledge bases
to predict new outcomes more accurately than before.

Statistical Pattern Recognition is becoming increasingly important within many industries due to
its ability to automate certain processes as well as providing valuable insights into large datasets
that may otherwise remain hidden beneath the surface. With this article, we aim to provide an
overview of different techniques used for identifying patterns within data and explain how they
are employed in solving practical problems effectively.

What Is Statistical Pattern Recognition With Example?


Statistical pattern recognition (SPR) is a field of data analysis that uses mathematical models and
algorithms to identify patterns from large datasets. It can be used for various tasks, such as
handwriting or speech recognition, classification of objects in images, and natural language
processing. SPR employs several techniques including vector machines, neural networks, linear
discriminants, Bayesian methods, k-nearest neighbors and other feature extraction algorithms.

In terms of applications, SPR has been successfully applied to problems like cursive handwriting
recognition and automated medical diagnosis. In the case of handwriting recognition, an
algorithm works by extracting features using a feature extraction algorithm then matching them
with existing model parameters. The same principle applies when solving more complex tasks
such as image classification where deep learning may be employed instead of traditional
methods like discriminant analysis. Similarly, machine vision systems use SPR techniques to
identify objects within an image and classify them according to specific criteria. Furthermore,
modern robotics also utilize SPR concepts to enable robots to recognize their environment better
iRobot Roomba Vacuum Cleaner being one example.

By combining different types of data transformations alongside statistical modeling approaches


such as supervised learning algorithms and unsupervised clustering techniques, it is possible to
uncover new insights from data sets which could otherwise have gone unnoticed. With the help
of advancements in computing power and technologies like artificial intelligence (AI), this area
continues to grow rapidly allowing us to investigate further into the world of big data.

What Is Statistical Pattern Recognition In Cognitive Psychology?


A method of cognitive psychology known as statistical pattern recognition employs learning
algorithms to detect patterns in data automatically. This technology can be used for shape
recognition, where features such as the size and orientation of the object are extracted from an
image using feature selection techniques and then converted into a feature vector input which
describes the identity of the object. The classification approach uses this information to classify
objects or events according to their properties, with Bayesian pattern classifiers being one of the
most common methods. These optimal classifiers use probabilities rather than hard limitations on
parameters when making decisions about how to group different classes together; they also allow
for prior knowledge about certain groups of objects or events to be taken into account when
doing so. By combining these two elements—feature extraction and classification—statistical
pattern recognition enables accurate automatic recognition processes.

What Are The 3 Components Of The Pattern Recognition?


Statistical pattern recognition is a cognitive psychology technique used to identify patterns in
data. It has three components: unsupervised classification, signal processing, and structural
pattern recognition. Unsupervised classification uses algorithms that classify items into groups
without any prior information about the groups or labels assigned to them. Signal processing
involves extracting meaningful signals from noisy sources like audio files, images and videos.
Structural pattern recognition deals with recognizing shapes and objects in an image using
various techniques such as automatic number plate recognition (ANPR).

Accuracy in recognition systems depends on the parameter vector which describes each item
being classified by its features. Probabilistic pattern classifiers are often used for this purpose
along with discriminant analysis techniques for feature extraction. Machine learning methods
also play an important role in statistical pattern recognition as they use ensemble learning
techniques to combine multiple models of machine learning together to increase accuracy.

Is Pattern Recognition A Cognitive Skill?


Pattern recognition is a cognitive skill that involves the automatic analysis of data to identify
patterns and classify them. It can be applied in various fields, such as speech recognition, image
recognition and feature detection. In order to perform pattern recognition tasks accurately,
models must be developed which are able to detect features within datasets and represent these
through probability distributions. For example, supervised learning algorithms such as neural
nets or principal components analysis (PCA) may be used for feature selection and posterior
probability calculations for class probabilities.

In addition, when dealing with large datasets it is important to select appropriate features using
automated methods like PCA or feature extraction techniques which rely on statistical measures
like mean-shift clustering or histogram equalization. Furthermore, by using machine learning
techniques based on probabilistic graphical models such as Bayesian networks we can achieve
higher accuracy rates than traditional approaches due to their ability to capture complex
dependencies between variables. However, all of these methods require careful tuning so that the
model’s performance stays optimal over time. By understanding how different kinds of pattern
recognition systems work and how they interact with each other, it is possible to develop more
effective solutions for real-world problems.

Conclusion
Statistical pattern recognition is an important analytical tool in many fields. It has been used to
identify patterns in data and make predictions based on those patterns. The three components of
the pattern recognition process are feature extraction, classification, and decision making. These
processes allow for insights into complex datasets that may not be immediately apparent from
visual inspection alone.

In cognitive psychology, statistical pattern recognition can be used to understand how cognition
works. By analyzing patterns in behavior or other measures of cognitive functioning, researchers
can gain insight into the underlying processes involved in a person's thinking and problem-
solving abilities. This knowledge can then be applied to develop interventions designed to
improve these skills.

Pattern recognition is also considered a cognitive skill due to its ability to detect patterns and
plan ahead. This allows people to better anticipate future events or outcomes based on past
experience, allowing them to effectively plan their actions accordingly. Understanding this
concept is essential for any profession that requires analysis of complex data sets - such as
business analysts or engineers - as it gives them the tools necessary to accurately assess
information quickly and come up with solutions faster than ever before.
4/16/23, 8:57 PM Statistical pattern recognition

Statistical pattern recognition


37steps.com/189/statisticalpr/

September 13, 2012

Statistical pattern recognition refers to the use of statistics to learn from examples. It means to collect observations, study and
digest them in order to infer general rules or concepts that can be applied to new, unseen observations. How should this be
done in an automatic way? What tools are needed?

Previous discussions on prior knowledge and Plato and Aristotle make clear that learning from observations is impossible
when context is missing. Context information should be available for learning to occur! Why? Because, otherwise, we don’t
know where to look for or how define meaningful patterns.Without the reference to the context, there is no way to derive a
single general statement on the observations, because many of them are equally possible. Context is necessary to make a
choice.

There is a trade-off between how much knowledge that we already have and the
number of observations that is needed to gain some specific additional insight. It
is shown in the figure on the right. If we know everything no new observations
are needed. The less we know in advance the more examples are needed to
reach a specific target. This all depends on how ambitious we are in reaching a
specific goal.

Discussions like this are very symbolic. Knowledge does not have a size that can
be measured. It can at most be ranked partially: from a specific starting point it
can grow. Information theory might be helpful when we accept that knowledge
can be expressed in bits of information. This implies always an uncertainty. Prior
knowledge however usually comes as certain. The expert cannot estimate how
convinced he is that he is right. The medical doctor who tells us what to measure where, or what are clear examples of a
disease cannot tell what the probability is that this is correct. Moreover, in daily life, we may be absolutely certain after a finite
number of observations: we meet somebody and within a second we are sure it is our neighbor. This does not fit in an
information theoretic approach.

https://www.printfriendly.com/p/g/Z7mXrk 1/3
4/16/23, 8:57 PM Statistical pattern recognition

This is again an aspect of the struggle to learn from examples. Prior knowledge might be wrong, but is is the foundation for
new knowledge. For the time being we solve this dilemma by converting existing knowledge into facts of where we are sure of
and unknowns between them that have to be uncovered. In the so-called Bayesian approaches it is assumed that some prior
probability for the unknowns are known, for sure (!?). The next step in any learning approach, Bayesian or not, is now to bring
the known facts, the observations and the unknowns in the same framework, in some mathematical description, by which they
can be related. This is called representation.

On the basis of the representation the observations can be related. The pattern recognition task is to generalize these relations
to rules that should hold for new observations outside the set of the given ones. The entire process of representation and
generalization is illustrated by the below figure. There are some real world objects. The given knowledge is that they come in
two different, distinguishable classes, the red ones and the blue ones. It is also given that their size (area) and perimeter length
are important for distinguishing them. Examples are given The rule to distinguish them is unknown. It should be found such
that it can be applied to new observation for which the class is unknown and has to be estimated.

The objects are represented in a 2-dimensional vector space. Every object is represented as a point (a vector) in this space. It
appears that the training examples of the two classes are represented in almost separable regions in this space. A straight line
may serve as a decision boundary. It may be applied to new objects with unknown class membership to estimate the class
they belong to.

https://www.printfriendly.com/p/g/Z7mXrk 2/3
4/16/23, 8:57 PM Statistical pattern recognition

In this example still some training objects go wrong. So it is to be expected that the classification rule is not perfect and that it
will make errors for future examples. The main target in the development of a pattern recognition system is to make this error,
as small as possible. Additional targets may be the minimization of the cost of learning and classification. To achieve these
targets we need to study:

1. representation: the way the objects are related, in this case by a 2-dimensional vector space.
2. generalization: the rules and concepts that can be derived from the given representation of the set of examples.
3. evaluation: accurate and trustworthy estimates of the performance of the system.

https://www.printfriendly.com/p/g/Z7mXrk 3/3
Principal Component Analysis

Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning. It is a statistical process that converts the
observations of correlated features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are called the Principal Components.
It is one of the popular tools that is used for exploratory data analysis and predictive modeling. It
is a technique to draw strong patterns from the given dataset by reducing the variances.

PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.

PCA works by considering the variance of each attribute because the high attribute shows the
good split between the classes, and hence it reduces the dimensionality. Some real-world
applications of PCA are image processing, movie recommendation system, optimizing the power
allocation in various communication channels. It is a feature extraction technique, so it contains
the important variables and drops the least important variable.

The PCA algorithm is based on some mathematical concepts such as:

o Variance and Covariance


o Eigenvalues and Eigen factors

Some common terms used in PCA algorithm:

1. Dimensionality: It is the number of features or variables present in the given dataset.


More easily, it is the number of columns present in the dataset.

2. Correlation: It signifies that how strongly two variables are related to each other. Such as
if one changes, the other variable also gets changed. The correlation value ranges from -1
to +1. Here, -1 occurs if variables are inversely proportional to each other, and +1
indicates that variables are directly proportional to each other.

3. Orthogonal: It defines that variables are not correlated to each other, and hence the
correlation between the pair of variables is zero.

4. Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will
be eigenvector if Av is the scalar multiple of v.

5. Covariance Matrix: A matrix containing the covariance between the pair of variables is
called the Covariance Matrix.

Principal Components in PCA

As described above, the transformed new features or the output of PCA are the Principal
Components. The number of these PCs are either equal to or less than the original features
present in the dataset. Some properties of these principal components are given below:
The principal component must be the linear combination of the original features.
These components are orthogonal, i.e., the correlation between a pair of variables is zero.
The importance of each component decreases when going to 1 to n, it means the 1 PC has the
most importance, and n PC will have the least importance.

Steps for PCA algorithm:

Getting the dataset

Firstly, we need to take the input dataset and divide it into two subparts X and Y, where X is the
training set, and Y is the validation set.

Representing data into a structure

Now we will represent our dataset into a structure. Such as we will represent the two-
dimensional matrix of independent variable X. Here each row corresponds to the data items, and
the column corresponds to the Features. The number of columns is the dimensions of the dataset.

Standardizing the data

In this step, we will standardize our dataset. Such as in a particular column, the features with
high variance are more important compared to the features with lower variance.
If the importance of features is independent of the variance of the feature, then we will divide
each data item in a column with the standard deviation of the column. Here we will name the
matrix as Z.

Calculating the Covariance of Z

To calculate the covariance of Z, we will take the matrix Z, and will transpose it. After transpose,
we will multiply it by Z. The output matrix will be the Covariance matrix of Z.

Calculating the Eigen Values and Eigen Vectors

Now we need to calculate the eigenvalues and eigenvectors for the resultant covariance matrix Z.
Eigenvectors or the covariance matrix are the directions of the axes with high information. And
the coefficients of these eigenvectors are defined as the eigenvalues.

Sorting the Eigen Vectors

In this step, we will take all the eigenvalues and will sort them in decreasing order, which means
from largest to smallest. And simultaneously sort the eigenvectors accordingly in matrix P of
eigenvalues. The resultant matrix will be named as P*.
Calculating the new features Or Principal Components

Here we will calculate the new features. To do this, we will multiply the P* matrix to the Z. In
the resultant matrix Z*, each observation is the linear combination of original features. Each
column of the Z* matrix is independent of each other.

Remove less or unimportant features from the new dataset.

The new feature set has occurred, so we will decide here what to keep and what to remove. It
means, we will only keep the relevant or important features in the new dataset, and unimportant
features will be removed out.

Applications of Principal Component Analysis:

i. PCA is mainly used as the dimensionality reduction technique in various AI applications


such as computer vision, image compression, etc.

ii. It can also be used for finding hidden patterns if data has high dimensions. Some fields
where PCA is used are Finance, data mining, Psychology, etc.
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA) is a supervised learning algorithm used for classification tasks in
machine learning. It is a technique used to find a linear combination of features that best separates the
classes in a dataset.
• LDA works by projecting the data onto a lower-dimensional space that maximizes the separation
between the classes. It does this by finding a set of linear discriminants that maximize the ratio
of between-class variance to within-class variance. In other words, it finds the directions in the
feature space that best separate the different classes of data.
• LDA assumes that the data has a Gaussian distribution and that the covariance matrices of the
different classes are equal. It also assumes that the data is linearly separable, meaning that a
linear decision boundary can accurately classify the different classes.
• LDA has several advantages, including:
o It is a simple and computationally efficient algorithm.
o It can work well even when the number of features is much larger than the number of
training samples.
o It can handle multicollinearity (correlation between features) in the data.

However, LDA also has some limitations, including:


i. It assumes that the data has a Gaussian distribution, which may not always be the case.
ii. It assumes that the covariance matrices of the different classes are equal, which may not be true
in some datasets.
iii. It assumes that the data is linearly separable, which may not be the case for some datasets.
iv. It may not perform well in high-dimensional feature spaces.

Linear Discriminant Analysis or Normal Discriminant Analysis or Discriminant Function Analysis is a


dimensionality reduction technique that is commonly used for supervised classification problems. It is
used for modelling differences in groups i.e. separating two or more classes. It is used to project the
features in higher dimension space into a lower dimension space.

For example, we have two classes and we need to separate them efficiently. Classes can have multiple
features. Using only a single feature to classify them may result in some overlapping as shown in the
below figure. So, we will keep on increasing the number of features for proper classification.

Example:
Suppose we have two sets of data points belonging to two different classes that we want to classify. As
shown in the given 2D graph, when the data points are plotted on the 2D plane, there’s no straight line
that can separate the two classes of the data points completely. Hence, in this case, LDA (Linear
Discriminant Analysis) is used which reduces the 2D graph into a 1D graph in order to maximize the
separability between the two classes.

Here, Linear Discriminant Analysis uses both the axes (X and Y) to create a new axis and projects data
onto a new axis in a way to maximize the separation of the two categories and hence, reducing the 2D
graph into a 1D graph.

Two criteria are used by LDA to create a new axis:


i. Maximize the distance between means of the two classes.
ii. Minimize the variation within each class.

In the above graph, it can be seen that a new axis (in red) is generated and plotted in the 2D graph such
that it maximizes the distance between the means of the two classes and minimizes the variation within
each class. In simple terms, this newly generated axis increases the separation between the data points
of the two classes. After generating this new axis using the above-mentioned criteria, all the data points
of the classes are plotted on this new axis and are shown in the figure given below.

But Linear Discriminant Analysis fails when the mean of the distributions are shared, as it becomes
impossible for LDA to find a new axis that makes both the classes linearly separable. In such cases, we
use non-linear discriminant analysis.
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint

Linear Discriminant Analysis (LDA) in Machine Learning


Linear Discriminant Analysis (LDA) is one of the commonly used dimensionality reduction techniques in machine learning
to solve more than two-class classification problems. It is also known as Normal Discriminant Analysis (NDA) or
Discriminant Function Analysis (DFA).

This can be used to project the features of higher dimensional space into lower-dimensional space in order to reduce resources
and dimensional costs. In this topic, "Linear Discriminant Analysis (LDA) in machine learning”, we will discuss the LDA
algorithm for classification predictive modeling problems, limitation of logistic regression, representation of linear Discriminant
analysis model, how to make a prediction using LDA, how to prepare data for LDA, extensions to LDA and much more. So, let's
start with a quick introduction to Linear Discriminant Analysis (LDA) in machine learning.

Note: Before starting this topic, it is recommended to learn the basics of Logistic Regression algorithms and a basic
understanding of classification problems in machine learning as a prerequisite

What is Linear Discriminant Analysis (LDA)?


Although the logistic regression algorithm is limited to only two-class, linear Discriminant analysis is applicable for more than two
classes of classification problems.

Linear Discriminant analysis is one of the most popular dimensiona


classification problems in machine learning. It is also considered a pre-pr
applications of pattern classification.

https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 2/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint

Whenever there is a requirement to separate two or more classes having multiple features efficiently, the Linear Discriminant
Analysis model is considered the most common technique to solve such classification problems. For e.g., if we have two classes
with multiple features and need to separate them efficiently. When we classify them using a single feature, then it may show
overlapping.

To overcome the overlapping issue in the classification process, we must increase the number of features regularly.

Example:

Let's assume we have to classify two different classes having two sets of dat
image:

https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 3/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint

However, it is impossible to draw a straight line in a 2-d plane that can separate these data points efficiently but using linear
Discriminant analysis; we can dimensionally reduce the 2-D plane into the 1-D plane. Using this technique, we can also maximize
the separability between multiple classes.

How Linear Discriminant Analysis (LDA) works?


Linear Discriminant analysis is used as a dimensionality reduction techniqu
transform a 2-D and 3-D graph into a 1-dimensional plane.

https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 4/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint

Let's consider an example where we have two classes in a 2-D plane having an X-Y axis, and we need to classify them efficiently. As
we have already seen in the above example that LDA enables us to draw a straight line that can completely separate the two
classes of the data points. Here, LDA uses an X-Y axis to create a new axis by separating them using a straight line and projecting
data onto a new axis.

Hence, we can maximize the separation between these classes and reduce the 2-D plane into 1-D.

To create a new axis, Linear Discriminant Analysis uses the following criteria:

https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 5/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint

It maximizes the distance between means of two classes.

It minimizes the variance within the individual class.

Using the above two conditions, LDA generates a new axis in such a way that it can maximize the distance between the means of
the two classes and minimizes the variation within each class.

In other words, we can say that the new axis will increase the separation between the data points of the two classes and plot them
onto the new axis.

Why LDA?

Logistic Regression is one of the most popular classification algorithms


short in the case of multiple classification problems with well-separated
efficiently.

LDA can also be used in data pre-processing to reduce the number of f


cost significantly.

https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 6/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint

LDA is also used in face detection algorithms. In Fisherfaces, LDA is used to extract useful data from different faces. Coupled
with eigenfaces, it produces effective results.

Drawbacks of Linear Discriminant Analysis (LDA)


Although, LDA is specifically used to solve supervised classification problems for two or more classes which are not possible using
logistic regression in machine learning. But LDA also fails in some cases where the Mean of the distributions is shared. In this case,
LDA fails to create a new axis that makes both the classes linearly separable.

To overcome such problems, we use non-linear Discriminant analysis in machine learning.

Extension to Linear Discriminant Analysis (LDA)


Linear Discriminant analysis is one of the most simple and effective methods to solve classification problems in machine learning.
It has so many extensions and variations as follows:

1. Quadratic Discriminant Analysis (QDA): For multiple input variables, ea

2. Flexible Discriminant Analysis (FDA): it is used when there are non-line

https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 7/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint

3. Flexible Discriminant Analysis (FDA): This uses regularization in the estimate of the variance (actually covariance) and
hence moderates the influence of different variables on LDA.

Real-world Applications of LDA


Some of the common real-world applications of Linear discriminant Analysis are given below:

Face Recognition
Face recognition is the popular application of computer vision, where each face is represented as the combination of a
number of pixel values. In this case, LDA is used to minimize the number of features to a manageable number before going
through the classification process. It generates a new template in which each dimension consists of a linear combination of
pixel values. If a linear combination is generated using Fisher's linear discriminant, then it is called Fisher's face.

Medical
In the medical field, LDA has a great application in classifying the patient disease on the basis of various parameters of
patient health and the medical treatment which is going on. On such parameters, it classifies disease as mild, moderate, or
severe. This classification helps the doctors in either increasing or decreasing the pace of the treatment.

Customer Identification
In customer identification, LDA is currently being applied. It means with the help of LDA; we can easily identify and select
the features that can specify the group of customers who are likely to purchase a specific product in a shopping mall. This
can be helpful when we want to identify a group of customers who mostly purchase a product in a shopping mall.

For Predictions
LDA can also be used for making predictions and so in decision making.
predicted result of either one or two possible classes as a buying or not.

In Learning
Nowadays, robots are being trained for learning and talking to simula
classification problem. In this case, LDA builds similar groups on the
frequencies, sound, tunes, etc.

https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 8/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint

Difference between Linear Discriminant Analysis and PCA


Below are some basic differences between LDA and PCA:

PCA is an unsupervised algorithm that does not care about classes and labels and only aims to find the principal
components to maximize the variance in the given dataset. At the same time, LDA is a supervised algorithm that aims to
find the linear discriminants to represent the axes that maximize separation between different classes of data.

LDA is much more suitable for multi-class classification tasks compared to PCA. However, PCA is assumed to be an as good
performer for a comparatively small sample size.

Both LDA and PCA are used as dimensionality reduction techniques, where PCA is first followed by LDA.

How to Prepare Data for LDA


Below are some suggestions that one should always consider while preparing the data to build the LDA model:

Classification Problems: LDA is mainly applied for classification problems to classify the categorical output variable. It is
suitable for both binary and multi-class classification problems.

Gaussian Distribution: The standard LDA model applies the Gaussian Distribution of the input variables. One should review
the univariate distribution of each attribute and transform them into more Gaussian-looking distributions. For e.g., use log
and root for exponential distributions and Box-Cox for skewed distributio

Remove Outliers: It is good to firstly remove the outliers from your data
used to separate classes in LDA, such as the mean and the standard devia

Same Variance: As LDA always assumes that all the input variables have
to firstly standardize the data before implementing an LDA model. By t
deviation of 1.

https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 9/14

You might also like