AI Unit 4
AI Unit 4
Pattern is everything around in this digital world. A pattern can either be seen
physically or it can be observed mathematically by applying algorithms. In Pattern
Recognition, pattern is comprises of the following two fundamental things:
Collection of observations
Classifier is used to partition the feature space into class-labeled decision regions.
While Decision Boundaries are the borders between decision regions.
There are several basic principles and design considerations that are important in
pattern recognition:
Collection of observations
• The concept behind the observation
• Differentiate between good and bad features.
• Feature properties.
1.In a statistical-classification problem, a decision boundary is a hyper surface that partitions the
underlying vector space into two sets. A decision boundary is the region of a problem space in
which the output label of a classifier is ambiguous. Classifier is a hypothesis or discrete-valued
function that is used to assign (categorical) class labels to particular data points.
2.Classifier is used to partition the feature space into class-labeled decision regions. While
Decision Boundaries are the borders between decision regions.
There are several basic principles and design considerations that are important in pattern
recognition:
i. Feature representation: The way in which the data is represented or encoded is critical for
the success of a pattern recognition system. It is important to choose features that are
relevant to the problem at hand and that capture the underlying structure of the data.
ii. Similarity measure: A similarity measure is used to compare the similarity between two
data points. Different similarity measures may be appropriate for different types of data
and for different problems.
iii. Model selection: There are many different types of models that can be used for pattern
recognition, including linear models, nonlinear models, and probabilistic models. It is
important to choose a model that is appropriate for the data and the problem at hand.
iv. Evaluation: It is important to evaluate the performance of a pattern recognition system
using appropriate metrics and datasets. This allows us to compare the performance of
different algorithms and models and to choose the best one for the problem at hand.
v. Preprocessing: Preprocessing is the process of preparing the data for analysis. This may
involve cleaning the data, scaling the data, or transforming the data in some way to make
it more suitable for analysis.
vi. Feature selection: Feature selection is the process of selecting a subset of the most
relevant features from the data. This can help to improve the performance of the pattern
recognition system and to reduce the complexity of the model.
Statistical Pattern Recognition
Statistical pattern recognition: Statistical pattern recognition is the branch of statistics
that deals with the identification and classification of patterns in data. It is a type of supervised
learning, where the data is labeled with class labels that indicate which class a particular instance
belongs to. The goal of statistical pattern recognition is to learn a model that can accurately
classify new data instances based on their features.
The topic of machine learning known as statistical pattern recognition focuses on finding
patterns and regularities in data. It enables machines to gain knowledge from data, enhance
performance, and make choices based on what they have discovered. The goal of Statistical
Pattern Recognition is to find relationships between variables that can be used for prediction or
classification tasks. This article will explore the various techniques used in Statistical Pattern
Recognition and how these methods are applied to solve real-world problems.
The importance of pattern recognition lies in its ability to detect complex relations among
variables without explicit programming instructions. By using statistical models, machines can
identify regularities in data that would otherwise require manual labor or trial-and-error
experimentation by humans. In addition, machines can generalize from existing knowledge bases
to predict new outcomes more accurately than before.
Statistical Pattern Recognition is becoming increasingly important within many industries due to
its ability to automate certain processes as well as providing valuable insights into large datasets
that may otherwise remain hidden beneath the surface. With this article, we aim to provide an
overview of different techniques used for identifying patterns within data and explain how they
are employed in solving practical problems effectively.
In terms of applications, SPR has been successfully applied to problems like cursive handwriting
recognition and automated medical diagnosis. In the case of handwriting recognition, an
algorithm works by extracting features using a feature extraction algorithm then matching them
with existing model parameters. The same principle applies when solving more complex tasks
such as image classification where deep learning may be employed instead of traditional
methods like discriminant analysis. Similarly, machine vision systems use SPR techniques to
identify objects within an image and classify them according to specific criteria. Furthermore,
modern robotics also utilize SPR concepts to enable robots to recognize their environment better
iRobot Roomba Vacuum Cleaner being one example.
Accuracy in recognition systems depends on the parameter vector which describes each item
being classified by its features. Probabilistic pattern classifiers are often used for this purpose
along with discriminant analysis techniques for feature extraction. Machine learning methods
also play an important role in statistical pattern recognition as they use ensemble learning
techniques to combine multiple models of machine learning together to increase accuracy.
In addition, when dealing with large datasets it is important to select appropriate features using
automated methods like PCA or feature extraction techniques which rely on statistical measures
like mean-shift clustering or histogram equalization. Furthermore, by using machine learning
techniques based on probabilistic graphical models such as Bayesian networks we can achieve
higher accuracy rates than traditional approaches due to their ability to capture complex
dependencies between variables. However, all of these methods require careful tuning so that the
model’s performance stays optimal over time. By understanding how different kinds of pattern
recognition systems work and how they interact with each other, it is possible to develop more
effective solutions for real-world problems.
Conclusion
Statistical pattern recognition is an important analytical tool in many fields. It has been used to
identify patterns in data and make predictions based on those patterns. The three components of
the pattern recognition process are feature extraction, classification, and decision making. These
processes allow for insights into complex datasets that may not be immediately apparent from
visual inspection alone.
In cognitive psychology, statistical pattern recognition can be used to understand how cognition
works. By analyzing patterns in behavior or other measures of cognitive functioning, researchers
can gain insight into the underlying processes involved in a person's thinking and problem-
solving abilities. This knowledge can then be applied to develop interventions designed to
improve these skills.
Pattern recognition is also considered a cognitive skill due to its ability to detect patterns and
plan ahead. This allows people to better anticipate future events or outcomes based on past
experience, allowing them to effectively plan their actions accordingly. Understanding this
concept is essential for any profession that requires analysis of complex data sets - such as
business analysts or engineers - as it gives them the tools necessary to accurately assess
information quickly and come up with solutions faster than ever before.
4/16/23, 8:57 PM Statistical pattern recognition
Statistical pattern recognition refers to the use of statistics to learn from examples. It means to collect observations, study and
digest them in order to infer general rules or concepts that can be applied to new, unseen observations. How should this be
done in an automatic way? What tools are needed?
Previous discussions on prior knowledge and Plato and Aristotle make clear that learning from observations is impossible
when context is missing. Context information should be available for learning to occur! Why? Because, otherwise, we don’t
know where to look for or how define meaningful patterns.Without the reference to the context, there is no way to derive a
single general statement on the observations, because many of them are equally possible. Context is necessary to make a
choice.
There is a trade-off between how much knowledge that we already have and the
number of observations that is needed to gain some specific additional insight. It
is shown in the figure on the right. If we know everything no new observations
are needed. The less we know in advance the more examples are needed to
reach a specific target. This all depends on how ambitious we are in reaching a
specific goal.
Discussions like this are very symbolic. Knowledge does not have a size that can
be measured. It can at most be ranked partially: from a specific starting point it
can grow. Information theory might be helpful when we accept that knowledge
can be expressed in bits of information. This implies always an uncertainty. Prior
knowledge however usually comes as certain. The expert cannot estimate how
convinced he is that he is right. The medical doctor who tells us what to measure where, or what are clear examples of a
disease cannot tell what the probability is that this is correct. Moreover, in daily life, we may be absolutely certain after a finite
number of observations: we meet somebody and within a second we are sure it is our neighbor. This does not fit in an
information theoretic approach.
https://www.printfriendly.com/p/g/Z7mXrk 1/3
4/16/23, 8:57 PM Statistical pattern recognition
This is again an aspect of the struggle to learn from examples. Prior knowledge might be wrong, but is is the foundation for
new knowledge. For the time being we solve this dilemma by converting existing knowledge into facts of where we are sure of
and unknowns between them that have to be uncovered. In the so-called Bayesian approaches it is assumed that some prior
probability for the unknowns are known, for sure (!?). The next step in any learning approach, Bayesian or not, is now to bring
the known facts, the observations and the unknowns in the same framework, in some mathematical description, by which they
can be related. This is called representation.
On the basis of the representation the observations can be related. The pattern recognition task is to generalize these relations
to rules that should hold for new observations outside the set of the given ones. The entire process of representation and
generalization is illustrated by the below figure. There are some real world objects. The given knowledge is that they come in
two different, distinguishable classes, the red ones and the blue ones. It is also given that their size (area) and perimeter length
are important for distinguishing them. Examples are given The rule to distinguish them is unknown. It should be found such
that it can be applied to new observation for which the class is unknown and has to be estimated.
The objects are represented in a 2-dimensional vector space. Every object is represented as a point (a vector) in this space. It
appears that the training examples of the two classes are represented in almost separable regions in this space. A straight line
may serve as a decision boundary. It may be applied to new objects with unknown class membership to estimate the class
they belong to.
https://www.printfriendly.com/p/g/Z7mXrk 2/3
4/16/23, 8:57 PM Statistical pattern recognition
In this example still some training objects go wrong. So it is to be expected that the classification rule is not perfect and that it
will make errors for future examples. The main target in the development of a pattern recognition system is to make this error,
as small as possible. Additional targets may be the minimization of the cost of learning and classification. To achieve these
targets we need to study:
1. representation: the way the objects are related, in this case by a 2-dimensional vector space.
2. generalization: the rules and concepts that can be derived from the given representation of the set of examples.
3. evaluation: accurate and trustworthy estimates of the performance of the system.
https://www.printfriendly.com/p/g/Z7mXrk 3/3
Principal Component Analysis
Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning. It is a statistical process that converts the
observations of correlated features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are called the Principal Components.
It is one of the popular tools that is used for exploratory data analysis and predictive modeling. It
is a technique to draw strong patterns from the given dataset by reducing the variances.
PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.
PCA works by considering the variance of each attribute because the high attribute shows the
good split between the classes, and hence it reduces the dimensionality. Some real-world
applications of PCA are image processing, movie recommendation system, optimizing the power
allocation in various communication channels. It is a feature extraction technique, so it contains
the important variables and drops the least important variable.
2. Correlation: It signifies that how strongly two variables are related to each other. Such as
if one changes, the other variable also gets changed. The correlation value ranges from -1
to +1. Here, -1 occurs if variables are inversely proportional to each other, and +1
indicates that variables are directly proportional to each other.
3. Orthogonal: It defines that variables are not correlated to each other, and hence the
correlation between the pair of variables is zero.
4. Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will
be eigenvector if Av is the scalar multiple of v.
5. Covariance Matrix: A matrix containing the covariance between the pair of variables is
called the Covariance Matrix.
As described above, the transformed new features or the output of PCA are the Principal
Components. The number of these PCs are either equal to or less than the original features
present in the dataset. Some properties of these principal components are given below:
The principal component must be the linear combination of the original features.
These components are orthogonal, i.e., the correlation between a pair of variables is zero.
The importance of each component decreases when going to 1 to n, it means the 1 PC has the
most importance, and n PC will have the least importance.
Firstly, we need to take the input dataset and divide it into two subparts X and Y, where X is the
training set, and Y is the validation set.
Now we will represent our dataset into a structure. Such as we will represent the two-
dimensional matrix of independent variable X. Here each row corresponds to the data items, and
the column corresponds to the Features. The number of columns is the dimensions of the dataset.
In this step, we will standardize our dataset. Such as in a particular column, the features with
high variance are more important compared to the features with lower variance.
If the importance of features is independent of the variance of the feature, then we will divide
each data item in a column with the standard deviation of the column. Here we will name the
matrix as Z.
To calculate the covariance of Z, we will take the matrix Z, and will transpose it. After transpose,
we will multiply it by Z. The output matrix will be the Covariance matrix of Z.
Now we need to calculate the eigenvalues and eigenvectors for the resultant covariance matrix Z.
Eigenvectors or the covariance matrix are the directions of the axes with high information. And
the coefficients of these eigenvectors are defined as the eigenvalues.
In this step, we will take all the eigenvalues and will sort them in decreasing order, which means
from largest to smallest. And simultaneously sort the eigenvectors accordingly in matrix P of
eigenvalues. The resultant matrix will be named as P*.
Calculating the new features Or Principal Components
Here we will calculate the new features. To do this, we will multiply the P* matrix to the Z. In
the resultant matrix Z*, each observation is the linear combination of original features. Each
column of the Z* matrix is independent of each other.
The new feature set has occurred, so we will decide here what to keep and what to remove. It
means, we will only keep the relevant or important features in the new dataset, and unimportant
features will be removed out.
ii. It can also be used for finding hidden patterns if data has high dimensions. Some fields
where PCA is used are Finance, data mining, Psychology, etc.
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA) is a supervised learning algorithm used for classification tasks in
machine learning. It is a technique used to find a linear combination of features that best separates the
classes in a dataset.
• LDA works by projecting the data onto a lower-dimensional space that maximizes the separation
between the classes. It does this by finding a set of linear discriminants that maximize the ratio
of between-class variance to within-class variance. In other words, it finds the directions in the
feature space that best separate the different classes of data.
• LDA assumes that the data has a Gaussian distribution and that the covariance matrices of the
different classes are equal. It also assumes that the data is linearly separable, meaning that a
linear decision boundary can accurately classify the different classes.
• LDA has several advantages, including:
o It is a simple and computationally efficient algorithm.
o It can work well even when the number of features is much larger than the number of
training samples.
o It can handle multicollinearity (correlation between features) in the data.
For example, we have two classes and we need to separate them efficiently. Classes can have multiple
features. Using only a single feature to classify them may result in some overlapping as shown in the
below figure. So, we will keep on increasing the number of features for proper classification.
Example:
Suppose we have two sets of data points belonging to two different classes that we want to classify. As
shown in the given 2D graph, when the data points are plotted on the 2D plane, there’s no straight line
that can separate the two classes of the data points completely. Hence, in this case, LDA (Linear
Discriminant Analysis) is used which reduces the 2D graph into a 1D graph in order to maximize the
separability between the two classes.
Here, Linear Discriminant Analysis uses both the axes (X and Y) to create a new axis and projects data
onto a new axis in a way to maximize the separation of the two categories and hence, reducing the 2D
graph into a 1D graph.
In the above graph, it can be seen that a new axis (in red) is generated and plotted in the 2D graph such
that it maximizes the distance between the means of the two classes and minimizes the variation within
each class. In simple terms, this newly generated axis increases the separation between the data points
of the two classes. After generating this new axis using the above-mentioned criteria, all the data points
of the classes are plotted on this new axis and are shown in the figure given below.
But Linear Discriminant Analysis fails when the mean of the distributions are shared, as it becomes
impossible for LDA to find a new axis that makes both the classes linearly separable. In such cases, we
use non-linear discriminant analysis.
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint
This can be used to project the features of higher dimensional space into lower-dimensional space in order to reduce resources
and dimensional costs. In this topic, "Linear Discriminant Analysis (LDA) in machine learning”, we will discuss the LDA
algorithm for classification predictive modeling problems, limitation of logistic regression, representation of linear Discriminant
analysis model, how to make a prediction using LDA, how to prepare data for LDA, extensions to LDA and much more. So, let's
start with a quick introduction to Linear Discriminant Analysis (LDA) in machine learning.
Note: Before starting this topic, it is recommended to learn the basics of Logistic Regression algorithms and a basic
understanding of classification problems in machine learning as a prerequisite
https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 2/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint
Whenever there is a requirement to separate two or more classes having multiple features efficiently, the Linear Discriminant
Analysis model is considered the most common technique to solve such classification problems. For e.g., if we have two classes
with multiple features and need to separate them efficiently. When we classify them using a single feature, then it may show
overlapping.
To overcome the overlapping issue in the classification process, we must increase the number of features regularly.
Example:
Let's assume we have to classify two different classes having two sets of dat
image:
https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 3/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint
However, it is impossible to draw a straight line in a 2-d plane that can separate these data points efficiently but using linear
Discriminant analysis; we can dimensionally reduce the 2-D plane into the 1-D plane. Using this technique, we can also maximize
the separability between multiple classes.
https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 4/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint
Let's consider an example where we have two classes in a 2-D plane having an X-Y axis, and we need to classify them efficiently. As
we have already seen in the above example that LDA enables us to draw a straight line that can completely separate the two
classes of the data points. Here, LDA uses an X-Y axis to create a new axis by separating them using a straight line and projecting
data onto a new axis.
Hence, we can maximize the separation between these classes and reduce the 2-D plane into 1-D.
To create a new axis, Linear Discriminant Analysis uses the following criteria:
https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 5/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint
Using the above two conditions, LDA generates a new axis in such a way that it can maximize the distance between the means of
the two classes and minimizes the variation within each class.
In other words, we can say that the new axis will increase the separation between the data points of the two classes and plot them
onto the new axis.
Why LDA?
https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 6/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint
LDA is also used in face detection algorithms. In Fisherfaces, LDA is used to extract useful data from different faces. Coupled
with eigenfaces, it produces effective results.
https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 7/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint
3. Flexible Discriminant Analysis (FDA): This uses regularization in the estimate of the variance (actually covariance) and
hence moderates the influence of different variables on LDA.
Face Recognition
Face recognition is the popular application of computer vision, where each face is represented as the combination of a
number of pixel values. In this case, LDA is used to minimize the number of features to a manageable number before going
through the classification process. It generates a new template in which each dimension consists of a linear combination of
pixel values. If a linear combination is generated using Fisher's linear discriminant, then it is called Fisher's face.
Medical
In the medical field, LDA has a great application in classifying the patient disease on the basis of various parameters of
patient health and the medical treatment which is going on. On such parameters, it classifies disease as mild, moderate, or
severe. This classification helps the doctors in either increasing or decreasing the pace of the treatment.
Customer Identification
In customer identification, LDA is currently being applied. It means with the help of LDA; we can easily identify and select
the features that can specify the group of customers who are likely to purchase a specific product in a shopping mall. This
can be helpful when we want to identify a group of customers who mostly purchase a product in a shopping mall.
For Predictions
LDA can also be used for making predictions and so in decision making.
predicted result of either one or two possible classes as a buying or not.
In Learning
Nowadays, robots are being trained for learning and talking to simula
classification problem. In this case, LDA builds similar groups on the
frequencies, sound, tunes, etc.
https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 8/14
4/16/23, 9:25 PM Linear Discriminant Analysis (LDA) in Machine Learning - Javatpoint
PCA is an unsupervised algorithm that does not care about classes and labels and only aims to find the principal
components to maximize the variance in the given dataset. At the same time, LDA is a supervised algorithm that aims to
find the linear discriminants to represent the axes that maximize separation between different classes of data.
LDA is much more suitable for multi-class classification tasks compared to PCA. However, PCA is assumed to be an as good
performer for a comparatively small sample size.
Both LDA and PCA are used as dimensionality reduction techniques, where PCA is first followed by LDA.
Classification Problems: LDA is mainly applied for classification problems to classify the categorical output variable. It is
suitable for both binary and multi-class classification problems.
Gaussian Distribution: The standard LDA model applies the Gaussian Distribution of the input variables. One should review
the univariate distribution of each attribute and transform them into more Gaussian-looking distributions. For e.g., use log
and root for exponential distributions and Box-Cox for skewed distributio
Remove Outliers: It is good to firstly remove the outliers from your data
used to separate classes in LDA, such as the mean and the standard devia
Same Variance: As LDA always assumes that all the input variables have
to firstly standardize the data before implementing an LDA model. By t
deviation of 1.
https://www.javatpoint.com/linear-discriminant-analysis-in-machine-learning 9/14