0% found this document useful (0 votes)
24 views23 pages

unit-3

The document discusses dimensionality reduction techniques, including feature selection and feature extraction, to address the challenges posed by high-dimensional data, known as the curse of dimensionality. It highlights methods such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) for reducing complexity while retaining essential information. Additionally, it outlines the benefits and limitations of feature selection methods and compares PCA and LDA in terms of their applications and characteristics.

Uploaded by

jayanths878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views23 pages

unit-3

The document discusses dimensionality reduction techniques, including feature selection and feature extraction, to address the challenges posed by high-dimensional data, known as the curse of dimensionality. It highlights methods such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) for reducing complexity while retaining essential information. Additionally, it outlines the benefits and limitations of feature selection methods and compares PCA and LDA in terms of their applications and characteristics.

Uploaded by

jayanths878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Contents


Dimensionality reduction: The curse of dimensionality

Principal component analysis

Feature selection

Discriminant analysis: Fisher linear discriminant, multiple linear discriminant
Dimensionality Reduction
Dimensionality reduction is a technique used to reduce the number of features in a
dataset while retaining as much of the important information as possible.
Dimensionality reduction: The curse of dimensionality
• It is a process of transforming high-dimensional data into a lower-dimensional
space that still preserves the essence of the original data.
• In machine learning, high-dimensional data refers to data with a large number of
features or variables.
• Dimensionality reduction can help to mitigate these problems by reducing the
complexity of the model and improving its generalization performance.
• There are two main approaches to dimensionality reduction: feature selection
and feature extraction.
1. Feature Selection:
• Feature selection involves selecting a subset of the original features that are most
relevant to the problem at hand.
• The goal is to reduce the dimensionality of the dataset while retaining the most
important features.
2. Feature Extraction:
• Feature extraction involves creating new features by combining or transforming
the original features.
• The goal is to create a set of features that captures the essence of the original
data in a lower-dimensional space.
Differences between Feature selection and Feature extraction

Feature Selection Feature Extraction


Selects a subset of relevant features from Extracts a new set of features that are
the original set of features. more informative and compact.
Reduces the dimensionality of the Captures the essential information from
feature space and simplifies the model. the original features and represents it in a
lower-dimensional feature space.
Can be categorized into filter, wrapper, Can be categorized into linear and
and embedded methods. nonlinear methods.
May lose some information and May introduce some noise and
introduce bias if the wrong features are redundancy if the extracted features are
selected. not informative.
Curse of Dimensionality
• Curse of Dimensionality refers to a set of problems that arise when working with
high-dimensional data.
• A dataset with a large number of attributes, generally of the order of a hundred
or more, is referred to as high dimensional data
• The Curse of Dimensionality refers to the various challenges and complications
that arise when analyzing and organizing data in high-dimensional spaces (often
hundreds or thousands of dimensions).
• Dimensions refer to the features or attributes of data. For instance consider a
dataset of houses, the dimensions could include the house's price, size, number
of bedrooms, location
• The curse of dimensionality occurs mainly because as more features or
dimensions are added then there is high chance of increasing the complexity of
the data without necessarily increasing the amount of useful information.
• In high-dimensional spaces, most data points are at the "edges" or "corners,"
making the data sparse.
• The primary solution to the curse of dimensionality is "dimensionality reduction."
It's a process that reduces the number of random variables under consideration
by obtaining a set of principal variables.
• By reducing the dimensionality, we can retain the most important information in
the data while discarding the redundant or less important features.
• The difficulties related to training machine learning models due to high
dimensional data is referred to as ‘Curse of Dimensionality
Domains of curse of dimensionality
1) Anomaly Detection
• Anomaly detection is used for finding unforeseen items or events in the dataset.
• In high-dimensional data anomalies often show a remarkable number of
attributes which are irrelevant in nature; certain objects occur more frequently in
neighbor lists than others.
2) Combinations
• Whenever, there is an increase in the number of possible input combinations it
fuels the complexity to increase rapidly, and the curse of dimensionality occurs.
How to Mitigate Curse of Dimensionality
• To mitigate the problems associated with high dimensional data a suite of
techniques generally referred to as ‘Dimensionality reduction techniques’
• Dimensionality reduction techniques fall into one of the two categories- ‘Feature
selection’ or ‘Feature extraction.
Feature selection techniques
• The attributes are tested for their worthiness and then selected or eliminated
• In Low Variance filter technique, the variance in the distribution of all the
attributes in a dataset is compared and attributes with very low variance are
eliminated.
• In High Correlation filter technique, the pair wise correlation between attributes
is determined. One of the attributes in the pairs that show very high correlation is
eliminated and the other retained
Feature extraction techniques
• The high dimensional attributes are combined in low dimensional components
(PCA or ICA) or factored into low dimensional factors (FA).
Principal Component Analysis (PCA)
• PCA, is a dimensionality-reduction technique in which high dimensional
correlated data is transformed to a lower dimensional set of uncorrelated
components, referred to as principal components.
Factor Analysis (FA)
• Factor analysis is based on the assumption that all the observed attributes in a
dataset can be represented as a weighted linear combination of latent factors.
Feature Selection Techniques
• Feature selection is a way of selecting the subset of the most relevant features
from the original features set by removing the redundant, irrelevant or noisy
features.
• Feature selection is one of the important concepts of machine learning, which
highly impacts the performance of the model.
• It is the process of selecting some attributes from a given collection of
prospective features and then discarding the rest of the attributes that were
considered
• A feature is an attribute that has an impact on a problem or is useful for the
problem, and choosing the important features for the model is known as feature
selection.
• Selecting the best features helps the model to perform well.
• Suppose to create a model that automatically decides which car should be
crushed for a spare part
• The dataset contains a Model of the car, Year, Owner's name, Mileage.
• The name of the owner does not contribute to the model performance as it does
not decide if the car should be crushed or not
• Remove owner column and select the rest of the features(column) for the model
building.
Benefits of Feature selection method
1) Reduce over fitting as less redundant data means less chance to make decisions
based on noise.
2) Improve accuracy by removing misleading and unimportant data
3) Reduce training time since data with fewer columns mean faster training
4) It helps in avoiding the curse of dimensionality.
5) It helps in the simplification of the model so that it can be easily interpreted by
the researchers.
Shortcomings of Feature Selection Method
1) Feature selection methods are hard to apply to high dimensional data
2) The presence of more features then longer time for selection to complete
3) There is risk of over fitting when there aren’t enough observation
The two types of Feature Selection techniques
• Supervised Feature Selection technique
Supervised Feature selection techniques consider the target variable and can be
used for the labeled dataset.
• Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore the target variable and can be
used for the unlabeled dataset.
Three techniques under supervised feature Selection
1) Wrapper Methods
• In wrapper methodology, selection of features is done by considering it as a
search problem, in which different combinations are made, evaluated, and
compared with other combinations.
• It trains the algorithm by using the subset of features iteratively.
2) Filter Methods
• In Filter Method, features are selected on the basis of statistics measures.
• This method does not depend on the learning algorithm and chooses the features
as a pre-processing step.
• The filter method filters out the irrelevant feature and redundant columns from
the model by using different metrics through ranking.
• Filter methods remove features with low correlation with target variable before
training the final ML model.
3) Embedded methods
• Embedded feature selection approaches incorporate the feature selection
machine learning algorithm as an integral component of the learning process.
• This allows for simultaneous classification and feature selection to take place
within the method.
• A few examples of common embedded approaches are the LASSO feature
selection algorithm, the random forest feature selection algorithm, and the
decision tree feature selection algorithm.
PCA
• Karl Pearson was the first person to come up with this plan
• Based on the idea that when data from a higher-dimensional space is put into a
lower-dimensional space
• Principal component analysis (PCA) is a way to get important variables from a
large set of variables in a data set.
• PCA is more useful when you have data with three or more dimensions.
Merits of Dimensionality Reduction
• It helps to compress data, which reduces the amount of space needed to store it
and the amount of time it takes to process it.
• If there are any redundant features, it also helps to get rid of them.
Limitations of Dimensionality Reduction
• lose some data.
• PCA fails when the mean and covariance are not enough to describe a dataset.
Steps Involved in the PCA
Step 1: Standardize the dataset.
Step 2: Calculate the covariance matrix for the features in the dataset.
Step 3: Calculate the eigenvalues and eigenvectors for the covariance matrix.
Step 4: Sort eigenvalues and their corresponding eigenvectors.
Step 5: Pick k eigenvalues and form a matrix of eigenvectors.
Step 6: Transform the original matrix.
Linear Discriminant Analysis (LDA)
• The Linear Discriminant Analysis is the linear classification method that is
recommended to use when there are more than two classes.
• Linear Discriminant analysis is one of the most popular dimensionality reduction
techniques used for supervised classification problems in machine learning.
• To separate two or more classes having multiple features efficiently, the Linear
Discriminant Analysis model is considered the most common technique to solve
such classification problems.
• LDA is used to solve classification problems where the output variable is a
categorical one
• LDA can also be used in data pre-processing to reduce the number of features,
just as PCA, which reduces the computing cost significantly.
• LDA is also used in face detection algorithms
• LDA is used to extract useful data from different faces.
• LDA is used to minimize the number of features to a manageable number before
going through the classification process.
• Face recognition is the popular application of computer vision, where each face is
represented as the combination of a number of pixel values.
• LDA has a great application in classifying the patient disease on the basis of
various parameters of patient health and the medical treatment which is going on
and classifies disease as mild, moderate, or severe. This classification helps the
doctors in either increasing or decreasing the pace of the treatment.
• LDA can also be used for making predictions and so in decision making. For
example, "will you buy this product” will give a predicted result of either one or
two possible classes as a buying or not.
Differences between PCA and LDA
PCA LDA
1) unsupervised method 1) supervised method

2) PCA is commonly used when there are 2) finding the linear combinations of
no class labels features that best separate two or more
classes

3) Data compression, noise reduction, 3) classification tasks and preprocessing


and visualization for algorithms like Logistic Regression,
SVM, and Neural Networks

4) PCA is sensitive to outliers 4) slightly less sensitive to outliers than


PCA.

5) well-suited for general dimensionality 5) ideal for classification-related


reduction, data exploration, and noise preprocessing
reduction.

You might also like