0% found this document useful (0 votes)
16 views

ML Unit 2 Part 2

Uploaded by

jkdprince3
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

ML Unit 2 Part 2

Uploaded by

jkdprince3
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Unit-2-Future

Engineering
Introduction
• Feature engineering is a critical preparatory process in machine learning.
• It is responsible for taking raw input data and converting that to well-aligned
features which are ready to be used by the machine learning models.

What is a feature?
A feature is an attribute of a data set that is used in a machine learning process.
Attributes which are meaningful to a machine learning problem are to be called as
features,
And selection of the subset of features which are meaningful for machine learning is
a sub-area of feature engineering which draws a lot of research interest
The features in a data set are also called its dimensions
An data set having ‘n’ features is called an n-dimensional data set.
● The below Figure has five attributes or features namely Sepal.Length,
Sepal.Width, Petal.Length, Petal. Width and Species. Out of these, the
feature ‘Species’ represent the class variable and the remaining features
are the predictor variables.
Feature engineering
● Feature engineering refers to the process of translating a data set into
features such that these features are able to represent the data set more
effectively and result in a better learning performance.

It has two major elements:


 1. feature transformation:
 2.feature subset selection:
Feature transformation
It transforms the data – structured or unstructured, into a new set of features
which can represent the underlying problem which machine learning is trying
to solve.
There are two variants of feature transformation:
1. feature construction:
Feature construction process discovers missing information about the
relationships between features and augments the feature space by creating
additional features.
2. feature extraction: is the process of extracting or creating a new set of
features from the original set of features using some functional mapping.

Both are sometimes known as feature discovery.


Feature subset selection (or simply feature selection)

● Objective of feature selection is to derive a subset of


features from the full feature set which is most meaningful
in the context of a specific machine learning problem
FEATURE TRANSFORMATION
● Feature transformation is used as an effective tool for dimensionality
reduction and hence for boosting learning model performance.

● Broadly, there are two distinct goals of feature transformation:

1.Achieving best reconstruction of the original features in the data set

2.Achieving highest efficiency in the learning task.


Feature construction
● Feature construction involves transforming a given set of input features to
generate a new set of more powerful features.
● There are certain situations where feature construction is an essential
activity before we can start with the machine learning task.
These situations are

1.when features have categorical value and machine learning needs


numeric value inputs
2.when features having numeric (continuous) values and need to be
converted to ordinal values
3.when text-specific feature construction needs to be done
Encoding categorical (nominal) variables
● Let’s take the example of another data set on athletes, as presented in Figure data
set has features age, city of origin, parents athlete (i.e. indicate whether any one
of the parents was an athlete) and Chance of Win.
● The feature chance of a win is a class variable while the others are predictor
variables.
● Suppose classification algorithm (like kNN) or a regression algorithm, requires
numerical figures to learn from.
● In this case, feature construction can be used to create new dummy features
which are usable by machine learning algorithms
Encoding categorical (ordinal) variables

● Let’s assume that there are three variable – science marks, maths marks
and grade as shown in Figure
● Let’s assume that there are three variable – science marks, maths marks
and grade as shown in Figure
Transforming numeric (continuous) features to
categorical features
● Sometimes there is a need of transforming a continuous numerical variable
into a categorical variable.
text-specific feature construction

● All machine learning models nee numerical data as input. So the text data
in the data sets need to be transformed into numerical features.
● Text data, is converted to a numerical representation following a process
is known as vectorization .
● In this process, word occurrences in all documents belonging to the
corpus are consolidated in the form of bag-of-words.
There are three major steps that are followed:
1. tokenize
2. count
3. normalize
text-specific feature construction

● In order to tokenize a corpus, the blank spaces and punctuations are used
as delimiters to separate out the words, or tokens.
● Then the number of occurrences of each token is counted, for each
document
● A matrix is then formed with each token representing a column and a
specific document of the corpus( Collection or Network of Words)
representing each row.
● Each cell contains the count of occurrence of the token in a specific
document.
● This matrix is known as a document-term matrix
Feature extraction

● In feature extraction, new features are created from a combination of


original features.
● Some of the commonly used operators for combining the original
features include
● 1. For Boolean features: Conjunctions, Disjunctions, Negation, etc.
● 2. For nominal features: Cartesian product, M of N, etc.
● 3. For numerical features: Min, Max, Addition, Subtraction,
Multiplication, Division, Average, Equivalence, Inequality, etc.
Principal Component Analysis
● Main guiding philosophy of principal component analysis (PCA) is
feature extraction.
● A new set of features are extracted from the original features which are
quite dissimilar in nature.
● So an n-dimensional feature space gets transformed to an m-dimensional
feature space.
● To understand PCA Need to bit of dip dive into vector space concept in
linear algebra.
● Vector is a quantity having both magnitude and direction and hence can
determine the position of a point relative to another point in the
Euclidean space
● A vector space is a set of vectors. Vector spaces have a property that they
can be represented as a linear combination of a smaller set of vectors,
called basis vectors.
● Any vector ‘v’ in a vector space can be represented as

where, a represents ‘n’ scalars and u represents the basis vectors.


Basis vectors are orthogonal to each other.
Two orthogonal vectors are completely unrelated or independent of each
other
● Principal Component Analysis (PCA) is used to reduce the dimensionality
of a data set by finding a new set of variables, smaller than the original set
of variables, retaining most of the sample’s information, and useful for the
regression and classification of data.
● The objective of PCA is to make the transformation in such a way that

1.The new features are distinct, i.e. the covariance between the new features, i.e. The
principal components is 0.
2. The principal components are generated in order of the variability in the data that it
captures. Hence, the first principal component should capture the maximum
variability, the second principal component should capture the next highest
variability etc.
3. The sum of variance of the new features or the principal components should be equal
to the sum of variance of the original features.
● PCA works based on a process called eigen value decomposition of a covariance
matrix of a data set.
● Below are the steps to be followed:
● 1. First, calculate the covariance matrix of a data set.
● 2. Then, calculate the eigen-values of the covariance matrix.
● 3. The eigenvector having highest eigen-value represents the direction in
which there is the highest variance. So this will help in identifying the First
Principal Component.
● 4.The eigenvector having the next highest eigen-value represents the direction in
which data has the highest remaining variance and also orthogonal to the first
direction.
● So this helps in identifying the second principal component.
● 5. Like this, identify the top ‘k’ eigenvectors having top ‘k’ eigen-values so as to
get the ‘k’ principal components.
● Linear Discriminant Analysis
Linear discriminant analysis (LDA) is another commonly used feature extraction
technique like PCA or SVD.
The objective of LDA is similar to the sense that it intends to transform a data set
into a lower dimensional feature space.
● However, unlike PCA, the focus of LDA is not to capture the data set variability.
Instead, LDA focuses on class separability, i.e. separating the features based on
class separability so as to avoid over-fitting of the machine learning model.

You might also like