ML Unit 2 Part 2
ML Unit 2 Part 2
Engineering
Introduction
• Feature engineering is a critical preparatory process in machine learning.
• It is responsible for taking raw input data and converting that to well-aligned
features which are ready to be used by the machine learning models.
What is a feature?
A feature is an attribute of a data set that is used in a machine learning process.
Attributes which are meaningful to a machine learning problem are to be called as
features,
And selection of the subset of features which are meaningful for machine learning is
a sub-area of feature engineering which draws a lot of research interest
The features in a data set are also called its dimensions
An data set having ‘n’ features is called an n-dimensional data set.
● The below Figure has five attributes or features namely Sepal.Length,
Sepal.Width, Petal.Length, Petal. Width and Species. Out of these, the
feature ‘Species’ represent the class variable and the remaining features
are the predictor variables.
Feature engineering
● Feature engineering refers to the process of translating a data set into
features such that these features are able to represent the data set more
effectively and result in a better learning performance.
● Let’s assume that there are three variable – science marks, maths marks
and grade as shown in Figure
● Let’s assume that there are three variable – science marks, maths marks
and grade as shown in Figure
Transforming numeric (continuous) features to
categorical features
● Sometimes there is a need of transforming a continuous numerical variable
into a categorical variable.
text-specific feature construction
● All machine learning models nee numerical data as input. So the text data
in the data sets need to be transformed into numerical features.
● Text data, is converted to a numerical representation following a process
is known as vectorization .
● In this process, word occurrences in all documents belonging to the
corpus are consolidated in the form of bag-of-words.
There are three major steps that are followed:
1. tokenize
2. count
3. normalize
text-specific feature construction
● In order to tokenize a corpus, the blank spaces and punctuations are used
as delimiters to separate out the words, or tokens.
● Then the number of occurrences of each token is counted, for each
document
● A matrix is then formed with each token representing a column and a
specific document of the corpus( Collection or Network of Words)
representing each row.
● Each cell contains the count of occurrence of the token in a specific
document.
● This matrix is known as a document-term matrix
Feature extraction
1.The new features are distinct, i.e. the covariance between the new features, i.e. The
principal components is 0.
2. The principal components are generated in order of the variability in the data that it
captures. Hence, the first principal component should capture the maximum
variability, the second principal component should capture the next highest
variability etc.
3. The sum of variance of the new features or the principal components should be equal
to the sum of variance of the original features.
● PCA works based on a process called eigen value decomposition of a covariance
matrix of a data set.
● Below are the steps to be followed:
● 1. First, calculate the covariance matrix of a data set.
● 2. Then, calculate the eigen-values of the covariance matrix.
● 3. The eigenvector having highest eigen-value represents the direction in
which there is the highest variance. So this will help in identifying the First
Principal Component.
● 4.The eigenvector having the next highest eigen-value represents the direction in
which data has the highest remaining variance and also orthogonal to the first
direction.
● So this helps in identifying the second principal component.
● 5. Like this, identify the top ‘k’ eigenvectors having top ‘k’ eigen-values so as to
get the ‘k’ principal components.
● Linear Discriminant Analysis
Linear discriminant analysis (LDA) is another commonly used feature extraction
technique like PCA or SVD.
The objective of LDA is similar to the sense that it intends to transform a data set
into a lower dimensional feature space.
● However, unlike PCA, the focus of LDA is not to capture the data set variability.
Instead, LDA focuses on class separability, i.e. separating the features based on
class separability so as to avoid over-fitting of the machine learning model.