ML Unit 2 Part 2

Uploaded by

jkdprince3

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

ML Unit 2 Part 2

Uploaded by

jkdprince3

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Unit-2-Future

Engineering
Introduction
• Feature engineering is a critical preparatory process in machine learning.
• It is responsible for taking raw input data and converting that to well-aligned
features which are ready to be used by the machine learning models.

What is a feature?
A feature is an attribute of a data set that is used in a machine learning process.
Attributes which are meaningful to a machine learning problem are to be called as
features,
And selection of the subset of features which are meaningful for machine learning is
a sub-area of feature engineering which draws a lot of research interest
The features in a data set are also called its dimensions
An data set having ‘n’ features is called an n-dimensional data set.
● The below Figure has five attributes or features namely Sepal.Length,
Sepal.Width, Petal.Length, Petal. Width and Species. Out of these, the
feature ‘Species’ represent the class variable and the remaining features
are the predictor variables.
Feature engineering
● Feature engineering refers to the process of translating a data set into
features such that these features are able to represent the data set more
effectively and result in a better learning performance.

It has two major elements:

 1. feature transformation:
 2.feature subset selection:
Feature transformation
It transforms the data – structured or unstructured, into a new set of features
which can represent the underlying problem which machine learning is trying
to solve.
There are two variants of feature transformation:
1. feature construction:
Feature construction process discovers missing information about the
relationships between features and augments the feature space by creating
additional features.
2. feature extraction: is the process of extracting or creating a new set of
features from the original set of features using some functional mapping.

Both are sometimes known as feature discovery.

Feature subset selection (or simply feature selection)

● Objective of feature selection is to derive a subset of

features from the full feature set which is most meaningful
in the context of a specific machine learning problem
FEATURE TRANSFORMATION
● Feature transformation is used as an effective tool for dimensionality
reduction and hence for boosting learning model performance.

● Broadly, there are two distinct goals of feature transformation:

1.Achieving best reconstruction of the original features in the data set

2.Achieving highest efficiency in the learning task.

Feature construction
● Feature construction involves transforming a given set of input features to
generate a new set of more powerful features.
● There are certain situations where feature construction is an essential
activity before we can start with the machine learning task.
These situations are

1.when features have categorical value and machine learning needs

numeric value inputs
2.when features having numeric (continuous) values and need to be
converted to ordinal values
3.when text-specific feature construction needs to be done
Encoding categorical (nominal) variables
● Let’s take the example of another data set on athletes, as presented in Figure data
set has features age, city of origin, parents athlete (i.e. indicate whether any one
of the parents was an athlete) and Chance of Win.
● The feature chance of a win is a class variable while the others are predictor
variables.
● Suppose classification algorithm (like kNN) or a regression algorithm, requires
numerical figures to learn from.
● In this case, feature construction can be used to create new dummy features
which are usable by machine learning algorithms
Encoding categorical (ordinal) variables

● Let’s assume that there are three variable – science marks, maths marks
and grade as shown in Figure
● Let’s assume that there are three variable – science marks, maths marks
and grade as shown in Figure
Transforming numeric (continuous) features to
categorical features
● Sometimes there is a need of transforming a continuous numerical variable
into a categorical variable.
text-specific feature construction

● All machine learning models nee numerical data as input. So the text data
in the data sets need to be transformed into numerical features.
● Text data, is converted to a numerical representation following a process
is known as vectorization .
● In this process, word occurrences in all documents belonging to the
corpus are consolidated in the form of bag-of-words.
There are three major steps that are followed:
1. tokenize
2. count
3. normalize
text-specific feature construction

● In order to tokenize a corpus, the blank spaces and punctuations are used
as delimiters to separate out the words, or tokens.
● Then the number of occurrences of each token is counted, for each
document
● A matrix is then formed with each token representing a column and a
specific document of the corpus( Collection or Network of Words)
representing each row.
● Each cell contains the count of occurrence of the token in a specific
document.
● This matrix is known as a document-term matrix
Feature extraction

● In feature extraction, new features are created from a combination of

original features.
● Some of the commonly used operators for combining the original
features include
● 1. For Boolean features: Conjunctions, Disjunctions, Negation, etc.
● 2. For nominal features: Cartesian product, M of N, etc.
● 3. For numerical features: Min, Max, Addition, Subtraction,
Multiplication, Division, Average, Equivalence, Inequality, etc.
Principal Component Analysis
● Main guiding philosophy of principal component analysis (PCA) is
feature extraction.
● A new set of features are extracted from the original features which are
quite dissimilar in nature.
● So an n-dimensional feature space gets transformed to an m-dimensional
feature space.
● To understand PCA Need to bit of dip dive into vector space concept in
linear algebra.
● Vector is a quantity having both magnitude and direction and hence can
determine the position of a point relative to another point in the
Euclidean space
● A vector space is a set of vectors. Vector spaces have a property that they
can be represented as a linear combination of a smaller set of vectors,
called basis vectors.
● Any vector ‘v’ in a vector space can be represented as

where, a represents ‘n’ scalars and u represents the basis vectors.

Basis vectors are orthogonal to each other.
Two orthogonal vectors are completely unrelated or independent of each
other
● Principal Component Analysis (PCA) is used to reduce the dimensionality
of a data set by finding a new set of variables, smaller than the original set
of variables, retaining most of the sample’s information, and useful for the
regression and classification of data.
● The objective of PCA is to make the transformation in such a way that

1.The new features are distinct, i.e. the covariance between the new features, i.e. The
principal components is 0.
2. The principal components are generated in order of the variability in the data that it
captures. Hence, the first principal component should capture the maximum
variability, the second principal component should capture the next highest
variability etc.
3. The sum of variance of the new features or the principal components should be equal
to the sum of variance of the original features.
● PCA works based on a process called eigen value decomposition of a covariance
matrix of a data set.
● Below are the steps to be followed:
● 1. First, calculate the covariance matrix of a data set.
● 2. Then, calculate the eigen-values of the covariance matrix.
● 3. The eigenvector having highest eigen-value represents the direction in
which there is the highest variance. So this will help in identifying the First
Principal Component.
● 4.The eigenvector having the next highest eigen-value represents the direction in
which data has the highest remaining variance and also orthogonal to the first
direction.
● So this helps in identifying the second principal component.
● 5. Like this, identify the top ‘k’ eigenvectors having top ‘k’ eigen-values so as to
get the ‘k’ principal components.
● Linear Discriminant Analysis
Linear discriminant analysis (LDA) is another commonly used feature extraction
technique like PCA or SVD.
The objective of LDA is similar to the sense that it intends to transform a data set
into a lower dimensional feature space.
● However, unlike PCA, the focus of LDA is not to capture the data set variability.
Instead, LDA focuses on class separability, i.e. separating the features based on
class separability so as to avoid over-fitting of the machine learning model.

Hannstar J mv-4 94v-0 0823 - Dell - Studio - 1435 - 1535 - QUANTA - FM6 - DISCRETE - REV - 3A PDF
75% (4)
Hannstar J mv-4 94v-0 0823 - Dell - Studio - 1435 - 1535 - QUANTA - FM6 - DISCRETE - REV - 3A PDF
58 pages
ML Unit2 Classppt
No ratings yet
ML Unit2 Classppt
44 pages
Basics of Feature Engineering Marked
No ratings yet
Basics of Feature Engineering Marked
33 pages
CHP 4
No ratings yet
CHP 4
72 pages
UNIT04
No ratings yet
UNIT04
35 pages
ML 3170724 Unit-4
No ratings yet
ML 3170724 Unit-4
97 pages
Unit No: 4 Basics of Feature Engineering (31707 24)
No ratings yet
Unit No: 4 Basics of Feature Engineering (31707 24)
98 pages
Unit 2 Feature Engineering
No ratings yet
Unit 2 Feature Engineering
64 pages
ML UNIT 2 2 Old
No ratings yet
ML UNIT 2 2 Old
15 pages
Feature Extraction
No ratings yet
Feature Extraction
16 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
ML_Lec-20
No ratings yet
ML_Lec-20
17 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
Presentation1
No ratings yet
Presentation1
15 pages
L06 Feature Selection and Extraction
No ratings yet
L06 Feature Selection and Extraction
29 pages
Feature Engineering For Machine Learning
No ratings yet
Feature Engineering For Machine Learning
41 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Unit 3,4 and 5
No ratings yet
Unit 3,4 and 5
5 pages
Data
No ratings yet
Data
36 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Unit 3
No ratings yet
Unit 3
50 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
AI Unit-5
No ratings yet
AI Unit-5
53 pages
Data Acquisition
No ratings yet
Data Acquisition
28 pages
3c Feature Extraction
No ratings yet
3c Feature Extraction
19 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Feature Engineering
No ratings yet
Feature Engineering
51 pages
Sta 5
No ratings yet
Sta 5
16 pages
Pca
No ratings yet
Pca
6 pages
16. Principal Component Analysis
No ratings yet
16. Principal Component Analysis
27 pages
AI-Module 4 - Updated
No ratings yet
AI-Module 4 - Updated
53 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
19 pages
ML Notes
No ratings yet
ML Notes
10 pages
Unit 4 (PCA)
No ratings yet
Unit 4 (PCA)
12 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
Eature Engineering: Presenter: Prof. Amit Kumar Das
No ratings yet
Eature Engineering: Presenter: Prof. Amit Kumar Das
17 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
R21 Unit 2
No ratings yet
R21 Unit 2
101 pages
Unit-II
No ratings yet
Unit-II
119 pages
CH1
No ratings yet
CH1
64 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
DM - MOD - 1 Part III
No ratings yet
DM - MOD - 1 Part III
12 pages
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
No ratings yet
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
30 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
Ml Micro u2 Insem
No ratings yet
Ml Micro u2 Insem
4 pages
Machine Learning Unit-2
No ratings yet
Machine Learning Unit-2
12 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
ML Unit 3 Part 3
No ratings yet
ML Unit 3 Part 3
33 pages
ML Unit 3 Part 2
No ratings yet
ML Unit 3 Part 2
8 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
ML Chapter 1
No ratings yet
ML Chapter 1
41 pages
Practice Program Part-4 (Method Overloading & Parameterised Constructors)
No ratings yet
Practice Program Part-4 (Method Overloading & Parameterised Constructors)
4 pages
02-Classical Encryption
No ratings yet
02-Classical Encryption
19 pages
Practice Program Basic Java
No ratings yet
Practice Program Basic Java
3 pages
RE5VL42A
No ratings yet
RE5VL42A
29 pages
Iot Based Smart Water Management 3
No ratings yet
Iot Based Smart Water Management 3
8 pages
As Cookery-9 Q4 W2
No ratings yet
As Cookery-9 Q4 W2
5 pages
Carve Your Core 21-Day Workout: Ab Carver
No ratings yet
Carve Your Core 21-Day Workout: Ab Carver
1 page
6 SD - Test English
No ratings yet
6 SD - Test English
10 pages
Path of The Assassin v04 (Dark Horse)
No ratings yet
Path of The Assassin v04 (Dark Horse)
298 pages
The 70 Weeks of Daniel Chapter 9: Edward Huey
No ratings yet
The 70 Weeks of Daniel Chapter 9: Edward Huey
7 pages
d1010 RDP
No ratings yet
d1010 RDP
1 page
Bow - 12 Hairdressing
No ratings yet
Bow - 12 Hairdressing
15 pages
Amino Acids, Amides, Polyamides and Chirality 1 QP
No ratings yet
Amino Acids, Amides, Polyamides and Chirality 1 QP
13 pages
Truss
No ratings yet
Truss
7 pages
Intellectual Disability: Definition, Classification, Causes, and Characteristics
No ratings yet
Intellectual Disability: Definition, Classification, Causes, and Characteristics
58 pages
Steering System Vinay
No ratings yet
Steering System Vinay
10 pages
Pulp and Paper - Non-Wood Sources
No ratings yet
Pulp and Paper - Non-Wood Sources
4 pages
8th MLlab
No ratings yet
8th MLlab
3 pages
Soal Ujian Radiologi
100% (1)
Soal Ujian Radiologi
4 pages
abanindranath tagore sadangas the six limbs of paintings
No ratings yet
abanindranath tagore sadangas the six limbs of paintings
22 pages
Egan's Irish Whiskey - Case Report
No ratings yet
Egan's Irish Whiskey - Case Report
13 pages
Corrosion Protection Engineers India: Dg-Chimney, Boiler & Structures, Boiler Tubes
No ratings yet
Corrosion Protection Engineers India: Dg-Chimney, Boiler & Structures, Boiler Tubes
2 pages
Land Use Literature Review
100% (2)
Land Use Literature Review
5 pages
Exploring The Potential and Opportunities of Current Tools For Removal of Hazardous Materials From Environments
No ratings yet
Exploring The Potential and Opportunities of Current Tools For Removal of Hazardous Materials From Environments
3 pages
X X + (X - X) + (X - X) (Observation) Overall Sample
No ratings yet
X X + (X - X) + (X - X) (Observation) Overall Sample
9 pages
Earth Structures Lectures: For M. Sc. Course - CE5302
No ratings yet
Earth Structures Lectures: For M. Sc. Course - CE5302
23 pages
B+G+7 Matioes Boq
No ratings yet
B+G+7 Matioes Boq
22 pages
Ejercicios Pasado Simple
No ratings yet
Ejercicios Pasado Simple
2 pages
Demag All Terrain Cranes Spec 032383
No ratings yet
Demag All Terrain Cranes Spec 032383
36 pages
Midterm Exam Differential 231
No ratings yet
Midterm Exam Differential 231
25 pages
Flex 5340 Appliance With Pure Storage FlashBlade Best Practices
No ratings yet
Flex 5340 Appliance With Pure Storage FlashBlade Best Practices
17 pages
Kate Vaiden - A Novel by Reynolds Price
89% (9)
Kate Vaiden - A Novel by Reynolds Price
11 pages

ML Unit 2 Part 2

Uploaded by

ML Unit 2 Part 2

Uploaded by

Unit-2-Future

It has two major elements:

Both are sometimes known as feature discovery.

● Objective of feature selection is to derive a subset of

● Broadly, there are two distinct goals of feature transformation:

1.Achieving best reconstruction of the original features in the data set

2.Achieving highest efficiency in the learning task.

1.when features have categorical value and machine learning needs

● In feature extraction, new features are created from a combination of

where, a represents ‘n’ scalars and u represents the basis vectors.

You might also like