unit2
unit2
1
Objectives
2
INTRODUCTION
4
Feature:
6
Feature engineering
7
1. Feature transformation
12
13
Feature construction
Example:
Data set: Real Estate dataset
Attributes: apartment length, apartment breadth, and
price of the apartment.
14
Situations where feature construction is an essential
activity:
15
1. Encoding categorical (nominal) variables
Athletes dataset:
18
2. Transforming numeric (continuous) features to
categorical features
19
20
3. Text-specific feature construction
22
First tokenize a corpus, the blank spaces and
punctuations are used as delimiters to separate out
the words, or tokens.
Definition:
Dataset with a feature set Fi (F1, F2 , …, Fn).
25
After feature extraction using a mapping function f
f (F1 , F2 , …, Fn ) then, we will have a set of features
Feat1=0.3 * 34 + 0.9 * 34.5=41.25
Feat2=34.5 + 0.5*23 + 26
0.6*233=185.80
The most popular feature extraction algorithms used in
machine learning:
27
Principal Component Analysis(PCA)
Example:
If the height is more, generally weight is more and vice
28
versa.
29
Basis vector:
A vector is a quantity having both magnitude and
direction and hence can determine the position of a
point relative to another point in the Euclidean space
30
where, a represents ‘n’ scalars and u represents the basis
vectors.
Orthogonality of vectors in n-dimensional vector space
can be thought of an extension of the vectors being
perpendicular in (i.e completely unrelated or independent
of each other. ) in two-dimensional vector space
31
The objective of PCA is to make the transformation in such
a way that
X Y
4 11
8 4
13 5
7 14
Total=32 Total=34
34
Dr. mohammed Alahmed
35
36
Singular value decomposition
D = D × [v1 , v2 , … , vk ]
AT.A
Step 1: find
AT.A = 9 -1
-1 9
Step 2:find eigen values of AT.A.
Characteristic equation of AT.A is AT.A- I=0
9- -1
-1 9- =0
1=10, 2=8 Eigen Values
square of – sum of diagonal elements of A + det of
A. 41
find eigen vectors
(A-
I)X1=0
Now in this Characteristic matrix we
need to submit with diagonal matrix.
i.e 9-10 -1 -1 -1
-1 9-10 = -1 -1
=0
-x-y=0
x=1 y=-1 x1=
42
Step 2:
Find normalized eigen values
For X1(v1)=
V=
43
A.V1= 1 -1
-2 2
2 2 *
U1= U2=
44
U3=
Step 7:
Now combine u1,u2,u3 and write complete U matrix
Similarly
Σ=
VT =
45
Linear Discriminant Analysis
47
where, mi is the sample mean for each class, m is the overall
mean of the data set, Ni is the sample size of each class
3 FEATURE SUBSET SELECTION
Example:
The student weight data set has features such as
Roll Number, Age, Height, and Weight.
Examples:
53
4.3.2 Key drivers of feature selection – feature
relevance and redundancy
Feature relevance :
In supervised learning, the input dataset which is the
training dataset, has a class label attached
58
Feature Redundancy…
60
4.3.3 Measures of feature relevance and redundancy
1. Correlation-based measures
2. Distance-based measures, and
3. Other coefficient-based measure
61
Measures of feature relevance
1. Correlation-based measures
2. Distance-based measures, and
3. Other coefficient-based measure
65
1. Correlation-based similarity measure
Correlation is a measure of linear dependency between
two random variables.
66
1. Correlation-based similarity measure
67
2. Distance-based similarity measure
The most common distance measure is the Euclidean
distance, which, between two features F1 and F2 are
calculated as:
68
2. Distance-based similarity measure
Contd..
69
2. Distance between Binary Vectors
71
3. Other similarity measures ..
Where
n11 = number of cases where both the features have value 1
n01 = number of cases where the feature1 has value 0 and feature2
has value 1
n10= number of cases where the feature 1 has value 1 and feature2
has value 0
72
Jaccard distance d = 1 - J
3. Other similarity measures ..
where,
n11= number of cases where both the features have value 1
n01= number of cases where the feature 1 has value 0 and
feature 2 has value 1
n10 = number of cases where the feature 1 has value 1 and
feature 2 has value 0
n11 = number of cases where both the features have value 0 74
Quite understandably, the total count of rows, n = n00 +
n01 + n10 + n11 . All values have been included in the
calculation of SMC.
75
Cosine similarity…
77
Cosine similarity…
79
4.3.4 Overall feature selection process
Feature selection is the process of selecting a subset of
features in a data set. It consists of four steps:
80
Reasons for Feature selection
1. Simple model
2. Shorter training time
3. Avoid curse of dimensionality
4. reduce overfitting
83
Validation
1. Filter approach
2. Wrapper approach
3. Hybrid approach
4. Embedded approach
85
Filter Approach
86
wrapper approach
88
Hybrid approach
89
Embedded approach
90
91