0% found this document useful (0 votes)
8 views

Feature Selection

Uploaded by

arkadebmisra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Feature Selection

Uploaded by

arkadebmisra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

FEATURE SELECTION

Importance of Feature Selection

Feature selection help the underlying Machine Learning (ML)


Technique -
• for faster training.
• to reduce overfitting.
• to reduces complexity of the ML model.
• Improves the accuracy if proper feature subset is selected.
• Improve the interpretability.
Classification of Feature Selection Methods
@Input Data: n points in d dimension @Output Data: n points in m dimension (m<d)
Wrapper Methods

Group 1

Embedded Methods

Group 2  Filters
Classification of Feature Selection Methods
1. Filter method-
• Feature selection does not depend on the underlying ML model.
• Feature are selected using various statistical measures/tests and correlation with the outcome variable.
• Example: Pearson`s correlation:
Mutual Information:
2. Wrapper method-
• Selects relevant features based on the performance of a ML model.
• Forward Selection
• Start from no feature. In each iteration add one feature and do so till the performance improve.
• Backward Elimination:
• Start from all feature. Remove the least important feature and do so till the performance improve.
• Recursive Feature elimination:
• Use Greedy algorithms for feature selection.
3. Embedded method-
• Use combination of Filter and Wrapper methods.
Example: LASSO
Mutual Information and Conditional Mutual Information

• Mutual Information (MI)

• Represent Statistical Dependence of a variable


• Reflect ideas of relevance, redundancy and complementarity

• Conditional Mutual Information (CMI)

• Maximize the conditional likelihood between target and features


• Example: A feature set can removed from the complete feature set provided it does not increase
significantly the “information” about the target (assuming we observed the remaining features as well).
PRELIMINARIES

Feature Space  Χ Rd
Target Space  Y
Dataset D = {(xi , yi )| i Є {1,…,N}} where (xi , yi ) ~ p(X,Y) for all I
Entropy of X i.e H(X) = - =

Mutual Information I(X ; Y) = ?


I(X ; Y) = DKL(p(X, Y) || p(X) * p(Y))
• MI is symmetric
• CMI satisfies Chain rule
Given a set of indices A, CMI between Y and XA given XȂ = I(Y;X| XȂ)
Example: Let us assume there are 8 features
Encoding: 8 bit binary Chromosome
f1 f2 f3 f4 f5 f6 f7 f8
1 0 0 1 1 0 1 0

Fitness/Objective:
• Classification Problem
• Using Feature set {f1, f4, f5, f7} train any classifier like KNN, SVM, RF ..
and then compute the % correct classification (CF) in the training data.
%CF = (# of correctly classified point x 100)/Total Training Samples.
• Select another Feature Set and use the same classifier to find %CF
• Iterate the process. After termination the Features encoding in the best
chromosome in terms of %CF are the selected features.

• Clustering Problem
• Using Feature set {f1, f4, f5, f7} execute any clustering algorithm like
KMean, FCM … and compute J/Jm/any cluster validity index
• Select another Feature Set and use the same method to find the same index
• Iterate the process. After termination the Features encoding in the best
chromosome in terms of the index are the selected features.
References:
• Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013). An Introduction to Statistical
Learning. Springer.
• Yu, Lei; Liu, Huan (2003). "Feature selection for high-dimensional data: a fast correlation-based filter
solution“, Proc. 20th International Conf. on Machine Learning: 856–863.
• Oh, I. S.; Moon, B. R. (2004). "Hybrid genetic algorithms for feature selection". IEEE Transactions on
Pattern Analysis and Machine Intelligence, 26 (11): 1424–1437.
• Muni, D. P.; Pal, N. R.; Das, J. (2006). "Genetic programming for simultaneous feature selection and
classifier design". IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics :
Cybernetics. 36 (1): 106–117
• S. Mallik, T. Bhadra and U. Maulik (2017). ``Identifying Epigenetic Biomarkers using Maximal Relevance
and Minimal Redundancy Based Feature Selection for Multi-Omics Data", IEEE Transactions on
Nanobioscience, 16(1): pp. 3-10.
• S. Bandyopadhyay, T. Bhadra, and U. Maulik(2015) ``Variable Weighted Maximal Relevance Minimal
Redundancy Criterion for Feature Selection using Normalized Mutual Information", Journal of Multiple-
Valued Logic and Soft Computing, 25 (2-3), pp. 189-213.

You might also like