Feature Selection
Feature Selection
Group 1
Embedded Methods
Group 2 Filters
Classification of Feature Selection Methods
1. Filter method-
• Feature selection does not depend on the underlying ML model.
• Feature are selected using various statistical measures/tests and correlation with the outcome variable.
• Example: Pearson`s correlation:
Mutual Information:
2. Wrapper method-
• Selects relevant features based on the performance of a ML model.
• Forward Selection
• Start from no feature. In each iteration add one feature and do so till the performance improve.
• Backward Elimination:
• Start from all feature. Remove the least important feature and do so till the performance improve.
• Recursive Feature elimination:
• Use Greedy algorithms for feature selection.
3. Embedded method-
• Use combination of Filter and Wrapper methods.
Example: LASSO
Mutual Information and Conditional Mutual Information
Feature Space Χ Rd
Target Space Y
Dataset D = {(xi , yi )| i Є {1,…,N}} where (xi , yi ) ~ p(X,Y) for all I
Entropy of X i.e H(X) = - =
Fitness/Objective:
• Classification Problem
• Using Feature set {f1, f4, f5, f7} train any classifier like KNN, SVM, RF ..
and then compute the % correct classification (CF) in the training data.
%CF = (# of correctly classified point x 100)/Total Training Samples.
• Select another Feature Set and use the same classifier to find %CF
• Iterate the process. After termination the Features encoding in the best
chromosome in terms of %CF are the selected features.
• Clustering Problem
• Using Feature set {f1, f4, f5, f7} execute any clustering algorithm like
KMean, FCM … and compute J/Jm/any cluster validity index
• Select another Feature Set and use the same method to find the same index
• Iterate the process. After termination the Features encoding in the best
chromosome in terms of the index are the selected features.
References:
• Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013). An Introduction to Statistical
Learning. Springer.
• Yu, Lei; Liu, Huan (2003). "Feature selection for high-dimensional data: a fast correlation-based filter
solution“, Proc. 20th International Conf. on Machine Learning: 856–863.
• Oh, I. S.; Moon, B. R. (2004). "Hybrid genetic algorithms for feature selection". IEEE Transactions on
Pattern Analysis and Machine Intelligence, 26 (11): 1424–1437.
• Muni, D. P.; Pal, N. R.; Das, J. (2006). "Genetic programming for simultaneous feature selection and
classifier design". IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics :
Cybernetics. 36 (1): 106–117
• S. Mallik, T. Bhadra and U. Maulik (2017). ``Identifying Epigenetic Biomarkers using Maximal Relevance
and Minimal Redundancy Based Feature Selection for Multi-Omics Data", IEEE Transactions on
Nanobioscience, 16(1): pp. 3-10.
• S. Bandyopadhyay, T. Bhadra, and U. Maulik(2015) ``Variable Weighted Maximal Relevance Minimal
Redundancy Criterion for Feature Selection using Normalized Mutual Information", Journal of Multiple-
Valued Logic and Soft Computing, 25 (2-3), pp. 189-213.