Feature Selection

Uploaded by

arkadebmisra

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Feature Selection

Uploaded by

arkadebmisra

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

FEATURE SELECTION

Importance of Feature Selection

Feature selection help the underlying Machine Learning (ML)

Technique -
• for faster training.
• to reduce overfitting.
• to reduces complexity of the ML model.
• Improves the accuracy if proper feature subset is selected.
• Improve the interpretability.
Classification of Feature Selection Methods
@Input Data: n points in d dimension @Output Data: n points in m dimension (m<d)
Wrapper Methods

Group 1

Embedded Methods

Group 2  Filters
Classification of Feature Selection Methods
1. Filter method-
• Feature selection does not depend on the underlying ML model.
• Feature are selected using various statistical measures/tests and correlation with the outcome variable.
• Example: Pearson`s correlation:
Mutual Information:
2. Wrapper method-
• Selects relevant features based on the performance of a ML model.
• Forward Selection
• Start from no feature. In each iteration add one feature and do so till the performance improve.
• Backward Elimination:
• Start from all feature. Remove the least important feature and do so till the performance improve.
• Recursive Feature elimination:
• Use Greedy algorithms for feature selection.
3. Embedded method-
• Use combination of Filter and Wrapper methods.
Example: LASSO
Mutual Information and Conditional Mutual Information

• Mutual Information (MI)

• Represent Statistical Dependence of a variable

• Reflect ideas of relevance, redundancy and complementarity

• Conditional Mutual Information (CMI)

• Maximize the conditional likelihood between target and features

• Example: A feature set can removed from the complete feature set provided it does not increase
significantly the “information” about the target (assuming we observed the remaining features as well).
PRELIMINARIES

Feature Space  Χ Rd
Target Space  Y
Dataset D = {(xi , yi )| i Є {1,…,N}} where (xi , yi ) ~ p(X,Y) for all I
Entropy of X i.e H(X) = - =

Mutual Information I(X ; Y) = ?

I(X ; Y) = DKL(p(X, Y) || p(X) * p(Y))
• MI is symmetric
• CMI satisfies Chain rule
Given a set of indices A, CMI between Y and XA given XȂ = I(Y;X| XȂ)
Example: Let us assume there are 8 features
Encoding: 8 bit binary Chromosome
f1 f2 f3 f4 f5 f6 f7 f8
1 0 0 1 1 0 1 0

Fitness/Objective:
• Classification Problem
• Using Feature set {f1, f4, f5, f7} train any classifier like KNN, SVM, RF ..
and then compute the % correct classification (CF) in the training data.
%CF = (# of correctly classified point x 100)/Total Training Samples.
• Select another Feature Set and use the same classifier to find %CF
• Iterate the process. After termination the Features encoding in the best
chromosome in terms of %CF are the selected features.

• Clustering Problem
• Using Feature set {f1, f4, f5, f7} execute any clustering algorithm like
KMean, FCM … and compute J/Jm/any cluster validity index
• Select another Feature Set and use the same method to find the same index
• Iterate the process. After termination the Features encoding in the best
chromosome in terms of the index are the selected features.
References:
• Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013). An Introduction to Statistical
Learning. Springer.
• Yu, Lei; Liu, Huan (2003). "Feature selection for high-dimensional data: a fast correlation-based filter
solution“, Proc. 20th International Conf. on Machine Learning: 856–863.
• Oh, I. S.; Moon, B. R. (2004). "Hybrid genetic algorithms for feature selection". IEEE Transactions on
Pattern Analysis and Machine Intelligence, 26 (11): 1424–1437.
• Muni, D. P.; Pal, N. R.; Das, J. (2006). "Genetic programming for simultaneous feature selection and
classifier design". IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics :
Cybernetics. 36 (1): 106–117
• S. Mallik, T. Bhadra and U. Maulik (2017). ``Identifying Epigenetic Biomarkers using Maximal Relevance
and Minimal Redundancy Based Feature Selection for Multi-Omics Data", IEEE Transactions on
Nanobioscience, 16(1): pp. 3-10.
• S. Bandyopadhyay, T. Bhadra, and U. Maulik(2015) ``Variable Weighted Maximal Relevance Minimal
Redundancy Criterion for Feature Selection using Normalized Mutual Information", Journal of Multiple-
Valued Logic and Soft Computing, 25 (2-3), pp. 189-213.

Materials Science and Engineering - A First Course - V. Raghavan
No ratings yet
Materials Science and Engineering - A First Course - V. Raghavan
53 pages
Lecture Notes on Machine Learning Concepts.docx
No ratings yet
Lecture Notes on Machine Learning Concepts.docx
5 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
ML Lab File[1]
No ratings yet
ML Lab File[1]
43 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
Papers Summary
No ratings yet
Papers Summary
9 pages
305 BA PYTHON - APR 2022 ANSWER Key
No ratings yet
305 BA PYTHON - APR 2022 ANSWER Key
14 pages
What Are The Basic Concepts in Machine Learning
No ratings yet
What Are The Basic Concepts in Machine Learning
3 pages
A Comparative Study Between Feature Selection Algorithms - Ok
No ratings yet
A Comparative Study Between Feature Selection Algorithms - Ok
10 pages
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
No ratings yet
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
4 pages
ML Notes.docx
No ratings yet
ML Notes.docx
15 pages
PCA-based Algorithm For Constructing Ensembles of Feature Ranking Filters
No ratings yet
PCA-based Algorithm For Constructing Ensembles of Feature Ranking Filters
6 pages
ML pr5
No ratings yet
ML pr5
3 pages
Kernels, Model Selection and Feature Selection
No ratings yet
Kernels, Model Selection and Feature Selection
5 pages
CS464_Ch5_FeatureSelection
No ratings yet
CS464_Ch5_FeatureSelection
31 pages
UCS_401_Unit-LV_ Trends in Machine Learning_Model and Symbols- Bagging and Boosting, Multitask
No ratings yet
UCS_401_Unit-LV_ Trends in Machine Learning_Model and Symbols- Bagging and Boosting, Multitask
44 pages
Semi Supervised Learning
No ratings yet
Semi Supervised Learning
86 pages
ML Unit 3
No ratings yet
ML Unit 3
10 pages
AI Unit-5
No ratings yet
AI Unit-5
53 pages
EE353 - 769 06 Intro To ML
No ratings yet
EE353 - 769 06 Intro To ML
27 pages
GAA UNIT 5 - Part A
No ratings yet
GAA UNIT 5 - Part A
41 pages
Fusion of Feature Selection With Symbolic Approach For Dimensionality Reduction
No ratings yet
Fusion of Feature Selection With Symbolic Approach For Dimensionality Reduction
4 pages
AIML
No ratings yet
AIML
30 pages
Untitled 10
No ratings yet
Untitled 10
12 pages
ML U4
No ratings yet
ML U4
48 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Unit 1 ML_Ver 2
No ratings yet
Unit 1 ML_Ver 2
56 pages
Unit 2
No ratings yet
Unit 2
55 pages
ML Notes
No ratings yet
ML Notes
79 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
ML-Lecture-6-7-preprocess
No ratings yet
ML-Lecture-6-7-preprocess
43 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
MLANS
No ratings yet
MLANS
26 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Chatgpt Unit - 3
No ratings yet
Chatgpt Unit - 3
4 pages
Machine Learning – I[1]
No ratings yet
Machine Learning – I[1]
126 pages
Pattern recognition unit 2
No ratings yet
Pattern recognition unit 2
24 pages
Survey 2006
No ratings yet
Survey 2006
15 pages
Data Preparation For ML in Practice v213
No ratings yet
Data Preparation For ML in Practice v213
78 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Lab 4_Feature Selection_Appendix
No ratings yet
Lab 4_Feature Selection_Appendix
3 pages
Module 3
No ratings yet
Module 3
33 pages
Lab Report 4
No ratings yet
Lab Report 4
6 pages
Uklanjanje I Normalizacija Karakteristika
No ratings yet
Uklanjanje I Normalizacija Karakteristika
2 pages
Unit 3
No ratings yet
Unit 3
50 pages
Report
No ratings yet
Report
7 pages
Summery of Feature Eng
No ratings yet
Summery of Feature Eng
4 pages
ds viva
No ratings yet
ds viva
9 pages
Dimensionality Reduction of High Dimensional Data: Summer Internship Project Summary
No ratings yet
Dimensionality Reduction of High Dimensional Data: Summer Internship Project Summary
20 pages
Assignment1_LATEX
No ratings yet
Assignment1_LATEX
11 pages
Samplepaper
No ratings yet
Samplepaper
13 pages
u1 p2 2
No ratings yet
u1 p2 2
66 pages
FAM_QUESTION_BANK_CT[1]
No ratings yet
FAM_QUESTION_BANK_CT[1]
14 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
64 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Classification and Clustering
No ratings yet
Classification and Clustering
80 pages
Svm Student
No ratings yet
Svm Student
40 pages
Pca
No ratings yet
Pca
34 pages
Decision Tree
No ratings yet
Decision Tree
10 pages
Clustering 2
No ratings yet
Clustering 2
13 pages
De Speaking Moi Nhat Update Thang 5 - 2020
No ratings yet
De Speaking Moi Nhat Update Thang 5 - 2020
204 pages
BCECE Physics Question Paper 2015 PDF Download
No ratings yet
BCECE Physics Question Paper 2015 PDF Download
37 pages
Swiss Basic Practical Logbook
No ratings yet
Swiss Basic Practical Logbook
21 pages
British Journal of Nursing - The Endocrine System and Associated Disorders
No ratings yet
British Journal of Nursing - The Endocrine System and Associated Disorders
17 pages
Sikacim Latex - Pds en PDF
No ratings yet
Sikacim Latex - Pds en PDF
3 pages
Howo Trucks Fault Codes List - Table of Diagnostic Flashing Codes For Faults of The Fuel Pump System of Common Rail Diesel Engine PDF
80% (5)
Howo Trucks Fault Codes List - Table of Diagnostic Flashing Codes For Faults of The Fuel Pump System of Common Rail Diesel Engine PDF
15 pages
PARTNERSHIP DEED - Dox
No ratings yet
PARTNERSHIP DEED - Dox
2 pages
Polycentric History of Psychology
No ratings yet
Polycentric History of Psychology
4 pages
Realme 9 Pro 5G - Layers
No ratings yet
Realme 9 Pro 5G - Layers
1 page
RF Power Field Effect Transistors: N - Channel Enhancement - Mode Lateral Mosfets
No ratings yet
RF Power Field Effect Transistors: N - Channel Enhancement - Mode Lateral Mosfets
13 pages
Lesson Plan 1
No ratings yet
Lesson Plan 1
4 pages
Percentage Sheet - 04
No ratings yet
Percentage Sheet - 04
2 pages
One-Way ANOVA & Kruskal-Wallis Test: DR Elaine Chan Wan Ling
No ratings yet
One-Way ANOVA & Kruskal-Wallis Test: DR Elaine Chan Wan Ling
26 pages
Class note for Electronics device for class 12 physics
No ratings yet
Class note for Electronics device for class 12 physics
6 pages
(Ebook) Metals in Medicine by James C. Dabrowiak ISBN 9780470681961, 9780470681978, 0470681969, 0470681977 - Own the complete ebook set now in PDF and DOCX formats
No ratings yet
(Ebook) Metals in Medicine by James C. Dabrowiak ISBN 9780470681961, 9780470681978, 0470681969, 0470681977 - Own the complete ebook set now in PDF and DOCX formats
60 pages
0000 Stats Practice
No ratings yet
0000 Stats Practice
52 pages
Scientist Photo
No ratings yet
Scientist Photo
9 pages
Learning Skills Assessment Record
100% (1)
Learning Skills Assessment Record
2 pages
Self - Assessment Guide Beuaty Care
No ratings yet
Self - Assessment Guide Beuaty Care
13 pages
Lecture 5 - Breast - Special Regions of The Chest
No ratings yet
Lecture 5 - Breast - Special Regions of The Chest
19 pages
UMPT Introduction and Discussion For NBI 0711-Libre
No ratings yet
UMPT Introduction and Discussion For NBI 0711-Libre
48 pages
New TIP Course 3 DepEd Teacher
No ratings yet
New TIP Course 3 DepEd Teacher
123 pages
Nouns Exercise Answers
No ratings yet
Nouns Exercise Answers
8 pages
Falla BMW 328i
No ratings yet
Falla BMW 328i
5 pages
Final Project
No ratings yet
Final Project
10 pages
WBO and WLN air compressors
No ratings yet
WBO and WLN air compressors
20 pages
Solution To Three Gods Problem
No ratings yet
Solution To Three Gods Problem
2 pages
Vandana Environmental
No ratings yet
Vandana Environmental
8 pages
Python Module 5
No ratings yet
Python Module 5
19 pages