Feature Selection Techniques in Machine Learning
Feature Selection Techniques in Machine Learning
Techniques in Machine
Learning
Dr. Shalini Gambhir
methods can significantly enhance model performance, reduce overfitting, and improve interoperability.
Three main categories of supervised feature selection methods: filter-based, wrapper-based, and embedded approaches.
• 1) Filter-based Methods
• Filter methods evaluate the intrinsic properties of features using univariate
statistics, making them computationally efficient and independent of the machine
learning algorithm.
• Use Cases
• Used in regression models (e.g., linear regression).
• Applied in machine learning classification problems.
• Useful in high-dimensional datasets where feature reduction is required.
• Use Cases
• Used in regression models (e.g., linear regression with p-values).
• Applied in machine learning to reduce dimensionality.
• Useful when dealing with a moderate number of features (not too large).
Dr. Shalini Gambhir
2.3 Exhaustive Feature Selection
• What is Exhaustive Feature Selection?
• Exhaustive Feature Selection is a comprehensive feature selection
technique that evaluates all possible combinations of features to
determine the optimal subset. Unlike Forward or Backward Selection,
which add or remove features iteratively, Exhaustive Selection tests every
feature combination to find the best-performing one.
• How Exhaustive Feature Selection Works
1.Generate All Possible Feature Combinations
1. The algorithm considers every possible subset of features, from a single feature to
all available features.
2.Train and Evaluate a Model for Each Combination
1. A model is trained and tested for each possible feature subset.
2. A predefined metric (e.g., accuracy, R², AUC-ROC) is used to measure performance.
3.Select the Best Subset
1. The feature subset that yields the highest model performance is selected.
• Use Cases
• Suitable for small datasets with a limited number of features.
• Used in high-stakes applications where model accuracy is critical, such as healthcare and finance.
• Helpful when computational resources are not a constraint.
Dr. Shalini Gambhir
2.4 Recursive Feature Elimination (RFE)
• Recursive Feature Elimination (RFE) is an iterative feature selection
technique that removes the least significant features step by step, refining
the model at each iteration until the best subset of features is selected.
• How RFE Works
1.Train a Model on All Features
1. A machine learning model (e.g., linear regression, SVM, decision tree) is trained on
the full feature set.
2.Rank Feature Importance
1. The model assigns importance scores to each feature.
3.Remove the Least Important Feature(s)
1. The feature with the lowest importance score is removed.
4.Repeat Until Desired Number of Features is Reached
1. The process continues recursively until the specified number of features remains.
• Disadvantages of RFE
• Computational Cost: More expensive than simple methods like Forward or Backward Selection.
• Feature Importance is Model-Dependent: The ranking depends on the choice of the model, which may lead
to different selections for different algorithms.
• Use Cases
• Commonly used in predictive modeling for selecting the most important features.
• Useful in scenarios where feature selection needs to be automated without exhaustive search.
• Works well for models like Support Vector Machines (SVM), Logistic Regression, and Decision Trees.
1.5) Autoencoder
• Autoencoders are neural networks that learn to compress and reconstruct data. The compressed representation
can be used for feature selection.
• Benefits of PCA
• Reduces Dimensionality – Helps in handling large datasets.
• Removes Redundancy – Eliminates correlated features.
• Improves Computation Speed – Useful in ML models.
• Enhances Visualization – Converts high-dimensional data into 2D or 3D for better
interpretation.
• Benefits of ICA
• Separates Mixed Signals – Used in noise removal and feature extraction.
• Enhances Data Interpretability – Useful in medical and financial applications.
• Removes Redundant Information – Makes data analysis more efficient.
• ICA is widely used in speech processing, EEG signal analysis, and financial data
modeling
• Benefits of NMF
• Enhances Interpretability – Outputs are easily understandable.
• Sparse Representations – Captures essential features with reduced redundancy.
• Used in Various Applications – Topic modeling, image processing, bioinformatics.
NMF is widely used in recommender systems, text mining, and signal processing.
• Architecture of an Autoencoder
1. Input Layer → Takes the raw data (e.g., images, text).
2. Encoder → Maps input to a lower-dimensional representation.
3. Bottleneck (Latent Space) → Captures the most important features.
4. Decoder → Reconstructs the original data from the latent space.
5. Output Layer → Produces a reconstructed version of the input.
• Applications of Autoencoders
• Image Denoising & Compression
• Anomaly Detection (Fraud, Medical Imaging)
• Data Generation (Variational Autoencoders - VAEs)