0% found this document useful (0 votes)

12 views5 pages

Feature engineering

Feature selection is a machine learning technique aimed at identifying the most relevant features from a larger set to enhance model performance and interpretability. It can be approached through filter methods, which rank features based on statistical scores, or wrapper methods, which evaluate feature subsets based on model performance. Additionally, embedded methods integrate feature selection into the model training process, with techniques like Recursive Feature Elimination and Genetic Algorithms being commonly used.

Uploaded by

kyalomakau08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views5 pages

Feature engineering

Uploaded by

kyalomakau08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Feature selection is a technique in machine learning that involves choosing a subset of the most

relevant features from a larger set of features to use in building a model. The goal is to select the
features that have the highest impact on the target variable and remove the redundant, irrelevant,
or noisy features, which can improve the performance of the model, reduce overfitting, and make
the model easier to interpret.

There are two main approaches to feature selection:

1. Filter Methods: These methods rank the features based on a statistical score and select the
top-k features. Examples of filter methods include Pearson's correlation, mutual
information, and chi-squared test.
2. Wrapper Methods: These methods use the model performance as the evaluation criterion
to choose the best features. Examples of wrapper methods include Recursive Feature
Elimination (RFE), which removes the least important feature at each iteration, and
genetic algorithms.

In addition, there are also embedded methods, which incorporate feature selection as a part of the
model training process. Examples include decision trees, regularization methods such as Lasso
and Ridge Regression, and deep learning models with Dropout.

Filter methods

Filter methods for feature selection in machine learning are a set of techniques used to select a
subset of the original features based on a certain criterion. They are mainly divided into two
categories:

1. Statistical-based methods: These methods use statistical tests to evaluate the relevance of
each feature with the target variable. Some common methods include chi-squared test,
correlation coefficient, and ANOVA.
2. Correlation-based methods: These methods measure the correlation between features and
the target variable, and then select features with a higher correlation coefficient. Some
popular correlation-based methods are Pearson's correlation, Spearman's correlation, and
Kendall's correlation.

Filter methods are generally fast and easy to implement, but they don't consider the relationship
between features and the target variable, which may result in suboptimal performance.

Wrapper methods

Wrapper methods for feature selection in machine learning are a set of techniques that use a
machine learning algorithm to evaluate the performance of different feature subsets. The goal of
wrapper methods is to find the optimal subset of features that provides the best performance in
terms of a specific evaluation metric.

The basic idea behind wrapper methods is to train a machine learning model on various
combinations of features, and then select the combination that gives the best performance. This
process can be repeated multiple times, using different algorithms and/or different evaluation
metrics.

Some popular wrapper methods for feature selection include Recursive Feature Elimination
(RFE), Sequential Backward Selection (SBS), and Genetic Algorithms (GA). Wrapper methods
are more computationally expensive than filter methods, but they can provide a more accurate
representation of the relationship between features and the target variable, which can lead to
better performance.

Cont..

Wrapper methods for feature selection in machine learning are a set of techniques that evaluate
feature subsets by training a model with a given set of features and then evaluating its
performance. The subset with the best performance is selected. Some common wrapper methods
include:

1. Recursive Feature Elimination (RFE): It involves training a model with all the features
and then removing the least important features one by one until the desired number of
features is reached.
2. Backward Feature Elimination: Similar to RFE, but it starts with a high number of
features and iteratively removes the least important features.
3. Forward Feature Selection: It involves training models with subsets of increasing size,
starting from an empty set and adding one feature at a time.
4. Genetic Algorithm-based Feature Selection: It is a meta-heuristic optimization technique
inspired by the process of natural selection and genetics.

Wrapper methods are more computationally expensive than filter methods, but they often result
in better performance as they take into account the relationship between features and the target
variable.

Recursive Feature Elimination

Recursive Feature Elimination (RFE) is a feature selection method in machine learning that
involves recursively removing the least important features until the desired number of features is
reached.

The process of RFE is as follows:

1. Train a model with all the features

2. Rank the features based on their importance, as determined by the model
3. Remove the feature with the lowest importance
4. Train the model again with the remaining features
5. Repeat steps 3 and 4 until the desired number of features is reached

RFE is often used with linear models such as Logistic Regression, SVM, and Linear Regression,
but it can be used with any model that has a feature importance attribute.
RFE is a wrapper method, which means it evaluates feature subsets by training a model and
evaluating its performance, unlike filter methods that use statistical tests to select features. This
makes RFE a more computationally expensive method, but it often results in better performance
as it takes into account the relationship between features and the target variable.

Backward Feature elimination method

Backward Feature Elimination is a feature selection method in machine learning that involves
starting with all the features and iteratively removing the least important features until the
desired number of features is reached.

The process of Backward Feature Elimination is as follows:

1. Train a model with all the features

Backward Feature Elimination is similar to Recursive Feature Elimination (RFE), but it starts
with all the features and removes the least important ones, while RFE starts with a small number
of features and adds the most important ones.

Backward Feature Elimination is often used with linear models such as Logistic Regression,
SVM, and Linear Regression, but it can be used with any model that has a feature importance
attribute.

Like RFE, Backward Feature Elimination is a wrapper method that evaluates feature subsets by
training a model and evaluating its performance, making it a computationally expensive method.
However, it often results in better performance as it takes into account the relationship between
features and the target variable.

Backward Feature Elimination is a technique used in Machine Learning to reduce the number of
features in a model by removing the least important ones. Here are a few examples of backward
feature elimination:

1. Recursive Feature Elimination (RFE): RFE is a recursive algorithm that starts with all
features and iteratively removes the feature with the lowest coefficient until a desired
number of features is reached.
2. Lasso Regression: Lasso Regression uses L1 regularization to penalize the coefficients of
less important features and drive them towards zero. Features with a coefficient of zero
are then removed.
3. Random Forest: Random Forest models have an in-built feature importance attribute that
can be used to rank features and eliminate the least important ones.
4. Gradient Boosting: Gradient Boosting models can also be used to rank features based on
their contribution to the model's accuracy, and eliminate the least important ones.

Forward Feature Selection

Forward Feature Selection is a feature selection method in machine learning that involves
starting with an empty set of features and adding one feature at a time until the desired number of
features is reached.

The process of Forward Feature Selection is as follows:

1. Train a model with an empty set of features

2. Evaluate the performance of the model
3. Add the feature that results in the greatest improvement in performance
4. Train the model again with the added feature
5. Repeat steps 3 and 4 until the desired number of features is reached or until a stopping
criterion is met

Forward Feature Selection is a greedy algorithm, as it only adds the feature that results in the
greatest improvement in performance at each step, without considering the interactions between
features.

Forward Feature Selection is often used with linear models such as Logistic Regression, SVM,
and Linear Regression, but it can be used with any model.

Like other wrapper methods, Forward Feature Selection evaluates feature subsets by training a
model and evaluating its performance, making it a computationally expensive method. However,
it often results in better performance as it takes into account the relationship between features
and the target variable.

Forward Feature Selection is a technique used in Machine Learning to select the most relevant
features for a model by adding them one-by-one. Here are a few examples of forward feature
selection:

1. Stepwise Regression: Stepwise Regression starts with an empty model and adds features
based on their statistical significance until no more features can be added.
2. Forward Selection: Forward Selection is similar to Stepwise Regression but adds a single
feature at a time and selects the feature that provides the greatest improvement to the
model.
3. Genetic Algorithms: Genetic Algorithms can be used to evolve a population of models
and select the best one based on performance. The algorithm can be set up to consider
adding a new feature to the model in each iteration.
4. Greedy Algorithms: Greedy Algorithms start with an empty model and add features one-
by-one based on their contribution to the model's accuracy. The algorithm selects the
feature that results in the greatest improvement in accuracy.
5. Floated Feature Selection: Floated Feature Selection is a heuristic-based method that
starts with a random subset of features and adds new features to the model based on their
contribution to the model's accuracy. The algorithm continues adding new features until
no further improvement in accuracy can be achieved.

Genetic Algorithm

Genetic Algorithm (GA) is a meta-heuristic optimization technique inspired by the process of

natural selection and genetics, that can be used for feature selection in machine learning.

GA works by representing a set of features as a chromosome, and then using genetic operators
such as crossover and mutation to generate new chromosomes. The chromosomes are evaluated
based on a fitness function that measures the performance of the model trained with the
corresponding set of features. The chromosomes with the best fitness are selected to generate the
next generation, until a stopping criterion is met.

GA can be used with any type of machine learning model, and it is especially useful when the
relationship between features and the target variable is complex.

Like other wrapper methods, GA evaluates feature subsets by training a model and evaluating its
performance, making it a computationally expensive method. However, it has the advantage of
considering the interactions between features and is therefore well-suited for problems where the
relationship between features and the target variable is non-linear.

Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
Quiz S-Curve
No ratings yet
Quiz S-Curve
4 pages
Umhlaba Ofile - Isahluko 1 & 2
No ratings yet
Umhlaba Ofile - Isahluko 1 & 2
21 pages
gcl3
No ratings yet
gcl3
12 pages
Untitled (1)
No ratings yet
Untitled (1)
93 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
52 pages
Module5.2 Feature selection methods
No ratings yet
Module5.2 Feature selection methods
64 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Kraev, E., Koseoglu, B., Traverso, L., & Topiwalla, M. (2024). Shap-Select Lightweight Feature Selection Using SHAP Values and Regression. ArXiv Preprint ArXiv2410.06815.
No ratings yet
Kraev, E., Koseoglu, B., Traverso, L., & Topiwalla, M. (2024). Shap-Select Lightweight Feature Selection Using SHAP Values and Regression. ArXiv Preprint ArXiv2410.06815.
13 pages
Presentation 1 (2)
No ratings yet
Presentation 1 (2)
22 pages
chen2007
No ratings yet
chen2007
7 pages
icml2005
No ratings yet
icml2005
8 pages
n2020
No ratings yet
n2020
6 pages
Feature Selection Techniques For Microarray Dataset: A Review
No ratings yet
Feature Selection Techniques For Microarray Dataset: A Review
8 pages
A Review of Feature Selection and Its Methods: Cybernetics and Information Technologies March 2019
No ratings yet
A Review of Feature Selection and Its Methods: Cybernetics and Information Technologies March 2019
25 pages
Fast Clustering Based Feature Selection: Ubed S. Attar, Ajinkya N. Bapat, Nilesh S. Bhagure, Popat A. Bhesar
No ratings yet
Fast Clustering Based Feature Selection: Ubed S. Attar, Ajinkya N. Bapat, Nilesh S. Bhagure, Popat A. Bhesar
7 pages
u1 p2 2
No ratings yet
u1 p2 2
66 pages
Feature selection techniques
No ratings yet
Feature selection techniques
5 pages
Module2.1 Feature Selection
No ratings yet
Module2.1 Feature Selection
46 pages
A Review of Feature Selection Methods With Applications
No ratings yet
A Review of Feature Selection Methods With Applications
6 pages
Dimensionality Reduction of High Dimensional Data: Summer Internship Project Summary
No ratings yet
Dimensionality Reduction of High Dimensional Data: Summer Internship Project Summary
20 pages
A Review of Feature Selection and Its Methods
No ratings yet
A Review of Feature Selection and Its Methods
15 pages
3038-Article Text-5729-1-10-20210418
No ratings yet
3038-Article Text-5729-1-10-20210418
6 pages
7 Selectia trasaturilor
No ratings yet
7 Selectia trasaturilor
54 pages
Branch and Bound Feature Selection
No ratings yet
Branch and Bound Feature Selection
4 pages
Presentation On 30%, 60%, and 90% Model Reviews in Electrical Design Engineering
No ratings yet
Presentation On 30%, 60%, and 90% Model Reviews in Electrical Design Engineering
3 pages
Featuere Selection
No ratings yet
Featuere Selection
5 pages
40 Machine Learning Algorithms
From Everand
40 Machine Learning Algorithms
Anam Giri
No ratings yet
Feature Selection
No ratings yet
Feature Selection
2 pages
Unit 3
No ratings yet
Unit 3
50 pages
Module-3 - DS (Autosaved)
No ratings yet
Module-3 - DS (Autosaved)
18 pages
Feature Selection
No ratings yet
Feature Selection
56 pages
Feature Selection Technique
No ratings yet
Feature Selection Technique
7 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Hybrid-Recursive Feature Elimination for Efficient Feature Selection
No ratings yet
Hybrid-Recursive Feature Elimination for Efficient Feature Selection
9 pages
Feature Selection
No ratings yet
Feature Selection
36 pages
PIA Exam Guide - New
No ratings yet
PIA Exam Guide - New
4 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
DOC-20241211-WA0028.
No ratings yet
DOC-20241211-WA0028.
10 pages
Master Thesis Passive Voice
100% (3)
Master Thesis Passive Voice
4 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Data Prep for ML-1
No ratings yet
Data Prep for ML-1
5 pages
A Review of Feature Selection Methods On Synthetic Data
No ratings yet
A Review of Feature Selection Methods On Synthetic Data
37 pages
Feature Pruning and Normalization
No ratings yet
Feature Pruning and Normalization
8 pages
Feature Subset Selection With Fast Algorithm Implementation
No ratings yet
Feature Subset Selection With Fast Algorithm Implementation
5 pages
SuuVan
No ratings yet
SuuVan
12 pages
Feature Selection
No ratings yet
Feature Selection
5 pages
Feature Selection - Study Material
No ratings yet
Feature Selection - Study Material
6 pages
Module 3
No ratings yet
Module 3
79 pages
Literature Review On Feature Subset Selection Techniques
No ratings yet
Literature Review On Feature Subset Selection Techniques
3 pages
Listening TOEFL Ibt
No ratings yet
Listening TOEFL Ibt
20 pages
Shubhra Ranjan Ias Study General Studies Paper Ii Test 08
No ratings yet
Shubhra Ranjan Ias Study General Studies Paper Ii Test 08
3 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Warpper Method
No ratings yet
Warpper Method
8 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
DYNAMICS of Semester System
No ratings yet
DYNAMICS of Semester System
19 pages
Kernels, Model Selection and Feature Selection
No ratings yet
Kernels, Model Selection and Feature Selection
5 pages
Pizzorno, A. Int Allo Studio Della Partecipazione Politica
No ratings yet
Pizzorno, A. Int Allo Studio Della Partecipazione Politica
36 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
3.1 Dimensionality Reduction
No ratings yet
3.1 Dimensionality Reduction
24 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
No ratings yet
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
9 pages
Shanu Resume DBL New
No ratings yet
Shanu Resume DBL New
2 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
MGT 6203 - Sri - M4 - Logistic Regression v042919
No ratings yet
MGT 6203 - Sri - M4 - Logistic Regression v042919
31 pages
Assigment
No ratings yet
Assigment
11 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Solution Ej 4 EDO
No ratings yet
Solution Ej 4 EDO
7 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Krish Anand - Investigatory Project (Chemistry)
No ratings yet
Krish Anand - Investigatory Project (Chemistry)
18 pages
BTR Astro 1
No ratings yet
BTR Astro 1
6 pages
Objective: Decrease of The Electrical Resistivity
No ratings yet
Objective: Decrease of The Electrical Resistivity
12 pages
Autonomous Urban Garden
No ratings yet
Autonomous Urban Garden
6 pages
Target Marketing Strategy
No ratings yet
Target Marketing Strategy
5 pages
Topic 3 - Concept of Inclusion in Sports, Its Need
No ratings yet
Topic 3 - Concept of Inclusion in Sports, Its Need
6 pages
Water Potential Word Problems
No ratings yet
Water Potential Word Problems
2 pages
Climate Change Impacts On Philippines Co
No ratings yet
Climate Change Impacts On Philippines Co
4 pages
Scent Marketing The Results Are In, Happier Customers Who Remember Your Brand and Linger Longer
No ratings yet
Scent Marketing The Results Are In, Happier Customers Who Remember Your Brand and Linger Longer
21 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
American Welding Society Welding Symbols
No ratings yet
American Welding Society Welding Symbols
11 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
9 pages
GenPhysics2 - Q2 Module 4
No ratings yet
GenPhysics2 - Q2 Module 4
18 pages
Code of Practice: Investigation and Control of Dampness in Buildings
No ratings yet
Code of Practice: Investigation and Control of Dampness in Buildings
20 pages
Mathematics Decimal and Percentage
No ratings yet
Mathematics Decimal and Percentage
4 pages
Chapter Five Equipment Design 5.1 Design of Vertical Ribbon Mixer
No ratings yet
Chapter Five Equipment Design 5.1 Design of Vertical Ribbon Mixer
12 pages

Feature engineering

Uploaded by

Feature engineering

Uploaded by

Feature selection is a technique in machine learning that involves choosing a subset of the most

There are two main approaches to feature selection:

Recursive Feature Elimination

The process of RFE is as follows:

1. Train a model with all the features

Backward Feature elimination method

The process of Backward Feature Elimination is as follows:

1. Train a model with all the features

Forward Feature Selection

The process of Forward Feature Selection is as follows:

1. Train a model with an empty set of features

Genetic Algorithm (GA) is a meta-heuristic optimization technique inspired by the process of

You might also like