0% found this document useful (0 votes)

45 views61 pages

Feature Selection

Uploaded by

Aisha Yahya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views61 pages

Feature Selection

Uploaded by

Aisha Yahya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Feature Selection

Outline
• Introduction
• What is Feature selection?
• Is Feature Selection required?
• Motivation for Feature Selection.
• Relevance of Features
• Variable Ranking
• Feature Subset Selection
Introduction
• The volume of data is practically exploding by the
day. Not only this, the data that is available now in
becoming increasingly unstructured.
• An universal problem of intelligent (learning) agents
is where to focus their attention.
• It is very critical to understand “What are the aspects
of the problem at hand are important/necessary to
solve it?”
– i.e. discriminate between the relevant and irrelevant parts
of experience.
What is Feature selection?
(or Variable Selection)
• Problem of selecting some subset of a learning
algorithm’s input variables upon which it should
focus attention, while ignoring the rest.
• In other words, Dimensionality Reduction. As
Humans, we constantly do that!
What is Feature selection?
(or Variable Selection)
• Given a set of features F = { f1 ,…, fi ,…, fn } the
Feature Selection problem is to find a subset
that “maximizes the learner’s ability to classify
patterns”.
• Formally F’ should maximize some scoring
function.
Is Feature Selection required?
Two Thoughts
Motivation for Feature Selection.
• Especially when dealing with a large number of variables
there is a need for Dimensionality Reduction.
• Feature Selection can significantly improve a learning
algorithm’s performance.
• The Curse of Dimensionality
Feature Selection — Optimality?
• In theory, the goal is to find an optimal feature-
subset (one that maximizes the scoring function).
• In real world applications this is usually not
possible.
– For most problems it is computationally intractable to
search the whole space of possible feature subsets.
– One usually has to settle for approximations of the
optimal subset.
– Most of the research in this area is devoted to finding
efficient search-heuristics.
Relevance of Features
• There are several definitions of relevance in
literature.
– Relevance of 1 variable, Relevance of a variable given
other variables, Relevance given a certain learning
algorithm ,..
– Most definitions are problematic, because there are
problems where all features would be declared to be
irrelevant
– This can be defined through two degrees of
relevance: weak and strong relevance.
• A feature is relevant iff it is weakly or strongly
relevant and irrelevant (redundant) otherwise.
Relevance of Features
• Strong Relevance of a variable/feature:
– Let Si = {f1, …, fi-1, fi+1, …fn} be the set of all features
except fi. Denote by si a value-assignment to all
features in Si.
– A feature fi is strongly relevant, iff there exists
some xi, y and si for which p(fi = xi, Si = si) > 0 such
that
– p(Y = y | fi = xi; Si = si) ≠ p(Y = y | Si = si)
– This means that removal of fi alone will always result
in a performance deterioration of an optimal Bayes
classifier.
Relevance of Features
• Weak Relevance of a variable/feature:
– A feature fi is weakly relevant, iff it is not strongly
relevant, and there exists a subset of features Si‘
of Si for which there exists some xi, y and si’ with
p(fi = xi, Si’ = si’) > 0 such that
– p(Y = y | fi = xi; Si’ = si’) ≠ p(Y = y | Si’ = si’)
– This means that there exists a subset of
features Si’, such that the performance of an
optimal Bayes classifier on Si’ is worse
than Si’ U { fi }
Variable Ranking
• Variable Ranking is the process of ordering the
features by the value of some scoring function,
which usually measures feature-relevance.

• Resulting set: The score S(fi) is computed from the

training data, measuring some criteria of feature fi.
By convention a high score is indicative for a valuable
(relevant) feature.
Variable Ranking
• A simple method for feature selection using variable
ranking is to select the k highest ranked features
according to S.

• This is usually not optimal, but often preferable to

other, more complicated methods.

• It is computationally efficient — only calculation and

sorting of n scores.
Ranking Criteria
Correlation Criteria
Information Theoretic Criteria
Ranking Criteria poses some questions
• Can variables with small score be automatically discarded?
• The answer is NO!
• Even variables with small score can improve class seperability.

• Here, this depends on the correlation between x1 and x2

Ranking Criteria poses some questions
• Can a useless variable (i.e. one with a small score) be useful
together with others?
• The answer is YES!

• The correlation between variables and target are not enough

to assess relevance.
• The correlation / co-variance between pairs of variables has to
be considered too (potentially difficult).
• Also, the diversity of features needs to be considered.
Ranking Criteria poses some questions
• Can two variables that are useless by themselves can be
useful together?
• The answer is YES!
• This can be done using the Information Theoretic Criteria.
• Information Theoretic Criteria
• Mutual information can also detect non-linear dependencies
among variables.
• But, it is harder to estimate than correlation.
• It is a measure for “how much information (in terms of
entropy) two random variables share”.
Variable Ranking
Single Variable Classifiers
• Idea: Select variables according to their individual
predictive power
• Criterion: Performance of a classifier built with 1
variable e.g. the value of the variable itself
• The Predictive power is usually measured in terms of
error rate (or criteria using False Positive Rate, False
Negative Rate)
• Also, a combination of SVC’s can be deployed using
ensemble methods (boosting,…).
Feature Subset Selection
The Goal of Feature Subset Selection is to find the optimal
feature subset. Feature Subset Selection Methods can be
classified into three broad categories.
– Filter Methods
– Wrapper Methods
– Embedded Methods
For Feature Subset Selection we need:
– A measure for assessing the goodness of a feature subset
(scoring function)
– A strategy to search the space of possible feature subsets
– Finding a minimal optimal feature set for an arbitrary target
concept is hard. It would need Good Heuristics.
Filter Methods
Feature Subset Selection
• Filter Methods:
– Filter methods select features from a dataset
independently for any machine learning algorithm.
– These methods rely only on the characteristics of
these variables, so features are filtered out of the
data before learning begins.
– These methods are powerful and simple and help to
quickly remove features.
– These are generally the first step in any feature
selection pipeline.
Feature Subset Selection
• Advantages of Filter Methods:
– Selected features can be used in any machine learning
algorithm,
– They’re computationally inexpensive—you can
process thousands of features in a matter of seconds.
– Filter methods are very good for eliminating
irrelevant, redundant, constant, duplicated, and
correlated features.
Feature Subset Selection
• Filtering Methods: There are of 2 of types.
– Univariate
– Multivariate.
Feature Subset Selection
Univariate filter methods
• They evaluate and rank a single feature according to certain
criteria.
• They treat each feature individually and independently of
the feature space.
• This is how it functions in practice:
– It ranks features according to certain criteria.
– Then select the highest ranking features according to those
criteria.
• One problem that can occur with univariate methods is
they may select a redundant variable, as they don’t take
into consideration the relationship between features.
Feature Subset Selection
• Multivariate filter methods, on the other hand, evaluate
the entire feature space.
• They take into account features in relation to other ones
in the dataset.
• These methods are able to handle duplicated, redundant,
and correlated features.
Feature Subset Selection
• Basic Filter Methods:
– Constant Features that show single values in all the
observations in the dataset. These features provide no
information that allows ML models to predict the
target.
– Quasi-Constant Features in which a value occupies
the majority of the records.
– Duplicated Features, which is self-explanatory—the
same feature.
Correlation Filter Methods
• Correlation is defined as a measure of the linear relationship
between two quantitative variables, like height and weight. You
could also define correlation is a measure of how strongly one
variable depends on another.
• A high correlation is often a useful property—if two variables are
highly correlated, we can predict one from the other.
• Therefore, we generally look for features that are highly
correlated with the target, especially for linear machine learning
models.
• However, if two variables are highly correlated among
themselves, they provide redundant information in regards to
the target. Essentially, we can make an accurate prediction on
the target with just one of the redundant variables.
Correlation Filter Methods
• There are a number of methods to measure the
correlation between variables.
• Pearson correlation coefficient: It’s used to summarize
the strength of the linear relationship between two
variables, which can vary between 1 and -1:
– 1 means a positive correlation: the values of one variable
increase as the values of another increase.
– -1 means a negative correlation: the values of one variable
decrease as the values of another increase.
– 0 means no linear correlation between the two variables.
Correlation Filter Methods
• The assumptions of the Pearson correlation coefficient :
• Both variables should be normally distributed.
– A straight-line relationship between the two variables.
– Data is equally distributed around the regression line.
• The formula to calculate the value of the Pearson
correlation coefficient is :

• Sometimes two variables can be related in a nonlinear

relationship, which can be stronger or weaker across the
distribution of the variables.
Correlation Filter Methods
• Spearman’s rank correlation coefficient is a non-parametric
test that’s used to measure the degree of association
between two variables with a monotonic function(increasing
or decreasing relationship).
• The measured strength between the variables using
Spearman’s correlation varies between+1 and −1.
• Spearman’s coefficient is suitable for both continuous and
discrete ordinal variables.
• The Spearman’s rank correlation test doesn’t carry any
assumptions about the distribution of the data.
Correlation Filter Methods
• Kendall’s rank correlation coefficient is a non-parametric
test that measures the strength of the ordinal association
between two variables.
• It calculates a normalized score for the number of
matching or concordant rankings between the two data
samples.
• Kendall’s correlation varies between 1 (high) and -1 (low).
• This type of correlation is best suited for discrete data.

= Concordant: Ordered in the same way

= Discordant: Ordered differently.
Correlation Filter Methods
• Concordant: Ordered in the same way (consistency).
– A pair of observations is considered concordant if (x2 — x1) and
(y2 — y1) have the same sign.
• Discordant: Ordered differently (inconsistency).
– A pair of observations is considered concordant if (x2 — x1) and
(y2 — y1) have opposite signs.
Correlation Filter Methods
• Calculate correlation coefficient for the following data
Correlation Filter Methods
Filter Methods
• Filter methods that tend to select features
independently and work with (essentially) any machine
learning algorithm.
• These methods tend to ignore the effect of the
selected feature subset on the performance of the
algorithm.
• In addition, filter methods often evaluate features
individually. In that case, some variables can be useless
for prediction in isolation, but they can be quite useful
when combined with other variables.
• To prevent those issues, wrapper methods join the
party in selecting the best feature subsets.
Wrapper Methods
Wrapper Methods
• Wrapper methods work by evaluating a subset of
features using a machine learning algorithm that employs
a search strategy to look through the space of possible
feature subsets, evaluating each subset based on the
quality of the performance of a given algorithm.

• These methods are called greedy algorithms because

they aim to find the best possible combination of
features that result in the best performant model—
which will be computationally expensive, and often
impractical in the case of exhaustive search.
Wrapper Methods
• Practically any combination of a search strategy and a
machine learning algorithm can be used as a wrapper.

• Wrapper Methods: Advantages

– They detect the interaction between variables
– They find the optimal feature subset for the desired
machine learning algorithm

• The wrapper methods usually result in better

predictive accuracy than filter methods.
Wrapper Methods: Process
• Search for a subset of features: Using a search method,
we select a subset of features from the available ones.

• Build a machine learning model: a chosen ML algorithm

is trained on the previously-selected subset of features.

• Evaluate model performance: we evaluate the newly-

trained ML model with a chosen metric.

• Repeat: The whole process starts again with a new

subset of features, a new ML model trained, and so on.
Stopping Criteria
• At some point in time, we need to stop searching for a
subset of features.
• To do this, we have to put in place some pre-set
criteria.
• These criteria need to be defined by the machine
learning engineer.
• Here are a couple of examples of these criteria:
– Model performance decreases.
– Model performance increases.
– A predefined number of features is reached.
• The pre-set criteria, Example, can be metrics like ROC-
AUC for classification or RMSE for regression.
AUC-ROC Curve in Machine Learning Clearly Explained
Search methods
• Forward Feature Selection: This method starts with no
feature and adds one at a time.

• Backward Feature Elimination: This method starts with

all features present and removes one feature at the time.

• Exhaustive Feature Selection: This method tries all

possible feature combinations.

• Bidirectional Search: And this last one does both forward

and backward feature selection simultaneously in order
to get one unique solution.
Forward Feature Selection
• Forward feature selection or sequential forward feature
selection(SFS) is an iterative method in which we start by
evaluating all features individually, and then select the
one that results in the best performance.

• In the next step, it tests all possible combinations of the

selected feature with the remaining features and retains
the pair that produces the best algorithmic performance.

• And the loop continues by adding one feature at a time

in each iteration until the pre-set criterion is reached.
Forward Feature Selection
Forward Feature Selection
Backward Feature Elimination
• Backward feature selection or sequential backward
feature selection(SBS). We start with all the features in
the dataset, and then we evaluate the performance of
the algorithm.
• After that, at each iteration, backward feature
elimination removes one feature at a time, which
produces the best performing algorithm using an
evaluation metric.
• This feature can be also described as the least significant
feature among the remaining available ones.
• And it continues, removing feature after feature until a
certain criterion is satisfied.
Backward Feature Elimination
Backward Feature Elimination
Exhaustive Feature Selection
• It searches across all possible feature combinations. Its aim
is to find the best performing feature subset.
• It creates all the subsets of features from 1 to N,
with N being the total number of features, and for each
subset, it builds a machine learning algorithm and selects
the subset with the best performance.
• The parameters that you can play with here are the 1 and
N, which can be described as the minimum number of
features and the maximum number of features.
• That way, we can reduce this method’s computation time if
we choose reasonable numbers for these parameters.
Bidirectional Feature Selection
• Begins the search in both directions, performing SFG and SBG
concurrently.
• They stop in two cases:
– (1) when one search finds the best subset comprised of m features
before it reaches the exact middle.
or
– (2) both searches achieve the middle of the search space. It takes
advantage of both SFG and SBG.
• But this can lead to an issue of converging to a different
solution. To avoid this and to guarantee SFS and SBS converge
to the same solution, we make the following constraints:
– Features already selected by SFS are not removed by SBS.
– Features already removed by SBS are not added by SFS.
Bidirectional Feature Selection
Embedded Methods
Embedded Methods
• Wrapper methods provide a good way to ensure that the
selected features are the best for a specific machine
learning model.
• These methods will provide better results in terms of
performance, but they’ll also cost us a lot of computation
time/resources.
• But what if we could include the feature selection
process in ML model training itself? That could lead us to
even better features for that model, in a shorter amount
of time. This is where embedded methods come into
play.
Embedded Methods
• Embedded methods complete the feature selection
process within the construction of the machine
learning algorithm itself.

• A learning algorithm takes advantage of its own

variable selection process and performs feature
selection and classification/regression at the same
time.
Embedded Methods: Advantages
• They take into consideration the interaction of features
like wrapper methods do.
• They are faster like filter methods.
• They are more accurate than filter methods.
• They find the feature subset for the algorithm being
trained.
• They are much less prone to overfitting.
Embedded Methods: Process
• First, these methods train a machine learning model.

• Then they derive feature importance from that

model, which is a measure of how much is feature
important when making a prediction.

• Finally, they remove non-important features using

the derived feature importance.
Few embedded methods for feature selection

• Regularization in machine learning adds a penalty to

the different parameters of a model to reduce its
freedom.
• This penalty is applied to the coefficient that
multiplies each of the features in the linear model,
and is done to avoid overfitting, make the model
robust to noise, and to improve its generalization.
Types of Regularization
• L1 regularization has shrinks some of the coefficients to zero,
therefore indicating that a certain predictor or certain
features will be multiplied by zero to estimate the target.
Thus, it won’t be added to the final prediction of the target—
this means that these features can be removed because they
aren’t contributing to the final prediction.
Types of Regularization
• L2 regularization, on the other hand, doesn’t
set the coefficient to zero, but only
approaching zero—that’s why we use only L1
in feature selection.
• L1/L2 regularization is a combination of the L1
and L2. It incorporates their penalties, and
therefore we can end up with features with
zero as a coefficient—similar to L1.
Tree-based Feature Importance
• Tree-based algorithms and models (i.e. random forest) are
well-established algorithms that not only offer good
predictive performance but can also provide us with what
we call feature importance as a way to select features.
• Feature importance
– Feature importance tells us which variables are more important in
making accurate predictions on the target variable/class. In other
words, it identifies which features are the most used by the
machine learning algorithm in order to predict the target.
• Random forests provide us with feature importance using
straightforward methods — mean decrease
impurity and mean decrease accuracy

Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
Module5.2 Feature selection methods
No ratings yet
Module5.2 Feature selection methods
64 pages
dimensionalityReduction.pptx
No ratings yet
dimensionalityReduction.pptx
117 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
Feature Selection 1692278667
No ratings yet
Feature Selection 1692278667
100 pages
7. Feature Engineering and Dimensionality Reduction
No ratings yet
7. Feature Engineering and Dimensionality Reduction
146 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
52 pages
Dimenn Red PDF
No ratings yet
Dimenn Red PDF
135 pages
u1 p2 2
No ratings yet
u1 p2 2
66 pages
Lecture 03
No ratings yet
Lecture 03
33 pages
Feature Selection Engineering
No ratings yet
Feature Selection Engineering
72 pages
An Introduction To Variable and Feature Selection: Isabelle Guyon
No ratings yet
An Introduction To Variable and Feature Selection: Isabelle Guyon
26 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
CS464_Ch5_FeatureSelection
No ratings yet
CS464_Ch5_FeatureSelection
31 pages
The Relationship of Student Allowance An
No ratings yet
The Relationship of Student Allowance An
7 pages
L2
No ratings yet
L2
53 pages
Lua Chon Dac Trung
No ratings yet
Lua Chon Dac Trung
18 pages
NEC ML UNIT-III Complete Final
No ratings yet
NEC ML UNIT-III Complete Final
22 pages
An Introduction To Variable and Feature Selection: Isabelle Guyon
No ratings yet
An Introduction To Variable and Feature Selection: Isabelle Guyon
26 pages
Feature Selection
No ratings yet
Feature Selection
56 pages
Data Reduction
No ratings yet
Data Reduction
23 pages
کتاب پنجم بارگزاری شده
No ratings yet
کتاب پنجم بارگزاری شده
35 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Wrapper Method
No ratings yet
Wrapper Method
58 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
Module-3 - DS (Autosaved)
No ratings yet
Module-3 - DS (Autosaved)
18 pages
Project On Impact of Workplace Motivation On Employee'S Performance: A Study Based On Banks-Sbi
100% (1)
Project On Impact of Workplace Motivation On Employee'S Performance: A Study Based On Banks-Sbi
23 pages
feature selection
No ratings yet
feature selection
17 pages
AI5003-AML-Week07
No ratings yet
AI5003-AML-Week07
14 pages
Feature Pruning and Normalization
No ratings yet
Feature Pruning and Normalization
8 pages
A Comparative Study Between Feature Selection Algorithms - Ok
No ratings yet
A Comparative Study Between Feature Selection Algorithms - Ok
10 pages
Feature Selection - Study Material
No ratings yet
Feature Selection - Study Material
6 pages
Data Selection
No ratings yet
Data Selection
6 pages
An Introduction To Variable and Feature Selection
No ratings yet
An Introduction To Variable and Feature Selection
26 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Special Topic: Missing Values
No ratings yet
Special Topic: Missing Values
25 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Flairs99 042
No ratings yet
Flairs99 042
5 pages
Eature Engineering: Presenter: Prof. Amit Kumar Das
No ratings yet
Eature Engineering: Presenter: Prof. Amit Kumar Das
17 pages
DM Prathameshwadnerkar92
No ratings yet
DM Prathameshwadnerkar92
9 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
New .........
No ratings yet
New .........
2 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
DOC-20241211-WA0028.
No ratings yet
DOC-20241211-WA0028.
10 pages
Literature Review On Feature Subset Selection Techniques
No ratings yet
Literature Review On Feature Subset Selection Techniques
3 pages
Salesforce+Motivation+and+Marketing+Performance+of+Manufacturing+Firms+in+Nigeria+(Dr.+V.C.+Anucha)
No ratings yet
Salesforce+Motivation+and+Marketing+Performance+of+Manufacturing+Firms+in+Nigeria+(Dr.+V.C.+Anucha)
14 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
Iijcs 2014 07 19 18
No ratings yet
Iijcs 2014 07 19 18
7 pages
4961394
No ratings yet
4961394
44 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
Kernels, Model Selection and Feature Selection
No ratings yet
Kernels, Model Selection and Feature Selection
5 pages
wst03 01 Que 20240120
No ratings yet
wst03 01 Que 20240120
24 pages
2.2* Correlation
No ratings yet
2.2* Correlation
23 pages
PR2 Module 5 Lesson 3
No ratings yet
PR2 Module 5 Lesson 3
30 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Methods of Research Chapter 3
100% (3)
Methods of Research Chapter 3
10 pages
Quantitative Research Design and Method
No ratings yet
Quantitative Research Design and Method
54 pages
Guajardo 2015 (Scholarly Article
No ratings yet
Guajardo 2015 (Scholarly Article
16 pages
Revision Sheet Chapter 19
No ratings yet
Revision Sheet Chapter 19
58 pages
Lab Record
No ratings yet
Lab Record
59 pages
Syllabus ME02000361
No ratings yet
Syllabus ME02000361
4 pages
Hypothesis
No ratings yet
Hypothesis
108 pages
RM Practice Questions Inferential Statistics Mainly
No ratings yet
RM Practice Questions Inferential Statistics Mainly
24 pages
M6 - Check-In Activity 2
No ratings yet
M6 - Check-In Activity 2
3 pages
Nabilah-22018025 Tugas Statistika
No ratings yet
Nabilah-22018025 Tugas Statistika
13 pages
ART2016615
No ratings yet
ART2016615
7 pages
Textbook Correlation and Regression Analysis Egypt en
No ratings yet
Textbook Correlation and Regression Analysis Egypt en
39 pages
Tingkat Pengetahuan TB Paru Mempengaruhi Penggunaan Masker Pada Penderita TB Paru
No ratings yet
Tingkat Pengetahuan TB Paru Mempengaruhi Penggunaan Masker Pada Penderita TB Paru
17 pages
Cervical Vertebrae Maturation Assessment As A Predictive Method For Midpalatal Suture Maturation Stages in 11 To 14 Year Olds Retrospective Study
No ratings yet
Cervical Vertebrae Maturation Assessment As A Predictive Method For Midpalatal Suture Maturation Stages in 11 To 14 Year Olds Retrospective Study
8 pages
Nonparametric Correlations: Notes
No ratings yet
Nonparametric Correlations: Notes
7 pages
EN Relationship Between Tax Policy, Growth of SMEs
No ratings yet
EN Relationship Between Tax Policy, Growth of SMEs
11 pages
Non-Parametric Statistics, A Free Distribution Test, Is A Statistical Technique Used For Non
No ratings yet
Non-Parametric Statistics, A Free Distribution Test, Is A Statistical Technique Used For Non
17 pages
IV - Measures of Relationship
100% (1)
IV - Measures of Relationship
4 pages
MMW Questions Answer 1. T-Test For Correlated Sample
No ratings yet
MMW Questions Answer 1. T-Test For Correlated Sample
21 pages
Nonparametric Correlations
No ratings yet
Nonparametric Correlations
2 pages
Formulas in Inferential Statistics
No ratings yet
Formulas in Inferential Statistics
4 pages
ISUE-CTE-Syl Effectivity: Revision:1
No ratings yet
ISUE-CTE-Syl Effectivity: Revision:1
10 pages
5-Repetition Sit-To-Stand Test in Subjects With Chronic Stroke
No ratings yet
5-Repetition Sit-To-Stand Test in Subjects With Chronic Stroke
7 pages
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)

Feature Selection

Uploaded by

Feature Selection

Uploaded by

Feature Selection

• Resulting set: The score S(fi) is computed from the

• This is usually not optimal, but often preferable to

• It is computationally efficient — only calculation and

• Here, this depends on the correlation between x1 and x2

• The correlation between variables and target are not enough

• Sometimes two variables can be related in a nonlinear

= Concordant: Ordered in the same way

• These methods are called greedy algorithms because

• Wrapper Methods: Advantages

• The wrapper methods usually result in better

• Build a machine learning model: a chosen ML algorithm

• Evaluate model performance: we evaluate the newly-

• Repeat: The whole process starts again with a new

• Backward Feature Elimination: This method starts with

• Exhaustive Feature Selection: This method tries all

• Bidirectional Search: And this last one does both forward

• In the next step, it tests all possible combinations of the

• And the loop continues by adding one feature at a time

• A learning algorithm takes advantage of its own

• Then they derive feature importance from that

• Finally, they remove non-important features using

• Regularization in machine learning adds a penalty to

You might also like