0% found this document useful (0 votes)
4 views

Artificial+Intelligence Feature+Engineering+III

The document outlines a lecture on feature engineering for aspiring AI engineers, focusing on how to identify and extract meaningful features from data. It covers various methods for feature selection, including univariate feature selection, recursive feature elimination, and feature importance using ensemble methods. Additionally, it provides practical examples using statistical tests like chi-square, ANOVA, and mutual information to select relevant features.

Uploaded by

sachit.gupta004
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Artificial+Intelligence Feature+Engineering+III

The document outlines a lecture on feature engineering for aspiring AI engineers, focusing on how to identify and extract meaningful features from data. It covers various methods for feature selection, including univariate feature selection, recursive feature elimination, and feature importance using ensemble methods. Additionally, it provides practical examples using statistical tests like chi-square, ANOVA, and mutual information to select relevant features.

Uploaded by

sachit.gupta004
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Master in Artificial Intelligence

Feature Engineering III


Enrichmentors Growing through Excellence over 40 years to become Best in Management
Purpose
The purpose of the section is to help you learn how
to identify and extract meaningful features from
the data to become a Successful Artificial
Intelligence (AI) Engineer

At the end of this lecture, you will learn the


following
How to select a subset of the most relevant
features

Enrichmentors Growing through Excellence over 40 years to become Best in Management


How to select a subset of the most relevant features

Understand the
Evaluate Feature Iterate and
Problem and
Importance Experiment
Data

Perform Domain
Dimensionality
Exploratory Data Knowledge
Reduction
Analysis (EDA) Integration

Feature Feature
Validate and Test
Engineering Selection

Enrichmentors Growing through Excellence over 40 years to become Best in Management


How to select a subset of the most relevant features
Univariate • Based on statistical tests such as chi-
Feature square, ANOVA, or mutual
Selection information

Recursive • Iteratively remove least important


Feature features based on model
Elimination (RFE) performance

• Assess the importance of features


Feature
using ensemble methods like Random
Importance Forests or Gradient Boosting

Enrichmentors Growing through Excellence over 40 years to become Best in Management


How to select based on statistical tests
Univariate • Based on statistical tests such as chi-
Feature square, ANOVA, or mutual
Selection information

Recursive • Iteratively remove least important


Feature features based on model
Elimination (RFE) performance

• Assess the importance of features


Feature
using ensemble methods like Random
Importance Forests or Gradient Boosting

Enrichmentors Growing through Excellence over 40 years to become Best in Management


How to select based on Chi-Square Test
The chi-square test is used to assess the independence between categorical
variables. It measures the significance of the association between a
categorical feature and a categorical target variable

from sklearn.feature_selection import SelectKBest


from sklearn.feature_selection import chi2

# Assuming X contains your features and y contains your target variable


# Select the k best features based on the chi-square test
k = 5 # Number of features to select
selector = SelectKBest(score_func=chi2, k=k)
X_new = selector.fit_transform(X, y)

Enrichmentors Growing through Excellence over 40 years to become Best in Management


How to select based on ANOVA (Analysis of Variance)
ANOVA is used to assess the significance of differences in the means of numerical
features across different categories of a categorical target variable

from sklearn.feature_selection import f_classif

# Assuming X contains your features and y contains your target variable


# Compute ANOVA F-values and select the best features
selector = SelectKBest(score_func=f_classif, k=k)
X_new = selector.fit_transform(X, y)

Enrichmentors Growing through Excellence over 40 years to become Best in Management


How to select based on Mutual Information
Mutual information measures the dependency between two variables, regardless of
their types (categorical or numerical). It quantifies the amount of information
obtained about one variable through the other

from sklearn.feature_selection import mutual_info_classif

# Assuming X contains your features and y contains your target variable


# Compute mutual information scores and select the best features
selector = SelectKBest(score_func=mutual_info_classif, k=k)
X_new = selector.fit_transform(X, y)

Enrichmentors Growing through Excellence over 40 years to become Best in Management


How to select based on statistical tests

from sklearn.model_selection import cross_val_score


from sklearn.linear_model import LogisticRegression

# Assuming X_new contains the selected features


# Initialize your model
model = LogisticRegression()

# Evaluate model performance using cross-validation


scores = cross_val_score(model, X_new, y, cv=5) # Adjust cv as
needed
print("Mean Accuracy:", scores.mean())

Enrichmentors Growing through Excellence over 40 years to become Best in Management


How to select based on statistical tests

Iterate and Tune:

Tune
hyperparameters and
Iterate experiment with
different machine
learning algorithms

evaluate the impact


Different feature Vary the number of Feature selection
on model Number of features Model type.
selection methods selected features (k) method
performance.

Enrichmentors Growing through Excellence over 40 years to become Best in Management


How to select a subset of the most relevant features
Univariate • Based on statistical tests such as chi-
Feature square, ANOVA, or mutual
Selection information.

Recursive • Iteratively remove least important


Feature features based on model
Elimination (RFE) performance.

• Assess the importance of features


Feature
using ensemble methods like Random
Importance Forests or Gradient Boosting

Enrichmentors Growing through Excellence over 40 years to become Best in Management


What is next?

Enrichmentors Growing through Excellence over 40 years to become Best in Management


Master in Artificial Intelligence

Feature Engineering III


Enrichmentors Growing through Excellence over 40 years to become Best in Management

You might also like