0% found this document useful (0 votes)
21 views

C2 - W2 Mlopssadasdsa

These slides are distributed under a Creative Commons license for educational purposes. DeepLearning.AI makes the slides available but does not allow commercial use or distribution without proper attribution. The full details of the license can be found at the URL provided.

Uploaded by

Jian Quan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

C2 - W2 Mlopssadasdsa

These slides are distributed under a Creative Commons license for educational purposes. DeepLearning.AI makes the slides available but does not allow commercial use or distribution without proper attribution. The full details of the license can be found at the URL provided.

Uploaded by

Jian Quan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 123

Copyright Notice

These slides are distributed under the Creative Commons License.

DeepLearning.AI makes these slides available for educational purposes. You may not use or
distribute these slides for commercial purposes. You may make copies of these slides and
use or distribute them for educational purposes as long as you cite DeepLearning.AI as the
source of the slides.

For the rest of the details of the license, see


https://creativecommons.org/licenses/by-sa/2.0/legalcode
Feature Engineering,
Transformation and Selection

Welcome
Feature Engineering

Introduction to
Preprocessing
“Coming up with features is difficult,
time-consuming, and requires expert knowledge.
Applied machine learning often requires careful
engineering of the features and dataset.”
— Andrew Ng
Outline

● Squeezing the most out of data

● The art of feature engineering

● Feature engineering process

● How feature engineering is done in a typical ML pipeline


Squeezing the most out of data

● Making data useful before training a model


● Representing data in forms that help models learn
● Increasing predictive quality
● Reducing dimensionality with feature engineering
Art of feature engineering
Combine Features

Feature
Launch and Tune Objective
Reiterate
Engineering
Function

Make New
Features
Typical ML pipeline
During training During serving

Whole Dataset Each Request

Feature Real-time
Batch processing
Engineering processing
Key points

● Feature engineering can be difficult and time consuming, but also very
important to success
● Squeezing the most out of data through feature engineering enables
models to learn better
● Concentrating predictive information in fewer features enables more
efficient use of compute resources
● Feature engineering during training must also be applied correctly
during serving
Feature Engineering

Preprocessing
Operations
Outline

● Main preprocessing operations


● Mapping raw data into features
● Mapping numeric values
● Mapping categorical values
● Empirical knowledge of data
Main preprocessing operations

Data cleansing Feature tuning Representation Feature Feature


transformation extraction construction
Mapping raw data into features

Raw Data Feature Vector

0: { [
house_info : { 6.0,
Process of creating features
num_rooms : 6 1.0,
from raw data is feature
num_bedrooms : 3 0.0,
engineering
street_name: “Shorebird Way” 0.0,
num_basement_rooms: -1 Feature Engineering 9.321,
… -2.20,
} 1.01,
} Raw data doesn’t 0.0,
come to us as feature …,
vectors ]
Mapping categorical values
Street names
{'Charleston Road', 'North Shoreline Boulevard', 'Shorebird Way', 'Rengstorff Avenue'}
Raw Data Feature Vector

0: { String Features can be One-hot encoding


house_info : { handled with one-hot This has a 1 for “Shorebird
num_rooms : 6 encoding way” and 0 for all others
num_bedrooms : 3
street_name: “Shorebird Way”
num_basement_rooms: -1 Feature Engineering street_name feature=
… [0,0, …, 0, 1, 0, …, 0]
}
}
Categorical Vocabulary
# From a vocabulary list

vocabulary_feature_column = tf.feature_column.categorical_column_with_vocabulary_list(

key=feature_name,

vocabulary_list=["kitchenware", "electronics", "sports"])


# From a vocabulary file

vocabulary_feature_column = tf.feature_column.categorical_column_with_vocabulary_file(

key=feature_name,

vocabulary_file="product_class.txt",

vocabulary_size=3)
Empirical knowledge of data

Text - stemming, lemmatization, TF-IDF, n-grams,


embedding lookup

Images - clipping, resizing, cropping, blur, Canny filters,


Sobel filters, photometric distortions
Key points

● Data preprocessing: transforms raw data into a clean and


training-ready dataset
● Feature engineering maps:
○ Raw data into feature vectors
○ Integer values to floating-point values
○ Normalizes numerical values
○ Strings and categorical values to vectors of numeric values
○ Data from one space into a different space
Feature Engineering

Feature Engineering
Techniques
Outline

● Feature Scaling

● Normalization and Standardization

● Bucketizing / Binning

● Other techniques
Feature engineering techniques

● Scaling
● Normalizing
Numerical Range
● Standardizing

● Bucketizing
Grouping ● Bag of words
Scaling
● Converts values from their natural range into a
prescribed range
○ E.g. Grayscale image pixel intensity scale is [0,255]
usually rescaled to [-1,1]
image = (image - 127.5) / 127.5

● Benefits
○ Helps neural nets converge faster
○ Do away with NaN errors during training
○ For each feature, the model learns the right weights
Normalization

10 10000

0 1
Normalization

Original Normalized
Standardization (z-score)

● Z-score relates the number of standard


deviations away from the mean
● Example:
-3σ -2σ -σ σ 2σ 3σ

(z-score)

10 10000

-3σ +3σ
Standardization (z-score)

Original Standardized
Bucketizing / Binning

1960 1980 2000


Bucket 0 Bucket 1 Bucket 2 Bucket 3

Date Range Represented as...

< 1960 [1, 0, 0, 0]


>= 1960 but < 1980 [0, 1, 0, 0]
>= 1980 but < 2000 [0, 0, 1, 0]
>= 2000 [0, 0, 0, 1]
Binning with Facets
Other techniques

Dimensionality ● Principal component analysis (PCA)


reduction in ● t-Distributed stochastic neighbor embedding (t-SNE)
embeddings ● Uniform manifold approximation and projection (UMAP)

Feature crossing
TensorFlow embedding projector
● Intuitive exploration of
high-dimensional data

● Visualize & analyze

● Techniques

○ PCA

○ t-SNE

○ UMAP

○ Custom linear projections

● Ready to play

@ projector.tensorflow.org
Key points

● Feature engineering:
○ Prepares, tunes, transforms, extracts and constructs features.

● Feature engineering is key for model refinement


● Feature engineering helps with ML analysis
Feature Engineering

Feature Crosses
Outline

● Feature crosses
● Encoding features
Feature crosses

● Combines multiple features together into a new feature

● Encodes nonlinearity in the feature space, or encodes


the same information in fewer features

● [A X B]: multiplying the values of two features

We can create many different


kinds of feature crosses ● [A x B x C x D x E]: multiplying the values of 5 features

● [Day of week, Hour] => [Hour of week]


Encoding features

● healthy trees

● sick trees
Classification
boundary
Need for encoding non-linearity

● healthy trees

● sick trees
Classification
boundary
Census dataset
Key points

● Feature crossing: synthetic feature encoding nonlinearity in feature


space.

● Feature coding: transforming categorical to a continuous variable.


Feature Transformation At Scale

Preprocessing Data
At Scale
Probably not ideal

Python Java
ML Pipeline

Example TensorFlow
Validator
Extended
Example Gen Statistics Gen SchemaGen

TRAINING &
EVAL DATA
Transform Trainer Evaluator Pusher

TENSORFLOW TENSORFLOW TENSORFLOW


SERVING JS LITE
Outline

● Inconsistencies in feature engineering

● Preprocessing granularity

● Pre-processing training dataset

● Optimizing instance-level transformations

● Summarizing the challenges


Preprocessing data at scale

Consistent transforms
Real-world models: Large-scale data
between training &
terabytes of data processing frameworks
serving
Inconsistencies in feature engineering

Training & serving code paths are different

Mobile (TensorFlow Lite)

Diverse deployments scenarios Server (TensorFlow Serving)

Web (TensorFlow JS)

Risks of introducing training-serving skews

Skews will lower the performance of your serving model


Preprocessing granularity

Transformations
Instance-level Full-pass
Clipping Minimax

Multiplying Standard scaling

Expanding features Bucketizing

etc. etc.
When do you transform?

Pre-processing training dataset

Pros Cons
Run-once Transformations reproduced at serving

Compute on entire dataset Slower iterations


How about ‘within’ a model?

Transforming within the model

Pros Cons

Easy iterations Expensive transforms

Transformation guarantees Long model latency

Transformations per batch: skew


Why transform per batch?

● For example, normalizing features by their average


● Access to a single batch of data, not the full dataset
● Ways to normalize per batch
○ Normalize by average within a batch

○ Precompute average and reuse it during normalization


Optimizing instance-level transformations

● Indirectly affect training efficiency


● Typically accelerators sit idle while the CPUs transform
● Solution:

○ Prefetching transforms for better accelerator efficiency


Summarizing the challenges

● Balancing predictive performance


● Full-pass transformations on training data
● Optimizing instance-level transformations for better training efficiency
(GPUs, TPUs, …)
Key points

● Inconsistent data affects the accuracy of the results


● Need for scaled data processing frameworks to process large datasets
in an efficient and distributed manner
Preprocessing Data At Scale

TensorFlow Transform
Outline

● Going deeper

● Benefits of using TensorFlow Transform

● Applies feature transformations

● tf.Transform Analyzers
Enter tf.Transform

Transform Trainer

Transformed Trained
Input data
Data Models

Training Data Serving System

PIPELINE + METADATA STORAGE


Inside TensorFlow Extended

Example TensorFlow
Validator
Extended
Example Gen Statistics Gen SchemaGen

TRAINING &
EVAL DATA
Transform Trainer Evaluator Pusher

TENSORFLOW TENSORFLOW TENSORFLOW


SERVING JS LITE
tf.Transform layout

● User-provided
ExampleGen SchemaGen
transform
(tf.Transform)
● Schema for parsing
Data Schema
● Applied during
training
● Embedded during Transform Code
serving

Transform Transformed
Graph Data
Performance
optimizations
Trainer
tf. Transform: Going deeper
Training Serving

Raw Data
Tf. Transform

SavedModel
Raw Inference
API
Tf. Transform Request
TensorFlow Graph
Beam
Preprocessing
Processed Data
Tf. Transform
TensorFlow Graph
Model Training Trained Model
Model Training
TensorFlow Graph TensorFlow Graph

Prediction
tf.Transform Analyzers

They behave like TensorFlow


Ops, but run only once during
training
For example:
tft.min computes the minimum
of a tensor over the training
dataset
How Transform applies feature transformations

Training Serving
Benefits of using tf.Transform

● Emitted tf.Graph holds all necessary constants and transformations


● Focus on data preprocessing only at training time
● Works in-line during both training and serving
● No need for preprocessing code at serving time
● Consistently applied transformations irrespective of deployment
platform
Analyzers framework
scale_to_z_score
Scaling
scale_to_0_1

quantiles

Bucketizing apply_buckets

tf.Transform bucketize

Analyzers bag_of_words

Vocabulary tfidf

ngrams
Dimensionality
pca
Reduction
tf.Transform preprocessing_fn

def preprocessing_fn(inputs):
...

for key in DENSE_FLOAT_FEATURE_KEYS:


outputs[key] = tft.scale_to_z_score(inputs[key])

for key in VOCAB_FEATURE_KEYS:

outputs[key] = tft.vocabulary(inputs[key], vocab_filename=key)

for key in BUCKET_FEATURE_KEYS:

outputs[key] = tft.bucketize(inputs[key], FEATURE_BUCKET_COUNT)


Commonly used imports

import tensorflow as tf
import apache_beam as beam
import apache_beam.io.iobase

import tensorflow_transform as tft


import tensorflow_transform.beam as tft_beam
Feature Transformation At Scale

Hello World
with tf.Transform
Hello world with tf.Transform

Analyze
1 2 3 4
tf.Transform

Data Define metadata Transform Constant graph


Collect raw data Prepare metadata for Define the preprocessing Generate a constant
the dataset using function with graph with the
DatasetMetadata tf.Transform analyzers required
transformations
Collect raw samples (Data)

[
{'x': 1, 'y': 1, 's': 'hello'},
{'x': 2, 'y': 2, 's': 'world'},
{'x': 3, 'y': 3, 's': 'hello'}
]
Inspect data and prepare metadata (Data)

from tensorflow_transform.tf_metadata import (


dataset_metadata, dataset_schema)

raw_data_metadata = dataset_metadata.DatasetMetadata(
dataset_schema.from_feature_spec({

'y': tf.io.FixedLenFeature([], tf.float32),


'x': tf.io.FixedLenFeature([], tf.float32),
's': tf.io.FixedLenFeature([], tf.string)
}))
Preprocessing data (Transform)
def preprocessing_fn(inputs):
"""Preprocess input columns into transformed columns."""
x, y, s = inputs['x'], inputs['y'], inputs['s']
x_centered = x - tft.mean(x)
y_normalized = tft.scale_to_0_1(y)
s_integerized = tft.compute_and_apply_vocabulary(s)
x_centered_times_y_normalized = (x_centered * y_normalized)
Preprocessing data (Transform)
return {
'x_centered': x_centered,
'y_normalized': y_normalized,
's_integerized': s_integerized,
'x_centered_times_y_normalized': x_centered_times_y_normalized,
}
Tensors in… tensors out
Inputs TensorFlow Ops Outputs

preprocessing_fn [-1.0, 0.0, 1.0]


[1, 2, 3]

x ● x_centered
[0.0, 0.5, 1.0]
x - tft.mean(x)
['hello', 'world', ● y_normalized
'hello'] tft.scale_to_0_1(y)
s ● s_integerized
[0, 1, 0]
tft.compute_and_apply_vocabulary(s)
[1, 2, 3] ● x_centered * y_normalized
y [-0.0, 0.0, 1.0]
Running the pipeline
def main():
with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
transformed_dataset, transform_fn = (
(raw_data, raw_data_metadata) | tft_beam.AnalyzeAndTransformDataset(
preprocessing_fn))
Running the pipeline
transformed_data, transformed_metadata = transformed_dataset

print('\nRaw data:\n{}\n'.format(pprint.pformat(raw_data)))
print('Transformed data:\n{}'.format(pprint.pformat(transformed_data)))

if __name__ == '__main__':
main()
Before transforming with tf.Transform

# Raw data:
[{'s': 'hello', 'x': 1, 'y': 1},
{'s': 'world', 'x': 2, 'y': 2},
{'s': 'hello', 'x': 3, 'y': 3}]
After transforming with tf.Transform
# After transform
[{'s_integerized': 0,
'x_centered': -1.0,
'x_centered_times_y_normalized': -0.0,
'y_normalized': 0.0},
{'s_integerized': 1,
'x_centered': 0.0,
'x_centered_times_y_normalized': 0.0,
'y_normalized': 0.5},
{'s_integerized': 0,
'x_centered': 1.0,
'x_centered_times_y_normalized': 1.0,
'y_normalized': 1.0}]
Key points

● tf.Transform allows the pre-processing of input data and creating


features
● tf.Transform allows defining pre-processing pipelines and their
execution using large-scale data processing frameworks
● In a TFX pipeline, the Transform component implements feature
engineering using TensorFlow Transform
Feature Selection

Feature Spaces
Outline

● Introduction to Feature Spaces


● Introduction to Feature Selection
● Filter Methods
● Wrapper Methods
● Embedded Methods
Feature space

● N dimensional space defined by your N features


● Not including the target label

X1

X2
X

X0 X1 X0
Feature vector Feature space (3D) Scatter plot (2D)
Feature space
3D Feature Space
No. of Rooms Area Locality Price
X0 X1 X2 Y

5 1200 sq. ft New York $40,000

6 1800 sq. ft Texas $30,000

Y = f(X0, X1, X2)


f is your ML model acting on feature space X0, X1, X2
2D Feature space - Classification

x1
x1
x1

x0
x0 x0

Ideal Realistic Poor


Drawing decision boundary

Model learns decision boundary


x1

Boundary used to classify data points

x0
Feature space coverage

● Train/Eval datasets representative of the serving dataset


○ Same numerical ranges
○ Same classes
○ Similar characteristics for image data
○ Similar vocabulary, syntax, and semantics for NLP data
Ensure feature space coverage

● Data affected by: seasonality, trend, drift.


● Serving data: new values in features and labels.
● Continuous monitoring, key for success!
Feature Selection

Feature Selection
Feature selection

All Features
● Identify features that best represent
✅ ✅ ✅
the relationship
Feature selection
● Remove features that don’t influence
X X X the outcome
Useful features ● Reduce the size of the feature space
● Reduce the resource requirements
and model complexity
Why is feature selection needed?

Reduce storage and I/O Minimize training and


requirements inference costs
Feature selection methods

Unsupervised

Feature Selection

Supervised
Unsupervised feature selection

1. Unsupervised
● Features-target variable relationship not considered
● Removes redundant features (correlation)
Supervised feature selection

2. Supervised
● Uses features-target variable relationship
● Selects those contributing the most
Supervised methods

Filter Methods

Wrapper Methods
Feature Selection Supervised

Embedded Methods
Practical example

Feature selection techniques on Breast


Cancer Dataset (Diagnostic)

Predicting whether tumour is benign or


malignant.
Feature list
id diagnosis radius-mean texture_mea perimeter_ area_mean smoothness compactnes
n mean _mean s_mean

842302 M 17.99 10.38 122.8 1001.0 0.1184 0.2776

concavity_m concavepoin symmetry_ fractal_dime radius_se texture_se perimeter_s area_se


ean ts_mean mean nsion_mean e
Irrelevant
0.3001 0.1471 0.2419 0.07871 1.095 0.9053 8.589 153.4
features

smoothness compactnes concavity_se concavepoint symmetry_ fractal_dime radius-wors texture_wor


_se s_se s_se se nsion_se t st

0.0064 0.049 0.054 0.016 0.03 0.006 25.38 17.33

perimeter_w area_worst smoothness compactness concavity_ concavepoin symmetry_ fractal_dime Unnamed:3


orst _worst _worst worst ts_worst worst nsion_worst 2

184.6 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.1189 NaN


Performance evaluation

We train a RandomForestClassifier model in sklearn.ensemble on


selected features

Metrics (sklearn.metrics):

Method Feature Count Accuracy AUROC Precision Recall F1 Score

All Features 30 0.967262 0.964912 0.931818 0.97619 0.953488


Feature Selection

Filter Methods
Filter methods

● Correlation
Filter Methods ● Univariate feature
selection

Wrapper Methods
Feature Selection Supervised

Embedded Methods
Filter methods

● Correlated features are usually redundant


○ Remove them!

Popular filter methods:


● Pearson Correlation
○ Between features, and between the features and the label
● Univariate Feature Selection
Filter methods

Set of All Selecting the


ML Model Performance
Features Best Subset
Correlation matrix
1.0

● Shows how features are related:

Features + target
○ To each other (Bad)
○ And with target variable (Good)
● Falls in the range [-1, 1]
○ 1 High positive correlation
○ -1 High negative correlation
-0.2
Features + target
Feature comparison statistical tests

● Pearson’s correlation: Linear relationships


● Kendall Tau Rank Correlation Coefficient: Monotonic relationships &
small sample size
● Spearman’s Rank Correlation Coefficient: Monotonic relationships

Other methods:
● Mutual information
● F-Test
● Chi-Squared test
Determine correlation
1.0
# Pearson’s correlation by default
cor = df.corr()

Features + target
plt.figure(figsize=(20,20))
# Seaborn
sns.heatmap(cor, annot=True, cmap=plt.cm.PuBu)
plt.show()

-0.2
Features + target
Selecting features

cor_target = abs(cor["diagnosis_int"])

# Selecting highly correlated features as potential features to eliminate


relevant_features = cor_target[cor_target>0.2]
Performance table

Method Feature Count Accuracy AUROC Precision Recall F1 Score

All Features 30 0.967262 0.964912 0.931818 0.97619 0.953488

Correlation 21 0.974206 0.973684 0.953488 0.97619 0.964706

Best Result
Univariate feature selection in SKLearn

SKLearn Univariate feature selection routines:


1. SelectKBest
2. SelectPercentile
3. GenericUnivariateSelect

Statistical tests available:


● Regression: f_regression, mutual_info_regression
● Classification: chi2, f_classif, mutual_info_classif
SelectKBest implementation
def univariate_selection():

X_train, X_test, Y_train, Y_test = train_test_split(X, Y,


test_size = 0.2,stratify=Y, random_state = 123)

X_train_scaled = StandardScaler().fit_transform(X_train)
X_test_scaled = StandardScaler().fit_transform(X_test)

min_max_scaler = MinMaxScaler()
Scaled_X = min_max_scaler.fit_transform(X_train_scaled)

selector = SelectKBest(chi2, k=20) # Use Chi-Squared test


X_new = selector.fit_transform(Scaled_X, Y_train)
feature_idx = selector.get_support()
feature_names = df.drop("diagnosis_int",axis = 1 ).columns[feature_idx]
return feature_names
Performance table

Method Feature Count Accuracy AUROC Precision Recall F1 Score

All Features 30 0.967262 0.964912 0.931818 0.97619 0.953488

Correlation 21 0.974206 0.973684 0.953488 0.97619 0.964706

Univariate (Chi2) 20 0.960317 0.95614 0.91111 0.97619 0.94252

Best Result
Feature Selection

Wrapper Methods
Wrapper methods

● Correlation
Filter Methods ● Univariate feature
selection

● Forward elimination
Wrapper Methods ● Backward elimination
Feature Selection Supervised
● Recursive feature
elimination

Embedded Methods
Wrapper methods

Select best subset

Set of All Generate a


ML Model Performance
Features Subset
Wrapper methods

Popular wrapper methods


1. Forward Selection
2. Backward Selection
3. Recursive Feature Elimination
Forward selection

1. Iterative, greedy method


2. Starts with 1 feature
3. Evaluate model performance when adding each of the additional
features, one at a time
4. Add next feature that gives the best performance
5. Repeat until there is no improvement
Backward elimination

1. Start with all features


2. Evaluate model performance when removing each of the included
features, one at a time
3. Remove next feature that gives the best performance
4. Repeat until there is no improvement
Recursive feature elimination (RFE)

1. Select a model to use for evaluating feature importance


2. Select the desired number of features
3. Fit the model
4. Rank features by importance
5. Discard least important features
6. Repeat until the desired number of features remains
Recursive feature elimination
def run_rfe():

X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size = 0.2, random_state = 0)

X_train_scaled = StandardScaler().fit_transform(X_train)
X_test_scaled = StandardScaler().fit_transform(X_test)

model = RandomForestClassifier(criterion='entropy', random_state=47)


rfe = RFE(model, 20)
rfe = rfe.fit(X_train_scaled, y_train)
feature_names = df.drop("diagnosis_int",axis = 1 ).columns[rfe.get_support()]
return feature_names

rfe_feature_names = run_rfe()

rfe_eval_df = evaluate_model_on_features(df[rfe_feature_names], Y)
rfe_eval_df.head()
Performance table

Method Feature Count Accuracy AUROC Precision Recall F1 Score

All Features 30 0.96726 0.96491 0.931818 0.97619 0.953488

Correlation 21 0.97420 0.97368 0.9534883 0.97619 0.964705

Univariate (Chi2) 20 0.96031 0.95614 0.91111 0.97619 0.94252

Recursive Feature 20 0.97420 0.97368 0.953488 0.97619 0.964706


Elimination

Best Result
Feature Selection

Embedded Methods
Embedded methods

● Correlation
Filter Methods ● Univariate feature
selection

● Forward elimination
Wrapper Methods ● Backward elimination
Feature Selection Supervised
● Recursive feature
elimination

● L1 regularization
Embedded Methods ● Feature importance
Feature importance

● Assigns scores for each feature in data


● Discard features scored lower by feature importance
Feature importance with SKLearn

● Feature Importance class is in-built in Tree Based Models (eg.,


RandomForestClassifier)
● Feature importance is available as a property
feature_importances_
● We can then use SelectFromModel to select features from the trained
model based on assigned feature importances.
Extracting feature importance
def feature_importances_from_tree_based_model_():

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2,


stratify=Y, random_state = 123)
model = RandomForestClassifier()
model = model.fit(X_train,Y_train)

feat_importances = pd.Series(model.feature_importances_, index=X.columns)


feat_importances.nlargest(10).plot(kind='barh')
plt.show()

return model
Feature importance plot

0 0.05 0.1 0.15


Select features based on importance

def select_features_from_model(model):

model = SelectFromModel(model, prefit=True, threshold=0.012)

feature_idx = model.get_support()
feature_names = df.drop("diagnosis_int",1 ).columns[feature_idx]
return feature_names
Tying together and evaluation

# Calculate and plot feature importances


model = feature_importances_from_tree_based_model_()

# Select features based on feature importances


feature_imp_feature_names = select_features_from_model(model)
Performance table
Method Feature Count Accuracy ROC Precision Recall F1 Score

All Features 30 0.96726 0.964912 0.931818 0.9761900 0.953488

Correlation 21 0.97420 0.973684 0.953488 0.9761904 0.964705

Univariate Feature 20 0.96031 0.95614 0.91111 0.97619 0.94252


Selection

Recursive Feature 20 0.9742 0.973684 0.953488 0.97619 0.964706


Elimination

Feature Importance 14 0.96726 0.96491 0.931818 0.97619 0.953488

Best Result
Review

● Intro to Preprocessing
● Feature Engineering
● Preprocessing Data at Scale
○ TensorFlow Transform
● Feature Spaces
● Feature Selection
○ Filter Methods
○ Wrapper Methods
○ Embedded Methods

You might also like