0% found this document useful (0 votes)

5 views49 pages

Jdavis Advice

Uploaded by

Riccardo Forte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views49 pages

Jdavis Advice

Uploaded by

Riccardo Forte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

1

ADVICE ABOUT PRACTICAL

ASPECTS OF ML

Jesse Davis
Goals of this Lecture: Address Practical Aspects of
Machine Learning
2

 Massaging the data for better performance

 Discussing how to set up an appropriate empirical evaluation

 Identifying potential pitfalls

 At high level a bunch of stuff I wish I knew for

 Performing academic empirical evaluations
 Dealing with real-world “applied” tasks
3 Part I: Selecting Features
Dimensionality Reduction
4

 Represent data with fewer dimensions! ☺

 Effectively: Alter the given feature space

 Two broad ways

 Construct new feature space
 Simple drop dimensions in given space
Why Dimensionality Reduction
5

 Easier learning – fewer parameters

 |Features| ≫ |training examples| ??
 Better visualization
 Hard to understand more than 3D or 4D
 Discover “intrinsic dimensionality” of data
 High dimensional data may truly be low dimensional
 More interpretable models
 Interested in which features are relevant for task
 Improve efficiency
 Fewer features = less memory / runtime
Don’t Some Algorithms Do This?
6

 Decision trees:
 Selectmost promising feature at each node
 Tree only contains a subset of features

 Problem: Irrelevant attributes can degrade performance due to

data fragmentation
 Datasplit into smaller and smaller sets
 Even random attribute can look good with little data by chance

 More data does not help

Principal Component Analysis
7

 First principal component:

Direction of the largest variance x2
u1

 Each subsequent principal component:

 Orthogonal to the previous ones, and

 Directions of the largest variance of

the residuals

Big Idea: Rotate the axes and drop irrelevant ones!

Eigenfaces [Turk, Pentland ’91]
8

Input images:
 N images

 Each 5050 pixels

 2500 features

Misleading figure.
Best to think of as an N  2500 matrix: |Examples|  |Features|
Reduce Dimensionality 2500 → 15
9

First principal component

Average
face

Other
components
Problematic Data Set for PCA
10

PCA cannot capture NON-LINEAR structure!

PCA Conclusions
11

 PCA
 Rotate the axes and sort new dimensions in order of “importance”
 Discard low significance dimensions
 Uses:
 Get compact description
 Ignore noise
 Improve classification (hopefully)
 Not magic:
 Doesn’t know class labels
 Can only capture linear variations

 One of many tricks to reduce dimensionality!

Feature Selection:
12
Two Approaches
Filtering-Based Wrapper-Based
Feature Selection Feature Selection
all features
all FS algorithm
FS algorithm features calls ML
algorithm
Score and rank each many times,
feature: Pick top k uses it to help
model select features
ML algorithm
ML algorithm
model
Filter-Based Approaches
13

 Idea: Measure each feature’s usefulness in isolation (i.e.,

independent of other features)

 Pro: Very fast so scales to large feature sets or large data sets

 Cons
 Misses feature interactions
 May select many redundant feature
Approach 1: Correlation
14

Gain(S,A) = Entropy(S) - Σ(|Sv| / |S|) Entropy(Sv)

v  Values(A)

cov( fi , y)
R ( fi , y ) =
var( fi ) var( y)

 (f )( y )
m
k =1 k,i − fi k −y
R( f i , y) =
 (f )  (y )
m 2 m 2

k =1 k ,i
− fi k =1 k
−y
Approach 2: Single Variable Classifier
15

 Select variable according to individual predictive performance

 Build classifier with just one variable

 Discrete:
Decision stump
 Continuous: Threshold the variable value

 Measure performance using accuracy, balanced accuracy, AUC,

etc.
Wrapper-Based Feature Selection
16

 Feature selection = search

 State = set of features
 Start state
 Forwardselection: Empty
 Backward elimination: Full

 Operators:
 Forward:add a feature
 Backward: subtract a feature

 Scoring function: Learned model’s performance on

training/tuning/ CV on the state’s feature set
Forward Feature Selection
17

Greedy search (aka “Hill Climbing”)

{}
50%

{F1} {F2} {Fd}

...
62% 72% 52%

add F3

{F1,F2} {F2,F3} ... {F2,Fd}

74% 73% 84%
Backward Feature Selection
18

Greedy search (aka “Hill Climbing”)

{F1,…,Fd}
75%
subtract F2

{F2,…,Fd} {F1, F3,…,Fd} ... {F1,…,Fd-1}

72% 82% 78%

subtract F3

{F3,…,Fd} {F1, F4,…,Fd} ... {F1, F3,…,Fd-1}

80% 83% 81%
Forward vs. Backward Selection
19

Forward Backward
 Faster in early steps because  Fast for choosing all but a
fewer features to test small subset of the features

 Fast for choosing a small

 Preserves features whose
subset of the features
usefulness requires other
features (e.g., area requires
 Misses features whose both length & width)
usefulness requires other
features (feature synergy)
Impact of feature selection on classification of
20
fMRI data [Pereira et al. ’05]
Feature Selection vs. Dimensionality Reduction
21

 Feature selection: Project to a lower dimensional subspace

perpendicular to removed feature
 Dimensionality reduction: allow other kinds of projection
Project onto
x2 x2 rotated axes

Drop x2

x1 x1
Feature Selection in Practice
22

 You cannot globally select the best features

 Thisis cheating
 Data leakage from test set to training set

 Results would be overoptimistic

 Feature selection must be performed separately for each fold

 Implication: Each fold could have a different feature set

23 Advice for Evaluation
Empirical Evaluation: Think about What You Want to
Demonstrate
24

 Many relevent questions

 Do we beat competitors?
 Are we more data efficient than competitions?
 Are we faster than the competition?

 Good practices:
 Pose a question / hypothesis and answer it
 Also include a naive baseline such as
◼ Always predict majority class
◼ Return mean value in training data
Case Study: RPE for Professional Soccer
25
Players
1.20
Given: GPS and
accelerometer data from a 1.00
player’s training session Train
set
Predict: Player’s Rate of 0.80 average
Neural
Perceived Exertion Net

MAE
0.60
LASSO
Question: Is model valid 0.40

across seasons?
0.20

0.00
Results: Is an Individual Model More Accurate Than
a Team Model?
26

0.90

0.85
Mean Absolute Error

0.80

0.75

0.70

0.65
Individual Team
Neural Net Boosted Tree LASSO

Lower value is better

How Does Amount of Data Affect Performance?
27

0.50
0.45
0.40
0.35
0.30
AUCPR

0.25
0.20
0.15
0.10
0.05
0.00
1 2 3
Number of Training Databases

TODTLER DTM LSM Random

Learning curve: Show performance as a function

of the amount of training data
Case Study: Activity Recognition
28

 Given: 3D accelerometer data from a phone

 Predict: Person’s activity (walking, ascending stairs, descending
stairs, cycling, jogging)
 Hypothesis: Deriving new signals will help
 Setup: Simulate different attachments by
rotating axes
 Approaches compared:
 TSFuse + GBT
 TSFresh + GBT (Time series features, but no fusion)
 RNN (LSTM)
Results Activity Recognition
29

TSFuse
TSFresh
RNN
Case Study: Energy Efficient Prediction
30

 Motivation: Learned models often deployed on devices with

resource constraints (e.g., battery)
 Question: How does feature selection strategy affect performance?
 Static
selection: Always consider k feature
 Dynamic selection: May ignore some features

 Approach: Fix max feature budget

RCV: Speedup and Weighted Accuracy vs.
Feature Budget
31

6.00
Speedup Factor Our approach:
5.00
4X more predictions
4.00
on resource budget
3.00
2.00
1.00
0.00
Δ Weighted Accuracy

0.02

0.01
IG
0.00
ΔCP
-0.01
0 200 400 600 800 1000
Feature Budget
Comparing Run Times Is A Dark Art
32

 What to measure: Wall clock or CPU time?

 Be sure to run everything on identically configured machines
 Should you include time to tune models?
 Easy to manipulate
 Also very relevant…

 Differences due to
 Programming languages
 How optimized the code is (definitely relevant)
Evaluate Design Decision: Ablasion or Lesion Study
33

 When designing your algorithm / model you make lots of decision

choices
 Which features
 Which normalizations

 Which functionality

 Ablative analysis tries to explain the difference between some

baseline (much poorer) performance and current performance
 Remove aspects of system and measure effect on performance
Case Study: Fatigue Protocol Data
34
Rating of perceived exertion (RPE): 6 – 20
Upper
arm

Wrist
Both
Tibia

Given: IMU data from a runner

Predict: Current fatigue level
Pre-processing: Normalizations Based on Domain
Knowledge
35

RPE evolution is trial-dependent: Normalize to first value

Normalize based on change from first windows

Domain Insight: Change in feature values over time is key
Effects of Feature Normalization for
Gradient Boosted Trees
36

No learning baselines:
3.50 Constant predictions

3.00
Median RPE
2.50
MAE of RPE

Personalized
2.00
Median
1.50 No Normalization

1.00 Normalization

0.50

0.00
Case Study: Resource Monitoring
37

Maintenance Univariate measurement: Abnormally high

usage Patterns Sampled every 5 minutes

Given: Real water usage data from a retail store

Do: Detect high periods of usage

Approach: Semi-supervised learning

 Simple statistical features, day of week, etc.

 Above features plus learned shape patterns

Results: Anomaly Detection for
Water Usage
38

Area under ROC Curve

Simple Features
Simple Features + Learned Patterns

Time of Data
39 Potential Problems or Pitfalls
Cross Validation Errors
40

 Must repeat entire data processing pipeline on every fold of

cross-validation using only that fold’s TRAINING DATA
 E.g.,
cannot do preprocessing over entire data set (feature selection,
parameter tuning, etc.

 Did I tweak my algorithm a million times until I get good results?

 Solution:Use one or two datasets for development, then expand
evaluation

 Temporal dependencies in the data?

Temporal Data Is Trickier!
41

 Setting: One season of data from training sessions from a professional

football team:
Season start Season end

Training: first 80% of data Testing: last 20% of data

 Predict adverse drug reactions
Adverse
Patient’s history First Prescription
Reaction?

Training data Censoring window

Class Imbalance
42

 Real-world problems: Often more examples of one class

(negatives) than the other (positives)

 One class rare: Anomaly detection, cancer, goals in a soccer

match, etc.

 This causes difficulties for learners: Hard to beat always

predicting the majority class!
Idea 1: Sampling
43

 Oversample the minority class: May lead to overfitting

 Undersample the majority class: Odd to throw away data

 SMOTE: Generate synthetic minority examples

 Find nearest neighbors
 Interpolate between them

- -
+ - +
+ - +
+ - -- - +
Synthetic - - -
example
Idea 2: Manipulate the Learner
44

 Change the cost function: Penalize mistakes on minority class

more heavily

 Optimize towards something that is better at capturing skew

0.5 × 𝑇𝑃 0.5 × 𝑇𝑁
 Balanced accuracy = +
𝑇𝑃 + 𝐹𝑁 𝐹𝑃 + 𝑇𝑁
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
 F1 =2×
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

 ROC

 Precision / Recall
My Model Is Not Accurate Enough
45

 Suppose: Activity recognition into walking, running, ascend stairs,

descend stairs
 Five minutes of data from ten subjects
 Divide data into 5 second windows which yields 600 examples

 Use five simple features from X, Y, Z acceleration

 Train linear separator using log loss

 Optimize using gradient descent

 Leave-one-subject out CV: 70% accuracy

Question: What do I do?

Possible fixes
46

 More data
 More / better features
 Change optimizer
 Change objective function
 Change model class
Question: What do I do?

Option 1: Grad student descent and try everything

Option 2: Debug the learning process

Look at Learning Curve
47

Error

Error
Train
Test

#Training examples #Training examples

Train and test error  Train error is low
 High
 Test error is high
 Close
=> More data
 More/better features

 More expressive model?

Conclusions
48

 Feature selection is important in practice

 Think about what you want to show in your empirical evaluation

 Practical issues are hard

 It
is a lot of guess and check at first
 Eventually you develop intuitions

 Generally speaking: Features and data more important than model

Questions?
49

unit-v-ppt (1).ppt
No ratings yet
unit-v-ppt (1).ppt
82 pages
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
From Everand
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
Eric Elliott
No ratings yet
ppt4dl
No ratings yet
ppt4dl
81 pages
Azencott BioML
No ratings yet
Azencott BioML
87 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
57 pages
3ML.03.Feature Reduction
No ratings yet
3ML.03.Feature Reduction
44 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
56 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
Chapter-4
No ratings yet
Chapter-4
25 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
CS3244 (2120) - Project Discussion 1 - Overview
No ratings yet
CS3244 (2120) - Project Discussion 1 - Overview
25 pages
Week 2 v1.1 (hidden) - Dimensionality and Evaluation
No ratings yet
Week 2 v1.1 (hidden) - Dimensionality and Evaluation
47 pages
An Introduction To Pattern Recognition - 2
No ratings yet
An Introduction To Pattern Recognition - 2
46 pages
Sta 5
No ratings yet
Sta 5
16 pages
Lecture 10 Merged
No ratings yet
Lecture 10 Merged
14 pages
Lecture 5 - Feature extraction, model building & evaluation
No ratings yet
Lecture 5 - Feature extraction, model building & evaluation
35 pages
ML - Hands On
No ratings yet
ML - Hands On
24 pages
Dimenn Red PDF
No ratings yet
Dimenn Red PDF
135 pages
Ml Notes All
No ratings yet
Ml Notes All
32 pages
Machine Learning and Econometrics
No ratings yet
Machine Learning and Econometrics
50 pages
Module 2
No ratings yet
Module 2
73 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
ML_MDU_2024_10939237
No ratings yet
ML_MDU_2024_10939237
20 pages
minor project
No ratings yet
minor project
21 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Fintech ML Using Azure
No ratings yet
Fintech ML Using Azure
51 pages
DSH - L5 - Data-Driven Approaches - Concepts
No ratings yet
DSH - L5 - Data-Driven Approaches - Concepts
38 pages
Features Selection and Featurs Generation
No ratings yet
Features Selection and Featurs Generation
5 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
ML IMP QUES 1
No ratings yet
ML IMP QUES 1
22 pages
Autodesk Maya 2022: A Comprehensive Guide, 13th Edition
From Everand
Autodesk Maya 2022: A Comprehensive Guide, 13th Edition
Prof. Sham Tickoo
No ratings yet
Unit 6aics
No ratings yet
Unit 6aics
25 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Machine Learning: Design, Development and Augmented Intelligence
No ratings yet
Machine Learning: Design, Development and Augmented Intelligence
25 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
AI5003-AML-Week07
No ratings yet
AI5003-AML-Week07
14 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
FAI Lecture - 23-10-2023 PDF
No ratings yet
FAI Lecture - 23-10-2023 PDF
12 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
ML Notes.docx
No ratings yet
ML Notes.docx
15 pages
Top 10 Data Mining Mistakes
No ratings yet
Top 10 Data Mining Mistakes
25 pages
40 Interview Questions On Machine Learning - AnalyticsVidhya
100% (1)
40 Interview Questions On Machine Learning - AnalyticsVidhya
21 pages
Cheat Sheet - Machine Learning - Data Science Interview PDF
No ratings yet
Cheat Sheet - Machine Learning - Data Science Interview PDF
16 pages
Information Gain - Towards Data Science
No ratings yet
Information Gain - Towards Data Science
8 pages
40 Interview Questions On Machine Learning From Analytics Vidhya
No ratings yet
40 Interview Questions On Machine Learning From Analytics Vidhya
14 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Mastering OpenGL: From Basics to Advanced Rendering Techniques: OpenGL
From Everand
Mastering OpenGL: From Basics to Advanced Rendering Techniques: OpenGL
Kameron Hussain
No ratings yet
Interview Questions On Machine Learning
100% (4)
Interview Questions On Machine Learning
22 pages
PR Assignment 01 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 01 - Seemal Ajaz (206979)
7 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
AI_THEORY_KAI-501[1]
No ratings yet
AI_THEORY_KAI-501[1]
65 pages
Preface To The Second Edition V 1 1
No ratings yet
Preface To The Second Edition V 1 1
9 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
ML Performance Improvement Cheatsheet
No ratings yet
ML Performance Improvement Cheatsheet
11 pages
CCW331 BA IAT 1 Set 1 & Set 2 Questions
No ratings yet
CCW331 BA IAT 1 Set 1 & Set 2 Questions
19 pages
7 MachineLearning
No ratings yet
7 MachineLearning
68 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
Quantact Amibroker Nifty
100% (3)
Quantact Amibroker Nifty
42 pages
Recommender Systems-Unit V
No ratings yet
Recommender Systems-Unit V
16 pages
DataIku Machine Learning Basics p2
No ratings yet
DataIku Machine Learning Basics p2
43 pages
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
No ratings yet
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
60 pages
1-Mapping Problems To Machine Learning Tasks
No ratings yet
1-Mapping Problems To Machine Learning Tasks
19 pages
Real Final
No ratings yet
Real Final
32 pages
Lecture 4 - Bias-Variance Trade-Off and Model Selection
No ratings yet
Lecture 4 - Bias-Variance Trade-Off and Model Selection
66 pages
PYTHON
No ratings yet
PYTHON
18 pages
Decision Tree With Cross Validation
No ratings yet
Decision Tree With Cross Validation
19 pages
Predicting The Success of A Startup in Information
No ratings yet
Predicting The Success of A Startup in Information
17 pages
Dual view deep learning for enhanced breast cancer screening using mammography
No ratings yet
Dual view deep learning for enhanced breast cancer screening using mammography
15 pages
A Manual For Objective TAT Scoring
No ratings yet
A Manual For Objective TAT Scoring
40 pages
Encog 3 3 Quickstart
No ratings yet
Encog 3 3 Quickstart
61 pages
Dichotomizing Continuous Predictors in Multiple Regression: A Bad Idea
No ratings yet
Dichotomizing Continuous Predictors in Multiple Regression: A Bad Idea
15 pages
Unstructured
No ratings yet
Unstructured
37 pages
Implementation of real time activity sensing
No ratings yet
Implementation of real time activity sensing
9 pages
Auto-Pytorch Tabular: Multi-Fidelity Metalearning For Efficient and Robust Autodl
No ratings yet
Auto-Pytorch Tabular: Multi-Fidelity Metalearning For Efficient and Robust Autodl
15 pages
unit1 DL JNTUK
No ratings yet
unit1 DL JNTUK
43 pages
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
No ratings yet
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
26 pages
(Articulo) Class Imbalance, and Cost Sensitivity - Why Undersampling Beats Over - Sampling PDF
No ratings yet
(Articulo) Class Imbalance, and Cost Sensitivity - Why Undersampling Beats Over - Sampling PDF
8 pages
2021 - Contribuições Únicas de Clorofila e Nitrogênio para Prever A Capacidade Fotossintética Da Cultura A Partir Da Espectroscopia de Folha
No ratings yet
2021 - Contribuições Únicas de Clorofila e Nitrogênio para Prever A Capacidade Fotossintética Da Cultura A Partir Da Espectroscopia de Folha
14 pages
Forecasting Rare Earth Stock Prices With Machine Learning
No ratings yet
Forecasting Rare Earth Stock Prices With Machine Learning
10 pages
Journal of Hydrology: Regional Studies: Getachew Tegegne, Dong Kwan Park, Young-Oh Kim
No ratings yet
Journal of Hydrology: Regional Studies: Getachew Tegegne, Dong Kwan Park, Young-Oh Kim
18 pages
Classification With WEKA: Data Mining Lab 2
No ratings yet
Classification With WEKA: Data Mining Lab 2
8 pages
Car_Dekho-Used_Car_Price_Prediction
No ratings yet
Car_Dekho-Used_Car_Price_Prediction
10 pages
Discriminant Analysis:: Group Statistics
No ratings yet
Discriminant Analysis:: Group Statistics
6 pages
Regression Analysis (1722021)
No ratings yet
Regression Analysis (1722021)
279 pages