0% found this document useful (0 votes)
41 views

Week 4 Lecture Slides BUS265 2023

This document discusses evaluating machine learning models to avoid overfitting. It introduces concepts like training and test sets, cross-validation, and various performance metrics. A case study on predicting used car prices is presented to illustrate model building and selection. Different regression models are fit to the car price data and their performance on training and test sets is compared to select the best model.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Week 4 Lecture Slides BUS265 2023

This document discusses evaluating machine learning models to avoid overfitting. It introduces concepts like training and test sets, cross-validation, and various performance metrics. A case study on predicting used car prices is presented to illustrate model building and selection. Different regression models are fit to the car price data and their performance on training and test sets is compared to select the best model.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

BUS265 Machine Learning and

Digital Technology
Lecture 4: Building a Machine Learning Model for Prediction

Dr Valentin Danchev School of Business and Management


Queen Mary University of London
Model performance

• When building a supervised model, how do I know that my


model is any good?
• With powerful, flexible algorithms searching for patterns or
models, there is a serious danger of overfitting.
- overfitting sometimes is a difficult concept
- generally the idea is that “if you look hard enough you’ll find
something” even if it does not generalize beyond the particular
training data.
- “if you torture the data long enough, it will confess” (Ronald Coase)

2
Machine learning process

3
Model performance
• Generalisation:
- we want models to apply not just to the exact training
set but to the general population from which the training
data came

4
Model performance

• There is no single choice or procedure that will eliminate


over-fitting
- recognize over-fitting and manage complexity in a
principled way

5
Model complexity via geometric interpretation
3 models
(which do you prefer?)

6
How can we judge whether our modeling has overfit?

Under-fitting Good Fit Over-fitting

7
Tool for model performance evaluation:
The fitting curve
Under-fitting Over-fitting
• Over-fitting: model “memorizes”
the properties of the particular
training set rather than learning
the underlying phenomenon Good Fit
• In-sample evaluation is in
favour or “memorizing”
• On the training data the right
model would be best
• But on new data it would be
bad

8
Finding the best-fitting model

Holdout data = Test data

9
Holdout validation

We are interested in generalization –


the performance on data not used for training
Given only one data set, we hold out some data for evaluation
• holdout set for final evaluation is called the test set
• Accuracy on training data is sometimes called “in-sample”
accuracy vs. “out-of-sample” accuracy on test data
• a.k.a. “holdout accuracy”
• an estimate of “generalization accuracy”

10
Holdout validation — simple hold-out set
Partition data into training and testing
set (2/3 to 1/3 or 80% to 20%)
• In some domains it makes sense
to partition temporally (training set
before time t, test set after time t)

Challenges:
1. What if by accident you selected a particularly
easy/hard test set?
2. Do you have an idea of the variation in model
accuracy due to training? What would be the
model accuracy if you select a different
training set?

11
Holdout validation — Cross-validation (CV)
4-fold CV
• Partition data into k “folds”
(randomly)
• Run training/test evaluation k
times

12
Holdout validation — Cross-validation (CV)

• Each fold is test set once (rest are combined for training set)
• Eventually tests on all data (each data point once)
• Can compute average and variance of accuracy measure(s) across folds
• Better use of a limited dataset: CV computes its estimates over all the data

13
How to choose the model
Measuring predictive ability

Metrics to evaluating classification models


• Accuracy: the fraction of predictions our
model got right.

• But in business applications different errors


(different decisions) have different costs and
benefits associated with them

14
Metrics to evaluate classification models—Accuracy

• Accuracy is the fraction of predictions our model got right.

• But in business applications different errors (different decisions)


have different costs and benefits associated with them

15
Metrics to evaluate classification models—
Confusion Metrix
Confusion matrix represents different sorts of errors made by a classification model
Actual

“confusion matrix”
or
+ -
“contingency table”
Y True+ False+ Entries are counts of
Predicted correct classifications
Not all errors are equal: think about a False and counts of errors
Negative (False-) result in medicine that
N False- True-
indicates that a person does not have a
specific disease/condition when the person
actually does have the disease/condition. More on classification next week…

16
Metrics to evaluate regression models
• R Square
• Does not take into account the overfitting problem (Adjusted R)
• Used for classical in-sample evaluation
• Mean Square Error (MSE)
• Root Mean Square Error (RMSE) (square root of MSE)
• Bayesian information criterion (BIC)—includes a penalty term for
the number of parameters in the model to address overfitting
• The closer the model predictions are to the observations, the
smaller the MSE/RMSE/BIC
• Models with lower MSE/RMSE/BIC are generally preferred

17
CASE STUDY: Predicting Used Car Value

Link to the Colab notebook:


https://colab.research.google.com/drive/1AgaUxnarPm5Vdk
3fQ1jZlyX_noD3Dixr?usp=sharing

18
Price cars

19
Prediction setup

20
Prediction setup

21
Prediction setup

22
Loss function

23
Square loss

24
Mean Squared Error (MSE)

25
Case study: used cars data

26
Case study - used cars: features

27
Case study: models by hand

28
Case study: Car price model results

29
External validity, avoiding overfitting and model
selection

30
Underfit, overfit

31
Underfitting and overfitting the original data

32
Overfitting

33
Reason for overfitting

34
Model fit evaluation

35
Model fit evaluation

36
Finding the best model by best fit and penalty

37
Finding the best model by training and test samples

38
5-fold cross-validation

39
Finding the best model by cross-validation

40
Case study: Model selection

41
Case study: Model selection

42
Case study: Model selection

Model 4
The lowest RMSE
on the test sample

43
Acknowledgements

• Courses/slides by Foster Provost, Panos Adamopoulos, Karolis Urbonas, Leonid


Zhukov, Mladen Kolar, John Kelleher, Chirag Shah, Gabor Bekes and Gabor
Kezdi

44
Thank you

Thank you

You might also like