Week 4 Lecture Slides BUS265 2023
Week 4 Lecture Slides BUS265 2023
Digital Technology
Lecture 4: Building a Machine Learning Model for Prediction
2
Machine learning process
3
Model performance
• Generalisation:
- we want models to apply not just to the exact training
set but to the general population from which the training
data came
4
Model performance
5
Model complexity via geometric interpretation
3 models
(which do you prefer?)
6
How can we judge whether our modeling has overfit?
7
Tool for model performance evaluation:
The fitting curve
Under-fitting Over-fitting
• Over-fitting: model “memorizes”
the properties of the particular
training set rather than learning
the underlying phenomenon Good Fit
• In-sample evaluation is in
favour or “memorizing”
• On the training data the right
model would be best
• But on new data it would be
bad
8
Finding the best-fitting model
9
Holdout validation
10
Holdout validation — simple hold-out set
Partition data into training and testing
set (2/3 to 1/3 or 80% to 20%)
• In some domains it makes sense
to partition temporally (training set
before time t, test set after time t)
Challenges:
1. What if by accident you selected a particularly
easy/hard test set?
2. Do you have an idea of the variation in model
accuracy due to training? What would be the
model accuracy if you select a different
training set?
11
Holdout validation — Cross-validation (CV)
4-fold CV
• Partition data into k “folds”
(randomly)
• Run training/test evaluation k
times
12
Holdout validation — Cross-validation (CV)
• Each fold is test set once (rest are combined for training set)
• Eventually tests on all data (each data point once)
• Can compute average and variance of accuracy measure(s) across folds
• Better use of a limited dataset: CV computes its estimates over all the data
13
How to choose the model
Measuring predictive ability
14
Metrics to evaluate classification models—Accuracy
15
Metrics to evaluate classification models—
Confusion Metrix
Confusion matrix represents different sorts of errors made by a classification model
Actual
“confusion matrix”
or
+ -
“contingency table”
Y True+ False+ Entries are counts of
Predicted correct classifications
Not all errors are equal: think about a False and counts of errors
Negative (False-) result in medicine that
N False- True-
indicates that a person does not have a
specific disease/condition when the person
actually does have the disease/condition. More on classification next week…
16
Metrics to evaluate regression models
• R Square
• Does not take into account the overfitting problem (Adjusted R)
• Used for classical in-sample evaluation
• Mean Square Error (MSE)
• Root Mean Square Error (RMSE) (square root of MSE)
• Bayesian information criterion (BIC)—includes a penalty term for
the number of parameters in the model to address overfitting
• The closer the model predictions are to the observations, the
smaller the MSE/RMSE/BIC
• Models with lower MSE/RMSE/BIC are generally preferred
17
CASE STUDY: Predicting Used Car Value
18
Price cars
19
Prediction setup
20
Prediction setup
21
Prediction setup
22
Loss function
23
Square loss
24
Mean Squared Error (MSE)
25
Case study: used cars data
26
Case study - used cars: features
27
Case study: models by hand
28
Case study: Car price model results
29
External validity, avoiding overfitting and model
selection
30
Underfit, overfit
31
Underfitting and overfitting the original data
32
Overfitting
33
Reason for overfitting
34
Model fit evaluation
35
Model fit evaluation
36
Finding the best model by best fit and penalty
37
Finding the best model by training and test samples
38
5-fold cross-validation
39
Finding the best model by cross-validation
40
Case study: Model selection
41
Case study: Model selection
42
Case study: Model selection
Model 4
The lowest RMSE
on the test sample
43
Acknowledgements
44
Thank you
Thank you