19 Assessing Model Accuracy
19 Assessing Model Accuracy
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Measuring the Quality of Fit
Mean Squared Error (MSE):
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Overfitting
- Test MSE
-Training MSE
Left: Data simulated from f, shown in black. Three estimates of f are shown:
the linear regression line (orange curve), and two smoothing spline fits (blue
and green curves). Right: Training MSE (grey curve), test MSE (red curve), and
minimum possible test MSE over all methods (dashed line). Squares represent
the training and test MSEs for the three fits shown in the left-hand panel.
When a given method yields a small training MSE but a large test MSE, we are said to
be overfitting the data (The U-shape in test MSE)
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Overfitting
• When a given method yields a small training MSE but a
large test MSE, we are said to be overfitting the data.
• Statistical learning procedure may be picking up some
patterns that are just caused by random chance rather
than by true properties of the unknown function f.
• Regardless of whether or not overfitting has occurred,
we almost always expect the training MSE to be
smaller than the test MSE because most statistical
learning methods either directly or indirectly seek to
minimize the training MSE.
• Overfitting refers specifically to the case in which a less
flexible model would have yielded a smaller test MSE.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Overfitting examples
- Test MSE
-Training MSE
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Overfitting examples
- Test MSE
-Training MSE
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Overfitting examples
- Test MSE - Test MSE
-Training MSE -Training MSE
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bias-Variance Trade-Off
The U-shape observed in the test MSE curves is a
result of two competing properties of statistical
learning methods.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Training and Testing
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bias-Variance Trade-Off
The U-shape observed in the test MSE curves is a
result of two competing properties of statistical
learning methods.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bias-Variance Trade-Off
• Variance refers to the amount by which
መ 0 )would change if we estimated it using a
𝑓(𝑥
different training data set.
• Ideally the estimate for f should not vary too
much between training sets.
• In general, more flexible statistical methods
have higher variance.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bias-Variance Trade-Off
• Bias refers to the error that is introduced by
approximating a real-life problem, which may
be extremely complicated, by a much simpler
model.
• Bias will (on an average) either over predict or
under predict the results.
• Generally, more flexible methods result in less
bias.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bias-Variance Trade-Off
- Test MSE
-Training MSE
The vertical dotted line indicates the flexibility level corresponding to the smallest test MSE.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bias-Variance Trade-Off
- Test MSE
-Training MSE
The vertical dotted line indicates the flexibility level corresponding to the smallest test MSE.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bias-Variance Trade-Off
The vertical dotted line indicates the flexibility level corresponding to the smallest test MSE.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications