Statistics 622: Calibration
Statistics 622: Calibration
Statistics 622
Module 4
Calibration
OVERVIEW
..................................................................................................................................................................................
3
METHODOLOGY
.........................................................................................................................................................................
3
INTRODUCTION
..........................................................................................................................................................................
4
CALIBRATION
.............................................................................................................................................................................
4
ANOTHER
REASON
FOR
CALIBRATION
..................................................................................................................................
6
ROLE
OF
CALIBRATION
IN
REGRESSION
................................................................................................................................
6
CHECKING
THE
CALIBRATION
OF
A
REGRESSION
................................................................................................................
7
CALIBRATION
IN
SIMPLE
REGRESSION
(DISPLAY.JMP)
.................................................................................................
8
TESTING
FOR
A
LACK
OF
CALIBRATION
.............................................................................................................................
10
CALIBRATION
PLOT
...............................................................................................................................................................
12
CHECKING
THE
CALIBRATION
..............................................................................................................................................
15
CHECKING
CALIBRATION
IN
MULTIPLE
REGRESSION
.....................................................................................................
17
CALIBRATING
A
MODEL
........................................................................................................................................................
20
PREDICTING
WITH
A
SMOOTHING
SPLINE
.........................................................................................................................
21
BUNDLING
THE
CALIBRATION
INTO
ONE
REGRESSION
...................................................................................................
22
OTHER
FORMS
OF
CALIBRATION
.........................................................................................................................................
25
DISCUSSION
.............................................................................................................................................................................
25
Prediction lies at the heart of statistical modeling. Statistical models extrapolate the data and
conditions that weve observed into new situations and forecast results in future time periods.
Ideally, these predictions are right on target, but thats wishful thinking. Since predictions only
estimate whats going to happen, at least we can be sure that they are right on average. For
example, suppose were trying to predict how well sales representatives perform in the field
based on their success in a training program. People vary widely, so we cannot expect to predict
exactly how each will perform too many other, random factors come into play.
We should, though, be right on average. Suppose our model predicts the sales volumes
generated weekly by sales reps. Among those predicted to book, say, $10,000 in sales next week,
we ought to demand that the average sales of these reps is in fact $10,000. At least then the total
predicted sales volume will come close to the actual total, even if we under or over predict the
sales of individuals.
Thats what calibration is all about: being right on average. Calibration does not ask much of a
model; its a minimal but essential requirement. Models that are not calibrated, even if they
have a high R2, are misleading and miss the opportunity to be better. Theres no excuse for using
an uncalibrated model because its a problem thats easily fixed, once you know to look for it.
Statistics 622
4-2
Fall 2011
Overview
This lecture introduces calibration and methods for checking
whether a model is calibrated.
Calibration
Calibration is a fundamental property of any predictive
model. If a model is not calibrated its predictions under
some conditions systematically lead to biased predictions
and poor decision, being too high or too low on average.
Test for calibration uses the plot of Y on the fitted values. It
is similar to checking for a nonlinear pattern in the
scatterplot of Y versus X with a transformation.
Outline
Simple regression provides the initial setting.
Well calibrate a simple regression in two ways. The first
is more familiar, but wont work in multiple regression.
The second generalizes to multiple regression.
Then well move to multiple regression. The second
calibration method used in simple regression works fine in
multiple regression.
Methodology
Smoothing spline
Smoothing splines fit smooth, gradually changing trends
that may not be linear.
Calibrating a model using splines improves its predictions,
albeit without offering more in the way of explaining what
was wrong in the original model.
Statistics 622
4-3
Fall 2011
Introduction
Better models produce better predictions in several ways:
The better the model, the more closely its predictions track
the average of the response.
The better the model, the more precisely its predictions
match the response (e.g., smaller prediction intervals).
The better the model, the more likely it is that the
distribution of the prediction errors are normally distributed
(as assumed by the MRM).
Consequences of systematic errors
Too be right on average, is critical. Unless predictions from
a model are right on average, the model cannot be used for
economic decisions.
Calibration is about being right on average.
High R2 calibrated. Calibration is neither a consequence
nor precursor to a large R2. A model with R2 = 0.05 may be
calibrated, and a model with R2 = 0.999 need not be
calibrated.
In simple regression, calibration is related to the choice of
transformations that capture nonlinear transformations.
Calibration
Definition
A model is calibrated if its predictions are correct on average:
ave(Response | Predicted value) = Predicted value.
( )
E Y Y = Y
Statistics 622
4-4
Fall 2011
Statistics 622
4-5
Fall 2011
4-6
Fall 2011
Statistics 622
4-7
Fall 2011
(display.jmp)
Sales
300
250
200
150
100
50
0
0
Display Feet
The linear fit misses the average amount sold for each
amount of shelf space devoted to its display. The objective
is to identify a smooth curve that captures the relationship
between the display feet and the amount sold at the stores.
Smoothing spline
We can show the average sales for each number of display
feet by adding a smoothing spline to the plot.2
A spline is a smooth curve that connects the average value
of Y over a subset of values of identified by an interval of
adjacent values of X.
1
This example begins in BAUR on page 12 and shows up several times in that casebook. This
example illustrates the problem of resource allocation because to display more of one item
requires showing less of items. The resource is the limited shelf space in stores.
2
In the Fit Y by X view of the data, choose the Fit Spline option from the tools revealed by the
red triangle in the upper left corner of the window. Pick the flexible option.
Statistics 622
4-8
Fall 2011
Sales
300
250
200
150
100
50
0
0
Display Feet
The spline tool is easy to control. Use the slider to vary the smoothness of the curve shown in
the plot. See BAUR casebook, p. 13, for further discussion of splines and these data.
4
Replications (several cases with matching values of all of the Xs) are a great asset if your data
has them. Usually, they only appear in simple regression because there are too many unique
combinations of predictors in multiple regression. Because the display data has replications,
JMPs Lack of Fit test checks for calibration. See the BAUR casebook, page 247.
Statistics 622
4-9
Fall 2011
To add a polynomial to a scatterplot, click the red triangle in the upper left corner of the output
window and pick Fit Polynomial. To match the spline often requires 5 or 6 powers.
Statistics 622
4-10
Fall 2011
450
400
350
Sales
300
250
200
150
100
50
0
0
Display Feet
RSquare
Root Mean Square Error
Term
Intercept
Display Feet
(Display Feet-4.40426)^2
(Display Feet-4.40426)^3
(Display Feet-4.40426)^4
(Display Feet-4.40426)^5
Estimate
109.52749
37.9317
12.189222
-2.36853
-2.070732
0.108778
0.8559
38.22968
Std Error
70.2328
15.51394
9.094144
6.066507
1.236305
0.560066
t Ratio
1.56
2.45
1.34
-0.39
-1.67
0.19
Prob>|t|
0.1266
0.0189
0.1875
0.6982
0.1016
0.8470
Partial F-test
Do not use the t-statistics to assess the polynomial because
of collinearity (even with centering). Were not
interpreting the coefficients; all we want to learn is whether
R2 significantly improved.
The partial F-test does what we need. It finds a very
significant increase for the 4 added predictors.
The degrees-of-freedom divisor in the numerator of the
ratio is 4 (not 5) because the polynomial adds 4 more
predictors. The d.f. for the denominator of F is n 2 (for
the original fit) 4 (added by the spline)
Statistics 622
41 0.1439
#
= 10.2
4 0.1441
4-11
Fall 2011
Sales
300
250
200
150
100
50
0
0
Display Feet
[BTW, it does not matter which log you use (unless youre
interpreting by taking derivatives. Logs are proportional to
each other. For instance log10(x) =loge(x)/loge(10). ]
Calibration Plot
Generalize to multiple regression
A calibration plot offers an equivalent test of calibration,
one that generalizes to multiple regression.
We need this generalization to test for lack of calibration
more convenient in models with more than one predictor.
The improvement is statistically significant even though none of the t-stats for the added
coefficients is significant. Theres too much collinearity. When you add several terms at once,
use the partial F to judge the increase.
7
F= ((0.8559-0.815)/4)/((1-0.8559)/41) = 2.9.
Statistics 622
4-12
Fall 2011
Definition
A calibration plot is a scatterplot of the actual values of
the response on the y-axis and the predicted values on the
x-axis. (This plot is part of the default output produced by
Fit Model.)
450
400
Sales Actual
350
300
250
200
150
100
50
0
0
Sales
300
250
200
150
100
50
0
0
Statistics 622
4-13
Fall 2011
Linear Fit8
Summary of Fit
RSquare
Root Mean Square Error
Observations (or Sum Wgts)
Term
Intercept
Pred Formula Sales
0.712
51.591
47
Parameter Estimates
Estimate
0
1
Std Error
26.51
0.09
t Ratio
0.00
10.55
Prob>|t|
1.0000
<.0001
JMP sometimes uses scientific notation for numbers very close to zero, something like 1.0e13, which translates to 1 10-13. The difference from zero in this output is due to round-off
errors in the underlying numerical calculations; computers use binary arithmetic that does not
represent exactly every number that you write down.
Statistics 622
4-14
Fall 2011
Sales
300
250
200
150
100
50
0
0
Why use a 5th order polynomial, you might ask? Its arbitrary, but seems to match the fit of a
smoothing spline so long as we stay away from the edges of the plot.
Statistics 622
4-15
Fall 2011
Summary of Fit
RSquare
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.8559
38.2297
268.1300
47.0000
Parameter Estimates
Term
Intercept
Pred Formula Sales
(Pred Formula Sales-268.13)^2
(Pred Formula Sales-268.13)^3
(Pred Formula Sales-268.13)^4
(Pred Formula Sales-268.13)^5
Estimate
20.765257
0.9541011
0.0077119
-0.000038
-8.29e-7
1.1e-9
Std Error
106.1972
0.390224
0.005754
0.000097
4.9e-7
5.6e-9
t Ratio
0.20
2.45
1.34
-0.39
-1.67
0.19
Prob>|t|
0.8459
0.0189
0.1875
0.6982
0.1016
0.8470
10
BS p 699. The example of modeling a time series using a polynomial illustrates what can
happen when you extrapolate a polynomial of high degree (degree 6) outside the range of the
observed data. See Figure 27.8 on page 701.
Statistics 622
4-16
Fall 2011
This model is part of the JMP data file; run the Uncalibrated regression script.
Statistics 622
4-17
Fall 2011
Summary of Fit
RSquare
Root Mean Square Error
Mean of Response
Observations
0.689
4.951
22.248
500
Parameter Estimates
Estimate Std Error t Ratio Prob>|t|
72.10
3.11
23.18 <.0001
-20.83
3.25
-6.41 <.0001
-1.55
0.19
-8.18 <.0001
-0.74
0.04 -17.66 <.0001
-1.28
0.12 -10.88 <.0001
0.05
0.01
3.48
0.0006
Term
Intercept
NOx
Distance
Lower class
Pupil/Teacher
Zoning
Calibration plot
We find curvature in the plot of the housing values on the
fitted values from the regression.
As happens often, a lack of calibration is most evident at
the extremes with the smallest and largest fitted values.
50
Value
40
30
20
10
0
0
10
20
30
40
RSquare
Root Mean Square Error
Term
Intercept
Pred Formula Value
Estimate
-7.5e-14
1
0.689
4.93
Std Error
0.706
0.030
t Ratio
0.00
33.18
Prob>|t|
1.0000
<.0001
4-18
Fall 2011
Value
40
30
20
10
0
0
10
20
30
40
50
RSquare
Root Mean Square Error
Observations (or Sum Wgts)
0.742
4.506
500
(0.742 ! 0.689) /4
(1! 0.742) /(500 ! 6 ! 4)
490 0.053
=
"
= 25.2
4
0.258
F=
12
The degrees of freedom in the F are 4 (for the 4 nonlinear terms in the polynomial) and n 6
4. (6 for the intercept and slopes in the original fit plus 4 for the non-linear terms).
Statistics 622
4-19
Fall 2011
Calibrating a Model
What should be done when a model is not calibrated?
Simple regression. Ideally, find a substantively motivated
transformation, such as a log, that captures the curvature.
Routine adjustments such as those for calibration are
no substitute for knowing the substance of the problem.
Multiple regression. Again, find the right transformation or
a missing predictor. This can be hard to do, but some
methods can find these (and indeed work for this data set
and model).
If you only need predictions, calibrate the fit.
(a) Use predictions from the polynomial used to test for
calibration, or better yet
(b) Use a spline that matches the polynomial fit in the test
for calibration. The spline avoids the edge effects that
make polynomials go wild when extrapolating.
Example
The matching spline (lambda=238.43) has similar R2 to the
fit of the polynomial model used in the test (74.2%), but the
spline controls the behavior at the edges of the plot.
50
Value
40
30
R-Square
Sum of Squares Error
20
0.748328
9784.732
10
0
0
10
20
30
40
50
Statistics 622
4-20
Fall 2011
Statistics 622
4-21
Fall 2011
13
JMP makes this easy. Recall the fitted multiple regression, set the Degree item to 5, select the
predicted value column, and finally use the Macro > Polynomial to Degree button to add powers
of the predicted values. Just remember to remove the 1st power. Easier done than said.
Statistics 622
4-22
Fall 2011
Collinearity
The addition of these powers adds quite a bit of collinearity
that will obscure the effects of the original explanatory
variables, so you will need to return to the original equation
to interpret them.
Be sure centering is turned on for a bit less collinearity.
It would be better to add so-called orthogonal
polynomials to the regression, but that requires more
manual calculations, so well not go there.
Example
To illustrate, I will redo the prior calibration for the 5predictor model for the Boston data. I relabeled the powers
of the predictions to make the output fit.
The fit is slightly better than the prior calibration regression
since it gets to modify the original slopes in addition to
calibrating the fit.
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
Term
Intercept
NOx
Distance
Lower class
Pupil/Teacher
Zoning
(Pred-22.2482)^2
(Pred-22.2482)^3
(Pred-22.2482)^4
(Pred-22.2482)^5
Statistics 622
0.772
0.768
4.256
22.248
500.000
Parameter Estimates
Estimate
45.12621
-11.52448
-0.18500
-0.64874
-0.50426
-0.05706
0.04167
0.00401
0.00003
-0.00001
4-23
Std Error
4.1677
2.9539
0.2103
0.0710
0.1310
0.0143
0.0088
0.0010
0.0000
0.0000
t Ratio
10.83
-3.90
-0.88
-9.14
-3.85
-3.99
4.74
3.84
0.80
-2.04
Prob>|t|
<.0001*
0.0001*
0.3794
<.0001*
0.0001*
<.0001*
<.0001*
0.0001*
0.4249
0.0419*
Fall 2011
Value
Calibration plot
The calibration plot for the revised model looks fine.
This figure shows the calibration
50
plot for the predictions from
40
this multiple regression.
Theres a line and spline
in the plot. The two are
basically indistinguishable.
30
20
10
10
20
30
40
Pred Formula Value
Profile plot
The profile plot shows that the calibration has a substantial
impact on the fit.
You can see how it only impacts the fit for tracts with
either high or low property values.
Prediction
Now that we have one regression model, we can use the
methods in the next lecture to anticipate the accuracy of
predictions.
Statistics 622
4-24
Fall 2011
50
Discussion
Some points to keep in mind
The reason to calibrate a model is so that the predictions
are correct, on average. Economic decision-making
requires calibrated predictions. As a by-product, the model
also has smaller RMSE.
We fit the polynomial to test for calibration only because
JMPs spline tool does not tell us the information needed to
use the partial F test (i.e., the number of added variables)
Splines are subject to personal impressions. Methods are
available (not in JMP) that provide a more objective
measure of how much to smooth the fit.
Statistics 622
4-25
Fall 2011