Predictions in Linear Regression Model
Predictions in Linear Regression Model
independently distributed following N (0, 2 ) . The parameters 0 and 1 are estimated using the
b0 y b1 x
sxy
b1
sxx
where
n n
1 n 1 n
sxy ( xi x )( yi y ), sxx ( xi x ) 2 , x i x , y yi .
i 1 i 1 n i 1 n i 1
0 0 0.
Thus the predictor pm is an unbiased predictor of E ( y ).
Predictive variance:
The predictive variance of pm is
PV ( pm ) Var (b0 b1 x0 )
Var y b1 ( x0 x )
Var ( y ) ( x0 x ) 2 Var (b1 ) 2( x0 x )Cov( y , b1 )
2 2 ( x0 x ) 2
0
n sxx
1 ( x0 x ) 2
2
.
n sxx
( p ) ˆ 2 1 ( x0 x )
2
PV m
n sxx
1 ( x x )2
MSE 0 .
n sxx
Prediction interval :
The 100(1- )% prediction interval for E ( y ) is obtained as follows:
The predictor pm is a linear combination of normally distributed random variables, so it is also normally
distributed as
pm ~ N 0 1 x0 , PV pm .
p E ( y)
P z / 2 m z /2 1
PV ( pm )
which gives the prediction interval for E ( y ) as
1 ( x x )2 ( x0 x ) 2
2 1
pm z /2 2 0 ,
m /2
p z .
n sxx n sxx
When 2 is unknown, it is replaced by ˆ 2 MSE and in this case, the sampling distribution of
pm E ( y )
1 ( x0 x ) 2
MSE
n sxx
Note that the width of prediction interval E ( y ) is a function of x0 . The interval width is minimum for
x0 x and widens as x0 x increases. This is expected also as the best estimates of y to be made at
x -values lie near the center of the data and the precision of estimation to deteriorate as we move to the
boundary of the x -space.
pa b0 b1 x0 .
Here a means “actual”. The true value of y in the prediction period is given by y0 0 1 x0 0 where
0 indicates the value that would be drawn from the distribution of random error in the prediction
period. Note that the form of predictor is the same as of average value predictor but its predictive error
and other properties are different. This is the dual nature of predictor.
Predictive bias:
The predictive error of pa is given by
pa y0 b0 b1 x0 ( 0 1 x0 0 )
(b0 0 ) (b1 1 ) x0 0 .
Thus, we find that
E ( pa y0 ) E (b0 0 ) E (b1 1 ) x0 E ( 0 )
000 0
Predictive variance
Because the future observation y0 is independent of pa , the predictive variance of pa is
PV ( pa ) E ( pa y0 ) 2
E[(b0 0 ) ( x0 x )(b1 1 ) (b1 1 ) x 0 ]2
Var (b0 ) ( x0 x ) 2 Var (b1 ) x 2Var (b1 ) Var ( 0 ) 2( x0 x )Cov(b0 , b1 ) 2 xCov(b0 , b1 ) 2( x0 x )Var (b1 )
[rest of the terms are 0 assuming the independence of 0 with 1 , 2 ,..., n ]
Var (b0 ) [( x0 x ) 2 x 2 2( x0 x )]Var (b1 ) Var ( 0 ) 2[( x0 x ) 2 x ]Cov(b0 , b1 )
Var (b0 ) x02Var (b1 ) Var ( 0 ) 2 x0Cov(b0 , b1 )
1 x2 2 x 2
2 x02 2 2 x0
n sxx sxx sxx
1 ( x x )2
2 1 0 .
n sxx
( p ) ˆ 2 1 1 ( x0 x )
2
PV a
n sxx
1 ( x x )2
MSE 1 0 .
n sxx
Prediction interval:
If 2 is known, then the distribution of
pa y0
PV ( pa )
p y0
P z / 2 a z / 2 1
PV ( pa )
which gives the prediction interval for y0 as
1 ( x x )2
2 1 ( x0 x ) 2
pa z /2 2 1 0 ,
a /2
p z 1 .
n s xx n sxx
follows a t -distribution with (n 2) degrees of freedom. The 100(1- )% prediction interval for y0 in
this case is obtained as
pa y0
P t /2, n 2 t /2, n 2 1
PV ( pa )
which gives the prediction interval for y0 as
1 ( x x )2 1 ( x0 x ) 2
pa t /2,n 2 MSE 1 0 ,
a /2,n 2
p t MSE 1 .
n sxx n sxx
The prediction interval is of minimum width at x0 x and widens as x0 x increases.
The prediction interval for pa is wider than the prediction interval for pm because the prediction interval
for pa depends on both the error from the fitted model as well as the error associated with the future
observations.
Econometrics | Chapter 4 | Predictions In Linear Regression Model | Shalabh, IIT Kanpur
5
Within sample prediction in multiple linear regression model
Consider the multiple regression model with k explanatory variables as
y X ,
where y ( y1 , y2 ,..., yn ) ' is a n 1 vector of n observation on study variable,
vector of regression coefficients and (1 , 2 ,..., n ) ' is a n 1 vector of random error components or
disturbance term following N (0, 2 I n ) . If intercept term is present, take first column of X to be
(1,1,...,1)' .
Let the parameter be estimated by its ordinary least squares estimator b ( X ' X ) 1 X ' y . Then the
predictor is p Xb which can be used for predicting the actual and average values of study variable.
This is the dual nature of predictor.
which proves that the predictor p Xb provides unbiased prediction for average value.
The predictive variance of p is
p E ( y)
P z /2 z /2 1
PVm ( p )
which gives the prediction interval for E ( y ) as
When 2 is unknown, it is replaced by ˆ 2 MSE and in this case, the sampling distribution of
p E ( y)
m ( p)
PV
p E ( y)
P t /2,n k t / 2,n k 1 .
m ( p)
PV
which gives the prediction interval for E ( y ) as
p t
/2, n k PV m ( p ), p t / 2, n k PV m ( p ) .
Comparing the performances of p to predict actual and average values, we find that p in better
predictor for predicting the average value in comparison to actual value when
PVm ( p) PVa ( p)
or k (n k )
or 2k n.
i.e. when the total number of observations are more than twice the number of explanatory variables.
p y
P z / 2 z /2 1
PVa ( p )
which gives the prediction interval for y as
When 2 is unknown, it is replaced by ˆ 2 MSE and in this case, the sampling distribution of
p y
a ( p)
PV
Further, suppose a set of n f observations on the same set of k explanatory variables are also available
but the corresponding n f observations on the study variable are not available. Assuming that this set of
variables and f is a n f 1 vector of disturbances following N (0, 2 I n f ) . It is also assumed that the
We now consider the prediction of y f values for given X f from model (2). This can be done by
estimating the regression coefficients from model (1) based on n observations and use it is formulating
the predictor in model (2). If ordinary least squares estimation is used to estimate in model (1) as
b ( X ' X ) 1 X ' y
then the corresponding predictor is
p f X f b X f ( X ' X ) 1 X ' y.
p f E( y f ) X f b X f
X f (b )
X f ( X ' X ) 1 X ' .
Then
E p f E ( y f ) X f X ' X X ' E
1
0.
Thus p f provides unbiased prediction for average value.
2 X f ( X ' X ) 1 X 'f .
If 2 is unknown, then replace 2 by ˆ 2 MSE in the expressions of predictive covariance matrix and
predictive variance and there estimates are
m ( p ) ˆ 2 X ( X ' X ) 1 X '
Cov f f f
p f z / 2 PVm ( p f ), p f z /2 PVm ( p f ) .
When 2 is unknown, it is replaced by ˆ 2 MSE and in this case, the sampling distribution of
p f E( y f )
m(p )
PV f
p f E( y f )
P t /2,n k t /2,n k 1 .
m(p )
PV
f
which gives the prediction interval for E ( y f ) as
p t m ( p ), p t
m /2, n k
PV / 2, n k PV m ( p ) .
m
pf yf X f b X f f
X f (b ) f .
Then
E p f y f X f E b E ( f ) 0.
tr Cova ( p f )
2 tr ( X ' X ) 1 X 'f X f n f .
The estimates of covariance matrix and predictive variance can be obtained by replacing 2 by
ˆ 2 MSE as
a ( p ) ˆ 2 X ( X ' X ) 1 X ' I
Cov f f f nf
pf yf
P z / 2 z /2 1
PVa ( p f )
p f z / 2 PVa ( p f ), p f z /2 PVa ( p f ) .
When 2 is unknown, it is replaced by ˆ 2 MSE and in this case, the sampling distribution of
pf yf
a(p )
PV f
pf yf
P t /2,n k t /2, n k 1 .
PV a ( p f )
which gives the prediction interval for y f as
p t a ( p ), p t
PV /2, n k PV a ( p f ) .
f /2,n k f f
Econometrics | Chapter 4 | Predictions In Linear Regression Model | Shalabh, IIT Kanpur
12
Simultaneous prediction of average and actual values of study variable
The predictions are generally obtained either for the average values of study variable or actual values of
study variable. In many applications, it may not be appropriate to confine our attention to only to either
of the two. It may be more appropriate in some situations to predict both the values simultaneously, i.e.,
consider the prediction of actual and average values of study variable simultaneously. For example,
suppose a firm deals with the sale of fertilizer to the user. The interest of company would be in
predicting the average value of yield which the company would like to use in showing that the average
yield of the crop increases by using their fertilizer. On the other side, the user would not be interested in
the average value. The user would like to know the actual increase in the yield by using the fertilizer.
Suppose both seller and user, both go for prediction through regression modeling. Now using the classical
tools, the statistician can predict either the actual value or the average value. This can safeguard the
interest of either the user or the seller. Instead of this, it is required to safeguard the interest of both by
striking a balance between the objectives of the seller and the user. This can be achieved by combining
both the predictions of actual and average values. This can be done by formulating an objective function
or target function. Such target function has to be flexible and should allow to assign different weights to
the choice of two kinds of predictions depending upon their importance in any given application and
also reducible to individual predictions leading to actual and average value prediction.
Now we consider the simultaneous prediction in within and outside sample cases.
Thus
E ( p ) XE (b ) E ( )
0.
So p provides unbiased prediction for .
The variance is
Var ( p ) E ( p ) '( p )
E (b ) ' X ' ' X (b )
E ' X ( X ' X ) 1 X ' X X ' X X ' 2 ' (b ) ' X ' ' X (b ')
1
E (1 2 ) ' X ( X ' X ) 1 X ' 2 '
2 (1 2 )tr ( X ' X ) 1 X ' X 2 2trI n
2 (1 2 )k 2 n .
y f X f f ; E ( f ) 0, V ( f ) 2 I n f .
n f 1 n f k k 1 n f 1
b X ' X X ' y.
1
p f X f b;
The variance of p f is
Var ( p f ) E ( p f ) '( p f )
E (b ) ' 'f 'f X f (b ) f
E ' X ( X ' X ) 1 X 'f X f X ' X X ' 'f f 2 'f X f ( X ' X ) 1 X '
1
tr X ( X ' X ) X f X f ( X ' X ) X ' n f
2 1 ' 1 2