Multiple Regression With Serial
Multiple Regression With Serial
com
Chapter 306
Multiple Regression
with Serial Correlation
Introduction
The regular Multiple Regression routine assumes that the random-error components are independent from one
observation to the next. However, this assumption is often not appropriate for business and economic data.
Instead, it is more appropriate to assume that the error terms are positively correlated over time. These are called
autocorrelated or serially correlated data.
Consequences of the error terms being serially correlated include inefficient estimation of the regression
coefficients, under estimation of the error variance (MSE), under estimation of the variance of the regression
coefficients, and inaccurate confidence intervals.
The presence of serial correlation can be detected by the Durbin-Watson test and by plotting the residuals against
their lags.
ε t = ρε t −1 + ut
where
ρ < 1 is the serial correlation
ut ~ N (0, σ 2 )
The subscript t represents the time period. In econometric work, these u’s are often called the disturbances. They
are the ultimate error terms. Further details on this model can be found in chapter 12 of Neter, Kutner,
Nachtsheim, and Wasserman (1996).
306-1
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
Cochrane-Orcutt Procedure
Several methods have been suggested to estimate the autoregressive error model. We have adopted the Cochrane-
Orcutt procedure as given in Neter, Kutner, Nachtsheim, and Wasserman (1996). This is an iterative procedure
that involves several steps.
1. Ordinary least squares. The regression coefficients are estimated using ordinary least squares. The array
of residuals is calculated.
( )
2. Estimation of ρ . The serial correlation is estimated from the current residuals et = Yt − Yˆt using the
formula
n
∑e e t t −1
ρ̂ = t =2
n
∑e
t =2
2
t −1
3. Obtain transformed data. A new set of data is created using the formulas.
Yt′ = Yt − ρˆYt −1
X 1′t = X 1t − ρˆX 1,t −1
X ′pt = X pt − ρˆX p ,t −1
4. Fit model to transformed data. Ordinary least squares is used to fit the following multiple regression to
the transformed data.
Yt′ = b0′ + b1′ X 1t + b2′ X 2 t + + b′p X pt
5. Create the regression model for the untransformed data. The regression equation of the untransformed
data is created using the following equations.
b0′
b0 =
1 − ρˆ
b1 = b1′
b2 = b2′
b p = b′p
The estimated standard errors of the regression coefficients are given by
s (b0′ )
s (b0 ) =
1 − ρˆ
s (b1 ) = s (b1′ )
s (b2 ) = s (b2′ )
s (b p ) = s (b′p )
6. Iterate until convergence is reached. Steps 2 – 4 are then repeated until the value of P stabilizes. Usually,
only four or five iterations are necessary.
306-2
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
7. Calculate Durbin-Watson test on transformed residuals. As a final diagnostic check, the Durbin-Watson
( )
test may be run on the residuals et′ = Yt′ − Yˆt′ from the transformed regression model.
Durbin-Watson Test
The Durbin-Watson test is often used to test for positive or negative, first-order, serial correlation. It is calculated
as follows
∑ (e )
N
2
j − e j −1
j=2
DW = N
∑e
j =1
2
j
The distribution of this test is difficult because it involves the X values. Originally, Durbin-Watson (1950, 1951) gave
a pair of bounds to be used. However, there is a large range of ‘inclusion’ found when using these bounds. Instead of
using these bounds, we calculate the exact probability using the beta distribution approximation suggested by Durbin-
Watson (1951). This approximation has been shown to be accurate to three decimal places in most cases which is all
that are needed for practical work.
Forecasts
The predicted value for a specific set of independent variable values is given by
en = Yn − Yˆn
The approximate 1 − α prediction interval for this forecast is
Fn + j ± t1−α / 2,n −3 s F
where s F is the standard error of the prediction interval based on the transformed data.
Data Structure
The data are entered in two or more variables. An example of data appropriate for this procedure is shown below.
These data give the annual values for several economic statistics. Later in this chapter, these data will be used in
an example in which Housing is forecast from Mort5Yr and DispInc. These data are stored in a dataset called
Housing.NCSS. Note that only two decimal places are displayed here, while on the database, more decimal places
are stored.
306-3
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
Procedure Options
This section describes the options available in this procedure.
Variables Tab
This panel specifies the variables used in the analysis.
Dependent Variable
Y: Dependent Variable(s)
This option specifies one or more dependent (Y) variables. If more than one variable is specified, a separate
analysis is run for each.
306-4
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
If you want to create powers and cross-products of these variables, specify an appropriate model in the ‘Custom
Model’ field under the Model tab.
If you want to create predicted values of Y for values of X not in your database, add the X values to the bottom of
the database. These rows will not be used during estimation phase, but predicted values will be generated for them
on the reports.
Estimation Options
Maximum Cochrane-Orcutt Iterations
This is the maximum number of iterations that the procedure will cycle through. Some authors recommend only
one iteration. Others recommend stopping once the Durbin-Watson test is not significant. This option lets you
stop after a specific number of iterations. Usually, four or five iterations should be plenty.
Minimum Rho Change
If the change is rho (serial correlation) from one iteration to the next is less than this amount, the algorithm will
stop iterating. We suggest you use a small amount such as 0.00001.
Alpha Levels
Alpha of C.I.’s and Tests
The value of alpha for the statistical tests and confidence intervals is specified here. Usually, this number will
range from 0.1 to 0.001. A common choice for alpha is 0.05, but this value is a legacy from the age before
computers when only printed tables were available. You should determine a value appropriate for your particular
study.
Alpha of Assumptions
This value specifies the significance level that must be achieved to reject a preliminary test of an assumption. In
regular hypothesis tests, common values of alpha are 0.05 and 0.01. However, most statisticians recommend that
preliminary tests use a larger alpha such as 0.10, 0.15, or 0.20.
We recommend 0.20.
306-5
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
Model Tab
These options control the regression model.
Model Specification
Which Model Terms
This option specifies which terms (terms, powers, cross-products, and interactions) are included in the regression
model. For a time-series regression model, select Up to 1-Way.
The other options on this tab are covered in detail in the Multiple Regression chapter. We refer you to that chapter
for further details.
Reports Tab
The following options control which reports and plots are displayed.
Select Reports
Run Summary ... Residuals
Each of these options specifies whether the indicated report is calculated and displayed. Note that since some of
these reports provide results for each row, they may be too long for normal use when requested on large
databases.
Report Options
Show All Rows
This option makes it possible to display predicted values for only a few designated rows.
When checked predicted values, residuals, and other row-by-row statistics, will be displayed for all rows used in
the analysis.
When not checked, predicted values and other row-by-row statistics will be displayed for only those rows in
which the dependent variable’s value is missing.
Precision
Specifies the precision of numbers in the report. Single precision will display seven-place accuracy, while the
double precision will display thirteen-place accuracy.
Variable Names
This option lets you select whether to display variable names, variable labels, or both.
Skip Line After
The names of the indicator variables can be too long to fit in the space provided. If the name contains more
characters than the number specified here, only the name is shown on the first line of the report and the rest of the
output is placed on the next line.
Enter 1 when you want the each variable’s results printed on two lines.
Enter 100 when you want each variable’s results printed on a single line.
306-6
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
Plots Tab
These options control the titles and style files used on each of the plots.
Select Plots
Histogram ... Residuals vs X Plot
Indicate whether to display these plots. Click the plot format button to change the plot settings.
Storage Tab
These options let you specify if, and where on the dataset, various statistics are stored.
Warning: Any data already in these columns are replaced by the new data. Be careful not to specify columns that
contain important data.
306-7
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
306-8
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
This report summarizes the multiple regression results. It presents the variables used, the number of rows used,
and the basic results. The estimated value of the autocorrelation (rho) has been added to this report. Otherwise, it
is identical to the corresponding report in the regular Multiple Regression report.
Note that values such as R2, Mean Square Error, etc., are calculated on the transformed data.
For each variable, the count, arithmetic mean, standard deviation, minimum, and maximum are computed. This
report is particularly useful for checking that the correct variables were selected.
Estimated Model
445.136489079996+ 4.4434007069797E-03*DispInc-8.53714263704248*Mort5Yr
This section reports the values and significance tests of the regression coefficients. Note that the intercept has
been corrected by dividing by 1-rho. Other than this, the report has the same definitions as in regular Multiple
Regression.
306-9
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
This section reports the analysis of variance table. Note it was calculated from the transformed data on the last
iteration. Other than this, the report has the same definitions as in regular Multiple Regression.
306-10
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
This section reports the autocorrelation structure of the residuals both before and after the model is corrected for
serial correlation. It has the same definitions as in the regular Multiple Regression report.
Confidence intervals for the mean response of Y given specific levels for the IV’s are provided here.
A prediction interval for the individual response of Y given specific values of the IV’s is provided here for each
row. Note that the forecasts start where the actual housing values are blank.
306-11
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
Residual Report
Absolute
Actual Predicted Percent
Row Housing Housing Residual Error
1 420.722 445.738
2 431.522 447.504 -15.982 3.704
3 448.085 462.874 -14.789 3.300
4 447.923 464.696 -16.773 3.745
5 451.401 454.340 -2.939 0.651
6 432.474 438.269 -5.795 1.340
7 403.338 409.328 -5.990 1.485
8 407.922 411.549 -3.627 0.889
. . . . .
. . . . .
. . . . .
Histogram
The purpose of the histogram and density trace of the residuals is to evaluate whether they are normally
distributed. A dot plot is also given that highlights the distribution of points in each bin of the histogram. Unless
you have a large sample size, it is best not to rely on the histogram for visually evaluating normality of the
residuals. The better choice would be the normal probability plot.
306-12
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
Y vs X’s Plots
Actually, a regression analysis should always begin with a plot of Y versus each IV. These plots often show
outliers, curvilinear relationships, and other anomalies.
306-13
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
306-14
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multiple Regression with Serial Correlation
306-15
© NCSS, LLC. All Rights Reserved.