Lecture Note 11 (2014 S)
Lecture Note 11 (2014 S)
Lecture note 11
Forecasting, Box-Jenkins, and Unit Root Tests
11.1 Forecast Evaluation
11.2 Forecast Exercise
11.3 Box-Jenkins Methodology
11.4 Unit Root Tests
2
11.1 Forecast Evaluation
Recall:
Suppose we know the true generating process, for example the , then the one-step-ahead
forecast would be
. Therefore, we
always encounter forecast error.
In practice, we never know the actual order of the process and the
parameters, i.e.,
, ,
,..
Obviously, the
for
competing models.
At t = 451, we can use
and calculate
. [_______
scheme]
Repeating the process, we will have 50 forecasts (
,,
) and 50
forecast errors (
, ,
, the
is estimated based on
At time 451, when we forecast
, the
is estimated based on
At time 452, when we forecast
, the
is estimated based on
(b) ______ scheme
At time 450, when we forecast
, the
is estimated based on
At time 451, when we forecast
, the
is estimated based on
At time 452, when we forecast
, the
is estimated based on
(c) _____ scheme
At time 450, when we forecast
, the
is estimated based on
At time 451, when we forecast
, the
is estimated based on
At time 452, when we forecast
, the
is estimated based on
6
Forecast Evaluation Criteria
(A) Mean Squared Error:
(B) Root Mean Squared Error: RMSE =
(C) Mean Percentage Error:
(D) Mean Absolute Error:
(E) Mean Absolute Percentage Error:
7
How do we know the MSE(s) of ___models are statistically different from each
other? Speaking differently, how do we know the difference between two models is
not due to the pure chance?
The F Statistic
Taking the MSE as an example, we can use the F test,
, where
is
calculated based on forecast errors of Model A;
Models A
and B are indifferent.
o Note that: The F test here is valid only if
(i) the forecast
.
Denote the loss differential between the two forecasts by
.
Two models are equally good if
.
If Assumptions (i) and (ii) discussed in F test are held, Granger and Newbold (1976)
showed that for quadratic loss function, testing
is equivalent testing
, where
and
.
9
Example:
One step ahead forecast error for 7 periods.
on
, i.e.,
. If the
null,
, where
We consider the
model, and use the first 90 observations for estimations. We keep the last 10 observations for
forecast evaluation.
Stata> arima y, arima (1,0,0) noconstant
. 100 . -146.5589 2 297.1177 302.3281
Model Obs ll(null) ll(model) df AIC BIC
. estat ic
/sigma 1.042636 .0546465 19.08 0.000 .9355311 1.149742
L1. .7881723 .0718208 10.97 0.000 .6474061 .9289385
ar
ARMA
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
OPG
Log likelihood = -146.5589 Prob > chi2 = 0.0000
Wald chi2(1) = 120.43
Sample: 1 - 100 Number of obs = 100
ARIMA regression
12
Stata> arima y, arima (2,0,0) noconstant
The information criteria for the
(a) The AIC and BIC suggest the is better. The fits the 90 observations better.
(b) But this doesnt necessary mean the delivers a better forecast result.
(c) The estimates are based on sample 90% of the sample observations. We can evaluate the forecast
performances between the and the based on the remained 10% observations.
/sigma 1.042314 .0551962 18.88 0.000 .934131 1.150496
L2. -.0255046 .1024235 -0.25 0.803 -.2262508 .1752417
L1. .8056681 .0917879 8.78 0.000 .6257671 .9855691
ar
ARMA
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
OPG
Log likelihood = -146.5302 Prob > chi2 = 0.0000
Wald chi2(2) = 120.05
Sample: 1 - 100 Number of obs = 100
ARIMA regression
Note: N=Obs used in calculating BIC; see [R] BIC note
. 100 . -146.5302 3 299.0604 306.8759
Model Obs ll(null) ll(model) df AIC BIC
13
One step ahead forecast
Stata > predict yA if t > (90) /*1 step ahead forecast for y after the ___observation*/
Stata > tsline y yA /* Graphically, comparing the true value y and forecast yA*/
Stata> tsline yA if t> (90) || tsline y if t < (91) /* plot forecast yA and historical y together */
The model
-
6
-
4
-
2
0
2
4
0 20 40 60 80 100
t
y xb prediction, one-step
-
4
-
2
0
2
4
0 20 40 60 80 100
t
xb prediction, one-step y
14
The model
Note: Although the BIC and AIC suggest the model is better, graphically the forecast points
are very similar between these 2 models.
-
6
-
4
-
2
0
2
4
0 20 40 60 80 100
t
y xb prediction, one-step
-
4
-
2
0
2
4
0 20 40 60 80 100
t
xb prediction, one-step y
15
Forecast Evaluation: One step ahead forecast
Stata > generate errorA = y yA /* calculate the forecast error */
Stata> generate SqError = errorA*errorA /* squared the forecast errors */
Stata> summarize (SqError) /* mean squared error is provided*/
Model A:
Model B:
If we look at the MSE(s) between Model A and Model B, Model __s MSE is smaller.
sqE_A 10 2.568757 5.488667 .1343741 18.02609
Variable Obs Mean Std. Dev. Min Max
sqE_B 11 2.330664 5.306183 .0487499 18.17375
Variable Obs Mean Std. Dev. Min Max
16
Suppose we create a variable called d (defined as the difference between errorA and
errorB, where errorA is the forecast errors of Model A). Run the d on the constant
term only, and obtain the following result.
Another test is the Granger-Newbold test. We create variables called
and
, and
use the t test discussed previously.
The t statistic is 0.0488 which is clearly less than
*/
Stata> tsline y_J if t> (90) || tsline y if t < (91)
The true model, , (long term) mean is 0. [Recall:
]
Therefore, over the time the forecast value is converging to 0.
-
2
0
2
4
0 20 40 60 80 100
t
J-step ahead historical y
19
Example:
Use simulated 200 data points from
, where
We used 180
data points to estimate the model. Then we forecast 20 points ahead.
Stata> arima y if t <(181), ar(1) /*estimates used data point 1, 2,,180
Stata > predict y_J, dynamic(180) /* at time 180, forecast
*/
Stata> tsline y_J if t> (179) || tsline y if t < (181)
The mean is
-
4
-
2
0
2
4
6
0 50 100 150 200
time
J-step ahead forecasts historical y
20
Example: Different commands for estimating ARIMA models.
Use simulated 10,000 data points from
, where
Stata A> arima y, arima(1,0,0)
Stata B> arima y, ar(1) /* If it is an arma model, then we can type > arima, ar(1/p) ma(1/q) */
/sigma 1.006175 .0069936 143.87 0.000 .9924676 1.019882
L1. .4988331 .0086065 57.96 0.000 .4819647 .5157015
ar
ARMA
_cons 4.017689 .0200645 200.24 0.000 3.978363 4.057015
y
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
OPG
Log likelihood = -14251.09 Prob > chi2 = 0.0000
Wald chi2(1) = 3359.37
Sample: 0 - 9999 Number of obs = 10000
ARIMA regression
/sigma 1.006175 .0069936 143.87 0.000 .9924676 1.019882
L1. .4988331 .0086065 57.96 0.000 .4819647 .5157015
ar
ARMA
_cons 4.017689 .0200645 200.24 0.000 3.978363 4.057015
y
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
OPG
Log likelihood = -14251.09 Prob > chi2 = 0.0000
Wald chi2(1) = 3359.37
Sample: 0 - 9999 Number of obs = 10000
ARIMA regression
21
Stata C > reg y L1.y /* note that if we have 3 lags of y, we can type reg y L1.y L2.y L3.y */
Stata D > arima y L1.y /*
Note: Command C gives us the
.
Several assumptions are imposed into the models.
Time Series
is a white noise.
o Mean is zero, variance is
? Please
state the procedure step by step.
(1) _____________
o Stationary or Non-Stationary? (If non-stationary, take the difference on variable
first,
)
o If the series is stationary, then we might use correlograms to decide the models and
# of lags
(2) Estimation
o Run regressions/ If we cant decide the model in the stage (1), we can use AIC (or
BIC) criterion here
(3) ________ Checking
We need to include enough AR & MA terms to make sure the residual terms are the
white noise in the models.
o The coefficients of the p and q lags must be significant, but the interior one needs
not be. We can skip the interim terms if they are not useful.
o Test the residuals; if the residual is white noise, the model is considered ok.
(4) Forecasting (either one step ahead or j step ahead)
24
11.4 Unit Root Tests
Stationary is ___________ important for time series analysis, (not only just for the
AR, MA, and ARMA models).
If a time series is not stationary, we must transform the non stationary series to be a
stationary series before any regression analysis. Otherwise, the regression result is
not reliable. Usually, taking the (first) difference on the variable is one of the
possible ways to ________ the series, i.e.
.
Graphically, if the ACF diagram of the series
is stationary?
Suppose we have an AR(1) model,
A unit root ____ if .
Therefore, we can test if is significantly different from 1 or not.
Alternatively, we can test the regression:
and
examine whether ( is zero.
28
11.4.1 Dickey Fuller (DF) Test
_________________
There are three versions of the DF test
Case 1:
o Case 1 is the simplest form
Case 2:
o We include an intercept to the model
Case 3:
o We include an intercept and a trend to the model
Whether to include the intercept and/or time trend is an ___________question.
The DF test is very restricted. Because it applies only on AR(1).
29
Example: Fertility Rate
Example: Singapore Inflation Rate (data file can be found in edventure)
-
5
0
5
1
0
i
n
f
1980m1 1990m1 2000m1 2010m1
time
30
Stata> ______ inf, regress _____
/* inf (inflation) is the variable name; I include time trend and 1 lag in the unit root testing */
Stata> dfuller inf, regress /* without including trend */
MacKinnon approximate p-value for Z(t) = 0.0174
Z(t) -3.784 -3.986 -3.426 -3.130
Statistic Value Value Value
Test 1% Critical 5% Critical 10% Critical
Interpolated Dickey-Fuller
Dickey-Fuller test for unit root Number of obs = 357
MacKinnon approximate p-value for Z(t) = 0.0029
Z(t) -3.798 -3.451 -2.876 -2.570
Statistic Value Value Value
Test 1% Critical 5% Critical 10% Critical
Interpolated Dickey-Fuller
Dickey-Fuller test for unit root Number of obs = 357
31
Augmented Dickey Fuller (ADF) Test
As discussed previously, DF test applies only on AR(1). But AR(1) might not
capture ___ serial correlation in
is wrong
Similarly, we can include an intercept and/or a time dummy into the model.
Case (1):
Case (2):
Case (3):
1
It is called augmented DF test because the test is augmented by lags of ______
32
We can graph the series
should we include?
Schwert (1989) suggested that we can set the number of lags not ______ than
where
Example:
If we have 358 monthly inflation data points, then we set the number of lags in the ADF
test at most up to
.
If the estimate of
is significant.
33
At first, we include 16 lags. Stata > dfuller inf, regress lags(16)
Since the estimate of the
Thus, practically PP is better than ADF test because the test is more robust to the violations of
classical linear assumptions. Another advantage is we do not need to specify the number of
lags.
Example: Inflation rate
Stata>pperron inf
The test statistic is larger than the critical values. We reject the null hypotheis.
MacKinnon approximate p-value for Z(t) = 0.0006
Z(t) -4.243 -3.451 -2.876 -2.570
Z(rho) -27.674 -20.386 -14.000 -11.200
Statistic Value Value Value
Test 1% Critical 5% Critical 10% Critical
Interpolated Dickey-Fuller
Newey-West lags = 5
Phillips-Perron test for unit root Number of obs = 357
35
Kwiatkowski, Phillips, Schmidt and Shin (KPSS) had developed to ________ to traditional
unit root tests, such as ADF and PP tests.
For ADF, PP tests --
For KPSS test --
.
To make sure the first difference variable is stationary, we perform unit root tests again on the first difference variable
. If
is stationary, then we can start the B-J approach. Otherwise, take the difference on
again.
We call this is an ARIMA
2
(p, 1, q) model. 1 indicates that the series
Recall: Box & Jenkins Approach
Stationary Identification Estimation Checking Forecasting.
Sometimes, nonstationary is due to the seasonal effects, i.e. some seasons have different pattern than the others. Therefore, we
can take first order differences with a seasonal difference at lag 4.
Example:
Vector AutoRegression (VAR): Multiple time series in the generalized AR models.
[B] Parameter Instability and Structural Change (* not tested in the exam)
The time series data are often nonstationary. It can be due to a time trend, and/or seasonal component, and/or structural change.
Trend (stationary): Once the trend is removed, the series is stationary process.
How do we test the structural change (or break)? How do we model the change?
Know the date of event, for instance global financial crisis, 9/11, etc that changes the (econometric) system.
Suppose you suspect there is a structural change at a particular date. It is straight forward to use the Chow test.
(1)
(2)
We use the data before the change, for example 9/11 or July 1997, to estimate Model 1 and data on and after 9/11 to estimate
Model 2.
vs
is wrong.
38
, where
where
In practice, we plot the graph.
We suspect there is a structural change at date 101 because after date 100, the y increases dramatically. In this simple
exercise, we also suspect that the model is the AR(2) because of the ac and pac graphs, (if we are lucky enough).
We can see the main difference between and after date 100, is (perhaps) only the intercept. There is a jump.
So we can run the regression using the AR(2) model for whole date set to get the . In addition, we run two
regression models. One uses 1-100 data points, and the other uses 101 -213 data points.
-
3
0
-
2
0
-
1
0
0
1
0
y
0 50 100 150 200
time
39
The whole data set (1-213 points)
For Model (1)
_cons .0184165 .1335337 0.14 0.890 -.2448218 .2816548
L2. .0716431 .0690029 1.04 0.300 -.064384 .2076702
L1. .9205832 .06888 13.37 0.000 .7847984 1.056368
y
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 37383.9386 212 176.339333 Root MSE = 1.882
Adj R-squared = 0.9799
Residual 743.777018 210 3.54179533 R-squared = 0.9801
Model 36640.1616 2 18320.0808 Prob > F = 0.0000
F( 2, 210) = 5172.54
Source SS df MS Number of obs = 213
. reg y l1.y l2.y
_cons -1.618651 .3933838 -4.11 0.000 -2.399617 -.8376855
L2. .211723 .0948349 2.23 0.028 .0234518 .3999942
L1. .7065436 .0993953 7.11 0.000 .509219 .9038683
y
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 2024.5908 97 20.8720701 Root MSE = 1.0395
Adj R-squared = 0.9482
Residual 102.646597 95 1.0804905 R-squared = 0.9493
Model 1921.9442 2 960.972099 Prob > F = 0.0000
F( 2, 95) = 889.39
Source SS df MS Number of obs = 98
. reg y l1.y l2.y if t<(101)
40
For Model (2)
,
We calculate the F statistic value and compare the value to the critical F value.
At 5 % significance level, .
Therefore, we reject the null hypothesis, where
To use the Chow test, we need to specify the date for the structural change and to assume that the change fully manifests itself
at that date. But this may not always be appropriate, for example there is no particular data at which we can say that significant
climate change has occurred.
In addition, we need to have enough observations are included in each subsample. Otherwise, the estimated coefficients have
little precisions.
_cons 1.806572 .4558879 3.96 0.000 .9030167 2.710127
L2. .0949843 .090681 1.05 0.297 -.0847425 .274711
L1. .6979468 .0948083 7.36 0.000 .5100398 .8858538
y
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 405.868817 111 3.65647583 Root MSE = 1.0905
Adj R-squared = 0.6748
Residual 129.619879 109 1.1891732 R-squared = 0.6806
Model 276.248939 2 138.124469 Prob > F = 0.0000
F( 2, 109) = 116.15
Source SS df MS Number of obs = 112
. reg y l1.y l2.y if t>(103)
41
We can use recursive estimation to detect if the estimated coefficients change abruptly.
/* note that STATA will clear all the result except the rolling estimates */
/* first set: 1- #, second set 1- (#+1), third set 1-(#+2), etc */
> rolling, recursive window(#) clear: regress y l1.y
/* we need to retell STATA that this is time series data, as the previous data were cleared.*/ >tsset end /* we always use this
command */
>tsline coefficient name1, coefficient name2, etc
Example:
, where
,
t=1, ,200. Observations 1-100 was simulated from Model 1and the other observations were simulated from Model 2.
> In total, 131 set of estimates were estimated. Each time, the number of observations was increased by 1 every time.
> since all the results were eliminated except the 131 sets of estimates we need to use tsset command again. Then plot the intercept
estimates and slope estimates against time.
...............................
.................................................. 100
.................................................. 50
1 2 3 4 5
Rolling replications (131)
(running regress on estimation sample)
. rolling, recursive window (70) clear: reg y l1.y
. tsline _b_cons _stat_1
delta: 1 unit
time variable: end, 70 to 200
. tsset end
42
The intercept estimates (_b[_cons]) change dramatically after 100
th
data point. This signals that there might be a structural
change.
Similarly, the slope estimates seems to increase after 100
th
data point. But the change is roughly just within (0.75, 0.95).
The estimated intercepts are within (1.55, 2) for the first 30 sets of the estimates then after 100
th
point, the estimated intercept
converges to 0.5. This is because for the very last few set of the estimates, the model estimated the data simulated from a
model with an intercept 0.5
.
5
1
1
.
5
2
50 100 150 200
end
_b[_cons] _b[L.y]