0% found this document useful (0 votes)
45 views

EC 485: Time Series Analysis in A Nut Shell

1) The document discusses time series analysis and modeling capacity utilization data from 1967 to 2004. 2) Taking first differences resulted in a stationary series based on examining the autocorrelation function. 3) Model identification and estimation will examine autocorrelations and partial autocorrelations to select ARIMA models to estimate and forecast.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

EC 485: Time Series Analysis in A Nut Shell

1) The document discusses time series analysis and modeling capacity utilization data from 1967 to 2004. 2) Taking first differences resulted in a stationary series based on examining the autocorrelation function. 3) Model identification and estimation will examine autocorrelations and partial autocorrelations to select ARIMA models to estimate and forecast.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

#1

EC 485: Time Series Analysis in a Nut Shell


Data Preparation: #2
1) Plot data and examine for stationarity
2) Examine ACF for stationarity
3) If not stationary, take first differences
4) If variance appears non-constant,
take logarithm before first differencing
5) Examine the ACF after these transformations
to determine if the series is now stationary

Model Identification and Estimation:


1) Examine the ACF and PACF’s of your
(now) stationary series to get some ideas
about what ARIMA(p,d,q) models to estimate.
2) Estimate these models
3) Examine the parameter estimates, the SBC
statistic and test of white noise for the residuals.

Forecasting:
1) Use the best model to construct forecasts
2) Graph your forecasts against actual values
3) Calculate the Mean Squared Error for the forecasts
Data Preparation: #3
1) Plot data and examine. Do a visual inspection to determine if your series is non-
stationary.

2) Examine Autocorrelation Function (ACF) for stationarity. The ACF for a non-
stationary series will show large autocorrelations that diminish only very slowly at
large lags. (At this stage you can ignore the partial autocorrelations and you can
always ignore what SAS calls the inverse autocorrelations.

3) If not stationary, take first differences. SAS will do this automatically in the
IDENTIFY VAR=y(1) statement where the variable to be “identified” is y and the 1
refers to first-differencing.

4) If variance appears non-constant, take logarithm before first differencing. You


would take the log before the IDENTIFY
statement:
ly = log(y);
PROC ARIMA;
IDENTIFY VAR=ly(1);

5) Examine the ACF after these transformations to determine if the series is now
stationary
#4
In this presentation, a variable measuring the capacity utilization
for the U.S. economy is modeled. The data are monthly from
1967:1 – 2004:03.

It will be used as an example of how to carry out the three steps


outlined on the previous slide.

We will remove the last 6 observations 2003:10 – 2004:03 so that


we can construct out-of-sample forecasts and compare our models’
ability to forecast.
Capacity Utilization 1967:1 – 2004:03 (in levels) #5

This plot of the raw data indicates non-stationarity, although there


does not appear to be a strong trend.
#6
The ARIMA Procedure

Name of Variable = cu

Mean of Working Series 81.61519


This ACF plot is produced
Standard Deviation 3.764998 By SAS using the code:
Number of Observations 441
PROC ARIMA;
Autocorrelations IDENTIFY VAR=cu;
Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1

0 14.175211 1.00000 | |********************| It will also produce an


1 13.884523 0.97949 | . |********************| inverse autocorrelation plot
2 13.485201 0.95132 | . |******************* |
3 13.007277 0.91761 | . |****************** | that you can ignore and a
4 12.434837 0.87722 | . |****************** |
5 11.820231 0.83387 | . |***************** |
partial autocorrelation plot
6 11.191805 0.78953 | . |**************** | that we will use in the
7 10.561770 0.74509 | . |*************** |
8 9.900866 0.69846 | . |************** | modeling stage.
9 9.215675 0.65013 | . |************* |
10 8.479804 0.59821 | . |************ |
11 7.713914 0.54418 | . |*********** |
12 6.928244 0.48876 | . |********** |
13 6.160440 0.43459 | . |********* |
14 5.422593 0.38254 | . |******** |
15 4.717018 0.33277 | . |*******. |
16 4.051825 0.28584 | . |****** . |
17 3.390746 0.23920 | . |***** . |
18 2.751886 0.19413 | . |**** . |

This plot of the ACF clearly indicates a non-stationary series.


The autocorrelations diminish only very slowly.
First differences of Capacity Utilization 1967:1 – 2004:03 #7

This graph of first differences appears to be stationary.


#8
Name of Variable = cu

Period(s) of Differencing 1
Mean of Working Series -0.03295
Standard Deviation 0.584287
Number of Observations 440 This ACF was produced in SAS
Observation(s) eliminated by differencing 1
using the code:
Autocorrelations
Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
0 0.341391 1.00000 | |********************|
PROC ARIMA;
1 0.126532 0.37064 | . |******* | IDENTIFY VAR=cu(1);
2 0.093756 0.27463 | . |***** |
3 0.079004 0.23142 | . |***** |
RUN;
4 0.062319 0.18254 | . |**** |
5 0.021558 0.06315 | . |*. |
6 0.020578 0.06028 | . |*. |
where the (1) tells SAS to use
7 0.018008 0.05275 | . |*. | first differences.
8 0.029300 0.08583 | . |** |
9 0.040026 0.11724 | . |** |
10 0.020880 0.06116 | . |*. |
11 0.010021 0.02935 | . |*. |
12 -0.0071559 -.02096 | . | . |
13 -0.026090 -.07642 | **| . |
14 -0.031699 -.09285 | **| . |
15 -0.032960 -.09654 | **| . |
16 -0.023544 -.06897 | . *| . |
17 -0.021426 -.06276 | . *| . |
18 -0.0084132 -.02464 | . | . |

This ACF shows the autocorrelations diminishing fairly quickly. So we


decide that the first difference of the capacity util. rate is stationary.
In addition to the autocorrelation function (ACF) and partial autocorrelation #9
functions (PACF) SAS will print out an autocorrelation check for
white noise. Specifically, it prints out the Ljung-Box statistics, called
Chi-Square below, and the p-values. If the p-value is very small as they are
below, then we can reject the null hypothesis that all of the autocorrelations
up to the stated lag are jointly zero. For example, for our capacity utilization
data (first differences):
Ho: 1 =2 =3 =4 =5 =6 = 0 (the data series is white noise)
H1: at least one is non-zero
2 = 136.45 with a p-value of less than 0.0001  easily reject Ho
Autocorrelation Check for White Noise
To Chi- Pr >
Lag Square DF ChiSq ---------------Autocorrelations---------------

6 136.45 6 <.0001 0.371 0.275 0.231 0.183 0.063 0.060


12 149.50 12 <.0001 0.053 0.086 0.117 0.061 0.029 -0.021
18 164.64 18 <.0001 -0.076 -0.093 -0.097 -0.069 -0.063 -0.025
24 221.29 24 <.0001 -0.059 -0.064 -0.118 -0.114 -0.145 -0.257

A check for white noise on your stationary series is important, because if


your series is white noise there is nothing to model and thus no point in
carrying out any estimation or forecasting. We see here that the first
difference of capacity utilization is not white noise, so we proceed to the
modeling and estimation stage. Note: we can ignore the autocorrelation
check for the data before differencing because it is non-stationary.
Model Identification and Estimation: #10
1) Examine the Autocorrelation Function (ACF) and Partial Autocorrelation
Function (PACF) of your (now) stationary series to get some ideas about
what ARIMA(p,d,q) models to estimate. The “d” in ARIMA stands for the
number of times the data have been differenced to render to stationary. This
was already determined in the previous section.

The “p” in ARIMA(p,d,q) measures the order of the autoregressive


component. To get an idea of what orders to consider, examine the partial
autocorrelation function. If the time-series has an autoregressive order of 1,
called AR(1), then we should see only the first partial autocorrelation
coefficient as significant. If it has an AR(2), then we should see only the
first and second partial autocorrelation coefficients as significant. (Note,
that they could be positive and/or negative; what matters is the statistical
significance.) Generally, the partial autocorrelation function PACF will
have significant correlations up to lag p, and will quickly drop to near zero
values after lag p.
Here is the partial autocorrelation function PACF for the first-differenced #11
capacity utilization series. Notice that the first two (maybe three)
autocorrelations are statistically significant. This suggests AR(2) or AR(3)
model. There is a statistically significant autocorrelation at lag 24 (not printed
here) but this can be ignored. Remember that 5% of the time we can get an
autocorr. that is more than 2 st. dev.s above zero when in fact the true one is zero.
Partial Autocorrelations

Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
1 0.37064 | . |******* |
2 0.15912 | . |*** |
3 0.10330 | . |** |
4 0.04939 | . |*. |
5 -0.07279 | .*| . |
6 0.00433 | . | . |
7 0.01435 | . | . |
8 0.06815 | . |*. |
9 0.08346 | . |** |
10 -0.02903 | .*| . |
11 -0.03996 | .*| . |
12 -0.07539 | **| . |
13 -0.08379 | **| . |
14 -0.03419 | .*| . |
15 -0.02101 | . | . |
16 0.01950 | . | . |
17 -0.00768 | . | . |
18 0.01681 | . | . |
Model Identification and Estimation: (con’t) #12

The “q” measures the order of the moving average component. To get an
idea of what orders to consider, we examine the autocorrelation function. If
the time-series is a moving average of order 1, called a MA(1), we should
see only one significant autocorrelation coefficient at lag 1. This is because
a MA(1) process has a memory of only one period. If the time-series is a
MA(2), we should see only two significant autocorrelation coefficients, at
lag 1 and 2, because a MA(2) process has a memory of only two periods.
Generally, for a time-series that is a MA(q), the autocorrelation function
will have significant correlations up to lag q, and will quickly drop to near
zero values after lag q.

For the capacity utilization time-series, we see that the ACF function decays,
but only for the first 4 lags. Then it appears to drop off to zero abruptly.
Therefore, a MA(4) might be considered.

Our initial guess is ARIMA(2,1,4) where the 1 tells us that the data have been
first-differenced to render it stationary.
2) Estimate the Models: #13

To estimate the model in SAS is fairly straight forward. Go back to the PROC
ARIMA and add the ESTIMATE command. Here we will estimate four models:
ARIMA(1,1,0), ARIMA(1,1,1), ARIMA(2,1,0), and ARIMA(2,1,4). Although
we believe the last of these will be the best, it is instructive to estimate a simple
AR(1) on our differenced series, this is the ARIMA(1,1,0) a model with an
AR(1) and a MA(1) on the differenced series; this is the ARIMA(1,1,1), and a
model with only an AR(2) term. This is the ARIMA(2,1,0)
This tells SAS that d=1 for all models
PROC ARIMA;
IDENTIFY VAR=cu(1); This estimates an ARIMA(1,1,0)
ESTIMATE p = 1:
ESTIMATE p = 1 q=1;
This estimates ARIMA(1,1,1)
ESTIMATE p = 2;
ESTIMATE p = 2 q = 4; This estimates an ARIMA(2,1,0)
RUN;
This estimates an ARIMA(2,1,4)
#14
3) Examine the parameter estimates, the SBC statistic and test of white noise
for the residuals.

On the next few slides you will see the results of estimating the 4 models
discussed in the previous section. We are looking at the statistical
significance of the parameter estimates. We also want to compare
measures of overall fit. We will use the SBC statistic. It is based on the
sum of squared residuals from estimating the model and it balances the
reduction in degrees of freedom against the reduction in sum of squared
residuals from adding more variables (lags of the time-series). The lower
the sum of squared residuals, the better the model. SAS calculates the
SBC as:
Where k = p+q+1, the number of
parameters estimated, and T is sample
SBC  2 ln( L)  K ln( T ) size. L is the likelihood measure, and essentially
depends on the sum of squared residuals. The model
with the lowest SBC measure is considered “best”.
SBC can be positive or negative.
NOTE: SAS’s formula differs slightly from the one in
the textbook.
This is the ARIMA(1,1,0) model: yt =β0 + β1 yt-1 + εt #15
Conditional Least Squares Estimation

Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
MU -0.03528 0.04115 -0.86 0.3918 0
AR1,1 0.37113 0.04440 8.36 <.0001 1

These are the Constant Estimate -0.02219


Variance Estimate 0.295778
estimates of β0 and β1 Std Error Estimate 0.543854
AIC 714.6766
SBC 722.8502
Number of Residuals 440
* AIC and SBC do not include log determinant.

Autocorrelation Check of Residuals


To Chi- Pr >
Lag Square DF ChiSq ---------------Autocorrelations---------------

6 17.95 5 0.0030 -0.059 0.103 0.109 0.114 -0.021 0.029


12 22.89 11 0.0183 0.006 0.040 0.092 0.017 0.022 -0.008
18 27.95 17 0.0455 -0.052 -0.048 -0.058 -0.022 -0.043 0.020
24 50.98 23 0.0007 -0.039 -0.008 -0.079 -0.037 -0.032 -0.198
30 62.85 29 0.0003 -0.071 -0.045 -0.087 -0.026 -0.056 0.082
36 68.07 35 0.0007 -0.046 0.056 -0.042 -0.027 -0.041 -0.040

Things to notice: the parameter estimate on the AR(1) term 1 is statistically


significant, which is good. However, the autocorrelation check of the
residuals tells us that the residuals from this ARIMA(1,1,0) are not white-
noise, with a p-value of 0.003. We have left important information in the
residuals that could be used. We need a better model.
#16
This is the ARIMA(1,1,1) model: yt = β0 + β1 yt-1 + εt + λ1 εt-1
Conditional Least Squares Estimation
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
MU -0.04037 0.05586 -0.72 0.4703 0
MA1,1 0.46161 0.09410 4.91 <.0001 1
AR1,1 0.75599 0.06951 10.88 <.0001 1

Constant Estimate -0.00985


Variance Estimate 0.286071
These are the estimates of Std Error Estimate 0.534856
AIC 700.9892
β0 , β1 and λ1 SBC
Number of Residuals
713.2496
440
* AIC and SBC do not include log determinant.

Autocorrelation Check of Residuals


To Chi- Pr >
Lag Square DF ChiSq ---------------Autocorrelations---------------
6 4.71 4 0.3187 -0.001 -0.012 0.031 0.045 -0.079 -0.034
12 10.53 10 0.3953 -0.029 0.032 0.097 0.031 0.023 -0.012
18 16.75 16 0.4021 -0.062 -0.061 -0.059 -0.016 -0.017 0.045
24 35.15 22 0.0374 -0.002 0.014 -0.048 -0.008 -0.024 -0.190
30 45.51 28 0.0196 -0.072 -0.028 -0.066 -0.017 -0.022 0.104
36 49.89 34 0.0386 -0.003 0.070 -0.023 -0.025 -0.038 -0.040

Things to notice: the parameter estimates of the AR(1) term β1 and of the
MA(1) term λ1 are statistically significant. Also, the autocorrelation check
of the residuals tells us that the residuals from this ARIMA(1,1,1) are white-
noise, since the Chi-Square statistics up to a lag of 18 have p-values greater
than 10%, meaning we cannot reject the null hypothesis that the
autocorrelations up to lag 18 are jointly zero (p-value = 0.4021). Also the
SBC statistic is smaller. So we might be done …
#17
This is the ARIMA(2,1,0) model: yt = β0 + β1 yt-1 + β2 yt-2 + εt
Conditional Least Squares Estimation
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
MU -0.03783 0.04829 -0.78 0.4338 0
AR1,1 0.31208 0.04726 6.60 <.0001 1
AR1,2 0.15929 0.04726 3.37 0.0008 2

Constant Estimate -0.02


Variance Estimate 0.288946
Std Error Estimate 0.537537
AIC 705.3888
SBC 717.6491
Number of Residuals 440
* AIC and SBC do not include log determinant.
Autocorrelation Check of Residuals

To Chi- Pr >
Lag Square DF ChiSq ---------------Autocorrelations---------------

6 8.67 4 0.0700 -0.017 -0.045 0.085 0.089 -0.045 -0.007


12 13.96 10 0.1747 -0.010 0.038 0.096 0.023 0.019 -0.007
18 18.73 16 0.2832 -0.054 -0.053 -0.052 -0.020 -0.025 0.030
24 38.35 22 0.0167 -0.016 -0.004 -0.063 -0.009 -0.022 -0.193
30 47.43 28 0.0123 -0.067 -0.021 -0.070 -0.031 -0.034 0.085
36 51.02 34 0.0305 -0.019 0.053 -0.029 -0.030 -0.033 -0.037

This model has statistically significant coefficient estimates, the residuals


up to lag 6 reject the null hypothesis of white noise, casting some doubt on this
model. We won’t place much meaning in the Chi-Square statistics for lags
beyond 18. The SBC statistic is larger, which is not good.
This is the ARIMA(2,1,4) model: #18
yt = β0 + β1 yt-1 + β2 yt-2 + εt + λ1 εt-1 + λ2 εt-2 + λ3 εt-3+ λ4 εt-4
Conditional Least Squares Estimation
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
MU -0.03613 0.04697 -0.77 0.4423 0
MA1,1 0.48913 0.29916 1.64 0.1028 1
MA1,2 -0.43438 0.13474 -3.22 0.0014 2
MA1,3 -0.17179 0.05634 -3.05 0.0024 3
MA1,4 -0.11146 0.08044 -1.39 0.1666 4
AR1,1 0.78020 0.29788 2.62 0.0091 1
AR1,2 -0.44336 0.19274 -2.30 0.0219 2

Constant Estimate -0.02396


Variance Estimate 0.284717
Std Error Estimate 0.533589
AIC 702.8553
SBC 731.4627
Number of Residuals 440
* AIC and SBC do not include log determinant.
Autocorrelation Check of Residuals
To Chi- Pr >
Lag Square DF ChiSq ---------------Autocorrelations---------------
6 0.00 0 <.0001 -0.000 0.003 0.005 0.020 -0.009 0.068
12 5.66 6 0.4624 0.028 0.032 0.072 0.008 0.022 -0.002
18 9.94 12 0.6212 -0.049 -0.050 -0.054 -0.016 -0.024 0.026
24 27.26 18 0.0743 -0.029 -0.003 -0.063 -0.022 -0.022 -0.177
30 35.68 24 0.0590 -0.058 -0.030 -0.070 -0.025 -0.048 0.076
36 40.12 30 0.1025 -0.027 0.056 -0.034 -0.033 -0.040 -0.040

Two of the parameter estimates are not statistically significant telling us the model
is not “parsimonious”, and the SBC statistic is larger than the SBC for the
ARIMA(1,1,1) model. Ignore the first Chi-Square statistic since it has 0 d.o.f. due
to estimating a model with 7 parameters. The Chi-Square statistic at 12 and 18 lags
is statistically insignificant indicating white noise.
Forecasts: #19
proc arima;
identify var=cu(1);
estimate p=1; (any model goes here)
forecast lead=6 id=date interval=month out=fore1;

We calculate the Mean Squared Error for the 6 out-of-sample forecasts. Graphs
appear on the next four slides. We find that the fourth model produces forecasts with
the smallest MSE. SAS automatically adjusts the data from first differences back into
levels.
Use the actual values for CU and the forecasted values below to generate a mean
squared prediction error for each model estimated. The formula is MSE = (1/6)*(fcu
– cu)2 where fcu is a forecast and cu is actual.

Obs date cu cu2 fcu1 sd1 fcu2 sd2 fcu3 sd3 fcu4 sd4

441 SEP03 74.9 74.9 74.4778 0.54385 74.5294 0.53486 74.5596 0.53754 74.6211 0.53359

442 OCT03 . 75.0 75.0263 0.54385 75.0215 0.53486 75.0048 0.53754 75.1540 0.53359

443 NOV03 . 75.7 75.0509 0.92295 75.1034 0.87485 75.0813 0.88678 75.3396 0.87138

444 DEC03 . 75.8 75.0379 1.23500 75.1555 1.19316 75.1018 1.22371 75.3883 1.18650
445 JAN04 . 76.2 75.0109 1.49834 75.1851 1.49534 75.1004 1.52680 75.3511 1.50072

446 FEB04 . 76.7 74.9787 1.72697 75.1976 1.78205 75.0833 1.80183 75.2766 1.81196

447 MAR04 . 76.5 74.9445 1.93039 75.1972 2.05370 75.0577 2.05184 75.2110 2.08938
#20
#21
Granger Causality (Predictability) Test

Yt = 0 + 1Yt-1 + 2Yt-2 … + pYt-p + α1Xt-1 + α 2Xt-2 … + α pXt-p + ut


We can test to determine if another variable helps to predict our series Yt.
This can be done through a simple F-test on the α parameters.
If these are jointly zero, then the variable X has no “predictive content”
for variable Y. See textbook, Chapter 14.

lcpi = lag(cpi);
inf = 12*100*log(cpi)-log(lcpi);

dinf = inf-lag(inf);
ldinf = lag(dinf);
l2dinf = lag2(dinf);
l3dinf = lag3(dinf);
l4dinf = lag4(dinf);

ldcu = lag(dcu);
l2dcu = lag2(dcu);
l3dcu = lag3(dcu);
l4dcu = lag4(dcu);

run;
proc autoreg data=one;
model dcu = ldcu l2dcu l3dcu l4dcu ldinf l2dinf l3dinf l4dinf ;
test ldinf=0,l2dinf=0,l3dinf=0,l4dinf=0;
run;
#22
The AUTOREG Procedure
Dependent Variable dcu
Ordinary Least Squares Estimates

SSE 116.390576 DFE 432


MSE 0.26942 Root MSE 0.51906
SBC 718.852042 AIC 682.050639
Regress R-Square 0.2090 Total R-Square 0.2090
Durbin-Watson 1.9847
Standard Approx
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0.1455 0.0459 3.17 0.0016


ldcu 1 0.2585 0.0481 5.37 <.0001
l2dcu 1 0.1118 0.0493 2.27 0.0240
l3dcu 1 0.0771 0.0486 1.59 0.1137
l4dcu 1 0.0534 0.0468 1.14 0.2545
ldinf 1 0.006012 0.009440 0.64 0.5246
l2dinf 1 -0.0218 0.0101 -2.15 0.0318
l3dinf 1 -0.006926 0.0102 -0.68 0.4959
l4dinf 1 -0.0103 0.009566 -1.08 0.2802

Test 1
Mean
Source DF Square F Value Pr > F
Numerator 4 1.295144 4.81 0.0008
Denominator 432 0.269423

You might also like