Some Notes On Univariate Time Series Analysis: C:/JIM//CLASSES/588/NOTES/III - ARIMA - 588 - 2000
Some Notes On Univariate Time Series Analysis: C:/JIM//CLASSES/588/NOTES/III - ARIMA - 588 - 2000
D. Diagnostic Analysis
E. Estimation
F. Forecasting
G. General comments
Appendices
AR(p) and Yule Walker equations
MA(q)
Spectral density functions
James B. McDonald
Brigham Young University
Minor revisions 3/2000
0. Basic Problem: Consider the problem of, given n observations on a single variable Y
(Y1, Y2, ..., Yn), obtain forecasts of Y at time period n + h, denoted by Y$N+ h or Yn(h) . This
might be pictorially
represented as:
The data might correspond to GNP, consumer prices, foreign exchange rates, telephone calls,
unemployment rates, product sales, the number of empty hospital beds in a hospital, commodity prices,
or any other time series. The forecasting problem becomes that of attempting to obtain a prediction
forecast h periods in the future from known observations. Many techniques have been developed to
extrapolate from past observations into the future. The techniques differ in structure and assumptions
made including the "amount" of past information used in forecasting. Some techniques assume that past
levels or changes or percentage changes can be used to forecast the next period. Other techniques are
based upon "moving" averages of past values and possibly allow for trends or seasonality in the
underlying series. When forecasting, we would do well to remember the admonition found in 88th
Section of the doctrine and Covenants (79th verse) about studying things “which must shortly come to
Auto Regressive Integrated Moving Average (ARIMA) models are very general specification
which includes some of the previously mentioned forecasting approaches as special cases as well as
many other approaches. These techniques will be studied in order of increasing "sophistication."
Before getting into the details of time series forecasting techniques, it is useful to give a brief
overview of applications of these techniques. They can be used (1) on a "stand alone" basis to predict
future values of a dependent variable of interest, YN(h), by extrapolating past trends ; (2) to predict a
future value for an independent variable (Xt), XN(h) which can be substituted into an econometric
model
to obtain forecasts of the dependent variable; and finally (3) these techniques can also be used to
predict systematic components in the residuals. Thus time series techniques can be used separately
1. Stationarity
Consider a stochastic process Yt which is defined for all integer values of t. Yt is said to be
E(Yt) = µ (A.1a-c)
Var(Yt) = ó 2
Cov(Yt, Yt-s ) = ãs
for all t. A stronger definition of stationarity is that the joint distribution functions F(Yt+1,...,
Yt+n) and F(Ys+1,..., Ys+n) are identical for any value of t and s.
š š
Ytš š
š š
š š
Yt
š š
š š
š š
š š
š š
š š
š____________________________ š_______________________________
t t
š š
Ytš š
š š
š š
Yt
š š
š š
š š
š š
š š
š š
š____________________________ š_______________________________
t t
The first series would be classified as being stationary; whereas, the other three would not.
autocovariances (ãs)
γs
ñs = = correlation (Yt, Yt-s) (A.2)
γ0
š
š
š
ñs
š
1 š
š
š
š
š
š____________________
0 1 2 3 s
Correlogram
From this specification, it can be shown that [do this using lag or back shift operator]
E(Yt) = 0
ó2
Var(Y t) ' ' ã0
2
1 & ö1
s ó2 s
Cov(y t,yt&s) ' (ö1) ' ö1 ã0 ' ãs
2
1 & ö1
ãs s
ñs '
ã0
' ö1 which decay exponentially for *ö 1* < 1.
While the AR(1) model is stationary if *ö 1* < 1, many economic series are not stationary.
Remember a time series is not stationary if either the mean, variance, or covariance change with
time. Thus a series which increases over time or is characterized by heteroskedasticity is not
stationary. Since many forecasting techniques are based upon the assumption that the series is
stationary, it is comforting to note that some non-stationary series can be transformed into
Yt = µ + Yt-1 + åt (A.4)
Σ
t
Yt = µt + i =0
(ε t −i ) (A.5)
And we note that Yt is not stationary because the mean of Yt ( µt ) increases (linearly) with t
and the variance of Yt , about the linear trend, increases with time. It is important to note
I(1), because the first difference of the series is stationary. A series is said to be integrated of
Yt = µ + ât + åt (A.7)
where gt - N[0, ó 2]. Like the random walk with drift, the trend stationary model has a
mean ( µ + ât ) which increases linearly with t; however, the variance of Yt , about its trend,
is a constant ó 2 which is in contrast to the random walk with drift whose variance increases
have a constant mean (â) and variance 2 ó 2, but the errors are moving averages. The
correlation between Yt and Yt-1 is -1/2 and correlation between Yt and Yt-s for s > 1 is
zero. The optimal estimations for these two series are different. The best way to estimate the
random walk with drift is to take first differences and then estimate the unknown parameters
associated with the differenced series. The best way to work with the trend stationary series
is to estimate a polynomial in “t” and then analyze the residuals. Thus, the two non stationary
c. The difference between these two important series can be summarized as in the
following table.
Σ i =0 (
t
• Yt = µt + t −i )
• Behavior: Increasing mean and variance
• The impact of innovations (åj ) persist.
practical problem is that it may be difficult to differentiate between the trend and difference
stationary models and, hence, to know the most efficient estimation techniques to use. One
approach to discriminated between the two models is to consider the following “nesting”
regression:
(1) ã = 0: T_________S____________
(2) ã = 1: R________W__________or DS
There are a series of tests known as Dickey-Fuller tests which explore the null hypothesis,
Yt - Yt-1 = á 0 + [á 1= (ã-1)]Yt-1 + á 2 t + åt
The hypothesis ã = 1 implies that the coefficients of the variables t and Yt-1
equal 0. Standard t-tests are not appropriate and special tables have been
have been constructed using Monte Carlo methods. If the coefficient of Yt-1 is
In these cases, the augmented Dickey_Fuller test is based on estimating the equation
Yt - Yt-1 = á 0 + [ á 1 = (ã-1)]Yt-1 + á 2 t + Σ pj=−11 φ ∆ Yt− j + åt (A.10)
where the term involving the summation has been added to pick up the impact of
autocorrelation. The test for differencing is performed by testing the null hypothesis
HO: á 1 = (ã-1) = 0
COINT Y/options
Shazam reports a number of tests for unit roots and difference stationarity which are based
These formulations would be used for series not being associated with a time trend, whereas
(A.10) would be used in the presence of a time trend. The summation term in (A.11) has
(2) á 0 = á 1 = 0 in (A.11) using an F test -Unit root test with zero drift
(4) á 0 = á 1 = á 2 =0 in (A.10) using an F test -Unit root test with zero drift
(5) á 1 = á 2 =0 in (A.10) using an F test -unit root test with non-zero drift
Notes:
1. Tests of á 1 = 0, null hypothesis of a unit root, are one tailed tests. The hypothesis is
rejected if the estimated test statistic is less than the reported critical value. For example, for
series without time trends, Asymptotic critical values for a unit root t - test [with no time
α$ 1
If < - 3.43, then we would reject the hypothesis of a unit root. In the case of unit
sα$1
roots, one approach is to work with first differences of the data. There are other
approaches.
2..Peter Phillips and others have explored the use of fractional differences where Äd where 0
< d < 1 and provides an intermediate ground between working with levels (d=0) and
Have you ever wondered what happens if you regress one series on an unrelated series
when both series grow over time? This was the question explored by Granger and Newbold in a
classic paper published in the Journal of Econometrics in 1974. Their finding may not be
surprising. Often, in these cases, standard statistical tests suggest statistical significance when, in
fact, there is no relationship. They concluded that in such cases much larger t-statistics would be
needed than suggested by traditional t-tables. Granger and Newbold find a critical value of 11.2
more appropriate than 1.96. Their study was based on Monte Carlo simulations. Peter Phillips
works with a more general model and used analytical methods, rather than simulation studies, and
recommends the use of a critical value given by (N.5 )( t_critical_from the table).
The bottom line of all of this, is that if we should consider fitting relationships to
4. Cointegration
An interesting problem arises if Y and X are integrated of different orders. This makes it
impossible for the error term = Yt - Xt â to be stationary. Greene gives a very abbreviated
An important general class of stochastic models which have been widely adopted as models for
time series are the autoregressive-integrated moving average models (ARIMA). These models include
several of the models in the previous section as special cases and are extremely versatile in terms of
their statistical properties. An ARIMA model with parameters (p,d,q) is defined as follows:
or
ö(B)Y*t = è(B)gt
where
• BsYt = Yt-s
• gt-N(0,ó 2)
This model is denoted ARIMA(p,d,q) and, as previously mentioned, includes many useful
Yt - ö 1 Yt-1 = gt
(6) ARIMA(0, 0, 0)
Yt = gt
Note that the portion of the expression on the left hand side of the equal sign in equation
(4.3) is the Autoregressive (AR) portion with p lags and that on the right hand side is the
Moving average component (MA) with q lags. The d refers to the number of times the
series is differenced. Other special cases include exponential smoothing ARIMA (0,1,1) and
the Holt-Winters nonseasonal model corresponds to ARIMA (0,2,2).
Traditional applications of ARIMA models as forecasting tools involves the following four
steps:
(1) IDENTIFICATION--determine values for p,d, and q. In other words the appropriate
number of autoregressive lags (p) and moving average lags (q) as well as the
appropriate degree of differencing needed to be determined.
"d" is selected to be the number of times that the series must be differenced to obtain a
stationary series.
Different ARIMA models will be associated with different behaviorial patterns for the
true autocorrelation and partial autocorrelation coefficients. These patterns depend
upon the values for p and q as well as the coefficients values of ö i, and è i in the model.
Many programs are available which will estimated autocorrelation and partial
autocorrelation coefficients corresponding to a series of data.
The SHAZAM command used in “identifying” the form of the ARIMA model is
Alternative and complimentary approaches to the identification process involve the use
of the spectral density function or the specification of an objective function which is
optimized over p and q and ö and è. The Akaike information criterion AIC = -2 Rn
(likelihood) +2(p+q) is an example of this procedure. AIC is then minimized over p
and q as well as the coefficients in the ARIMA specification.
After the model has been identified (values for p,d,q selected), the second step involves
estimating the specified model.
Where y denotes the name of the variable, NAR denotes the number of
autoregressive parameters to be estimated, NMA denotes the number of
moving average parameters to be estimated, RESID saves the residuals in a
variable denoted “e” to be used in possible additional diagnostics, COEF =
saves the coefficients in a variable “b” to be used in forecasting, and
NDIF=d denote the number of times the series should be differenced.
The estimation routine is based upon an equivalent AR(4) model
representations of the errors gt:
gt = Yt - ö 1Yt-1 ... -ö pYt-p + è 1gt-1 + ... + è q gt-q or
= è -1(B)ö(B) Ä d Yt
In either representation gt depends upon the ö i's and è i's (autoregressive and
moving average parameters). The associated sum of squared errors is given
by
SSE(ö,è) = 3 g2t
t
This formulation gives MLE for normally distributed error terms. Alternative
formulations based on different probability density functions can be
employed.
(3) Diagnostics--Given that the hypothesized model has been estimated, tests are
performed to check the validity of the conjectured model.
• Given estimated values for the ö’s and è’s , we can obtain estimated
residuals:
gt = è -1(B)ö(B) Ä d Yt
• The estimated residuals (gt) can then be tested for the existence of
underlying patterns. This can be done by checking for patterns in the
associated autocorrelation and partial autocorrelation coefficients. In a
correctly specified model, the estimated residuals should be white noise
without statistically significant autocorrelation coefficients or partial
autocorrelation coefficients. A "Q statistic" provides the basis for a
statistical test of the hypothesis that the autocorrelation coefficents of
the residuals are zero. Using the Box-Pierce Q-statistic
Q = TΣ pk=1 rk2 ~ χ 2 (p)
or the Box–Ljung Q statistic
rk2
Q = T (T + 2) Σ p
k=1 ~ χ 2(p)
T- k
(4) Forecast
This section has attempted to give a brief overview of alternative models, their characteristics and
identification, estimation, model diagnostics, and forecasting procedures. Each of these issues will
be considered in additional detail in subsequent sections. We turn to a more thorough analysis of
a number of simple ARIMA(p,d,q) models–including an investigation of the behavior of the
associated autocorrelation coefficients considered.
C. Characteristics of some ARIMA Models and Identification
Yt - ö 1 Yt-1 = gt (C.1)
E(yt) = 0 (C.2)
ó2
Var(Y t) ' ' ã0
2
1 & ö1
s ó2 s
Cov(y t,yt&s) ' (ö1) ' ö1 ã0 ' ãs
2
1 & ö1
Note ö 1 = ñ1
Some sample output from the theory program illustrates these patterns. The reader
might experiment with some other values of ö i.
DOS mode: TheoryTS
ENTER THE NUMBER OF AUTOREGRESSIVE PARAMETERS: 1
ENTER THE NUMBER OF MOVING AVERAGE PARAMETERS: 0
ENTER THE NUMBER OF AUTOCORRELATION COEFFICIENTS: 15
ENTER THE VALUE OF PHI(1): .8
****AUTO CORRELATION COEFFICIENTS****
LAGS VALUES
15 I* 0.035
14 I* 0.044
13 I* 0.055
12 I* 0.069
11 I* 0.086
10 I* 0.107
9 I * 0.134
8 I * 0.168
7 I * 0.210
6 I * 0.262
5 I * 0.328
4 I * 0.410
3 I * 0.514
2 I * 0.640
1 I * 0.800
I.I.I.I.I.I.I.I.I.I.I.I.I.I
-1 0 +1
LAGS VALUES
15 * -0.000
14 * -0.000
13 * -0.000
12 * -0.000
11 * -0.000
10 * -0.000
9 * -0.000
8 * -0.000
7 * -0.000
6 * -0.000
5 * -0.000
4 * -0.000
3 * -0.000
2 * -0.000
1 * I -0.800
I.I.I.I.I.I.I.I.I.I.I.I.I.I.I
-1 0 +1
Note that in each case the autocorrelation coefficients decline geometrically as ö 1s. The partial
autocorrelation coefficients are all equal to zero except for ö 11 (the first) which is equal to ö 1 It will
also be instructive to note that an AR(1) with š ö 1 š<1 can be written as an infinite moving average
.
MA(4). This demonstration along with a derivation of previous results follows. It might also be
mentioned that if the process is AR(p), then the first p partial autocorrelation coefficients may be
nonzero and all others zero. The corresponding autocorrelation coefficients decline to zero.
= 3 ö 1i E(gt-i)
=0
= ó 2 3ö 12i
ó2
'
2
1 & ö1
= ö s1 Var(yt-s)
s
ö1ó2
2
1 & ö1
ñs = ãs/ã0 = ö s1
' èiB iY t
4
i'0
' èiYt&i
4
i'0
or
Yt + è 1 Yt-1 + è 21 Yt-2 + . . . = gt
which is an AR(4).
Looking at the form of the coefficients of the lagged Yt's, suggests that the partial
autocorrelation coefficients will decline geometrically. This is in fact what happens
for a MA(1).
From the form for a moving average model we obtain the following
= E(gt) - è 1 E(gt-1)
=0
= ó 2 + è 21 ó 2
= ó 2 (1+è 21)
= -è 1E(gt-1)2
= -è 1ó 2
=0
=0
Therefore
è s ' 1
ñs ' &
2 s ' 2,3,...
1 % è1
0
This result demonstrates that a moving average process of order 1 (MA (1)) only has a
"memory" of one period. Similarly a moving average process of order q (MA(q)) only
has a "memory" of q periods. In other words for a MA(q) model
ñ s = 0 for s>q. The impact of Yt on the Y's will completely die out after q periods.
The following computer printout illustrates typical behavior of the autocorrelation and
partial autocorrelation coefficients corresponding to a MA(1) with è 1 = .9. Note that
there is only one nonzero autocorrelation coefficient and the partial autocorrelation
coefficients decline geometrically. Computational details will be reviewed following the
graphs.
ENTER 1 FOR NEW PROCESS. 2 FOR THE SAME PROCESS: 1
ENTER THE NUMBER OF AUTOREGRESSIVE PARAMETERS: 0
ENTER THE NUMBER OF MOVING AVERAGE PARAMETERS: 1
ENTER THE NUMBER OF AUTOCORRELATION COEFFICIENTS: 15
ENTER THE VALUE OF THETA (1): .9
* * * * AUTO CORRELATION COEFFICIENTS * * * *
LAGS VALUES
15 * 0.000
14 * 0.000
13 * 0.000
12 * 0.000
11 * 0.000
10 * 0.000
9 * 0.000
8 * 0.000
7 * 0.000
6 * 0.000
5 * 0.000
4 * 0.000
3 * 0.000
2 * 0.000
1 * I -0.497
I.I.I.I.I.I.I.I.I.I.I.I.I.I.I
-1 0 +1
ñi = 0 for i > 2
&è1
ñi '
2
1 % è1
&.9
' ' &.497
1 % .81
(c) In summary and a preview, the patterns of the autocorrelation and partial
autocorrelation coefficients for an AR(p) and MA(q) "appear" as follows
š š
Autocorrelation Partial autocorrelation
š š
š_______________ š______________________
š š
AR(p)
š š
p "spikes"
š š
Autocorrelation Partial autocorrelation
š š
š_______________ š______________________
š š
MA(q)
š š
q "spikes"
ö(B)yt = gt (C.4)
(a) An AR(p) will be stationary and can be written as a MA(4) if the roots of ö(z) are
greater than one in absolute value.
Details:
Factoring polynomial
Now consider
yt = {1/ö(B)}gt
1 1
' ... gt
1 & ö̃1B 1 & ö̃p
which is valid if the š ö̃ š<1, i.e., the roots of ö(z) are greater than one in
absolute value.
or
(C.5) is referred to as the system of Yule Walker equations. The ö i's can then be expressed
in terms of the ñi's and the ñi's can be expressed in terms of the ö i's. For example, for
p=1: ñ1 = ö 1
ñ1 1 ñ1 ö1
p=2: '
ñ2 ñ1 1 ö2
2
ñ2 & ñ1 ö1
ö1 ' , ñ1 '
1&
2
ñ1 1 & ñ2
2 2
ñ2 & ñ1 ö2(1 & ö2) % ö1
ö2 ' , ñ2 '
1 & ñ1
2 1 & ñ2
Derivation of the Yule Walker Equations: Multiply (C.3) by yt-k and take the
expected value
= E(gt yt-k)
or
If different values of p are selected and the "last coefficient" ö p obtained for each p,
using the Yule Walker equations, then these coefficients are referred to as partial
autocorrelation coefficients and are useful in determining the order of the autoregressive
process. This is analogous to deciding on how many terms to include in a multiple
regression. For an autoregressive process of order p, the first p partial autocorrelations will
be nonzero and higher order partial autocorrelation coefficients will equal zero. The partial
autocorrelation coefficients are denoted by ö ii.
ñ1 = ö 1 = ö 11
/0 /0
1 ñ1
00ñ ñ2000
0
' 0 1 0 ' 2
2
ñ & ñ1
ö22
/0 /0
1 ñ1 2
00ñ 1 000
1 & ñ1
00 1 0
/0 /0
1 ñ1 ñ2
00ñ 1 ñ2000
000 1 00
00 0
00ñ ñ1 ñ3000
ö33 ' 0 2 0
/0 /0
1 ñ1 ñ2
00ñ 1 ñ1000
00 1 00
00 0
00ñ ñ1 1 000
00 2 0
More generally
/0 /0
1 ñ1 ÿ ñ2
00 ñ 1 ÿ ñ2000
00 1 00
00 00
00 ! 00
00 00
00
!
ñi000
!
00ñi&1 ñi&2
ö33 ' 0 0
/0 /0
1 ñ1 ÿ ñi&1
00 ñ 1 ÿ ñi&2000
00 1
00 000
00 ! ! 000
00 00
00
00ñi&1 ñi&2 ÿ 1 000
!
0 0
Note:
ö 11 = ñ1
2 2 2
ñ2 & ñ1 ñ1 & ñ1
ö22 ' ' ' 0
2 2
1 & ñ1 1 & ñ1
ö ii = 0 i $ 2.
The Yule-Walker equations can then be used to estimate the corresponding partial
autocorrelation coefficients. The associated asymptotic standard errors were shown to
be
1
sö̂ ' k $ %1 (C.7)
kk T
I.I.I.I.I.I.I.I.I.I.I.I.I.I.I
-1 0 +1
ñk = ö 1ñk-1 + ö 2ñk-2
ñ0 = 1
ö1 02 2
ñ1 ' ' '
1 & ö2 1 & .7 3
= .667
2
ö2(1 & ö2) % ö1
ñ2 '
1 & ö
(.7)(.3) % .04
'
1 & .7
.25
' ' .833
.30
ö 11 = ñ1 = .667
2
5 2
ñ2&ñ2
2 &
6 3
ö22 ' '
2 2
1&ñ1 2
1 &
3
21
' .7
30
ö 33 = 0
= è(B)Ut
In order for the process defined by (C.8) to be invertible, the roots of è(B) must have
modulus greater than one,
q
è(z) ' ð (1 & èiz) ' 0
i'1
Hint: è&1(B) ' Ð (1 & è̃jB)&1 ' Ð ' (è̃j ) is valid if š è̃ jš < 1 for all j.
q q 4 i
i
ãk = E(yt yt-k)
ã0 = (1 + è 21 + ... + è 2q)ó 2.
for k = 1, 2, ... , q
=0 for k > q.
&èk % èk%1è1 % ... % èqèq&k
ñk ' or k=1,2,...,q (C.11)
2 2
1 % è1 % ... % èq
= 0 for k > q
From (C.10) we see that the autocorrelation function of a MA(q) has a "cut-off" at lag
q. We might say that a MA(q) has a memory of Length q.
1 % 2 ' ñi for k
q
2 1 2
sñ̂ ' (4.16)
k T i'1
The reader should be reminded that the appropriateness of the convention of using the
limiting normal density with (C.7) or (C.12) for purposes of assessing statistical
significance is questionable for small samples.
15 * -0.023
14 *I -0.027
13 *I -0.032
12 *I -0.037
11 *I -0.044
10 *I -0.052
9 *I -0.062
8 *I -0.074
7 * I -0.088
6 * I -0.107
5 * I -0.127
4 * I -0.165
3 * I -0.189
2 * I -0.313
1 * I -0.261
I.I.I.I.I.I.I.I.I.I.I.I.I.I.I
-1 0 +1
Given that the conditions of stationarity and invertibility are satisfied, we note that an
ARMA (p,q) process can be expressed as
AR(4): è -1(B)ö(B)Y = g t t (C.14)
MA(4): Yt = ö -1(B)è(B)gt (C.15)
Multiplying (C.13) by yt-k and taking expected values we see that
-è qñk-q(y,u)
These results will assist in the determination of p, q. The following tables are taken
from Nelson (p. 89) and Box and Jenkins (176,7) and provide useful summary
information to assist in the determination of p and q. It should be noted that the
autocorrelation coefficients and partial autocorrelations provide the basis for this
determination. Recall that the asymptotic standard errors of ñ̂k and ö̂ kk are given by
1 2 2 1/2
sñ̂ ' 1 % 2(ñ1 % ... % ñq)
k
T
1
sö̂ ' for k > p.
kk T
ñ2 & ñ1
2 è2
ö2 ' ñ2 '
2 2
1 & ñ2 1 % è1 è2
ö2 - ö1 < 1 è2 - è1 < 1
______________________________________________________________________________
Order (1,d,1)
______________________________________________________________________________
Behavior of ñk decays exponentially from first lag
(1&è1ö1)(ö1&è1)
Preliminary estimates from ñ1 ' ñ2 ñ1ö 1
2
1 % è1 & 2ö1è1
(5) ARIMA (p,d,q) The foregoing discussion has assumed that the underlying series is
stationary. If this is not the case, for any of several reasons, the previous approaches
are not strictly appropriate. For example if a time series didn't have a fixed mean, then
the series wouldn't be stationary. It will frequently be the case that Yt-Yt-1 = (I-B)Yt,
or (I-B)2Yt, or (I-B)dYt, will be stationary. If such a value of d can be determined then
the previous techniques can be applied to the "differenced" series, (I-B)dYt. For
example if Yt is basically distributed with constant variance about a linear (quadratic)
trend, then (I-B)Yt, (I-B)2Yt, will be stationary. If Yt shows evidence of trend
stationarity, then Y could be regressed on a polynomial in “t” and the previously
described techniques can be applied to the resulting residuals. Sometimes nonlinear
transformation on Yt , such as RnYt, may facilitate the search for a stationary process.
D. Diagnostic Analysis
There are several approaches which can be utilized to determine the "validity" of the
estimated model and three of the most common involve (1) considering more general models,
(2) an analysis of estimated residuals and (3) the Q-statistic.
(1) Generalized Model. Assume that ARIMA(p,d,q) has been "identified." The researcher
might estimate an ARIMA(p',d,q') where p' and q' are larger than p, q and then check
the statistical significance of the additional coefficients. This approach has at least two
limitations. The validity of the statistical inference (test statistics) is questionable for
small samples and an ARIMA (p,d,q) process is uniquely determined by the
autocorrelation structure up to a multiple of polynomials in B. t "type" statistics or
likelihood ratio tests may be used.
ñ̂k(ĝt) = 3ĝtĝt-k/3ĝt
It should be mentioned that the distributional characteristics of ĝt are not necessarily
exactly the same as for gt. Autocorrelation and partial autocorrelation coefficients of
gt's are frequently used in this analysis. If the model has been correctly specified, the
estimated residuals and the associated auto and partial autocorrelation coefficients
should correspond to white noise. However, if the estimated residuals appear to have
an AR or MA component, then the model specification should be respecified. For
example, if the ĝt's appear to be an AR(1), then one more AR component should be
included in the specification for the Yt series.
(3) Q-Statistics.
Box and Pierce define
ρ$k2
Q = T (T + 2) Σ M
k=1
T -k
E. Estimation
Once values for p and q have been determined, the coefficients in the ARIMA (p,d,q) need
to be estimated in order to use the model.
or
ö(B)Yt = è(B)gt
and also
= è -1(B)ö(B)Yt (E.3)
In either representation gt depends upon the ö i's and è i's (autoregressive and moving average
parameters). The associated sum of squared errors is given by
= 3 {è -1(B) ö(B)Yt}2
Under the assumption that the gt's are independently and identically distributed as N(0,ó 2)
the log likelihood function is
&'g2t/2ó 2
e
R(è, ö,ó2) ' Rn
(2ð)N/2(ó2)N/2
&SSE(ö,è) N N
' & Rn(2ð) & R n(ó2)
2ó2 2 2
A close inspection of the expression for SSE reveals that gt depends on the previous random
disturbances and observations on the variable Y, e.g.
Several approaches to these problems have been proposed. This is frequently referred to as
the question as to how to initialize the series. One approach is to replace the unobservable
values of yYt and gt by their expected values--in this case 0; hence
g1=Y1
' gt
N
2
t'1
'
N
SSE ' (yt&ö1yt&1 & ... &öpyt&p % è1gt&1 ... % èqgt&q)2
t'p%1
where ñi is estimated by
'(y t & ȳ)(yt&i&ȳ)
'(y t & ȳ)2
F. Forecasts
= Ã(B) gt
It can be shown that the optimal (minimum mean squared error) forecast of YN+h at time N
is given by
j'0
A recurrence relationship can also be developed from (F.3) which facilitates evaluation of
forecasts.
i'1 i'0
YN(h) = ö 1YN(h-1)
YN(1) = ö 1YN
YN(h) = ö h1 YN
i'0 j'0
Note: (1) That the variance of the forecast error increases as the lead time
increases
š
š___________________________________________
š
š___________________________________________
š
š___________________________________________
š
š___________________________________________
š
š___________________________________________
š
š___________________________________________
š
š___________________________________________
N t
(4) The expression for the variance of the forecast error in (F.7) doesn't take
account of parameter uncertainty.
G. General Comments
(2) If the time series exhibits seasonal behavior [large ñ12 (monthly) or ñ4 (quarterly)] then
the previously developed model can be modified to incorporate this behavior into the
modeling process.
(3) Just a reminder: the simple exponential smoothing is an ARIMA(0,1,1) model and the
Holt-Winters nonseasonal predictor is an ARIMA (0,2,2) model.
(4) A number of texts suggest that time series models based on ARIMA formulations
should only be expected to be "successful" if at least 40 observations are available. For
smaller samples, use other techniques–such as exponential smoothing or Holt
Winters. . .
(5) A complementary method of analysis is that of spectral analysis. The spectral density of
a stationary series is given by
k'1
0#f# ½
k'1
The spectral density function provides information about the cyclical behavior of a
series and shows how the variance of a stochastic process is distributed between a
continuous range of frequencies.
yt = ö -1(B)è(B)gt
-ð < u < ð
Demoivre's Theorem.
ARMA(0,0) σ ε2
2
ó2(1 % è1 & 2è1cosù)
ARMA(1,1) 2
(1 % ö1 & 2ö1cosù)
2 2
ó2(1 % è1 % è2 & 2è1(1&è2 )cosù & 2è2(cos2ù)
ARMA(1,2) 2
(1 % ö1 & 2ö1cosù)
2
ó2(1 % è1 & 2è1cosù)
ARMA(2,1) 2 2
1 % ö1 % ö2 & 2ö1(1&ö2)cosù
2 2
ó2(1 % è1 % è2 & 2è1(1&è2 )cosù & 2è2cos2ù)
ARMA (2,2) 2 2
1 % ö1 % ö2 & 2ö1(1&ö2)cosù & 2ö2cosù
COMPUTER PROGRAMS (TIME SERIES ANALYSIS)
I. THEORYTS