0% found this document useful (0 votes)
33 views

Topic Four: IV. Box-Jenkins Methodology

The document discusses the Box-Jenkins methodology for fitting ARIMA models to time series data. The methodology consists of three steps: 1) tentatively identifying an ARIMA model based on examining autocorrelation functions, 2) estimating the parameters of the identified model, and 3) performing diagnostic checks of the model. If the model passes diagnostic checks, it can be used for forecasting. Otherwise, the process is repeated with model modifications informed by the diagnostic results. Key aspects of model identification include examining sample autocorrelation functions to identify whether residuals resemble white noise or fit AR or MA models based on where autocorrelations cut off. Model fitting involves estimating parameters of the identified ARMA model.

Uploaded by

Collins Musera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Topic Four: IV. Box-Jenkins Methodology

The document discusses the Box-Jenkins methodology for fitting ARIMA models to time series data. The methodology consists of three steps: 1) tentatively identifying an ARIMA model based on examining autocorrelation functions, 2) estimating the parameters of the identified model, and 3) performing diagnostic checks of the model. If the model passes diagnostic checks, it can be used for forecasting. Otherwise, the process is repeated with model modifications informed by the diagnostic results. Key aspects of model identification include examining sample autocorrelation functions to identify whether residuals resemble white noise or fit AR or MA models based on where autocorrelations cut off. Model fitting involves estimating parameters of the identified ARMA model.

Uploaded by

Collins Musera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

TOPIC FOUR

IV. Box-Jenkins methodology


IV.1 Overview

We consider how to fit an ARIMA(p,d,q) model to historical data {x1, x2, ...xn}. We assume that trends
and seasonal effects have been removed from the data.

The methodology developed by Box and Jenkins consists in 3 distinct steps:

• Tentative identification of an ARIMA model


• Estimation of the parameters of the identified model
• Diagnostic checks

If the tentatively identified model passes the diagnostic tests, it can be used for forecasting.

If it does not, the diagnostic tests should indicate how the model should be modified, and a new cycle of

• Identification
• Estimation
• Diagnostic checks

is performed.

IV.2 Model selection

a. Identification of white noise

Recall: in a simple linear regression model, yi = .0 + .1Xi + ei, ei ~ IN(0,)2), we use regression
diagnostic plots of the residuals eˆi to test the goodness of fit of the model, i.e. if the assumptions
ei ~ IN(0,)2) are justified.

The error variables ei form a zero-mean white noise process: they are uncorrelated, with common
variance )2.

Recall: {et : t !! } is a zero-mean white noise process if

E (et ) = 0 $t
%! 2 , k = 0
" k = Cov(et , et #k ) = &
' 0, otherwise

Thus the ACF and PACF of a white noise process (when plotted against k) look like this:
ACF (*k) PACF ( !ˆk )

1 1
| | | | | | | | | | | |
1 2 3 ... k 1 2 3 ... k

-1 -1
i.e. apart from *0 = 1, we have *k = 0 for k = 1,2,... and !k = 0 for k = 1, 2,...

Question: how do we test if the residuals from a time series model look like a realisation of a white

noise process?

Answer: we look at the SACF and SPACF of the residuals. In studying the SACF and SPACF, we
realise that even if the original process was white noise, we would not expect rk = 0 for k = 1, 2,… and
!k = 0 for k = 1, 2,… as rk is only an estimate of *k and !ˆk is only an estimate of !k .

Question: how close to 0 should rk and !ˆk be, if rk = 0 for k = 1, 2, … and !ˆk = 0 for k = 1, 2, …?

Answer: If the original model is white noise, Xt = µ + et, then for each k, the SACF and SPACF satisfy
! 1$ ! 1$
rk ~ N # 0, & and !ˆk ~ N # 0, &
" n% " n%
This is true for large samples, i.e. for large values of n.

! 2 2 "
Values of rk or !ˆk outside the range $ # , % can be taken as suggesting that a white noise model is
& n n'
inappropriate.

However, these are only approximate 95% confidence intervals. If *k = 0, we can be 95% certain that rk
lies between these limits. This means that 1 value in 20 will lie outside these limits even if the white
noise model is correct.
Hence a single value of rk or !ˆk outside these limits would not be regarded as significant on its own, but
three such values might well be significant.

There is an overall Goodness of Fit test, based on all the rk’s in the SACF, rather than on individual rk’s,
called the Portmanteau test by Ljung and Box. It consists in checking whether the m sample
autocorrelation coefficients of the residuals are too large to resemble those of a white noise process
(which should all be negligible).

Given residuals from an estimated ARMA(p,q) model, under the null hypothesis that all values of rk = 0,
and the Q-statistic is asymptotically #2-distributed with s = m – p – q degrees of freedom, or, if a
constant (say µ) is included, s = m – p – q – 1 degrees of freedom.

If the white noise model is correct then


m
rk 2
Q = n(n + 2)# ! ! s2 for each s = m - p - q.
k =1 n " k

That is, under the null hypothesis that all values of rk = 0, the Q-statistic given above is asymptotically
#2-distributed with m degrees of freedom. If the Q-statistic is found to be greater than the 95th percentile
of that #2 distribution, the null hypothesis is rejected, which means that the alternative hypothesis that “at
least one autocorrelation is non-zero” is accepted. Statistical packages print these statistics. For large n,
the Ljung-Box Q-statistic tends to closely approximate the Box-Pierce statistic:
m
rk 2 m
2
n(n + 2) " ! n "r k
k =1 n - k k =1

The overall diagnostic test is therefore performed as follows (for centred realisations):
• Fit ARMA(p,q) model
• Estimate (p+q) parameters
• Test if
m
rk2 2
Q = n(n + 2) "n!k ~ ! m! p!q
k=1

Remark: the above Ljung-Box Q-statistic was first suggested to improve upon the simpler Box-Pierce
test statistic
m
Q = n! rk2
k =1
which was found to perform poorly even for moderately large sample sizes.

b. Identification of MA(q)

Recall: for an MA(q) process, #k = 0 for all k > q, i.e. the “ACF cuts off after lag q”.

To test if an MA(q) model is appropriate, we see if rk is close to 0 for all k > q. If the data do come from
an MA(q) model, then for k > q (since the first q+1 coefficients are significant),

" 1" q %%
rk ~ N $$ 0, $$1+ ! 2 !i2 ''''
# n # i=1 &&

and 95% of the rk’s should lie in the interval

" 1$ q
1$ q #
2% 2%
' &1.96 )1 + 2/ !i * , +1.96 )1 + 2/ !i * (
'- n+ i =1 , n+ i =1 , (.

(note that it is common to use 2 instead of 1.96 in the above formula). We would expect 1 in 20 values to
lie outside the interval. In practise, the #i’s are replaced by ri’s. The “confidence limits” on SACF plots
are based on this. If rk lies outside these limits it is “significantly different from zero” and we conclude
that #k $ 0. Otherwise, rk is not significantly different to zero and we conclude that #k = 0.

SACF
---
---
---
rk
1 2 k
---
---
---
For q=0, the limits for k=1 are

! 1.96 1.96 "


$# , %
& n n'

as for testing for white noise model. Coefficient r1 is compared with these limits. For q = 1, the limits
for k = 2 are

! 1 2 1 2
"
$ #1.96 (1 + 2r1 ),1.96 (1 + 2r1 ) %
& n n '

and r2 is compared with these limits. Again, 2 is often used in place of 1.96.

c. Identification of AR(p)

Recall: for an AR(p) process, we have !k = 0 for all k > p, i.e. the “PACF cuts off after lag p”.

To test if an AR(p) model is appropriate, we see if the sample estimate of !k is close to 0 for all k > p. If
the data do come from an AR(p) model, then for k > p,

! 1$
!ˆk ~ N # 0, &
" n%
and 95% of the sample estimates should lie in the interval

! 2 2 "
$# , %
& n n'

The “confidence limits” on SPACF plots are based on this: if the sample estimate of !k lies outside these
limits, it is “significant”.

Sample PACF of AR(1)


0.8
0.6
0.4
SPACF

0.2
-0.2 0.0

5 10 15

Lag k
IV.3 Model fitting

a. Fitting an ARMA(p,q) model

We make the following assumptions:

• An appropriate value of d has been found and {zd+1, zd+2, ... zn} is stationary.

• Sample mean z = 0; if not, subtract µˆ = z from each zi.

• For simplicity, we assume that d = 0 (to simplify upper and lower limits of sums).

We look for an ARMA(p,q) model for the data z:

• If the SACF appears to cut off after lag q, an MA(q) model is indicated (we use the tests of
significance described previously).

• If the SPACF appears to cut off after lag p, and AR(p) model is indicated.

If neither the SACF nor the SPACF cut off, mixed models must be considered, starting with
ARMA(1,1).

b. Parameter estimation: LS and ML

Having identified the values for the parameters p and q, we must now estimate the values of the
parameters (1, (2, ... (p and &1, &2, ..., &q in the model

Zt = (1Zt-1 + ... + (pZt-p + et + &1et-1 + &qet-q

Least squares (LS) estimation is equivalent to maximum likelihood (ML) estimation if et is assumed
normally distributed.
Example: in the AR(p) model, et = Zt – (1Zt-1 – ... – (pZt-p. The estimators !ˆ1 ,...,!ˆ p are chosen to
minimise
n

" (z t ! !ˆ 1z t-1 ! ... ! !ˆ p z t-p )2


t=p+1

Once these estimates obtained, the residual at time t is given by

eˆt = z " !ˆ1 zt -1 " ... " !ˆ p zt - p

For general ARMA models, êt cannot be deduced from the zt. In the MA(1) model for instance,

eˆt = zt " !ˆ1eˆt "1

We can solve this iteratively for êt as long as some starting value ê0 is assumed. For an ARMA(p,q)
model, the list of starting values is ( ê0 , ê1 , ..., êq!1 ). The starting values are estimated recursively by
backforecasting:
0. Assume ( ê0 , ê1 , ..., êq!1 ) are all zero
1. Estimate the (i and &j
2. Use forecasting on the time-reversed process {zn, ..., z1} to predict values for ( ê0 , ê1 , ..., êq!1 )
3. Repeat cycle (1)-(2) until the estimates converge.

c. Parameter estimation: method of moments

• Calculate theoretical ACF or ARMA(p,q): #k’s will be a function of the (’s and &’s.

• Set #k = rk and solve for the (’s and &’s. These are the method of moments estimators.

Example: you have decided to fit the following MA(1) model

xn = en + .en-1 , en ~ N(0,1)

You have calculated !ˆ0 =1, !ˆ1 = -0.25. Estimate ..

ˆ
We have r1 = ! 1 = -0.25.
!ˆ0

Recall: '0 = (1 + .2) )2 = 1 + .2 and '1 = .)2 = . here, from which *1 = ! .


1+! 2
Setting #1 = r1 = ! = -0.25 and solving for . gives . = -0.268 or . = -3.732.
1+! 2

Recall: the MA(1) process is invertible IFF |.| < 1. So for . = -0.268, the model is invertible. But for . =
-3.732 the model is not invertible.

Note: If !ˆ1 = -0.5 here, then #1 = r1 = ! = -0.5, which gives (. + 1)2 = 0, so . = -1, and neither
1+! 2
estimate gives an invertible model.

Now, let us estimate )2 = Var (et).

Recall that in the simple linear model Yi = &0 + &1Xi + ei, ei ~ IN(0, )2), )2 is estimated by

n
1
!ˆ 2 = " eˆ 2
i
n-2 i =1

where eˆi = yi - !ˆ0 - !ˆ1 xi is the ith residual. Here we use

1 n 2
!ˆ 2 = $ eˆt
n t = p +1
n
1
= $ ( z - "ˆ z t 1 t -1 -...- "ˆ p zt - p - #ˆ1eˆt -1 -...- #ˆq eˆt -q )
n t = p +1
No matter which estimation method is used this parameter is estimated last, as estimates of the (’s and
.’s are required first.

Note: In using either Least Squares or Maximum Likelihood Estimation we also find the residuals, ê t ,
whereas using the Method of Moments to estimate the ,’s and .’s these residuals have to be calculated
afterwards.

Note: for large n, there will be little difference between LS, ML and Method of Moments estimators.

d. Diagnostic checking

Assume we have identified a tentative ARIMA(p,d,q) model and calculated the estimates
ˆ "ˆ 1 , ... "ˆ p , #ˆ 1, ... ,#ˆ q .
ˆ !,
µ,

We must perform diagnostic checks based on the residuals. If the ARMA(p,q) model is a good
approximation to the underlying time series process, then the residuals ê t will form a good
approximation to a white noise process.

(I) Tests to see if the residuals are white noise:


! 1.96 1.96 "
• Study SACF and SPACF of residuals. Do rk and !ˆk lie outside $ # , %?
& n n'
• Portmanteau test of residuals (carried out on the residual SACF):
m
r2
n(n + 2) # k ~ ! m2 "s , for s = number of parameters of the model
k =1 n - k

If the SACF or SPACF of the residuals has too many values outside the interval !$ # 1.96 , 1.96 "% we
& n n'
conclude that the fitted model does not have enough parameters and a new model with additional
parameters should be fitted.

The Portmanteau test may also be used for this purpose. Other tests are:

• Inspection of the graph of {eˆ t }

• Counting turning points

• Study the sample spectral density function of the residuals

(II) Inspection of the graph of {eˆ t }:

plot ê t against t

• plot ê t against zt
any patterns evident in these plots may indicate that the residuals are not a realisation of a set of
independent (uncorrelated) variables and so the model is inadequate.
(III) Counting Turning Points:

This is a test of independence. Are the residuals a realisation of a set of independent variables?
Possible configurations for a turning point are:

In the diagram above, there exists a turning point for all configurations except (a) and (b). Since four out
of the six possible configurations exhibit a turning point, the probability to observe one is 4/6 = 2/3.

If y1, y2, ..., yn is a sequence of numbers, the sequence has a turning point at time k if

either
yk-1 < yk AND yk > yk+1
or
yk-1 > yk AND yk < yk+1

Result: if Y1, Y2, ... YN is a sequence of independent random variables, then

• the probability of a turning point at time k is 2/3

• The expected number of turning points is 2/3 (N - 2)

• The variance is (16N – 29)/90


[Kendall and Stuart, “The Advanced Theory of Statistics”, 1966, vol 3, p.351]

therefore, the number of turning points in a realisation of Y1, Y2, ... YN should lie within the 95%
confidence interval:

!2 $ 16 N # 29 % 2 $ 16 N # 29 % "
& ( N # 2) # 1.96 ( ) , ( N # 2) + 1.96 ( )'
,& 3 * 90 + 3 * 90 + -'

Study the sample spectral density function of the residuals:

Recall: the spectral density function on white noise process is f(#) = )2/2$ , -$ < # < $. So the sample
spectral density function of the residuals should be roughly constant for a white noise process.

You might also like