Arima: Model Building Blocks
Arima: Model Building Blocks
1
The Box-Jenkins model building Partial-autocorrelations (PACs)
process
z Model identification z Partial-autocorrelations are another set of statistical
– Autocorrelations measures are used to identify time series models
– Partial-autocorrelations
z Model estimation
z PAC is Similar to AC, except that when calculating
z Model validation
it, the ACs with all the elements within the lag are
– Certain diagnostics are used to check the validity of the
model partialled out (Box & Jenkins, 1976)
z Model forecasting
2
Partial-autocorrelations (cont.) Model identification
z PACs can be calculated from the values of the
ACs where each PAC is obtained from a different z The sample ACs and PACs are computed for the
set of linear equations that describe a pure series and compared to theoretical autocorrelation
autoregressive model of an order that is equal to and partial-autocorrelation functions for candidate
the value of the lag of the partial-autocorrelation models investigated
computed
Stationarity and
z PAC at lag k is denoted by φkk Theoretical ACs and
invertibility
– The double notation kk is to emphasize that φkk is the PACs
conditions
autoregressive parameter φk of the autoregressive model
of order k
3
Stationarity requirements for an Stationarity requirements for an
AR(1) model (cont.) AR(1) model (cont.)
z For an autoregressive model of order p, the
z For a stationary AR(1) model, the theoretical
theoretical autocorrelation function satisfies the
following difference equation autocorrelation function decays exponentially to
ρk = φ1 ρk – 1 + φ2 ρk – 2 + … + φp ρk – p zero,
which for p = 1 and with ρ0 = 1 has the solution z However, the theoretical partial-autocorrelation
ρk = φ1
k
k > 0 i.e., Exponenial decay function has a cut off after the 1st lag
4
Invertibility requirements for a
Theoretical PACs
MA(1) model (cont.)
z The partial-autocorrelations of a time series
z For a stationary MA(1) model, the theoretical produce patterns that are exactly the reverse of
autocorrelation function has a cut off after the 1st autocorrelation patterns with respect to AR and
lag MA parameters
5
Permissible regions for the AR and
Higher order models
MA parameters
z For an AR model of order p > 1:
– The autocorrelation function consists of a mixture of
damped exponentials and damped sine waves
– The partial-autocorrelation function has a cut off after
the p lag
z For a MA models of order q > 1:
– The autocorrelation function has a cut off after the q lag
– The partial-autocorrelation function consists of a
mixture of damped exponentials and damped sine
waves
6
Theoretical ACs and PACs for an AR(2) Process
AR(1)
7
Theoretical ACs and PACs for a
MA(2) Process
MA(1)
8
Theoretical ACs and PACs for an ARMA(1,1)
ARMA(1,1)
9
ARMA(1,1) Model estimation
z There are three objectives in estimating a specific
Box-Jenkins model for a given series:
1- Determining optimum values for the selected AR
and/or MA parameters so that the sum of squared
residuals is minimized. These parameters are called
“the most likelihood parameters” or “the least squares
parameters”.
2- Obtain residuals at that are not correlated to one
another.
3- Use as few parameters as necessary to obtain an
adequate model; i.e., make sure the model is
parsimonious (not overspecified)
10
Model verification
Diagnostic checks
Residual mean or Mean error (ME)
Parameter diagnostics Overfitting Residual diagnostics
z The residual mean is simply the average of all the
computed residuals
Cofidence limits Residual mean
Correlations Residual mean percent error z If the residual mean is significantly nonzero, then
Correlogram
Q-statistic
either the fitted values are consistently higher or
Cumulative periodogram
Normality
lower than the original series values
Error variance
z To check whether a residual mean is significantly
Closeness of fit statistics
(with 95% confidence) nonzero, its magnitude can
Average absolute error be compared with: 2 * S
Residual standard error
Average absolute percent error a
Index of determination
n
11
Correlogram of the residuals Q statistic of the residuals
z Is used in order to judge whether the
autocorrelations of the residual series, as a whole,
are significantly nonzero
z By comparing the Q-statistic with a critical test
value (the chi-square value), we can determine
(with a certain degree of confidence) if the
residual autocorrelations, being tested as a whole,
are significant
z Using the Ljung-Box formula:
z Solid lines represent the 95% confidence limits of
two standard deviations m
rk2
Q = n ( n + 2) ∑
k =1 ( n − k )
12
Normality of the residuals
Error variance
100
80
60
40
20
-2 0
-4 0
-6 0
-8 0
-1 0 0
0 20 40 60 80 100 120 140
P r e d ic t e d e f f lu e n t T S S ( m g /L )
13
Index of determination (R2)
Seasonal AC and PAC patterns
R2 = 1 −
∑ (a ) t
2
14