Lecture Notes TS Econometrics
Lecture Notes TS Econometrics
Short-Term Movements
σ(𝜷𝒀𝟐𝒕−𝟏 + 𝒆𝒕 𝒀𝒕−𝟏 )
𝑶𝑳𝑺
𝜷 =
σ 𝒀𝟐𝒕−𝟏
Problems with OLS in Models with LDVs
σ 𝜷𝒀𝟐𝒕−𝟏 σ 𝒆𝒕 𝒀𝟐𝒕−𝟏
𝑶𝑳𝑺 =
𝜷 +
σ 𝒀𝟐𝒕−𝟏 σ 𝒀𝟐𝒕−𝟏
Since 𝜷 is a constant we can pull it out of the summation and simplify:
σ 𝒀𝟐𝒕−𝟏 σ 𝒆𝒕 𝒀𝟐𝒕−𝟏
𝑶𝑳𝑺 = 𝜷 ∙
𝜷 +
σ 𝒀𝟐𝒕−𝟏 σ 𝒀𝟐𝒕−𝟏
σ 𝒆𝒕 𝒀𝟐𝒕−𝟏
𝑶𝑳𝑺 = 𝜷 +
𝜷
σ 𝒀𝟐𝒕−𝟏
Problems with OLS in Models with LDVs
Thus, we can see that the OLS estimate 𝜷 𝑶𝑳𝑺 is equal to the true value of 𝜷 plus
some bias.
Fortunately, this bias shrinks in larger samples (that is, the estimate is said to be
“consistent.”)
If the errors are autocorrelated then the problem is worse.
OLS estimates are biased and inconsistent.
That is, the problem of bias doesn’t go away even in infinitely large samples
Simulated data in the first figure shows that OLS estimates on LDVs are biased in
small samples, but that this bias diminishes as the sample size increases.
The second figure shows that OLS’s bias does not diminish in the case where the
errors are autocorrelated.
AR(p) Model
The idea of an autoregressive model can be extended to include lags
reaching farther back than one period.
In general, a process is said to be 𝐴𝑅(𝑝) if:
𝒀𝒕 = 𝜷𝟏 𝒀𝒕−𝟏 + 𝜷𝟐 𝒀𝒕−𝟐 + ⋯ + 𝜷𝒑 𝒀𝒕−𝒑 + 𝒆𝒕
Remember this still based on the assumption that the process is stationary.
Macroeconomic theory in most cases does not give you any idea about the
number of lags to include in a model, this is because the problem is mainly
an econometric one.
AR model with more lags accommodate richer dynamics and makes the
residuals closer to white noise.
There are few instances where economic theory suggest the number of lags
to be included in a model.
AR(p) Model
Paul Samuelson’s multiplier-accelerator model is a typical example.
It begins with GDP for a two-sector economy
𝒀𝒕 = 𝑪𝒕 + 𝑰𝒕
𝑪𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏
The investment is modelled as a function of growth in consumption
𝑰𝒕 = 𝜷𝟐 𝑪𝒕 − 𝑪𝒕−𝟏 + 𝒆𝒕
These three equation imply an AR(2) model:
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝟏 + 𝜷𝟐 𝒀𝒕−𝟏 − 𝜷𝟏 𝜷𝟐 𝒀𝒕−𝟐 + 𝒆𝒕
𝒐𝒓
𝒀𝒕 = 𝜶𝟎 + 𝜶𝟏 𝒀𝒕−𝟏 + 𝜶𝟐 𝒀𝒕−𝟐 + 𝒆𝒕
AR(p) Model
with the 𝜶’s properly defined. Samuelson’s model accommodates different
kinds of dynamics: dampening, oscillating, etc. . . depending on the
estimated parameters.
MA (1) Process
Since data generating processes are presumed to be purely random in nature,
𝒀𝒕 could be modelled as a weighted average of all previous errors.
Thus, the simplest form of an MA model is:
𝒀𝒕 = 𝒆𝒕
𝒆𝒕 = 𝒖𝒕 + 𝜷𝒖𝒕−𝟏
𝒖𝒕 ~𝒊𝒊𝒅 𝑵 𝝁, 𝝈𝟐𝒖
This can be condensed to:
𝒀𝒕 = 𝒖𝒕 + 𝜷𝒖𝒕−𝟏
MA(q) Models
Moving Average models can be functions of lags deeper than 1.
The general form of the Moving Average model with lags of one through q
𝒒
= 𝜷𝟎 + 𝑬 𝜷𝟏 𝒀𝒕−𝟏 + 𝑬 𝒆𝒕
= 𝜷𝟎 + 𝜷𝟏 𝑬 𝒀𝒕−𝟏
Stationarity implies that 𝑬 𝒀𝒕 = 𝑬 𝒀𝒕−𝟏
𝑬 𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝑬 𝒀𝒕
Grouping like terms
𝑬 𝒀𝒕 − 𝜷𝟏 𝑬 𝒀𝒕 = 𝜷𝟎
𝑬 𝒀𝒕 [𝟏 − 𝜷𝟏 ] = 𝜷𝟎
Non-Zero AR Processes
Making 𝑬 𝒀𝒕 the subject
𝜷𝟎
𝑬 𝒀𝒕 = , 𝑵𝑩 𝜷𝟏 ≠ 𝟎
𝟏 − 𝜷𝟏
If the process were AR(p), then the expectation generalizes to
𝜷𝟎
𝑬 𝒀𝒕 =
𝟏 − 𝜷𝟏 − 𝜷𝟐 − ⋯ 𝜷𝒑
What is the variance of a non-zero AR(1) process?
𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + 𝒆𝒕
𝝈𝟐𝒆
𝑽𝒂𝒓 𝒀𝒕 =
𝟏 − 𝜷𝟐𝟏
Non-Zero MA Processes
Now, let’s consider the following MA(1) process with an intercept:
𝒀𝒕 = 𝜶 + 𝒖𝒕 + 𝜷𝒖𝒕−𝟏 , 𝒖𝒕 ~𝑵 𝟎, 𝝈𝟐𝒖
The constant, 𝜶 allows the mean of the error to be non-zero.
What are the features of this type of MA(1) model? What is the mean of such
a process?
𝑬 𝒀𝒕 = 𝑬 𝜶 + 𝒖𝒕 + 𝜷𝒖𝒕−𝟏
𝑬(𝒀𝒕 ) = 𝜶 + 𝑬(𝒖𝒕 ) + 𝜷𝑬(𝒖𝒕−𝟏 )
𝑬 𝒀𝒕 = 𝜶 + 𝟎 + 𝟎
𝑬 𝒀𝒕 = 𝜶
The rather straightforward result is that mean of an MA(1) process is equal to
the intercept.
Non-Zero MA Processes
This generalises to any MA(1)
𝑬 𝒀𝒕 = 𝑬 𝜶 + 𝒖𝒕 + 𝜷𝟏 𝒖𝒕−𝟏 + 𝜷𝟐 𝒖𝒕−𝟐 + ⋯ + 𝜷𝒒 𝒖𝒕−𝒒
𝑬(𝒀𝒕 ) = 𝜶 + 𝑬(𝒖𝒕 ) + 𝜷𝑬(𝒖𝒕−𝟏 ) + 𝜷𝟐 𝑬(𝒖𝒕−𝟐 ) + ⋯ + 𝜷𝒒 𝑬(𝒖𝒕−𝒒 )
𝑬 𝒀𝒕 = 𝜶 + 𝟎 + 𝟎 + 𝟎 + 𝟎
𝑬 𝒀𝒕 = 𝜶
Non-Zero MA Processes
What is the variance of a non-zero MA(1) process?
𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝜶 + 𝒖𝒕 + 𝜷𝒖𝒕−𝟏
𝑽𝒂𝒓 𝒀𝒕 = 𝝈𝟐𝒖 = 𝟏 + 𝜷𝟐
We moved from the first to the second line because, since the 𝒖𝒕 are white noise at
all 𝑡, there is no covariance between 𝒖𝒕 and 𝒖𝒕−𝟏 . We moved to the third line
because 𝜶 and 𝜷 are not random variables.
Notice that the variance does not depend on the added constant (𝜶). That is,
adding a constant affects the mean of an MA process, but does not affect its
variance
Dealing with Non-zero Means
If we subtract the mean (𝒀ഥ ) from 𝒀𝒕
෪𝒕 = 𝒀𝒕 − 𝒀
𝒀 ഥ
෩ 𝒕 will have a mean of zero:
the resulting variable 𝒀
𝑬 𝒀෩ 𝒕 = 𝑬 𝒀𝒕 − 𝒀 ഥ = 𝑬 𝒀𝒕 − 𝑬 𝒀 ഥ =𝒀 ഥ−𝒀ഥ=𝟎
But the same variance:
𝑽𝒂𝒓 𝒀෩ 𝒕 = 𝑽𝒂𝒓 𝒀𝒕 − 𝒀 ഥ = 𝑽𝒂𝒓 𝒀𝒕 − 𝟎 = 𝑽𝒂𝒓(𝒀𝒕 )
Subtracting a constant shifts our variable (changes its mean) but does not
affect the dynamics nor the variance of the process.
Dealing with Non-zero Means
How do we deal with an AR process that doesn’t have mean of zero?
We could directly estimate a model with an intercept.
Alternatively, we could de-mean the data.
Then we can estimate an AR process in the demeaned variables without an
intercept.
Assuming we have a random variable, 𝒀𝒕 , with a non-zero mean, but with a
mean of 𝒀 ഥ.
Since there is no subscript t with the mean, it implies the mean is time
invariant.
We can prove that de-meaning TS variables reverse it from a non-zero AR(1)
process to a zero AR(1) process
Dealing with Non-zero Means
Given a non-zero AR(1) process:
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + 𝒆𝒕
𝜷𝟎
ഥ=
Remember that the mean of 𝒀
𝟏−𝜷𝟏
If we subtract the mean of 𝒀ෙ = 𝒀𝒕 − 𝒀
ഥ
ෙ would have a zero mean.
The resulting variable 𝒀
𝑬 𝒀ෙ = 𝑬 𝒀𝒕 − 𝒀 ഥ = 𝑬 𝒀𝒕 − 𝑬 𝒀 ഥ =𝒀 ഥ−𝒀
ഥ=𝟎
But the variance would remain the same
𝑽𝒂𝒓 𝒀ෙ = 𝑽𝒂𝒓 𝒀𝒕 − 𝒀 ഥ = 𝑽𝒂𝒓 𝒀𝒕 − 𝟎 = 𝑽𝒂𝒓(𝒀𝒕 )
Subtracting a constant shifts our variable (changes its mean) but does not
affect the dynamics nor the variance of the process.
Dealing with Non-zero Means
We can now proof that de-meaning the variable changes the model from
AR(1) with a constant to our more familiar zero mean AR(1) process.
Given a non-zero AR(1) process
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + 𝒆𝒕
By replacing all the terms in the equation above we get
෩+𝒀
𝒀𝒕 = 𝒀 ഥ
Thus
𝜷𝟎
෩𝒕 +
𝒀𝒕 = 𝒀
𝟏 − 𝜷𝟏
Dealing with Non-zero Means
𝜷𝟎 𝜷𝟎 𝜷𝟎 𝟏 − 𝜷𝟏 − 𝜷𝟎 + 𝜷𝟏 𝜷𝟎
෩
𝒀𝒕 + ෩
= 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + ෩
+ 𝒆𝒕 𝒀 𝒕 = ෩ 𝒕−𝟏 + 𝒆𝒕
+ 𝜷𝟏 𝒀
𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏
𝜷𝟎 𝜷𝟏 𝜷𝟎 𝜷𝟎 − 𝜷𝟎 𝜷𝟏 − 𝜷𝟎 + 𝜷𝟏 𝜷𝟎
෩𝒕 +
𝒀 ෩
= 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + + 𝒆𝒕 ෩𝒕 =
𝒀 ෩ 𝒕−𝟏 + 𝒆𝒕
+ 𝜷𝟏 𝒀
𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏
𝜷𝟏 𝜷𝟎 𝜷𝟎
෩ 𝒕 = 𝜷𝟎 +
𝒀 − ෩ 𝒕−𝟏 + 𝒆𝒕
+ 𝜷𝟏 𝒀 ෩ 𝒕 = 𝟎 + 𝜷𝟏 𝒀
𝒀 ෩ 𝒕−𝟏 + 𝒆𝒕
𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏
𝜷𝟏 𝜷𝟎 𝜷𝟎 ෩ 𝒕 = 𝜷𝟏 𝒀
𝒀 ෩ 𝒕−𝟏 + 𝒆𝒕
෩ 𝒕 = 𝜷𝟎 +
𝒀 − ෩ 𝒕−𝟏 + 𝒆𝒕
+ 𝜷𝟏 𝒀
𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏
Dealing with Non-zero Means
De-meaning the variables transforms the non-zero AR(1) process (i.e., one
with a constant) to a zero-mean AR(1) process (i.e., one without a constant)
Thus, whenever you are looking at a zero-mean AR(p) process, just
from its mean.
remember that the Y’s represent deviations of a variable 𝒀
ARMA(𝒑, 𝒒) Models
Let's complicate things a little more, by bringing together AR and MA
processes.
That is, there is a more general class of process called ARMA(𝒑, 𝒒) models
that consist of (a) an autoregressive component with 𝑝 lags, and (b) a
moving average component with 𝑞 lags
An ARMA(𝒑, 𝒒) model looks like:
𝒀𝒕 = 𝜷𝟏 𝒀𝒕−𝟏 + 𝜷𝟐 𝒀𝒕−𝟐 + ⋯ + 𝜷𝒑 𝒀𝒕−𝒑 + 𝒖𝒕 + 𝜸𝟏 𝒖𝒕−𝟏 + 𝜸𝟐 𝒖𝒕−𝟐 + ⋯ + 𝜸𝒒 𝒖𝒕−𝒒
Model Selection in ARMA(𝒑,𝒒)
Processes
Theoretical ACFs and PACFs
Introduction
To be able to tell whether a data generating process is an AR or MA process, there
is the need to rely some statistical processes that is associated with AR and MA
Models.
The classic Box and Jenkins (1976) procedure could be used to do this.
This procedure is to check whether a time series mimics the properties of various
theoretical models before estimation is actually carried out
This involves comparing the estimated ACFs and PACFs from the data with the
theoretical ACF and PACFs implied by the various model types.
A more recent approach is to use various “information criteria” to aid in model
selection
ACFs and PACFs each come in two flavors: theoretical and empirical. The former
is implied by a model; the latter is a characteristic of the data.
Autocorrelation functions
It is not always easy to see from the plot of a time series whether it is stationary.
It is useful to consider some statistics related to a time series like
AUTOCORRELATION FUNCTION.
Under weak or covariance stationarity, we can define the 𝒌𝒕𝒉 order autocovariance
𝜸𝒌 as:
𝑬 𝒀𝒕 − 𝝁 𝒀𝒕−𝒌 − 𝝁 = 𝒄𝒐𝒗 𝒀𝒕 , 𝒀𝒕−𝟏 = 𝒄𝒐𝒗 𝒀𝒕−𝒌 , 𝒀𝒕 = 𝜸𝒌
As the autocovariances are not independent of the units in which the variables are
measured, it is common to standardize by defining autocorrelations 𝝆𝒌 as
𝒄𝒐𝒗 𝒀𝒕 , 𝒀𝒕−𝒌
𝝆𝒌 =
𝑽𝒂𝒓 𝒀𝒕
Note that 𝝆𝟎 = 𝟏, while −𝟏 ≤ 𝝆𝒌 ≤ 𝟏
Autocorrelation functions
The autocorrelation function (ACF) describes the correlation between 𝑌𝑡 and
its lag 𝑌𝑡−𝑘 as a function of 𝑘.
The ACF plays a major role in modelling the dependencies among
observations, because it characterizes the process describing the evolution of
𝑌𝑡 over time
From the ACF we can infer the extent to which one value of the process is
correlated with previous values and thus the length and strength of the
memory of the process.
It indicates how long (and how strongly) a shock in the process (𝜀𝑡 ) affects
the values of 𝑌𝑡 .
Autocorrelation functions
For the AR(1) process:
𝒀𝒕 = 𝜷𝒀𝒕−𝟏 + 𝒆𝒕
We have autocorrelation coefficients
𝝆𝒌 = 𝜷𝒌
For the MA(1) process:
𝑌𝑡 = µ + 𝜀𝑡 + 𝛼𝜀𝑡−1
We have
𝜶
𝝆𝟏 =
𝟏 + 𝜶𝟐
and 𝝆𝒌 = 𝟎, 𝒌 = 𝟐, 𝟑, 𝟒 … .
Autocorrelation functions
Consequently,
a shock in an MA(1) process affects 𝑌𝑡 in two periods only,
while a shock in the AR(1) process affects all future observations with a
decreasing effect.
Theoretical PACFs
Theoretical Partial ACFs are more difficult to derive, so we will only outline their
general properties.
Theoretical PACFs are similar to ACFs, except they remove the effects of other
lags.
That is, the PACF at lag 2 filters out the effect of autocorrelation from lag 1.
Likewise, the partial autocorrelation at lag 3 filters out the effect of autocorrelation
at lags 2 and 1.
A useful rule of thumb is that Theoretical PACFs are the mirrored opposites of
ACFs.
While the ACF of an 𝐴𝑅(𝑝) process dies down exponentially, the PACF has spikes
at lags 1 through 𝑝, and then zeros at lags greater than 𝑝.
The ACF of an 𝑀𝐴(𝑞) process has non-zero spikes up to lag 𝑞 and zero afterward,
while the PACF dampens toward zero, and often with a bit of oscillation
Summary: Theoretical ACFs and PACFs
For 𝐴𝑅(𝒑) processes:
The ACFs decays slowly.
The PACFs show spikes at lags 1 through p, with zeros afterward.
For 𝑀𝐴(𝒒) processes:
The ACFs show spikes at lags 1 through q, with zeros afterward.
The PACFs decay slowly, often with oscillation.
For 𝐴𝑅𝑀𝐴(𝒑, 𝒒) processes:
The ACFs decay slowly.
The PACFs decay slowly
Information Criteria [IC]
To select the appropriate lag length or model for ARMA process we
sometimes ignore the ACF and PACFs and rely on the IC.
An information criterion is a measure of the quality of a statistical model
that considers:
How well the model fits the data
Complexity of the model.
IC are used to compare alternative models fitted to the same data set.
IC is an embodiment of two main factors:
a term which is a function of the sum of squared errors (SSE)
penalty for the loss of degrees of freedom from adding extra parameters
Information Criteria [IC]
Adding a new variable will have two competing effects on the IC: the SSE will
fall but the value of the penalty term will increase
The objective is to choose the number of parameters which minimises the
value of the information criteria.
So, adding an extra term will reduce the value of the criteria only if the fall in
the SSE is sufficient to more than outweigh the increased value of the
penalty term
Hence, all else being equal, a model with a lower IC is superior to a model
with a higher value.
The IC adds harsher penalty for adding more regressors that have no
significant impact on your dependent variable.
Information Criteria [IC]
The two most commonly used IC are the:
Akaike’s Information Criterion (AIC)
Schwarz’s Information Criterion (SIC) or Schwarz Bayesian Information Criterion
(SBC/BIC)
Most regression outputs always comes with the IC that you can use to
compare with other models.
When using STATA you may have to use the command “estat ic”
Chapter Three
Stationarity and Invertibility
What Is Stationarity?
Several methods in time series analysis are only valid if the underlying TS
variable is stationary.
The more stationary something is, the more predictable it is.
A series is said to be stationary when its mean, variance and autocovariance
are time invariant
In cross-sectional analysis, the mean of a variable 𝒀 is 𝑬 𝒀 = 𝝁
In the case of TS variable, the mean may end up being 𝑬 𝒀𝒕 = 𝝁𝒕
This imply that observed mean is time dependent, hence, as 𝒀𝒕 increases the
observed mean could also increase.
This would also affect the variance and autocovariances
What Is Stationarity?
TS variable is therefore said to be stationary when:
𝑬 𝒀𝒕 = μ
𝑽𝒂𝒓 𝒀𝒕 = 𝝈𝟐
𝑪𝒐𝒗 𝒀𝒕 , 𝒀𝒕+𝒌 = 𝑪𝒐𝒗 𝒀𝒕+𝒂 , 𝒀𝒕+𝒌+𝒂
That is the covariance between 𝒀𝒕 and 𝒀𝒕+𝒌 does not depend upon which 𝑡 is;
the time variable could be shifted forward or backward by a periods and the same
covariance relationship would hold.
What matters is the distance between the two observations.
For example, the covariance between 𝒀𝟏𝟗𝟗𝟎 & 𝒀𝟏𝟗𝟗𝟒 is the same as the
covariance between 𝒀𝟏𝟗𝟗𝟓 & 𝒀𝟏𝟗𝟗𝟖 or between 𝒀𝟐𝟎𝟎𝟏 & 𝒀𝟐𝟎𝟎𝟒 . i.e.,
𝑪𝒐𝒗 𝒀𝟏𝟗𝟗𝟎 , 𝒀𝟏𝟗𝟗𝟒 = 𝑪𝒐𝒗 𝒀𝟏𝟗𝟗𝟓 , 𝒀𝟏𝟗𝟗𝟖 = 𝑪𝒐𝒗 𝒀𝟐𝟎𝟎𝟏 , 𝒀𝟐𝟎𝟎𝟒 = 𝑪𝒐𝒗 𝒀𝒕 , 𝒀𝒕+𝟑
The Importance of Stationarity
Stationary processes are better understood than non-stationary ones, and we
know how to estimate them better.
The test statistics of certain non-stationary processes do not follow the usual
distributions.
Knowing how a process is nonstationary will allow us to make the necessary
corrections.
If we regress two completely unrelated integrated processes on each other,
then a problem called “spurious regression” can arise.
The Importance of Stationarity (spurious regression)
Spurious regression is a problem that arises when regression analysis
indicates a strong relationship between two or more variables when in fact,
they are totally unrelated.
That is, they may be seemingly related either due to coincidence, or the
presences of a certain third unseen factor (usually called confounding
variable).
That is if both 𝒀𝒕 and 𝑿𝒕 are nonstationary and we regress 𝑿𝒕 on 𝒀𝒕 , it is
likely that you obtained a result that indicate a strong relationship between 𝒀𝒕
and 𝑿𝒕 , even though there is no real connection.
They both depend upon time, so they would seem to be affecting each other.
This problem would be examined in detail in the next chapter
Restrictions on AR coefficients
Which Ensure Stationarity
Not all AR processes are stationary.
Some grow without limit.
Some have variances which change over time.
Restrictions on AR(1) Coefficients
Consider an 𝑨𝑹(𝟏) process,
𝒀𝒕 = 𝜷𝒀𝒕−𝟏 + 𝒆𝒕 (𝟏)
It is easy to see that it will grow without bound if 𝜷 > 𝟏
it will decrease without bound if 𝜷 < −𝟏
The process will only settle down and have a constant expected value if 𝜷 <
𝟏
This might be intuitively true, but we would have to develop a method for
examining high order 𝑨𝑹(𝟏) processes
First, we rewrite EQ (1) in terms of the lag operator 𝑳
Restrictions on AR(1) Coefficients
𝒀 = 𝜷𝑳𝒀 + 𝒆𝒕 Stationarity is achieved if and only if
Grouping like terms the root of the lag polynomial are
𝒀 − 𝜷𝑳𝒀 = 𝒆𝒕 greater than one in absolute value.
𝒀 𝟏 − 𝜷𝑳 = 𝒆𝒕 if 𝑳 = 𝒛, we solve for the values of 𝒛
The term in parentheses is referred that set the polynomial equal to zero
to as: 𝟏 − 𝒛𝜷 = 𝟎
a polynomial in the lag operator 𝟏 = 𝒛𝜷
∗
𝟏
Lag polynomial 𝒛 =
Characteristic polynomial 𝜷
It is usually denoted by Thus, our lag polynomial has one
𝟏
𝚽 𝐋 = 𝟏 − 𝜷𝑳 root, and it is equal to
𝜷
Restrictions on AR(1) Coefficients
The AR process is stationary if its roots are greater than 1 in magnitude:
𝒛∗ > 𝟏
Which is to say that
𝜷 <𝟏
the AR(1) process is stationary if the roots of its lag polynomial are greater
than one (in absolute value); and this is assured if 𝜷 is less than one in
absolute value.
What Are Unit Roots, and Why Are They Bad?
“Unit roots” refer to the roots of the lag polynomial.
∗ 𝟏
In the AR(1) process, if there was a unit root, then 𝑳 = =𝟏
𝜷
So 𝜷 = 𝟏, which means that the AR process is a random walk.
The problem with unit-root process – that is, with processes that contain
random walks—is that they look stationary in small samples.
But treating them as stationary leads to very misleading results.
Moreover, regressing one non-stationary process on another, leads many “false
positives” where two variables seem related when they are not.
Unit roots represent a specific type of non-stationarity, which we would
explore in the next chapter.
The Connection Between AR and
MA Processes
• Under certain conditions, AR processes can be expressed as infinite
order MA processes. Same is true for MA processes.
• To go from AR to MA, the AR process must be stationary.
• To go from MA to AR, the MA process must be invertible
AR(1) to MA(∞)
There is an important link between AR and MA processes.
A stationary AR process can be expressed as an MA process, and vice versa.
Let's convert an AR(1) process into an MA ∞
Consider the following AR(1) process:
𝒀𝒕 = 𝜷𝒀𝒕−𝟏 + 𝒆𝒕 𝟏. 𝟏
Since the 𝑡 subscript is arbitrary, we can write the above the equation
𝒀𝒕−𝟏 = 𝜷𝒀𝒕−𝟐 + 𝒆 𝟏. 𝟐
𝒀𝒕−𝟐 = 𝜷𝒀𝒕−𝟑 + 𝒆𝒕−𝟐 𝟏. 𝟑
Substituting (1.3) into (1.2) and (1.2) into (1.1)
AR(1) to MA(∞)
𝒀𝒕 = 𝜷 𝜷 𝜷𝒀𝒕−𝟑 + 𝒆𝒕−𝟐 + 𝒆𝒕−𝟏 + 𝒆𝒕
𝒀𝒕 = 𝜷𝒀𝒕−𝟑 + 𝜷𝟐 𝒆𝒕−𝟐 + 𝜷𝒆𝒕−𝟏 + 𝒆𝒕
Continuing the substitutions indefinitely yields:
𝒀𝒕 = 𝒆𝒕 + 𝜷𝒆𝒕−𝟏 + 𝜷𝟐 𝒆𝒕−𝟐 + 𝜷𝟑 𝒆𝒕−𝟑 + ⋯ .
Thus, the AR(1) process is an MA(∞) process:
𝒀𝒕 = 𝒆𝒕 + 𝜸𝟏 𝒆𝒕−𝟏 + 𝜸𝟐 𝒆𝒕−𝟐 + 𝜸𝟑 𝒆𝒕−𝟑 + ⋯ .
Where, 𝜸𝟏 = 𝜷, 𝜸𝟐 = 𝜷𝟐 , 𝜸𝟑 = 𝜷𝟑 , and so forth.
AR(1) to MA(∞)
Can an AR(1) process always be expressed in this way?
No, The reason why an AR(1) process is not always an MA(∞) lies in our
ability to continue the substitution indefinitely.
Invertibility: MA(1) to AR(∞)
Just as we converted an AR(1) process to an MA(∞) process, we can also
convert an MA(1) process to an AR(∞) process.
In short, yes, it is possible as long as the MA process is “invertible”.
Consider the MA(1) Model
Yt = ut + γ ut−1
which can be rewritten as:
ut = Yt − γut−1.
This also implies that
ut−1 = Yt−1 −γut−2
ut−2 = Yt−2 − γut−3
ut−3 = Yt−3 − γut−4
Invertibility: MA(1) to AR(∞)
and so forth.
Substituting the last equation into the last but and the last but one into the third
equation and the third into the second equation
𝒖𝒕 = 𝒀𝒕 + −𝜸𝒊 𝒀𝒕−𝒊
𝒊=𝟏
Invertibility: MA(1) to AR(∞)
Equivalently
∞
𝒀𝒕 = 𝒖𝒕 − −𝜸𝒊 𝒀𝒕−𝒊
𝒊=𝟏
Which is an AR (∞) process, with 𝜷𝟏 = 𝜸, 𝜷𝟐 = −𝜸𝟐 , 𝜷𝟑 = −𝜸𝟑 , 𝜷𝟏 =
− 𝜸𝟒 , and so on
The condition that was required for us to continue the substitutions above
indefinitely is analogous to what was required for stationarity when dealing
with AR processes.
Non-stationarity and
ARIMA 𝒑, 𝒅𝒒 Processes
Our focus since the beginning of the semester have focused on TS whose means did not exhibit long-run
growth.
It is time to drop this assumption.
After all, many economic and financial time series do not have a constant mean.
Examples include, Ghana’s GDP per capita, CPI, GSE Composite Index.
Non-stationary ARIMA models include the “random walk” and the “random walk with drift”
Differencing
Differencing is to time series what differentiation is to calculus.
If we have a TS whose mean is increasing, we can apply the difference
operator enough times to render the series stationary.
If a series needs to be differenced once in order to make it stationary, we say
that the series is “integrated of order one” or “𝐼(1).”
A series that needs to be differenced twice is “integrated of order two” and is
“𝐼(2).”
In general, if a series needs to be differenced 𝑑 times, it is said to be “integrated
of order d” and is “𝐼(𝑑).”
Mean and Variance of the Random Walk (Without Drift)
Given a random walk process with the 𝒀𝟑 = 𝒀𝟐 + 𝒆𝟑 = 𝒀𝟎 + 𝒆𝟏 + 𝒆𝟐 + 𝒆𝟑
coefficient of the lag dependent
variable equal to one Thus,
𝒀𝒕 = 𝒀𝒕−𝟏 + 𝒆𝒕 (𝟏) 𝒀𝒕 = 𝒀𝟎 + 𝒆𝒕
Applying the back-shift or the lag
operator onto both sides of eqn(1) Taking expectation on both sides of the
yields: equation
𝒀𝒕 = 𝒀𝒕−𝟐 + 𝒆𝒕−𝟏 + 𝒆𝒕
Continuing such substitution to the 𝑬 𝒀𝒕 = 𝑬 𝒀𝟎 + 𝒆𝒕 = 𝒀𝟎
period 𝒕 = 𝟎 allows us to write the
random walk.
𝒀𝟏 = 𝒀𝟎 + 𝒆𝟏 a
𝒀𝟐 = 𝒀𝟏 + 𝒆𝟐 = 𝒀𝟎 + 𝒆𝟏 + 𝒆𝟐
Mean and Variance of the Random Walk (Without Drift)
The random walk model tend to be very unpredictable, so our best guess during
the period 0 of what Y will be in period t is just Y’s value right now at period zero.
Taking the variance of a random walk process
𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝒀𝟎 + 𝒆𝟏 + 𝒆𝟐 + ⋯ + 𝒆𝒕
Since each of the error terms are drawn independently of each other, there is no
covariance between them.
Hence, we can push the variance calculations through the additive terms:
𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝒀𝟎 ) + 𝑽𝒂𝒓(𝒆𝟏 ) + 𝑽𝒂𝒓(𝒆𝟐 ) + ⋯ + 𝑽𝒂𝒓(𝒆𝒕
𝑽𝒂𝒓 𝒀𝒕 = 𝟎 + 𝝈𝟐 + 𝝈𝟐 + ⋯ + 𝝈𝟐
𝑽𝒂𝒓 𝒀𝒕 = 𝒕𝝈𝟐
Mean and Variance of the Random Walk (Without Drift)
Since the variance of 𝒀𝒕 is a function of 𝒕, the process is not variance
stationary.
Thus, a clear feature of RWM is the persistence of random shocks which
never dies out. Hence random walk processes tend to have infinite memory.
However, taking the first difference make it a stationary process.
i.e.,
𝜟𝒀𝒕 = 𝒀𝒕 − 𝒀𝒕−𝟏 = 𝒀𝒕−𝟏 − 𝒀𝒕−𝟏 + 𝒆𝒕
𝜟𝒀𝒕 = 𝒆𝒕
Though, 𝒀𝒕 may be nonstationary, the first difference could be stationary.
The Random Walk with Drift
𝒀𝒕 = 𝜷𝟎 + 𝒀𝒕−𝟏 + 𝒆𝒕
This process can be expressed in slightly different terms, which we will find
useful. Given an initial value of 𝒀𝟎 , which we arbitrarily set to zero, then
𝒀𝟎 = 𝟎
𝒀𝟏 = 𝜷𝟎 + 𝒀𝟎 + 𝒆𝟏 = 𝜷𝟎 + 𝒆𝟏
𝒀𝟐 = 𝜷𝟎 + 𝒀𝟏 + 𝒆𝟐 = 𝜷𝟎 + 𝜷𝟎 + 𝒆𝟏 + 𝒆𝟐 = 𝟐𝜷𝟎 + 𝒆𝟏 + 𝒆𝟐
𝒀𝟑 = 𝜷𝟎 + 𝒀𝟐 + 𝒆𝟑 = 𝜷𝟎 + 𝟐𝜷𝟎 + 𝒆𝟏 + 𝒆𝟐 + 𝒆𝟑 = 𝟑𝜷 + 𝒆𝟏 + 𝒆𝟐 + 𝒆𝟑
𝒕
𝒀𝒕 = 𝒕𝜷𝟎 + 𝒆𝒊
𝒊=𝟏
The Mean and Variance of the Random Walk with Drift
Taking the expectations of the last equation above
𝑬 𝒀𝒕 = 𝒕𝜷𝟎
The variance of a random walk with drift would also give as
𝒕 𝒕
𝜟𝒀𝒕 = 𝜷𝟎 + 𝒆𝒕
Let 𝒁𝒕 = 𝜟𝒀𝒕 and we see that:
𝒁𝒕 = 𝜷𝟎 + 𝒆𝒕
The variable 𝒁𝒕 is just a white noise, with a mean of 𝜷𝟎
Deterministic Trend
Deterministic trend is also another example of a non-stationary series.
Consider,
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒕 + 𝒆𝒕
Where t denotes the time elapsed and the 𝜷𝒔 are parameters; the only
random component in the model is 𝒆𝒕 , the IID errors
The mean and variance is given as
𝑬 𝒀𝒕 = 𝑬 𝜷𝟎 + 𝜷𝟏 𝒕 + 𝒆𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒕
𝜟𝒀𝒕 = 𝜷𝟏 𝒕 + 𝒆𝒕 − 𝜷𝟏 𝒕 − 𝜷𝟏 − 𝒆𝒕−𝟏
𝜟𝒀𝒕 = 𝜷𝟏 + 𝒆𝒕 − 𝒆𝒕−𝟏
Since this first-differenced series does not depend upon time, then the mean
and variance of this first-differenced series also do not depend upon time:
𝑬 𝚫𝐘𝐭 = 𝐄 𝜷𝟏 + 𝒆𝒕 − 𝒆𝒕−𝟏 = 𝜷𝟏
𝒀𝒕 = 𝜷𝟎 + 𝒀𝒕−𝟏 + 𝒆𝒕 = 𝒕𝜷𝟎 + 𝒆𝒊
𝒊=𝟏
with mean and variance of
𝑬 𝒀𝒕 = 𝒕𝜷𝟎
𝑽𝒂𝒓 𝒀𝒕 = 𝒕𝝈𝟐𝒆 .
Random Walk with Drift vs Deterministic Trend
In order to make it stationary, it needs to be differenced. The deterministic
trend model is:
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 + 𝒆𝒕
with mean and variance of
𝑬 𝒀𝒕 = 𝒕𝜷𝟏 + 𝜷𝟎
𝑽𝒂𝒓 𝒀𝒕 = 𝝈𝟐𝒆
Both models have means which increase linearly over time.
This makes it very difficult to visually identify which process generated the
data.
The variance of the random walk with drift, however, grows over time, while
the variance of the deterministic trend model does not.
Unit Root Tests
Unit Root Tests
Consider an AR(1) process with drift
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + 𝒆𝒕 , 𝒆𝒕 ~𝒊𝒅𝒅 𝟎, 𝝈𝟐
The simple approach to unit root testing is to estimate the above eqn using
OLS and examine the estimated 𝜷
Use a t-test with null 𝑯𝟎 : 𝜷𝟏 = 𝟏 (Non-stationary)
Against the alternative 𝑯𝑨 : 𝜷𝟏 < 𝟏 (Stationarity)
𝜷𝟏 −𝟏
The test statistic is obtained as
𝑺𝒆 𝜷𝟏
Reject the null hypothesis when the test statistic is large negative
Unit Root Tests
The problem with the simple t-test approach is that
Lagged dependent variables ⇒ 𝜷𝟏 biased downwards in small samples (i.e.,
dynamic bias)
When 𝜷𝟏 = 𝟏, we have non-stationary process and standard regression
analysis is invalid (i.e., non-standard distribution)
Dickey Fuller(DF) Approach to Unit Roots Testing
𝒀𝒕 = 𝝆𝒀𝒕−𝟏 + 𝒆𝒕
𝜟𝒀𝒕 = 𝝆𝒀𝒕−𝟏 − 𝒀𝒕−𝟏 + 𝒆𝒕
𝜟𝒀𝒕 = (𝝆 − 𝟏)𝒀𝒕−𝟏 + 𝒆𝒕
𝜟𝒀𝒕 = 𝜹𝒀𝒕−𝟏 + 𝒆𝒕
With the DF test we test the
𝑯𝟎 : 𝜹 = 𝟎
𝑯𝑨 : 𝜹 ≠ 𝟎
Augmented Dickey Fuller Test (ADF)
The main criticism against the DF test is that the power of the tests is low if
the process is stationary but with a root close to the non-stationary boundary.
E.g., the DF test is poor in determining the difference between
𝜹=𝟏 & 𝜹=𝟎.𝟗𝟓
More so the DF test assumes that the error term is a white noise.
To help solve this problem the ADF test incorporate lagged dependent
variables.
Testing Unit Roots – Augmented Dickey Fuller Test
(ADF)
The main criticism against the DF test is that the power of the tests is low if
the process is stationary but with a root close to the nonstationary boundary.
E.g. the DF test is poor in determining the difference between 𝜃 = 1 or 𝜃 =
0.95
More so the DF test is based on the assumption that the error term is a white
noise.
Hence the ADF test was developed to help solve the weaknesses associated
with the DF test.
To help improve on the deficiencies associated with the DF, the ADF
includes lag values of the dependent variable.
ADF is for higher lag order.
Testing Unit Roots - Phillips Perron Test (PP)
Unlike the ADF test which includes more lags to correct for the problem of
serial correlation in
the error term,
The PP employs a nonparametric statistical method to take of the problem
of serial correlation in the error term without adding lag difference terms.
However the PP and the ADF test have the same asymptotic distribution.
Critique of ADF and PP
The major issue associated the ADF and the PP test has to do with the size
and power of these
tests.
By size of the test we are referring to the level of significance of the test.
That is the probability of committing type I error.
By the power of a test we are referring to the probability of rejecting a false
hypothesis. This is calculated by subtracting the probability of committing
type II error from 1.
Critique of ADF and PP
SIZE OF THE TEST
Both the ADF and PP test are very sensitive to the way it is
conducted. This is because we could have a:
▪ Pure random walk;
▪ Random walk with drift and
▪ Random walk with drift and trend
This implies that if you don’t have an idea about the true model of the series
you may end up with wrong conclusions.
E.g. if the true model is (1) but we estimate (2) and we
obtain a significance level of 5% that the series is stationary, when actually
the level of significance may actually be higher.
Critique of ADF and PP
POWER OF THE TEST
The ADF & PP test tend to accept the null of unit root more frequently than
it is warranted.
That is they have low power, they find unit root even when they do not exist.
Reasons for this situation include the following
❖It depends on the time span of the data more than the mere size of the sample
❖Thus, the power greater when the time span is large
❖A unit root test based on 30 observations over 30-years may have greater power
than that based on say 90 observations over a span of 90 days.
Critique of ADF and PP
POWER OF TEST (2)
If θ ≈ 1 but not exactly 1, the test may declare such series as non-stationary
These tests also assumes a single unit root i.e. they assume that a given time
series is I(1). But if a series is integrated of a higher order, there would be
more than one unit root.
In addition, if there are structural breaks in a particular series, the unit root
test may be unable to detect.
Testing Unit Roots - Kwiatkowski, Phillips, Schmidt,
and Shin (KPSS) Test
The KPSS test differs from the other unit root tests described. This is
because it has a null hypothesis that the series is trend stationary
The rejection of the null hypothesis is an indication unit root.
𝐻0 : 𝑦𝑡 ~ 𝐼(0) (stationarity)
against 𝐻1 : 𝑦𝑡 ~ 𝐼(1) (a unit root).
Therefore, if the results of the tests above indicate a unit root but the result
of the KPSS test indicates a stationary process, one should be cautious and
opt for the latter result.
Very powerful test, but it has problems with structural breaks (say, volatility
shifts).