0% found this document useful (0 votes)
15 views114 pages

Lecture Notes TS Econometrics

The document outlines a course on Time Series Econometrics, covering topics such as ARMA models, stationarity, unit roots, VAR, and cointegration. It emphasizes the unique characteristics of time series data, including dependencies over time and the importance of understanding trends, seasonal variations, and cyclical movements. The course aims to equip students with the necessary statistical concepts and techniques for effective time series analysis in economic contexts.

Uploaded by

Eugene Adu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views114 pages

Lecture Notes TS Econometrics

The document outlines a course on Time Series Econometrics, covering topics such as ARMA models, stationarity, unit roots, VAR, and cointegration. It emphasizes the unique characteristics of time series data, including dependencies over time and the importance of understanding trends, seasonal variations, and cyclical movements. The course aims to equip students with the necessary statistical concepts and techniques for effective time series analysis in economic contexts.

Uploaded by

Eugene Adu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Time Series Econometrics

William G. Cantah (PhD)


Data Science and Economic Policy
Centre for Data Archiving, Management, Analysis and Advocacy
(CDAMAA)
Course Outline
Introduction AR(1) Models
Understanding Time Series AR(p) Models
Components of Time Series MA(1) Models
What Makes Time Series Econometrics MA(q) Models
Unique Non-Zero ARMA Processes
Notations and Some Basic Properties ARMA(𝒑, 𝒒) Models
of Time Series
Statistical Review (Students’ Reading
Model Selection in ARMA (𝒑, 𝒒)
Assignment) Processes
Specifying Time in Stata Theoretical ACFs and PACFs
Empirical AFCs and PACFs
ARMA(𝒑, 𝒒) Processes Putting it all together
Introduction (Stationarity & a Purely Information criteria
Random Processes)
Course Outline
Stationarity and Invertibility Unit Roots Tests
What is stationarity Dickey-Fuller tests
The importance if Stationarity ADF
Restrictions on AR and MA Processes DF-GLS Tests
What are Unit Roots and why are they bad? Phillips-Perron Tests
Non-Stationarity and ARIMA 𝒑, 𝒅, 𝒒 KPSS Tests
Processes Nelson and Plosser
Differencing Testing for Seasonal Unit Roots
The random walk ARCH & GARCH
The random walk with drift Conditional and Unconditional Moments
Deterministic trend ARCH Models
Random walk with Drift vs Deterministic GARCH Models
trend
Differencing and detrending appropriately
VAR (Basics) Cointegration
A simple VAR(1) and how to Estimate Cointegration and Error correction
it Mechanism
How many Lags to include Deriving ECM
Expressing VARs in Matrix form Engel and Granger’s Residual-Based
Stability Tests of Cointegration
Long-Run Levels: Including a Constant Basic Introduction to the Bounds
Expressing a VAR as a VMA Process Testing approach
Impulse Response Functions Cointegration Implies Granger
Causality
Forecasting
Granger Causality
Introduction

Understanding Time Series


Introduction
Planning is necessary for individuals, businesses, governments and
institutions.
Planning requires understanding of the past, and the present.
This implies you would need datasets collected over a period of time.
This is what gives rise to time series analysis.
A Time Series is a set of observation taken at specified times, usually at equal
intervals.
It can also be seen as a sequence of numerical data in which each item is
associated with a particular instant in time.
It could also be seen as the sequential measurement of a given phenomenon
taken at regular time intervals.
Introduction Cont.
Examples: monthly inflation rate, daily closing prices of stock indices,
weekly measures of money supply, annual economic growth rate. Etc.
Time series variable could either be a stock series or flow series
Stock series are measures of activities at a point in time and can be thought
of as stocktakes (e.g., Labour force surveys takes stock of whether a person
was employed in the reference week)
Flow series are series which are measure of activity to date (e.g., Balance of
Payment)
An analysis of a single sequence of data is called univariate time-series
analysis.
An analysis of several sets of data for the same sequence of time periods is
called multivariate time-series analysis.
Examples of Time Series in Ghana
Examples of Time Series in Ghana
Uses of Time Series
The most important use of studying time series is that it helps us to predict
the future behaviour of the variable based on past experience
It is helpful for business planning as it helps in comparing the actual current
performance with the expected one
From time series, we get to study the past behaviour of the phenomenon or
the variable under consideration to inform future decisions
We can compare the changes in the values of different variables at different
times or places, etc.
Introduction
Components of Time Series
Components of Time Series Data
Because time series involves the observation of a phenomenon over a period
of time, time series variables tend to be affected by several factors over a
period.
Some these factors stems from the fact that the variable has been observed
over a long period of time and as such follows a particular pattern or trend.
TS variables could also be affected by specific occurrence within the year
which may create an annual pattern in the variable to observed
Also due to economic downturns and booms that may occur over a long time,
TS variables could also be associated with some ups and downs
Some TS variables are also affected by unpredictable events such
earthquakes, flooding, pandemics etc. which would affect the regular
patterns in the TS variable.
Components of Time Series Data
Long-Term Random or Irregular
Movements or Trend Components Movement
of Time
Series

Short-Term Movements

Seasonal Variations Cyclical Variations


Trend
The tendency for data to increase or
decrease over a period of time.
A trend is generally a long-term
phenomenon.
Trend results from long term effects of
socio-economic and political factors.
It may show the growth or decline in a
time series over a long period.
This is the type of tendency which
continues to persist for a very long
period
Seasonal Variations
These are the rhythmic forces which operate in
a regular and periodic manner over a span of
less than a year.
They have the same or almost the same pattern
during a period of 12 months.
This variation will be present in a time series if Jul
the data are recorded hourly, daily, weekly, Jul Jul
quarterly, or monthly. Aug Aug
Aug
These variations come into play either because Sep
Sep Sep
of the natural forces or man-made
conventions.
The various seasons or climatic conditions play
an important role in seasonal variations. S
Cyclical Variations
The variations in a time series which
operate themselves over a span of
more than one year are the cyclic
variations.
This oscillatory movement has a
period of oscillation of more than a
year.
One complete period is a cycle.
This cyclic movement is sometimes
called the ‘Business Cycle’.
Random or Irregular Movements
There is another factor which causes
the variation in the variable under
study.
They are not regular variations and
are purely random or irregular.
These fluctuations are unforeseen,
uncontrollable, unpredictable, and
are erratic.
These forces are earthquakes, wars,
flood, famines, and any other
disasters.
Summary of the Components of Time Series
Introduction
What Makes Time Series Econometrics Unique
Uniqueness of Time Series Econometrics
Uniqueness of Time Series Econometrics
Panel (a) which shows a cross sectional data set relies on the assumption that
observations are independent.
If we take a sample 10,000 people and quiz them about their employment
status, we are likely to get a mixture of answers.
We might be in a particularly bad spell in the economy, one person’s
unemployment status is not likely to affect another person’s unemployment
status.
i.e., if Kofi is unemployed, it doesn’t mean Yaw can’t get employed.
But if we are focused on the unemployment rate, year after year, then this
year’s performance is likely influenced by last year’s economy.
The observations in time series are almost never independent. Usually, one
observation is correlated with the previous observation
Uniqueness of Time Series Econometrics
This creates a situation where TS variables tend to have strong dependency
with its past.
TS variables tend to have strong relationship between current and past
values.
With this strong dependency, most TS variables tend to evolve slowly, which
makes the past a useful guide for the future.
Furthermore, because a number of TS variables co-move in time, they also
tend to strong relationships with these other variables
i.e., TS variables tend to have strong intra and inter variable correlations
Unlike cross sectional variables where samples tend to be independent and
normally distributed, TS variables per the data generating processes could be
affected irregular occurrences which could result in outliers in the data
Uniqueness of Time Series Econometrics
This situations tend make it difficult to use the OLS for effective time series
analysis
OLS estimates are sensitive to outliers.
OLS attempts to minimize the sum of squares for errors; time series with a
trend will result in OLS placing greater weight on the first and last
observations.
OLS treats the regression relationship as deterministic, whereas time series
have many stochastic trends.
We can do better modeling dynamics than treating them as a nuisance.
Introduction
Notations and Some Basic Properties of Time
Series
Time Series and Their Features
Though time series is used in all aspects of economic analysis, their formal
study requires special statistical concepts and techniques without which
erroneous inferences and conclusions may all too readily drawn.
A time series on some variable 𝒙 would be denoted as 𝒙𝒕 , where the subscript
𝒕 represents time, with 𝒕 = 𝟏 being the first observation available on 𝒙 and
𝒕 = 𝑻 being the last.
The complete set of time series 𝒕 = 𝟏, 𝟐, … , 𝑻 will often be referred to as the
observation period.
The observations are typically measured at equally spaced intervals.
Time series can be used to calculate future forecasts &, hence, unknown
values of 𝒙𝒕 at say, times 𝑻 + 𝟏, 𝑻 + 𝟐, … 𝑻 + 𝒉, where 𝒉 is the forecast
horizon
Notations
Random variables will be denoted by capital letters
𝑿, 𝒀, 𝒁
Realisations of the random variable will take lower-case letters
𝒙, 𝒚, 𝒛
The value of 𝑿 at a particular time period 𝒕, will be denoted 𝑿𝒕 𝒐𝒓 𝒙𝒕
Unknown parameters will be denoted with Greek letters such as
𝜷, 𝜸, 𝝁 … 𝒆𝒕𝒄.

Estimates of these parameters will be denoted with the notation ^ e.g., 𝜷
Sometimes we speak of a “Lag Operator” or “Lag Polynomial.”
Lag operator is used to capture the previous values of TS variable
E.g., 𝑿𝒕−𝟏 , 𝑿𝒕−𝟐 … 𝑿𝒕−𝒌
Notations
We also have the difference operator (𝚫) in TS in which the previous value of
a TS variable is subtracted from the current value.
i.e., 𝚫𝐗 𝐭 = 𝐗 𝐭 − 𝐗 𝐭−𝟏
The Lag and Differencing Operators
The lag operator L is defined for a time series {𝒀𝒕 } by
𝑳𝒀𝒕 = 𝒀𝒕−𝟏
The operator can be defined for linear combinations by
𝑳 𝜶𝒀𝒕 + 𝜷𝑿𝒕 = 𝜶𝒀𝒕−𝟏 + 𝜷𝒀𝒕−𝟏
In addition to being linear, the lag operator preserves inner products
𝑳𝒀𝒔 , 𝑳𝒀𝒕 = 𝒄𝒐𝒗 𝒀𝒔−𝟏 , 𝒀𝒕−𝟏
= 𝒄𝒐𝒗 𝒀𝒔 , 𝒀𝒕
= (𝒀𝒔 , 𝒀𝒕 )
An operator of this type is called a unitary operator
The Lag and Differencing Operators
There is natural calculus of operators. For example, we can define powers of
𝐿 naturally by:
𝑳𝟐 𝒀𝒕 = 𝑳𝑳𝒀𝒕 = 𝑳𝒀𝒕−𝟏 = 𝒀𝒕−𝟐
𝑳𝟑 𝒀𝒕 = 𝑳𝑳𝟐 𝒀𝒕 = 𝒀𝒕−𝟑

𝑳𝒌 𝒀𝒕 = 𝒀𝒕−𝒌
and linear combinations by
𝜶𝑳𝒌 + 𝜷𝑳𝒍 𝒀𝒕 = 𝜶𝒀𝒕−𝒌 + 𝜷𝒀𝒕−𝟏
Other operators can be defined in terms in terms of L
The differencing operator defined by
𝚫𝐘𝐭 = 𝟏 − 𝐋 𝐘𝐭 = 𝐘𝐭 − 𝐘𝐭−𝟏
The Lag and Differencing Operators
Differencing is of fundamental importance when dealing with models of
non-stationary TS. Again, we can define powers of this operator
𝚫𝟐 𝐘𝐭 = 𝚫 𝚫𝐘𝐭
= 𝚫 𝐘𝐭 − 𝐘𝐭−𝟏
= 𝒀𝒕 − 𝒀𝒕−𝟏 − 𝒀𝒕−𝟏 − 𝒀𝒕−𝟐
= 𝒀𝒕 − 𝟐𝒀𝒕−𝟏 + 𝒀𝒕−𝟐
ARMA(𝒑, 𝒒) Processes
Stationarity and a Purely Random Processes
ARMA Model
ARMA is a combination of two different models. i.e., Autoregressive (AR)
and Moving Average (MA) models.
AR Model involves using the past observations of a TS variable to predict
future behaviour of the series.
AR model specifies that the output variable depends linearly on its own
previous values and on a stochastic term.
A Moving Average model uses past forecast errors in a regression
Thus, Both rely on previous data to help predict future outcomes.
AR and MA models are the building blocks of all Time Series Models
Stationarity and Non-Stationarity
In order to use AR and MA models the data must be “well behaved.”
Formally, the data need to be “stationary.”
Roughly speaking, a time series is stationary if its behaviour does not change
over time.
This implies that the values always tend to vary about the same level and that
their invariability is constant over time.
That is a TS is said to be stationary when its average value does not vary with
time.
Stationarity in TS plays fundamental role in Macroeconometrics
However not all time series that we encounter are stationary.
Indeed, nonstationary series tend to be the rule rather than the exception.
Stationarity and Non-Stationarity
Suppose that you have TS variable 𝒀𝒕 = 𝒀𝟎 , 𝒀𝟏 , 𝒀𝟐 .
𝒀𝒕 is mean stationary if the expected value of Y is not time dependent.
That is the expected value of Y should not be a function of time
𝑬 𝒀𝒕 = 𝑬 𝒀𝟏 = 𝑬 𝒀𝟐 = ⋯ = 𝑬 𝑿 = 𝝁
Likewise, Y is said to be variance stationary if its variance is also not a
function of time
𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝒀𝟏 = 𝑽𝒂𝒓 𝒀𝟐 = ⋯ = 𝑽𝒂𝒓 𝒀 = 𝝈𝟐
Finally, Y is covariance stationary if the covariance of Y with its own lagged
values depends only upon the length of the lag, but not on specific time
period or the direction of the lag.
𝑪𝒐𝒗 𝒀𝒕 , 𝒀𝒕+𝟏 = 𝑪𝒐𝒗(𝒀𝒕−𝟏 , 𝒀𝒕 )
Stochastic Processes
Suppose you are the manager at a betting shop, and one of your jobs is to
track and predict the flow of cash into and from the shop.
How much cash will you have on hand on Tuesday of next week?
Suppose you have daily data extending back for the previous 1000 days.
Let 𝒀𝒕 denote the net flow of cash into the shop on day 𝑡.
Can we predict tomorrow’s cash flow 𝒀𝒕+𝟏 , given what happened today,
𝒀𝒕 , yesterday (𝒀𝒕−𝟏 ) and before?
Consider a model of the of the following form:
𝒀𝒕 = 𝒆𝒕
Where; 𝒆𝒕 ~𝒊𝒊𝒅 𝑵(𝟎, 𝟏)
Stochastic Processes
The equation above implies that 𝒀𝒕 is just pure random error.
Though this may not be useful or even accurate model of betting shop, it is
useful starting point to aid understanding of TS modelling.
Note that each day’s cash flow is independent of the previous day’s cash flow
And the amount of money that flows into the shop is also offset by the
outflow.
Thus, the average cashflow would zero. i.e.,
𝑬 𝒀𝒕 = 𝑬 𝒆𝒕 = 𝟎
Since the mean of 𝒆𝒕 is zero for all 𝒕, this implies the process is mean
stationary & variance 𝑽 𝒀𝒕 = 𝑽 𝒆𝒕 = 𝟏
The AR(1) Model
A model which depends only on the previous outputs of the system is called
an autoregressive model.
Consider, now a different type of model:
𝒀𝒕 = 𝜷𝒀𝒕−𝟏 + 𝒆𝒕
We’ll look more closely at this simple random process.
It is the workhorse of timeseries econometrics, and we will make extensive
use of its properties throughout this course.
Let's use the ARexamples.dta to try out an estimation
Problems with OLS in Models with LDVs
OLS estimates are sensitive to outliers.
OLS attempts to minimize the sum of squares for errors; time series with a
trend will result in OLS placing greater weight on the first and last
observations.
OLS treats the regression relationship as deterministic, whereas time series
have many stochastic trends.
AR models explicitly have lagged dependent variables (LDVs)
This means that even if the errors are IID and serially uncorrelated, OLS
estimates of the parameters will be biased.
Problems with OLS in Models with LDVs
To see this, consider a simple AR(1) model:
𝒀𝒕 = 𝜷𝒀𝒕−𝟏 + 𝒆𝒕 (1)
Where 𝜷 < 𝟏 and 𝒆𝒕 ~𝒊𝒊𝒅 𝑵 𝟎, 𝝈𝟐 .
ഥ=𝟎
We will see shortly that this restriction on 𝜷 implies that 𝑬 𝒀 = 𝒀
The variable 𝒀𝒕−𝟏 on the right-hand side is a lagged dependent variable.
Estimating OLS of 𝒀 on its lag produces a biased (but consistent) estimate
of 𝜷.
Problems with OLS in Models with LDVs
To see this, recall from introductory econometrics that the OLS estimate of 𝜷
is
𝑪𝒐𝒗 𝒀𝒕 ,𝒀𝒕−𝟏 σ 𝒀𝒕 𝒀𝒕−𝟏

𝜷𝑶𝑳𝑺 = = (2)
𝑽𝒂𝒓 𝒀𝒕−𝟏 σ 𝒀𝟐𝒕−𝟏
Substituting 𝒀𝒕 from (1) give:
σ(𝜷𝒀𝒕−𝟏 + 𝒆𝒕 )𝒀𝒕−𝟏
෡ 𝑶𝑳𝑺 =
𝜷
σ 𝒀𝟐𝒕−𝟏

σ(𝜷𝒀𝟐𝒕−𝟏 + 𝒆𝒕 𝒀𝒕−𝟏 )
෡ 𝑶𝑳𝑺
𝜷 =
σ 𝒀𝟐𝒕−𝟏
Problems with OLS in Models with LDVs
σ 𝜷𝒀𝟐𝒕−𝟏 σ 𝒆𝒕 𝒀𝟐𝒕−𝟏
෡ 𝑶𝑳𝑺 =
𝜷 +
σ 𝒀𝟐𝒕−𝟏 σ 𝒀𝟐𝒕−𝟏
Since 𝜷 is a constant we can pull it out of the summation and simplify:

σ 𝒀𝟐𝒕−𝟏 σ 𝒆𝒕 𝒀𝟐𝒕−𝟏
෡ 𝑶𝑳𝑺 = 𝜷 ∙
𝜷 +
σ 𝒀𝟐𝒕−𝟏 σ 𝒀𝟐𝒕−𝟏

σ 𝒆𝒕 𝒀𝟐𝒕−𝟏
෡ 𝑶𝑳𝑺 = 𝜷 +
𝜷
σ 𝒀𝟐𝒕−𝟏
Problems with OLS in Models with LDVs
Thus, we can see that the OLS estimate 𝜷 ෡ 𝑶𝑳𝑺 is equal to the true value of 𝜷 plus
some bias.
Fortunately, this bias shrinks in larger samples (that is, the estimate is said to be
“consistent.”)
If the errors are autocorrelated then the problem is worse.
OLS estimates are biased and inconsistent.
That is, the problem of bias doesn’t go away even in infinitely large samples
Simulated data in the first figure shows that OLS estimates on LDVs are biased in
small samples, but that this bias diminishes as the sample size increases.
The second figure shows that OLS’s bias does not diminish in the case where the
errors are autocorrelated.
AR(p) Model
The idea of an autoregressive model can be extended to include lags
reaching farther back than one period.
In general, a process is said to be 𝐴𝑅(𝑝) if:
𝒀𝒕 = 𝜷𝟏 𝒀𝒕−𝟏 + 𝜷𝟐 𝒀𝒕−𝟐 + ⋯ + 𝜷𝒑 𝒀𝒕−𝒑 + 𝒆𝒕
Remember this still based on the assumption that the process is stationary.
Macroeconomic theory in most cases does not give you any idea about the
number of lags to include in a model, this is because the problem is mainly
an econometric one.
AR model with more lags accommodate richer dynamics and makes the
residuals closer to white noise.
There are few instances where economic theory suggest the number of lags
to be included in a model.
AR(p) Model
Paul Samuelson’s multiplier-accelerator model is a typical example.
It begins with GDP for a two-sector economy
𝒀𝒕 = 𝑪𝒕 + 𝑰𝒕

𝑪𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏
The investment is modelled as a function of growth in consumption
𝑰𝒕 = 𝜷𝟐 𝑪𝒕 − 𝑪𝒕−𝟏 + 𝒆𝒕
These three equation imply an AR(2) model:
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝟏 + 𝜷𝟐 𝒀𝒕−𝟏 − 𝜷𝟏 𝜷𝟐 𝒀𝒕−𝟐 + 𝒆𝒕
𝒐𝒓
𝒀𝒕 = 𝜶𝟎 + 𝜶𝟏 𝒀𝒕−𝟏 + 𝜶𝟐 𝒀𝒕−𝟐 + 𝒆𝒕
AR(p) Model
with the 𝜶’s properly defined. Samuelson’s model accommodates different
kinds of dynamics: dampening, oscillating, etc. . . depending on the
estimated parameters.
MA (1) Process
Since data generating processes are presumed to be purely random in nature,
𝒀𝒕 could be modelled as a weighted average of all previous errors.
Thus, the simplest form of an MA model is:
𝒀𝒕 = 𝒆𝒕

𝒆𝒕 = 𝒖𝒕 + 𝜷𝒖𝒕−𝟏

𝒖𝒕 ~𝒊𝒊𝒅 𝑵 𝝁, 𝝈𝟐𝒖
This can be condensed to:
𝒀𝒕 = 𝒖𝒕 + 𝜷𝒖𝒕−𝟏
MA(q) Models
Moving Average models can be functions of lags deeper than 1.
The general form of the Moving Average model with lags of one through q
𝒒

𝒀𝒕 = 𝒖𝒕 + 𝜷𝟏 𝒖𝒕−𝟏 + 𝜷𝟐 𝒖𝒕−𝟐 + ⋯ + 𝜷𝒒 𝒖𝒕−𝒒 = ෍ 𝒖𝒕−𝒊 𝜷𝒊


𝒊=𝟎
Non-Zero AR Processes
While many processes have a zero mean, many more do not. E.g., GDP or
GNP don’t vary around zero.
Similarly, unemployment rate, discount rate, MPR all do not vary around zero.
The zero mean assumption we have used so far is for easy understanding of
key concepts.
Consider a stationary AR(1) process with an additional constant term.
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + 𝒆𝒕 (𝟏)
Taking the expectations of both sides
Non-Zero AR Processes
Taking the expectations of both sides
𝑬 𝒀𝒕 = 𝑬 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + 𝒆𝒕

= 𝜷𝟎 + 𝑬 𝜷𝟏 𝒀𝒕−𝟏 + 𝑬 𝒆𝒕

= 𝜷𝟎 + 𝜷𝟏 𝑬 𝒀𝒕−𝟏
Stationarity implies that 𝑬 𝒀𝒕 = 𝑬 𝒀𝒕−𝟏
𝑬 𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝑬 𝒀𝒕
Grouping like terms
𝑬 𝒀𝒕 − 𝜷𝟏 𝑬 𝒀𝒕 = 𝜷𝟎

𝑬 𝒀𝒕 [𝟏 − 𝜷𝟏 ] = 𝜷𝟎
Non-Zero AR Processes
Making 𝑬 𝒀𝒕 the subject
𝜷𝟎
𝑬 𝒀𝒕 = , 𝑵𝑩 𝜷𝟏 ≠ 𝟎
𝟏 − 𝜷𝟏
If the process were AR(p), then the expectation generalizes to
𝜷𝟎
𝑬 𝒀𝒕 =
𝟏 − 𝜷𝟏 − 𝜷𝟐 − ⋯ 𝜷𝒑
What is the variance of a non-zero AR(1) process?
𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + 𝒆𝒕

= 𝑽𝒂𝒓(𝜷𝟎 ) + 𝑽𝒂𝒓 𝜷𝟏 𝒀𝒕−𝟏 + 𝑽𝒂𝒓 𝒆𝒕


Non-Zero AR Processes
𝟎 + 𝜷𝟐𝟏 𝑽𝒂𝒓 𝒀𝒕−𝟏 + 𝝈𝟐𝒆
Stationarity implies that 𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝒀𝒕−𝟏
𝑽𝒂𝒓 𝒀𝒕 = 𝜷𝟐𝟏 𝑽𝒂𝒓 𝒀𝒕 + 𝝈𝟐𝒆

𝝈𝟐𝒆
𝑽𝒂𝒓 𝒀𝒕 =
𝟏 − 𝜷𝟐𝟏
Non-Zero MA Processes
Now, let’s consider the following MA(1) process with an intercept:
𝒀𝒕 = 𝜶 + 𝒖𝒕 + 𝜷𝒖𝒕−𝟏 , 𝒖𝒕 ~𝑵 𝟎, 𝝈𝟐𝒖
The constant, 𝜶 allows the mean of the error to be non-zero.
What are the features of this type of MA(1) model? What is the mean of such
a process?
𝑬 𝒀𝒕 = 𝑬 𝜶 + 𝒖𝒕 + 𝜷𝒖𝒕−𝟏
𝑬(𝒀𝒕 ) = 𝜶 + 𝑬(𝒖𝒕 ) + 𝜷𝑬(𝒖𝒕−𝟏 )
𝑬 𝒀𝒕 = 𝜶 + 𝟎 + 𝟎
𝑬 𝒀𝒕 = 𝜶
The rather straightforward result is that mean of an MA(1) process is equal to
the intercept.
Non-Zero MA Processes
This generalises to any MA(1)
𝑬 𝒀𝒕 = 𝑬 𝜶 + 𝒖𝒕 + 𝜷𝟏 𝒖𝒕−𝟏 + 𝜷𝟐 𝒖𝒕−𝟐 + ⋯ + 𝜷𝒒 𝒖𝒕−𝒒
𝑬(𝒀𝒕 ) = 𝜶 + 𝑬(𝒖𝒕 ) + 𝜷𝑬(𝒖𝒕−𝟏 ) + 𝜷𝟐 𝑬(𝒖𝒕−𝟐 ) + ⋯ + 𝜷𝒒 𝑬(𝒖𝒕−𝒒 )
𝑬 𝒀𝒕 = 𝜶 + 𝟎 + 𝟎 + 𝟎 + 𝟎
𝑬 𝒀𝒕 = 𝜶
Non-Zero MA Processes
What is the variance of a non-zero MA(1) process?
𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝜶 + 𝒖𝒕 + 𝜷𝒖𝒕−𝟏

𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝜶 + 𝑽𝒂𝒓 𝒖𝒕 + 𝜷𝟐 𝑽𝒂𝒓 𝒖𝒕−𝟏

𝑽𝒂𝒓 𝒀𝒕 = 𝝈𝟐𝒖 + 𝜷𝟐 𝝈𝟐𝒖

𝑽𝒂𝒓 𝒀𝒕 = 𝝈𝟐𝒖 = 𝟏 + 𝜷𝟐
We moved from the first to the second line because, since the 𝒖𝒕 are white noise at
all 𝑡, there is no covariance between 𝒖𝒕 and 𝒖𝒕−𝟏 . We moved to the third line
because 𝜶 and 𝜷 are not random variables.
Notice that the variance does not depend on the added constant (𝜶). That is,
adding a constant affects the mean of an MA process, but does not affect its
variance
Dealing with Non-zero Means
If we subtract the mean (𝒀ഥ ) from 𝒀𝒕
෪𝒕 = 𝒀𝒕 − 𝒀
𝒀 ഥ
෩ 𝒕 will have a mean of zero:
the resulting variable 𝒀
𝑬 𝒀෩ 𝒕 = 𝑬 𝒀𝒕 − 𝒀 ഥ = 𝑬 𝒀𝒕 − 𝑬 𝒀 ഥ =𝒀 ഥ−𝒀ഥ=𝟎
But the same variance:
𝑽𝒂𝒓 𝒀෩ 𝒕 = 𝑽𝒂𝒓 𝒀𝒕 − 𝒀 ഥ = 𝑽𝒂𝒓 𝒀𝒕 − 𝟎 = 𝑽𝒂𝒓(𝒀𝒕 )
Subtracting a constant shifts our variable (changes its mean) but does not
affect the dynamics nor the variance of the process.
Dealing with Non-zero Means
How do we deal with an AR process that doesn’t have mean of zero?
We could directly estimate a model with an intercept.
Alternatively, we could de-mean the data.
Then we can estimate an AR process in the demeaned variables without an
intercept.
Assuming we have a random variable, 𝒀𝒕 , with a non-zero mean, but with a
mean of 𝒀 ഥ.
Since there is no subscript t with the mean, it implies the mean is time
invariant.
We can prove that de-meaning TS variables reverse it from a non-zero AR(1)
process to a zero AR(1) process
Dealing with Non-zero Means
Given a non-zero AR(1) process:
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + 𝒆𝒕
𝜷𝟎
ഥ=
Remember that the mean of 𝒀
𝟏−𝜷𝟏
If we subtract the mean of 𝒀ෙ = 𝒀𝒕 − 𝒀

ෙ would have a zero mean.
The resulting variable 𝒀
𝑬 𝒀ෙ = 𝑬 𝒀𝒕 − 𝒀 ഥ = 𝑬 𝒀𝒕 − 𝑬 𝒀 ഥ =𝒀 ഥ−𝒀
ഥ=𝟎
But the variance would remain the same
𝑽𝒂𝒓 𝒀ෙ = 𝑽𝒂𝒓 𝒀𝒕 − 𝒀 ഥ = 𝑽𝒂𝒓 𝒀𝒕 − 𝟎 = 𝑽𝒂𝒓(𝒀𝒕 )
Subtracting a constant shifts our variable (changes its mean) but does not
affect the dynamics nor the variance of the process.
Dealing with Non-zero Means
We can now proof that de-meaning the variable changes the model from
AR(1) with a constant to our more familiar zero mean AR(1) process.
Given a non-zero AR(1) process
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + 𝒆𝒕
By replacing all the terms in the equation above we get
෩+𝒀
𝒀𝒕 = 𝒀 ഥ
Thus
𝜷𝟎
෩𝒕 +
𝒀𝒕 = 𝒀
𝟏 − 𝜷𝟏
Dealing with Non-zero Means
𝜷𝟎 𝜷𝟎 𝜷𝟎 𝟏 − 𝜷𝟏 − 𝜷𝟎 + 𝜷𝟏 𝜷𝟎

𝒀𝒕 + ෩
= 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + ෩
+ 𝒆𝒕 𝒀 𝒕 = ෩ 𝒕−𝟏 + 𝒆𝒕
+ 𝜷𝟏 𝒀
𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏
𝜷𝟎 𝜷𝟏 𝜷𝟎 𝜷𝟎 − 𝜷𝟎 𝜷𝟏 − 𝜷𝟎 + 𝜷𝟏 𝜷𝟎
෩𝒕 +
𝒀 ෩
= 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + + 𝒆𝒕 ෩𝒕 =
𝒀 ෩ 𝒕−𝟏 + 𝒆𝒕
+ 𝜷𝟏 𝒀
𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏
𝜷𝟏 𝜷𝟎 𝜷𝟎
෩ 𝒕 = 𝜷𝟎 +
𝒀 − ෩ 𝒕−𝟏 + 𝒆𝒕
+ 𝜷𝟏 𝒀 ෩ 𝒕 = 𝟎 + 𝜷𝟏 𝒀
𝒀 ෩ 𝒕−𝟏 + 𝒆𝒕
𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏
𝜷𝟏 𝜷𝟎 𝜷𝟎 ෩ 𝒕 = 𝜷𝟏 𝒀
𝒀 ෩ 𝒕−𝟏 + 𝒆𝒕
෩ 𝒕 = 𝜷𝟎 +
𝒀 − ෩ 𝒕−𝟏 + 𝒆𝒕
+ 𝜷𝟏 𝒀
𝟏 − 𝜷𝟏 𝟏 − 𝜷𝟏
Dealing with Non-zero Means
De-meaning the variables transforms the non-zero AR(1) process (i.e., one
with a constant) to a zero-mean AR(1) process (i.e., one without a constant)
Thus, whenever you are looking at a zero-mean AR(p) process, just
෡ from its mean.
remember that the Y’s represent deviations of a variable 𝒀
ARMA(𝒑, 𝒒) Models
Let's complicate things a little more, by bringing together AR and MA
processes.
That is, there is a more general class of process called ARMA(𝒑, 𝒒) models
that consist of (a) an autoregressive component with 𝑝 lags, and (b) a
moving average component with 𝑞 lags
An ARMA(𝒑, 𝒒) model looks like:
𝒀𝒕 = 𝜷𝟏 𝒀𝒕−𝟏 + 𝜷𝟐 𝒀𝒕−𝟐 + ⋯ + 𝜷𝒑 𝒀𝒕−𝒑 + 𝒖𝒕 + 𝜸𝟏 𝒖𝒕−𝟏 + 𝜸𝟐 𝒖𝒕−𝟐 + ⋯ + 𝜸𝒒 𝒖𝒕−𝒒
Model Selection in ARMA(𝒑,𝒒)
Processes
Theoretical ACFs and PACFs
Introduction
To be able to tell whether a data generating process is an AR or MA process, there
is the need to rely some statistical processes that is associated with AR and MA
Models.
The classic Box and Jenkins (1976) procedure could be used to do this.
This procedure is to check whether a time series mimics the properties of various
theoretical models before estimation is actually carried out
This involves comparing the estimated ACFs and PACFs from the data with the
theoretical ACF and PACFs implied by the various model types.
A more recent approach is to use various “information criteria” to aid in model
selection
ACFs and PACFs each come in two flavors: theoretical and empirical. The former
is implied by a model; the latter is a characteristic of the data.
Autocorrelation functions
It is not always easy to see from the plot of a time series whether it is stationary.
It is useful to consider some statistics related to a time series like
AUTOCORRELATION FUNCTION.
Under weak or covariance stationarity, we can define the 𝒌𝒕𝒉 order autocovariance
𝜸𝒌 as:
𝑬 𝒀𝒕 − 𝝁 𝒀𝒕−𝒌 − 𝝁 = 𝒄𝒐𝒗 𝒀𝒕 , 𝒀𝒕−𝟏 = 𝒄𝒐𝒗 𝒀𝒕−𝒌 , 𝒀𝒕 = 𝜸𝒌
As the autocovariances are not independent of the units in which the variables are
measured, it is common to standardize by defining autocorrelations 𝝆𝒌 as
𝒄𝒐𝒗 𝒀𝒕 , 𝒀𝒕−𝒌
𝝆𝒌 =
𝑽𝒂𝒓 𝒀𝒕
Note that 𝝆𝟎 = 𝟏, while −𝟏 ≤ 𝝆𝒌 ≤ 𝟏
Autocorrelation functions
The autocorrelation function (ACF) describes the correlation between 𝑌𝑡 and
its lag 𝑌𝑡−𝑘 as a function of 𝑘.
The ACF plays a major role in modelling the dependencies among
observations, because it characterizes the process describing the evolution of
𝑌𝑡 over time
From the ACF we can infer the extent to which one value of the process is
correlated with previous values and thus the length and strength of the
memory of the process.
It indicates how long (and how strongly) a shock in the process (𝜀𝑡 ) affects
the values of 𝑌𝑡 .
Autocorrelation functions
For the AR(1) process:
𝒀𝒕 = 𝜷𝒀𝒕−𝟏 + 𝒆𝒕
We have autocorrelation coefficients
𝝆𝒌 = 𝜷𝒌
For the MA(1) process:
𝑌𝑡 = µ + 𝜀𝑡 + 𝛼𝜀𝑡−1
We have
𝜶
𝝆𝟏 =
𝟏 + 𝜶𝟐
and 𝝆𝒌 = 𝟎, 𝒌 = 𝟐, 𝟑, 𝟒 … .
Autocorrelation functions
Consequently,
a shock in an MA(1) process affects 𝑌𝑡 in two periods only,
while a shock in the AR(1) process affects all future observations with a
decreasing effect.
Theoretical PACFs
Theoretical Partial ACFs are more difficult to derive, so we will only outline their
general properties.
Theoretical PACFs are similar to ACFs, except they remove the effects of other
lags.
That is, the PACF at lag 2 filters out the effect of autocorrelation from lag 1.
Likewise, the partial autocorrelation at lag 3 filters out the effect of autocorrelation
at lags 2 and 1.
A useful rule of thumb is that Theoretical PACFs are the mirrored opposites of
ACFs.
While the ACF of an 𝐴𝑅(𝑝) process dies down exponentially, the PACF has spikes
at lags 1 through 𝑝, and then zeros at lags greater than 𝑝.
The ACF of an 𝑀𝐴(𝑞) process has non-zero spikes up to lag 𝑞 and zero afterward,
while the PACF dampens toward zero, and often with a bit of oscillation
Summary: Theoretical ACFs and PACFs
For 𝐴𝑅(𝒑) processes:
The ACFs decays slowly.
The PACFs show spikes at lags 1 through p, with zeros afterward.
For 𝑀𝐴(𝒒) processes:
The ACFs show spikes at lags 1 through q, with zeros afterward.
The PACFs decay slowly, often with oscillation.
For 𝐴𝑅𝑀𝐴(𝒑, 𝒒) processes:
The ACFs decay slowly.
The PACFs decay slowly
Information Criteria [IC]
To select the appropriate lag length or model for ARMA process we
sometimes ignore the ACF and PACFs and rely on the IC.
An information criterion is a measure of the quality of a statistical model
that considers:
How well the model fits the data
Complexity of the model.
IC are used to compare alternative models fitted to the same data set.
IC is an embodiment of two main factors:
a term which is a function of the sum of squared errors (SSE)
penalty for the loss of degrees of freedom from adding extra parameters
Information Criteria [IC]
Adding a new variable will have two competing effects on the IC: the SSE will
fall but the value of the penalty term will increase
The objective is to choose the number of parameters which minimises the
value of the information criteria.
So, adding an extra term will reduce the value of the criteria only if the fall in
the SSE is sufficient to more than outweigh the increased value of the
penalty term
Hence, all else being equal, a model with a lower IC is superior to a model
with a higher value.
The IC adds harsher penalty for adding more regressors that have no
significant impact on your dependent variable.
Information Criteria [IC]
The two most commonly used IC are the:
Akaike’s Information Criterion (AIC)
Schwarz’s Information Criterion (SIC) or Schwarz Bayesian Information Criterion
(SBC/BIC)
Most regression outputs always comes with the IC that you can use to
compare with other models.
When using STATA you may have to use the command “estat ic”
Chapter Three
Stationarity and Invertibility
What Is Stationarity?
Several methods in time series analysis are only valid if the underlying TS
variable is stationary.
The more stationary something is, the more predictable it is.
A series is said to be stationary when its mean, variance and autocovariance
are time invariant
In cross-sectional analysis, the mean of a variable 𝒀 is 𝑬 𝒀 = 𝝁
In the case of TS variable, the mean may end up being 𝑬 𝒀𝒕 = 𝝁𝒕
This imply that observed mean is time dependent, hence, as 𝒀𝒕 increases the
observed mean could also increase.
This would also affect the variance and autocovariances
What Is Stationarity?
TS variable is therefore said to be stationary when:
𝑬 𝒀𝒕 = μ
𝑽𝒂𝒓 𝒀𝒕 = 𝝈𝟐
𝑪𝒐𝒗 𝒀𝒕 , 𝒀𝒕+𝒌 = 𝑪𝒐𝒗 𝒀𝒕+𝒂 , 𝒀𝒕+𝒌+𝒂
That is the covariance between 𝒀𝒕 and 𝒀𝒕+𝒌 does not depend upon which 𝑡 is;
the time variable could be shifted forward or backward by a periods and the same
covariance relationship would hold.
What matters is the distance between the two observations.
For example, the covariance between 𝒀𝟏𝟗𝟗𝟎 & 𝒀𝟏𝟗𝟗𝟒 is the same as the
covariance between 𝒀𝟏𝟗𝟗𝟓 & 𝒀𝟏𝟗𝟗𝟖 or between 𝒀𝟐𝟎𝟎𝟏 & 𝒀𝟐𝟎𝟎𝟒 . i.e.,
𝑪𝒐𝒗 𝒀𝟏𝟗𝟗𝟎 , 𝒀𝟏𝟗𝟗𝟒 = 𝑪𝒐𝒗 𝒀𝟏𝟗𝟗𝟓 , 𝒀𝟏𝟗𝟗𝟖 = 𝑪𝒐𝒗 𝒀𝟐𝟎𝟎𝟏 , 𝒀𝟐𝟎𝟎𝟒 = 𝑪𝒐𝒗 𝒀𝒕 , 𝒀𝒕+𝟑
The Importance of Stationarity
Stationary processes are better understood than non-stationary ones, and we
know how to estimate them better.
The test statistics of certain non-stationary processes do not follow the usual
distributions.
Knowing how a process is nonstationary will allow us to make the necessary
corrections.
If we regress two completely unrelated integrated processes on each other,
then a problem called “spurious regression” can arise.
The Importance of Stationarity (spurious regression)
Spurious regression is a problem that arises when regression analysis
indicates a strong relationship between two or more variables when in fact,
they are totally unrelated.
That is, they may be seemingly related either due to coincidence, or the
presences of a certain third unseen factor (usually called confounding
variable).
That is if both 𝒀𝒕 and 𝑿𝒕 are nonstationary and we regress 𝑿𝒕 on 𝒀𝒕 , it is
likely that you obtained a result that indicate a strong relationship between 𝒀𝒕
and 𝑿𝒕 , even though there is no real connection.
They both depend upon time, so they would seem to be affecting each other.
This problem would be examined in detail in the next chapter
Restrictions on AR coefficients
Which Ensure Stationarity
Not all AR processes are stationary.
Some grow without limit.
Some have variances which change over time.
Restrictions on AR(1) Coefficients
Consider an 𝑨𝑹(𝟏) process,
𝒀𝒕 = 𝜷𝒀𝒕−𝟏 + 𝒆𝒕 (𝟏)
It is easy to see that it will grow without bound if 𝜷 > 𝟏
it will decrease without bound if 𝜷 < −𝟏
The process will only settle down and have a constant expected value if 𝜷 <
𝟏
This might be intuitively true, but we would have to develop a method for
examining high order 𝑨𝑹(𝟏) processes
First, we rewrite EQ (1) in terms of the lag operator 𝑳
Restrictions on AR(1) Coefficients
𝒀 = 𝜷𝑳𝒀 + 𝒆𝒕 Stationarity is achieved if and only if
Grouping like terms the root of the lag polynomial are
𝒀 − 𝜷𝑳𝒀 = 𝒆𝒕 greater than one in absolute value.
𝒀 𝟏 − 𝜷𝑳 = 𝒆𝒕 if 𝑳 = 𝒛, we solve for the values of 𝒛
The term in parentheses is referred that set the polynomial equal to zero
to as: 𝟏 − 𝒛𝜷 = 𝟎
a polynomial in the lag operator 𝟏 = 𝒛𝜷

𝟏
Lag polynomial 𝒛 =
Characteristic polynomial 𝜷
It is usually denoted by Thus, our lag polynomial has one
𝟏
𝚽 𝐋 = 𝟏 − 𝜷𝑳 root, and it is equal to
𝜷
Restrictions on AR(1) Coefficients
The AR process is stationary if its roots are greater than 1 in magnitude:

𝒛∗ > 𝟏
Which is to say that

𝜷 <𝟏
the AR(1) process is stationary if the roots of its lag polynomial are greater
than one (in absolute value); and this is assured if 𝜷 is less than one in
absolute value.
What Are Unit Roots, and Why Are They Bad?
“Unit roots” refer to the roots of the lag polynomial.
∗ 𝟏
In the AR(1) process, if there was a unit root, then 𝑳 = =𝟏
𝜷
So 𝜷 = 𝟏, which means that the AR process is a random walk.
The problem with unit-root process – that is, with processes that contain
random walks—is that they look stationary in small samples.
But treating them as stationary leads to very misleading results.
Moreover, regressing one non-stationary process on another, leads many “false
positives” where two variables seem related when they are not.
Unit roots represent a specific type of non-stationarity, which we would
explore in the next chapter.
The Connection Between AR and
MA Processes
• Under certain conditions, AR processes can be expressed as infinite
order MA processes. Same is true for MA processes.
• To go from AR to MA, the AR process must be stationary.
• To go from MA to AR, the MA process must be invertible
AR(1) to MA(∞)
There is an important link between AR and MA processes.
A stationary AR process can be expressed as an MA process, and vice versa.
Let's convert an AR(1) process into an MA ∞
Consider the following AR(1) process:
𝒀𝒕 = 𝜷𝒀𝒕−𝟏 + 𝒆𝒕 𝟏. 𝟏
Since the 𝑡 subscript is arbitrary, we can write the above the equation
𝒀𝒕−𝟏 = 𝜷𝒀𝒕−𝟐 + 𝒆 𝟏. 𝟐
𝒀𝒕−𝟐 = 𝜷𝒀𝒕−𝟑 + 𝒆𝒕−𝟐 𝟏. 𝟑
Substituting (1.3) into (1.2) and (1.2) into (1.1)
AR(1) to MA(∞)
𝒀𝒕 = 𝜷 𝜷 𝜷𝒀𝒕−𝟑 + 𝒆𝒕−𝟐 + 𝒆𝒕−𝟏 + 𝒆𝒕
𝒀𝒕 = 𝜷𝒀𝒕−𝟑 + 𝜷𝟐 𝒆𝒕−𝟐 + 𝜷𝒆𝒕−𝟏 + 𝒆𝒕
Continuing the substitutions indefinitely yields:
𝒀𝒕 = 𝒆𝒕 + 𝜷𝒆𝒕−𝟏 + 𝜷𝟐 𝒆𝒕−𝟐 + 𝜷𝟑 𝒆𝒕−𝟑 + ⋯ .
Thus, the AR(1) process is an MA(∞) process:
𝒀𝒕 = 𝒆𝒕 + 𝜸𝟏 𝒆𝒕−𝟏 + 𝜸𝟐 𝒆𝒕−𝟐 + 𝜸𝟑 𝒆𝒕−𝟑 + ⋯ .
Where, 𝜸𝟏 = 𝜷, 𝜸𝟐 = 𝜷𝟐 , 𝜸𝟑 = 𝜷𝟑 , and so forth.
AR(1) to MA(∞)
Can an AR(1) process always be expressed in this way?
No, The reason why an AR(1) process is not always an MA(∞) lies in our
ability to continue the substitution indefinitely.
Invertibility: MA(1) to AR(∞)
Just as we converted an AR(1) process to an MA(∞) process, we can also
convert an MA(1) process to an AR(∞) process.
In short, yes, it is possible as long as the MA process is “invertible”.
Consider the MA(1) Model
Yt = ut + γ ut−1
which can be rewritten as:
ut = Yt − γut−1.
This also implies that
ut−1 = Yt−1 −γut−2
ut−2 = Yt−2 − γut−3
ut−3 = Yt−3 − γut−4
Invertibility: MA(1) to AR(∞)
and so forth.
Substituting the last equation into the last but and the last but one into the third
equation and the third into the second equation

𝒖𝒕 = 𝒀𝒕 − 𝜸𝒀𝒕−𝟏 + 𝜸𝟐 𝒀𝒕−𝟐 − 𝜸𝟑 𝒀𝒕−𝟑 − 𝜸𝒖𝒕−𝟒

𝒖𝒕 = 𝒀𝒕 − 𝜸𝒀𝒕−𝟏 + 𝜸𝟐 𝒀𝒕−𝟐 − 𝜸𝟑 𝒀𝒕−𝟑 − 𝜸𝟒 𝒖𝒕−𝟒


Repeating this process indefinitely yields
𝒖𝒕 = 𝒖𝒕 = 𝒀𝒕 − 𝜸𝒀𝒕−𝟏 + 𝜸𝟐 𝒀𝒕−𝟐 − 𝜸𝟑 𝒀𝒕−𝟑 + ⋯

𝒖𝒕 = 𝒀𝒕 + ෍ −𝜸𝒊 𝒀𝒕−𝒊
𝒊=𝟏
Invertibility: MA(1) to AR(∞)
Equivalently

𝒀𝒕 = 𝒖𝒕 − ෍ −𝜸𝒊 𝒀𝒕−𝒊
𝒊=𝟏
Which is an AR (∞) process, with 𝜷𝟏 = 𝜸, 𝜷𝟐 = −𝜸𝟐 , 𝜷𝟑 = −𝜸𝟑 , 𝜷𝟏 =
− 𝜸𝟒 , and so on
The condition that was required for us to continue the substitutions above
indefinitely is analogous to what was required for stationarity when dealing
with AR processes.
Non-stationarity and
ARIMA 𝒑, 𝒅𝒒 Processes
Our focus since the beginning of the semester have focused on TS whose means did not exhibit long-run
growth.
It is time to drop this assumption.
After all, many economic and financial time series do not have a constant mean.
Examples include, Ghana’s GDP per capita, CPI, GSE Composite Index.
Non-stationary ARIMA models include the “random walk” and the “random walk with drift”
Differencing
Differencing is to time series what differentiation is to calculus.
If we have a TS whose mean is increasing, we can apply the difference
operator enough times to render the series stationary.
If a series needs to be differenced once in order to make it stationary, we say
that the series is “integrated of order one” or “𝐼(1).”
A series that needs to be differenced twice is “integrated of order two” and is
“𝐼(2).”
In general, if a series needs to be differenced 𝑑 times, it is said to be “integrated
of order d” and is “𝐼(𝑑).”
Mean and Variance of the Random Walk (Without Drift)
Given a random walk process with the 𝒀𝟑 = 𝒀𝟐 + 𝒆𝟑 = 𝒀𝟎 + 𝒆𝟏 + 𝒆𝟐 + 𝒆𝟑
coefficient of the lag dependent
variable equal to one Thus,
𝒀𝒕 = 𝒀𝒕−𝟏 + 𝒆𝒕 (𝟏) 𝒀𝒕 = 𝒀𝟎 + ෍ 𝒆𝒕
Applying the back-shift or the lag
operator onto both sides of eqn(1) Taking expectation on both sides of the
yields: equation
𝒀𝒕 = 𝒀𝒕−𝟐 + 𝒆𝒕−𝟏 + 𝒆𝒕
Continuing such substitution to the 𝑬 𝒀𝒕 = 𝑬 𝒀𝟎 + ෍ 𝒆𝒕 = 𝒀𝟎
period 𝒕 = 𝟎 allows us to write the
random walk.
𝒀𝟏 = 𝒀𝟎 + 𝒆𝟏 a

𝒀𝟐 = 𝒀𝟏 + 𝒆𝟐 = 𝒀𝟎 + 𝒆𝟏 + 𝒆𝟐
Mean and Variance of the Random Walk (Without Drift)
The random walk model tend to be very unpredictable, so our best guess during
the period 0 of what Y will be in period t is just Y’s value right now at period zero.
Taking the variance of a random walk process
𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝒀𝟎 + 𝒆𝟏 + 𝒆𝟐 + ⋯ + 𝒆𝒕
Since each of the error terms are drawn independently of each other, there is no
covariance between them.
Hence, we can push the variance calculations through the additive terms:
𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝒀𝟎 ) + 𝑽𝒂𝒓(𝒆𝟏 ) + 𝑽𝒂𝒓(𝒆𝟐 ) + ⋯ + 𝑽𝒂𝒓(𝒆𝒕

𝑽𝒂𝒓 𝒀𝒕 = 𝟎 + 𝝈𝟐 + 𝝈𝟐 + ⋯ + 𝝈𝟐

𝑽𝒂𝒓 𝒀𝒕 = 𝒕𝝈𝟐
Mean and Variance of the Random Walk (Without Drift)
Since the variance of 𝒀𝒕 is a function of 𝒕, the process is not variance
stationary.
Thus, a clear feature of RWM is the persistence of random shocks which
never dies out. Hence random walk processes tend to have infinite memory.
However, taking the first difference make it a stationary process.
i.e.,
𝜟𝒀𝒕 = 𝒀𝒕 − 𝒀𝒕−𝟏 = 𝒀𝒕−𝟏 − 𝒀𝒕−𝟏 + 𝒆𝒕

𝜟𝒀𝒕 = 𝒆𝒕
Though, 𝒀𝒕 may be nonstationary, the first difference could be stationary.
The Random Walk with Drift
𝒀𝒕 = 𝜷𝟎 + 𝒀𝒕−𝟏 + 𝒆𝒕
This process can be expressed in slightly different terms, which we will find
useful. Given an initial value of 𝒀𝟎 , which we arbitrarily set to zero, then
𝒀𝟎 = 𝟎
𝒀𝟏 = 𝜷𝟎 + 𝒀𝟎 + 𝒆𝟏 = 𝜷𝟎 + 𝒆𝟏
𝒀𝟐 = 𝜷𝟎 + 𝒀𝟏 + 𝒆𝟐 = 𝜷𝟎 + 𝜷𝟎 + 𝒆𝟏 + 𝒆𝟐 = 𝟐𝜷𝟎 + 𝒆𝟏 + 𝒆𝟐
𝒀𝟑 = 𝜷𝟎 + 𝒀𝟐 + 𝒆𝟑 = 𝜷𝟎 + 𝟐𝜷𝟎 + 𝒆𝟏 + 𝒆𝟐 + 𝒆𝟑 = 𝟑𝜷 + 𝒆𝟏 + 𝒆𝟐 + 𝒆𝟑
𝒕

𝒀𝒕 = 𝒕𝜷𝟎 + ෍ 𝒆𝒊
𝒊=𝟏
The Mean and Variance of the Random Walk with Drift
Taking the expectations of the last equation above
𝑬 𝒀𝒕 = 𝒕𝜷𝟎
The variance of a random walk with drift would also give as
𝒕 𝒕

𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝒕𝜷𝟎 + ෍ 𝒆𝒊 = 𝑽𝒂𝒓 ෍ 𝒆𝒊 = 𝒕𝝈𝟐𝒆


𝒊=𝟏 𝒊=𝟏
Thus both the mean and variance of a random walk with drift processes are a
function of time
The Random Walk with Drift
Taking the first difference makes it stationary
𝜟𝒀𝒕 = 𝒀𝒕 − 𝒀𝒕−𝟏 = 𝜷𝟎 + 𝒀𝒕−𝟏 + 𝒆𝒕 − 𝑿𝒕−𝟏

𝜟𝒀𝒕 = 𝜷𝟎 + 𝒆𝒕
Let 𝒁𝒕 = 𝜟𝒀𝒕 and we see that:
𝒁𝒕 = 𝜷𝟎 + 𝒆𝒕
The variable 𝒁𝒕 is just a white noise, with a mean of 𝜷𝟎
Deterministic Trend
Deterministic trend is also another example of a non-stationary series.
Consider,
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒕 + 𝒆𝒕
Where t denotes the time elapsed and the 𝜷𝒔 are parameters; the only
random component in the model is 𝒆𝒕 , the IID errors
The mean and variance is given as
𝑬 𝒀𝒕 = 𝑬 𝜷𝟎 + 𝜷𝟏 𝒕 + 𝒆𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒕

𝑽𝒂𝒓 𝒀𝒕 = 𝑽𝒂𝒓 𝜷𝟎 + 𝜷𝟏 𝒕 + 𝒆𝒕 = 𝑽𝒂𝒓 𝒆𝒕 = 𝝈𝟐𝒆


Thus, a deterministic trend process has a non-stationary mean (it grows
linearly with time) and a stationary variance (equal to 𝝈𝟐𝒆 )
Deterministic Trend
Taking the first difference introduces an MA unit root
𝜟𝒀𝒕 = 𝒀𝒕 − 𝒀𝒕−𝟏 = 𝜷𝟎 + 𝜷𝟏 𝒕 + 𝒆𝒕 − 𝜷𝟎 + 𝜷𝟏 𝒕 − 𝟏 + 𝒆𝒕−𝟏

𝜟𝒀𝒕 = 𝜷𝟏 𝒕 + 𝒆𝒕 − 𝜷𝟏 𝒕 − 𝜷𝟏 − 𝒆𝒕−𝟏

𝜟𝒀𝒕 = 𝜷𝟏 + 𝒆𝒕 − 𝒆𝒕−𝟏
Since this first-differenced series does not depend upon time, then the mean
and variance of this first-differenced series also do not depend upon time:
𝑬 𝚫𝐘𝐭 = 𝐄 𝜷𝟏 + 𝒆𝒕 − 𝒆𝒕−𝟏 = 𝜷𝟏

𝑽𝒂𝒓 𝜟𝒀𝒕 = 𝑽𝒂𝒓 𝒆𝒕 − 𝒆𝒕−𝟏 = 𝑽𝒂𝒓 𝒆𝒕 ) + 𝑽𝒂𝒓(𝒆𝒕−𝟏 = 𝟐𝝈𝟐𝒆


Deterministic Trend
Notice that the first-differenced model now has an MA unit root in the error
terms.
Never take first differences to remove a deterministic trend.
Rather, regress Y on time, and then work with the residuals.
These residuals now represent Y that has been linearly detrended.
Random Walk with Drift vs Deterministic Trend
There are many similarities between random walks with drift and deterministic
trend processes.
They are both non-stationary, but the source of the non-stationarity is different.
It is worthwhile to look at these models side by side. The “random walk with drift”
is
𝒕

𝒀𝒕 = 𝜷𝟎 + 𝒀𝒕−𝟏 + 𝒆𝒕 = 𝒕𝜷𝟎 + ෍ 𝒆𝒊
𝒊=𝟏
with mean and variance of
𝑬 𝒀𝒕 = 𝒕𝜷𝟎

𝑽𝒂𝒓 𝒀𝒕 = 𝒕𝝈𝟐𝒆 .
Random Walk with Drift vs Deterministic Trend
In order to make it stationary, it needs to be differenced. The deterministic
trend model is:
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 + 𝒆𝒕
with mean and variance of
𝑬 𝒀𝒕 = 𝒕𝜷𝟏 + 𝜷𝟎
𝑽𝒂𝒓 𝒀𝒕 = 𝝈𝟐𝒆
Both models have means which increase linearly over time.
This makes it very difficult to visually identify which process generated the
data.
The variance of the random walk with drift, however, grows over time, while
the variance of the deterministic trend model does not.
Unit Root Tests
Unit Root Tests
Consider an AR(1) process with drift
𝒀𝒕 = 𝜷𝟎 + 𝜷𝟏 𝒀𝒕−𝟏 + 𝒆𝒕 , 𝒆𝒕 ~𝒊𝒅𝒅 𝟎, 𝝈𝟐
The simple approach to unit root testing is to estimate the above eqn using
OLS and examine the estimated 𝜷 ෡
Use a t-test with null 𝑯𝟎 : 𝜷𝟏 = 𝟏 (Non-stationary)
Against the alternative 𝑯𝑨 : 𝜷𝟏 < 𝟏 (Stationarity)
𝜷𝟏 −𝟏
The test statistic is obtained as
𝑺𝒆 𝜷𝟏
Reject the null hypothesis when the test statistic is large negative
Unit Root Tests
The problem with the simple t-test approach is that
Lagged dependent variables ⇒ 𝜷𝟏 biased downwards in small samples (i.e.,
dynamic bias)
When 𝜷𝟏 = 𝟏, we have non-stationary process and standard regression
analysis is invalid (i.e., non-standard distribution)
Dickey Fuller(DF) Approach to Unit Roots Testing
𝒀𝒕 = 𝝆𝒀𝒕−𝟏 + 𝒆𝒕
𝜟𝒀𝒕 = 𝝆𝒀𝒕−𝟏 − 𝒀𝒕−𝟏 + 𝒆𝒕
𝜟𝒀𝒕 = (𝝆 − 𝟏)𝒀𝒕−𝟏 + 𝒆𝒕

𝜟𝒀𝒕 = 𝜹𝒀𝒕−𝟏 + 𝒆𝒕
With the DF test we test the
𝑯𝟎 : 𝜹 = 𝟎
𝑯𝑨 : 𝜹 ≠ 𝟎
Augmented Dickey Fuller Test (ADF)
The main criticism against the DF test is that the power of the tests is low if
the process is stationary but with a root close to the non-stationary boundary.
E.g., the DF test is poor in determining the difference between
𝜹=𝟏 & 𝜹=𝟎.𝟗𝟓
More so the DF test assumes that the error term is a white noise.
To help solve this problem the ADF test incorporate lagged dependent
variables.
Testing Unit Roots – Augmented Dickey Fuller Test
(ADF)
The main criticism against the DF test is that the power of the tests is low if
the process is stationary but with a root close to the nonstationary boundary.
E.g. the DF test is poor in determining the difference between 𝜃 = 1 or 𝜃 =
0.95
More so the DF test is based on the assumption that the error term is a white
noise.
Hence the ADF test was developed to help solve the weaknesses associated
with the DF test.
To help improve on the deficiencies associated with the DF, the ADF
includes lag values of the dependent variable.
ADF is for higher lag order.
Testing Unit Roots - Phillips Perron Test (PP)
Unlike the ADF test which includes more lags to correct for the problem of
serial correlation in
the error term,
The PP employs a nonparametric statistical method to take of the problem
of serial correlation in the error term without adding lag difference terms.
However the PP and the ADF test have the same asymptotic distribution.
Critique of ADF and PP
The major issue associated the ADF and the PP test has to do with the size
and power of these
tests.
By size of the test we are referring to the level of significance of the test.
That is the probability of committing type I error.
By the power of a test we are referring to the probability of rejecting a false
hypothesis. This is calculated by subtracting the probability of committing
type II error from 1.
Critique of ADF and PP
SIZE OF THE TEST
Both the ADF and PP test are very sensitive to the way it is
conducted. This is because we could have a:
▪ Pure random walk;
▪ Random walk with drift and
▪ Random walk with drift and trend
This implies that if you don’t have an idea about the true model of the series
you may end up with wrong conclusions.
E.g. if the true model is (1) but we estimate (2) and we
obtain a significance level of 5% that the series is stationary, when actually
the level of significance may actually be higher.
Critique of ADF and PP
POWER OF THE TEST
The ADF & PP test tend to accept the null of unit root more frequently than
it is warranted.
That is they have low power, they find unit root even when they do not exist.
Reasons for this situation include the following
❖It depends on the time span of the data more than the mere size of the sample
❖Thus, the power greater when the time span is large
❖A unit root test based on 30 observations over 30-years may have greater power
than that based on say 90 observations over a span of 90 days.
Critique of ADF and PP
POWER OF TEST (2)
If θ ≈ 1 but not exactly 1, the test may declare such series as non-stationary
These tests also assumes a single unit root i.e. they assume that a given time
series is I(1). But if a series is integrated of a higher order, there would be
more than one unit root.
In addition, if there are structural breaks in a particular series, the unit root
test may be unable to detect.
Testing Unit Roots - Kwiatkowski, Phillips, Schmidt,
and Shin (KPSS) Test
The KPSS test differs from the other unit root tests described. This is
because it has a null hypothesis that the series is trend stationary
The rejection of the null hypothesis is an indication unit root.
𝐻0 : 𝑦𝑡 ~ 𝐼(0) (stationarity)
against 𝐻1 : 𝑦𝑡 ~ 𝐼(1) (a unit root).
Therefore, if the results of the tests above indicate a unit root but the result
of the KPSS test indicates a stationary process, one should be cautious and
opt for the latter result.
Very powerful test, but it has problems with structural breaks (say, volatility
shifts).

You might also like