Basic Econometrics (Lectures)
Basic Econometrics (Lectures)
Bo Sjö
Linköping, Sweden
email:[email protected]
1 Introduction 7
I Basic Statistics 19
3 Time Series Modeling - An Overview 21
CONTENTS 3
7 Introductioo to Time Series Modeling 59
7.1 Descriptive Tools for Time Series . . . . . . . . . . . . . . . . . . . 62
7.1.1 Weak and Strong Stationarity . . . . . . . . . . . . . . . . . 64
7.1.2 Weak Stationarity, Covariance Stationary and Ergodic Processes 64
7.1.3 Strong Stationarity . . . . . . . . . . . . . . . . . . . . . . . 65
7.1.4 Finding the Optimal Lag Length and Information Criteria . 66
4 CONTENTS
12 Non-Stationarity and Co-integration 109
12.0.1 The Spurious Regression Problem . . . . . . . . . . . . . . 110
12.0.2 Integrated Variables and Co-integration . . . . . . . . . . . 111
12.0.3 Approaches to Testing for Co-integration . . . . . . . . . . 112
16 Encompassing 137
20 References 157
20.1 APPENDIX 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
20.2 Appendix III Operators . . . . . . . . . . . . . . . . . . . . . . . . 160
Abstract
CONTENTS 5
6 CONTENTS
1. INTRODUCTION
”He who controls the past controls the future.” George Orwell in
"1984".
Please respect that this is work in progress. It has never been my intention to
write a commercial book, or a perfect textbook in time series econometrics. It is
simply a collection of lectures in a popular form that can serve as a complement
to ordinary textbooks and articles used in education. The parts dealing with
tests for unit roots (order of integration) and cointegration are not well developed.
These topics have a memo of their own "A Guide to testing for unit roots and
cointegration".
When I started to put these lecture notes together some years ago I decided
on title "Lectures in Modern Time Series Econometrics" because I thought that
the contents where a bit "modern" compared to standard econometric textbook.
During the fall of 2010 as I started to update the notes I thought that it was
time to remove the word "modern" from the title. A quick look in Damodar
Gujarati’s textbook "Basic Econometrics" from 2009 convinced my to keep the
word "modern" in te title. Gujarati’s text on time series hasn’t changed since the
1970’s even though time series econometrics has changed completely since the 70s.
Thus, under these circumstances I see no reason to change the title, at least not
yet.
There are four ways in which one do time series econometrics. The …rst is to use
the approach of the 1970s, view your time series model just like any linear regres-
sion, and impose a number of ad hoc restrictions that will hide all problems you
…nd. This is not a good approach. This approach is only found in old textbooks
and never in today’s research. You might only see it used in very low scienti…c
journals. Second, you can use theory to derive a time series model, and interest-
ing parameters, that you then estimate with appropriate estimators. Examples
of this ti derive utility functions, assume that agents have rational expectations
etc. This is a proper research strategy. However, it typically takes good data,
and you need to be original in your approach, but you can get published in good
journals. The third, approach is simply to do statistical description of the data
series, in the form of a vector autoregressive system, or reduced form of the vector
error correction model. This system can used for forecasting, analysing relation-
ships among data series and investigated with respect to unforeseen shocks such
as drastic changes in energy prices, money supply etc. The fourth way is to go
beyond the vector autoregressive system and try to estimate structural parameters
in the form of elasticities and policy intervention parameters. If you forget about
the …rst method, the choice depends on the problem at hand and you chose to
formulate it. This book aims at telling you how to use methods three and four.
The basic thinking is that your data is the real world, theories are abstractions
that we use to understand the real world. In applied econometric time series you
should always strive to build well-de…ned statistical models, that is models that
are consistent with the data chosen. There is a complex statistical theory behind
all this, that I will try to popularize in this book. I do not see this book as a
substitute for an ordinary textbook. It is simply a complement.
INTRODUCTION 7
1.1 Outline of this Book/Text/Course/Workshop
This book is intended for people who has done a basic course in statistics and
econometrics, either at the undergraduate or at the graduate level. If you did an
undergraduate course I assume that you did it well. Econometrics is a type of
course were every lecture, and every textbook chapter leads to the next level. The
best way to learn econometrics is to be active, read several books, work on your
own with econometric software. No teacher can learn you how to run a software.
That is something you have to learn on your own by practicing how to use the
software. There are some very good software out there, and some The outline
di¤erences between graduate and Ph.D. level mainly in the theoretical parts. At
the Ph.D. level, there is more stress on theoretical backgrounds.
1) I will begin by talking about why econometrics is di¤erent from statistics,
and why econometric time series is di¤erent from the econometrics your meet in
many basic textbooks.
2) I will repeat very brie‡y basic statistics, and linear regression and stress
what you should know in terms of testing and modeling dynamic models. For
most students that will imply going back and do some quick repetition.
3) Introduction into statistical theory including maximum likelihood, random
variables, density functions and stochastic processes.
4) Fourth, basic time series properties and processes.
5) Using and understanding ARFIMA and VAR modelling techniques.
6) Testing for non-stationary in the form of stochastic trends, i.e. test for unit
roots.
7) The spurious regression problem
8) Testing and understanding cointegration.
9) Testing for Granger non-causality
10) The theory of reduction, exogeneity and building dynamic models and
systems
11) Modelling time varying variances, ARCH and GARCH models
12) The implications and consequences of rational expectations on econometric
modelling
13) Non-linearities
14) Additional topics
For most of these topics I have developed more or less self-instructing exercises.
8 INTRODUCTION
decisions on some view of the economy where we assume that certain events are
linked to each other in more or less complex ways. Economists call this a model
of the economy. We can describe the economy and the behavior of the individ-
uals in terms of multivariate stochastic processes. Decisions based on stochastic
sequences play a central role economics and in …nance. Stochastic processes are
the basis for our understanding about the behavior of economic agents and of how
their behavior determine the future path of the economy. Most econometric text
books deal with stochastic time series as a special application of the linear regres-
sion technique. Though this approach is acceptable for an introductory course in
econometrics, it is unsatisfactory for students with a deeper interest in economics
and …nance. To understand the empirical and theoretical work in these areas, it
is necessary to understand some of the basic philosophy behind stochastic time
series.
This work is a work in progress. It is based on my lectures on Modern Eco-
nomic Time Series Analysis at the Department of Economics …rst at University
of Gothenburg and later at University of Skovde and Linköping University in
Sweden. The material is not ready for a widespread distribution. This work, most
likely, contains lots of errors, some are known by the author, and some are not
yet detected. The di¤erent sections do not necessarily follow in a logical order.
Therefore, I invite anyone who has opinions about this work to share them me.
The …rst part of this work provides a repetition of some basic statistical con-
cepts, which are necessary understanding modern economic time series analysis.
The motive for repeating these concepts is that they play a larger role in econo-
metrics than many contemporary textbooks in econometrics indicate. Economet-
rics did not change much from the …rst edition of Johnston in the 60s until the
revised version of Kmenta in the mid 80s. However, as a consequence of the cri-
tique against the use of econometrics delivered by Sims, Lucas, Leamer, Hendry
and others, in combination with new insights into the behavior of non-stationary
time series and the rapid development of computer technology, have revolutionized
econometric modeling, and resulted in an explosion of knowledge. The demand for
writing a decent thesis, or a scienti…c paper, based on econometric methods has
risen far beyond what one can learn in an introductory course in econometrics.
In media you often hear about this and that being proved by scienti…c research.
In the late 1990s newspapers told that someone had proved that genetic modi…ed
(GM) food could be dangerous. The news were spread quickly, and according to
the story the original article had been stooped from being published by scientists
with suspicious motives. Various lobby groups immediately jumped up. GM food
were dangerous, should be banned and more money should go into this line of
research. What had happened was the following. A researcher claimed to have
shown that GM food were bad for health. He claimed this results for a number
of media people, who distributed the results. (Remember the fuss about ’cold
fusion’). The result were presented in a paper sent to a scienti…c journal for
publication. The journal however, did not publish the article. It was dismissed
because the results were not based on a sound scienti…c method. The researcher
had feed rats with potatoes. One group of rats got GM potatoes, the other group
of rats got normal non-GM potatoes. The rats that got GM potatoes seemed
to develop cancer more often than the control group. The statistical di¤erence
10 INTRODUCTION
2. INTRODUCTION TO ECONO-
METRIC TIME SERIES
"Time is a great teacher, but unfortunately it kills all its pupils" Louis Hector
Berlioz
A time series is simply data ordered by time. For an econometrician time series
is usually data that is also generated over time in such a way that time can be
seen as a driving factor behind the data. Time series analysis is simply approaches
that look for regularities in these data ordered by time.
In comparison with other academic …elds, the modeling of economic time series
is characterized by the following problems, which partly motivates why economet-
rics is a subject of its own:
The empirical sample sizes in economics are generally small, especially com-
pared with many applications in physics or biology. Typical sample sizes
ranges between 25 - 100 observations. In many areas anything below 500
observations is considered a small sample.
Economic time series are dependent in the sense that they are correlated with
other economic time series. In the economic science, problems are almost
never concerned with univariate series. Consumption, as an example, is a
function of income, and at the same time, consumption also a¤ects income
directly and through various other variables.
Economic time series are often dependent over time. Many series display
high autocorrelation, as well as cross autocorrelation with other variables
over time.
Economic time series are generally non-stationary. Their means and vari-
ances change over time, implying that estimated parameters might follow un-
known distributions instead of standard tabulated distributions like the nor-
mal distribution. Non-stationarity arises from productivity growth and price
in‡ation. Non-stationary economic series appear to be integrated, driven by
stochastic trends, perhaps as a result of stochastic changes in the total fac-
tor productivity. Integrated variables, and in particular the need to model
them, are not that common outside economics. In some situations, therefore,
inference in econometrics become quite complicated, and requires the devel-
opment of new statistical techniques for handling stochastic trends. The
concepts of cointegration and common trends, and the recently developed
asymptotic theory for integrated variables are examples of this.
Economic time series cannot be assumed to be drawn from samples in the
way assumed in classical statistics. The classical approach is to start from
a population from which a sample is drawn. Since the sampling process can
be controlled the variables which make up the sample can be seen as ran-
dom variables. Hypothesis are then formulated and tested conditionally on
the assumption that the random variables have a speci…c distribution. Eco-
nomic time series are seldom random variables drawn from some underlying
population in the classical statistical sense. Observations do not represent
Finally, from the view of economics, the subject of statistics deals mainly
with the estimation and inference of covariances only. The econometrician,
however, must also give estimated parameters an economic interpretation.
This problem cannot always be solved ex post, after the a model has been es-
timated. When it comes to time series, economic theory is an integrated part
of the modeling process. Given a well de…ned statistical model, estimated
parameters should represent behavior of economic agents. Many econometric
studies fail because researchers assume that their estimates can be given an
economic interpretation without considering the statistical properties of the
model, or the simple fact there is in general not a one to one correspondence
with observed variables and the concepts de…ned in economic theory.1
2.1 Programs
Here is a list of statistical software that you should be familiar with, please goggle,
(those recommended for time series are marked with *):
– *RATS and CATS in RATS, Regression Analysis of Time Series and Coin-
tegrating Analysis of Time Series (www.estima.com)
- *PcGive - Comes highly recommended. Included in Oxmetrics modules, see
also Timberlake consultants for more programs.
- *Gretl (Free GNU license, very good for students in econometrics)
- *JMulti (Free for multivariate time series analysis, updated? The discussion
forum is quite dead, www.jmulti.com)
- *EViews
- Gauss (good for simulation)
- STATA (used by the World Bank, good for microeconometrics, panel data,
OK on time series)
- LIMDEP (’Mostly free’ with some editions of Green’s Econometric text
book?, you need to pay for duration models?)
- SAS - Statistical Analysis System (good for big data sets, but not time series,
mainly medicine, "the calculus program for decision makers")
- Shazam
And more, some are very special programs for this and that, ... but I don’t
…nd them worth mentioning in this context.
1 For a recent discussion about the controversies in econometrics see The Economic Journal
1996.
You should also know about C, C++, and LaTeX to be a good econometrician.
Please google.
For Data Envelopment Analysis (DEA) I recommend Tom Coelli’s DEAP 2.1
or Paul W. Wilson’s FEAR.
Given the general de…nition of time series above, there many types of time series.
The focus in econometrics, macroeconomics and …nance is in stochastic time series
typically in the time domain, which are non-stationarity in levels but becomes what
is called covariance stationary after di¤erencing.
In a broad perspective, time series analysis typically aims at making time series
more understandable by decomposing them into di¤erent parts. The aim of this
introduction is to give a general overview of the subject. A time series is any
sequence ordered by time. The sequence can be either deterministic or stochastic.
The primary interest in economics is in stochastic time series, where the sequence
of observations is made up by the outcome of random variables. A sequence of
stochastic variables ordered by time is called a stochastic time series process.
The random variables that make up the process can either be discrete ran-
dom variables, taking on a given set of integer numbers, or be continuous
random variables taking on any real number between 1: While discrete ran-
dom variables are possible they are not that common in economic time series
research.
variables. Stocks are variables that can be observed at a point in time like, the money stock,
inventories. Flows are variables that can only be observed over some period, like consumption or
GDP. In this context price variables include prices, interest rates and similar variables which can
be observed at a market at a given point in time. Combining these variables into multivariate
process and constructing econometric models from observed variables in discrete time produces
further problems, and in general they are quite di¢ cult to solve without using continuous time
methods. Usually, careful discrete time models will reduce the problems to a large extent.
1. To be completed...
3 For simplicity we assume a linear process. An alternative is to assume that the components
are multiplicative, xt = Tt;d St;d Ct;d It :
To be completed...
Random variables, OLS, minimize the sum of squares, assumptions 1 - 5(6),
understanding, multiple regression, multicollinearity, properties of OLS estimator
Matrix algebra
Tests and ’solutions’for heteroscedasticity (cross-section), and autocorrelation
(time series).
If you read a good course you should have learned the three golden rules: test
test test, and learned about the probabilities of the OLS estimator.
Generalized least squares GLS
System estimation: demand and supply models.
Further extensions:
Panel data, Tobit, Heckit, discrete choice, probit/logit, duration
Time series: distributed lag models, partial adjustment models, error correc-
tion models, lag structure, stationarity vs. non-stationarity, co-integration
What need to know ...
What you probably do not know but should know.
OLS
Ordinary least squares is a common estimation method. Suppose there are two
series fyt ; xt g
yt = + xt + "t
Minimize
PT the sumPTof Squares over the sample t = 1; :2:::T ,
2
S = t=1 "2t = t=1 (yt xt )
Take the derivative of S with respect to and , set the expressions to zero,
and solve for and :
S
=
S
=
^ =s
T SS = ESS + RSS
1 = ESS RSS
T SS + T SS
2 RSS
R = 1 T SS = ESST SS
Basic assumptions
1) E("t ) = 0 for all t
6) "t s N ID(0; 2 )
Discuss these properties
Properties
Gauss-Markow BLUE
Deviations
Misspeci…cation, add extra variable, forget relevant variable
Multicollinearity
Error in variables problem
Homoscedasticity Heteroscedasticity
Autocorrelation
Basic Statistics
19
3. TIME SERIES MODELING - AN
OVERVIEW
The basic reason for dealing with stochastic models rather than deterministic
models is that we are faced with random variables. A popular de…nition of
random variables goes like this: a random variable is a variable that can take on
more than one value. 1 For every possible value that a random variable can take
on there is a number between zero and one that describes the probability that
the random variable will take on this value. In the following a random variable is
indicated with .
In statistical terms, a random variable is associated with the outcome of a
statistical experiment. All possible outcomes of such an experiment can be called
the sample space. If S is a sample space with a probability measure and if X ~
~
is real valued function de…ned over S then X is called a random variable.
There are two types of random variables; discrete random variables, which
only take on a speci…c number of real values, and (absolute) continuous random
variables, which can take on any value between 1. It is also possible to examine
discontinuous random variables, but we will limit ourselves to the …rst two types.
If the discrete random variable X ~ can take k numbers of values (x1 , ..., xk ),
the probability of observing a value xj can be stated as,
P (xj ) = pj : (3.1)
~
F (x) = P (X x) f or 1 < x < 1; (3.3)
RANDOM VARIABLES 23
The fundamental theorem of integral calculus gives us the following expression
~ takes on a value less that or equal to x,
for the probability that X
Z x
F (x) = f (u)du: (3.5)
1
It follows that for any two constants (a) and (b), with a < b, the probability
that X~ takes on a value on the interval from (a) to (b) is given by
Z b Z a
F (b) F (a) = f (u)du f (u)du (3.6)
1 1
Z b
= f (u)du (3.7)
a
where E is the expectation operator and f (x) is the value of its probability
~ Thus, E(X)
function at X. ~ represents the mean of the discrete random variable
~
X: Or, in other words, the …rst moment of the random variable. For a continuous
~ the mathematical expectation is
random variable (X),
Z 1
~ =
E(X) x f (x)dx (3.9)
1
Like density, the term moment, or moment about the origin, has its explanation
in physics. (In physics the length of a lever arm is measured as the distance from
the origin. Or if we refer to the example with the rod above, the …rst moment
around the mean would correspond to horizontal center of gravity of the rod.)
Reasoning from intuition, the mean can be seen as the midpoint of the limits of
the density. The midpoint can be scaled in such a way that its becomes the origin
of the x- axis.
The term ”moments of a random variable” is a more general way of talking
about the mean and variance of a variable. Setting g(x) equal to x, we get the
r:th moment around the origin,
X
0 ~r
r = E(X ) = xr f (x) (3.11)
The …rst moment is nothing else than the mean, or the expected value of X. ~
The second moment is the variance. Higher moments give additional information
about the distribution and density functions of random variables.
Now, de…ning g(X) ~ = (X ~ 0
r ) we get what is called the r:th moment about
the mean of the distribution of the random variable X. ~ For r = 0, 1, 2, 3 ... we
get for a discrete variable,
X
r = E[(X
~ 0 r
r) ] = (X~ 0 r
r ) f (x) (3.13)
~ is continuous
and when X
Z 1
r
~
= E[(X 0 r
r) ] = ~
(X 0 r
) f (x)dx: (3.14)
1
The second moment about the mean, also called the second central moment,
is nothing else than the variance of g(x) = x;
Z 1
~
var(X) = ~ E(X)]
[X ~ 2 f (x)dx (3.15)
1
Z 1
= ~ 2 f (x)dx [E(X)]
X ~ 2 (3.16)
1
= ~ 2)
E(X ~ 2;
[E(X)] (3.17)
where f (x) is the value of probability density function of the random variable
~ at x:A more generic expression for the variance is dispersion. We can say that
X
In time series econometrics, and …nancial economics, there is a small set of distri-
butions that one has to know. The following is a list of common distributions:
Distribution
Normal distribution N ; 2
Log Normal distribution LogN ; 2
Student t distribution St ; ; 2
Cauchy distribution Ca ; 2
Gamma distribution Ga ; ; 2
Chi-square distribution ( )
F distribution F (d1 ; d2 )
Poisson distribution P ois ( )
Uniform distribution U (ja; bj)
This test is known as the Jarque-Bera (JB) test and is the most common
test for normality in regression analysis. The null hypothesis is that the series is
normally distributed. Let 1 ; 2 , 3 and 4 represent the mean, the variance, the
skewness and the kurtosis. The null of a normal distribution is rejected if the test
statistics is signi…cant. The fact that the test is only valid asymptotically, means
that we do not know the reason for a rejection in a limited sample. In a less than
asymptotic sample rejection of normality is often caused by outliers. If we think
the most extreme value(s) in the sample are non-typical outliers, ’removing’them
from the calculation the sample moments usually results in a non-signi…cant JB
test. Removing outliers is add hoc. It could be that these outliers are typical
values of the true underlying distribution.
2 For these moments to be meaningful, the series must be stationary. Also, we would like
fxt g to an independent process. Finally, notice that the here suggested estimators of the higher
moments are not necessarily e¢ cient estimators.
3 This test statistics is for a variable with a non-zero mean. If the variable is adjusted for its
mean (say an estimated residual), the second should be removed from the expression.
We will now generalize the work of the previous sections by considering a vector
of n random variables,
~ = (X
X ~1; X
~ 2 ; :::; X
~n) (3.18)
whose elements are continuous random variables with density functions f (x1 )
..., f (xn ), and distribution functions F (x1 ) ..., F (xn ). The joint distribution will
look like,
Z xn Z x1
F (x1 ; x2 ; :::; xn ) = f (x1 ; x2 ; :::; xn )dx1 dxp ; (3.19)
1 1
For independent random variables we can de…ne the r:th product moment as,
E(X~ 1 r1 ; X
~ 2 r2 ; :::; X
~ n rn ) (3.21)
Z 1 Z 1
= x1 r1 x2 r2 xn rn f (x1 ; x2 ; :::; xn )dx1 dx2 dxn ; (3.22)
1 1
~ 1 r1 )E(X
E(X ~ 2 r2 ) ~ n rn ):
E(X (3.23)
It follows from this result that the variance of a sum of independent random
variables is merely the sum of these individual variances,
~1 + X
var(X ~ 2 + ::: + X
~ n ) = var(X
~ 1 ) + var(X
~ 2 ) + ::: + var(X
~ n ): (3.24)
X
• Z)
cov(Z; • =B B0; (3.28)
and X
cov(Y• ; Z)
• =A B0: (3.29)
which in this case would mean that (yt ) the observation at time t is dependent
on all earlier observations on Y~t .
It is seldom that we deal with independent variables when modeling economic
time series. For example, a simple …rst order autoregressive model like yt =
yt 1 + t , implies dependence between the observations. The same holds for all
time series models. Despite this shortcoming, density functions with independent
random variables, are still good tools for describing time series modelling, because
the results based on independent variables carries over to dependent variables in
almost every case.
In this section we look at the linear regression model starting from two random
variables Y~ and X.
~ Two regressions can be formulated,
y= + x+ ; (3.34)
and
x= + y+ : (3.35)
Whether one chooses to condition y on x, or x on y depends on the parameter
of interest. In the following it is shown how these regression expression are con-
structed from the correlation between x and y, and their …rst moments by making
use of the (bivariate) joint density function of x and y. (One can view this section
as an exercise in using density functions).
D(y; x; ); (3.36)
The parameters in 3.38 can be estimated by using means, variances and co-
variances of the variables. Or in other terms, by using some of the lower moments
~ and Y~ . Hence, the …rst step rewrite 3.38 in such a
of the joint distribution of X
way that we can write and in terms of the means of X ~ and Y~ .
Looking at the LHS of 3.38 it can be seen that a multiplication of the condi-
tional density with the marginal density for X, ~ g(x), leads to the joint density.
Given the joint density we can choose to integrate out either x or y. In this case
we chose to integrate over x. Thus we have after multiplication,
Z
y D(yj x; )dyg(x)= g(x ) + x g(x ): (3.39)
= + ~ =
E(X) + x: (3.41)
E(Y~ jx; ) = + ~ =
E(X) y = + x: (3.42)
We now have one equation two solve for the two unknowns. Since we have
used up the means let us turn to the variances by multiplying both sides of 3.38
with x and perform the same operations again.
Integrate over x,
Z Z
xyD(yj x ; )dydxg(x )
Z Z
= x g(x )dx + x 2 g(x )dx : (3.44)
= y x: (3.48)
which gives
xy
= 2
: (3.50)
x
Using these expressions in the linear regression line leads to,
xy
E(Y~ j x; ) = y + 2
(x x) = + x; (3.51)
x
~ yx
E(Xjy; )= x + 2
(y y) = + y: (3.52)
y
We can now make use of the correlation coe¢ cient and the parameter in the
~ and Y~ is de…ned as,
linear regression. The correlation coe¢ cient between X
xy
= or x y = xy : (3.53)
x y
So, if the two variables are independent their covariance is zero, and the cor-
relation is also zero. Therefore, the conditional mean of each variable does not
dependent on the mean and variance of the other variable. The …nal message
is that a non-zero correlation, between two normal random variables, results in
linear relationship between them. With a multivariate model, with more than two
random variables, things are more complex.
The MLE approach starts from a random variable X ~ t , and a sample of T inde-
pendent observations (x1 ; x2 :::; xT ). The joint density function is
T
Y
f (x1 ; x2 ; :::; xT ; ) = f (x; ) = f (xt ; ) (4.1)
L( ; x); (4.3)
where the parameters are seen as a function of the sample. It is often convenient
to work with the log of the likelihood instead, leading to the log likelihood
log L( ; x) = l( ; x) (4.4)
What is left is to …nd the maximum of this function with respect to the pa-
rameters in . The maximum, if it exists is found by solving the system of k
simultaneous equations,
l( ; x)
= 0; (4.5)
i
var(^) = [I( )] 1
: (4.8)
So far we have not assigned any speci…c distribution to the density function.
Let us assume a sample of T independent normal random variables fX ~ t g. The
normal distribution is particularly easy two work with since it only requires two
parameters to describe it. We want to estimate the …rst two moments, the mean
2
and the variance 2 , thus = ( ; ): The likelihood is,
" T
#
T =2 1 X
2
L( ; x) = 2 exp (xt )2 : (4.9)
2 2 t=1
and,
T
X
l 2 4
2
= (T =2 ) + (1=2 ) (xt )2 : (4.12)
t=1
T
X
(xt )2 T 2
= 0: (4.14)
t=1
2
If this system is solved for and we get the estimates of the mean and the
variance as1
T
1X
^x = xt (4.15)
T t=1
T T
" T #2
1X 1X 2 1 X
^ 2x = (xt 2
^x) = x xt : (4.16)
T t=1 T t=1 t T t=1
T 1
The correction involves multiplying the estimated variance with T :
We have derived the maximum likelihood estimates for a single independent nor-
mal variable. How does this relate to a linear regression model? Earlier, when
we discussed the moments of a variable, we showed how it was possible, as a gen-
eral principle, to substitute a random variable with a function of the variable.
The same reasoning applies here. Say that X ~ is a function of two other random
variables Y~ and Z. Assume the linear model
yt = zt + xt ; (4.22)
where Y~ is a random variable, with observations fyt g and zt is, for the time
being, assumed to be a deterministic variable.(This is not a necessary assumption).
Instead of using the symbol x, for observation on the random variable X; ~ let us
set xt = t where t N ID(0, 2 ): Thus, we have formulated a linear regression
model with a white noise residual. This linear equation can be rewritten as,
t = yt zt (4.23)
where the RHS is the function to be substituted with the single normal variable
xt used in the MLE example above. The algebra gets a bit more complicated but
the principal steps are the same.2 The unknown parameters in this case are and
2 As a consequence of more complex algebra the computer algorithms for estimating the vari-
ables will also get more complex. For the ordinary econometrician there are a lot of software
packages that cover most of the cases.
The last factor in this expression can be identi…ed as the sum of squares func-
tion, S( ). In matrix form we have,
T
X
S( ) = (yt zt )2 = (Y Z )0 (Y Z ) (4.25)
t=1
and
2 2 2
l( ; ; y; z) = (T =2) log 2 (T =2) log (1=2 )(Y Z )0 (Y Z ) (4.26)
S
= 2Z 0 (Y Z ); (4.27)
Notice that the ML estimator of the linear regression model is identical to the
OLS estimator.
The variance estimate is,
^ 2 = 0 =T; (4.29)
which in contrast to the OLS estimate is biased.
To obtain these estimates we did not have to make any direct assumptions
about the distribution of yt or zt : The necessary and su¢ cient condition is that yt
conditional on zt is normal, which means that yt zt = t should follow a normal
distribution. This is the reason why MLE is feasible even though yt might be a
dependent AR(p) process. In the AR(p) process the residual term is a independent
normal random variable. The MLE is given by substitution of the independently
distributed normal variable with the conditional mean of yt :
The above results can be extended to a vector of normal random variables. In
this case we have a multivariate normal distribution, where the density is
X 2
12
X
2 2
= 1
2 with j j= 1 2 (1 p2 ); (4.32)
21 2
P
where p is the correlation coe¢ cient. As can be seen j j> 1 unless p2 = 1. If
12 = 21 = 0; the two processes are independent and can estimated individually
(To be completed, add …gure of normal distributed variable with value of likelihood
function (L) on the vertical axis and parameter value on the horizontal axis, with
(^) is indicating the maximum value of L).
There are three approaches to testing a statistical model model. The …rst is
to start with an unrestricted model and imposed restrictions on the estimated
model. The second approach is to impose the restrictions prior to estimation, and
estimate a restricted model. The test is then performed by asking if the restriction
should be lifted. The third approach, is to test for signi…cant di¤erences between
an estimated restricted model and an estimated unrestricted model. The last
approach involves estimating two models, rather than one.
The three approaches of testing are named
Likelihood Ratio tests (LR) - estimate both the unrestricted and the re-
stricted models.
A test is labeled Wald, Lagrange Multiplier or Likelihood ratio depending
on how it is constructed. A typical Wald test is the ”t-test”for signi…cance.
A Lagrange multiplier test is the LM test of autocorrelation. Finally, the
F-test for testing the signi…cance of one or several parameters in a group
represents a typical Likelihood ratio test.
^R
L
= : (5.1)
^
LU
This lead to the test statistic ( 2 ln ) which has a 2 (R) distribution, where
R is the number of restrictions.
The Wald test compares (squared) estimated parameters with their variances.
In a linear regression, if the residual is N ID(0; 2 ), then ^ N ( ; var( ^ )), so
( ^ ^
) N (0; var( ); and a standard t-test will tell if is signi…cant or not.
More generally
P if we have vector of normally distributed random variables
^ Nj ( ; ), then have
X
0 1 2
(x ) (x ) (J): (5.2)
The LM test statistic for testing if the parameters 1 to p are zero, amounts to
estimating the equation with OLS and calculate the test statistics T R2 , distributed
as 2 (p) under the null of no autocorrelation. Similar tests can be formulated for
testing various forms of heteroscedasticity.
Tests can often be formulated in such a way that they follow both 2 and
F -distributions. In less than large samples the F -distribution is better one to use.
The general rule for choosing among tests based on the F or the 2 distribution
is to use the F distribution, since it has better the small sample properties.
If the information matrix is known (meaning that it is not necessary to estimate
it), all three tests would lead to the same test statistic, regardless of the chosen
distribution 2 or F . I all all three approaches lead to the same test statistics,
we would have RW = RLR = RLM . However, when the information matrix is
estimated we get the following relation between the tests RW RLR RLM .
Remember (1) that when dealing with limited samples the three tests might
lead to di¤erent conclusions, and (2) if the null is rejected the alternative can
never be accepted. As a matter of principle, statistical tests only rejects the
null hypothesis. Rejection of the null does not lead to accepting the alternative
hypothesis, it leads only to the formulation of new null. As an example, in a test
where the null hypothesis is homoscedasticity, the alternative is not necessarily
heteroscedasticity. Tests are generally derived on the assumption that ”everything
else” is OK in the model. Thus, in this example, rejection of homoscedasticity
could be caused by autocorrelation, non-normality, etc. The econometrician has
to search for all possible alternatives.
43
6. RANDOM WALKS, WHITE NOISE
AND ALL THAT
This section looks at di¤erent types of stochastic time series processes that are
important in the economics and …nance. Time series is a series where the data
is ordered by time. A random variable (X) ~ is a variable which can take on more
than one value, and for each value it can take one there is a value between zero
and one that describes the probability of observing that value. We distinguish
between discrete and continuous random variables. Discrete random variables can
only take on a …nite number of outcomes. A continuous random variable can take
one value between -1 and +1: The mathematical model of the probabilities
associated with a random variable is given by the distribution function F (x),
F (x) = P (X ~ x): If we have a continuous random variable, we can de…ne the
probability density function of the random variable as, f (x) = dFdx(x) : Random
variables are characterized by the probability functions, and their moments.
First, second, third and fourth moments all describe the characteristics of a
random variable. By estimating these we describe a random variable. All moments
have direct implication for risk-and return decisions. Mean = return, Variance
= risk, skewness and kurtosis implies deviations from normal and might a¤ect
behavior. To be completed.
A stochastic time series process is then made up of a random variable that
over time can take on more than one value.
We denote a stochastic process as fX ~ t gT indicating that it starts at time zero
0
and continuous to time T . To de…ne a stochastic time series process we start
with the random variable (X ~ t ), which at time t can take on di¤erent values
i
at the future periods i = 1; 2; 3; ::n; where n might go to in…nity. Often we
will talk about conditional expectation of (X ~ t ), we want to estimate the most
likely future value, given the information we have today. A stochastic time series
process can be discrete or continuous. A discrete series is only changing values
at discrete time periods, while a continuous process is, or can potentially, change
values continuously and not only at discrete time intervals.
The conditional expectation is written as E(X ~ t+1 jIt ) or Et (X
~ t+1 ). To formalize
the use of conditional expectations, assume a probability space ( ; z; P ), where
is the total sample space (or possible states of the world), z denotes the tribe
of subsets of that are outcomes (observations), and P is a probability measure
associated with the outcomes. A very practical question in modeling is if there
exists a simple mathematical form for associating outcomes with probabilities.
Usually we will refer to the tribe of subsets z as the information set It :We will
assume that memory is not forgotten by the decision makers, so the information
set is increasing over time,
It0 It1 ::: Itk Itk+1 :::
In a discrete time setting we refer to this increasing sets as an increasing se-
quence of sigma-…elds. In a continuous time setting, where new information arrives
continuously, rather than at discrete time intervals, the increasing information set
is referred to as a …ltration, or an increasing family of sigma-algebra. A very uno¢ -
A random variable is a white noise process if its expected mean is equal to zero,
E[ t ] = = 0; (6.1)
its variance exists and is constant 2 , and there is no memory in the process
so the autocorrelation function is zero,
2
E[ t t] = (6.2)
E[ t s] = 0 f or t 6= s: (6.3)
In addition, the white noise process is supposed to follow a normal and in-
dependent distribution, t N ID(0; 2 ). A p standardized white noise have a
distribution like N ID(0; 1). Dividing t with 1= 2 gives ( t = ) ~N ID(0; 1):
The independent normal distribution has some important characteristics. First,
if we add normal random variables together, the sum will have a mean equal to the
sum ofPthe mean of all variables. Thus, adding T white noise variables together as,
T
zT = t=1 ( t = ) forms a new variables with mean E(zT ) = E( 1 = ) + E ( 2 = ) +
:: + E ( T = ) = (1= ) [E( 1 ) + E( 2 ) + ::: + E( T )] = 0: Since each variable is in-
dependent, we have the variance as 2z = 2z;1 + 2z;2 + :: + 2z;T = 1 + 1 + :: + 1 = T .
The random p variable is distributed as zt ~N ID(0; T ); with a standard deviation
given as 1= T : As the forecast horizon
p for zt increases, a 95% forecast con…dence
interval also increases with 1:96 T :
In the same way, we can de…ne the distribution, mean and variance during
subsets of time. If t ~N (0; 1) is de…ned for the period of year. The variables
will
p be distributed over six months as, N (0; 1=2), with a standard deviation of
1= 2, p over three months the distribution is N (0; 1=4), with a standard deviation
of 1= 4. For any fraction ( ) over
p the year, the distribution becomes N ID(0; 1= )
and the standard deviation 1= : This property of the variable following from the
assumption of independent distribution, is known as Markov property. Given that
x0 is generated from an independent normal distribution N ( ; 2 ); the expected
future value of xt at time x0+T is distributed as N ( T; 2 T ).
To sum up, it follows from the de…nition that a white noise process is not
linearly predictable from its own past. The expected mean of a white noise,
conditional on its history is zero,
E[ t j t 1 ; t 2 ; :::: 1 ] = E [ t ] = 0: (6.4)
This is a relatively weak condition. A white noise process might be predicted
by other variables, and by its own past using non-linear functions.
A process is called an innovation if it is unpredictable given some information
set It . A process yt is an innovation process w.r.t. the an information set if,
E[yt j It ) = 0: (6.5)
The non-parametric white noise can be used to de…ne (or generate) autoregressive
models (AR), and moving average models (MA). The AR(p) model is
yt = + a1 yt 1 + ::: + yt p + t; (6.6)
2
where is E(yt ) = , and ~N ID(o; ): Or using the lag operator, Li xt =
xt i ,
A(L)yt = t; (6.7)
where A(L) = (1 a1 L a2 L2 ::: ap Lp ). The eigenvalues associated with
this polynomial informs about the time path of yt . The moving vicarage model of
order q is,
yt = + B(L) t : (6.9)
where n might be equal to in…nity. This de…nition does not rule out the case
that there are other variables that can be correlated with xt and thereby also
predict xt+1 . We can also say that a random walk has an in…nite long memory.
~ = 2t ;and
The mean is zero, the variance and autocovariance is equal to, var(X)
~ ~
Cov(Xt , Xt n ) = (t n) : 2
t
X
E(xt ) = E i = 0; (6.12)
i=1
" t
!#2 t X
t
X X
var(xt ) = E(x2t ) =E i = E [ei ej ] = t: (6.13)
i=1 i=1 j=1
We can see that, given su¢ ciently large number of observations, there is an
in…nite memory. All theoretical autocorrelations are equal to 1.0.
t
X
xt = x0 + i: (6.16)
i=0
Thus, a random walk is a sum of white noise error from the beginning of the
series (x0 ). Hence, the value of today is dependent on shocks from built up be-
ginning of the series. All shocks in the past, are still a¤ecting the seriesPtoday.
t
Furthermore, all shocks are equally important. The process formed by i=1 is
called a stochastic trend. In contrast to a deterministic trend, the stochastic trend
is changing its slope in a random way period by period. Ex post a stochastic trend
might look like deterministic trend. Thus, it is not really possible to determine
whether a variable is driven by a stochastic or a deterministic trend, or a combi-
nation of both.
If we add a constant term to the model we get a random walk with a drift,
xt = + xt 1 + t; (6.17)
where the constant represents the drift term. In this processxt is driven by
both a deterministic and a stochastic trend. If we perform the same backward
substitution as above, we get,
t
X
xt = t + i + x0 ; (6.18)
i=1
xt = + t; (6.19)
A random walk process is also a series integrated of order one, it is also called a
unit root process, and it contains a stochastic trend. Furthermore a random walk
process can also be embedded in another process, say an ARIM A(p; d; q)process.
The problem is that it is problematic to do inference on random walk variables
(and integrated variables) because the estimated parameter on the lagged term will
not follow a standard normal distribution. Hence, ordinary t , chi square and
F distributions are not suitable for inference. Parameter estimates will generally
be asymptotically unbiased. Their standard errors and variances do not follow
standard distributions.
For instance, a common t-test cannot be used to test if a = 1 in the regression,
xt = axt 1 + t: (6.20)
~ t+s
E[(X ~ t ) j It ] = E(X
X ~ t+s xt ) = 0: (6.22)
which says that, on average the expected value is growing over time. A super-
martingale is de…ned as
h i
E X ~ t+s j It xt ;
from the process by conditioning or direct calculation. In fact most variables can be transformed
into a martingale in this way. An alternative way of transforming a variable into a martingale is
to transform its probability distribution. In this method you look for a probability distribution
which is ’equivalent’to the one generating the conditional expectations. This type of distribution
is called an equivalent martingale distribution.
MARTINGALE PROCESSES 51
variable (G ~ t ), we have E(Pt+1 ) = gt + Pt , which takes us even further away from
the random walk.
It is important to distinguish between martingales and random walks. Finan-
cial theory ends in statements about the expected mean of a variable with respect
to a given information set. A random walk is de…ned in terms of its own past only.
Thus, saying that a variable is a random walk does not exclude the case that there
exists an information set for which the variable is not a martingale.
Furthermore, the residuals in a random walk model are by de…nition indepen-
dent, if we assume them to be white noise. But, a martingale describes behavior
of the …rst moment of a random variable. It does not imply independence between
the higher moments of the series. If we model a martingale by a …rst order au-
toregressive process, we might …nd that the errors are dependent through higher
moments. The variance of t is not 2 , but a function of its own past, like
2 2
t = + t 1 + t; (6.23)
where t is a white noise process. This is a …rst order ARCH(1) model (Auto
Regressive Conditional Heteroscedasticity), which implies that a large shock to
the series is likely to be followed by another large shock. In addition, it implies
that the residuals are not independent of each other.
The conclusion is that we must be careful when reading articles which claim
that the exchange rate, or some other variable should be, or is, random walks, often
what the authors really mean is that the variable is a martingale, conditional on
some information.
The martingale property is directly related to the e¢ cient market hypothesis
(EMH), which set out the conditions under which changes in asset prices becomes
unpredictable given di¤erent types of information.
Markov3 processes represent a general type of series with the property that the
value at time t contains all information necessary to form probability assessments
of all future values of the variable. Compared with the martingale property above,
this property is more far reaching. The martingale property is concerned with the
conditional expectation of a variable, and not with the actual distribution function
and the higher moments of the variable. Markov processes and the associated
Markov property are important because it helps us to form stochastic time series
processes. In economics and …nance we like explain how expectations are generated
and how expectations a¤ects the outcome of observed prices and quantities on
various markets.
In particular, in …nancial economics and the pricing of derivatives, we like to
model asset prices as continuous stochastic processes Once we can trace the price
of asset continuously over time into the future, we can also determine the price
of derivatives though replication and arbitrage In addition, we learn how to use
derivatives to continuously hedge risky positions.4
To predict or generate future possible paths of a Markov variable, we only
need to know the most recent value, or its recent values of the variable. This is,
3 Markov is known for a number of results, including the so-called Markov estimates that
value from some underlying asset, and (2) at the time of expiration has exactly the same price
as the underlying asset.
~ t+s
Pr ob(X ~ t+s
xt + s j x1 ; x2 ; :::xt ) = Pr ob(X xt+s j xt ); (6.24)
where s > 0: The expression says that all probability statements of future
values of the random variable X~ t+s is only dependent on the value the variable
takes at time t, and do not depend on earlier realizations. By stating that a
variable is a Markov process we put a restriction on the memory of the process.
The AR(1) model, and the random walk, are …rst-order Markov process,
2
xt = a1 xt 1 + t where t N ID(0; ): (6.25)
Given that we know that t is a white noise process (N ID(0; 2 )]; and can
observe xt we know all what there is to know about xt+1 = ; since xt = contains
all information about the future. In practical terms, it is not necessary to work
with the whole series, only a limited ’present’. we can also say that the ’future’
of the process, given the ’present’, is independent of the ’past’. For a …rst order
Markov process, the expected value of X ~ t+1 , given all its possible present and
~tX
historical values X ~t 1, X
~ t 2 :::, can be expressed as,
h i h i
E X~ t+1 j X;
~ X~t ~ ~
1 ; Xt 2 :::Xt 1 =E X~ t+1 j X
~t : (6.26)
Thus, a …rst order Markov process is also a martingale. Typically, the value of
X~ t is know at time t as xt : The Markov property is a very convenient property if we
want to build theoretical models describing the continuous evolution of asset prices.
We can focuses on the value today, and generate future time series, irrespective
of the past history of the process. Furthermore, at each period in future we can
easily determine an ’exact’future value, which is the ’equilibrium’price for that
period.
The white noise process, as an example, is a Markov process. This follows from
the fact that we assumed that each t was independent from its own past, and
future. One outcome of the assumption of a normal and independent process, was
that we could relatively easy form predictions and con…dence intervals given only
the value of t ’today.
The de…nition of a Markov process can be extended to an m : th order Markov
processes, for which we have;
h i h i
E X~ t+1 j X
~t; X
~t ~ ~
1 ; Xt 2 :::Xt 1 =E X~ t+1 j X;
~ X~t ~ ~
1 ; Xt 2 :::Xt m : ; (6.27)
MARKOV PROCESSES 53
6.8 Brownian Motions
Consider the random walk model, xt = xt 1 + t and assume that the distance
between t and t 1 becomes smaller and smaller. As the distance between the
observations gets smaller the function will in the end get so close to a continuous
function that it becomes indistinguishable from a function in continuous time
x(t) = x(t 1) + (t): This takes us to the random walk in continuous time,
known as a Brownian motion or Wiener process. This section introduces, Brownian
motions (Wiener process), geometric Brownian motion, jump di¤usion models and
Ornstein- Uhlenbeck process.
There are (at least) two very important reasons for studying Wiener processes.
The …rst is that the limiting distribution of most non-stationary variables in eco-
nomics and …nance are given as functions of a Brownian motion. It is this knowl-
edge that helps us to understand the distribution of estimates based on non-
stationary variables. The second reason for learning about Brownian motions is
that they play an important role in modeling asset prices in …nance.
A word of warning, though Brownian motions have nice mathematical proper-
ties it is not necessarily so that it also …ts given data series better. Normal discrete
empirical modelling will take you a long way.
The random walk is de…ned in discrete time. The intuition behind the random
walk and the Brownian motion is as follows. If we let the steps between t and t 1
become in…nitely small, the random walk can be said to converge to Brownian
motion (or Wiener process. As the distance between t and t 1, alternatively
between t and t+1, it becomes harder and harder to distinguish between a discrete
time process and continuous time process. In the end, the di¤erence will be so
small that it will not matter.
These processes have a long history. The Brownian motion was named after
an English botanist, Robert Brown, who in 1827 observed that small particles im-
mersed in a liquid, exhibited ceaseless irregular motion. Brown himself, however,
named a few persons who had observed this phenomena before him. In 1900 a
french mathematical named Bachelier described the random variation in stocks
prices when he wanted to explain option prices. In 1917 Einstein observed similar
behavior gas molecules. Finally,
Norbert Wiener gave the process a rigorous mathematical treatment in a series
of papers during 1918 and 1923.
Is there a di¤erence between what we call a Wiener processes and Brownian
motion? In practice the answer is no. The two terms can and are used inter-
changeably. If you look at the details you will …nd that the Brownian motion
have normally distributed increments. The Wiener process, on the other hand, is
explicitly assumed to be a martingale. No such statement is made for the Brown-
ian motion.5 In practice, these di¤erences means nothing (for more information
search for the Lévy theorem). In econometrics there is a tendency to use Wiener
processes to represent univariate processes and Brownian motion for multivariate
processes.
The most important characteristic of a Brownian motion is that all increments
are independent, and not predictable from the past. Thus the Brownian motion
can be said to be a martingale and it ful…lls the Markov property. The latter means
that the distribution of future values at (t + dt) depends only the current value
of x(t). This is a good characteristic of models describing insecurity, in particular
situations when nature is evolving as a function of random steps that we cannot
5 See Neftci, Salih (2000), An Introduction to the Mathematics of Financial Derivatives, 2 ed.
In terms of the change over a speci…c (possibly) observable time period we need to
introduce the notation t to represent the change over some fraction of time t. By
using this notation we can let t be a year or a month, and then by changing we
can let the length of the period become smaller and smaller. The change due to
the deterministic trend is written, per unit of time, as p t. The stochastic noise
that we add to dx over a given interval is written as t, where t ~N ID(0; 1).
In the limit, as ! 0, we have that x ! dt: In terms of small intervals the
Brownian motion becomes
p
xt = t+ t: (6.29)
To understand the asymptotic properties of the Brownian we could let ! 1,
but there is a better way to see what happens. As we study a standardized
Brownian motion/Wiener process W (t) over the interval, [0; T ] we will …nd that
we can divide this interval into segments ti ti 1 ;
The arithmetic Brownian motion is not well suited for asset prices as their changes
seldom display a normal distribution. The log of asset prices, and return, is better
described with a normal distribution. This takes us to the geometric Brownian
motion
dxt
= dt + dWt
xt
What happens here is that we assume that ln xt has a normal distribution,
meaning that xt follows a log normal distribution, and dt + dWt follows a
normal variable. Ito’s lemma can be used to show that
2
d ln xt = dt + dWt :
2
The expected value of the geometric Brownian motion is E(dxt =xt ) = dt, and
the variance is V ar(dxt =xt ) = 2 dt:
There are several ways in which the model can be modi…ed to better suit
real world asset prices. One way is to introduce jumps in the process, so-called
"jump di¤usion models". This is done adding a Poisson process to the geometric
Brownian motion,
dxt
= dt + dWt + Ut dN ( );
xt
where Ut is a normally distributed random variable, Nt represent a Poisson
process with intensity to account for jumps in the price process.
The random walk model is good for asset prices, but not for interest rates.
The movements of interest rates are more bounded than asset prices. In this case
the so-called Ornstein-Uhlenbeck process provides a more realistic description of
the dynamics,
drt = (b rt )dt + Wt :
Thus the idea behind the Ornstein-Uhlenbeck process is that it restricts the move-
ments of the variable (r) to be mean reverting, or to stay in a band, around b,
where b can be zero.
with a variance h i
~ X(t
var X(t) ~ 1) = 2
(t s); (6.35)
where 0 s < t. Finally, since the increments are a martingale di¤erence
process, we can assume that these increments follow a normal distribution, so
~
X(t) N [0, (t s)]: These assumptions lead to the density function,
D[x(t)]
n
Y
1 x21 (ti ti 1 ) (1=2) (xi xi 1 )2
= p exp p exp (6.36)
(2 )ti 2 2 t1 i=2
(2 )t1 2 2 (ti ti 1 ;
2
When = 1, the process is called a standard Wiener process or standard
Brownian motion. That the Brownian motion is quite special, can be seen from
this density function. The sample path is continuous, but is not di¤erentiable.
[In physics this is explained as the motion of a particle which at no time has a
velocity].
Wiener processes are of interest in economics of many reasons. First, they
o¤er a way of modeling uncertainty. Especially in …nancial markets, where we
sometimes have an almost continuous stream of observations. Secondly, many
macro economic variables appear to be integrated or near integrated. The limiting
distributions of such variables are known to be best described as functions of
Wiener processes. In general we must assume that these distributions are non-
standard.
To sum up, there are …ve important things to remember about the Brownian
motions/Wiener process;
It represents the continuous time, (asymptotic) counterpart of random walks.
It always starts at zero and are de…ned over 0 t < 1:
The increments, any change between two points, regardless of the length of
the intervals, are not predictable, are independent, and distributed as N (0,
(t s) 2 ), for 0 s < t.
It is continuous over 0 t < 1, but nowhere di¤erentiable. The intuition
behind this result is that the di¤erential implies predictability, which would
go against the previous condition.
Finally, a function of a Brownian motion/Wiener process will behave like a
Brownian motion/Wiener process.
The last characteristic is important, because most economic time series vari-
ables can be classi…ed as, random walks, integrated or near-integrated processes.
In practice this means that their variances, covariances etc. have distributions
that are functionals of Brownian motions. Even in small samples will functionals
of Brownian motions better describe the distributions associated with economic
variables that display tendencies of stochastic growth.
"Time is a great teacher, but unfortunately it kills all its pupils" Louis Hector
Berlioz
A time series is simply data ordered by time. And, time series analysis is simply
approaches that look for regularities in these data ordered by time. Stochastic time
series play an important part in economics and …nance. To forecast and analyse
these series it is necessary to take into account not only their stochastic nature but
also the fact that they are non-stationary, dependent over time and are by nature
correlated among each other. In theoretical models, the emphasis on intertemporal
decision making highlights the role expectations play in a world where decisions
must be made from information sets made up of stochastic processes.
All time series techniques aim making the series more understandable by de-
composing them into di¤erent parts. This can be done in several ways. This
introduction’s aim is to give a general overview of the subject. A time series is
any sequence ordered by time. The sequence can be either deterministic or sto-
chastic. The primary interest in economics is in stochastic time series, where the
sequence is made up by random variables. A sequence of stochastic variables or-
dered by time is called a stochastic time series process. These random variables
making up the process can either be discrete, taking on a given set of integer
numbers, or be continuous random variables taking on any real number between
1: While discrete random variables are possible they are not common.
Stochastic time series can be analysed. in the time domain or in the fre-
quency domain. The former approach analysis stochastic processes in given
time periods like, days, weeks, years etc. The frequency approach aims at decom-
posing the process into frequencies by using trigonometric functions like sinuses,
etc. Spectral analysis is an example of analysis that uses the frequency domain, to
identify regularities like seasonal factors, trends, and systematic lags in adjustment
etc. In economics and …nance, where we are faced with given observations and we
study the behavior of agents operating in real time, the time domain is the most
interesting road ahead. There are relatively few problems that are interesting to
analyze in the frequency domain.
Another dimension in modeling is processes in discrete time or in contin-
uous time. The principal di¤erence here is that the stochastic variables in a
continuous time process can be measured at any time t; and that they can take
di¤erent values at any time. In a discrete time process, the variables are observed
at …xed intervals of time (t), and they do not change between these observation
points. Discrete time variables are not common in …nance and economics. There
are few, if any variables that remain …xed between their points of observations.
The distinction between continuous time and discrete time is not matter of mea-
surability alone. A common mistake is to be confused the fact that economic
variables are measured at discrete time intervals. The money stock is generally
measured and recorded as an end-of-month value. The way of measuring the stock
of money does not imply that it remains unchanged between the observation in-
terval, instead it changes whenever the money market is open. The same holds for
variables like production and consumption. These activities take place 24 hours
a day, during the whole year. The are measured as the ‡ow of income and con-
variables. Stocks are variables that can be observed at a point in time like, the money stock,
inventories. Flows are variables that can only be observed over some period, like consumption or
GDP. In this context price variables include prices, interest rates and similar variables which can
be observed at a market at a given point in time. Combining these variables into multivariate
process and constructing econometric models from observed variables in discrete time produces
further problems, and in general they are quite di¢ cult to solve without using continuous time
methods. Usually, careful discrete time models will reduce the problems to a large extent.
Present ARIMA
A class of models ARIMA (p,dq,) ARFIMA(p,d,q) models
Operators
Box Jenkins
Non-stationarity, dynamics
Trend
Seasonal e¤ects
Deterministic variables
Theory ARIMA:
After ARIMA?
ARIMAX, Transfer function RDL, ARCH/GARCH
Structural: Single equation, ADL
2 For simplicity we assume a linear process. An alternative is to assume that the components
Random variables are described by their moments. Stochastic time series can be
described by their means, variances and autocovariances. Given a random variable
Y~t which generates an observed process fyt g, the mean and the variance are EfY~t g
= and varfY~t g: The autocovariance at lag k is
Given the variance, and the standard deviation of the estimated variable, it be-
comes possible to set up a signi…cance test. Asymptotically, this t-test has a
normal distribution, with an expected value of zero under the null of no autocor-
relation (no memory in the series). For a limited sample, a value of ^k larger than
two times its standard error is considered signi…cant.
The next question is how much autocorrelation is left between the observations
at t and t k (Y~t and Y~t k ) after we remove (condition on) the autocorrelation
between t and t k? Removing the autocorrelation means that we …rst calculate
the mean of Y~t conditional on all observation on Y~t and Y~t k 1 ;another way of
3 Standard practice is to calculate the …rst K T =4 sample autocorrelations.
A fundamental issue when analyzing time series processes is whether they are
stationary or not. As a …rst, general de…nition, we can say that a non-stationary
series changes its behavior over time such that the mean is changing over time.
Many economic time series are non-stationary in the sense that they are growing
over time, their estimated variances are also growing and the covariance function
never dies out. In other words the calculation of the mean, autocovariance etc.
are dependent on the time period we study, and inference becomes impossible. A
stationary series on the other hand displays a behavior which is independent of
the time period and it becomes possible to test for signi…cance. Non-stationarity
must either be removed before modeling or included in the model. This requires
that we know what type of non-stationarity we are dealing with.
The problem with non-stationary is that a series can be non-stationary in an
in…nite number of ways. And, to make the problem even more complex some types
of non-stationarities will skew the distributions of the estimates such that inference
based on standard distributions such as the t , the F or the 2 distributions
are not only wrong but completely misleading. In order to model time series, we
need to understand what non-stationarity is, how to estimate it and how to deal
with it.
Of the two concepts, weak stationarity is the practical one. Weak stationarity
is de…ned in terms of the …rst two moments of the process, the mean and the
variance. A process fxt g is (weakly) stationary if (1) the mean is independent of
time t,
Efxt g = ;
(2) the variance exists and is less than in…nity,
2
varfxt g = < 1;
covfxt ; xt k) = k:
cov(xt ; xt k) ! 0 as k ! 1:
1. (a) i. In this chapter we deal with a very broad class of models named
ARMA models, autoregressive moving average models. These are
a set of models that describe the process fxt g as a function of its
own lags and a white noise process. The autoregressive models of
order p [AR(p)],
xt = a0 + a1 xt 1 + ::: + ap xt k + et ;
xt = a 0 + et b1 e t 1 ::: bq et q;
In empirical work, the question is to …nd the correct lag length. If we chose
to few lags the model will be de…nition be misspeci…ed, and the assumption of
normally distributed white noise residual will be wrong. On the other hand,
adding more lags to the AR or M A process will make the model capture more
of the possible memory of the process, but the estimates will be ine¢ cient. We
need to add as few lags as possible, without rejecting the assumption of white
noise residuals. The Box-Jenkin’s method suggests that we start with a relatively
large number of lags and tests for autocorrelation. Among those models, which
has no signi…cant autocorrelation, we then pick the model with the lowest possible
information criteria.
In the Box-Jenkins approach, testing for white noise is equal to testing for
autocorrelation. The typical test for autocorrelation is the Box-Pearce test, also
known as the portmanteau test, sometimes as the Q-test or the Ljung -Box test.
To test for p:th order autocorrelation in a mean adjusted series, "t ; calculate the
k:th order autocorrelation coe¢ cient,
PT
t=k+1 ^
"t ^"t k
^k = PT
"2t
t=1 ^
for r = 1; 2; :::p: The Box-Pearce test statistic is then given by
p
X
BP = T ^2k :
k=1
Under the null of no autocorrelation this test statistic has a 2 (p) distribution.
The Box-Pearce statistics is best suited for testing the residual in an AR model.
A modi…cation, for ARMA, and more general regression models, is the so called
Box-Ljung statistics,
p
X ^2r
BL = T (T + 2) ;
r=1
(T r)
where ^ 2" is the estimated residual variance. Since an estimated residual vari-
ance gets smaller the more lags there are in the model, the last term (2k=T ) tries
to compensate for the number of estimated parameters in the models. The smaller
the value of the information criteria the better is the model, as long as there is no
autocorrelation.
For model with both AR and MA components Hannan and Rissanen suggested
a di¤erent model,
When dealing with time series and dynamic econometric models, the expressions
are easier to handle with the backward shift operator (B) or the lag operator
(L).5 The backward shift operator is the symbol most often used in statistical
textbooks. Econometricians tend to use the lag operator more often. The …rst
order lag operator is de…ned as,
Lxt = xt 1; (7.4)
Ln xt = xt n: (7.5)
The lag operator is an expression such that when its is multiplied with an
observation at any given time, it will shift the observation one period backwards
5 The practical di¤erence between using the lag operator or the backward shift operator is
that the lag operator also a¤ects the conditional expectations generator Et which is of interest
when working with economic theories dealing with expectations.
a0 xt + a1 xt 1 + a2 xt 2 + ::: + ap xt p
= a0 xt + a1 Lxt + a2 L xt + ::: + ap Lp xt
2
Notice that the lag operator can be moved across the equal sign. The AR(1)
model, xt = a1 xt 1 + "t can be written as (1 La1 )xt = "t or A(L)xt = "t or
1
xt = [A(L)] "t . If necessary the lag length of the process can be indicated as
Ap (L): An ARM A(p; q) process can be written compactly as,
Skipping the indication of lag lengths for convenience, the ARMA model can
1
written as xt = [A(L)] B(L)"t or alternatively depending on the context as
1
[B(L)] A(L)xt = "t : Thus, the lag operator works as any mathematical expres-
sion. However, whether or not moving the lag operator around results in a mean-
ingful expression is associated with the principles of stationarity and invertibility,
know as duality.
The function A(L) is a convenient way of writing the sequence. More generally
we can refer to any expression of the type A(L) as a generating function. This
includes the mean operator, the variance and covariance operators etc. Generating
functions summarize a lot of information about sequences in a compact way and
are an important tool in time series analysis. Their main advantage is that they
saves time and make the expressions much simpler since a number mathematical
operations can be applied to generating
P functions. As an example, given certain
conditions concerning the sum ai ; we can write invert A(L); and A(L) 1 A(L) =
1:
The generating function for the lag operator is
k
X
D(L) = di z i ; (7.8)
i
where di is generated by some other function. The point here is that it is often
easier to do manipulations on D(L) directly than on each individual element in
the expression. In the example above, we would refer to A(L)xt as the generating
function of xt .
A property of generating functions is that they are additive. If we have two
series, ai , bi and i = 0; 1; 2; :::, and de…ne a third series as ci = ai + bi , it then
follows that,
C(L) = A(L) + B(L): (7.9)
i
X
di = a0 bi + a1 bi 1 + a 2 bi 2 + ::: + ai b0 = ah bi h: (7.10)
h=0
The results stated in this section should be compared with chapter 19, below,
which shows how long-run multipliers, etc. can be derived from lag operator.
Given the de…nition of the lag operator (or the backward shift operator) the dif-
ference operator ( ) is de…ned as,
=1 L; (7.12)
Dx = dx=dt; (7.13)
where D = d=dt.
Di¤erences of higher order are denoted in the same way as for the lag operator.
Thus for the second di¤erence of xt we write,
2
xt = (1 L)2 xt = (1 2L + L2 )xt = xt 2xt 1 + xt 2: (7.14)
d
xt = (1 L)d xt :
The subscript s indicates the interval over which we take the (seasonal di¤er-
ence). If xt is quarterly, setting s = 4, leads to the yearly changes in the series.
This new series can the be di¤erenced by using the di¤erence operator,
d d
s xt = (1 Ls )xt :
yt = yp + yc ; (7.15)
where yp represents the particular solution, the long-run steady state equilib-
rium or the stationary long-run mean of yt ; and yc represents the complementary
solution, the deviation from the long-run steady state.
Dynamic stability requires that yc vanishes as T ! 1: The roots of the poly-
nomial A(L) tell us if this occurs. Given a change in "t ; what will happen to
yt+1 ; yt+2, ... yt+1 ? Will yt+1 explode, continue to grow for ever, or change tem-
porary until it returns to the steady state equilibrium described by yp ? The roots
are given by solving for the r : s in the following equation,
r p + a1 r p 1
+ a2 rp 2
+ ::: + ap = 0: (7.16)
This equation leads to the latent roots of the polynomial. The condition for
stability, when using the latent roots, is that the roots should be less than unity,
or ”that the roots should be inside the unit circle”. Root equal to unity, so called
unit roots, imply an evergrowing series (stochastic trend), roots greater than unity
implies an explosive process. Complex roots suggest that the adjustment is cyclical
. Though not very likely, the process could follow an explosive cyclical path or
cyclical permanent shocks. If the process is stationary, following a shock, yt will
return to its stationary long-run mean. The roots can be complex indicating
cyclical behavior. The case with one or several ”unit roots”is of particular interest
because it represents stochastic growth in a non-stationary variable. Series with
one or more unit roots are also called integrated series. Many economic time
processes appears to have a unit root, or roots close to unity.
Using latent roots to de…ne stability is common, but is not only way to de…ne
stability. Latent roots, or eigenvalues, are motivated with the fact that they are
2 p
1 a1 + a2 + ::: + ap =0 (7.17)
where : If the roots are greater than unity in absolute value j j> 1, ”lies
outside the unit circle”the process is stationary, if the roots are less than unity the
process is explosive. The historical literature on time series uses both de…nitions,
however, latent roots, or eigenvalues are now the established ”standard”.
The parameters of an ARMA model might not be unique. To see the conditions
for uniqeness, decompose the polynomials of the ARMA process A(L)yt = B(L)"t
into their factors6 as,
p
A(L) = i=1 (1 i L); (7.18)
and
q
B(L) = j=1 (1 j L): (7.19)
6 If A(L) contains the polynominal 1 L the process is said to have a unit root.
There is a link between AR and M A models, as the presentation of the lag operator
indicated. An AR process with an in…nite number of lags can under certain
conditions be rewritten as a …nite M A process. In a similar way an in…nite moving
average process can be inverted to an autoregressive process of …nite order.
These results have two practical implications. The …rst is that in practical
modelling, a long M A process can often be rewritten as a shorter AR process in-
stead, and the other way around. The second implication is that the two process
are complementary to each other. The combination of AR and M A into ARM A
will lead to relatively parsimonious models meaning models with quite few pa-
rameters. In fact, it is quite uncommon to …nd ARM A models above the order
p = 2 and q = 2.
The AR(1) process, yt = a1 yt 1 + "t , can be written as (1 a1 L)yt = "t , and
in the next step as yt = (1 a1 L) 1 "t : The term (1 a1 L) 1 represents the sum
of an in…nite moving average process,
X 1
1
yt = "t = bi " t i = B(1)"t ;
(1 a1 L) i=0
=
(1 a1 )
The last step is simply an application of the solution to an in…nite series, which
works in this case as long as the AR process is stationary, ja1 j < 1. It is important
that you understand the use of the expectations operator in this example because
the technique is frequently used to derive a number of results. We could have
reached the result in a simpler way if we had used the lag operator. Take the
expectation of E(1 a1 L)yt = E( + "t ). The lag operator is a deterministic
factor why the result is E(yt ) = (1 a1 L) : Again, the left hand side is the sum
of an in…nite process. If there is no constant, = 0 it follows immediately that
E(yt ) = 0:
What is the variance of the process yt ? The answer is given by understand-
ing that E(yt yt ) = V ar(yt ) = 2 :Thus, start from the AR(1) process, multiply
both sides with yt to get yt yt = a1 yt yt 1 + yt "t . Next, take expectations of
both sides, E(yt yt ) = a1 E(yt yt 1 ) + E(yt "t ), and substitute yt yt 1 and yt "t as
(a1 yt 1 + "t )yt 1 = a1 yt2 1 + "t yt 1 and yt "t = (a1 yt 1 + "t )"t . From this we have
a21 E(yt2 1 ) and a1 E("t yt 1 ) + E("2t ):In the latter expression we have by de…nition
that E("t yt 1 ) = 0 (recall the basic assumptions of OLS) and that E("2t ) = 2" .
Put the results together,
k
k = = ak1 :
From this expression it is obvious that the autocorrelation function for the
AR(1) process dies out slowly as the lag length k increases.
Calculating the mean, variance, autocovariances and autocorrelations for AR(1),
AR(2), MA(1) and MA(2) processes are standard exercise in time series courses,
followed by investigation of the unit root case a1 = 1: To be completed...
7.3.1 Seasonality
ADDITIONAL TOPICS 75
7.3.2 Non-stationarity
(To be completed)
Di¤erencing until stationarity is the standard Box-Jenkins approach. A bit
ad hoc. In econometrics the approach is to test …rst, but only reject the null
of integrated of order one in the case of strong evidence against. Alternatives,
include linear deterministic trends, polynomial trends etc. Dangerous, spurious
detrending under the maintained hypothesis of integrated variables.
7.4 Aggregation
The following section o¤ers a brief discussion about the problems of aggregation.
The interested reader is referred to the literature to learn more [Wei (1990 is a
good textbook with many references on the subject, see also Sjöö (1990, ch. 4].
Aggregation of series means aggregation over agents and markets, or aggregation
over time. The stock of money, measured by (M3), at the end of the month
represents an aggregation over individuals. A series like aggregate consumption in
the national accounts, represents an aggregation over both individuals and time.
Aggregation over time is usually referred to as temporal aggregation. Money
holdings is a stock variable which can be measured at any point in time. Temporal
aggregation of a stock variable implies picking observations with larger intervals,
using say a money series measured at the end of a quarter, instead of at the end
of each month. Consumption, on the other hand is a ‡ow variable, it cannot
be measured at a point in time, only as the sum of consumption over a given
period. Temporal aggregation in this case implies taking the sum of consumption
over intervals. The distinction is of importance because the e¤ects of temporal
aggregation are di¤erent for stock and ‡ow variables.
Aggregation, both over time and individuals, can change the functional form
of the distribution of the variables, and that it can a¤ect the residual variance
and t-values. Exactly how aggregation changes a model varies from situation
to situation. There are however some general conclusions regarding temporal
aggregation which we will repeat in this section. In many situations there is little
we can do about these problems, except working with continuous time models,
or=and select series with a low degree of temporal aggregation. That the problem
is hard to deal with is no excuse for forgetting or hiding them, as it is done in many
text books in econometrics. The area of aggregation is an interesting challenge for
econometricians since it has not been explored as much as it deserves.
An interesting example of the consequences of aggregation is given in Chris-
tiano and Eichenbaum (1987). They show how one can get extremely di¤erent
results by using discrete time models with yearly, quarterly and monthly data
compared with a continuous time model. They tried to estimate the speed of
adjustment in the stock of inventories, in the U:S national accounts. Using a
continuous time model they estimated the average time for closing 95% of the
gap between the desired and the actual stock of inventories, to be 17 days. The
discrete models predicted much higher rates. Using monthly data the result was
46 days, with quarterly data 7 months, and with yearly data 5 (1=2) year!
Aggregation also becomes an important problem if we have a theory that de-
scribes the stochastic behavior of a variable which we would like to test with
empirical data. There are many results, in macro and …nance, that predict that
series should follow a random walk, or be the outcome of a martingale process.
AGGREGATION 77
estimated strength of the relationship and can therefore lead to wrong conclusions
from Granger non-causality tests. For ‡ow variables, on the other hand, temporal
aggregation turns a one direction causality into what will appear to be a two-sided
causality. In this situation a clear warning is in place.
Finally, we also look at the aggregation of two random variables, X ~ t and Y~t .
Suppose that they are two independent stationary processes with mean zero,
~ t j yt ] = E[Y~t j xt ] = 0:
E[X (7.22)
~ t and Y~t are,
The autocovariances of X
cov(xt 1 ; xt k ) = x ;k ; (7.23)
cov(yt 1 ; yt k ) = y ;k : (7.24)
~ t and Y~t is,
The sum of X
Z~t = X
~ t + Y~t ; (7.25)
which will have an autocovariance equal to,
z ;k = x ;k + y ;k : (7.26)
and
Y~t ARM A(q; n): (7.28)
and,
Z~t = X
~ t + Y~t ; (7.29)
then,
Z~t ARM A(x1 ; x2 ); (7.30)
where x1 p + q and x2 max(p + n, q + m). As an example, think of a series
which is measured with a white noise error. That is, the true series is added to a
white noise series. If the true series is AR(p) then the result of this aggregation
will be an ARMA(p; p) process.
We can conclude this section by stating that aggregation leads to loss of in-
formation, which, if the aggregation is large, might fool us into assuming that the
random walk is the appropriate model. The extent to which aggregation leads
us to wrong conclusions has not been stated yet. Partly this is so because we
need better data on shorter time intervals than what is available. Remember that
ignoring problems is not a way of solving them. One way of dealing with the
problems of aggregation is to use continuous time econometric techniques instead,
see Sjöö (1993) for a discussion and further references.
B(L) (L)
8. Transfer function: yt = A(L) xt + (L) t
Notice that the transfer function is also a rational distributed lag since it
contains a ratio of two lag structures. Also, (7) and (8) can be viewed as distributed
lag models since D(L) = [B(L)=A(L)]. Notice that rational distributed lag models
require some information about B(L) to be workable.
Imposing restrictions on the lag structure B(L) in distributed lag models lead
to further models;
10. Polynomial distributed lag (PDL) models, where B(L) declines according
to some polynomial function, decided a priori. (= Almon lags).
11. All other types of a priori restrictions on B(L) not covered by (9) and
(10).7
12. The error correction model. This model embraces all of the above models
as special cases. The following explains way this is so.
Introduction to Error Correction Models
Economic time series are often non-stationary, their means and variances change
over time. The trend component in the data can either by deterministic or sto-
chastic, or a combination of both. Fitting a deterministic trend assumes that the
data series grow with a …xed rate each period. This is seldom a good way of
characterizing describing trends in economic time series. Instead they are better
described as containing stochastic trends with a drift. The series might be grow-
ing over time, but it is not possible to predict whether it grows or declines in the
next period. Variables with stochastic trends can be made stationary by taking
…rst di¤erences. This type of variable is called integrated of order 1, where the
order of integration is determined by the number of times the variable needs to be
di¤erenced before it becomes stationary.
A necessary condition for …tting trending data in an econometric model, is that
the variables share the same trend, otherwise there is no meaningful long-run re-
lationship between them.8 Testing for co-integration is a way of testing if the data
7 Restrictions are put on the lag process to make the estimation more e¤ective. A priori,
restrictions can be motivated by a limited sample and muticollinarity that a¤ects estimated
standard errors of the individual lags. These type of restrictions are not used anymore. Today,
it is recognized that it is more important to focus information criteria, white noise residuals and
building a well-de…ned statistical model, instead of imposing restrictions that might not be valid.
8 The exception is tests of the e¢ cient market hypothesis, and related tests of rational expec-
tations. See Appendix A in Sjöö and Sweeney (1998) and Sjöö (1998).
yt = + xt + t : (7.31)
If both yt and xt are integrated variables of the same order, a necessary con-
dition for a statistically meaningful long-run relationship is that the residual term
( t ) is stationary. If that is the case the error term from the regression can be
seen as temporary deviations from the long-run, and and can be viewed as
estimates of the long-run steady state relation between x and y.
A general way of building a model of time series, without imposing ad hoc a
priori restrictions, is the autoregressive distributed lag model. For two variables
we have,
A(L)yt = B(L)xt + t ; (7.32)
Pk Pk
where the lags are given by A(L) = i=0 ai , and B(L) = i=0 bi . The …rst
coe¢ cient in A(L) is set to unity, a1 = 1. The lag length is chosen such that
the error term becomes a white noise process, t N ID(0, 2 ). The long-run
solution of this model is given by,
yt = xt + t; (7.33)
where = B(L)=A(L). Without loss of generality we can use the di¤erence op-
erator, xt = xt xt 1 , to rewrite the autoregressive model as an error correction
model,
X
k X
k
yt = i x t i + i yt i + ECMt 1 + t ; (7.34)
i=0 i=1
Multivariate models are introduced later. For the time we can conclude our listing
of models with the following,
Given an autoregressive, or distributed lag structure A(L), B(L) or D(L) the long
run static solution of the model is found by setting L = 1. The intuition is that
in the long run there will be no changes in the explanatory variables, and it will
not matter if we explain yt by say xt and/or xt i .
The conditional mean of yt in an ADL model for example is
1
Et fyt g = yt = A(L) B(L)xt : (8.1)
B(1)
y= x; (8.2)
A(1)
y = D(1)x (8.3)
The total e¤ect of a change in xt is given by the sum of the coe¢ cients in D(L)
when L = 1. If there are m lags in D(L), the total multiplier is
m
X
D(1) = ( 0 + 1 + 2 + ::: m) = j: (8.5)
j=0
j = j: (8.6)
j=0
Xm
j =[ j ]= D(1); (8.7)
j=0
such that it represents the share of the total multiplier up until the j : th lag.
Notice that m could be equal to in…nity if we have a stable model, with sta-
tionary variables, such that the in…nite sum of i converges to a constant sum in
the long run.
The mean lag can be derived in a more sophisticated way, by di¤erentiating
D(L) with respect to L and then dividing by D(1). That is,
2 s
D(L) = 0 + 1L + 2L + ::: + sL ; and (8.9)
D0 (L) = 1 +2 2L +3 3L
2
+ ::: + s sL
s 1
: (8.10)
0
By dividing D (1) by D(1) we get, as a general result for ADL models,
Finally we have the median lag, representing the number of periods required
for 50% of the total e¤ect to be achieved. The median lag is obtained by solving,
Xj
k
X
xt = Ai x t i + et ; or
i=1
A(L)xt = et (9.1)
Pp
where Ai is the matrix of coe¢ cients of lag number i; so A0 A = i=0 Ai ; where
A0 is a diagonal matrix, et is a vector of white noise residual terms. Notice that
all variables across all equations have the same lag length (k). This is so because
it makes it possible estimate the system with OLS. If the lag order is allowed vary,
the VAR must be estimated with the seemingly unrelated regressor method.
A VAR model can be inverted into its VECMA form as
1 See Wei (1989) for a presentation of the Box-Jenkin’s technique in a multivariate framework.
The MA form is convenient for analysing the properties of a VAR and inves-
tigate the consequences of shocks to the system. Estimation, however, is usually
done in the VAR form, and is straightforward since each equation can be estimated
individually with OLS. The lag length (k) of the autoregressive process is chosen
such that the estimated residual process, in combination with constants, trend,
dummy variables and seasonals, becomes white noise process in each equation.
The idea is that the lag length is equal for all variables in all equations.
A
2 second
3 order
2 VAR 3 of dimension p with a constant
2 is 3 2 3
x1t a0 x1t 1 x1t 2 e1t
6 x2t 7 6 a1 7 6 7 6 7
6 7 6 7 a11 a12 a1p 6 x2t 1 x2t 2 7 6 e2t 7
6 .. 7 = 6 .. 7 + 6 .
.. .
.. 7 +6 . 7
4 . 5 4 . 5 a21 a22 a2p 4 5 4 .. 5
xpt ap xp 1 xpt 2 ept
VAR models were strongly advocated by Sims (1980) as a response to what he
described as ”incredible restrictions”imposed on standard structural econometric
models. Up until the mid 80s, empirical time series econometrics was dominated
yt = 1 + 11 yt 1 + 12 xt 1 + e1t (9.4)
xt = 2+ 21 xt 1+ 22 xt 1 + e2t : (9.5)
Thus, the parameters of the VAR are complex functions of some underly-
ing structural model, and as such they are on their own quite uninteresting for
economic analysis. It is the lag structure and sometimes it signs that are more
interesting. The two residuals in this VAR are,
1t + 11 2t
e1t = (9.6)
1 11 21
and
11 1t + 2t
e2t = : (9.7)
1 21 11
These residuals are both white noise terms, but they are correlated with each
other whenever the coe¢ cients 11 or 21 are di¤erent from zero.
The generalization of structural system above, setting zt = fyt ; xt g; is
Bzt = 0 + 1 zt 1 + t; (9.8)
where
1 11 01 11 12
B= ; 0= and 1 = :
21 1 02 21 22
If both sides of 9.8 is multiplied with B 1 the result is,
1 1 1
zt = B 0 +B 1 zt 1 +B t
= 0 + 1 zt 1 + et ; (9.9)
equations contain the same explanatory variables. However, if we set some lags to zero and have
a system with di¤erent lags in di¤erent equations, SUR will be a more e¢ cient estimator than
OLS.
Both the variance decomposition and the impulse response analysis require that
the residual covariance matrix of the VAR is orthogonalized. This is so, because
the errors et are dependent on each other through the B 1 matrix. Unless the
residuals of the VAR is orthogonalized it will not be possible to identify a shock
from as a unique shock coming from one speci…c variable.3 There are several ways
of performing the orthogonalization of the residuals. (In the following we assume
that the VAR is made up of stationary variables.) The idea is that restrictions
must be put on the covariance matrix of the VAR.
e1t = 1t
e2;t = c21 1t + 2t
e3t = c31 2t + c32 2t + 3t ; (9.11)
e1t = 1t + c13 3t
e2;t = c21 1t + 2t
e3t = c31 2t + 3t : (9.12)
Azt = A0 + A1 zt 1 + B t;
Once the error process is set up in such a way that the errors are orthogonal,
it becomes possible to analyze the e¤ects of one speci…c shock on the system and
4 A fourth approach is o¤ered by Blanchard and Quah (1989), and builds on classifying shocks
as temporary or permanent. This approach can be seen as an extension of the SVAR approach
to processes including integrated variables with common trends.
First you thing about your system. What is it that you want to explain? How
could it be modelled as a recursive system? Second you estimate the equations, by
OLS, the same lag lengths on all variables across the equations to avoid using the
SUR estimation technique. Third, you investigate outliers and shifts and put in
the appropriate dummy variables. Fourth, you try to …nd a short lag structure and
white noise residuals. Fifth, if you cannot ful…ll 4) you minimize the information
criteria. In this case AIC is not the best choice, use BIC or something else.
The orthogonalization of the residuals can o¤er some interesting intellectual chal-
lenges, especially in SVAR approach. If the variables in the VAR are integrated
variables, which also are co-integrating, we are faced with some interesting prob-
lems. In the co-integrating VAR model there will be both stationary shocks and
permanent chocks, and identifying these two types in the system is not always
easy. If the VAR is of dimension p, there can be at most r co-integrating vec-
tors, 0 r p, and p r common stochastic trends. Juselius (2006) ("The
Co-integrated VAR Model", Oxford University Press) shows how an identi…cation
of the structural MA model, and orthogonalization of the residuals, can be done
of both the in terms of short and the long-run of the system.
The VAR(2), with no constants, trends or other deterministic variables, will
have the following VECM representation after …nding r co-integrating vectors,
0
xt = 1 xt 1 + xt 1 + "t
The MA version of this model is,
t
X
xt = C "i + C (L)"t + x0
i=1
Where the …rst factor on the right hand side represent the stochastic trends
in the system and the second factor represents stationary part. The C matrix
will then represent all that is not the stationary vectors, and is related to the
co-integrated vectors as, C = ?( 0 ? ?) 1 0 ?:
VAR models represent statistical descriptions of data series. As such is a basis for
reducing your model and going into more ordinary structural econometric models,
such as Vector Error correction Model (VECMs). Estimating a VAR is then a
way of making sure that the …nal model is a well-de…ned statistical model, i.e. a
model that is consistent with the data chosen.
1. We have talked about what you can do with the VAR in terms of forecasting,
simulations, impulse responses, forecast error decomposition and Granger
causality testing. in this context we meet the so-called SVAR - Structural
VAR. There is, however, a number of other VARs that one needs to know
about. The problems of working with VARs are obvious; there is a large
amount of variables to be estimated, the estimated parameters might no be
stable over time and there is a number of variables that are not modelled in
the VAR because the VAR would get too large to handle. If you want to use
the VAR for forecasting we need to address these problems. To handle the
problem with time varying parameters there are Time-Varying-Parameter
VARs (TVP-VARs). In addition there various VAR modeling techniques
that deal with regime changes, Markov switching VARs, threshold VARs,
‡oor and ceiling VARs, smooth transition VAR.
To work with large number of variables and reduce the model it is possible to
factor analysis, which takes us to Factor Augmented VARs (FA-VARs). Another
approach is to use a priori information about parameters and their distribution
in the form of represented by Bayesian VARs (BVARs). The latter is a popular
approach in many central banks.
We can illustrate the problem in the following way. Your model predicts that
the in‡ation rate will vary around 10%, and the same time you have additional
information indicating that in‡ation will ‡uctuate around 5 per cent, say that there
is a sudden drop in in‡ation. What do you do? One approach is simply to reduce
the constant term and predict changes in in‡ation around 5 per cent instead. A
more ambitious approach is to incorporate more information in your model, from
more data and place more emphasis on recent observations etc. Changing the
constant is easy and quite normal. As you start walking along the path of making
assumptions about the data and the parameters of the model you might go too
far in the other direction. As long as we talk about forecasting, the proof is in the
pudding. The best forecast wins, but as we talk about the best policy to achieve
goals in the future you have to be much more careful.
The type of VARs we have discussed so far are basically statistical representa-
tions of the data. Without futher restrictions, and incorporation of long-run steady
state relations in the form of co-integrating vectors, their relative predictability
will be quite poor. Also, the economy is more complex, involving many more
variables that the two to six variables that can be handled in a standard VAR.
If you model contains …fty or one hundred variables there will be too many lags
and coe¢ cients to estimate. One way of dealing with this problem is use so-call
Bayesian VARs (BVAR). In the BVAR you can use prior information to reduce
the number of coe¢ cients you need to estimate. BVAR is popular among many
central banks, included both the ECB and the FED to make construct better and
bigger VARs for forecasting.5
5 Gary Koop at University of Strathclyde has a home page with course material dealing with
BVAR models.
93
Whether a variable is a¤ected by another in such away that it can be said
to cause the other variable is a fundamental question in all sciences. However, to
validate empirically that one variable are caused by another variable is problematic
in economics since it is often quite di¢ cult to set up controlled experiments.
Granger (1969), building upon work done by Wiener, was the …rst to formalize
an empirical concept of causality in economics. Granger’s basic idea is that the
future cannot predict the present or the past. It follows, as a necessary condition,
that for one variable (xt ) to cause another variable (yt ), lagged values of xt must
predict yt . This can be tested with the following vector autoregressive model,
k
X k
X
yt = i yt i+ i xt i + et ; (9.13)
i=1 i=1
where the lag length is the same as before, and t N ID(0, ! 2 ). If lagged
values of yt predict xt ; yt is Granger causing xt . In some situation testing the
reverse relationship is of no interest. For instance, the in‡ation rate in a small
open economy should not Granger cause the in‡ation rate of the World.
The main weakness of the Granger non-causality test is the assumption that
the error process in the VAR is not only a white noise process, but also a white
95
noise innovation process with respect to all relevant information for explaining
the movements of xt and yt . This is an important issue which is often forgotten
in applied work, were bivariate systems are the rule rather than the exception.
Granger’s basic de…nition of non-causality is based on the assumption that all fac-
tors relevant for predicting yt are known. Let It represent all relevant information,
both past and present, let Xt be present and past observations on xt , such that
Xt = (xt , xt 1 , xt 2 , ..., x0 ); It 1 and Xt 1 represent past observations only.
The variable xt can therefore be said to Granger cause yt if the mean square error
(MSE) increases when yt is regressed against the information set where Xt 1 is
removed. In the bivariate case, this can be stated as,
M SE(^
yt jIt 1) < M SE[^
yt j(It 1 Xt 1 ]; (9.15)
96
10. INTRODUCTION TO EXO-
GENEITY AND MULTICOLLINEAR-
ITY
10.1 Exogeneity
yt = xt + "1t
xt = "2t
If "1t and "2t are both stationary it follows that xt is I(1) and that yt if I(0)
that = 0. On the other hand if 6= 0, it follows that yt is I(1): To estimate
it is required that yt is not simultaneously in‡uences xt : If yt or yt is part of
the left-hand side of xt equation (and thus embedded in "2t ) the result is that
E("1t "2t ) 6= 0, and we can write "1t = "2t + ut : Where for simplicity we assume
that ut s N (0; 2 ):
Now, if we estimate with OLS, the outcome would be a biased estimate of
, since E(xt "1t ) = E(xt ( "2t + ut ), and we can no longer assume that xt and "1t
are independent. This is example of lack of weak exogeneity. With the …rst model
is not possible to estimate the parameter of interest , the outcome from OLS is
a di¤erent and biased value.
Weak exogeneity spell out the conditions under which it is possible to obtain
unbiased and e¢ cient estimates. The de…nition is based splitting the joint density
function, into a conditional density and a marginal density function;
Strong exogeneity spells out the conditions for conditional forecasting and simu-
lations of a model with not modelled variables. The condition is weak exogeneity
and that the marginal model should not depend on the endogenous variable. Thus
the marginal process must be
1 The condition of no correlation between the error terms is easily understandable if we assume
that fyt ; zt g is a bivariate normal process. Set up the density function, and determine the
condition when it is possible to estimate the parameters of interest from the conditional model
only.
2 Regarding Johansen’s test, it is important to remember that it is model dependent. The test
is performed conditionally on the short-run dynamics of the variables included in the system,
the dummy variables and the speci…cation of deterministic trend.
Super exogeneity determines the conditions for using the estimated parameters
for policy decisions. The condition is weak exogeneity and that the parameters
of the conditional model are stable w.r.t. to changes in the marginal model. For
instance, if the money supply rule changes, the parameters of the marginal process
will also change. If this also leads to changes of the parameters of the conditional
model, the conditional model cannot be used to analyse the implications of policy
changes. Thus, super exogeneity de…nes the situations when the Lucas critique is
not valid.
yt = 0 + 1 xt + 2 zt + t
The estimated parameters of this model is analysed under the assumption that
there is no correlation between the variables. The parameter 1 is understood as
the e¤ect on yt following a unit change in xt while holding the other variables in
the model (zt ) constant. In the same way 2 measures the e¤ect on yt while xt is
held constant. Another way of expressing this is the following; Efyt j zt g = 1 xt
and Efyt j xt g = 2 zt ; which tells us that the e¤ect of one parameter cannot be
analysed in isolation from the rest of the model. The e¤ect of zt in the model is
not on yt in it self, it is on yt conditional on xt . The meaning of holding say xt
constant in the model, while zt is free to vary implies that we study the e¤ect on yt
after ’removing’the e¤ects of xt on yt .If xt and zt are correlated it is not possible to
keep one of the constant while the other is changing. This is the multicollinearity
problem.
The statistical problem is best understood by looking at the OLS variance of
^ : The variance is
2
V ar( ^ 2 ) = P ;
(xt x2 ) (1 xz )
yt = 1 xt + 3 xt 1 + t: (10.3)
The transformation is just a reparameterization and does not a¤ect the residual
term. The parameter 3 = 1 + 2 which is the long run static solution of the
model. Thus we get an estimate of the short run e¤ect on yt from 1 and at
the same time a direct estimate of the static long run solution from 3 . If the
collinearity between xt and xt 1 is high, it can be assumed to be quite small when
we look at xt and xt 1 . Since our …nal interest in modelling economic time series
is to …nd a well-de…ned statistical model, which mimics the DGP of the variable(s)
multicollinearity is not really a problem. We will therefore not deal with this topic
any further.
This section looks at a number of unit root tests, which can be applied to determine
the order of integration of a variable. The following tests are presented,
The Dickey-Fuller test is one of the oldest test. The tests builds on the assumed
DGP,
yt = yt 1 + t with t N ID(0, 2 ):
Given this DGP, subtract yt 1 from both sides, and estimate the equation
a) yt = yt 1 + t ,
or, put a constant term in the regression, to allow for the alternative of a
deterministic trend in yt 1 ,
b) yt = + yt 1 + t ;
or, put in both a constant and a time trend in the estimated equation, to allow
for both a linear deterministic trend and a quadratic deterministic trend in yt ;
c) yt = + yt 1 + t + t ;
where = 0 if yt is I(1). In this regression, know that will be biased
downwards, in a limited sample. Thus, we can put all the risk on the negative
side and perform a one-sided test, instead of a two-sided standard t-test. The one
sided t-test H0 : ^ = 0 - yt I(1) against,
H1 : ^ < 0, yt I(0): The correct 0 t-statistic’for testing the signi…cance of ^
is tabulated in Fuller (1976), under the assumption that yt is random walk, yt
N (0; 2 ): The correct distribution for the ”t-test” can also be calculated from
MacKinnon (1991), for the exact sample size at hand. In practice the di¤erences
are small though. The t-statistics for the constant term and the trend term are
tabulated in Dickey and Fuller (1980). Notice that the null hypothesis is that
yt = t , where t is white noise. The econometrician, however, will not know
1 To understand why the constant represents a linear deterministic trend, go back to the
The DF-test, like all tests of I(1) versus I(0), is sensitive to deviations from the
assumption t N ID(0, 2 ). The assumption of NID errors is critical to the
simulated distributions in Fuller (1976). If there is autocorrelation in the residual
process the OLS estimated residual will inappropriate, the residual variance esti-
mate will be biased and inconsistent. The ADF-test seeks to solve the problem by
augmenting the equations with lagged yt ;
X
k
yt = yt 1 + i yt i + t; (11.1)
i=1
or
k
X
yt = + yt 1+ i yt i + t; (11.2)
i=1
or
k
X
yt = + yt 1+ t+ i yt i + t: (11.3)
i=1
The asymptotic test statistic is distributed as the DF-test, and the same recom-
mendation applies to these equations, make sure there is a meaningful alternative
hypothesis. Therefore start with the model including both a constant and a trend.
The ADF test is better than the original DF-test since the augmentation leads
to empirical white noise residuals. As for the DF-test, the ADF test must be set
up in such a way that it has a meaningful alternative hypothesis, and higher order
integration must be tested before the one only unit root case.2
2 Sjöö (2000b) explains in some detail how the test is used in practice.
The ADF-test tries to solve the problem of non-white noise residuals by adding
lags of the dependent variable. It should be stressed that the ADF-test is quite
adequate as a data descriptive device under the maintained hypothesis that the
variables in a sample are integrated of order one. There are, however, a number of
tests which tries improve on some of the weaknesses of the ADF-test. Phillips and
Perron (1988) suggest non-parametric correction of the test statistic so that the
Dickey-Fuller distribution can be used even in cases when the residual in the DF-
test is not white noise. (The KPSS-test below a recent modi…cation of the same
principle) The method starts from the estimated t-value (t^ ) and the estimated
residuals from the DF equation. The test statistic (t ) -the t-value- is modi…ed
with the following formula
S ^ T [S 2 S 2 ][std:er(^ )=s]
t = t (11.4)
S 2S
where s is the residual variance from the DF regression,
T
X
S2 = T 1
^2t ; (11.5)
t=1
and
T
X l
X T
X
S2 = T 1
^2t + 2T 1
[1 j(l + 1) 1
] ^t^t j: (11.6)
t=1 j=1 t=j+1
yt = + t + et; (11.7)
where
X
i
t
X k
X t
X
s2 (k) = T 1
e^2t + 2 T 1
w(s; k) e^t e^t s: (11.10)
1 s=1 t=s+1
The critical values for the test is given in Kwiatkowsky et.al (1992). A Bartlett
type window, w(s; k) = 1 [s=(k + 1)] is used to correct the estimate (sample)
test statistics correspond to the simulated distribution which is based on white
noise residuals. The KPSS test appears to be powerful against the alternative of
a fractionally integrated series. That is, a rejection of I(0) does not lead to I(1),
as in most unit root test, but rather to a I(d) process where 0 < d < 1. These
type of series are called fractionally integrated. A high value of d implies a long
memory process. In contrast to an integrated series I(1), or I(2) etc, a fractionally
integrated series is reverting. Baillie and Bollerslev (1994).
This test builds on the conclusion that for a unit root variable, the estimated resid-
uals are inappropriate and will indicate that unrelated variables are statistically
signi…cant (spurious regression). Therefore estimate,
1 : yt = + t+ 1t (11.11)
where RSS1 and RSS2 are the residual sums of squares from model 1 and 2
respectively, s2 (k) is as above.
We can conclude that among theses tests, the ADF test is robust as long as
the lag structure is correctly speci…ed. The gains from correcting the estimated
residual variance seem to be small.
Rejecting one unit root does not necessarily mean that one can accept the alter-
native of an I(0) series. Sometimes unit root test will reject the assumption of
a unit root even though the series is clearly non-stationary. There are several
alternatives to rejecting the I I(1) hypothesis,
The segmented trend approach was launched by Perron (1989). He argues that
few series really are I(1): If we have detailed knowledge about the data generating
process, we might establish that series have di¤erent deterministic trends for dif-
ferent time periods. The fact that these segmented trends shift over time implies
that unit root tests cannot reject the hypothesis of an integrated variable. Thus,
instead of detecting the correct deterministic trend(s), the test approximates the
changing deterministic trend with a stochastic trend. Perron (1989) demonstrates
this fact and drives a test for a known break date in the series. Banerjee et.al.
(1992) develop a test for an unknown break date. The problem with this approach
is that we somehow have to estimate these segmented trends. Sometimes it will
be possible to argue for segmented trends, like World War One and Two, etc.,
but in principle we are left more or less with ad hoc estimates of what might be
segmented trends.
3 Testing for integration should be done according to the Pantula Principle, since higher order
integration dominates lower order integration, test from higher to lower order, and stop when it
is not possible to reject the null. For instance, a test for I(1) v.s I(0) assumes that there are
no I(2)processes. The presence of higher order cointegration might ruin the test for lower order
integration, therefore start with I(2) and only if I(2) is rejected will it be meaningful to test for
I(1), etc.
For the class of integrated series discussed above the di¤erence operator was as-
sumed to be d = 1. The choice between d = 0 and d = 1 might be too restrictive
in some situations. Especially, if unit root tests reject I(0) in favour of the I(1)
hypothesis, when we have theoretical information that suggests that I(1) is im-
plausible, or highly unrealistic. For example, unit root tests might …nd that both
the forward and the spot foreign exchange rates are I(1), and that the forward pre-
mium (f s), the log di¤erence, is also I(1), indicating no mean reversion in this
di¤erence series, and that the forward and the spot rates are not co-integrating.
The expectations part of the forward rate would therefore be extremely small or
irrational in some sense, so the risk premiums are causing the I(1) behavior.
Autoregressive Fractional Di¤erence Moving Average Models, represents a
more general class of model than ARMA and ARIMA models, see Granger and
Joyeux (1980) and Granger (1980). The ARFIMA (p; d; q) model is de…ned as
Most macroeconomic and …nance variables are non-stationary. This has enormous
consequences for the use of statistical methods in economics research. Statistical
theory assumes that variables are stationary, if they are not stationary statistical
inference is generally not possible. It doesn’t matter that numerous old textbooks
in econometrics and research papers have ignored the problem. The problems
associated with non-stationary variables in econometrics has been known since the
1920s, but didn’t get a solution until the end of the 1980s. In principle there two
ways of dealing with non-stationary, you must either remove the non-stationarity
before setting up the econometric model or set up a model of non-stationary
variables that forms a stationary relation. Typically, in none of these cases can
you use standard inference based on t-, chi–square or F-distributions.
Now, variables can be non-stationary in an in…nite number of ways. In practice,
there are broadly two types of non-stationary variables of interest in econometrics.
The …rst type are variables stationary around a deterministic trend. The second
type are variables stationary around a stochastic trend. Stochastic trend variables
are also known as integrated variables. Most variables in economics and …nance
seem to be driven by stochastic trends.
The problem with stochastic trend variables (integrated variables) is that not
only do they not follow standard distributions, if you try to use standard distri-
butions you will most likely be fooled into thinking there are signi…cant relations
when in fact there are no relation. This is know as the spurious regression problem
in the literature.
Historically, trends were dealt with by removing what people assumed was a
linear deterministic trend. This was done in the following way. The non-stationary
variable was regressed against a constant and a linear trend variable;
yt = + t + y~t (12.1)
2
yt = + 1t + 2t + ::: + nt
n
+ y~t : (12.2)
This approach of …tting deterministic trends can be extended into cyclical
trends, using trigonometric functions in combinations with the time trend. In
the literature there are various deterministic …lters that aim at removing long-run
(supposedly deterministic) trends such as the so-called Hodrick-Prescott …lter.
However, if the series is driven by a stochastic trend the estimated variables of
these models will not follow standard distributions and the regression will impose
a spurious autocorrelation pattern in the spuriously detrended variable y~t . Thus,
until you have investigated the non-stationary properties of the series and tested
for stochastic trends (order of integration) it is not possible to do any econometric
modelling.
yt = y0 + vi : (12.4)
i=0
The expression shows how the random walk variable is made up by the sum of
all historical white noise shocks to the series. The sum represents the stochastic
trend. The variable is non-stationary, but we cannot predict how it changes, at
least no by looking at the history of the series. (See also the discussion above
concerning random walks under the section about di¤erent stochastic processes)
The stochastic trend term is removed by taking the …rst di¤erence of the series.
In the random walk case it implies that yt = vt is a stationary variable with
constant mean and variance. Variables driven by stochastic trends are also called
integrated variable because the sum process represents the integrated property of
these variables.
A generic representation is the combination of deterministic and stochastic
trends,
yt = + t + t + y~t ; (12.5)
where t = t 1 + vt ; vt is N ID(0; 2 ); t is the deterministic trend and y~t
is a stationary process representing
Pthe stationary part of yt : In this model, the
t
stochastic trend is represented by i=1 vi :
An alternative trend representation is segmented deterministic trends, illus-
trated by the model
where t1 ; t2 etc;
_ are deterministic trends for di¤erent periods, such as wars, or
policy regimes such as exchange rates, monetary policy etc.. Segmented trends
are an alternative to stochastic trends, see Perron 1989, but the problem is that
the identi…cation of these di¤erent trends might be ad hoc. Given a suitable
choice of trends almost any empirical series can be made stationary, but are the
di¤erent trends really picking up anything interesting, that is not embraced by the
assumption of stochastic trends, arising from innovations with permanent e¤ects
on the economy?
yt = + xt + "t : (12.7)
In the case when the two I(1) variables share the same stochastic trend and
form an I(0) residual we say that they are co-integrating.
Remark 1 If xt has more than two elements there can be more than one co-
integrating vector .
Remark 3 This de…nition and the Granger Representation Theorem (Engle and
Granger, 1987) tell us that if there is co-integration then there is also an error
correction representation, and there must be Granger causality in at least one
direction.
Under the general null hypothesis of independent and integrated variables esti-
mated variances, and test statistics, do not follow standard distributions. There-
fore the way ahead is to test for co-integration, and then try to formulate a regres-
sion model (or system) in terms of stationary variables only. Traditionally there
are two approaches of testing for co-integration; residual based approaches and
other approaches. The …rst type starts with the formulation of a co-integration
regression, a regression model with integrated variables. Co-integration is then
determined by investigating the residual(s) from that regression. The Engle and
yt = + xt + zt (12.8)
where the estimated residuals are z^t : If the variables are co-integrating, z^t will
be I(0). The second step is to perform an Augmented Dickey-Fuller unit root test
of the estimated residual,
k
X
z^t = + z^t 1 + z^t i + "t : (12.9)
i=1
The relevant test statistics are not the one tabulated by Fuller (1976). Instead
you have to look new simulated tables in Engle and Granger (1987), Engle and
Yoo (1987), or Banerjee et al (1993). The reason is that the unit root test is
now performed, not on a univariate process, but on a variable constructed from
several stochastic processes. The test statistic will change depending on how many
explanatory variables there are in the model.
Remark 4 Remember that the t-statistics, and the estimated standard deviations,
from the co-integrating regression must be considered, even if we …nd cointegra-
tion. Unless xt is exogenous the estimated parameters follow unknown non-normal
distributions even asthmatically.
Remark 5 For the outcome of the test, it will not matter which variable is chosen
to be the dependent variable. As an economist you might favour setting one variable
as dependent and understand the parameters as long-run economic parameters
(elasticities etc.)
There are a number of problems with the Engle and Granger two-step proce-
dure.
The …rst is that the tabulated (non-standard) test statistic assumes white noise
residuals. The augmentation tries to deal with this but is in most cases it is only
a crude approximation.
Second, the test assumes a common factor in the dynamic processes of yt and
xt : In practice this restriction is quite restrictive and the test will not behave
yt = + 1 xt + 2 ut + t: (12.10)
If yt and xt are co-integrating, they already form one linear combination (zt )
which is stationary. If ut I(1) is not co-integrating with the other variables,
OLS will set 2 to zero, and the estimated residual ^t is I(0). This is why the test
will only work if there is only one co-integrating vector among the variables. If yt
and xt are not co-integrating then adding ut I(1) might lead to a co-integrating
relation. Thus, in this respect the test is limited, and testing must be done by
creating logical chains of bi-variate co-integration hypotheses.
Other residual based tests try to solve at least the …rst problem by adjusting
the test statistics in the second step, so that it always ful…lls the criteria for testing
the null correctly. Some approaches try to transform the co-integrating regression
is such a way that the estimated parameters follow a standard normal distribution.
A better alternative to testing for co-integration among more than two variables
is o¤ered by Johansen’s test. This test …nds long long-run steady-state, or co-
integrating, relations in the VAR representation of a system. Let the VAR,
Ak (L)xt = Dt + "t ; (12.11)
represent the system. The VAR is a p-dimensional system, the variables are as-
sumed to integrated of order d; fxgt I(d); Dt is a vector deterministic variables,
constants, dummies, seasonals and possible trends, is the associated coe¢ cient
P
matrix. The residual process is normally distributed white noise, "t ID(0; ).
It is important to …nd the optimal lag length in the VAR and have a normal
distribution of the error terms in addition to white noise because the test uses
a full information maximum likelihood estimator (FIML). estimators are notori-
ously sensitive to small samples and misspeci…cations why care must be taken in
the formulation of the VAR. Once the VAR has been found, it can be rewritten
in error correction form,
k
X
xt = xt 1 + i xt 1 + Dt + "t (12.12)
i=1
In practical use the problem is to formulate the VAR, the program will rewrite
the VAR for the user automatically. Johansen’s test builds on the knowledge
that if xt is I(d) and co-integration implies that there exists vectors such that
0
xt I(d b). In a practical situation we will assume that xt (1) and if there
is cointegration, 0 xt I(0). If there is cointegration, the matrix must have
reduced rank. The rank of indicates the number of independent rows in the ma-
trix. Thus, if xt is a p-dimensional process, the rank (r) of matrix determines
the number of co-integrating vectors ( ), or the number of linear steady state
relations among the variables in fxgt : Zero rank (r = 0) implies no cointegration
vectors, full rank (r = p) means that all variables are stationary, while a reduced
rank (0 < r < p) means cointegration and the existence of r co-integrating vectors
among the variables.
The procedure is to estimate the eigenvalues of and determine their signi…-
cance.1
1 The test is called the Trace test and its use is explanied in Sjö Guide to testing for ...
In practical use the problem is to formulate the VAR, the program will rewrite
the VAR for the user and present the estimated and vectors. Sometimes it
necessary to understand how the VECM is found. Consider the 2 dimensional
VAR model, where the deterministic terms have been removed for simpli…cation,
Start with the …rst equation and with yt from both sides of the equal sign.
This chapter looks the common trends approach and some economics behind co-
integration. For instance, the question of creating positive or negative shocks in
stabilization policy.
An important characteristic of integrated variables is that they become sta-
tionary after di¤erencing. The de…nition of an integrated series is; A series, or
a vector of series, yt with no deterministic component, which has a stationary,
invertible ARMA representation, after di¤erencing d times is said to be integrated
of order d, denoted as xt I(d).
It is possible to have variables driven by both stochastic and deterministic
trends. In the very long run a deterministic trend will always dominate over a
stochastic trend. In a limited sample however, it becomes an empirical question if
the deterministic trend is su¢ ciently strong to have an e¤ect on the distributions
of the estimates of the model.1
We know, from the Wold representation theorem, that if yt is I(0), and has no
deterministic process, it can be written as an in…nite moving average process. (If
the series has a deterministic process this can be removed before solving for the
MA process).
yt = C(L) t ; (13.1)
where L is the lag operator, and t iid(0; 2 ). Now, suppose that yt is I(1),
then its …rst di¤erence is stationary and has an in…nite MA process,
yt = C(L) t : (13.2)
2
Under the assumption that t iid(0; ), we have also that
1
yt = C(L) t = [1=(1 L)]C(L) t : (13.3)
where 1=(1 L) represents the sum of an in…nite series. For a limited sample,
we get approximately,
yt = y0 + (1 + L + L2 + ::: + Lt 1
)C(L) t ; (13.4)
yt = y0 + t C(1) t : (13.5)
The forecasts are decomposed into what is known at time t, the …rst double
sum, and what is going to happen between t and t + h i. The latter is unknown
at time t, therefore we hhave to formi the conditional forecast of yt+h at time t;
Pt Pt+h i
yt+h jt = y0 + i=1 j=0 Cj i :
The e¤ect of a shock today (at time t) on future periods is found by taking
the derivative of the above expression with respect to a change in t ;
h
X
@yt+h jt =@ t = Cj ! C(1) as t ! 1: (13.9)
j=0
Thus, the long-run e¤ect of a shock today can be expressed by the static
long run solution of the MA representation of yt . (Equal to the sum of the MA
coe¢ cients).
The persistence of a shock depends on the value of C(1). If C(1) happens to
be 0, there is no long-run e¤ect of today’s shock. Otherwise we have three cases,
C(1) is greater than 0, C(1) = 1 or C(1) is greater than unity. If C(1) is greater
than 0 but less than unity, the shock will die out in the future. If C(1) = 1, the
integrated variables (unit roots) case, a shock will be as important today as it is
for all future periods. Finally, if C(1) is greater than one (explosive roots) the
shock magni…es into the future, and we have an unstable system.
If the series are truly I(1), spectral analysis can be applied to exactly measure
the persistence of a shock. The persistence of shocks has interesting implications
for economic policy. If shocks are very persistent, or explosive in some cases, it
might be a good policy to try to avoid negative shocks, but create positive shocks.
In our stabilization policy example, this can be understood as the authorities
should be careful with de‡ationary policies, for instance, since they might result
in high and persistent social costs, see Mankiw and Shapiro (198x) for a discussion
of these issues.
In the following, the MA representation of systems of integrated processes are
analysed. For this purpose let yt be vector of I(1)-variables. Using the lag operator
as, L = 1 (1 L) [1 ]2 , and Wald’s decomposition theorem gives,
If yt is a vector of I(1) variables, then we know from above that if the matrix
C(1) 6= 0, any shock to the series has in…nite e¤ects on future levels of yt . Let us
consider a linear combination of these variables 0 yt = zt . Multiplication of the
expression with 0 gives,
0 0
zt = C(1) t + C (L) t : (13.11)
In general, it is the case that when yt is I(1); zt is I(1) as well. Thus a linear
combination of integrated variables will also be integrated. Implying that 0 C(1)
2 This is the same as yt = yt + yt = [(1 L) + L]yt :
1
t = t 1 J t = (1 + L + L2 + :::Lt 1
)J t : (13.15)
The variable t represents the common trends, modelled as random walks.
Setting the initial condition 0 = 0, the level of yt is solved as
yt = y0 + A t + C (L) t (13.16)
which shows that yt is driven by the common trends representation A t . It
can also be shown that, since C(1) = AJ, that 0 A = 0, which implies that the
A(L)yt = Dt + t ; (14.1)
where
i
X
i = (I + Aj ); (14.3)
j=1
and
k
X
= (I + Aj ) = A(1): (14.4)
j=1
Notice that in this example the system was rewritten such that the variables in
levels (yt k ) ended up at the k : th lag. As an alternative it is possible to rewrite
the system such that the levels enter the expression at the …rst lag, followed by k
lags of yt i . The two ways of rewriting the system are identical. The preferred
form depends on ones preferences.
Since yt is integrated of order one and yt is stationary, it follows that there
can be at most p 1 steady state relationships between the non-stationary variables
in yt . Hence, p 1 is the largest possible number of linearly independent rows in
the -matrix. The latter is determined by the number of signi…cant eigenvalues
in the estimated matrix ^ = A(1). ^ Let r be the rank of , then rank( ) = 0
implies that there are no combinations of variables that leads to stationarity. In
other words, there is no cointegration. If we have rank( ) = p, the matrix is
said to have full rank, and all variables in y t must be stationary. Finally, reduced
rank, 0 < r < p means that there are r co-integrating vectors in the system.
Once a reduced rank has been determined, the matrix can be written as =
0
, where 0 yt represent the vectors of co-integrating relations, and a matrix of
adjustment coe¢ cients measuring the strength by which each co-integrating vector
a¤ects an element of yt . Whether the co-integrating vectors 0 yt are referred
k
X1
yt = 1 ;i yt i + 0 Dt + R1t ; (14.5)
i=1
k
X1
yt p = 2 ;i yt i + 2 Dt + Rkt : (14.6)
i=1
The system in 14.1 can now be written in terms of the residuals above as,
0
R1t = Rkt + et : (14.7)
The vectors and can now be estimated by forming the product moment
matrices S11 , Skk and S1k from the residuals R1 ;t and Rk ;t ;
T
X
1
Sij = T Rit Rjt ; i; j = 0; k (14.8)
i=1
For …xed vectors, is given by ^ ( ) = S1k ( 0 Skk ) 1 , and the sums of squares
function ^ ( ) = S11 ^ ( )( 0 Skk )^ ( )0 . Minimizing this sum of squares function
leads to maximum likelihood estimates of and . The estimates of are found
after solving the eigenvalue problem,
From this expression two likelihood ratio tests for determining the number of
non-zero eigenvalues are formulated. The …rst test concerns the hypothesis that
the number of eigenvalues is less than or equal to some given number (q) such that
H0 : r q, against an unrestricted model where H1 : r p. The test is given by,
p
X
2ln(Q; qj p) = T ln(1 ^ i ): (14.11)
i=q+1
The second test is used for the hypothesis that the number of eigenvalues
is less than the number tested in the previous hypothesis, H0 : r q against
H1 : r q + 1, and is given by,
yt = C(L)( t + + Dt ): (14.14)
(To be completed...)
The modelling of stochastic di¤erential models introduce some problems which
clearly violate the assumptions behind the classical linear regression model. With
some care most of these problems can be solved. The most important factors are
whether the data series are stationary, and if the residuals are white noise. As
long as the variables are stationary and the residual is a white noise process, OLS
estimation is generally feasible. Autocorrelated residuals, however, mean that the
OLS estimator is no longer consistent. In this situation the model must either
be re-speci…ed, or the whole model including the autoregressive process in the
residuals must be estimated by maximum likelihood.
To understand the di¤erences between the estimation of stochastic di¤erence
equations and the classical linear regression model, we will introduce these di¤er-
ences step by step, in all there are 6 models of interest here,
The following sections do not present any rigorous proofs concerning the prop-
erties of the OLS estimator. The aim is only to review known problems and
introduce some new ones.
y=X + ; (15.1)
which ensures that the inverse of (x0 x) exists. Minimizing the sum of squared
residuals leads to the following OLS estimator of ;
^ = (X 0 X) 1
(X 0 y) = + (X 0 X) 1
(X 0 ): (15.5)
The estimated parameter ^ is equal to its true value and an additional term.
For the estimate to be unbiased the last factor must be zero. If we assume that
the x0 s are deterministic the problem is relatively easy. A correct speci…cation of
the model, E( ) = 0, leads to the result that ^ is unbiased.
The parameter has the variance,
E( ^ ) = E( ) + E[(X 0 X) 1
(X 0 )] = + E(X 0 X) 1
E(X 0 ); (15.8)
(^ ) N ID(0; 2
I): (15.9)
of a constant is equal to the constant, since true parameter can be treated as a constant we
have E( ) = .
T
" # 1 " T
#
^= 1 X ~2 1 X~
+ t t t : (15.11)
T T
t=1 t=1
Taking expectations leads to the result that is unbiased. The most important
reason why this regression works well is that there is an additional t~ variable in
the denominator. As t~ goes to in…nity the denominator gets larger and larger
compared to the numerator, so the ratio goes to zero much faster than otherwise.
Applying OLS to time series data introduces the problem of stochastic explanatory
variables. The explanatory variable can be stochastic on their own, and lags of
the dependent variables imply stochastic regressors. Let the model be,
yt = xt + t ; (15.12)
seasonal components.
or, alternatively,
p lim [X 0 ] !p 0: (15.18)
The intuition behind this result is that, because t is zero on average, we are
multiplying xt with zero. It follows then that the average of (xt t ) will be zero.
The practical implication is that given a su¢ ciently large sample the OLS
estimator will be unbiased, e¢ cient and consistent even when the explanatory
variables are stochastic variables. If t N ID(0; 2 ), we also have, conditional
T
on the stochastic process fxt g1 ; that the estimated is distributed as,
^ jx N[ ; 2
(X 0 X) 1
] (15.19)
t
and
(^ ) jxt N (0; 2
); (15.20)
making ^ an unbiased and consistent estimate, with a normal distribution such
that standard distributions can be used for inference.
The example can be extended by two assumptions. First let the residuals be
et iid(0, 2 ), they are independent and identically distributed as before, but not
necessarily normal. Second, let the process fxt gT1 be only covariance stationary in
the long run, allowing
Pt the sample covariance to vary with time in a limited sample,
E(X 0 X) = (1=T ) xt xt = Qt . The processes fxt gT1 and f t gT1 are independent
as above. Under these conditions the estimated is,
" T
#
X
^ = + Qt 1 (1=T ) xt t (15.21)
t
t=1
The estimated t can vary with t since Qt varies with time. To establish that
OLS is a consistent estimator we need to establish that
" T
# 1
X
(1=T ) xt xt = Qt 1 !p Q 1 (15.22)
t=1
3 In a multivariate model we would say that Q converges to a matrix of constants.
(^t ) N (0; 2
): (15.24)
1 1
Since (1=T ) = (1=T 2 )(1=T 2 ) the CLT can be evoked by rewriting the expres-
sion as,
T
X T
X
1 1
(1=T 2 )( ^ t ) = [(1=T ) xt xt ] 1
[(1=T 2 ) xt t ]; (15.27)
t=1 t=1
where the LHS and the numerator on the RHS correspond to the CLT theorem.
From the numerator, on the RHS, we get as T goes to in…nity
T
X
1
2
[(1=T 2 ) xt t ] ) N (0; ): (15.28)
t=1
1
Moreover, we can also conclude that the rate of convergence is given by (1=T 2 ).
1 1
Dividing the RHS side of the OLS estimator with (1=T 2 ) leaves (1=T 2 ) in the
denominator which then represents the speed by which the estimate ^ t converges
to its true value .
yt = yt 1 + t; (15.29)
leading to
" T
# 1 " T
#
X X
^ = T 1
yt 1
2
T 1
yt 1 t : (15.31)
t=1 t=1
This is similar to the stochastic regressor case, but here fyt 1 g and f t g cannot
be assumed to be independent, so E(yt 1 )E( t ) 6= 0 and ^ can be biased in a
limited sample. The dependence can be explained as follows t is dependent on
yt , but yt is through the AR(1) process correlated with yt+1 , so yt+1 is correlated
with t+1 . The long-run covariance (lrcov) between yt 1 and t is de…ned as,
" 1
# 1 1
X X X
1
lrcov(yt 1 t) = T yt 1 t + E(yt 1 t+k ) + E(yt 1+k t ); (15.32)
t=1 k=1 k=1
where the …rst term on the RHS is sample estimate of the covariance, the last
two terms capture leads and lags in the cross correlation between yt 1 and t .
As long as yt is covariance stationary and t is iid, the sample estimate of the
covariance will converge to its true long-run value.
This dependence from t to yt+1 is not of major importance for estimation.
Since (yt 1 t ) is still a martingale di¤erence sequence w.r.t. the history of yt and
t , we have that Efyt 1 t j yt 2 ; yt 3; :::; t 1 ; t 2; :::g = 0, so it can be established
in line with the CLT; 4 that
" T
#
1
X
(1=T 2 ) yt 1 t ) N (0; 2 Q): (15.33)
t=1
Using the same assumptions and notation as above the variance is given is
E(yt 1 t t yt 1 ) = E( 2 )E(yt 1 yt 1 ) = 2 Qt : These results are su¢ cient to
establish that OLS is a consistent estimator, though not necessarily unbiased in
a limited sample. It follows that the distribution of the estimated , and its rate
of convergence is as above. The results are the same for higher order stochastic
di¤erence models.
In this section we look at the AR(1) model with an autoregressive residual process.
Let the error process be,
t = t 1 + t; (15.34)
2
where t iid(0, ). In this case we get the following expression,
4 This result is established by the so called Mann-Wold theorem
which establishes that the OLS estimator is biased and inconsistent. Only the
last covariance term can be assumed to go to zero as T goes to in…nity.
In this situation OLS is always inconsistent.5 Thus, the conclusion is that
with a lagged depended variable OLS is only feasible if there is no serial corre-
lation in the residual. There are two solutions in this situation, to respecify the
equation so the serial correlation is removed from the residual process, or to turn
to an iterative ML estimation of the model (yt yt 1 t 1 = vt ). The lat-
ter speci…cation implies common factor restrictions, which if not tested is an ad
hoc assumption. The approach was extremely popular in the late 70s and early
80s, when people used to rely on a priori assumptions in the form of adaptive
expectations or costly adjustment, as examples, to derive their estimated models.
Often economists started from a static formulation of the economic model and
then added assumption about expectations or adjustment costs. These assump-
tions could then lead to an in…nite lag structure with white noise residuals. To
estimate the model these authors called upon the so called Koyck transformation
to reduce the model to a …rst order autoregressive stochastic di¤erence model,
with an assumed …rst order serially correlated residual term.
converges to a constant while the eroor process vt converges to NID(0; 2 ): This is a result of
the CLT.
With three observations, we have that the joint probability density function is
equal to the density function of X~ 3 , conditional on X
~ 2 and X
~ 1 , multiplied by the
~
conditional density for X2 , multiplied by the marginal density for X ~1.
It follows that for a sample of T observations, the likelihood function can be
written as,
T
Y
L( ; x) = D(xt j Xt 1; )f (x1 ); (15.41)
t=2
yt = yt 1 + t; (15.42)
is that the estimated is not following a normal distribution, not even asymp-
totically. The problem here is not inconsistency, but the nonstandard distribution
of the estimated parameters. This is clearly established in Fuller(1976) where the
results from simulating the empirical distribution of the random walk model is
presented. Fuller generated data series from a driftless random walk model, and
estimated the following models,
a) yt = yt 1 + t ;
b) yt = + yt 1 + t ;
c) yt = + (t t) + yt 1 + t ;
where is constant and (t t) a mean adjusted deterministic trend. These
equations follow from the random walk model. The reason for setting up these
three models is that the modeler will not now in practice that the data is generated
by a driftless random walk. S=he will therefore add a constant (representing the
deterministic growth trend in yt ) or a constant and trend. The models are easy
to understand, simply subtract yt 1 from both sides of the random walk model,
yt yt 1 = yt 1 yt 1 + t; (15.43)
which leads to
yt 1 =( 1)yt 1 + t = yt 1 + t: (15.44)
where W (t) indicates that the sample moment converges to a random variable
which is a function if a Wiener process and therefore distributed according to a
non-standard distribution. If the residuals are white noise then we get the so
called Dickey-Fuller distributions.
The intuition behind this result is that an integrated variable has an in…nite
memory so the correlation between yt 1 and t does not disappear as T grows.
The nonstandard distribution remains, and gets worse if we choose to regress two
independent integrated variables against each other. Assume that xt and yt are
two random walk variables, such that
yt = yt 1 + t; and xt = xt 1 + t; (15.47)
where both t and t are N ID(0; 2 ). In this case, would equal zero in the
model, yt = xt + t :
The estimated t-value from this model, when yt and xt are independent random
walks should converge to zero. This is not what happens when yt and xt are also
Often you will …nd that there are several alternative variables that you can put
into a model, there might be several measures of income, capital or interest rates
to choose from. Starting from a general to a speci…c model, several models of the
same dependent variables might display, white noise innovation terms and stable
parameters that all have signs and sizes that are in line with economic theory.
A typical example is given by Mankiw and Shapiro (1986), who argue that in
a money demand equation, private consumption is a better variable than income.
Thus, we are faced with two empirical models of money demand.1 The …rst model
is,
mt = 0 + 1 yt + 2 yt 1 + 3 ry + t (16.1)
and the second is
mt = 0 + 1 ct + 2 ct 1 + 3 rt + t: (16.2)
Which of these models is the best one, given that both can be claimed to be
good estimates of the data generating process? The better model is the one that
explains more of the systematic variation of mt and explains were other models go
wrong. Thus, the better model will encompass the not so good models. The crucial
factor is that yt and ct are two di¤erent variables, which leads to a non-nested
test.
To understand the di¤erence between nested and non-nested tests set 2 = 0:
This is a nested test because it involves a restriction on the …rst model only. Now,
set 1 = 2 = 0; this is also a nested test, because it only reduces the information
of model one. If 1 = 2 = 0, this is also a nested test of the second model. Thus,
setting 1 = 2 = 0; or 1 = 2 = 0, are only special cases of each model.
The problem that we like to address here is whether to choose either yt or ct
as ”the scale variable” in the money demand equation. This is non-nested test
because the test can not be written as a restriction in terms of one model only.
The …rst thing to consider is that a stable model is better than an unstable
one, so if one of the models is stable that is the one to choose. The next measure
is to compare the residual variance and choose the model with the signi…cantly
smaller error variance.
However, variance domination is not really su¢ cient, PcGive therefore o¤ers
more tests, that allow the comparison of Model one versus Model two, and vice
versa. Thus, there are three possible outcomes, Model one is better, Model two is
better, or there is no signi…cant di¤erence between the two models.
1 For simplicity we assume that there is only one lag on income and consumption. This should
not be seen as a restriction, the lag length can vary between the …rst and the second model.
ENCOMPASSING 137
138 ENCOMPASSING
17. ARCH MODELS
where the error variance is dependent on its lagged value. The …rst equation
is referred to as the mean equation and the second equation is referred to as the
variance equation. Together they form an ARCH model, both equations must
estimated simultaneously. In the mean equation here, xt is simply an expression
for the conditional mean of yt . In a real situation this can be explanatory variables,
an AR or ARIMA process. It will be understood that yt is stationary and I(0),
otherwise the variance will not exist.
This example is an ARCH model of order one, ARCH(1). ARCH models can
be said represent an ARMA process in the variance. The implication is that a high
variance in period t-1 will be followed by higher variances in periods t, t + 1, t + 2
etc. How long the shock persists depends, as in the ARMA model on the size of
yt = xt + t t D(0; ht ) (17.4)
q
X
2 2 2 2
ht = !+ 1 t 1 + 2 t 2 + ::: q t q = 0 + i t i: (17.5)
i=1
The expression for the variance shows a autoregressive process in the variance of
: Deliberately the distribution of the residual term is left undetermined. In ARCH
models normality is one option, but often the residual process will be non-nonrmal
and often display thicker tails, and be leptokurtic. Thus, other distributions such
as the Student t-distribution can be a better alternative.
The t-distribution has three moments, the mean, the variance and the "degrees
of freedom of the Student t-distribution". In this case, if the residual process t
St(0; h2 ; ); where is a positive parameter that measures the relative importance
of the peak in relation to the thickness of the tails. The Student t distribution is
a symmetrical distribution that contains the normal distribution as a special case,
as ! 1:
The ARCH process can be detected by testing for ARCH and by inspecting
the P ACF and ACF of the estimated squared residual ^2t : As is the case for AR
models, ARCH has a more general form, the Generalised ARCH, which implies
lagging the dependent variable ht : A long lag structure in the ARCH process can
be substituted with lagged dependent variables to create a shorter process, just as
for ARMA processes. A GARCH(1,1) model is written as,
yt = xt + t t D(0; ht ) (17.6)
2
ht = !+ 1 t 1 + ht 1: (17.7)
yt = xt + t t D(0; ht ) (17.8)
Xq Xp
2
ht = !+ i t i+ i ht i : (17.9)
i=1 i=1
To explore ARCH models, let us start with the following AR(1) model, which
could represent an asset price,
yt = yt 1 + t; (17.10)
Et (yt+1 j yt ) = yt ; (17.11)
We can see that while the conditional expectation of yt+1 depends on the in-
formation set It = yt , both the conditional (V art ) and the unconditional variances
(Var) do not depend on It = yt .
If we extend the forecasts k periods ahead we get, by repeated substitution,
k
X
k k i
yt+k = yt + t i: (17.15)
i=1
The …rst term is the conditional expectation of yt k periods ahead. The second
term is the forecast error. Hence, the conditional variance of yt k periods ahead
is equal to
Xk
V art (yt+k ) = 2 2(k i)
: (17.16)
i=1
It can be seen that the forecast of yt+k depends on the information at time t.
The conditional variance, on the other hand, depends on the length of the forecast
horizon (k periods into the future), but not on the information set. Nothing says
that this conditional variance should be stable. Like the forecast of yt it could
very well depend on available information as well, and therefore change over time.
So let us turn to the simplest case, where the errors follow an ARCH(1) model.
We have the following model, yt = yt 1 + t where t D(0, ht ), E( t ) = 0,
E( t t i ) = 0 for i 6= 0, and ht = w + t 2 :
The process is assumed to be stable j j < 1, and since t 2 is positive we must
have w > 0 and 0. Notice that the errors are not autocorrelated, but at the
same time they are not independent since they are correlated in higher moments
through the ARCH e¤ect. Thus, we cannot assume that the errors really are
normally distributed. If we chose to use the normal distribution as a basis for ML
estimation, this is only an approximation. (As an alternative we could think of
using the t-distribution since the distribution of the errors tends to have fatter tails
than that of the normal). Looking at the conditional expectations of the mean
and the variance of this process, Et (yt+1 jyt ) = yt and V art (yt+1 jyt ) = ht+1 =
w + (yt yt )2 :
We can see that both depend on the available information at time t. Especially
it should be noticed that the conditional variance of yt+1 increases by positive and
negative shocks in yt :
Extending the conditional variance expression k periods ahead, as above, we
get,
k
X
2(k i)
V art (yt+k jyt ) = Et (ht+k ): (17.17)
i=1
Substitute by ht ;
2 2
ht = (1 ) + t 1; (17.21)
to get the relationship between the conditional and the unconditional variances
of yt . The expected value of ht in any period i is,
2 2
E(ht+i ) = + E[ht+i 1 ]: (17.22)
The …rst term on the RHS is the long run unconditional forecast variance of
yt . The second term represents the memory in the process, given by the presence
2
of ht+1 . If < 1 the in‡uence of (ht+1 ) will die out in the long run and
the second term vanishes. Thus, for long-run forecasts it is only the unconditional
forecast variance which is of importance. Under the assumption of < 1 the
memory in the ARCH e¤ect dies out. (Below we will relax this assumption, and
allow for unit roots in the ARCH process).
ARCH models represent a class of models were the variance is changing over time
in a systematic way. Let us now de…ne di¤erent types of ARCH models. In all
these models there is always a mean equation, which must be correctly speci…ed
for the ARCH process to be modeled correctly.
1) ARCH(q); the ARCH model of order q,
q
X
2 2
ht = 0 + i t i = 0 + A(L) t : (17.24)
i=1
This is the basic ARCH model from which we now introduce di¤erent e¤ects.
2) GARCH(q; p): Generalized ARCH models.
If q is large then it is possible to get a more parsimonious representation by
adding lagged ht to the model. This is like using ARMA instead of AR models.
A GARCH(q; p) model is
where p 0, q P
> 0, a0P
> 0, i 0, and i 0. The sum of the estimated
parameters (1) = i + i shows the memory of the process. Values of (1)
equal to unity indicates that shocks to the variance has permanent e¤ects, like in
a random walk model. High values of (1); but less than unity indicates a long
memory process. It takes a long time before shocks to the variance disappears.
If the roots of [1 B(L)] = 0 are outside the unit circle we the process is
invertible and,
1 1 2
ht = 0 [1 B(L)] + A(L)[1 B(L)] t (17.26)
" p
# 1 1
X X
2
= 0 1 i + i t i (17.27)
i=1 i=1
= a+ D(L) 2 t ARCH(1): (17.28)
If D(L) < 1 then GARCH = ARCH. Moreover, if the long run solution of the
model B(1), is < 1, the i will decrease for all i > max(p, q).
GARCHmodels are standard tools, in particular, for modeling foreign ex-
change rate markets and …nancial market data. Often the GARCH(1; 1) is the
preferred choice. GARCH models some empirical observations quite well. The
distribution of many …nancial series display fatter tails than the standard normal
distribution. GARCH models in combination with the assumption of a normal
distribution of the residual can generate such distributions. However, many series,
like foreign exchange rates, display both fatter tails and are leptokurtic (the peak
of the distribution is ’higher’than the normal. A GARCH process combined with
the assumption that the errors follow the t-distribution can generate this type
observed data.
Before continuing with di¤erent ARCH models, we can now look at an alter-
native formulation of ARCH models which show their similarities with ordinary
time series models. De…ne the innovations in the conditional variance as,
2
vt = t ht : (17.29)
[1 B(L)]( 2t ) = 0 + A(L) 2
t + [1 B(L)]vt ; (17.31)
and
[1 B(L) A(L)]( 2t ) = 0 + vt B(L)vt 1; (17.32)
which is an ARMA process. This shows us that we can identify a GARCH
process using the same tools as an ARMA model. That is, by looking at the
autocorrelations and partial autocorrelations of ^2t ; estimated from OLS.
Solving for the GARCH(1,1) model,
2 2
t = +(
+ 1 vt 1 + vt :
0 1 + 1) t 1
(17.33)
P
If 1 + 1 = 1, or ( i + i ) = 1 in GARCH(q; p) model, we get what is called
an integrated GARCH model.
There exists various ways of ’putting’the variance ’back’in the mean equation.
The example above assumes that it is the standard error which is the interesting
variable in the mean equation.
6) IGARCH. Integrated ARCH.
When the coe¢ cients sum to unity we get a model with extremely long memory.
(Similar to the random walk model). Unlike the cases discussed earlier the shocks
to the variance will not die out. Current information remains important for all
future forecasts. We talk about an integrated variance and persistence in variance.
A signi…cant constant term in an GARCH process can be understood as a mean
reversion of the variance. But if the variance is not mean-reverting, integrated
GARCH is an alternative, that in a GARCH(1,1) process can put the constant
zero, and restrict the two parameters to unity.
7) EGARCH. Exponential GARCH and ARCH models. (Exponential
due to logs of the variables in the GARCH model). These models have the in-
teresting characteristic that they allow for di¤erent reactions from negative and
positive shocks. A phenomenon observed on many …nancial markets. In the out-
put the …rst lagged residual indicated the e¤ect of a positive shock, while the
second lagged residual (in absolute terms) indicates the e¤ect of a negative shock.
Let us now turn to the estimation of ARCH and GARCH models. The main
problem is the distribution of the error terms, in general they are not normally
distributed. The most used alternatives are the t-distribution and the gamma
distribution. In applications in …nance and foreign exchange rates a t-distribution
is often motivated by the fact the empirical distributions of these variables display
fatter tails than the normal distribution.
If we assume that the residuals of the model follow a normal distribution, we
have that the conditional variance is normally distributed, or t j t 1 NID(0; 2 ).
Using that assumption the following likelihood function is estimated,
T
T T X 2
t
log L = log 2 (log ht ) + : (17.36)
2 2 t=1 ht
Notice that there are two equations involved here, the mean equation and the
variance equation. The process is correctly modelled …rst when both equations
are correctly modelled.
To estimate ARCH and GARCH processes, non-standard algorithms are gen-
erally needed. If yt i is among the regressors some iterative method is always
required. (GAUSS, RATS, SAS provide such facilities). There are also special
programs which deal with ARCH, GARCH and multivariate ARCH. The research
strategy is to begin by testing ARCH, by standard tests procedures. The following
LM test for q order ARCH, is an example,
^2 t = 2
1^t 1 + 2
2^t 2 + ::: + 2
q ^t q + yt + vt ; (17.37)
where T R2 2
(q). Notice that this requires that E( ) = 0, and E( t t i ) 6= 0,
for i 6= 0:
If ARCH is found, or suspected, use standard time series techniques to identify
the process. The speci…cation of an ARCH model can be tested by Lagrange mul-
tiplier tests, or likelihood ration tests. Like in time series modeling the Box-Ljung
test on the estimated residuals from an ARCH equation serves as a misspeci…-
cation test. ARCH type of processes are seldom found in low frequency data.
High frequency data is generally needed to observe these e¤ects. Daily, weekly
sometimes monthly data, but hardly ever in quarterly or yearly data.
Finally, remember two things, …rst that ARCH e¤ects imply thicker tails than
the standard normal distribution. It not obvious that the normal distribution
should be used. On the other hand it, there is no obvious alternative either.
Often the normal distribution is the best approximation, unless there is some
other information. On example, of other information, is that some series are
leptokurtic, higher peak than the normal, in combination with fat tails. In that
case the t-distribution might be an alternative. Thus using the normal density
function is often an approximation. Second, correct inference on ARCH e¤ects
builds upon a correct speci…cation of the mean equation. Misspeci…cation tests of
the mean equation are therefore necessary.
yt = Efxt+1 j It g + et ; or
yt = xet + et ; (18.1)
where xet is the expected value of the variable xt :1
Without given values of the expected value there are two types of common mis-
takes in econometric models on expected driven stochastic processes. The …rst
mistake is to substitute xet with the observed value xt : This leads to an error-in-
variables problem, since xt = xet + vt ; where vt is E(vt ) = 0:The error-in-variable
problem implies that will not be estimated correctly. OLS is inconsistent for the
estimation of the original parameter.
The second mistake is to model the process for xt and substitute this process
into 18.1. Assume that the variable xt follows an AR(2) process, like xt = a1 xt 1 +
yt = a1 xt 1 + a2 xt 2 + et
= 1 xt 1 + 2 xt 2 + et : (18.2)
This estimated model also gives the wrong results, if we are interested in esti-
mation the (deep) behavioral parameter . The variables xt 1 and xt 2 are not
weakly exogenous for the parameter of interest ( ) in this case. The estimated
parameters will be a mixture of the deep behavioral parameter and the parameters
of the expectations generating process (a1 and a2 ).
Not only are the estimates biased, but policy conclusion based on this estimated
model will also be misleading. If the parameters of the marginal model, (a1 and
a2 ) describe some policy reaction function, say a particular type of money supply
rule, changing this rule, i.e. changing a1 and a2 will also change 1 and 2 : This is
a typical example of when super exogeneity does not hold, and when an estimated
model cannot be used to form policy recommendations.
What is the solution to this dilemma of estimating ”deep”behavior parameters,
in order to understand working of the economy better?
1. One conclusion is that econometrics will not work. The problems of correctly
specifying the expectation process in combinations with short samples make
it impossible to use econometric to estimate ”deep” parameters. A better
alternative is to construct micro-based theoretical models and simulate these
models. (As example, use calibration techniques)
2. Sim’s solution was to advocate VAR models, and avoid estimating ”deep”
parameters. VAR models can then be used to increase our understanding
about the economy, and be used to simulate the consequences of unpre-
dictable events, like monetary or …scal policy shocks in order to optimize
policy.
3. Though the rational expectations critique (Lucas, Sims and others) seem to
be devastating for structural econometric modeling, the critique has yet to
be proven. In surprisingly many situations, policy changes appear to have
small e¤ects on estimated equations, i.e. the e¤ects of the switch in monetary
policy in the UK in early 1980s.
yt = x
^t + ut where
ut = et (^
xt xet ) = et (vt v^t ): (18.3)
(To be completed)
1. Test if the di¤erence between the expectation and the outcome is a martin-
gale di¤erence process, conditional on assumptions regarding risk premiums.
2. Test for ”news”. Under the assumption of rational expectations the expected
driven variable should only react the unpredictable event ”news”but not to
events that can be predicted. These assumptions are directly testable as
soon as we have a forecasting model for xet :
3. Variance bounds tests. Again, given xet , it follows that the variance of yt in
equation 18.1 must be higher than the variance of xet :
Encompassing tests
If a model based on taking account of assumed rational expectations behavior
is ”the correct model”, it follows that this model should encompass other models
with lack this feature. Thus, encompassing tests can used to discriminate between
models based on rational expectations and other models.
Rethink the problem. Have you forgotten some important explanatory vari-
able?
Look for outliers and test their e¤ects. Use dummies, trends etc. if they can
be motivated. Look for structural breaks, sample size.
Use …rst (and/or second) di¤erences instead, to get a model with only sta-
tionary I(0) variables that leads to estimated parameters with well de…ned
distributions. You have to conclude that your model might not be good for
long-run analysis.
Continue with the modeling process to get the least bad of all possible mod-
els, at least. If possible, show that there may be strong a priori information
that justi…es the model. Add that cointegration is only an asymptotic result,
and that your sample is too short.
Consider stop modelling, and conclude that the absence of cointegration is
an interesting conclusion in itself! (Data problem, wrong theory, missing ex-
planatory factors etc.). Do not waste too much time on a problem where the
answers will be dependent on ad hoc assumptions concerning distributions,
or instable results which will be totally model dependent.
information crteria for each model. Then when you press "Progess" will you see both F-test for
lag order and Information critera for the di¤erent VAR modeles you estimated.
Study outliers. Use dummies and trends to get white noise. But remember
that they should be motivated.
Rethink the problem or stop. RESET test!! (Perhaps you should try to
condition on some other variable instead?)
Is the equation in line with what you think can be an economic meaningful
long-run equilibrium? Check sign and sizes of parameters.
Establish the stability of the conditional model without using ad hoc trends
or dummies:(= criteria for stability).
Test for instability in the marginal model. If it is unstable while the condi-
tional is stable you have super exogeneity. If the marginal model is unstable
you can go one step further by forcing the marginal to be stable by imposing
trends and dummies in such a way that it becomes stable. Then put these
trends and dummies into the conditional model and test if they are signif-
icant there? If not you have super exogeneity. And can reject the parts of
the assumptions in the rational expectations theory.
X. STOP when you …nd a model that is consistent with the data
chosen. And where the parameters make ”economic sense.”
In other words "a well-de…ned statistical model".
That is a model with white noise innovation residuals and stable parameters,
which is also encompassing all other rival models. Encompassing meaning that
your model explain other models and picks up more of the variation in the depen-
dent variables, and which has an economic meaning.
Be open minded and inform the reader of the tests and the problems you have
found. Don’t try to prove things which one can easily reject by a simple test. The
rule is to minimize the number of assumptions behind your model, and remember
that the errors are the outcome of the formulation of the model.
Andersson, T.W. (1971) The Statistical Analysis of Time Series, John Wiley &
Sons, New York.
Andersson, T.W. (1984) An Introduction to Multivariate Statistical Analysis,
John Wiley & Sons, New York.
Banerjee, A., J. Dolado, J.W.Galbraith and D.F. Hendry, (1993) Cointegration,
Error-Correction and the Econometric Analysis of Non-stationary Data, (Oxford
University Press, Oxford).
Baillie, Richard J. and Tim Bollerslev, The long memory of the Forward pre-
mium, Journal of Money and Finance 1994, 13 (5), p. 565-571.
Baillie, Richard J., Tim Bolloerslev and Hans Ole Mikkelsen (1966) Fraction-
ally Integrated Generalized Autoregressive Heteroscedastcity, Journal of Econo-
metrics 74, 3-30.
Banerjee, A., R.L. Limsdaine and J.H Stock (1992) Recursive and Sequential
tests of the Unit Root and Trend Break Hypothesis: Theory and International
Evidence”, Journal of Business and Economics Statistics ?.
Cheung, Y. and K. Lai (1993), Finite Sample Sizes of Johansen’s Likelihood
Ratio Tests for Cointegration, Oxford Bulletin of Economics and Statistics 55, p.
313-328.
Cheung, Y. and K. Lai (1995) “A Search for Long Memory in International
Stock Markets Returns,” Journal of International Money and Finance 14 (4),
p.597-615.
Davidson, James, (1994) Stochastic Limit Theory, Oxford Univresity Press,
Oxford.
Dickey, D. and W.A. Fuller (1979), Distribution of the Estimators for Au-
toregressive Time Series with a Unit Root, Journal of the American Statistical
Association 74.
Diebold, F.X. and G.D. Rudebush (1989), “ Long Memory and Persistence in
Aggregate Output,”Journal of Monetary Economics 24 (September), p. 189-209.
Eatwell, J., M. Milgate and P. Newman eds., (1990), Econometrics (Macmil-
lian, London).
Eatwell, J., M. Milgate and P. Newman eds., (1990) Time Series and Statistics
(Macmillian, London).
Engle, Robert F. ed. (1995) ARCH Selected Readings, Oxford University Press,
Oxford.
Engle, R.F. and C.W.J. Granger, eds. (1991), Long-Run Economic Relation-
ships. Readings in Cointegration, (Oxford University Press, Oxford).
Engle, R.F. and B.S. Yoo (1991) “Cointegrated Economic Time Series: An
Overview with New Results, in R.F Engle and C.W. Granger, eds., Long-Run
Economic Relationships. Readings In Cointegration (Oxford University Press,
Oxford).
Ericsson, Neil R. and John S. Irons (1994) Testing Exogeneity, Oxford Univer-
sity Press, Oxford.
Fuller, Wayne A. (1996) Introduction to Statistical Time Series, John Wiley &
Sons, Nw York.
Freud, J.E. (1972) Mathematical Statistics, 2ed.(Prentice/Hall London).
Granger and Newbold (1986), Forecasting Economic Time Series, (Academic
Press, San Diego).
REFERENCES 157
Granger, C.W.J. and T. Lee (1989) Multicointegration, Advances in Econo-
metrics, 8, 71-84.
Hamilton, James D. (1994) Time Series Analysis, Princton University Press,
Priceton, New Jersey.
Hargreaves, Colin P. ed. (1994) Nonstationarity Time Series Analysis and
Cointegration, Oxfod University Press, Oxford.
Harvey, A. (1990), The Econometric Analysis of Time Series, Philip Allan,
New York).
Hendry, David F. (1995) Dynamic Econometrics, Oxford University Press, Ox-
ford.
Hylleberg, Svend (1992) Modelling Seasonality, Oxford University Press, Ox-
ford.
Johansen, Sören (1995) Likelihood-Based Inference in Cointegrated Vector Au-
toregressive Models, Oxford University Press, Oxford.
Johnston, J. (1984) Econometric Methods (McGraw-Hill, Singapore).
Kwiatkowsky, D., P.C.B. Phillips, P. Schmidt and Y. Shin (1992) ”Testing the
Null Hypothesis of Stationarity Against the Alternative of a Unit Root,” Journal
of Econometrics 54, p. 159-178.
Lo, Andrew W. (1991) “Long-Term Memory in Sock Market Prices,”Economtrica
59 (5:September), p. 1279-1313.
Maddala, G.S. (1988) Introduction to Econometrics (McMillian, New York).
Morrison, D.F. (1967) Multivariate Statistical Methods, McGraw-Hill, New
York).
Pagan, A.R. and M.R. Wickens (1989) ”Econometrics: A Survey,” Economic
Journal, 1-113.
Park, J.Y. (1990), ”Testing for Unit Roots and Cointegration by Variable Ad-
dition,”in T. B. Fomby and G.F. Rhodes (eds.) Co-integration, Spurious Regres-
sions, and Unit Roots: Advances in Econometrics 8, JAI Press, New York.
Perron, Pierre (1989) ”The Great Crash, the Oil Price Shock and the Unit
Root Hupothesis”, Econometrica 57, 1361-1401.
Phillips, P.C.B (1988) Re‡ections on Econometric Methodolgy, The Economic
Record -Symposium on Econometric Methodolgy, December, 344-359.
Sjöö, Boo (2000) Testing for Unit Roots and Cointegration, memo.
Sowell, F.B. (1992) “Modeling Long-Memory Behavior with the Fractional
ARMA Model,” Journal of Monetary Economics 29 (April),p. 277-302.
Spanos, A. (1986) Statistical Foundations of Econometric Modelling (Cam-
bridge University Press, Cambridge).
Wei, William W.S. (1990) Time Series Analysis. Univariate and Multivariate
Methods, (Addison-Wesley Publishing Company, Redwood City).
20.1 APPENDIX 1
158 REFERENCES
cause monthly cycles in the data.1 Smoothing methods, of course, are related
closely to spectral analysis. In this appendix we concentrate on two …lters, or lag
windows, which represent the “best”, or most commonly used methods for time
series in time domain.
Start from a time series, rt . What we are looking for is some weights bi such
that the …ltered series xt , is free of low frequency components,
i=+k
X
xt = bi rt+i : (20.1)
i= k
In this formula the window is applied both backwards and forwards, implying
a combination of backward and forward looking behavior. Whether this is a good
or a bad thing depends totally on the series at hand, and is left to the judgment of
the econometrician. The alternative is to let the window end at time i = 0. The
literature is …lled with methods of calculating the weights bi , in this appendix we
will look at the two most commonly used methods; the Partzén window and the
Tuckey-Hanning window.
The Parzén window is calculated using the following weights,
8 9
< 1 6(i=k)2 + 6(j i j =k)3 ; j i j k=2; =
wi = 2(1 j i j =k)3 ; k=2 j i j k;
: ;
0; j i j k;
where k is the size of the lag window. The Parzén window tries to …t a third
grade polynomial to the original series.
An alternative is the so called Tuckey-Hanning window, calculated as,
1=2 [1 + cos( i=k)] ; j i j k;
wi =
0; jij k
Like the Parzen window, the weights need to be normalized. Under optimal
conditions, that is the correct identi…cation of underlying cycles, the di¤erence be-
tween xt and rt , will appear as a normal distribution. The problem is to determine
the bandwidth, the size of the window, or k in the formula above. Unfortunately
there is no way easy way to determine this in practice. Choosing the size of the lag
window involves a choice between low bias in the mean or a high variance of the
smoothed series. The larger the window the smaller the variance but the higher
is the bias. In practice, make sure that the weights at the end of the window are
close to zero, and then judge the best …t from comparing xt rt . As a rule of
thumb, choose a bandwidth equal to N exp(2=5), the number of observations (N )
raised to the power of 2 over 5. The alternative rule is to set the bandwidth equal
to N 1=4 , or make a decision based on the last signi…cant autocorrelation.. Since
the choice of the window is always ad hoc in some sense, great care is needed if
the smoothed series is going to be used to reveal correlations of ’great economic
consequence’.
APPENDIX II
Testing the Random Walk Hypothesis using the Variance Ratio Test.
For a random walk, xt = xt 1 + "t , where "t N ID(o; 2 ); we have that
the variance is 2 t and that the autocovariance function is cov(xt ; xt k ) = (t
k) 2 . It follows that cov(xt ; xt 1 ) = 21 , and that cov(xt ; xt k ) = 21 k. De…ning
2 1
k = k cov(xt ; xt 1 ). For a random walk we get that the estimated variance ratio
^ 2k
V R(k) = ^ 21
is not signi…cantly di¤erent from zero. The estimated (unbiased)
1 To be clear, we are not saying that daily interest rates necessarily contain monthly cycles,
only that it might be the case. One example is daily observations of the Swedish overnight
interbank rate.
APPENDIX 1 159
autocovariances are given as, for k = 1
T
X
1
^ 21 = (xt xt 1 ^ )2 ; (20.2)
T 1 t=1
2(2k 1)(k 1)
(k) = : (20.4)
3kT
Under these assumptions a test statistic is given as
V R(k) 1
Z(k) = 1 !a N (0; 1); (20.5)
[ (k)] 2
where
PT 2 2
t=j+1 (xt xt 1 ^ ) (xt j xt j 1 ^)
^(j) = hP i2 : (20.7)
T
t=1 (xt xt 1 ^)
V R(k) 1
Z (k) = 1 !a N (0; 1): (20.8)
[ (k)] 2
When dealing with random variables, and series of data there some operators that
simpli…es work. This chapter presents the rules of some common operators applied
160 REFERENCES
to random variables and series of observations. These are the expectations opera-
tor, the variance operator, the covariance operator, the lag operator, the di¤erence
operator, and the sum operator.2 The formal proofs behind these operators are
not given, instead the chapter states the basic rules for using the operators.
All operators serve the basic purpose of simplifying the calculations and com-
munication involving random variables. Take the expectations operator (E), as
an example. Writing E(xt ) means the same as ”I will calculate the mean (or the
…rst moment) of the observations on random variable X: ~ 3 But, I am not telling
exactly which speci…c estimator I would be using, if I were to estimate the mean
from empirical data, because in this context it is not important.”
One important use of operators is in investigating the properties of estimators
under di¤erent assumptions concerning the underlying process. For instance, the
properties of the OLS estimator, when the explanatory variables are stochastic,
when the variables in the model are trending etc.
The …rst operator is the expectations operator. This is a linear operator and, is
therefore easy to apply, as shown by the following rules. In the following, let c
and k be two non-random constants, i is the mean of the variable i and ij is
the covariance between variable i and variable j. It follows that,
E(c) = c:
~ = cE(X)
E(cX) ~ =c x:
~ = k + cE(X)
E(k + cX) ~ =k+c x:
~ + Y~ ) = E(X)
E(X ~ + E(Y~ ) = x + y:
~ Y~ ) = E(X)E(
E(X ~ Y~ ) + covar(X
~ Y~ ) = x y + xy ;
~ 2 ) = E(X)E(
E(X ~ ~ + var(X)
X) ~ = 2
x + 2
x.
The expectations operator is linear and straight forward to use, with one
important exception - the expectation of a ratio. This is an important ex-
ception since it represents a quite common problem.
Y ~ E(Y ) ~
EX ~ is not equal to E(X)
~ : The problem is that the numerator and the denom-
inator are not necessarily independent In this situation it is necessary to use
the p lim operator, alternatively let the number of observations go to zero
and use convergence in probability or distribution to analyze the outcome.
In the derivation of the OLS estimator, the hfollowing
i transformation is often
Y~ 1 ~ ~ Y~ ):
used, when X is viewed as given, E X~ = E X~ Y = E(W
A similar problem occurs in …nancial economics. If F is the forward foreign
exchange rate, and S is the spot rate; E FS 6= E FS . However, E(ln F
ln S) = E(ln F ) E(ln S):
2 The probability limit operator is introduced in a later chapter.
3 Notice the di¤erence between an estimator and an estimate.
For the variance operator, var(:) or V (:) we have the following rules,
var(c) = 0:
~ = c2 var(X)
var(cX) ~ = c2 2
x:
~ = c2 var(X)
var(k + cX) ~ = c2 2
x:
var(Y~ + X)
~ = var(Y~ ) + var(X)
~ + 2cov(Y~ + X)
~ = 2
y + 2
x +2 yx :
If Y~ and X
~ are independent we get,
var(Y~ + X)
~ = var(Y~ ) + var(X)
~ + cov(Y~ + X)
~ = 2
y + 2
x:
The covariance operator (cov) has already been used above. It can be thought of
~
as a generalization of the variance operator. Suppose we have two elements of X,
~ ~
call them Xi and Xj : The elements can be two random variables in a multivariate
process, or refereeing to observations at di¤erent times (i) or (j) of the same
univariate time series process. The covariance between X~ i and X
~ j is
~i; X
cov(X ~ j ) = Ef[X
~i ~ i )][X
E(X ~j ~ j )]g =
E(X ij ;
[To be completed!]
~ with p elements can be de…ned
The covariance matrix of a random variable X
as, 2 3
11 12 ::: ::: 1p
6 21 ::: ::: 7
6 22 2p 7
~ ~ ~
Ef[X E(X)][X E(X) ]g = 6 :::~ 0 6 ::: ::: ::: ::: 77
4 ::: ::: ::: ::: ::: 5
p1 p2 ::: ::: pp
where ii = 2i ; the variance of the i : th element.
Like the expectations and the variance operator there some simple rules. If we
add constants, a and b to X ~ i and X ~j ;
cov(X~ i + a, X ~ j + b) = cov(X ~i, X~ j ):
~ ~
If we multiply Xi and Xj with the constants (a) and (b) respectively, we get,
cov(aX ~ i , bX
~ j ) = ab cov(X ~i, X
~ j ):
The covariance operator is sometimes also written as C( ).
P
In the following represents the sum operator. The basic de…nition of the sum
operator is,
n
X
xi = xm + xm+1 + xm+2 + ::: + xn; (20.9)
i=m
162 REFERENCES
where m and n are integers, and m n. The important characteristic of
the sum operator is that it is linear, all proofs of the following rules of the sum
operator build on this fact.
If k is a constant,
X n Xn
kxi = k xi : (20.10)
i=1 i=1
Some important rules deal with series of integer numbers, like a deterministic
time trend t = 1; 2; :::T: These are of interest when dealing with integrated vari-
ables and determining the ‘order of probability’, that is the order of convergence,
here indicated with O(:);
T
X
t = 1 + 2 + ::: + T = (1=2)[T (T + 1)] = (1=2)[T + 1)2 (T + 1)]
t=1
= O(T 2 ) (20.11)
T
X
t2 = 12 + 22 + ::: + T 2 = (1=6)[T (T + 1)(2T + 1)]
t=1
= (1=3)[(T + 1)3 (3=2)(T + 1)2 + (1=2)(T + 1)]
= O(T 3 ) (20.12)
T
X
t3 = 13 + 23 + ::: + T 3
t=1
= (1=4)[T 2 (T + 1)2 ](1=4)[T + 1)4 2(T + 1)3 + (n + 1)2 ]
= O(T 4 ): (20.13)
1 1
p lim(x ) = [p lim(x)] ; (20.20)
1 1
p lim(A ) = [p lim(A)] : (20.23)
These rules hold regardless of whether the variables are independent or not.
Ln xt = xt n:
=1 L
such that
xt = xt xt 1
xt = xt + xt 1
or as,
xt 1 = xt xt
Di¤erencing at higher order is done as
d
xt = (1 L)d xt
164 REFERENCES
Setting d = 2 we get,
2
xt = (1 L)2 xt = (1 2L + L2 )xt
= xt 2xt 1 + xt 2 = xt xt 2
The letter d indicates di¤erences, which can be done by integer numbers such as
-2, -1, 0, 1 and 2. It is also possible to use real numbers, typically between -1.5 and
+1.5. With non-integer di¤erencing we come fractional integration, and so-called
long run memory series. If variables are expressed in log, which is the typical thing
in time series, the …rst di¤erence will be a close approximation to per cent growth.
The lag operator is sometimes called the backward shift operator and is then
indicated with the symbol B n . The di¤erence operator, de…ned with the backward
shift operator is written as 5d = (1 B)d : Econometricians use the terms lag
operator and di¤erence operators with the symbols above. Time series statisticians
often use the backward shift notations.