Lecture Notes
Lecture Notes
Anders Warne
1. Outline
• Macroeconomic background
– Sims (1980)
– Stock and Watson (1988)
• Vector Autoregressions
1. Stationarity vs. nonstationarity
2. Structural models
3. Dynamic experiments
4. Estimation
– Lütkepohl (1991), chapter 2
– Hamilton (1994), chapter 11
– Sims (1980)
– Cooley and LeRoy (1985)
– Runkle (1987)
• Cointegration and Common Trends
– Johansen and Juselius (1990)
– King, Plosser, Stock, and Watson (1991)
– Mellander, Vredin, and Warne (1992)
– Englund, Vredin, and Warne (1994)
– Jacobson, Vredin, and Warne (1996)
These notes will not discuss estimation and inference in structural VAR’s; the reader is
instead adviced to consult the sources listed above. Rather, the main purpose is to explain
terminology and concepts by focusing on a few simple examples. Some familiarity with
ARIMA models is assumed.
Example: Let Yt and Ct denote (the natural logarithms of) aggregate income and consump-
tion, respectively. Consider the following version of the permanent income hypothesis (PIH)
for t = 1, 2, . . . :
p
Y t = Y t + vt , (2.1)
p p
Yt = µY + Yt−1 + ut , (2.2)
p
Ct = Y t , (2.3)
p p
where (ut , vt ) is iid(0, Diag(σu2 , σv2 )) and Y0 is fixed. Solving for permanent income, Yt , in
p
terms of Y0 and ui we obtain
p p
t
Yt = Y0 + µY t + ui . (2.4)
i=1
p t
Yt = Y0 + µY t + i=1 ui + vt ,
(2.5)
p t
Ct = Y0 + µY t + i=1 ui .
p
Note: Yt and Ct are nonstationary since, conditional on Y0 , the mean and the variance for
both variables depend on t. For example,
p p
E[Yt |Y0 ] = Y0 + µY t, (2.6)
p
V [Yt |Y0 ] = σu2 t + σv2 . (2.7)
Furthermore, the following transformations of aggregate income and consumption are weakly
stationary1
∆Ct = µY + ut , (2.9)
Ct − Yt = −vt . (2.10)
Here, ∆ = 1−L is the first difference operator and L is the lag operator, i.e. Lxt = xt−1 . Tech-
nically, we have found that Yt , Ct are integrated of order 1 (denoted by I(1)) and cointegrated
of order (1,1) (denoted by CI(1,1)). The latter property means that a linear combination of
I(1) variables is I(0) (weakly stationary).
The term integrated comes from the observation that, e.g., Yt in (2.5) includes a compo-
nent where we sum from 1 to t (discrete integration) over a stationary variable. Since we
1
A time series is said to be weakly stationary if the first and second moments are invariant (in an absolute
sense) with respect to time.
–2–
sum once over this interval we say that this component is integrated of order 1. The change
in Yt includes zero such summations and is therefore integrated of order zero.
3. Vector Autoregressions
Ct = µY + Ct−1 + ut . (3.1)
That is, aggregate consumption is decribed by an AR(1) process. Moreover, equation (2.8)
also gives us that aggregate income is related to consumption according to
Y t = C t + vt . (3.2)
Yt = µY + Ct−1 + ut + vt
(3.3)
= µY + Ct−1 + wt .
or more compactly
xt = µ + Π1 xt−1 + εt , (3.5)
• εt and xt−1 are uncorrelated. A consequence of this is that εt = xt −E[xt |xt−1 , xt−2 , . . . ].
In other words, εt is a Wold innovation, i.e. it represent the new information in xt
relative to its history.
• The covariance matrix of εt is given by
E[wt2 ] E[wt ut ]
E[εt εt ] =
2
E[wt ut ] E[ut ]
(3.6)
σu2 + σv2 σu2
=
≡ Σ,
σu2 σu2
–3–
This matrix is positive definite since a Σa > 0 for all a ∈ R2 : a ≠ 0.
• The system in (3.5) is nonstationary since the individual time series, Yt and Ct , are
nonstationary.
• The system in (3.5) is a reduced form, i.e. neither the parameters (µ, Π1 , Σ) nor the
innovations εt have an economic interpretation.
B0 xt = γ + B1 xt−1 + ηt , (3.7)
where
We can examine how Y and C react to permanent and transitory income shocks by using
equation (2.5). For income we find that a one standard deviation shock at an arbitrary t in
permanent and transitory income, respectively, implies the following responses:
–4–
These dynamic functions are called impulse response functions. Notice that permanent in-
come shocks have permanent effects on income and consumption, while transitory income
shocks only have transitory effects on income and no effect on consumption.
We can also study the relative importance of the two shocks through forecast error vari-
ance decompositions. To derive these parameters we first note that income at t + j is given
by
t+j
p
Yt+j = Y0 + µY (t + j) + ui + vt+j .
i=1
Hence, the expectation of Yt+j conditional on current (t) and past values of x (and the
parameters) is
p
t
E[Yt+j |xt , xt−1 , . . . ] = Y0 + µY (t + j) + ui .
i=1
The forecast error variance thus contains two parts; one part is due to permanent income
shocks while the remainder is due to transitory income shocks. The share of the total
forecast error variance explained by permanent income shocks is thus
jσu2
wyp,j = ,
jσu2 + σv2
while the share explained by transitory income shocks is
σv2
wyτ,j = .
jσu2 + σv2
Remarks:
–5–
4. Stability and Stationarity
Let xt ∈ Rn be a vector of random variables generated by the following Gaussian VAR model:
p
xt = µ + Πj xt−j + εt , t = 1, 2, . . . , T , (4.1)
j=1
while Σ is positive definite and x0 , . . . , x1−p is fixed. The parameter p is called the lag length
(order) and is assumed to be finite.
xt = µ + Π1 xt−1 + εt ,
(1 − Π1 L)xt = µ + εt .
If |Π1 | < 1, then the polynomial (1 − Π1 z) is invertible for all |z| ≤ 1 and the AR(1) process
is said to be stable. It now follows that
xt = (1 − Π1 L)−1 (µ + εt )
∞ j j
= Π
j=0 1 L (µ + εt )
∞ j ∞ j
= j=0 Π1 µ + j=0 Π1 εt−j
∞ j
= µ/(1 − Π1 ) + j=0 Π1 εt−j .
lim |Π1 |j = 0.
j→∞
∞ j 2
V [xt ] = E[[ j=0 Π1 εt−j ] ]
∞ 2j 2
= j=0 Π1 E[εt−j ]
= Σ/(1 − Π21 ).
–6–
The autocovariances can be computed similarly. Note first that
h−1 j ∞ j
xt = µ/(1 − Π1 ) + j=0 Π1 εt−j + j=h Π1 εt−j
h−1 j ∞ j+h
= µ/(1 − Π1 ) + j=0 Π1 εt−j + j=0 Π1 εt−h−j .
h−1 j ∞ ∞
j+h j
C[xt , xt−h ] = E ( j=0 Π1 εt−j + j=0 Π1 εt−h−j )( j=0 Π1 εt−h−j )
∞ 2j+h 2
= j=0 Π1 E[εt−h−j ]
∞ 2j
= Πh1 j=0 Π1 Σ
= Πh1 V [xt ].
Finally,
Conclusion 1: When xt is generated by a stable (|Π1 | < 1) AR(1) process and εt is iid (0, Σ)
(we have not used the assumption of normality), then xt is also
1. weakly stationary since the first and second moments are invariant with respect to
time
2. ergodic since the dependence between xt and xt−h (in an absolute sense) declines as
the distance h increases.
Let us now examine the general case. Consider the matrix polynomial
p
Π(z) = In − Πj z j ,
j=1
obtained from
Π(L)xt = µ + εt .
–7–
Example: Suppose n = 2. Then
Π11 (z) Π12 (z)
Π(z) =
.
Π21 (z) Π22 (z)
Now,
1 − p Πii,k z k if j = i,
k=1
Πij (z) =
p
− k
k=1 Πij,k z otherwise.
The parameters φk are determined directly from Πij,k . For instance, φ1 = Π11,1 + Π22,1 ,
whereas φ2 = Π11,2 + Π22,2 + Π21,1 Π12,1 − Π11,1 Π22,1 . The third equality above determines
the λi ’s from the φk ’s. Notice that while φk is a real number and unique, λi is a complex
number and typically not unique. That is, we need to use some ordering rule before the λi ’s
can be uniquely determined.
np
Conclusion 2: Let det[Π(z)] = i=1 (1 − λi z), where |λnp | ≥ |λnp−1 | ≥ . . . ≥ |λ1 | ≥ 0.
Then Π(z) is invertible if and only if |λnp | < 1.
√
Let |λi | denote the modulus of λi Suppose, that λ1 = .5 + .6ı, λ2 = .5 − .6ı, where ı = −1.
Then
|λ1 | = .52 + .62 = .7810 = |λ2 |.
An equivalent condition for invertibility is that det[Π(z)] = 0 if and only if |z| > 1. The
z’s which imply that the determinant is zero are called roots, and this condition states that
–8–
all roots must lie outside the unit circle. The λi ’s are eigenvalues of the matrix:
Π 1 Π2 · · · Πp−1 Πp
In 0 ··· 0 0
Π=
0 In 0 0 .
. ..
. ..
. . .
0 0 ··· In 0
This matrix is found when we rewrite the VAR(p) system into a VAR(1) system for Xt =
(xt , xt−1 , . . . , xt−p+1 ), an np × 1 vector. In the AR(1) case, Π = Π1 = λ1 and the invertibility
(stability) condition was found to be |Π1 | < 1.
The results in Conclusion 1 thus also hold when np > 1. That is, if xt is a stable VAR(p)
process, then xt is weakly stationary and ergodic.
where
p
xt = B0−1 γ + j=1 B0−1 Bj xt−j + B0−1 ηt
(5.3)
p
= µ + j=1 Πj xt−j + εt ,
where
with Σ = B0−1 Ω(B0 )−1 being positive definite since Ω is positive definite and B0 invertible.
The general answer is, of course, no. This follows directly from the observation that there
are n2 additional parameters in (5.1) relative to (5.3). Accordingly, if the parameters in (5.3)
are uniquely determined (from the distribution for xt ), then to achieve identification of the
–9–
parameters in (5.1) it is necessary (but generally not sufficient) to impose n2 restrictions
(identifying assumptions) on its parameters.
Although uniqueness is often at the heart of what econometricians tend to mean by a
structural model, there is no unique definition as to what a structural model is. Let F (x; θ)
be a distribution function for x which depends on a vector of parameters, θ.
Definition 1 (statistics): A structural model for x is given by a function F (x; θ) such that θ
is uniquely determined from the probability distribution for x.
Question 5: Is the VAR model in (5.3) a structural model according to this definition?
The answer is yes. (µ, Π1 , . . . , Πp , Σ) is uniquely determined from the first and second
moments for xt . For example, in the case when p is equal to one and the mean of x is zero
we have that
µ=0
−1
Π1 = E[xt xt−1 ]E[xt−1 xt−1 ]
Σ = E[xt xt ] + Π1 E[xt−1 xt−1 ]Π1 − E[xt xt−1 ]Π1 − Π1 E[xt−1 xt ].
Hence, the VAR parameters are uniquely determined from the population moments of x.
Since (γ, B0 , B1 , . . . , Bp , Ω) is not uniquely determined from (µ, Π1 , . . . , Πp , Σ), the answer
must be no. However, once the parameters of (5.1) are uniquely determined, they are indeed
structural in this sense.
An alternative notion of what a structure (or structural model) is, comes from David
Hendry. My understanding of what he means by a structure in time series econometrics is
the following:
Definition 2 (“Hendry”): A structure is a set of features of the data that remain constant
over time, e.g. properties which do not vary across different policy regimes.
An analogy would be a classroom, where the room is a structure whereas the chairs, tables,
students and teachers are not. While this definition has certain appeals from a practical
(empirical) point of view, it is of limited theoretical interest since parameters are usually
considered constant. In other words, the models in (5.1) and in (5.3) are both structures in
Hendry’s sense since the parameters are taken to be constant. Moreover, as with Definition
1, there is basically no economics in Hendry’s idea of what a structure is.
– 10 –
The variables x can here include endogenous as well as exogenous variables, while the
parameters are taken to be constant (over time or cross sections).
What is important in this definition is that “the parameters” have an economic meaning
(that the variables have an economic meaning is implicitly assumed). But this doesn’t mean
that any transformation of the parameters, e.g. φ = f (θ) for a particular function f (·), can
be given an economic interpretation. Note also, that this definition does not say that θ
satisfies Definition 1. In other words, the structural parameters need not be identified!
This definition indicates that a structural model should be useful for, e.g., policy analy-
sis. In that sense, it is similar to Hendry’s idea of a structure. Moreover, Sims’s definition
suggests that θ is identified, i.e. it satisfies the statistical notion of a structural model. Fi-
nally, in order for the results of the actions to be meaningful to an economists, θ must have
an economic interpretation. Hence, Sims’s definition seems to incorporate all the above
notions of what a structural model is. Moreover, it suggests that (5.1) can be a structural
model whereas (5.3) cannot.2
This definition presumes a time series perspective. It states that a variable x1 can be
causal for x2 if x1 occurs (is realized) before x2 . However, in practise the sampling frequency
of macroeconomic time series is typically (much) lower than the frequency between causal
events, thus making use of this definition somewhat doubtful.
The early structural VAR analyses, e.g. Sims (1980), are based on so called Wold causal
chains with independent innovations (shocks). An economic interpretation is given to the
shocks (the actions, e.g. a monetary policy shock) and to the dynamic responses in the
endogenous variables (impulse response functions and variance decompositions).
To show that Wold causal chains are exactly identifying, note first that
1. B0 is lower triangular (recursive) yields n(n − 1)/2 restrictions.
2. Ω is diagonal (mutually independent innovations) yields n(n − 1)/2 restrictions.
This gives us a total of n(n−1) identifying restrictions. To exactly identify the parameters of
(5.1) we need at least n additional restrictions. These are given by either letting all the diag-
onal elements of B0 or of Ω be equal to unity. For impulse responses functions and variance
decompositions, these two choices of the n normalizing assumptions are equivalent.
Consider first the case where Ω = In . Then Σ = B0−1 (B0 )−1 . Since B0 is lower triangular,
its inverse is also lower triangular. Let P denote the inverse of B0 . With Σ = P P the matrix
2
This statement assumes that “actions” have an economic meaning.
– 11 –
P is called the Choleski factor of Σ and it can be shown that P is uniquely determined up to
an orthogonal transformation N such that N is diagonal with diagonal elements equal to 1
or −1. That is, P ∗ = P N is also lower triangular and satisfies Σ = P ∗ P ∗ . In plain english
this means that each structural shock is identified up to its sign!
Notice that all pij ’s are real numbers since Σ is assumed to be positive definite. Moreover,
we have chosen the orthogonal matrix N = I2 .
Consider now the case when we choose to impose the n normalizing assumptions on the
diagonal of B0 . In the n = 2 case we have that
ω11 = σ11 ,
– 12 –
Here we find that ωii > 0 since Σ is positive definite, while the sign of β21 depends on the
sign of the covariance between the two residuals in the reduced form VAR.
Alternatively, suppose B0 and Ω are given by
1 −β12 ω11 0
B0 =
Ω=
.
0 1 0 ω22
In this case
−1 −1
σ11 σ12 1 −β12 ω11 0 1 0
=
σ12 σ22 0 1 0 ω22 −β12 1
1 β12 ω11 0 1 0
=
0 1 0 ω22 β12 1
ω11 + β212 ω22 β12 ω22
=
.
β12 ω22 ω22
ω22 = σ22
All these choices of B0 , Ω are observationally equivalent . The first and the second structural
models are equivalent up to a choice of normalization, while the third has very different
implications for the behavior of x except when σ12 = 0.
As long as we choose to identify B0 and Ω from the covariance matrix Σ, all structures will
be related to the Choleski decomposition of Σ. For instance, suppose Ω = In . Then there
exists an infinite number of orthogonal matrices N such that B0 = (P N)−1 . In the bivariate
case, one such orthogonal matrix is:
1 1 −1
N=√ .
2
1 1
– 13 –
Hence, we no longer have a recursive structure!
Example: PIH revisited. Remember that the VAR(1) system for Yt and Ct can be written as
Ct µY 1 0 Ct−1 ut
= + + , (5.5)
Yt µY 1 0 Yt−1 wt
where wt = ut + vt , while
σu2 σu2
Σ= . (5.6)
σu2 σu2 + σv2
where
Question 7: Given (5.5) and (5.6), can we derive (5.7) from Σ = B0−1 Ω(B0 )−1 with Ω diagonal
and B0 lower triangular with unit diagonal elements?
Accordingly,
ω11 = σu2
β21 = 1
ω22 = σv2 .
and
1 0 σu2 0
B0 =
,
Ω= .
−1 1 0 σv2
– 14 –
Now, suppose we change the ordering of the variables while
1 0 ω11 0
B0 =
,
Ω=
.
−β21 1 0 ω22
Notice that ωii > 0 (as they should be) and that 0 < β21 < 1. Moreover, the resulting
structural VAR system is now given by
1 0 Y t µY 0 1 Yt−1 ut + vt
= + + .
−β21 1 Ct (1 − β21 )µY 0 1 − β21 Ct−1 (1 − β21 )ut − β21 vt
Hence, the first “structural shock” is the sum of the permanent and the transitory income
shock, while the second is another linear combination of the true structural shocks.
The last case illustrates the Cooley and LeRoy (1985) critique against arbitrary orderings
of the variables when the identifying assumptions are based on Wold causal chains. If the
identifying assumptions do not rely on a particular economic theory, the resulting “struc-
tural shocks” can be pure nonsense shocks. Note, however, that this second example is not
empirically irrelevant. In fact, models of consumption have a long tradition of using these
identifying assumption; see e.g. Davidson, Hendry, Srba, and Yeo (1978). To be fair, the
context where it has been used is very different from that of structural VAR’s.
If we interpret the second equation in the above structural VAR model as a consumption
function we find after a bit of algebra that it can be written as
where ψt = (1−β21 )ut −β21 vt . We have already noted that 0 < β21 < 1 and that income and
consumption are CI(1,1) with (Ct −Yt ) being a cointegration relation. The relationship in (5.8)
is consistent with a Keynesian consumption function in the empirical modelling tradition
of the so called LSE school (Sargan, Hendry, etc.). The cointegration relationship would then
– 15 –
have the interpretation of a long run consumption rule (or function). The true values of the
parameters suggest that, ceteris paribus, an increase in current income by 1 percent leads
to an increase in current consumption by less than 1 percent, while consumption over the
long run level in the previous period (Ct−1 > Yt−1 ) leads to a partial decrease in current
consumption.
The assumption which is critical here is that ∆Yt and ψt are uncorrelated. In terms of
the Wold causal chain this means that income is predetermined, i.e. current income does
not depend on current consumption.
To choose between the PIH and the Keynesian consumption function we may turn to
examining overidentifying assumptions. In our example, the PIH implies that consumption
is a random walk (with drift) and is thus consistent with the Hall (1978) version of this
hypothesis. In terms of the VAR in (5.3) this implies 2 restrictions on Π1 . Once we have
established that the data is consistent with these restrictions and we choose to use these
restrictions in our analysis, the consumption function in (5.8) is no longer an interesting
competing theory.
To sum up, we have shown that two sets of identifying assumptions can yield results
which makes sense to an economist. The data will not help us choose between these two
structures and we can always find an economists who will argue in favor of one of these
theories over the other. Still, when we attempt to identify a structural model, economic
theory is, in my opinion, the best guide available to us. If competing theories provide
restrictions on the parameters of the reduced form VAR, these may help us choose which
theory is consistent with the data.
The notion that a set of impulses and a propagation mechanism are useful tools when
analysing an economy goes back to Frisch (1933) and Slutzky (1937).
Impulse response analysis addresses the question:
Question 8: How does x react (over time) to a change in one of the shocks?
Suppose that our VAR(p) model is stable so that xt is weakly stationary. The resulting
VMA representation of the VAR is then
xt = Π(L)−1 (µ + εt )
= δ + C(L)εt ,
– 16 –
where
∞
C(z) = In + Ci z i .
i=1
ηt = B0 εt . (6.2)
xt = δ + C(L)B0−1 B0 εt
(6.3)
= δ + R(L)ηt ,
where
∞
R(z) = Ri z i , (6.4)
i=0
with
B −1 if i = 0,
0
Ri =
Ci B0−1 otherwise.
Example: Let
√
1 0 ω11 0
I2 =
=
e1 e2 Ω1/2
= .
√
0 1 0 ω22
– 17 –
Using the structural VMA represenation in (6.3) we find that the impulse response function
is:
√ √
resp(xt ∗ |ηt ∗ = ej ωjj ) = R0 ej ωjj ,
√ √
resp(xt ∗ +1 |ηt ∗ = ej ωjj ) = R1 ej ωjj ,
(6.6)
..
.
√ √
resp(xt ∗ +i |ηt ∗ = ej ωjj ) = Ri ej ωjj .
for all j ∈ {1, 2} since xt is ergodic. In other words, for weakly stationary VAR(p) models,
the response in x from any shock vanishes in the long run. Hence, we can say that xt is
mean reverting.
Question 10: Is the experiment in equation (6.5) relevant from a statistical point of view?
Shocks at t ∗ and t ∗ +i are independent and, moreover, different structural shocks are
independent. Thus, the experiment is consistent with the assumptions about ηt .
Question 11: Is the experiment in equation (6.5) relevant from an economics point of view?
If the shocks can be given a credible economic interpretation, the answer would be yes.
However, there is no guarantee that the responses in x will be fully consistent with the
economic model which the identification of the shocks is based on. This can occur when the
economic model implies overidentifying restrictions on the parameters of the VAR model
and these restrictions are not consistent with the data.
An important assumption in structural VAR modelling is that the structural shocks are
linear combinations of the residuals in the reduced form VAR model (the so called Wold in-
novations). To illustrate the relevance of this assumption, consider the following univariate
process
2
where |α| < 1 and ψt ∼ iid N(0, σψ ). This looks like an ordinary ARMA(1,1) model, where
the AR polynomial is invertible while the MA polynomial is not. Also, the MA coefficient is
equal to the inverse of the AR coefficient. The polynomial (1 − α−1 z)/(1 − αz) is called a
Blaschke factor.
– 18 –
Since the AR polynomial is invertible, it follows that ηt is weakly stationary and that its
mean is zero. Moreover, the variance is given by
E[η2t ] = E[α2 η2t−1 + ψt2 + α−2 η2t−1 + 2αηt−1 ψt − 2ηt−1 ψt−1 − 2α−1 ψt ψt−1 ]
= α2 E[η2t−1 ] + σψ
2
+ α−2 σψ
2 2
− 2σψ .
(α−2 − 1)σψ
2
ση2 = . (6.8)
1 − α2
Similarly, the first autocovariance is given by
= αση2 − α−1 σψ
2
= [α(α−2 − 1)σψ
2
− α−1 (1 − α2 )σψ
2
]/(1 − α2 )
= 0.
2
Accordingly, the parameters α, σψ cannot be uniquely determined from the distribution for
2
η since this random variable is not serially correlated. Still, for any pair (α, σψ ) consistent
with the population variance of η, there is a dynamic reaction in η from a shock to ψ. For
instance, consider the experiment
σψ if t = t ∗ ,
ψt =
0 if t > t ∗ .
resp(ηt ∗ |ψt ∗ = σψ ) = σψ ,
– 19 –
Question 12: Should we worry about Blaschke factors?
According to Lippi and Reichlin (1993), modern macroeconomic models which are lin-
earized into dynamic systems tend to include noninvertible MA components. While this is
certainly a problem from the point of view of estimating a multivariate ARMA model, we
should keep in mind that noninvertibility of an MA term does not mean that there exists
an AR factor whose coefficient is the inverse of the MA coefficient in question. Still, it em-
phasizes the point made earlier that sound structural VAR analysis should rest on a firm
theoretical basis.
A variance decomposition, or innovation accounting, measures the share of the forecast
error variance which is accounted for by a particular shock. Hence, variance decompositions
address the question:
Question 13: How important is a particular shock (relative to all the other shocks) for
explaining the fluctuations in x?
To construct the forecast error variance, from (6.3) we have for all h ≥ 1 that
∞
xt+h = δ + Rk ηt+h−k .
k=0
The optimal prediction of xt+h given all information available at period t is the conditional
expectation.3 Hence,
∞
E[xt+h |xt , xt−1 , . . . ] = δ + Rk ηt+h−k . (6.9)
k=h
a VMA process of order (h − 1). Consequently, the forecast error covariance matrix for x is
h−1
Vh = E[ϕt+h|t ϕt+h|t ]= Rk ΩRk . (6.11)
k=0
Notice that this covariance matrix is invariant to the choice of identification, i.e. Vh =
h−1
k=0 Ck ΣCk .
For a particular variable i ∈ {1, . . . , n} the h steps ahead forecast error variance is given
by the i:th diagonal element of Vh . With ei being the i:th column of In this variance can be
written as
h−1
vi,h = ei Vh ei = ei Rk ΩRk ei . (6.12)
k=0
3
By optimal we mean that it has the smallest mean square error among all unbiased predictors.
– 20 –
Let Rij,k denote the (i, j):th element of Rk . It then follows that
ω 0 Ri1,k
11
.
ei Rk ΩRk ei = .. .
Ri1,k · · · Rin,k . .
0 ωnn Rin,k
n 2
= j=1 Rij,k ωjj .
h−1 n
2
vi,h = Rij,k ωjj . (6.13)
k=0 j=1
h−1 n 2
1 = k=0 j=1 Rij,k ωjj /vi,h
n
h−1 2 (6.14)
= j=1 k=0 Rij,k ωjj /vi,h
n
= j=1 wij,h .
The parameter wij,h takes values in the unit interval and measures the fraction of the h
steps ahead forecast error variance for variable i which is accounted for by shock j.
Example: Suppose n = 2 with Ω diagonal and B0 lower triangular with unit diagonal ele-
ments. With R0 = B0−1 the 1 step ahead forecast error variance is
V1 = R0 ΩR0
1 0 ω11 0 1 β21
=
β21 1 0 ω22 0 1
ω11 β21 ω11
=
.
β21 ω11 ω22 + β221 ω11
Hence, the 1 step ahead forecast error variance for each variable is invariant with respect to
the choice of identification. The variance decompositions, however, are not invariant. For
1 step ahead forecast errors, the share of the total variance for the first variable which is
explained by the first (second) shock is unity (zero). For the second variable, the share due
– 21 –
to the first shock is β221 ω11 /σ22 while the share due to the second shock is ω22 /σ22 . if we
instead assume that the B0 matrix is upper triangular (with unit diagonal elements), for the
second variable variable we find that the share of the 1 step ahead forecast error variance
due to the second (first) shock is unity (zero). Moreover, for the first variable both shocks
may now account for the error variance.
It has long been recognized that many macroeconomic time series are trending and thus
not well described as weakly stationary. To transform the data into appropriate stationary
series various detrending techniques have been considered. Common among these are the
linear trend model and the first difference model.
Example: PIH revisited. From equation (2.5) we find that both variables have a linear trend
when µY ≠ 0. However, removal of this trend does not make the variables stationary since
t
they also include a stochastic trend, i=1 ui . Hence, the linear trend model is not appropri-
ate for rendering nonstationary variables stationary in this model.
By taking first differences, we know from equations (2.8) that these transformations make
income and consumption stationary. Still, there does not exist a VAR model with finite lag
order for the first differences. In fact, if we subtract Ct−1 from the consumption equation
of (3.4) we have that
∆Ct = µY + ut , (7.1)
Hence, once the left hand side variables have been transformed into first differences, the
levels of lagged income and consumption still appears on the right hand side of the model.
Moreover, the matrix of coefficients on the lagged levels has reduced rank (lower rank than
– 22 –
dimension). Specifically,
0 0
Π =
1 −1
0
= 1 −1
1
= αβ
where the vectors α, β have rank 1. Finally, the product Πxt−1 yields
0 Ct−1
Πxt−1 =
1 −1
1 Yt−1
0
=
(Ct−1 − Yt−1 ) .
1
Hence, the product produces a vector of weights α on the cointegration relation between
consumption and income. VAR models in first differences do not take this relation into
account and are therefore misspecified.
An alternative way of deriving an appropriate transformation of the variables in the VAR
model is to calculate the number of unit roots. If this number is lower than the dimension
of the VAR, then a VAR model in first differences will be overdifferenced. In the PIH case
we have that
1 − z 0
Π(z) =
.
(7.4)
−z 1
This matrix polynomial has exactly 1 unit root and no roots inside the unit circle. Hence,
the number of variables (2) exceeds the number of unit roots.
To generalize these observations, consider again the VAR(1) model for xt in (3.5). Sub-
tracting xt−1 from both sides we obtain
= µ + Πxt−1 + εt ,
– 23 –
To ensure that xt is I(d) with the integer d ≥ 0, we shall assume that det[Π(z)] = 0 if and
only if |z| > 1 or z = 1. In other words, there are neither explosive (|z| < 1) nor seasonal
(z = −1) roots.
While the first two cases are not too difficult to understand, the third case is far from
obvious. Specifically, what is the importance of the condition that “ . . . the number of unit
roots is equal to n − r . . . ”?
Accordingly, det[Π(z)] = 1 − 2z + z 2 = (1 − z)2 . Hence, there are 2 unit roots. At the same
time,
1 −1
Π=
,
1 −1
has rank 1. Hence, the number of unit roots is greater than n − r = 1. In this case, xt is still
integrated but not I(1).
– 24 –
If we subtract xt−1 from both sides of equation (7.6) we get
∆x1,t 1 −1 x1,t−1 ε1,t
= +
∆x2,t 1 −1 x2,t−1 ε2,t
1 x1,t−1 ε1,t
=
1 −1
+
(7.7)
1 x2,t−1 ε2,t
1 ε1,t
=
x1,t−1 − x2,t−1 +
.
1 ε2,t
From this equation it can be seen that (x1,t − x2,t ) is integrated of an order less than x1,t
and x2,t . Subtracting ∆x2,t from ∆x1,t we obtain
∆x1,t − ∆x2,t = ∆ x1,t − x2,t
(7.8)
= ε1,t − ε2,t .
In other words, (x1,t − x2,t ) is I(1) and we must therefore have that xt is I(2).
Hence, this example illustrates that the condition “ . . . the number of unit roots is equal
to (n − r ) . . . ” rules out the cases when xt is I(d) with d ≥ 2.
In the PIH case, the number of unit roots is exactly equal to (n − r ) and thus satisfies the
conditions in case (3) above.
Returning to the VAR(1) model in (7.5), we can express the matrix Π as
Π = αβ , (7.9)
where α, β are n × r matrices with full column rank. The error correction representation of
the model can now be expressed as
When Π has reduced rank and the number of unit roots equals the rank reduction (n − r ),
then xt is CI(1,1) with β xt being the r cointegration relations.4
Note that the parameters (α, β) are not uniquely determined. For any r × r nonsingular
matrix ξ we have that β∗ xt = ξβ xt is also I(0). With α∗ = αξ −1 it follows that Π = α∗ β∗ .
In other words, the cointegration space, sp(β), is uniquely determined from Π, but the basis
is not.
In the PIH example, we have that (Ct − Yt ) is I(0), but so is a(Ct − Yt ) for any finite a ≠ 0.
4
In the VAR(1) model, the coindition that the number of unit roots is equal to the rank reduction is equivalent
to rank[α β] = r . In the I(2) example, for instance, we have that α β = 0; for parametric conditions for xt to
be I(1) in the VAR(p) model, see Johansen (1991).
– 25 –
Question 14: How do we invert VAR models with unit roots?
Example: For the PIH, the VAR model with a cointegration constraint is given in (7.3). With
(Ct − Yt ) = −vt the income growth equation can be written
∆Yt = µY + wt − vt−1
= µY + wt + ut−1 − wt−1 .
or
In this case, the inverted error correction model is an MA(1) process for the first differences.
As we shall see below, the MA representation for ∆xt is usually of infinite order.
Notice also that C(z) = I2 + C1 z is not invertible. Specifically,
det[C(z)] = 1 − z.
This is, of course, just the other side of the coin of the fact that there does not exist a finite
order VAR model for the first differences.
The result that C(z) has a unit root, means that C = C(1) has reduced rank. In particular,
1 0
C =
.
(7.13)
1 0
– 26 –
Hence, although C(z) itself does not have a common factor, (1 − z), its deviation from C
does. We can therefore express the C(z) polynomial as
We have thus found that the “conditional” MA representation for consumption and income
t
contains (i) an I(1) component (δt + C i=1 εi ); (ii) an I(0) component (C ∗ εt ); and (iii) initial
values (x0 − C ∗ ε0 ). The fact that the MA representation includes the third component is
reason why I call it a conditional representation.
The I(1) component of the conditional MA representation can also be expressed as
µY 1 0 t ui
=
p
xt t + i=1
µY 1 0 wi
1 t
=
µY t + i=1 ui
1 (7.17)
1 µ Y 1
t ui
=
1 0 t + 1 0
i=1
1 µY 1 wi
µt + β α
t
= β⊥ α ⊥ ⊥ ⊥ i=1 εi .
α = 0 and β β = 0. From equation (7.17) it can be seen that income and consump-
Here, α⊥ ⊥
(µt +
t
tion have 1 common trend. This trend can be represented by α⊥ i=1 εi ). Moreover,
p
we find that β xt = 0 since β xt is I(0) and cannot include the I(1) component in xt . Hence,
the cointegration vector acts as a detrending model.
To generalize these results to the VAR(1) model, note first that equation (7.10) can be
rewritten as
xt = µ + In + αβ xt−1 + εt . (7.18)
– 27 –
Premultiplying this system by β yields
a VAR(1) model for the cointegration relations. Solving this model recursively we obtain
∞ ∞
β xt + β α) β µ + i=0 (Ir + β α) β εt−i
i i
= i=0 (Ir
(7.20)
−1 ∞
= −(β α) β µ + i=0 (Ir + β α) β εt−i ,
i
an MA(∞) representation for the r cointegration relations. Notice that if rank(β α) < r , then
the polynomial (Ir − (Ir + β α)z) contains a unit root. This is ruled out by the assumption
that the number of unit roots equals (n − r ).5
Substituting equation (7.20) for β xt−1 in (7.10) we have found the MA representation for
∆xt . Specifically, it is given by
∞
−1
In − α(β α) β µ + εt + i=1 α(Ir + β α)
(i−1)
∆xt = β εt−i
(7.21)
∞
= δ + i=0 Ci εt−i ,
where C0 = In .
To show that C(z) has unit roots, note first that
∞
+ β α)
(i−1)
C = In + i=1 α(Ir β
(7.22)
−1
= In − α(β α) β.
α = 0 and β β = 0 it holds
Second, for any α⊥ , β⊥ ∈ Rn×(n−r ) of rank (n − r ) such that α⊥ ⊥
that
−1 −1
In = β⊥ α⊥ β⊥ α ⊥ + α β α β. (7.23)
or β or through post-
This can be verified through premultiplication of both sides by α⊥
∗ = α ζ,
multiplication by α or β⊥ . The choice of basis for α⊥ and β⊥ is irrelevant since α⊥ ⊥
β∗
⊥ = β⊥ ξ (where ζ and ξ are nonsingular (n − r ) × (n − r ) matrices) satisfy
∗ ∗ −1 ∗ −1
β∗
⊥ α ⊥ β⊥ α ⊥ = β⊥ α ⊥ β⊥ α⊥ .
5
When β α has full rank r (explosive roots have already been ruled out by assumption), it follows that
∞ −1
i=0 (Ir + β α) = (Ir − (Ir + β α)) , i.e. the matrix (Ir + β α) has all eigenvalues inside the unit circle so that
i
the sum of the exponents from zero to s converges to a finite matrix as s becomes very large.
– 28 –
in the VAR(1) model.6 Accordingly, we find that
rank[C] = n − r , (7.25)
∞ ∞
i
C(z) − C = i=1 Ci z − i=1 Ci
∞ ∞ ∞ i
= −(1 − z) i=1 Ci − i=1 Ci z + i=1 Ci z
∞ ∞ ∞
= −(1 − z) i=1 Ci − i=1 Ci z + C1 z + i=2 Ci z
i
∞ ∞ ∞ i
= −(1 − z) i=1 Ci − i=2 Ci z + i=2 Ci z
∞ ∞ ∞ 2
= −(1 − z) i=1 Ci − (1 − z) i=2 Ci z − i=2 Ci z
∞
+ i=2 Ci z i
∞ ∞ ∞ 2
= −(1 − z) (7.26)
i=1 Ci − (1 − z) i=2 Ci z − i=2 Ci z
∞
+C2 z 2 + i=3 Ci z i
∞ ∞ ∞ 2
= −(1 − z) i=1 Ci − (1 − z) i=2 Ci z − i=3 Ci z
∞
+ i=3 Ci z i
∞ ∞ ∞ 2
= −(1 − z) i=1 Ci − (1 − z) i=2 Ci z − (1 − z) i=3 Ci z
∞ 3 ∞ i
− i=3 Ci z + i=3 Ci z
k ∞ ∞ k ∞
= (1 − z) j=0 − i=j+1 Ci z j − i=k+1 Ci z +
i
i=k+1 Ci z .
The last two terms on the right hand side of the last equality vanish as k becomes very large,
while the first term converges when the Ci matrices satisfy a summability condition. In that
case, we obtain
∞
∞
C(z) − C = (1 − z) Cj∗ z j , Cj∗ = − Ci , j = 0, 1, . . . . (7.27)
j=0 i=j+1
The summability condition we require Ci to satisfy is such that the Cj∗ matrices are ab-
solutely summable. That is,
∞ ∗
∞ ∞
j=0 Cj = j=0 i=j+1 Ci
∞
= i=1 i |Ci | < ∞.
Hence, the Ci matrices must be 1-summable. For finite order VAR models, this condition will
always be satisfied since its MA representation has exponentially decreasing (in an absolute
sense) parameters.
6
Notice that in the PIH case, α⊥ β⊥ = 1.
– 29 –
In our VAR(1) model, the Cj∗ matrices are for all j ≥ 0
∞
Cj∗ + β α) β
(i−1)
= − j=i+1 α(Ir
∞ i
= −α i=j [Ir + β α] β
(7.28)
j ∞
= −α(Ir + β α) i=0 [Ir + β α]i β
j −1
= α(Ir + β α) (β α) β.
It is now straightforward to show that these matrices indeed are absolutely summable7 and
thus that the C(z) matrix polynomial can be expressed as
∞
xt = xt−1 + Cµ + Cεt + j=0 Cj∗ (εt−j − εt−j−1 )
(7.30)
t ∞
= x̃0 + Cµt + C i=1 εi + j=0 Cj∗ εt−j .
Again we find that xt includes (i) an I(1) component; (ii) an I(0) component; and (iii) initial
values, denoted by x̃0 .
The I(1) component is of particular interest in the so called common trends model; see
King et al. (1991). Specifically, while this component is made up of n linear combinations
of the accumulated Wold innovations, only (n − r ) of these combinations are linearly inde-
pendent. In other words, there are fewer trends than variables. From equation (7.30) we
find that
t
−1
t
C(µt + εi ) = β⊥ α⊥ β⊥ α µt +
α⊥ εi . (7.31)
⊥
i=1 i=1
µt +
Hence, the reduced form linearly independent (n − r ) common trends are given by (α⊥
t −1
i=1 α⊥ εi ), while the coefficients on these trends are β⊥ (α⊥ β⊥ ) .
Structural common trends models were first suggested by Blanchard and Quah (1989),
King et al. (1991), and Shapiro and Watson (1988). The basic idea is to make identifying
assumptions about the long run responses in the endogenous variables with respect to
the structural shocks. The observation that the number of linearly independent common
trends in the reduced form conditional MA representation is smaller than the number of
endogenous variables is a central ingredient. This suggests that structural shocks can be
decomposed into (i) shocks with permanent effects on x (trend shocks); and (ii) shocks
which only have temporary (transitory) effects on x. Moreover, since the I(0) component
and the change in the I(1) component in (7.30) are correlated, the structural trend shocks
7
This follows from the fact that (Ir + β α) has all eigenvalues inside the unit circle.
– 30 –
typically lead to cyclical fluctuations around the trends as well changes in the trends. When
xt contains macroeconomic variables, we can think about this as shocks to growth also
having an influence on business cycle fluctuations.
The common trends approach is based on identifying B0 and Ω using more reduced form
parameters than just Σ. In particular, the restrictions implied by cointegration are used for
identification through the matrix C.
Example: Consider again the PIH. To exactly identify the parameters of a structural VAR
model such that it has a common trends interpretation we need to impose 4 identifying
assumptions. By letting Ω be the identity matrix we already have 3 of these restrictions.
The remaining restriction will be imposed on B0 such that only one of the structural shocks
has a long run effect on x.
Collecting the initial values in x̃0 the reduced form common trends representation is
t
xt = x̃0 + δt + C εi + C ∗ εt . (7.32)
i=1
t
xt = x̃0 + δt + A ϕt + Φηt . (7.33)
i=1
Here, the 2 × 1 vector A is defined from CB0−1 = [A 0], and the 2 × 2 matrix Φ = C ∗ B0−1 .
Since, Ω = I2 , the parameters of B0 must also satisfy B0−1 (B0 )−1 = Σ.
– 31 –
For the second column to contain zeros only, the inverse of B0 must have that β+
12 = 0.
Since the inverse is lower triangular, it follows that B0 itself must be lower triangular, i.e.
β11 0
B0 =
.
β21 β22
To uniquely determine the remaining 3 elements of B0 we use the relation Σ = B0−1 (B0 )−1 .
This given us
σu2 σu2 1/β11 1/β11
0 −β21 /(β11 β22 )
=
σu2 σu2 + σv2 −β21 /(β11 β22 ) 1/β22 0 1/β22
1/β211 −β21 /(β211 β22 )
=
.
2 2 2 2 2
−β21 /(β11 β22 ) 1/β22 + β21 /(β11 β22 )
These parameters are equivalent to the trend innovation, ϕt , being a permanent income
shock, and the temporary innovation, ψt , being a transitory income shock. This can be
seen from the contemporaneous effects on consumption and income from one standard
deviation shocks being
σu 0
B0−1 =
,
σu σv
where the first column contains the effects on x from the trend shock and the second column
the effects from the temporary shock. The long run responses are given by
σu 0
CB0−1 =
.
σu 0
The long run is reached after 1 period in this example, and comparing the above results to
those in section 2 we find that they are indeed equivalent.
For the n variable case, imposing the necessary n2 identifying assumptions is somewhat
more involved. First, n(n + 1)/2 restrictions are given by assuming that Ω = In . Second,
(n−r )r restrictions are obtained from CB0−1 = [A 0], where A is an n×(n−r ) matrix. These
assumptions imply that the first (n − r ) structural shocks have a long run effect on at least
one of the x variables, while the remaining r shocks have only temporary effects on x. To
identify the (n − r ) trend shocks (n − r )(n − r − 1)/2 restrictions need to be imposed on A
– 32 –
(which implies the same number of restrictions on B0 ), while the r transitory shocks can be
identified from, e.g., restricting r (r − 1)/2 elements of the final r columns of Φ0 = C0∗ B0−1 .
How to achieve this is discussed in some detail by King et al. (1991), Mellander et al. (1992),
and Englund et al. (1994). In addition, the paper by Jacobson et al. (1996) discusses how
to relate the structural common trends coefficients of the matrix A to familiar economic
theory parameters.
References
Blanchard, O. J., and Quah, D. (1989). “The Dynamic Effects of Aggregate Demand and Supply
Disturbances.” American Economic Review, 79, 655–673.
Cooley, T. F., and LeRoy, S. F. (1985). “Atheoretical Macroeconometrics: A Critique.” Journal of
Monetary Economics, 16, 283–308.
Davidson, J. E. H., Hendry, D. F., Srba, F., and Yeo, S. (1978). “Econometric Modelling of the Ag-
gregate Time-series Relationship between Consumers’ Expenditure and Income in the United
Kingdom.” Economic Journal, 88, 661–692.
Englund, P., Vredin, A., and Warne, A. (1994). “Macroeconomic Shocks in and Open Economy: A
Common Trends Representation of Swedish Data, 1871–1990.” In V. Bergström and A. Vredin
(Eds.), Measuring and interpreting business cycles (pp. 125–223). Oxford, England: Clarendon
Press.
Frisch, R. (1933). “Propagation Problems and Impulse Problems in Dynamic Economics.” In J. Åker-
man (Ed.), Economic essays in honour of gustav cassel (pp. 171–205). London, England: George
Allen.
Hall, R. E. (1978). “Stochastic Implications of the Life Cycle–Permanent Income Hypothesis: Theory
and Evidence.” Journal of Political Economy, 86, 971–987.
Hamilton, J. D. (1994). Time Series Analysis. Princeton, N.J.: Princeton University Press.
Jacobson, T., Vredin, A., and Warne, A. (1996). “Common Trends and Hysteresis in Scandinavian
Unemployment.” European Economic Review.
Johansen, S. (1991). “Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian
Vector Autoregressive Models.” Econometrica, 59, 1551–1580.
Johansen, S., and Juselius, K. (1990). “Maximum Likelihood Estimation and Inference on Cointegra-
tion: With Applications to the Demand for Money.” Oxford Bulletin of Economics and Statistics,
52, 169–210.
King, R. G., Plosser, C. I., Stock, J. H., and Watson, M. W. (1991). “Stochastic Trends and Economic
Fluctuations.” American Economic Review, 81, 819–840.
Lippi, M., and Reichlin, L. (1993). “The Dynamic Effects of Aggregate Demand and Supply Distur-
bances: Comment.” American Economic Review, 83, 644–658.
Lütkepohl, H. (1991). Introduction to Multiple Time Series. Berlin, Germany: Springer-Verlag.
Mellander, E., Vredin, A., and Warne, A. (1992). “Stochastic Trends and Economic Fluctuations in a
Small Open Economy.” Journal of Applied Econometrics, 7, 369–394.
Runkle, D. E. (1987). “Vector Autoregressions and Reality.” Journal of Business and Economic
Statistics, 5, 437–454.
Shapiro, M. D., and Watson, M. W. (1988). “Sources of Business Cycle Fluctuations.” In S. Fischer
(Ed.), Nber macroeconomics annual (pp. 111–148). Cambridge, Ma: MIT Press.
Sims, C. A. (1980). “Macroeconomics and Reality.” Econometrica, 48, 1–48.
Slutzky, E. (1937). “The Summation of Random Causes as the Source of Cyclic Processes.” Econo-
metrica, 5, 105–146.
Stock, J. H., and Watson, M. W. (1988). “Variable Trends in Economic Time Series.” Journal of
Economic Perspectives, 2, 147–174.
– 33 –
Anders Warne, Sveriges Riksbank, 103 37 Stockholm, Sweden
E-mail address: [email protected]
URL: http://www.riksbank.com/
http://www.farfetched.nu/anders/
– 34 –