A Step-By-Step Introduction To VAR Models (With Simulations On Matlab) - Michele Piffer
A Step-By-Step Introduction To VAR Models (With Simulations On Matlab) - Michele Piffer
∗
DIW Berlin, Mohrenstrasse 58, 10117 Berlin, Germany. Email: [email protected], personal
web page: https://sites.google.com/site/michelepiffereconomics/. These notes can be reproduced
freely for educational and research purposes as long as they contain this notice and are retained for
personal use or distributed for free. All errors are mine. Please get in touch if you find typos or
mistakes.
1
Contents
1 The nature of the problem 3
2 Forecasting 7
4 Identification 11
4.1 Recursive identification - Cholesky . . . . . . . . . . . . . . . . . . . 15
4.2 Other zero restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Sign restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4 Long run restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5 External instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.6 Identification through heteroskedasticity . . . . . . . . . . . . . . . . 23
4.7 Restrictions on contemporaneous independence . . . . . . . . . . . . 25
List of Figures
1 Example 1 - Generating the data . . . . . . . . . . . . . . . . . . . . 6
2 Example 2 - Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Example 3 - Impulse response to shock to the first variable (no constant) 11
4 Example 3 - Impulse response to shock to the first variable (constant) 12
5 Example 4 - The case of observationally equivalent structural repre-
sentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Example 5 - Cholesky identification . . . . . . . . . . . . . . . . . . . 19
7 Example 8 - Identification through sign restrictions . . . . . . . . . . 22
8 Example 11 - AR and MA representations of the same data . . . . . . 33
2
1 The nature of the problem
Suppose you want to model the interaction between variable xt and variable zt , say,
GDP and inflation. You are willing to accept as maintained hypothesis that they
affect one another both contemporaneously and up to one lag, but not after one lag.
In addition, your are willing to assume that the relationship is linear, and that only
these variables should enter the model. The model you are after is
δ2 + δ1 µ 2 δ3 + δ1 µ 3 1 δ1
xt = xt−1 + zt−1 + sxt + szt
1 − µ 1 δ1 1 − µ1 δ1 1 − µ 1 δ1 1 − µ 1 δ1
µ1 δ2 + µ2 µ 1 δ3 + µ 3 µ1 1
zt = xt−1 + zt−1 + sxt + sz
1 − µ1 δ1 1 − µ 1 δ1 1 − µ 1 δ1 1 − µ 1 δ1 t
A better approach, instead, consists of using matrix algebra. Rewrite first the system
in equations (1) as
x
1 −δ1 xt δ2 δ3 xt−1 s
= + tz
−µ1 1 zt µ2 µ3 zt−1 st
3
Premultiplying both sides by the inverse of the matrix on the left hand side yields
x
xt 1 δ2 + δ1 µ 2 δ3 + δ1 µ 3 xt−1 1 1 δ1 st
= +
zt 1 − δ1 µ1 µ 2 + δ µ
2 1 µ 3 + δ µ
3 1 zt−1 1 − δ1 µ1 µ 1 1 szt
which is of course the same system obtained through the tedious substitutions. For
simplicity, rewrite the above equations as
where, by now, it should be understood that the parameters {βi }4i=1 are not free
parameters, but they are some function of the underlying parameters {δi , µi }3i=1 .
The same holds for the error terms that enter equations (2). These errors are not
free random variables, but they are driven by the underlying errors in the initial
model through the relationship
x x
rt 1 1 δ1 st
z =
rt 1 − δ1 µ1 µ1 1 szt
Contrary to equations (1), equations (2) can be estimated consistently using OLS
x
equation by equation. In fact, the regressors xt−1 and zt−1 depend directly on rt−1
z x z ∞
and rt−1 (and indirectly on all previous errors {rt−i , rt−i }i=2 , etc. via the lagged
dependent variables), but not on rtx and rtz . Equations (1) provide the structural
representation of the model, while equations (2) provide the reduced form represen-
tation of the model. Note that estimating the reduced form model of equations (2)
gives four estimates (βi , i = 1, .., 4), while the structural representation of the same
data had 6 elements (δi , µi , i = 1, .., 3). Moving from the estimated reduced form to
the structural is not straightforward. We will get back to this later.
A few words of caution are worth at this point. While we do acknowledge that
xt should be part of the model representing variable zt (and viceversa), the omission
of xt in the equation estimated for zt is not a source of omitted variable, since we
are estimating a transformation of the underlying model. The contemporaneous
relationship between xt and zt that is clearly displayed in equations (4) is equally
present in equations (2). It simply enters the model of equations (4) through the
correlation of the errors. This correlation should be clear from equation (5) and will
be crucial in the discussion of identification issues.
From now on, refer to the reduced form model using linear algebra:
yt = B− yt−1 + rt (3)
where bold lower-case letters indicate vectors and upper-case letters indicate ma-
trices. VAR analysis starts with the estimation of a reduced form VAR. Here
yt = (xt , zt )0 , and rt = (rtx , rtz )0 . 0 indicates transpose, i.e. yt and rt are column
4
(2 × 1) vectors. Note that the model considered here uses only one lag as regressors,
i.e., it is a VAR(1). We will use a more general notation with the lag operator in
Section 5.
The structural form of the model, instead, can be written in two ways:
Note that what distinguishes the structural form from the reduced form is whether
structural shocks appear explicitly or not. It should nevertheless be clear that
rt = Bst (5)
with B = A−1 and that B− = A−1 A− = BA− . In other words, when working with
a reduced form model we do acknowledge that the reduced form shocks come from
some underlying structural shocks, but we do not take a stand on how the mapping
works. In most of these notes I will use the B form of the structural form. Section 7
discusses in further detail the difference between A form and B form of the structural
shocks.
In these notes I will only discuss stationary VARs. A VAR in the form of (3) is
stationary if the maximum eigenvalue of the matrix B− is inside the unit circle, i.e.
it is smaller than 1 in absolute value.
5
model, nor the number of lags. Assume that you know the true model
up to the parameter values, i.e. you do know that only xt , zt should enter
the model, that the model has no constant and that it has only one lag of
each variable. Convince yourself that estimating the reduced form model
yields consistent estimates, while estimating directly the structural model
does not.
1
1
0 0
−1 −1
−2 −2
10 20 30 40 50 10 20 30 40 50
0.5 0.4
0.2
0
0
−0.5
−0.2
−1
10 20 30 40 50 10 20 30 40 50
Variable 1 Variable 2
0.8
0.6
0.5
0.4
0.2
0
0
−0.5 −0.2
−0.4
10 20 30 40 50 10 20 30 40 50
6
2 Forecasting
Suppose you have data until time T and you want to predict the course of variables
x and z for the future. To do so, one can use model (3) to generate forecasts of x
and z recursively. In order to do so, one needs to substitute {rt }Tt=T
+h
+1 with their
expected value. Since rt depends on st through equation (5) and since the structural
shocks equal zero in expectation, the expectation of the corresponding reduced form
shocks is zero. The forecasts periods are then computed as
ŷT +1 = B̂− yT
ŷT +2 = B̂− ŷT +1
ŷT +3 = B̂− ŷT +2
...
where B̂− stands for the OLS estimates of B− and ŷT +h is the forecast of the vector
y for the period T +h, given the information on the first T periods. Forecasts tend to
become poorer with the time, given that the effects of future shocks is not taken into
account period after period. This should become clear after the following example.
7
Figure 2: Example 2 - Forecasting
Variable 1 Variable 2
1 0.4
0.2
0.5
0
0
−0.2
−0.5
−0.4
−1
−0.6
20 40 60 80 100 20 40 60 80 100
Variable 1 Variable 2
0
0
−0.2
−0.5
−0.4
−1
−0.6
Suppose for the moment that we do actually know the true matrices B− and B.
This is of course never the case in practice, but it is an innocuous assumption for the
sake of the point made in this specific example (it allows to keep sample uncertainty
out of the analysis). Suppose you want to generate pseudo data corresponding to
a completely random extraction of structural shocks. Doing so requires extracting
T · 2 realizations of structural shocks S̄ = [s̄1 , s̄2 , ..., s̄T ], s̄t = (s̄xt , s̄zt )0 , converting
them into reduced form shocks through the equality r̄t = Bs̄t and then generating
8
the observables Y = [y1 , y2 , ..., yT ] recursively, given some initial value y0 . Call
the generated data Y(S̄)pseudo . Suppose now that you generate new data by feeding
into the model the very same set of structural shock S̄, except that the structural
shock to x at some period τ is augmented by . In other words, the new data,
Y(S̃)pseudo , are generated from the same initial values y0 and from the structural
shocks {s̃xt , s̃zt }Tt=1 with s̃zt = s̄zt , ∀t, s̃xt = s̄xt , ∀t 6= τ and s̃xt = s̄xt + , t = τ . These
new shocks will necessarily map into identical reduced form shocks for t < τ and
for t > τ , and into a different set of reduced form shocks for t = τ . Hence, by
construction, the dynamics of the observables are identical up to t = τ and can differ
afterwards. In particular, they can differ both at time t = τ (due to the matrix
B) and at any t > τ (due to the autoregressive component B− of the model). Any
difference in the pattern of Y corresponding to Ŝ and S̃ would be attributed to the
single additional structural shock given to variable x. This difference, computed
as φτ = y(S̃)pseudo τ − y(S̄)pseudo
τ , is called impulse response of variables x and z to
a structural shock to variable x. Impulse responses play a key role in structural
analysis. An impulse response describes the effects produced by a structural shock
on the endogenous variables.
The above description of an impulse response function uses its full definition, but
computing impulse responses does not really require generating pseudo data every
time, as we did above. In fact, since the model is linear, the impulse responses
computed using the above methodology are the same irrespectively on the initial set
of structural shocks S̄, as long as the value of is kept unchanged. A particularly
convenient set of values of S̄ is of course zero. If s̄t = (0, 0)0 , ∀t the corresponding
observables Y(s̄)pseudo equal zero at each point in time, so we do not even need to
subtract them. It then follows that one can compute the same impulse response of
yt to a structural shock of size to variable x simply as the level of the variables
corresponding to the matrix of structural shocks s̄τ = (, 0)0 and zeros for other
periods:
φ0 = v
φ1 = B− · φ0 = B− · v
φ2 = B− · φ1 = (B− )2 · v
...
φτ = B− · φτ −1 = (B− )τ · v
where x
v b
v= z =B· = 11 = b1 (7)
v 0 b21
where b1 stands for the first column of B. The vector v takes the name of impulse
vector. In this case, the impulse vector corresponds to a shock to variable x. If one
wants to consider a shock to variable z, then the impulse vector will be computed as
B · (0, )0 = b2 , where b2 represents the second column of B.
9
So far we have considered the thought experiment of giving a vector of structural
shocks (, 0)0 or (0, )0 . In principle, one could also compute impulse responses to a
shock to a reduced form shock instead of to a structural shock. This would require
giving an impulse vector of the form
0
v= or v = (8)
0
Mathematically, one could well compute this, but economically it would not make
much sense. In fact, such an analysis would not appreciate the fact that reduced
form shocks are necessarily correlated with each other because they come from some
underlying structural shocks. This correlation is important, because it was this very
specific correlation that captured the contemporaneous correlation among variables.
We saw in the introduction to these notes that this contemporaneous correlation
is captured by the matrices A or B in the structural form. An impulse response
analysis is computed on structural shocks st . Given a set of structural shocks of
interest, to compute impulse responses one must compute the reduced form shocks
endogenously as a function of the shock considered, using matrix B. It then becomes
crucial to have not only an estimate of B− , but also an estimate of B. Doing so is
not straightforward and requires to take a stand on the identification of the model.
10
Figure 3: Example 3 - Impulse response to shock to the first variable (no constant)
3 1
2
0
1
−1
0 pseudo
−1 −2 pseudo + extra shock
5 10 15 20 25 30 5 10 15 20 25 30
0.5 0.2
0
0
−0.2
−0.5 −0.4
−0.6
5 10 15 20 25 30 5 10 15 20 25 30
Variable 1 Variable 2
0.5
0.5
0
0
−0.5 −0.5
5 10 15 20 25 30 5 10 15 20 25 30
0.05 −0.2
0 −0.3
5 10 15 20 25 30 5 10 15 20 25 30
4 Identification
Identification is a very broad concept in Econometrics, and these notes steer clear
from attempting to give a formal definition of the concept. Here, it suffices to say that
identifying the structural model means estimating matrix B (or equivalently A. See
Section 7). Once this matrix is identified, impulse responses can be computed, the full
structural form can be derived and additional analysis like variance decomposition
11
Figure 4: Example 3 - Impulse response to shock to the first variable (constant)
3 1
2
0
1
−1
0 pseudo
−1 −2 pseudo + extra shock
5 10 15 20 25 30 5 10 15 20 25 30
0.5 0.2
0
0
−0.2
−0.5 −0.4
−0.6
5 10 15 20 25 30 5 10 15 20 25 30
Variable 1 Variable 2
−1
4
3.5 −1.5
3 −2
5 10 15 20 25 30 5 10 15 20 25 30
0.05 −0.2
0 −0.3
5 10 15 20 25 30 5 10 15 20 25 30
12
ments due to the fact that reduced form shocks come from the same set of underlying
structural shocks, and hence are correlated with each other. Getting an estimate of
Σ is not complicated, because the estimation of the reduced form model yields con-
sistent estimates of the residuals, which implies the following estimator for Σ:
PT
r̂t r̂0
Σ̂ = t=2 t (10)
T
By assumption, i.e. by the relationship
rt = Bst
the following equality holds:
Σ = BDB 0 (11)
where D = E(st s0t ) is the true variance covariance matrix of the structural shocks.
Since these are structural shocks, they are not correlated with each other, hence D is
diagonal. Assume that the structural shocks are normally distributed (I will return
to this point in Section 4.7). In our application with two variables from the previous
Sections, Σ̂ has 3 elements (one is lost since the matrix is by construction symmetric),
B has 4 elements and D has two. Clearly, we cannot solve for 6 elements having 3
pieces of information. The system is not identified, which implies that to estimate B
uniquely we need additional restrictions, i.e. additional information. Note that this
is not an estimation issue but an identification issue. For this reason, I will address
the problem considering Σ rather than Σ̂, i.e. I will discuss this issue keeping sample
uncertainty out and assuming that we know the true Σ.
More in general, with n variables included in the VAR, the matrix Σ contains
n(n + 1)/2 elements, the matrix B contains n2 elements and the matrix D contains n
elements. It is standard to assume a normalization at this point and set D = I. This
eliminates n elements to be estimated, although the system is still under-identified.
Under this assumption, the key condition imposed by the data to estimate B is
Σ = BB 0 (12)
• The key challenge in solving system (12) comes from the fact that the system
is non-linear in the elements of B;
• Some solve the identification problem by adding restrictions from the theory,
others from statistical features of the data. These are two conceptually very
different approaches;
13
• Identification approaches are divided into pointwise identification and set iden-
tification. Pointwise identification consists of adding sufficient and appropriate
restrictions to identify a single matrix B consistent with condition (12), given
an estimated matrix Σ̂. Set identification consists of adding fewer restrictions,
implying a whole set of candidate matrices B that are equally consistent with
condition (12);
In this introductory part of the notes it is probably pointless to dig more into
these concepts and discuss rank conditions, local vs. global identification and so on. I
discuss these issues in greater detail in “Notes on the uniqueness in the identification
of SVARs”, available on my website. For a more complete analysis the reader is
referred to Fisher (1966), Lütkepohl (2007) and Rubio-Ramirez et al. (2010). See
also Amisano and Giannini (2002).
14
Figure 5: Example 4 - The case of observationally equivalent structural representa-
tions
Variable 1 Variable 2
0.5
0.5
0 0
−0.5
−0.5
5 10 15 20 25 30 5 10 15 20 25 30
1 1
0 0
−1 −1
−2 −2
5 10 15 20 25 30 5 10 15 20 25 30
0.4 0
0.2
−0.2
0
5 10 15 20 25 30 5 10 15 20 25 30
0.4 0.2
0.2 0.1
0 0
5 10 15 20 25 30 5 10 15 20 25 30
15
Suppose we are willing to impose the restriction that the top-right entry of B equals
zero, i.e. that B should be searched for within the set of
b11 0
(13)
b21 b22
Under this restriction, it can be shown that there is a unique solution for B in
equation (12) up to sign of columns. This means that, given one matrix B that
satisfies condition (12), flipping the signs of all entries of the first column of B, of
the second column of B or of both columns generates a new matrix that equally
solves equation (5). This degree of indeterminacy actually holds for all matrices, not
only for triangular matrices. Importantly, it can be shown that, in the triangular
case, these are the only matrices that satisfy equation (5). A standard normalization
consists of imposing that the diagonal elements of the solution for B are positive,
which happens to be an innocuous imposition.
Having understood the mathematical properties of a unique (up to sign of columns)
solution for B under the triangular assumption, it is key to understand the economic
meaning of it. In fact, it is economic theory that should guide a researcher towards
adopting (i.e. imposing) a triangular structure to B or not. Put it differently, it is
the economic meaning of the restrictions imposed that allows a researcher to attach
interpretations to the underlying structural shocks identified.
Let’s use a standard example, which is the example of monetary shocks. For the
moment, do not worry about what a monetary shock could possibly be, I will return
to this point in Section 8. Consider a VAR model with 3 variables, i.e. real GDP,
inflation and the federal funds rate. Identifying the model adding these variables in
this order, i.e. GDP
yt
yt = ytinf lation
ytf edf unds
and using the triangular structure of B, i.e.
GDP GDP 1
yt yt−1 b11 0 0 st
ytinf lation = B− (L) yt−1
inf lation
+ b21 b22 0 s2t
ytf edf unds f edf unds
yt−1 b31 b32 b33 s3t
implies that:
• a shock to GDP affects contemporaneously all three variables (due to the first
column of B), but GDP is affected contemporaneously only by its own shock
(due to the first row of B);
• a shock to inflation affects contemporaneously only inflation and the fed funds
(due to the second column of B), but inflation is affected contemporaneously
only by its own shock and the shock to GDP (due to the second row of B);
16
• a shock to the fed funds rate affects contemporaneously only the federal funds
rate (due to the third column of B), but the federal funds rate is affected by
all shocks (due to the third row of B);
• after one lag, all shocks could potentially affect all variables, due to the au-
toregressive component B− (L) of the model.
Convince yourself that this is the case by rewriting the reduced form structural shocks
as GDP
rt b11 0 0
inf l
rt = rt = b21 st
GDP inf
+ b22 st + 0 sft f unds
rtf f unds b31 b32 b33
Note that the response of the central bank to the economy is captured by the terms
{B− }03 yt−1 + b31 sGDP
t + b32 sinf
t
l
, with {B− }03 the last row of the 3 × 3 matrix B− ,
while the stochastic response of the policy rate is taken by b33 sft f unds .
Do these restrictions make economic sense? It depends on the frequency of the
data. If the data used are monthly, then it is realistic that an exogenous variation in
the policy rate takes at least one month to affect the economy, and that the federal
funds rate responds endogenously to contemporaneous developments of GDP and
inflation. Instead, if the data are yearly, it would be harder to assume that the lags
between a monetary intervention and the corresponding response of the economy are
so long that an effect could not occur before a whole year. Of course, this calls into
question the judgement of the researcher.
What happens if one adds a financial variable to the model? Suppose one orders
this financial variable, say return on stocks, after the federal funds. This means that
a monetary shock does affect contemporaneously financial markets, which is realistic.
Nevertheless, it also implies that contemporaneous developments on financial markets
are not taken into account by the central bank when setting the interest rate, which
is not easy to justify. Similarly, if one orders the financial variable before the federal
funds rate, then the central bank does take contemporaneous financial developments
into account when setting the interest rate, but contemporaneous developments on
the interest rate affect the financial variable only with a lag. Again, depending on
the data and on the exact variables used, this assumption might be easy or hard to
defend.
There is another way of thinking about the identification restrictions imposed,
which uses identified structural shocks rather than impulse responses. Remember
that reduced form and structural shocks are related by the equation rt = Bst . Once
B is estimated, we can trace back the structural shocks from the reduced form shocks
using
st = B −1 rt
Since B is lower triangular, its inverse is lower triangular. This means that in the
Cholesky ordering, what allows to attach a specific economic label to a certain shock
is the argument that, given the reduced form shocks estimated in the VAR, it is the
17
only structural shock that is contemporaneously correlated with all reduced form
shocks of the variables ordered before and contemporaneously uncorrelated with
the reduced form shocks of the variables order after. In the example above, what
distinguished the monetary policy shock from the other identified structural shocks
is that it is the only structural shock that is correlated with the reduced form shocks
of all 3 variables. This can be seen from the fact that, given
a11 0 0
B −1 = A = a21 a22 0
a31 a32 a33
the structural shocks are computed as s1t = a11 r1t , s2t = a21 r1t + a22 r2t and s3t =
a31 r1t + a32 r2t + a33 r3t . Such restrictions are the counterpart of the triangularity
restrictions on the impulse responses. If such restrictions are not theoretically sound
given the specific case at hand, the Cholesky decomposition is not appropriate, no
matter how convenient or easy it is to implement it.
A standard result in the Cholesky identification is that the recursive structure
matters in terms of blocks and allows for an easy partial identification. To see this,
consider a VAR of 5 variables, with variable z ordered as third. The Cholesky iden-
tification of the structural shock to variable z is independent on the exact ordering
within blocks of the variable that enter before and after z. A standard reference for
this is Christiano et al. (1999).
18
Figure 6: Example 5 - Cholesky identification
IRF of Variab. 1, shock 1 IRF of Variab. 2, shock 1
0.25 0.3
0.2 0.25
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
2 4 6 8 10 2 4 6 8 10
0.9
0.8
−0.05
0.7
0.6
0.5
−0.1 0.4
0.3
0.2
0.1
−0.15
0
2 4 6 8 10 2 4 6 8 10
19
Once checked whether the matrix is identifiable or not, there are numerical solutions
available that one can use to actually estimate B under the restrictions imposed. A
good reference on this is Binning (2013), while a more general and more comprehen-
sive reference is Rubio-Ramirez et al. (2010).
20
and the Givens rotation.
The identification through sign restrictions is a set identification, because there
are usually several candidate structural representations of the data that meet the
restrictions and that are equally consistent with the data (unless one intentionally
adds as many restrictions as needed to be left with only one candidate model. One
example is Canova and De Nicolo (2002)). A standard way of representing the results
is to report an error band covering between 5% and 95% of the models generated,
considered at each time t. It is also common to report either the median or the median
target (see Fry and Pagan (2011)). It is worth reminding that the uncertainty band
attached to the impulse response under sign restrictions does not give a measure of
sample uncertainty, but of model uncertainty.
Last, one can combine a triangular structure of B for most of the variables and
achieve a rotation in a subspace of B, in case, say, the first variables in the VAR
should allow for a triangular structure and the last ones not. For an application of
this type see Eickmeier and Hofmann (2013).
21
Figure 7: Example 8 - Identification through sign restrictions
0.2
0.4
0.15
0.3
0.1
0.2
0.05
0.1
0
0
1 2 3 4 5 1 2 3 4 5
yt = C(L)Bst (14)
= Bst + C1 Bst−1 + C2 Bst−2 + C3 Bst−3 + ...
where the matrix B has the same meaning as in the notation used so far and the
matrices {Ci }∞
i=1 are a function of the underlying autoregressive parameters.
Student Version Suppose
of MATLAB
22
that the first variable of the VAR enters in first differences. Assuming that
∞
X
C(1)1,2 = c1,2,j = 0
j=0
means that the second structural shock does affect the first variable, but overall the
long run effect is zero because the positive first differences and the negative first
differences cancel each other. This means that if one has a theory of no long run
impacts of a certain structural shocks, then one can add this set of restrictions to
identify the SVAR. In particular, the restriction would be imposed on the matrix
C(1)B, where the elements in C(1) do not allow for restrictions, because they will
reflect parameters estimated in the reduced form model. The long run restrictions
are then imposed in the matrix B, which is the matrix that we are trying to identify.
The standard reference for this approach is Blanchard and Quah (1989).
Σ1 = BB 0 and Σ2 = BΛB 0
23
The variance covariance matrix of structural shocks in period T1 is normalized to
I, while Λ measures the relationship between variances. Note that adding regimes
introduces additional parameters to estimate, namely the parameters in the matrix
Λ, but it also introduces information to exploit, namely, the elements in Σ2 . It
is this additional information that this methodology exploits in order to achieve
identification.
In terms of estimation strategy for B, several applications use a maximum like-
lihood estimator on the structural model, which by construction simultaneously es-
timates B and Λ, together with the parameters of the reduced form parameters. If
instead one wants to estimate first the reduced form model with OLS and then back
out B given the variance-covariance matrices of the reduced form shocks, one needs
to decompose matrices Σ1 and Σ2 into B and Λ.1
The key challenge of this identification approach is that, in itself, it does not
allow to attach economic interpretations to the impulse responses. In fact, under
the identifying strategies of triangular structure, sign restrictions and long run re-
strictions, it is a specific feature of the impulse responses that allows to interpret the
underlying shock in one way or another. Under identification through heteroskedas-
ticity, instead, the researcher exploits a variation in the statistical properties of the
data, and this, in itself, is silent about the interpretation of shocks. It is common to
then attach economic interpretations based on the shape of the impulse responses.
Note also that the validity of this identification approach is as good as the argument
that the data have undergone volatility regimes.
Applications of this methodology can be found, for instance, in Rigobon and
Sack (2004) and Lanne and Lütkepohl (2008). Other papers, including Lütkepohl
and Netšunajev (2014), do not impose the timing of the changes in the volatility
regimes, but estimate them through Markov switching. More recently, Fanelli and
Bacchiocchi (2015) extended this methodology by abandoning the assumption that
the B matrix is invariant across regimes.
1
One way to do so is the following. Start with any decomposition of Σ1 into a candidate matrix
Bc such that Bc Bc0 = Σ1 . Given Bc , construct a matrix C = (Bc−1 )Σ2 (Bc−1 )0 . Apply the spectral
decomposition to C, i.e. compute Ec and Vc such that Ec Vc Ec0 = C. Estimate matrix Λ with Vc
(by construction, Λ must be a diagonal matrix, and Vc satisfies this requirement). Then, note that
condition
C = Ec Vc Ec0 = (Bc−1 )Σ2 (Bc−1 )0
implies
Σ2 = Bc Ec Vc Ec0 Bc0
= Bc Ec Vc1/2 Vc1/2 Ec0 Bc0
24
Example 9: [ADD]
There are at least four main notations that one can use. Notation 1 is
particularly convenient to rewrite and simply a VAR(p) into a VAR(1).
Notation 4 is particularly convenient for Bayesian estimation of the VAR.
25
Notation 3 simplifies coding and it has been used in the Matlab codes
used for the examples.
Notation 1
xt = B ∗ · xt−1 + C ∗ · ȳt + r∗t (16)
2m×1 2m×2m 2m×1 2m×mc mc ×1 2m×1
B1 B2
with xt = 0
(yt0 , yt−1
B = )0 , ∗
, C ∗ = (C 0 , 00 )0 , r∗t = (rt 0 , 00 )0 , i.e.
I 0
yt B1 B2 yt−1 C r
= · + · ȳt + t
yt−1 I 0 yt−2 0 mc ×1 0
2m×1 2m×2m 2m×1 2m×mc 2m×1
This form takes the name of companion form and simplifies a lot the
coding, as it turns the VAR(p) into a VAR(1), i.e. a VAR with one lag
only. For example, to see if a VAR model with more than one lag is
stationary, one can first rewrite it in companion form and then check the
maximum eigenvalue of matrix B ∗ , as mentioned in Section 1.
Notation 2
yt = B(L) · yt−1 + C · ȳt + rt (17)
m×1 m×mp m×1 m×mc mc ×1 m×1
y1t β11,L=1 β12,L=1 β13,L=1 β11,L=2 L β12,L=2 L β13,L=2 L y1t−1 c1 r1t
y2t = β21,L=1 β22,L=1 β23,L=1 β21,L=2 L β22,L=2 L β23,L=2 L y2t−1 +c2 +r2t
y3t β31,L=1 β32,L=1 β33,L=1 β31,L=2 L β32,L=2 L β33,L=2 L y3t−1 c1 r3t
Notation 3
Y = X · A + E (18)
T ×m T ×k k×m T ×m
26
β11,L=1 β21,L=1 β31,L=1
β12,L=1 β22,L=1
β32,L=1
. . . . β13,L=1 β23,L=1 β33,L=1 .
0
yt0 yt−1 0 0
y 1
r
= 1×1 β11,L=2 β21,L=2 β31,L=2
+ 1×3
t−2 t
1×3 1×3 1×3
. . . . β12,L=2 β22,L=2 β32,L=2 .
β13,L=2 β23,L=2 β33,L=2
c1 c2 c3
. . . 0
BL=1
.
0
yt−1 0 0
y 1 0 r
= 1×1 BL=2 + 1×3
t−2 t
1×3 1×3 0
. . . c .
Notation 4
y = Im ⊗ X · β + r (19)
T m×1 T ×k T m×1
| {z } mk×1
Z
mT ×mk
27
β11,L=1
β12,L=1
β13,L=1
β11,L=2
β12,L=2
β13,L=2
c1
β
21,L=1
. . .
y1
y0 0 β22,L=1
t−1 yt−2 1
0 0
r1
T ×1 1×3 1×3 1×1
T ×7 T ×7 β23,L=1 ×1
T
y2 .
+ Tr×1
. .
T ×1 = · β21,L=2 2
y3 0 [same] 0 β22,L=2
r3
T ×7 T ×7
T ×1
β23,L=2 T ×1
0 0 [same]
T ×7 T ×7 c2
β31,L=1
β32,L=1
β33,L=1
β31,L=2
β32,L=2
β33,L=2
c3
28
average representation. It would be acceptable, of course, to ask our-
selves why bothering representing the data as a moving average, given
that impulse responses can be more easily computed using the autoregres-
sive component. It will be shown that a moving average representation
allows to push the analysis further and implement variance and histori-
cal decompositions. Since the literature uses this alternative notation at
least just as frequently as the autoregressive notation, it is very impor-
tant to become familiar with it. Note that the model does not change, it
is just expressed using an alternative, equivalent form.
Consider the general VAR(p) model. For the moment, assume that the
model does not have a constant. The following are equivalent notations
of the autoregressive reduced form representation of the data:
yt = B− (L)yt−1 + rt
yt = B1 yt−1 + B2 yt−2 + ... + Bp yt−p + rt
2 p
(I − B1 L − B2 L − ... − Bp L )yt = rt
p
X
yt = Bl yt−l + rt
l=1
The only difference from what we have seen so far is the use of the lag
operator L, which is the operator such that Lp xt = xt−p . Note that only
p lags enter.
The model can be rewritten as a moving average representation, i.e. a
representation in which variables at time t are not expressed as a linear
function of variables in the previous periods up to a certain lag, but as a
weighted function of all previous reduced form shocks. Formally,
yt = C− (L)rt
yt = rt + C1 rt−1 + C2 rt−2 + ...
2
(I − C1 L − C2 L − ....)yt = rt
X∞
yt = rt + Cl rt−l
l=1
Note that C0 = I. Note also that lags up to infinity are used. Mathe-
matically, the following equality links the polynomials:
29
The case with one lag is easy, since one can use recursive substitution:
yt = B− yt−1 + rt
2
yt = B− (yt−2 + rt−1 ) + rt = B− yt−2 + B− rt−1 + rt
= ...
Xs
s τ
yt = B− yt−s + B− rt−τ
τ =0
One can either stop the recursive substitution at t − s and write the data
as a combination of an initial observation and of the shocks occurred after
that, or can continue with the substitution and write the data exclusively
s
as a function of shocks. If the model is stationary, B− yt−s goes to zero as
s goes to infinity. The moving average representation is hence C1 = B− ,
2 3
, Cs = B− , C3 = B− , ... .
Things are instead a bit more tricky if the VAR has more than a lag. In
case the VAR is not of order one but is a general VAR(p), one either uses
the companion matrix and applies the same rule of a VAR(1) or uses the
general rule, which is
C1 = B1
C2 = B1 C1 + B2
C3 = B1 C2 + B2 C1 + B3
= ...
Cp = B1 Cp−1 + B2 Cp−2 + ... + Bp−1 C1 + Bp
Cp+s = B1 Cp+s−1 + B2 Cp+s−2 + ... + B2 Cs−1 + B1 Cs , ∀s > 0
The next step is to introduce the difference between reduced form and
structural moving average representation. Just as an autoregressive rep-
resentation of the data can be specified using the reduced form or the
structural form (i.e. expressing the data in rt or in st ), the moving av-
erage representation of the data can be specified in reduced form or in
structural form. The representation reported above is the reduced form
moving average representation. The structural moving average represen-
tation of the data is simply obtained by replacing the reduced form shocks
with the appropriate function of structural shocks, i.e.
30
yt = C− (L)Bst
= Bst + C1 Bst−1 + C2 Bst−2 + ... (20)
X∞
= Bst + Cl Bst−l
l=1
or simply
yt = D0 st + D1 st−1 + D2 st−2 + ... (21)
∞
X
= Dl st−l
l=0
φ0 = v
φ1 = C1 v
φ2 = C2 v
...
φτ = Cτ v
where
v = b1
Note that the impulse responses computed in this way are identical to
the impulse responses computed from the autoregressive representation in
Section 3. Note again that computing impulse responses with an impulse
vector v = (, 0, ...0)0 would make little sense, as discussed in Section 3.
The model considered so far did not include a constant. If the model
includes a constant, the autoregressive reduced form representation is
simply
yt = c + B1 yt−1 + B2 yt−2 + ... + Bp yt−p + rt
2 p
(I − B1 L − B2 L − ... − Bp L )yt = c + rt
The reduced form moving average representation becomes
yt =(I − B1 − B2 − ... − Bp )−1 c
+ (I − B1 L − B2 L2 − ... − Bp Lp )−1 rt
31
The inverse of I − B1 L − B2 L2 − ... − Bp Lp leads to the parameters in
C(L), just as shown before. Note, instead, that (I −B1 −B2 −...−Bp )−1 c
is the expected value of y. In fact, absent any realization of structural
shocks, the value of yt that is consistent with the model is m such that
m = c + B1 m + ... + Bp m = (I − B1 − ... − Bp )−1 c (remember that we
are only considering stationary VARs here). Last, moving from reduced
form to structural representation only requires substituting reduced form
with structural shocks and arrange the notation accordingly. The fact
that the moving average terms do not change depending on the presence
of a constant but they only depend on the underlying autoregressive
components should convince you of why we did not need the constant
term in Example 3 to generate impulse responses using the operative
definition.
Example 12: Consider again Example 10, except that the val-
ues of the constant terms set equal to zero. Compute the pa-
rameters in the moving average reduced form representation of
the data using both the mathematical result reported in the
section above and the short-cut by rewriting first the model as
a VAR(1) using a companion matrix. Compare the results.
Consider the VAR(p) with two variables written in the structural moving
average representation (21). Call the 2 × 1 h-periods-ahead forecast error
ψh . Consider for the moment the one-period ahead forecast error. From
the structural moving average representation of the VAR it should be
clear that
ψh=1 = yt+1 − E(yt+1 |t) = D0 st+1
32
Figure 8: Example 11 - AR and MA representations of the same data
Variable 1 Variable 2
0.8
0.6
0.5
0.4
0.2
0
0
−0.5 −0.2
−0.4
10 20 30 40 50 10 20 30 40 50
0.08 0
0.06 −0.05
0.04
−0.1
0.02
From AR representation
−0.15 From MA representation
0
2 4 6 8 10 2 4 6 8 10
0.3
0.1
0.2
0.05
0.1
0 0
2 4 6 8 10 2 4 6 8 10
The expected value of ψh=1 is clearly zero, which we knew already from
the forecasting example in Section 2. The variance of ψh=1 , instead, can
be decomposed into the variances of the underlying stochastic processes
33
s1t and s2t . Using simple math we find that
i.e.
2
X
V (ψ1,h=1 ) = d211,0 σ12 + d212,0 σ22 = d21j,0 σj2
j=1
2
X
V (ψ2,h=1 ) = d221,0 σ12 + d222,0 σ22 = d22j,0 σj2
j=1
This means that the share of the variance of the forecast error of variable i
explained by the volatility of the structural shock to variable j at horizon
h = 1 is
2
d2ij,0 σj2
Rij,h=1 = P2 2
(22)
d σ 2
g=1 ig,0 g
For time horizons longer than h = 1 nothing much changes, except that
there are more terms to be considered. For instance, the two-period ahead
forecast error is
34
shock to the variance of the first variables is h−1 2 2
P
τ =0 d12,τ σ2 . To isolate this
0 0
component we need ei = (1, 0) and ej = (0, 1) .
where the series goes to infinity. Calling dih column i of matrix Dh and
sjt shock j at time t, the moving average can be rewritten as
...
dj0 sjt + dj1 sjt−1 + dj2 sjt−2 + ... (24)
| {z }
ythd,j contribution of shocksj
...
dn0 snt + dn1 snt−1 + dn2 snt−2 + ...
| {z }
ythd,n contribution of shocksn
Consider two thought experiments. The first one is, what is the contri-
bution of sjt , i.e. of the realization of shock j at time t on the variables
of the system? This will be dj0 sjt on yt , dj1 sjt on yt−1 , dj2 sjt on yt−2 and so
35
on. Note that this is simply the impulse response [dj0 , dj1 , dj2 , ...] to shock
j, multiplied by a shock of size sjt rather than of size equal to the arbi-
trary scaling factor used in Section 3. Historical decomposition does
something different, although related: what is the contribution of shock
sj (rather than of realization sjt ) to the variables yt of the system? From
equation (24) it should be clear that this equals
ỹthd,j = dj0 sjt + dj1 sjt−1 + dj2 sjt−2 + ...djt−1 sj1 (27)
Note that in principle we are still in population, i.e., other than the
approximation of the full moving average process, the expression holds
for the true realizations of shocks and for the true parameter values.
In applied work Dj and sjt are substituted with estimates, for obvious
reasons.
A few remarks are due:
36
relevant this error is, one can plot both the true data and the sum
of historical decompositions. The difference should be bigger the
closer we are to the initial period.
By now, it should be clear that this set of restrictions implies that the
first shock does not have any contemporaneous effect on y2t , the sec-
ond shock does not have any contemporaneous effect on y3t and the last
shock does not have any overall contemporaneous effect on y1t . Consider
now the corresponding structural form. Simple math shows that in the
corresponding A matrix no zero appears, i.e.
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
37
restrictions on A is much harder (they are non-linear restrictions among
the entries {ai,j }).
Consider now the case in which the true structural form is
a11 a12 0
A = 0 a22 a23
a31 0 a33
y1t = −a−1 −1
11 a12 y2t + ... + a11 s1t
y2t = −a−1 −1
22 a23 y3t + ... + a22 s2t
y3t = −a−1 −1
33 a31 y1t + ... + a33 s3t
By symmetry with the example given above, we know that the B matrix
does not display any zero entry, and hence each structural shock affects
all variables. While this might not be obvious from the system specified
in A form, one should be able to see that, for example, s1t affects the
first variable directly, which indirectly affects the third variable, which
indirectly affects the second variable. Similarly, s2t affects the second
variable directly, the first variable indirectly and the third variable indi-
rectly. So, what do the restrictions mean? They mean that, conditioning
on the third variable, the first shock should have no effect on the second
variable. Or similarly, that conditioning on the first variable, a shock to
the second variable should not affect contemporaneously the third vari-
able. Such restrictions reflect restrictions on elasticities rather than on
overall effects. An example can be found in Caldara and Kamps (2012).
The example above should have clarified the distinction between A and
B forms, but it is worth taking the discussion a step forward and derive
impulse responses. For simplicity I will only consider impulse responses
only with regard to the impact effects. Consider the model
a11 −a12 y1t ... s
= + 1t
−a21 a22 y2t ... s2t
or equivalently,
a12 1 1
y1t = y2t + ... + s1t (28)
a11 a11 a11
a21 1 1
y2t = y1t + ... + s2t
a22 a22 a22
(29)
38
The B matrix corresponding to this model in A form is
1 a22 a21
B=
a11 a22 − a21 a12 a12 a11
Accordingly, the impulse vector corresponding to a shock s1t of size 1 is
1 a22
φ=
a11 a22 − a21 a12 a12
This is the overall impact effect of such a shock.
Consider now the A specification. The same overall effect emerges by
construction from computing the impulse vector step by step. To this
purpose, simplify notation from model (28) and define γ11 = a12 /a11 ,
γ12 = 1/a11 ,γ21 = a21 /a22 , γ22 = 1/a22 . Model (28) then rewrites as
∆1 y1t = γ12
∆2 y1t = (γ11 γ21 )γ12
∆3 y1t = (γ11 γ21 )2 γ12
∆4 y1t = (γ11 γ21 )3 γ12
...
which coincides with the one computed from the B model. The same can
be verified regarding the effect on y2t .
While apparently more complicated, computing the impulse response this
way allows to isolate the effects of multipliers. Consider for example the
case in which the two equations considered are a negatively sloped de-
mand function and a positively sloped supply function. An exogenous
39
increase in supply would increase the quantity for each price level. Nev-
ertheless, such a higher quantity would be demanded only at a lower
price. But the decrease in price would decrease the quantity supplied,
which would then increase the quantity demanded, and so on. Overall,
this argument leads to the new equilibrium. The total effect can be read
from the B form of the model. An alternative example is the case of a
positive demand function for some financial asset, say due to speculation.
An increase in price increases demand, which increases the price, which
again increases demand and so on. The A form specifies this mechanism
directly, while the B form captures the overall impact.
40
The literature has developed several arguments why at least a small part
of the variations in the federal funds rate might be exogenous. One
of these is to consider that the policy rate is set as the outcome of a
discussion among members of a committee, a process that might leave
room for some exogeneity. For example, one can assume that whether
one member of the committee was more convincing than another one
on a specific day, given an identical information set available, depends
on factors that are external to the state of the economy (for example,
mood, whether his or her kids let him/her sleep the night before etc.).
These and other interpretations are discussed, for example, in Romer and
Romer (2004).
Another interesting case on the general nature and existence of structural
shocks is the case of a model that pins down the joint determination of
price and quantity in a given market. If structural analysis is required,
one could be led to interpret the two underlying structural shocks as a
price shock and a quantity shock. This, though, opens the question for
what a price shock could possibly be. In fact, if one considers a Walrasian
setting, variations in prices depend on demand and supply, so a variation
in price should be decomposed into the underlying cause, otherwise it
remains endogenous. The same holds with regard to a quantity shock.
Instead of interpreting shocks in terms of a price shock and a quantity
shock, it is more common to label shocks as a demand shock and a supply
shock. In such a scenario, sign restrictions are usually used for identifi-
cation. Note that this means that in an equation featuring as dependent
variable, say, oil price, the corresponding structural shock cannot just
be labelled “oil price shock”. For a discussion on this point see Kilian
(2009).
41
References
Amisano, G. and Giannini, C. (2002). Topics in structural var economet-
rics.
Bernanke, B. S., Boivin, J., and Eliasz, P. (2005). Measuring the effects
of monetary policy: A factor’augmented vector autoregressive (favar)
approach. The Quarterly Journal of Economics, pages 387 – 422.
42
Fry, R. and Pagan, A. (2011). Sign restrictions in structural vector
autoregressions: A critical review. Journal of Economic Literature,
49(4):938–960.
Herwartz, H. (2015). Structural modelling with independent innovations.
Kilian, L. (2009). Not all oil price shocks are alike: Disentangling demand
and supply shocks in the crude oil market. American Economic Review,
99(3):1053–1069.
Lanne, M. and Lütkepohl, H. (2008). Identifying monetary policy shocks
via changes in volatility. Journal of Money, Credit and Banking,
40(6):1131–1149.
Lütkepohl, H. (2007). New introduction to multiple time series analysis.
Springer Science & Business Media.
Lütkepohl, H. (2014). Structural vector autoregressive analysis in a data
rich environment: A survey.
Lütkepohl, H. and Netšunajev, A. (2014). Disentangling demand and
supply shocks in the crude oil market: How to check sign restrictions
in structural vars. Journal of Applied Econometrics, 29(3):479–496.
Mertens, K. and Ravn, M. O. (2013). The dynamic effects of personal
and corporate income tax changes in the united states. The American
Economic Review, 103(4):1212–1247.
Rigobon, R. (2003). Identification through heteroskedasticity. Review of
Economics and Statistics, 85(4):777–792.
Rigobon, R. and Sack, B. (2004). The impact of monetary policy on asset
prices. Journal of Monetary Economics, 51(8):1553–1575.
Romer, C. and Romer, D. (2004). A new measure of monetary shocks:
Derivation and implications. The American Economic Review, pages
1055–1084.
Rubio-Ramirez, J. F., Waggoner, D. F., and Zha, T. (2010). Structural
vector autoregressions: Theory of identification and algorithms for in-
ference. The Review of Economic Studies, 77(2):665–696.
Stock, J. H. and Watson, M. W. (2011). Dynamic factor models. Oxford
Handbook of Economic Forecasting, 1:35–59.
Stock, J. H. and Watson, M. W. (2012). Disentangling the channels of the
2007-2009 recession. Technical report, National Bureau of Economic
Research.
43