0% found this document useful (0 votes)
48 views

A Step-By-Step Introduction To VAR Models (With Simulations On Matlab) - Michele Piffer

Var

Uploaded by

zenith6505
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

A Step-By-Step Introduction To VAR Models (With Simulations On Matlab) - Michele Piffer

Var

Uploaded by

zenith6505
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

A step-by-step introduction to VAR models

(with simulations on Matlab)


Michele Piffer∗
February 2015

WORK IN PROGRESS: COMMENTS ARE WELCOME

These notes provide a concise and simplified introduction to vector autoregressive


models, they explain what is meant by identification of the structural form and in-
troduce to impulse responses, variance decomposition, historical decomposition and
so on. The intuition is supported by simulations on Matlab. Codes are available
on my webpage. The interested reader is strongly suggested to write and work on
his/her own codes to challenge his/her own understanding of the topic. Figures for
each example are copied at the end of the notes. A pdf file of the .m files used is also
available on my webpage.


DIW Berlin, Mohrenstrasse 58, 10117 Berlin, Germany. Email: [email protected], personal
web page: https://sites.google.com/site/michelepiffereconomics/. These notes can be reproduced
freely for educational and research purposes as long as they contain this notice and are retained for
personal use or distributed for free. All errors are mine. Please get in touch if you find typos or
mistakes.

1
Contents
1 The nature of the problem 3

2 Forecasting 7

3 Impulse response analysis 7

4 Identification 11
4.1 Recursive identification - Cholesky . . . . . . . . . . . . . . . . . . . 15
4.2 Other zero restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Sign restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4 Long run restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5 External instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.6 Identification through heteroskedasticity . . . . . . . . . . . . . . . . 23
4.7 Restrictions on contemporaneous independence . . . . . . . . . . . . 25

5 Alternative autoregressive notation 25

6 VARs as a moving average 28


6.1 Variance decomposition of the forecast error . . . . . . . . . . . . . . 32
6.2 Historical decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 35

7 A, B and AB specification of the SVAR 37

8 Interpreting structural shocks 40

List of Figures
1 Example 1 - Generating the data . . . . . . . . . . . . . . . . . . . . 6
2 Example 2 - Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Example 3 - Impulse response to shock to the first variable (no constant) 11
4 Example 3 - Impulse response to shock to the first variable (constant) 12
5 Example 4 - The case of observationally equivalent structural repre-
sentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Example 5 - Cholesky identification . . . . . . . . . . . . . . . . . . . 19
7 Example 8 - Identification through sign restrictions . . . . . . . . . . 22
8 Example 11 - AR and MA representations of the same data . . . . . . 33

2
1 The nature of the problem
Suppose you want to model the interaction between variable xt and variable zt , say,
GDP and inflation. You are willing to accept as maintained hypothesis that they
affect one another both contemporaneously and up to one lag, but not after one lag.
In addition, your are willing to assume that the relationship is linear, and that only
these variables should enter the model. The model you are after is

xt = δ1 zt + δ2 xt−1 + δ3 zt−1 + sxt (1)


zt = µ1 xt + µ2 xt−1 + µ3 zt−1 + szt

The contemporaneous interdependence between variables is picked up by the pa-


rameters δ1 and µ1 , while the lagged interdependence is reflected by the remaining
parameters. The error terms sxt and szt are assumed to be independently, normally
distributed with unit variance.
The key problem in VARs is that we cannot directly estimate equations (1)
consistently because the regressors zt and xt are endogenous. The most immediate
way of seeing this is the following. Is zt exogenous in the first equation of the system,
given that it enters as regressor in that equation? We can address this question using
the system itself, since it spells out the determinants of zt . By the second equation,
zt depends on xt , which in turn depends on sxt from the first equation. This means
that the regressor zt in the first equation depends indirectly on sxt and hence is
endogenous to the error term in the first equation. The same logic holds with respect
to the second equation. As you will simulate in Example 1, if one regressor in an
equation is endogenous, then the OLS estimates of the coefficients of all regressors
are inconsistent. Using maximum likelihood estimation would not solve the problem,
since this is not an estimation issue but an identification issue, as will be explained.
A more formal way of seeing the endogeneity problem discussed above requires
rewriting the variables at t only as a function of predetermined variables and contem-
poraneous errors. The tedious way of doing so consists of substituting one equation
into the other and solving for xt and zt , which gives

δ2 + δ1 µ 2 δ3 + δ1 µ 3 1 δ1
xt = xt−1 + zt−1 + sxt + szt
1 − µ 1 δ1 1 − µ1 δ1 1 − µ 1 δ1 1 − µ 1 δ1
µ1 δ2 + µ2 µ 1 δ3 + µ 3 µ1 1
zt = xt−1 + zt−1 + sxt + sz
1 − µ1 δ1 1 − µ 1 δ1 1 − µ 1 δ1 1 − µ 1 δ1 t
A better approach, instead, consists of using matrix algebra. Rewrite first the system
in equations (1) as
       x
1 −δ1 xt δ2 δ3 xt−1 s
= + tz
−µ1 1 zt µ2 µ3 zt−1 st

3
Premultiplying both sides by the inverse of the matrix on the left hand side yields
        x
xt 1 δ2 + δ1 µ 2 δ3 + δ1 µ 3 xt−1 1 1 δ1 st
= +
zt 1 − δ1 µ1 µ 2 + δ µ
2 1 µ 3 + δ µ
3 1 zt−1 1 − δ1 µ1 µ 1 1 szt

which is of course the same system obtained through the tedious substitutions. For
simplicity, rewrite the above equations as

xt = β1 xt−1 + β2 zt−1 + rtx (2)


zt = β3 xt−1 + β4 zt−1 + rtz

where, by now, it should be understood that the parameters {βi }4i=1 are not free
parameters, but they are some function of the underlying parameters {δi , µi }3i=1 .
The same holds for the error terms that enter equations (2). These errors are not
free random variables, but they are driven by the underlying errors in the initial
model through the relationship
 x    x
rt 1 1 δ1 st
z =
rt 1 − δ1 µ1 µ1 1 szt

Contrary to equations (1), equations (2) can be estimated consistently using OLS
x
equation by equation. In fact, the regressors xt−1 and zt−1 depend directly on rt−1
z x z ∞
and rt−1 (and indirectly on all previous errors {rt−i , rt−i }i=2 , etc. via the lagged
dependent variables), but not on rtx and rtz . Equations (1) provide the structural
representation of the model, while equations (2) provide the reduced form represen-
tation of the model. Note that estimating the reduced form model of equations (2)
gives four estimates (βi , i = 1, .., 4), while the structural representation of the same
data had 6 elements (δi , µi , i = 1, .., 3). Moving from the estimated reduced form to
the structural is not straightforward. We will get back to this later.
A few words of caution are worth at this point. While we do acknowledge that
xt should be part of the model representing variable zt (and viceversa), the omission
of xt in the equation estimated for zt is not a source of omitted variable, since we
are estimating a transformation of the underlying model. The contemporaneous
relationship between xt and zt that is clearly displayed in equations (4) is equally
present in equations (2). It simply enters the model of equations (4) through the
correlation of the errors. This correlation should be clear from equation (5) and will
be crucial in the discussion of identification issues.
From now on, refer to the reduced form model using linear algebra:

yt = B− yt−1 + rt (3)
where bold lower-case letters indicate vectors and upper-case letters indicate ma-
trices. VAR analysis starts with the estimation of a reduced form VAR. Here
yt = (xt , zt )0 , and rt = (rtx , rtz )0 . 0 indicates transpose, i.e. yt and rt are column

4
(2 × 1) vectors. Note that the model considered here uses only one lag as regressors,
i.e., it is a VAR(1). We will use a more general notation with the lag operator in
Section 5.
The structural form of the model, instead, can be written in two ways:

Ayt = A− yt−1 + st (4)


yt = B− yt−1 + Bst

Note that what distinguishes the structural form from the reduced form is whether
structural shocks appear explicitly or not. It should nevertheless be clear that

rt = Bst (5)

with B = A−1 and that B− = A−1 A− = BA− . In other words, when working with
a reduced form model we do acknowledge that the reduced form shocks come from
some underlying structural shocks, but we do not take a stand on how the mapping
works. In most of these notes I will use the B form of the structural form. Section 7
discusses in further detail the difference between A form and B form of the structural
shocks.
In these notes I will only discuss stationary VARs. A VAR in the form of (3) is
stationary if the maximum eigenvalue of the matrix B− is inside the unit circle, i.e.
it is smaller than 1 in absolute value.

Example 1: Generate data on yt from the model with structural param-


eters    
1 −5.12 −1.01 −2.02
A= and A− =
2.19 1 1.52 −0.30
There is only one lag in the true model. Check that the model is station-
ary. Compute the corresponding reduced form parameters and generate
pseudo data using equation (3). Do to so, extract first a set of structural
shocks st , map these shocks into reduced form shocks rt and then gener-
ate the data yt recursively, starting from an initial observation of (0, 0)0 .
Note that this is the expected value of yt . Try to generate data starting
far away from the expected value.
Convince yourself that generating the data using equations (4) rather
than (3) is much harder.
Convince yourself that the structural shocks are uncorrelated with each
other within period (i.e., contemporaneously uncorrelated), but the re-
duced form shocks are not. Convince yourself of the fact that none of
these shocks are serially correlated, i.e. correlated across time.
Estimate the reduced form parameters using OLS equation by equation.
Of course, in real life we do not know which variables enter the true

5
model, nor the number of lags. Assume that you know the true model
up to the parameter values, i.e. you do know that only xt , zt should enter
the model, that the model has no constant and that it has only one lag of
each variable. Convince yourself that estimating the reduced form model
yields consistent estimates, while estimating directly the structural model
does not.

Figure 1: Example 1 - Generating the data


Structural shocks to variable 1 Structural shocks to variable 2
2

1
1

0 0

−1 −1

−2 −2

10 20 30 40 50 10 20 30 40 50

Reduced form shocks to variable 1 Reduced form shocks to variable 2

0.5 0.4

0.2
0
0
−0.5
−0.2

−1
10 20 30 40 50 10 20 30 40 50

Variable 1 Variable 2
0.8

0.6
0.5
0.4

0.2
0
0

−0.5 −0.2

−0.4

10 20 30 40 50 10 20 30 40 50

6
2 Forecasting
Suppose you have data until time T and you want to predict the course of variables
x and z for the future. To do so, one can use model (3) to generate forecasts of x
and z recursively. In order to do so, one needs to substitute {rt }Tt=T
+h
+1 with their
expected value. Since rt depends on st through equation (5) and since the structural
shocks equal zero in expectation, the expectation of the corresponding reduced form
shocks is zero. The forecasts periods are then computed as

ŷT +1 = B̂− yT
ŷT +2 = B̂− ŷT +1
ŷT +3 = B̂− ŷT +2
...

where B̂− stands for the OLS estimates of B− and ŷT +h is the forecast of the vector
y for the period T +h, given the information on the first T periods. Forecasts tend to
become poorer with the time, given that the effects of future shocks is not taken into
account period after period. This should become clear after the following example.

Example 2: Following the previous example, assume you have T = 100


observations. Generate forecasts of x and z for 10 periods using both
the true, in principle unknown matrix B− (L) and the estimated matrix.
Then, extract one possible realization of the variables and compare.

3 Impulse response analysis


As should be clear from the argument above, a key feature of forecasting is that it
does not require to take a stand on the matrix B that maps structural into reduced
form shocks, but only on matrix B− capturing the autoregressive components. Since
B = A−1 , not taking a stand on B is equivalent to saying that the researcher does
not need the structural model, but can simply rely on its reduced form representa-
tion. Of course, the researcher does acknowledge that the errors rt come from some
combination of structural shocks st through an unknown relationship captured by
the matrix B, but since these structural shocks are zero in expectation, the corre-
sponding reduced form shocks are equally zero in expectations. The rest of these
notes deals with structural analysis.
Let’s introduce structural analysis using one of its most popular applications, the
impulse responses, and let’s introduce impulse responses with an example. Suppose
we have the structural model:

yt = B− yt−1 + Bst (6)

7
Figure 2: Example 2 - Forecasting
Variable 1 Variable 2

1 0.4

0.2
0.5

0
0
−0.2

−0.5
−0.4

−1
−0.6

20 40 60 80 100 20 40 60 80 100

Variable 1 Variable 2

1 Forecast, true param. 0.4


Forecast, estimated param.
Possible realization
0.2
0.5

0
0
−0.2

−0.5
−0.4

−1
−0.6

95 100 105 110 95 100 105 110

Suppose for the moment that we do actually know the true matrices B− and B.
This is of course never the case in practice, but it is an innocuous assumption for the
sake of the point made in this specific example (it allows to keep sample uncertainty
out of the analysis). Suppose you want to generate pseudo data corresponding to
a completely random extraction of structural shocks. Doing so requires extracting
T · 2 realizations of structural shocks S̄ = [s̄1 , s̄2 , ..., s̄T ], s̄t = (s̄xt , s̄zt )0 , converting
them into reduced form shocks through the equality r̄t = Bs̄t and then generating

8
the observables Y = [y1 , y2 , ..., yT ] recursively, given some initial value y0 . Call
the generated data Y(S̄)pseudo . Suppose now that you generate new data by feeding
into the model the very same set of structural shock S̄, except that the structural
shock to x at some period τ is augmented by . In other words, the new data,
Y(S̃)pseudo , are generated from the same initial values y0 and from the structural
shocks {s̃xt , s̃zt }Tt=1 with s̃zt = s̄zt , ∀t, s̃xt = s̄xt , ∀t 6= τ and s̃xt = s̄xt + , t = τ . These
new shocks will necessarily map into identical reduced form shocks for t < τ and
for t > τ , and into a different set of reduced form shocks for t = τ . Hence, by
construction, the dynamics of the observables are identical up to t = τ and can differ
afterwards. In particular, they can differ both at time t = τ (due to the matrix
B) and at any t > τ (due to the autoregressive component B− of the model). Any
difference in the pattern of Y corresponding to Ŝ and S̃ would be attributed to the
single additional structural shock  given to variable x. This difference, computed
as φτ = y(S̃)pseudo τ − y(S̄)pseudo
τ , is called impulse response of variables x and z to
a structural shock to variable x. Impulse responses play a key role in structural
analysis. An impulse response describes the effects produced by a structural shock
on the endogenous variables.
The above description of an impulse response function uses its full definition, but
computing impulse responses does not really require generating pseudo data every
time, as we did above. In fact, since the model is linear, the impulse responses
computed using the above methodology are the same irrespectively on the initial set
of structural shocks S̄, as long as the value of  is kept unchanged. A particularly
convenient set of values of S̄ is of course zero. If s̄t = (0, 0)0 , ∀t the corresponding
observables Y(s̄)pseudo equal zero at each point in time, so we do not even need to
subtract them. It then follows that one can compute the same impulse response of
yt to a structural shock of size  to variable x simply as the level of the variables
corresponding to the matrix of structural shocks s̄τ = (, 0)0 and zeros for other
periods:

φ0 = v
φ1 = B− · φ0 = B− · v
φ2 = B− · φ1 = (B− )2 · v
...
φτ = B− · φτ −1 = (B− )τ · v
where  x    
v  b
v= z =B· = 11  = b1  (7)
v 0 b21
where b1 stands for the first column of B. The vector v takes the name of impulse
vector. In this case, the impulse vector corresponds to a shock to variable x. If one
wants to consider a shock to variable z, then the impulse vector will be computed as
B · (0, )0 = b2 , where b2 represents the second column of B.

9
So far we have considered the thought experiment of giving a vector of structural
shocks (, 0)0 or (0, )0 . In principle, one could also compute impulse responses to a
shock to a reduced form shock instead of to a structural shock. This would require
giving an impulse vector of the form
   
 0
v= or v = (8)
0 

Mathematically, one could well compute this, but economically it would not make
much sense. In fact, such an analysis would not appreciate the fact that reduced
form shocks are necessarily correlated with each other because they come from some
underlying structural shocks. This correlation is important, because it was this very
specific correlation that captured the contemporaneous correlation among variables.
We saw in the introduction to these notes that this contemporaneous correlation
is captured by the matrices A or B in the structural form. An impulse response
analysis is computed on structural shocks st . Given a set of structural shocks of
interest, to compute impulse responses one must compute the reduced form shocks
endogenously as a function of the shock considered, using matrix B. It then becomes
crucial to have not only an estimate of B− , but also an estimate of B. Doing so is
not straightforward and requires to take a stand on the identification of the model.

Example 3: Start from Example 1. Compute the true impulse responses


to a shock to x and to a shock to z. By true impulse responses it is
meant the one corresponding to the true values of the parameters of the
model. Set the size of the shock equal to one standard deviation of the
corresponding variable. Compute the impulse responses using both the
formal definition of impulse responses (i.e. as the difference between
observables given alternative sets of structural shocks) and the operative
definition (i.e. as the recursive substitution from a single impulse vector).
Convince yourself that they are the same.
Now, assume that the model has also a constant. Convince yourself that
the operative definition should not include the constant term in the com-
putation of impulse responses, but only the autoregressive component.
Why? Starting from the same values of pseudo structural shocks, do you
expect the impulse responses to be the same or different depending on
whether the model has or not a non-zero constant term? Also, does the
presence of a constant affect whether the model is stationary?
Replicate the analysis giving a two standard deviation shock. Make sure
you see why the impulse responses are simply scaled up by the same
factor.

10
Figure 3: Example 3 - Impulse response to shock to the first variable (no constant)

Structural shocks to variable 1 Structural shocks to variable 2

3 1
2
0
1
−1
0 pseudo
−1 −2 pseudo + extra shock

5 10 15 20 25 30 5 10 15 20 25 30

Reduced form shocks to variable 1 Reduced form shocks to variable 2

0.5 0.2
0
0
−0.2
−0.5 −0.4
−0.6
5 10 15 20 25 30 5 10 15 20 25 30

Variable 1 Variable 2
0.5
0.5
0
0

−0.5 −0.5

5 10 15 20 25 30 5 10 15 20 25 30

Impulse responses Impulse responses

0.15 true definition 0


operative definition
0.1 −0.1

0.05 −0.2

0 −0.3
5 10 15 20 25 30 5 10 15 20 25 30

4 Identification
Identification is a very broad concept in Econometrics, and these notes steer clear
from attempting to give a formal definition of the concept. Here, it suffices to say that
identifying the structural model means estimating matrix B (or equivalently A. See
Section 7). Once this matrix is identified, impulse responses can be computed, the full
structural form can be derived and additional analysis like variance decomposition

11
Figure 4: Example 3 - Impulse response to shock to the first variable (constant)

Structural shocks to variable 1 Structural shocks to variable 2

3 1
2
0
1
−1
0 pseudo
−1 −2 pseudo + extra shock

5 10 15 20 25 30 5 10 15 20 25 30

Reduced form shocks to variable 1 Reduced form shocks to variable 2

0.5 0.2
0
0
−0.2
−0.5 −0.4
−0.6
5 10 15 20 25 30 5 10 15 20 25 30

Variable 1 Variable 2

−1
4

3.5 −1.5

3 −2

5 10 15 20 25 30 5 10 15 20 25 30

Impulse responses Impulse responses

0.15 true definition 0


operative definition
0.1 −0.1

0.05 −0.2

0 −0.3
5 10 15 20 25 30 5 10 15 20 25 30

can be carried out.


Let’s start by listing what we have and what we need. Of course, we need an
estimate of B. What we have in order to achieve this is the variance covariance matrix
of the reduced form shocks rt in the estimated reduced form model (3), which we
call
Σ = E(rt r0t ) (9)
Note that the true variance covariance matrix of rt has non-zero off diagonals ele-

12
ments due to the fact that reduced form shocks come from the same set of underlying
structural shocks, and hence are correlated with each other. Getting an estimate of
Σ is not complicated, because the estimation of the reduced form model yields con-
sistent estimates of the residuals, which implies the following estimator for Σ:
PT
r̂t r̂0
Σ̂ = t=2 t (10)
T
By assumption, i.e. by the relationship

rt = Bst
the following equality holds:
Σ = BDB 0 (11)
where D = E(st s0t ) is the true variance covariance matrix of the structural shocks.
Since these are structural shocks, they are not correlated with each other, hence D is
diagonal. Assume that the structural shocks are normally distributed (I will return
to this point in Section 4.7). In our application with two variables from the previous
Sections, Σ̂ has 3 elements (one is lost since the matrix is by construction symmetric),
B has 4 elements and D has two. Clearly, we cannot solve for 6 elements having 3
pieces of information. The system is not identified, which implies that to estimate B
uniquely we need additional restrictions, i.e. additional information. Note that this
is not an estimation issue but an identification issue. For this reason, I will address
the problem considering Σ rather than Σ̂, i.e. I will discuss this issue keeping sample
uncertainty out and assuming that we know the true Σ.
More in general, with n variables included in the VAR, the matrix Σ contains
n(n + 1)/2 elements, the matrix B contains n2 elements and the matrix D contains n
elements. It is standard to assume a normalization at this point and set D = I. This
eliminates n elements to be estimated, although the system is still under-identified.
Under this assumption, the key condition imposed by the data to estimate B is

Σ = BB 0 (12)

and contains n2 unknowns, to be solved for in n(n + 1)/2 elements.


There are many approaches that the literature has proposed in order to estimate
B. I will give a quick introduction to the most popular ones. Before doing so, a few
general points are worth keeping in mind:

• The key challenge in solving system (12) comes from the fact that the system
is non-linear in the elements of B;

• Some solve the identification problem by adding restrictions from the theory,
others from statistical features of the data. These are two conceptually very
different approaches;

13
• Identification approaches are divided into pointwise identification and set iden-
tification. Pointwise identification consists of adding sufficient and appropriate
restrictions to identify a single matrix B consistent with condition (12), given
an estimated matrix Σ̂. Set identification consists of adding fewer restrictions,
implying a whole set of candidate matrices B that are equally consistent with
condition (12);

• Under pointwise identification, it is useful to know ex ante if, in general, the


specific set of restrictions that one might be willing to impose on B are enough
for B to be uniquely identified or not, i.e. if the restrictions imposed are not
enough to imply a single matrix B that solves condition (12). If one is after
pointwise identification, a unique solution for B exists under the necessary but
not sufficient condition that we impose at least n(n − 1)/2 restrictions, i.e. the
n2 parameters in B minus the n(n + 1)/2 elements in Σ. Note that this simply
means that B has no more unknowns that Σ. If this is the case, and if the
restrictions imposed are sufficiently ”asymmetric”, then the matrix could be
identified;

• Given a VAR with n variables, there is a conceptual difference between trying to


identify all n structural shocks, i.e. trying to attach an economically meaningful
interpretation to all of them, and trying to identify only a subset of shocks. The
latter case refers to the case of partial identification. Partially identifying the
model, though, still requires choosing whether the identification of the subset
of shock is intended to be a pointwise identification or a set identification.

In this introductory part of the notes it is probably pointless to dig more into
these concepts and discuss rank conditions, local vs. global identification and so on. I
discuss these issues in greater detail in “Notes on the uniqueness in the identification
of SVARs”, available on my website. For a more complete analysis the reader is
referred to Fisher (1966), Lütkepohl (2007) and Rubio-Ramirez et al. (2010). See
also Amisano and Giannini (2002).

Example 4: Start from the structural model in Example 1. Compute


some data from it. Then compute a different set of structural shocks
that would imply exactly the same dataset. By doing so, you generate
sets of structural shocks that are observationally equivalent. Show that
the impulse responses corresponding to the two structural representations
considered are different. For simplicity, use the true parameter values
instead of the estimated parameter values.

14
Figure 5: Example 4 - The case of observationally equivalent structural representa-
tions

Variable 1 Variable 2
0.5
0.5

0 0

−0.5
−0.5
5 10 15 20 25 30 5 10 15 20 25 30

Structural shocks to variable 1 Structural shocks to variable 2

1 1

0 0

−1 −1

−2 −2
5 10 15 20 25 30 5 10 15 20 25 30

Impulse responses to shock to variable 1 Impulse responses to shock to variable 1


0.8
Case 1 0.2
0.6 Case 2

0.4 0

0.2
−0.2
0
5 10 15 20 25 30 5 10 15 20 25 30

Impulse responses to shock to variable 2 Impulse responses to shock to variable 2


0.8
0.3
0.6

0.4 0.2

0.2 0.1
0 0
5 10 15 20 25 30 5 10 15 20 25 30

4.1 Recursive identification - Cholesky


By far the most approachable identification scheme is the Cholesky identification.
With this identification, exactly n(n−1)/2 additional restrictions are imposed, which
take the form of zero restrictions, or exclusion restrictions.
Consider again the 2 × 2 case studied so far. In this case, we need one restriction.

15
Suppose we are willing to impose the restriction that the top-right entry of B equals
zero, i.e. that B should be searched for within the set of
 
b11 0
(13)
b21 b22

Under this restriction, it can be shown that there is a unique solution for B in
equation (12) up to sign of columns. This means that, given one matrix B that
satisfies condition (12), flipping the signs of all entries of the first column of B, of
the second column of B or of both columns generates a new matrix that equally
solves equation (5). This degree of indeterminacy actually holds for all matrices, not
only for triangular matrices. Importantly, it can be shown that, in the triangular
case, these are the only matrices that satisfy equation (5). A standard normalization
consists of imposing that the diagonal elements of the solution for B are positive,
which happens to be an innocuous imposition.
Having understood the mathematical properties of a unique (up to sign of columns)
solution for B under the triangular assumption, it is key to understand the economic
meaning of it. In fact, it is economic theory that should guide a researcher towards
adopting (i.e. imposing) a triangular structure to B or not. Put it differently, it is
the economic meaning of the restrictions imposed that allows a researcher to attach
interpretations to the underlying structural shocks identified.
Let’s use a standard example, which is the example of monetary shocks. For the
moment, do not worry about what a monetary shock could possibly be, I will return
to this point in Section 8. Consider a VAR model with 3 variables, i.e. real GDP,
inflation and the federal funds rate. Identifying the model adding these variables in
this order, i.e.  GDP 
yt
yt = ytinf lation 
ytf edf unds
and using the triangular structure of B, i.e.
 GDP   GDP     1
yt yt−1 b11 0 0 st
ytinf lation  = B− (L) yt−1
inf lation 
+ b21 b22 0  s2t 
ytf edf unds f edf unds
yt−1 b31 b32 b33 s3t

implies that:

• a shock to GDP affects contemporaneously all three variables (due to the first
column of B), but GDP is affected contemporaneously only by its own shock
(due to the first row of B);

• a shock to inflation affects contemporaneously only inflation and the fed funds
(due to the second column of B), but inflation is affected contemporaneously
only by its own shock and the shock to GDP (due to the second row of B);

16
• a shock to the fed funds rate affects contemporaneously only the federal funds
rate (due to the third column of B), but the federal funds rate is affected by
all shocks (due to the third row of B);

• after one lag, all shocks could potentially affect all variables, due to the au-
toregressive component B− (L) of the model.
Convince yourself that this is the case by rewriting the reduced form structural shocks
as  GDP       
rt b11 0 0
inf l 
rt =  rt = b21 st
  GDP   inf
+ b22 st + 0  sft f unds

rtf f unds b31 b32 b33
Note that the response of the central bank to the economy is captured by the terms
{B− }03 yt−1 + b31 sGDP
t + b32 sinf
t
l
, with {B− }03 the last row of the 3 × 3 matrix B− ,
while the stochastic response of the policy rate is taken by b33 sft f unds .
Do these restrictions make economic sense? It depends on the frequency of the
data. If the data used are monthly, then it is realistic that an exogenous variation in
the policy rate takes at least one month to affect the economy, and that the federal
funds rate responds endogenously to contemporaneous developments of GDP and
inflation. Instead, if the data are yearly, it would be harder to assume that the lags
between a monetary intervention and the corresponding response of the economy are
so long that an effect could not occur before a whole year. Of course, this calls into
question the judgement of the researcher.
What happens if one adds a financial variable to the model? Suppose one orders
this financial variable, say return on stocks, after the federal funds. This means that
a monetary shock does affect contemporaneously financial markets, which is realistic.
Nevertheless, it also implies that contemporaneous developments on financial markets
are not taken into account by the central bank when setting the interest rate, which
is not easy to justify. Similarly, if one orders the financial variable before the federal
funds rate, then the central bank does take contemporaneous financial developments
into account when setting the interest rate, but contemporaneous developments on
the interest rate affect the financial variable only with a lag. Again, depending on
the data and on the exact variables used, this assumption might be easy or hard to
defend.
There is another way of thinking about the identification restrictions imposed,
which uses identified structural shocks rather than impulse responses. Remember
that reduced form and structural shocks are related by the equation rt = Bst . Once
B is estimated, we can trace back the structural shocks from the reduced form shocks
using
st = B −1 rt
Since B is lower triangular, its inverse is lower triangular. This means that in the
Cholesky ordering, what allows to attach a specific economic label to a certain shock
is the argument that, given the reduced form shocks estimated in the VAR, it is the

17
only structural shock that is contemporaneously correlated with all reduced form
shocks of the variables ordered before and contemporaneously uncorrelated with
the reduced form shocks of the variables order after. In the example above, what
distinguished the monetary policy shock from the other identified structural shocks
is that it is the only structural shock that is correlated with the reduced form shocks
of all 3 variables. This can be seen from the fact that, given
 
a11 0 0
B −1 = A = a21 a22 0 
a31 a32 a33
the structural shocks are computed as s1t = a11 r1t , s2t = a21 r1t + a22 r2t and s3t =
a31 r1t + a32 r2t + a33 r3t . Such restrictions are the counterpart of the triangularity
restrictions on the impulse responses. If such restrictions are not theoretically sound
given the specific case at hand, the Cholesky decomposition is not appropriate, no
matter how convenient or easy it is to implement it.
A standard result in the Cholesky identification is that the recursive structure
matters in terms of blocks and allows for an easy partial identification. To see this,
consider a VAR of 5 variables, with variable z ordered as third. The Cholesky iden-
tification of the structural shock to variable z is independent on the exact ordering
within blocks of the variable that enter before and after z. A standard reference for
this is Christiano et al. (1999).

Example 5: Assume that


   
2.5 0 0.06 −0.03
A= and A− =
−1.19 1 −0.03 0.06
Compute and compare the true and the estimated impulse responses.
Note that a recursive structure of B in the reduced form model maps
into a recursive structure of A in the corresponding structural model.
Convince yourself that the economic interpretation discussed above holds
equally well for the B representation and for the A representation. What
if instead the true model does not allow a recursive structure, i.e. the
true B is not actually triangular?

Example 6: Consider now a VAR model with matrices


   
2.5 0 0 0.6 −0.3 0
A = −1.19 1 0 and A− = −0.3 0.6 0.11
−0.8 1.1 2 0.4 0 0.1
Show that the true and the estimated impulse response to a shock to
the third variable when imposing a Cholesky structure holds even when
reshuffling the variables ordered before.

18
Figure 6: Example 5 - Cholesky identification
IRF of Variab. 1, shock 1 IRF of Variab. 2, shock 1

0.4 True 0.45


Estimated
0.35
0.4
0.3 0.35

0.25 0.3

0.2 0.25

0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
2 4 6 8 10 2 4 6 8 10

IRF of Variab. 1, shock 2 IRF of Variab. 2, shock 2


0
1

0.9

0.8
−0.05
0.7

0.6

0.5

−0.1 0.4

0.3

0.2

0.1
−0.15
0
2 4 6 8 10 2 4 6 8 10

4.2 Other zero restrictions


The Cholesky identification imposes n(n−1)/2 zero restrictions, which are enough to
satisfy the so-called order condition. While this is not in itself enough to guarantee
that B is actually identified, under the triangular ordering of these restrictions the
matrix is actually identified.
One could instead impose just as many restrictions, but not recursively. Whether
the corresponding matrix B is identified or not depends on the exact restrictions.

19
Once checked whether the matrix is identifiable or not, there are numerical solutions
available that one can use to actually estimate B under the restrictions imposed. A
good reference on this is Binning (2013), while a more general and more comprehen-
sive reference is Rubio-Ramirez et al. (2010).

4.3 Sign restrictions


The identification strategies considered so far aim for pointwise identification, i.e. the
estimation of a unique B (again, unique up to sign of columns) that satisfies condition
(12). An alternative approach starts from the following consideration: after all, what
one uses to attach to a candidate impulse response the interpretation of an impulse
response to, say a technology shock, is the specific behaviour of the impulse response.
If one acknowledges this, then one could generate several candidate matrices B that
are equally consistent with condition (12) and rule out the ones whose corresponding
impulse responses do not satisfy the theoretical restrictions imposed by the prior of
the researcher.
Coding this identification approach is fairly simple because it relies on a simple
intuition. Suppose that B̂ is a candidate matrix satisfying condition (5). Define a new
candidate matrix as B̃ = B̂ · Q where Q is an n × n matrix such that QQ0 = Q0 Q = I
(i.e. that Q is an orthogonal matrix). B̃ equally satisfies condition (12) because
Σ = B̃ B̃ 0 = B̂QQ0 B̂ 0 = B̂ B̂ 0 . As long as one has an algorithm that generates
matrices Q, one can generate as many candidate matrices B as he/she wants. Since
each corresponds to a potentially different set of impulse responses, one can then rule
out the orthogonal matrices which achieve a transformation that is not compatible
with the a priori restrictions. Such restrictions could be, for example, that GDP
does not decrease in response to an expansionary monetary shock on impact, or that
after h periods the impulse response to a technology shock takes a certain shape.
A key step in this identification strategy is the generation of orthogonal matrices.
There are two main approaches. One uses rotation matrices. For example, the
rotation matrix  
cos(θ) −sin(θ) 0
R1,2 = sin(θ) cos(θ) 0 .
0 0 1
has the orthogonality property thanks to the equality cos(θ)2 + sin(θ)2 = 1. If you
post-multiply a 3 × 3 candidate matrix B̂ by R1,2 you obtain a new matrix with
third column identical to the third column of B̂ and with first two columns equal
to some linear combination of the first two columns of B̂. Similar matrices can be
computed to change other vectors of the multiplied matrix. Another approach to
generate orthogonal matrices is to extract any matrix M and then apply the QR
decomposition to it. By construction, this decomposition is such that it generates a
matrices Q and R such that B̂ = Q·R and Q is orthogonal. The main algorithms used
for the QR decomposition are the Gram-Schmidt process, the Householder reflection

20
and the Givens rotation.
The identification through sign restrictions is a set identification, because there
are usually several candidate structural representations of the data that meet the
restrictions and that are equally consistent with the data (unless one intentionally
adds as many restrictions as needed to be left with only one candidate model. One
example is Canova and De Nicolo (2002)). A standard way of representing the results
is to report an error band covering between 5% and 95% of the models generated,
considered at each time t. It is also common to report either the median or the median
target (see Fry and Pagan (2011)). It is worth reminding that the uncertainty band
attached to the impulse response under sign restrictions does not give a measure of
sample uncertainty, but of model uncertainty.
Last, one can combine a triangular structure of B for most of the variables and
achieve a rotation in a subspace of B, in case, say, the first variables in the VAR
should allow for a triangular structure and the last ones not. For an application of
this type see Eickmeier and Hofmann (2013).

Example 7: Generate a 3 × 3 matrix as the product of 3 rotation matrices


that use the same rotation angle θ and that rotate, respectively, the first
and the second columns, the first and the third column, the second and
the third columns of a postmupltiplied matrix. Show that the generated
matrix is still orthogonal. Why is this the case? Show that this does not
depend on the exact rotation matrix. Then, compute a 3 × 3 orthogonal
matrix using the QR decomposition on a 3 × 3 matrix extracted from a
multivariate normal distribution.

Example 8: Assume that


   
1 −5.12 .06 −0.03
A= and A− =
2.19 1 −0.03 0.06
Compute the true impulse responses. Estimate the model and compute
the impulse responses when mistakenly imposing a Cholesky structure.
Assume you have some theory that tells you that a shock to variable 1
should decrease on impact variable 2 and a shock to variable 2 should
increase on impact variable 1. Remember the normalization according to
which each shock should affect its own variable positively. Generate 100
models using QR decomposition on random normal extractions of 2 × 2
matrices. Plot the error bands for model uncertainty.

4.4 Long run restrictions


It will be argued more extensively in Section 6 that the autoregressive organization
of the data used so far can be replaced with a moving average representation. In

21
Figure 7: Example 8 - Identification through sign restrictions

IRF of Variab. 1, shock 1 IRF of Variab. 2, shock 1

Sign restricted 0.05


0.4
True
0.35 Cholesky (wrong)
0
0.3
0.25
−0.05
0.2
0.15 −0.1
0.1
0.05 −0.15
0
−0.2
1 2 3 4 5 1 2 3 4 5

IRF of Variab. 1, shock 2 IRF of Variab. 2, shock 2

0.2
0.4

0.15
0.3

0.1
0.2
0.05

0.1
0

0
1 2 3 4 5 1 2 3 4 5

particular, one can rewrite the data yt as

yt = C(L)Bst (14)
= Bst + C1 Bst−1 + C2 Bst−2 + C3 Bst−3 + ...

where the matrix B has the same meaning as in the notation used so far and the
matrices {Ci }∞
i=1 are a function of the underlying autoregressive parameters.
Student Version Suppose
of MATLAB

22
that the first variable of the VAR enters in first differences. Assuming that

X
C(1)1,2 = c1,2,j = 0
j=0

means that the second structural shock does affect the first variable, but overall the
long run effect is zero because the positive first differences and the negative first
differences cancel each other. This means that if one has a theory of no long run
impacts of a certain structural shocks, then one can add this set of restrictions to
identify the SVAR. In particular, the restriction would be imposed on the matrix
C(1)B, where the elements in C(1) do not allow for restrictions, because they will
reflect parameters estimated in the reduced form model. The long run restrictions
are then imposed in the matrix B, which is the matrix that we are trying to identify.
The standard reference for this approach is Blanchard and Quah (1989).

4.5 External instruments


It is sometimes the case that one has a variable external to the model that is likely
correlated with a structural shock to be identified in the VAR. For example, suppose
that one wants to identify a monetary shock in the VAR. The literature has worked
on estimation strategies of monetary shocks that do not use VARs, for example
using the approach by Romer and Romer (2004) on Greenbook data. This allows to
exploit the additional information contained in these external instruments to identify
the structural monetary shock in the VAR out of the reduced form shocks estimated.
This identification strategy is discussed separately on the notes titled “Identifica-
tion using external instruments” available on my webpage. The standard references
for this approach are Stock and Watson (2012) and Mertens and Ravn (2013).

4.6 Identification through heteroskedasticity


The identification strategies considered so far exploit economic theory in order to
introduce additional restrictions. Another prominent approach proposed in the liter-
ature exploits statistical properties of the data. One leading case is the identification
through heteroskedasticity proposed by Rigobon (2003).
The idea is the following. Suppose you have economic reasons to believe that
there have been changes in the variances of all the structural shocks, for example,
due to more general instability in certain periods. Then, one can divide the sample
period into subperiods, say for example two, T1 and T2 , depending on the variance
regime. Under the assumption that the impact effect of shocks is the same across
regimes (i.e. that the B matrix is not regime-specific), the following system holds:

Σ1 = BB 0 and Σ2 = BΛB 0

23
The variance covariance matrix of structural shocks in period T1 is normalized to
I, while Λ measures the relationship between variances. Note that adding regimes
introduces additional parameters to estimate, namely the parameters in the matrix
Λ, but it also introduces information to exploit, namely, the elements in Σ2 . It
is this additional information that this methodology exploits in order to achieve
identification.
In terms of estimation strategy for B, several applications use a maximum like-
lihood estimator on the structural model, which by construction simultaneously es-
timates B and Λ, together with the parameters of the reduced form parameters. If
instead one wants to estimate first the reduced form model with OLS and then back
out B given the variance-covariance matrices of the reduced form shocks, one needs
to decompose matrices Σ1 and Σ2 into B and Λ.1
The key challenge of this identification approach is that, in itself, it does not
allow to attach economic interpretations to the impulse responses. In fact, under
the identifying strategies of triangular structure, sign restrictions and long run re-
strictions, it is a specific feature of the impulse responses that allows to interpret the
underlying shock in one way or another. Under identification through heteroskedas-
ticity, instead, the researcher exploits a variation in the statistical properties of the
data, and this, in itself, is silent about the interpretation of shocks. It is common to
then attach economic interpretations based on the shape of the impulse responses.
Note also that the validity of this identification approach is as good as the argument
that the data have undergone volatility regimes.
Applications of this methodology can be found, for instance, in Rigobon and
Sack (2004) and Lanne and Lütkepohl (2008). Other papers, including Lütkepohl
and Netšunajev (2014), do not impose the timing of the changes in the volatility
regimes, but estimate them through Markov switching. More recently, Fanelli and
Bacchiocchi (2015) extended this methodology by abandoning the assumption that
the B matrix is invariant across regimes.

1
One way to do so is the following. Start with any decomposition of Σ1 into a candidate matrix
Bc such that Bc Bc0 = Σ1 . Given Bc , construct a matrix C = (Bc−1 )Σ2 (Bc−1 )0 . Apply the spectral
decomposition to C, i.e. compute Ec and Vc such that Ec Vc Ec0 = C. Estimate matrix Λ with Vc
(by construction, Λ must be a diagonal matrix, and Vc satisfies this requirement). Then, note that
condition
C = Ec Vc Ec0 = (Bc−1 )Σ2 (Bc−1 )0
implies

Σ2 = Bc Ec Vc Ec0 Bc0
= Bc Ec Vc1/2 Vc1/2 Ec0 Bc0

Estimate matrix B with


B = Bc Ec
Since Ec contains the normalized eigenvectors of C, Ec Ec0 = I, hence Ec is orthogonal, meaning
that B = Bc Ec is a candidate matrix for B.

24
Example 9: [ADD]

4.7 Restrictions on contemporaneous independence


Another statistical approach to identification exploits the following idea.
In principle, structural shocks should not have anything to do with each
other, i.e. they should be independent. So far, we assumed that the
underlying structural shocks were normal. This implied that all rotations
of structural shocks through an orthogonal matrix yielded uncorrelated
structural shocks, which in the case of normality implies independence.
If, instead, the structural shocks are not normal, then in itself the absence
of correlation does not imply that the structural shocks are independent.
Hence, one could select rotations of an initial candidate B matrix that
imply not only uncorrelated but also independent shocks. To do so, tests
on independence must be used. Note that the challenge discussed in
the previous subsection of attaching an economic interpretation to the
computed shocks remains in place.
An application of this approach can be found, for example, in Herwartz
(2015).

5 Alternative autoregressive notation


So far we have used a very simplified notation that allowed for only one
lag in the regressors. Very different notations are used in the literature.
While possibly more complicated on first impression, these alternative
notations make life much easier in several cases. Note that a change in
notation affects only our way of organizing the data, although conceptu-
ally nothing is changed in terms of the data generating process that is
assumed behind the observables.
This Section follows Canova (2007). For simplicity, assume the case of
2 lags, 3 endogenous variables and a constant. I will only consider the
reduced form model here. In all cases considered, the model is

yt = B1 · yt−1 + B2 · yt−2 + C · ȳt + rt (15)


m×1 m×m m×1 m×m m×1 m×mc mc ×1 m×1

There are at least four main notations that one can use. Notation 1 is
particularly convenient to rewrite and simply a VAR(p) into a VAR(1).
Notation 4 is particularly convenient for Bayesian estimation of the VAR.

25
Notation 3 simplifies coding and it has been used in the Matlab codes
used for the examples.

Notation 1
xt = B ∗ · xt−1 + C ∗ · ȳt + r∗t (16)
2m×1 2m×2m 2m×1 2m×mc mc ×1 2m×1
 
B1 B2
with xt = 0
(yt0 , yt−1
B = )0 , ∗
, C ∗ = (C 0 , 00 )0 , r∗t = (rt 0 , 00 )0 , i.e.
I 0
         
yt B1 B2 yt−1 C r
= · + · ȳt + t
yt−1 I 0 yt−2 0 mc ×1 0
2m×1 2m×2m 2m×1 2m×mc 2m×1

This form takes the name of companion form and simplifies a lot the
coding, as it turns the VAR(p) into a VAR(1), i.e. a VAR with one lag
only. For example, to see if a VAR model with more than one lag is
stationary, one can first rewrite it in companion form and then check the
maximum eigenvalue of matrix B ∗ , as mentioned in Section 1.

Notation 2
yt = B(L) · yt−1 + C · ȳt + rt (17)
m×1 m×mp m×1 m×mc mc ×1 m×1

or equivalently, opening up the matrix and vector elements,

        
y1t β11,L=1 β12,L=1 β13,L=1 β11,L=2 L β12,L=2 L β13,L=2 L y1t−1 c1 r1t
y2t  = β21,L=1 β22,L=1 β23,L=1 β21,L=2 L β22,L=2 L β23,L=2 L y2t−1 +c2 +r2t 
y3t β31,L=1 β32,L=1 β33,L=1 β31,L=2 L β32,L=2 L β33,L=2 L y3t−1 c1 r3t

Notation 3
Y = X · A + E (18)
T ×m T ×k k×m T ×m

or equivalently, opening up the matrix elements

26
 
β11,L=1 β21,L=1 β31,L=1
    β12,L=1 β22,L=1
 β32,L=1 
  
. . . .  β13,L=1 β23,L=1 β33,L=1  .
0
 yt0  yt−1 0   0
y 1 
r
 = 1×1 β11,L=2 β21,L=2 β31,L=2 
 + 1×3
t−2  t
1×3 1×3 1×3
. . . . β12,L=2 β22,L=2 β32,L=2  .
 
β13,L=2 β23,L=2 β33,L=2 
c1 c2 c3
   
. . . 0
BL=1
 .
0
yt−1 0 0 
y 1 0 r
= 1×1 BL=2 + 1×3
t−2    t
1×3 1×3 0
. . . c .

This form is referred to by some as compact form.

Notation 4

y = Im ⊗ X · β + r (19)
T m×1 T ×k T m×1
| {z } mk×1
Z
mT ×mk

or equivalently, opening up the matrix and vector elements,

27
 
β11,L=1
β12,L=1 
 
β13,L=1 
 
β11,L=2 
 
β12,L=2 
 
β13,L=2 
 
 c1 
  β 
 21,L=1
. . .

 

y1

y0 0 β22,L=1   
 t−1 yt−2 1 
0 0 
   r1
T ×1  1×3 1×3 1×1
T ×7 T ×7   β23,L=1   ×1
  T
 y2   .
+ Tr×1
. .  
T ×1 =   · β21,L=2  2 
 

  
y3  0 [same] 0  β22,L=2 
 
r3
T ×7 T ×7   
T ×1
 β23,L=2  T ×1
0 0 [same]  
T ×7 T ×7  c2 
 
β31,L=1 
 
β32,L=1 
 
β33,L=1 
 
β31,L=2 
 
β32,L=2 
 
β33,L=2 
c3

Example 10: Assume that the true structural parameters are


 
1 −5.12
A=
2.19 1
   
0.06 −0.03 −0.02 −0.01
A− (L = 1) = and A− (L = 2) =
−0.03 0.06 −0.01 0.02
The true, unknown constants are equal to -4 and -11. Gener-
ate data. Estimate the model after organizing the data using
Notation 1, Notation 3 and Notation 4. Check that you get
exactly the same estimates.

6 VARs as a moving average


So far, we have expressed the data as an autoregressive process. An al-
ternative, equivalent representation of the same dataset is the moving

28
average representation. It would be acceptable, of course, to ask our-
selves why bothering representing the data as a moving average, given
that impulse responses can be more easily computed using the autoregres-
sive component. It will be shown that a moving average representation
allows to push the analysis further and implement variance and histori-
cal decompositions. Since the literature uses this alternative notation at
least just as frequently as the autoregressive notation, it is very impor-
tant to become familiar with it. Note that the model does not change, it
is just expressed using an alternative, equivalent form.
Consider the general VAR(p) model. For the moment, assume that the
model does not have a constant. The following are equivalent notations
of the autoregressive reduced form representation of the data:

yt = B− (L)yt−1 + rt
yt = B1 yt−1 + B2 yt−2 + ... + Bp yt−p + rt
2 p
(I − B1 L − B2 L − ... − Bp L )yt = rt
p
X
yt = Bl yt−l + rt
l=1

The only difference from what we have seen so far is the use of the lag
operator L, which is the operator such that Lp xt = xt−p . Note that only
p lags enter.
The model can be rewritten as a moving average representation, i.e. a
representation in which variables at time t are not expressed as a linear
function of variables in the previous periods up to a certain lag, but as a
weighted function of all previous reduced form shocks. Formally,

yt = C− (L)rt
yt = rt + C1 rt−1 + C2 rt−2 + ...
2
(I − C1 L − C2 L − ....)yt = rt
X∞
yt = rt + Cl rt−l
l=1

Note that C0 = I. Note also that lags up to infinity are used. Mathe-
matically, the following equality links the polynomials:

(I − B1 L − B2 L2 − ... − Bp Lp )−1 = (I − C1 L − C2 L2 − ....)

Of course, the tricky thing is to solve for matrices Cl given Bl .

29
The case with one lag is easy, since one can use recursive substitution:

yt = B− yt−1 + rt
2
yt = B− (yt−2 + rt−1 ) + rt = B− yt−2 + B− rt−1 + rt
= ...
Xs
s τ
yt = B− yt−s + B− rt−τ
τ =0

One can either stop the recursive substitution at t − s and write the data
as a combination of an initial observation and of the shocks occurred after
that, or can continue with the substitution and write the data exclusively
s
as a function of shocks. If the model is stationary, B− yt−s goes to zero as
s goes to infinity. The moving average representation is hence C1 = B− ,
2 3
, Cs = B− , C3 = B− , ... .
Things are instead a bit more tricky if the VAR has more than a lag. In
case the VAR is not of order one but is a general VAR(p), one either uses
the companion matrix and applies the same rule of a VAR(1) or uses the
general rule, which is

C1 = B1
C2 = B1 C1 + B2
C3 = B1 C2 + B2 C1 + B3
= ...
Cp = B1 Cp−1 + B2 Cp−2 + ... + Bp−1 C1 + Bp
Cp+s = B1 Cp+s−1 + B2 Cp+s−2 + ... + B2 Cs−1 + B1 Cs , ∀s > 0

The next step is to introduce the difference between reduced form and
structural moving average representation. Just as an autoregressive rep-
resentation of the data can be specified using the reduced form or the
structural form (i.e. expressing the data in rt or in st ), the moving av-
erage representation of the data can be specified in reduced form or in
structural form. The representation reported above is the reduced form
moving average representation. The structural moving average represen-
tation of the data is simply obtained by replacing the reduced form shocks
with the appropriate function of structural shocks, i.e.

30
yt = C− (L)Bst
= Bst + C1 Bst−1 + C2 Bst−2 + ... (20)
X∞
= Bst + Cl Bst−l
l=1

or simply
yt = D0 st + D1 st−1 + D2 st−2 + ... (21)

X
= Dl st−l
l=0

with Dl = Cl B. Note that imposing D0 = I would miss the whole point


of the analysis conducted so far.
The representation (21) allows to rethink of the impulse responses in-
troduced in Section 3. Under equation (21), the impulse responses to a
vector of structural shocks (, 0, ..0)0 is simply

φ0 = v
φ1 = C1 v
φ2 = C2 v
...
φτ = Cτ v
where
v = b1 
Note that the impulse responses computed in this way are identical to
the impulse responses computed from the autoregressive representation in
Section 3. Note again that computing impulse responses with an impulse
vector v = (, 0, ...0)0 would make little sense, as discussed in Section 3.
The model considered so far did not include a constant. If the model
includes a constant, the autoregressive reduced form representation is
simply
yt = c + B1 yt−1 + B2 yt−2 + ... + Bp yt−p + rt
2 p
(I − B1 L − B2 L − ... − Bp L )yt = c + rt
The reduced form moving average representation becomes
yt =(I − B1 − B2 − ... − Bp )−1 c
+ (I − B1 L − B2 L2 − ... − Bp Lp )−1 rt

31
The inverse of I − B1 L − B2 L2 − ... − Bp Lp leads to the parameters in
C(L), just as shown before. Note, instead, that (I −B1 −B2 −...−Bp )−1 c
is the expected value of y. In fact, absent any realization of structural
shocks, the value of yt that is consistent with the model is m such that
m = c + B1 m + ... + Bp m = (I − B1 − ... − Bp )−1 c (remember that we
are only considering stationary VARs here). Last, moving from reduced
form to structural representation only requires substituting reduced form
with structural shocks and arrange the notation accordingly. The fact
that the moving average terms do not change depending on the presence
of a constant but they only depend on the underlying autoregressive
components should convince you of why we did not need the constant
term in Example 3 to generate impulse responses using the operative
definition.

Example 11: Consider again Example 1. Compute the ele-


ments of the moving average structural representation of the
model. Generate the data using both the autoregressive re-
duced form representation and the moving average structural
representation. Compare the results.
Compute the true impulse responses using both the autore-
gressive reduced form representation and the moving average
representation. Compare the results.

Example 12: Consider again Example 10, except that the val-
ues of the constant terms set equal to zero. Compute the pa-
rameters in the moving average reduced form representation of
the data using both the mathematical result reported in the
section above and the short-cut by rewriting first the model as
a VAR(1) using a companion matrix. Compare the results.

6.1 Variance decomposition of the forecast error


[EXPLAIN MOTIVATION]

Consider the VAR(p) with two variables written in the structural moving
average representation (21). Call the 2 × 1 h-periods-ahead forecast error
ψh . Consider for the moment the one-period ahead forecast error. From
the structural moving average representation of the VAR it should be
clear that
ψh=1 = yt+1 − E(yt+1 |t) = D0 st+1

32
Figure 8: Example 11 - AR and MA representations of the same data
Variable 1 Variable 2
0.8

0.6
0.5
0.4

0.2
0
0

−0.5 −0.2

−0.4

10 20 30 40 50 10 20 30 40 50

IRF of Variab. 1, shock 1 IRF of Variab. 2, shock 1

0.08 0

0.06 −0.05

0.04
−0.1
0.02
From AR representation
−0.15 From MA representation
0
2 4 6 8 10 2 4 6 8 10

IRF of Variab. 1, shock 2 IRF of Variab. 2, shock 2


0.4

0.3
0.1

0.2

0.05
0.1

0 0
2 4 6 8 10 2 4 6 8 10

The expected value of ψh=1 is clearly zero, which we knew already from
the forecasting example in Section 2. The variance of ψh=1 , instead, can
be decomposed into the variances of the underlying stochastic processes

33
s1t and s2t . Using simple math we find that

V (ψh=1 ) = D0 V (st )D00 =


  2  
d11,0 d12,0 σ1 0 d11,0 d21,0
= =
d21,0 d22,0 0 σ22 d12,0 d22,0
d211,0 σ12 + d212,0 σ22 d11,0 d21,0 σ12 + d12,0 d22,0 σ22
 
=
d11,0 d21,0 σ12 + d12,0 d22,0 σ22 d221,0 σ12 + d222,0 σ22

i.e.
2
X
V (ψ1,h=1 ) = d211,0 σ12 + d212,0 σ22 = d21j,0 σj2
j=1
2
X
V (ψ2,h=1 ) = d221,0 σ12 + d222,0 σ22 = d22j,0 σj2
j=1

This means that the share of the variance of the forecast error of variable i
explained by the volatility of the structural shock to variable j at horizon
h = 1 is
2
d2ij,0 σj2
Rij,h=1 = P2 2
(22)
d σ 2
g=1 ig,0 g

For time horizons longer than h = 1 nothing much changes, except that
there are more terms to be considered. For instance, the two-period ahead
forecast error is

ψh=2 = yt+2 − E(yt+2 |t) = D0 st+2 + D1 st+1

Matrix algebra makes the computations more convenient, especially when


we have more than two variables in the VAR. In particular, the variance
of the h-period ahead forecast error, i.e. the denominator in the equation
(22), equals
h−1
X
V (ψh ) = Dτ V (st )Dτ0
τ =0

whose diagonal gives a k × 1 vector of total variances in the denominator


of the generalized ratio (22). The numerator, instead, can be computed
from
Xh−1
[e0i Dτ ej ]2 · V (st )j,j
τ =0

where ei and ej are the i − th and the j − th columns of the k × k


identity matrix, with k the number of variables. For example, in the
2 × 2 case considered above we know that the contribution of the second

34
shock to the variance of the first variables is h−1 2 2
P
τ =0 d12,τ σ2 . To isolate this
0 0
component we need ei = (1, 0) and ej = (0, 1) .

Example 13: [ADD ]

6.2 Historical decomposition


Impulse responses trace the effect of a general shock j of size  on the
entire set of variables. When exactly in time this shock is given is irrel-
evant, due to the linearity of the model, while the size of  only affects
the scale of the response, not the shape.
It is sometimes of interest to trace the effect of the actual realizations of
shock j on the dynamic of variable i, rather than of a shock of arbitrary
size . We certainly know that shocks sjt+1 , sjt+2 , sjt+3 , ... do not matter
for yt , but we know that yt depends on sjt , sjt−1 , sjt−2 , .... The thought
experiment is how much the general shock sj , i.e. the actual realizations
sjt , sjt−1 , sjt−2 , ..., contributed to determine yt , yt−1 , yt−2 , ... . Note that for
the moment the analysis is in population, so no estimation uncertainty
exists.
To address this question, rewrite the VAR in the structural moving av-
erage representation as shown in Section 6:

yt = D0 st + D1 st−1 + D2 st−2 + ... (23)

where the series goes to infinity. Calling dih column i of matrix Dh and
sjt shock j at time t, the moving average can be rewritten as

yt = d10 s1t + d11 s1t−1 + d12 s1t−2 + ...


| {z }
ythd,1 contribution of shocks1

...
dj0 sjt + dj1 sjt−1 + dj2 sjt−2 + ... (24)
| {z }
ythd,j contribution of shocksj

...
dn0 snt + dn1 snt−1 + dn2 snt−2 + ...
| {z }
ythd,n contribution of shocksn

Consider two thought experiments. The first one is, what is the contri-
bution of sjt , i.e. of the realization of shock j at time t on the variables
of the system? This will be dj0 sjt on yt , dj1 sjt on yt−1 , dj2 sjt on yt−2 and so

35
on. Note that this is simply the impulse response [dj0 , dj1 , dj2 , ...] to shock
j, multiplied by a shock of size sjt rather than of size equal to the arbi-
trary scaling factor  used in Section 3. Historical decomposition does
something different, although related: what is the contribution of shock
sj (rather than of realization sjt ) to the variables yt of the system? From
equation (24) it should be clear that this equals

ythd,j = dj0 sjt + dj1 sjt−1 + dj2 sjt−2 + ... (25)

Note that by construction nj=1 ythd j = yt .


P

In applied work we can compute D0 , D1 , ..., D∞ , but we cannot estimate


s−∞ , ..., s∞ , since we do not have data for such a time period. Instead, we
can only use data in the dataset covering the general period t = 1, 2, ..., T .
Hence, the moving average decomposition holds only approximately, al-
though the approximation improves if we are sufficiently away from t = 1
due to the stationarity of the system. The approximation is hence

ỹt = D0 st + D1 st−1 + D2 st−2 + ...Dt−1 s1 ≈ yt (26)

The variance decomposition then becomes

ỹthd,j = dj0 sjt + dj1 sjt−1 + dj2 sjt−2 + ...djt−1 sj1 (27)

Note that in principle we are still in population, i.e., other than the
approximation of the full moving average process, the expression holds
for the true realizations of shocks and for the true parameter values.
In applied work Dj and sjt are substituted with estimates, for obvious
reasons.
A few remarks are due:

• historical decompositions should be interpreted as the cumulative


effect of shocks on variables. They should not instead be interpreted
as “what would have occurred had only one type of shock occurred”,
i.e. as the couterfactual value of yt , t = 1, 2, ..., T had we had had all
structural shocks except sjt with t = 1, 2, ..., T equal to zero. Such
an interpretation would go against the Lucas’ critique, since agents
would have probably behaved differently had they observed that
no shock other than the sj shock is hitting the economy, and their
adjustment would impact on matrices Dj . Historical decomposition
only interprets the observed data as a composition of cumulative
effects of several structural shocks.
• The historical decomposition incorporates an approximation error
due to the truncation of the moving average. To appreciate how

36
relevant this error is, one can plot both the true data and the sum
of historical decompositions. The difference should be bigger the
closer we are to the initial period.

Example 14: [ADD ]

7 A, B and AB specification of the SVAR


So far, it might seem that specifying a structural model in A form or in
B form is equivalent, because once the identification is achieved, one can
move from one notation to another by inverting A. While this is true
mathematically, it is important to bear in mind that different specifica-
tions allow to think about restrictions very differently.
To see this, consider the following example. Suppose we have estimated a
reduced form model in three variables, and that we are willing to impose
the following zero restrictions for the B matrix:
 
b11 b12 0
B =  0 b22 b23 
b31 0 b33

By now, it should be clear that this set of restrictions implies that the
first shock does not have any contemporaneous effect on y2t , the sec-
ond shock does not have any contemporaneous effect on y3t and the last
shock does not have any overall contemporaneous effect on y1t . Consider
now the corresponding structural form. Simple math shows that in the
corresponding A matrix no zero appears, i.e.
 
a11 a12 a13
A = a21 a22 a23 
a31 a32 a33

From this matrix it might seem that all variables contemporaneously


affect one another, which would imply that any shock will have an impact
on any variable. How can this be, given what we assumed from the
specification of the B matrix? The reason is that, while A might seem
to display no restriction, it actually does contain restrictions because its
entries are not free parameters, but are a function of the entries of B,
which are restricted. This means that, in the case considered, thinking of
the model in B form is more appropriate, because imposing restrictions
on B is easy (they are just zero restrictions), while imposing the implied

37
restrictions on A is much harder (they are non-linear restrictions among
the entries {ai,j }).
Consider now the case in which the true structural form is
 
a11 a12 0
A =  0 a22 a23 
a31 0 a33

Note that the corresponding system of equations is

y1t = −a−1 −1
11 a12 y2t + ... + a11 s1t
y2t = −a−1 −1
22 a23 y3t + ... + a22 s2t
y3t = −a−1 −1
33 a31 y1t + ... + a33 s3t

By symmetry with the example given above, we know that the B matrix
does not display any zero entry, and hence each structural shock affects
all variables. While this might not be obvious from the system specified
in A form, one should be able to see that, for example, s1t affects the
first variable directly, which indirectly affects the third variable, which
indirectly affects the second variable. Similarly, s2t affects the second
variable directly, the first variable indirectly and the third variable indi-
rectly. So, what do the restrictions mean? They mean that, conditioning
on the third variable, the first shock should have no effect on the second
variable. Or similarly, that conditioning on the first variable, a shock to
the second variable should not affect contemporaneously the third vari-
able. Such restrictions reflect restrictions on elasticities rather than on
overall effects. An example can be found in Caldara and Kamps (2012).
The example above should have clarified the distinction between A and
B forms, but it is worth taking the discussion a step forward and derive
impulse responses. For simplicity I will only consider impulse responses
only with regard to the impact effects. Consider the model
      
a11 −a12 y1t ... s
= + 1t
−a21 a22 y2t ... s2t
or equivalently,
a12 1 1
y1t = y2t + ... + s1t (28)
a11 a11 a11
a21 1 1
y2t = y1t + ... + s2t
a22 a22 a22
(29)

38
The B matrix corresponding to this model in A form is
 
1 a22 a21
B=
a11 a22 − a21 a12 a12 a11
Accordingly, the impulse vector corresponding to a shock s1t of size 1 is
 
1 a22
φ=
a11 a22 − a21 a12 a12
This is the overall impact effect of such a shock.
Consider now the A specification. The same overall effect emerges by
construction from computing the impulse vector step by step. To this
purpose, simplify notation from model (28) and define γ11 = a12 /a11 ,
γ12 = 1/a11 ,γ21 = a21 /a22 , γ22 = 1/a22 . Model (28) then rewrites as

y1t = γ11 y2t + ... + γ12 s1t (30)


y2t = γ21 y1t + ... + γ22 s2t (31)

The instantaneous effect of a shock to s1t of size 1 to variable y1t is


∆1 y1t = γ12 . By equation (31) this generates an instantaneous effect
on y2t of ∆1 y2t = γ21 · ∆1 y1t = γ21 γ12 . This, in turn, affects again y1t
through equation (30) by a marginal effect equal to ∆2 y1t = γ11 ∆1 y2t .
By repeating substitutions, we find that the marginal effects on variable
y1t are

∆1 y1t = γ12
∆2 y1t = (γ11 γ21 )γ12
∆3 y1t = (γ11 γ21 )2 γ12
∆4 y1t = (γ11 γ21 )3 γ12
...

Consequently, under appropriate stationarity restrictions, the overall ef-


fect is
∞ ∞
X
i
X 1 a22
∆ y1t = γ12 (γ11 γ21 )i = γ12 =
i=0 i=0
1 − γ11 γ21 a11 a22 − a21 a12

which coincides with the one computed from the B model. The same can
be verified regarding the effect on y2t .
While apparently more complicated, computing the impulse response this
way allows to isolate the effects of multipliers. Consider for example the
case in which the two equations considered are a negatively sloped de-
mand function and a positively sloped supply function. An exogenous

39
increase in supply would increase the quantity for each price level. Nev-
ertheless, such a higher quantity would be demanded only at a lower
price. But the decrease in price would decrease the quantity supplied,
which would then increase the quantity demanded, and so on. Overall,
this argument leads to the new equilibrium. The total effect can be read
from the B form of the model. An alternative example is the case of a
positive demand function for some financial asset, say due to speculation.
An increase in price increases demand, which increases the price, which
again increases demand and so on. The A form specifies this mechanism
directly, while the B form captures the overall impact.

In short, which specification of the structural form is more appropriate


depends on the case at hand, i.e., on the restrictions that we are willing to
impose. The A form models instantaneous links among the endogenous
variables, while the B form models the impact effect of structural shocks.
In some cases one needs to impose restrictions on both elasticities and
total effects. In such cases, the AB form is used:

Ayt = A(L)yt−1 + Bst

A detailed assessment of A, B and AB models can be found in Lütkepohl


(2007). See also Amisano and Giannini (2002).

8 Interpreting structural shocks


We saw that VAR models (i.e. reduced form VAR models, or RVARs)
only use the dynamics of the data, and that SVARs are used to estimate
structural shocks and several statistics related to them. A useful prelim-
inary step to the analysis of certain structural shock is to think if there
is any theoretical justification to whether such an exogenous variation
could exist in the first place. A good example for this is the case of a
monetary shock.
Literally speaking, a monetary shock is a variation in the federal funds
rate that is exogenous to the state of the economy (or to be more precise,
exogenous to the variables included in the model). A good starting ques-
tion is to think whether it can ever be the case that the fed funds rate
experiences an exogenous variation. After all, central banks always set
interest rates in response to something, i.e. in response to the current or
the expected state of the economy. Taken literally, this argument would
leave little room to the existence of a monetary shock.

40
The literature has developed several arguments why at least a small part
of the variations in the federal funds rate might be exogenous. One
of these is to consider that the policy rate is set as the outcome of a
discussion among members of a committee, a process that might leave
room for some exogeneity. For example, one can assume that whether
one member of the committee was more convincing than another one
on a specific day, given an identical information set available, depends
on factors that are external to the state of the economy (for example,
mood, whether his or her kids let him/her sleep the night before etc.).
These and other interpretations are discussed, for example, in Romer and
Romer (2004).
Another interesting case on the general nature and existence of structural
shocks is the case of a model that pins down the joint determination of
price and quantity in a given market. If structural analysis is required,
one could be led to interpret the two underlying structural shocks as a
price shock and a quantity shock. This, though, opens the question for
what a price shock could possibly be. In fact, if one considers a Walrasian
setting, variations in prices depend on demand and supply, so a variation
in price should be decomposed into the underlying cause, otherwise it
remains endogenous. The same holds with regard to a quantity shock.
Instead of interpreting shocks in terms of a price shock and a quantity
shock, it is more common to label shocks as a demand shock and a supply
shock. In such a scenario, sign restrictions are usually used for identifi-
cation. Note that this means that in an equation featuring as dependent
variable, say, oil price, the corresponding structural shock cannot just
be labelled “oil price shock”. For a discussion on this point see Kilian
(2009).

41
References
Amisano, G. and Giannini, C. (2002). Topics in structural var economet-
rics.

Bernanke, B. S., Boivin, J., and Eliasz, P. (2005). Measuring the effects
of monetary policy: A factor’augmented vector autoregressive (favar)
approach. The Quarterly Journal of Economics, pages 387 – 422.

Binning, A. (2013). Underidentified svar models: A framework for com-


bining short and long-run restrictions with sign restrictions.

Blanchard, O. J. and Quah, D. (1989). The dynamic effects of aggregate


demand and supply disturbances. The American Economic Review,
79(04):655–673.

Brandt, P. T. and Williams, J. T. (2007). Multiple time series models.


Number 148. Sage.

Caldara, D. and Kamps, C. (2012). The analytics of svars: A unified


framework to measure fiscal multipliers, (with. FEDs Working Paper
Series 2012-20.

Canova, F. (2007). Methods for applied macroeconomic research, vol-


ume 13. Princeton University Press.

Canova, F. and De Nicolo, G. (2002). Monetary disturbances matter


for business fluctuations in the g-7. Journal of Monetary Economics,
49(6):1131–1159.

Canova, F. and Pina, J. P. (1999). Monetary policy misspecification in


var models. CEPR Discussion Papers 2333.

Christiano, L. J., Eichenbaum, M., and Evans, C. L. (1999). Monetary


policy shocks: What have we learned and to what end? Handbook of
macroeconomics, 1:65–148.

Eickmeier, S. and Hofmann, B. (2013). Monetary policy, housing booms,


and financial (im) balances. Macroeconomic dynamics, 17(04):830–860.

Fanelli, L. and Bacchiocchi, E. (2015). Identification in structural vector


autoregressive models with structural changes, with an application to
u.s. monetary policy. Oxford Bulletin of Economics and Statistics.

Fisher, F. M. (1966). The Identification Problem in Econometrics. Robert


E. Krieger Publishing.

42
Fry, R. and Pagan, A. (2011). Sign restrictions in structural vector
autoregressions: A critical review. Journal of Economic Literature,
49(4):938–960.
Herwartz, H. (2015). Structural modelling with independent innovations.
Kilian, L. (2009). Not all oil price shocks are alike: Disentangling demand
and supply shocks in the crude oil market. American Economic Review,
99(3):1053–1069.
Lanne, M. and Lütkepohl, H. (2008). Identifying monetary policy shocks
via changes in volatility. Journal of Money, Credit and Banking,
40(6):1131–1149.
Lütkepohl, H. (2007). New introduction to multiple time series analysis.
Springer Science & Business Media.
Lütkepohl, H. (2014). Structural vector autoregressive analysis in a data
rich environment: A survey.
Lütkepohl, H. and Netšunajev, A. (2014). Disentangling demand and
supply shocks in the crude oil market: How to check sign restrictions
in structural vars. Journal of Applied Econometrics, 29(3):479–496.
Mertens, K. and Ravn, M. O. (2013). The dynamic effects of personal
and corporate income tax changes in the united states. The American
Economic Review, 103(4):1212–1247.
Rigobon, R. (2003). Identification through heteroskedasticity. Review of
Economics and Statistics, 85(4):777–792.
Rigobon, R. and Sack, B. (2004). The impact of monetary policy on asset
prices. Journal of Monetary Economics, 51(8):1553–1575.
Romer, C. and Romer, D. (2004). A new measure of monetary shocks:
Derivation and implications. The American Economic Review, pages
1055–1084.
Rubio-Ramirez, J. F., Waggoner, D. F., and Zha, T. (2010). Structural
vector autoregressions: Theory of identification and algorithms for in-
ference. The Review of Economic Studies, 77(2):665–696.
Stock, J. H. and Watson, M. W. (2011). Dynamic factor models. Oxford
Handbook of Economic Forecasting, 1:35–59.
Stock, J. H. and Watson, M. W. (2012). Disentangling the channels of the
2007-2009 recession. Technical report, National Bureau of Economic
Research.

43

You might also like