Notes On The Identification of VARs Using External Instruments - Michele Piffer
Notes On The Identification of VARs Using External Instruments - Michele Piffer
instruments
Michele Piffer∗
June 2, 2020
These notes derive and discuss the methodology proposed by Stock and Watson
(2012) and Mertens and Ravn (2013) to identify Structural VAR models.
∗
DIW Berlin, Mohrenstrasse 58, 10117 Berlin, Germany. Email: [email protected], personal
web page: https://sites.google.com/site/michelepiffereconomics/. These notes can be reproduced
freely for educational and research purposes as long as they contain this notice and are retained for
personal use or distributed for free. All errors are mine. Please get in touch if you find typos or
mistakes.
1
1 Preliminary
The identification of structural shocks within Vector Autoregressive (VAR) models
usually consists of obtaining the B matrix in the model
where smonetary
t is the structural monetary shock, sothers
t is a vector including the
non-monetary shocks and B∗ is a matrix containing the n − 1 column vectors corre-
sponding to such shocks. If instead we are interested in estimating the realizations
of the monetary shocks, then we are after the row vector at (written as a column
vector) of the matrix A = B −1 , which maps the reduced form shocks r t = Bst back
into structural shocks, i.e.
smonetary
t = a0t r t
These notes discuss and derive such an approach of obtaining b and a. The main
case considered is the case in which we want to identify only one structural shock, and
for such shock we only have one instrument. The literature has discussed also other
possibilities. For example, Mertens and Ravn (2013) identify two structural shocks
using two correlated instruments. Stock and Watson (2012), instead, identify several
shocks of interests, and for each structural shock they consider different candidate
instruments. For simplicity, the key intuition of the approach is derived here using
the most simplified model. Most of the intuitions are outlined using a VAR with
2 variables, while the extension to a VAR with n variables will be briefly discussed
during the exposition. Some derivations are available in Olea et al. (2012).
1
An informal introduction to these models is available on
https://sites.google.com/site/michelepiffereconomics/VARs.pdf?attredirects=0d=1.
2
2 Estimating Impulse Response Functions
Consider the reduced form VAR given by
with
r t = Bst
= b1 sa,t + b2 sb,t
r1,t b11 b12 sa,t
=
r2,t b2,1 b22 sb,t
b b
= 11 sa,t + 12 sb,t
b21 b22
and V (st ) is known to equal the identity matrix. The reduced form model delivers
a set of covariance restrictions,
Σ = BB 0
The reduced form shocks r1,t and r2,t correspond to variables y1,t and y2,t , respec-
tively. The structural shocks sa,t and sb,t , instead, are left intentionally unspecified
and do not correspond to a defined ordering of the variables, i.e. they are not
necessarily a s1,t and a s2,t shock. To understand why this distinction is relevant,
consider two examples. In a VAR with output and the federal funds, it makes sense
to name the structural shock in the structural equation of the federal funds rate as
the monetary policy shock. In a VAR with quantity and price, it makes more sense
to interpret the structural shocks as demand and supply shocks, and not as a price
and a quantity shock. We will get back to this later.
Suppose we want to conduct structural analysis on the shock sa,t . We have a
variable mt such that
E(mt sa,t ) = φ
E(mt sb,t ) = 0
Note that we do not specify the relationship between mt and sa,t , but only the
expected value of the product, i.e. the covariance, given that structural shocks have
expected value equal to 0. mt could be some complicated function of sa,t , in which
case the covariance φ would be somehow determined by the parameters in such
function. Nevertheless, as long as there is some correlation between sat and mt we
do not need to determine where this correlation comes from.
3
The key equality to exploit is
If we had the true value of φ and the population values of the vector E(r t mt ), we
could recover the true values of b11 and b21 simply as bi1 = E(ri,t mt )φ−1 . Similarly,
define the ratio
b21
µ1 =
b11
If we had the population vector E(r t mt ) but no value for φ, we could recover the
true ratio µ1 as
E(r2,t mt )
µ1 =
E(r1,t mt )
If instead we do not have φ nor the population value of E(r t mt ) we might still aim
for an estimation of b11 , b21 and or b21 /b11 . The rest of these notes operate under this
scenario, which is the only realistic one in applied work. We assume we have data
on mt and some estimates of r t from the estimation of the reduced form model.
There are many ways of exploiting the information provided by the instrument
mt in order to estimate bij . The first step is to get µ1 .
Ê(r2,t mt )
µ̂1 =
Ê(r1,t mt )
The poorer are our estimates for the population moments Ê(ri,t mt ), the poorer
is our estimate for µ1 .
r1,t = δ1 mt + 1
r2,t = δ2 mt + 2
4
we can estimate µ1 as
δ̂2
µ̂1 =
δ̂1
The corresponding estimate coincides with the one from point 1 because, from
OLS estimation,
Ê(ri,t mt )
δ̂i = (2)
Ê(m2t )
Hence, the term Ê(m2t ) drops when taking the ratio δ2 /δ1 , delivering the same
estimate for µ1 .
3. 2SLS : Given the regression
r2,t = γ1 r1,t + ηt
one can estimate µ1 with γ1 , obtained using 2 Stages Least Squares estimation
with mt as instruments for r1,t . Even this estimate coincides (by construction)
with the one from point 1 (and hence also point 2) because the fitted values
from the first step would be
Ê(r1,t mt )
r̂1,t = · mt
Ê(m2t )
so the second step yields
Ê(r1,t mt ) Ê(r1,t mt )
Ê(r2,t · r̂1,t ) Ê(r2,t · Ê(m2t )
· mt ) Ê(m2t )
Ê(r2,t · mt ) Ê(r2,t mt )
γ̂ = = = =
2
Ê(r̂1,t ) Ê(r1,t mt ) 2 Ê(r1,t mt ) 2 Ê(r1,t mt )
Ê Ê(m2t )
· m2t ) Ê(m2t )
Ê(m2t )
One might wonder why engaging in doing a 2SLS estimation when the results
are identical with the other two approaches. One of the answers is that the
2SLS simplifies things when one has more than one instrument per shock of
interest. For example, if one has one shock of interest and 2 instruments,
he or she can implement a GMM estimator with the 2SLS procedure by first
regressing the VAR residuals in one equation on all instruments, and by then
regressing the remaining VAR residuals on the fitted value obtained from the
preliminary regression.
In short, given some data on an instrument mt for the structural shock sa,t , we
cannot immediately estimate the single elements of the column of the B matrix
corresponding to the shock sa,t , but we can at least consistently estimate the ratio
µ1 = b21 /b11 . Note that the covariance restrictions included in Σ have not been used
so far.
Given an estimate of µ1 , the impulse responses of variables y1,t and y2,t to the
structural shock sa,t can be computed in the following ways:
5
1. combine the information b21 = µ̂1 b11 with the covariance restrictions Σ = BB 0
of the reduced form model, solve for b1 , compute impulse responses using
Given our knowledge of V (st ) = I in population, this gives the impulse response
to a one standard deviation shock. Note that solving for b1 instead of µ1 is not
immediate, we will discuss it further below.
2. Without solving for the full vector b1 , compute impulse responses starting from
1
relative impulse vector = ·1
µ̂1
from point 3 above and scaling again the size of the shock as desired. We refer
to this vector as the inconsistently-estimated impulse vector since it delivers
φ
b1 up to the scale factor E(m 2) .
t
Before continuing, it can be useful to remark that the definition of µ1 was arbi-
trary. In principle, we could compute impulse responses also using the ratio
b11
µ2 = = µ−1
1
b21
and then
µ̂2
relative impulse vector =
1
6
do not know the true values b11 and b21 , we might want to define the ratio µ as to
minimize the risk of dividing by a parameter close to zero. For example, if we are
after uncertainty shocks, we might want to define µ by dividing the vector b by bi ,
where i is the equation of the measure of uncertainty included in the VAR. This
would work under the assumption that an uncertainty shock is likely to have a non
zero impact on the uncertainty measure. In general, estimating the inconsistently-
estimated impulse vector does not require taking a stand on which equation will be
used in the normalization for the definition of the relative impulse vector. If instead
we are after the absolute impulse vector, we need to choose which variable to use to
define the relative impulse vector. Asymptotically this choice is irrelevant, in finite
sample not.
The generalization of the estimation of the relative impulse vector to a VAR
models with n variables is immediate and will be omitted. The only thing that
changes is that the scalar µ becomes a vector µ.
r1,t = const1 + δ1 mt + 1
r2,t = const2 + δ2 mt + 2
In principle, the ratio δ2 /δ1 is not necessarily equivalent to the same ratio computed
from the regressions in the previous section. This turns out to be irrelevant. In fact,
the estimator for δi would now be combining the data as
7
(and as long as the regressors are not close to multicollinear, otherwise the inverse
of X 0 X would be imprecisely calculated).2 Hence the term Ê(ri,t )Ê(mt ) equals zero.
This implies that the inclusion of a constant term in the single regression approach
is irrelevant: it would affect selected estimates, but not the ratio that we are after.
This holds in finite sample and does not rely on any asymptotic result.
Consider now the other approach, i.e. the 2SLS approach. When including a
constant, the regression in the first stage is
r1,t = const1 + δ1 mt + 1
The derivations will be skipped, but the results from the previous section hold, i.e.
the estimate of γ1 coincides with the estimate of the ratio δ2 /δ1 . It follows that,
again, the inclusion of a constant term is irrelevant, as long as it is included in the
regressions of both the first and the second step of the 2SLS.
8
While the relevance and exogeneity conditions are not testable, we can test if
there is a strong correlation between the instrument and the VAR innovations. In
the notation used so far, a strong relationship between the instrument mt and the
reduced form innovations ri,t is a necessary condition for mt to be considered a useful
tool to inspect the underlying drivers of ri,t . One way to do so is to consider the
regressions from the Single Regression approach. These regressions are replicated
here for convenience, considering the general equation i and considering both the
case in which a constant term is included and is not included:
ri,t = δi mt + i (4)
ri,t = const + δi mt + i (5)
Lags are left unspecified to make notation less hefty. syt and smp
t stand for produc-
tivity shock and monetary policy shock, respectively. The equivalent way of writing
this model is
Note that the monetary shock enters the equation of the federal funds rate while the
productivity shock enters the equation of output.
9
If we rewrite the VAR in the B structural specification we get
y
yt lags 1 1 a12 st
= +
it lags 1 − a12 a21 a21 1 smp
t
| {z }
B
h 1
lags 1 a i
= + st + 12 smp
y
t
lags 1 − a12 a21 a21 1
Of course, one can reshuffle the columns of the B matrix and the corresponding
shocks:
h a i
yt lags 1 mp 1
= + 12
st + syt
it lags 1 − a12 a21 1 a 21
mp
lags 1 a12 1 st
= +
lags 1 − a12 a21 1 a21 syt
| {z }
B̃
Note that B and B̃ differ only up to the ordering of the columns. Since the reduced
form shocks have not changed, Σ has not changed, implying Σ = BB 0 = B̃ B̃ 0 . Put
it differently, the ordering of the shocks in the B form of the model is arbitrary.3
Nevertheless, it is not arbitrary in the A form, i.e. you cannot reshuffle the B matrix
and then expect an à specification in which syt enters the equation of the federal
funds rate and smpt enters the equation of output. To see this, compute the A form
of the reschuffled model and note that
mp
−a21 1 yt lags s
= + ty
1 −a12 it lags st
| {z }
Ã
In short, the order of the shocks in the B form is arbitrary, and reshufling the columns
of B and the corresponding shocks implies reshufling the rows of the corresponding
A model.
3
Note that
b211 + b212
b b12 b11 b21 b11 b21 + b12 b22
BB = 11
0
=
b21 b22 b12 b22 b11 b21 + b12 b22 b221 + b222
b212 + b211
b b11 b12 b22 b12 b22 + b11 b21
B̃ B̃ 0 = 12 = = BB 0
b22 b21 b11 b21 b12 b22 + b11 b21 b222 + b221
10
The implication of this remark on the use of external instruments in VARs is the
following: the ratio µ1 , or equivalently the ratio µ2 , is not the ratio of the elements
of a particular column of B. It is just an arbitrarily-selected column of the B matrix.
Calling j the arbitrary position of the shock of interest within the vector of structural
shocks st , the column (row) vector in the B (A) matrix for the estimation of impulse
responses (structural shocks) will be the j − th column (row).
E(r t mt ) = b1 φ
Since we can get an estimate of the left-hand side, we can estimate b1 up to a scale,
i.e. up to the unknown φ on the right-hand side, which will scale up and down our
estimate of b1 .
While this might look like a limitation, it is actually a strength, because it allows
to get around a measurement error problem. To see this, consider again the model
that links the reduced-form shocks to the structural shocks
r1,t = b11 sa,t + b12 sb,t
r2,t = b21 sa,t + b22 sb,t
The identification problem consists of estimating the elements bij using the reduced
form shocks rj,t and an instrument. Suppose we have the true realizations of the
structural shock sa,t . Then the regression of ri,t on sa,t leaves the remaining compo-
nent bi,2 sb,t in the error term and delivers a consistent estimate of bi1 . The estimate,
in fact, equals
P
t ri,t sa,t
b̂i1 = P 2
=
t sa,t
P P
(b 11 s a,t + b 12 s b,t )s a,t t sb,t sa,t /T
= t P 2 = bi1 + bi2 P 2
→ bi1
t sa,t t sa,t /T
since sa,t and sb,t , being structural shocks, should be uncorrelated. In this case, we
consistently estimate the single parameters b11 and b21 rather than just their ratio.
Consider instead the case in which we have a noisy measure of sa,t rather than
the true values. In particular, let us write this measure as
mt = sa,t + t with E(t sb,τ ) = 0, ∀τ
11
Then the regression of ri,t on mt estimates the model
This does not converge to bi1 because the correlation between mt and qt is not zero,
due to a simple manifestation of the measurement error bias. Nevertheless, the ratio
of the inconsistent estimates gives a consistent estimate of the ratio. To see this,
rewrite first the estimate as
P
t ri,t mt
b̂i1 = P =
t m2t
P
(bi1 sa,t + bi2 sb,t )mt
= t P 2 =
P t mt P
bi1 t sa,t mt + bi2 sb,t mt
= P 2
t mt
On the contrary, the following measurement error would not imply consistent esti-
mates of the relative impulse vector
12
7 Recovering the full B matrix
So far we have discussed a method to estimate the ratio µ of the parameters in the
arbitrary i column of the B matrix. Let us now recover the full column i. This can
be useful, for instance, to study the impulse responses to a one standard deviation
shock. The key piece of information to use is the covariance restrictions of the model.
Note that so far we have made no use of these restrictions, but only of the instrument
mt . Using this additional piece of information allows to estimate the impulse vector
b not only up to a scale, but in absolute terms. Nevertheless, the vector will still be
estimated up to a sign convention.
Consider the VAR with 2 variables studied so far. The covariance restrictions are
expressed in the system Σ = BB 0 . More formally, the system is
In a standard VAR we aim to solve for bij using Σij . Here, instead, we make use of
the additional information included in the ratio µ, which has already been estimated.
Without loss of generality, consider the ratio µ = b21 /b11 . All we need in order to
solve for in order to recover the entire vector b1 = (b11 , b21 )0 is b11 .
This parameter can be estimated as follows.4 Substituting out b21 = µb11 , the
system becomes
13
b12 b22 − µb212 = Σ12 − µΣ11
b12 (b22 − µb12 ) = Σ12 − µΣ11
b212 (b22 − µb12 )2 = (Σ12 − µΣ11 )2
b212 = (b222 + µ2 b212 − 2b22 b12 µ)−1 (Σ12 − µΣ11 )2
| {z }
γ
−1
b212 =γ (Σ12 − µΣ11 )2
We do have (Σ12 − µΣ11 )2 , but we don’t have γ, which includes the parameter b12
that we are after. Nevertheless, we can rewrite γ as a function of known parameters
in the following way. First, note that the term b22 b12 appears in equation (7). Take
equation (8) and subtract equation (7) times µ, get
Then, to get rid of b222 , note that it appears in equation (8). Start from equation (8)
and subtract equation (6) times µ2 , obtaining
Having γ, solve for b212 , then solve for b11 . This allows to compute the full column of
the B matrix corresponding to the sa,t shock.
14
the relative impulse vector, if, when solving for b11 with the positive sign, a monetary
shock of size 1 implies a decrease of the federal funds rate, then one can generate a
monetary tightening by simply giving the same shock and by taking the solution to
b11 with the negative sign.
Let us conclude this section by showing the derivations for the case in which the
VAR has more than 2 variables. The only difference with respect to the case with 2
variables is that some operations arranged above with scalars must be rethought of
now in matrix notation. With a VAR with n + 1 variables, write the B matrix as
b11 b012
B=
b21 B22
b11 is a scalar for the 1,1 entry of the B matrix, b21 is an n × 1 vector of the first
column of the B matrix written as a column vector, after excluding the first entry,
b12 is an n × 1 vector of the first row of the B matrix written as a row vector,
after excluding the first entry and B22 is an n × n matrix containing the remaining
elements of the B matrix.
Write the system of equations as
The external instrument will deliver the vector µ = b21 /b11 . We can substitute out
b21 = µb11 , obtaining
15
Hence
0
b012 b12 = (σ 21 − σ11 µ)0 (B22 − µb012 )−1 (B22 − µb012 )−1 (σ 21 − σ11 µ)
= (σ 21 − σ11 µ)0 [(B22 − µb012 )(B22 − µb012 )0 ]−1 (σ 21 − σ11 µ)
| {z }
Γ
with
16
st = B −1 r t
= Ar t
Since the procedure discussed so far does not deliver the full B matrix but only
a column of it, this cannot be done. Nevertheless, we could recover the shocks of
interest if we had only one row of A, as long as it is the row corresponding to the
shock of interest.
More precisely, the identification of the impulse response corresponding to shock
sa,t gave us the vector b, where the position of this vector within the B matrix is
irrelevant. What we are after is the row vector a of the A = B −1 matrix correspond-
ing to the position of the b column in B. When we have a, we can estimate the
structural shocks of interest as
sa,t = a0 · r t
a gives the weights we should use to combine the reduced form shocks in r t in order
to obtain the shock of interest.
The following procedure, developed for example in Olea et al. (2012), can be used
when having one instrument for one shock of interest. The first step is to estimate
φ, where we had that
φ = E(mt sa,t )
We started the notes discussing the equality
E(r t mt ) = bφ
In this expression, E(r t mt ) can be computed from the data. b, instead, can be
estimated using the procedure discussed until now. This means that we can recover
an estimate φ̂ using any of the equations in the above equality, and taking the ratio
of the element in E(r t mt ) over the corresponding element in the b vector.
Then, consider the regression of the instrument mt on the reduced form shocks
r t . Note that this is the opposite regression of the one we considered in Section 2 to
estimate the relative impulse vector. The model is
mt = r 0t ι + error
17
ι̂ = (R0 R)−1 R0 m
PT r r 0 −1 PT r m
t=1 t t t=1 t t
=
T T
PT s s0 −1 PT r m
t t 0 t=1 t t
= B t=1 B
T T
−1
→ BB 0 E(r t mt )
= B 0−1 B −1 bφ
0−1 1
=B φ
0
0 1
=A φ
0
= aφ
(R0 R)−1 R0 m → aφ
PT r r 0 −1
PT
t=1 t t t=1 r t mt
→ aφ
T T
Σ−1 Ê(r t mt ) → aφ
which implies
The above methodology allows to estimate the a vector when we have one instru-
ment for the shock of interest. When one has g instruments things change, since we
would not have a single condition E(r t mt ) = bφ, but g conditions E(r t mj,t ) = φj ,
j = 1, 2, ..., g. In Section 2 we briefly mentioned that the 2SLS approach allows to
combine all these moment conditions to obtain a single estimate of the b vector.
18
If we then want to estimate the corresponding a vector we cannot proceed as just
discussed, since this would require making use of only one of the moment conditions
in E(r t mj,t ) = φj . What we can do instead is to start from the estimated b vector,
and combine it with the covariance restrictions of the model.
More precisely, we make use of the following result regarding the inverse of a
partitioned matrix. Given
B11 B12
B=
B21 B22
the inverse of B is
−1 −1 −1
B21 )−1 B21 )−1 B12 B22
−1 (B11 − B12 B22 −(B11 − B12 B22
B = −1 −1 −1
−(B22 − B21 B11 B12 )−1 B21 B11 (B22 − B21 B11 B12 )−1
In the application developed here, we have solved for the first column of the B matrix
and need to derive the first row of its inverse. Using again the notation from Section
7, write the B matrix as
b̂11 b012
B=
b̂21 B22
where .̂ indicates the elements that we have. We want to solve for the first row of
A = B −1 as a function of, at most, b̂11 , b̂21 , Σ.
Applying the rule for the inverse of a partitioned matrix, we get that the a vector,
written as a column, equals
0 −1 −1 1
a = (b̂11 − b12 B22 b̂21 ) 0−1 (14)
−B22 b12
0−1
This means that if we can solve for B22 b12 , or equally, for b012 B22
−1
, we get a, irre-
spectively on how many instruments were used to obtain the first column of the B
matrix. Note that we do not need to solve for b12 and B22 separately.
The term b012 B22−1
can be computed starting from the system of equations in
Section 7. In particular, start from
19
Equation (15) can be rearranged as
20
References
Mertens, K. and M. O. Ravn (2013). The dynamic effects of personal and corporate
income tax changes in the united states. The American Economic Review 103 (4),
1212–1247.
21