Koop 2009
Koop 2009
a r t i c l e in fo abstract
Article history: This paper investigates whether the monetary transmission mechanism has changed or
Received 18 August 2007 whether apparent changes are due to changes in the volatility of exogenous shocks. Also,
Accepted 25 November 2008 the question of whether any changes have been gradual or abrupt is considered. A
Available online 24 December 2008
mixture innovation model is used which extends the class of time-varying vector
JEL classification: autoregressive models with stochastic volatility. The advantage of our extension is that
C11 it allows us to estimate whether, where, when and how parameter change is occurring.
C32 Our empirical results indicate that the transmission mechanism, the volatility of
E52 exogenous shocks and the correlations between exogenous shocks are all changing.
& 2008 Elsevier B.V. All rights reserved.
Keywords:
Structural VAR
Monetary policy
Bayesian
Mixture innovation model
Time-varying parameter model
1. Introduction
Questions of interest to policymakers typically involve the inter-relationships between several macroeconomic
variables. To investigate such questions, it is common to build a macroeconomic model (e.g. based on a vector
autoregressive, VAR, model) where exogenous shocks impact on the variables under study. The manner in which the
exogenous variables affect the variables of interest is referred to as the transmission mechanism. Traditionally, estimation
of the transmission mechanism (or features such as impulse responses which shed light on it) was considered a major goal
of many macroeconomic papers. However, empirical researchers have realized two important things. First, the
transmission mechanism may not be constant over time. Second, the way the exogenous shocks are generated (and, in
particular, their variance) can change over time.
Consider, for instance, U.S. monetary policy and the question of whether the macroeconomic events of the 1970s were
due to bad policy or bad luck. Some authors (e.g. Boivin and Giannoni, 2006; Cogley and Sargent, 2001; Lubik and
Schorfheide, 2004) have argued that the way the Fed reacted to inflation has changed over time (e.g. under the Volcker and
Greenspan chairmanship, the Fed was more aggressive in fighting inflation pressures than under Burns). This is the ‘‘bad
policy’’ story and is an example of a change in the transmission mechanism. Others (e.g. Sims and Zha, 2006) have
emphasized that the variance of the exogenous shocks has changed over time and that this alone may explain many
$
All authors are Fellows of the Rimini Centre for Economic Analysis. The authors would like to thank the Leverhulme Trust for financial support under
Grant F/00 273/J.
Corresponding author.
E-mail address: [email protected] (G. Koop).
0165-1889/$ - see front matter & 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.jedc.2008.11.003
ARTICLE IN PRESS
998 G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017
apparent changes in monetary policy. This is the ‘‘bad luck’’ story. Yet others (e.g. Primiceri, 2005) have found that both the
transmission mechanism and the variance of the exogenous shocks has changed over time.
This brief (and very incomplete) discussion of the literature is intended to motivate the basic point that an
understanding of monetary policy should be based on multivariate models where the transmission mechanism and the
variances of the exogenous shocks can both potentially change over time. Another important issue is whether any such
change is gradual or abrupt. Many models have been used to investigate such issues in the literature. However, most of
them (including some of the DSGE-based models), use extended versions of VARs as building blocks. There is a large
literature (e.g. George et al., 2008) which points out that even standard VARs can be over-parameterized and tries to find
various ways of minimizing this problem. When one turns to extensions of VARs with time-varying parameters such over-
parameterization worries become even more serious. Such considerations motivate the present paper. In it, we re-examine
some of the existing empirical literature on U.S. monetary policy using a class of models which is flexible enough to nest
many of the existing specifications, but is more tightly parameterized in key dimensions. Most importantly, it allows us to
estimate the form and nature of how parameters (and, thus, the transmission mechanism) evolve over time.
Our model is based on a time-varying VAR similar to that used in Primiceri (2005) or Cogley and Sargent (2001, 2005),
but extends this type of model in important ways. Like Primiceri (2005) and Cogley and Sargent (2005), we have a
multivariate model where both the transmission mechanism and the error covariance matrix can change over time.
However, unlike Primiceri (2005) and the related time-varying parameter VAR (TVP-VAR) literature (e.g. Cogley and
Sargent, 2001, 2005; Cogley et al., 2005), we do not impose as many restrictions on the time variation of the parameters.
Instead, to model the change in parameters over time, we draw on the mixture innovation approach of Gerlach et al. (2000)
and Giordani and Kohn (2008) as a way of letting the data speak about how parameters evolve as well as keeping the model
more tightly parameterized in key dimensions. Exact details will be provided in the next section. But, to motivate the basic
ideas, note that there are two main approaches to modelling changes in parameters over time: one can estimate a model
with a small number of structural breaks (usually one or two). Alternatively, one can estimate a time-varying parameter
(TVP) model where the parameters are allowed to change with each new observation, usually according to a random walk.
A TVP model can be interpreted as imposing T 1 breaks in a sample of size T. Thus, we have two extremes: models with
very few (but usually large) breaks or those with many (usually small) breaks. The approach adopted in this paper allows
for the estimation of the number of breaks. Thus, we nest the two extreme cases and can let the data tell us if there are few
(or no) changes in the parameters or whether change is constant and gradual. Another advantage of our approach relative
to the TVP-VAR literature is that, by estimating the parameters to be constant over periods, we can obtain a more
parsimonious model, mitigating concerns about over-parameterization. Our model also allows for the three different blocks
of parameters we work with (the VAR coefficients, a block which relates to the error variances and another relating to error
covariances) to evolve in completely different ways (or even for some or all blocks not to change at all). Thus, we can
estimate whether and how change occurs in a very flexible manner, as opposed to assuming a specific model with
parameter change of a particular sort.
After developing our model and appropriate Bayesian econometric methods, we present empirical results. We work
with a standard system involving inflation, unemployment and interest rates. We present results relating to the
transmission mechanism and the volatility of exogenous shocks. We find evidence of gradual change in all of our
parameters and reinforce the findings of Primiceri (2005). Relative to the existing literature, a crude summary of our results
might run as follows. The model of Primiceri (2005) is best, but the model of Cogley and Sargent (2005) is not too bad
(although there are some restrictions in this model which are rejected, these have only minor macroeconomic
implications). Models which only have time variation in the error covariance matrix (i.e. with constant VAR coefficients)
are a bit worse. They accurately recover patterns in the exogenous shocks, but can be misleading about the transmission
mechanism. However, models with a constant error covariance matrix such as Cogley and Sargent (2001) or a traditional
VAR are strongly rejected and can yield seriously misleading policy inferences.
2. The models
The models used in this paper all begin with a state space model involving a measurement equation:
yt ¼ Z t at þ et (1)
atþ1 ¼ at þ Rt Zt , (2)
where yt is an p 1 vector of observations on the dependent variables, at an m 1 vector of states (in our case, these are
the VAR coefficients), et are independent Nð0; Ht Þ random vectors and Zt are independent Nð0; Q t Þ random vectors for
t ¼ 1; . . . ; T. The errors in the two equations, et and Zs , are independent of one another for all t and s.1 Z t is the appropriate
p m matrix of data on explanatory variables. In our case, we are working with extensions of VARs and, hence, each row of
1
This is the standard assumption, but it can easily be relaxed if desired.
ARTICLE IN PRESS
G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017 999
Z t contains lags of all dependent variables and an intercept and other deterministic terms. For future reference, note that
we will use Rt to control the structural breaks in our model.
This model, which is a familiar one in the state space literature, nests a wide range of commonly used models. A VAR is
obtained if we set Rt ¼ 0m for all t and, thus, the VAR coefficients are constant over time. TVP-VARs of the sort used, e.g., in
Cogley and Sargent (2001) are obtained by setting Rt ¼ Im for all t. Technical details on how Bayesian econometric methods
can be used to carry out inference in this model are provided in the Appendix. Suffice it to note here that a great advantage
of staying in the framework of the state space model given by (1) and (2) is that standard methods of posterior simulation
are available. In particular, Markov chain Monte Carlo (MCMC) algorithms can be used to draw the states, a ¼ ða01 ; . . . ; a0T Þ0 .
In our empirical work, we use the method of Durbin and Koopman (2002).
As discussed in the Introduction, there is strong empirical evidence that volatility issues are important in many
macroeconomic problems. Thus, the error covariance matrix in the measurement equation, Ht , should be allowed to vary
over time. To motivate the particular specification we choose, note that the Great Moderation of the business cycle implies
that it is important that the variances of macroeconomic variables should be allowed to change over time. However, many
key aspects of the transmission mechanism relate to the covariances between the errors. For instance, in many models, the
immediate effect of changes in monetary policy on inflation is dependent upon the correlation between the errors in the
interest rate and inflation equations. Thus, it is potentially important to allow for both the error variances and covariances
to change over time.
Following Primiceri (2005), we use a triangular reduction of the measurement error covariance, Ht , such that
At Ht A0t ¼ St S0t
or
Ht ¼ A1 0 1 0
t St St ðAt tÞ , (3)
where St is a diagonal matrix with diagonal elements sj;t for j ¼ 1; . . . ; p and At is the lower triangular matrix:
2 3
1 0 ... 0
6 a21;t 1 . . . 7
6 7
6 7
At ¼ 66
... 7.
7
6 . . . 1 07
4 5
ap1;t . . . apðp1Þ;t 1
Note that the assumption that At is lower triangular is not an identification assumption (identification will be discussed in
the next section), it is merely a particular way of parameterizing the reduced form covariance matrix. However, as
discussed in Primiceri (2005, Section 3.1), this choice of parameterization means that, in theory, empirical results could be
sensitive to the way the variables are ordered (i.e. due to the prior on the unknown elements of At depending on the
ordering). In practice, with this data set, results are very similar for different orderings.
To model evolution in St and At we must specify additional state equations. For St a stochastic volatility framework can
be used. In particular, if st ¼ ðs1;t ; . . . ; sp;t Þ0 , hi;t ¼ lnðsi;t Þ, ht ¼ ðh1;t ; . . . ; hp;t Þ0 then Primiceri uses
htþ1 ¼ ht þ ut , (4)
where ut is Nð0; WÞ and is independent over t and of et and Zt . Technical details for drawing h in an MCMC algorithm are
given in the Appendix. Here we stress only that standard algorithms are required. In our empirical work, we use the
algorithm of Kim et al. (1998).
To describe the manner in which At evolves, we first stack the unrestricted elements by rows into a pðp 1Þ=2 vector as
at ¼ ða21;t ; a31;t ; a32;t ; . . . ; apðp1Þ;t Þ0 . These are allowed to evolve according to the state equation:
atþ1 ¼ at þ zt , (5)
where zt is Nð0; CÞ and is independent over t and of ut , et and Zt . Following Primiceri (2005), we assume C to have a block
diagonal structure such that the coefficients in C belonging to each equation are independent of one another. With regards
to our MCMC algorithm, this means that we can transform the original measurement equation so that the Durbin and
Koopman (2002) algorithm can be used to draw the states one equation at a time.
Primiceri (2005) uses the model given by (1)–(5), which we will refer to as a TVP-VAR with stochastic volatility, in a
study of the evolution of monetary policy.2 Cogley and Sargent (2005) use a similar specification but one which has
parameters comparable to At being constant over time. Cogley and Sargent (2001) uses an even more restricted variant of
this specification which does not have multivariate stochastic volatility (i.e. it is a TVP-VAR, but Ht is constant over time).
This (very incomplete) discussion of the related literature is meant to motivate that this class of models is receiving a great
deal of attention by macroeconomists. These are very flexible models which are well-suited for estimating transmission
mechanisms and their evolution over time. However, they contain very many parameters and, thus, there is the risk that
2
To be precise, Primiceri (2005) uses a slightly restricted version of this model where C is assumed to be block diagonal, thus slightly reducing the
number of parameters to estimate.
ARTICLE IN PRESS
1000 G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017
they will over-fit the data. A common symptom of over-fitting is if a model yields good in-sample performance, but poor
out-of-sample forecast performance. It is perhaps significant that most papers using this sort of models present only in-
sample results. These considerations motivate the development of models which are flexible, but allow for a more tight
parameterization to lessen the risks of over-fitting. It is to such an extension which we now turn.
TVP models imply that coefficients change every time period, although the magnitude of the change in coefficients can
be restricted by the state equation. That is, typically the error covariance matrix in the state equation is estimated to be
small and, thus, atþ1 is close to at . Thus, TVP models work well when the evolution of coefficients is constant but gradual.
Loosely speaking, TVP models can be thought of as ‘‘many small breaks’’ models. In contrast to TVP models, there is a large
literature which assumes that fewer changes in coefficients occur, but when a structural break occurs, the magnitude of the
change in coefficients is unrestricted. Loosely speaking these can be thought of as ‘‘few large breaks’’ models. Examples
include Chib (1998), Maheu and Gordon (2008), Pastor and Stambaugh (2001) and Pesaran et al. (2006). See the discussion
in Koop and Potter (2007) for attempts to reconcile these different approaches.
An increasingly popular class of models which are increasingly used to model structural breaks are mixture innovation
models. McCulloch and Tsay (1993) is an early example of such an approach, Gerlach et al. (2000) develops a very efficient
computational algorithm and Giordani and Kohn (2008) applies mixture innovation models to change-point problems.
The mixture innovation aspect arises by allowing some or all of the states and parameters in the previous models to be
determined (up to a set of unknown parameters) by a sequence of Markov random vectors K ¼ ðK 1 ; . . . ; K T Þ0 . As we shall see
shortly, these vectors will control the structural breaks in the model. In our model, we allow for breaks in the VAR
coefficients (at ) and the measurement error covariance matrix (Ht ). Remember that Ht ¼ A1 0 1 0
t St St ðAt Þ and, thus, the
measurement error covariance matrix is parameterized in terms of St and At . Given that some authors (e.g. Cogley and
Sargent, 2005) assume a time-invariant At , there does seem to be interest in models with breaks in the error variances, but
not covariances. Accordingly, we allow for an unknown number of breaks in at , St and At and (of empirical importance and
in contrast to much of the literature on structural breaks), we allow for breaks in these three sets of parameters to occur at
different times. Accordingly, we let K t ¼ ðK 1t ; K 2t ; K 3t Þ0 for t ¼ 1; . . . ; T, where K 1t 2 f0; 1g controls breaks in the VAR
coefficients, K 2t 2 f0; 1g controls breaks in St and K 3t 2 f0; 1g controls breaks in At .
We extend the TVP-VAR with stochastic volatility model as follows. In (2), the state equation which controls the
evolution of at , we set Rt ¼ K 1t . Note that this implies that there are time periods when the VAR coefficients remain
constant (K 1t ¼ 0) and times when a break in the VAR coefficients can occur (K 1t ¼ 1).
Eqs. (4) and (5) are the state equations which control the evolution in St and At . We generalize these to
htþ1 ¼ ht þ K 2t ut (6)
and
at ¼ at1 þ K 3t zt . (7)
Thus, these vectors of parameters can either remain constant (K 2t ¼ 0 and/or K 3t ¼ 0) or a break can occur (K 2t ¼ 1 and/or
K 3t ¼ 1). All other assumptions given for the TVP-VAR with stochastic volatility model still hold.
Note that all of the models previously discussed are nested within this mixture innovation extension of the TVP-VAR
with stochastic volatility. If K 1t ¼ K 2t ¼ K 3t ¼ 1 for t ¼ 1; . . . ; T, then we obtain the TVP-VAR with stochastic volatility of
Primiceri (2005). If K 1t ¼ K 2t ¼ K 3t ¼ 0 for t ¼ 1; . . . ; T, then we obtain the traditional VAR with constant parameters. If
K 1t ¼ 1 and K 2t ¼ K 3t ¼ 0 then we obtain a homoskedastic TVP-VAR as in Cogley and Sargent (2001). If K 1t ¼ K 2t ¼ 1 and
K 3t ¼ 0 then we obtain the model of Cogley and Sargent (2005). Different configurations allow for change-points to occur at
different times. It is also worth stressing that, unlike most other approaches to structural break modelling, the mixture
innovation framework allows us to deal with the case where there is an unknown number of changepoints. As discussed in
Koop and Potter (2008) this is an advantageous feature since imposing the restriction that a fixed number of breaks occur
leads to models with undesirable characteristics.
Note that this model is more flexible than a standard TVP-VAR with stochastic volatility, since it nests it along with a
myriad of other models. However, it also can be more parsimonious since, if any of K 1t ; K 2t or K 3t equals zero, then the
corresponding parameter vector (i.e. at ; htþ1 or at ) does not change, reducing the number of parameters that needs to be
estimated. This allows for more precise inference. Thus, although there is a sense in which we are adding new parameters
to the model by including the mixture innovation extension, the manner in which these parameters are added allows for
more parsimonious models to be estimated.
To complete the model, we must specify a hierarchical prior for K. The posterior simulation algorithm for K discussed in
the Appendix will work provided the hierarchical prior for K t is Markov. In our empirical work, we adopt a Bernoulli
distribution:
pðK jt ¼ 1Þ ¼ pj (8)
for j ¼ 1; 2; 3. Thus, pj is the probability that a break occurs at time t, for j ¼ 1; 2; 3 (i.e. corresponding to at , St or At ). This is
treated as an unknown parameter and estimated from the data. Note that our prior assumes that changes occur
independently in at , St or At (i.e. in the prior K 1t ; K 2t and K 3t are independent of one another, contemporaneously and
at all leads and lags). We note that, in some applications, this prior independence assumption may be undesirable. If so,
ARTICLE IN PRESS
G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017 1001
Gerlach et al. (2000) shows how this assumption can be relaxed. This extension is computationally more cumbersome and
we do not pursue it here.
The MCMC algorithm is described in detail in the Technical appendix. Here we note only that it takes the MCMC
algorithm for the TVP-VAR with stochastic volatility and adds extra steps taken from Giordani and Kohn (2008) for the
mixture innovation aspect of the model.
3. Macroeconomic issues
To investigate issues relating to monetary policy, it is common (e.g. Cogley and Sargent, 2001, 2005; Primiceri, 2005;
Stock and Watson, 2001) to use a short term interest rate as being under the control of the Fed (the ‘‘policy block’’) with the
inflation and unemployment rates representing the ‘‘non-policy block’’ . Accordingly, we use data from 1953Q1 through
2006Q2 on the unemployment rate (seasonally adjusted civilian unemployment rate, all workers over age 16), interest rate
(yield on three month Treasury bill rate) and inflation rate (the annual percentage change in a chain-weighted GDP price
index).3
The models described thus far are reduced form models. Identifying assumptions must be made to allow for structural
interpretation. We go from our time-varying reduced form VARs to time-varying structural form VARs in a standard way
(see, e.g., Primiceri, 2005) and begin by ordering our dependent variables as inflation, unemployment and interest rates in
the vector yt. In particular, in our reduced form models the errors in the measurement equation, et , were Nð0; Ht Þ where Ht
is parameterized as in (3). The structural form errors, ut are assumed to be Nð0; IÞ and the structural form model has
yt ¼ Z t at þ Ut ut , (9)
where Ut imposes the identifying restrictions. With regards to the policy block, we assume that the shock to the interest
rate equation (i.e. the monetary policy shock) has no immediate effect on inflation and unemployment. This is a standard
assumption used, among many others, by Bernanke and Mihov (1998), Christiano et al. (1999) and Primiceri (2005). With
regards to the non-policy block we assume that the shock to the unemployment equation has no immediate effect on
inflation.4 These assumptions imply that Ut is lower triangular. The relationship between the reduced form and structural
form parameters thus becomes
Ut ¼ A1
t St
and our MCMC draws of At and St can be directly transformed to provide draws of Ut and, thus, impulse responses.
There are, of course, many macroeconomic features that can be presented with a structural VAR model such as the one
discussed. However, for our policy question, the most important ones relate to monetary policy. With regards to the
exogenous shocks, we simply plot their standard deviations (i.e. the diagonal elements of St ) with the standard deviation of
the interest rate equation being of greatest importance as reflecting the monetary policy shock. With regards to the
transmission mechanism, impulse response functions are of interest. Given our interest in evolving monetary policy,
we focus on the impulse response of the variables in the non-policy block (i.e. inflation and interest rates) to policy (i.e. to
the monetary shock).
With nonlinear time series models such as the TVP-VARs we are working with, there are some issues which arise with
impulse response analysis which do not arise with linear (time-invariant) models (see Koop, 1996; Koop et al., 1996). These
are discussed in the Technical appendix. Suffice it to note here that, following other authors such as Primiceri (2005), we
calculate impulse responses for a shock at time t with response over any time period from t to t þ n, based on the
parameters as they are at time t.
4. Empirical results
We divide our empirical results into two sub-section. The first of these discusses the evolution of parameters in the VAR
model and, in particular, whether there is evidence for parameter change and, if so, in which parameters and of what sort.
The second presents results on our macroeconomic features of interest. Throughout, we present results with two lags in the
VAR and an intercept (but no additional deterministic terms).
3
The data were obtained from the Federal Reserve Bank of St. Louis website, http://research.stlouisfed.org/fred2/.
4
This assumption is more controversial since we could have assumed the inflation shock had no immediate effect on unemployment. Primiceri
(2005) discusses issues relating to the ordering of variables in the TVP-VAR in detail on his page 827. Briefly, given the lower-triangular structure of At , the
ordering of the variables in the non-policy block could affect inference on the covariance matrix. But, in practice, for this data set, empirical results are
very similar for this alternative ordering.
ARTICLE IN PRESS
1002 G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017
Before presenting empirical results relating to macroeconomic features of interest, we present some direct evidence on
whether breaks have occurred in our three blocks of parameters (i.e. the VAR coefficients, the volatilities, St , and At which
relates to the error covariances) and, if so, of what sort. A convenient vehicle for discussing these issues is through our
mixture innovation variables which control the changes in the three sets of parameters, K 1 , K 2 and K 3 (or their associated
transition probabilities, p1 ; p2 and p3 ). As discussed in Section 2 of the paper, by setting particular values for K 1 ; K 2 and K 3 ,
we can obtain many different models of interest. The ones we consider are listed in Table 1. We consider various restricted
versions of our model as noted in Table 1 including the models of Primiceri (2005) and of Cogley and Sargent (2005) where
the latter restricts At to be constant over time. We also consider a homoskedastic TVP-VAR model as well as a model with
multivariate stochastic volatility, but constant VAR coefficients. This latter is motivated by papers such as Sims and Zha
(2006), which have found support for models with no changes in the VAR coefficients (but substantive changes in the error
covariance matrix).
The prior used in the paper is described in the Technical appendix. It is a training sample prior of the sort used by
Primiceri (2005) and Cogley and Sargent (2001, 2005). Indeed, for the TVP-VAR with stochastic volatility it is the same as
Primiceri’s prior. The new parameters relate to the mixture innovation extension. As discussed in the Appendix, we use
Beta priors for pj and, thus, Bðb1j ; b2j Þ for j ¼ 1; 2; 3. The properties of the Beta distribution are given, e.g., in Koop (2003, p.
330) and, from these, it can be seen that if b11 ¼ 1; b2j ¼ 1 for all j, then Eðpj Þ ¼ 12 (with standard deviation 0.29). This is our
Benchmark Prior, which says, a priori, that there is a 50% chance of a break occurring in any time period. The standard
deviation is very large, indicating a relatively noninformative prior.
Inspired by much of the structural break literature which works with models with a small number of breaks (e.g.,
among many others, Pesaran et al., 2006), we might also be interested in working with a model which only allows for, say,
one or two breaks. However, one of the advantages of the mixture innovation approach to structural break modelling is that
it does not impose, a priori, a fixed number of breakpoints on the data. Instead it estimates the number of breakpoints in a
data-based fashion. So, we cannot simply choose a mixture innovation model with, say, one or two breaks imposed.
However, we can tighten the prior on the transition probabilities towards such a model. This is what we do in the model
labelled ‘‘Few Breaks’’ in Table 1. In particular, for the prior hyperparameter values listed in Table 1 for this model we have
Eðpj Þ ¼ 0:001 (with standard deviation 0.010) for j ¼ 1; 2; 3.
Table 2 presents empirical results relating to the question of which type of model receives support from the data. An
advantage of our mixture innovation approach is that evidence for or against any particular restricted version of our model
(such as those listed in Table 1) can be revealed by looking at posterior of parameters such as p1 ; p2 and p3 . The usual
method of Bayesian model comparison is through marginal likelihoods and (although we do present marginal likelihoods)
these can be more sensitive to prior information than posteriors (especially with models such as ours with high-
dimensional parameter spaces). Thus, much of our discussion of how monetary policy evolves relates to p1 ; p2 and p3 .
Table 1
Models and priors used in the empirical work.
VAR coefficients St At
Table 2
Results using Benchmark Prior for mixture innovation TVP-VAR with multivariate stochastic volatility and restricted versions of this model.
Model Marginal likelihood Eðlog LÞ Eðp1 jYÞ Eðp2 jYÞ Eðp3 jYÞ
In addition to the marginal likelihood, we also present the expected value of the log-likelihood function. The Technical
appendix discusses how these measures of model performance are calculated and how the expected value of the log-
likelihood can be interpreted as the empirical Bayesian metric described in Carlin and Louis (2000, Section 6.5.1) and are
closely related to conventional information criteria. In Table 2, the column labelled ‘‘E½log L’’ presents this measure of
model performance.
Regardless of whether we look at the posteriors for p1 ; p2 and p3 , the marginal likelihoods or the expected
log likelihoods, the story that comes through is a strong one. We are finding that all three of our sets of parameters
(at ; At and St ) do change over time and in a way that is closer to being the gradual evolution of the TVP-VAR than
the abrupt breaks of conventional structural break models. Consistent with the Great Moderation of the business
cycle, we are finding most evidence for evolution of the error variances. We elaborate on these points in the next
paragraph.
Our mixture innovation TVP-VAR with stochastic volatility estimates all three of our transition probabilities to be above
0.8 indicating that, in any time period, there is a very high probability that parameters will change. We are thus finding
support for a model that is close to Primiceri’s model (although, of course, we did not impose this on the data). It is worth
noting that our model has a slightly higher marginal likelihood and expected log-likelihood than Primiceri’s model. Among
the other restricted versions of our model, the one which imposes At as being constant does the best. This is the model of
Cogley and Sargent (2005). The remaining models clearly are receiving little support. The standard VAR with time-invariant
parameters does the worst. The TVP-VAR with constant error covariance (used in Cogley and Sargent, 2001) also does very
poorly. It is also worth noting that there does seem to be time variation in the VAR coefficients as the model which restricts
at to be constant receives little support. Finally, we discuss whether breaks in at ; At and St are occurring at the same time.
Given that we are close to Primiceri’s model (which, by definition, has breaks occurring in every period for all three
parameter blocks), we know immediately that there will be strong dependence between break times for at ; At and St . Given
K 1t ; K 2t and K 3t are dummy variables (and, hence, correlations are not that informative), we use as a metric of dependence
the posterior mean of the proportion of times when K 1t ¼ K 2t ; K 1t ¼ K 3t and K 2t ¼ K 3t , respectively. This measure of
similarity between break timing in different parameter blocks is 0:939, 0:809 and 0:804, respectively. Thus, as expected,
there is a strong dependence in break timing across parameter blocks.
All of the results discussed so far use the Benchmark Prior (or suitably restricted variants of it). We have carried out a
prior sensitivity analysis and results are similar, even when we make substantive changes in the prior hyperparameter
values. As one example, consider the Few Breaks prior which expresses extremely strong views that the transition
probabilities are near zero (i.e. Eðpj Þ ¼ 0:001 with standard deviation 0.010). Using this prior, the posteriors for the
transition probabilities are Eðp1 jDataÞ ¼ 0:18; Eðp2 jDataÞ ¼ 0:87 and Eðp3 jDataÞ ¼ 0:27 (with posterior standard deviations
of 0.04, 0,04 and 0.07, respectively). Thus, even though we have used prior information that the transition probabilities are
near zero, data information is so strong that the posteriors are pulled strongly in directions which suggest gradual change
in all parameters. Note that, in the case of at and At , we are finding that the prior does have some effect (e.g. Eðp3 jDataÞ ¼
0:82 with the Benchmark Prior and Eðp3 jDataÞ ¼ 0:27 with the Few Breaks prior), but the point estimates still indicate
gradual evolution of coefficients (i.e. even a transition probability of 0.27 indicates we would expect a break to occur about
once a year). As before, the evidence of parameter change is greatest for St , but is still appreciable for at and At . The
marginal likelihood for this model is 19.72, which is a slight improvement over the Benchmark Prior. This is consistent with
the fact that the Few Breaks prior is much more informative than the Benchmark Prior (thus, resulting in a model which is
more parsimonious which, holding other things equal, will increase the marginal likelihood), but imposing this prior
information causes only a small reduction in model fit.
In summary, thus far we have made a strong case in favor of our mixture innovation extension of a TVP-VAR with
stochastic volatility. In terms of the controversies in the macroeconomics literature, we are finding (like Primiceri, 2005)
that parameter change in the error covariance matrix is of predominant importance, but that evolution in the VAR
coefficients is appreciable enough that it should not be neglected. However, so far our argument has been purely statistical,
using reduced form models. The question of what kind of implications this parameter evolution has for our understanding
of monetary policy has to now be addressed.
0.9
Benchmark
Primiceri
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
1.4
Benchmark
Primiceri
Hetero VAR
1.2
0.8
0.6
0.4
0.2
0
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
restriction that at is constant over time). These are labelled, ‘‘Benchmark’’ , ‘‘Primiceri’’ and ‘‘Hetero VAR’’ , respectively, in
the figures.
Before discussing impulse responses, we begin with some evidence directly relating to the parameters of these models.
Figs. 1–3 plot the point estimates of the standard deviations of the errors in the measurement equation (i.e. the posterior
mean of the square roots of the diagonal elements of Ht ). These figures do indicate substantial variation in volatility.5 The
general patterns in these graphs are similar to those noted by others in the literature. For instance, our Figs. 1–3 look very
similar to Figs. 1(a)–(c) in Primiceri (2005). There is the same increase in inflation volatility until 1975 and, subsequently, a
tendency (with many exceptions, particularly in the early 1980s) for it to decline. The volatility in unemployment equation
error spikes around 1975. The volatility of the monetary policy shock from the interest rate equation shows a big increase in
the early 1980s before becoming much lower afterwards (with the interesting exception of a substantial increase in 2001).
With regards to comparing the three different models listed above, they all do capture the same broad patterns of
volatility for all variables. However, some interesting differences do exist. This is most noticeable in Fig. 1 where the
Heteroskedastic VAR yields a much smoother pattern of volatility for the inflation equation. There are also noticeable
divergences between our model and that of Primiceri (2005), particularly in the crucial mid-1970s through early 1980s
5
To keep the figures clear, we have not put measures of uncertainty (e.g. 10th/90th percentile bands) in these figures. These do indicate some degree
of imprecision, but the change in volatility is still large relative to this imprecision.
ARTICLE IN PRESS
G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017 1005
6
Benchmark
Primiceri
Hetero VAR
5
0
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
time period where most of the change in the parameters appears to happen. For the other two equations (see Figs. 2 and 3),
fewer differences exist. Our mixture innovation model and the TVP-VAR with stochastic volatility are yielding quite similar
patterns of volatility, but the Heteroskedastic VAR diverges from this pattern a few times, particularly in the
unemployment equation. However, with respect to the monetary shock (Fig. 3), these three models are yielding very
similar results.
Results using the Few Breaks prior are given in the Empirical appendix. Suffice it to note here that they looks similar to
those obtained using the Benchmark Prior, but are slightly smoother.
0.04
1975
1981
1996
0.03 2006
0.02
0.01
-0.01
-0.02
-0.03
0 5 10 15 20 25
Horizon
0.025
1975
1981
1996
2006
0.02
0.015
0.01
0.005
-0.005
-0.01
0 5 10 15 20 25
Horizon
The response of the interest rate to the monetary shock (see Fig. 6) is very nearly the same in every time period, but the
responses of inflation and the unemployment rate exhibit more interesting patterns. The point estimate of the response of
inflation to the monetary shock (Fig. 4) in 1975Q1 is very different from all the other years. At short and medium horizons,
the pattern is that monetary shocks are having less of an effect as time goes by. In 1975Q1 there is a (positive) hump-
shaped response of inflation to a monetary shock. This apparent price puzzle (which is probably due to the high inflation,
high interest rate environment of the period6), has vanished in later periods. The point estimate of the response of the
unemployment rate to the monetary shock also shows a similar pattern, but the vanishing hump occurs later. That is, there
is a large (positive) hump-shaped response of unemployment to a monetary shock (at short and medium horizons) for both
1975Q1 and 1981Q3, but this, to all intents and purposes, vanishes for 1996Q1 and 2006Q3.
The preceding statements about the evolution of the effects of monetary shocks were based on point estimates of
impulse responses. To keep the figures readable, we did not put measures of uncertainty associated with the point
estimates. These tend to be quite large. In Fig. 4, the point estimate of the impulse response in 1975Q1 is quite different
from 1981Q3. Fig. 7 relates to the difference in these impulse responses between these two years. It plots the point estimate
6
Alternatively, the price puzzle could be evidence of mispecification and, e.g., that more variables should be included in the VAR.
ARTICLE IN PRESS
G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017 1007
0.18
1975
0.16 1981
1996
2006
0.14
0.12
0.1
0.08
0.06
0.04
0.02
-0.02
0 5 10 15 20 25
Horizon
0.6
Posterior Median
90th percentile
0.5 10th percentile
0.4
0.3
0.2
0.1
-0.1
-0.2
0 5 10 15 20 25
Horizon
Fig. 7. Difference between impulse responses, 1975–1981 (response of inflation to monetary shock).
of this difference along with the 10th and 90th percentiles of the posterior. Other years and other differences between
impulse responses function exhibit a similar pattern. The basic pattern is that a horizontal line at zero always lies between
the 10th and 90th percentiles (i.e. lies within our 80% credible interval). These credible intervals are calculated pointwise.
Thus, the fact that each credible interval contains zero individually does not necessarily imply that jointly there are no
interesting differences in impulse responses over time. However, our previous discussion about the systematic patterns in
the evolution of impulse responses must be qualified due to the inaccuracy of point estimates.
0.03
Benchmark
Primiceri
0.025
0.02
0.015
0.01
0.005
-0.005
-0.01
0 5 10 15 20 25
Horizon
0.035
Benchmark
Primiceri
0.03
0.025
0.02
0.015
0.01
0.005
0
0 5 10 15 20 25
Horizon
Figs. 8–11 compare results for our Benchmark model and the model of Primiceri (2005) for our four different
representative time periods. It can be seen that, in all of our time periods, the two models are giving basically the same
story. However, the fact that we are unable to even plot the Heteroskedastic VAR results on a figure with the same scaling
implies that the story of our model is quite different from the Heteroskedastic VAR.
5. Conclusion
There has been much interest in the recent macroeconomic literature on the transmission of monetary policy shocks
and the volatility of such shocks. In particular, questions arise over whether they have changed over time and, if so, of what
form the change takes (e.g. has the change been gradual or abrupt). In this paper, we develop a model which allows us to
directly address such issues. Instead of estimating a model which assumes a particular form of parameter change (i.e. TVP
models assume gradual evolution of parameters whereas conventional structural break models assume a small number of
abrupt breaks), our model allows us to estimate the type of change which is occurring.
We use our model in an empirical context involving three standard variables: inflation, unemployment and interest
rates and make standard identifying assumptions. Our results are clear. Relative to a traditional (homoskedastic) VAR, there
ARTICLE IN PRESS
G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017 1009
0.006
Benchmark
Primiceri
0.004
0.002
0.000
-0.002
-0.004
-0.006
-0.008
0 5 10 15 20 25
Horizon
0.005
Benchmark
Primiceri
0 .000
-0.005
-0.01
0 5 10 15 20 25
Horizon
is overwhelming evidence of parameter change. The strongest change relates to the error variances, but there also seems to
be appreciable change in VAR coefficients and error covariances. Furthermore, this change is gradual as opposed to being
abrupt.
Relative to the existing literature, our model is yielding results which are quite close to those of Primiceri (2005). Our
model is also yielding results which are close (but not quite so close) to those obtained from the model of Cogley and
Sargent (2005). The restrictions on the error covariance matrix in the latter paper do not receive statistical support, but
freeing up the restrictions has only small economic implications. Using a different modelling strategy, Sims and Zha (2006)
find evidence in support of a model where error variances and covariances change, but VAR coefficients do not. In the
framework of our model, restricting VAR coefficients to be constant has only minor implications for the volatility of
exogenous shocks, but has a substantive impact on impulse response functions.
Acknowledgments
We would like to thank two anonymous referees and seminar participants at the European Central Bank and the Rimini
Centre for Economic Analysis for helpful comments.
ARTICLE IN PRESS
1010 G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017
Appendix A
A.1. Technical appendix: posterior computation, prior and impulse response analysis
The models used in this paper all begin with the TVP-VAR given in (1) and (2) in the body of the text. A key step in any of
our MCMC algorithms will be to draw the states, a ¼ ða01 ; . . . ; a0T Þ0 . For known values of Ht ; Q t and Rt , this can be done using
any of the standard algorithms for state space models. We use the algorithm of Durbin and Koopman (2002) which (for the
reasons given in that paper) is more efficient than other popular alternatives. State space algorithms such as this require a
treatment of the initial condition a1 . We do this by writing (1) as
yt ¼ Z t a0 þ Z t at þ et
and then initializing the algorithm for drawing states by setting a1 ¼ 0. Note that a0 can be interpreted as benchmark VAR
coefficients and the state equation as capturing deviations from this benchmark. The case where Rt ¼ 0 for t ¼ 1; . . . ; T then
produces the standard VAR with time-invariant parameters.
Our MCMC algorithm involves drawing from the posterior of a0 conditional on the states and other model parameters.
This is straightforward since we can re-arrange the previous equation as
yt Z t at ¼ Z t a0 þ et
and standard results for the multivariate Normal regression model (see, e.g., Koop, 2003, pp. 140–141) can be used with
yt Z t at as the dependent variable. In our models with stochastic volatility, we also use the Durbin and Koopman (2002)
algorithm for the elements relating to the measurement error covariance matrix. In these cases, we treat initial conditions
in the same manner.
Our MCMC algorithms involve cycling through the full posterior conditional distributions. For simplicity, we do not list
all the conditioning arguments. But we stress that all of the posteriors noted below (which are labelled as being conditional
on ‘‘Data’’) are the full conditionals required to set up a valid MCMC algorithm.
and
Q 1 WðnQ ; Q 1 Þ. (A.2)
where
nH ¼ T þ nH
and
1
h XT i1
H ¼ H þ t¼1 ðyt Z t at Þðyt Z t at Þ0 .
where
nQ ¼ T þ nQ
and
1
h XT i1
Q ¼ Q þ t¼1 ðatþ1 at Þðatþ1 at Þ0 .
A posterior simulator for this model involves drawing the states using the algorithm of Durbin and Koopman (2002) and
drawing the other model parameters from (A.3) and (A.4).
ARTICLE IN PRESS
G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017 1011
ynt ¼ At ðyt Z t at Þ,
where varðynt Þ ¼ St S0t which is a diagonal matrix. Let ynj;t for j ¼ 1; . . . ; p denote the jth element of ynt , ynn n 2
j;t ¼ ln½ðyj;t Þ þ c and
nn nn nn 0
yt ¼ ðy1;t ; . . . ; yp;t Þ . Note that c is referred to as an offset constant which has no effect on the following theoretical
derivations. Following standard practice we set c ¼ 0:001.
We can now write our specification for St as a state space model with measurement equation given by
ynn
t ¼ 2ht þ et (A.5)
7
and state Eq. (4). The only problem with using standard state space algorithms is that et is not Normally distributed. Note,
however, that since ynj;t and yni;t are independent of one another (for iaj), this independence property will carry over to
et ¼ ðe1t ; . . . ; ept Þ0 . Thus, we can draw on the univariate results of Kim et al. (1998) as relating to ejt . Although ejt is not
Normal, Kim et al. (1998) show how its distribution can be approximated to an extremely high degree of accuracy by a
mixture of seven Normals with means and variances given in their Table 4. If Sjt 2 f1; 2; 3; . . . ; 7g denotes which of the seven
Normals ejt is drawn from, we can construct Sj ¼ ðSj1 ; . . . ; SjT Þ0 and S ¼ ðS01 ; . . . ; S0p Þ0 as component indicators for all elements
of et . Conditional on S (and a and other parameters), (A.5) and (4) is a Normal linear state space model and, hence, we can
use the algorithm of Durbin and Koopman (2002) to draw ht .
The strategy above requires that we draw from the posterior of S conditional on the model parameters and
states. Kim et al. (1998) derive the appropriate posterior conditional. Let qi ; mi and v2i for i ¼ 1; . . . ; 7 be the
component probability, mean and variance of each of the components in the Normal mixture (obtained from their
Table 4). Then
2
PrðSit ¼ jjData; hi;t Þ / qj f N ðynn
i;t j2hi;t þ mj 1:2704; vj Þ (A.6)
for j ¼ 1; . . . ; 7, i ¼ 1; . . . ; p and t ¼ 1; . . . ; T.
To complete the description of the MCMC algorithm relating to St , we need to work out the conditional posterior for W
(where W is defined after Eq. (4)). We use a Wishart prior for W 1 :
W 1 WðnW ; W 1 Þ. (A.7)
1
The posterior for W (conditional on the states) is then Wishart:
1
W 1 jDataWðnW ; W Þ, (A.8)
where
nW ¼ T þ nW
and
" #1
1 X
T
W ¼ Wþ ðhtþ1 ht Þðhtþ1 ht Þ0 .
t¼1
Thus, to handle stochastic volatility in St , we add to the MCMC algorithm for Model 1 steps which draw h using the state
space model (A.5) and (4), S using (A.6) and W using (A.8).
Next we describe an algorithm for drawing from At , the unrestricted elements of which we stack by rows into a
pðp 1Þ=2 vector as at ¼ ða21;t ; a31;t ; a32;t ; . . . ; apðp1Þ;t Þ0 . These are allowed to evolve according to the state equation (5). We
can transform the original measurement equation so that the Durbin and Koopman (2002) algorithm can be used to draw
the states. This can be done as follows. Define
b
yt ¼ yt Z t at
and
At b
yt ¼ xt ,
where xt is independent Nð0; St St Þ (and independent of zt ). We can use the structure of At to isolate b
yt on the left-hand side
and write
b
yt ¼ C t at þ xt . (A.9)
7
We treat the initial conditions as in Primiceri (2005) by drawing from the training sample prior.
ARTICLE IN PRESS
1012 G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017
Primiceri (2005, p. 845)gives a general definition of C t . For our empirical work we have p ¼ 3 and, for this case,
2 3
0 0 0
6 b 0 7
C t ¼ 4 y1;t 0 5,
0 by1;t b
y2;t
C 1 1
j WðnCj ; C j Þ. (A.10)
1
The posterior for C (conditional on the states) is then Wishart:
1
C 1
j jDataWðnCj ; C j Þ, (A.11)
where
nCj ¼ T þ nCj
and
" #1
1 X
T
ðjÞ
Cj ¼ Cj þ ðatþ1 aðjÞ ðjÞ ðjÞ 0
t Þðatþ1 at Þ
t¼1
and aðjÞ
t are the elements of at corresponding to C j .
To summarize, to handle the variation in At , we add to the MCMC algorithm, steps which draw at (for t ¼ 1; . . . ; T) using
the state space model (A.9) and (5), and C using (A.11). To obtain draws of get the structural VAR (see Eq. (9)), we can use
the transformation Ut ¼ A1
t St :
and
X
T
b2j ¼ b2j þ T K jt .
t¼1
The MCMC algorithm for the time-varying parameter model (set out previously in this appendix) still, with one minor
alteration, works (except now that the formulae set out above are additionally conditional on K). The alteration is that the
P P
degrees of freedom parameters, nQ , nW and nC all have T in their formulae which should be changed to Tt¼1 K 1t , Tt¼1 K 2t
PT
and t¼1 K 3t , respectively.
To complete our MCMC algorithm, we must specify a way of drawing K. The posterior for K conditional on the states
takes a simple form. This motivated some early authors (e.g. McCulloch and Tsay, 1993) to draw from K conditional on the
states. However, Gerlach et al. (2000) point out some limitations of such a strategy. Most importantly it can be extremely
inefficient since the states and K can be very highly correlated with one another. They develop an algorithm which
integrates out the states analytically and draws from pðK t jData; K ðtÞ Þ where K ðtÞ denotes all the elements of K except for K t .
For state space models, Gerlach et al. (2000) use notation xs;t for all observations from s to t on any variable, x, and show
that
pðK t jData; K ðtÞ Þ / pðytþ1;T jy1;t ; KÞpðyt jy1;t1 ; K 1;t ÞpðK t jK ðtÞ Þ. (A.13)
The term pðK t jK ðtÞ Þ is simply the hierarchical prior and, thus, easy to draw from. Gerlach et al. (2000, pp. 820–822) set out
an efficient algorithm for drawing from the other terms pðytþ1;T jy1;t ; KÞ and pðyt jy1;t1 ; K 1;t Þ.
As discussed in Giordani and Kohn (2008), we can draw K 1t , K 2t and K 3t separately from one another in the context of
the three state space algorithms which make up the blocks of the MCMC algorithm for TVP model with stochastic volatility.
ARTICLE IN PRESS
G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017 1013
Formally, this amounts to drawing from pðK 1t jData; K ðtÞ ; K 2t ; K 3t Þ, pðK 2t jData; K ðtÞ ; K 1t ; K 3t Þ and pðK 3t jData; K ðtÞ ; K 1t ; K 2t Þ.
That is, drawing at in the TVP model involves use of the algorithm of Durbin and Koopman (2002) conditional on all the
model parameters including Ht (see our discussion of Model 1). K 2t and K 3t are used in the definition of Ht in Model 3. Thus,
the algorithm of Gerlach et al. (2000) can be combined with Durbin and Koopman (2002) to draw from K 1t and the VAR
coefficients (conditional on all other model parameters including K 2t and K 3t ). Similarly, the algorithm of Gerlach et al.
(2000) can be combined with Durbin and Koopman (2002) to draw from K 3t and At (conditional on all other model
parameters including K 1t and K 2t ). Finally, the algorithm of Gerlach et al. (2000) can be combined with our extension of
Kim et al. (1998) to draw from K 2t and St (conditional on all other model parameters including K 1t and K 3t ).
For the TVP-VAR the prior we use is the same as that used in Primiceri (2005). That is, we use a training sample prior
with the first 10 years of data to choose many of the key prior hyperparameters. To be precise, we use the training sample
and a time-invariant VAR to produce OLS estimates of the VAR coefficients, a b0 , and the error covariance matrix, O b and
decompose the latter as in (3) to produce b a0 and hb (where these are both vectors stacking the free elements as we did with
0
At and St ). We also obtain OLS estimates of the variance–covariance matrices of a b0 and b a0 which we label V b a and V b a . Using
these, we construct the priors for the initial conditions in each of our state equations as
b a Þ,
b0 ; 4V
a0 Nða
a0 Nðb b aÞ
a0 ; 4V
and
b Þ; I Þ.
logðh0 ÞNðlogðh0 3
Next we describe the priors for the error variances in the state equations. Note that we are choosing small degrees of
freedom parameters (relative to sample size) and, thus, these prior contain a relatively small amount of information
(relative to the data). For (A.2) we set nQ ¼ 40 and Q ¼ 0:0001V b a For (A.7) we set n ¼ 4 and W ¼ 0:0001I3 . For (A.10), we
W
set nC1 ¼ 2; nC2 ¼ 3 and C j ¼ 0:01Vb aj for j ¼ 1; 2 and V
b aj is the block corresponding to C j taken from V
b a.
For the TVP-VAR this completes the specification of the prior. For restricted versions of this model (e.g. the
homoskedastic TVP-VAR or the standard time-invariant VAR) we use the same prior for the parameters which are left
unrestricted.
The preceding prior choices were the same as Primiceri (2005) and were calibrated with the TVP-VAR with stochastic
volatility in mind. With our mixture innovation extension of the TVP-VAR, we have to additionally elicit the prior
hyperparameters b1j and b2j . These are discussed in the empirical section. With regards to the remaining parameters, we
make one alteration on Primiceri’s prior. The latter was a prior calculated for a TVP-VAR with stochastic volatility which
assumed a structural break occurred in every time period (a ‘‘many small breaks’’ model). We want our prior for the
mixture innovation extension to allow for this, but also to allow for fewer breaks, potentially of a larger magnitude.
Accordingly, we allow the mean of the error covariance matrices for the state equation to depend on our prior about the
number of breaks which occur. Note that, the Beta prior in (A.12) implies that
b1j
Eðpj Þ ¼ .
b1j þ b2j
If we let T 0j ¼ Eðpj ÞT, we modify our previous prior hyperparameters as Q ¼ 0:0001V b a T=T 01 , W ¼ 0:0001I3 T=T 02 and
C j ¼ 0:01V b aj T=T 03 . Thus, if we set Eðp Þ ¼ 1 we get Primiceri’s prior, but if we use a prior for p which implies fewer breaks,
j j
then our prior for the state equation error variances allows for large shifts in the parameters to occur.
The Gerlach et al. (2000) algorithm allows us to calculate the marginal likelihood and the expected value of the
likelihood. Let Y stack all the data on the dependent variables and l denote all the parameters in the model except for K 1 ; K 2
and K 3 and the states themselves. Eq. (3) and Lemmas 3 and 4 of Gerlach et al. (2000) describe how we can calculate
pðYjK 1 ; lÞ for the model studied in that paper. We can adapt this result using any one of our three state equations. That is,
this result provides us with pðYjK 1 ; K 2 ; K 3 ; l; a; hÞ. By averaging over MCMC draws of all of these conditioning arguments
(i.e. K 1 ; K 2 ; K 3 ; l; a; h), we can obtain the expected value of the log-likelihood function. To calculate the marginal likelihood,
we use the inverse of these draws of the likelihoods in the approach to marginal likelihood calculation suggested by
Gelfand and Dey (1994). Note that the likelihood function used in the Gelfand and Dey approach could be defined in several
ways and the basic theory underlying this approach would still hold. For instance, the likelihood conditional on all the
states, pðYjK 1 ; K 2 ; K 3 ; l; a; a; hÞ could be used in the Gelfand–Dey algorithm. However, such an approach would
computationally be even less efficient than ours. To obtain more reliable estimates of the marginal likelihood, it is
desirable to analytically integrate out as much as possible. As noted above, this is exactly what the Gerlach et al. (2000)
algorithm allows us to do. That is, this approach involves analytically integrating out one set of the states.
Finally, note that some of the models set elements of K to particular values and, for these, we simply condition on these
values. For instance, for the Benchmark model with at being constant, we calculate pðYjK 11 ¼ K 12 ¼ ¼ K 1T ¼ 0Þ.
The use of the expected log-likelihood can be motivated as in Section 6.5.1 of Carlin and Louis (2000). Note that Carlin
and Louis’s penalized likelihood criteria are closely related to conventional information criteria such as the Schwarz
criteria, but (instead of evaluating them at the maximum likelihood estimate) use the posterior and are based on the
ARTICLE IN PRESS
1014 G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017
expected value of the log of the likelihood function. Like information criteria, such features do not involve the prior (except
insofar as the prior enters the posterior and, thus, the MCMC algorithm) and, thus, will be less sensitive to prior choice (and
can be considered as approximations to the log of the marginal likelihood).
Finally, we turn to the calculation of impulse responses. In linear (time-invariant) VARs, impulse responses can be taken
directly from the vector moving average (VMA) representation implied by the VAR. However, with a TVP-VAR the implied
VMA is changing over time. Suppose the VMA representation of a standard VAR is given by
X
1
yt ¼ yi uti ,
i¼0
then the usual result is that an impulse response h periods in the future is the appropriate element of yh . With a TVP-VAR
the implied VMA will, of course, have time-varying coefficients:
X
1
yt ¼ yti;i uti .
i¼0
This raises two issues when calculating impulse responses. The first is that the impulse responses will be changing over
time. Hence, we have to either plot impulse responses for every time period or choose a few time periods for detailed study.
We adopt both these strategies in our empirical work. A second and more subtle issue arises due to the treatment of shocks
other than the one being perturbed. To explain this issue, suppose we are interested in the effect of a shock of size one (to
the structural errors in the measurement equation) which occurs at time t on the variables at time t þ h. Strictly speaking,
an impulse response is usually interpreted as a difference in conditional expectations such as
Eðytþh jIt ; ut ¼ 1Þ Eðytþh jIt Þ,
where It denotes information through time t. In any nonlinear time series model, these expectations can be calculated
using simulation methods (as in Koop, 1996). However, this can be computationally demanding, so it is much easier to
simply take the structural VAR coefficients at time t (i.e. at and Ut ) and calculate a conventional impulse response function.
Table A1
Posterior median of VAR coefficients (10th/90th percentiles in parentheses).
In linear models, these two strategies are identical, but with nonlinear models they can be slightly different. Nevertheless,
in this paper we adopt this second simpler strategy. Formally, it can be interpreted as an impulse response function
calculated assuming all shocks to the model (including the shocks to the state equations) between time t and t þ h are
simply set to their expected values of zero.
This appendix contains additional empirical results not included in the body of the text.
Tables A1 and A2 in this appendix give point estimates (posterior medians) and measures of uncertainty (i.e. the 10th
and 90th percentile of each posterior) of the VAR coefficients (at ) and the lower-triangular Choleski decomposition of the
error covariance matrix in the structural VAR (i.e. UtU ¼ A1t St ). With regards to the former, note that we are working with
two lags and, to keep the tables as brief as possible, do not present results for the intercepts. Following Primiceri (2005), we
present results for 1975Q1, 1981Q4 and 1996Q1. Given that we have extended the data set, we also present results for
2006Q3.
To aid in interpretation of the tables, note that any parameter will relate to one of our three equations which we call the
inflation (dp), unemployment rate (u) and interest rate (r) equations, respectively. It can also relate to any variable. We use
this equation/variable notation to identify parameters in the tables. So, for instance, the ð1; 3Þ element of Ut is in the interest
rate equation and is the coefficient on the error in the inflation equation. This we label as r/dp. For the VAR coefficients, we
will have, e.g., the coefficient on the second lag of the unemployment rate in the inflation equation. This we label as
dpt =ut2 .
Figs. A1–A3 present the posterior means of the standard deviations of the errors in our three equations using the Few
Breaks prior. Thus, they are comparable to Figs. 1–3 in the paper, but use a different prior.
Table A2
Posterior median of Ut (10th/90th percentiles in parentheses).
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
4.5
3.5
2.5
1.5
0.5
0
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
References
Bernanke, B., Mihov, I., 1998. Measuring monetary policy. Quarterly Journal of Economics 113, 869–902.
Boivin, J., Giannoni, M., 2006. Has monetary policy become more effective? Review of Economics and Statistics 88, 445–462.
Carlin, B., Louis, T., 2000. Bayes and Empirical Bayes Methods for Data Analysis, second ed. Chapman & Hall, Boca Raton.
Chib, S., 1998. Estimation and comparison of multiple change-point models. Journal of Econometrics 86, 221–241.
Christiano, L., Eichenbaum, M., Evans, C., 1999. Monetary shocks: what have we learned and to what end? In: Taylor, J., Woodford, M. (Eds.), Handbook of
Macroeconomics, vol. 1A. Elsevier, New York, pp. 65–148.
Cogley, T., Sargent, T., 2001. Evolving post-World War II inflation dynamics. NBER Macroeconomic Annual 16, 331–373.
Cogley, T., Sargent, T., 2005. Drifts and volatilities: monetary policies and outcomes in the post WWII U.S. Review of Economic Dynamics 8, 262–302.
Cogley, T., Morozov, S., Sargent, T., 2005. Bayesian fan charts for U.K. inflation: forecasting and sources of uncertainty in an evolving monetary system.
Journal of Economic Dynamics and Control 29, 1893–1925.
Durbin, J., Koopman, S., 2002. A simple and efficient simulation smoother for state space time series analysis. Biometrika 89, 603–616.
Gelfand, A., Dey, D., 1994. Bayesian model choice: asymptotics and exact calculations. Journal of the Royal Statistical Society Series B 56, 501–514.
George, E., Sun, D., Ni, S., 2008. Bayesian stochastic search for VAR model restrictions. Journal of Econometrics 142, 553–580.
Gerlach, R., Carter, C., Kohn, E., 2000. Efficient Bayesian inference in dynamic mixture models. Journal of the American Statistical Association 95, 819–828.
Giordani, P., Kohn, R., 2008. Efficient Bayesian inference for multiple change-point and mixture innovation models. Journal of Business and Economic
Statistics 26, 66–77.
Kim, S., Shephard, N., Chib, S., 1998. Stochastic volatility: likelihood inference and comparison with ARCH models. Review of Economic Studies 65,
361–393.
Koop, G., 1996. Parameter uncertainty and impulse response analysis. Journal of Econometrics 72, 135–149.
ARTICLE IN PRESS
G. Koop et al. / Journal of Economic Dynamics & Control 33 (2009) 997–1017 1017