Evaluating competing predictive distributions

Evaluating competing predictive distributions. An
out-of-sample forecast simulation study.
Bachelor’s Thesis in Statistics
Andreas C. Collett∗
January 7, 2015
Abstract†
This thesis aims to formulate a simple measurement that evaluates predic-
tive distributions of out-of-sample forecasts between two competing models.
Predictive distributions form a large part of today’s forecast models used for
policy making. The possibility to compare predictive distributions between
models is important for policy makers who make informed decisions based
on probabilities. We conduct simulation studies to estimate autoregressive
models and vector autoregressive models with Bayesian inference. The for-
mulated measurement uses out-of-sample forecasts and predictive distribu-
tions to evaluate the full forecast error probability distribution by forecast
horizon. We ﬁnd the measurement to be accurate and can be used to evalu-
ate single forecasts or to calibrate forecast models.
Keywords: Autoregressive, out-of-sample forecast, Bayesian inference, Gibbs
sampling, prior distribution, posterior distribution and predictive distribu-
tion.
department of statistics
autumn semester 2014
Course code SU-39434
∗
Correspondence to author: andreascollett@gmail.com.
†
I am deeply grateful to my supervisor Professor Emeritus Daniel Thorburn for his
commitment, time, notes and discussions.

Contents
1. Introduction 2
2. Related Literature 4
3.1 Bayesian Inference 5
3.2 Autoregression 7
3.3 Vector Autoregression 11
4. Evaluating the Predictive Distribution 13
5. Simulation Study 15
6. Hyperparameters 17
7.1 Results 19
7.2 Univariate Simulation Results 20
7.3 Multivariate Simulation Results 22
8. Conclusion 26
Appendix 27
A1. Gibbs Sampler 27
1

1. Introduction
This thesis aims to formulate a simple measurement that evaluates predic-
tive distributions of out-of-sample forecasts between two competing models.
Predictive distributions form a large part of today’s forecast models used for
policy making. The possibility to compare predictive distributions between
models is important for policy makers who make informed decisions based
on probabilities. Out-of-sample forecasts are used to mimic the situation
forecaster’s experience in real time and are used by academics in forecast
methodology research and by practitioners to calibrate forecast models. By
combining predictive distributions and out-of-sample forecasts one can eval-
uate the forecast error probability distribution. Earlier forecast evaluation
literature have tended to focus on point forecasts, either direct from a model
or by a certain value in the predictive distribution, rather than evaluating
the full predictive distribution. This results in loss of information about the
uncertainty of the forecasts and the forecast model. The contribution of
this thesis is to formulate a simple measurement that uses this information
to evaluate forecasts at multiple horizons. There is recent research which
address this subject in various forms; Geweke and Amisano (2008, 2012),
Warne et al. (2013) and Bauwens et al. (2014). However, the literature is
small relative to the literature of evaluating point forecasts.
We generate data samples from univariate autoregression (AR) and mul-
tivariate vector autoregression (VAR) with known true parameters. We will
use Bayesian methods to estimate the data by AR and VAR models to obtain
posterior inference and predictive distributions. A restrictive Gibbs sampler
is implemented to conduct the Bayesian inference. These methods allow us
to produce predictive distributions and to explore the theory and applica-
tion of Bayesian analysis throughout a simple example. The restrictive Gibbs
sampler is a popular method to obtain posterior inference in time series anal-
ysis. Therefore the thesis gives an introduction to Bayesian analysis and its
application in time series.
The structure of simulated data allow us to use simple statistical theory
to evaluate the posterior inference, summarized in Table 1, and the use of
out-of-sample forecasts allow us to evaluate which model produces the most
accurate predictive distribution. The diagonal in Table 1 represents the sit-
uation when the simulated data is estimated with the correct model. The
lower right outcome occurs when the data is simulated from an AR model but
is estimated with a VAR model. In this case the VAR model will not suﬀer
from misspeciﬁcation but include irrelevant independent variables. This will
2

cause an (small) increase in the estimated variance, which in turn will lead
to a wider predictive distribution. The upper right outcome occurs when the
data is simulated from a VAR model but is estimated with an AR model.
In this case the AR model will be misspecified, i.e. the model suffers from
omitted variable bias. This will cause an (large) increase in the estimated
variance, which in turn will lead to a wider predictive distribution. The for-
mulated measurement accounts for both the size of the forecast error and the
probability that this forecast error would occur.
Table 1: Underlying Statistical Theory.
Simulation of Data
Univariate Multivariate
Model
AR(p) Optimal Misspecification
VAR(p) Irrelevant Independent Variables Optimal
We formulate a measurement that use out-of-sample forecasts and predictive
distributions to evaluate the full forecast error probability distribution by
forecast horizon. We are able to validate the accuracy of the measurement
against statistical theory, but we find that the autoregressive model and vec-
tor autoregressive model with the same lag length, have difficulty producing
dissimilar predictive distributions. However, we are able to separate pre-
dictive distributions from the models by letting both be correctly specified
but allowing for high degree of correlation between the error terms across
equations in the vector autoregressive model. From this we find that the
formulated measurement is able to measure the accuracy of the full forecast
error distribution. The measurement can be used as a forecasting evaluating
technique for single forecasts or to calibrate forecast models. The technique
used in this thesis can be used for these purposes.
The rest of the thesis is structured as follow. Section 2 presents the empirical
research closest to the research question. Section 3 describes the empirical
methodology. Section 4 describes the evaluation method of the predictive
distribution. Section 5 describes the simulation method. Section 6 accounts
for the selection of the hyperparameters in the priors. Section 7 discusses
the results and model comparisons.
3

2. Related Literature
We will present two articles that evaluate predictive distributions. The meth-
ods used are technical and we will not go into debt to describe them. Instead
we will mention the methods and give a brief summary of the results. For
interested readers please review the articles under consideration.
Geweke and Amisano (2012) compare the forecast performance and con-
struct model combination for three models; the dynamic factor model, the
dynamic stochastic general equilibrium model and the vector autoregressive
model. They use several analytical techniques to evaluate the forecast per-
formance and to construct model combination; pooling of predictive densi-
ties, analysis of predictive variances, probability integral transform tests and
Bayesian model averaging. They find two improvements that increase fore-
cast accuracy substantially. The first improvement is to use the full Bayesian
predictive distribution instead of the posterior mode for the parameters. The
second improvement is to construct model combination by the use of equally
weighted pooling of the predictive densities from the three models, instead
of relying on the individual predictive distribution from each model. This
result is considerable better than when Bayesian model averaging is used for
the same purpose.
Bauwens et al. (2014) compare the forecast performance of two models that
allow for structural breaks against a wide range of alternative models which
do not allow for structural breaks. They evaluate forecast performance by
two metrics. First, they use root mean squared forecast errors (RMSE) to
evaluate point forecasts. The median of the predictive distribution is used
as point forecast. Second, they use the average of log predictive likelihoods
(APL), which is the predictive density evaluated at the observed outcome.
The APL is estimated by a nonparametric kernel smoother, using draws from
the predictive simulator. They find that no single model is consistently bet-
ter against the alternatives in the presence of structural breaks. One source
for this uncertainty about the forecast performance is that the two metrics
yield substantially different conclusions. They find that the structural break
models seem to dominate the non-structural break models in terms of RMSE,
but the opposite is often true in terms of APL.
4

3.1 Bayesian Inference
To describe Bayesian inference the simple linear regression model will be
examined, consider the model
yt = Xtβ + εt (1)
where yt is a T ×1 vector representing the dependent variable, Xt is a T ×K
matrix and εt
iid
∼ N(0, σ2
), T is the number of observations and K is the
number of independent variables. Our purpose is to obtain estimates of the
K × 1 vector β and the scalar σ2
. These can be obtained by maximizing the
likelihood function
l(yt | β, σ2
) = (2πσ2
)
−T
2 exp −
(yt − βXt) (yt − βXt)
2σ2
(2)
which yields maximum likelihood estimates3 ˆβMLE = (XtXt)−1
(Xtyt) and
ˆσ2
MLE = (yt−ˆβMLEXt) (yt−ˆβMLEXt)
T
. According to the likelihood principle the
likelihood function contains all the information in the data about the param-
eters β and σ2
. This is where the diﬀerence between classical (or frequentist)
inference and Bayesian inference becomes apparent. Bayesian inference in-
corporates prior beliefs about the parameters in the estimation process in
form of probability distributions. This results in the joint posterior distribu-
tion
p(β, σ2
| yt) =
l(yt | β, σ2
)p(β, σ2
)
p(yt)
∝ l(yt | β, σ2
)p(β, σ2
) (3)
where p(β, σ2
) is the prior distribution and p(yt) is the density of the data,
or marginal likelihood. The marginal likelihood, p(yt), does not depend on β
or σ2
and can thus be considered a constant. This yields the unnormalized
joint posterior distribution which is proportional to the likelihood function
times the prior distribution. However, Karlsson (2013) stress the importance
of the marginal likelihood for model comparison. Several conclusions can
be drawn form the joint posterior distribution. First, the joint posterior
distribution represents the probability distribution of the parameters β and
σ2
when the prior distribution has been updated with the information in
the observed data, yt. Second, if the prior distribution is vague (or ﬂat)
then it can be considered as almost constant, this causes the estimates to
3
Note that the ˆβMLE is equal to the ordinary least square estimator, ˆβOLS, while ˆσ2
MLE
is a biased estimate of the variance due to the fact that it does not deduct the number
of estimated parameters from the number of observations from the denominator, which is
the case in ˆσ2
OLS.
5

be similar to those of classical inference, i.e. the likelihood function will
determine the estimates. This also occur when the information in the data
is rich, i.e a large number of observations. There is a large literature on
Bayesian inference in macroeconomics, where the data tend to have a small
number of observations and the model requires a large number of parameters
to be estimated, for example the vector autoregressive model. Given the
joint posterior distribution, the marginal posterior distributions conditional
on the data, p(β | yt) and p(σ2
| yt), can be obtained by integrating β and
σ2
from the joint posterior distribution, one at the time:
p(β | yt) = p(β, σ2
| yt)dσ2
(4)
p(σ2
| yt) = p(β, σ2
| yt)dβ. (5)
For the simple regression model specified there exists analytical (or closed
form) results for the integrals. But for more complex models or particular
prior distributions there may not exist analytical results for the integrals.
Then numerical or simulation techniques are required to obtain estimates on
β and σ2
, such as Markow Chain Monte Carlo (MCMC) simulation. We will
impose the restriction of stability to our AR and VAR processes, i.e. we are
restricted to evaluate a certain range of the distributions of β. This will be
implemented by the Gibbs sampler (see Appendix A1), which will allow us
to sample from the range of this distribution. Even if there exists analytical
results and no restrictions are imposed, there are still situations where sim-
ulation is suitable, this is the case in forecasting. Forecasts with a horizon
beyond the one-step ahead horizon are nonlinear and can only be obtained
by simulation.
The essential key in Bayesian inference is the prior belief of the researcher,
i.e. the prior distribution, p(β, σ2
), in (3). The prior distribution allows
the researcher to address the uncertainty about the parameters before the
data has been taken into account, this is done by specifying a probability
distribution for each parameter. The prior distribution is classified into two
categories, noninformative and informative. The noninformative prior dis-
tributions are implemented when the researcher do not have prior beliefs
about the parameters, when the prior beliefs exists at a third party, i.e. not
known, and for scientific reports where differences in prior beliefs could have
impact on the result. The noninformative prior distribution put a uniform
distribution on the parameters which forces the estimates to be determined
by the data, but still reap benefits of Bayesian analysis. The informative
prior distributions are used when the researcher have prior beliefs about the
6

parameters and incorporate these beliefs into the prior distribution. This is
accomplished by assigning hyperparameters4
or by restricting the parameter
range. It is, typically, diﬃcult to assess the prior belief in practice, therefore
it is essential that the joint posterior distribution is proper and to assess
the posterior inference with sensitivity analysis. According to Sun and Ni
(2004), there exists situations in which the posterior is improper even though
the full conditional distributions used for MCMC are all proper.
One reason for the use Bayesian inference is that it produce predictive distri-
butions. This enables assessment of the probability to an outcome which is
more coherent to policy decisions than evaluating a certain point forecast of a
model5
. The essence of Bayesian inference is that the predictive distribution
accounts for the uncertainty about the future and that then joint posterior
distribution accounts for the uncertainty about the parameters:
p(yt+1:t+h) = f(yt+1:t+h | yt, β, σ2
)p(β, σ2
| yt)dβdσ2
. (6)
This results in a predictive distribution of forecasts p(yt+h) at each forecast
horizon h. The predictive distribution enables the researcher to address the
probability that a certain outcome will occur. This is useful in many ways,
the predictive distribution can be described by measures of central tendency
and to assess distressed scenarios by evaluating quantiles of the predictive
distribution.
3.2 Autoregression
We will use Bayesian methods to estimate the parameters in an autoregressive
process, AR(p), where p is the number of lags to use for the univariate time
series yt
yt =
p
i=1
βiyt−i + εt, t = 1, ..., T (7)
where εt
iid
∼ N(0, σ2
). This model is the same as in (1) where Xt is a T × p
matrix (without a constant) and consists of p lags of the time series yt. An im-
4
The hyperparameters represents the parameters in the prior distribution and are called
hyperparameters to distinguish them from the parameters in the model.
5
There are methods in the frequentist view to generate an approximative predictive
distribution, for example by bootstrapping. These distributions are, however, mostly
tighter than the Bayesian predictive distribution due to the fact that they do not take into
account the uncertainty about the parameters. These methods will not be evaluated in
this thesis.
7

portant difference between the models in (1) and (7) is that in the AR model
the dependent variable is not independently and identically distributed, yt
depends on past values of itself.
The Normal-Gamma prior distribution is the conjugate prior for a normal
distribution, the posterior distribution and the prior distribution are in the
same family of distributions. In the Normal-Gamma prior distribution, the
parameter vector, β, is normally distributed and conditional on the variance,
σ2
, and the variance follows the (inverse) Gamma distribution. The prior
mean is
p β|σ2
∼ N β0, σ2 ¯H (8)
where β0 is a p × 1 vector representing the researchers prior belief about the
parameter values. The prior mean variance-covariance matrix, H, is equal to
the researchers the prior belief of the variance, σ2
, times the p × p diagonal
matrix ¯H representing the researchers uncertainty about the parameters.
Larger values on the diagonal of σ2 ¯H results in larger variances around the
prior means. The prior variance is
p σ2
∼ iΓ
α
2
,
θ0
2
(9)
where α represents the prior degrees of freedom and θ0 represents the prior
scale parameter. Holding the prior degrees of freedom fixed and letting the
prior scale increase results in a (inverse) Gamma distribution with an in-
creasing mean, i.e. the prior belief about the value of σ2
increases. Holding
the prior scale fixed and letting the prior degrees of freedom increase results
in a (inverse) Gamma distribution that is more tightly centred around the
mean, i.e. the prior belief of σ2
becomes tighter. This is illustrated in Figure
1. Specifying the prior belief depends on several factors. Practitioners set
the prior beliefs to their own views, while researchers tend to set these to
the OLS estimate to let the data influence the estimates more than the prior
beliefs. This is viewed as more coherent and accepted academic approach.
But it also depends on the number of observations and parameters, if there is
a large number of observations relative to the number of parameters, the in-
fluence of the data will be stronger. If there is a small number of observations
relative to the number of parameters, this is known as overparametrization
in the literature, then one must specify a strong prior.
8

Figure 1: The (left) figure illustrates the effect on the (inverse) Gamma distri-
bution as the scale parameter θ0 takes the values {1, 2, 3, 4}, holding the degrees
of freedom constant at α = 1. The (right) figure illustrates the effect on the (in-
verse) Gamma distribution as the degrees of freedom α takes the values {1, 2, 3, 4},
holding the scale parameter constant at τ0 = 1.
The conditional posterior distributions of β and σ2
are
p(β | σ2
, yt) = N(M, V ) (10)
p(σ2
| β, yt) = iΓ
τ1
2
,
θ1
2
(11)
where
M = H−1
+
1
σ2
XtXt
−1
H−1
β0 +
1
σ2
Xtyt (12)
V = H−1
+
1
σ2
XtXt
−1
(13)
τ1 = α + T (14)
θ1 = θ0 + (yt − Xtβ) (yt − Xtβ) (15)
and β0, Σ, α and θ0 are hyperparameters specified by the researcher. Note
that there exists analytical results for the Normal-Gamma prior distribution,
but we will use the Gibbs sampler to obtain parameter estimates and pre-
dictive distributions. Table 2 describes the implemented restrictive Gibbs
sampler.
9

Table 2: Restrictive Gibbs Sampler for a AR(p) model.
To illustrate the Gibbs sampler we will examine the first sample, m = 1.
We will start to sample β(1)
from p(β | (σ2
)(0)
, yt), in (10). It is important
to note that (12) and (13) depend on σ2
and the Gibbs sampler need
an initial value for this parameter, denoted (σ2
)(0)
, which is specified
by the researcher. We will set (σ2
)(0)
to the OLS estimate ˆσ2
. As we
have obtained M, in (12), and V , in (13), we can sample β(1)
from
p(β | (σ2
)(0)
, yt) by
ˆβ(1)
= M + [r(V )
1
2 ]
where r is a 1 × p vector of draws from the standard normal distribution
and (V )
1
2 is the Cholesky decomposition of V . We impose the restriction
that ˆβ(1)
must come from a stable AR process, i.e. all the roots, z, of
the polynomial βp(z) = 1 − β1z − β2z2
− ... − βpzp
must have a module
greater than one, |z| > 1. As ˆβ(1)
is obtained we can sample (σ2
)(1)
from
p(σ2
| β(1)
, yt), from the inverse Gamma distribution (11). Note that the
prior degrees of freedom τ1, in (14), and the prior scale parameter θ1,
in (15), requires the researcher to specify α and θ0. A sample from the
inverse Gamma distribution is structured as
(ˆσ2
)(1)
=
θ1
x0x0
where x0 is a 1 × τ1 vector of draws form the standard normal distribu-
tion. A sample from the predictive distributions for forecast horizon h is
structured as
ˆy
(1)
t+h =
h−1
i=1
ˆβ
(1)
i ˆy
(1)
t+h−i +
p
i=1
ˆβ
(1)
i y
(1)
t+h−i + ˆε
(1)
t+h
where ˆε
(1)
t+h = r (ˆσ2)(1) and r is a single draw from the standard normal
distribution. This process is repeated M iterations until we have obtained
β(1)
, ..., β(M)
, (σ2
)(1)
, ..., (σ2
)(M)
and ˆy
(1)
t+h, ..., ˆy
(M)
t+h . The first 1, ..., B it-
erations are discarded, thus β(B+1)
, ..., β(M)
, (σ2
)(B+1)
, ..., (σ2
)(M)
and
ˆy
(B+1)
t+h , ..., ˆy
(M)
t+h are used for empirical distributions. The iterations 1, ..., B
are known as burn-in iterations and are required for the Gibbs sampler
to converge. There is, however, no guarantee that the Gibbs sampler will
perform well or that it will converge. In this thesis we will set M = 4.000
and B = 1.000 to ensure convergence.
10

3.3 Vector Autoregression
We will use Bayesian methods to estimate a vector autoregressive process,
VAR(p), where p is the number of lags to use for each time series. The VAR
model is a system of equations model that allows the endogenous variables
to simultaneously aﬀect each other. Furthermore, the error terms can be
correlated across equations. A structural shock in one of the error terms
may cause a shock in all error terms, causing contemporaneous movement in
all endogenous variables. The VAR(p) with n endogenous variables without
constants, deterministic and exogenous variables, is deﬁned as
yt =
p
i=1
Biyt−i + εt, t = 1, ..., T (16)
where yt is a n × 1 vector, Bi is a n × n matrix, εt is a n × 1 vector and
εT
iid
∼ N(0, Σ). The endogenous variables in the VAR model are not iid, they
depend on past values of yt. We can express (16) in compact matrix form if
we rewrite xt = (yt−1, ..., yt−p)
Y t = XtB + Et (17)
where Y t and Et are T × n matrices, Xt = (x1, ..., xT ) is a T × np matrix
and B = (B1, ..., Bp) is a np × n matrix. Note that the parameter matrix
can be stacked into a n(np) × 1 vector by b = vec(B).
The Normal-Wishart prior distribution is the conjugate prior for a multivari-
ate normal distribution, the posterior distribution and the prior distribution
are in the same family of distributions. In the Normal-Wishart prior dis-
tribution, the parameter vector, b, is normally distributed and conditional
on the variance-covariance matrix, Σ, and the variance-covariance matrix
follows the (inverse) Wishart distribution
p(b | Σ) ∼ N(b0, Σ ⊗ ¯H) (18)
p(Σ) ∼ iW( ¯S, α) (19)
where ⊗ is the Kronecker product and b0 represents the researchers prior
belief of the parameter values. We follow Kadiyala and Karlsson (1993, 1997)
in specifying the matrix ¯H, the prior scale matrix ¯S and the prior degrees
of freedom α . The np × np diagonal matrix ¯H has the diagonal equal to
λ0λ1
lλ2 si
2
11

where l refers to the lag length and s2
i refers to the OLS estimate of the
variance from an AR(p) model and i refers to the endogenous variable in the
ith
equation. The prior scale is a n × n diagonal matrix with the diagonal
equal to (α − n − 1)λ−1
0 s2
i and the prior degrees of freedom satisfy
α = max{n + 2, n + 2h − T} (20)
to ensure existence of the prior variances of the regression parameters and
the posterior variances of the predictive distribution at forecast horizon h.
Following the guidelines of Kadiyala and Karlsson (1993, 1997) we only need
to specify the hyperparameters b0, λ0, λ1 and λ2. The interpretation of the
hyperparameters of λ are as follows:
λ0 controls the overall tightness of the prior on the covariance matrix.
λ1 controls the tightness of the prior on the coefficients on the first lag.
λ2 controls the degree to which coefficients are shrunk towards zero more tightly.
The prior mean variance-covariance matrix, H, is obtained by V (b) = (α −
m − 1)−1 ¯S ⊗ ¯H, due to the imposed Kronecker structure we are not able
to specify prior variances and standard deviations. Instead we are forced to
treat all equations symmetrically.
The conditional posterior distributions of β and Σ are
p(b | Σ, Y t) ∼ N(M, V ) (21)
p(Σ | b, Y t) ∼ iW(¯Σ, T + α) (22)
where
M = H−1
+ Σ−1
⊗ Xt Xt
−1
H−1
b0 + Σ−1
⊗ Xt Xt
ˆb (23)
V = H−1
+ Σ−1
⊗ Xt Xt
−1
(24)
¯Σ = ¯S + (Y t − XtB) (Y t − XtB). (25)
and ˆb is the OLS estimate of b. Kadiyala and Karlsson (1997) and Karlsson
(2013) provide the analytical results for this prior. But we will use the Gibbs
sampler to obtain parameter estimates and predictive distributions. The re-
strictive Gibbs sampler implemented in Table 2 is essentially the same for the
Normal-Wishart prior, ˆb is sampled from (21) and ˆΣ is sampled from (22).
12

There is, however, one step in the Gibbs sampler for the VAR model that will
affect the predictive distributions. This is due to fact that the predictive dis-
tribution accounts for the uncertainty about the future, combined with that
the VAR model allows the error terms to be correlated across the equations.
The Gibbs sampler will draw a sample m from the predictive distribution at
forecast horizon h
ˆy
(m)
t+h =
h−1
i=1
ˆB
(m)
i ˆy
(m)
t+h−i +
p
i=1
ˆB
(m)
i y
(m)
t+h−i + ˆε
(m)
t+h
where ˆε
(m)
t+h = r[ˆΣ
(m)
]
1
2 and r is n draws from the standard normal distribu-
tion. The term [ˆΣ
(m)
]
1
2 is the square root of the estimated variance-covariance
matrix by the Cholesky decomposition, which results in that the lower tri-
angle of ˆΣ
(m)
consists of zeros. Therefore, the order of the variables in the
VAR model is important. For example, for the bivariate VAR model, the
first equation will have two elements of uncertainty added to the forecast for
each draw. While in the second equation will only have one element of un-
certainty added to the forecast at each draw. Therefore, there will be more
uncertainty added to the predictive distribution of y1 than y2.
4. Evaluating the Predictive Distribution
To explain the measurement, consider a predictive distribution at a specific
forecast horizon, h. The problem is to structure a measurement that accounts
for two factors, (1) the accuracy of the forecasts6
and (2) the probabilities
that the forecasts occurs.
The most common method to examine forecast accuracy is to use out-of-
sample forecasts. This allow the researcher to mimic a real time situation.
The observed data is divided into two sets, data for parameter estimation and
actual values for forecast evaluation. The forecast error for each observation
in the predictive distribution at h is
ˆy
(m)
t+h − ya
t+h
where ˆy
(m)
t+h is the mth
Gibbs sampling in predictive distribution at horizon h,
and ya
t+h is the actual value corresponding to the forecast.
6
Note: When we use the terminology forecast, we refer to one element within the
predictive distribution if nothing else is specified.
13

The most common method to visualize a distribution is the histogram and
will serve as a tool to describe the concept of the measurement. The data is
classiﬁed into bins represented by rectangles where the height of the rectan-
gle represents the number of data points within the interval of the bin. The
histogram can be normalized to represent the probability for each bin with
the condition that the sum of probabilities of each bin is equal to one. In Fig-
ure 2 we can see the forecast error probability distribution of two competing
models, it is clear that the left graph is more accurate than the right graph.
We can conclude that these graphs serve our purpose, i.e. we can determine
which graph is most accurate by examining the probabilities of the forecast
error.
Figure 2: The (left) graph is the forecast errors probability distribution of the
AR(2) model at h = 2 for univariate simulation of y2. The (right) graph is
the forecast errors probability distribution of the VAR(2) model at h = 2 for
univariate simulation of y2. The red line indicates zero forecast error.
Examining each forecast error distribution is time-consuming if, for exam-
ple, a practitioner is calibrating a forecast model. Therefore we would like
to summarize the forecast error distribution by the expected value. This
will not, however, yield information about the accuracy of the forecast error
distribution that we intend to measure, instead it will contain information
about the bias of the forecast error distribution. To achieve the same conclu-
sion to be drawn as for the graphs in Figure 3 we will examine the squared
14

forecast errors7
. The expected value of the squared forecast error is
¯et+h =
M−B
m=1
(M − B)−1
(ˆy
(m)
t+h − ya
t+h)2
=
M−B
m=1
pm(ˆy
(m)
t+h − ya
t+h)2
(26)
where ¯et+h is the expected value of the squared forecast error and M − B is
the number of stored samples in the Gibbs sampler. The sum of probabilities
are equal to one, M−B
m=1 pm = 1. A tight forecast error distribution centred
around zero will produce a small value of the expected value of squared fore-
cast error. While a wide forecast error distribution not centred at zero or
skewed away from zero will produce a larger expected value of the squared
forecast error. The expected value of the squared forecast error is itself not
informative, it is only informative when put in relative terms to a competing
model. We will use the notation ¯e
AR(p)
t+h to represent the expected value of the
squared forecast error at forecast horizon h for the AR(p) model and ¯e
VAR(p)
t+h
for the corresponding representation for the VAR(p) model.
5. Simulation Studies
We generate the data for the variables y1 and y2 by (pseudo)-simulation from
univariate AR models and a multivariate VAR model. This will allow us to
obtain results corresponding to Table 1 (the simulation of data represents
the columns of Table 1). Both the univariate and multivariate simulation
will create data for the variables y1 and y2 with TS
= 200 observations8
.
Univariate Simulation: The two time series y1 and y2 are simulated from
two AR(2) models. Both AR models are conditional on being stable, this is
fulfilled when the modules of the eigenvalues of the companion matrix
β1 β2
1 0
are less than one. We impose a stricter condition, the modules of the eigenval-
ues must be less than 0.850. This is motivated by the implemented restrictive
7
There are also other factors to use the squared forecast errors, it penalizes outliers to
a high degree and it has nice properties to the normal distribution. Note, however, that
our forecast error distributions are t-distributed because they are estimated form a model.
8
Note that y1 and y2 depend on past observations. We have specified start values
for each simulation processes. To mitigate the effect of start values we generated 250
observations of y1 and y2 and discarded the first 50 observations, resulting in TS
= 200.
15

Gibbs sampler which will require large increases in computational time as the
modules of the eigenvalues approaches one. The series will be simulated by:
y1,t = 0.70y1,t−1 + 0.10y1,t−2 + ε1,t, ε1,t ∼ N(0, 2) (27)
y2,t = 0.35y2,t−1 + 0.30y2,t−2 + ε2,t, ε2,t ∼ N(0, 3). (28)
The parameters in (27) are chosen so that one of the eigenvalues have a
modules close to the chosen criteria and to have β1 large and β2 small. The
parameters in (28) are chosen to have β1 and β2 closer to each other than
in (27), they should be positive and not close to zero. The pair of β1 and β2
in (27) and (28) have been chosen to not be identical and the variances of y1
and y2 are different. The correlations between y1 and y2 should on average
be zero. But due to the randomness of simulation the correlations between
y1 and y2 will not be constant, which in turn will affect the estimation. The
left graph in Figure 3 shows 100 correlations between y1 and y2, the average
is -0.008 and the median is -0.008.
Multivariate Simulation: The two time series y1 and y2 are simulated
from a bivariate VAR(2) model. The VAR model is conditional on being
stable, this is fulfilled when the modules of the eigenvalues of the companion
matrix
B1 B2
I2 0
are less than one. We again impose the stricter condition that the modules
of the eigenvalues must be less than 0.850. The series will be simulated by
y1,t = 0.70y1,t−1 + 0.10y1,t−2 + ε1,t
y2,t = −0.25y1,t−1 + 0.35y2,t−1 + 0.50y1,t−2 + 0.30y2,t−2 + ε2,t
(29)
where the variance-covariance matrix is
Σ =
2 0.75
√
6
0.75
√
6 3
(30)
where the error terms between the equations have a correlation of 0.750 and
the variances of y1 and y2 are different.
Three factors have determined the choice of the parameter matrix B. First,
the elements in B are chosen so that the correlation that arise from the
parameters are balanced, i.e. the correlation between y1 and y2 is approx-
imately 0.750. Second, the first equation, y1, in (29) should not depend
on the parameters from y2. The only relationship between the variables, in
16

the first equation in (29), is the correlation between error terms across the
two equations. Both models are correctly specified, but the AR model dis-
card the correlated error terms across equations. While the VAR model will
include irrelevant independent variables and will add uncertainty to the pre-
dictive distribution due to the correlated error terms across equations. Both
these effects will make the predictive distribution of the VAR model wider.
Third, the second equation, y2, in (29) should depend on the parameters
of y1. Estimating y2 with a AR model will result in misspecification. Due
to randomness of simulation the correlations between y1 and y2 will not be
constant, which in turn will affect the estimation. The right graph in Figure
3 shows 100 correlations between y1 and y2, the average is 0.745 and the
median is 0.741.
Figure 3: The (left) graph is the histogram of correlations between y1 and y2
generated by 100 univariate simulations. The (right) graph is the histogram
of correlations between y1 and y2 generated by 100 multivariate simulations.
6. Hyperparameters
As mentioned in Section 3.3, Kadiyala and Karlsson (1993, 1997) suggests
a set of guidelines to standardize the restrictions on the parameters in the
Normal-Wishart prior distribution. This allows the researcher to only spec-
ify a number of hyperparameters. Informally, we can think of the inverse
Wishart distribution as a multivariate version of the inverse Gamma distri-
bution. This allows us to align the Normal-Gamma prior in the AR and
Normal-Wishart prior in the VAR, by using the guidelines of Kadiyala and
17

Karlsson (1993, 1997).
We align the Normal-Gamma prior to the Normal-Wishart prior in three
steps. First, we specify the hyperparameters for the prior mean variance,
in (8), in the AR(2) model
H = σ2 ¯H =
λ2
1 0
0 λ1
2λ3
2
we specify σ2
= 1 and the diagonal of ¯H to have the same variances for
the lags as matrix V (b) = (α − n − 1)−1 ¯S ⊗ ¯H has for own lags in the
Normal-Wishart prior. Second, the prior degrees of freedom α, in (14), will
be determined by (20), which results in α = 3. Third, the prior scale param-
eter θ0, in (15), will be determined by θ0 = (α − n − 1)λ−1
0 ˆσ2
, with α = 3
this simpliﬁes to θ0 = λ−1
0 ˆσ2
, where ˆσ2
is the OLS estimate.
Now we turn to examine the guidelines of Kadiyala and Karlsson (1993,
1997) for the Normal-Wishart prior. The prior mean variance V (b) = (α −
n−1)−1 ¯S⊗ ¯H and the prior scale matrix ¯S, with the diagonal (α−n−1)λ−1
0 s2
i ,
both depend on the prior degrees of freedom, α. By (20) we determine that
α = 4, which results in that the scale matrix, ¯S, is
λ−1
0 s2
1 0
0 λ−1
0 s2
2
and the prior mean variance V (b) = ¯S ⊗ ¯H is


















λ2
1 0 0 0 0 0 0 0
0 s1λ1
s2
2
0 0 0 0 0 0
0 0 λ1
2λ2
2
0 0 0 0 0
0 0 0 s1λ1
s22λ2
2
0 0 0 0
0 0 0 0 s2λ1
s1
2
0 0 0
0 0 0 0 0 λ2
1 0 0
0 0 0 0 0 0 s2λ1
s12λ2
2
0
0 0 0 0 0 0 0 λ1
2λ2
2


















notice that the ﬁrst and third diagonal element in this matrix is equal to the
diagonal elements of the prior mean variance in the Normal-Gamma prior.
18

This alignment between the Normal-Gamma prior and the Normal-Wishart
prior allow us to control the parameter restrictions of both priors by only
specifying the prior mean, β and b0, in (8) and (18) for each prior and the
hyperparameters λ0, λ1, λ2 for both priors. For the Normal-Gamma prior
we will set the prior mean to β = (0, 0) and for the Normal-Wishart prior
we will set the prior mean to b0 = (0, 0, 0, 0, 0, 0, 0, 0) . The hyperparameters
will be set to λ0 = λ1 = λ2 = 1.
7.1 Results
To validate the measurement we want to obtain results corresponding to those
in Table 1. We will attempt to verify these by examining posterior variances
and the difference in the measurement between two competing models by
forecast horizon.
But first we will describe the kind of data we analyse in this section. In
Section 5, we generated the data for the univariate and multivariate models.
From this we construct out-of-sample forecast, we will produce forecasts for
the horizon h = 1, 2, ..., 10 leaving Tp
= Ts
− h = 190 observations for pa-
rameter estimation. Predictive distributions are estimated and the expected
value of the squared forecast error, i.e. the measurement, is calculated for
ten forecast horizons. This step of simulating data and calculating the mea-
surement is repeated one hundred times. This results in that each model will
produce data of the measurement in form of two 100 × 10 matrices where
the first matrix is for the univariate simulated data and the second is for the
multivariate simulated data.
First, we will examine the posterior variance. The results are presented
in Table 3 and 5, the coefficient is the expected value of the mean posterior
variance, resulting from one hundred simulations of the data. This is moti-
vated by two reasons (1) to summarize the data, each simulation of the data
will yield a mean posterior variance and the corresponding ninety-five per-
cent probability interval of the posterior variance (2) deviations from Table
1 should be caused by the random correlation in the data due to simulation,
therefore the expected value of the mean posterior variance yield a more ro-
bust conclusion.
Second, we will examine the difference in the measurement between two
competing models at each forecast horizon. The mean of this difference will
determine which predictive distribution is most accurate. We will represent
19

this by the linear regression described in (1), where the dependent variable is
the difference in the measurement between the competing models by forecast
horizon and the independent variable of a constant:
(¯e
AR(2)
t+h )i − (¯e
VAR(2)
t+h )i = ϕ + εi, i = 1, ..., 100 (31)
The coefficient of the constant, ϕ, represent the mean of the difference in
the measurement between the competing models by forecast horizon, this
enables the following hypothesis test:
H0 : ϕ = 0
HA : ϕ = 0
If ϕ < 0 then the predictive distribution of the AR(2) is more accurate than
the VAR(2). If ϕ > 0 then the predictive distribution of the VAR(2) is more
accurate than the AR(2). If H0 is not rejected, then we cannot conclude that
one predictive distribution is more accurate than the other. The estimation
of this model is similar to the AR(p) model described in section 3.1 and the
Gibbs sampler described in Table 2. We will set the hyperparameters as
follows β0 = 0, σ2
= 1, θ0 = ˆσ2
and α = 3 according to (20). The results are
presented in Table 4 and 6.
7.2 Univariate Simulation Results
We start by examining the results of the estimated posterior variance and
covariance from the AR(2) and VAR(2) models for the univariate simulations
of the data. The first column in Table 1 shows the expected outcomes. The
AR model is correctly specified, while the VAR model includes irrelevant
independent variables which is expected to increase the estimated posterior
variance.
The results are presented in Table 3. For the variable y1, we conclude that
the expected value of the mean posterior variance is equal to 2.006 for the
AR and 2.007 for the VAR. Therefore we conclude that inclusion of irrele-
vant independent variables do increase the estimated posterior variance as
expected. However, this increase is smaller than we expected. For the vari-
able y2, we conclude that the expected value of the mean posterior variance
is 2.962 for the AR model and 2.965 for the VAR model. The increase in
estimated posterior variance is somewhat larger than for y1 but still smaller
than expected. The expected value of the mean posterior covariance in the
VAR model is 0.006. Which is close to zero as we expect due to that y1 and
20

y2 are simulated independent of each other. From the (left) graph in Figure
3, we conclude that the correlation between y1 and y2 is on average -0.008,
but range approximately from -0.300 to 0.350.
Overall, we conclude that the estimated posterior variances follow the ex-
pected theory in Table 1, but the increase in estimated posterior variance
due to inclusion of irrelevant variables in the VAR model is smaller than we
expected.
Table 3: Expected Values of Mean Posterior Variance/Covariance.
Univariate Simulation of the Data
AR(2) VAR(2)
E[Coef.] E[95% PI] E[Coef.] E[95% PI]
V(y1) 2.006 [ 1.638 ; 2.455] 2.007 [ 1.639 ; 2.456]
COV(y1,y2) - - 0.006 [-0.348 ; 0.361]
V(y2) 2.962 [ 2.419 ; 3.620] 2.965 [ 2.421 ; 3.633]
E[.] is the expected value of 100 simulations of the data.
We now turn to examine the results for the linear regression (31) of the
measurement we have formulated. From our findings about the estimated
posterior variance we expect the forecast error probability distributions of
the AR model to be tighter and more centred around zero than for the VAR
model, i.e the measurement would be smaller for the AR model than for
the VAR model. The expected value of the estimated posterior covariance
between y1 and y2 was close to zero and, therefore, a minimal increase of
uncertainty in the predictive distribution of y1 caused by the Cholesky de-
composition. Therefore, we expect to reject H0 and find that the parameter
ϕ is negative.
The results are presented in Table 4. For the variable y1, we conclude that
we reject H0 at one of the ten forecast horizons. We find that ϕ is negative
at the second forecast horizon as expected, i.e the ninety-five percent prob-
ability interval do not range over zero. On average at this forecast horizon,
the predictive distribution of the AR model is more accurate than that of
the VAR model. For all other forecast horizons H0 cannot be rejected. For
the variable y2, we conclude that we reject H0 at one of the ten forecast
horizons. We find that ϕ is negative at the first forecast horizon as expected.
On average at this forecast horizon, the predictive distribution of the VAR
model is more accurate than that of the AR model. For all other forecast
horizons H0 cannot be rejected.
21

Overall, we cannot conclude that the AR model produce more accurate pre-
dictive distributions than the VAR model for the univariate simulated data.
It seems that the small increase in estimated posterior variance caused by the
inclusion of irrelevant variables is not large enough to distinguish between
the two models predictive distributions.
Table 4: Regression Results.
Univariate Simulation of the Data
y1 y2
ϕ 95% PI ϕ 95% PI
h=1 -0.111 [-0.377 ; 0.155] -0.906 [-1.768 ; -0.063]
h=2 -0.527 [-0.976 ; -0.060] 0.080 [-0.536 ; 0.698]
h=3 -0.235 [-0.615 ; 0.132] -0.559 [-1.330 ; 0.233]
h=4 -0.197 [-0.568 ; 0.166] 0.261 [-0.263 ; 0.789]
h=5 -0.085 [-0.535 ; 0.358] -0.448 [-1.268 ; 0.358]
h=6 -0.016 [-0.384 ; 0.347] -0.157 [-0.597 ; 0.292]
h=7 0.161 [-0.245 ; 0.562] 0.387 [-0.060 ; 0.833]
h=8 0.042 [-0.359 ; 0.449] 0.186 [-0.128 ; 0.501]
h=9 0.004 [-0.434 ; 0.446] 0.005 [-0.264 ; 0.273]
h=10 0.060 [-0.381 ; 0.488] 0.135 [-0.148 ; 0.413]
PI stands for probability interval.
7.3 Multivariate Simulation Results
We start by examining the results of the estimated posterior variance and co-
variance from the AR(2) and VAR(2) models for the multivariate simulations
of the data. The second column in Table 1 shows the expected outcomes.
The VAR model is correctly specified, while the AR model is misspecified
which is expected to increase the estimated posterior variance.
The results are presented in Table 5. For the variable y1, which is not de-
termined by y2, we conclude that the expected value of the mean posterior
variance is equal to 1.980 for the AR and 1.983 for the VAR. We conclude
that the inclusion of irrelevant independent variables cause an increase in the
estimated posterior variance, this is the same conclusion as in the univariate
simulation of data. For the variable y2, we conclude that the result follow the
statistical theory well. The expected value of the mean posterior variance
is 3.291 for the AR model and 3.025 for the VAR model. The increase in
estimated posterior variance due to misspecification is large. The expected
value of the mean posterior covariance in the VAR model is 1.821. Which is
close to the covariance in equation (30), 0.75 ×
√
6 ≈ 1.837. The choice of
elements in parameter matrix B have balanced the correlation between y1
22

and y2 to the same correlation specified for the error terms. From the (right)
graph in Figure 3, we conclude that the correlation between y1 and y2 is on
average 0.745, but range approximately from 0.550 to 0.850.
Overall, we conclude that the estimated posterior variances follow the ex-
pected theory in Table 1. The increase in the estimated posterior variance
due to misspecification is large as expected. We find the same conclusion of
inclusion of irrelevant variables as in the univariate simulation of the data.
Table 5: Expected Values of Mean Posterior Variance/Covariance.
Multivariate Simulation of the Data
AR(2) VAR(2)
E[Coef.] E[95% PI] E[Coef.] E[95% PI]
V(y1) 1.980 [ 1.617 ; 2.420] 1.983 [ 1.619 ; 2.429]
COV(y1,y2) - - 1.821 [ 1.423 ; 2.303]
V(y2) 3.291 [ 2.689 ; 4.027] 3.025 [ 2.469 ; 3.703]
E[.] is the expected value of 100 simulations of the data.
We now turn to examine the results for the linear regression (31) of the mea-
surement. We expect ϕ to be negative for y1, the AR model do not suffer
from misspecification and the expected value of the mean posterior covari-
ance is large. Therefore we expect the predictive distributions of the VAR
model to be wide for y1 due to the Cholesky decomposition. From the finding
about the estimated posterior variance for y2, we expect to reject H0 and
find that the parameter ϕ is positive.
The results are presented in Table 6. For the variable y1, we conclude that we
reject H0 at all forecast horizons except the first. We find that ϕ is negative
for the second to tenth forecast horizon as we expected due to the Cholesky
decomposition. On average, the forecast accuracy of the AR relative to the
VAR increases over forecast horizons. For the variable y2, we conclude that
we reject H0 for three out of ten forecast horizons. We find that ϕ is neg-
ative for the eighth to tenth forecast horizon, which is opposite to what we
expected. We also conclude that the magnitude of the differences are large
for the ninth and tenth forecast horizon.
23

Table 6: Regression Results.
y1 y2
ϕ 95% PI ϕ 95% PI
h=1 -0.029 [-0.159 ; 0.109] 0.272 [-0.050 ; 0.604]
h=2 -0.251 [-0.467 ; -0.038] 0.050 [-0.257 ; 0.349]
h=3 -0.321 [-0.553 ; -0.087] 0.536 [ 0.145 ; 0.938]
h=4 -0.289 [-0.563 ; -0.030] 0.172 [-0.192 ; 0.539]
h=5 -0.219 [-0.470 ; 0.039] 0.334 [-0.104 ; 0.792]
h=6 -0.338 [-0.597 ; -0.068] -0.257 [-0.640 ; 0.125]
h=7 -0.308 [-0.614 ; -0.005] -0.211 [-0.614 ; 0.203]
h=8 -0.333 [-0.598 ; -0.062] -0.492 [-0.910 ; -0.066]
h=9 -0.483 [-0.755 ; -0.209] -0.701 [-1.107 ; -0.293]
h=10 -0.466 [-0.718 ; -0.210] -0.905 [-1.289 ; -0.523]
Table 7 shows the same analysis as before but this time we have changed the
order of the variables when estimating the VAR model. For the variable y1
we conclude that we reject H0 at the second, fourth, fifth and ninth forecast
horizon. We find that ϕ is negative for the second forecast horizon and
positive for fourth, fifth and ninth forecast horizon. For the variable y2 we
conclude that we reject H0 for five out of the ten forecast horizons. We find
that ϕ is negative at the first, sixth, eight, ninth and tenth forecast horizon
as expected due to Cholesky decomposition. Both these results show the
strong effect of Cholesky decomposition. The variable y2 handles the extra
uncertainty added to the predictive distribution better than y1. This is most
likely due to the misspecification of the AR model in the y2 variable.
Table 7: Changed Order Of Variables in VAR.
y2 y1
ϕ 95% PI ϕ 95% PI
h=1 -2.054 [-3.041 ; -1.001] -0.940 [-1.956 ; 0.022]
h=2 -0.165 [-0.946 ; 0.614] -1.506 [-2.507 ; -0.480]
h=3 -0.415 [-0.836 ; 0.018] 0.132 [-0.282 ; 0.549]
h=4 -0.310 [-0.698 ; 0.086] 0.295 [ 0.077 ; 0.514]
h=5 -0.307 [-1.020 ; 0.409] 0.599 [ 0.088 ; 1.119]
h=6 -0.680 [-1.267 ; -0.089] 0.150 [-0.387 ; 0.666]
h=7 -0.433 [-0.924 ; 0.055] 0.388 [-0.008 ; 0.775]
h=8 -0.569 [-0.966 ; -0.168] 0.158 [-0.150 ; 0.463]
h=9 -0.829 [-1.290 ; -0.367] 0.388 [ 0.049 ; 0.726]
h=10 -0.936 [-1.369 ; -0.512] 0.178 [-0.238 ; 0.579]
24

Overall, we can conclude that the AR model produce more accurate predic-
tive distributions than the VAR model for the multivariate simulated data.
This was expected for y1 due to Cholesky decomposition and that neither the
AR or VAR was misspecified. For y2, however, it seems that misspecification
is too small to have an effect on the predictive distribution. Instead the VAR
is out performed by the AR at longer horizons.
Summing up our results. From both the univariate and multivariate re-
sults we conclude that the estimated posterior variances follow the statistical
theory of Table 1. But the magnitude of these effects are smaller than ex-
pected. From analyzing the predictive distribution by the measurement for
the univariate simulation of the data. We find difficulties to verify the in-
crease in estimated posterior variance in the predictive distributions, due to
the statistical theory. Only at two horizons are we able to distinguish that
the AR models outperform the VAR models. This difficulty arise because
the increase in the estimated posterior variance is too small to separate the
predictive distributions. From the multivariate simulation of the data we
conclude that there is a large effect of the Cholesky decomposition in the
first equation of the estimation in the VAR model. We find that the VAR
model produces inferior predictive distributions of y1 for all horizons. This
result is not as strong for y2 where five of the horizons produce inferior pre-
dictive distributions. We also conclude that this effect increases with forecast
horizon.
We conclude that the autoregressive and vector autoregressive models with
a lag length of two, have difficulty to produce dissimilar predictive distribu-
tions. This has made it difficult to assess the accuracy of the measurement
but by examining the effect of the Cholesky decomposition in VAR models
we are able to validate the accuracy of the measurement.
25

8. Conclusion
We conduct simulation studies to formulate a measurement that evaluates
the forecast accuracy of predictive distributions. We use Bayesian methods to
estimate posterior inference and predictive distributions for the autoregres-
sive model and the vector autoregressive model. By the use of out-of-sample
forecasts and predictive distributions we are able to evaluate the full distri-
bution of forecast errors. We are also able to validate the accuracy of the
measurement, especially by allowing for correlated error terms across equa-
tions in the vector autoregressive model.
We formulate a measurement that uses out-of-sample forecasts and predic-
tive distributions to evaluate the full forecast error probability distribution by
forecast horizon. The measurement can be used as a forecasting evaluating
technique for single forecasts or to calibrate forecast models. The technique
used in this thesis can be used for these purposes. Furthermore, we recom-
mend that the measurement should be used with several other forecasting
evaluation techniques when used by practitioners.
For further research we recommend that the measurement should be evalu-
ated by models that are not from the same family to ensure differences in
predictive distributions, treat conditional hetroskedasticity differently, case
studies of outliers such as financial crisis and evaluated against a wide range
of forecasting evaluation techniques.
26

Appendix
A1. Gibbs Sampler
To explain the intuition behind the Gibbs sampler we will borrow a summary
put forward by Ciccarelli and Rebucci (2003), but with the mathematical no-
tation convenient to Section 3.1.
In many applications the analytical integration of p(β, σ2
|yt) may be diﬃ-
cult or even impossible to implement. This problem, however, can often
be solved by using numerical integration based on Monte Carlo simulation
methods.
One particular method used in the literature to solve similar estimation prob-
lems of those discussed in the paper is the Gibbs sampler. The Gibbs Sampler
is a recursive Monte Carlo method which only requires knowledge of the full
conditional posterior distribution of the parameters of interest, p(β|σ2
, yt)
and p(σ2
|β, yt) are known. Then the Gibbs sampler starts from an arbi-
trary value for β(0)
or (σ2
)(0)
, and samples alternately from the density from
each element of the parameter vector, conditional on the values of the other
element sampled in the previous iteration and the data. Thus, the Gibbs
sampler samples recursively as follows:
β(1)
from p(β|(σ2
)(0)
, yt)
(σ2
)(1)
from p(σ2
|β(1)
, yt)
β(2)
from p(β|(σ2
)(1)
, yt)
(σ2
)(2)
from p(σ2
|β(2)
, yt)
...
β(m)
from p(β|(σ2
)(m−1)
, yt)
(σ2
)(m)
from p(σ2
|β(m)
, yt)
and so on.
The vectors ϑ(m)
= (β(m)
, (σ2
)(m)
) from a Markov Chain, and, for a suﬃ-
ciently large number of iterations (say m ≥ M), can be regarded as draws
from the true joint posterior distribution. Given a large sample of draws
from this limiting distribution, any posterior moment or marginal density
of interest can then be easily estimated consistently with the corresponding
sample average.
27

References
Luc Bauwens, Gary Koop, Dimitris Korobilis, and Jeroen VK Rombouts.
The contribution of structural break models to forecasting macroeconomic
series. Journal of Applied Econometrics, 2014.
Andrew P Blake and Haroon Mumtaz. Applied Bayesian econometrics for
central bankers. Number 4 in Technical Books. Centre for Central Banking
Studies, Bank of England, 2012. URL http://ideas.repec.org/b/ccb/
tbooks/4.html.
Matteo Ciccarelli and Alessandro Rebucci. BVARs: A Survey of the Re-
cent Literature with an Application to the European Monetary Sys-
tem. Rivista di Politica Economica, 93(5):47–112, September 2003. URL
http://ideas.repec.org/a/rpo/ripoec/v93y2003i5p47-112.html.
A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari, and D.B.
Rubin. Bayesian Data Analysis, Third Edition. Chapman & Hall/CRC
Texts in Statistical Science. Taylor & Francis, 2013. ISBN 9781439840955.
URL http://books.google.se/books?id=ZXL6AQAAQBAJ.
John Geweke and Gianni Amisano. Comparing and evaluating Bayesian
predictive distributions of assets returns. Working Paper Series 0969, Eu-
ropean Central Bank, November 2008. URL http://ideas.repec.org/
p/ecb/ecbwps/20080969.html.
John Geweke and Gianni Amisano. Prediction using several macroeconomic
models, 2012.
K Rao Kadiyala and Sune Karlsson. Forecasting with generalized bayesian
vector auto regressions. Journal of Forecasting, 12(3-4):365–378, 1993.
K Rao Kadiyala and Sune Karlsson. Numerical Methods for Estimation
and Inference in Bayesian VAR-Models. Journal of Applied Econometrics,
12(2):99–132, March-Apr 1997. URL http://ideas.repec.org/a/jae/
japmet/v12y1997i2p99-132.html.
Sune Karlsson. Chapter 15 - forecasting with bayesian vector autoregres-
sion. In Graham Elliott and Allan Timmermann, editors, Handbook
of Economic Forecasting, volume 2, Part B of Handbook of Economic
Forecasting, pages 791 – 897. Elsevier, 2013. doi: http://dx.doi.org/
10.1016/B978-0-444-62731-5.00015-4. URL http://www.sciencedirect.
com/science/article/pii/B9780444627315000154.
28

Gary Koop and Dimitris Korobilis. Bayesian multivariate time series meth-
ods for empirical macroeconomics. Now Publishers Inc, 2010.
Dongchu Sun and Shawn Ni. Bayesian analysis of vector-autoregressive mod-
els with noninformative priors. Journal of statistical planning and infer-
ence, 121(2):291–309, 2004.
Anders Warne, G¨unter Coenen, and Kai Christoﬀel. Predictive likelihood
comparisons with DSGE and DSGE-VAR models. Working Paper Series
1536, European Central Bank, April 2013. URL http://ideas.repec.
org/p/ecb/ecbwps/20131536.html.
29

Evaluating competing predictive distributions

Recommended

More Related Content

What's hot (19)

Similar to Evaluating competing predictive distributions (20)

Evaluating competing predictive distributions