0% found this document useful (0 votes)
3 views

IntroBayesTimeSeries2

The document provides an introduction to Bayesian Time Series Econometrics, focusing on various models such as vector autoregression (VAR), unobserved components models, and Bayesian model comparison techniques. It outlines the structure of the VAR model, estimation methods using Gibbs sampling, and the application of state space models, including the Kalman filter. Additionally, it includes empirical examples and MATLAB code for implementing these models using U.S. economic data.

Uploaded by

jessezheng742247
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

IntroBayesTimeSeries2

The document provides an introduction to Bayesian Time Series Econometrics, focusing on various models such as vector autoregression (VAR), unobserved components models, and Bayesian model comparison techniques. It outlines the structure of the VAR model, estimation methods using Gibbs sampling, and the application of state space models, including the Kalman filter. Additionally, it includes empirical examples and MATLAB code for implementing these models using U.S. economic data.

Uploaded by

jessezheng742247
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Introduction to

Bayesian Time Series Econometrics

Joshua Chan

Australian National University


6 July 2013
Instructor

Name: Joshua Chan (Josh or Dr. Chan)

Website:

http://people.anu.edu.au/joshua.chan/

Email:
[email protected]
Plan for Today

Four 50-minute Sessions


◦ vector autoregression or VAR
◦ unobserved components model (Kalman filter, precision
sampler)
◦ time-varying parameter VAR
◦ Bayesian model comparison (harmonic mean estimator, Chib’s
method, etc.)
Vector Autoregression

Consider the VARn (p) model:

yt = b + B1 yt−1 + · · · + Bp yt−p + ǫt ,

where ǫt ∼ N(0, Σ)

b is an n × 1 vector of intercepts and Bi is an n × n coefficient


matrix

(So basically a regression, but with multiple equations)


An Example

For example, if n = 2 and p = 1:


        
y1,t b1 B11,1 B12,1 y1,t−1 ǫ
= + + 1,t ,
y2,t b2 B21,1 B22,1 y2,t−1 ǫ2,t
where      2 
ǫ1,t 0 σ11 σ12
∼N , 2 .
ǫ2,t 0 σ21 σ22
Estimation

If we can write the system as

y = Xβ + ǫ,

we can use the Gibbs sampler for the linear regression model

So let’s try to do that

(Work out the previous example first)


An Example (Continued)

Write
        
y1,t b1 B11,1 B12,1 y1,t−1 ǫ
= + + 1,t ,
y2,t b2 B21,1 B22,1 y2,t−1 ǫ2,t
as
 
b1
 
    B11,1   
1 y1,t−1 y2,t−1 0 0 0  
y1,t
= B12,1 + ǫ1,t ,
y2,t 0 0 0 1 y1,t−1 
y2,t−1  b2   ǫ2,t
B21,1 
B22,1

Or 

yt = I2 ⊗ [1, yt−1 ] β + ǫt
In the Form of a Regression

In general, rewrite the VARn (p) model as:

yt = Xt β + ǫt ,

where Xt = In ⊗ [1, yt−1 ′
, . . . , yt−p ] and β = vec([b, B1 , · · · Bp ]′ ).

Then stack the observations over t to get

y = Xβ + ǫ,

where
   
y1 X1
   
y =  ...  , X =  ...  , ǫ ∼ N(0, IT ⊗ Σ)
yT XT
VAR: Likelihood

Since
(y | β, Σ) ∼ N(Xβ, IT ⊗ Σ),
the likelihood function is given by:
1 1 ′ −1 (y−Xβ)
f (y | β, Σ) = |2π(IT ⊗ Σ)|− 2 e− 2 (y−Xβ) (IT ⊗Σ)
Tn T 1 ′ −1
= (2π)− 2 |Σ|− 2 e− 2 (y−Xβ) (IT ⊗Σ )(y−Xβ)

Here we have used

|IT ⊗ Σ| = |Σ|T
(IT ⊗ Σ)−1 = IT ⊗ Σ−1
Also note that since

(yt | β, Σ) ∼ N(Xt β, Σ),

another way to write the likelihood is


Tn T 1 PT ′ −1 (y −X β)
f (y | β, Σ) = (2π)− 2 |Σ|− 2 e− 2 t=1 (yt −Xt β) Σ t t
Inverse-Wishart Distribution

An n × n random matrix Z is said to have an inverse-Wishart


distribution with shape parameter α > 0 and scale matrix W if its
pdf is given by

|W|α/2 1 −1
|Z|− 2 e− 2 tr(WZ ) ,
α+n+1
f (Z; α, W) = nα/2
2 Γn (α/2)

where Γn is the multivariate gamma function and tr(·) is the trace


function

We write Z ∼ InvWishart(α, W)
W
Moment: EZ = α−n−1 for α > n + 1
Priors and Gibbs Sampler

Independent priors for β and Σ:

β ∼ N(β 0 , Vβ ), Σ ∼ InvWishart(ν0 , S0 )

Use Gibbs sampler to estimate the model

We need to derive (1) (Σ | y, β) and (2) (β | y, Σ)


Sample (Σ | y, β)

Recall that for conformable matrices A, B, C

tr(ABC) = tr(BCA) = tr(CAB)

(1) f (Σ | y, β):
ν0 +n+1 1 −1 T 1 PT ′ −1 (y −X β)
∝ |Σ|− 2 e− 2 tr(S0 Σ )
|Σ|− 2 e− 2 t=1 (yt −Xt β) Σ t t

ν0 +n+T +1 1
) − 12 tr[ T ′ −1
]
−1
P
∝ |Σ|− 2 e− 2 tr(S0 Σ e t=1 (yt −Xt β)(yt −Xt β) Σ

ν +n+T +1 PT
− 0 2 − 21 tr[(S0 + t=1 (yt −Xt β)(yt −Xt β)

)Σ−1 ]
∝ |Σ| e
Now, compare
1 −1
e− 2 tr(SΣ )
ν+n+1
f (Σ | y, β) ∝ |Σ|− 2 ,

with
ν0 +T +n+1 1 PT
e− 2 tr[(S0 + )Σ−1 ]

∝ |Σ|− 2 t=1 (yt −Xt β)(yt −Xt β)

Hence,
T
!
X
(Σ | y, β) ∼ InvWishart ν0 + T , S0 + (yt − Xt β)(yt − Xt β)′
t=1
Sample (β | y, Σ)

(2) f (β | y, Σ):
1 ′ −1 1
∝ e− 2 (β−β0 ) Vβ (β−β0 ) e− 2 (y−Xβ) (IT ⊗Σ )(y−Xβ)
′ −1

1
∝ e− 2 [β (Vβ +X (IT ⊗Σ )X)β−2β (Vβ β0 +X (IT ⊗Σ )y)]
′ −1 ′ −1 ′ −1 ′ −1

Hence,
b Dβ ),
(β | y, Σ) ∼ N(β,
where
 −1
Dβ = Vβ−1 ′
+ X (IT ⊗ Σ−1
)X ,
 
b = Dβ V−1 β + X′ (IT ⊗ Σ−1 )y
β β 0
Gibbs Sampler for the VAR

Pick some initial values β (0) = a0 and Σ(0) = B0 > 0. Then,


repeat the following steps from r = 1 to R:
1. Draw Σ(r ) ∼ f (Σ | y, β (r −1) ) (inverse-Wishart).
2. Draw β (r ) ∼ f (β | y, Σ(r ) ) (multivariate normal).
Empirical Example

U.S. data: GDP growth, CPI inflation rate, unemployment rate,


and Fed funds rate

From 1947:Q1 to 2011:Q4

Estimate a VAR4 (3)


MATLAB Code

% VAR3.m
nloop = 11000; burnin = 1000;
load ’USdata.csv’;
Z0 = USdata(1:3,:); Z = USdata(4:end,:);
[T n] = size(Z);
longZ = reshape(Z’,T*n,1);
k = n+3*n^2;

%% prior
temp1 = ones(k,1); temp1(1:2*n+1:k) = 1/100;
invVbeta = sparse(1:k,1:k,temp1’);
nu0 = 10; S0 = 7*eye(n);
MATLAB Code

%% compute and define a few things


X = [ones(T,1) [Z0(3,:); Z(1:end-1,:)] ...
[Z0(2:end,:); Z(1:end-2,:)] [Z0; Z(1:end-3,:)] ];
bigX = SURform2(X,n);
newnu = T + nu0;

%% initialize for storage


store_Sig = zeros(nloop-burnin,n,n);
store_beta = zeros(nloop-burnin,k);

%% initialize the chain


beta = (bigX’*bigX)\(bigX’*longZ);
err = reshape(longZ - bigX*beta,n,T);
Sig = err*err’/T;
invSig = Sig\speye(n);
MATLAB Code

for loop = 1:nloop


%% sample beta
XinvSig = bigX’*kron(speye(T),invSig);
XinvSigX = XinvSig*bigX;
invDbeta = invVbeta + XinvSigX;
betahat = invDbeta\(XinvSig*longZ);
beta = betahat + chol(invDbeta,’lower’)’\randn(k,1);

%% sample Sig
err = reshape(longZ - bigX*beta,n,T);
newS = S0 + err*err’;
Sig = iwishrnd(newS,newnu);
invSig = Sig\speye(n);
end
State Space Models: Overview

Now we study a general class of models called state space models


(focus on the linear Gaussian case)

High-dimensional, flexible models

Traditionally estimated by Kalman filter based methods

Here we introduce some new algorithms (Chan and Jeliazkov,


2009; McCausland et al. 2011) to do that without using KF
(easier derivation and faster algorithms)
State Space Models: Definition

A state space model consists of two modeling levels

In the first level, observations are related to the latent or


unobserved variables called states according to the observation or
measurement equation

yt = Xt θ t + ǫt , ǫt ∼ N(0, Σ)

In the second level, the evolution of the states are modeled via the
state or transition equation

θ t = θ t−1 + ζ t , ζ t ∼ N(0, Ω)
Kalman Filter: Overview

Kalman Filter: a system of recursive equations that is used for two


purposes:
◦ sample from the conditional density f (θ | y, Σ, Ω) (typically a
very high-dimensional normal density)
◦ evaluate the integrated or observed-data likelihood f (y | Σ, Ω)
(as opposed to the complete-data likelihood f (y | θ, Σ, Ω))

Turns out we can accomplish both tasks with some new sparse
matrix algorithms without using KF
Unobserved Components Model

We start with the simplest state space model: the unobserved


components model

The measurement equation is given by

yt = τ t + ǫ t , ǫt ∼ N(0, σ 2 )

The states, in turn, are initialized with τ1 ∼ N(τ0 , ω02 ) for some
known constants τ0 and ω02 , and evolve according to the transition
equation
τt = τt−1 + ut , ut ∼ N(0, ω 2 )
for t = 2, . . . , T
Priors and Estimation

Independent priors for σ 2 and ω 2

σ 2 ∼ InvGamma(νσ2 , Sσ2 ), ω 2 ∼ InvGamma(νω2 , Sω2 ),

A Gibbs sampler is constructed by sampling through


1. (τ | y, σ 2 , ω 2 ),
2. (σ 2 | y, τ , ω 2 ),
3. (ω 2 | y, τ , σ 2 ).

One main difficulty is to sample (τ | y, σ 2 , ω 2 ), which is a


T -dimensional density
Sample (τ | y, σ 2, ω 2): Overview

The joint posterior is given by

f (τ , σ 2 , ω 2 | y) ∝ f (y | τ , σ 2 )f (τ | ω 2 )f (σ 2 )f (ω 2 ),

where f (σ 2 ) and f (ω 2 ) are the inverse-Gamma priors

First show that f (τ | y, σ 2 , ω 2 ) is a normal density

Then discuss how one can sample from it efficiently


An Expression for f (y | τ , σ 2)

Rewrite the measurement equation as

y = τ + ǫ, ǫ ∼ N(0, σ 2 IT )

Hence, we have
T 1 ′
f (y | τ , σ 2 ) = (2πσ 2 )− 2 e− 2σ2 (y−τ ) (y−τ )
An Expression for f (τ | ω 2 )

For simplicity, assume τ0 = 0

Rewrite the transition equation as:


    
1 0 0 ··· 0 τ1 u1
−1 1 0 · · · 0   τ 2   u2 
    
 0 −1 1 · · · 0  τ3   u3 
   =  
 .. . . ..   ..   .. 
 . . .  .   . 
0 0 ··· −1 1 τT uT
| {z }
H

That is,
Hτ = u, u ∼ N(0, Ω)
where Ω = diag(ω02 , ω 2 , . . . , ω 2 )
Note that |H| = 1 and hence H is invertible

Since τ = H−1 u, Eτ = 0 and

Var(τ ) = H−1 Ω(H−1 )′

Finally,
τ ∼ N(0, (H′ Ω−1 H)−1 )

(Recall (AB)−1 = B−1 A−1 )


It follows that
1 1 ′ (H′ Ω−1 H)τ
f (τ | ω 2 ) = |(2π)(H′ Ω−1 H)−1 |− 2 e− 2 τ
T 1 1 ′ (H′ Ω−1 H)τ
= (2π)− 2 |Ω|− 2 e− 2 τ
T 1 T −1 1 ′ (H′ Ω−1 H)τ
= (2π)− 2 (ω02 )− 2 (ω 2 ) 2 e− 2 τ
An Expression for f (τ | y, σ 2, ω 2)

Combining f (τ | ω 2 ) and f (y | τ , σ 2 )

f (τ | y, σ 2 , ω 2 ) ∝ f (y | τ , σ 2 )f (τ | ω 2 )
 
− 21 1
(y−τ )′ (y−τ )+τ ′ (H′ Ω−1 H)τ
∝e σ2
 
− 12 τ ′ (H′ Ω−1 H+σ−2 IT )τ − 2
τ ′y
∝e σ2

Hence,
(τ | y, σ 2 , ω 2 ) ∼ N(b
τ , K−1 ),
where K = H′ Ω−1 H + σ −2 IT and τb = σ −2 K−1 y
Sample (τ | y, σ 2, ω 2)

Since f (τ | y, σ 2 , ω 2 ) is multivariate normal, sampling from it


might seem easy

Main difficulty: the covariance matrix K−1 is a full T × T matrix


(computing its Cholesky factor, e.g., is time-consuming)

However, the precision matrix K = H′ Ω−1 H + σ −2 IT is sparse

Computations involving sparse matrices are much quicker


0 0

50 50

100 100

150 150

200 200

250 250
0 100 200 0 100 200
nz = 769 nz = 66049
Precision Sampler

τ , K−1 ) of dimension T ,
To generate R independent draws from N(b
carry out the following steps:
1. Compute the lower Cholesky factorization K = BB′ .
2. Generate Z = (Z1 , . . . , ZT )′ by drawing Z1 , . . . , ZT ∼ N(0, 1).
3. Output U = τb + (B′ )−1 Z.
4. Repeat Steps 2 and 3 independently R times.
Quick Check

Recall that U = τb + (B′ )−1 Z, where Z ∼ N(0, IT )

U is an affine transformation of normal random variables, so it is


normal

Easy to check that EU = τb

The covariance matrix is

Var(U) = (B′ )−1 Var(Z)((B′ )−1 )′ = (B′ )−1 (B)−1


= (BB′ )−1 = K−1
Sample (σ 2 | y, τ , ω 2)

(2) f (σ 2 | y, τ , ω 2 ):

f (σ 2 | y, τ , ω 2 ) ∝ f (y | τ , σ 2 )f (σ 2 )
S 2 1
T ′
∝ (σ 2 )−(νσ2 +1) e− σ2 (σ 2 )− 2 e− 2σ2 (y−τ ) (y−τ )
σ

1 ′
∝ (σ 2 )−(νσ2 +T /2+1) e− σ2 (Sσ2 +(y−τ ) (y−τ )/2)

Hence,

(σ 2 | y, τ , ω 2 ) ∼ InvGamma νσ2 + T /2, Sσ2 + (y − τ )′ (y − τ )/2
Sample (ω 2 | y, τ , σ 2)

Recall that the transition equation is given by

τt = τt−1 + ut , ut ∼ N(0, ω 2 )

for t = 2, . . . , T

Hence, another way to write f (τ | ω 2 ) is


T −1 1 PT 2
f (τ | ω 2 ) = f (τ1 )(2πω 2 )− 2 e− 2ω2 t=2 (τt −τt−1 )
(3) f (ω 2 | y, τ , σ 2 ):

f (ω 2 | y, τ , σ 2 ) ∝ f (τ | ω 2 )f (ω 2 )
S 2 1 PT
T −1 2
t=2 (τt −τt−1 )
ω
∝ (ω 2 )−(νω2 +1) e− ω2 (ω 2 )− 2 e− 2ω2
T −1 1 1 PT 2
∝ (ω 2 )−(νω2 + 2
+1)
e− ω2 (Sω2 + 2 t=2 (τt −τt−1 ) )

Hence,
 
2 2 T −1
(ω | y, τ , σ ) ∼ InvGamma νω2 + ,S
2
PT
where S = Sω2 + t=2 (τt − τt−1 )2 /2
Gibbs Sampler for the Unobserved Components Model

Pick some initial values τ (0) = a0 , σ 2(0) = b0 > 0, and


ω 2(0) = c0 > 0. Then, repeat the following steps from r = 1 to R:
1. Draw τ (r ) ∼ f (τ | y, σ 2(r −1) , ω 2(r −1) ) (multivariate normal).
2. Draw σ 2(r ) ∼ f (σ 2 | y, τ (r ) , ω 2(r −1) ) (inverse-gamma).
3. Draw ω 2(r ) ∼ f (ω 2 | y, τ (r ) , σ 2(r ) ) (inverse-gamma).
MATLAB Code

% UC_RW.m
nloop = 11000; burnin = 1000;
load ’USCPI.csv’; Y = USCPI;
T = length(Y);

%% initialize for storage


store_tau = zeros(nloop-burnin,T);
store_theta = zeros(nloop-burnin,2);

%% prior
invVtau = 1/5;
nusig0 = 5; Ssig0 = 4*(nusig0-1);
nuomega0 = 5; Somega0 = .25^2*(nuomega0-1);
MATLAB Code

%% initialize the Markov chain


sig2 = 1; omega2 = 1;

%% compute a few things outside the loop


H = speye(T) - sparse(2:T,1:(T-1),ones(1,T-1),T,T);

for loop = 1:nloop


%% sample tau
invOmega = sparse(1:T,1:T, ...
[invVtau 1/omega2*ones(1,T-1)],T,T);
invDtau = H’*invOmega*H + 1/sig2*speye(T);
Ctau = chol(invDtau,’lower’);
tauhat = invDtau\(Y/sig2);
tau = tauhat + Ctau’\randn(T,1);
MATLAB Code

%% sample sig2
newSsig = Ssig0 + sum((Y - tau).^2)/2;
sig2 = 1/gamrnd(nusig0+T/2,1/newSsig);

%% sample omega2
u = tau(2:end) - tau(1:T-1);
newSomega = Somega0 + sum(u.^2)/2;
omega2 = 1/gamrnd(nuomega0+(T-1)/2,1./newSomega);

if loop>burnin
i = loop-burnin;
store_tau(i,:) = tau’;
store_theta(i,:) = [sig2 omega2];
end
end
Time-Varying Parameter VAR

Consider again the VARn (p) but now with time-varying parameters:

yt = bt + B1t yt−1 + · · · + Bpt yt−p + ǫt , ǫt ∼ N(0, Σ)

Or equivalently,
yt = Xt β t + ǫt ,
where
′ ′
Xt = In ⊗ [1, yt−1 , . . . , yt−p ], β t = vec([bt , B1t , · · · Bpt ]′ )
The state equation is given by

β t = β t−1 + ut , ut ∼ N(0, Q)

for t = 2, . . . , T

Initialized with β 1 ∼ N(β 0 , Q0 ) for some known constant matrices


β 0 and Q0

The covariance matrix is typically assumed to be diagonal


Q = diag(q1 , . . . , qk )
Priors and Estimation

Independent priors for Σ and Q

Σ ∼ InvWishart(ν0 , S0 ), qi ∼ InvGamma(ν0i , S0i )

A Gibbs sampler is constructed by sampling through


1. (β | y, Σ, Q),
2. (Σ | y, β, Q),
3. (Q | y, β, Σ).
An Expression for f (y | β, Σ)

Rewrite the measurement equation as

y = Xβ + ǫ, ǫ ∼ N(0, IT ⊗ Σ),

where  
X1 0 · · · 0
 0 X2 · · · 0 
 
X =  .. .. . . .. 
 . . . . 
0 0 ··· XT

Hence, we have
Tn T 1 ′ −1
f (y | β, Σ) = (2π)− 2 |Σ|− 2 e− 2 (y−Xβ) (IT ⊗Σ )(y−Xβ)
An Expression for f (β | Q)

For simplicity, assume β 0 = 0

Rewrite the transition equation as:


    
Ik 0 0 ··· 0 β1 u1
−Ik Ik 0 · · · 0   β   u2 
  2  
 0 −Ik Ik · · · 0   β   u3 
  3 =  
 .. . . ..   ..   .. 
 . . .  .   . 
0 0 · · · −Ik Ik βT uT
| {z }
H

That is,
Hβ = u, u ∼ N(0, Ω)
where Ω = diag(Q0 , Q, . . . , Q)
Again |H| = 1 implies that H is invertible

Since
β ∼ N(0, (H′ Ω−1 H)−1 ),
it follows that
Tk 1 T −1 1 ′ ′ −1
f (β | Q) = (2π)− 2 |Q0 |− 2 |Q| 2 e− 2 β (H Ω H)β
An Expression for f (β | y, Σ, Q)

Combining f (β | Q) and f (y | β, Σ)

f (β | y, Σ, Q) ∝ f (β | Q)f (y | β, Σ)
1
∝ e− 2 ((y−Xβ) (IT ⊗Σ )(y−Xβ)+β (H Ω H)β )
′ −1 ′ ′ −1

1
∝ e− 2 (β (H Ω H+X′ (IT ⊗Σ−1 )X)β−2β ′ X′ (IT ⊗Σ−1 )y)
′ ′ −1

Hence,
b K−1 ),
(β | y, Σ, Q) ∼ N(β,
where

K = H′ Ω−1 H + X′ (IT ⊗ Σ−1 )X


b = K−1 X′ (IT ⊗ Σ−1 )y
β
Sample (β | y, Σ, Q)

Note that
K = H′ Ω−1 H + X′ (IT ⊗ Σ−1 )X
is again a sparse matrix

Moreover,
b = K−1 X′ (IT ⊗ Σ−1 )y
β
can be computed quickly

Use the precision sampler to sample (β | y, Σ, Q)


Sample (Σ | y, β, Q)

(2) f (Σ | y, β, Q):
ν0 +n+1 1 −1 T 1 PT ′ −1 (y −X β )
∝ |Σ|− 2 e− 2 tr(S0 Σ )
|Σ|− 2 e− 2 t=1 (yt −Xt β t ) Σ t t t

ν0 +n+T +1 PT
1 −1
) − 21 tr[ ′ −1
t=1 (yt −Xt β t )(yt −Xt β t ) Σ ]
∝ |Σ|− 2 e− 2 tr(S0 Σ e
ν0 +n+T +1 1 PT
e− 2 tr[(S0 + )Σ−1 ]

∝ |Σ|− 2 t=1 (yt −Xt β t )(yt −Xt β t )

Hence,
(Σ | y, β, Q) ∼ InvWishart (ν0 + T , S)
P
where S = S0 + T t=1 (yt − Xt β t )(yt − Xt β t )

Sample (Q | y, β, Σ)

Recall that Q = diag(q1 , . . . , qk ) is diagonal

Can show that

(qi | y, β, Σ) ∼ InvGamma (ν0i + (T − 1)/2, Si ) ,


P
where Si = S0i + T 2
t=2 (βi ,t − βi ,t−1 ) /2
Gibbs Sampler for the TVP-VAR

Pick some initial values β (0) = a0 , Σ(0) = B0 , and Q(0) = C0 .


Then, repeat the following steps from r = 1 to R:
1. Draw β (r ) ∼ f (β | y, Σ(r −1) , Q(r −1) ) (multivariate normal).
2. Draw Σ(r ) ∼ f (Σ | y, β (r ) , Q(r −1) ) (inverse-Wishart).
3. Draw Q(r ) ∼ f (Q | y, β (r ) , Σ(r ) ) (independent
inverse-gammas).
MATLAB Code

for loop = 1:nloop


%% sample beta
XinvSig = bigX’*kron(speye(T),invSig);
XinvSigX = XinvSig*bigX;
invOmega = sparse(1:Tk,1:Tk, ...
[1./Vbeta; repmat(invQ,T-1,1)]’);
invDbeta = H’*invOmega*H + XinvSigX;
C = chol(invDbeta,’lower’);
betahat = C’\(C\(XinvSig*Y));
beta = betahat + C’\randn(Tk,1);
MATLAB Code

%% sample Sig
e1 = reshape(Y-bigX*beta,n,T);
newS1 = S01 + e1*e1’;
invSig = wishrnd(newS1\speye(n),newnu1);
Sig = invSig\speye(n);

%% sample Q
e2 = reshape(H*beta,k,T);
newS2 = S02 + sum(e2(:,2:end).^2,2)/2;
invQ = gamrnd(newnu2,1./newS2);
Q = 1./invQ;
end
Evaluating the Integrated Likelihood

The integrated likelihood is defined as


Z
f (y | Σ, Q) = f (y | β, Σ)f (β | Q)dβ

Very high-dimensional integration

It can be evaluated using Kalman filter (often very slow)

Turns out it can be quickly evaluated using space matrix algorithms


By Bayes’ Theorem, we have

f (y | β, Σ)f (β | Q)
f (β | y, Σ, Q) =
f (y | Σ, Q)

Or equivalently,

f (y | β, Σ)f (β | Q)
f (y | Σ, Q) =
f (β | y, Σ, Q)
Note that the RHS does not depend on β (true for all β)

Pick any β = β ∗ :

f (y | β ∗ , Σ)f (β ∗ | Q)
f (y | Σ, Q) =
f (β ∗ | y, Σ, Q)

The RHS involves evaluating only normal densities

Used in Chan and Jeliazkov (2009) and Chan and Eisenstat (2013)
Bayesian Model Comparison

Consider the problem of comparing models M1 , . . . , MK

Each model Mk is formally defined by a likelihood function


f (y | θ k , Mk ) and a prior distribution f (θ k | Mk )

We make the model indicator Mk explicit, and θ k is a


model-specific parameter vector
Posterior Odds Ratio

One popular criterion to compare models Mi against Mj is the


posterior odds ratio:

f (Mi | y) f (Mi ) f (y | Mi )
POij = = × ,
f (Mj | y) f (Mj ) f (y | Mj )
| {z } | {z }
prior odds ratio Bayes factor

where Z
f (y | Mk ) = f (y | θ k , Mk )f (θ k | Mk )dθ k

is the marginal likelihood for model Mk


When f (Mi | y) = f (Mj | y), i.e., when the models are equally
probable apriori, then

f (y | Mi )
POij =
f (y | Mj )

It boils down to computing the marginal likelihood for each model

In general, computing
Z
f (y | Mk ) = f (y | θ k , Mk )f (θ k | Mk )dθ k

is non-trivial
Marginal Likelihood Estimation

From now on we omit the dependence on the model Mk

So just write f (y) instead of f (y | Mk )

Two popular methods to estimate the marginal likelihood: the


modified harmonic mean estimator (Gelfand and Dey, 1994) and
Chib’s method (Chib, 1995; Chib and Jeliazkov, 2001)
Modified Harmonic Mean Estimator

For any density g (θ), we have


  Z
g (θ) g (θ)
E y = f (θ | y)dθ
f (θ)f (y | θ) f (θ)f (y | θ)
Z
g (θ) f (θ)f (y | θ)
= dθ
f (θ)f (y | θ) f (y)
Z
1
= g (θ)dθ
f (y)
1
=
f (y)

R
This identity is true for any g such that g (θ)dθ = 1
Since  
g (θ) 1
E y =
f (θ)f (y | θ) f (y)

we consider the estimator


   −1
R (r )
1 X g θ
fd
(y) =      ,
R (r )
f y|θ (r )
r =1 f θ

where θ (1) , . . . , θ (R) are posterior draws


Summary

The modified harmonic mean estimator of Gelfand and Dey (1994):


1. Obtain draws θ (1) , . . . , θ (R) from f (θ | y) using, e.g., Gibbs
sampler.
2. Compute fd(y).
How to choose a ’good’ g ?

Geweke (1999) suggests a normal approximation to the posterior


distribution with a tail truncation

Can prove that fd


(y) has finite variance

Some comments:
◦ easy to code up
◦ works well in low-dimensional models
◦ bias can be substantial in high-dimensional models (e.g.,
latent data models)
Chib’s Method

Chib’s Method is based on the identity

f (y | θ)f (θ)
f (y) =
f (θ | y)

which is true for any θ (in the support of the posterior)

Taking θ = θ ∗ , we have

log f (y) = log f (y | θ ∗ ) + log f (θ ∗ ) − log f (θ ∗ | y)

The first two terms on the RHS can often be evaluated


analytically; only need to estimate the third term
Example: a 3-Block Gibbs Sampler

Suppose we have the output from a 3-Block Gibbs sampler:


f (θ 1 | y, θ 2 , θ 3 ), f (θ 2 | y, θ 1 , θ 3 ), and f (θ 3 | y, θ 1 , θ 2 )

The three conditional distributions are fully known (so they can be
evaluated exactly)

Goal: estimate

log f (θ ∗ | y) = log f (θ ∗1 , θ ∗2 , θ ∗3 | y)
= log f (θ ∗1 | y) + log f (θ ∗2 | y, θ ∗1 ) + log f (θ ∗3 | y, θ ∗1 , θ ∗2 )
First, f (θ ∗3 | y, θ ∗1 , θ ∗2 ) can be computed exactly

Second, note that


Z Z
f (θ ∗1 | y) = f (θ ∗1 , θ 2 , θ 3 | y)dθ 2 dθ 3
Z Z
= f (θ ∗1 | y, θ 2 , θ 3 )f (θ 2 , θ 3 | y)dθ 2 dθ 3

Hence, f (θ ∗1 | y) can be estimated by

1X  ∗ 
R
\ ∗ (r ) (r )
f (θ 1 | y) = f θ 1 | y, θ 2 , θ 3
R
r =1
n o
(r ) (r )
where θ2 , θ 3 are sampled from f (θ 2 , θ 3 | y)
Similarly, note that
Z
f (θ ∗2 | y, θ ∗1 ) = f (θ ∗2 , θ 3 | y, θ ∗1 )dθ 3
Z Z
= f (θ ∗2 | y, θ ∗1 , θ 3 )f (θ 3 | y, θ ∗1 )dθ 3

Hence, f (θ ∗2 | y, θ ∗1 ) can be estimated by

1X  ∗ 
R
∗ (r )
f (θ\

2 | y, θ ∗
1 ) = f θ 2 | y, θ ,
1 3θ
R
r =1
n o
(r )
where θ3 are sampled from f (θ 3 | y, θ ∗1 )
Draws from f (θ 3 | y, θ ∗1 ) can be obtained via a reduced run
(0) (0)
Initialize θ 2 = a0 and θ 3 = b0 . Then, repeat the following steps
from r = 1 to R:  
(r ) (r −1)
1. Draw θ 2 ∼ f θ 2 | y, θ ∗1 , θ 3 .
 
(r ) (r )
2. Draw θ 3 ∼ f θ 3 | y, θ ∗1 , θ 2 .
Example: Summary

First, consider the identity

log f (y) = log f (y | θ ∗ ) + log f (θ ∗ ) − log f (θ ∗ | y)

Both f (y | θ ∗ ) and f (θ ∗ ) are often known. Suffices to estimate


f (θ ∗ | y) = f (θ ∗1 , θ ∗2 , θ ∗3 | y)

Next, note that

log f (θ ∗ | y) = log f (θ ∗1 | y) + log f (θ ∗2 | y, θ ∗1 ) + log f (θ ∗3 | y, θ ∗1 , θ ∗2 )

◦ f (θ ∗3 | y, θ ∗1 , θ ∗2 ) is known
◦ f (θ ∗1 | y) can be estimated using draws from the main run
◦ f (θ ∗2 | y, θ ∗1 ) can be estimated using draws from a reduced run
Comments:
◦ more programming effort is required
◦ more blocks require more reduced runs
◦ even more complicated when MH is involved (Chib and
Jeliazkov, 2001)
◦ works very well for high-dimensional latent data models

You might also like