IntroBayesTimeSeries2
IntroBayesTimeSeries2
Joshua Chan
Website:
http://people.anu.edu.au/joshua.chan/
Email:
[email protected]
Plan for Today
yt = b + B1 yt−1 + · · · + Bp yt−p + ǫt ,
where ǫt ∼ N(0, Σ)
y = Xβ + ǫ,
we can use the Gibbs sampler for the linear regression model
Write
y1,t b1 B11,1 B12,1 y1,t−1 ǫ
= + + 1,t ,
y2,t b2 B21,1 B22,1 y2,t−1 ǫ2,t
as
b1
B11,1
1 y1,t−1 y2,t−1 0 0 0
y1,t
= B12,1 + ǫ1,t ,
y2,t 0 0 0 1 y1,t−1
y2,t−1 b2 ǫ2,t
B21,1
B22,1
Or
′
yt = I2 ⊗ [1, yt−1 ] β + ǫt
In the Form of a Regression
yt = Xt β + ǫt ,
′
where Xt = In ⊗ [1, yt−1 ′
, . . . , yt−p ] and β = vec([b, B1 , · · · Bp ]′ ).
y = Xβ + ǫ,
where
y1 X1
y = ... , X = ... , ǫ ∼ N(0, IT ⊗ Σ)
yT XT
VAR: Likelihood
Since
(y | β, Σ) ∼ N(Xβ, IT ⊗ Σ),
the likelihood function is given by:
1 1 ′ −1 (y−Xβ)
f (y | β, Σ) = |2π(IT ⊗ Σ)|− 2 e− 2 (y−Xβ) (IT ⊗Σ)
Tn T 1 ′ −1
= (2π)− 2 |Σ|− 2 e− 2 (y−Xβ) (IT ⊗Σ )(y−Xβ)
|IT ⊗ Σ| = |Σ|T
(IT ⊗ Σ)−1 = IT ⊗ Σ−1
Also note that since
|W|α/2 1 −1
|Z|− 2 e− 2 tr(WZ ) ,
α+n+1
f (Z; α, W) = nα/2
2 Γn (α/2)
We write Z ∼ InvWishart(α, W)
W
Moment: EZ = α−n−1 for α > n + 1
Priors and Gibbs Sampler
β ∼ N(β 0 , Vβ ), Σ ∼ InvWishart(ν0 , S0 )
(1) f (Σ | y, β):
ν0 +n+1 1 −1 T 1 PT ′ −1 (y −X β)
∝ |Σ|− 2 e− 2 tr(S0 Σ )
|Σ|− 2 e− 2 t=1 (yt −Xt β) Σ t t
ν0 +n+T +1 1
) − 12 tr[ T ′ −1
]
−1
P
∝ |Σ|− 2 e− 2 tr(S0 Σ e t=1 (yt −Xt β)(yt −Xt β) Σ
ν +n+T +1 PT
− 0 2 − 21 tr[(S0 + t=1 (yt −Xt β)(yt −Xt β)
′
)Σ−1 ]
∝ |Σ| e
Now, compare
1 −1
e− 2 tr(SΣ )
ν+n+1
f (Σ | y, β) ∝ |Σ|− 2 ,
with
ν0 +T +n+1 1 PT
e− 2 tr[(S0 + )Σ−1 ]
′
∝ |Σ|− 2 t=1 (yt −Xt β)(yt −Xt β)
Hence,
T
!
X
(Σ | y, β) ∼ InvWishart ν0 + T , S0 + (yt − Xt β)(yt − Xt β)′
t=1
Sample (β | y, Σ)
(2) f (β | y, Σ):
1 ′ −1 1
∝ e− 2 (β−β0 ) Vβ (β−β0 ) e− 2 (y−Xβ) (IT ⊗Σ )(y−Xβ)
′ −1
1
∝ e− 2 [β (Vβ +X (IT ⊗Σ )X)β−2β (Vβ β0 +X (IT ⊗Σ )y)]
′ −1 ′ −1 ′ −1 ′ −1
Hence,
b Dβ ),
(β | y, Σ) ∼ N(β,
where
−1
Dβ = Vβ−1 ′
+ X (IT ⊗ Σ−1
)X ,
b = Dβ V−1 β + X′ (IT ⊗ Σ−1 )y
β β 0
Gibbs Sampler for the VAR
% VAR3.m
nloop = 11000; burnin = 1000;
load ’USdata.csv’;
Z0 = USdata(1:3,:); Z = USdata(4:end,:);
[T n] = size(Z);
longZ = reshape(Z’,T*n,1);
k = n+3*n^2;
%% prior
temp1 = ones(k,1); temp1(1:2*n+1:k) = 1/100;
invVbeta = sparse(1:k,1:k,temp1’);
nu0 = 10; S0 = 7*eye(n);
MATLAB Code
%% sample Sig
err = reshape(longZ - bigX*beta,n,T);
newS = S0 + err*err’;
Sig = iwishrnd(newS,newnu);
invSig = Sig\speye(n);
end
State Space Models: Overview
yt = Xt θ t + ǫt , ǫt ∼ N(0, Σ)
In the second level, the evolution of the states are modeled via the
state or transition equation
θ t = θ t−1 + ζ t , ζ t ∼ N(0, Ω)
Kalman Filter: Overview
Turns out we can accomplish both tasks with some new sparse
matrix algorithms without using KF
Unobserved Components Model
yt = τ t + ǫ t , ǫt ∼ N(0, σ 2 )
The states, in turn, are initialized with τ1 ∼ N(τ0 , ω02 ) for some
known constants τ0 and ω02 , and evolve according to the transition
equation
τt = τt−1 + ut , ut ∼ N(0, ω 2 )
for t = 2, . . . , T
Priors and Estimation
f (τ , σ 2 , ω 2 | y) ∝ f (y | τ , σ 2 )f (τ | ω 2 )f (σ 2 )f (ω 2 ),
y = τ + ǫ, ǫ ∼ N(0, σ 2 IT )
Hence, we have
T 1 ′
f (y | τ , σ 2 ) = (2πσ 2 )− 2 e− 2σ2 (y−τ ) (y−τ )
An Expression for f (τ | ω 2 )
That is,
Hτ = u, u ∼ N(0, Ω)
where Ω = diag(ω02 , ω 2 , . . . , ω 2 )
Note that |H| = 1 and hence H is invertible
Finally,
τ ∼ N(0, (H′ Ω−1 H)−1 )
Combining f (τ | ω 2 ) and f (y | τ , σ 2 )
f (τ | y, σ 2 , ω 2 ) ∝ f (y | τ , σ 2 )f (τ | ω 2 )
− 21 1
(y−τ )′ (y−τ )+τ ′ (H′ Ω−1 H)τ
∝e σ2
− 12 τ ′ (H′ Ω−1 H+σ−2 IT )τ − 2
τ ′y
∝e σ2
Hence,
(τ | y, σ 2 , ω 2 ) ∼ N(b
τ , K−1 ),
where K = H′ Ω−1 H + σ −2 IT and τb = σ −2 K−1 y
Sample (τ | y, σ 2, ω 2)
50 50
100 100
150 150
200 200
250 250
0 100 200 0 100 200
nz = 769 nz = 66049
Precision Sampler
τ , K−1 ) of dimension T ,
To generate R independent draws from N(b
carry out the following steps:
1. Compute the lower Cholesky factorization K = BB′ .
2. Generate Z = (Z1 , . . . , ZT )′ by drawing Z1 , . . . , ZT ∼ N(0, 1).
3. Output U = τb + (B′ )−1 Z.
4. Repeat Steps 2 and 3 independently R times.
Quick Check
(2) f (σ 2 | y, τ , ω 2 ):
f (σ 2 | y, τ , ω 2 ) ∝ f (y | τ , σ 2 )f (σ 2 )
S 2 1
T ′
∝ (σ 2 )−(νσ2 +1) e− σ2 (σ 2 )− 2 e− 2σ2 (y−τ ) (y−τ )
σ
1 ′
∝ (σ 2 )−(νσ2 +T /2+1) e− σ2 (Sσ2 +(y−τ ) (y−τ )/2)
Hence,
(σ 2 | y, τ , ω 2 ) ∼ InvGamma νσ2 + T /2, Sσ2 + (y − τ )′ (y − τ )/2
Sample (ω 2 | y, τ , σ 2)
τt = τt−1 + ut , ut ∼ N(0, ω 2 )
for t = 2, . . . , T
f (ω 2 | y, τ , σ 2 ) ∝ f (τ | ω 2 )f (ω 2 )
S 2 1 PT
T −1 2
t=2 (τt −τt−1 )
ω
∝ (ω 2 )−(νω2 +1) e− ω2 (ω 2 )− 2 e− 2ω2
T −1 1 1 PT 2
∝ (ω 2 )−(νω2 + 2
+1)
e− ω2 (Sω2 + 2 t=2 (τt −τt−1 ) )
Hence,
2 2 T −1
(ω | y, τ , σ ) ∼ InvGamma νω2 + ,S
2
PT
where S = Sω2 + t=2 (τt − τt−1 )2 /2
Gibbs Sampler for the Unobserved Components Model
% UC_RW.m
nloop = 11000; burnin = 1000;
load ’USCPI.csv’; Y = USCPI;
T = length(Y);
%% prior
invVtau = 1/5;
nusig0 = 5; Ssig0 = 4*(nusig0-1);
nuomega0 = 5; Somega0 = .25^2*(nuomega0-1);
MATLAB Code
%% sample sig2
newSsig = Ssig0 + sum((Y - tau).^2)/2;
sig2 = 1/gamrnd(nusig0+T/2,1/newSsig);
%% sample omega2
u = tau(2:end) - tau(1:T-1);
newSomega = Somega0 + sum(u.^2)/2;
omega2 = 1/gamrnd(nuomega0+(T-1)/2,1./newSomega);
if loop>burnin
i = loop-burnin;
store_tau(i,:) = tau’;
store_theta(i,:) = [sig2 omega2];
end
end
Time-Varying Parameter VAR
Consider again the VARn (p) but now with time-varying parameters:
Or equivalently,
yt = Xt β t + ǫt ,
where
′ ′
Xt = In ⊗ [1, yt−1 , . . . , yt−p ], β t = vec([bt , B1t , · · · Bpt ]′ )
The state equation is given by
β t = β t−1 + ut , ut ∼ N(0, Q)
for t = 2, . . . , T
y = Xβ + ǫ, ǫ ∼ N(0, IT ⊗ Σ),
where
X1 0 · · · 0
0 X2 · · · 0
X = .. .. . . ..
. . . .
0 0 ··· XT
Hence, we have
Tn T 1 ′ −1
f (y | β, Σ) = (2π)− 2 |Σ|− 2 e− 2 (y−Xβ) (IT ⊗Σ )(y−Xβ)
An Expression for f (β | Q)
That is,
Hβ = u, u ∼ N(0, Ω)
where Ω = diag(Q0 , Q, . . . , Q)
Again |H| = 1 implies that H is invertible
Since
β ∼ N(0, (H′ Ω−1 H)−1 ),
it follows that
Tk 1 T −1 1 ′ ′ −1
f (β | Q) = (2π)− 2 |Q0 |− 2 |Q| 2 e− 2 β (H Ω H)β
An Expression for f (β | y, Σ, Q)
Combining f (β | Q) and f (y | β, Σ)
f (β | y, Σ, Q) ∝ f (β | Q)f (y | β, Σ)
1
∝ e− 2 ((y−Xβ) (IT ⊗Σ )(y−Xβ)+β (H Ω H)β )
′ −1 ′ ′ −1
1
∝ e− 2 (β (H Ω H+X′ (IT ⊗Σ−1 )X)β−2β ′ X′ (IT ⊗Σ−1 )y)
′ ′ −1
Hence,
b K−1 ),
(β | y, Σ, Q) ∼ N(β,
where
Note that
K = H′ Ω−1 H + X′ (IT ⊗ Σ−1 )X
is again a sparse matrix
Moreover,
b = K−1 X′ (IT ⊗ Σ−1 )y
β
can be computed quickly
(2) f (Σ | y, β, Q):
ν0 +n+1 1 −1 T 1 PT ′ −1 (y −X β )
∝ |Σ|− 2 e− 2 tr(S0 Σ )
|Σ|− 2 e− 2 t=1 (yt −Xt β t ) Σ t t t
ν0 +n+T +1 PT
1 −1
) − 21 tr[ ′ −1
t=1 (yt −Xt β t )(yt −Xt β t ) Σ ]
∝ |Σ|− 2 e− 2 tr(S0 Σ e
ν0 +n+T +1 1 PT
e− 2 tr[(S0 + )Σ−1 ]
′
∝ |Σ|− 2 t=1 (yt −Xt β t )(yt −Xt β t )
Hence,
(Σ | y, β, Q) ∼ InvWishart (ν0 + T , S)
P
where S = S0 + T t=1 (yt − Xt β t )(yt − Xt β t )
′
Sample (Q | y, β, Σ)
%% sample Sig
e1 = reshape(Y-bigX*beta,n,T);
newS1 = S01 + e1*e1’;
invSig = wishrnd(newS1\speye(n),newnu1);
Sig = invSig\speye(n);
%% sample Q
e2 = reshape(H*beta,k,T);
newS2 = S02 + sum(e2(:,2:end).^2,2)/2;
invQ = gamrnd(newnu2,1./newS2);
Q = 1./invQ;
end
Evaluating the Integrated Likelihood
f (y | β, Σ)f (β | Q)
f (β | y, Σ, Q) =
f (y | Σ, Q)
Or equivalently,
f (y | β, Σ)f (β | Q)
f (y | Σ, Q) =
f (β | y, Σ, Q)
Note that the RHS does not depend on β (true for all β)
Pick any β = β ∗ :
f (y | β ∗ , Σ)f (β ∗ | Q)
f (y | Σ, Q) =
f (β ∗ | y, Σ, Q)
Used in Chan and Jeliazkov (2009) and Chan and Eisenstat (2013)
Bayesian Model Comparison
f (Mi | y) f (Mi ) f (y | Mi )
POij = = × ,
f (Mj | y) f (Mj ) f (y | Mj )
| {z } | {z }
prior odds ratio Bayes factor
where Z
f (y | Mk ) = f (y | θ k , Mk )f (θ k | Mk )dθ k
f (y | Mi )
POij =
f (y | Mj )
In general, computing
Z
f (y | Mk ) = f (y | θ k , Mk )f (θ k | Mk )dθ k
is non-trivial
Marginal Likelihood Estimation
R
This identity is true for any g such that g (θ)dθ = 1
Since
g (θ) 1
E y =
f (θ)f (y | θ) f (y)
Some comments:
◦ easy to code up
◦ works well in low-dimensional models
◦ bias can be substantial in high-dimensional models (e.g.,
latent data models)
Chib’s Method
f (y | θ)f (θ)
f (y) =
f (θ | y)
Taking θ = θ ∗ , we have
The three conditional distributions are fully known (so they can be
evaluated exactly)
Goal: estimate
log f (θ ∗ | y) = log f (θ ∗1 , θ ∗2 , θ ∗3 | y)
= log f (θ ∗1 | y) + log f (θ ∗2 | y, θ ∗1 ) + log f (θ ∗3 | y, θ ∗1 , θ ∗2 )
First, f (θ ∗3 | y, θ ∗1 , θ ∗2 ) can be computed exactly
1X ∗
R
\ ∗ (r ) (r )
f (θ 1 | y) = f θ 1 | y, θ 2 , θ 3
R
r =1
n o
(r ) (r )
where θ2 , θ 3 are sampled from f (θ 2 , θ 3 | y)
Similarly, note that
Z
f (θ ∗2 | y, θ ∗1 ) = f (θ ∗2 , θ 3 | y, θ ∗1 )dθ 3
Z Z
= f (θ ∗2 | y, θ ∗1 , θ 3 )f (θ 3 | y, θ ∗1 )dθ 3
1X ∗
R
∗ (r )
f (θ\
∗
2 | y, θ ∗
1 ) = f θ 2 | y, θ ,
1 3θ
R
r =1
n o
(r )
where θ3 are sampled from f (θ 3 | y, θ ∗1 )
Draws from f (θ 3 | y, θ ∗1 ) can be obtained via a reduced run
(0) (0)
Initialize θ 2 = a0 and θ 3 = b0 . Then, repeat the following steps
from r = 1 to R:
(r ) (r −1)
1. Draw θ 2 ∼ f θ 2 | y, θ ∗1 , θ 3 .
(r ) (r )
2. Draw θ 3 ∼ f θ 3 | y, θ ∗1 , θ 2 .
Example: Summary
◦ f (θ ∗3 | y, θ ∗1 , θ ∗2 ) is known
◦ f (θ ∗1 | y) can be estimated using draws from the main run
◦ f (θ ∗2 | y, θ ∗1 ) can be estimated using draws from a reduced run
Comments:
◦ more programming effort is required
◦ more blocks require more reduced runs
◦ even more complicated when MH is involved (Chib and
Jeliazkov, 2001)
◦ works very well for high-dimensional latent data models