Bayesian Structural Time Series Models
Bayesian Structural Time Series Models
Steven L. Scott
Outline
Applications
Extensions
I Regression
I ARMA
I Smoothing
I Structural time series
Regression models
yt = β0 + β1 t +β2 xt + t .
|{z}
linear
Airline passengers
An example from elementary textbooks
2.8
log10(AirPassengers)
500
2.6
AirPassengers
2.4
300
2.2
100
2.0
1950 1954 1958 1950 1954 1958
Time Time
0.06 ●
●
●
air <- log10(AirPassengers) 0.04 ●
● ●●
●
●
●
● ●
●
● ●
●
● ● ●
time <- 1:length(air) 0.02 ●
●● ●
● ●
●
●
● ●●
●
●
● ●
● ●●
●
●
●
●
● ●●
●
●
●
●
● ●● ●●
months <- time %% 12 ●● ● ● ●
residuals
● ● ●
●
● ●● ● ● ●
●● ● ●
0.00 ● ●● ●
● ●
●
● ●
● ●●
● ●
months[months==0] <- 12 ●
●
● ● ●● ● ●
●● ●
●
●● ●
●
●
●●
●
●
−0.02 ● ●
● ●● ●
months <- factor(months, ● ●
●● ●
●
●
●
●
● ●
●●
●
● ● ● ●
−0.04
label = month.name) ●
●
●
● ●●
fitted
residuals
● ● ● ● ● ●
● ● ●
● ● ● ●● ● ●●● ●
●● ●
0.00 ● ● ● ● ●●● ● ● ●
●● ● ●● ● ●
● ● ●
● ●
●
● ● ●●
● ●● ●
● ● ●●●● ●
−0.02 ●
● ● ● ● ●●● ● ●●●●
●
reg <- lm(air ~ poly(time, 2), ●
●●●
● ●● ● ●
●
● ●●
months) −0.04
●
●
plot(reg$residuals) ●
time
1.0
0.0
Serial correlation is cured by locality.
−0.2
0 5 10 15 20
Lag
Introduction to time series modeling
ARMA models
Stationary vs Nonstationary
See code in stationary.R
10
many.random.walk[, 1]
5
0
many.ar1[, 1]
−10
0
−20
−5
−30
0 200 400 600 800 1000 0 200 400 600 800 1000
Time Time
yt = .95yt−1 + yt = yt−1 +
yt = .95yt−1 + yt = yt−1 +
Variance
AR1
yt = φyt−1 + t
= φ(φyt−2 + t−1 ) + t
= ...
t
X
= φi t−i .
i=0
Smoothing
Exponential smoothing
st = αyt + (1 − α)st−1
turns out to be the Kalman filter for the “local level” model.
Holt-Winters or “double exponential smoothing” captures a trend.
This is the Kalman filter for the “local linear trend” model.
“Triple” exponential smoothing can handle seasonality as well, but
the formulas are getting ridiculous!
Outline
Applications
Extensions
yt = ZtT αt + t t ∼ N (0, Ht )
I yt is the observed data at time t.
I αt is a vector of latent variables (the “state”).
I Zt and Ht are structural parameters (partly known).
Transition equation
αt+1 = Tt αt + Rt ηt ηt ∼ N (0, Qt )
I Tt , Rt , and Qt are structural parameters (partly known).
I ηt may be of lower dimension that αt .
State Vector Zt Tt
Trend
Seasonal
Regression
Example:
The “basic structural model” with a regression effect S seasons can be
written
yt = µt + γt + β T xt +t
|{z} |{z} | {z }
trend seasonal regression
µt = µt−1 + δt−1 + ut
δt = δt−1 + vt
S−1
X
γt = − γt−s + wt
s=1
I Local level
I Local linear trend
I Generalized local linear trend
I Autoregressive models
0
−5
−15
0 20 40 60 80 100
Time
tau = 0, sigma = 1
6
local.level.constant
5
4
3
2
1
0 20 40 60 80 100
Time
5
0
−5
0 20 40 60 80 100
I The model is
t ∼ N 0, σ 2
yt = µt + t
ηµ,t ∼ N 0, τµ2
µt = µt−1 + δt−1 + ηµ,t−1
ηδ,t ∼ N 0, τδ2
δt = δt−1 + ηδ,t−1
y = β0 + β1 t + t .
I With change ∆t, the expected increase in y is β1 ∆t.
I Now each ∆t = 1, and β1 = δt is a changing slope.
I Neat fact! The posterior mean of the local linear trend model is a
smoothing spline.
Steven L. Scott (Google) bsts August 10, 2015 24 / 100
Simulating local linear trend
3 simulations with σlevel = 1, σslope = .25 , σobs = .5
60
40
20
0
0 20 40 60 80 100
Time
0
−60 −40 −20
−100
0 20 40 60 80 100
Time
0
−40
−80
−120
0 20 40 60 80 100
Time
Structural time series models Modeling seasonality
Modeling seasonality
I In the “classroom regression model”
I We used a dummy variable for each “season.”
I Left one season out (set its coefficient to zero).
I In state space models
S−1
X
γt = − γt−s + ηt−1
s=1
Example
Modeling the air passengers data
data(AirPassengers)
y <- log10(AirPassengers)
ss <- AddLocalLinearTrend(
list(), ## No previous state specification.
y) ## Peek at the data to specify default priors.
ss <- AddSeasonal(
ss, ## Adding state to ss.
y, ## Peeking at the data.
nseasons = 12) ## 12 "seasons"
model <- bsts(y, state.specification = ss, niter = 1000)
plot(model)
plot(model, "help")
plot(model, "comp") ## "components"
plot(model, "resid") ## "residuals"
2.8
●●
●● ●
●
●●
●● ● ● ●● ●
● ● ●
●●● ● ●● ●
2.6
●● ● ● ● ●
● ● ●● ● ● ●
● ● ●●● ● ●● ● ●●
● ● ●●● ●
● ●●●
distribution
●● ● ●
●
●●●
●● ● ● ●●● ●
2.4
●
● ●●●● ● ● ● ● ●●● ●
●
● ● ●
● ●
●● ●
● ● ●●● ●
● ● ● ● ●● ●
●● ● ● ●
●●
2.2
● ●
●● ● ●● ●
● ●
●● ● ● ● ●
● ●
● ● ● ●●
● ●
2.0
Time
plot(model)
I “Fuzzy line” shows posterior distribution of state at time t.
I Blue dots are actual observations.
trend seasonal.12.1
2.0
2.0
distribution
distribution
1.0
1.0
0.0
0.0
1950 1954 1958 1950 1954 1958
Time Time
trend seasonal.12.1
2.7
0.10
2.5
distribution
distribution
0.00
2.3
−0.10
2.1
Time Time
distribution
distribution
distribution
0.05
0.05
0.05
0.05
−0.10
−0.10
−0.10
−0.10
1950 1954 1958 1950 1954 1958 1950 1954 1958 1950 1954 1958
distribution
distribution
distribution
0.05
0.05
0.05
0.05
−0.10
−0.10
−0.10
−0.10
1950 1954 1958 1950 1954 1958 1950 1954 1958 1950 1954 1958
distribution
distribution
distribution
0.05
0.05
0.05
0.05
−0.10
−0.10
−0.10
−0.10
1950 1954 1958 1950 1954 1958 1950 1954 1958 1950 1954 1958
Setting priors
AddLocalLinearTrend(
state.specification = NULL,
y,
level.sigma.prior = NULL, # SdPrior
slope.sigma.prior = NULL, # SdPrior
initial.level.prior = NULL, # NormalPrior
initial.slope.prior = NULL, # NormalPrior
sdy,
initial.y)
SdPrior(sigma.guess,
sample.size = .01,
initial.value = sigma.guess,
fixed = FALSE,
upper.limit = Inf)
Prediction
### Predict the next 24 periods.
pred <- predict(model, horizon = 24)
2.5
2.0
time
Steven L. Scott (Google) bsts August 10, 2015 35 / 100
MCMC and the Kalman filter
Outline
Applications
Extensions
y t−2 y t−1 yt
α t−2 α t−1 αt
y t−2 y t−1 yt
α t−2 α t−1 αt
y t−2 y t−1 yt
α t−2 α t−1 αt
α t−2 α t−1 αt
yt = ZtT αt + t t ∼ N (0, Ht )
αt+1 = Tt αt + Rt ηt ηt ∼ N (0, Qt )
Simulation smoother
[Durbin and Koopman(2002)] thought of a clever way to simulate p(α|y).
1. Simulate data with the wrong mean, but the right variance.
2. Subtract off the wrong mean, and put in the right one.
[Durbin and Koopman(2012)] (Section 4.6.2) have a “fast state smoother” that
can quickly compute E (αt |y) (without computing each Pt ).
I The DK simulation smoother requires two Kalman filters (for y and ỹ) and
two “fast state smoothers.”
I Works even if Rt is not full rank.
Steven L. Scott (Google) bsts August 10, 2015 44 / 100
MCMC and the Kalman filter
Break time!
Outline
Applications
Extensions
Linear regression
Posterior distribution
where
β̃ = V (XT y + Ω−1 b)
V −1 = XT X + Ω−1
is the sum of the information in the prior (Ω−1 ) and data (XT X).
Solve for
p(y|β, σ 2 )p(β|σ 2 )p(σ −2 )
p(y) = .
p(β|σ 2 , y)p(σ −2 |y)
Sparse modeling
I If there are many predictors, one could expect many of them to have
zero coefficients.
I Spike and slab priors set some coefficients to zero with positive
probability.
prior prior
likelihood likelihood
0.6
posterior posterior
0.5
0.5
0.4
0.4
0.3
pri
pri
0.3
0.2
0.2
0.1
0.1
0.0
0.0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
beta beta
I This argument absurdly overstates the case (because not all predictors
are exchangeable), but any algorithm that claims to find “the right”
model with this many candidates should be viewed with suspicion.
I Some variables are obviously helpful.
I Some are obviously garbage.
I With some you’re not sure. This is where the win comes from.
Steven L. Scott (Google) bsts August 10, 2015 54 / 100
Bayesian regression and spike-and-slab priors
γ = (1, 0, 0, 1, · · · , 1, 0, 0)
I Now factor the prior distribution
A useful parameterization
This prior is conditionally conjugate given γ.
Notation
I bγ means the elements of b where γ = 1.
I Ω−1
γ means the rows and columns of Ω
−1 where γ = 1.
γ
Y
γ∼ πj j (1 − πj )1−γj “Spike”
j
−1
βγ |γ, σ 2 ∼ N bγ , σ 2 Ω−1
γ “Slab”
1 df ss
∼Γ ,
σ2 2 2
Prior elicitation
αt = 1, Zt = β T xt , Tt = 1, Rt = 0
Outline
Applications
Nowcasting with Google Trends
Causal Impact
Extensions
Outline
Applications
Nowcasting with Google Trends
Causal Impact
Extensions
Nowcasting
Maintaining “real time” estimates of infrequently observed time series.
for unemployment
(ICNSA).
800
I Recession leading
700
indicator.
Thousands
600
released?
400
unemployment.office
Inclusion Probability
1 unemployment.office
5
0.94 filing.for.unemployment
0.47 idaho.unemployment
0.14 sirius.internet.radio
4
0.11 sirius.internet
3
I Remaining lines
1
shaded by inclusion
probability.
0
−1
−2
plot(model, "components")
4
3
3
2
2
distribution
distribution
distribution
1
1
0
0
−1
−1
−1
2004 2006 2008 2010 2012 2004 2006 2008 2010 2012 2004 2006 2008 2010 2012
Did it help?
CompareBstsModels(list("pure time series" = model1,
"with Google Trends" = model2))
pure time series
with Google Trends
100
cumulative absolute error
absolute
60
one-step-ahead
40
prediction error
20
normal times.
scaled values
recession.
0
−1
Outline
Applications
Nowcasting with Google Trends
Causal Impact
Extensions
Example
Real Google advertiser. 6-week ad campaign. Random shift added to both axes.
11000
10000
9000
clicks
8000
7000
6000
5000
Problem statement
Synthetic controls
A more realistic counterfactual model than DnD
CausalImpact
Extends DnD and synthetic controls using BSTS
a
post-intervention period
pre-intervention period
c
Applications Causal Impact
Potential outcomes
I Let yjst denote the value of Y for unit j under treatment s at time t.
T is the time of the market intervention.
I What we observe:
Before T We observe yj0t for everyone
After T We observe yj1t for the actor and yk0t for the potential
controls k 6= j.
I If we could also observe yj0t for the actor then yj1t − yj0t would be
the treatment effect.
Case study
A Google advertiser ran a marketing experiment.
●
●
●
● ●
●
●
● ●● ●
●
clicks (treatment region)
● ●
●●
● ● ● ●
● ●
9000
● ●
●
●●
● ●●
●
● ●
● ●
● ● ●● ● ●
●● ●●
●● ● ●
●
●
7000
●
●
●
●●
●
●
●● ●
●
●
● ●
●● ● before
5000
●
●
during
●
after
Case study
Google advertiser. Treated vs. Untreated regions
a
c
week -4
week -3
week -2
week -1
week 0
week 1
week 2
week 3
week 4
week 5
week 6
week 7
Steven L. Scott (Google) bsts August 10, 2015 85 / 100
Applications Causal Impact
Case study
Google advertiser. Competitor’s clicks as predictors
a
c
week -4
week -3
week -2
week -1
week 0
week 1
week 2
week 3
week 4
week 5
week 6
week 7
Steven L. Scott (Google) bsts August 10, 2015 86 / 100
Applications Causal Impact
Case study
Google advertiser. Untreated regions. Competitor’s sales as predictors
a
c
week -4
week -3
week -2
week -1
week 0
week 1
week 2
week 3
week 4
week 5
week 6
week 7
Steven L. Scott (Google) bsts August 10, 2015 87 / 100
Applications Causal Impact
Case study
Summary
Outline
Applications
Extensions
Normal mixtures
Longer term forecasting
Outline
Applications
Extensions
Normal mixtures
Longer term forecasting
Retail Sales
(Excluding Food Service)
350
adjusted.
RSXFS / 1000
t ∼ N 0, σ 2
y t = µt + t
µt = µt−1 + δt−1 + ηµ,t−1 ηµ,t ∼ Tν (0, τµ2 )
δt = δt−1 + ηδ,t−1 ηδ,t ∼ Tν (0, τδ2 )
Standard Deviations
Student ●
Slope
Gaussian ●
●●
● ●●
Slope
Student
Level
Gaussian ●
●
●●
●●●
●●● ● ●
Level
I Because the model is aware that occasional large errors can occur, the
“standard deviation” parameters can be smaller.
Impact on predictions
Original Data
600
600
600
500
500
500
RSXFS / 1000
original.series
original.series
400
400
400
300
300
300
200
200
200
1995 2000 2005 2010 2009 2010 2011 2012 2013 2014 2009 2010 2011 2012 2013 2014
I The extreme quantiles of the predictions under the Student model are
wider than under the Gaussian model.
I The central (e.g. 95%, 90%) intervals are narrower.
Similar tricks can be used to model probit, logit, and Poisson responses,
and even dynamic support vector machines by expressing these
distributions as normal mixtures.
Outline
Applications
Extensions
Normal mixtures
Longer term forecasting
t ∼ N 0, σ 2
yt = µt + t
ηµ,t ∼ N 0, τµ2
µt = µt−1 + δt−1 + ηµ,t−1
ηδ,t ∼ N 0, τδ2
δt = D + ρ(δt−1 − D) + ηδ,t
I Now the slopes δt follow an AR(1) process instead of a random walk.
I If |ρ| < 1 then AR(1) is stationary, so it does not entirely forget the
past.
I D is the “long run” trend of the series.
I The slope can locally deviate from D, but it will eventually return.
600
600
500
500
500
RSXFS / 1000
original.series
original.series
400
400
400
300
300
300
200
200
200
1995 2000 2005 2010 2009 2010 2011 2012 2013 2014 2009 2010 2011 2012 2013 2014