0% found this document useful (0 votes)

125 views

Bayesian Structural Time Series Models

This document provides an introduction to Bayesian structural time series models. It discusses different strategies for time series modeling, including regression models, ARMA models, smoothing models, and structural time series models. It uses an example of airline passenger data to show that a simple linear time trend does not adequately capture the patterns in the data and that structural time series models may fit better. The document then outlines topics to be covered, including the basics of structural time series modeling, regression modeling with spike and slab priors, applications of these techniques using the bsts R package.

Uploaded by

Claudya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views

Bayesian Structural Time Series Models

Uploaded by

Claudya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 100

Bayesian Structural Time Series Models

Steven L. Scott

August 10, 2015

Welcome!

The goal for the day is to introduce you to:

I basic ideas in structural time series modeling,
I regression modeling with spike and slab priors, and
I the bsts R package.
Course notes and materials at https://goo.gl/VUWUC9

Steven L. Scott (Google) bsts August 10, 2015 2 / 100

Some good books
For structural time series, and time series in general.

Harvey Durbin and Koopman West and Harrison

Chatfield Brockwell and Davis Petris et. al

Steven L. Scott (Google) bsts August 10, 2015 3 / 100
Introduction to time series modeling

Outline

Introduction to time series modeling

Structural time series models

MCMC and the Kalman filter

Bayesian regression and spike-and-slab priors

Applications

Extensions

Steven L. Scott (Google) bsts August 10, 2015 4 / 100

Introduction to time series modeling

Strategies for time series models

I Regression
I ARMA
I Smoothing
I Structural time series

Steven L. Scott (Google) bsts August 10, 2015 5 / 100

Introduction to time series modeling

Regression models

Introductory statistics courses teach students to fit models like

yt = β0 + β1 t +β2 xt + t .
|{z}
linear

1. The trend probably won’t follow a parametric form.

2. Even if it does, the residuals will be autocorrelated.

Steven L. Scott (Google) bsts August 10, 2015 6 / 100

Introduction to time series modeling

Airline passengers
An example from elementary textbooks

2.8
log10(AirPassengers)
500

2.6
AirPassengers

2.4
300

2.2
100

2.0
1950 1954 1958 1950 1954 1958

Time Time

Air passengers log scale

Steven L. Scott (Google) bsts August 10, 2015 7 / 100

Introduction to time series modeling

Linear time trend doesn’t quite fit

See air-passengers-bsts.R

0.06 ●
●
●
air <- log10(AirPassengers) 0.04 ●
● ●●
●

●
●
● ●
●
● ●
●
● ● ●
time <- 1:length(air) 0.02 ●
●● ●
● ●
●
●

● ●●
●
●
● ●
● ●●
●
●
●
●
● ●●
●
●

●
●
● ●● ●●
months <- time %% 12 ●● ● ● ●

residuals
● ● ●
●
● ●● ● ● ●
●● ● ●
0.00 ● ●● ●
● ●
●
● ●
● ●●
● ●

months[months==0] <- 12 ●
●
● ● ●● ● ●

●● ●
●
●● ●
●
●
●●
●
●

−0.02 ● ●
● ●● ●
months <- factor(months, ● ●
●● ●
●
●
●
●
● ●
●●
●

● ● ● ●
−0.04
label = month.name) ●
●
●
● ●●

reg <- lm(air ~ time + months) −0.06 ●

●

2.2 2.4 2.6 2.8

fitted

Steven L. Scott (Google) bsts August 10, 2015 8 / 100

Quadratic time trend ●
●
●
Misses serial correlation 0.04 ●
● ●
●
●
●
● ● ●
● ● ●●
● ● ● ● ●
0.02 ●
●● ● ● ● ● ●
●●
● ●● ● ●● ● ●
● ● ● ●●● ● ●
●
●● ●
●●●

residuals
● ● ● ● ● ●
● ● ●
● ● ● ●● ● ●●● ●
●● ●
0.00 ● ● ● ● ●●● ● ● ●
●● ● ●● ● ●
● ● ●
● ●
●
● ● ●●
● ●● ●
● ● ●●●● ●
−0.02 ●
● ● ● ● ●●● ● ●●●●
●
reg <- lm(air ~ poly(time, 2), ●
●●●
● ●● ● ●
●
● ●●
months) −0.04
●
●
plot(reg$residuals) ●

acf(residuals) 0 20 40 60 80 100 120 140

time
1.0

I Predictions between months 80 0.8

- 100 predictably too low. 0.6

I Between months 100 - 120 0.4

predictably too high. ACF 0.2

0.0
Serial correlation is cured by locality.
−0.2

0 5 10 15 20

Lag
Introduction to time series modeling

ARMA models

ARMA(P,Q) models have the form

P
X Q
X
yt = φp yt−p + θq t−q .
p=1 q=0

Some features that make ARMA models difficult:

1. yt must be stationary. If non-stationary then take differences until it
becomes stationary.
2. If yt contains a seasonal component, then seasonal differencing is also
required.
3. Harder to think about. (Regression of y on x vs of ∆52 ∆2 y on x).
ARMA models can be written as a special case of state space models.

Steven L. Scott (Google) bsts August 10, 2015 10 / 100

Introduction to time series modeling

Stationary vs Nonstationary
See code in stationary.R

sample.size <- 1000

number.of.series <- 1000
many.ar1 <- matrix(nrow = sample.size, ncol =number.of.series)
for (i in 1:number.of.series) {
many.ar1[, i] <- arima.sim(model = list(ar = .95),
n = sample.size)
}
many.random.walk <- matrix(nrow = sample.size,
ncol = number.of.series)
for (i in 1:number.of.series) {
many.random.walk[, i] <- cumsum(rnorm(sample.size))
}
par(mfrow = c(1, 2))
plot.ts(many.ar1, plot.type = "single")
plot.ts(many.random.walk, plot.type = "single")

Steven L. Scott (Google) bsts August 10, 2015 11 / 100

Introduction to time series modeling

What it looks like

Single series

AR1 Random Walk

10
many.random.walk[, 1]
5

0
many.ar1[, 1]

−10
0

−20
−5

−30
0 200 400 600 800 1000 0 200 400 600 800 1000

Time Time

yt = .95yt−1 + yt = yt−1 +

Steven L. Scott (Google) bsts August 10, 2015 12 / 100

Introduction to time series modeling

What it looks like

Many series

yt = .95yt−1 + yt = yt−1 +

Steven L. Scott (Google) bsts August 10, 2015 13 / 100

Introduction to time series modeling

Variance
AR1
yt = φyt−1 + t
= φ(φyt−2 + t−1 ) + t
= ...
t
X
= φi t−i .
i=0

If |φ| < 1 then as t → ∞, Var (yt ) = Var (t )/(1 − |φ|).

Random walk
t
X
yt = t
i=0
2
Var (yt ) = σ t
Variance diverges to ∞.
Steven L. Scott (Google) bsts August 10, 2015 14 / 100
Introduction to time series modeling

Smoothing
Exponential smoothing
st = αyt + (1 − α)st−1
turns out to be the Kalman filter for the “local level” model.
Holt-Winters or “double exponential smoothing” captures a trend.

st = αyt + (1 − α)(st−1 + bt−1 )

bt = β(st − st−1 ) + (1 − β)bt−1

This is the Kalman filter for the “local linear trend” model.
“Triple” exponential smoothing can handle seasonality as well, but
the formulas are getting ridiculous!

I What happens if you want to include a regression component?

I Confidence about the “smoothed” estimate?

Steven L. Scott (Google) bsts August 10, 2015 15 / 100

Introduction to time series modeling

Advantages of structural time series models

I All the flexibility of regression models.

I The locality of ARMA models and smoothing.

I Can handle non-stationarity.

I Modular, so easy combine with other additive components.

I All those “smoothing parameters” become variances that can be

estimated from data.

Steven L. Scott (Google) bsts August 10, 2015 16 / 100

Structural time series models

Outline

Introduction to time series modeling

Structural time series models

Models for trend
Modeling seasonality

MCMC and the Kalman filter

Bayesian regression and spike-and-slab priors

Applications

Extensions

Steven L. Scott (Google) bsts August 10, 2015 17 / 100

Structural time series models

State space form

There are two pieces to a structural time series model

Observation equation

yt = ZtT αt + t t ∼ N (0, Ht )
I yt is the observed data at time t.
I αt is a vector of latent variables (the “state”).
I Zt and Ht are structural parameters (partly known).
Transition equation

αt+1 = Tt αt + Rt ηt ηt ∼ N (0, Qt )
I Tt , Rt , and Qt are structural parameters (partly known).
I ηt may be of lower dimension that αt .

Steven L. Scott (Google) bsts August 10, 2015 18 / 100

Structural time series models

Structural time series models are modular

Add your favorite trend, seasonal, regression, holiday, etc. models to the mix

State Vector Zt Tt

Trend

Seasonal

Regression

Steven L. Scott (Google) bsts August 10, 2015 19 / 100

Structural time series models

Example:
The “basic structural model” with a regression effect S seasons can be
written

yt = µt + γt + β T xt +t
|{z} |{z} | {z }
trend seasonal regression

µt = µt−1 + δt−1 + ut
δt = δt−1 + vt
S−1
X
γt = − γt−s + wt
s=1

I Local linear trend: “level” µt + “slope” δt .

I Seasonal: S − 1 dummy variables with time varying coefficients.
Sums to zero in expectation.

Steven L. Scott (Google) bsts August 10, 2015 20 / 100

Structural time series models Models for trend

Some models for trend

I Local level
I Local linear trend
I Generalized local linear trend
I Autoregressive models

Steven L. Scott (Google) bsts August 10, 2015 21 / 100

Structural time series models Models for trend

Understanding the local level model

I The local level model is
t ∼ N 0, σ 2

yt = µt + t
ηt ∼ N 0, τ 2

µt = µt−1 + ηt−1

I A compromise between the random walk model (when σ 2 = 0) and

the constant mean model (when τ 2 = 0).
I In the random walk model, your forecast of the future (given data to
time t) is yt .
I In the constant mean model, your forecast is ȳ .
I The larger the ratio σ 2 /τ 2 the closer this model is to the “constant
mean model”.
I In “state space form”
Tt = 1, Zt = 1, Rt = 1, Ht = σ 2 , Qt = τ 2
Steven L. Scott (Google) bsts August 10, 2015 22 / 100
Structural time series models Models for trend

Simulating the local level model

local-level.R
tau = 1, sigma = 0
10
5
local.level.rw

0
−5
−15

0 20 40 60 80 100

Time

tau = 0, sigma = 1
6
local.level.constant

5
4
3
2
1

0 20 40 60 80 100

Time

tau = 1, sigma = 0.5

10
local.level

5
0
−5

0 20 40 60 80 100

Steven L. Scott (Google) bsts

Time August 10, 2015 23 / 100
Structural time series models Models for trend

Local linear trend

local-linear-trend.R

I The model is

t ∼ N 0, σ 2

yt = µt + t
ηµ,t ∼ N 0, τµ2

µt = µt−1 + δt−1 + ηµ,t−1
ηδ,t ∼ N 0, τδ2

δt = δt−1 + ηδ,t−1

I We normally think of a “linear trend” as

y = β0 + β1 t + t .
I With change ∆t, the expected increase in y is β1 ∆t.
I Now each ∆t = 1, and β1 = δt is a changing slope.
I Neat fact! The posterior mean of the local linear trend model is a
smoothing spline.
Steven L. Scott (Google) bsts August 10, 2015 24 / 100
Simulating local linear trend
3 simulations with σlevel = 1, σslope = .25 , σobs = .5
60
40
20
0

0 20 40 60 80 100

Time
0
−60 −40 −20
−100

0 20 40 60 80 100

Time
0
−40
−80
−120

0 20 40 60 80 100

Time
Structural time series models Modeling seasonality

Modeling seasonality
I In the “classroom regression model”
I We used a dummy variable for each “season.”
I Left one season out (set its coefficient to zero).
I In state space models
S−1
X
γt = − γt−s + ηt−1
s=1

I γsummer = −(γspring + γwinter + γfall ) + ηt−1

I Mean over the year is zero.
I State is S − 1 dimensional.
I Only one dimension of randomness.
     
1 −1 −1 −1 1
Zt =  0 Tt =
  1 0 0 Rt =
  0 
0 0 1 0 0
Steven L. Scott (Google) bsts August 10, 2015 26 / 100
Structural time series models Modeling seasonality

Example
Modeling the air passengers data

data(AirPassengers)
y <- log10(AirPassengers)
ss <- AddLocalLinearTrend(
list(), ## No previous state specification.
y) ## Peek at the data to specify default priors.
ss <- AddSeasonal(
ss, ## Adding state to ss.
y, ## Peeking at the data.
nseasons = 12) ## 12 "seasons"
model <- bsts(y, state.specification = ss, niter = 1000)
plot(model)
plot(model, "help")
plot(model, "comp") ## "components"
plot(model, "resid") ## "residuals"

Steven L. Scott (Google) bsts August 10, 2015 27 / 100

Structural time series models Modeling seasonality

Posterior distribution of state

2.8
●●
●● ●
●
●●
●● ● ● ●● ●
● ● ●
●●● ● ●● ●
2.6
●● ● ● ● ●
● ● ●● ● ● ●
● ● ●●● ● ●● ● ●●
● ● ●●● ●
● ●●●
distribution

●● ● ●
●
●●●
●● ● ● ●●● ●
2.4

●
● ●●●● ● ● ● ● ●●● ●
●
● ● ●
● ●
●● ●
● ● ●●● ●
● ● ● ● ●● ●
●● ● ● ●
●●
2.2

● ●
●● ● ●● ●
● ●
●● ● ● ● ●
● ●
● ● ● ●●
● ●
2.0

1950 1952 1954 1956 1958 1960

Time

plot(model)
I “Fuzzy line” shows posterior distribution of state at time t.
I Blue dots are actual observations.

Steven L. Scott (Google) bsts August 10, 2015 28 / 100

Structural time series models Modeling seasonality

Contributions from each component

trend seasonal.12.1
2.0

2.0
distribution

distribution
1.0

1.0
0.0

0.0
1950 1954 1958 1950 1954 1958

Time Time

plot(model, "comp") ## "components"

Steven L. Scott (Google) bsts August 10, 2015 29 / 100

Structural time series models Modeling seasonality

Contributions from each component

trend seasonal.12.1
2.7

0.10
2.5
distribution

distribution

0.00
2.3

−0.10
2.1

1950 1954 1958 1950 1954 1958

Time Time

plot(model, "comp", same.scale = FALSE) ## "components"

Steven L. Scott (Google) bsts August 10, 2015 30 / 100

Evolution of the seasonal component
Season 1 Season 2 Season 3 Season 4
distribution

Setting priors

AddLocalLinearTrend(
state.specification = NULL,
y,
level.sigma.prior = NULL, # SdPrior
slope.sigma.prior = NULL, # SdPrior
initial.level.prior = NULL, # NormalPrior
initial.slope.prior = NULL, # NormalPrior
sdy,
initial.y)

Steven L. Scott (Google) bsts August 10, 2015 32 / 100

Structural time series models Modeling seasonality

Priors on standard deviations

SdPrior(sigma.guess,
sample.size = .01,
initial.value = sigma.guess,
fixed = FALSE,
upper.limit = Inf)

I This puts a gamma prior on 1/σ 2 .

I Shape (α) = sigma.guess2 × sample.size/2
I Scale (β) = sample.size/2
I If specify an upper limit on σ then support will be truncated.

Steven L. Scott (Google) bsts August 10, 2015 33 / 100

Structural time series models Modeling seasonality

What’s in the model object

Varies depending on how the function was called.
> names(model)
[1] "sigma.obs" "sigma.trend.level"
[3] "sigma.trend.slope" "sigma.seasonal.12"
[5] "final.state" "state.contributions"
[7] "one.step.prediction.errors" "has.regression"
[9] "state.specification" "original.series"

I MCMC draws of model parameters (each one is named).

I Draws of the “final” state vector (used for forecasting).
I Draws of each component’s contributions to the state mean.
I Draws of the one-step-ahead prediction errors (from the Kalman filter).
I A logical value indicating whethter the model has a (static) regression
component.
I The state specification you used to call the model.
I A copy of the original data series.
Steven L. Scott (Google) bsts August 10, 2015 34 / 100
Structural time series models Modeling seasonality

Prediction
### Predict the next 24 periods.
pred <- predict(model, horizon = 24)

### Plot prediction along with last 36 observations

### from training series.
plot(pred, plot.original = 36)
3.0
original.series

2.5
2.0

1958 1959 1960 1961 1962 1963

time
Steven L. Scott (Google) bsts August 10, 2015 35 / 100
MCMC and the Kalman filter

Outline

Introduction to time series modeling

The Kalman equations

Recall the state space form of the model is

yt = ZtT αt + t t ∼ N (0, Ht )
αt+1 = Tt αt + Rt ηt ηt ∼ N (0, Qt )

The Kalman filter recursively computes P(αt+1 |y1,...,t ) = N (at+1 , Pt+1 ).

vt = yt − ZtT at (1-step prediction error)

Ft = ZtT Pt Zt + Ht (forecast variance)
Kt = Tt Pt Zt Ft−1 (Kalman gain . . .
at+1 = Tt at + Kt vt . . . is a regression coefficient)
Pt+1 = Tt Pt (Tt − Kt ZtT )T + Rt Qt RtT

The deriviation is tedious, but elementary.

You can use Bayes’ rule, or properties of the multivariate normal.
See [Durbin and Koopman(2012)] or many other sources.
Steven L. Scott (Google) bsts August 10, 2015 42 / 100
MCMC and the Kalman filter

Forward and backward

I The Kalman filter marches forward through the data, collecting

information.

I There are corresponding algorithms that march backward through the

data, distributing information.
I Kalman smoother (useful for EM algorithm) computes p(αt |y).
I Simulation smoother
[Carter and Kohn(1994), Frühwirth-Schnatter(1995),
de Jong and Shepard(1995), Durbin and Koopman(2002)]

I The output of the Kalman filter + simulation smoother is an exact

draw from p(α|y, θ).

Steven L. Scott (Google) bsts August 10, 2015 43 / 100

MCMC and the Kalman filter

Simulation smoother
[Durbin and Koopman(2002)] thought of a clever way to simulate p(α|y).
1. Simulate data with the wrong mean, but the right variance.
2. Subtract off the wrong mean, and put in the right one.

The argument goes like this:

1. For multivariate normal (α, y), Var (α|y) is independent of y.
2. Simulate fake data (α, ỹ) from a structural time series model. The
conditional distribution (α|ỹ) has the same variance as (α|y).
3. Subtract E (α|ỹ) from your simulated α’s, and add E (α|y).

[Durbin and Koopman(2012)] (Section 4.6.2) have a “fast state smoother” that
can quickly compute E (αt |y) (without computing each Pt ).
I The DK simulation smoother requires two Kalman filters (for y and ỹ) and
two “fast state smoothers.”
I Works even if Rt is not full rank.
Steven L. Scott (Google) bsts August 10, 2015 44 / 100
MCMC and the Kalman filter

Break time!

Let’s take 15 minutes.

Steven L. Scott (Google) bsts August 10, 2015 45 / 100

Bayesian regression and spike-and-slab priors

Outline

Introduction to time series modeling

Structural time series models

MCMC and the Kalman filter

Bayesian regression and spike-and-slab priors

Applications

Extensions

Steven L. Scott (Google) bsts August 10, 2015 46 / 100

Bayesian regression and spike-and-slab priors

Linear regression

I “Bayesian regression” is just the ordinary linear model

yn×1 ∼ N Xn×k βk×1 , σ 2 In×n

with a prior on β and σ.

I A convenient prior distribution is p(β, σ 2 ) = p(β|σ 2 )p(σ −2 ), where

2 2
1 df ss
β|σ ∼ N b, σ Ω ∼Γ ,
σ2 2 2

I This prior is conjugate to the regression likelihood (i.e. prior and

posterior are from the same model family).

Steven L. Scott (Google) bsts August 10, 2015 47 / 100

Bayesian regression and spike-and-slab priors

Posterior distribution

I Write (prior) × (likelihood), do some algebra, and you get

2

2
1 DF SS
β|σ , y ∼ N β̃, σ V |y ∼ Γ ,
σ2 2 2

where

V −1 = XT X + Ω−1 β̃ = V (XT y + Ω−1 b)

DF = df + n SS = ss + yT y + b T Ω−1 b − β̃ T V −1 β̃

Steven L. Scott (Google) bsts August 10, 2015 48 / 100

Bayesian regression and spike-and-slab priors

Some useful facts about the posterior distribution

I The posterior mean

β̃ = V (XT y + Ω−1 b)

is the information-weighted average of the OLS estimate and the prior

mean. (XT y = XT Xβ̂)

I The (scaled) posterior information

V −1 = XT X + Ω−1

is the sum of the information in the prior (Ω−1 ) and data (XT X).

I If Ω−1 is positive definite then so is V −1 (and thus V ). Saves you

from perfect colinearity, k > n, etc.

Steven L. Scott (Google) bsts August 10, 2015 49 / 100

Bayesian regression and spike-and-slab priors

Using default values makes prior specification easier

b=0 (Helpful to cheat a tiny bit and set b0 = ȳ .)

XT X/n is the average information in a single
XT X
Ω−1 = n κ observation. κ is the “number of prior observa-
tions” worth of weight given to the prior.
ss
E (σ 2 ) ≈ df
Weight (number of prior observations) given to
df
your guess at σ 2 .
I Now “specifying the prior” means supplying 3 numbers: κ, df , and
your guess at σ 2 .
I If you don’t want to guess at σ 2 , peek at the sample variance of y,
and guess at R 2 , where σ 2 = (1 − R 2 )× (sample variance).
Some useful default values: κ = 1, df = 1, R 2 = 0.5.

Steven L. Scott (Google) bsts August 10, 2015 50 / 100

Bayesian regression and spike-and-slab priors

The marginal distribution of the data.

Because regression models are Gaussian, we can do some of the hard

integrals we can’t do in other models.

p(y|β, σ 2 )p(β|σ 2 )p(σ −2 )

p(β, σ −2 |y) =
p(y)
= p(β|σ 2 , y)p(σ −2 |y)

Solve for
p(y|β, σ 2 )p(β|σ 2 )p(σ −2 )
p(y) = .
p(β|σ 2 , y)p(σ −2 |y)

Steven L. Scott (Google) bsts August 10, 2015 51 / 100

Bayesian regression and spike-and-slab priors

Sparse modeling

I If there are many predictors, one could expect many of them to have
zero coefficients.

I Machine learning people like to use “penalized log likelihood.”

I Lasso, elastic net, Dantzig selector, etc.
I Penalties to log likelihood can be interpreted as log prior distributions.
I These induce sparsity at the mode, but not in the distribution
(zero probability mass at zero).

I Spike and slab priors set some coefficients to zero with positive
probability.

Steven L. Scott (Google) bsts August 10, 2015 52 / 100

The “lasso” (and related priors) are not sparse
They induces sparsity at the mode, but not in the posterior distribution
 
X
p(β) ∝ exp − |βj |
j
0.6

prior prior
likelihood likelihood

0.6
posterior posterior
0.5

0.5
0.4

0.4
0.3
pri

pri

0.3
0.2

0.2
0.1

0.1
0.0

0.0

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

beta beta

(weak likelihood) (stronger likelihood)

Bayesian regression and spike-and-slab priors

Why is this important?

I Penalized methods make a single decision about which variables are
included / excluded.
I With 100 predictors there are 2100 models, which is about 1030 .
I Avogadro’s number is 6 × 1023 , so if each model was an atom, the
space of models would have 1.66 million moles of mass.
I A mole of carbon is (by definition) 12g, so that’s about 20,000kg, or
22 (US) tons.
I So finding the best model in a space of 100 predictors is analogous to
finding the best atom in a 22 ton block of carbon.

I This argument absurdly overstates the case (because not all predictors
are exchangeable), but any algorithm that claims to find “the right”
model with this many candidates should be viewed with suspicion.
I Some variables are obviously helpful.
I Some are obviously garbage.
I With some you’re not sure. This is where the win comes from.
Steven L. Scott (Google) bsts August 10, 2015 54 / 100
Bayesian regression and spike-and-slab priors

Spike and slab priors

[George and McCulloch (1997)]

I We think most elements of β are zero.

I Let γj = 1 if βj 6= 0 and γj = 0 if βj = 0.

γ = (1, 0, 0, 1, · · · , 1, 0, 0)
I Now factor the prior distribution

p(β, γ, σ −2 ) = p(βγ |γ, σ 2 )p(σ 2 |γ)p(γ)

Steven L. Scott (Google) bsts August 10, 2015 55 / 100

Bayesian regression and spike-and-slab priors

A useful parameterization
This prior is conditionally conjugate given γ.

Notation
I bγ means the elements of b where γ = 1.
I Ω−1
γ means the rows and columns of Ω
−1 where γ = 1.

γ
Y
γ∼ πj j (1 − πj )1−γj “Spike”
j
−1
βγ |γ, σ 2 ∼ N bγ , σ 2 Ω−1
γ “Slab”

1 df ss
∼Γ ,
σ2 2 2

Steven L. Scott (Google) bsts August 10, 2015 56 / 100

Bayesian regression and spike-and-slab priors

Prior elicitation

πj = “expected model size” / number of predictors

b = 0 (vector)
Ω−1 = κ{αXT X + (1 − α)diagXT X}/n
2
ss/df = (1 − Rexpected )sy2
df = 1

I The Ω−1 expression is κ observations worth of prior information.

I It can help to average Ω−1 with its diagonal.
I Prior elicitation is 4 numbers: expected model size, expected R 2 , beta
weight (κ), and sigma weight (df ).

Steven L. Scott (Google) bsts August 10, 2015 57 / 100

Bayesian regression and spike-and-slab priors

Gibbs sampling for spike and slab regression

For each variable j, draw γj |γ−j , y.
1
|Ω−1
γ |2 p(γ)
γ|y ∼ C (y) 1 DF
|Vγ−1 | SSγ 2
2 −1

I Each γj only assumes the values 0 or 1, so the full conditional of γj

only has 2 values. Compute them both and normalize.
I The symbols in this forumla are the same as slide 48, but with γ
subscripts.
I Vγ is the posterior variance of βγ .
I SSγ is a “sum of squares”.
I A |γ| × |γ| matrix needs to be inverted to compute p(γ|y).
Cheap! (if there are lots of 0’s).

Steven L. Scott (Google) bsts August 10, 2015 58 / 100

Bayesian regression and spike-and-slab priors

A regression component in a structural time series model

The Kalman filter requires matrix-matrix multiplication at each step.

I With T time points and latent state dimension m the complexity is
O(Tm3 ).
I With pain, the exponent on m can be lowered, but it is still important
to keep the dimension down, where possible.

A regression component β T xt can be added to the Kalman filter at the

cost of a single dimension.

αt = 1, Zt = β T xt , Tt = 1, Rt = 0

Steven L. Scott (Google) bsts August 10, 2015 59 / 100

Bayesian regression and spike-and-slab priors

MCMC for spike-and-slab bsts

I Time series parameters: θ

I Regression coefficients β.

1. Simulate α ∼ p(α|θ, β, y).

I Note the conditioning on β.
I You’re effecively subtracting off the regression component, then fitting
the “state space model” to the residuals.
2. Set yt∗ = yt − ZtT αt
I Simulate p(θ|α)
I Simulate β|y∗
The componenets are independent so the simulation could be done in
parallel, but θ is so trivial it usually isn’t worth the effort.

Steven L. Scott (Google) bsts August 10, 2015 60 / 100

Bayesian regression and spike-and-slab priors

“Orthogonal Data Augmentation”

A neat trick!

I If p(β|σ 2 ) was diagonal and independent of σ 2 then the only think

keeping p(β|σ, y) from being diagonal is XT X.
I What if you happened to have a set of x’s lying around, and if you
added these x’s to your data XT X would be diagonal?
I You’d need to know the y ’s that go along with these x’s, so you’d
have a missing data problem.
Step 1 Find the x’s needed to diagonalize XT X.
Step 2 Repeat the the following steps:
1. Simulate the missing y ’s given β and σ 2 .
2. Simulate β and σ 2 given complete data.

Steven L. Scott (Google) bsts August 10, 2015 61 / 100

Bayesian regression and spike-and-slab priors

Pro’s and cons of ODA

Pro I The γi ’s can be sampled independently, as can βi |γi .

I This can be done in parallel, and is really really fast.
Cons I You have to decompose the whole XT X matrix once at
the beginning of the algorithm to find the necessary x’s.
I Some of the x’s can have high leverage.
I High leverage points determine where the line goes.
I If the missing data determine the line, and the line
determines the missing data then you have slow mixing.
bsts includes suport for ODA, but it is still experimental at this point.

Steven L. Scott (Google) bsts August 10, 2015 62 / 100

Applications

Outline

Introduction to time series modeling

Structural time series models

MCMC and the Kalman filter

Bayesian regression and spike-and-slab priors

Applications
Nowcasting with Google Trends
Causal Impact

Extensions

Steven L. Scott (Google) bsts August 10, 2015 63 / 100

Applications Nowcasting with Google Trends

Outline

Introduction to time series modeling

Structural time series models

MCMC and the Kalman filter

Bayesian regression and spike-and-slab priors

Applications
Nowcasting with Google Trends
Causal Impact

Extensions

Steven L. Scott (Google) bsts August 10, 2015 64 / 100

Applications Nowcasting with Google Trends

Nowcasting
Maintaining “real time” estimates of infrequently observed time series.

I US weekly initial claims

900

for unemployment
(ICNSA).
800

I Recession leading
700

indicator.
Thousands

600

I Can we learn this week’s

number before it is
500

released?
400

I We’d need a real time

300

signal correlated with

the outcome.
Jan 10 Jul 02 Jul 01 Jul 07 Jul 05 Jul 04 Jul 03 Jul 02 Jul 07
2004 2005 2006 2007 2008 2009 2010 2011 2012

Steven L. Scott (Google) bsts August 10, 2015 65 / 100

Google searches are a real time indicator of public interest
Google searches are a real time indicator of public interest
Applications Nowcasting with Google Trends

Google trends public interface

I Get it from google.com/trends
I Click the little gear to “download as CSV.”
I Data are percentage of all search traffic, normalized so the maximum
is 100.
I You can restrict by type of search, time range, geo, or search category
(“vertical”).

There are ∼600 verticals

I Hierarchical: “Automotive” vertical has “Hybrid and Alternative
Vehicles” subvertical.
I If you compare your search to a “vertical” and then “download as
CSV” then you get the vertical’s series too.
I That’s ∼600 “public interest indicies” you can use to predict YOUR
time series!
Steven L. Scott (Google) bsts August 10, 2015 68 / 100
Applications Nowcasting with Google Trends

Individual search queries

Google correlate can provide the most highly correlated individual queries (up to 100)

Steven L. Scott (Google) bsts August 10, 2015 69 / 100

Applications Nowcasting with Google Trends

Posterior inclusion probabilities

With expected model size = 3, and the top 100 predictors from correlate

plot(model, "coef", inc = .1)

unemployment.office

I Only showing inclusion

filing.for.unemployment probabilities < .1.
I Shading shows
idaho.unemployment Pr (βj > 0|y).
I White: positive
sirius.internet.radio
coefficients
I Black: negative
coefficients
sirius.internet

0.0 0.2 0.4 0.6 0.8 1.0

Inclusion Probability

Steven L. Scott (Google) bsts August 10, 2015 70 / 100

Applications Nowcasting with Google Trends

What got chosen?

plot(model, "predictors", inc = .1)

1 unemployment.office
5

0.94 filing.for.unemployment
0.47 idaho.unemployment
0.14 sirius.internet.radio
4

0.11 sirius.internet
3

I Solid blue line:

actual
Scaled Value

I Remaining lines
1

shaded by inclusion
probability.
0
−1
−2

2004 2006 2008 2010 2012

Steven L. Scott (Google) bsts August 10, 2015 71 / 100

Applications Nowcasting with Google Trends

How much explaining got done?

Dynamic distribution plot shows evolving pointwise posterior distribution of state
components.

plot(model, "components")

trend seasonal.52.1 regression

4
3

3
2

2
distribution

distribution

distribution
1

1
0

0
−1

−1

−1
2004 2006 2008 2010 2012 2004 2006 2008 2010 2012 2004 2006 2008 2010 2012

time time time

Steven L. Scott (Google) bsts August 10, 2015 72 / 100

Applications Nowcasting with Google Trends

Did it help?
CompareBstsModels(list("pure time series" = model1,
"with Google Trends" = model2))
pure time series
with Google Trends
100
cumulative absolute error

I Plot shows cumulative

absolute
60

one-step-ahead
40

prediction error
20

I The regressors are not

5 0

very helpful during

normal times.
scaled values

I They help the model to

quickly adapt to the

recession.
0
−1

Jan 04 Jul 03 Jul 02 Jul 01 Jul 06 Jul 05 Jul 04 Jul 03 Jul 01

2004 2005 2006 2007 2008 2009 2010 2011 2012

Steven L. Scott (Google) bsts August 10, 2015 73 / 100

Applications Causal Impact

Outline

Introduction to time series modeling

Structural time series models

MCMC and the Kalman filter

Bayesian regression and spike-and-slab priors

Applications
Nowcasting with Google Trends
Causal Impact

Extensions

Steven L. Scott (Google) bsts August 10, 2015 74 / 100

Applications Causal Impact

Measuring advertising effectiveness is a tricky business

I know that half my advertising dollars are wasted.
I just don’t know which half.
John Wanamaker

I One of the basic promises of online advertising is measurement.

I It is supposed to be easy.
I Change something (e.g. increase bid on Google).
I Look to see how many incremental ad clicks you get.
I Life is never easy.
I Ad clicks and native search clicks interact in complicated ways.
I Tough to get “incremental clicks” attributable to the ad campaign.
- Ad clicks can cannibalize native search clicks.
- Ads have a branding effect that can:
1. Be hard to measure,
2. Drive native search clicks,
3. Outlast the campaign.
Steven L. Scott (Google) bsts August 10, 2015 75 / 100
Applications Causal Impact

Example
Real Google advertiser. 6-week ad campaign. Random shift added to both axes.

11000

10000

9000
clicks

8000

7000

6000

5000

Apr 01 Apr 15 May 01 May 15 Jun 01 Jun 15

Steven L. Scott (Google) bsts August 10, 2015 76 / 100
Applications Causal Impact

Problem statement

I An actor engages in a market intervention.

I Has a sale.
I Begins (or modifies) an advertising campaign.
I Introduces (or adopts) a new product.
I Other similar actors don’t engage in the intervention.
I We have data on both the actor and the similar actors prior to the
intervention.

I Question: What was the effect of the intervention?

I Total change to the bottom line.
I How quickly did changes begin to occur?
I How quickly did the effect begin to die out?

Steven L. Scott (Google) bsts August 10, 2015 77 / 100

Difference in differences
An old trick from econometrics. Only measures at two points.
Applications Causal Impact

Synthetic controls
A more realistic counterfactual model than DnD

Abadie et al. (2003, 2010) suggested synthetic controls as counterfactuals.

I Weighted averages of untreated actors used to forecast actor of
interest.
I Weights (0 ≤ wi ≤ 1) estimated so that “synthetic control” series
matches actor’s series in pre-treatment period.
I Difference from forecast is estimated treatment effect.
Good Allows multiple controls, captures temporal effects.
Bad Scaling issues (California vs. Rhode Island), sign constraints
(negative correlations?), other time series?
Especially problematic for marketing. You know your sales, but not your
competitor’s sales.

Steven L. Scott (Google) bsts August 10, 2015 79 / 100

Applications Causal Impact

CausalImpact
Extends DnD and synthetic controls using BSTS

I Use data in the pre-treatement period to build a flexible time series

model for the series of interest.
I Forecast the time series over the intervention period given data from
the pre-treatment period.
I Can use contemporaneous regressors in the forecast.
I Model fit is based on pre-treatment data.
I Deviations from the forecast are the “treatment effect.”

I Assumes “no interference between units.” Often violated. Benign if

effect on untreated is small relative to effect on treated.

Steven L. Scott (Google) bsts August 10, 2015 80 / 100

The picture
Simulated data

a
post-intervention period

pre-intervention period

c
Applications Causal Impact

Potential outcomes

I Let yjst denote the value of Y for unit j under treatment s at time t.
T is the time of the market intervention.

I What we observe:
Before T We observe yj0t for everyone
After T We observe yj1t for the actor and yk0t for the potential
controls k 6= j.

I If we could also observe yj0t for the actor then yj1t − yj0t would be
the treatment effect.

I For t > T we have a model for yj0t |yk0t .

Steven L. Scott (Google) bsts August 10, 2015 82 / 100

Applications Causal Impact

Case study
A Google advertiser ran a marketing experiment.

I Google search ads ran 6 weeks.

I Response is total search related visits to the site.
I Native search clicks.
I Ad clicks.

I 95 of 190 “designated marketing areas” received the ads. (DMA’s are

areas that can receive distinct TV ads).

Steven L. Scott (Google) bsts August 10, 2015 83 / 100

Applications Causal Impact

This particular advertiser ran an experiment

Plot shows clicks from treated vs untreated geos. Each dot is a time point.
11000

●
●
●
● ●
●
●
● ●● ●
●
clicks (treatment region)

● ●
●●
● ● ● ●
● ●
9000

● ●
●
●●
● ●●
●
● ●
● ●
● ● ●● ● ●
●● ●●
●● ● ●
●
●
7000

●
●
●
●●
●
●
●● ●
●
●
● ●
●● ● before
5000

●
●
during
●
after

4000 5000 6000 7000

week 6

week 7
Steven L. Scott (Google) bsts August 10, 2015 87 / 100
Applications Causal Impact

Case study
Summary

Clicks % 95% Interval

vs. Untreated (1) 84,100 20 (15, 26)%
vs. Competitors (2) 84,800 21 (13, 26)%
A-A (placebo) test 8,000 2 (-5, 6)%
I Need experimental data to do analysis 1.
I Analysis 2 is observational, but replicates the experimenatal results.
I Using Google trends (instead of competitor information) gets about
the same results.
I Google trends are publicly available, while competitor clicks are not.
I Many more potential controls for Google trends. Spike and slab
variable selection / model averaging is useful for selecting appropriate
control groups.

Steven L. Scott (Google) bsts August 10, 2015 88 / 100

Extensions

Outline

Introduction to time series modeling

Structural time series models

MCMC and the Kalman filter

Bayesian regression and spike-and-slab priors

Applications

Extensions
Normal mixtures
Longer term forecasting

Steven L. Scott (Google) bsts August 10, 2015 89 / 100

Extensions Normal mixtures

Outline

Introduction to time series modeling

Structural time series models

MCMC and the Kalman filter

Bayesian regression and spike-and-slab priors

Applications

Extensions
Normal mixtures
Longer term forecasting

Steven L. Scott (Google) bsts August 10, 2015 90 / 100

Extensions Normal mixtures

Relaxing Gaussian assumptions

I All “model matrices” in state space models are subscripted by t.

I We can replace the Gaussian assumpttions with conditionally
Gaussian assumptions, where there is a latent variable at each t
determining the means and variances.
The MCMC now looks like this
I Draw latent variables w = (w1 , . . . , wT ) given α and θ.
I Draw α ∼ p(α|y, θ, w)
I Draw θ ∼ p(θ|y, α, w).
Example: The T distribution is a mixture of normals
(a normal divided by a chi-square).
I w ∼ Ga(ν/2, ν/2)
y |w ∼ N µ, σ 2 /w

I

Steven L. Scott (Google) bsts August 10, 2015 91 / 100

Extensions Normal mixtures

Example: retail sales

RSXFS: retail sales, excluding food service

Retail Sales
(Excluding Food Service)
350

I Monthly data, already seasonally

300

adjusted.
RSXFS / 1000

I Catastrophic drop in 2008.

250

I Shift in slope of local linear

200

trend is too large to be handled

by Gaussian assumption.
150

Jan Jan Jan Jan Jan Jan

1992 1996 2000 2004 2008 2012

Steven L. Scott (Google) bsts August 10, 2015 92 / 100

Extensions Normal mixtures

Local linear trend with student T errors

rsxfs-analysis.R

t ∼ N 0, σ 2

y t = µt + t
µt = µt−1 + δt−1 + ηµ,t−1 ηµ,t ∼ Tν (0, τµ2 )
δt = δt−1 + ηδ,t−1 ηδ,t ∼ Tν (0, τδ2 )

I This is an old Bayesian trick to ensure “robustness.”

I If you tell the model that occasional large errors are possible, it is not
surprised by occasional large errors.

Steven L. Scott (Google) bsts August 10, 2015 93 / 100

Extensions Normal mixtures

Comparing dispersion parameters

Under the Guassian and T models

Standard Deviations

Student ●
Slope

Gaussian ●
●●
● ●●
Slope

Student
Level

Gaussian ●
●
●●
●●●
●●● ● ●
Level

0.0 0.2 0.4 0.6 0.8 1.0

I Because the model is aware that occasional large errors can occur, the
“standard deviation” parameters can be smaller.

Steven L. Scott (Google) bsts August 10, 2015 94 / 100

Extensions Normal mixtures

Impact on predictions

Original Data
600

600

600
500

500

500
RSXFS / 1000

original.series

original.series
400

400

400
300

300

300
200

200

200
1995 2000 2005 2010 2009 2010 2011 2012 2013 2014 2009 2010 2011 2012 2013 2014

Time time time

I The extreme quantiles of the predictions under the Student model are
wider than under the Gaussian model.
I The central (e.g. 95%, 90%) intervals are narrower.

Steven L. Scott (Google) bsts August 10, 2015 95 / 100

Extensions Normal mixtures

More normal mixtures

Similar tricks can be used to model probit, logit, and Poisson responses,
and even dynamic support vector machines by expressing these
distributions as normal mixtures.

Steven L. Scott (Google) bsts August 10, 2015 96 / 100

Extensions Longer term forecasting

Outline

Introduction to time series modeling

Structural time series models

MCMC and the Kalman filter

Bayesian regression and spike-and-slab priors

Applications

Extensions
Normal mixtures
Longer term forecasting

Steven L. Scott (Google) bsts August 10, 2015 97 / 100

Extensions Longer term forecasting

Long term predictions

I The local linear trend model is focused on detecting short term
changes in the trend.
I Very flexible, but it forgets the past quickly.
I A less flexible, but more robust trend model is

t ∼ N 0, σ 2

yt = µt + t
ηµ,t ∼ N 0, τµ2

µt = µt−1 + δt−1 + ηµ,t−1
ηδ,t ∼ N 0, τδ2

δt = D + ρ(δt−1 − D) + ηδ,t
I Now the slopes δt follow an AR(1) process instead of a random walk.
I If |ρ| < 1 then AR(1) is stationary, so it does not entirely forget the
past.
I D is the “long run” trend of the series.
I The slope can locally deviate from D, but it will eventually return.

Steven L. Scott (Google) bsts August 10, 2015 98 / 100

Extensions Longer term forecasting

Original Data Local linear trend Long term predictions

600

600
500

500

500
RSXFS / 1000

original.series

original.series
400

400

400
300

300

300
200

200

200
1995 2000 2005 2010 2009 2010 2011 2012 2013 2014 2009 2010 2011 2012 2013 2014

Time time time

Steven L. Scott (Google) bsts August 10, 2015 99 / 100

References

Carter, C. K. and Kohn, R. (1994).

On Gibbs sampling for state space models.
Biometrika 81, 541–553.
de Jong, P. and Shepard, N. (1995).
The simulation smoother for time series models.
Biometrika 82, 339–350.
Durbin, J. and Koopman, S. J. (2002).
A simple and efficient simulation smoother for state space time series analysis.
Biometrika 89, 603–616.
Durbin, J. and Koopman, S. J. (2012).
Time Series Analysis by State Space Methods.
Oxford University Press.
Frühwirth-Schnatter, S. (1995).
Bayesian model discrimination and Bayes factors for linear Gaussian state space models.
Journal of the Royal Statistical Society, Series B: Methodological 57, 237–246.
George, E. and McCulloch R. (1997).
Approaches for Bayesian Variable Selection.
Statistica Sinica 7, 339–374.

Steven L. Scott (Google) bsts August 10, 2015 100 / 100

BBMA OA MyEnglishVersion1.1
100% (1)
BBMA OA MyEnglishVersion1.1
79 pages
Deman S. Game Theory and Its Applications To Takeovers 2021
No ratings yet
Deman S. Game Theory and Its Applications To Takeovers 2021
141 pages
Kemna Vorst
100% (1)
Kemna Vorst
17 pages
Entropy, Relative Entropy and Mutual Information
No ratings yet
Entropy, Relative Entropy and Mutual Information
4 pages
How to Implement Market Models Using VBA
From Everand
How to Implement Market Models Using VBA
Francois Goossens
No ratings yet
3 Math Basics
No ratings yet
3 Math Basics
77 pages
Divide and Conquer For Nearest Neighbor Problem
No ratings yet
Divide and Conquer For Nearest Neighbor Problem
32 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
Econ 138: Financial and Behavioral Economics: Noise February 6, 2017
No ratings yet
Econ 138: Financial and Behavioral Economics: Noise February 6, 2017
29 pages
EC744 Lecture Note 6 Stochastic Models: Mathematical Preliminaries
No ratings yet
EC744 Lecture Note 6 Stochastic Models: Mathematical Preliminaries
18 pages
Slides 2 Extending the RBC Model
No ratings yet
Slides 2 Extending the RBC Model
70 pages
Econ 138: Financial and Behavioral Economics
No ratings yet
Econ 138: Financial and Behavioral Economics
21 pages
Change Point Detection
No ratings yet
Change Point Detection
23 pages
Dokumen - Pub Time Series Econometrics J 6726102
No ratings yet
Dokumen - Pub Time Series Econometrics J 6726102
219 pages
Time Series Models With Discrete Wavelet Transform
No ratings yet
Time Series Models With Discrete Wavelet Transform
11 pages
Markov Chain Monte Carlo and Gibbs Sampling
No ratings yet
Markov Chain Monte Carlo and Gibbs Sampling
24 pages
Stochastic Calculus Notes 2/5
100% (1)
Stochastic Calculus Notes 2/5
10 pages
Temuco
No ratings yet
Temuco
298 pages
Bayes' Estimators of Generalized Entropies
No ratings yet
Bayes' Estimators of Generalized Entropies
16 pages
SSRN Id3270269 PDF
No ratings yet
SSRN Id3270269 PDF
83 pages
Ensemble Average and Time Average
No ratings yet
Ensemble Average and Time Average
31 pages
EC744 Lecture Note 8 Applications of Stochastic DP: Prof. Jianjun Miao
No ratings yet
EC744 Lecture Note 8 Applications of Stochastic DP: Prof. Jianjun Miao
31 pages
Heath Jarrow Morton A Interest Rate Model For CVA Calculations
No ratings yet
Heath Jarrow Morton A Interest Rate Model For CVA Calculations
9 pages
Markov Interest Rate Models - Hagan and Woodward
No ratings yet
Markov Interest Rate Models - Hagan and Woodward
28 pages
Lucas Tree PDF
100% (1)
Lucas Tree PDF
11 pages
Chapter 6: Time Series and Forecasting
No ratings yet
Chapter 6: Time Series and Forecasting
32 pages
Financial Numerical Recipes in C++
No ratings yet
Financial Numerical Recipes in C++
152 pages
Understanding The Kelly Capital Growth Investment Strategy
No ratings yet
Understanding The Kelly Capital Growth Investment Strategy
7 pages
Time Series Analysis and Spectral Analysis
No ratings yet
Time Series Analysis and Spectral Analysis
11 pages
Introduction To Stochastic Processes
No ratings yet
Introduction To Stochastic Processes
22 pages
Financial Time Series Notes
No ratings yet
Financial Time Series Notes
31 pages
Stochastic Calculus in Finance
No ratings yet
Stochastic Calculus in Finance
240 pages
(Quantitative Finance Collector) PDF
No ratings yet
(Quantitative Finance Collector) PDF
57 pages
Time Series Notes
100% (4)
Time Series Notes
212 pages
Propositional Calculus
No ratings yet
Propositional Calculus
191 pages
1 Introduction: Why Time Series Analysis
No ratings yet
1 Introduction: Why Time Series Analysis
22 pages
Implementing HJM Model in Python
No ratings yet
Implementing HJM Model in Python
12 pages
Stochastic Calculus Notes 1/5
100% (3)
Stochastic Calculus Notes 1/5
25 pages
Lecture - 12 Von Neumann & Morgenstern Expected Utility
No ratings yet
Lecture - 12 Von Neumann & Morgenstern Expected Utility
20 pages
EC 744 Lecture Notes: Incomplete Markets and Bewley Models: Jianjun Miao
No ratings yet
EC 744 Lecture Notes: Incomplete Markets and Bewley Models: Jianjun Miao
39 pages
Maximum Entropy Distribution of Stock Price Fluctuations
No ratings yet
Maximum Entropy Distribution of Stock Price Fluctuations
29 pages
Advance Stochastic Calculus (Abstracts) PDF
100% (2)
Advance Stochastic Calculus (Abstracts) PDF
106 pages
Lyapunov Exponents and Chaos Theory
No ratings yet
Lyapunov Exponents and Chaos Theory
33 pages
Stochastic Volatiity Models 2005 PDF
No ratings yet
Stochastic Volatiity Models 2005 PDF
35 pages
EC744 Lecture Note 3 Dynamic Programming Under Certainty: Prof. Jianjun Miao
No ratings yet
EC744 Lecture Note 3 Dynamic Programming Under Certainty: Prof. Jianjun Miao
17 pages
(Probability and Stochastics Series) Richard Durrett - Stochastic Calculus - A Practical Introduction-CRC Press (1996)
No ratings yet
(Probability and Stochastics Series) Richard Durrett - Stochastic Calculus - A Practical Introduction-CRC Press (1996)
353 pages
(Whalley, Wilmott) An Asymptotic Analysis of An Optimal Hedging Model For Option Pricing With Transaction Costs (Jul1997)
100% (1)
(Whalley, Wilmott) An Asymptotic Analysis of An Optimal Hedging Model For Option Pricing With Transaction Costs (Jul1997)
18 pages
0082 A Multi-Factor Model For Energy Derivatives
No ratings yet
0082 A Multi-Factor Model For Energy Derivatives
20 pages
Partial Differential Equations A Unified Hilbert Space Approach 1st Edition Rainer Picard All Chapters Instant Download
100% (1)
Partial Differential Equations A Unified Hilbert Space Approach 1st Edition Rainer Picard All Chapters Instant Download
82 pages
Bovier & Den Hollander - Metastability
No ratings yet
Bovier & Den Hollander - Metastability
578 pages
Rough Volatility 2023 Part 1 Handout
No ratings yet
Rough Volatility 2023 Part 1 Handout
43 pages
A Practical ImplementationOfHJM
No ratings yet
A Practical ImplementationOfHJM
336 pages
(William Feller) An Introduction To Probability TH
No ratings yet
(William Feller) An Introduction To Probability TH
683 pages
Girsanov, Numeraires, and All That PDF
No ratings yet
Girsanov, Numeraires, and All That PDF
9 pages
Applications of Derivatives Rate of Change (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Rate of Change (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Principles of Quantitative Development
From Everand
Principles of Quantitative Development
Manoj Thulasidas
No ratings yet
Constructive Real Analysis
From Everand
Constructive Real Analysis
Allen A. Goldstein
No ratings yet
Group Theory in Solid State Physics and Photonics: Problem Solving with Mathematica
From Everand
Group Theory in Solid State Physics and Photonics: Problem Solving with Mathematica
Wolfram Hergert
No ratings yet
Mastering Markets: The Ultimate Guide to Backtesting and Strategy Validation
From Everand
Mastering Markets: The Ultimate Guide to Backtesting and Strategy Validation
William Johnson
No ratings yet
Unit Root, Cointegration, Granger-Causality, Threshold Regression and Other Econometric Modeling with Economics and Financial Data: 單根，共積，格蘭傑爾因果，門檻迴歸及其他計量經濟模式
From Everand
Unit Root, Cointegration, Granger-Causality, Threshold Regression and Other Econometric Modeling with Economics and Financial Data: 單根，共積，格蘭傑爾因果，門檻迴歸及其他計量經濟模式
Chin-Wei Yang
No ratings yet
The universal behavioral model: The homeodynamic paradigm in action
From Everand
The universal behavioral model: The homeodynamic paradigm in action
Fernando Martinez-Gil Gutierrez de la C
No ratings yet
Publishedpaper Bayesian 4
No ratings yet
Publishedpaper Bayesian 4
9 pages
TESIS20TANPA20BAB20PEMBAHASAN
No ratings yet
TESIS20TANPA20BAB20PEMBAHASAN
67 pages
Varimax
No ratings yet
Varimax
11 pages
Raport - June To August - Divii
No ratings yet
Raport - June To August - Divii
1 page
Ce Marking Brochure v2.3 in en 0421
No ratings yet
Ce Marking Brochure v2.3 in en 0421
4 pages
Bill of Supply For Electricity: Area Details Connection Details Supply and Meter Details
No ratings yet
Bill of Supply For Electricity: Area Details Connection Details Supply and Meter Details
2 pages
Monash University: Department of Econometrics and Business Statistics
No ratings yet
Monash University: Department of Econometrics and Business Statistics
19 pages
3D Illusion Museum Vision
No ratings yet
3D Illusion Museum Vision
7 pages
Policy No.: Comprehensive Machinery (CM) Insurance
No ratings yet
Policy No.: Comprehensive Machinery (CM) Insurance
25 pages
9 Chapters 1 - 17
No ratings yet
9 Chapters 1 - 17
110 pages
Reposabrazos Fiat 500 Dolcevita 112043514951
No ratings yet
Reposabrazos Fiat 500 Dolcevita 112043514951
1 page
Sinergi - No. 1
No ratings yet
Sinergi - No. 1
9 pages
2024 Parking Map
No ratings yet
2024 Parking Map
1 page
Thesis Statement Examples For The Great Depression
100% (3)
Thesis Statement Examples For The Great Depression
6 pages
Format of The Internship Report
No ratings yet
Format of The Internship Report
3 pages
Test Bank for Microeconomics, 9th Edition Robert Pindyck Daniel Rubinfeld instant download
100% (3)
Test Bank for Microeconomics, 9th Edition Robert Pindyck Daniel Rubinfeld instant download
33 pages
Case Study Preparation With Potential Q&A
No ratings yet
Case Study Preparation With Potential Q&A
3 pages
Topic 3 - The Economics of Tourism & Hospitality
No ratings yet
Topic 3 - The Economics of Tourism & Hospitality
12 pages
Terminology Numerical Measure Verbal Discription
No ratings yet
Terminology Numerical Measure Verbal Discription
3 pages
Pengaruh Gaya Hidup, Kualitas Produk, Promosi, Dan Citra Merek Terhadap Keputusan Pembelian Eiger
No ratings yet
Pengaruh Gaya Hidup, Kualitas Produk, Promosi, Dan Citra Merek Terhadap Keputusan Pembelian Eiger
26 pages
Chapter 1 - Enterprise - The Role of Entrepreneurs and Intrapreneurs
No ratings yet
Chapter 1 - Enterprise - The Role of Entrepreneurs and Intrapreneurs
16 pages
REMOC_O2_2059_P2_25 - Copy
No ratings yet
REMOC_O2_2059_P2_25 - Copy
23 pages
托业语法测试与解析（5 14）
No ratings yet
托业语法测试与解析（5 14）
13 pages
Drain Theory and Its Debate
No ratings yet
Drain Theory and Its Debate
14 pages
How To Combine Forex Indicators Like A Pro
No ratings yet
How To Combine Forex Indicators Like A Pro
17 pages
Báo Giá C A Ishida
No ratings yet
Báo Giá C A Ishida
2 pages
SCM Chapter 3 Seatwork Answers
No ratings yet
SCM Chapter 3 Seatwork Answers
5 pages
Xii, Accountancy - Practice Work Book
No ratings yet
Xii, Accountancy - Practice Work Book
182 pages
Seabites_business Feasibility Study Final for Revisions
No ratings yet
Seabites_business Feasibility Study Final for Revisions
270 pages
2025_04_21_16_42
No ratings yet
2025_04_21_16_42
2 pages
153 - Rojas V Maglana
No ratings yet
153 - Rojas V Maglana
2 pages
By Jamshaid Akhtar ACA: Events After Reporting Period
No ratings yet
By Jamshaid Akhtar ACA: Events After Reporting Period
1 page
Networking
No ratings yet
Networking
16 pages

Bayesian Structural Time Series Models

Uploaded by

Bayesian Structural Time Series Models

Uploaded by

Bayesian Structural Time Series Models

August 10, 2015

The goal for the day is to introduce you to:

Steven L. Scott (Google) bsts August 10, 2015 2 / 100

Harvey Durbin and Koopman West and Harrison

Chatfield Brockwell and Davis Petris et. al

Introduction to time series modeling

Structural time series models

MCMC and the Kalman filter

Bayesian regression and spike-and-slab priors

Steven L. Scott (Google) bsts August 10, 2015 4 / 100

Strategies for time series models

Steven L. Scott (Google) bsts August 10, 2015 5 / 100

Introductory statistics courses teach students to fit models like

1. The trend probably won’t follow a parametric form.

Steven L. Scott (Google) bsts August 10, 2015 6 / 100

Air passengers log scale

Steven L. Scott (Google) bsts August 10, 2015 7 / 100

Linear time trend doesn’t quite fit

reg <- lm(air ~ time + months) −0.06 ●

2.2 2.4 2.6 2.8

Steven L. Scott (Google) bsts August 10, 2015 8 / 100

acf(residuals) 0 20 40 60 80 100 120 140

I Predictions between months 80 0.8

- 100 predictably too low. 0.6

I Between months 100 - 120 0.4

predictably too high. ACF 0.2

ARMA(P,Q) models have the form

Some features that make ARMA models difficult:

Steven L. Scott (Google) bsts August 10, 2015 10 / 100

sample.size <- 1000

Steven L. Scott (Google) bsts August 10, 2015 11 / 100

What it looks like

AR1 Random Walk

Steven L. Scott (Google) bsts August 10, 2015 12 / 100

What it looks like

Steven L. Scott (Google) bsts August 10, 2015 13 / 100

If |φ| < 1 then as t → ∞, Var (yt ) = Var (t )/(1 − |φ|).

st = αyt + (1 − α)(st−1 + bt−1 )

I What happens if you want to include a regression component?

Steven L. Scott (Google) bsts August 10, 2015 15 / 100

Advantages of structural time series models

I All the flexibility of regression models.

I The locality of ARMA models and smoothing.

I Can handle non-stationarity.

I Modular, so easy combine with other additive components.

I All those “smoothing parameters” become variances that can be

Steven L. Scott (Google) bsts August 10, 2015 16 / 100

Introduction to time series modeling

Structural time series models

MCMC and the Kalman filter

Bayesian regression and spike-and-slab priors

Steven L. Scott (Google) bsts August 10, 2015 17 / 100

Structural time series models

There are two pieces to a structural time series model

Steven L. Scott (Google) bsts August 10, 2015 18 / 100

Structural time series models are modular

Steven L. Scott (Google) bsts August 10, 2015 19 / 100

I Local linear trend: “level” µt + “slope” δt .

Steven L. Scott (Google) bsts August 10, 2015 20 / 100

Some models for trend

Steven L. Scott (Google) bsts August 10, 2015 21 / 100

Understanding the local level model

I A compromise between the random walk model (when σ 2 = 0) and

Simulating the local level model

tau = 1, sigma = 0.5

Steven L. Scott (Google) bsts

Local linear trend

I We normally think of a “linear trend” as

I γsummer = −(γspring + γwinter + γfall ) + ηt−1

Steven L. Scott (Google) bsts August 10, 2015 27 / 100

Posterior distribution of state

1950 1952 1954 1956 1958 1960

Steven L. Scott (Google) bsts August 10, 2015 28 / 100

Contributions from each component

plot(model, "comp") ## "components"

If |φ| < 1 then as t → ∞, Var (yt ) = Var (t )/(1 − |φ|).