Rolling Regression Theory
Rolling Regression Theory
Abstract
We find the asymptotic distribution for rolling linear regression models using various
window widths. The limiting distribution depends on the width of the rolling window,
and on a “bias process” that is typically ignored in practice. Based on the distribution,
we tabulate critical values used to find uniform confidence intervals for the average
values of regression parameters over the windows. We propose a corrected rolling
regression technique that removes the bias process by rolling over smoothed parameter
estimates. The procedure is illustrated using a series of Monte Carlo experiments.
The paper includes an empirical example to show how the confidence bands suggest
alternative conclusions about the persistence of inflation.
1 Introduction
Rolling regression is often employed in many applied fields as a method to characterize
changing relationships over time. As a simple robustness check, regression parameters are
estimated using some fraction of the data early in the sample. The fixed fraction is then
“rolled” through the sample, so that the estimated regression parameters may vary over time.
This intuitive procedure is one methods of examining the stability of statistical relationships
over time. A cursory search will reveal that there are rolling regression routines written
∗
We thank Graham Elliott, Bruce Hansen, and Ulrich Müller for comments on earlier versions of this
paper.
†
Contact information: Zongwu Cai, Department of Economics, University of Kansas, Lawrence, KS, 66045
([email protected]); Ted Juhl, School of Business, University of Kansas, Lawrence, KS 66045 ([email protected])
1
in statistical computing languages or packages such as R, SAS, STATA, Matlab, RATS,
Python, Eviews, and Excel.
As part of the prototypical exercise of reporting rolling regression estimates, researchers
often plot bands around the point estimates as a way to conduct some type of ocular infer-
ence about whether there are changes in relationships over time. The regression bands are
constructed using estimated standard errors from the regression parameters in the relevant
time period for the rolling window. Then, these estimated standard errors are multiplied
by critical values from the standard normal distribution. Recent papers using rolling regres-
sion with confidence bands include Swanson and Williams (2014), Linnainmaa and Roberts
(2017), Adrian et al. (2015), Blanchard (2018), Georgiev et. al. (2018), Jiménez et. al.
(2017), and López-Salido et al. (2017), among others.1
In this paper, we characterize the population parameters that rolling regression is at-
tempting to estimate and we find the distribution of the rolling estimator. We provide new
critical values that are used to construct asymptotically correct confidence bands for the
estimated function. As part of this exercise, we show that rolling regression contains a bias
process that may inhibit inference about the true population parameters. We develop a new
procedure to estimate rolling regression parameters that is not affected by the bias process.
The original idea of rolling regression is an intuitive one, in that we want to use regression
over different time intervals to examine how the relationship may have changed. The window
width may appear to be ad-hoc, but it is often based on equating the window width with
some number of observations that seems appropriate for time series estimation, or based on
decades, or some other relevant time frame. However, the results of this paper suggest that
rolling regression is a compromise of the usual bias variance tradeoff. In particular, we can
obtain the usual parametric convergence rates for rolling regression estimates (rather than
nonparametric ones), although with a different limiting distribution.
The remainder of the paper is structured as follows. In Section 2, we develop the model,
assumptions, and asymptotic distribution results for the existing rolling regression proce-
dures. In Section 3, a new procedure is proposed to deal with the bias process. Our proce-
1
Rolling regression is also used in forecasting in work by Clark and McCracken (2009) in a framework
that allows for structural change in regression parameters.
2
dure is discussed. Section 4 provides Monte Carlo evidence for the competing procedures.
An empirical example is treated in Section 5 to illustrate differences in results based on our
techniques versus the traditional confidence bands for the persistence of inflation. Section 6
concludes. Finally, all technical proofs are gathered in Section 7.
yt = x⊤
t β + t , 1 ≤ t ≤ T,
where xt is a p-dimensional regressor. Let λ be the fraction of the total sample of T obser-
vations that is used in the rolling sample of data. The rolling regression estimator uses the
[T λ] observations, where [x] denotes the integer part of x, and we index each of the periods
with r so that we have
−1
[rT ] [rT ]
1 1
β̂λ (r) = xs x⊤
s
xs ys .
Tλ Tλ
s=[rT −T λ+1] s=[rT −T λ+1]
Clearly, if a constant coefficient regression model is correctly specified, then the rolling
regression estimator would estimate the same parameter, β, in each of the sub-samples using
λ × 100% of the data. Such an exercise is not interesting as β is constant over time, and
rolling estimators will be inefficient relative to the full sample regression estimator. Consider
the model
t
yt = x⊤
t β + t
T
so that the regression parameter changes over time. The data are assumed to become more
dense around a point t as T increases, a device employed in many studies, beginning notably
in Robinson (1989); see, for example, Cai (2007) for details. Given that the regression
parameter is potentially changing at each point in time, as T increases, we hope to estimate
the “average” value of β(t/T ) in the rolling window. To be specific, define
1 r
β̄λ (r) = β(u)du,
λ r−λ
3
which is the population parameter of interest indexed by λ and r. This quantity represents
the average of the coefficients at point r given a rolling window fraction λ.
Our first task is to ascertain whether rolling regression indeed provides a consistent
estimate of the parameter β̄λ (r). We make several assumptions about the data generating
process for the results in this section. Our theory makes use of the characterizations of
processes from Zhou and Wu (2010). First, we allow for time varying processes, a form
of non-stationarity. In particular, given that we are attempting to estimate time-varying
parameters, it is natural to allow for non-stationary processes. Changing values of β(t/T )
necessarily induces non-stationarity in yt . Moreover, if the model is dynamic, then xt is also
non-stationary if β(t/T ) changes. To this end, consider processes depending on deterministic
functions coupled with the iid process vt . Let Ft = {. . . , vt−1 , vt }. The processes for the
covariates and errors are given by
t t
xt = G , Ft and t = H , Ft ,
T T
respectively, where the functions G and H allow for non-stationary processes in that the
moments may change over time. The index t is scaled by the sample size T so that the data
are assumed to be observed more densely as we collect more observations. In particular,
define the second moment matrix of the xt process as
⊤
t t
M (t) = E G , Ft G , Ft .
T T
The process associated with xt t is denoted GH and the covariance matrix of this product
process as ⊤
t t
Ω(t) = E GH , Ft GH , Ft .
T T
Statisticians characterize dependence in data in several ways; linear processes, α-mixing,
β-mixing, etc. Recent papers by Chen and Hong (2012) and Cai (2007) both make use of
β-mixing and α-mixing assumptions, but the data are assumed to be stationary in both
papers. The assumption in Inoue et al. (2017) allows the data to be near-epoch dependent
(NED). As argued in Inoue et al. (2017), the NED assumption is more general than the
α-mixing assumption, allows for heterogeneity over time which is necessary for time-varying
4
parameter framework, and overcomes several undesirable features of the α-mixing assump-
tion as addressed by Lu and Linton (2007). In this paper, we follow Zhou and Wu (2008) and
allow for non-stationary processes that might arise from dynamic models with time-varying
parameters. To this end, let v0′ be an iid copy of the variable v0 that is part of Fj . Define
Fj∗ = {. . . , v−1 , v0′ , v1 , . . . , vj−1 , vj }. We wish to characterize the dependence of a process by
measuring the effects of a shock to the system. For the variable xt , define
t t
δq (x, j) = sup
G , F − G , F ∗
T j q
j
t T
for some q ≥ 1, which is a measure of the effect of a shock after j periods. Limiting the
allowable dependence in a process amounts to specifying suitable rates of decay for δq (x, j)
as j increases.
The dependence in these processes over time is a separate issue than whether the process
is stationary. One class of non-stationary processes is the unit root process, where the
autoregressive parameter is related to both dependence of the process over time as well as
whether the variance of the process is constant over time. In the framework of this paper,
we allow the processes to be non-stationary in a way that is separate from the dependence
over time. To this end, we denote a process G to be stochastically Lipshitz continuous if
s t (t − s)
G , F0 − G , F0
sup
0≤s≤t≤T
T T ≤ c1 T
2
5
Assumption 4 The functions G and GH are stochastically Lipshitz continuous processes.
Assumption 1 allows for the regression parameters to vary over time, perhaps with dis-
continuities. The martingale difference assumption could be relaxed and would require a
long-run variance estimator for inference procedures. In the present case, we can assume a
dynamic model may be specified to remove any correlation in t . Moreover, because of the
Assumptions 3 and 4, we allow for dynamic models with changing parameters, so that we
can assume martingale differences for the error process. In particular, autoregressive models
with time varying coefficients are considered in Zhang and Wu (2012). They show that, given
(standard) conditions on the time varying roots of the characteristic function, the process is
locally stationary and they characterize the decay in dependence. These models are shown
to satisfy similar conditions to the conditions in this paper. Assumption 5 is sufficient for
the rolling regression estimator to exist in the limit for any of the rolling windows one might
consider.
We now provide the limiting distribution of the rolling regression estimator with its proof
given in Section 7.
Theorem 2.1 Suppose that Assumptions 1 to 5 hold and that r ∈ [λ, 1]. Then,
−1
√ r
T (β̂λ (r) − β̄λ (r) − BT (r)) ⇒ M (s) [Q(r) − Q(r − λ)] ,
r−λ
where
−1
[rT ] [rT ] [rT ]
BT (r) = xs x⊤
s
xs x⊤
s
β( s ) − 1 s
β( )
T Tλ T
s=[rT −T λ+1] s=[rT −T λ+1] s=[rT −T λ]
serves as the asymptotic bias term, ⇒ indicates weak convergence, and Q(r) denotes a p-
min(r1 ,r2 )
dimensional Gaussian process with covariance E[Q(r1 )Q(r2 )⊤ ] = 0 Ω(s).
The result provides several insights toward the use of rolling regression. First, the limit-
ing distribution involves Q(r), a functional of Brownian motion. In the search for confidence
bands of the “average” of the regression parameters, using critical values from a standard
6
normal distribution is incorrect.2 As we illustrate in the Monte Carlo section, using stan-
dard normal critical values for confidence bands for the estimate of the average parameter
vector, β̄λ (r), will be too narrow, so that coverage probabilities are well below target lev-
els. Intuitively, the smaller the rolling window, we expect that wider confidence bands are
required.
In addition to the finding of the limiting distribution involving functionals of Brownian
motion, the distribution is affected by a bias process denoted BT (r). If the parameter vector
β(s/T ) is constant over the entire sample, then this process disappears. However, in such
cases, rolling regression is not interesting. If xs x⊤
s is unrelated to β(s/T ), then the process
will have zero mean, and hence we can apply a functional central limit theorem to it. In this
case, we can think of this term as as generating an additional term in the variance process.
If xs x⊤
s is related to β(s/T ), then the bias process has non-zero mean. The intuition for
the bias process is that regression over the rolling interval will weight the data, and hence
the parameter vector, with more weights to observations with larger values of xs x⊤
s . If these
quantities are related to the parameter vector, then we may find inconsistent estimates of the
average, given that some parameter values are over-weighted. A related phenomenon arises
for fixed effects regression in panel data models. In particular, Campello et al. (2019) show
that if cross-sectional units are heterogeneous in slope, a bias may result if the heterogeneity
is related to the second moments of the regressors.
Theorem 2.1 allows for both xt and t to exhibit a form of non-stationary behavior,
which complicates the limiting distribution of the rolling estimator. The following corollary
provides a simplification. If xt and t are both stationary, the result simplifies. First, the bias
r
process BT (r) disappears. In addition, M (s) and Ω(s) are constant, so that r−λ M (s) = λM ,
min(r1 ,r2 )
where M = E(xs x⊤ s ), and 0 Ω(s) = min(r1 , r2 )Ω with Ω = E(xs x⊤ 2
s s ).
so that the autoregressive coefficient φ(t/T ) is constant but the intercept term is changing
2
A large literature on testing for structural change with unknown change point illustrates the need for
different critical values in hypothesis testing.
7
over the sample. It is easy to show, regardless of the width of the rolling window, that the
asymptotic bias for estimating the parameter φ is given by
k12 (1 − φ)(1 − φ2 )/ k12 (1 − φ2 ) + 12σ 2 .
−1
1 + 12σ 2 /k12 .
This simple case is much like the result in Perron (1989) for an omitted (broken) trend in
the data generating process causes a bias in the estimates of the autoregressive parameters.
Even if there is no serial correlation, the estimated autoregressive parameter may be close
to one. The larger the magnitude of the omitted trend, the more bias. In general, the bias
process BT (r) will be larger in magnitude as there is more correlation in the second moment
of xt and the parameter vector β(t/T ).
The limiting distribution is a functional of the process given by Q(r). The usual (incorrect)
procedure for constructing confidence bands is to calculate a standard error estimate for
each of the sub-periods in the rolling regression, and then employ standard normal critical
values. For a rolling regression indexed by r, we would use a variance estimator given by
−1 −1
[T r] [T r] [T r]
V̂ (β̂λ (r) = xs x⊤
s
xs x⊤
sˆ2s xs x⊤
s
.
s=[T (r−λ)+1] s=[T (r−λ)+1] s=[T (r−λ)+1]
If there is no bias process, it is easy to see that the standardized process will converge to a
Gaussian process, Q̃(r) with covariance
r1 −1/2 r1 r2 −1/2
⊤
E Q̃(r1 )Q̃(r2 ) = Ω(s) Ω(s) Ω(s)
r1 −λ r2 −λ r2 −λ
for r1 < r2 (or 0 if r1 < r2 − λ). The process has variance Ik , and the dependence arises
from the non-zero covariance when r1 > r2 − λ.
From the limiting distribution of rolling regression estimators, we see that if we are
attempting to construct uniform confidence bands for β̄λ (r), standard normal critical values
8
are inappropriate. To this end, we want to find critical values θλ such that
P sup Q̃(r) ≤ θλ = 0.95.
r∈[λ,1]
If the variance process is constant over r, so that Ω(s) = Ω, then the distribution of Q̃(r)
is a function of standard Brownian Motion, adjusted for different values of the fraction of
data used in rolling, λ. To illustrate the distribution under this case when k = 1, we simulate
critical values by generating random variables ut from a standard normal distribution. Then,
r]
the standard Brownian Motion W (r) is simulated with √1T [T t=1 ut where T = 10, 000. For
values of λ, find supr∈[λ,1] √1 |W (r) − W (r − λ)|, and repeat 100, 000 times for values of λ
λ
from 0.05 to 0.85. The resulting critical values are analogous to those calculated in Andrews
(1993) for tests of structural change in regression models. The 0.90, 0.95, and 0.99 quantiles
of empirical distribution are provided in Table 1.
The width of the confidence bands increases as the rolling window becomes smaller. For
example, if we use 10% of the sample in the rolling window, a confidence band will be 3.499
times the relevant standard error for 95% confidence. The factor for a 20% window is 3.265,
while an 80% window width increases the factor to 2.377. If one increases the window width
to 100% of the sample, we obtain the usual 1.96. These are the critical values that are
appropriate for confidence bands of rolling regressions in the absence of the bias process. At
a minimum, these tables provide values to replace critical values from the standard normal
tables when applying rolling regression.
In the more realistic case of a time varying variance process governed by Ω(s), critical
values will depend on Ω(s) which enters in the Gaussian process. We follow Hansen (1996)
and simulate the limiting distribution of the process so that we have the proper critical
values for the process associated with Ω(s). To this end, define the following estimators,
ˆs = ys − x⊤
s β̂λ (s/T )
and let vs be a generated standard normal variable. Then, we can simulate the process Q̃(r)
via −1/2
[T r] [T r]
1 1
Ũ (r) = xs x⊤
sˆ2s √ xs ˆs vs .
T T
s=[T (r−λ)+1] s=[T (r−λ)+1]
9
Table 1. Critical Values for Rolling Confidence Bands
Confidence Level Confidence Level Confidence Level
λ 0.90 0.95 0.99 λ 0.90 0.95 0.99 λ 0.90 0.95 0.99
0.050 3.498 3.708 4.148 0.320 2.795 3.059 3.590 0.590 2.423 2.707 3.262
0.060 3.440 3.648 4.092 0.330 2.780 3.048 3.581 0.600 2.404 2.695 3.260
0.070 3.395 3.614 4.074 0.340 2.767 3.028 3.564 0.610 2.389 2.685 3.252
0.080 3.346 3.570 4.016 0.350 2.753 3.016 3.533 0.620 2.380 2.672 3.266
0.090 3.311 3.535 3.992 0.360 2.730 3.000 3.533 0.630 2.380 2.674 3.248
0.100 3.271 3.499 3.956 0.370 2.712 2.983 3.523 0.640 2.357 2.655 3.238
0.110 3.245 3.478 3.945 0.380 2.700 2.970 3.512 0.650 2.343 2.636 3.214
0.120 3.213 3.444 3.912 0.390 2.690 2.962 3.496 0.660 2.332 2.628 3.208
0.130 3.178 3.415 3.895 0.400 2.677 2.942 3.481 0.670 2.322 2.623 3.214
0.140 3.159 3.388 3.866 0.410 2.662 2.935 3.477 0.680 2.309 2.600 3.176
0.150 3.125 3.356 3.843 0.420 2.646 2.920 3.469 0.690 2.288 2.585 3.163
0.160 3.105 3.344 3.823 0.430 2.635 2.917 3.462 0.700 2.283 2.578 3.176
0.170 3.080 3.319 3.808 0.440 2.617 2.896 3.449 0.710 2.264 2.562 3.155
0.180 3.056 3.299 3.780 0.450 2.603 2.887 3.436 0.720 2.254 2.551 3.145
0.190 3.035 3.284 3.784 0.460 2.591 2.870 3.419 0.730 2.239 2.540 3.135
0.200 3.013 3.265 3.755 0.470 2.582 2.854 3.410 0.740 2.221 2.524 3.136
0.210 2.993 3.240 3.729 0.480 2.563 2.846 3.398 0.750 2.212 2.512 3.115
0.220 2.976 3.225 3.701 0.490 2.554 2.834 3.384 0.760 2.202 2.501 3.079
0.230 2.942 3.194 3.704 0.500 2.544 2.825 3.388 0.770 2.190 2.491 3.093
0.240 2.926 3.182 3.708 0.510 2.530 2.818 3.367 0.780 2.167 2.473 3.062
0.250 2.921 3.174 3.691 0.520 2.513 2.796 3.355 0.790 2.155 2.461 3.058
0.260 2.896 3.154 3.656 0.530 2.499 2.779 3.342 0.800 2.154 2.459 3.059
0.270 2.876 3.134 3.660 0.540 2.483 2.770 3.326 0.810 2.134 2.438 3.029
0.280 2.863 3.120 3.646 0.550 2.468 2.758 3.322 0.820 2.114 2.424 3.024
0.290 2.844 3.103 3.621 0.560 2.459 2.749 3.317 0.830 2.112 2.414 3.027
0.300 2.828 3.091 3.619 0.570 2.443 2.733 3.295 0.840 2.095 2.399 2.995
0.310 2.818 3.078 3.600 0.580 2.428 2.718 3.284 0.850 2.072 2.377 2.999
The simulated (standardized) process will have the same covariance structure as the limiting
distribution in Q̃(r), so that we can construct bands using the maximum of the absolute value
of the appropriate entry of the process, in combination with the element from the variance
matrix to obtain standard errors. The procedure can be applied under the assumption that
the bias process is zero, as would be the case if the second moments of xt are unrelated to
the time-varying regression parameters.
10
3 Estimating Average Slope
Given that the goal of rolling regression appears to be estimating the average slope of the
regression function as we move the window through time, we propose to estimate that quan-
tity directly. That is, we seek to estimate the integral of the regression coefficient over the
subintervals of the sample data. Our intent in the direct estimation of β̂λ (r) is to avoid the
potential bias process given by BT (r).
The local linear estimator for β(t/T ) was analyzed in Cai (2007) for stationary α-mixing
data, and is given by β̃(t/T ), where
−1
ST,0 ST,1
⊤
VT,0
β̃(t/T ) = Ip 0p
ST,1 ST,2 VT,1
with
T ℓ T ℓ
1 t−s 1 t−s
ST,ℓ (t/T ) = xs x⊤
s Kt,s , VT,ℓ (t/T ) = xs ys Kt,s
T h s=1 Th T h s=1 Th
Kt,s = K(t−s/T h), and K(·) being a kernel function. It is well known that the nonparametric
estimator is consistent and point-wise normally distributed. Zhou and Wu (2010) extend
the work of Johnston (1982) to include uniform confidence bands in a time series setting
for the function over the entire range of the data. The uniform confidence bands converge
at an even slower rate than the usual nonparametric estimators. In addition, there is an
additional bias term which depends on the second derivative of β(t/T ) at each point, which
adds another obstacle to the construction of confidence bands.
We propose following estimator for β̄λ (r);
1
Tλ
s
β̂λ∗ (r) = β̃ .
[T r] T
s=[rT −T λ+1]
The intuition for our estimator is that we hope to combine each estimator of β(s/T ) for the
relevant range. By choosing an appropriate bandwidth parameter, we can eliminate the bias
process altogether. We list three additional assumptions for the data generating process and
the bandwidth.
11
Assumption 7 The bandwidth is chosen such that h = c2 T −δ with 1/4 < δ < 1/3.
Assumption 8 The kernel function K(z) is second order and takes the value 0 outside of
[−1, 1].
Theorem 3.1 Suppose that Assumptions 1-8 hold with r ∈ (λ, 1).
√
T (β̂λ∗ (r) − β̄λ (r)) ⇒ [Q2 (r) − Q2 (r − λ)] ,
where Q2 (r) is p-dimensional Gaussian process with covariance matrix E Q2 (r1 )Q2 (r2 )⊤ =
min(r1 ,r2 )
0
Λ(s)ds, and
1
Λ(s) = 2 M (s)−1 Ω(s)M (s)−1 .
λ
The new estimator is consistent for β̂λ (r). Moreover, like the naive rolling estimator,
the limiting distribution of our new statistic is also a function of Q(r). Hence, we can
employ the generated critical values to construct uniform confidence bands for β̂λ (r). This
rolling average smoothed estimator can be viewed as a bias corrected version of rolling
regression, and the limiting distribution still involves a similar Gaussian process. The bias
is absent because we are directly estimating the average of the parameter values through
the two step process. We first estimate the time-varying parameters directly via local-linear
estimation, and then we average those estimates. Similar results are often obtained in semi-
parametric models involving averages of nonparametric estimates. The advantage of this
procedure is that the averaging operation provides a faster rate of convergence that does not
depend on the nonparametric bandwidth rate. In this way, the rolling estimator provides a
computationally tractable procedure with the same interpretation as the traditional rolling
12
regression procedure. However, our method now has the correct uniform size, where the
traditional rolling procedure would have bands that are too narrow, resulting in incorrect
coverage.
Construction of the appropriate confidence bands is similar to the method used in Zhang
and Wu (2012) along with our tabulated critical values. We estimate the standard errors of
the modified rolling estimators. Let
T ℓ
⊤
ST,0 (s) ST,1 (s) 1 ⊤ 2 s−r s−r
S̃(s) = , Ω̃ℓ (s) = xr xr ˜r K ,
ST,1 (s) ST,2 (s) T h r=1 Th Th
Ω̃0 (s) Ω̃1 (s)
Ω̃(s) = , and ˜r = yr − x⊤ r β̃(r).
Ω̃1 (s) Ω̃2 (s)
Then, the standard errors are estimated via the variance matrix
1
[T r]
⊤ 1
[T r]
.. −1 −1 .
.
T λ2
2 Ip . 0p S̃(s) Ω̃(s)S̃(s) I p . 0p = T 2 Λ̃(s).
s=[T (r−λ)] s=[T (r−λ)]
The standardized estimator will be governed by the process Q̃2 (r), which is Gaussian with
covariance process
r1 −1/2 r1 r2 −1/2
⊤
E Q̃2 (r1 )Q̃2 (r2 ) = Λ(s) Λ(s) Λ(s)
r1 −λ r2 −λ r2 −λ
As in the case of the traditional rolling regression, the appropriate critical values for the
rolling average slope estimator are not the usual values associated with the standard normal
distribution. In the most restrictive case where Ω(s) and the second moment matrix of the
regressors M (s) are constant over s, we can employ the critical values appearing in Table 1.
If Ω(s) and M (s) are time-varying, we can use the estimated components of Λ(s) so that
we can simulate using
−1/2
[T r] [T r]
1 1
Ũ2 (r) = Λ̃(s) √ Λ̃(s)1/2 vs .
T T
s=[T (r−λ)+1] s=[T (r−λ)+1]
13
4 Monte Carlo Studies
The theorems from Sections 2 and 3 show that rolling regression estimators have a limiting
distribution that depends on functionals of Brownian motion, and those new critical values
are provided in Table 1. The purpose of Theorem 3.1 is to provide a bias corrected estimate
of the average slope.
We explore the performance of various rolling regression estimators in this section. In
particular, the naive rolling regression estimator that uses standard normal critical values
is denoted RO for Rolling OLS. A second estimator is considered but uses the new critical
values, and we refer to this estimator as adjRO. This rolling estimator accounts for Q(r) in
the limiting distribution, but does not correct for the possible bias process B(r). Finally, we
include three versions of the corrected rolling regression estimator proposed in Section 3. The
estimator depends on a bandwidth parameter for the time varying parameter regression at
the first stage. The form of the allowable bandwidth is h = c2 T −δ where −1/4 < δ < −1/3.
We use δ = 0.30 and set c2 = 0.5, 0.75, and 1. The estimators are denoted SRb1, SRb2, and
SRb3.
Before we consider rolling regression, we illustrate the bias that one encounters if an
autoregressive process has an omitted trend, which could be considered a case where λ = 1.
To this end, we simulate the process discussed in Section 2 with a missing trend. The bias
of the estimated AR parameter is indexed by k1 , the magnitude of the missing trend. We
simulate an AR process with φ = 0 and consider values of k1 ranging from 0 to 15, with
T = 200 and 1000 replications for each value of k1 . The resulting bias for OLS and the
smoothed estimates (Sb1, Sb2, Sb3) appears in Figure 1 (see later). We note that there is
a small negative bias for the smoothed regression estimators. However, it does not change
as the magnitude of the missing trend grows, which illustrates that this procedure removes
the bias process from the estimator. Moreover, the bias in OLS grows as the omitted trend
indexed via k1 grows larger, consistent with the results of Perron (1989).
Given the potential for bias in rolling estimators, we illustrate several data generating
processes using rolling estimators. For each experiment, we allow for time series dimensions
T = 200,400 and 600. The rolling window is set for λ = 0.20 so that there are 40, 80,
14
and 120 observations used in each of the regimes in the estimation window. The number
of replications for each experiment is set to 10, 000. The parameter of interest is β̄λ (r),
the average value of β(r) over each relevant subset of time. Our experiments report the
estimated coverage probability for a 95% uniform confidence band for β̄λ (r). We report the
mean average deviation over the range. In addition, we list the average width of the uniform
confidence band. For example, if two competing procedures both have 95% coverage, we
would prefer the method generating narrow bands.
Denote the current naive technology of using rolling OLS and employing critical values
from the standard normal distribution as RO (rolled OLS). If we use the adjusted critical
values tabulated in Table 1 with rolled OLS, we list the procedure as adjRO. Our smoothed
rolling procedure using the adjusted critical values is indexed by the bandwidths as SRb1,
SRb2, and SRb3.3
Our first experiment uses a simple static regression model given by
yt = 2xt + t
15
Table 2. Static Regression
Next, we generate data from a simple autoregressive model with no parameter changes.
The data generating process is given by
yt = 0.80yt−1 + t
Given the rolling fraction of λ = 0.20, the procedures are attempting to estimate the AR
parameter based on 40 observations. The SRb3 procedure is best for this data generating
16
process, with small MAD and narrow confidence bands, but the adjusted rolling OLS pro-
cedure performs respectably. Again, the naive traditional rolling OLS procedure performs
poorly, with coverage of 17% even with a sample of 600 observations.
The next process we consider has a changing autoregressive coefficient
The adjusted rolling OLS estimator performs well once the sample size reaches 600 (ef-
fectively 120 with λ = 0.2), while the smoothed rolled estimators with larger bandwidths
perform well for all sample sizes and have narrower confidence bands.
For the next experiment, the autoregressive coefficients do not change, but there is an
omitted trend variable. This type of data generating process will cause bias in the autore-
gressive parameter if one uses standard OLS, which are illustrated in Figure 1. The process
is given by
yt = 2 × (t/T ) + ρyt−1 + t .
We report the coverage for the uniform bands for the autoregressive coefficient in Table 5.
The biases in standard OLS are apparent in both the rolled OLS and adjusted rolled
OLS based confidence bands with coverage of 0% for all sample sizes. The bias appears in
17
AR Bias
1.1
0.9
0.7
0.5
Bias
0.3
0.1
-0.1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-0.3
k
Figure 1: Bias for OLS and the smoothed estimates (Sb1, Sb2, Sb3).
the MAD rows of Table 5 where both adjusted rolling OLS and rolling OLS are over five
times larger than the smoothed rolling estimators.
18
The next data generating process is given by
so that the autoregressive coefficient starts at zero, and oscillates between 0.80 and zero.
The results are displayed in Table 6.
The performance of rolling OLS is again poor, and the adjusted rolling OLS is also below
nominal confidence levels. The smoothed rolling procedures improve as the sample size
increases, up to coverage of 84%.
Finally, we treat the case where the intercept term follows increases and then decreases
with
yt = sin[2π × (t/T )] + ρyt−1 + t .
19
Table 7. AR, Omitted trend, sin[2π × (t/T )]
accurate confidence bands. However, from the theorems in earlier sections, we know that bias
results when the second moments of the regressors are related to the parameters in the model.
A well-known case of this phenomenon is the time varying autoregressive model, which we
also explore in the Monte Carlo experiments. We see that employing rolling regression with
the use of the time-varying coefficient regression of Cai (2007) in the first stage removes the
bias from rolling regression.
5 An Empirical Application
Rolling regression was employed by O’Reilly and Whelan (2005) to examine the persistence
of inflation over time in the Euro area. Suppose that one estimates an AR(3) model of U.S.
inflation given by
20
observations from February of 1947 to May of 2020. As a preliminary analysis, we estimate
the AR(4) model and test for serial correlation in the residuals. We fail to reject the null of
no serial correlation at the 5% level, and we also reject the unit root hypothesis at the 1%
level. We proceed to analyze rolling estimates of the persistence parameter ω1 . Given our
preliminary findings of no unit root, our rolling analysis is not intended to check for unit
roots but to look for changing persistence. To this end, we consider a rolling window over
the sample of data. Adjusting for endpoints, we are left with 880 observations, so that using
λ = 0.20 results in using 176 observations in the rolling window.
The naive approach just estimates the parameter ω with OLS using 176 observations for
each window and includes heteroskedasticity robust standard errors. However, the confidence
bands incorporate the incorrect critical value of 1.96. Given that λ = 0.20, using Table 1
gives the appropriate critical value that accounts for the uniform nature of the confidence
bands, with a value of 3.265, just as used in the Monte Carlo experiments. We plot the
resulting rolling estimator of ω and the competing confidence bands in Figure 2. That is,
we plot the rolling OLS (RO) regression estimator with the incorrect confidence bands, and
the smoothed rolling regression (SR) with the corrected bands that account for the rolling
multiple periods.
There are several conclusions that we see from the empirical exercise of comparing the
existing naive procedure (RO) from the new technique (SR). The first point is that the naive
function is not contained in the corrected bands, suggesting that the bias process is nonzero.
For example, in May of 1969 through August of 1980, we see that the naive rolling OLS
estimate is outside our bands, and the same is true later in the sample in November of 1991.
In addition, the naive RO procedure was shown to have narrow bands so that confidence
levels are far below the prescribed target. A by-product of the incorrectly narrow bands of
the naive rolling OLS procedure is the illusion of a more volatile inflation persistence. That
is, the smoothed rolling regression estimates with more accurate confidence bands and lack
of bias indicates a gradual increase in persistence in the early 1970’s, a peak in the mid
1980’s, and a gradual decline until the early 2000’s.
The examples with an autoregressive process show that the result of a change in the trend
or mean of the process is upward bias in the estimated persistence. In light of this effect, we
21
Inflation Persistence
0.1
Nov-91
Nov-96
Nov-01
Nov-06
Nov-11
Nov-16
May-04
May-09
May-14
May-19
Nov-76
Nov-81
Nov-86
Nov-61
Nov-66
Nov-71
May-79
May-84
May-89
May-94
May-99
Feb-98
Aug-00
Feb-03
Aug-05
Feb-08
Aug-10
Feb-13
Aug-15
Feb-18
May-64
May-69
May-74
Feb-78
Feb-88
Aug-80
Feb-83
Aug-85
Aug-90
Feb-93
Aug-95
Feb-63
Aug-65
Feb-68
Aug-70
Feb-73
Aug-75
-0.1
-0.3
-0.5
-0.7
-0.9
-1.1
-1.3
-1.5
SR RO
estimate the α parameter which also influences the mean of the autoregressive process. The
smoothed rolling estimate is shown in Figure 3. We anticipate the more that the estimate
of α changes, the more upward bias we will see in the estimate of persistence. From Figure
3 coupled with Figure 2, this is indeed the case from May of 1964 until May of 1979, where
the naive rolling OLS estimate is much higher than our smoothed estimate, SR.
The use of our new procedure corrects for the poor numerical performance of confidence
bands by increasing the width. Moreover, autoregressive processes estimated via rolling OLS
are susceptible to upward bias in the persistence estimate arising from changes in the mean
of the process through the change in intercept parameters. We mitigate this bias to isolate
the persistence from the change in mean of the process. In the present case, increases in
the level of inflation are misinterpreted by researchers relying on rolling OLS as increases in
persistence.
22
Estimated Intercept
0.19
0.18
0.17
0.16
0.15
0.14
0.13
0.12
0.11
0.1
0.09
Nov-11
Nov-16
May-14
May-19
Nov-91
Nov-96
Nov-01
Nov-06
May-04
May-09
Feb-18
May-89
May-94
May-99
Aug-10
Feb-13
Aug-15
Nov-61
Nov-66
Nov-71
Nov-76
Nov-81
Nov-86
May-79
May-84
Feb-93
Feb-98
Aug-00
Feb-03
Aug-05
Feb-08
May-64
May-69
May-74
Aug-85
Feb-88
Aug-90
Aug-95
Feb-63
Aug-65
Feb-68
Aug-70
Feb-73
Aug-75
Feb-78
Aug-80
Feb-83
SR
6 Conclusion
The results in this paper provide an asymptotic analysis of the ubiquitous rolling regression
estimator for a class of potentially non-stationary processes. Our analysis covers processes
that allow forms of non-stationary properties which may arise from the changing parameters
in the model. In particular, we can cover classes of varying parameter autoregressions, for
example. In the simplest cases, the usual procedure for using point-wise confidence bands
based on plus and minus 1.96 times the standard error will lead to confidence bands that
are much too narrow. The limiting distribution is a functional of Gaussian processes, but
is readily tabulated. Our results suggest that one should use our new critical values which
depend on the window width to determine uniform confidence bands.
In addition to the new critical values, we showed the potential for a bias process arising
from a relationship between the distribution of the regressors and the regression parameters.
From an empirical standpoint, a dynamic model will be most susceptible for such a process.
23
However, we propose a procedure of averaging smooth coefficient time-varying regression
estimators over the relevant window. The resulting distribution is the same as in the case
with no bias process, and should be used when applying rolling regression in dynamic models.
The empirical example covers time-varying persistence in inflation, and we show that the
new corrected bands suggest that the persistence is less variable than previous studies would
suggest.
Rolling regression is a natural procedure, and one employed in many recent empirical
studies. The choice of window width for the typical rolling procedure is often based on having
“enough” observations in order to estimate parameters, or based on some relevant time frame
for the question at hand. Our results show that one can retain the original idea of rolling
regression, as well as the usual parametric convergence rates, but the statistical distribution
is adjusted to obtain proper coverage. Moreover, the adjusted rolling regression procedures
are simple to implement, with narrower confidence bands relative to fully nonparametric
time-varying coefficient models with uniform confidence bands.
7 Proofs
Proof of Theorem 2.1:
−1
√ √ [rT ]
[rT ]
T β̂λ (r) − β̄λ (r) = T xs x⊤
s
xs ys − β̄λ (r)
s=[rT −T λ+1] s=[rT −T λ+1]
−1
√ [rT ]
[rT ]
s
= T xs x⊤
s
xs x⊤
sβ + s − β̄λ (r)
T
s=[rT −T λ+1] s=[rT −T λ+1]
−1
[rT ] [rT ]
√ √
= T xs x⊤
s
xs s + T BT (r)
s=[rT −T λ+1] s=[rT −T λ+1]
√ 1
[rT ]
s
+ T β − β̄λ (r)
Tλ T
s=[rT −T λ+1]
where BT (r) is the bias process defined in the statement of the theorem. The final term is
o(1) by Reimann integrability of β(s/T ). Then applying Corollary 2 of Wu and Zhou (2011)
24
for locally stationary processes, we have
[rT ]
1
√ xs s ⇒ Q(r) − Q(r − λ).
T s=[rT −T λ+1]
Now consider xs x⊤ ⊤
s . The process xs xs − M (s/T ) is mean zero, and following the proof of
so that
[rT ]
r
1 p
xs x⊤
s → M (s)ds
T r−λ
s=[rT −T λ+1]
uniformly in r.
Proof of Theorem 3.1 For the local-linear estimator, define the term
1 1 t − s
[T (1−h)]
RT,0 (s/T ) = K xt t
T h Th
t=[T h]
where
ξT = T −1/2 h−1 + h
from equations (26) and (27) of Zhou and Wu (2010) and (A.3) of Zhang and Wu (2012).
We have
√ ∗ 1 [rT ]
T β̂λ (r) − β̄λ (r) = √ β̃(s/T ) − β(s/T )
T λ s=[rT −T λ+1]
[rT ]
1
= √ M (G, s/T )−1 RT,0 (s/T ) + Op (T 1/2 κT ξT ),
T λ s=[rT −T λ+1]
25
where the last term is op (1) given our bandwidth choice. Then
√ ∗ 1 [rT ]
−1 1
[T (1−h)]
t−s
T β̂λ (r) − β̄λ (r) = √ M (G, s/T ) K xt t + op (1)
T λ s=[rT −T λ+1] Th Th
t=[T h]
[T (1−h)]
1 [rT ]
1 −1 t−s
=√ M (G, s/T ) K xt t + op (1)
T λ t=[T h] T h s=[rT −T λ+1] Th
[rT ] [T (1−h)]
1 −1 t−s 1
=√ M (G, s/T ) K xt t + op (1).
T λ s=[rT −T λ+1] Th Th
t=[T h]
Consider
[rT ] [T (1−h)]
1 −1 1 t−s 1
D(r) = M (G, s/T ) K √ xt t .
T h Th Tλ
s=[T h] t=[T h]
We write
Consider the second term on the right hand side. Denote the eigenvalue decomposition as
M (G, s/T ) = Γ(s/T )Θ(s/T )Γ(s/T )−1 so that
[T (1−h)] [T (1−h)]
1 1 t − s 1 t − s
M (G, s/T ) −1
K ≤ M (G, s/T ) K
−1
T h T h Th T h
s=[T r]+1 s=[T r]+1
[T (1−h)]
1 t−s
= K tr [Γ(s/T )Θ(s/T )−2 Γ(s/T )−1 ]
Th Th
s=[T r]+1
[T (1−h)]
1 t−s
≤ K c−2k
M
Th Th
s=[T r]+1
26
where cM is the lower bound of eigenvalues of M (s/T ). Note that t ≤ [T r], so that t < s.
Then
[T (1−h)]
1−h
1 t−s 1 u−v
K ∼ K dv
Th Th r h h
s=[T r]+1
1−h−u
h
= K(z)dz
r−u
h
where z = (v − u)/h, u < 1 − h, and r < u. Then as h → 0, this integral is O(h), since
K(z) = 0 for |z| ≥ 1. Hence, D1 (r) converges in probability to 0 uniformly in r. The
argument is similar for D2 (r).
For D∗ (r) we note that
[T (1−h)] (1−h)
1 −1 t−s 1 −1 u−v
M (G, s/T ) K ∼ M (G, v) K dv
Th Th h h h
s=[T h]
u−r+λ
h
= M (G, u − zh)−1 K(z)dz
u−r
h
Then
[rT ]
∗ 1
D (r) = √ M (G, t/T )−1 xt t + op (1)
T λ t=[T h]
1
Λ(s) = M (s)−1 Ω(s)M (s)−1 .
λ2
27
References
Adrian, T., R.K. Crump and E. Moench (2015). Regression-Based Estimation of Dynamic
Asset Pricing Models. Journal of Financial Economics, 118, 211-244.
Andrews, D.W.K. (1993). Tests for Parameter Instability and Structural Change With
Unknown Change Point. Econometrica, 61, 821-856.
Blanchard, O. (2018). Should We Reject the Natural Rate Hypothesis?, Journal of Eco-
nomic Perspectives, 32, 97-120.
Cai, Z. (2007). Trending Time-Varying Coefficient Time Series Models with Serially Cor-
related Errors. Journal of Econometrics, 136, 163-188.
Campello, M., A. Galvao and T. Juhl (2019). Testing for Slope Heterogeneity Bias in Panel
Data Models. Journal of Business and Economic Statistics, 37, 749-760.
Chen, B. and Y. Hong (2012). Testing for Smooth Structural Changes in Time Series
Models via Nonparametric Regression. Econometrica, 80, 1157-1183.
Clark, T.E. and M.W. McCracken (2009). Improving Forecast Accuracy by Combining
Recursive and Rolling Forecasts. International Economic Review, 50, 363-395.
Georgiev, I., D.I. Harvey, S.J. Leybourne and A.R. Taylor (2018). Testing for Parameter
Instability in Predictive Regression Models. Journal of Econometrics, 204, 101-118.
Hansen, B.E. (1996). Inference When a Nuisance Parameter Is Not Identified Under the
Null Hypothesis. Econometrica, 64, 413-430.
Inoue, A., L. Jin and B. Rossi (2017). Rolling Window Selection for Out-of-Sample Fore-
casting With Time-Varying Parameters. Journal of Econometrics, 196, 55-67.
Jimeńez, G., S. Ongena, J.-L. Peydró and J. Saurina (2017). Macroprudential Policy,
Countercyclical Bank Capital Buffers, and Credit Supply: Evidence from the Spanish
Dynamic Provisioning Experiments. Journal of Political Economy, 125, 2126-2177.
28
Johnston, G. (1982). Probabilities of Maximal Deviations for Nonparametric Regression
Function Estimates. Journal of Multivariate Analysis, 12, 402-414.
Linnainmaa, J.T. and M.R. Roberts (2018). The History of the Cross Section of Stock
Returns. Review of Financial Studies, 31, 2606-2649.
Løpez-Salido, D., J.C. Stein and E. Zkrajšek (2017). Credit-Market Sentiment and the
Business Cycle. Quarterly Journal of Economics, 132, 1373-1426.
Lu, Z. and O. Linton (2007). Local Linear Fitting Under Near Epoch Dependence. Econo-
metric Theory, 23, 37-70.
O’Reilly, G. and K. Whelan (2005). Has Euro-Area Inflation Persistence Change Over
Time. Review of Economics and Statistics, 87, 709-720.
Perron, P. (1989). The Great Crash, the Oil Price Shock and the Unit Root Hypothesis.
Econometrica, 57, 1361-1401.
Swanson, E.T. and J.C. Williams (2014). Measuring the Effect of the Zero Lower Bound
on Medium-and Longer-Term Interest Rates. American Economic Review, 104, 3154-
3185.
Wu, W.B. and Z. Zhou (2011). Gaussian Approximations for Non-Stationary Multiple
Time Series. Statistica Sinica, 21, 1397-1413.
Zhao, Z. and W.B. Wu (2008). Confidence Bands in Nonparametric Time Series Regression.
Annals of Statistics, 36, 1854-1878.
29
Zhou, Z. and W.B. Wu (2010). Simultaneous Inference of Linear Models with Time Varying
Coefficients. Journal of the Royal Statistical Society, Series B, 72, 513-531.
30