0% found this document useful (0 votes)

39 views

Rolling Regression Theory

The paper examines the asymptotic distribution of rolling linear regression estimators using various window widths. It finds that the limiting distribution depends on the window width and a "bias process" typically ignored. It proposes a corrected rolling regression technique that removes this bias by rolling over smoothed parameter estimates. An empirical example illustrates how the new confidence bands suggest alternative conclusions about inflation persistence compared to traditional bands.

Uploaded by

Luis Eduardo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

Rolling Regression Theory

Uploaded by

Luis Eduardo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

∗†

The Distribution of Rolling Regression Estimators

Zongwu Cai, Ted Juhl
University of Kansas
January 4, 2021

Abstract
We find the asymptotic distribution for rolling linear regression models using various
window widths. The limiting distribution depends on the width of the rolling window,
and on a “bias process” that is typically ignored in practice. Based on the distribution,
we tabulate critical values used to find uniform confidence intervals for the average
values of regression parameters over the windows. We propose a corrected rolling
regression technique that removes the bias process by rolling over smoothed parameter
estimates. The procedure is illustrated using a series of Monte Carlo experiments.
The paper includes an empirical example to show how the confidence bands suggest
alternative conclusions about the persistence of inflation.

Keywords: Parameter instability; Nonparametric estimation; Rolling regressions; Uniform

confidence intervals; Nonstationary.

1 Introduction
Rolling regression is often employed in many applied fields as a method to characterize
changing relationships over time. As a simple robustness check, regression parameters are
estimated using some fraction of the data early in the sample. The fixed fraction is then
“rolled” through the sample, so that the estimated regression parameters may vary over time.
This intuitive procedure is one methods of examining the stability of statistical relationships
over time. A cursory search will reveal that there are rolling regression routines written
∗
We thank Graham Elliott, Bruce Hansen, and Ulrich Müller for comments on earlier versions of this
paper.
†
Contact information: Zongwu Cai, Department of Economics, University of Kansas, Lawrence, KS, 66045
([email protected]); Ted Juhl, School of Business, University of Kansas, Lawrence, KS 66045 ([email protected])

1
in statistical computing languages or packages such as R, SAS, STATA, Matlab, RATS,
Python, Eviews, and Excel.
As part of the prototypical exercise of reporting rolling regression estimates, researchers
often plot bands around the point estimates as a way to conduct some type of ocular infer-
ence about whether there are changes in relationships over time. The regression bands are
constructed using estimated standard errors from the regression parameters in the relevant
time period for the rolling window. Then, these estimated standard errors are multiplied
by critical values from the standard normal distribution. Recent papers using rolling regres-
sion with confidence bands include Swanson and Williams (2014), Linnainmaa and Roberts
(2017), Adrian et al. (2015), Blanchard (2018), Georgiev et. al. (2018), Jiménez et. al.
(2017), and López-Salido et al. (2017), among others.1
In this paper, we characterize the population parameters that rolling regression is at-
tempting to estimate and we find the distribution of the rolling estimator. We provide new
critical values that are used to construct asymptotically correct confidence bands for the
estimated function. As part of this exercise, we show that rolling regression contains a bias
process that may inhibit inference about the true population parameters. We develop a new
procedure to estimate rolling regression parameters that is not affected by the bias process.
The original idea of rolling regression is an intuitive one, in that we want to use regression
over different time intervals to examine how the relationship may have changed. The window
width may appear to be ad-hoc, but it is often based on equating the window width with
some number of observations that seems appropriate for time series estimation, or based on
decades, or some other relevant time frame. However, the results of this paper suggest that
rolling regression is a compromise of the usual bias variance tradeoff. In particular, we can
obtain the usual parametric convergence rates for rolling regression estimates (rather than
nonparametric ones), although with a different limiting distribution.
The remainder of the paper is structured as follows. In Section 2, we develop the model,
assumptions, and asymptotic distribution results for the existing rolling regression proce-
dures. In Section 3, a new procedure is proposed to deal with the bias process. Our proce-
1
Rolling regression is also used in forecasting in work by Clark and McCracken (2009) in a framework
that allows for structural change in regression parameters.

2
dure is discussed. Section 4 provides Monte Carlo evidence for the competing procedures.
An empirical example is treated in Section 5 to illustrate diﬀerences in results based on our
techniques versus the traditional confidence bands for the persistence of inflation. Section 6
concludes. Finally, all technical proofs are gathered in Section 7.

2 Model and Assumptions

It is natural to ask what we are attempting to estimate by employing rolling regression.
Consider a standard regression model given by

yt = x⊤
t β + 󰂃t , 1 ≤ t ≤ T,

where xt is a p-dimensional regressor. Let λ be the fraction of the total sample of T obser-
vations that is used in the rolling sample of data. The rolling regression estimator uses the
[T λ] observations, where [x] denotes the integer part of x, and we index each of the periods
with r so that we have
󰀳 󰀴−1
[rT ] [rT ]
1 󰁛 1 󰁛
β̂λ (r) = 󰁃 xs x⊤
s
󰁄 xs ys .
Tλ Tλ
s=[rT −T λ+1] s=[rT −T λ+1]

Clearly, if a constant coeﬃcient regression model is correctly specified, then the rolling
regression estimator would estimate the same parameter, β, in each of the sub-samples using
λ × 100% of the data. Such an exercise is not interesting as β is constant over time, and
rolling estimators will be ineﬃcient relative to the full sample regression estimator. Consider
the model 󰀕 󰀖
t
yt = x⊤
t β + 󰂃t
T
so that the regression parameter changes over time. The data are assumed to become more
dense around a point t as T increases, a device employed in many studies, beginning notably
in Robinson (1989); see, for example, Cai (2007) for details. Given that the regression
parameter is potentially changing at each point in time, as T increases, we hope to estimate
the “average” value of β(t/T ) in the rolling window. To be specific, define
󰁝
1 r
β̄λ (r) = β(u)du,
λ r−λ

3
which is the population parameter of interest indexed by λ and r. This quantity represents
the average of the coeﬃcients at point r given a rolling window fraction λ.
Our first task is to ascertain whether rolling regression indeed provides a consistent
estimate of the parameter β̄λ (r). We make several assumptions about the data generating
process for the results in this section. Our theory makes use of the characterizations of
processes from Zhou and Wu (2010). First, we allow for time varying processes, a form
of non-stationarity. In particular, given that we are attempting to estimate time-varying
parameters, it is natural to allow for non-stationary processes. Changing values of β(t/T )
necessarily induces non-stationarity in yt . Moreover, if the model is dynamic, then xt is also
non-stationary if β(t/T ) changes. To this end, consider processes depending on deterministic
functions coupled with the iid process vt . Let Ft = {. . . , vt−1 , vt }. The processes for the
covariates and errors are given by
󰀕 󰀖 󰀕 󰀖
t t
xt = G , Ft and 󰂃t = H , Ft ,
T T

respectively, where the functions G and H allow for non-stationary processes in that the
moments may change over time. The index t is scaled by the sample size T so that the data
are assumed to be observed more densely as we collect more observations. In particular,
define the second moment matrix of the xt process as
󰀥 󰀕 󰀖 󰀕 󰀖⊤ 󰀦
t t
M (t) = E G , Ft G , Ft .
T T

The process associated with xt 󰂃t is denoted GH and the covariance matrix of this product
process as 󰀥 󰀕 󰀖 󰀕 󰀖⊤ 󰀦
t t
Ω(t) = E GH , Ft GH , Ft .
T T
Statisticians characterize dependence in data in several ways; linear processes, α-mixing,
β-mixing, etc. Recent papers by Chen and Hong (2012) and Cai (2007) both make use of
β-mixing and α-mixing assumptions, but the data are assumed to be stationary in both
papers. The assumption in Inoue et al. (2017) allows the data to be near-epoch dependent
(NED). As argued in Inoue et al. (2017), the NED assumption is more general than the
α-mixing assumption, allows for heterogeneity over time which is necessary for time-varying

4
parameter framework, and overcomes several undesirable features of the α-mixing assump-
tion as addressed by Lu and Linton (2007). In this paper, we follow Zhou and Wu (2008) and
allow for non-stationary processes that might arise from dynamic models with time-varying
parameters. To this end, let v0′ be an iid copy of the variable v0 that is part of Fj . Define
Fj∗ = {. . . , v−1 , v0′ , v1 , . . . , vj−1 , vj }. We wish to characterize the dependence of a process by
measuring the eﬀects of a shock to the system. For the variable xt , define
󰀫󰀐 󰀕 󰀖 󰀕 󰀖󰀐 󰀬
󰀐 t t 󰀐
δq (x, j) = sup 󰀐
󰀐 G , F − G , F ∗ 󰀐
T j 󰀐q
j
t T
for some q ≥ 1, which is a measure of the eﬀect of a shock after j periods. Limiting the
allowable dependence in a process amounts to specifying suitable rates of decay for δq (x, j)
as j increases.
The dependence in these processes over time is a separate issue than whether the process
is stationary. One class of non-stationary processes is the unit root process, where the
autoregressive parameter is related to both dependence of the process over time as well as
whether the variance of the process is constant over time. In the framework of this paper,
we allow the processes to be non-stationary in a way that is separate from the dependence
over time. To this end, we denote a process G to be stochastically Lipshitz continuous if
󰀝󰀐 󰀓 󰀔 󰀕 󰀖󰀐 󰀞 󰀏 󰀏
󰀐 s t 󰀐 󰀏 (t − s) 󰀏
󰀐G , F0 − G , F0 󰀐 󰀏 󰀏
sup
0≤s≤t≤T
󰀐 T T 󰀐 ≤ c1 󰀏 T 󰀏
2

for finite c1 > 0.

Assumption 1 The true data generating process is given by

󰀕 󰀖
⊤ t
yt = xt β + 󰂃t ,
T
and β(u) is Riemann integrable on [0, 1].

Assumption 2 The error process 󰂃t is a martingale diﬀerence with respect to Ft , so that

E(󰂃t |Ft−1 ) = 0.

Assumption 3 For xt 󰂃t we have

∞
󰁛
1/4+γ
N δ4 (x󰂃, j) < ∞
j=N
󰁓∞
for some γ > 0, and for x2t we have j=N δ4 (x, j) < ∞.

5
Assumption 4 The functions G and GH are stochastically Lipshitz continuous processes.

Assumption 5 The smallest eigenvalue of M (t) is bounded away from zero.

Assumption 1 allows for the regression parameters to vary over time, perhaps with dis-
continuities. The martingale difference assumption could be relaxed and would require a
long-run variance estimator for inference procedures. In the present case, we can assume a
dynamic model may be specified to remove any correlation in 󰂃t . Moreover, because of the
Assumptions 3 and 4, we allow for dynamic models with changing parameters, so that we
can assume martingale differences for the error process. In particular, autoregressive models
with time varying coefficients are considered in Zhang and Wu (2012). They show that, given
(standard) conditions on the time varying roots of the characteristic function, the process is
locally stationary and they characterize the decay in dependence. These models are shown
to satisfy similar conditions to the conditions in this paper. Assumption 5 is sufficient for
the rolling regression estimator to exist in the limit for any of the rolling windows one might
consider.
We now provide the limiting distribution of the rolling regression estimator with its proof
given in Section 7.

Theorem 2.1 Suppose that Assumptions 1 to 5 hold and that r ∈ [λ, 1]. Then,
󰀕󰁝 󰀖−1
√ r
T (β̂λ (r) − β̄λ (r) − BT (r)) ⇒ M (s) [Q(r) − Q(r − λ)] ,
r−λ

where
󰀳 󰀴−1 󰀵 󰀶
[rT ] [rT ] [rT ]
󰁛 󰁛 󰁛
BT (r) = 󰁃 xs x⊤
s
󰁄 xs x⊤
s
󰀷β( s ) − 1 s
β( )󰀸
T Tλ T
s=[rT −T λ+1] s=[rT −T λ+1] s=[rT −T λ]

serves as the asymptotic bias term, ⇒ indicates weak convergence, and Q(r) denotes a p-
󰁕 min(r1 ,r2 )
dimensional Gaussian process with covariance E[Q(r1 )Q(r2 )⊤ ] = 0 Ω(s).

The result provides several insights toward the use of rolling regression. First, the limit-
ing distribution involves Q(r), a functional of Brownian motion. In the search for confidence
bands of the “average” of the regression parameters, using critical values from a standard

6
normal distribution is incorrect.2 As we illustrate in the Monte Carlo section, using stan-
dard normal critical values for confidence bands for the estimate of the average parameter
vector, β̄λ (r), will be too narrow, so that coverage probabilities are well below target lev-
els. Intuitively, the smaller the rolling window, we expect that wider confidence bands are
required.
In addition to the finding of the limiting distribution involving functionals of Brownian
motion, the distribution is aﬀected by a bias process denoted BT (r). If the parameter vector
β(s/T ) is constant over the entire sample, then this process disappears. However, in such
cases, rolling regression is not interesting. If xs x⊤
s is unrelated to β(s/T ), then the process

will have zero mean, and hence we can apply a functional central limit theorem to it. In this
case, we can think of this term as as generating an additional term in the variance process.
If xs x⊤
s is related to β(s/T ), then the bias process has non-zero mean. The intuition for

the bias process is that regression over the rolling interval will weight the data, and hence
the parameter vector, with more weights to observations with larger values of xs x⊤
s . If these

quantities are related to the parameter vector, then we may find inconsistent estimates of the
average, given that some parameter values are over-weighted. A related phenomenon arises
for fixed eﬀects regression in panel data models. In particular, Campello et al. (2019) show
that if cross-sectional units are heterogeneous in slope, a bias may result if the heterogeneity
is related to the second moments of the regressors.
Theorem 2.1 allows for both xt and 󰂃t to exhibit a form of non-stationary behavior,
which complicates the limiting distribution of the rolling estimator. The following corollary
provides a simplification. If xt and 󰂃t are both stationary, the result simplifies. First, the bias
󰁕r
process BT (r) disappears. In addition, M (s) and Ω(s) are constant, so that r−λ M (s) = λM ,
󰁕 min(r1 ,r2 )
where M = E(xs x⊤ s ), and 0 Ω(s) = min(r1 , r2 )Ω with Ω = E(xs x⊤ 2
s 󰂃s ).

To illustrate the bias process, we consider a simple autoregressive model given by

󰀕 t 󰀖 󰀣 t󰀤
t t α( T ) k1 ×
yt = α( ) + φ( )yt−1 + 󰂃t where t = T ,
T T φ( T
) φ

so that the autoregressive coeﬃcient φ(t/T ) is constant but the intercept term is changing
2
A large literature on testing for structural change with unknown change point illustrates the need for
diﬀerent critical values in hypothesis testing.

7
over the sample. It is easy to show, regardless of the width of the rolling window, that the
asymptotic bias for estimating the parameter φ is given by

󰀅 󰀆
k12 (1 − φ)(1 − φ2 )/ k12 (1 − φ2 ) + 12σ 2 .

To simplify further, if φ = 0, bias is

󰀅 󰀆−1
1 + 12σ 2 /k12 .

This simple case is much like the result in Perron (1989) for an omitted (broken) trend in
the data generating process causes a bias in the estimates of the autoregressive parameters.
Even if there is no serial correlation, the estimated autoregressive parameter may be close
to one. The larger the magnitude of the omitted trend, the more bias. In general, the bias
process BT (r) will be larger in magnitude as there is more correlation in the second moment
of xt and the parameter vector β(t/T ).

2.1 Critical Values

The limiting distribution is a functional of the process given by Q(r). The usual (incorrect)
procedure for constructing confidence bands is to calculate a standard error estimate for
each of the sub-periods in the rolling regression, and then employ standard normal critical
values. For a rolling regression indexed by r, we would use a variance estimator given by
󰀳 󰀴−1 󰀳 󰀴−1
[T r] [T r] [T r]
󰁛 󰁛 󰁛
V̂ (β̂λ (r) = 󰁃 xs x⊤
s
󰁄 xs x⊤
s󰂃ˆ2s 󰁃 xs x⊤
s
󰁄 .
s=[T (r−λ)+1] s=[T (r−λ)+1] s=[T (r−λ)+1]

If there is no bias process, it is easy to see that the standardized process will converge to a
Gaussian process, Q̃(r) with covariance
󰁫 󰁬 󰀕󰁝 r1 󰀖−1/2 󰁝 r1 󰀕󰁝 r2 󰀖−1/2
⊤
E Q̃(r1 )Q̃(r2 ) = Ω(s) Ω(s) Ω(s)
r1 −λ r2 −λ r2 −λ

for r1 < r2 (or 0 if r1 < r2 − λ). The process has variance Ik , and the dependence arises
from the non-zero covariance when r1 > r2 − λ.
From the limiting distribution of rolling regression estimators, we see that if we are
attempting to construct uniform confidence bands for β̄λ (r), standard normal critical values

8
are inappropriate. To this end, we want to find critical values θλ such that
󰀣 󰀤
󰀏 󰀏
󰀏 󰀏
P sup 󰀏Q̃(r)󰀏 ≤ θλ = 0.95.
r∈[λ,1]

If the variance process is constant over r, so that Ω(s) = Ω, then the distribution of Q̃(r)
is a function of standard Brownian Motion, adjusted for diﬀerent values of the fraction of
data used in rolling, λ. To illustrate the distribution under this case when k = 1, we simulate
critical values by generating random variables ut from a standard normal distribution. Then,
󰁓 r]
the standard Brownian Motion W (r) is simulated with √1T [T t=1 ut where T = 10, 000. For

values of λ, find supr∈[λ,1] √1 |W (r) − W (r − λ)|, and repeat 100, 000 times for values of λ
λ

from 0.05 to 0.85. The resulting critical values are analogous to those calculated in Andrews
(1993) for tests of structural change in regression models. The 0.90, 0.95, and 0.99 quantiles
of empirical distribution are provided in Table 1.
The width of the confidence bands increases as the rolling window becomes smaller. For
example, if we use 10% of the sample in the rolling window, a confidence band will be 3.499
times the relevant standard error for 95% confidence. The factor for a 20% window is 3.265,
while an 80% window width increases the factor to 2.377. If one increases the window width
to 100% of the sample, we obtain the usual 1.96. These are the critical values that are
appropriate for confidence bands of rolling regressions in the absence of the bias process. At
a minimum, these tables provide values to replace critical values from the standard normal
tables when applying rolling regression.
In the more realistic case of a time varying variance process governed by Ω(s), critical
values will depend on Ω(s) which enters in the Gaussian process. We follow Hansen (1996)
and simulate the limiting distribution of the process so that we have the proper critical
values for the process associated with Ω(s). To this end, define the following estimators,

󰂃ˆs = ys − x⊤
s β̂λ (s/T )

and let vs be a generated standard normal variable. Then, we can simulate the process Q̃(r)
via 󰀳 󰀴−1/2
[T r] [T r]
1 󰁛 1 󰁛
Ũ (r) = 󰁃 xs x⊤
s󰂃ˆ2s 󰁄 √ xs 󰂃ˆs vs .
T T
s=[T (r−λ)+1] s=[T (r−λ)+1]

9
Table 1. Critical Values for Rolling Confidence Bands
Confidence Level Confidence Level Confidence Level
λ 0.90 0.95 0.99 λ 0.90 0.95 0.99 λ 0.90 0.95 0.99
0.050 3.498 3.708 4.148 0.320 2.795 3.059 3.590 0.590 2.423 2.707 3.262
0.060 3.440 3.648 4.092 0.330 2.780 3.048 3.581 0.600 2.404 2.695 3.260
0.070 3.395 3.614 4.074 0.340 2.767 3.028 3.564 0.610 2.389 2.685 3.252
0.080 3.346 3.570 4.016 0.350 2.753 3.016 3.533 0.620 2.380 2.672 3.266
0.090 3.311 3.535 3.992 0.360 2.730 3.000 3.533 0.630 2.380 2.674 3.248
0.100 3.271 3.499 3.956 0.370 2.712 2.983 3.523 0.640 2.357 2.655 3.238
0.110 3.245 3.478 3.945 0.380 2.700 2.970 3.512 0.650 2.343 2.636 3.214
0.120 3.213 3.444 3.912 0.390 2.690 2.962 3.496 0.660 2.332 2.628 3.208
0.130 3.178 3.415 3.895 0.400 2.677 2.942 3.481 0.670 2.322 2.623 3.214
0.140 3.159 3.388 3.866 0.410 2.662 2.935 3.477 0.680 2.309 2.600 3.176
0.150 3.125 3.356 3.843 0.420 2.646 2.920 3.469 0.690 2.288 2.585 3.163
0.160 3.105 3.344 3.823 0.430 2.635 2.917 3.462 0.700 2.283 2.578 3.176
0.170 3.080 3.319 3.808 0.440 2.617 2.896 3.449 0.710 2.264 2.562 3.155
0.180 3.056 3.299 3.780 0.450 2.603 2.887 3.436 0.720 2.254 2.551 3.145
0.190 3.035 3.284 3.784 0.460 2.591 2.870 3.419 0.730 2.239 2.540 3.135
0.200 3.013 3.265 3.755 0.470 2.582 2.854 3.410 0.740 2.221 2.524 3.136
0.210 2.993 3.240 3.729 0.480 2.563 2.846 3.398 0.750 2.212 2.512 3.115
0.220 2.976 3.225 3.701 0.490 2.554 2.834 3.384 0.760 2.202 2.501 3.079
0.230 2.942 3.194 3.704 0.500 2.544 2.825 3.388 0.770 2.190 2.491 3.093
0.240 2.926 3.182 3.708 0.510 2.530 2.818 3.367 0.780 2.167 2.473 3.062
0.250 2.921 3.174 3.691 0.520 2.513 2.796 3.355 0.790 2.155 2.461 3.058
0.260 2.896 3.154 3.656 0.530 2.499 2.779 3.342 0.800 2.154 2.459 3.059
0.270 2.876 3.134 3.660 0.540 2.483 2.770 3.326 0.810 2.134 2.438 3.029
0.280 2.863 3.120 3.646 0.550 2.468 2.758 3.322 0.820 2.114 2.424 3.024
0.290 2.844 3.103 3.621 0.560 2.459 2.749 3.317 0.830 2.112 2.414 3.027
0.300 2.828 3.091 3.619 0.570 2.443 2.733 3.295 0.840 2.095 2.399 2.995
0.310 2.818 3.078 3.600 0.580 2.428 2.718 3.284 0.850 2.072 2.377 2.999

The simulated (standardized) process will have the same covariance structure as the limiting
distribution in Q̃(r), so that we can construct bands using the maximum of the absolute value
of the appropriate entry of the process, in combination with the element from the variance
matrix to obtain standard errors. The procedure can be applied under the assumption that
the bias process is zero, as would be the case if the second moments of xt are unrelated to
the time-varying regression parameters.

10
3 Estimating Average Slope
Given that the goal of rolling regression appears to be estimating the average slope of the
regression function as we move the window through time, we propose to estimate that quan-
tity directly. That is, we seek to estimate the integral of the regression coeﬃcient over the
subintervals of the sample data. Our intent in the direct estimation of β̂λ (r) is to avoid the
potential bias process given by BT (r).
The local linear estimator for β(t/T ) was analyzed in Cai (2007) for stationary α-mixing
data, and is given by β̃(t/T ), where
󰀕 󰀖−1 󰀕 󰀖
󰀃 󰀄 ST,0 ST,1
⊤
VT,0
β̃(t/T ) = Ip 0p
ST,1 ST,2 VT,1

with
T 󰀕 󰀖ℓ T 󰀕 󰀖ℓ
1 󰁛 t−s 1 󰁛 t−s
ST,ℓ (t/T ) = xs x⊤
s Kt,s , VT,ℓ (t/T ) = xs ys Kt,s
T h s=1 Th T h s=1 Th

Kt,s = K(t−s/T h), and K(·) being a kernel function. It is well known that the nonparametric
estimator is consistent and point-wise normally distributed. Zhou and Wu (2010) extend
the work of Johnston (1982) to include uniform confidence bands in a time series setting
for the function over the entire range of the data. The uniform confidence bands converge
at an even slower rate than the usual nonparametric estimators. In addition, there is an
additional bias term which depends on the second derivative of β(t/T ) at each point, which
adds another obstacle to the construction of confidence bands.
We propose following estimator for β̄λ (r);

1
Tλ
󰁛 󰀓s󰀔
β̂λ∗ (r) = β̃ .
[T r] T
s=[rT −T λ+1]

The intuition for our estimator is that we hope to combine each estimator of β(s/T ) for the
relevant range. By choosing an appropriate bandwidth parameter, we can eliminate the bias
process altogether. We list three additional assumptions for the data generating process and
the bandwidth.

Assumption 6 The function β(s) has three continuous derivatives.

11
Assumption 7 The bandwidth is chosen such that h = c2 T −δ with 1/4 < δ < 1/3.

Assumption 8 The kernel function K(z) is second order and takes the value 0 outside of
[−1, 1].

Assumption 6 imposes smoothness conditions on the behavior of the regression coeﬃcients

over time. Since we are estimating the average of the coeﬃcients, this assumption simplifies
the results. Assumption 7 is required so that the estimate of the average slope will converge
at the usual parametric rate, and Assumption 8 is useful to limit the observations that enter
the smoothed estimators. The Epanechnikov kernel is a popular example of such a kernel
and we employ this kernel in our Monte Carlo and empirical examples. We state the theorem
for the modified procedure below with its proof relegated to Section 7.

Theorem 3.1 Suppose that Assumptions 1-8 hold with r ∈ (λ, 1).
√
T (β̂λ∗ (r) − β̄λ (r)) ⇒ [Q2 (r) − Q2 (r − λ)] ,
󰀅 󰀆
where Q2 (r) is p-dimensional Gaussian process with covariance matrix E Q2 (r1 )Q2 (r2 )⊤ =
󰁕 min(r1 ,r2 )
0
Λ(s)ds, and
1
Λ(s) = 2 M (s)−1 Ω(s)M (s)−1 .
λ

The new estimator is consistent for β̂λ (r). Moreover, like the naive rolling estimator,
the limiting distribution of our new statistic is also a function of Q(r). Hence, we can
employ the generated critical values to construct uniform confidence bands for β̂λ (r). This
rolling average smoothed estimator can be viewed as a bias corrected version of rolling
regression, and the limiting distribution still involves a similar Gaussian process. The bias
is absent because we are directly estimating the average of the parameter values through
the two step process. We first estimate the time-varying parameters directly via local-linear
estimation, and then we average those estimates. Similar results are often obtained in semi-
parametric models involving averages of nonparametric estimates. The advantage of this
procedure is that the averaging operation provides a faster rate of convergence that does not
depend on the nonparametric bandwidth rate. In this way, the rolling estimator provides a
computationally tractable procedure with the same interpretation as the traditional rolling

12
regression procedure. However, our method now has the correct uniform size, where the
traditional rolling procedure would have bands that are too narrow, resulting in incorrect
coverage.
Construction of the appropriate confidence bands is similar to the method used in Zhang
and Wu (2012) along with our tabulated critical values. We estimate the standard errors of
the modified rolling estimators. Let
󰀗 󰀘 T 󰀕 󰀖ℓ 󰀕 󰀖
⊤
ST,0 (s) ST,1 (s) 1 󰁛 ⊤ 2 s−r s−r
S̃(s) = , Ω̃ℓ (s) = xr xr 󰂃˜r K ,
ST,1 (s) ST,2 (s) T h r=1 Th Th
󰀗 󰀘
Ω̃0 (s) Ω̃1 (s)
Ω̃(s) = , and 󰂃˜r = yr − x⊤ r β̃(r).
Ω̃1 (s) Ω̃2 (s)
Then, the standard errors are estimated via the variance matrix
1
[T r]
󰁛 󰁫 󰁬 󰁫 󰁬⊤ 1
[T r]
󰁛
.. −1 −1 .
.
T λ2
2 Ip . 0p S̃(s) Ω̃(s)S̃(s) I p . 0p = T 2 Λ̃(s).
s=[T (r−λ)] s=[T (r−λ)]

The standardized estimator will be governed by the process Q̃2 (r), which is Gaussian with
covariance process
󰁫 󰁬 󰀕󰁝 r1 󰀖−1/2 󰁝 r1 󰀕󰁝 r2 󰀖−1/2
⊤
E Q̃2 (r1 )Q̃2 (r2 ) = Λ(s) Λ(s) Λ(s)
r1 −λ r2 −λ r2 −λ

for r1 < r2 (or 0 if r1 < r2 − λ).

We note that the BT (r) process is removed from the limiting distribution due to the
restrictions on the smoothing parameter h. In addition, if xt and 󰂃t are stationary, the
distribution simplifies further since Ω(s) = E(xs x⊤ 2
s 󰂃s ) is constant.

3.1 Critical Values

As in the case of the traditional rolling regression, the appropriate critical values for the
rolling average slope estimator are not the usual values associated with the standard normal
distribution. In the most restrictive case where Ω(s) and the second moment matrix of the
regressors M (s) are constant over s, we can employ the critical values appearing in Table 1.
If Ω(s) and M (s) are time-varying, we can use the estimated components of Λ(s) so that
we can simulate using
󰀳 󰀴−1/2
[T r] [T r]
1 󰁛 1 󰁛
Ũ2 (r) = 󰁃 Λ̃(s)󰁄 √ Λ̃(s)1/2 vs .
T T
s=[T (r−λ)+1] s=[T (r−λ)+1]

13
4 Monte Carlo Studies
The theorems from Sections 2 and 3 show that rolling regression estimators have a limiting
distribution that depends on functionals of Brownian motion, and those new critical values
are provided in Table 1. The purpose of Theorem 3.1 is to provide a bias corrected estimate
of the average slope.
We explore the performance of various rolling regression estimators in this section. In
particular, the naive rolling regression estimator that uses standard normal critical values
is denoted RO for Rolling OLS. A second estimator is considered but uses the new critical
values, and we refer to this estimator as adjRO. This rolling estimator accounts for Q(r) in
the limiting distribution, but does not correct for the possible bias process B(r). Finally, we
include three versions of the corrected rolling regression estimator proposed in Section 3. The
estimator depends on a bandwidth parameter for the time varying parameter regression at
the first stage. The form of the allowable bandwidth is h = c2 T −δ where −1/4 < δ < −1/3.
We use δ = 0.30 and set c2 = 0.5, 0.75, and 1. The estimators are denoted SRb1, SRb2, and
SRb3.
Before we consider rolling regression, we illustrate the bias that one encounters if an
autoregressive process has an omitted trend, which could be considered a case where λ = 1.
To this end, we simulate the process discussed in Section 2 with a missing trend. The bias
of the estimated AR parameter is indexed by k1 , the magnitude of the missing trend. We
simulate an AR process with φ = 0 and consider values of k1 ranging from 0 to 15, with
T = 200 and 1000 replications for each value of k1 . The resulting bias for OLS and the
smoothed estimates (Sb1, Sb2, Sb3) appears in Figure 1 (see later). We note that there is
a small negative bias for the smoothed regression estimators. However, it does not change
as the magnitude of the missing trend grows, which illustrates that this procedure removes
the bias process from the estimator. Moreover, the bias in OLS grows as the omitted trend
indexed via k1 grows larger, consistent with the results of Perron (1989).
Given the potential for bias in rolling estimators, we illustrate several data generating
processes using rolling estimators. For each experiment, we allow for time series dimensions
T = 200,400 and 600. The rolling window is set for λ = 0.20 so that there are 40, 80,

14
and 120 observations used in each of the regimes in the estimation window. The number
of replications for each experiment is set to 10, 000. The parameter of interest is β̄λ (r),
the average value of β(r) over each relevant subset of time. Our experiments report the
estimated coverage probability for a 95% uniform confidence band for β̄λ (r). We report the
mean average deviation over the range. In addition, we list the average width of the uniform
confidence band. For example, if two competing procedures both have 95% coverage, we
would prefer the method generating narrow bands.
Denote the current naive technology of using rolling OLS and employing critical values
from the standard normal distribution as RO (rolled OLS). If we use the adjusted critical
values tabulated in Table 1 with rolled OLS, we list the procedure as adjRO. Our smoothed
rolling procedure using the adjusted critical values is indexed by the bandwidths as SRb1,
SRb2, and SRb3.3
Our first experiment uses a simple static regression model given by

yt = 2xt + 󰂃t

where 󰂃t ∼ N (0, 0.25). The results are summarized in Table 2.

We first examine the rolling OLS (RO) estimator using the (incorrect) standard normal
distribution critical values. The coverage for the procedure is below the nominal 95%, and
ranges from around 19% to 26% depending on the sample size. The rolling OLS with adjusted
critical values from Table 1. is much closer to the intended coverage, reaching as high as 91%
when the sample size is 600. The smoothed rolling procedures have much better coverage for
all the considered bandwidth parameters. The SRb2 performs best, does not vary as much
with the sample size. The mean absolute deviation (MAD) is similar for all procedures,
which is to be expected since the coverage issues arise from incorrect critical values. The
smoothed procedures has the narrowest uniform bands (average width) while having the best
coverage.
3
In addition to the procedures we evaluated in our Monte Carlo experiment, we also used simulated
critical values to account for the possibility of time varying variances, which were present in the experiments.
However, the case specific simulated critical values for the time varying variance models were similar to those
in Table 1. We attempted to construct pathological examples of time varying variances but the results were
similar to data generating processes with fixed variances.

15
Table 2. Static Regression

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.8924 0.9575 0.9839 0.8357 0.1940
Coverage T = 400 0.9052 0.9586 0.9819 0.8991 0.2386
T = 600 0.9081 0.9530 0.9773 0.9152 0.2623

T = 200 0.0577 0.0520 0.0478 0.0650 0.0650

MAD T = 400 0.0414 0.0382 0.0355 0.0455 0.0455
T = 600 0.0336 0.0314 0.0294 0.0367 0.0367

T = 200 0.3918 0.4001 0.4098 0.4978 0.2988

Av. Width T = 400 0.2800 0.2837 0.2883 0.3585 0.2152
T = 600 0.2293 0.2318 0.2348 0.2944 0.1767

Next, we generate data from a simple autoregressive model with no parameter changes.
The data generating process is given by

yt = 0.80yt−1 + 󰂃t

where 󰂃t is N(0,0.25). The results are given in Table 3.

Table 3. Constant AR, ρ = 0.80

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.3502 0.7647 0.9210 0.6154 0.0552
Coverage T = 400 0.4406 0.7947 0.9231 0.8041 0.1266
T = 600 0.4865 0.8076 0.9224 0.8579 0.1734

T = 200 0.1975 0.1350 0.1059 0.1124 0.1124

MAD T = 400 0.1200 0.0832 0.0666 0.0672 0.0672
T = 600 0.0893 0.0630 0.0513 0.0511 0.0511

T = 200 0.5782 0.5623 0.5592 0.6724 0.4036

Av. Width T = 400 0.3871 0.3773 0.3753 0.4480 0.2689
T = 600 0.3075 0.3008 0.2995 0.3686 0.2213

Given the rolling fraction of λ = 0.20, the procedures are attempting to estimate the AR
parameter based on 40 observations. The SRb3 procedure is best for this data generating

16
process, with small MAD and narrow confidence bands, but the adjusted rolling OLS pro-
cedure performs respectably. Again, the naive traditional rolling OLS procedure performs
poorly, with coverage of 17% even with a sample of 600 observations.
The next process we consider has a changing autoregressive coeﬃcient

yt = ρ(t/T )yt−1 + 󰂃t with ρ(t/T ) = 0.8 [1 − (t/T )] ,

and our results appear in Table 4.

Table 4. AR, ρ = 0.80[1 − (t/T )]

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.6907 0.8937 0.9631 0.7701 0.1451
Coverage T = 400 0.7559 0.9109 0.9651 0.8702 0.1950
T = 600 0.7815 0.9156 0.9642 0.8989 0.2236

T = 200 0.1417 0.1131 0.0983 0.1235 0.1235

MAD T = 400 0.0948 0.0784 0.0696 0.0837 0.0837
T = 600 0.0748 0.0632 0.0569 0.0672 0.0672

T = 200 0.6856 0.7028 0.7195 0.8850 0.5313

Av. Width T = 400 0.4916 0.5000 0.5080 0.6380 0.3830
T = 600 0.4040 0.4094 0.4146 0.5248 0.3150

The adjusted rolling OLS estimator performs well once the sample size reaches 600 (ef-
fectively 120 with λ = 0.2), while the smoothed rolled estimators with larger bandwidths
perform well for all sample sizes and have narrower confidence bands.
For the next experiment, the autoregressive coeﬃcients do not change, but there is an
omitted trend variable. This type of data generating process will cause bias in the autore-
gressive parameter if one uses standard OLS, which are illustrated in Figure 1. The process
is given by
yt = 2 × (t/T ) + ρyt−1 + 󰂃t .

We report the coverage for the uniform bands for the autoregressive coeﬃcient in Table 5.
The biases in standard OLS are apparent in both the rolled OLS and adjusted rolled
OLS based confidence bands with coverage of 0% for all sample sizes. The bias appears in

17
AR Bias
1.1

0.9

0.7

0.5
Bias

0.3

0.1

-0.1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

-0.3
k

OLS Sb1 Sb2 Sb3

Figure 1: Bias for OLS and the smoothed estimates (Sb1, Sb2, Sb3).

Table 5. Omitted trend, 2 × (t/T )

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.8601 0.9480 0.9830 0.0000 0.0000
Coverage T = 400 0.8859 0.9490 0.9783 0.0000 0.0000
T = 600 0.8917 0.9497 0.9774 0.0000 0.0000

T = 200 0.1151 0.1031 0.0942 0.5620 0.5620

MAD T = 400 0.0820 0.0753 0.0699 0.5663 0.5663
T = 600 0.0674 0.0626 0.0586 0.5678 0.5678

T = 200 0.7278 0.7570 0.7781 0.7692 0.4618

Av. Width T = 400 0.5323 0.5471 0.5576 0.5545 0.3347
T = 600 0.4410 0.4508 0.4577 0.4559 0.2737

the MAD rows of Table 5 where both adjusted rolling OLS and rolling OLS are over five
times larger than the smoothed rolling estimators.

18
The next data generating process is given by

yt = ρ(t/T )yt−1 + 󰂃t with ρ(t/T ) = 0.8 sin[2π × (t/T )],

so that the autoregressive coeﬃcient starts at zero, and oscillates between 0.80 and zero.
The results are displayed in Table 6.

Table 6. AR, ρ = 0.80 sin[2π × (t/T )]

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.6669 0.8111 0.8484 0.6594 0.0808
Coverage T = 400 0.7212 0.8347 0.8535 0.6928 0.0766
T = 600 0.7445 0.8408 0.8477 0.6683 0.0621

T = 200 0.1334 0.1214 0.1256 0.1202 0.1202

MAD T = 400 0.0895 0.0833 0.0876 0.0848 0.0848
T = 600 0.0710 0.0668 0.0708 0.0710 0.0710

T = 200 0.6222 0.6384 0.6581 0.8040 0.4826

Av. Width T = 400 0.4439 0.4528 0.4637 0.5789 0.3475
T = 600 0.3635 0.3695 0.3773 0.4763 0.2859

The performance of rolling OLS is again poor, and the adjusted rolling OLS is also below
nominal confidence levels. The smoothed rolling procedures improve as the sample size
increases, up to coverage of 84%.
Finally, we treat the case where the intercept term follows increases and then decreases
with
yt = sin[2π × (t/T )] + ρyt−1 + 󰂃t .

The results are presented in Table 7.

The estimators based on OLS perform poorly and get worse as the sample size increases.
Our smoothed rolling estimators perform well and improve as the sample size increases. The
MAD gets smaller and the width of the confidence bands declines with increases in sample
size.
The Monte Carlo experiments suggest that for static models, using the new critical values
from Table 1 provide a simple and eﬀective way to adjust the rolling estimators to obtain

19
Table 7. AR, Omitted trend, sin[2π × (t/T )]

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.8860 0.9676 0.9836 0.2161 0.0025
Coverage T = 400 0.8956 0.9622 0.9830 0.0458 0.0001
T = 600 0.9024 0.9591 0.9825 0.0086 0.0000

T = 200 0.1010 0.0962 0.0916 0.1909 0.1909

MAD T = 400 0.0807 0.0725 0.0674 0.1736 0.1736
T = 600 0.0663 0.0610 0.0569 0.1682 0.1682

T = 200 0.7273 0.7561 0.7743 0.9395 0.5640

Av. Width T = 400 0.5319 0.5465 0.5562 0.6846 0.4110
T = 600 0.4407 0.4505 0.4572 0.5651 0.3392

accurate confidence bands. However, from the theorems in earlier sections, we know that bias
results when the second moments of the regressors are related to the parameters in the model.
A well-known case of this phenomenon is the time varying autoregressive model, which we
also explore in the Monte Carlo experiments. We see that employing rolling regression with
the use of the time-varying coeﬃcient regression of Cai (2007) in the first stage removes the
bias from rolling regression.

5 An Empirical Application
Rolling regression was employed by O’Reilly and Whelan (2005) to examine the persistence
of inflation over time in the Euro area. Suppose that one estimates an AR(3) model of U.S.
inflation given by

yt = α + φ1 yt−1 + φ2 yt−2 + φ3 yt−3 + φ4 yt−4 + 󰂃t .

A common transformation that is used to evaluate persistence is

∆yt = α + ωyt−1 + γ1 ∆yt−1 + γ2 ∆yt−2 + γ3 ∆yt−3 + 󰂃t .

If ω = φ1 + φ2 + φ3 + φ4 − 1 is zero, the autoregressive process has a unit root which is an

extreme form of persistence. If the process is stationary, ω will be negative, and the closer
to -1 the less persistent is inflation. We use data on inflation from the CPI, with monthly

20
observations from February of 1947 to May of 2020. As a preliminary analysis, we estimate
the AR(4) model and test for serial correlation in the residuals. We fail to reject the null of
no serial correlation at the 5% level, and we also reject the unit root hypothesis at the 1%
level. We proceed to analyze rolling estimates of the persistence parameter ω1 . Given our
preliminary findings of no unit root, our rolling analysis is not intended to check for unit
roots but to look for changing persistence. To this end, we consider a rolling window over
the sample of data. Adjusting for endpoints, we are left with 880 observations, so that using
λ = 0.20 results in using 176 observations in the rolling window.
The naive approach just estimates the parameter ω with OLS using 176 observations for
each window and includes heteroskedasticity robust standard errors. However, the confidence
bands incorporate the incorrect critical value of 1.96. Given that λ = 0.20, using Table 1
gives the appropriate critical value that accounts for the uniform nature of the confidence
bands, with a value of 3.265, just as used in the Monte Carlo experiments. We plot the
resulting rolling estimator of ω and the competing confidence bands in Figure 2. That is,
we plot the rolling OLS (RO) regression estimator with the incorrect confidence bands, and
the smoothed rolling regression (SR) with the corrected bands that account for the rolling
multiple periods.
There are several conclusions that we see from the empirical exercise of comparing the
existing naive procedure (RO) from the new technique (SR). The first point is that the naive
function is not contained in the corrected bands, suggesting that the bias process is nonzero.
For example, in May of 1969 through August of 1980, we see that the naive rolling OLS
estimate is outside our bands, and the same is true later in the sample in November of 1991.
In addition, the naive RO procedure was shown to have narrow bands so that confidence
levels are far below the prescribed target. A by-product of the incorrectly narrow bands of
the naive rolling OLS procedure is the illusion of a more volatile inflation persistence. That
is, the smoothed rolling regression estimates with more accurate confidence bands and lack
of bias indicates a gradual increase in persistence in the early 1970’s, a peak in the mid
1980’s, and a gradual decline until the early 2000’s.
The examples with an autoregressive process show that the result of a change in the trend
or mean of the process is upward bias in the estimated persistence. In light of this eﬀect, we

21
Inflation Persistence

0.1

Nov-91

Nov-96

Nov-01

Nov-06

Nov-11

Nov-16
May-04

May-09

May-14

May-19
Nov-76

Nov-81

Nov-86
Nov-61

Nov-66

Nov-71

May-79

May-84

May-89

May-94

May-99
Feb-98

Aug-00

Feb-03

Aug-05

Feb-08

Aug-10

Feb-13

Aug-15

Feb-18
May-64

May-69

May-74

Feb-78

Feb-88
Aug-80

Feb-83

Aug-85

Aug-90

Feb-93

Aug-95
Feb-63

Aug-65

Feb-68

Aug-70

Feb-73

Aug-75
-0.1

-0.3

-0.5

-0.7

-0.9

-1.1

-1.3

-1.5

SR RO

Figure 2: Confidence bands for SR and RO

estimate the α parameter which also influences the mean of the autoregressive process. The
smoothed rolling estimate is shown in Figure 3. We anticipate the more that the estimate
of α changes, the more upward bias we will see in the estimate of persistence. From Figure
3 coupled with Figure 2, this is indeed the case from May of 1964 until May of 1979, where
the naive rolling OLS estimate is much higher than our smoothed estimate, SR.
The use of our new procedure corrects for the poor numerical performance of confidence
bands by increasing the width. Moreover, autoregressive processes estimated via rolling OLS
are susceptible to upward bias in the persistence estimate arising from changes in the mean
of the process through the change in intercept parameters. We mitigate this bias to isolate
the persistence from the change in mean of the process. In the present case, increases in
the level of inflation are misinterpreted by researchers relying on rolling OLS as increases in
persistence.

22
Estimated Intercept
0.19

0.18

0.17

0.16

0.15

0.14

0.13

0.12

0.11

0.1

0.09

Nov-11

Nov-16
May-14

May-19
Nov-91

Nov-96

Nov-01

Nov-06
May-04

May-09

Feb-18
May-89

May-94

May-99

Aug-10

Feb-13

Aug-15
Nov-61

Nov-66

Nov-71

Nov-76

Nov-81

Nov-86
May-79

May-84

Feb-93

Feb-98

Aug-00

Feb-03

Aug-05

Feb-08
May-64

May-69

May-74

Aug-85

Feb-88

Aug-90

Aug-95
Feb-63

Aug-65

Feb-68

Aug-70

Feb-73

Aug-75

Feb-78

Aug-80

Feb-83

Figure 3: The smoothed rolling estimate.

6 Conclusion
The results in this paper provide an asymptotic analysis of the ubiquitous rolling regression
estimator for a class of potentially non-stationary processes. Our analysis covers processes
that allow forms of non-stationary properties which may arise from the changing parameters
in the model. In particular, we can cover classes of varying parameter autoregressions, for
example. In the simplest cases, the usual procedure for using point-wise confidence bands
based on plus and minus 1.96 times the standard error will lead to confidence bands that
are much too narrow. The limiting distribution is a functional of Gaussian processes, but
is readily tabulated. Our results suggest that one should use our new critical values which
depend on the window width to determine uniform confidence bands.
In addition to the new critical values, we showed the potential for a bias process arising
from a relationship between the distribution of the regressors and the regression parameters.
From an empirical standpoint, a dynamic model will be most susceptible for such a process.

23
However, we propose a procedure of averaging smooth coeﬃcient time-varying regression
estimators over the relevant window. The resulting distribution is the same as in the case
with no bias process, and should be used when applying rolling regression in dynamic models.
The empirical example covers time-varying persistence in inflation, and we show that the
new corrected bands suggest that the persistence is less variable than previous studies would
suggest.
Rolling regression is a natural procedure, and one employed in many recent empirical
studies. The choice of window width for the typical rolling procedure is often based on having
“enough” observations in order to estimate parameters, or based on some relevant time frame
for the question at hand. Our results show that one can retain the original idea of rolling
regression, as well as the usual parametric convergence rates, but the statistical distribution
is adjusted to obtain proper coverage. Moreover, the adjusted rolling regression procedures
are simple to implement, with narrower confidence bands relative to fully nonparametric
time-varying coeﬃcient models with uniform confidence bands.

7 Proofs
Proof of Theorem 2.1:
󰀵󰀳 󰀴−1 󰀶
√ 󰀓 󰀔 √ [rT ]
󰁛 [rT ]
󰁛
T β̂λ (r) − β̄λ (r) = T 󰀷󰁃 xs x⊤
s
󰁄 xs ys − β̄λ (r)󰀸
s=[rT −T λ+1] s=[rT −T λ+1]
󰀵󰀳 󰀴−1 󰀶
√ [rT ]
󰁛 [rT ]
󰁛 󰀓 󰀓s󰀔 󰀔
= T 󰀷󰁃 xs x⊤
s
󰁄 xs x⊤
sβ + 󰂃s − β̄λ (r)󰀸
T
s=[rT −T λ+1] s=[rT −T λ+1]
󰀵󰀳 󰀴−1 󰀶
[rT ] [rT ]
√ 󰁛 󰁛 √
= T 󰀷󰁃 xs x⊤
s
󰁄 xs 󰂃s 󰀸 + T BT (r)
s=[rT −T λ+1] s=[rT −T λ+1]
󰀵 󰀶
√ 1
[rT ]
󰁛 󰀓s󰀔
+ T󰀷 β − β̄λ (r)󰀸
Tλ T
s=[rT −T λ+1]

where BT (r) is the bias process defined in the statement of the theorem. The final term is
o(1) by Reimann integrability of β(s/T ). Then applying Corollary 2 of Wu and Zhou (2011)

24
for locally stationary processes, we have
[rT ]
1 󰁛
√ xs 󰂃s ⇒ Q(r) − Q(r − λ).
T s=[rT −T λ+1]

Now consider xs x⊤ ⊤
s . The process xs xs − M (s/T ) is mean zero, and following the proof of

Lemma 6 of Zhou and Wu (2010), we apply Doob’s inequality to

t
1 󰁛󰀃 ⊤ 󰀄
√ xs xs − M (s/T ) ,
T s=1

so that
[rT ]
󰁛 󰁝 r
1 p
xs x⊤
s → M (s)ds
T r−λ
s=[rT −T λ+1]

uniformly in r.
Proof of Theorem 3.1 For the local-linear estimator, define the term

1 󰁛 1 󰀕t − s󰀖
[T (1−h)]
RT,0 (s/T ) = K xt 󰂃t
T h Th
t=[T h]

Given our bandwidth choice, we have

󰀐 󰀐
󰀐 󰀐
sup 󰀐M (G, s/T ){β̃(s/T ) − β(s/T )} − RT,0 (s/T )󰀐 = Op (κT ξT )
h≤s/T ≤1−h

where

κT = (T h)−1 T 1/ι + (T h log T )1/2 + h2

ξT = T −1/2 h−1 + h

from equations (26) and (27) of Zhou and Wu (2010) and (A.3) of Zhang and Wu (2012).
We have

√ 󰁫 ∗ 󰁬 1 󰁛[rT ] 󰁫 󰁬
T β̂λ (r) − β̄λ (r) = √ β̃(s/T ) − β(s/T )
T λ s=[rT −T λ+1]
[rT ]
1 󰁛
= √ M (G, s/T )−1 RT,0 (s/T ) + Op (T 1/2 κT ξT ),
T λ s=[rT −T λ+1]

25
where the last term is op (1) given our bandwidth choice. Then
󰀕 󰀖
√ 󰁫 ∗ 󰁬 1 󰁛[rT ]
−1 1
[T (1−h)]
󰁛 t−s
T β̂λ (r) − β̄λ (r) = √ M (G, s/T ) K xt 󰂃t + op (1)
T λ s=[rT −T λ+1] Th Th
t=[T h]
[T (1−h)]
󰁛 1 󰁛[rT ] 󰀕 󰀖
1 −1 t−s
=√ M (G, s/T ) K xt 󰂃t + op (1)
T λ t=[T h] T h s=[rT −T λ+1] Th

󰁛[rT ] 󰀕 󰀖 [T (1−h)]
1 −1 t−s 1 󰁛
=√ M (G, s/T ) K xt 󰂃t + op (1).
T λ s=[rT −T λ+1] Th Th
t=[T h]

Consider
[rT ] 󰀕 󰀖 [T (1−h)]
1 󰁛 −1 1 t−s 1 󰁛
D(r) = M (G, s/T ) K √ xt 󰂃t .
T h Th Tλ
s=[T h] t=[T h]

We write

D(r) = D∗ (r) − D1 (r) + D2 (r)

[T (1−h)] 󰀕 󰀖 [T r]
∗ 1 󰁛 −1 1 t−s 1 󰁛
D (r) = M (G, s/T ) K √ xt 󰂃t
Tλ h Th T
s=[T h] t=[T h]
[T (1−h)]
󰁛 󰀕 󰀖 [T r]
1 −1 1 t−s 1 󰁛
D1 (r) = M (G, s/T ) K √ xt 󰂃t
Tλ h Th T t=[T h]
s=[T r]+1
[T r] 󰀕 󰀖 [T (1−h)]
1 󰁛 −1 1 t−s 1 󰁛
D2 (r) = M (G, s/T ) K √ xt 󰂃t
Tλ h Th T
s=[T h] t=[T r]+1

We show that D1 (r) and D2 (r) converge to zero uniformly in r.

󰀐 󰀐󰀐 󰀐
󰀐 [T r] 󰀐 󰀐 [T (1−h)] 󰀕 󰀖󰀐
󰀐 1 󰁛 󰀐󰀐1 󰁛 −1 1 t−s 󰀐
󰀂D1 (r)󰀂 ≤ 󰀐 √
󰀐 Tλ x t 󰂃 t
󰀐󰀐
󰀐 󰀐T M (G, s/T ) K 󰀐
󰀐
󰀐 󰀐 󰀐 s=[T r]+1 h T h 󰀐
t=[T h]

Consider the second term on the right hand side. Denote the eigenvalue decomposition as
M (G, s/T ) = Γ(s/T )Θ(s/T )Γ(s/T )−1 so that
󰀐 󰀐
󰀐 [T (1−h)] 󰀕 󰀖󰀐 [T (1−h)] 󰀐 󰀕 󰀖󰀐
󰀐1 󰁛 1 t − s 󰀐 1 󰁛 󰀐 t − s 󰀐
󰀐 M (G, s/T ) −1
K 󰀐≤ 󰀐M (G, s/T ) K
−1 󰀐
󰀐T h T h 󰀐 Th 󰀐 T h 󰀐
󰀐 s=[T r]+1 󰀐 s=[T r]+1
[T (1−h)] 󰀕 󰀖
1 󰁛 t−s 󰁳
= K tr [Γ(s/T )Θ(s/T )−2 Γ(s/T )−1 ]
Th Th
s=[T r]+1
[T (1−h)]
󰁛 󰀕 󰀖
1 t−s
≤ K c−2k
M
Th Th
s=[T r]+1

26
where cM is the lower bound of eigenvalues of M (s/T ). Note that t ≤ [T r], so that t < s.
Then
[T (1−h)]
󰁛 󰀕 󰀖 󰁝 1−h 󰀕 󰀖
1 t−s 1 u−v
K ∼ K dv
Th Th r h h
s=[T r]+1
󰁝 1−h−u
h
= K(z)dz
r−u
h

where z = (v − u)/h, u < 1 − h, and r < u. Then as h → 0, this integral is O(h), since
K(z) = 0 for |z| ≥ 1. Hence, D1 (r) converges in probability to 0 uniformly in r. The
argument is similar for D2 (r).
For D∗ (r) we note that
[T (1−h)] 󰀕 󰀖 󰁝 (1−h) 󰀕 󰀖
1 󰁛 −1 t−s 1 −1 u−v
M (G, s/T ) K ∼ M (G, v) K dv
Th Th h h h
s=[T h]
󰁝 u−r+λ
h
= M (G, u − zh)−1 K(z)dz
u−r
h

= M (G, u)−1 + O(h).

Then
[rT ]
∗ 1 󰁛
D (r) = √ M (G, t/T )−1 xt 󰂃t + op (1)
T λ t=[T h]

uniformly in r. Combining these results, we have

√ 󰁫 ∗ 󰁬
T β̂λ (r) − β̄λ (r) = D∗ (r) − D∗ (r − λ) + op (1).

Given this representation, we apply Corollary 2 of Wu and Zhou (2011), so that

[rT ]
1 󰁛
√ M (G, t/T )−1 xt 󰂃t ⇒ Q2 (r) − Q2 (r − λ)
T λ t=[rT −T λ+1]
󰀅 󰀆 󰁕 min(r1 ,r2 )
where Q2 (r) is p dimensional Gaussian process with covariance E Q2 (r1 )Q2 (r2 )⊤ = 0 Λ(s),
and

1
Λ(s) = M (s)−1 Ω(s)M (s)−1 .
λ2

27
References
Adrian, T., R.K. Crump and E. Moench (2015). Regression-Based Estimation of Dynamic
Asset Pricing Models. Journal of Financial Economics, 118, 211-244.

Andrews, D.W.K. (1993). Tests for Parameter Instability and Structural Change With
Unknown Change Point. Econometrica, 61, 821-856.

Blanchard, O. (2018). Should We Reject the Natural Rate Hypothesis?, Journal of Eco-
nomic Perspectives, 32, 97-120.

Cai, Z. (2007). Trending Time-Varying Coeﬃcient Time Series Models with Serially Cor-
related Errors. Journal of Econometrics, 136, 163-188.

Campello, M., A. Galvao and T. Juhl (2019). Testing for Slope Heterogeneity Bias in Panel
Data Models. Journal of Business and Economic Statistics, 37, 749-760.

Chen, B. and Y. Hong (2012). Testing for Smooth Structural Changes in Time Series
Models via Nonparametric Regression. Econometrica, 80, 1157-1183.

Clark, T.E. and M.W. McCracken (2009). Improving Forecast Accuracy by Combining
Recursive and Rolling Forecasts. International Economic Review, 50, 363-395.

Georgiev, I., D.I. Harvey, S.J. Leybourne and A.R. Taylor (2018). Testing for Parameter
Instability in Predictive Regression Models. Journal of Econometrics, 204, 101-118.

Hansen, B.E. (1996). Inference When a Nuisance Parameter Is Not Identified Under the
Null Hypothesis. Econometrica, 64, 413-430.

Inoue, A., L. Jin and B. Rossi (2017). Rolling Window Selection for Out-of-Sample Fore-
casting With Time-Varying Parameters. Journal of Econometrics, 196, 55-67.

Jimeńez, G., S. Ongena, J.-L. Peydró and J. Saurina (2017). Macroprudential Policy,
Countercyclical Bank Capital Buﬀers, and Credit Supply: Evidence from the Spanish
Dynamic Provisioning Experiments. Journal of Political Economy, 125, 2126-2177.

28
Johnston, G. (1982). Probabilities of Maximal Deviations for Nonparametric Regression
Function Estimates. Journal of Multivariate Analysis, 12, 402-414.

Linnainmaa, J.T. and M.R. Roberts (2018). The History of the Cross Section of Stock
Returns. Review of Financial Studies, 31, 2606-2649.

Løpez-Salido, D., J.C. Stein and E. Zkrajšek (2017). Credit-Market Sentiment and the
Business Cycle. Quarterly Journal of Economics, 132, 1373-1426.

Lu, Z. and O. Linton (2007). Local Linear Fitting Under Near Epoch Dependence. Econo-
metric Theory, 23, 37-70.

O’Reilly, G. and K. Whelan (2005). Has Euro-Area Inflation Persistence Change Over
Time. Review of Economics and Statistics, 87, 709-720.

Perron, P. (1989). The Great Crash, the Oil Price Shock and the Unit Root Hypothesis.
Econometrica, 57, 1361-1401.

Robinson, P.M. (1989). Nonparametric Estimation of Time-Varying Parameters. in Sta-

tistical Analysis and Forecasting of Economic Structural Change, ed. by P. Hackl,
Springer Verlag: New York.

Swanson, E.T. and J.C. Williams (2014). Measuring the Eﬀect of the Zero Lower Bound
on Medium-and Longer-Term Interest Rates. American Economic Review, 104, 3154-
3185.

Wu, W.B. and Z. Zhou (2011). Gaussian Approximations for Non-Stationary Multiple
Time Series. Statistica Sinica, 21, 1397-1413.

Zhang, T. and W.B. Wu (2012). Inference of Time-Varying Regression Models. Annals of

Statistics, 40, 1376-1402.

Zhao, Z. and W.B. Wu (2008). Confidence Bands in Nonparametric Time Series Regression.
Annals of Statistics, 36, 1854-1878.

29
Zhou, Z. and W.B. Wu (2010). Simultaneous Inference of Linear Models with Time Varying
Coeﬃcients. Journal of the Royal Statistical Society, Series B, 72, 513-531.

The Art and Science of Trading by Adam Grimespdf
80% (5)
The Art and Science of Trading by Adam Grimespdf
274 pages
Isle Royale WorkBook 2016
0% (1)
Isle Royale WorkBook 2016
24 pages
Chapter 5 Solutions Solution Manual Introductory Econometrics For Finance
No ratings yet
Chapter 5 Solutions Solution Manual Introductory Econometrics For Finance
9 pages
Executive Briefing For The Santa Fe Grill Case Study
0% (1)
Executive Briefing For The Santa Fe Grill Case Study
10 pages
14-BEJ671
No ratings yet
14-BEJ671
22 pages
Multi-Dimensional Point Process Models in R
No ratings yet
Multi-Dimensional Point Process Models in R
27 pages
CIR8
No ratings yet
CIR8
15 pages
Econometrics Chapter Six (1)
No ratings yet
Econometrics Chapter Six (1)
80 pages
Time Series and Sequential Data
No ratings yet
Time Series and Sequential Data
143 pages
Stable Marked Point Processes
No ratings yet
Stable Marked Point Processes
35 pages
Chapter11 PDF
No ratings yet
Chapter11 PDF
29 pages
skataric_sontag_nonhomogeneous_markov_ecc2014
No ratings yet
skataric_sontag_nonhomogeneous_markov_ecc2014
6 pages
entropy-21-00713
No ratings yet
entropy-21-00713
16 pages
Intro of Time Series
No ratings yet
Intro of Time Series
18 pages
Statistical Inference For Time-Inhomogeneous
No ratings yet
Statistical Inference For Time-Inhomogeneous
27 pages
Answers Review Questions Econometrics
84% (25)
Answers Review Questions Econometrics
59 pages
09 Ba402
No ratings yet
09 Ba402
22 pages
Statistical Modeling of Difusion Processes With Free Knot Splines
No ratings yet
Statistical Modeling of Difusion Processes With Free Knot Splines
24 pages
Qu Zhongjun
No ratings yet
Qu Zhongjun
41 pages
Jordan Philips Dynamac Stata
No ratings yet
Jordan Philips Dynamac Stata
30 pages
Thu 1340 Engle
No ratings yet
Thu 1340 Engle
36 pages
Elliott
No ratings yet
Elliott
24 pages
Testing For Cross-Sectional Dependence in Panel Data Models
No ratings yet
Testing For Cross-Sectional Dependence in Panel Data Models
13 pages
Time Series Economic Forecasting
No ratings yet
Time Series Economic Forecasting
4 pages
Path-Integral Evolution of Multivariate Systems With Moderate Noise
No ratings yet
Path-Integral Evolution of Multivariate Systems With Moderate Noise
15 pages
An Autoregressive Distributed Lag Modelling Approach To Cointegration Analysis
No ratings yet
An Autoregressive Distributed Lag Modelling Approach To Cointegration Analysis
33 pages
Multi-Variate Stochastic Volatility Modelling Using Wishart Autoregressive Processes
No ratings yet
Multi-Variate Stochastic Volatility Modelling Using Wishart Autoregressive Processes
13 pages
s00181-014-0870-2
No ratings yet
s00181-014-0870-2
13 pages
Pesaran - Shin - An Auto Regressive Distributed Lag Modelling Approach To Cointegration Analysis
No ratings yet
Pesaran - Shin - An Auto Regressive Distributed Lag Modelling Approach To Cointegration Analysis
33 pages
Bacry-kozhemyak-muzy-2008-Log Normal Continuous Cascades Aggregation Properties and Estimation-Applications To Financial Time Series
No ratings yet
Bacry-kozhemyak-muzy-2008-Log Normal Continuous Cascades Aggregation Properties and Estimation-Applications To Financial Time Series
27 pages
14.1 - Autoregressive Models: Autocorrelation and Partial Autocorrelation
No ratings yet
14.1 - Autoregressive Models: Autocorrelation and Partial Autocorrelation
34 pages
Time Series Analysis
No ratings yet
Time Series Analysis
5 pages
Press (1972)
No ratings yet
Press (1972)
6 pages
Integer Autoregressive Models With Structural Breaks
No ratings yet
Integer Autoregressive Models With Structural Breaks
18 pages
Engle 1982
No ratings yet
Engle 1982
22 pages
Identi Cation, Estimation and Testing of Conditionally Heteroskedastic Factor Models
No ratings yet
Identi Cation, Estimation and Testing of Conditionally Heteroskedastic Factor Models
0 pages
Discrete-Event Simulation of Fluid Stochastic Petri Nets
No ratings yet
Discrete-Event Simulation of Fluid Stochastic Petri Nets
9 pages
Time Series Prediction - Predicting Stock Price
No ratings yet
Time Series Prediction - Predicting Stock Price
8 pages
The Conditional Autoregressive Geometric Process Model For Range Data
No ratings yet
The Conditional Autoregressive Geometric Process Model For Range Data
35 pages
Pss Stata 2017
No ratings yet
Pss Stata 2017
27 pages
Averaging Oscillations With Small Fractional Damping and Delayed Terms
No ratings yet
Averaging Oscillations With Small Fractional Damping and Delayed Terms
20 pages
00 panels1e
No ratings yet
00 panels1e
20 pages
02 Basic Tools and Techniques
No ratings yet
02 Basic Tools and Techniques
116 pages
Econometrica: Eywords
No ratings yet
Econometrica: Eywords
51 pages
Financial Econometrics
No ratings yet
Financial Econometrics
16 pages
A Flexible Regime Switching Model With Pairs Trading Application to the S&P 500 High-frequency Stock Returns
No ratings yet
A Flexible Regime Switching Model With Pairs Trading Application to the S&P 500 High-frequency Stock Returns
15 pages
Assigment
100% (2)
Assigment
13 pages
Bayesian Dynamic Modelling: Bayesian Theory and Applications
No ratings yet
Bayesian Dynamic Modelling: Bayesian Theory and Applications
27 pages
Ku Satsu 160225
No ratings yet
Ku Satsu 160225
11 pages
Chapter 5
No ratings yet
Chapter 5
17 pages
236-240
No ratings yet
236-240
5 pages
Jump-Diffusion Stock-Return Model With Weighted Fitting of Time-Dependent Parameters
No ratings yet
Jump-Diffusion Stock-Return Model With Weighted Fitting of Time-Dependent Parameters
6 pages
8-Time Series Analysis
No ratings yet
8-Time Series Analysis
15 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
100% (2)
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
7 pages
Dynamic Linear Models, Recursive Least Squares and Steepest-Descent Learning
No ratings yet
Dynamic Linear Models, Recursive Least Squares and Steepest-Descent Learning
11 pages
partialnongaussian
No ratings yet
partialnongaussian
18 pages
Dissetacao Mestrado
No ratings yet
Dissetacao Mestrado
19 pages
Econometrics Journal - 2002 - Lüutkepohl - Maximum Eigenvalue Versus Trace Tests For The Cointegrating Rank of A VAR
No ratings yet
Econometrics Journal - 2002 - Lüutkepohl - Maximum Eigenvalue Versus Trace Tests For The Cointegrating Rank of A VAR
24 pages
A Splitting Strategy For The Calibration of Jump-Diffusion Models
No ratings yet
A Splitting Strategy For The Calibration of Jump-Diffusion Models
34 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications
From Everand
Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications
Harald Cramér
4/5 (2)
Analytical Methods of Optimization
From Everand
Analytical Methods of Optimization
D. F. Lawden
No ratings yet
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Syllabus of BBA II Year
No ratings yet
Syllabus of BBA II Year
26 pages
Research Proposal
No ratings yet
Research Proposal
1 page
Chapter 7. Statistical Intervals For A Single Sample
No ratings yet
Chapter 7. Statistical Intervals For A Single Sample
102 pages
Smart Skills Research Paper Questions
No ratings yet
Smart Skills Research Paper Questions
8 pages
1-28
100% (1)
1-28
42 pages
Zagheni Weber2015
No ratings yet
Zagheni Weber2015
13 pages
B1. Ecology and Life
No ratings yet
B1. Ecology and Life
72 pages
(AMALEAKS - BLOGSPOT.COM) Statistics (STAT-112) - Grade 11 Week 1-10
No ratings yet
(AMALEAKS - BLOGSPOT.COM) Statistics (STAT-112) - Grade 11 Week 1-10
100 pages
Practice Questions
No ratings yet
Practice Questions
6 pages
Case Studies in The Mathematical Statistics Course
No ratings yet
Case Studies in The Mathematical Statistics Course
5 pages
Lesson No. 28
No ratings yet
Lesson No. 28
5 pages
Logistic: Regression Sigmoid Function
No ratings yet
Logistic: Regression Sigmoid Function
4 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
8 pages
12 Sim Hw2 Sol
No ratings yet
12 Sim Hw2 Sol
3 pages
ML Project Report
No ratings yet
ML Project Report
16 pages
Technical Intelligence and Organizational Effectiveness of Foods and Beverages Manufacturing Firms in South South, Nigeria
No ratings yet
Technical Intelligence and Organizational Effectiveness of Foods and Beverages Manufacturing Firms in South South, Nigeria
8 pages
GRU-based Attention Mechanism For Human Activity Recognition
No ratings yet
GRU-based Attention Mechanism For Human Activity Recognition
6 pages
CONSOLATA NJERI..Docx..Bak
No ratings yet
CONSOLATA NJERI..Docx..Bak
27 pages
Repeated Measures ANOVA
No ratings yet
Repeated Measures ANOVA
41 pages
Multicollinearity Samiji
No ratings yet
Multicollinearity Samiji
13 pages
DWDM - Case Study On Weka - Ceb624
No ratings yet
DWDM - Case Study On Weka - Ceb624
13 pages
Chapter # 3: Research Methodology
No ratings yet
Chapter # 3: Research Methodology
3 pages
Inverted 5 Spot
No ratings yet
Inverted 5 Spot
12 pages
Stats Problems
No ratings yet
Stats Problems
9 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
63 pages
Solutions To Problem Set 1
No ratings yet
Solutions To Problem Set 1
4 pages
Assignment No 1
No ratings yet
Assignment No 1
5 pages

Rolling Regression Theory

Uploaded by

Rolling Regression Theory

Uploaded by

∗†

The Distribution of Rolling Regression Estimators

Keywords: Parameter instability; Nonparametric estimation; Rolling regressions; Uniform

2 Model and Assumptions

for finite c1 > 0.

Assumption 1 The true data generating process is given by

Assumption 2 The error process 󰂃t is a martingale diﬀerence with respect to Ft , so that

Assumption 3 For xt 󰂃t we have

Assumption 5 The smallest eigenvalue of M (t) is bounded away from zero.

To illustrate the bias process, we consider a simple autoregressive model given by

To simplify further, if φ = 0, bias is

2.1 Critical Values

Assumption 6 The function β(s) has three continuous derivatives.

Assumption 6 imposes smoothness conditions on the behavior of the regression coeﬃcients

for r1 < r2 (or 0 if r1 < r2 − λ).

3.1 Critical Values

where 󰂃t ∼ N (0, 0.25). The results are summarized in Table 2.

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.0577 0.0520 0.0478 0.0650 0.0650

T = 200 0.3918 0.4001 0.4098 0.4978 0.2988

where 󰂃t is N(0,0.25). The results are given in Table 3.

Table 3. Constant AR, ρ = 0.80

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.1975 0.1350 0.1059 0.1124 0.1124

T = 200 0.5782 0.5623 0.5592 0.6724 0.4036

yt = ρ(t/T )yt−1 + 󰂃t with ρ(t/T ) = 0.8 [1 − (t/T )] ,

and our results appear in Table 4.

Table 4. AR, ρ = 0.80[1 − (t/T )]

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.1417 0.1131 0.0983 0.1235 0.1235

T = 200 0.6856 0.7028 0.7195 0.8850 0.5313

OLS Sb1 Sb2 Sb3

Table 5. Omitted trend, 2 × (t/T )

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.1151 0.1031 0.0942 0.5620 0.5620

T = 200 0.7278 0.7570 0.7781 0.7692 0.4618

yt = ρ(t/T )yt−1 + 󰂃t with ρ(t/T ) = 0.8 sin[2π × (t/T )],

Table 6. AR, ρ = 0.80 sin[2π × (t/T )]

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.1334 0.1214 0.1256 0.1202 0.1202

T = 200 0.6222 0.6384 0.6581 0.8040 0.4826

The results are presented in Table 7.

SRb1 SRb2 SRb3 adjRO RO

T = 200 0.1010 0.0962 0.0916 0.1909 0.1909

T = 200 0.7273 0.7561 0.7743 0.9395 0.5640

yt = α + φ1 yt−1 + φ2 yt−2 + φ3 yt−3 + φ4 yt−4 + 󰂃t .

A common transformation that is used to evaluate persistence is

∆yt = α + ωyt−1 + γ1 ∆yt−1 + γ2 ∆yt−2 + γ3 ∆yt−3 + 󰂃t .

If ω = φ1 + φ2 + φ3 + φ4 − 1 is zero, the autoregressive process has a unit root which is an

Figure 2: Confidence bands for SR and RO

Figure 3: The smoothed rolling estimate.

Lemma 6 of Zhou and Wu (2010), we apply Doob’s inequality to

Given our bandwidth choice, we have

κT = (T h)−1 T 1/ι + (T h log T )1/2 + h2

D(r) = D∗ (r) − D1 (r) + D2 (r)

We show that D1 (r) and D2 (r) converge to zero uniformly in r.

= M (G, u)−1 + O(h).

uniformly in r. Combining these results, we have

Given this representation, we apply Corollary 2 of Wu and Zhou (2011), so that

Robinson, P.M. (1989). Nonparametric Estimation of Time-Varying Parameters. in Sta-

Zhang, T. and W.B. Wu (2012). Inference of Time-Varying Regression Models. Annals of

You might also like