Change Point Analysis for Time Series
Change Point Analysis for Time Series
Lajos Horváth
Gregory Rice
Change Point
Analysis
for Time Series
Springer Series in Statistics
Series Editors
Peter Bühlmann, Seminar für Statistik, ETH Zürich, Zürich, Switzerland
Peter Diggle, Dept. Mathematics, University Lancaster, Lancaster, UK
Ursula Gather, Dortmund, Germany
Scott Zeger, Baltimore, MD, USA
Springer Series in Statistics (SSS) is a series of monographs of general interest that
discuss statistical theory and applications.
The series editors are currently Peter Bühlmann, Peter Diggle, Ursula Gather,
and Scott Zeger. Peter Bickel, Ingram Olkin, and Stephen Fienberg were editors of
the series for many years.
Lajos Horváth . Gregory Rice
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
This book started with a review article on change point analysis that we wrote for
the journal TEST in 2014. The primary goal of that article was to review and extend
some standard change point techniques to data that exhibited serial dependence, e.g.,
time series. The article was received warmly, and it generated some quite stimulating
discussion. Given this, and the crippling boredom of the pandemic lockdowns of
2020, we decided to expand that article into a book.
In the past decade, the pace of research in change point analysis has grown
tremendously. The topics covered in this book only cover a portion of recent
developments that are close to our own research interests. The focus is hence
on asymptotic results in change point analysis when the data are time series. We
consider such results in many different settings, including in applications to change
point analysis in popular regression and time series models, as well as to high-
dimensional and function valued time series.
We have tried to write this book so that it will be useful both as a reference
and as a textbook for researchers or graduate students who are trying to learn more
about the subject. Each chapter concludes with bibliographic notes and a number of
exercises that can be used to evaluate one’s grasp of the material, or in structuring
a reading or topics course. After the first chapter, which covers foundational
asymptotic results for cumulative sum processes derived from stationary variables,
each subsequent chapter contains real data examples, mainly in the areas of
economics, finance, and environmetrics, that illustrate the practical application of
the asymptotic results developed.
This book would not have been possible without the help and contributions of
a great many collaborators, students, and friends over the years. These especially
include Jaromir Antoch, Alexander Aue, Patrick Bardsley, István Berkes, Cooper
Boniece, Julian Chan, Shoja Chenourri, Stefan Fremdt, Tomasz Górecki, Robertas
Gabrys, Edit Gombay, Siegfried Hörmann, Zsuzsanna Horváth, Marie Hus̆ková,
Claudia Kirch, Mario Kühn, Piotr Kokoszka, Bo Li, Hemei Li, Shiqing Ling,
Zhenya Liu, Shanlin Lu, Nirian Martin, Curtis Miller, Leandro Pardo, William
Pouliot, Ron Reeder, Matthew Reimherr, Johannes Schauer, Qi-Man Shao, Ozan
vii
viii Preface
ix
x Contents
A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
A.1 Weak Convergence and Approximations of Sums . . . . . . . . . . . . . . . . . . . . 501
A.1.1 Weak Convergence of the Empirical Processes
Based on Stationary Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
A.2 Properties of Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
A.3 Functional Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
Notation and Conventions
sargmax g(x) = min{y : g(y) = maxx∈A g(x)} ; the smallest maximal argument of
.
x∈A
g over the set A.
D D
→, .= ; converges in distribution, equal in distribution.
.
P a.s.
→, .→ ; converges in probability, converges almost surely.
.
D (I )
. → ; for a hypercube .I ⊆ Rd , weak convergence in the Skorokhod topology on
.D(I ).
A and B of .R.
.log+ x = max{log x, 1}.
E | |
. ∅ = 0, . ∅ = 1 ; empty sum is equal to zero, empty product is equal to one.
.0, .I ; zero vector in .R , identity matrix in .R
d d×d . d is made clear by context.
xiii
Chapter 1
Cumulative Sum Processes
In this chapter we introduce the basic change point in the mean problem for
scalar observations. We see that the most logical and straightforward approaches to
detect such a change point lead to the consideration of weighted functionals of the
cumulative sum (CUSUM) processes computed from the observed data. As such,
we begin by developing a comprehensive asymptotic theory for CUSUM processes
under conditions that allow for serial dependence in the observations. This includes
a careful analysis of how weights applied to the CUSUM process affect the limiting
distribution of its functionals, and extensions to multivariate observations.
1.1 Introduction
Here .k ∗ denotes a potential change point in the mean, and assuming that
holds, .μ0 and .μA describe the means of the observations before and after the change
point .k ∗ , respectively. The model (1.1.1) is called the At Most One Change (AMOC)
in the mean model. Detecting a change point may then be framed as a hypothesis
H0 : μ0 = μA ,
. (1.1.2)
i.e. there is no change in the mean of the observations, versus the alternative
HA : μ0 /= μA .
. (1.1.3)
If the change point .k ∗ were known, then the hypotheses .H0 and .HA revert to a two
sample problem to test for a difference in the population means between the samples
2
.X1 , . . . , Xk ∗ and .Xk ∗ +1 , . . . , XN . Assuming for the moment that .σ is also known,
it would be natural to test .H0 versus .HA using the two sample z-test. After some
simple algebra, this amounts to rejecting .H0 in favor of .HA for large values of the
test statistic
X̄k ∗ ,1 − X̄k ∗ ,2
. , (1.1.4)
σ [N/(k ∗ (N − k ∗ ))]1/2
where
1E E
k N
1
X̄k,1 =
. Xi and X̄k,2 = Xi . (1.1.5)
k N −k
i=1 i=k+1
If for instance the errors .E1 , . . . , EN are independent and identically distributed
normal random variables with a known variance, this is equivalent with the
likelihood ratio test for the equality of the means of two samples before and after
the point .k ∗ . Moreover, if .E1 , . . . , EN are independent and identically distributed,
but not necessarily normally distributed, then the above statistic has approximately
a standard normal distribution for large N due to the central limit theorem.
Since the change point .k ∗ is unknown, a logical test statistic to test .H0 versus .HA
is to maximize the two-sample test statistic over all potential change point locations,
which leads to considering the statistic
We will reject .H0 in favor of .HA or large values of .TN,1 . Some immediate
observations related to the definition of .TN,1 are as follows. Elementary algebra
gives
⎛ ⎞1/2 ||E |
k E ||
k N
[k(N − k)]1/2 N |
. |X̄k,1 − X̄k,2 | = | Xi − Xi | . (1.1.6)
N 1/2 k(N − k) | N |
i=1 i=1
1.1 Introduction 3
The process appearing on the right-hand side of (1.1.6) can be expressed in terms of
the cumulative sum (CUSUM) process,
⎛ ⎞
LN
E LNt⎦ E ⎠
t⎦ N
1 ⎝
.ZN (t) = Xi − Xi , t ∈ [0, 1], (1.1.7)
N 1/2 N
i=1 i=1
where .Lx⎦ denotes the integer part of x. The statistic .TN,1 is hence the maximum
of the weighted CUSUM process .ZN (t) with the weight function .w(t) = [t (1 −
t)]1/2 , 0 < t < 1, i.e.
|ZN (t)|
TN,1 =
. sup .
t∈[1/N,1−1/N ] [t (1 − t)]1/2
TN,1
. lim =σ a.s. (1.1.8)
N →∞ (2 log log N)1/2
As a result, when .H0 holds and the observations are independent and identically
P
distributed, the statistic .TN,1 satisfies .TN,1 → ∞ as .N → ∞. What then is an
appropriate threshold that if it is exceeded by .TN,1 we favor .H0 over .HA ? Or more
generally what is the approximate null distribution of .TN,1 ? We may ask the same
questions about more general change point test statistics of the form
|ZN (t)|
. sup
t∈(0,1) w(t)
for a weight function .w(·). In order to answer these questions, and many others
like it that arise in change point analysis, we begin with a detailed account of the
asymptotic properties of .ZN /w and its functionals in terms of the properties of
the model errors and the weight functions .w(·). Since most data that we wish to
apply change point analysis to are sequentially collected, either as time series or by
observing other sequential processes, it is to be expected that the observations are
serially dependent. As such it is useful to understand how serial dependence among
the observations influences the distribution of the process .ZN /w.
We say that .{W (x), x ≥ 0} is a Wiener process, or a standard Brownian motion,
if it is a continuous Gaussian process defined for .x ≥ 0 with .EW (x) = 0 and
.EW (x)W (y) = min(x, y). For the construction and properties of the Wiener
process, we refer to Csörgő and Révész (1981). The continuous Gaussian process
4 1 Cumulative Sum Processes
Assumption 1.1.1 For each N there are two independent Wiener processes
{WN,1 (t), 0 ≤ t ≤ N/2}, .{WN,2 (t), 0 ≤ t ≤ N/2} and .σ > 0 such that
.
| k |
|E |
| |
. sup | Ei − σ WN,1 (k)| = oP (N 1/2 )
1≤k≤N/2 |i=1
|
and
| N |
| E |
| |
. sup | Ei − σ WN,2 (N − k)| = oP (N 1/2 ).
N/2<k<N | i=k+1
|
( | | )
vm = E |Ei − Ei,m
.
∗ |ν 1/ν
≤ am−α with some a > 0 and α > 2,
∗ = g(η , . . . , η ∗ ∗ ∗
where .Ei,m i i−m+1 , ηi−m , ηi−m−1 , . . .), where .{ηk , k ∈ Z} are
independent, identically distributed copies of .η0 , independent of .{ηj , j ∈ Z}.
The notion of .Lν -decomposibility is discussed in greater detail in Sect. A.1
of the appendix. .Lν -decomposable sequences include observations generated from
1.1 Introduction 5
most stationary time series models, which are often defined through an underlying
innovation sequence and structural equations. These include stationary solutions
to autoregressive moving average (ARMA) models, generalized autoregressive
conditionally heteroscedastic (GARCH) models, as well as many other stationary
sequences. In essence Definition 1.1.1(2) aims to characterize the rate at which the
error sequence .Ei can be approximated by a sequence exhibiting a finite range of
dependence. We note that .Lν -decomposable errors satisfy Assumption 1.2.2 (see
Sect. A.1 of the appendix).
One of the great advantages afforded by the framework of .Lν -decomposibility
is the ease in which it generalizes to more complicated spaces and settings. For
example, if the observations and errors take their values in a general normed space
rather than .R, we may replace the absolute value .| · | in Definition 1.1.1 with the
norm .|| · || on the space, and immediately get a useful and general characterization
of weakly dependent sequences in that space. As such we make use of this notion
throughout this text to unify the study of asymptotics in change point analysis for
serially dependent observations over many different spaces and settings.
Under Assumption 1.1.1, the CUSUM process can be approximated by a
Brownian bridge.
Theorem 1.1.1 If .H0 of (1.1.2) and Assumption 1.1.1 are satisfied, we may define
a sequence of Brownian bridges .{BN (t), 0 ≤ t ≤ 1} such that
Proof We note that under the null hypothesis .ZN (t) does not depend on the mean,
and so may be expressed in terms of the errors .E1 , . . . , EN . We write
⎧ ⎛ ⎞
⎪ Ek LN/2⎦
E EN
⎪
⎪ k
⎪
⎪
⎪ Ei − ⎝ Ei + Ei ⎠ ,
⎪
⎪ i=1 N
⎪
⎪ i=1 i=LN/2⎦ +1
Ek
k E
N ⎨
if 1 ≤ k ≤ N/2, ⎛ ⎞
. Ei − Ei = (1.1.9)
N ⎪
⎪ EN
−
LN/2⎦
E E
N
⎪
⎪ N k ⎝ Ei ⎠ ,
i=1 i=1
⎪
⎪ − Ei + Ei +
⎪
⎪ N
⎪
⎪ i=k+1 i = 1 i = LN/2⎦ + 1
⎩
if N/2 < k < N.
By Assumption 1.1.1,
|/ k \ |
| E k E
N |
| |
. max | Ei − Ei − σ ┌N (k)| = oP (N 1/2 ). (1.1.11)
1≤k≤N | N |
i=1 i=1
If
then for each N , .BN is a continuous Gaussian process, and one can check that
EBN (t) = 0 and .EBN (t)BN (s) = min(t, s)−ts. This implies that .BN is distributed
.
= oP (1),
since by the uniform continuity of the Brownian bridge (see Appendix A.2)
U
⨅
Theorem 1.1.1 implies the weak convergence of functionals of .ZN that are
continuous with respect to the supremum norm. For example, the Kolmogorov–
Smirnov and Cramér–von Mises functionals of .ZN satisfy under Assumption (1.1.1)
and .H0 :
1 D
. sup |ZN (t)| → sup |B(t)|, (1.1.13)
σ 0≤t≤1 0≤t≤1
and
/ 1 / 1
1 D
.
2
ZN (t)dt → B 2 (t)dt. (1.1.14)
σ2 0 0
Two asymptotically size .α tests of .H0 under Assumption 1.1.1 may be constructed
by rejecting .H0 if either statistic on the left-hand side of (1.1.13) or (1.1.14) exceed
the .1 − α quantile of their limiting distributions. In order to practically implement
tests, one usually needs to replace .σ 2 with a consistent estimator, say .σ̂N2 , satisfying
|σ̂N2 − σ 2 | = oP (1).
. (1.1.15)
1.2 Weak Convergence of Weighted CUSUM Processes 7
1 D
. sup |ZN (t)| → sup |B(t)|
σ̂N 0≤t≤1 0≤t≤1
and
/ 1 / 1
1 D
.
2
ZN (t)dt → B 2 (t)dt.
σ̂N2 0 0
It has been observed that functionals of the weighted CUSUM process .ZN /w with
weights .w(·) that ascribe more weight to the CUSUM process near zero and one
can improve the power of CUSUM statistics to detect changes that might occur
anywhere in the sample, especially near the end points. In this subsection we
consider the asymptotic properties of weighted CUSUM processes.
Before proceeding, we note that the Brownian bridge .{B(t), 0 ≤ t ≤ 1} is
“symmetric” in the sense that it has the same behavior around 0 and 1. However,
.{ZN (t), 0 ≤ t ≤ 1} behaves differently around 0 and 1. According to its definition,
such it may be shown that .ZN (t)/w(t) → ∞ a.s., as .t → 1, for any N , if .w(1) = 0.
So following Csörgő and Horváth (1993) we modify the definition of .ZN (t). Let
QN (t) = ZN (t (N + 1)/N),
. 0<t <1 (1.2.1)
and define
|QN (t)|
. TN,2 = sup .
0<t<1 w(t)
Assumption 1.2.1 (i) .infδ≤t≤1−δ w(t) > 0 for all .0 < δ < 1/2, (ii) .w(t) is non
decreasing in a neighbourhood of 0, (iii) .w(t) is non increasing in a neighbourhood
of 1.
Typical examples of weight functions that we consider are of the form .w(t) =
[t (1 − t)]β , which satisfy Assumption 1.2.1. Let
/ 1 ⎛ ⎞
1 cw 2 (t)
I (w, c) =
. exp − dt. (1.2.4)
0 t (1 − t) t (1 − t)
The necessary and sufficient condition for (1.2.3) is given by an integral test
concerning .I (w, c).
Theorem 1.2.1 If Assumption 1.2.1 is satisfied, then (1.2.3) holds if and only if
I (w, c) < ∞ for some .c > 0.
.
Csörgő and Horváth (1993), (p. 181) contains a detailed proof of Theorem 1.2.1.
In order to establish (1.2.2), we need a stronger condition than Assumption 1.1.1
containing a rate of approximation:
Assumption 1.2.2 For each N there are two independent Wiener processes
{WN,1 (t), 0 ≤ t ≤ N/2}, .{WN,2 (t), 0 ≤ t ≤ N/2}, .σ > 0 and .ζ < 1/2 such
.
that
| k |
|E |
| |
. sup k −ζ | Ei − σ WN,1 (k)| = OP (1)
1≤k≤N/2 | |
i=1
and
| N |
| E |
−ζ | |
. sup (N − k) | Ei − σ WN,2 (N − k)| = OP (1).
N/2<k<N | |
i=k+1
1 D |B(t)|
. TN,2 → sup , (1.2.5)
σ 0≤t≤1 w(t)
where .{BN (t), 0 ≤ t ≤ 1} is the Brownian bridge of (1.1.12). Using Theorem A.2.1
we show that
|W (Nt) − W (LNT ⎦ )
. sup = OP (1), (1.2.7)
1/(N +1)≤t≤1−1/(N +1) (log+ Nt)1/2
where .{W (t), t ≥ 0} is a Wiener process. According to Theorem A.2.1 for any
M > 0 we have that
.
⎧ ⎞
|W (Nt) − W (LNT ⎦ )
.P sup >M
1/(N +1)≤t≤1−1/(N +1) (log+ (N t))1/2
⎧ ⎞
|W (k + s) − W (k)
≤P max sup >M
1/2≤k≤N +1 0≤s≤1 (log+ (k))1/2
∞
⎧ ⎞
E |W (k + s) − W (k)
≤ P sup >M
0≤s≤1 (log+ (k))1/2
k=0
∞
E ⎛ ⎞
M2
≤ c1 k exp − log k
3
k=2
→ 0, as M → ∞,
where .c1 is a positive constant. This gives (1.2.7). By Assumption 1.2.2 we get
| |
|LN t⎦ |
|E |
sup (N t)−ζ | E − σ W (N t) | = OP (1) (1.2.8)
.
| i N,1 |
1/(N +1)≤t≤1/2 | i=1 |
and
| |
| E |
| N |
sup (N (1 − t))−ζ | E − σ W (N (1 − t))|
.
| i N,2 |
1/2<t<1−1/(N +1) |i=LN t⎦ +1 |
= OP (1). (1.2.9)
= OP (1).
Since .I (w, c) < ∞ with some .c > 0 and Assumption 1.2.1(ii) holds, Csörgő and
Horváth (1993), (p. 180) yields
t 1/2
. lim = 0, (1.2.12)
t→0 w(t)
and therefore (1.2.11) follows from (1.2.6). It follows similarly that for all .x > 0
⎧ ⎞
|QN (t) − BN (t)|
. lim lim sup P sup >x = 0. (1.2.13)
δ→0 N→∞ 1−δ≤t≤1−1/(N +1) w(t)
and similarly
|QN (t)|
. sup = oP (1).
1−1/(N +1)<t≤1 w(t)
if and only if
/ 1 [t (1 − t)]p/2
. dt < ∞. (1.2.15)
0 w(t)
Mimicking the proof of Theorem 1.2.2 one can verify the following result:
Theorem 1.2.4 If .p ≥ 1, .H0 of (1.1.2), Assumptions 1.2.1(i), 1.2.2 and (1.2.15)
are satisfied, then
/
1 D
1 |B(t)|p
. TN,3 → dt, (1.2.16)
σp 0 w(t)
1 |QN (t)|
TN,5 =
. sup ,
σ 0<t<1 [t (1 − t)]1/2
P
where .QN (t) is defined in (1.2.1). As a result of (1.1.8) we have that .TN,4 → ∞
P
and .TN,5 → ∞ under .H0 . Also, due to Theorem 1.2.1, (1.2.2) cannot hold since
.I ([t (1 − t)]
1/2 , c) = ∞ for all .c > 0. One alteration of .T
N,4 that leads to a statistic
with a well defined limiting distribution under .H0 and Assumption 1.1.1 is to “trim”
the domain on which the CUSUM process is maximized. This leads to statistics of
the form
⎛ ⎞1/2 ||E |
k E ||
k N
1 1 N |
.TN,6 = TN,1 = max | Xi − Xi | ,
σ σ LN α1 ⎦ ≤k≤LN α2 ⎦ k(N − k) | N |
i=1 i=1
where .0 < α1 < α2 < 1. It follows from Theorem 1.1.1 and the continuous
mapping theorem that
D |B(t)|
TN,6 →
. sup , (1.2.17)
α1 ≤t≤α2 [t (1 − t)]1/2
|B(t)| D
. sup = sup |V (u)|,
α1 ≤t≤α2 [t (1 − t)]1/2 0≤u≤c(α1 ,α2 )
where
⎛ ⎞
1 α2 α1
c(α1 , α2 ) =
. log − log
2 1 − α2 1 − α1
Two drawbacks of statistics sharing the form of .TN,6 are that they have reduced
power to detect change points that occur outside of the interval .[LNα1 ⎦ , LNα2 ⎦ ],
and they also depend on the practitioner’s choice of .α1 and .α2 .
Alternatively, it can be shown that .TN,4 and .TN,5 converge in distribution to
extreme-value laws upon proper centralization and normalization, which we now
show. The following result is sometimes referred to in the literature as a “Darling–
Erdős” result, since the basic idea behind it is to apply a Gaussian approximation as
introduced in Darling and Erdős (1956).
Let for .x > 0
1 1
a(x) = (2 log x)1/2
. and b(x) = 2 log x + log log x − log π. (1.2.18)
2 2
Theorem 1.2.5 If .H0 of (1.1.2) and Assumption 1.2.2 are satisfied, then
{ } ( )
. lim P a (log N) TN,5 ≤ x + b (log N) = exp −2e−x
N →∞
for all .x ∈ R.
Proof We divide the unit interval .(0, 1) into 5 subintervals. Let .t1 = t1 (N ) =
(log N )4 /N, .t2 = t2 (N ) = 1 − (log N)4 /N, and define
and
1 |QN (t)|
AN,5 =
. sup .
σ 1−1/(N +1)<t<1 [t (1 − t)]1/2
|BN (t)|
. sup = OP ((log log log N )1/2 ), (1.2.20)
1/(N +1)≤t≤t1 [t (1 − t)]1/2
14 1 Cumulative Sum Processes
|BN (t)|
. sup = OP ((log log log N )1/2 ), (1.2.21)
t2 ≤t≤1−1/(N +1) [t (1 − t)] 1/2
and
1 |BN (t)| P
. sup → 1, (1.2.22)
(2 log log N) t1 ≤t≤t2 [t (1 − t)]1/2
1/2
where .BN is the Brownian bridge defined in (1.2.6). The approximation in (1.2.6)
with (1.2.20)–(1.2.22) yields that
1 |QN (t)| P
. sup → σ,
(2 log log N)1/2 1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]
1/2
and
and similarly,
⎛ ⎞
.AN,5 = OP N −1/2 .
Hence we only need the to establish the extreme value limit result for the
supremum of the standardized Brownian bridge on .[1/(N +1), 1−1/(N +1)], noting
that the distribution of .BN does not depend on N. This is shown in Theorem A.2.3
of the appendix, which implies that
⎧ ⎛ ⎞ ⎛ ⎞⎞
1 |BN (t)| 1
. lim P a rN sup ≤x+a rN
1/(N +1)≤t≤1/(N +1) [t (1 − t)]
N →∞ 2 1/2 2
= exp(−2e−x )
for all .x ∈ R, where .rN = (1 − 1/(N + 1))2 (N + 1)2 . Elementary arguments give
that
| ⎛ ⎞ |
| 1 |
. |a r − a(log N)| = o((log log N )−1/2 )
| 2
N |
and
| ⎛ ⎞ |
| 1 |
| |
| 2 rN − b(log N)| = o(1),
. b
and
/ ∞
b∗ (p) =
. |x|p φ(x)dx,
−∞
where
⎛ ⎞
1 1 2
.φ(x) = exp − x
(2π )1/2 2
It may be shown under the no-change null hypothesis, and with independent and
identically distributed errors in model (1.1.1), that the limiting distribution of
.sup0<t<1 |QN (t)|/[t (1 − t)] for .κ > 1/2 is not the supremum of a Gaussian pro-
κ
|E |
| |
cess, but rather is determined by the random variable .sup1≤k<∞ (1/k κ ) | kj =1 Xj |.
In order to modify such statistics so that their limit distribution is a functional of
a Gaussian process, and to increase their power against change points that may lie
near the end points of the sample, Rényi suggested applying such heavier weights
to the CUSUM process, but with an alternate trimming scheme when compared to
that used in (1.2.17). Let the trimming parameters .t1 = t1 (N ) < t2 = t2 (N ) satisfy
the following condition:
Assumption 1.2.3 (i) .min(t1 (N ), 1−t2 (N )) → 0, (ii) .N min(t1 (N ), 1−t2 (N )) →
∞.
We define the statistic
where
rN = min(t1 (N ), 1 − t2 (N )).
. (1.2.27)
The limiting distribution of .TN,7 may be expressed using the random variables
D D
a1 (κ) and .a2 (κ), which we take to be independent such that .a1 (κ) = a2 (κ) =
.
rN rN
. lim = γ1 , lim = γ2 , (1.2.28)
N →∞ t1 (N ) N →∞ 1 − t2 (N )
and
κ−1/2 κ−1/2
a(κ) = max(γ1
. a1 (κ), γ2 a2 (κ)). (1.2.29)
Theorem 1.2.7 If .H0 of (1.1.2), Assumptions 1.2.2 and 1.2.3 are satisfied, and .κ >
1/2, then
D
TN,7 → a(κ).
.
Proof It follows from Assumption 1.2.2 that with the sequence of Brownian bridges
in (1.1.12) we have
Hence
⎛ ⎞
= OP (N t1 )ζ −1/2 + (N (1 − t2 ))ζ −1/2
= oP (1)
using Assumption 1.2.3(ii), since .ζ < 1/2. Now the result follows from Theo-
rem A.2.5. U
⨅
18 1 Cumulative Sum Processes
Remark 1.2.2 The “difference in sample mean” version of the statistic suggested
by Rényi is
| |
.TN,8 = max |X̄k,1 − X̄k,2 | ,
a(N )≤k≤b(N )
where .X̄k,1 and .X̄k,2 are the sample means of the first k and last .N − k observations,
as defined in (1.1.5). To get the limit distribution of .TN,8 , we apply Theorem 1.2.7
with .κ = 1.
The .Lp analogue of .TN,7 is
/
κ−p/2+1 1 t2 |QN (t)|p
. TN,9 = rN dt,
σp t1 [t (1 − t)]κ
where .rN is defined in (1.2.27). Let .b1 (p, κ) and .b2 (p, κ) be independent random
variables such that
/ ∞
D D |W (t)|p
.b1 (p, κ) = b2 (p, κ) = dt.
1 tκ
where .γ1 and .γ2 are defined in (1.2.28). The proof of the below result mimics
that of Theorem 1.2.7, where the salient difference lies in replacing the use of
Theorem A.2.5 with Theorem A.2.6.
Theorem 1.2.8 Let .p ≥ 1. If .H0 of (1.1.2), Assumptions 1.2.2 and 1.2.3 are
satisfied, and .κ > p/2 + 1, then
D
.TN,9 → b(p, κ).
where .Xi , μ0 , μA and .E i take values in .Rd , .EE i = 0, and .k ∗ is an unknown change
point. One often wishes to test .H0 of (1.1.2) against the alternative .HA in (1.1.3)
with .μ0 and .μA replacing .μ0 and .μA , respectively.
We introduce the d dimensional cumulative sums (CUSUM) process
⎛ ⎞
LN
E LNt⎦ E ⎠
t⎦ N
ZN (t) = N −1/2 ⎝
. Xi − Xi , 0 ≤ t ≤ 1.
N
i=1 i=1
Let .|| · || and .·T denote respectively the Euclidean or Frobenius norm and the
transpose of vectors and matrices. We replace Assumption 1.1.1 with
Assumption 1.3.1 For each N there are two independent Gaussian processes
{┌ N,1 (t), 0 ≤ t ≤ N/2}, .{┌ N,2 (t), 0 ≤ t ≤ N/2} with values in .Rd such that
.
|| k ||
||E ||
|| ||
. sup || E i − ┌ N,1 (k)|| = oP (N 1/2 ),
1≤k≤N/2 || ||
i=1
|| N ||
|| E ||
|| ||
. sup || E i − ┌ N,2 (N − k)|| = oP (N 1/2 ),
N/2<k≤N || ||
i=k+1
.EBN (t) = 0 and .EBN (t)BT N (s) = (min(t, s) − ts)E. Hence the coordinates of
1/2
.BN (t) are dependent Brownian bridges scaled with .σ
i,i , 1 ≤ i ≤ d. If .E is
nonsingular, the weak convergence in (1.3.2) can be rewritten as
| |
| |
. sup |ZT
N (t)E −1
ZN (t) − B̄T
N (t) B̄N (t) | = oP (1), (1.3.3)
0≤t≤1
20 1 Cumulative Sum Processes
where .B̄(t) = (B̄N,1 (t), . . . , B̄N,d (t))T and .{B̄N,i (t), 0 ≤ t ≤ 1}, i ∈ {1, . . . , d}
are independent Brownian bridges. To use (1.3.2) or (1.3.3) in statistical inference,
we often need a consistent estimator .Ê N satisfying
such that
| k |
|E |
| |
. sup k −ζ | E i − ┌ N,1 (k)| = OP (1)
1≤k≤N/2 | |
i=1
and
| N |
| E |
−ζ | |
. sup (N − k) | E i − ┌ N,2 (N − k)| = OP (1),
N/2<k<N | |
i=k+1
QN (t) = ZN (t (N + 1)/N),
. 0 ≤ t ≤ 1. (1.3.4)
and
||QN (t) − BN (t)||
N 1/2−ζ
. sup = OP (1), (1.3.6)
1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]ζ
where
and
⎧ x ( )
⎪
⎪┌ N,1 (x) − ┌ N,1 (N/2) + ┌ N,2 (N/2) ,
⎪
⎪
⎪
⎨
N
if 0 ≤ x ≤ N/2,
.┌ N (x) =
⎪ N −x ( ) (1.3.7)
⎪
⎪−┌ N,2 (N − x) + ┌ N,1 (N/2) + ┌ N,2 (N/2) ,
⎪
⎪ N
⎩
if N/2 ≤ x≤N.
Elementary arguments yield that .BN ∈ Rd is a Gaussian process, and for .t, s ∈
[0, 1], .EBN (t) = 0, and .EBN (t)BTN (s) = (min(t, s) − ts)E. The proofs of The-
orems 1.2.2–1.2.8 use weighted approximations. Using the analogous multivariate
approximations in (1.3.5) and (1.3.6), we can obtain asymptotic approximations for
vector valued processes or for corresponding quadratic forms of those processes.
Assumption 1.3.3 .E is a nonsingular matrix.
If Assumption 1.3.3 holds, we can linearly transform the limit process into a
multivariate process with components that are independent Brownian bridges. Since
the distribution of .{BN (t), 0 ≤ t ≤ 1} does not depend on N , we define .B and .B̄ as
D
. {B(t), 0 ≤ t ≤ 1} = {BN (t), 0 ≤ t ≤ 1} and
{ }D{ }
B̄(t), 0 ≤ t ≤ 1 = E −1/2 B̄N (t), 0 ≤ t ≤ 1 ,
where .E 1/2 is the (positive definite) square root of .E, and .E −1/2 = (E 1/2 )−1 . We
recall the integral criterion based on .I (w, c) from Theorem 1.2.1. Based on the
weighted approximation of the .Rd valued CUSUM process .QN (t), we have that
and
QT −1
N (t)E QN (t) D ||B̄(t)||2
. sup → sup (1.3.8)
0<t<1 w 2 (t) 2
0<t<1 w (t)
if and only .I (w, c) < ∞ for some .c > 0. It is more difficult to obtain the general-
ization of the Darling–Erdős result of Theorem 1.2.5 to .QN (t). Since .cov(B(t)) =
t (1−t)E and .cov(B̄(t)) = t (1−t)Id , a standardized vector-valued CUSUM process
is obtained by weighting the CUSUM process with the weight function .[t (1−t)]1/2 .
The limit distribution of .sup1/(N +1)≤t≤1−1/(N +1) ||B(t)||/[t (1 − t)]1/2 is unknown
for general .E. It is known though for .B̄ (t), and is given in Theorem A.2.7. Let
d
a(x) = (2 log x)1/2
. and bd (x) = 2 log x + log log x −log ┌(d/2), (1.3.9)
2
22 1 Cumulative Sum Processes
for all .x ∈ R.
Proof It follows as in the proof (1.2.24) in establishing Theorem 1.2.5 that
| |
| (QT −1 1/2 (BT −1 1/2 |
| N (t)E QN (t)) N (t)E BN (t)) |
. | sup − sup |
|0<t<1 [t (1 − t)]1/2 1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]1/2 |
⎛ ⎞
= oP (log log N)−1/2 .
We note that
⎧ d ⎞
{ }
T −1 D E 2
. BN (t)E BN (t), 0 ≤ t ≤ 1 = Bi (t), 0 ≤ t ≤ 1 ,
i=1
where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , d} are independent Brownian bridges. Now
the result follows from Theorem A.2.7 in the Appendix. U
⨅
/ / d \p/2 / d \p/2
E E | |
d
a∗ (p, d) =
. xi2 yi2 (2π(1 − exp(−|u|))−1/2
R2d+1 i=1 i=1 i=1
⎛ ⎛ ⎞
1
× exp − (x 2 + yi2 − 2 exp(−|u|/2)xi yi
2(1 − exp(−|u|)) i
| |
d ⎞ /| |
d
\/ d \
| |
− φ(xi )φ(yi ) × xi yi du,
i=1 i=1 i=1
where .┌(x) is the Gamma function. The proof of the following result is similar to
that of Theorem 1.3.1.
1.3 Multivariate CUSUM Processes 23
Theorem 1.3.2 Let .p ≥ 1. If .H0 , Assumptions 1.3.2 and 1.3.3 are satisfied, then
⎛ ⎞1/2 ⎛ / −1 ⎞
1 1 (QT
N (t)E QN (t))
p/2
D
. dt − 2b∗ (p, d) → N ,
4a∗ (p, d) log N 0 [t (1 − t)]p/2+1
D D ||W(t)||
ā1,E (d, κ) = ā2,E (κ) = sup
. .
1≤t<∞ tκ
Let
κ−1/2 κ−1/2
āE (d, κ) = max(γ1
. ā1,E (d, κ), γ2 ā2,E (d, κ)). (1.3.10)
Similarly, let .b̄1,E (p, κ), b̄2,E (p, κ) be independent and identically distributed with
/ ∞
D D ||W(t)||p
.b̄1,E (p, κ) = b̄2,E (p, κ) = dt,
1 tκ
and define
κ−p/2−1 κ−p/2−1
b̄E (p, κ) = γ1
. b̄1,E (p, κ) + γ2 b̄2,E (p, κ).
Theorem 1.3.3 If .H0 , Assumptions 1.2.3 and 1.3.2 hold, and .κ > 1/2, then we
have
where .rN and .āE (d, κ) are defined in (1.2.27) and (1.3.10).
The last result of this section gives the convergence in distribution of the integrals
of the heavily weighted CUSUM process.
24 1 Cumulative Sum Processes
Theorem 1.3.4 Let .p ≥ 1. If .H0 , Assumptions 1.2.3 and 1.3.2 hold, .κ > p/2 + 1,
then we have
/ t2
κ−p/2−1 ||QN (t)|| D
.r dt → b̄E (p, κ),
t1 [t (1 − t)]
N κ
∗ (t).
The following result is a restatement of Theorem 1.3.5 for the processes .zj,N
Theorem 1.3.6 We assume that .H0 , Assumptions 1.3.2 and 1.3.3 hold.
(i) If Assumption 1.2.1 is satisfied and .I (w, c) < ∞ with some .c > 0, then for any
.p ∈ {1, . . . , d},
∗
−1/2 |zN,j (t)| D |Bj (t)|
. max max λ → max max ,
1≤j ≤p 0<t<1 j w(t) 1≤j ≤p 0<t<1 w(t)
= exp(−2pe−x ),
1.4 Exercises
Ei = ρEi−1 + ηi ,
. i ∈ Z,
where {ηi , i ∈ Z} are independent and identically distributed random variables with
Eη0 = 0, Eη02 = σ 2 , E|η0 |ν < ∞ with some ν > 2 and |ρ| < 1. Show that there
is a Wiener process {W (x), x ≥ 0} such that
| k |
|E |
| σ |
.| Ei − W (k)| = oP (k 1/ν ).
| 1−ρ |
i=1
E
p E
q
.Ei = φj Ei−j + ηi + ψj ηi−j , i ∈ Z,
j =1 j =1
26 1 Cumulative Sum Processes
where {ηi , i ∈ Z} are independent and identically distributed random variables with
Eη0 = 0, Eη02 = σ 2 and E|η0 |ν < ∞ with some ν > 2. Find τ and construct a
Wiener process {W (x), x ≥ 0} such that
| k |
|E |
| |
.| Ei − τ W (k)| = oP (k 1/ν ).
| |
i=1
E
M
Ei =
. cl ηi−l i ∈ Z,
l=−M
where {ηi , i ∈ Z} are independent and identically distributed random variables with
Eη0 = 0, Eη02 = σ 2 and E|η0 |ν < ∞ with some ν > 2 (finite order linear process).
Find τ and construct a Wiener process {W (x), x ≥ 0} such that
| k |
|E |
| |
.| Ei − τ W (k)| = oP (k 1/ν ).
| |
i=1
E
p E
q
Ei =
. φj Ei−j + ηi + ψj ηi−j , i ∈ Z,
j =1 j =1
where X̄j,k is defined in (2.6.1). Compute the limit distribution of TN under the null
hypothesis.
Exercise 1.4.8 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = μ2 = . . . = μN against the at least
one change alternative using the statistic
{
TN =
. max N −3/2 j (k − j )|X̄0,j − X̄j,k | + (k − j )(l − k)|X̄j,k − X̄k,l |
1≤j <k<l
}
+(l − k)(N − l)|X̄k,l − X̄l,N | .
We have seen that under the no change in the mean null hypothesis .H0 , and
assuming the observations satisfy a functional version of the central limit theorem
(Assumptions 1.1.1 and 1.2.2), that the asymptotic distribution of many functionals
of the CUSUM process may be computed. Since the CUSUM process arises as
the objective function in maximally selecting two sample test statistics to test .H0
versus .HA , it stands to reason that, in the presence of change points in the series, the
functionals of the CUSUM process that we have considered should be consistent
in the sense that they diverge in probability to positive infinity as the sample size
grows. One goal of this chapter is to carefully quantify the asymptotic behaviour of
the CUSUM process in the presence of change points.
When several change points are thought to exist in the sequence of observations,
it is natural to estimate their locations with the points at which the CUSUM process
achieves its largest values. In this section we define such estimators, and establish
their asymptotic properties in the presence of a change point in the series, as well
as under local alternatives in which the magnitude of the change in the mean of
the series decreases with the sample size. These can be used to compute confidence
intervals for the change point locations.
There exist a great many methods to detect and estimate multiple change points in
a sequence of observations, and we discuss the consistency properties of two such
approaches: binary segmentation, and model selection techniques using penalized
loss functions (e.g. information criteria).
To begin, we consider the case when there is exactly one change in the mean, i.e. the
observations follow (1.1.1) under .HA . Letting again .μ0 and .μA denote the means
before and after the change, the CUSUM process may be written as
Σ
k
k Σ
N Σ
k
k Σ
N
. Xi − Xi = Ei − Ei + V (k), (2.1.1)
N N
i=1 i=1 i=1 i=1
where
⎧
⎪ k(N − k ∗ )
⎨ ΔN , if 1 ≤ k ≤ k ∗ ,
.V (k) = ∗ N (2.1.2)
⎩ k (N − k) ΔN ,
⎪
if k ∗ + 1 ≤ k ≤ N,
N
and .ΔN = μ0 − μA . We allow that the change magnitude .ΔN might depend on N,
and may vanish as N increases. Similarly we use .θN = k ∗ /N to denote the break
fraction.
Theorem 2.1.1 We assume that .HA of (1.1.3) and Assumption 1.2.2 are satisfied.
(i) If .0 ≤ κ < 1/2, then
|QN (t)| P
. sup → ∞
0<t<1 [t (1 − t)] κ
if and only if
|QN (t)| P
(log log N)−1/2 sup
. → ∞
0<t<1 [t (1 − t)]
1/2
if and only if
|QN,E (t)|
. sup = OP (1),
0<t<1 [t (1 − t)]
κ
where
⎛ ⎞
L(NΣ
+1)t⎦
L(N + 1)t⎦ Σ ⎠
N
1 ⎝
QN,E (t) =
. Ei − Ei .
N 1/2 N
i=1 i=1
2.1 CUSUM Statistics in the Presence of Change Points 31
Note that .V (k ∗ ) = NθN (1 − θN )ΔN , and that the largest value that .[N 2κ V (k)]/
[k(N − k)]κ takes for .k ∈ {1, . . . , N } is at .k = k ∗ . As a result, we get that
|V (L(N + 1)t⎦|
. sup N −1/2 = [θN (1 − θN )]1−κ N 1/2 |ΔN | → ∞ (2.1.5)
0<t<1 [t (1 − t)]κ
if and only if (2.1.3) holds, giving part .(i) of Theorem 2.1.1. The proof of part .(ii)
requires only replacing Theorem 1.2.2 with Theorem 1.2.5. ⨆
⨅
Theorem 2.1.1 can be interpreted that for changes that are close to the boundary,
i.e. .θN close to zero or one, smaller changes are more easily detected for larger
values of .κ. Regardless of the value of .κ though, it is easiest to detect changes that
occur in the middle of the sample .(θN = 1/2).
Next we consider the behavior of functionals of weighted CUSUM processes
under local alternatives when the change occurs at a location bounded away from
the end points of the sample.
Δ
μ0 = μ
. and μA = μ + with Δ /= 0.
N 1/2
Let
⎧
t (1 − θ ), if 0 ≤ t ≤ θ
gθ (t) =
.
θ (1 − t), if 0 ≤ t ≤ 1.
Theorem 2.1.2 If .HA of (1.1.3) holds, and Assumptions 1.2.2, 2.1.1 and 2.1.2 are
satisfied, then
t 1/2 (1 − t)1/2
. → 0 (t → 0) and → 0 (t → 1)
w(t) w(t)
(see Section 4.1 of Csörgő and Horváth (1993)). Hence .sup0<t<1 |gθ (t)|/w(t)
is finite. As such according to Theorem 1.2.1 the limit in (2.1.6) is finite with
probability 1. It follows from elementary calculation that for a positive constant .c1 ,
32 2 Change Point Analysis of the Mean
|N −1/2 V (L(N + 1)t⎦) − Δg(t)| ≤ c1 t/N for .t ≤ θ , and .|N −1/2 V (L(N + 1)t⎦) −
.
We showed in the proof of Theorem 1.2.2 that under Assumption 1.2.2 there exists
a sequence of Brownian bridges .{BN (t), 0 ≤ t ≤ 1} such that
Now the result follows from (2.1.7) and (2.1.8) since the distribution of .{BN , 0 ≤
t ≤ 1} does not depend on N . ⨆
⨅
The following result can be used to obtain a description of the power function of
functionals of weighted CUSUM processes under condition (2.1.3).
Theorem 2.1.3 If .HA of (1.1.3), and Assumption 1.2.2 are satisfied, .0 ≤ κ < 1/2,
max{1 − lim supN →∞ θN , lim infN →∞ θN } > 0, and
.
then we have
⎧ ⎫
1 |QN (t)| D
. max − [θN (1 − θN )]1−κ N 1/2 |ΔN | → N,
σ [θN (1 − θN )]1/2−κ 0<t<1 [t (1 − t)]κ
|QN,E (t)|
. sup = OP (1). (2.1.10)
0<t<1 (1 − t)]
[t κ
2.1 CUSUM Statistics in the Presence of Change Points 33
For .0 < δ < 1, we define the events .BN,1 = {(1 − δ)θN ≤ θ̂N ≤ (1 + δ)θN }c ,
and
⎧ ⎫
QN (t) QN (t)
.BN,2 = max > max ,
t∈[(1−δ)θN ,(1+δ)θN ]c [t (1 − t)]κ t∈[(1−δ)θN ,(1+δ)θN ] [t (1 − t)]κ
where
QN (t)
θ̂N = sargmax
.
t∈(0,1) [t (1 − t)]κ
is the smallest maximal argument of .QN (t)/[t (1 − t)]κ . Evidently .BN,1 ⊆ BN,2 .
Using the definition of V , we have that
V (LNt⎦)
. max
t∈[(1−δ)θN ,(1+δ)θN ]c N 1/2 [t (1 − t)]κ
≤ N 1/2 [θN (1 − θN )]1−κ ΔN max{δ 1−κ , (1 − δ)1−κ } + rN,1 ,
where .rN,1 = O(1) is deterministic and arises from approximating .LN t⎦ with Nt.
We have already seen that
V (LNt⎦)
. max = N 1/2 [θN (1 − θN )]1−κ ΔN ,
t∈[(1−δ)θN ,(1+δ)θN ] N 1/2 [t (1 − t)]κ
which is the value of .V (t) achieved at .t = θN . As a result we have that .BN,2 ⊆ BN,3 ,
where
⎧
.BN,3 = N (θN (1 − θN ))1−κ ΔN max{δ 1−κ , (1 − δ)1−κ } + cN,1
1/2
QN,E (t)
+ max
t∈[(1−δ)θN ,(1+δ)θN ]c [t (1 − t)]κ
⎫
QN,E (θN )
> N 1/2 [θN (1 − θN )]1−κ ΔN + .
[θN (1 − θN )]κ
Since .max0<t<1 QN,E (t)/[t (1 − t)]κ = OP (1) under Assumption 1.2.2, and
1−κ , (1 − δ)1−κ } < 1, we have that .lim
.max{δ N →∞ P (BN,3 ) = 0. It follows that
{ }
. lim P (1 − δ)θN ≤ θ̂N ≤ (1 + δ)θN = lim 1 − P (BN,1 ) = 1. (2.1.11)
N →∞ N →∞
34 2 Change Point Analysis of the Mean
In order to complete the proof of the Theorem, we assume .lim supN→∞ θN < 1;
i.e. .θN cannot be too close to 1. The case when .lim infN→∞ θN > 0 can be proven
similarly. Using Assumption 1.2.2, we may define a sequence of Brownian bridges
.{BN (t), 0 ≤ t ≤ 1} such that
Now
| |
1 | B(t) B(θN ) |
sup | | − | (2.1.13)
[θN (1 − θN )] |
.
|t−θN |≤δθN [t (1 − t)]
1/2−κ κ κ
θN
⎧ | |
1 | B(t) B(θ ) ||
⎪
⎪ |
⎨ θ 1/2−κ sup | [t (1 − t)]κ − [θ (1 − θ )]κ | , if θN → θ
D ||t−θ|≤δθ |
→ | W (s) |
⎪
⎩ sup | κ − W (1)|| , if θN → 0,
⎪ |
|s−1|≤δ s
where .{W (t), t ≥ 0} is a Wiener process. The first part follows from the almost sure
continuity of .B(·). To prove the second part (2.1.13), we use that .B(t) = W (t) −
tW (1), and hence
| |
1 | W (t) − tW (1) W (θN ) − θN W (1) |
sup | − |
.
1/2−κ | [t (1 − t)]κ [θN (1 − θN )]κ |
θN |t−θN |≤δθN
| |
1 | W (t) W (θN ) |
= 1/2−κ |
sup | κ − | + oP (1)
θN |t−θN |≤δθN t θNκ |
Using the almost sure continuity of .B(t) and .W (t) at .t = 1, we conclude that for all
x>0
.
⎧ | |
1 | B(t) B(θN ) |
. lim lim sup P sup | | − |
δ→0 N→∞ 1/2−κ [t (1 − t)]κ [θ (1 − θ )] κ|
θN |t−θN |≤δθN N N
⎫
> x = 0. (2.1.14)
2.1 CUSUM Statistics in the Presence of Change Points 35
1 |QN (t)|
. max
[θN (1 − θN )] 1/2−κ 0<t<1 [t (1 − t)]κ
⎧ ⎫
1 σ BN (θN ) N −1/2 V (L(N + 1)t⎦)
= + sup + oP (1)
[θN (1 − θN )]1/2−κ [θN (1 − θN )]κ |t−θN |≤δ [t (1 − t)]κ
⎧ ⎫
1 σ BN (θN )
= + [θN (1 − θN )] 1−κ 1/2
N Δ + oP (1),
[θN (1 − θN )]1/2−κ [θN (1 − θN )]κ
since .V (L(N + 1)t⎦)/[t (1 − t)]κ reaches its largest value at .θN . The distribution of
.BN (θN )/[θN (1 − θN )]
1/2 is standard normal for each N, and therefore the proof is
complete. ⨆
⨅
We may also establish similar results for integrated functionals of the CUSUM
process in the presence of a change point.
Theorem 2.1.4 If .HA of (1.1.3) and Assumption 1.2.2 are satisfied, .p ≥ 1 , .κ <
p/2 + 1, then we have
(i)
⎧ 1 |QN (t)|p P
. dt → ∞
0 [t (1 − t)] κ
if and only if
where .c1 is a positive constant. Hence the first part of Theorem 2.1.4 follows from
Theorem 1.2.4.
To prove the second part we note that
⎧⎧ ⎧ p
⎫
1 |QN (t)|p 1 QN (t)
. lim P dt = dt =1
N→∞ 0 [t (1 − t)]κ 0 [t (1 − t)]κ
and
⎧ ⎧ 1
(N −1/2 V (L(N + 1)t⎦))p
1 p
QN (t)
. dt − dt
0 [t (1 − t)]κ
0 [t (1 − t)]κ
⎧ 1
VN (t)(N −1/2 V (L(N + 1)t⎦))p−1
=p dt + oP ((N 1/2 ΔN )p−1 ).
0 [t (1 − t)]κ
Using Assumption 1.2.2 we can define a sequence of Brownian bridges .{BN (t), 0 ≤
t ≤ 1} such that
⎧ 1 VN (t)
. (N −1/2 V (L(N + 1)t⎦))p−1 dt
0 [t (1 − t)]
κ
⎧ 1
σ BN (t)
= (N −1/2 V (L(N + 1)t⎦))p−1 dt + oP ((N 1/2 ΔN )p−1 ).
0 [t (1 − t)]
κ
The distribution of .BN does not depend on N, so once again using the definition of
V (k) we have that
.
⎧ 1 σ BN (t)
(N −1/2 V (L(N + 1)t⎦))p−1 dt
0 [t (1 − t)]κ dt
⎧ 1 ⎛
D σ B(t)
= (N ΔN )
1/2 p−1
t (1 − LNθ ⎦/N)1{0 ≤ t ≤ LNθ⎦/N }
0 [t (1 − t)]
κ
⎞p−1
+ (1 − t)(LNθ ⎦/N)1{LNθ ⎦/N ≤ t ≤ 1} dt.
Since
⎛
(N 1/2 ΔN )p−1 t (1−LNθ⎦/N)1{0 ≤ t ≤ LNθ⎦/N }
.
⎞p−1
p−1
+ (1 − t)LNθ ⎦/N1{LNθ⎦/N ≤ t ≤ 1} → gθ (t),
2.1 CUSUM Statistics in the Presence of Change Points 37
In some cases it is of interest, rather than to consider the alternative (1.1.3) that
the means simply differ, to evaluate through testing whether the means differ by
some “relevant” or “significant” amount. This leads to considering the alternative
hypotheses
HA : |μ0 − μA | > Δ0 ,
. (2.1.15)
where .Δ0 ≥ 0 is a practitioner specified threshold. In this case the null hypothesis
is
H0 : |μ0 − μA | ≤ Δ0 .
. (2.1.16)
We recall the notation .X̄k,1 and .X̄k,2 , denoting the empirical means of .{X1 , . . . , Xk }
and .{Xk+1 , . . . , XN }, defined in (1.1.5). We reject the null hypothesis of (2.1.16)
in favor of the hypothesis in (2.1.15) if there exists a k such that .Δ̂k = |X̄k,1 −
X̄k,2 | is significantly larger than .Δ0 . Since .Δ̂k is not a reliable estimator of .ΔN for
small or large values of k, it is natural to consider Rényi style statistics described
in Sect. 1.2.2, which involve trimming the domain on which .Δ̂k is maximized. In
particular, we consider the statistic
⎛ ⎞
. D̂N = N 1/2
max |X̄k,1 − X̄k,2 | − Δ0 .
a(N )≤k≤N −b(N )
Theorem 2.1.5 We assume that .HA of (1.1.3), Assumption 1.2.2, 2.1.1 are satisfied
and
Let
P
D̂N → −∞.
.
D
D̂N → N (ζ, σ 2 /[θ (1 − θ )]),
.
where .N (ζ, σ 2 /[θ (1 − θ )]) is a normal random variable with mean .ζ and variance
.σ /[θ (1 − θ )].
2
(iii) If .ζ = ∞, then
P
D̂N → ∞.
.
Proof The difference between the means .X̄k,1 and .X̄k,2 may be decomposed as
⎛ k ⎞
N Σ k Σ
N
X̄k,1 − X̄k,2
. = Xi − Xi
k(N − k) N
i=1 i=1
⎛ k ⎞
N Σ k Σ
N
= Ei − Ei + v(k),
k(N − k) N
i=1 i=1
where
⎧
⎪
⎪ N − k∗
⎨ ΔN , if 1 ≤ k ≤ k ∗ ,
.v(k) =
N − k
⎪
⎪ k∗
⎩ ΔN , if k ∗ + 1 ≤ k ≤ N.
k
We assume without loss of generality that .ΔN > 0. It is established in the proof of
Theorem 1.2.7 that
| k |
|Σ k Σ ||
N ⎛ ⎞
N | −1/2 −1/2
. max | Ei − Ei | = OP aN + bN .
a(N )≤k≤N −b(N ) k(N − k) | N |
i=1 i=1
2.1 CUSUM Statistics in the Presence of Change Points 39
The function .v(k) reaches its largest value for .k ∈ {1, . . . , N } at .k ∗ , and .v(k ∗ ) =
ΔN . Also for any .0 < δ < θ ,
⎛ ⎞
N N
. max v(k) < 1 − δ min , .
∗
|k −k|≥N δ k ∗ + Nδ N − (k ∗ − δN )
.|k̂N − k ∗ | = oP (N ),
where
k̂N =
. sargmax |X̄k,1 − X̄k,2 |.
k∈{a(N ),...,N −b(N )}
B(t) B(θ )
. sup → , a.s.,
|t−θ|≤δ t (1 − t) θ (1 − θ )
N 1/2 (ΔN − Δ0 ) → ζ , the result (ii) follows. When .|ζ | = ∞, then the random
.
denote the smallest maximal argument of the weighted CUSUM process. Under the
AMOC model and .HA , .k̂N may be used as an estimator of .k ∗ , and .θ̂N = k̂N /N
serves as an estimator of the break fraction .θ .
In order to describe the asymptotic behavior of .k̂N , we define a triangular drift
term, and two sided Brownian motion, as
⎧
⎨ (1 − κ)(1 − θ ) + κθ, if t < 0
.mκ (t) = 0, if t = 0 (2.2.1)
⎩
(1 − κ)θ + κ(1 − θ ), if t > 0,
and
⎧
W1 (−t), if t < 0
W (t) =
. (2.2.2)
W2 (t), if t ≥ 0,
where .{W1 (t), t ≥ 1} and .{W2 (t), t ≥ 1} are independent Wiener processes. There
is an almost surely unique random variable .ξ(κ) defined as
It is interesting to note that the random variable .ξ(κ, θ ) does not depend on .θ if
κ = 1/2, and does not depend on .κ when .θ = 1/2. The density function of .ξ(κ) is
.
where
ΔN → 0 and
. NΔ2N → ∞,
then
Δ2N D
.
2
(k̂N − k ∗ ) → ξ(κ).
σ
(ii) If .κ = 1/2,
then
Δ2N D
.
2
(k̂N − k ∗ ) → ξ(1/2).
σ
Proof We only prove Theorem 2.2.1(i), and the second part can be proven similarly
with minor modifications. It follows from (2.1.11) that
. |k̂N − k ∗ | = oP (N ). (2.2.4)
C
a = aN =
. .
Δ2N
Notice that for any .C > 0, .aN = o(1). On account of (2.2.5) we can assume that
N α ≤ k ≤ Nβ, for any .α < θ < β. We recall (2.1.1) and introduce
.
⎛ ⎞2κ ⎛Σ ⎞2
k Σ
k N
N
Qk =
. Ei − Ei + V (k)
k(N − k) N
i=1 i=1
42 2 Change Point Analysis of the Mean
⎛ ∗ ⎞2
⎛ ⎞2κ Σ
k ∗ Σ
N
N ⎝ k
− Ei − Ei + V (k ∗ )⎠
k ∗ (N − k ∗ ) N
i=1 i=1
= Qk,1 + · · · + Qk,6 ,
where
⎧⎛ ⎞2κ ⎛ ⎞2κ ⎫ ⎛Σ
⎞2
k Σ
N k
N N
Qk,1
. = − Ei − Ei ,
k(N − k) k (N − k ∗ )
∗ N
i=1 i=1
⎧ ⎫
⎛ ⎞2κ ⎨Σ k∗
k Σ Σ k ∗ Σ ⎬
k N N
N
Qk,2 = Ei − Ei + Ei − Ei
k ∗ (N − k ∗ ) ⎩ N N ⎭
i=1 i=1 i=1 i=1
⎧ ⎫
⎨Σ k∗
Σ k − k ∗ Σ ⎬
k N
× Ei − Ei − Ei , (2.2.6)
⎩ N ⎭
i=1 i=1 i=1
⎛⎛ ⎞2κ ⎛ ⎞2κ ⎞⎛ ⎞
N N Σ
k
k Σ
N
∗
Qk,3 = 2 V (k) − V (k ) Ei − Ei ,
k(N − k) k (N − k ∗ )
∗ N
i=1 i=1
⎛ ⎞2κ ⎛ ⎞ Σ
N
N k∗ − k
Qk,4 = 2 V (k ∗ ) Ei ,
k (N − k ∗ )
∗ N
i=1
⎛ ⎞2κ ⎛ ⎞
N Σ
k Σ
k∗
∗
Qk,5 =2 ∗ V (k ) Ei − Ei ,
k (N − k ∗ )
i=1 i=1
⎛ ⎞2κ ⎛ ⎞2κ
N N
Qk,6 = V 2 (k) − V 2 (k ∗ ),
k(N − k) k (N − k ∗ )
∗
1 D 1
a 1/2
. sup |WN (k ∗ − t)| = a 1/2 sup |W (t)|
0≤t≤k ∗ −a k∗ −t a≤t≤k ∗ t
D 1
= sup |W (t)|
1≤t≤k ∗ /a t
1
→ sup |W (t)| a.s.,
1≤t<∞ t
and since .N 1/2 |ΔN | → ∞, (2.2.8) implies (2.2.7) when .l = 1. Now we use
Assumption 2.1.1, and we get from (2.2.8) and (2.2.9) that
⎛ ⎞ ⎛ ⎞
1 1 1
. max |Qk,2 | = OP + OP = oP (1).
∗
|k−k |≥a N 1−2κ Δ2N |k ∗ − k| N |ΔN |
1/2 NΔ2N
Hence (2.2.7) is proven when .l = 2. The same calculations can be used to establish
(2.2.7) when .l = 3 and .4.
Applying (2.2.9) we conclude that
1 1
. max |Qk,5 | = 1/2 OP (1), (2.2.10)
|k−k ∗ |≥a N 1−2κ Δ2N |k ∗ − k| C
44 2 Change Point Analysis of the Mean
where the .OP (1) term does not depend on C. Using again the mean value theorem
there are .c1 > 0 and .c2 > 0 such that
This completes the proof of (2.2.5). It may be established similarly as (2.2.7) that
N −(1−2κ)
. max |Qk,l | = oP (1), if l = 1, 2, 3, 4. (2.2.12)
|k−k ∗ |≤Cσ 2 /Δ2N
Hence the limit distribution of .k̂N is determined by .Qk,5 and .Qk,6 . It follows from
elementary calculation that
| |
| −(1−2κ) |
sup. |N Qk ∗ +sσ 2 /Δ2 ,6 + 2[θ (1 − θ )]1−2κ σ 2 |s|mκ (s)| = o(1). (2.2.13)
N
−C≤s≤C
where .{W (t), −∞ < t < ∞} is the two sided Wiener process of (2.2.2). Since
{W (t), −∞ < t < ∞} and .{−W (t), −∞ < t < ∞} have the same distribution,
.
Σ
6
D [−C,C]
N −(1−2κ)
. Qk ∗ +sσ 2 /Δ2 ,l −→
N
l=1
If
⎛ ⎞κ ||Σ |
k Σ ||
k N
N |
k̂N,C =
. sargmax | Xi − Xi |
k∈{1,...,N }, |k ∗ −k|≤Cσ 2 Δ2N k(N − k) | N |
i=1 i=1
2.2 The Asymptotic Properties of Change Point Estimators 45
(2.2.15) yields
Δ2N ⎛ ∗
⎞ D
. k̂ N,C − k → argmax|s|≤C (W (s) − |s|mκ (s)) .
σ2
Since
. lim ΔN = Δ /= 0.
N →∞
In this case, the limiting distribution of .k̂N depends on the joint distribution of the
partial sums of the errors. Let
⎧
⎪ −1
Σ
⎪
⎪
⎪
⎪ − Ei , if l < 0,
⎪
⎪
⎨ i=l
.S(l) = 0, if l = 0, (2.2.16)
⎪
⎪
⎪
⎪ Σ
l
⎪
⎪
⎪
⎩ Ei , if l > 0,
i=1
D
k̂N − k ∗ → ξκ,Δ .
.
Proof We follow the proof of Theorem 2.2.1 with some modifications. We observe
that (2.2.4) still holds true due to (2.1.11). Instead of (2.2.5), we need to show that
|k̂N − k ∗ | = OP (1).
. (2.2.17)
46 2 Change Point Analysis of the Mean
Using (2.2.4) we can assume that .Nα ≤ k̂N ≤ Nβ for any .0 < α < θ < β < 1.
We use the decomposition of .Qk in (2.2.6). Let .C > 0. We aim to show that
1
. max |Qk,l | = oP (1), if l = 1, 2, 3, 4. (2.2.18)
1≤k≤k ∗ −C N 1−2κ (k ∗ − k)
1
. max∗ |Qk,1 | = OP (N −1/2 ). (2.2.19)
1≤k≤k −C N 1−2κ (k ∗ − k)
Hence (2.2.18) holds when .l = 2. Similar arguments can be used to prove (2.2.18)
when .l = 3 and 4. Since
| |
| k∗ |
1 1 || Σ ||
max |Qk,5 | = OP (1) max∗ Ei | ,
1≤k≤k −C k ∗ ||
.
1≤k≤k ∗ −C N 1−2κ (k ∗ − k)
i=k+1 |
where the .OP (1) term does not depend on C, (2.2.20) implies that for all .x > 0,
⎧ ⎫
1
. lim lim sup P max∗ |Q k,5 | > x = 0. (2.2.21)
C→∞ N →∞ 1≤k≤k −C N 1−2κ (k ∗ − k)
Using the mean value theorem we can find .c1 > 0 and .c2 > 0 such that
1
. max |Qk,l | = oP (1), if l = 1, 2, 3, 4.
N 1−2κ |k ∗ −k|≤C
By definition,
⎧ ⎛ ⎛ ⎞
⎪ ∗ N − k ∗ ⎞1−2κ k∗
Σ
⎪
⎪ k
⎪
⎪ 2 ΔN ⎝− Ei ⎠ , if 1 ≤ k < k ∗
⎪
⎪ N N
1 ⎨ i=k+1
. Qk,5 = 0, if k = k ∗ ,
N 1−2κ ⎪
⎪ ⎛ ∗ ⎞
⎪
⎪ k N − k ∗ 1−2κ Σ
k
⎪
⎪
⎪2
⎩ ΔN Ei , if k ∗ + 1 ≤ k < N.
N N ∗ i=k +1
and therefore
{ }
N −(1−2κ) (Qk ∗ +l,5 + Qk ∗ +l,6 ), |l| ≤ C (2.2.23)
D
{ }
→ 2[θ (1 − θ )]1−2κ (ΔS(l) − Δ2 |l|mκ (l), |l| ≤ C .
If
⎛ ⎞κ ||Σ |
k Σ ||
k N
N |
k̂N,C =
. sargmax | Xi − Xi |
k∈{1,...,N }, |k ∗ −k|≤C k(N − k) | N |
i=1 i=1
D
{ }
k̂N,C − k ∗ → argmax ΔS(l) − Δ2 |l|mκ (l) : |l| ≤ C .
.
Since
{ }
argmax ΔS(l) − Δ2 |l|mκ (l) : |l| ≤ C
.
{ }
→ argmax ΔS(l) − Δ2 |l|mκ (l) a.s.,
. lim |ΔN | = ∞.
N →∞
. lim P {k̂N = k ∗ } = 1,
N →∞
|t|σ 2
ΔS(tσ 2 /Δ2 ) − Δ2
. mκ (t) ≈ σ 2 (W (t) − |t|mκ (t)),
Δ2
where .{W (t), −∞ < t < ∞} is the two sided Wiener process of (2.2.2). The only
unknown in the process on the right hand side above is the parameter .σ 2 , which is
more easily estimated.
2.3 Multiple Changes in the Mean 49
We have seen that tests based on several functionals of the CUSUM process are
consistent to detect a single change point, and that natural estimators of a single
change point, defined as the maximal argument of the CUSUM process, are in a
certain sense consistent and have tractable asymptotic distributions upon proper
normalization depending on the size of the change.
CUSUM processes may also be used to detect and estimate more than one change
point, and in this section we explore the asymptotic behavior of weighted CUSUM
processes when there are multiple change points in the mean. In particular, we
consider the model
Σ
R+1
Xi =
. μj 1{kj∗−1 ≤ i < kj∗ } + Ei , i ∈ {1, . . . , N }. (2.3.1)
j =1
Here .μ1 /= μ2 /= . . . /= μR+1 denote the means of the observations .Xi , which
change at the change points .1 = k0∗ < k1∗ < · · · < kR∗ < kR+1 ∗ = N + 1. We
note that model (2.3.1) includes the AMOC model, and the model with .R = 2 and
.μ1 = μ3 is often referred to as the epidemic change point model. We let .ΔN =
Σ
k
k Σ
N Σ
k
k Σ
N
. Xi − Xi = Ei − Ei + V̄ (k),
N N
i=1 i=1 i=1 i=1
where
Σ
i−1
k Σ
R+1
V̄ (k) =
. μl (kl∗ − kl−1
∗
) + (k − ki−1 )μi − μl (kl∗ − kl−1
∗
),
N
l=1 l=1
∗
if ki−1 <k≤ ki∗ , (2.3.2)
i ∈ {1, ..., R + 1}. We assume in several of the asymptotic results below that the
.
change points are well spaced, and the size of changes are bounded:
Assumption 2.3.1
(i) .ki∗ = LN θi ⎦, .i ∈ {1, . . . , R} and .0 < θ1 < · · · < θR < 1.
(ii) .max1≤l≤R+1 |ΔN,l | < ∞.
It follows from a simple calculation that under Assumption 2.3.1
| |
| V̄ (Nt) |
| |
. sup
| N − v̄N (t)| = o (ΔN ) ,
0≤t≤1
50 2 Change Point Analysis of the Mean
and
Σ
i−1 Σ
R+1
v̄N (t) =
. μl (θl −θl−1 )+(t−θi−1 )μi−1 −t μl (θl −θl−1 ), if θi−1 <t≤θi ,
l=1 l=1
1 ≤ i ≤ R + 1. Here we use the convention that .θ0 = 0 and .θR+1 = 1. We note that
.
ΔN,l might depend on N . In particular we will consider the case when .ΔN,l → 0.
.
The drift function .v̄N (t) is a polygonal function with knots at .θ1 , . . . , θR . We
can extend Theorems 2.1.1–2.1.4 to multiple changes; largely this is an exercise in
replacing .V (x) with .V̄ (x). We note that under Assumption 2.3.1, the test statistics
that are supremum functionals of CUSUM processes will diverge in probability to
infinity if .N 1/2 max1≤l≤R+1 |ΔN,l | → ∞, i.e. at least one of the changes in the
mean is not too small.
(l,u)
k̂ = sargmax SN
. (k).
k∈{l,...,u}
(l,u)
Typically we will consider a change point to be detected in a sub-sample if .TN
exceeds a user specified threshold .ρN . For example, .ρN might be taken to be a
2.3 Multiple Changes in the Mean 51
(l,u)
suitable quantile of the approximate (asymptotic) distribution of .TN , or .ρN might
be a scaled function of N, for instance .ρN = σ (log N)1/2 . With this notation, the
binary segmentation algorithm is described as follows:
BINSEG(1, N, .ρN ) returns a set of estimated change points .K̂ = {k̂1 , . . . , k̂R̂ },
sorted into increasing order, and an estimated number of change points .R̂ = |K̂|.
We now aim to study the asymptotic consistency properties of these estimators
when the detector used is a supremum functional of a weighted CUSUM process.
Interestingly, if weights are not applied in computing the detector, binary segmen-
tation may not lead to a consistent method to estimate R, as illustrated with the
following example.
Example 2.3.1 Let .{Ei , i ≥ 1} be independent identically distributed random
variables with .EEi = 0 and .EEi2 = 1. Consider observations generated so that
⎧
⎨ 2 + Ei , if 1 ≤ i ≤ LN/3⎦,
.Xi = 1 + Ei , if LN/3⎦ < i ≤ L2N/3⎦,
⎩
Ei , if L2N/3⎦ < i ≤ N.
A plot of the mean of .Xi with respect to i looks like a “staircase” with equal steps.
Consider observations generated so that
LN
Σ LN
LNt⎦ Σ Σ LNt⎦ Σ
t⎦ N t⎦ N
. Xi − Xi = Ei − Ei + vN (t)
N N
i=1 i=1 i=1 i=1
vN (t)
. → v(t), as N → ∞,
N
where
⎧
⎨ t, if 0 ≤ t ≤ 1/3,
.v(t) = 1/3, if 1/3 ≤ t ≤ 2/3,
⎩
1 − t, if 2/3 ≤ t ≤ 1.
We note that
| k |
|Σ k Σ ||
N
|
. max | Ei − Ei | = OP (N 1/2 ).
1≤k≤N | N |
i=1 i=1
As we will show in detail below, since the drift term .vN (t) is asymptotically of
higher order than the CUSUM process of the errors, the asymptotic properties of
the change point estimator .k̂N are largely determined by the drift term. In this case,
since the drift term takes it largest value at each .k ∈ {LN/3⎦, . . . , L2N/3⎦}, and
is flat on this interval, the change point estimator as the maximal argument of the
CUSUM process is determined by the CUSUM of the errors. If for example .k̂N =
k̂N (0) denotes the change point estimator based on the standard CUSUM statistic,
then .k̂N /N converges to a non-degenerate random variable that is supported on the
interval .[1/3, 2/3], and is not a consistent estimator of a change point.
Example 2.3.1 suggests that binary segmentation applied using functionals of
the CUSUM process without weights can lead to overestimation of the number of
change points.
If binary segmentation is based on the weighted CUSUM .ZN (N t/(N +
1))/[t (1 − t)]κ , .0 < κ ≤ 1/2, and .ΔN is bounded away from zero, then the method
is consistent in that the number of change points R is correctly estimated with
probability tending to one, and, conditioning on .R̂ = R, the centered estimators
.k̂i − ki are bounded in probability. In view of Theorem 2.2.2, these rate conditions
cannot be improved.
Towards establishing this result, the following lemma shows that when weights
are applied the drift function cannot have “flat” segments. As a result, points at
which the maximum is reached will asymptotically coincide with change points.
Let
Σ
k−1 Σ
R+1
v(t) =
. (θi − θi−1 )μi + (t − θk−1 )μk − t (θi − θi−1 )μi (2.3.3)
i=1 i=1
only at c or d if .infc<t<d |v(t)| > 0. Moreover, on .[c, d] .vN (t) is either strictly
increasing, strictly decreasing, or is decreasing on .[c, a) and increasing on .(a, d]
for some .a ∈ (c, d).
Proof Due to its definition, .v(t) is linear on .[c, d]. If .|v(t)| is constant on .[c, d],
say .|v(t)| = b > 0 for .t ∈ [c, d], then the derivative of .b/[t (1 − t)]κ is .−bκ[t (1 −
t)]−κ−1 (1 − 2t). If .c < d ≤ 1/2, then .b/[t (1 − t)]κ is strictly decreasing on .[c, d],
if .1/2 ≤ c < d, then .b/[t (1 − t)]κ is strictly increasing on .[c, d]. In these cases
the maximum over .[c, d] occurs at c and d, respectively. If .c < 1/2 < d, then
.b/[t (1 − t)] is strictly decreasing until .1/2 and then strictly increasing. Once again
κ
the maximum can only occur at c or d. As such, we may turn to the case where v
on .[c, d] is linear with a nonzero slope term. Since multiplying .|v(t)| by a constant
does not change the location of its maximum, may consider instead the function
t +b
f (t) =
. on [c, d].
[t (1 − t)]κ
If .v(t) changes sign at a, then we can consider the function .|v(t)| on the intervals
.[c, a] and .[a, d], and show that the maximum of .|v(t)| is reached at c and d since
.v(a) = 0. Thus we can assume that the sign of .v(t) does not change on .[c, d]. If
.v(t) is positive on .(c, d), we need to show that the maximum is uniquely reached at
c or d. If .v(t) is negative on .(c, d), we need to prove that the minimum is achieved
only at c or d. The function .[t (1 − t)]−κ is strictly increasing on .[1/2, 1] we get that
for all .1/2 ≤ t < s ≤ 1,
s+b t +b
. >
(s(1 − s))κ [t (1 − t)]κ
since the product of two strictly increasing functions is strictly increasing. It follows
that the maximum is at d and the minimum is at c. As a result we may turn to the
case where .0 < c < 1/2. Elementary calculations give
h(t)
f ' (t) =
. with h(t) = −(1 − 2κ)t 2 + (2κb − κ + 1)t − κb.
[t (1 − t)]κ+1
The roots of .h(t) giving the points were .f ' (t) = 0 are
( )1/2
2κb − κ + 1 − (2κb − κ + 1)2 − 4(1 − 2κ)κb
.t1 = , (2.3.4)
2(1 − 2κ)
54 2 Change Point Analysis of the Mean
and
( )1/2
2κb − κ + 1 + (2κb − κ + 1)2 − 4(1 − 2κ)κb
.t2 =
2(1 − 2κ)
We consider the following cases (1) .f (t) is increasing at c, i.e. .h(c) > 0.
(i) .1/2 < d < 1. We have already shown that .f (t) is increasing on .[1/2, 1] and
therefore .h(d) > 0. If .t1 = t2 ∈ [c, 1/2], then .h(t) ≥ 0 on .[c, d] so that f
is increasing over .[c, d] and its maximum occurs at d. If .t1 > c, then since
.h(0) = −κb > 0, .b < 0. We note .h(0) < 0 and .h(d) > 0 forces .t2 > d. As a
result, when .t1 > c and .h(d) > 0, f decreases from c to .t1 , and then increases
from .t1 to d, hence the maximum must occur at either c or d. Otherwise .t1 < c.
If .c < t2 < 1/2, then once again .h(d) > 0 cannot hold. Hence in this case
.h(t) ≥ 0 on .[c, d], so .f (t) is increasing on .[c, d] and the maximum occurs at
d.
(ii) .d < 1/2. We claim that .h(d) < 0 leads to a contradiction. Assume that .h(d) <
0. Then
d 2 (2κ − 1) − d(κ − 1)
κb ≥
. .
1 − 2d
c2 (2κ − 1) − c(κ − 1)
κb ≤
.
1 − 2c
and therefore
Using that .c, d ≤ 1/2 one can verify that (2.3.5) holds if and only if
1 − κ ≤ (1 − 2κ)(d + c − 2cd).
.
Observing that
1 1
d + c − 2cd = c(1 − 2d) + d ≤
. (1 − 2d) + d = ,
2 2
we get
1
1−κ ≤
. (1 − 2κ)
2
2.3 Multiple Changes in the Mean 55
which is a contradiction. If .h(d) > 0, the argument in part (1)(i) gives that .f (t)
increases on .[c, d].
(2) .f (t) decreases at c, so .h(c) < 0.
(i) If there are no roots of .h(t) between c and d, then .f (t) decreases on .[c, d], so
the unique maximum and minimum are at c and d.
(ii) If there is exactly one root of .h(t) on .[c, d], then it is the smallest root .t1 and
.f (t) has its smallest value at .t1 and the maximum is at c or d. This covers the
case when .c + b > 0. Assume now that .c + b < 0. Then the smaller root of
.h(t) is .t1 of .(2.3.4). Since .b < −c < 0, we have that .4(1 − 2κ)κb < 0 and
therefore
2κb − κ + 1 − |2κb − κ + 1| 2κb − κ + 1
.t1 ≤ = < 0,
2(1 − 2κ) 1 − 2κ
on .[t1 , d] so the maximum is reached at d. This covers the case when .c+b > 0.
If .c + b < 0, then case (2)(i) yields that .t1 < 0, so we cannot have two roots
between c and d.
⨆
⨅
Due to this property passed on to the drift term when weights are applied
in computing the CUSUM detectors, binary segmentation at each stage will
asymptotically correctly identify a change point when one is present and well spaced
from the prior change point estimators. By carefully keeping track of the magnitude
of the CUSUM processes of the model errors at each stage, we may establish the
following consistency result.
Theorem 2.3.1 Suppose the observations follow model (2.3.1) with errors that are
.Lν -decomposable for some .ν > 2, Assumption 2.3.1 holds, .ΔN is bounded away
from zero, and .0 < κ ≤ 1/2. Then if the threshold parameter in binary segmentation
.ρN satisfies
(log N)1/ν ρN
. + 1/2 → 0, as N → ∞,
ρN N
We delay the proof of Theorem 2.3.1 to Chap. 8 (see Theorem 8.2.2), which
establishes this result for binary segmentation based on norms of weighted CUSUM
processes in general separable Hilbert space. The assumption that .ΔN is bounded
away from zero is critical in obtaining the .OP (1) rate for the change point estimators
detailed in (2.3.6), while still assuming minimal conditions on the threshold
parameter .ρN and moment and weak–dependence conditions on the model errors
in (2.3.1). For .ΔN shrinking to zero such that .Δ2N N → ∞,
⎛ ⎞
P {R̂ = R} ∩ { max Δ2N,i |k̂i − ki∗ | ≤ rN } → 1,
. (2.3.7)
1≤i≤R
may be established for example when the model errors are linear processes with
innovations that have sub-Gaussian (or sub-exponential) tails. The rate of estimation
for the change point locations in (2.3.7) is optimal in view of Theorem 2.2.1. Further
discussion of these results may be found in Sect. 2.7 and Remark 2.3.3.
Another popular method to estimate the number and locations of the change points
is to use model selection criteria. We consider a candidate model consisting of S
change points in the means of the observations occurring at times .r1 < · · · < rS as
an estimate of the model (2.3.1). In order to evaluate the quality of such a model,
we measure its fidelity to the data using the loss function .M(S, r1 , . . . , rS ), which
we aim to minimize with respect to the number and locations of the change points.
We imagine that .M(·) is such that small values indicate a good fit. An example is to
use least squares loss. In this case we compute the sample means for each segment
determined by the change point candidates,
1 Σ
ri
X̂ri−1 ,ri =
. Xl , i ∈ {1, . . . , S + 1},
ri − ri−1
l=ri−1 +1
Σ
S+1 Σ
ri
. M = M(S, r1 , . . . , rS ) = (Xl − X̂ri−1 ,ri )2 .
i=1 l=ri−1 +1
It is evident is that the above M will be minimized with the value zero if .S = N and
ri = i, .i ∈ {1, . . . , N }, i.e. each point is identified as a change point. To control the
.
MP EN (S, r1 , r2 , . . . , rS )
.
⎛ ⎞
Σ Σ
S+1 ri
= log ⎝ (Xl − X̂ri−1 ,ri )2 w(l; ri , ri−1 )⎠ + P(N, S),
i=1 l=ri−1 +1
where the function .w(·) determines the weights. Typical choices of the penalty
function generate the well known information criteria used in model selection. For
example .P(N, S) = (N + 2S)/N leads to the Akaike information criteria, whereas
.P(N, S) = (S log N)/N is often referred to as the Bayesian or Schwarz information
criteria.
Another similar approach is to replace least squares with a maximized likelihood,
making use of distributional assumptions on the model errors. We compute the
likelihood function, assuming that the means change exactly S times, and the
possible times of changes .r1 < · · · < rS , resulting in .Lmax (S, r1 , . . . , rS ). We
then minimize
H (ri−1 , ri )
.
⎛ ⎞2
1 Σ
k
k − ri−1 Σ
ri
= max ⎝ Xl − Xl ⎠ , i ∈ {1, . . . , S + 1}.
N ri−1 <k≤ri ri − ri−1
l=ri−1 +1 l=ri−1 +1
We then define
(H )
MP EN (S, r1 , . . . , rS ) =
. max H (ri−1 , ri ) + g(S)mN
1≤i≤S+1
The number and the locations of the change points are estimated with .R̂ and
k̂1 , . . . , k̂R̂ satisfying
.
(H ) (H )
MP EN (R̂, k̂1 , . . . , k̂R̂ ) = min
. min MP EN (S, r1 , . . . , rS ). (2.3.8)
S∈N 1<r1 <r2 <...<rS <N
58 2 Change Point Analysis of the Mean
Assumption 2.3.2 (i) .g(x) is a positive and strictly increasing function, and (ii)
mN → ∞ and .mN /(NΔ2N,l ) → 0 for all .l ∈ {1, . . . , R + 1},
.
NΔ2N,l → ∞,
. for all l ∈ {1, . . . , R + 1},
then for the estimators .R̂ and .k̂1 , . . . , k̂R̂ defined in (2.3.8) and all .ε > 0,
⎛ ⎧ ⎫⎞
. lim P {R̂ = R} ∩ max |k̂i − ki∗ | < εN = 1. (2.3.9)
N→∞ 1≤i≤R
(H )
MP EN (R̂, k̂1 , . . . , k̂R̂ ) = min
. min HN (S) + g(S)mN .
S∈N 1<r1 <r2 <...<rS <N
Hence the random part is bounded in probability for any fixed S. So the asymptotic
behaviour of .MP(HEN
)
(S, r1 , . . . , rS ) is governed by the drift and penalty terms. Since
the drift term vanishes when .S = R and .ri = ki ,
MP(HEN
.
)
(R, k1 , . . . , kR ) = g(R)mN + OP (1). (2.3.10)
If .S < R, then for any setting .1 ≤ r1 < · · · < rS ≤ N, there is at least one
change point .ki∗ between .rj −1 and .rj for some .1 ≤ j ≤ S + 1 and at least one of
.ki − rj −1 or .rj − ki is proportional to N as a consequence of Assumption 2.3.1.
Hence as in (2.1.5),
HN (S, r1 , . . . , rS )
. = OP (1), (2.3.11)
N min1≤l≤R Δ2N,l
MP(HEN
)
(S, r1 , . . . , rS ) P
. → ∞.
mN
(H )
If .S > R, then the drift term in .MP EN is again made to vanish by setting R of the
' ∗ ∗
.r s equal to .k , . . . , k , which implies that
i 1 R
(H )
MP EN (R̂, k̂1 , . . . , k̂R̂ ) ≥ OP (1) + g(S)mN .
.
proving (2.3.9). By Assumptions 2.3.1 and 2.3.2(ii), the fact that .max1≤i≤R |k̂i −
ki∗ | = oP (N ) follows by using the same argument that establishes (2.3.11). ⨆
⨅
Assumption 2.3.2 allows for a variety of choices of .mN and .g(k) that lead to
consistent estimates in the sense of (2.3.9). For instance if the changes are shrinking
to zero slower than .(log N/N )1/2 , then the BIC penalty satisfies Assumption 2.3.2.
Remark 2.3.1 Although binary segmentation and model selection criteria methods
are motivated by different considerations, they are comparable in the following
way. In the simple multiple change in the means model (2.3.1), the problem
of estimating the number of change points and their locations is equivalent to
60 2 Change Point Analysis of the Mean
then
| |
|Σ |
1 | k
|
.N
−1/2+κ
max | E |
i | = oP (1).
|
|k̂−k|≤|k̂−N θ| |k̂ − k|κ | |
i=k̂
2.3 Multiple Changes in the Mean 61
since (2.3.13) holds. The distribution of .{WN (x), x ≥ 0} does not depend on N , and
by the scale transformation of the Wiener process we have
1
N −1/2+κ
. max max |WN (k) − WN (LNα⎦ + 1)| (2.3.14)
LαN ⎦<k≤N θ k<j ≤N θ (j − k)κ
D 1
= max max |W (j/N ) − W (k/N )|
LN α⎦/N <k/N ≤θ k/N <j/N ≤θ (j/N − k/N )κ
1
→ sup sup |W (s) − W (t)| a.s.
α<t≤θ t<s≤θ (s − t)κ
The existence of the limit in (2.3.14) follows from Theorem A.2.2. Theorem A.2.2
also yields that
1
. sup sup |W (s) − W (t)|
α<t≤θ t<s≤θ (s − t)κ
1
. sup sup |W (s) − W (t)| → 0 a.s.,
α<t≤θ t<s≤θ (s − t)κ
In order to construct confidence intervals for the time of changes, we refine the
initial estimators .k̂1 , . . . , k̂R . Let
⎛ ⎞κ
1
.k̃l = sargmax
k∈{k̂l−1 ,...,k̂l+1 } (k − k̂l−1 )(k̂l+1 − k)
| |
| Σ
k
k − k̂l−1 Σ
k̂l+1
|
× || Xi − Xi ||,
k̂l+1 − k̂l−1
i=k̂l−1 +1 i=k̂l−1 +1
l ∈ {1, . . . , R}, where .k̂0 = 0 and .k̂R+1 = N. To reflect the multiple changes we
.
l ∈ {1, . . . , R}. Similarly, the limiting distribution is again the location of the
.
maximum of the two sided Wiener process with triangular drift of (2.2.2):
ΔN,l → 0
. and NΔ2N,l → ∞, (2.3.17)
then
Δ2N,1 ⎛ ∗
⎞ Δ2 ⎛
∗
⎞ Δ2N,R ⎛ ⎞
k̃R − kR∗
N,2
.
2
k̃ 1 − k 1 , 2
k̃ 2 − k 2 , . . . , 2
σ σ σ
are asymptotically independent, and for each .l ∈ {1, . . . , R},
Δ2N,l ⎛ ∗
⎞ D
. k̃ l − k l → ξ̄l .
σ2
Proof To simplify the presentation we discuss the details when .R = 2. Assump-
tion 2.3.3 may be taken to mean that
|k̂1 − k1∗ | = oP (N )
. and |k̂2 − k2∗ | = oP (N ). (2.3.18)
2.3 Multiple Changes in the Mean 63
Using Lemma 2.3.2, we only must consider the weighted CUSUM statistics
computed from the observations .{Xi , 1 ≤ i ≤ k2∗ } to compute .k̃1 , and .{Xi , k1∗ ≤
i ≤ N } to compute .k̃2 . Hence the limit distributions of the estimators follow
from Theorem 2.2.1. The proof of Theorem 2.2.1 also shows that apart from the
trend the limiting distribution of .k̃1 is determined by the sum of the .Ei ’s, when
.i = k1 − LN δ⎦, k1 − LNδ⎦ + 1, . . . , k1 + LNδ⎦, and similarly the limit distribution
of .k̃2 is determined by the sum of the .Ei ’s, when .i = k2 − LNδ⎦, k2 − LNδ⎦ +
1, . . . , k2 +LNδ⎦, for any .δ > 0. The asymptotic independence of the two estimators
results from the assumed .Lν decomposability, see Theorem A.1.3. ⨆
⨅
Remark 2.3.2 Theorem 2.3.3 suggests that an approximate .1 − α confidence
interval for the .l’th change point .kl∗ can be computed as
⎛ ⎞
∗ qξ (κ, 1 − α/2)σ̂N2 qξ (κ, α/2)σ̂N2
.kl ∈ k̃l − , k̃l − ,
Δ̂2N,l Δ̂2N,l
where .qξ (κ, α) is the .α quantile of the random variable .ξ̄l defined in (2.3.16), .σ̂N2 is
an estimator of the variance parameter in (2.3.1), and
1 Σ
k̂l+1
1 Σ
k̃l
Δ̂N,l =
. Xj − Xj
k̂l+1 − k̃l k̃l − k̂l−1
j =k̃l +1 j =k̂l−1
where
{S(j ), j ∈ Z} and .m̄l (t) are defined in (2.2.16) and (2.3.15), respectively. The
.
Δ2N,1 ⎛ ∗
⎞ Δ2 ⎛
N,2 ∗
⎞ Δ2N,R ⎛ ∗
⎞
. k̃ 1 − k 1 , k̃ 2 − k 2 , . . . , k̃ R − k R
σ2 σ2 σ2
64 2 Change Point Analysis of the Mean
D
k̃l − k ∗ → ξ̄l,Δl .
.
Remark 2.3.3 We have seen that the change point tests studied in this section
asymptotically reject the no-change in the mean null hypothesis in the presence
of one or more mean changes. Since the standardized CUSUM process is equivalent
with the likelihood ratio test when the errors are independent normal random
variables, it would be natural to derive the likelihood ratio assuming that under
the alternative there are exactly .R ≥ 2 changes. Maximizing such a likelihood ratio
with respect to the location of R changes in the mean, after some algebra, leads to
maximizing terms in the random part of the CUSUM process of the form
⎛ ⎞2
1 Σ
k
. max Tk,l , where Tk,l = Ei .
1≤l<k≤N l−k
i=l+1
Similar terms will also appear in studying the asymptotic behaviour of the estimators
in binary segmentation under shrinking magnitudes of the changes, and under an
increasing number of change points. Since .max1≤l<k≤N Tk,l ≥ max1≤i≤N Ei2 ,
the rate at which this random variable diverges will depend on the rate of decay
of the cumulative distribution function of the .Ei ’s. We refer to Révész (1990)
and Shao (1995) for optimal, almost sure limit results on .max1≤l<k≤N Tk,l . In
order to obtain a divergence at no more than a logarithmic rate, these random
variables must have a well defined moment generating function. In the asymptotic
results presented to this point, we often have made use of invariance principles
for errors that allow them to be replaced with independent identically distributed
normal random variables, but such approximations fail for .max1≤l<k≤N Tk,l . The
distribution of .max1≤l<k≤N Tk,l depends on the cumulative distribution function
of the errors even in case of independent, identically distributed random variables.
In general it makes sense to avoid estimating the mean of a series on segments
that only contain a few observations, so we might require the length of each sub-
segment considered in, for example, binary segmentation to be at least as large as
.h = h(N). To consider asymptotics in this case we would need a limit result for
.LN (h) = max1≤l<k≤N,k−l≥h Tk,l . Now approximations for the partial sums of the
.Ei ’s can be used to derive the asymptotic distribution of .LN (h). The choice of h
will depend on the rate at which the partial sums may be approximated by a Wiener
process. Due to the optimality of the Komlós et al. (1975, 1976) approximation, in
case of independent and identically distributed errors, h can be small if .Ei has high
moments. Using the main results in Berkes et al. (2014) (see Theorem A.1.2), the
arguments used for independent and identically distributed random variables can be
extended to the dependent case.
2.3 Multiple Changes in the Mean 65
In this section we considered the behaviour of the weighted CUSUM under the
alternative for scalar observations. Using the methods discussed in Sect. 1.3 one can
extend our results to vector valued observations. One possibility to estimate the time
of change is
⎛ k ⎞T
1 Σ k Σ
N
.k̂N = sargmax Xi − Xi
k∈{2,...,N −1} [k(N − k)]
κ N
i=1 i=1
⎛ ⎞
Σ
k
k Σ
N
×A Xi − Xi , (2.3.20)
N
i=1 i=1
and .zN (t) = (zN,1 (t), zN,2 (t), . . . , zN,d (t))T and .0 ≤ κ ≤ 1/2. We suggest the
maximum norm based estimator
⎛ ⎞κ
(1) 1
.k̂ = sargmax max |zN,j (k/(N + 1))| (2.3.21)
k∈{2,...,N −1} 1≤j ≤d k(N − k)
N
Motivated by the problem of testing for changes in the mean parameter, to this
point we have studied the asymptotic properties of the CUSUM process of the raw
observations. In many cases though we are interested in evaluating for the presence
of change points in other quantities describing the distribution of the observations,
for instance the median. In order to perform such change point analyses, it is useful
to study CUSUM processes derived from the empirical distribution function. We let
.F1 , . . . , FN denote the respective cumulative distribution functions (CDFs) of the
against the multiple change point alternative, with .k0∗ = 1 < k1∗ < · · · < kR∗ <
∗
kR+1 = N,
∗
HA :
. Fi = F (l) , for kl−1 < i ≤ kl∗ , l ∈ {1, . . . , R + 1}, (2.4.2)
In some cases one may assume that m is known, for instance if we wish to test
whether m changes during the sample away from a historical baseline median. If m
2.4 Classical CUSUM Tests for Changes in Distribution 67
is unknown, we estimate it with the sample median .m̂N , and replace m with .m̂N in
the definition of .QN (t), leading to the process
⎛ ⎞
L(NΣ
+1)t⎦
L(N + 1)t⎦ Σ {
N
{ } }
Q̂N (t) = N −1/2 ⎝
. 1 Xi ≤ m̂N − 1 Xi ≤ m̂N ⎠ ,
N
i=1 i=1
t ∈ [0, 1].
We now aim to establish the asymptotic properties of .QN and .Q̂N when the
observations arise from a strictly stationary process .{Xi , i ∈ Z}. In this case under
Assumption 2.4.1, .{Ui = F (Xi ), i ∈ Z} is also a stationary sequence, but with
uniform marginal distributions over the unit interval. If the process .{Xi , i ∈ Z} is
p
.L -decomposable for some .p > 0 and .α > 4 in Definition 1.1.1, then the infinite
sequence
∞
Σ
τ2 =
. E [1{(U0 ≤ 1/2} − 1/2)(1{Ui ≤ 1/2} − 1/2)]
l=−∞
is absolutely convergent.
Theorem 2.4.1 If .H0 of (2.4.1) and Assumption 2.4.1 are satisfied, and .{Xi , i ∈ Z}
is .Lp -decomposable for some .p > 0 and .α > 4 in Definition 1.1.1, then we can
define a sequence of Brownian bridges .{BN (t), 0 ≤ t ≤ 1} such that
and
| |
| |
. sup |Q̂N (t) − τ BN (t)| = oP (1). (2.4.4)
0≤t≤1
and
⎛ L(NΣ
+1)t⎦ { }
R̂N (t) = N −1/2
. 1 Ui ≤ ÛN (1/2)
i=1
L(N + 1)t⎦ Σ {
N }⎞
− 1 Ui ≤ ÛN (1/2) ,
N
i=1
68 2 Change Point Analysis of the Mean
where .ÛN (1/2) is the median of .U1 , . . . , UN . It follows from Theorem A.1.4 that
we can define sequence of Wiener processes .{WN (t), 0 ≤ t ≤ 1} such that
| |
| |
| −1/2 L(NΣ
+1)t⎦
|
. sup |N (1 {U ≤ 1/2} − 1/2) − τ W (t) | = oP (1), (2.4.5)
| i N |
0≤t≤1 | i=1 |
where
+1)t⎦ ⎛
L(NΣ ⎞
R̄N (t) = N −1/2
. 1{Ui ≤ ÛN (1/2)} − ÛN (1/2) .
i=1
It follows once again from Theorem A.1.4 that there are continuous Gaussian
processes .{┌N (t, x), 0 ≤ t, x ≤ 1} such that
| |
| |
| −1/2 L(NΣ
+1)t⎦
|
. sup | (1{Ui ≤ x} − x) − ┌N (t, x)|| = oP (1), (2.4.6)
|N
0≤t,x≤1 | i=1 |
and .E┌N (t, x) = 0, .E┌N (t, x)┌N (t ' , x ' ) = γ (x, x ' ) min(t, t ' ), where
∞
Σ
'
┌ ( ) ┐
γ (x, x ) =
. E (1{U0 ≤ x} − x) 1{Ul ≤ x ' } − x ' . (2.4.7)
l=−∞
Hence we conclude
| |
| |
. sup |R̄N (t) − ┌N (t, ÛN (1/2))| = oP (1).
0≤t≤1
It follows from Horváth (1984) (see also Csörgő and Horváth (1993, pp. 24 and 25))
and (2.4.6) that
| |
| 1 ||
|
. ÛN (1/2) − = OP (N −1/2 ). (2.4.8)
| 2|
(2.4.8) in combination with the continuity of .{┌N (t, x), 0 ≤ t, x ≤ 1} gives that
| |
| |
. sup |┌N (t, ÛN (1/2)) − ┌N (t, 1/2)| = oP (1).
0≤t≤1
2.4 Classical CUSUM Tests for Changes in Distribution 69
Hence
| |
. sup |R̄N (t) − ┌N (t, 1/2)| = oP (1),
0≤t≤1
and therefore
| |
. sup |R̄N (t) − t R̄N (1) − (┌N (t, 1/2) − t┌N (1, 1/2))| = oP (1).
0≤t≤1
We use the notation .k0 = 0 and .kR+1 = N. In order to make rigorous statements
about the asymptotic properties .QN , we assume that between each pair of change
points the process is stationary and .Lν -decomposable, resulting in a piecewise .Lν -
decomposable process. This is formalized in the following assumption.
∗
Assumption 2.4.3 .Xi = gl (ηi , ηi−1 , . . .), kl−1 < i ≤ kl∗ , 1 ≤ l ≤ R + 1, where
.g1 , g2 , . . . , gR+1 are deterministic, measurable functions, .gi : S
∞ → R, .E|X |ν <
i
∞ with some .ν > 4, .{ηi , i ∈ Z} are independent and identically distributed random
variables with values in a measurable space .S, and for .kl−1 ≤ i ≤ kl
( | | )
.
∗ |ν 1/ν
vm,l = E |Xi − Xi,m ≤ al m−α with some al > 0 and α > 4,
∗ ∗ , η∗
= gl (ηi , . . . , ηi−m+1 , ηi−m ∗
where .Xi,m i−m−1 , . . .), where .{ηk , k ∈ Z} are
independent, identically distributed copies of .η0 , independent of .{ηj , j ∈ Z}, for
∗ ∗
.k
l−1 < i ≤ kl , .l ∈ {1, . . . , R + 1}.
It follows from Theorem A.1.4 in the appendix that
| |
| |
. sup |F̂N (x) − FN∗ (x)| = OP (N −1/2 ), (2.4.9)
−∞<x<∞
70 2 Change Point Analysis of the Mean
where
1 Σ
N
F̂N (x) =
. 1 {Xi ≤ x} (2.4.10)
N
i=1
Σ
R+1
kl∗ (l)
∗
.FN (x) = F (x).
N
l=1
Let
where
R+1 ⎛ ∗ ⎞
1 Σ kl∗ kl−1
.ĀN = − F (l) (m∗N ).
N N N
l=1
N 1/2
. max |pN (l)| → ∞, (2.4.11)
1≤l≤R+1
then
P
. max |QN (t)| → ∞, (2.4.12)
0≤t≤1
2.4 Classical CUSUM Tests for Changes in Distribution 71
and
P
. max |Q̂N (t)| → ∞. (2.4.13)
0≤t≤1
(see Csörgő and Horváth (1993, pp. 24 and 25)). If .kl−1 < k ≤ kl , then we have
k∗
Σ
k Σ
l−1 Σ
j
Σ
k
∗
. 1{Xi ≤ m̂N } = 1{Xi ≤ m̂N } + (k − kl−1 ) 1{Xi ≤ m̂N }
i=1 j =1 i=kj∗−1 i=kl−1
k∗
Σ
l−1 Σ
j
Σ
k
+ (k − kl−1 ) 1{Xi ≤ m̂N } − F (l) (m̂N ))
∗
i=kl−1
k∗
Σ
l−1 Σ
j
+ (F (j ) (m̂N ) − F (j ) (m∗N ))
j =1 i=kj∗−1
Σ
k
∗
+ (k − kl−1 ) (F (l) (m̂N ) − F (l) (m∗N ))
i=kl−1
kj∗
Σ
l−1 Σ Σ
k
+ F (j ) (m∗N ) + (k − kl−1
∗
) F (l) (m∗N ).
j =1 i=kj∗−1 ∗
i=kl−1
P
m̂N → m∗ ,
.
where .m∗ denotes the median of .F ∗ . Note that according to Theorem 2.4.1, each
scaled partial sum
Σ
k
∗
(k − kl−1
. ) (1{Xi ≤ m̂N } − F (l) (m̂N ))
i=kl−1
72 2 Change Point Analysis of the Mean
satisfies
| |
| Σ |
| k
|
. max ||(k − kl−1
∗
) (1{Xi ≤ m̂N } − F (l) (m̂N ))|| = OP (N 1/2 ).
kl −1≤k≤kl | |
i=kl−1
i=kl−1 |
Assumption 2.4.2 and (2.4.14) with the mean value theorem yield
|
| l−1 kj∗
|Σ Σ
|
. max | (F (j ) (m̂N ) − F (j ) (m∗N ))
1≤k≤N |
|j =1 i=kj −1
∗
|
|
Σ
k
|
∗
+(k − kl−1 ) (F (l) (m̂N ) − F (l) (m∗N ))|| = OP (N 1/2 ).
i=kl−1 |
Thus we conclude
where
|
| l−1 kj∗
|Σ Σ Σ
k
|
.TN = max | F (j ) (m∗N ) + (k − kl−1
∗
) F (l) (m∗N )
1≤k≤N |
|j =1 i=kj∗−1 i=kl−1
|
kj∗ |
k Σ Σ (j ) ∗ ||
R
− F (mN )| .
N |
j =1 i=kj∗−1 |
2.4 Classical CUSUM Tests for Changes in Distribution 73
By comparing the maximum only over the points .k = kl∗ , .l ∈ {1, . . . , R}, we see
that
| |
| l |
|Σ |
.TN ≥ max || pN (j )|| ,
1≤l≤R+1 | |
j =1
Now the CUSUM process is redefined to depend on the quantile level .u ∈ (0, 1) as
well:
Q̃N (t, u)
. (2.4.15)
⎛ ⎞
L(NΣ
+1)t⎦ Σ
N
{ } L(N + 1)t⎦ { }
= N −1/2 ⎝ 1 Xi ≤ q̂N (u) − 1 Xi − q̂N (u) ⎠ ,
N
i=1 i=1
0 < t < 1.
.
Theorem 2.4.3 If .H0 of (2.4.1), and Assumption 2.4.1–2.4.3 are satisfied, and the
sequence .{Xi , i ∈ Z} is .Lν -decomposable for some .ν > 4 with parameter .α > 4,
we can define a sequence of Gaussian processes .{┌˜ N (t, u), 0 ≤ t, u ≤ 1} such that
| |
| |
. sup |Q̃N (t, u) − ┌˜ N (t, u)| = oP (1)
0<t,u<1
.E ┌˜ N (t, u) = 0 and .E ┌˜ N (t, u)┌˜ N (t ' , u' ) = γ (u, u' )(min(t, t ' ) − tt ' ), where
'
.γ (u, u ) is defined in (2.4.7).
Proof By Assumption 2.4.1, we can use again apply the probability integral
transformation. We again let .Ui = F (Xi ), 1 ≤ i ≤ N. Using (2.4.6), we get
| |
| N ⎛ ⎞ |
| −1/2 Σ |
. sup |N 1{Ui ≤ ÛN (u)} − ÛN (u) − ┌N (t, ÛN (u))| = oP (1),
0≤t,u≤1 | i=1
|
74 2 Change Point Analysis of the Mean
where
⎧ ⎫
1 Σ
N
.ÛN (u) = inf x : 1{Ui ≤ x} ≥ u , 0 < u < 1.
N
i=1
Similarly to (2.4.8), (Csörgő and Horváth 1993, pp. 24–25) and (2.4.6) imply
| |
| |
. sup |ÛN (u) − u| = OP (N −1/2 ). (2.4.16)
0≤u≤1
Using again the continuity of .{┌N (t, u), 0 ≤ t, u ≤ 1} and (2.4.16) we get
| |
| |
. sup |┌N (t, ÛN (u)) − ┌N (t, u)| = oP (1).
0≤t,u≤1
Thus we conclude
| |
| |
. sup |Q̃N (t, u) − (┌N (t, u) − t┌N (1, u))| = oP (1).
0≤t,u≤1
The result follows by taking .┌˜ N (t, u) = ┌N (t, u) − t┌N (1, u). ⨆
⨅
The behaviour of .{Q̃N (t, u), 0 ≤ t, u ≤ 1} under the multiple changes in the
distributions of the .Xi ’s under the alternative can be studied along the lines of
Theorem 2.4.2. The times of changes can be estimated with the locations of the
maximum of .sup0≤u≤1 |Q̃N (t, u)| with respect to t.
Example 2.5.1 (River Nile Data) Figure 2.1 displays the yearly measurements
of the flow of the river Nile measured at Aswan over N = 100 years starting
from 1817. A change in the mean of the series appears to take place around the
year 1900. In order to test for the presence of a change point in the mean of the
series, we consider the CUSUM process QN (t) in (1.2.1), and its weighted versions
QN (t)/[t (1 − t)]γ . Figure 2.2 displays QN (t)/[t (1 − t)]γ as a function of t ∈ (0, 1)
for γ ∈ {0, 1/4, 1/2}. Each process has a distinct peak corresponding to the year
1898. This coincides with the year that construction began on the Aswan Low Dam.
We remark the effect of changing γ in Fig. 2.2: although the peak of the CUSUM
process remains largely unaffected, we see that larger values of γ amplify the values
of the CUSUM process closer to the end points.
2.5 Data Examples 75
1400
Fig. 2.1 The time series of
annual flow measurements of
the Nile river taken at Aswan
measured in 108 m3 from
1871–1970
1200
1000
Flow in 108m3
800
600
Time (year)
In order to evaluate the statistical significance of the observed break in the series,
we may calculate an approximate p-value of a test of H0 versus HA in (1.1.2) and
(1.1.3), using the approximation obtained from Theorem (1.2.2): when γ ∈ {0, 1/4},
⎧ | ⎫
|B(t)| 1 |QN (t)| ||
p=P
. sup > sup X
γ| 1
, . . . , XN ,
0<t<1 [t (1 − t)] 0<t<1 σ̂N [t (1 − t)]
γ
where σ̂N2 is an estimator for the long-run variance in the model errors in (1.1.1),
and B is a Brownian bridge that is independent of the sample. We discuss in Chap. 3
below how such estimators for the variance may be obtained. Using for example
the kernel-lag window estimator σN2 defined in (3.1.18) below leads to approximate
p-values of zero when γ ∈ {0, 1/4}. The horizontal dotted lines in Fig. 2.2 show
the approximate null 95% quantiles of sup0<t<1 |QN (t)|/[t (1 − t)]γ , γ ∈ {0, 1/4}.
We see that the test statistics far exceed these values, suggesting the presence of a
change point.
In order to evaluate the statistical significance of sup0<t<1 |QN (t)|/[t (1 − t)]1/2 ,
we may instead appeal to the Darling-Erdős approximation derived in Theo-
76 2 Change Point Analysis of the Mean
1000
γ=0
γ = 1/4
800 γ = 1/2
QN(t) (t(1−t))γ
600
400
200
0
Fig. 2.2 Plots of the weighted CUSUM process |QN (t)|/[t (1 − t)]γ over t ∈ (0, 1) for γ ∈
{0, 1/4, 1/2} computed from the annual Nile river flow series. The horizontal black dotted line
shows the 95% quantile of σ̂N sup0≤t≤1 |B(t)|, and the horizontal red dotted line shows the 95%
quantile of σ̂N sup0≤t≤1 |B(t)|/[t (1 − t)]1/4 . The blue dotted line shows the approximate 95% null
quantile of sup0<t<1 |QN (t)|/[t (1 − t)]1/2 described in (2.5.1)
rem 1.2.5. This suggests that an approximation of the 95% null quantile of
sup0<t<1 |QN (t)|/[t (1 − t)]1/2 is of the form
where qG,0.95 is the 95th quantile of the Gumbel law with cumulative distribution
function F (x) = exp(−2e−x ). This threshold is also displayed in Fig. 2.2.
Example 2.5.2 (Array Comparative Genomic Hybridization Data) Array com-
parative genomic hybridization (A-CGH) data consist of log-ratios of normalized
gene expression intensities from disease versus control samples, indexed by their
location on the human genome. Figure 2.3 displays a sub-sequence of A-CGH
data obtained for the study of genetic aberrations in 26 patients with Glioblastoma
Multiforme obtained from Lai et al. (2005). Aberrations in this case appear as
change-points in the mean of the A-CGH sequence, and so we aimed to perform
a multiple change point analysis on this series. In order to detect change points, we
2.5 Data Examples 77
⎛ ⎞1/2 ||Σ |
k Σ ||
k N
N |
k̂N = sargmax
. | Xi − Xi | .
k∈{1,...,N } k(N − k) | N |
i=1 i=1
Using the threshold ρN = (2 log N)1/2 leads to the estimation of four change
points, as seen in the top panel of Fig. 2.3, which roughly coincided with the visible
abberations in the series. Using instead the threshold ρN = (log N)1/2 leads to the
estimation of eight change points, as seen in the bottom panel of Fig. 2.3. In order
select a segmentation, we computed the BIC of each segmentation as a function
of the number of change points estimated as in Sect. 2.3.2. The candidate change
point models are ordered in each stage of the binary segmentation in descending
order based on the size of the normalized detector computed in that stage. These
are displayed in Fig. 2.4. We see in this plot that there is a large drop in the BIC
occurring at S = 4 change points, after which the BIC levels off. This suggests
using a four change point model, which agrees with the initial binary segmentation
using the threshold ρN = (2 log N)1/2 .
In order to evaluate the uncertainty in the estimates of the change point locations,
we computed, using the standard CUSUM process to refine the initial change point
estimators, approximate 95% confidence intervals for each change point location as
described in Remark 2.3.2. These intervals are plotted in the top panel of Fig. 2.3
as transparent red bands. We observed that these intervals are quite narrow, and
localize the visible starting and ending indices of the aberrations. We remark that,
withstanding clear violations of the assumptions underpinning Theorem 2.3.3, we
view these intervals as being conservative since they are constructed under the
assumption that the change in the mean is shrinking as a function of the sample
size.
4
Normalized A−CGH
2
0
−2
Index
4
Normalized A−CGH
2
0
−2
Index
Fig. 2.3 The series of normalized A-CGH measurements from chromosome 7 in sample GBM29
as shown in Figure 4 of Lai et al. (2005). The segmentation of the mean in the top panel was
obtained using binary segmentation with the threshold ρN = (2 log N )1/2 , which yielded four
change points, whereas the segmentation in the bottom panel is based on binary segmentation with
threshold ρN = (log N )1/2 , yielding eight change points. The top panel shows using transparent
red bands the approximate confidence intervals for the change points as stated in Remark 2.3.2
2.5 Data Examples 79
6.0
5.8
5.6
BIC
5.4
5.2
2 4 6 8 10 12 14
Fig. 2.4 A plot of the BIC as a function of the number of change points obtained by binary
segmentation
right hand panel of Fig. 2.6. It appears to support that the centered series is mean
stationary and approximately serially uncorrelated.
A natural question here is whether or not the series appears to undergo changes in
other aspects of its distribution. This can be investigated by examining the CUSUM
process Q̃N (t, u) introduced in (2.4.15). A natural test statistic to evaluate for
further changes in the distribution is to consider
⎧ 1⎧ 1
Q̃2N (t, u)dtdu.
0 0
11
10
Temperature in C
9
8
Year
Fig. 2.5 Time series of the mean yearly temperature measured in degrees Celsius in Prague
between 1775 to 1990. Binary segmentation using the standard and weighted CUSUM processes
for the mean leads to estimates of change points at locations k̂1 = 1837 and k̂2 = 1934
1.0
2.5
γ=0
0.8
γ = 1/4
2.0
0.6
QN(t) (t(1−t))γ
1.5
ACF
0.4
1.0
0.2
0.5
0.0
0.0
0 5 10 15 20
t Lag
Fig. 2.6 Left panel shows plots of the weighted CUSUM process |QN (t)|/[t (1 − t)]γ over t ∈
(0, 1) for γ ∈ {0, 1/4}, along with approximate 95% null critical values. Peaks are visible at
estimated change point locations corresponding to k̂1 = 1837 and k̂2 = 1934. The right panel
shows the ACF of the series centered based on these estimated change points using the piecewise
constant mean estimate shown in Fig. 2.5
u
t
t
Fig. 2.7 Plots of |Q̃N (t, u)| as defined in (2.4.15) for the Prague mean yearly temperature (left),
and for the centered Prague mean yearly temperature series using the change point estimates
k̂1 = 1837 and k̂2 = 1934 (right). Tests for changes in the distribution of these series based
/ /1 1
on 0 0 Q̃2N (t, u)dtdu are significant at the 5% level for the original series, but not for the centered
series
these change point estimates, the p-value was 0.8715, indicating that there do not
appear to be significant changes in the distribution of the re-centered series. A plot
of |Q̃N (t, u)| for the centered series is shown in the right hand panel of Fig. 2.7,
from which one may see that large peaks in the process no longer appear.
82 2 Change Point Analysis of the Mean
2.6 Exercises
where
1 Σ
k
X̄j,k =
. Xi . (2.6.1)
k−j
i=j +1
where X̄j,k is defined in (2.6.1). Investigate the limiting behavior of TN under the
exactly one change in the mean alternative when the mean changes from μ1 to μk1 +1
at time k1 .
Exercise 2.6.3 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} is a stationary AR(1)
sequence defined by
Ei = ρEi−1 + ηi
. i ∈ Z,
where {ηi , i ∈ Z} are independent and identically distributed random variables with
Eη0 = 0, Eη02 = σ 2 , E|η0 |ν < ∞ with some ν > 2 and |ρ| < 1. We wish to test
H0 : μ1 = · · · = μN against the one change in the mean alternative. We use
| |
TN = max |X̄0,m − X̄m,N | ,
.
1≤m<m
where X̄j,k is defined in (2.6.1). Compute the limiting distribution of TN under the
null hypothesis.
Exercise 2.6.4 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} is a stationary AR(1)
sequence defined by
Ei = ρEi−1 + ηi
. i ∈ Z,
2.6 Exercises 83
where {ηi , i ∈ Z} are independent and identically distributed random variables with
Eη0 = 0, Eη02 = σ 2 , E|η0 |ν < ∞ with some ν > 2 and |ρ| < 1. We wish to test
H0 : μ1 = · · · = μN against the one change in the mean alternative. We use
| |
.TN = max |X̄0,m − X̄m,N | ,
1≤m<m
where X̄j,k is defined in (2.6.1). Investigate the limiting behavior of TN under the
exactly one change in the mean alternative when the mean changes from μ1 to μk1 +1
at time k1 .
Exercise 2.6.5 Let X1 , X2 , . . . , XN be stationary and serially uncorrelated random
variables with EXi2 = σ 2 . Show that
⎛ k ⎞2
Σ k Σ
N
σ 2 k(N − k)
.E Xi − Xi = .
N N
i=1 i=1
N 1/2 | |
. |λ k − λ k | → ∞
1/2 1 2
(log log N)
and k1 = LNθ1 ⎦, 0 < θ1 < 1.
Exercise 2.6.8 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = · · · = μN against the at most two
changes alternative HA : μ1 = μ2 = . . . = μk1 /= μk1 +1 = μk1 +2 = . . . = μk2 /=
μk2 +1 = μk2 +2 = . . . = μN . We use the statistic
{ | |
.TN = N
−3/2
max k(m − k) |X̄0,k − X̄k,m |
1≤k<m<N
| |}
+(m − k)(N − m) |X̄k,m − X̄m,N | ,
and X̄j,k is defined in (2.6.1). Assume that we have exactly two changes at LNθ1 ⎦
and LNθ2 ⎦, 0 < θ1 < θ2 < 1. Find a condition on the sizes of the changes in the
mean which implies that TN → ∞ in probability.
Exercise 2.6.10 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = · · · = μN against the at most two
changes alternative HA : μ1 = μ2 = . . . = μk1 /= μk1 +1 = μk1 +2 = . . . = μk2 /=
μk2 +1 = μk2 +2 = . . . = μN . We use the statistic
{ | |
TN = N −3/2
. max k(m − k) |X̄0,k − X̄k,m |
1≤k<m<N
| |}
+(m − k)(N − m) |X̄k,m − X̄m,N | ,
.TN → ∞,
TN → ∞,
.
P {E0 > x} = cx −α ,
. x ≥ x0 with some c > 0 and α > 0.
Show that
{ }
. lim inf P TN > N 1/α > 0
N →∞
Grabovsky et al. (2000) introduced kernel type estimators with more general
weights for the time of change, and obtained their asymptotic properties in case
of independent identically distributed errors in the exactly one change model.
Dümbgen (1991) and Antoch et al. (1995) proved Theorem 2.2.1 for independent
observations. Later Antoch et al. (1997) extended their result to linear processes; see
also Antoch and Hušková (1999) for a review. Kurozumi (2018) and Hušková and
Kirch (2010) consider the development of confidence intervals for the time change.
The density function of ξ(κ) in Theorem 2.2.1 was computed by Ferger (1994); see
also (Csörgő and Horváth 1997, p. 177).
Binary segmentation is often credited to Scott and Knott (1974) and Vostrikova
(1981). The first consistency result for binary segmentation of the mean of
86 2 Change Point Analysis of the Mean
independent variables with a fixed number of change points that do not shrink
to zero appears to have been Korostelev (1988). For Gaussian model errors with
a potentially increasing number of, potentially shrinking, changes appeared in
the PhD thesis Venkatraman (1992), and was revisited in Fryzlewicz (2014).
Lemma 2.3.1 was proven in Venkatraman (1992) when κ = 1/2. Sub-Gaussianity of
the tails of the model errors is a typical ingredient in establishing such consistency
results, which is used to produce logarithmic bounds for random variables of the
form
| b |
| Σ |
1 | |
. max | Ei | .
1≤a<b<c≤N (c − a) 1/2 | |
i=a+1
It may be shown for example that when binary segmentation is applied in the
presence of multiple, potentially shrinking, change points, the first change point
estimator in binary segmentation is within a radius of OP (1/Δ2N ) of a change point
(cf. Theorem 2.2.1). However, in the second stage, the maximum of the CUSUM of
the innovations is maximized over a random interval. The maximal absolute value
of the CUSUM process of the model errors may be bounded by variables of the
form max1≤i≤C/Δ2 |Ei |. As such, when ΔN shrinks, it is convenient to make use of
N
strong tail conditions on the variables to get effective bounds on such maxima.
Bai and Perron (1998, 2003) extend the binary segmentation method to linear
models. Yao (1988) and Lee (1995) used Schwarz’s criteria to estimate the number
of changes in independent normal observations. Serbinowska (1996) applied the
same method to binomial observations. Pan and Chen (2006) and Ciuperca (2011)
applied a more general penalty function to find the number of changes in general
time series models. Bai (1995) modifies the binary segmentation method and derives
the asymptotic properties of the estimators for the time of change. Chen et al.
(2011) compares the binary segmentation and the maximum residual method. The
minimum description length is used as the criterion for segmentation in Davis et al.
(2006), and it is minimised using a genetic algorithm to reduce computational
complexity. For a review of penalty terms and information criteria that may be used
in Sect. 2.3.2, see Claeskens and Hjort (2008).
Multiple change point detection and estimation methods have been intensively
studied, even for univariate scalar data, in the last two decades. Many of these
improve upon the weaknesses of simple binary segmentation, including its propen-
sity to perform poorly for change points that are close together, or when there are
many change points. Some notable modern methods include SMUCE (Frick et al.
2014; Dette et al. 2020a), Wild-binary segmentation (Fryzlewicz 2014; Fryzlewicz
and Rao 2014), seeded binary segmentation (Kovács et al. 2023), MOSUM (Kirch
and Klein 2021), kernel based methods (Arlot et al. 2019), and PELT (Killick et al.
2012), among many others; see also Wang et al. (2020). Excellent reviews may be
found in Cho and Kirch (2021) and Yu (2020). A numerical comparison of many of
these methods may be found in Shi et al. (2022).
2.7 Bibliographic Notes and Remarks 87
One of the first nonparametric change point procedures was introduced in Page
(1954, 1955). Sections 2.2 and 2.3 in Csörgő and Horváth (1997) discuss the
extensions of Page’s procedure in case of independent observations. They also
provide several references on applications of nonparametric statistics to change
point analysis. Their methodology is based on the theory of empirical and quantile
processes which we also used in Sect. 2.4. Empirical process techniques for depen-
dent data are surveyed in Dehling et al. (2002, 2009). Hoga (2018a,b) investigates
changes in the quantiles with applications to risk measures and tail indices. Hušková
and Kirch (2008) and Boldea et al. (2019) advocated resampling methods to improve
finite sample performance of several statistical methods. Gerstenberger (2018) uses
Wilcoxon statistics, along the lines of Sect. 2.4 to estimate the time of change, which
also fall within the scope of the U-statistic based methods in Dehling et al. (2022,
2015). By considering the random functions Xi |→ 1{Xi ≤ ·}, functional data
methods, see Chap. 8, can be applied to motivate similar methods as discussed
in Sect. 2.4; see e.g. Sharipov et al. (2016). Holmes et al. (2013) and Bücher
et al. (2019) provides several nonparametric tests for changes in distribution. An
outlier robust method was proposed in Fearnhead and Rigaill (2019). Empirical
characteristic function based methods to detect and estimate changes in distribution
are developed in Huśková and Meintanis (2006b,a), Hlávka et al. (2017), and
Matteson and James (2014).
If the observation process {Xi , i ∈ Z} is formed from independent and
identically distributed random variables, then the approximating Gaussian process
{┌˜ N (t, u), 0 ≤ u, t ≤ 1} in Theorem 2.4.3 has a simple covariance structure:
E ┌˜ N (t, u) = 0 and E ┌˜ N (t, u)┌˜ N (s, v) = (min(t, s) − ts)(min(u, v) − uv). Hence
for each N the process {┌˜ N (t, u), 0 ≤ u, t ≤ 1} is a “Brownian pillow”; it is
tied down at all edges of the unit square. The Brownian pillow appeared in the
paper of Blum et al. (1961), who provided critical values for the square integral
of the Brownian pillow. Koning and Protasov (2003) obtained a representation for
the Brownian pillow which can be used to simulate critical values for several other
functionals.
Applications of change point analysis to climate data have been reviewed in
Reeves et al. (2007). Example 2.5.2 was motivated by an example in Killick and
Eckley (2014).
Chapter 3
Variance Estimation, Change Points
in Variance, and Heteroscedasticity
i.e. the variance parameter is defined in terms of the asymptotic variance of the
scaled sample mean of the model errors. This asymptotic quantity is often termed
the long–run variance, and also coincides with a scalar multiple of the spectral
density of the sequence .{Ei , i ∈ Z} evaluated at frequency zero. To begin this
chapter, we discuss typical estimators of the long–run variance, and establish their
asymptotic consistency. We then turn to the problem of performing change point
analysis for the second order properties of a process. The chapter concludes with
studying how asymptotic properties of change point methods for the mean are
affected by heteroscedasticity, or changes in the variance, of the model errors in
model (1.1.1).
Suppose for the moment that .H0 in the basic AMOC change point model (1.1.1)
holds, so that for each i, .Xi = Ei , and further that .{Ei , i ∈ Z} is a stationary sequence
with .EEi2 < ∞, and autocovariance function .γE (k) = cov(E0 , Ek ). It follows from
elementary calculations that
⎛ ⎞ ⎛ ⎞
1 E
N E
N−1
|k|
Var
. Ei = 1− γE (k).
N 1/2 N
i=1 k=−(N −1)
Hence, if
∞
E
. |γE (k)| < ∞,
k=−∞
1 E( 1 E
N N
)2
2
.σ̂N = Xi − X̄N , where X̄N = Xi . (3.1.2)
N −1 N
i=1 i=1
The asymptotic consistency of .σ̂N2 follows from the ergodic theorem, when the
model errors are ergodic.
3.1 Estimation of Long–Run Variances and Covariance Matrices 91
a(cx)
. lim = 1.
x→∞ a(x)
Theorem 3.1.2 We assume that .H0 of (1.1.2), Assumption 3.1.1 hold and .EX04 <
∞.
If .cov(X02 , Xk2 ) = O(a(k)), where .a(k) is a strictly decreasing, slowly varying
function at infinity, then
⎛ ⎞
|σ̂N2 − σ 2 | = OP a 1/2 (N ) .
.
Proof Since under the null hypothesis .σ̂N2 does not depend on the common mean,
we have that
1 E 2 1 E
N N
N
. σ̂N2 = Ei − Ē 2 with ĒN = Ei .
N −1 N −1 N N
i=1 i=1
and
| |
| NE −1 ⎛ ⎞ ⎛ ⎞| E −1 | ⎛ ⎞|
| |k| | N
| |
| 1− 2 2 |
cov E0 , Ek | ≤ 4 |cov E02 , Ek2 | .
.
|
|k=−(N −1) N | k=0
for all .t > 0. By the monotone density theorem for slowly varying function in
Bingham et al. (1987) (pp. 159–160)
N |
E ⎛ ⎞|
| |
. |cov ε02 , εk2 | ≤ CNa(N).
k=0
⎛ ⎞
1 E
N
k ∗ (N − k ∗ ) 2 |ΔN |
σ̂N2 =
. (Ei − ĒN )2 + ΔN + O P . (3.1.5)
N −1 N2 N 1/2
i=1
The relation (3.1.5) suggests that the usual sample variance will tend to overestimate
σ 2 when change points are present in the sequence. The bias term .Δ2N k ∗ (N −
.
k ∗ )/N 2 may be thought to quantify the “additional variance” that appears in the
series due to the presence of change points. It is of note that this bias term does
not asymptotically vanish when the change point .k ∗ is proportional to N (i.e. under
Assumption 2.1.1) and .lim infN →∞ |ΔN | > 0. Using such an estimator in practice
will have the effect of reducing the power of change point detection methods.
There are several ways though to reduce the bias in such a variance estimator
due to change points. One may note that the bias term arises due to the fact that the
sample mean .X̄N used in defining .σ̂ 2 in (3.1.2) does not properly center the series
under .HA . If instead we center the series before and after a candidate change point
k using the running averages
1E E
k N
1
X̄k =
. Xi and X̃k = Xi , (3.1.6)
k N −k
i=1 i=k+1
would estimate .σ 2 . The fact that this estimator as a function of k is expected to reach
its smallest value when .k = k ∗ suggests using the estimator
⎧ k ⎫
1 E E
N
2
.σ̃N = min (Xi − X̄k ) +
2
(Xi − X̃k ) .
2
(3.1.7)
N 1≤k<N
i=1 i=k+1
Theorem 3.1.3 We assume that the AMOC alternative in (1.1.3) holds. If Assump-
tions 1.2.2, 3.1.1 and 3.1.2 are satisfied and
then
P
σ̃N2 → σ 2 .
.
94 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
Proof If .1 ≤ k ≤ k ∗ , then
E
k E
k
1E
k
. (Xi − X̄k )2 = (Ei − Ēk )2 with Ēk = Ei ,
k
i=1 i=1 i=1
and
E
N k∗ ⎛
E ⎞2
N − k∗
. (Xi − X̃k ) = Ei − Ẽk +
2
ΔN
N −k
i=k+1 i=k+1
E
N ⎛ ⎞2
k − k∗
+ Ei − Ẽk − ΔN
∗
N −k
i=k +1
with
1 E
N
Ẽk =
. Ei .
N −k
i=k+1
Now
E
k E
k
. (Ei − Ēk )2 = Ei2 − k Ēk2 .
i=1 i=1
Under Assumption 1.2.2, we have by the law of the iterated logarithm that
⎛ ⎞2
1 E
k
. max k Ēk2 = max ∗ Ei = OP (log log N) .
1≤k≤k ∗ 1≤k≤k k 1/2
i=1
k∗ ⎛
E ⎞2
N − k∗
. Ei − Ẽk + ΔN
N −k
i=k+1
∗ ∗
E
k
(N − k ∗ )2 2 E
k
∗ ∗
= Ei2 + (k − k)Ẽk2 + (k − k) Δ N − 2Ẽ k Ei
(N − k)2
i=k+1 i=k+1
E k∗
N − k∗ N − k∗
+2 ΔN Ei − (k ∗ − k)Ẽk ΔN ,
N −k N −k
i=k+1
3.1 Estimation of Long–Run Variances and Covariance Matrices 95
and
E
N ⎛ ⎞2
k − k∗
. Ei − Ẽk − ΔN
∗
N −k
i=k +1
E
N
(k ∗ − k)2 2 E
N
= Ei2 + (N − k ∗ )Ẽk2 + (N − k ∗ ) Δ N − 2Ẽ k Ei
(N − k)2
i=k ∗ +1 ∗
i=k +1
k − k∗ E
N
k − k∗
−2 ΔN Ei + 2(N − k ∗ )Ẽk ΔN .
N −k ∗
N −k
i=k +1
and
| |
| E
N |
| |
. max (N − k)Ẽk2 = OP (1), max ∗ |Ẽk Ei | = OP (1),
1≤k≤k ∗ 1≤k≤k | ∗
|
i=k +1
| |
k − k ∗ || E |
N
|
max ∗ | ΔN Ei | = OP (|ΔN |N 1/2 ),
1≤k≤k N − k | |
i=k ∗ +1
| |
|
∗ | k−k
∗ |
max ∗ (N − k ) |Ẽk ΔN || = OP (|ΔN |N 1/2 ).
1≤k≤k N −k
and
where
where .X̄k and .X̃k are defined in (3.1.6). The estimators defined in Chap. 2 satisfy
that under the no change null hypothesis, .k̂/N = θ̂N converges to a non–degenerate
distribution, while under the AMOC alternative .k̂/N = θ̂N → θ in probability.
Under these conditions and with minor modifications of the proof of Theorem 3.1.3
we may show that
P
σ̄N2 → σ 2 .
.
The definition of .σ̃N2 and .σ̄N2 can be extended to the case when multiple change
points are present in the means of the observations. Here we consider the R change
point model of (2.3.1)
E
R+1
Xi =
. μj 1{kj∗−1 ≤ i < kj∗ } + Ei , i ∈ {1, . . . , N }, (3.1.11)
j =1
where .kj∗ , .j ∈ {1, . . . , R} denote change points in the mean, satisfying .1 = k0∗ <
k1∗ < · · · < kR∗ < kR+1
∗ = N + 1. Letting .1 < k1 < . . . < kS < N denote candidate
3.1 Estimation of Long–Run Variances and Covariance Matrices 97
1 E
ki
X̄ki =
. Xj , i ∈ {1, ..., S + 1}. (3.1.12)
ki − ki−1
j =ki−1
1 E E
S+1 ki
( )2
2
σ̃N,S
. = min Xj − X̄ki , (3.1.13)
1≤k1 <k2 <···<kS ≤N N
i=1 j =ki−1
where S is a user specified upper bound on the number of possible changes. This
estimator works well if N is large and S does not greatly overestimate the number
of changes. We can modify the estimator .σ̄N2 to be based instead on estimators for
the number and locations of the change points. Suppose we have estimates of the
change points in the mean .k̂1 < . . . < k̂R̂ , with .R̂ denoting the estimator for the
number of changes R. We define
R̂+1 k̂i ⎛ ⎞2
1 E E
2
σ̄N,
.
R̂
= Xj − X̄k̂i . (3.1.14)
N
i=1 j =k̂i−1
One can prove that if the estimators .k̂1 , . . . , k̂R̂ and .R̂ satisfy Assumption 2.3.3,
namely that for all .ε > 0 .limN →∞ P ({R̂ = R} ∩ {max1≤i≤R |k̂i − ki∗ | < εN }) = 1,
then
P
.σ̄ 2 → σ 2.
N,R̂
Each of these estimators can analogously be defined for vector valued obser-
vations taking value in .Rd . Consider a stationary vector-valued time series .{Xt ∈
Rd , t ∈ Z}. We have that the covariance of .N 1/2 times the sample mean can be
written as
⎛N ⎞⎛ N ⎞T
1 E E
. E (Xi − EX0 (Xi − EX0
N
i=1 i=1
E
N −1 ⎛ ⎞
|k|
= 1− E (X0 − EX0 ) (Xk − EXk )T
N
k=−(N −1)
E
N −1 ⎛ ⎞
|k|
= 1− γ k,
N
k=−(N −1)
98 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
where .γ k = E (X0 − EX0 ) (Xk − EXk )T is the autocovariance matrix of the series
at lag k. We then define the long–run covariance matrix as
⎛N ⎞⎛ N ⎞T
1 E E
. lim E (Xi − EX0 ) (Xi − EX0 ) = E.
N →∞ N
i=1 i=1
First we consider again the case when the observations are uncorrelated, i.e.
Assumption 3.1.3 .{Xi , i ∈ Z} is an uncorrelated stationary sequence, .E||X0 ||2 <
∞ and .cov(X0 , Xk ) = 0 for all .k /= 0, where .0 denotes the .d × d zero matrix.
If Assumption 3.1.3 holds, then the sample covariance is the natural estimator
for .E. Let
1 E( 1 E
N N
)( )T
Ê N =
. Xi − X̄N Xi − X̄N with X̄N = Xi .
N −1 N
i=1 i=1
As a result of its definition .Ê N is a non negative definite matrix. It can be shown
as in (3.1.3) that under the null hypothesis and assuming the series .{Xi , i ∈ Z} is
ergodic,
Ê N → E
. a.s. (3.1.15)
In order to define an estimator for .E that is consistent not only under the null
hypothesis but also when the series contains change points in the mean, one can
easily modify the definition of .Ê N as we did in defining .σ̃N2 , σ̄N2 , σ̃N,R
2
∗ and .σ̄
2 .
N,R̂
If the observations are serially correlated, then the long–run variance .σ 2 defined in
(3.1.1) may in principle depend on the entire autocovariance function .γE (l). It is
natural to estimate this function with the empirical autocovariance function based
3.1 Estimation of Long–Run Variances and Covariance Matrices 99
where .X̄N is the sample mean. Note that this estimator may not be used to
estimate .γE (l) for .l larger than .N − 1. A naïve estimator of .σ 2 is obtained by
replacing the unknown autocovariances in (3.1.1) with estimators, leading to
E
N −1
σ̂N2 (Naïve) =
. γ̂l . (3.1.17)
l=1−N
E
N −1 ⎛ ⎞
l
2
.σ̂N = 2
σ̂N,LRV = K γ̂l , (3.1.18)
h
l=1−N
Assumption 3.1.5
(i) .K(0) = 1,
(ii) .K(u) = K(−u),
(iii) there is .c > 0 such that .K(u) = 0, if .u /∈ [−c, c],
(iv) .sup−c<u<c |K(u)| < ∞,
All the above kernels satisfy Assumption 3.1.5 with the exception of the
quadratic spectral kernel, which is supported on the entire real line.
In some consistency results and bandwidth selection methods it is often useful to
specify the polynomial degree of the kernel near the origin. A kernel function K is
said to be of order q if
1 − K(x)
0 < lim
. < ∞. (3.1.19)
x→0 |x|q
3.1 Estimation of Long–Run Variances and Covariance Matrices 101
When such a q exists that satisfies (3.1.19), the kernel is said to be of infinite
order. The Bartlett kernel is of order one, whereas the Parzen, Tukey–Hanning,
Quadratic spectral, and Daniell kernels are of order two. Kernels that are equal to
one in a neighborhood of the origin are often said to be of infinite order.
We now turn to studying the asymptotic consistency of kernel–bandwidth
estimators of the long–run variance of the form (3.1.18). While the asymptotic
results established in Chap. 1 typically only require a Gaussian approximation
result for the partial sum process of the observations, as in Assumption 1.1.1, such
conditions are not sufficient to establish the consistency of the long–run variance
estimators. We often require moment bounds for the partial sums of the errors and
their autocovariances. It is convenient and quite general to state such moment and
weak dependence requirements in terms of .Lν –decomposability as introduced in
Definition 1.1.1. For example, if .{Ej , j ∈ Z} is .Lν –decomposable for .ν ≥ 2,
∗ , as defined in Definition 1.1.1, is independent of .E . It follows then by the
then .El,l 0
Cauchy-Schwarz inequality
∗ ∗ 2 1/2
|γE (l)| = |cov(E0 , El )| = |cov(E0 , El − El,l
. )| ≤ (EE02 )1/2 (E|E0 − E0,l | ) .
∗ |2 )1/2 ≤
Hence when .{Ej , j ∈ Z} is .Lν decomposable for .ν ≥ 2, so that .(E|E0 −E0,l
al−α for constants .a > 0 and .α > 2, .γE (l) is an absolutely summable sequence,
and the long–run variance is well defined.
Theorem 3.1.4 If the no–change null hypothesis .H0 of (1.1.2) holds, Assump-
tions 3.1.4 and 3.1.5 are satisfied, and the model errors are .Lν –decomposable for
some .ν ≥ 4, then
P
.σ̂N2 → σ 2
theorem,
E
N −1 ⎛ ⎞
l
. K γE (l) → σ 2 .
h
l=−(N −1)
ĒN = OP (N −1/2 ),
. (3.1.20)
we get that
| h |
| E ⎛ l ⎞| ⎛ ⎞
| | h
2
.ĒN | K | = OP = oP (1).
| h | N
l=−h
since the same estimates can be used for .l < 0. We note that
⎛ h ⎞2
E ⎛ l ⎞ 1 NE −l
.E K (Ei Ei+l − γE (l)) (3.1.25)
h N −l
l=0 i=1
E
h E
h ⎛ ⎞ ⎛ '⎞
l l 1 1
= K K
h h N − l N − l'
l=0 l' =0
E
N −l'
−l NE
┌ ┐
× E (Ei Ei+l − γE (l))(Ej Ej +l' − γE (l)' ) .
i=1 j =1
If .1 ≤ i ≤ i + l, i ≤ j ≤ j + k, by stationarity we have
( )
.E (Ei Ei+l − γE (l)) Ej Ej +k − γE (k) (3.1.26)
( )
= E (E0 El − γE (l)) Ej −i Ej −i+k − γE (k) .
and
| |
| ∗ ∗ | −α
.E |(E0 El − γE (l)) Ej,j −l (E j +k − E j +k,j +k−l | ≤ c5 (j + k − l) .
)
∗
Since .E(E0 El − γE (l))Ej,j ∗
−l Ej +k,j +k−l = 0, we get
E| ( )|
. |E (E0 El − γE (l)) Ej Ej +k − γE (k) |
A1
E ∞
h E
≤ c6 h (j − l + 1)−α ≤ c7 h, (3.1.27)
l=0 j >l
and
∗
E|E0 Ej (El − El,l−j
. )Ej∗+k,j +k−l | ≤ c8 (l − j + 1)−α .
∗
For all .(j, k, l) ∈ A2 , .EE0 Ej El,l−j Ej∗+k,j +k−l = E[E0 Ej ]E[El,l−j
∗ Ej∗+k,j +k−l ] =
∗ ∗
γj γj +k−l , since we can assume in the definitions of .El,l−j and .Ej +k,j +k−l that we
use the same .{ηn∗ , −∞ < n < l}. Hence
E| ( )|
. |E (E0 El − γE (l)) Ej Ej +k − γE (k) | ≤ c9 h. (3.1.28)
A2
the kernel as well as the autocorrelations of the errors that may be estimated. It
is recommended in Andrews (1991) that B be approximated using a parametric
model for the errors, for instance an autoregressive model. When an autoregressive
process of order one is used in this step, and the kernel is of order .q = 1, the optimal
bandwidth may be estimated with
4ρ̂ 2
ĥ = B ∗ [α̂(1)N]1/3 , where α̂(1) =
. . (3.1.30)
(1 − ρ̂ 2 )2
estimators as in (3.1.18). It may be shown that under the AMOC model (1.1.1) and
Assumption 2.1.1 that
γ̂j − γj P
. → 1, as N → ∞, (3.1.31)
θ (1 − θ )Δ2N
where .ΔN = μ1 − μA , the difference between the means before and after the
change. It follows from (3.1.31), and Assumptions 3.1.4 and 3.1.5 that
P
σ̂N2 → ∞,
.
if .limN →∞ |ΔN | > 0, where .σ̂N2 is defined in (3.1.18). Hence if the possible change
is not taken into account, the power of the various tests introduced in Chap. 2 will
be reduced. One can show in fact that
/
σ̂N2 P
c
. → K(u)du.
hθ (1 − θ )Δ2N −c
In other words .σ̂N increases approximately linearly with .ΔN . As a result, when
CUSUM statistics are standardized by .σ̂N it can be more difficult to detect larger
changes when compared to detecting smaller ones. This issue is sometimes referred
to as the “non-monotonic power problem”.
The methods used to modify the sample variance in (3.1.7) and (3.1.10)–(3.1.14)
can also be applied in the setting of long–run variance estimation. Recall the
106 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
notation under (2.3.1) where .kj∗ , .j ∈ {1, . . . , R} denote change points in the
mean, satisfying .1 = k0∗ < k1∗ < · · · < kR∗ < kR+1 ∗ = N + 1. Letting
.1 < k1 < · · · < kS < N denote candidate values for these changes, we estimate
1 E
ki
.X̄ki = Xj , 1 ≤ i ≤ S + 1,
ki − ki−1
j =ki−1
k0 = 0 and .kS+1 = N, as before. We modify the estimators for .γE (l) by centering
.
Ēj = Xj − X̄ki ,
. if ki−1 < j ≤ ki . (3.1.32)
Then define
⎧
⎪ N −l
⎪
⎪ 1 E
⎪
⎪ Ēi Ēi+l , if 0 ≤ l < N,
⎨N −l
i=1
.γ̃l =
⎪
⎪ 1 E
N
⎪
⎪ Ēi Ēi+l , if − N < l < 0.
⎪
⎩ N − |l|
i=−(l−1)
E
N −1 ⎛ ⎞
l
σ̂N2 (k1 , . . . , kS ) =
. K γ̃l .
h
l=−(N −1)
where S is user–selected upper bound for the number of changes. Similarly, using
1 < k̂1 < k̂ . . . < k̂R̂ < N , estimates and locations of the number of changes, we
.
define
σ̄ 2
. = σ̂N2 (k̂1 , k̂2 , . . . , k̂R̂ ). (3.1.33)
N,R̂
Along the lines of the proof of Theorem 3.1.3 one can show that
P
∗ → σ .
2 2
σ̃N,R
.
3.1 Estimation of Long–Run Variances and Covariance Matrices 107
If .R̂ and .θ̂i,N = k̂i /N are asymptotically consistent estimators for R and .θi , it may
also be verified that
P
2
σ̄N,
.
R̂
→ σ 2.
These estimators for the long–run variance can be extended to vector valued
observations. The empirical autocovariance matrices are defined as
⎧
⎪
⎪ 1 E
N−l
⎪
⎪ (Xi − X̄N )(Xi+l − X̄N )T , if 0 ≤ l < N,
⎪
⎨N −l
i=1
.γ̂ l = (3.1.34)
⎪
⎪ 1 E
N
⎪
⎪ (Xi − X̄N )(Xi+l − X̄N ) , T
if − N < l < 0,
⎪
⎩ N − |l|
i=−(l−1)
where, as before,
1 E
N
X̄N =
. Xi .
N
i=1
E
N −1 ⎛ ⎞
l
Ê N =
. K γ̂ l . (3.1.35)
h
l=1−N
( || ||ν )1/ν
vm = E ||E i − E ∗i,m ||
. ≤ cm−α with some c > 0 and α > 2,
where .|| · || is the Euclidean norm in .Rd , .E ∗i,m = g(ηi , . . . , ηi−m+1 , ηi−m
∗ ,
∗ ∗
ηi−m−1 , . . .), where .{ηk , k ∈ Z} are independent, identically distributed copies
of .η0 , independent of .{ηj , j ∈ Z}.
108 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
Theorem 3.1.5 If .H0 of (1.3.1) and Assumptions 3.1.4–3.1.5 hold, and .{Xi , i ∈ Z}
are .Lν –decomposable for .ν ≥ 4, then
P
Ê N → E
.
Similarly as in the univariate case, one can show that if a change occurs in the
multivariate mean, then .Ê N has a bias that is diverging as a function of the sample
size. One can show under the AMOC model where the time of change .k ∗ = LNθ ⎦
as in Assumption 2.1.1 that
/ c
Ê N P
. → θ (1 − θ )(μ1 − μA )(μ1 − μA )T K(u)du, (3.1.36)
h −c
where .μ1 , .μA are the mean vectors before and after the change. Hence
−1 P
.||Ê N || → 0.
−1 −1
The result in (3.1.36) provides the exact rate for .Ê N . Namely, .Ê N is asymptoti-
cally of the order .1/ h. Since functionals of .QN (·) converge in probability to infinity
faster than h, one can prove that the statistics in (1.3.8) and Theorem 1.3.1 remain
consistent under the AMOC model with .E replaced with its estimator .Ê N .
According to (3.1.36), the largest empirical eigenvalue of .Ê N is tending to infinity
with the sample size at the same rate as h. Hence all statistics in Theorems 1.3.5
and 1.3.6 are also consistent under the AMOC model.
We can modify the definition of .Ê N as in the scalar case to account for S possible
changes in the mean. If .1 < k1 < . . . < kS < N denote the possible times of
changes in the mean vector, we define
1 E
ki
.X̄ki = Xj , i ∈ {1, ..., S + 1},
ki − ki−1
j =ki−1 +1
where .k0 = 0 and .kS+1 = N, as before. The estimator for the covariance matrix of
order lag .l is defined through
Ē j = Xj − X̄ki ,
. if ki−1 < j ≤ ki .
3.1 Estimation of Long–Run Variances and Covariance Matrices 109
Let
⎧
⎪ N −l
⎪
⎪ 1 E
⎪
⎪ Ē i Ē T
i+l , if 0 ≤ l < N,
⎨N −l
i=1
γ̃ l =
.
⎪
⎪ 1 E
N
⎪
⎪ Ē i Ē T
⎪
⎩ N − |l| i+l , if − N < l < 0,
i=−(l−1)
which is implicitly a function of the candidate change points .1 < k1 < k2 < . . . <
kS < N. Now we define the estimator for .E from the centered observations with
E
N −1 ⎛ ⎞
l
Ê N (k1 , . . . , kS ) =
. K γ̃ l .
h
l=1−N
2 , we define
Similarly to .σ̃N,S
where
and S is a user specified upper bound for the number of changes. As in defining
σ̄ 2 we may define
.
N,R̂
where .k̂1 , . . . , k̂R̂ are the estimated times of the changes. It can be shown if the true
number of changes is less than .R ∗ , then
P
Ẽ N,R ∗ → E.
.
Also, if the estimators for the number and times of changes are consistent, then
P
Ē N,R̂ → E.
.
It is known that in some cases the estimators of .σ 2 of (3.1.1) converge to the true
long–run variance parameter slowly, and moreover are often sensitive to the choice
110 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
where
| k |
|E k E ||
N
(1) |
.T =| Xl − Xl |
k,10 | N |
l=1 l=1
and
| i | | N |
|E i E ||
k |E N −i E
N |
(2) | | |
.T = max | Xl − Xl | + max | Xl − Xl | .
k,10 1≤i≤k | k | k+1≤i<N | N −k |
l=1 l=1 l=i+1 l=k+1
Similarly we define
E
N −1 (1)
Tk,11
TN,11 =
.
(2)
,
k=1 Tk,11
where
⎛ k ⎞2
(1)
E k E
N
.T
k,11 = Xl − Xl
N
l=1 l=1
and
⎛ i ⎞2 −1
⎛ ⎞2
(2)
E
k E i E
k E
N E
N
N −i E
N
.T = Xl − Xl + Xl − Xl .
k,11
k N −k
i=1 l=1 l=1 i=k+1 l=i+1 l=k+1
We note that .TN,10 and .TN,11 do not depend on the unknown mean under the
null hypothesis. It follows from Assumption 1.1.1 that under the no change null
hypothesis
LN
E t⎦
D [0,1]
N −1/2
. (Xl − μ) −→ σ W (t), (3.1.38)
l=1
where .{W (t), 0 ≤ t ≤ 1} denotes a Wiener process. Using (3.1.38) we get that
{ } D2 [0,1] { }
. N −1/2 TLN
(1)
t⎦,10 , N −1/2 (2)
T LN t⎦,10 , 0 ≤ t ≤ 1 −→ σ L (1)
t,10 , σ L (2)
t,10 , 0 ≤ t ≤ 1 ,
3.1 Estimation of Long–Run Variances and Covariance Matrices 111
where
(1)
Lt,10 = |W (t) − tW (1)|,
.
and
| | | |
| | | 1−u |
.L
(2) | | |
= sup |W (u) − uW (t)|+ sup |W (1) − W (u) − (W (1) − W (t))|| .
t,10
0<u≤t t≤u<1 1 − t
Hence, again under Assumption 1.1.1 and the null hypothesis we get that
D L(1)
t,10
TN,10 → sup
.
(2)
. (3.1.39)
0<t<1 Lt,10
To prove that .TN,10 is asymptotically consistent under the one change in the mean
alternative of (1.1.2), we note that
Tk(1)
∗ ,10
TN,10 ≥
.
(2)
.
Tk ∗ ,10
1 P
. T (1)
∗ → θ (1 − θ ),
N |ΔN | k ,10
1/2
(1)
1 Tk ∗ ,10 D θ (1 − θ )
. → .
N |ΔN | T (2)
1/2 (2)
σ Tθ,10
k ∗ ,10
where
/ t
(1)
.L
t,11 = (W (u) − uW (1))2 du
0
112 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
and
/ t⎛ ⎞2
(2) u
.L
t,11 = W (u) − W (1) du
0 t
/ 1⎛ ⎞2
1−u
+ W (1) − W (u) − (W (1) − W (t)) du.
t 1−t
/ 1 (1)
1 D Lt,11
. TN,11 → (2)
dt. (3.1.40)
N 0 Lt,11
1 P
. TN,11 → ∞,
N
H0 : σ02 = σA2
. (3.2.2)
HA : σ02 /= σA2 .
. (3.2.3)
3.2 Changes in Variances and Covariances 113
In order that the parameters .μ, .σ0 and .σA are uniquely identified, we make the
following assumption.
1 E
N
. X̂N = Xi .
N
i=1
Further, if .H0 holds the average of the centered and squared observations
is the natural estimator for the variance parameter .σ02 . The CUSUM process for the
variance is
⎛ ⎞
LN
E t⎦ EN
−1/2 ⎝ LNt⎦
.ZN (t) = N σ̂i2 − σ̂i2 ⎠ .
N
i=1 i=1
Theorem 3.2.1 We assume that .H0 of (3.2.1) is satisfied along with Assump-
tions 1.2.1 and 3.2.1, and that the series .{Ei i ∈ Z} in (3.2.1) is .Lν –decomposable
for some .ν ≥ 4.
(i) If .I (w, c) < ∞ for some .c > 0, where .I (w, c) is defined in (1.2.4), then
for all .x ∈ R.
114 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
we get that
E
k
k E 2 E
N k
k E
N
. σ̂i2 − σ̂i = (Xi − μ)2 − (Xi − μ)2 − 2Rk , (3.2.5)
N N
i=1 i=1 i=1 i=1
where
┌ k ┐
E k E
N
Rk =
. Xi − Xi (X̂N − μ). (3.2.6)
N
i=1 i=1
which when combined with (3.2.7) imply that the remainder term .Rk does not
influence the limit distribution of functionals of the weighted CUSUM of .σ̂i2 ’s.
Since .Xi being .Lν –decomposable implies that .(Xi − μ)2 is .Lν/2 –decomposable,
the result follows from Theorem 1.2.5 of Chap. 1. The proof of this result is omitted
since it follows from Theorem 2.1.1. ⨆
⨅
We may also obtain similar results to those in Sects. 2.1 and 2.2 regarding the
asymptotic properties of the CUSUM process for the variance under .HA in (3.2.3).
As in Theorem 2.2.1, we denote .k ∗ /N = θN and let .σA2 = σA,N 2 depend on the
sample size in order to study local alternatives where .σA,N converge to .σ02 , and
2
.θN → 0 or .θN → 1.
3.2 Changes in Variances and Covariances 115
Theorem 3.2.2 We assume that .HA of (3.2.3) is satisfied along with Assump-
tions 1.2.1 and 3.2.1, and that the series .{Ei i ∈ Z} in (3.2.1) is .Lν –decomposable
for some .ν ≥ 4.
(i) If .0 ≤ κ < 1/2, then
(ii) Also,
An estimator for the time of change in the variance in the AMOC in the variance
model is defined as
⎧⎛ ⎞κ ||E |⎫
k E 2 ||
k N
N |
.k̂N = k̂N (κ) = sargmax σ̂i −
2
| σ̂i | , (3.2.10)
k∈{1,...,N } k(N − k) | N |
i=1 i=1
where .0 ≤ κ ≤ 1/2. The following result is analogous to Theorem 2.2.1. Its proof
follows using the decomposition of the variance CUSUM process in (3.2.5), and
with minor modifications of Theorem 2.2.1.
Theorem 3.2.3 We assume that .HA of (3.2.3) is satisfied along with Assump-
tions 1.2.1 and 3.2.1, and that the series .{Ei i ∈ Z} in (3.2.1) is .Lν –decomposable
for some .ν ≥ 4.
(i) If .0 ≤ κ < 1/2,
then
N(σ02 − σA2 )2 D
. (θ̂N − θ ) → ξ(κ).
τ2
116 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
(ii) If .κ = 1/2
N
|σ02 − σA2 | → 0 and
. (σ 2 − σA2 )2 → ∞,
(log log N)1/2 0
then
N(σ02 − σA2 )2 D
. (θ̂N − θ ) → ξ(1/2).
τ2
where .τ 2 , .mκ (t) and .ξ(κ) are defined in (3.2.4), (2.2.1) and (2.2.3).
Remark 3.2.1 Once again the condition that .ΔN = |σ02 − σA2 | → 0 gives that
the limiting distribution of .k̂N does depend on the full joint distribution of the
2
.Ei ’s, but instead only depends on the long–run variance parameter .τ . If instead
|k̂N − k ∗ | = OP (1).
.
Another popular model allows for changes in the mean and variance to occur at the
same time:
⎧
μ0 + σ0 Ei , if 1 ≤ i ≤ k ∗ ,
.Xi = (3.2.11)
μA + σA Ei , if k ∗ + 1 ≤ i ≤ N.
Here we take Assumption 3.2.1 as granted, so that .μ0 , μA , σ0 , and .σA are unknown
parameters representing the means and variances before and after the change
point .k ∗ . The null hypothesis of interest is the stability of the mean and variance
parameters:
H0 : μ0 = μA and σ0 = σA .
. (3.2.12)
HA : μ0 /= μA and/or σ0 /= σA .
. (3.2.13)
3.2 Changes in Variances and Covariances 117
None of the tests discussed up to this point are “universally consistent” against the
alternative of (3.2.13). A natural idea is to combine the outcomes of two tests; one
for the stability for the mean, and another for the stability of the variance. Deriving
joint asymptotics for such tests is exceedingly complicated. An alternate approach is
to test for the change in the mean, and then based on the result of that test perform a
subsequent test for a change in the variance. Our results concerning change point
tests for the mean have assumed so far that the model errors form a stationary
sequence. This assumption may appear to be frequently violated, and evidently does
not hold under model (3.2.13). This motivates studying the behaviour of CUSUM
based tests when the model errors are generally heteroscedastic, which we take up
in Sect. 3.3 below.
Before turning to that situation though, we discuss for a moment the asymptotic
properties of the CUSUM process for the variance when the mean is estimated using
segmentation based on a consistent change point estimator. Such an estimator under
heteroscedasticity of the error process is introduced in Sect. 3.3. Let .k̂ denote an
estimator for the time of change in the mean in model 3.2.11. We segment the
observations into two sub–samples .{X1 , X2 , . . . , Xk̂ } and .{Xk̂+1 , Xk̂+2 , . . . , XN },
and compute the corresponding sample means:
1E E
k̂ N
1
X̂k̂,1 =
. Xi and X̂k̂,2 = Xi .
k̂ i=1 N − k̂
i=k̂+1
Assume for the moment that .k ∗ < k̂. This means that .k̂ − k ∗ observations are
incorrectly centered. If .k ∗ < k ≤ k̂, then using
we get
E
k̂ E
k̂ E
k̂
. (Xi − X̂k̂,1 )2 = (Xi − μA )2 + 2(μA − X̂k̂,1 ) (Xi − μA )
i=k ∗ +1 i=k ∗ +1 i=k ∗ +1
E
k̂
+ (μA − X̂k̂,1 )2 .
i=k ∗ +1
and
| |
| k̂ |
| E |
.| − |
A | = oP (N
1/2
| (Xi μ ) ).
|i=k ∗ +1 |
Using now Theorem 3.2.3 (cf. also Remark 3.2.1) we obtain that
E
k̂ ⎛ ⎞
. (μA − X̂k̂,1 )2 = OP (k̂ − k ∗ )(μ0 − μA )2 .
i=k ∗ +1
These arguments can be adjusted to give the same approximations for .k̂ < k ∗ . Hence
| |
| |
. sup |Z̃N (t) − ZN (t)| = oP (1),
0<t<1
so the estimation of .k ∗ does not change the asymptotic properties of the variance
CUSUM processes. For example, such arguments can be used to establish Theo-
rem 3.2.1 when the process .ZN (t) is replaced with .Z̃N (t). The same reasoning can
be applied when more than one mean change point estimator is used to segment the
mean. We showed in this section how methods to perform change point analysis for
the mean can be modified to detect and estimate changes in the variance. We only
considered AMOC in the variance models, although these methods may be extended
to cover multiple points in the variance models.
3.2 Changes in Variances and Covariances 119
where .EE i = 0 and .EE i E Ti = I. This model allows for AMOC in the covariance
matrix of the vectors, from .E 0 to .E A , at time .k ∗ . We may phrase detecting a change
point under model (3.2.14) as a hypothesis test of .H0 : E 0 = E A , versus HA :
E 0 /= E A .
To construct a test statistic for distinguishing between .H0 and .HA , we let .vech(·)
be the operator that stacks the columns on and below the diagonal of a symmetric
.d × d matrix as a vector in .R , where .d = d(d + 1)/2.
d
When .H0 holds and .μ = 0 in (3.2.14), the expected values of the .d dimensional
vectors .vech(Xj XTj ) are constant for .j ∈ {1, . . . , N }. Consequently, a vector valued
CUSUM process as in Sect. 1.3 can be constructed as
⎛ ⎞
LN t⎦
1 ⎝E ┌ ┐ LNt⎦ EN ┌ ┐
.ZN (t) = √ vech Xj XT
j − vech Xj XT
j
⎠ , 0 ≤ t ≤ 1.
N j =1 N
j =1
Under .H0 and when the errors .{E i , i ∈ Z} have suitably decaying autocovariance
matricies, for instance if they are .Lν –decomposable for some .ν > 4, it may be
shown the long–run covariance
E ⎛ ┌ ┐ ┌ ┐⎞
E=
. Cov vech X0 XT T
0 , vech Xj Xj
j ∈Z
120 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
We may obtain the following result by applying the results in Sect. 1.3, in
particular (1.3.8). Note that under .H0 and when the errors .{E i , i ∈ Z} are .Lν –
decomposable in .Rd for some .ν > 4, then the vectors .Yj = vech(Xj XT j ) in .R are
d
.L
ν/2 –decomposable in .Rd .
Theorem 3.2.4 Suppose .H0 holds and that the errors in model (3.2.14), .{E i , i ∈ Z}
are .Lν –decomposable for some .ν > 4. If .E is non-singular, and .||E − Ê|| = oP (1),
then
D E
d
ΛN −→ sup
. Bl2 (t), as N → ∞,
0≤t≤1 l=1
and
d /
E 1
D
ΩN −→
. Bl2 (t)dt as N → ∞.
l=1 0
The exact distribution of the limiting random variables in Theorem 3.2.4 are
computed in Kiefer (1959).
The location of the change point in (3.2.14) may be estimated with
In studying CUSUM–based methods to perform change point analysis for the mean
in Chap. 2, we assumed that the model errors in (1.1.1) were strictly stationary.
In many cases though we are interested in conducting change point analysis for
the mean of a series that appears to display other deviations from stationarity,
for instance changes in the variance. In this subsection we study the asymptotic
behaviour of CUSUM–based tests for changes in the mean for series exhibiting
heteroscedasticity.
3.3 Heteroscedastic Errors 121
Specifically we consider the model in (1.1.1), but with errors that do not
necessarily have homogenous variance:
Assumption 3.3.1 .Ei = a(i/N)ei , i ∈ {1, ..., N}, with .Eei = 0, and .Eei2 = 1.
The function a we take to satisfy the following assumption.
E
R+1
a(t) =
. rl 1{θl−1 < t ≤ θl },
l=1
with .0 = θ0 < θ1 < θ2 < . . . < θR < θR+1 = 1 and .r1 /= r2 /= . . . /= rR+1 . In
this model the variance jumps from .rl2 to .rl+1
2 at time .LNθl ⎦.
(ii) (polynomially evolving variances) in this case .a 2 (x) is a non negative polyno-
mial, which includes linearly and quadratically changing variances.
We assume that the innovation sequence .{ei , i ∈ Z}’s satisfies the functional central
limit theorem:
Assumption 3.3.3 There is .σ > 0 such that
LN
E t⎦
D [0,1]
N −1/2
. ei −→ σ W (t),
i=1
LN
E t⎦
D [0,1]
N −1/2
. Ei −→ W (b(t)),
i=1
122 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
Proof Let
E
k
S(k) =
. ei and S(0) = 0.
i=1
E
k E
k
. Ei = a(i/N)ei = a(k/N )S(k)
i=1 i=1
E
k−1
− S(i) (a((i + 1)/N) − a(i/N)) , 1 ≤ k ≤ N.
i=1
E
N −1
= oP (N 1/2 ) sup |a(t)| + oP (N 1/2 ) |a((l + 1)/N) − a(l/N)|
0≤t≤1 l=1
= oP (N 1/2 ).
3.3 Heteroscedastic Errors 123
By the Jordan decomposition theorem (see Hewitt and Stromberg, 1969, p. 266),
there are two non-decreasing functions such that .a(t) = a1 (t) − a2 (t). Focusing on
the function .a1 (t) we have
E
k−1
. WN (l)(a1 ((l + 1)/N) − a1 (l/N))
l=1
E
k−1 / l+1
= WN (l) da1 (x/N)
l=1 l
/ k k−1 /
E l+1
= WN (x)da1 (x/N) + (WN (l) − WN (x))da1 (x/N).
0 l=1 l
By the modulus of continuity of the Wiener process (see Appendix A.2) we have
that
and therefore
| / k |
| E
k−1 |
| |
. |a1 (k/N)WN (k)− WN (l)(a1 ((l + 1)/N)−a1 (l/N))− a1 (x/N)dWN (x)|
| 0 |
l=1
= OP ((log N ) 1/2
).
Similarly,
| / k |
| E
k−1 |
| |
. |a2 (k/N)WN (k)− WN (l)(a2 ((l + 1)/N)−a2 (l/N))− a2 (x/N)dWN (x)|
| 0 |
l=1
= OP ((log N)1/2 ),
resulting in
| k / k |
|E |
| |
. max | El − σ a(x/N)dWN (x)| = oP (N 1/2 ).
1≤k≤N | 0 |
l=1
124 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
Let
/ t
UN (t) =
. a(x/N)dWN (x), 0 ≤ t ≤ N.
0
= OP ((log N) 1/2
),
since
/ k+s
. max sup a 2 (x/N)dx ≤ 4(a12 (1) + a22 (1)).
0≤k≤N −1 0≤s≤1 k
D [0,1]
ZN (t) −→ σ ┌(t) with ┌(t) = W (b(t)) − tW (b(1)),
.
The covariance function .C(t, s) evidently differs from the covariance function
of the Brownian bridge so long as b is not equal to a constant multiple of
the identity function, which upon inspecting (3.3.1) occurs if a is non-constant.
The approximation of the distribution function of .sup0<t<1 |┌(t)|, which requires
estimating .C, is a difficult problem that we will turn to momentarily. Before doing
so we note that approximations as in Corollary 3.3.1 can also be established for
weighted functionals of .ZN . As we have seen in Sect. 1.2, in this case we need a
rate of approximation of the partial sums of the .ei ’s by a Gaussian process.
Assumption 3.3.4 For each N there are two independent Wiener processes
{WN,1 (t), 0 ≤ t ≤ N/2}, .{WN,2 (t), 0 ≤ t ≤ N/2}, .σ > 0 and .ζ < 1/2 such
.
that
| k |
|E |
−ζ | |
. sup k | ei − σ WN,1 (k)| = OP (1)
1≤k≤N/2 | |
i=1
and
| N |
| E |
−ζ | |
. sup (N − k) | ei − σ WN,2 (N − k)| = OP (1).
N/2<k<N | |
i=k+1
Theorem 3.3.2 We assume that .H0 of (1.1.2), Assumptions 1.2.1, 3.3.1, 3.3.2
and 3.3.4 are satisfied and
(ii) If .p ≥ 1 and
/ 1 [t (1 − t)]p/2
. < ∞,
0 w(t)
then
/ /
1 |ZN (N t/(N + 1))|p D
1 |┌(t)|p
. dt → dt,
0 w(t) 0 w(t)
Proof The result can be proven by combining the methods of the proofs of
Theorems 1.2.2, 1.2.4 and 3.3.1. ⨅
⨆
We note that (3.3.3) holds in the important case when the variance exhibits
change points, or in other words is piecewise constant.
Self–normalized Darling–Erdős type result as in Theorem 1.2.5 can also be
derived for such heteroscedastic processes. However, these results are difficult to
apply since they depend on estimating .b(t) and its limiting behaviour near 0 and 1.
A similar problem is discussed in Sect. 4.3 in the context of regression models.
We now illustrate how to use Hilbert space techniques to approximate the
/1
critical values for the weighted Cramér–von Mises statistics . 0 ┌ 2 (t)/w(t)dt. The
Karhunen–Loéve expansion, see e.g. pg. 188 of Hsing and Eubank (2015), yields
that
/ 1 ∞E
┌ 2 (t)
. dt = λi Ni2 ,
0 w(t)
i=1
where .N1 , N2 , . . . are independent and identically distributed standard normal ran-
dom variables, and .λ1 ≥ λ2 ≥ . . . are the eigenvalues of the kernel integral operator
associated with the kernel .C(t, s)/[w(t)w(s)] defined in (3.3.2). Specifically,
/ 1 C(t, s)
λi φi (t) =
. φi (s)ds, 1 ≤ i < ∞. (3.3.4)
0 w(t)w(s)
The eigenvalues .λ1 ≥ λ2 ≥ . . ., are unknown but we can estimate them from the
sample. The first step is the estimation of the covariance function .C. We will define
an estimator .ĈN below that is .L2 consistent, such that
⎛ ⎞2
/ 1/ 1 ĈN (t, s) − C(t, s)
. dtds = oP (1). (3.3.5)
0 0 w(t)w(s)
∞
E E
d
. λi Ni2 ≈ λ̂i,N Ni2 (3.3.7)
i=1 i=1
to obtain approximate critical values of the Cramér–von Mises statistics. The choice
of d is similar to choice of the number of the eigenvectors (or eigenfunctions) in
principal component analysis. Large d reduces the empirical bias but introduces
higher variance in (3.3.7). We also note an upper bound for the rate of convergence
in (3.3.6) is the order of convergence in (3.3.5).
Now we discuss an estimator for .C satisfying (3.3.5). According to the definition
of .C this requires estimating .b(·) defined in (3.3.1). The proofs of the results are
based on approximating moments of sums, for which we make use of the .Lν –
decomposability of the .ei ’s.
To illustrate the method, first we assume that the errors are uncorrelated, so that
Eej el = 0, if j /= l.
. (3.3.8)
Let
LN t⎦
1 E 1 E
N
.b̂N (t) = (Xi − X̄N )2 , where X̄N = Xi . (3.3.9)
N N
i=1 i=1
Theorem 3.3.3 If .H0 of (1.1.2), Assumptions 1.2.1, 3.3.1, 3.3.2, and (3.3.8) hold,
and the innovations .{ei , i ∈ Z} are .Lν –decomposable for some .ν ≥ 4, then
Proof Since the function .b(·) is continuous and monotone on .[0, 1], it is enough to
show that for every .0 ≤ t ≤ 1,
P
b̂N (t) → b(t).
.
We write
LN t⎦
1 E 2 LN t⎦
b̂N (t) =
. a (i/N)ei2 − (X̄N − μ)2
N N
i=1
and
LN t⎦
1 E
X̄N − μ =
. a(i/N)ei .
N
i=1
128 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
Since .a(·) is bounded, we get due to the .Lν –decomposability of the errors that
⎛ ⎞2
LN
E t⎦ ⎛ ⎞
1 1
.E ⎝ a(i/N)ei ⎠ = O ,
N N
i=1
|X̄N − μ| = oP (1).
.
Also, using .Lν –decomposability, it follows that the series .{ei2 − Eei2 , i ∈ Z} is
.L
ν/2 –decomposable, from which it follows using Theorem A.3.1 that
| |ν
|LN |
|E t⎦
|
.E | 2 2 2 |
a (i/N)(ei − σ )| ≤ c1 N ν/2 .
|
| i=1 |
Therefore
LN t⎦
1 E 2
. a (i/N)(ei2 − σ 2 ) = oP (1).
N
i=1
Since
LN t⎦ / t
1 E 2 2
. σ a (i/N) → σ 2
a 2 (t)dt,
N 0
i=1
ĈN (t, s) = b̂N (min(t, s)) − t b̂N (s) − s b̂N (t) + ts b̂N (1)
.
correlations of the first .LNt⎦ observations to build our estimator in the general case.
For any .1 ≤ k ≤ N we define for .0 ≤ l ≤ k
⎧
⎪
⎪ 1 E
k−l
⎪
⎪ (Xi − X̄N )(Xi+l − X̄N ), 0 ≤ l ≤ k − 1,
⎪
⎨N
i=1
.γ̃k,l =
⎪ 1 E
k
⎪
⎪
⎪
⎪ (Xi − X̄N )(Xi+l − X̄N ), −(k − 1) ≤ l < 0.
⎩N
i=−(l−1)
3.3 Heteroscedastic Errors 129
The estimator for .b(·) is the long–run variance estimator computed from
{X1 , . . . , XLN t⎦ }. We note that .{Xi , 1 ≤ i ≤ N } is not necessarily a stationary
.
sequence, althoughΣwe still think of the “long–run variance” as the limit of the
variance of .N −1/2 N i=1 Xi . We let
LNE
t⎦−1 ⎛ ⎞
l
b̃N (t) =
. K γ̃LN t⎦,l , (3.3.10)
h
l=−(LN t⎦−1)
The estimator for .C(t, s) is defined as before but we use .b̃N (t) instead of .b̂N (t).
It follows from Theorem 3.3.4 that
ĈN (t, s) = b̃N (min(t, s)) − t b̃N (s) − s b̃N (t) + ts b̃N (1)
. (3.3.11)
means occur and the size of the change. The proof of the following result is similar
to that of Theorem 2.1.1.
Theorem 3.3.5 We assume that .HA of (1.1.3), Assumptions 3.3.1, 3.3.2, 3.3.4 are
satisfied and
if and only if
We note that Górecki et al. (2018) discusses the consistency of the CUSUM
method with heteroscedastic data for other potential alternatives, including multiple
changes in the mean and a polynomially increasing mean after the change.
Limit theorems for the location of the time of change estimator are also affected
by heteroscedasticity in the data. We recall
| k |
|E k E ||
N
1 |
.k̂N = k̂N (κ) = sargmax Xi −
κ |
Xi | .
k∈{1,...,N −1} [k(N − k)] | N
i=1
|
i=1
The asymptotic properties of .k̂N are similar to those presented in Theorem 2.2.1,
although a possible jump in the variance of the model errors introduces some
differences. Let .a(θ −) = limx↓θ a(x), and .a(θ +) = limx↑θ a(x). Define
⎧⎛ ⎞1/2
⎪
⎪ 2a 2 (θ −)
⎪
⎨ 2 W1 (−t), if t < 0,
∗
.W (t) =
a (θ −) + a 2 (θ )
⎛ ⎞1/2 (3.3.12)
⎪
⎪ 2a 2 (θ )
⎪
⎩ 2 W2 (t), if t ≥ 0,
a (θ −) + a 2 (θ )
where .{W1 (t), t ≥ 0} and .{W2 (t), t ≥ 0} are independent Wiener processes. If
a(x) is continuous at .θ , then .W ∗ is a standard two sided Wiener process as in
.
then
2Δ2N ⎛ ⎞ D
∗
. k̂ N − k → ξ ∗ (κ).
a 2 (θ −) + a 2 (θ )
Proof We follow the proof of Theorem 2.2.1, but now we assume that .κ = 0.
Repeating the calculations we obtain that the limit distribution of .k̂N is determined
by
⎛ ⎞
E
k k∗
E
k ∗ (N − k ∗ )
Qk,5
. =2 ΔN ⎝ Ei − Ei ⎠
N
i=1 i=1
3.3 Heteroscedastic Errors 131
and
⎛⎛ ⎞2
k(N − k ∗ )
Qk,6 =
. 1{1 ≤ k ≤ k ∗ }
N
⎛ ⎞2 ⎛ ⎞2 ⎞
k ∗ (N − k) k ∗ (N − k ∗ )
+ 1{k ∗ < k ≤ N} − Δ2N
N N
on the set .|k − k ∗ | ≤ A/Δ2N , where A is an arbitrary positive scalar. It follows that
for .C > 0, and .c > 0,
| |
|1 |
. sup || Qk∗+cs/Δ2 ,6 + 2cθ (1 − θ )|s|m0 (s)|| = o(1). (3.3.13)
−C≤s≤C N N
and for .s ≥ 0
k ∗ +cs/Δ2N
1 k ∗ (N − k ∗ ) E
. Qk ∗ +cs/Δ2 ,5 = −2 ΔN Ei .
N N N
i=k ∗ +1
Now we use Theorem A.1.1 to define two independent Wiener processes .WN,1 and
WN,2 such that for all .A > 0
.
| |
| k∗
|
| E |
| |
. sup |ΔN ei − ΔN WN,1 (−s)| = oP (1) (3.3.14)
|
−A≤s≤0 | |
i=k ∗ +s/Δ2N +1 |
and
| |
| k ∗ +s/Δ2N |
| E |
. sup | ei − WN,2 (s)|| = oP (1). (3.3.15)
| ΔN
0≤s≤A | i=k ∗ +1 |
132 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
Using Abel’s summation formula, as in the proof of Theorem 3.3.1, (3.3.14) and
(3.3.15) imply that for the sums of the .Ei ’s:
| |
| k∗ / k∗ |
| E |
| |
. sup |ΔN Ei − a(x/N)dWN,1 (xΔ2N )| = oP (1)
|
−A≤s≤0 | ∗
k +s/ΔN +1
2 |
i=k ∗ +s/Δ2N +1 |
(3.3.16)
and
| |
| k ∗ +s/Δ2N / k ∗ +s/Δ2 |
| E N |
. sup | Ei − 2 |
a(x/N)dWN,2 (xΔN )| = oP (1). (3.3.17)
| ΔN
0≤s≤A | i=k ∗ +1 k∗ |
The processes
/ k∗
UN,1 (s) =
. a(x/N)dWN,1 (xΔ2N )
k ∗ +s/Δ2N +1
and
/ k ∗ +s/Δ2N
UN,2 (s) =
. a(x/N)dWN,2 (xΔ2N )
k∗
and similarly
| |
| |
. sup |EUN,2 (s)UN,2 (s ' ) − a 2 (θ +) min(s, s ' )| = o(1).
0≤s,s ' ≤A
1 D [−C,C]
. Qk ∗ +cs/Δ2 ,5 −→ 2θ (1 − θ )c1/2 W̃ (s), (3.3.18)
N N
where
⎧
a(θ −)W1 (−t), if t < 0,
W̃ (t) =
. (3.3.19)
a(θ )W2 (t), if t ≥ 0,
3.3 Heteroscedastic Errors 133
Taking
a 2 (θ −) + a 2 (θ )
c=
.
2
leads to .{c1/2 W̃ (t), −∞ < t < ∞} having the same distribution as .{W ∗ (t), −∞ <
t < ∞}. For any .C > 0 we define
| k |
|E k E ||
N
|
.k̂N (C) = sargmax | Xi − Xi | .
{k : |k ∗ −k|≤C} |
i=1
N
i=1
|
Since .{W ∗ (s), −∞ < s < ∞} and .{W ∗ (s), −∞ < s < ∞} have the same distribu-
tion, (3.3.13) and (3.3.18) imply
2Δ2N ⎛ ⎞ D
∗
. k̂ N (C) − k → argmax|s|≤C {W ∗ (s) − |s|m0 (s)}.
a 2 (θ −) + a 2 (θ )
Observing that
We define
{ }
ξ̄Δ (κ) = argmaxl ΔS̄(l) − Δ2 |l|mκ (l) ,
. (3.3.20)
where .mκ (t) is defined in (2.2.1). The proof of the below result is similar to the
above and Theorem 2.2.2.
Theorem 3.3.7 If .HA of (1.1.3), Assumptions 2.1.1, 3.3.1, and 3.3.2 hold, and the
errors .{ei , i ∈ Z} are .Lν −decomposable for some .ν > 4, and .0 ≤ κ ≤ 1/2.
. lim ΔN = Δ /= 0.
N →∞
then
D
. k̂N − k ∗ → ξ̄Δ (κ),
Example 3.4.1 (River Nile Data Revisited) In this example we revisit the change
point analysis of the river Nile flow series considered in Example 2.5.1. In particular,
we turn our attention to investigating the effect of using different long–run variance
estimators in order to compute the critical levels for the maximized CUSUM process
considered.
A plot of the empirical autocorrelation function (ACF) of the river Nile flow
series is given in the left–hand panel of Fig. 3.1. It is apparent that the magnitude
of the autocorrelation observed in the series is significantly larger than what one
would expect for a sequence of independent and identically distributed variables.
This might be attributed to either the presences of genuine autocorrleation in the
sequence, changes in the mean of the series that are not taken into account when
the ACF is estimated, or both. If autocorrelation is present in the sequence, then
we may estimate the parameter σ 2 using the kernel–bandwidth LRV estimator
in (3.1.18). With the aim of producing such an estimator based on the Bartlett
kernel, we computed the automatic bandwidth parameter based on fitting an
autoregressive process of order one to the series as defined in (3.1.30), giving
ĥ ≈ 6.5. The automatic bandwidth computed as detailed in Newey and West
(1987) was similar. The resulting long–run variance estimator with this bandwidth
2
was σ̂N,LRV = 86537.37. This value is nearly four times larger than the sample
variance σ̂N = 28637.95. Figure 3.2 shows a plot of |QN (t)| with a horizontal
2
black dotted line indicating the 95% quantile of σ̂N sup0≤t≤1 |B(t)|, which also
3.4 Data Examples 135
1.0
1.0
0.8
0.8
0.6
0.6
ACF
0.4
0.4
ACF
0.2
0.2
0.0
0.0
−0.2
−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag
Fig. 3.1 The left–hand panel show a plot of the ACF of the river Nile flow series. It appears that
the series exhibits significant autocorrelation. This could be attributed to the fact that a potential
mean change in the series is not accounted for in calculating the ACF. The right hand panel shows
the ACF of the river Nile series that was centered using a single change point estimate k̂N (0) as
described in (3.1.12)
appears in Fig. 2.2, and the horizontal blue dotted line shows the 95% quantile of
σ̂N,LRV sup0≤t≤1 |B(t)|. We see that in this case the maximum absolute value of
the CUSUM process exceeds both thresholds, suggesting that our earlier conclusion
that the series contains a change point remains the same when we factor in the
observed autocorrelation in the sequence by estimating σ 2 using a long–run variance
estimator.
Given that a potential change point in the mean of the series would also lead to
large values of the ACF, yet another option is the estimate the variance parameter
σ 2 after centering the data taking into account potential change points as in (3.1.33).
We estimated the location of a single change in the mean using k̂N (0) at the
location k̂N (0) = 28, and then centered the series using the mean estimates before
and after this change as in (3.1.32). The ACF of the resulting series appears in
the right hand panel of Fig. 3.1, which shows that centering the data based on a
change point estimator appears to remove most of the observed autocorrelation.
Estimating the bandwidth parameter in the same way based on the change point–
centered data results in ĥ ≈ 2.5, and an updated long–run variance estimate of
2 = 19020.28. The horizontal red dotted line in Fig. 3.2 shows the 95% quantile
σ̄N,1
of σ̄N,1 sup0≤t≤1 |B(t)|.
In this case the conclusions of the analysis do not change as a result of the
method used to estimate σ 2 . One should generally check whether this is the case
in conducting change point analyses.
136 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
600
process |QN (t)| for the river ^ N,LRV
Nile flow series. The σ
horizontal black dotted line ^N
σ
500
shows the 95% quantile of
σ̂N sup0≤t≤1 |B(t)|, which σN,1
also appears in Fig. 2.2, the
400
horizontal blue dotted line
shows the 95% quantile of
σ̂N,LRV sup0≤t≤1 |B(t)|, and
the horizontal red dotted line
QN(t)
300
shows the 95% quantile of
σ̄N,1 sup0≤t≤1 |B(t)|
200
100
0
8
6
Percent Change Sticky CPI
4
2
0
−2
Fig. 3.3 Plot of the monthly percentage change at an annual rate of the Sticky Consumer Price
Index (CPI) over the period from January, 2010 to March, 2023. A change in the mean level of the
series appears to occur just after a change in the variability of the series
3
|ZN(t)|
3
bN(t)
2
^
2
1
1
0
t t
Fig. 3.4 The left–hand panel shows the unweighted absolute CUSUM process |ZN |, along with
95% and 99% estimated quantiles of the random variable sup0<t<1 |┌(t)| defined in Theorem 3.3.2,
where the covariance kernel of the process ┌ is estimated using b̃N . The right–hand panel shows
the processes b̂N (uncorrelated) and b̃N (correlated) as described in (3.3.9) and (3.3.10)
centered based on a preliminary change point estimator before computing b̂N (t) and
b̃N (t), this appears to reflect the increased variance of the underlying series. We also
noticed that the series has reasonably strong autocorrelation, suggesting that using
b̃N is more appropriate when estimating the limiting distributions in Theorem 3.3.2.
We used b̃N to estimate the covariance kernel in (3.3.11), and subsequently used
138 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
simulation to estimate the quantiles of sup0<t<1 |┌(t)| in Theorem 3.3.2. The 95%
and 99% estimated quantiles of this distribution are also shown in the left–hand
panel of 3.4, which indicate that the change point in the level of the series observed
is highly significant. Binary segmentation applied after segmenting the series based
on this change point estimate suggest that no further change points of significance
can be detected.
Example 3.4.3 (Log–Returns Covariation and Volatility) In this example we
study the log returns of the adjusted closing stock prices between July 19, 1993, and
March 19 , 2009 (N = 3941), of the 12 companies as studied in Aue et al. (2009a).
The 12 companies considered are listed in Table 3.1, and include four companies in
the airline sector, four in the automotive sector and four in the energy sector. The
raw data we consider take the form pj,l , denoting the price of stock l at time j . The
l’th coordinate of Xj is defined as the centered log–return of stock l,
⎛ ⎞ ⎛ ⎞
1 E
3941
pj +1,l pi+1,l
xj,l
. = log − log , j = 1, . . . , 3941; l = 1, . . . , 12.
pj,l 3941 pi,l
i=1
1 E
j
. γ̂j (k, l) = yi,k yi,l , j = 101, . . . , 3941; k, l = 1, . . . , 12.
100
i=j −100+1
0.008
ALK ALK and AMR
AMR ALK and CAL
CAL ALK and LUV
LUV AMR and CAL
AMR and LUV
0.006
CAL and LUV
0.004
0.002
0.000
1994 1996 1998 2000 2002 2004 2006 2008 1994 1996 1998 2000 2002 2004 2006 2008
Fig. 3.5 The volatilities (left) and cross–volatilities (right) of the log–returns from the stocks in
the airline sector. Changes in the level of the cross–volatilities are apparent following 2001 and
2007
For the airline sector, the corresponding rolling averages are shown in Fig. 3.1.
There appear to be several locations at which a change in the covariance of the
returns occurs. To further assess this conjecture, we computed the test statistic value
Ω3941 = 60.07 as described in Sect. 3.2.2, where the long–run covariance matrix
E was computed according to Eq. 3.1.35 with a Bartlett kernel and bandwidth h =
log10 N . Since d = 12, we have d = 78. The approximate 95% null–quantile of the
statistic Ω3941 computed from Theorem 3.2.4 was 12.00. Therefore, there is strong
evidence against the hypothesis that there is no change in the covariance matrix
(Fig. 3.5).
In order to detect multiple changes in the covariance matrix, we applied binary
segmentation using the change point estimator (3.2.15). A summary and findings
of this application of binary segmentation are reported in Table 3.2. A number of
the detected changes can be readily associated with major historical events. For
example, the estimated changes in 2001 may be linked to the bursting of the dot–
com bubble and the September 11 attacks, while the break dates in 1997 and 1998
may be connected to the Asian financial crisis, and the collapse of the hedge fund
Long–Term Capital Management and the Russian financial crisis, respectively. The
detected breaks in 2007 and 2008 can be related to the collapse of the housing
market in the United States and several European countries. The detected change–
point on September 9, 2008 predates the collapse of the investment bank Lehman
Brothers by three trading days.
140 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
3.5 Exercises
Exercise 3.5.1 Show that if the observations {Xi , i ∈ Z} are strictly stationary
with EX0 = 0 and EX04 < ∞, then σ̂N2 (Naïve) defined in (3.1.17) is not a consistent
estimator of the long–run variance of the series.
Exercise 3.5.2 We assume that {Xi , i ∈ Z} is Lν –decomposable for some ν > 4.
Let
1 E
Q(k) = ⎛ ⎞
. (Xi − Xj )2
k
1≤i<j ≤k
2
and define
and 0, if t /∈ [2/N, 1 − 2/N]. Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.3 We assume that X1 , . . . , XN are Lν –decomposable for some ν > 4.
Let
1 E
Q(k) = ⎛ ⎞
. ||Xi − Xj ||2
k
1≤i<j ≤k
2
3.5 Exercises 141
and define
and 0, if t /∈ [2/N, 1 − 2/N]. Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.4 We assume that X1 , . . . , XN are Lν –decomposable for some ν > 4
and EXi = 0. Let
1E
k
.Q(k) = Xi Xi+1 , 1≤k ≤N −1
k
i=1
and define
and 0, if t /∈ [1/N, 1 − 1/N). Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.5 We assume that X1 , X2 , . . . , XN are Lν –decomposable for some
ν > 4. Let
1E 1 E
k N
Q(k) =
. (Xi − X̄N )(Xi+1 − X̄N ), 1 ≤ k ≤ N − 1 with X̄N = Xi
k N
i=1 i=1
and define
and 0, if t /∈ [1/N, 1 − 1/N). Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.6 We assume that X1 , X2 , . . . , XN are Lν –decomposable for some
ν > 4 and EXi = 0. Let
1E
k
Q(k) =
. Xi Xi+1 , 1≤k ≤N −1
k
i=1
and define
and 0, if t /∈ [1/N, 1 − 1/N). Show that ZN (t)/[t (1 − t)]γ converges in D[0, 1] for
all 0 ≤ γ < 1/2 and determine the limit.
Exercise 3.5.7 We assume that X1 , X2 , . . . , XN are Lν –decomposable for some
ν > 4 and EXi = 0. Let
1E
k
.Q(k; l) = Xi Xi+l , 1≤k ≤N −l
k
i=1
and 0, if t /∈ [l/N, 1 − l/N). Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.8 We assume that X1 , X2 , . . . , XN are Lν –decomposable for some
ν > 4 and EXi = 0. Let
1E T
k
Q(k) =
. Xi Xi+1 , 1≤k ≤N −1
k
i=1
and define
and 0, if t /∈ [1/N, 1 − 1/N). Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.9 We assume that X1 , X2 , . . . , XN are Lν –decomposable for some
ν > 4 and EXi = 0. Let
1 E || ||
k
|| ||
Q(k) =
. ||Xi XT
i+1 || , 1≤k ≤N −1
k
i=1
and define
and 0, if t /∈ [1/N, 1 − 1/N). Show that ZN (t) converges in D[0, 1] and determine
the limit.
3.6 Bibliographic Notes and Remarks 143
Parzen (1957) and Grenander and Rosenblatt (1957) introduced the kernel estimator
for the long–run variance. It was generalized to define long–run covariance matrices
in Newey and West (1987), Andrews (1991), and Andrews and Monahan (1992)
who also discussed the optimal choice of the bandwidth or smoothing parameter.
Sun et al. (2008) obtains optimal windows for robust testing procedures. Politis and
Romano (1995) advocated the flat top kernel as a way to reduce the bias of the kernel
estimator. Liu and Wu (2010) proves the central limit theorem for kernel estimates of
the long–run variance. The idea of ratio statistics appeared in Kim (2000). Horváth
et al. (2008) applied the idea of ratio statistics to find changes in the mean. See also
Pešta and Wendler (2020). Surgailis et al. (2008) used the ratios of second order
increments of strongly dependent observations. Shao (2015) and Shao and Zhang
(2010) review some self–normalized methods to conduct change point analysis with
time series similar to those in Sect. 3.1.3 that can be viewed as CUSUM statistics
normalized by a long–run variance estimator that has a fixed (rather than growing the
with the sample size) bandwidth. Betken (2016) develops a non–parameteric self–
normalized change point detection procedure. The non-monotonic power problem
is discussed in Crainiceanu and Vogelsang (2007) and Vogelsang (1997).
We demonstrated that CUSUM methods are extended easily for changes in other
summaries of the observations that may be represented as expected values, like
the variance or covariance matrix. For example, Berkes et al. (2009a) introduced
standardized tests to see if the covariances of linear process change during the
observation period. Galeano and Pena (2007) studies possible changes in the
covariances of dependent vectors. Aue et al. (2009a), Wied et al. (2012) and Steland
(2020) applied CUSUM statistics to detect changes in the covariance structure of
dependent vectors. A robust method to test for changes in the scale of multivariate
observations based on data depth was put forward in Chenouri et al. (2020). For
further results on detecting changes in the second order behaviour of time series we
refer to Wied et al. (2012) and Bücher et al. (2014).
Inclán and Tiao (1994), Gombay and Horváth (1994), Davis et al. (1995), Lee and
Park (2001), Deng and Perron (2008), Antoch et al. (1997), Berkes et al. (2009a),
Aue et al. (2009a), Wied et al. (2012), Wied et al. (2013) and Zhou (2013) propose
tests when the mean and/or the variance are changing under the alternative, i.e.
heteroscedastic errors can occur under the alternative. Dalla et al. (2020) and Xu
(2015) point out that in some applications the errors are heteroscedastic, which
should be taken into account when we test the validity of the no–change in the
mean null hypothesis. Busetti and Taylor (2004), Cavaliere et al. (2011), Cavaliere
and Taylor (2008), Hansen (1992) and Harvey et al. (2006) investigate change point
tests when some type of non–stationarity is exhibited by the data, including for
second order properties in Dette et al. (2019). The discussion in Sect. 3.3 is based on
Górecki et al. (2017). We assumed in Sect. 3.3 that the volatility might be changing
144 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity
under the null as well as under the alternative. Wu and Xiao (2018) devise methods
to test if the volatility of the errors is non-constant a function of time.
Estimating the variance of the underlying model errors in the presence of a shift
in the mean has been studied in a number of different contexts. See for example Axt
and Fried (2020), Fryzlewicz (2014), and Gallagher et al. (2022).
Chapter 4
Regression Models
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 145
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_4
146 4 Regression Models
variables. We begin by studying the case when .(yi , xi ) follow a linear model with
AMOC in the linear regression parameters. In particular, we assume that
⎛
xT
i β 0 + Ei , 1 ≤ i ≤ k∗,
yi =
.
T (4.1.1)
xi β A + Ei , k ∗ + 1 ≤ i ≤ N.
. H0 : β 0 = β A , (4.1.2)
HA : β 0 /= β A .
. (4.1.3)
A natural approach to test .H0 versus .HA is to test, for each .k ∈ {1, . . . , N},
for the equality of the linear model parameters between the two samples .(yi , xi ),
.i ∈ {1, . . . , k} and .(yj , xj ), .j ∈ {k + 1, . . . , N }, and then use as evidence against
.H0 the most significant of those N tests. When the model errors are independent
and identically distributed normal random variables with variance .EEi2 = σ 2 , the
likelihood ratio for this two sample test reduces to
⎛ 2 + σ̂ 2
⎞N/2
σ̂k,1 k,2
Λk =
.
2
, (4.1.4)
σ̂N,1
where
1 E 1 E
k N
2
σ̂k,1
. = (yi − xT β̂
i k,1 ) 2
, and σ̂ 2
k,2 = (yi − xT 2
i β̂ k,2 ) . (4.1.5)
N N
i=1 i=k+1
Here .β̂ k,1 and .β̂ k,2 are the least-squares estimators of the vector of regression
parameters based on, respectively, the first k and last .N − k observations. These
estimators are defined by
⎛ ⎞−1 ⎛ ⎞−1
β̂ k,1 = XT
. k,1 Xk,1 XT
k,1 Yk,1 , β̂ k,2 = XT
k,2 Xk,2 XT
k,2 Yk,2 , (4.1.6)
4.1 Change Point Detection Methods for Linear Models 147
In order that these estimators are asymptotically well defined, we require that the
matrices .XT T
k,1 Xk,1 and .Xk,2 Xk,2 are non singular for large enough k and .N − k
which will be implied by
Assumption 4.1.1 The matrix .A = {ai,j , i, j ∈ {1, . . . , d}} ∈ Rd×d is a non-
singular matrix, where .ai,j = Ex0,i x0,j .
Let
∞
E ┌ ┐
D=
. E E0 El x0 xT
l . (4.1.7)
l=−∞
We show in the proof of Theorem 4.1.1 that the infinite sum defining .D is absolutely
convergent if, for example, the vector valued process .{zi = (xT T
i , Ei ) , i ∈ Z}
taking values in .R d+1 ν
is strictly stationary and .L –decomposable, as defined in
Definition 3.1.1, for some .ν > 4.
Asymptotic consistency of the least squares estimators in (4.1.6) requires the
following assumption:
D [0,1] 1
t (1 − t)(−2 log ΛLN t⎦ ) −→
. (┌(t) − t┌(1))T A−1 (┌(t) − t┌(1)),
σ2
E
k E
N
2
.N σ̂k,1 = Ei2 − Dk,1 and 2
N σ̂k,2 = Ei2 − Dk,2 ,
i=1 i=k+1
148 4 Regression Models
where
⎛ ⎞T ⎛ ⎞−1
. Dk,1 = XT
k,1 Ek,1 XT
k,1 Xk,1 XT
k,1 Ek,1 ,
⎛ ⎞T ⎛ ⎞−1
. Dk,2 = XT
k,2 Ek,2 XT
k,2 Xk,2 XT
k,2 Ek,2 ,
and
Ek,1 = (E1 , E2 , . . . , Ek )T ,
. Ek,2 = (Ek+1 , Ek+2 , . . . , EN )T . (4.1.8)
and therefore
⎛ ⎛ || ||ν/2 ⎞2/ν
|| T ∗ ∗ T ||
. E ||xi xi − xi,l (xi,l ) || ≤ c1 l−α (4.1.9)
approximate . ki=1 (xi xT i − A) with a vector valued Wiener process. By the law
of the iterated logarithm for Wiener processes (see Breiman, 1968, p. 64),
|| k ||
||E ||
1 || T ||
. max || (xi xi − A)|| = OP (1). (4.1.10)
1≤k≤N (k log+ log(k))1/2 || ||
i=1
|| ||
1 || T ||
. max || Xk,1 k,1 || = OP (1),
E (4.1.12)
1≤k≤N (k log+ log(k))1/2
4.1 Change Point Detection Methods for Linear Models 149
|| ||
1 || T ||
. max || X k,2 E k,2 || = OP (1), (4.1.13)
1≤k<N ((N − k) log+ log(N − k))1/2
| k |
|E |
1 | 2 |
. max | (Ei − σ )| = OP (1),
2
(4.1.14)
1≤k≤N (k log+ log(k))1/2 | |
i=1
and
| N |
| E |
1 | 2 |
. max | (Ei − σ )| = OP (1).
2
(4.1.15)
1≤k<N ((N − k) log+ log(N − k))1/2 | |
i=k+1
2 + σ̂ 2
σ̂k,1 k,2
. − 2 log Λk = −N log 2
(4.1.16)
σ̂N,1
⎡ 2 + σ̂ 2
⎤
σ̂k,1 k,2
= −N 2
− 1 + Rk ,
σ̂N,1
where
⎛ 2
⎞⎡ 2 + σ̂ 2
⎤2
σ̂N,1 σ̂k,1 k,2
|Rk | ≤ N
.
2 + σ̂ 2
+1 2
−1 .
σ̂k,1 k,2 σ̂N,1
2
σ̂N,1
. max 2 + σ̂ 2
= OP (1),
d<k<N −d σ̂k,1 k,2
and
| | ⎛⎛ ⎛ ⎞ ⎞
| 1 1 || log log N 1/2
|
.| − 2 | = OP .
| σ̂N,1
2 σ | N
Now
⎛ ⎞2 1 ( )2
.
2
σ̂k,1 + σ̂k,2
2
− σ̂N,1
2
= 2 Dk,1 + (Dk,2 − D1,N )
N
4 ⎛ 2 ⎞
≤ 2 Dk,1 + (Dk,2 − D1,N )2 .
N
150 4 Regression Models
1 ⎛ ⎞
. max 2
Dk,1 + (Dk,2 − D1,N )2 = OP (1),
d<k≤N/2 log+ log k
and
1 ⎛ ⎞
. max 2
Dk,2 + (Dk,1 − D1,N )2 = OP (1).
N/2≤k<N −d log+ log(N − k)
and
N
. max |Rk | = OP (1).
N/2≤k<N −d log+ log(N − k)
Let
⎛ ⎛
1 1 T 1
Zk =
. S (k)A−1 S(k) + (S(N ) − S(k))T A−1 (S(N ) − S(k))
2
σ k N −k
⎞
1
− ST (N )A−1 S(N ) (4.1.17)
N
with
E
k
S(k) = Xk EkT =
. xi Ei . (4.1.18)
i=1
We showed that
k 1/2
. max |−2 log Λk − Zk | = OP (1) (4.1.19)
d<k<N −d (log+ log k)3/2
and
(N − k)1/2
. max |−2 log Λk − Zk | = OP (1). (4.1.20)
d<k<N −d (log+ log(N − k))3/2
4.1 Change Point Detection Methods for Linear Models 151
⎛ ⎞2/ν
. E||xi Ei − x∗i,l Ei,l
∗ ν/2
|| ≤ c2 l−α ,
with some .c2 > 0. Now the result in Theorem 4.1.1 follows from Theorem A.1.3.
⨆
⨅
Repeating the arguments used in Sect. 1.3, we can also derive from (4.1.19)–
(4.1.21) that
|
|
| max (−2 log Λk ) (4.1.23)
| d<k<N −d
.
⎛ ⎛ ⎞T ⎛ ⎛ ⎞|
N k k |
− max S(k) − S(N ) A −1
S(k) − S(N ) ||
d<k<N −d σ k(N − k)
2 N N
= oP (1/ log N).
The limit process depends on the unknown matrix .D and error variance .σ 2 , which
must be estimated in order to make use of Theorem 4.1.1 for testing .H0 versus .HA .
However, this limiting process does have a simple form when the model errors are
serially uncorrelated, or evolve as volatility processes, which we characterize by the
following assumption.
Assumption 4.1.3 .E(Ei |Fi ) = 0, where .Fi is the .σ –algebra generated by the
variables .{xl , El−1 , −∞ < l ≤ i}.
Using Assumption 4.1.3 we get that .D = σ 2 A, and therefore
⎛ ⎫ ⎧ d ⎫
1 D E
T −1
. (┌(t)−t┌(1)) A (┌(t)−t┌(1)), 0 ≤ t ≤ 1 = Bi (t), 0 ≤ t ≤ 1 ,
2
σ2
i=1
152 4 Regression Models
where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , d}, are independent standard Brownian
bridges. We recall from Theorem A.2.7
d
a(x) = (2 log x)1/2
. and bd (x) = 2 log x + log log x − log ┌ (d/2),
2
where .┌ (x) is the Gamma function. The following result is then a consequence of
Theorems 4.1.1 and 1.3.1.
Theorem 4.1.2 We assume that .H0 of (4.1.2) and Assumptions 4.1.1–4.1.3 are
satisfied, and that .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable for some .ν > 4.
ν
(i) We have
( ) D[0,1] E
d
t (1 − t) −2 log ΛLN t⎦ −→
. Bi2 (t), (4.1.24)
i=1
where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , d}, are independent Brownian bridges.
(ii) Also,
⎛ ⎫
. lim P a(log N) max (−2 log Λk ) 1/2
≤ x + bd (log N)
N →∞ d<k<N −d
= exp(−2e−x ) (4.1.25)
for all x, where .a(x) and .bd (x) are defined in (1.3.9).
(iii) If .min(t1 , 1 − t2 ) → 0, .N min(t1 , 1 − t2 ) → ∞ and .κ > 1/2, then
where .I is the .d × d identity matrix, .rN and .āI (d, κ) are defined in (1.2.27) and
(1.3.10).
Since .−2 log ΛLN t⎦ can be approximated with a (weighted) CUSUM process one
can prove that
1 E 2
d
t (1 − t) ( ) D[0,1]
.
2
−2 log ΛL(N +1)t⎦ −→ 2
Bi (t),
w (t) w (t)
i=1
assuming that .w(t) satisfies Assumption 1.2.1 and .I (w, c) is finite for some .c > 0,
where .I (w, c) is defined in (1.2.4). It is interesting to note that if Assumption 4.1.3
holds, then the limit distributions of functionals of the .log likelihood process are
parameter free. The limit distributions are known in some cases (see Shorack and
Wellner, 1986), but it is also possible to use simulations to obtain critical values.
4.1 Change Point Detection Methods for Linear Models 153
The proof of Theorem 4.1.1 shows that the likelihood ratio test statistic is approx-
imately a CUSUM process of the weighted model errors. As such, and since the
model errors are unknown, it is natural to test for homogeneity of the model
parameters by evaluating for changes in the mean of the empirical residuals
. Êi = yi − xT
i β̂ N,1 , (4.1.26)
where .β̂ N,1 is the least squares estimator of the regression parameters computed
from the entire sample. Let
⎛ ⎞
LN
E LN
LNt⎦ E E
t⎦ N t⎦
. ẐN (t) = N −1/2 ⎝ xi Êi − ⎠
xi Êi = N −1/2
xi Êi (4.1.27)
N
i=1 i=1 i=1
be the CUSUM process of the weighted residuals .xi Êi , .i ∈ {1, . . . , N }. The identity
in (4.1.27) is a consequence of the definition of .β̂ N,1 . Sometimes we refer to
.ẐN (t) as the partial sum process of the weighted residuals. Although the right most
expression in (4.1.27) is the simplest, we often use both representations of .ẐN (t). It
is natural to normalize this process by .D−1 defined in (4.1.7), and so we assume the
following.
Assumption 4.1.4 .D is a non–singular matrix.
Theorem 4.1.3 We assume that .H0 of (4.1.2), Assumptions 1.2.1, 4.1.1, 4.1.2,
and 4.1.4 are satisfied, and that .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable
ν
1 ⎛ T ⎞1/2
. sup ẐN ((N + 1)t/N)D−1 ẐN ((N + 1)t/N)
0<t<1 w(t)
⎛ d ⎞1/2
D 1 E
→ sup 2
Bi (t) ,
0<t<1 w(t) i=1
(ii) Also,
⎛ ⎛ ⎛ ⎞1/2
N
. lim P a(log N) max
N →∞ d<k<N −d k(N − k)
⎡⎛ ⎞T ⎛ k ⎞⎤1/2
Ek E ⎫
× ⎣ xi Êi D−1
xi Êi ⎦ ≤ x + bd (log N)
i=1 i=1
= exp(−2e−x )
for all .x ∈ R, where .a(x) and .bd (x) are defined in (1.3.9).
(iii) If .min(t1 , 1 − t2 ) → 0, .N min(t1 , 1 − t2 ) → ∞ and .κ > 1/2, then
κ−1/2 ||ẐT −1
N (t)D ẐN (t)||
1/2
D
rN
. sup → āI (d, κ),
t1 ≤t≤t2 (t (1 − t))κ
where .I is the .d × d identity matrix, .rN and .āI (d, κ) are defined in (1.2.27) and
(1.3.10).
Proof Since .H0 holds, we have
Êi = Ei − xT
. i (β̂ N,1 − β 0 ) (4.1.28)
and therefore
E
k
k E
N E
k
k E
N
. xi Êi − xi Êi = xi Ei − xi Ei
N N
i=1 i=1 i=1 i=1
⎛ k ⎞
E k E T
k
− xi xT
i − xi xi (β̂ N,1 − β 0 ).
N
i=1 i=1
We assume that .I (w, c) < ∞ for some .c > 0, so using Theorem 1.2.2 for each
coordinate we get that
|| ||
||L(N +1)t⎦ ||
N −1/2 || E L(N + 1)t⎦ EN
||
sup || T
xi xi − T ||
xi xi || = OP (1). (4.1.30)
.
||
w(t) ||
0<t<1 i=1
N
i=1 ||
4.1 Change Point Detection Methods for Linear Models 155
Hence (4.1.29) and Theorem 1.3.3 imply the third part of the theorem. ⨆
⨅
Similarly to Theorem 1.3.5, one can use the maximum norm of the CUSUM of
the weighted residuals. Let
1 || | D 1 || |
. max sup ẑN,j (t)| → max sup Bj (t)| ,
1≤j ≤d 0<t<1 w(t) 1≤j ≤d 0<t<1 w(t)
= exp(−2de−x )
Now we state the principal component version of Theorem 4.1.4, which may be
applied to the projections of the CUSUM process onto the eigenvectors .v1 , . . . , vp
for .p ∈ {1, . . . , d}.
Theorem 4.1.5 We assume that .H0 of (4.1.2), .p ∈ {1, . . . , d}, Assump-
tions 1.2.1, 4.1.1, 4.1.2 and 4.1.4 are satisfied, and that .{zi = (xT T
i , Ei ) , i ∈ Z} is
ν
.L –decomposable for some .ν > 4.
(i) If Assumption 1.2.1 holds and .I (w, c) < ∞ for some .c > 0, then
1 | | D 1 || |
. max sup |z̃N,j (t)| → max sup Bj (t)| ,
1≤j ≤p 0<t<1 λ1/2 w(t) 1≤j ≤p 0<t<1 w(t)
j
where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, ..., p}, are independent Brownian bridges.
(ii) Also,
⎛ ⎫
1 | |
. lim P a(log N) max sup |z̃N,j (t)| ≤ x + b(log N)
N →∞ 1≤j ≤p 1/N <t<1−1/N 1/2
λj w(t)
= exp(−2pe−x )
1 E T
N
ÂN =
. xi xi . (4.1.34)
N
i=1
|| ||
|| ||
. ||ÂN − A|| = OP (N −1/2 ).
4.1 Change Point Detection Methods for Linear Models 157
The matrix .D depends on the distribution of unobservable model errors .{Ei , i ∈ Z}.
In their place we use the residuals .Êi of (4.1.26). The long–run covariance matrix
estimator discussed in Sect. 3.1 is now computed from .xi Êi , .i ∈ {1, . . . , N }. Let
⎧
⎪ N −l
⎪
⎪ 1 E
⎪
⎪ xi xT
i+l Êi Êi+l , if 0 ≤ l < N,
⎨N −l
i=1
γ̂ l =
.
⎪
⎪ 1 E
N
⎪
⎪ xi xT
⎪
⎩ N − |l| i+l Êi Êi+l , if − N < l < 0,
i=−(l−1)
E
N −1 ⎛ ⎛ ⎞
l
D̂N =
. K γ̂ l . (4.1.35)
h
l=−(N −1)
Theorem 4.1.6 If .H0 of (4.1.2), Assumptions 3.1.4, 3.1.5, 4.1.1, 4.1.2 and 4.1.4
hold, then
P
D̂N → D.
.
Proof The proof follows from Theorem 3.1.5 upon showing that the residuals .Êi
may be replaced with the model errors .Ei in the estimator .D̂N with negligible
asymptotic error. Using (4.1.28) we have
xi Êi xT
.
T T T
i+l Êi+l = xi Ei xi+l Ei+l − xi Ei (β̂ N,1 − β 0 ) xi+l xi+l
− xi xT T T T T
i (β̂ N,1 − β 0 )xi+l Ei+l + xi xi (β̂ N,1 − β 0 )(β̂ N,1 − β 0 ) xi+l xi+l .
We showed that
Lν –decomposability yields
.
||Ex0 E0 xT
.
−α
l El || = O(l )
and
||Ex0 xT
.
T −α
0 xl xl || = O(l ).
158 4 Regression Models
Thus we get
|| −1 ⎛ ⎛ ⎞ ||
||NE 1 E
N −l ||
|| l T T ||
. || K xi Ei (β̂ N,1 − β 0 ) xi+l xi+l || = oP (1),
|| h N −l ||
l=0 i=1
||N −1 ⎛ ⎛ ⎞ ||
|| E 1 E
N −l ||
|| l ||
. || K xi xT (β̂ N,1 − β 0 )xT Ei+l || = oP (1)
|| h N −l i i+l ||
l=0 i=1
and
|| −1 ⎛ ⎛ ⎞ ||
||NE 1 E
N −l ||
|| l T T T ||
. || K xi xi (β̂ N,1 − β 0 )(β̂ N,1 − β 0 ) xi+l xi+l || = oP (1).
|| h N −l ||
l=0 i=1
As we have seen the quasi–likelihood method used in Sect. 4.1.1 leads to the
consideration of maximally selected standardized CUSUM processes constructed
from the covariates and residuals. Another possibility is to simply directly compare
estimators of the regression parameters before and after all candidate change
points, and use as evidence against .H0 the maximum difference. Formally then
we divide the data into two subsets at time k, .(xi , Ei ), .i ∈ {1, . . . , k} and .(xi , Ei ),
.i ∈ {k + 1, . . . , N }, and obtain the estimators .β̂ k,1 and .β̂ k,2 given in (4.1.6). We
Theorem 4.1.7 We assume that .H0 of (4.1.2) and Assumptions 1.2.1, 4.1.1, 4.1.2
and 4.1.4 are satisfied, and that .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable for
ν
some .ν > 4.
(i) If .I (w, c) < ∞ with some .c > 0, then
⎛ d ⎞1/2
1 ⎛ T ⎞1/2 D 1 E
. sup RN (t)AD−1 ARN (t) → sup Bi2 (t) ,
0<t<1 w(t) 0<t<1 w(t) i=1
where .{Bi (t), 0 ≤ t ≤ 1}, i ∈ {1, ..., d}, are independent Brownian bridges.
(ii) Also,
⎛ ⎛ ⎛ ⎞1/2
k(N − k)
. lim P a(log N) max
N →∞ d<k<N −d N
⎡⎛ ⎞T ⎛ ⎞⎤1/2 ⎫
× β̂ k,1 − β̂ k,2 AD−1 A β̂ k,1 − β̂ k,2 ≤ x + bd (log N)
= exp(−2e−x )
for all .x ∈ R, where .a(x) and .bd (x) are defined in (1.3.9).
(iii) If .min(t1 , 1 − t2 ) → 0, .N min(t1 , 1 − t2 ) → ∞ and .κ > 1/2, then we have
κ−1/2 ||RT −1
N (t)AD ARN (t)||
1/2
D
rN
. sup → āI (d, κ),
t1 ≤t≤t2 (t (1 − t))κ
where .I is the .d × d identity matrix, .rN and .āI (d, κ) are defined in (1.2.27) and
(1.3.10).
160 4 Regression Models
where .Xk,1 , .Xk,2 , .Ek,1 , and .Ek,2 are defined in Eqs. (4.1.6) and (4.1.8). Using
(4.1.10) and (4.1.12) we conclude
||⎛ ⎛⎛ ⎞−1 1 ⎞ ||
k || ||
max || T
Xk,1 Xk,1 − A −1
Xk,1 Ek,1 ||
T
.
d<k<N −d log+ log(k) || k ||
||⎛ ⎞−1 1 || || ||
k || T ||
−1 || || T ||
≤ max ||
≤ || Xk,1 Xk,1 − A || ||Xk,1 Ek,1 ||
d<k<N −d log log k + k
||⎛ ⎞−1 1 ||
k 3/2 || T ||
≤ max || X Xk,1 − A || −1 ||
d<k<N −d (log+ log k)1/2 || k,1 k
|| ||
1 || T ||
× max || Xk,1 E k,1 || = OP (1).
d<k<N −d (k log+ log k)1/2
Similarly,
||⎛ ⎛⎛ ⎞−1 ⎞ ||
N −k || 1 ||
|| XT Xk,2 − −1
Xk,2 Ek,2 ||
T
. max
d<k<N −d log+ log(N − k) || k,2
N −k
A || = OP (1).
1 1
. β̂ k,1 = β 0 + A−1 XT
k,1 Ek,1 + Rk,1 and β̂ k,2 = β 0 + A−1 XT
k,2 Ek,2 + Rk,2 .
k N −k
Therefore
k
. max ||Rk,1 || = OP (1), and
d<k<N −d (log+ log k)1/2
N −k
max ||Rk,2 || = OP (1).
d<k<N −d (log+ log(N − k))1/2
Hence the comparison of the parameters .β̂ k,1 and .β̂ k,2 leads to a CUSUM based
procedure. It follows from elementary calculation that
⎛ ⎛ ⎞
1 T 1 N k
. Xk,1 Ek,2 − XT E k,2 = S(k) − S(N ) ,
k N − k k,2 k(N − k) N
where .rN and .āA−1 DA−1 (d, 1) are defined in (1.2.27) and (1.3.10).
.A and .D may again be estimated as in Sect. 4.1.1.
So far we have studied the case in which the model errors a drawn from a stationary
process. In order to allow for heteroscedasticity in the model errors, we may
generalize the approach developed in Sect. 3.3 to the linear model framework. Recall
Assumptions 3.3.1 and 3.3.2 that the model errors satisfy .Ei = a(i/N)ei for a
stationary sequence .ei with mean zero and variance one, and .a : [0, 1] |→ R
a function of bounded variation. If Assumption 3.3.3 holds so that the sequence
.{xi ei , i ∈ Z} satisfies the functional central limit theorem, and .Exi ei = 0 for all i,
then we can define a sequence of Gaussian processes .{┌ N (t), 0 ≤ t ≤ 1} such that
|| ||
|| ||
|| −1/2 LN
E t⎦
||
|| xi Ei − ┌ N (t)||
. sup
||N || = oP (1),
0≤t≤1 || i=1 ||
and
∞
E
∗
. D = Ex0 e0 xT
l el (4.1.38)
l=−∞
is the long–run covariance matrix of the vector valued process .{xi ei , i ∈ Z}. With
again
⎛ ⎞
LN
E LNt⎦ E
t⎦ N
ZN (t) = N −1/2 ⎝
. xi Ei − xi Ei ⎠ ,
N
i=1 i=1
162 4 Regression Models
we obtain that
where
T
It follows that .E ┌ˆ N (t) = 0 and .E ┌ˆ N (t)┌ˆ N (s) = D(t, s), where
ˆ D
. {┌(t), 0 ≤ t ≤ 1} = {┌ˆ N (t), 0 ≤ t ≤ 1}.
where .{Ni2 , i ≥ 1} are independent standard normal random variables and .λ1 ≥
λ2 ≥ . . . are the eigenvalues of integral operator defined by the matrix valued
function .D. In particular, the eigenvalues satisfy the equation, with .φ i (s) =
(φi,1 (s), . . . ., φi,d (s))T
/ 1 d /
E 1
. λi φ i (t) = D(t, s)φ i (s)ds, 1 ≤ i < ∞, 2
φi,j (s)ds = 1,
0 j =1 0
Êi = yi − xT
. i β̂ N , i ∈ {1, . . . , N},
4.1 Change Point Detection Methods for Linear Models 163
where we use .β̂ N for .β̂ N,1 . The estimator of .D is computed from .r̂ i = xi Êi ,
1 ≤ i ≤ N. First we estimate .b(u) with a kernel lag-window estimator for each
.0 < u ≤ 1. If the estimator is denoted by .b̂N (u), then the plug in estimator for
.D(t, s) is
D̂N (t, s) = b̂N (min(t, s)) − t b̂N (1) − s b̂N (1) + st b̂N (1),
. 0 ≤ t, s ≤ 1.
We follow the estimation technique of Sect. 3.1. The estimator .b̂N (u) is the long–
run covariance matrix estimator computed from .r̂ i , .i ∈ {1, . . . , LNu⎦}. Let
⎧
⎪
⎪ 1 E
k−l
⎪
⎪ (r̂ i − r̄)(r̂ i+l − r̄)T , if l /= 0,
⎪
⎨N
i=1
.γ̂ k,l =
⎪
⎪ 1 E
k
⎪
⎪ (r̂ i − r̄)(r̂ i+l − r̄)T , if l < 0,
⎪
⎩N
i=−(l−1)
where
1 E
N
.r̄ = r̂ i .
N
i=1
LNE
u⎦−1 ⎛ ⎛ ⎞
l
b̂N (u) =
. K γ̂ LN u⎦,l ,
h
l=−(LN u⎦−1)
where .K(t) and h satisfy Assumptions 3.1.5 and 3.1.4. As in the proof of
Theorem 3.3.4, it can be shown that
/ || ||2
|| ||
. ||b̂N (u) − b(u)|| du = oP (1),
Let .λ̂1 ≥ λ̂2 ≥ . . . be the eigenvalues of .D̂N . Combining Theorem A.3.4 and
(4.1.41) we get that
|λ̂i − λi | = oP (1).
.
164 4 Regression Models
∞ ∗
E E
d
. λi Ni2 ≈ λ̂i Ni2 ,
i=1 i=1
Under the alternative .D̂N is not a consistent estimator for .D, but
//
. D̂2N (t, s)dtds = OP (h2 max ||β l+1 − β l ||2 ).
1≤l≤R
Hence
⎛ ⎛ ⎞
.λ̂1 = OP h max ||β l+1 − β l || ,
1≤l≤R
which implies
E
d∗ ⎛ ⎛ ⎞
. λ̂i Ni2 = OP h max ||β l+1 − β l || ,
1≤l≤R
i=1
and therefore the consistency of the suggested procedure follows from (4.1.42).
Assumption 4.2.2
with .k ∗ = LN θN ⎦.
The change in the parameters in the first consistency result we consider is measured
by
ΔN = (β 0 − β A )T A(β 0 − β A ).
.
Theorem 4.2.1 If .HA of (4.1.3), Assumptions 4.1.1, 4.1.2, 4.2.1 and 4.2.2 hold,
and .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable for some .ν > 4, then
ν
−2 log Λk ∗ P
. ⎛ ⎛ ⎞ → 1.
1
N log θN (1 − θN )ΔN + 1
σ2
Proof Using the definition of .β̂ N,1 we have under (4.1.3) that
i=1 i=k ∗ +1
∗
E
k E
N
= (Ei − xT
i ( β̂ N,1 − β 0 )) 2
+ (Ei − xT
i (β̂ N,1 − β A ))
2
i=1 i=k ∗ +1
∗ ∗
E
N E
k E
k
= Ei2 − 2 Ei xT
i (β̂ N,1 − β 0 ) + (β̂ N,1 − β 0 )
T
xi xT
i (β̂ N,1 − β 0 )
i=1 i=1 i=1
E
N E
N
−2 Ei xT
i (β̂ N,1 − β A ) + (β̂ N,1 − β A )
T
xi xT
i (β̂ N,1 − β A )
i=k ∗ +1 i=k ∗ +1
E
N
= Ei2 + NθN (1 − θN )2 ΔN (1 + oP (1))+NθN2 (1 − θN )ΔN (1 + oP (1)).
i=1
166 4 Regression Models
Similarly,
∗ ∗ ∗
E
k E
k E
k
N σ̂k2∗ ,1 =
. Ei2 − 2 Ei xT
i (β̂ k ∗ ,1 − β 0 ) + (β̂ k ∗ ,1 − β 0 )
T
xi xT
i (β̂ k ∗ ,1 − β 0 )
i=1 i=1 i=1
= k ∗ σ 2 (1 + oP (1)),
and
E
N E
N
.N σ̂k2∗ ,2 = Ei2 − 2 Ei xT
i (β̂ k ∗ ,2 − β A )
i=k ∗ +1 i=k ∗ +1
E
N
+ (β̂ k ∗ ,2 − β A )T xi xT
i (β̂ k ∗ ,2 − β A )
i=k ∗ +1
= (N − k ∗ )σ 2 (1 + oP (1)).
1 ⎛ ⎞1/2
T −1
sup1/(N +1)<t<1−1/(N +1) ẐN (t)D ẐN (t)
(t (1 − t))κ P
.
T −1
→ 1.
N [θN (1 − θN )] ((β 0 − β A ) AD A(β 0 − β A ))
1/2 1−κ 1/2
Thus we have
⎧
⎪
⎪ Ek Ek
⎪
⎪ xi Ei − xi xT if 1 ≤ i ≤ k ∗
⎪
⎪ i (β̂ N,1 − β 0 ),
⎪
⎨ i=1
Ek i=1
. xi Êi = E k Ek∗ Ek
⎪
⎪ − T
− − xi xT
i=1 ⎪
⎪
xi E i xi xi ( β̂ N,1 β 0 ) i (β̂ N,1 − β A ),
⎪
⎪ ∗ +1
⎪
⎩
i=1 i=1 i=k
if k ∗ + 1 ≤ i ≤ N.
For .1 ≤ i ≤ k ∗ we write
⎛ ∗ ⎞
E
k
k E
k E
k
. xi xT
i (β̂ N,1 − β 0 ) −
⎝ xi xT
i (β̂ N,1 − β 0 ) + xi xT
i (β̂ N,1 − β A )
⎠
N ∗
i=1 i=1 i=k +1
⎛ k ⎞
E k E T
N
k E
k
= xi xT
i − xi xi (β̂ N,1 − β 0 ) − xi xT
i (β̂ N,1 − β A )
N N ∗
i=1 i=1 i=k +1
and
⎛ ⎛ ⎞κ || ⎛ N
|| E
⎞ ||
||
N2 || ||
. max || xi xT − (N − k ∗ )A (β̂ N,1 − β A )|| = OP (1).
1≤k<N k(N − k) || ∗
i ||
i=k +1
So we conclude
⎛ ⎛ ⎞κ ||
||E
||
||
k E
k N
N2 || ∗ ||
. max || xi Êi − xi Êi − (N − k )A(β̂ N,1 − β A )||
1≤k≤k ∗ k(N − k) || N ||
i=1 i=1
= OP (1).
168 4 Regression Models
Similarly,
⎛ ⎛ ⎞κ ||
||E
||
||
k E
k N
N2 || ∗ ||
. max || xi Êi − xi Êi − k A(β̂ N,1 − β 0 )|| = OP (1).
k ∗ <k<N k(N − k) || N ||
i=1 i=1
Hence
⎧
⎛ ⎛ ⎞κ ⎨⎛E ⎞T
k E
k N
N2
. max xi Êi − xi Êi
1≤k<N k(N − k) ⎩ N
i=1 i=1
⎛ k ⎞⎫1/2
E k E
N ⎬
× D−1 xi Êi − xi Êi
N ⎭
i=1 i=1
⎛ ⎞1/2
= N[θN (1 − θN )]1−κ (β 0 − β A )T AD−1 A(β 0 − β A ) + OP (N 1/2 ).
The second half of the theorem follows similarly and we omit the details. ⨆
⨅
Remark 4.2.1 We note that Theorem 4.2.2 remains true if .κ = 1/2 when we
replace Assumption 4.2.2 with
not consistent when there is a change in the regression parameters. If .HA of (4.1.3)
holds, then the long–run covariance matrix of (4.1.35) satisfies
||D̂N || = OP (h),
. (4.2.2)
P −1 P
. sup t (1 − t)(−2 log ΛLN t⎦ ) → ∞, sup ẐT
N (t)D̂N ẐN (t) → ∞,
0<t<1 0<t<1
4.2 Inference for Change–Points in a Linear Model 169
and
P
. sup RT −1
N (t)AD ARN (t) → ∞
0≤t≤1
We have seen that several different approaches for testing .H0 versus .HA amount
to considering maximally selected CUSUM processes. Naturally the argument at
which these processes attain their maximum may be used to estimate the location of
a change point. In view of (4.1.27), an estimator for the location of a change point
may be defined as
⎛ || k || ⎫
||E ||
1 || ||
k̂N (κ) = k̂N = sargmax
. || xi Êi || , (4.2.3)
k∈{1,...,N −1} (k(N − k))κ || ||
i=1
where .0 ≤ κ ≤ 1/2. The asymptotic properties of this estimator are detailed in the
following result, in which the size of the change in model (4.1.1) is measured by
||A(β 0 − β A )||2
ΔN =
. . (4.2.4)
||AD1/2 (β 0 − β A )||
||β 0 − β A || → 0 and
. N||β 0 − β A ||2 → ∞, (4.2.5)
then
D
Δ2N (k̂N − k ∗ ) → ξ(κ).
.
||β 0 − β A || → 0
. and N ||β 0 − β A ||2 / log log N → ∞,
170 4 Regression Models
then
D
Δ2N (k̂N − k ∗ ) → ξ(1/2),
.
E
k
k E
N E
k
k E
N
. xi Êi − xi Êi = xi Ei − xi Ei + vk ,
N N
i=1 i=1 i=1 i=1
where
⎧ ⎛ ∗ ⎞
⎪
⎪E k Ek E N
⎪
⎪ xi xT
k ⎝ xi xT xi xT ⎠
⎪
⎪ i (β̂ N − β 0 ) − i (β̂ N − β 0 ) + i (β̂ N − β A ) ,
⎪
⎪ N
⎪
⎪ i=1 i=1 i=k +1 ∗
⎪
⎪ ∗,
⎪
⎨ ∗ if 1 ≤ i ≤ k
E E ⎛ ⎛ k ∗
k E T
k k
.vk =
⎪ T T
⎪
⎪ xi xi (β̂ N − β 0 ) + xi xi (β̂ N − β A ) − xi xi (β̂ N − β 0 )
⎪
⎪ N
⎪
⎪ i=1 i=k ∗ +1 i=1
⎪
⎪ E N ⎞
⎪
⎪ T
⎪
⎪
⎩ + xi x i ( β̂ N − β A ,
) if k ∗ + 1 ≤ i ≤ N.
i=k ∗ +1
and
|| N ||
|| E || ⎛ ⎞
−κ || ||
. max (N − k) || xi xT − (N − k)A|| = OP N κ−1/2 .
1≤k<N || i ||
i=k+1
Hence
|| k ||
N κ−1/2 N κ−1/2 ||||E T
||
||
. max ||vk − zk || ≤ max || (x x
i i − A)( β̂ − β 0 ||
)
1≤k≤k ∗ kκ 1≤k≤k ∗ k κ || N
||
i=1
|| ||
|| E k∗ ||
N κ−1/2 |||| k T
||
||
+ max ∗ κ || (x x
i i − A)( β̂ N − β )
0 ||
1≤k≤k k || N i=1 ||
|| ||
N κ−1/2 || k E
N ||
|| ||
+ max ∗ || (xi xT − A)(β̂ N − β A )||
1≤k≤k kκ || N ∗
i ||
i=k +1
= OP (1),
where
⎧
⎪
⎪ k(N − k ∗ )
⎨ A(β 0 − β A ), if 1 ≤ k ≤ k ∗ ,
.zk =
N
⎪
⎪ k ∗ (N − k)
⎩ A(β 0 − β A ), if k ∗ + 1 ≤ k ≤ N.
N
N κ−1/2
. max ||vk − zk || = OP (1).
k ∗ +1≤k<N (N − k)κ
Thus we conclude
|k̂N − k ∗ | = oP (N ).
. (4.2.6)
Let
C
a=
. ,
Δ2N
E
k
k E
N
. xi Êi − xi Êi = Q1 (k) + . . . + Q4 (k),
N
i=1 i=1
172 4 Regression Models
where
E
k
k E
N
Q1 (k) =
. xi Ei − xi Ei ,
N
i=1 i=1
⎛ ⎞
E
k
k E
N
Q2 (k) = − xi xT
i − xi xT
i (β̂ N − (θ β 0 + (1 − θ )β A ),
N
i=1 i=1
E
k
Q3 (k) = xi xT
i (1 − θ )[β 0 − β A ],
i=1
⎡ ⎤
k∗
E E
N
k ⎣ ⎦ (β 0 − β A ).
Q4 (k) = − (1 − θ ) xi xT
i −θ xi xT
i
N ∗i=1 i=k +1
⎛ ⎛ ⎞2κ ||
||E
||2
||
k E
k N
N || ||
. || xi Êi − xi Êi ||
k(N − k) || N ||
i=1 i=1
|| ||2
⎛ ⎛ ⎞2κ ||E ∗
∗ E
||
N || k k
N
||
− || xi Êi − xi Êi ||
∗ ∗
k (N − k ) || ||
|| i=1 N
i=1 ||
⎛ ⎛ ⎞2κ E
4 ⎛ ⎛ ⎞2κ E
4
N 1
= QT
i (k)Qj (k) − QT ∗ ∗
i (k )Qj (k ).
k(N − k) k ∗ (N − k ∗ )
i,j =1 i,j =1
From here we may follow the proof of Theorem 2.2.1 to obtain the limit of the
process .Q4 (k). We obtain that
With
E
N
Q3 (k) =
. xi xT
i θ (β 0 − β A ), if k > k ∗ ,
i=k+1
4.2 Inference for Change–Points in a Linear Model 173
we also have
| ⎛ ⎛ ⎞2κ
| N −k E T
N
−(1−2κ) | N
N
. max | k(N − k) xi Ei Q3 (k)
k ∗ <k≤k ∗ +C/Δ2N N
i=1
⎛ ⎛ ⎞2κ |
− k∗ E |
N
N N |
− xT ∗
i Ei Q3 (k ) − RN,1 (k)| = oP (1),
k ∗ (N − k ∗ ) N
i=1
where
⎧
⎪ ⎛ ⎛ ⎛ ⎛ ⎞2κ E k
⎪
⎪ −(1−2κ) N
⎪
⎪
⎪ N xT
i Ei Q3 (k)
⎪
⎪ k(N − k)
⎪
⎪ ⎛ ⎞2κ E ∗
i=1 ⎞
⎪
⎪
⎪
⎪ − N k
xT E Q (k ∗ ) , if 1 ≤ k ≤ k ∗ ,
⎪
⎪ k ∗ (N −k ∗ ) i=1 i i 3
⎪
⎨ ⎛ ⎛ ⎛ ⎛ ⎞2κ ⎛ E N
⎞
.RN,1 (k) = −(1−2κ) N T
⎪N
⎪ − xi Ei Q3 (k)
⎪
⎪ k(N − k)
⎪
⎪ ⎞2κ ⎛
i=k+1 ⎞
⎪
⎪ ⎛ ⎛ E ⎞
⎪
⎪ N
N
⎪
⎪ − ∗ − T ∗
xi Ei Q3 (k ) ,
⎪
⎪ k (N − k ∗ )
⎪
⎪ ∗ +1
⎪
⎩
i=k
if k ∗ + 1 ≤ k ≤ N,
We note that
| ⎛ ⎛ ⎞2κ ⎡
| N
−(1−2κ) | QT
3 (k)Q3 (k) − (1 − θ ) k
2 2
N
. max | k(N − k)
|k−k ∗ |≤C/Δ2N
⎤
× (β 0 − β A )T AT A(β 0 − β A )
⎛ ⎛ ⎞2κ ⎡
N
− QT ∗ ∗ 2 ∗ 2
3 (k )Q3 (k ) − (1 − θ ) (k )
k ∗ (N − k ∗ )
⎤|
|
× (β 0 − β A ) A A(β 0 − β A ) ||
T T
= oP (1).
174 4 Regression Models
Let
⎛ ⎛ ⎛ ⎛ ⎞2κ
N
RN,2 (k) = N −(1−2κ)
. (1 − θ )2 k 2 (β 0 − β A )T AT A(β 0 − β A )
k(N − k)
⎛ ⎛ ⎞2κ
N
− (1 − θ )2 (k ∗ )2
k ∗ (N − k ∗ )
⎞
T T
× (β 0 − β A ) A A(β 0 − β A ) ,
E
∗ |
k
|
× xi xT (β 0 − β )
A |
|
i
i=1
= oP (1).
where
⎧ ⎛ ⎞T ∗
⎪
⎪ ⎛ ⎛ ⎞2κ Ek∗ E k
⎪
⎪ N
⎪
⎪ 2 (1 − θ ) ⎝ − x E ⎠ xi xT
i (β 0 − β A ),
⎪
⎪ k ∗ (N − k ∗ ) i i
⎪
⎪ i=k+1 i=1
⎨ ∗
.RN,3 (k) =
if 1 ≤ k ≤ k
⎪
⎪ ⎛ ⎛ ⎞2κ ⎛ E k
⎞ N
E
⎪
⎪ N
⎪
⎪ 2 ∗ θ xi Ei xi xT
i (β 0 − β A ),
⎪
⎪ k (N − k ∗ )
⎪
⎪ i=k ∗ i=k ∗ +1
⎩
if k ∗ + 1 ≤ k ≤ N.
Using Theorem A.1.1 we can define a sequence two sided Wiener processes
{WN (t), −∞ < t < ∞} such that
.
| |
| −(1−2κ) ||AD1/2 (β 0 − β A )|| ||
. sup N| ∗
RN,3 (k + s/ΔN ) − 2(θ (1 − θ ))
2 1−2κ
WN (s)
| ΔN |
|s|≤C
= oP (1),
4.2 Inference for Change–Points in a Linear Model 175
where the two sided Wiener process is defined in (2.2.2). We observe that by the
choice of .ΔN we have
||AD1/2 (β 0 − β A )||
2(θ (1 − θ ))1−2κ WN (s)
.
ΔN
||A(β 0 − β A )||2
− 2(θ (1 − θ ))1−2κ |s|mκ (s)
Δ2N
||AD1/2 (β 0 − β A )||
= 2(θ (1 − θ )1−2κ (WN (s) − |s|mκ (s)).
ΔN
Since the distribution of .{WN (t), −∞ < t < ∞} does not depend on N, we can
repeat the last step in the proof of Theorem 2.2.1 to conclude. ⨆
⨅
This result pertains to the argument-maximum of the CUSUM process based on
weighted residuals, although a similar result may be established for maximally
selected likelihood ratio statistics.
As described in Chap. 3, for example in (3.1.33), estimators for .D as in (4.1.35)
can be modified using .k̂ = k̂N in order to reduce additional estimated variation
due to the presence of change points. We may compute least–squares regression
parameters estimators .β̂ k̂,1 from .{(yi , xT T
i ), 1 ≤ i ≤ k̂} and .β̂ k̂,2 from .{(yi , xi ), k̂ +
1 ≤ i ≤ N}. Subsequently .D may be estimated with a kernel lag–window long–run
covariance matrix estimate based on .{xi Êi , 1 ≤ i ≤ N }, where
⎧
yi − xT
i β̂ k̂,1 , if 1 ≤ i ≤ k̂,
.Êi =
yi − xT
i β̂ k̂,2 , if k̂ + 1 ≤ i ≤ N.
and
⎛ ⎛ ⎛ ⎞κ ⎫
(2) 1
.k̂ = sargmax max |z̃N,j (k/(N + 1))| ,
N
k∈{1,...,N −1} 1≤j ≤d k(N − k)
176 4 Regression Models
(1)
|k̂N
. − k ∗ | = OP (1/Δ2N ) = oP (N ) and (2)
|k̂N − k ∗ | = OP (1/Δ2N ) = oP (N ),
The methods we discussed can be extended to multiple changes. We assume that the
observations are given by
E
R+1
yi =
. xT
i β j 1{kj −1 ≤ i < kj } + Ei , i ∈ {1, . . . , N }, (4.2.7)
j =1
where .k0 = 0 and .kR+1 = N + 1. As such the times of the changes are .k1 , . . . , kR ,
and at time .kl the regression parameter changes from .β l to .β l+1 . The alternative of
R change points can be characterized by
HA : β 1 /= β 2 /= . . . /= β R+1 .
. (4.2.8)
When .ki = LNθi ⎦, 1 ≤ i ≤ R and .0 < θ1 < θ2 < . . . θR < 1 (Assumption 2.3.1)
holds, it is required that
in order for the tests resulting from Theorems 4.1.2–4.1.5 to be consistent. Similarly
if
We define the sum of the squares assuming that there are S changes at times
1 < r1 < r2 < . . . < rS < N:
.
E
S+1 E
rl
.M(r1 , r2 , . . . , rS , S) = min (yi − xT
i β)
2
β
l=1 i=rl−1 +1
E
rl E
rl
. min (yi − xT
i β) =
2
(yi − xT 2
i β̂ l ) ,
β
i=rl−1 +1 i=rl−1 +1
where .β̂ l is the least-squares estimator of the regression parameter computed from
.{yi , xi , rl−1 < i ≤ rl }. We assume that the penalty function takes the form
.P(N, S) = g(S)mN , and satisfies the conditions
mN → ∞,
. and mN /(N ||β l+1 − β l ||2 ) → 0, 1≤l≤R+1
hold. We note that the penalty terms discussed in Sect. 2.1 can be used. Following
the proof of Theorem 2.3.2, one can show that for all .ε > 0,
⎛ ⎛ ⎛ ⎫⎞
. lim P {R̂ = R} = 1} ∩ max |k̂l − kl | < εN = 1. (4.2.9)
N →∞ 1≤l≤R
we assume that we have preliminary change point estimators satisfying the follow-
ing assumption:
Assumption 4.2.3 The estimators .R̂ and .k̂1 , . . . , k̂R̂ satisfy for all .ε > 0,
⎛ ⎛ ⎛ ⎫⎞
P {R̂ = R} ∩
. max |k̂i − ki | < εN → 1.
i∈{1,...,R}
⎛⎛ ⎞κ ||
|| k
||
|| ⎫
1 || E ||
k̃l = sargmax || xi β̂ l ||
.
|| || ,
k∈{k̂l−1 +1,...,k̂l+1 −1} (k − k̂l−1 )(k̂l+1 − k) || ||
i=k̂l−1 +1
where .β̂ l is the least squares estimator computed from .(yi , xi ), .i ∈ {k̂l−1 , . . . , k̂l+1 }.
The size of change at .kl is measured by
We note since .D and .A are non singular matrices, .ΔN,l is proportional to .||β l+1 −
β l ||. The proof of the following results can be derived from Theorem 4.2.3 similarly
as Theorem 2.3.3 was derived from Theorem 2.2.1.
Theorem 4.2.4 We assume that .HA of (4.2.8), Assumptions 2.3.1, 4.1.1,
4.1.2, 4.1.4, 4.2.3 are satisfied, .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable
ν
for some .ν > 4 and satisfies Assumption 1.3.2 with rate parameter .ζ , and
.0 ≤ κ < 1/2 − ζ . If for all .l ∈ {1, ..., R}
||β l+1 − β l || → 0
. and N||β l+1 − β l ||2 → ∞,
then
D
Δ2N,l (k̃l − kl ) → ξ̄ (κ),
.
scenario, let
where .{W ∗ (t), −∞ < t < ∞} and .m̄l (t) defined in (3.3.19) and (2.3.15). The
normalization also must be changed to reflect non-stationarity of the errors. Let
where .D∗ is defined in (4.1.38). The following result may be established along
the lines of Theorem 2.3.3 where its reliance on Theorem 2.2.1 is replaced by
Theorem 4.2.4.
Theorem 4.2.5 We assume that .HA of (4.2.8), Assumptions 2.3.1, 3.3.1, 3.3.2,
4.1.1, 4.1.2, 4.1.4, and 4.2.3 are satisfied, .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –
ν
decomposable for some .ν > 4 and satisfies Assumption 1.3.2 with .ζ = 1/ν„ and
.0 ≤ κ < 1/2 − 1/ν. If for all .l ∈ {1, . . . , R},
||β l+1 − β l || → 0
. and N ||β l+1 − β l ||2 → ∞,
then
D
Δ2N,l (k̃l − kl ) → ξl∗ (κ),
.
E
k
. xi xT
i ≈ kA, (4.3.1)
i=1
180 4 Regression Models
To derive a test for .H0 of (4.1.2) against .HA of (4.1.3), we may again use a quasi–
likelihood argument as in Sect. 4.1.1. If .{Ei , i ≥ 1} are independent and identically
distributed normal random variables with .EEi = 0 and .EEi2 = σ 2 . If the variance
2
.σ is known, negative two multiplied by the logarithm of the likelihood ratio takes
the form
N ⎛ 2 ⎞
lN (k) =
. σ̂N,1 − [ σ̂ 2
k,1 + σ̂ 2
k,2 ] ,
σ2
2 and .σ̂ 2 are defined in (4.1.5). We also recall the least square estimators
where .σ̂k,1 k,2
β̂ k,1 and .β̂ k,2 defined in (4.1.6). Elementary algebra gives that, under .H0 ,
.
1 T −1
2
σ̂N,1
. − [σ̂k,1
2
+ σ̂k,2
2
]= R C CN,1 C−1
k,2 Rk ,
N k k,1
where
E
k E
N
Ck,1 =
. xi xT
i , Ck,2 = xi xT
i ,
i=1 i=k+1
and
E
k E
k E
N
Rk =
. xi (yi − xT
i β̂ N,1 ) = xi Ei − Ck C−1
N,1 xi yi .
i=1 i=1 i=1
Rk is the partial sum of the weighted residuals of (4.1.26). We then use as evidence
.
1 −1 −1
TN =
. max RT
k Ck,1 CN,1 Ck,2 Rk . (4.3.3)
σ2 p<k<N −p
T̂N =
. max (−2 log Λk ),
p<k<N −p
4.3 Polynomial Regression 181
where .Λk is defined in (4.1.4). We have already investigated the properties of .T̂N in
Sect. 4.1.1, but under assumptions that imply (4.3.1). In the present section we study
.T̂N where the covariates are as in (4.3.2). In order to simplify this presentation, we
| k |
|E |
−ζ | |
. max k | Ei − σ WN,1 (k)| = OP (1)
1≤k≤N/2 | |
i=1
and
| N |
| E |
| |
. max (N − k)−ζ | Ei − σ WN,2 (N − k)| = OP (1)
N/2≤k≤N | |
i=k+1
and
⎛/ 1 ⎫
C2 (t) =
. x i+j dx, 0 ≤ i, j ≤ p .
t
The sum of the weighted errors has a weak limit under Assumption 4.3.2 that may
be expressed using the Gaussian process
⎛ ⎛/ t / t / t ⎞T
┌(t) =
. dW (x), xdW (x), . . . , x p dW (x) ,
0 0 0
Theorem 4.3.1 If .H0 of (4.1.2), Assumptions 4.3.1 and 4.3.2 are satisfied, then
1 −1 −1 D
. max RT
k Ck,1 CN Ck,2 Rk → sup ||Γ 0 (t)||2
σ2 LN δ⎦≤k≤N −LN δ⎦ δ<t<1−δ
for all .0 < δ < 1/2, where .┌ 0 (t) = ┌(t) − C(t)C−1 (1)┌(t).
Proof Elementary calculations give that
|| ||
|| 1 ||
|| ||
. sup
|| N CLN t⎦,1 − C1 (t)|| = o(1), (4.3.4)
0≤t≤1
and
|| ||
|| 1 ||
|| ||
. sup
|| N CLN t⎦,2 − C2 (t)|| = o(1). (4.3.5)
0≤t≤1
LN
E t⎦
1 D [0,1]
. Ei −→ σ W (t), (4.3.6)
N 1/2
i=1
Now
1 D [0,1]p+1
. R
1/2 LN t⎦
−→ ┌ 0 (t)
N
follows from (4.3.4) and (4.3.5). Using again (4.3.4) and (4.3.5) we obtain Theo-
rem 4.3.1. ⨆
⨅
The statistics .TN and .T̂N are maximally selected .log likelihood ratios, and
as a result they are “standardized” in the sense that the expected values of
T −1 C C−1 R and .−2 log Λ are constant. Hence it is expected that the
.R C
k k,1 N k,2 k k
maximally selected .log likelihood ratios must have a limit distribution related to
4.3 Polynomial Regression 183
and
{ }
. lim P T̂N ≤ x + 2 log log N + (p + 1) log log log N − 2a(p + 1)
N →∞
⎛ ⎞
= exp −2e−x/2 (4.3.9)
. lim P {TN ≤ x + 2 log log h(N) + (p + 1) log log log h(N) − 2a(p + 1)}
N →∞
⎛ ⎞
= exp −2e−x/2
and
{ }
. lim P T̂N ≤ x + 2 log log h(N) + (p + 1) log log log h(N) − 2a(p + 1)
N →∞
⎛ ⎞
= exp −2e−x/2
where .h(N) = N(log N)γ , .−∞ < γ < ∞. The parameter .γ can be tuned to
improve the finite sample approximation of the limit result in Theorem 4.3.2, as
discussed in Aue et al. (2008).
184 4 Regression Models
−1 −1 T −1 T −1
.RT
k Ck,1 CN,1 Ck,2 Rk = Rk Ck,1 Rk + Rk Ck,2 Rk .
||β̂ N − β 0 || = OP (N −1/2 ).
. (4.3.10)
We recall
E
k
.S(k) = xi Ei
i=1
Rk = S(k) − Ck (β̂ N − β 0 ).
. (4.3.11)
. max Rk C−1
k,2 Rk = OP (1).
1≤k≤N/2
Let
b1 = b1 (N ) = (log N)α
. and b2 = b2 (N ) = N(log N )−β ,
4.3 Polynomial Regression 185
We note that
⎛/ k ⎛ ⎞ ⎫ ⎛ ⎛ ⎛/ k ⎛ ⎞ ⎞ ⎫
x j D x 2j
. dWN,1 (x), 1 ≤ k ≤ N = W dx , 1 ≤ k ≤ N .
0 N 0 N
where .{W (x), x ≥ 0} is a Wiener process. By the scale transformation of the Wiener
process we have
| ⎛ ⎛/ ⎛ x ⎞2j ⎞| ⎛ ⎞
| k |
max ||W dx || = OP b2
1/2
. (4.3.13)
1≤k≤b2 0 N
and therefore
−1 ( )
. max RT
k Ck,2 Rk = OP (log N)
−β
.
1≤k≤b2
= ST (k)C−1 T T
k,1 S(k) − 2(β̂ N − β 0 ) S(k) + (β̂ N − β 0 ) Ck,1 (β̂ N − β 0 ).
Using again Assumption 4.3.2 and (4.3.10), one can verify that
−1 −1
. max RT T
k Ck,1 Rk = max S (k)Ck,1 S(k) + OP (N
−1/2
(log N )α ),
1≤k≤b1 1≤k≤b1
−1
max RT
k Ck,1 Rk = max ST (k)C−1 −β
k,1 S(k) + OP ((log N) ),
b1 ≤k≤b2 b1 ≤k≤b2
−1
max RT
k Ck,1 Rk = max ST (k)C−1
k,1 S(k) + OP (1).
b2 ≤k≤N/2 b2 ≤k≤N/2
process. Hence we can apply the Darling–Erdős result in Theorem A.2.3 and get
. max ST (k)C−1
k,1 S(k) = OP (log log log N)
1≤k≤b1
max ST (k)C−1
k,1 S(k) = OP (log log log N)
b2 ≤k≤N/2
1
max ST (k)C−1
P
.
b1 ≤k≤b2 k,1 S(k) → c1 ,
log log N
and
. max RT
k Ck,2 T
−1
CN,1 C−1 T −1 −β
k,2 Rk = max S (k)Ck,1 S(k) + OP ((log N ) ).
b1 ≤k≤b2 b1 ≤k≤b2
Thus
⎛ ⎫
−1 −1 −1
. lim P max RT C C C R
k k,2 N,1 k,2 k = max ST
(k)Ck,1 S(k) = 1.
N →∞ 1≤k≤N b1 ≤k≤b2
1 || || ⎛ ⎞
. max ||S(k) − σ ┌ N,1 (k)|| = OP b−(1/2−ζ ) = oP (1/ log N ),
b1 ≤k≤b2 2
k
where
⎛ ⎛/ k / k / k ⎛ x ⎞p ⎞T
x
┌ N,1 (k) =
. dWN,1 (x), dWN,1 (x), . . . , dWN,1 (x) .
0 0 N 0 N
By the modulus of continuity of the Wiener process (see Theorem A.2.2) we have
−1
. max ┌ T
N,1 (k)Ck,1 ┌ N,1 (k)
b1 ≤k≤b2
−1
= sup ┌ T
N,1 (t)(NC1 (t/N))┌ N,1 (t) + oP ((log N)
1/2−α
).
b1 ≤t≤b2
D
. sup ┌ T −1
N,1 (t)(NC1 (t/N)) ┌ N,1 (t) = sup ┌ T (s)C−1
1 (s)┌(s),
b1 ≤t≤b2 b1 /N ≤s≤b2 /N
4.3 Polynomial Regression 187
where
⎛ ⎛/ s / s / s ⎞T
┌(s) =
. dW (x), xdW (x), . . . , x p dW (x) ,
0 0 0
where .{W (x), x ≥ 0} is a Wiener process. Since the same argument can be used on
maxN/2≤k<N , we get that
.
−1 −1 −1
. max RT 2 T
k Ck,2 CN,1 Ck,2 Rk = max σ ┌ N,2 (k)Ck,1 ┌ N,2 (k)
N/2≤k<N b1 ≤k≤b2
Hence the limit distribution is the same as the limit distribution of the maximum of
two independent copies of copies of .supb1 /N ≤s≤b2 /N ┌ T (s)C−11 (s)┌(s). Aue et al.
(2009b) obtained the limit distribution using some results on Legendre polynomials
and the general theory of the maximum of the square norm of stationary Gaussian
processes (Leadbetter et al., 1983; Piterbarg, 1996; Albin, 2001). ⨆
⨅
In the definition of .xi in (4.3.2) we can replace the polynomial with a smooth
function,
⎛ ⎛ ⎞
i
.xi = h , 1 ≤ i ≤ N, (4.3.14)
N
Moreover, there are p linearly independent vectors .a1,1 , a1,2 , . . . , a1,p and non–
negative integers .0 ≤ γ1,1 < γ1,2 < . . . < γ1,p such that
|| ||
|| E
p ||
1 || γ1,i ||
. lim sup ||h(t) − a1,i (1 − t) || < ∞.
t→0 t γ1,p +1 || ||
i=1
E
p
∗
.h0 (t) = h(t) − a0,i t γ0,i
i=1
and
E
p
∗
.h1 (t) = h(t) − a1,i (1 − t)γ1,i .
i=1
and
|| ∗ ||
. ||h (t) − h∗ (s)|| ≤ C1 (1 − s)γ1,p |t − s|, for all 1 − τ1 ≤ s ≤ t ≤ 1.
1 1
Assumptions 4.3.3–4.3.6 are rather mild as it is only required that the regressors
are continuous and linearly independent in .(0, 1), and that they are smoothly
differentiable in a neighborhood of both 0 and 1. In particular, Assumptions 4.3.5
and 4.3.6 are used to determine the leading non–zero terms in the Taylor expansion
of .h(t) at 0 and 1. The polynomial regression satisfies Assumptions 4.3.3–4.3.6. We
detail two important examples where these conditions hold. Let
Example 4.3.1 If the data exhibit cyclical or seasonal behavior, it may be appropri-
ate to specify the regressors as trigonometric functions. Let .p = 2q with q being a
positive integer. One possibility is to pick Fourier frequencies .0 < ω1 < ω2 < . . . <
ωq < 1/2 and define .hl (t) = cos(2π ωl t), 1 ≤ l ≤ q and .hl (t) = sin(2π ωl−k t)
for .q < l ≤ p. This gives
q ⎡
E ⎛ ⎛ ⎞ ⎛ ⎛ ⎞⎤
2π ωl i 2π ωl i
xT
.i β0 = β0,l cos + β0,q+l sin .
N N
l=1
4.4 Non–linear Regression and Generalized Method of Moments 189
We use again the quadratic form .TN of (4.3.3) and .T̂N , the maximally selected
log likelihood ratio of (4.1.4). As in the polynomial case, these two statistics are
.
and
⎧ ⎫
EE12
. lim P TN ≤ x + 2 log log N + p log log log N − 2a(p) = exp(−2ex/2 )
N →∞ σ2
The residual based methods for testing the stability of the parameters in a linear
regression model introduced in Sect. 4.1 can be extended to general non–linear
regression models of the form
yi = h(xi , θ i ) + Ei , i ∈ {1, . . . , N },
.
where the .θ i ’s are d–dimensional parameter vectors. Under the null hypothesis
(1)
H0
. : θ1 = · · · = θN,
190 4 Regression Models
= θ k ∗ /= θ k ∗ +1 = . . . = θ N .
The unknown common parameter vector under .H0(1) is denoted by .θ 0 . Using the
least squares principle, the estimator for .θ 0 is .θ̂ N , the location of the minimizer of
E
N
LN (θ ) =
. (yt − h(xt , θ ))2 ,
t=1
where the minimum is taken over a compact parameter space .O. We make the
following assumptions that are standard in non–linear least squares:
Assumption 4.4.1 The parameter space .O is a compact subset of .Rd , and .θ 0 is an
interior point of .O.
Assumption 4.4.2 There is a function .M : Rd |→ R so that
|| 2 ||
|| ∂ ||
. sup Eh (x0 , θ ) < ∞, sup || ||
|| ∂θ 2 h(xt , θ )|| ≤ M(xt ), EM(x0 ) < ∞,
2
θ∈O θ∈O
|| ||2
|| ∂ ||
||
E || h(x0 , θ 0 )||
|| < ∞, and E[h(x0 , θ 0 ) − h(x0 , θ )] > 0, if θ /= θ 0 .
2
∂θ
Ẽt = yt − h(xt , θ̂ N ).
. (4.4.1)
(1)
Theorem 4.4.1 If .H0 holds, Assumptions 4.4.1 and 4.4.2 are satisfied , and .{zi =
(xT T
i , Ei ) , i ∈ Z} is .L –decomposable for some .ν > 4, then
ν
(ii) If .p ≥ 1 and
/ 1 (t (1 − t))p/2
. < ∞,
0 w(t)
then
/ /
1 |Z̃N (N t/(N + 1))|p D
1 |B(t)|p
. dt → dt,
0 w(t) 0 w(t)
E
N
. g(xt , θ̂N ) = 0,
t=1
where .xt contains both model and instrumental variables. Let .mt (θ ) = g(xt , θ ),
and we assume that the parameter .θ ∈ O, where .O is a compact subset of .R. One
could more generally consider .θ ∈ Rd , .d ≥ 1. Model stability in this case can be
described as
(2)
HA
. 0 = Em1 (θ ) = · · · = Emk ∗ (θ ) /= Emk ∗ +1 (θ )
= · · · = EmN (θ ) for some θ ∈ O.
(2)
Under .H0 , .θ0 denotes the true value of the parameter. We require that the
instrumental variables are stationary and weakly dependent (for example .Lν –
decomposable), which yields that .mt (θ ) is a stationary sequence. The following
assumptions are standard in GMM estimation, see for example the conditions of
Theorem 3.1 of Hansen (1982).
Assumption 4.4.3 .Em0 (θ ) = 0 if and only if .θ = θ0 .
and
| |
| E |
|
−κ |
T
|
. max (T − x) | ms (θ0 ) − σ WT ,2 (T − x)|| = OP (1) (4.4.4)
T /2≤x≤T −1 |s=Lx⎦+1 |
⎛ ⎞
LN
E LNt⎦ E
t⎦ N
1 ⎝
. Z̃N (t) = mt (θ̂N ) − mt (θ̂N )⎠ (4.4.5)
N 1/2 N
i=1 i=1
LN
E t⎦
1
= mt (θ̂N ).
N 1/2
i=1
Theorem 4.4.2 Suppose that .H0(2) holds, Assumptions 4.4.3–4.4.5 are satisfied,
and that .{xi , i ∈ Z} is .Lν –decomposable for some .ν > 4, then the conclusions
of Theorem 4.4.1 remain true with .Z̃N defined in (4.4.5).
We note that these results may also be extended to Darling–Erdős and Rényi–
type functionals of .Z̃N . We refer the proofs of Theorems 4.4.1 and 4.4.2 to Górecki
et al. (2018) and Horváth et al. (2020).
.F1 = · · · = FN . We show that Theorem 2.4.3 remains true when the innovations .Ei
4.5 Changes in the Distributions of the Innovations 193
0 ≤ t ≤ 1, −∞ < x < ∞.
Let F denote the common distribution function under .H0 of (2.4.1). We require in
this case a somewhat stronger condition than Assumption 2.4.1:
E┌ N (t, x) = 0 and
.
∞
E
= (min(t, t ' ) − tt ' ) E[(1{E0 ≤ x} − F (x))(1{El ≤ x ' } − F (x ' )),
l=−∞
Proof To simplify the argument, we assume that the dimension of the covariate
vector is .d = 2. Let
⎛ LN
E t⎦ ⎛ ⎞
R̄N (t, x, u) = N −1/2
. 1{Ei ≤ x + N −1/2 xT
i u} − F (x + N −1/2 T
xi u)
i=1
LN
E t⎦ ⎫
− (1{Ei ≤ x} − F (x)) , 0 ≤ t ≤ 1, x ∈ R, u ∈ R2 .
i=1
where
The proof of (4.5.1) is based on the arguments in Davydov and Zitikis (2008). We
cover .I(C) with the smallest number of squares .U(k, l) of length .1/M, i.e. the
edges of the squares are .(k/M, l/M), ((k + 1)/M, l/M), (k/M, (l + 1)/M) and
.u(k, l) = ((k + 1)/M, (l + 1)/M). It follows from elementary calculation that
| |
| |
. |1{Ei ≤ x + N −1/2 xT
i u} − 1{Ei ≤ x + N
−1/2 T
xi u(k, l)}|
for all .u ∈ U(k, l), where .x̄i is the maximum norm of .xi . Thus we get
| |
| −1/2 T |
. |1{Ei ≤ x + N −1/2 xT
i u} − F (x + N xi u)| (4.5.2)
|
|
≤ |1{Ei ≤ x + N −1/2 (xT i u(k, l) + 4x̄i /M)}
|
|
−F (x + N −1/2 (xTi u(k, l) + 4 x̄i /M)) |
|
|
+ |1{Ei ≤ x + N −1/2 (xT i u(k, l) − 4x̄i /M)}
|
|
−F (x + N −1/2 (xTi u(k, l) − 4x̄i /M))|
| |
| |
+ |F (x + N −1/2 (xTi u(k, l) + 4 x̄ i /M)) − F (x + N −1/2 T
xi u(k, l)) |
| |
| |
+ |F (x + N −1/2 xT
i u(k, l)) − F (x + N −1/2 T
(x i u(k, l) − 4x̄i /M)) |.
≤ c1 (N −1/2 (xT
i (u − u(k, l)) − 4x̄i /M))
2
1
≤ c2 x̄i4 .
NM 2
By the ergodic theorem we have
1 E 4 1 E
N N
. x̄i = OP (1) and ||xi || = OP (1).
N N
i=1 i=1
Thus we conclude
N |
E |
| |
. sup N −1/2 |F (x + N −1/2 xT
i u) − F (x + N −1/2 T
(xi u(k, l) + 4x̄i /M))|
u∈U(k,l) i=1
1 E T E
N N
c3
≤ f (x) |xi (u − u(k, l)) − 4x̄i /M| + 3/2 2 x̄i4
N N M
i=1 i=1
⎛ ⎞
c4 1 E
N
≤ (x̄i4 + 1) . (4.5.4)
M N
i=1
We can use the proof of (4.5.4) to get similar estimates for the increments of
F (x + xT
.
i u). Now (4.5.1) follows from (4.5.2)–(4.5.4). Since (4.5.1) holds for all
C, (4.1.29) implies
∗
|RN (t, x) − RN
. (t, x)| = oP (1),
where
⎛ ⎞
LN
E LNt⎦ E
t⎦ N
∗
RN
. (t, x) = N −1/2 ⎝ 1{Ei ≤ x} − 1{Ei ≤ x}⎠ ,
N
i=1 i=1
0 ≤ t ≤ 1, −∞ < x < ∞.
1 E
N
F̂N (x) =
. 1{Êi ≤ x}
N
i=1
196 4 Regression Models
be the empirical distribution function of the residuals .Ê1 , . . . , ÊN and .F̂N−1 (u) denote
the quantile function of the residuals. Now we compute .Q̃N (t, u) of Sect. 2.4 from
the residuals:
⎛
L(NE+1)t⎦ { }
−1/2 ⎝
.Q̃N (t, u) = N 1 Eˆi ≤ F̂N−1 (u)
i=1
⎞
L(N + 1)t⎦ E { }
N
− 1 Eˆi ≤ F̂N−1 (u) ⎠ , (4.5.5)
N
i=1
0 < t, u < 1. The proof of Theorem 4.5.1 shows that Theorem 2.4.3 remains true if
.
0.25
0.25
1800 - 1878 1800 - 1878
1879 - 1977 1879 - 1977
0.15
0.15
1978 - 2018 1978 - 2018
CO2
CO2
0.05
0.05
-0.05
-0.05
-0.06 -0.02 0.02 0.06 0.000 0.001 0.002 0.003 0.004 0.005
GDP GDP2
Fig. 4.1 Scatter plots of the estimated model based on the three sub–samples obtained by the
segmentation. The left sub–plot shows the scatter of the (GDP) against (CO2 ), and the right sub–
plot shows the scatter of the (GDP)2 against (CO2 )
Here we applied tests for the stability of the model parameters in (4.6.1) based
on the maximum of the standardized–quadratic form
⎡ ⎞⎤1/2
⎛ ⎛ ⎞1/2 ⎛E
k
⎞T ⎛ k
E
N ⎣
.VN = a(log N ) max xi Êi D̂−1 xi Êi ⎦
3<k<N −3 k(N − k)
i=1 i=1
− b3 (log N).
The matrix D̂ was estimated using a Bartlett kernel and bandwidth selected using
the method of Andrews (1991). By Theorem 4.1.3, we expect under stability of the
parameter VN follows approximately a Gumbel law. Using binary segmentation with
the threshold determined as the 95% quantile of the approximate null distribution of
VN , two change points were detected: years 1878, and 1977.
Figure 4.1 displays the scatter plots of the model in the three sub–samples
determined by these change point estimates. We might think of these change points
splitting the historic U.S. economic development into three phases: (1) early growth
phase, from 1800 to 1878, with the coefficients β̂1 = 0.92 and β̂2 = 3.81; (2) middle
growth phase, from 1879 to 1977, with the coefficients β̂1 = 0.96 and β̂2 = −1.40;
(3) late growth phase, from 1978 to 2018, with the coefficients β̂1 = 1.24 and
β̂2 = −6.16. We noticed that while the coefficient corresponding to GDP are always
near one, the parameter β2 appears to fluctuate from positive to negative, which
appears to support inverted U–shaped curve described in the EKC theory.
Example 4.6.2 (COVID-19 Confirmed Cases and Deaths in the U.K.) In this
example we consider a change point analysis of the linear relationship between
COVID-19 deaths and confirmed cases. Change point methods have been applied
frequently to COVID-19 data in order to evaluate the effects of public health
measures and changing environmental conditions on how the pandemic progressed;
see e.g. Jiang et al. (2023). The time series of the number of confirmed cases
198 4 Regression Models
and deaths due to COVID-19 was collected from the GOV.UK website https://
coronavirus.data.gov.uk/details/cases, covering the period from March 11, 2020 to
November 4, 2021 (N = 603). We considered two series: yi , the log differenced
deaths due to COVID-19 in the UK, and xi , the log differenced confirmed cases of
COVID-19 in the UK. We expected a positive correlation between confirmed cases
and future deaths. By calculating the cross–correlation function between the log–
differenced confirmed cases and deaths series, we observed the strongest correlation
between changes in deaths and confirmed cases at a lag of 14 days. Thus, we regress
the log–differenced deaths on the log–differenced confirmed cases, using the linear
model:
11/Mar/20 - 28/Apr/20
29/Apr/20 - 20/Jan/21
21/Jan/21 - 7/Jul/21
8/Jul/21 - 4/Nov/21
50
Diff deaths
0
-50
Fig. 4.2 Scatter plots of the estimated model based on four sub–samples
4.6 Data Examples 199
70000
1400
14-day lagged confirmed cases
deaths
1200
break
realised vaccination rate 30% 50% 70%
50000
1000
confirmed cases
800
deaths
30000
600
400
10000
200
0
0
26/Mar/20 08/Oct/20 22/Apr/21 04/Nov/21
Fig. 4.3 Detected breaks with the shaded areas indicating the first, second and third national
lockdown phases
lagged log–differenced confirmed cases in the four sub–samples. The results show
that there was, conspicuously, a negative relationship between confirmed cases and
deaths in the first sub–sample. This might be attributed to the lack of COVID-19
tests at the beginning of the pandemic. A relatively strong positive relationship
between confirmed cases and deaths emerge in the second and third sub–samples,
while the positive linear relation weakens in the third and fourth sub–sample. This
might be explained by the different seasons spanning each period, the national
lockdown policy in the UK, and the administration of vaccines. The UK instituted
three national lockdowns: the first phase was 26 March to 16 June 2020, second
phase was 31 Oct to 2 Dec 2020, and third phase was 1 Jan to 12 Apr 2021.
Figure 4.3 displays the raw data of confirmed cases and deaths with the detected
breaks as they coincide with the dates of lockdowns and national vaccination
rates. We found then that during the first and third national lockdowns, a positive
relationship between confirmed deaths and lagged cases was maintained, while after
high vaccination rates were achieved following June 2021, the relationship became
weaker.
Example 4.6.3 (Changing Trends in Global Near–Surface Temperature) As
an application of Theorem 4.3.2, Aue et al. (2009b) analyzed the average global
near–surface temperatures collected in the benchmark data set HadCRUT31 (see
Brohan et al., 2006) which is frequently used in climatology to study the impact of
global warming. The data extends earlier versions compiled by Jones (1994) and
Jones and Moberg (2003). HadCRUT3 has been updated making use of additional
observations and advances in the marine component of the data set, blending the
measurements of over 4000 land and marine stations located around the globe. The
time series is commonly used to illustrate the increase in global mean temperatures
since the 1850s. For each year from 1850 until 2008, anomalies in the average
200 4 Regression Models
temperatures are reported in degrees Celsius (◦ C) that are centered using the
baseline temperature calculated as the average from 1961 and 1990. A time series
plot of HadCRUT3 is shown in Fig. 4.4. Aue et al. (2009b) conducted preliminary
model fitting with polynomials of up to seventh order. Inspecting the corresponding
model residuals, they found that neither of these polynomial fits provides an
acceptable description of the whole data set of 159 observations. Hence instead
piecewise quadratic polynomials were fit to the series using binary segmentation
obtained from a repeated application of the test statistic T̂N in Theorem 4.3.3,
with the threshold of binary segmentation determined as the 95% quantile of the
corresponding null limiting distribution. This lead to three change point estimators
and four sub–subsamples with an approximately quadratic trend: years 1874, 1922
and 1944. The resulting estimated trend is also shown in Fig. 4.4, with the estimated
break points shown as vertical lines. This trend is mostly in line with the conclusions
drawn in Brohan et al. (2006), who argue that there have been two major periods of
(unusual and man–made) increases in the average global near-surface temperatures.
The first approximately from 1930 to 1940, and the second in the 1970s.
4.7 Exercises
xT
i β̂ N,1 , 1 ≤ i ≤ N, where β̂ N,1 is the least square estimator. We use the statistic
| k |
|E k E
N |
1 | |
TN =
. max | Êi,0,N − Êi,0,N | , (4.7.1)
rN
1/2 1≤k<N | N |
i=1 i=1
where
⎧ ⎛ ⎞2
⎪
⎨E k
1 E k
.rN = min ⎝Êi,0,k − Êj,0,k ⎠
1≤k≤N ⎪
⎩ i=1 k
j =1
⎛ ⎞2 ⎫
⎪
⎬
E
N
1 E
N
+ ⎝Êi,k,N − Êj,k,N ⎠ ,
N −k ⎪
⎭
i=k+1 j =k+1
N 1/2 ||β 1 − β k1 +1 || → ∞.
.
where
⎧ ⎛ ⎞2
⎪ ⎛ ⎞2
⎨E k
1E
k Ej
1 Ej
.rN = min Êi,0,k − Êl,0,k + ⎝Êi,k,j − Êl,k,j ⎠
1≤k≤N ⎪
⎩ i=1 k j −k
l=1 i=k+1 l=k+1
⎛ ⎞2 ⎫
E
N E
N ⎪
⎬
⎝Êi,j,N 1
+ − Êl,j,N ⎠ ,
j −k ⎪
⎭
i=j +1 l=j +1
Testing for change points in linear models appears to have been initiated in Quandt
(1958) and Quandt (1960) who suggested maximally selected statistics and provided
practical advice how to obtain critical values. Gombay and Horváth (1994), Horváth
(1995) and Horváth and Shao (1995) obtained the limit distributions of some of
the test statistics proposed by Quandt (1958), Quandt (1960) including maximally
selected .F –statistics and the likelihood ratio. McCabe and Harrison (1980) also
contributed to this literature and advise the use of ordinary least squares residuals
rather than recursive cumulative sum control chart (CUSUM)-type tests. Later
McCabe (1988), using a multiple decision theory approach, shows that the CUSUM
test is optimal in a decision theoretic sense for structural stability in scale and
variance models, and also that the CUSUM-of-squares test is similarly optimal for
structural stability in variance of linear regression models. Turning to estimation of
the time change of change, Hušková (1996) gave large sample approximation for
the estimator of the time of change assuming that we have exactly one change in
the regression coefficients during the observation period. The serial independence
of the error terms is assumed in these early articles. Andrews (1993) provides a
general methodology to test for the stability of random systems from an economic
viewpoint. Ghysels et al. (1997), Bai (1999), Bai and Perron (1998) and Hall et al.
(2012) followed the suggestions of Andrews (1993), and they also used maximally
selected statistics, but the maxima were not computed for all observations points,
and a fraction of early and late observations are trimmed. Horváth et al. (2017b)
derived the limit distribution of a maximally selected test which is derived under the
assumption that there are exactly R changes in the parameters. Their statistic is a
maximally selected weighted likelihood ratio.
Bai (1999) used the likelihood ratio test in linear models. He derived the
limit distribution of .maxLN δ⎦≤k≤N −LN δ⎦ (−2 log Λk ) which follows from (4.1.24),
if .0 < δ < 1/2. The second part of Theorem 4.1.2 shows that for .δ = 0 we
obtain a Darling–Erdős type result. Bai (1995) derives the limit distribution of the
estimator for the time of change along the lines of Theorem 4.2.3 with .κ = 1/2, but
under stricter conditions on .ΔN , the size of the change. He also points out that the
estimator is related to the sup–Wald–type statistic. Bai (1999) derived the likelihood
ratio assuming that multiple changes can occur in the regression parameters. Bai
and Perron (1998) uses least squares to detect multiple changes and obtains the
limit distributions of the estimators for the time of change, and Bai and Perron
(2003) investigates computational issues of change point tests. Bai (1995) develops
an asymptotic theory for least absolute deviation estimation of a shift in linear
regressions. Rates of convergence and asymptotic distributions for the estimated
regression parameters and the estimated shift point are also derived. One of the
examples in Horváth et al. (2022) is the heavily weighted CUSUM process of the
residuals. They also derived tests based on the comparison of the estimators when
the change occurs late or early in Horváth et al. (2022).
4.8 Bibliographic Notes and Remarks 205
Nyblom (1989) derives the locally best invariant test as a Lagrange multiplier
test and shows that it is a quadratic form of the sums of the weighted residuals.
Hansen (1997) provides a method to compute critical values for maximally selected
but heavily truncated standardised statistics, like in Theorems 4.1.25(ii)–4.1.7(ii),
but the maximum is taken on .LNα⎦ ≤ k ≤ LNβ⎦ with some .0 < α < β < 1.
Perron et al. (2020) extended the likelihood method to detect changes in regression
parameter and variance at the same time in Csörgő and Horváth (1997) to dependent
observations. Hall et al. (2015) uses least squares with penalty term to fit a change
point model to the data. Kurozumi and Tuvaandorj (2011) considers the issue of
selecting the number of regressors and the number of structural breaks in multivari-
ate regression models in the possible presence of multiple structural changes. They
develop a modified Akaike information criterion, a modified Mallows’ criterion, and
a modified Bayesian information criterion. Lin and Teräsvirta (1994) assumes that
instead of a single jump the regression parameter changes according to a continuous
function after an unknown time.
Theorems 4.1.3 and 4.1.4 are taken from Horváth et al. (2023a) who also
considered the case when the errors are heteroscedastic.
Kulperger (1985) investigates the asymptotic properties of the CUSUM process
of the residuals in polynomial regression. He points out these are different from
the case when (4.3.1) holds. Theorem 4.3.1 is due to Hansen (2000). Albin and
Jarušková (2003) provides test to find changes in linear trends, and they proved
Theorem 4.3.2 when .p = 1. Our proofs of the results in Sect. 4.3 are based on Aue
et al. (2008), Aue et al. (2009b), and Aue et al. (2012). Aue et al. (2008), Aue et al.
(2012) uses the maximally selected likelihood ratio method to test for stability of the
parameter against exactly one change. However, they also showed that the derived
tests are consistent against several changes under the alternative, and discuss the
applicability of the limit results in case of small and moderate sample sizes.
Neumeyer and Keilegom (2009) use nonparametric kernel estimators to check
the stability of the innovations.
We assumed that the variance of the errors remain the same even if the regression
parameters remain the same. The results of the present section can be extended to
cover the changing variance case as well (see Bai, 1997, 1999 and Bai and Perron
(1998, 2003)). If we know that some of the regression parameters do not change,
they may be treated as nuisance parameters.
Chapter 5
Parameter Changes in Time Series
Models
We develop in this chapter the asymptotic theory surrounding change point methods
for many popular time series models. Although up to this point we have generally
taken into consideration potential serial dependence in the observations under study,
in this chapter we are concerned with detecting change points in the parameters for
models specifically designed to capture the serial dependence structure of a time
series. To begin, in Sect. 5.1 we consider change point methods for autoregressive,
moving average (ARMA) models. Dynamic regression models for a scalar time
series modelled jointly with covariate series are considered in Sect. 5.2. Random
coefficient autoregressive models are studied in Sect. 5.3. In Sect. 5.4 we consider
generalized autoregressive conditionally heteroscedastic (GARCH) models along
with other models for conditionally heteroscedastic time series. Extensions of these
approaches to linear and non–linear multivariate time series models are considered
in Sects. 5.5 and 5.6.
.{Ei , i ∈ Z} so that
An autoregressive process of order d (AR(d)) follows the above model where .ψ1 =
· · · = ψr = 0, and similarly a moving average model of order r (MA(r)) is as
above with .φ1 = · · · = φd = 0. Change point analysis of pure AR processes can
be framed simply as a change in a linear model as studied in Chap. 4. An AMOC
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 207
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_5
208 5 Parameter Changes in Time Series Models
where
H0 : β 0 = β A ,
. (5.1.2)
HA : β 0 /= β A .
.
Let .β̂ N be the least squares estimator for the autoregressive parameter that
minimizes as a function of .β the sum of squares
Σ
N
. (yi − xT 2
i β) .
i=d+1
We note that this estimator differs in only finitely many terms with the estimator
obtained by minimizing
Σ
N
. (yi − xT 2
i β) , (5.1.3)
i=1
which depends on unobserved, past values of the series .y0 , . . . , y−d . In the asymp-
totic arguments below to lighten the notation we assume the estimator is computed
to minimize (5.1.3), which as we show is of no asymptotic consequence. Similarly
we may define .β̂ k,1 and .β̂ k,2 , the least-squares estimators of the parameters based
on the first k and last .N − k observations, as in (4.1.6).
We define the model residuals as
Êi = yi − xT
. i β̂ N , (5.1.4)
and write .β 0 = (β1,0 , . . . , βd,0 )T . In this way we may also define the likelihood
ratio statistic as in (4.1.4), and the process of the difference between the estimated
parameters in (4.1.36).
5.1 ARMA Models 209
Under the null hypothesis, there exists a stationary series satisfying (5.1.1) if the
following conditions are satisfied:
Assumption 5.1.1 The roots of the polynomials .1 − β1,0 t − · · · − βd,0 t d and .1 −
β1,A t − · · · − βd,A t d are outside the unit circle in .C.
Assumption 5.1.2 .{Ei , i ∈ Z} are independent and identically distributed random
variables, .EE0 = 0, .E|Ei |ν < ∞ with some .ν > 4.
We discuss at the end of this section how Assumption 5.1.2 can be replaced
with weaker conditions in some cases, including allowing for serially dependent
innovations.
Under these conditions the test statistics introduced in Theorems 4.1.1–4.1.7 have
the same asymptotic distributions as developed in Chap. 4.
Theorem 5.1.1 If .H0 of (5.1.2), Assumptions 5.1.1 and 5.1.2 hold, then Theo-
rems 4.1.1–4.1.7 remain true.
Proof Under .H0 and Assumption 5.1.1 there is a unique, stationary and causal
sequence .{yi , i ∈ Z} satisfying the equation
yi = xT
. i β 0 + Ei , i ∈ Z, (5.1.5)
D = EE02 A,
.
1 Σ 2
N
2
sN
. = Êi .
N
i=1
the complex unit circle, then the new regime defining .yi beyond .k ∗ also admits a
stationary solution.
In analyzing this case further we assume that Assumption 2.1.1 holds, so that
∗
.k = LNθ ⎦ with some .0 < θ < 1. Our strategy is to first show that there exists a
With .β̄, under .HA we write the residuals in the following form:
⎧
Ei + xT T
i (β 0 − β̄) + xi (β̄ − β̂ N ), 1 ≤ i ≤ k∗,
Êi =
.
Ei + xT T
i (β A − β̄) + xi (β̄ − β̂ N ), k ∗ + 1 ≤ k ≤ N.
and
1 Σ
N
. lim xi xT
i = A2 in probability. (5.1.8)
N −k ∗ →∞ N − k ∗ ∗ i=k +1
As we shall see these results may be used to establish the asymptotic behaviour of
the linear model change point test statistics under .HA .
We discuss the proofs of (5.1.6)–(5.1.8) in two cases: when the size of the change
is constant, i.e. the difference between .β 0 and .β A does not depend on N , and when
the change in the parameters is small, so that .||β 0 − β A || → 0 as .N → ∞.
First we consider the case when the size of the change is constant. We define the
stationary sequence .{ŷi , i ∈ Z} as the solution of the AR(d) model
ŷi = x̂T
. i β A + Ei , i ∈ Z,
5.1 ARMA Models 211
j
E|yk ∗ +j − ŷk ∗ +j | ≤ c1 ρ1 ,
. (5.1.9)
with some .0 < ρ1 < 1. We already established (5.1.7) with .A1 = Ex0 xT
0 . Due to
(5.1.9), standard arguments give (5.1.6) with
( )
. β̄ = (θ A1 + (1 − θ )A2 )−1 θ A1 β 0 + (1 − θ )A2 β A
If the processes .{yi , i ∈ Z} and .{ŷi , i ∈ Z} are non degenerate, then .A1 and .A2 are
non singular, so if .||β 0 − β A || > 0, then the maximum of the weighted functions
of .ZN (k), as defined in (4.1.27) and Theorem 4.1.3, converge to .∞ in probability at
rate .N 1/2 .
We now consider when .ΔN = β 0 − β A depends on N, and .||ΔN || → 0 as
.N → ∞. For the sake for notational simplicity, we only consider the AR(2) case,
i.e. .d = 2, but the result may be extended. After .k ∗ , the observations satisfy the
recursion
Yi = GYi−1 + ei ,
.
j −1
Σ
Y
. k ∗ +j =G Yj
k∗ + Gl ek ∗ +j −l .
l=0
||Gm || ≤ c2 ρ2m
. with some 0 < ρ2 < 1, for all m ∈ N.
212 5 Parameter Changes in Time Series Models
where .ȳk is the stationary solution in (5.1.5). Hence (5.1.6) holds with .β̄ = θ β 0 +
(1 − θ )β A and (5.1.7) and (5.1.8) with .A1 = A2 = E x̄0 x̄T 0 . Putting together our
estimates we get
∗ ∗
⎛ ∗
⎞
Σ
k Σ
k Σ
k ⎛ ⎞
. xi Êi = xi Ei + ⎝ xi xTi ⎠ β 0 − β̂ N
i=1 i=1 i=1
⎛ ⎞
= OP N 1/2 + θ (1 − θ )E x̄0 x̄T
0 NΔN (1 + oP (1)).
As a result the maximum of the weighted process .ZN (k), as defined in (4.1.27) and
Theorem 4.1.3, converges in probability to .∞ at rate .N 1/2 ||ΔN ||. In the case of
the self–normalized Darling–Erdős statistic as in Theorem 4.1.3(ii), divergence to
infinity occurs when .N 1/2 ||ΔN ||/(log log N)1/2 → ∞. In both cases we see that
the consistency results as established in Chap. 4 remain true if at least one of the
changes in the regression parameters is not too small (as in Assumption 4.2.2).
Similarly it may be shown that the change point estimator .k̂N defined in
(4.2.3) satisfies the same consistency and asymptotic distributional properties for
autoregressive processes as detailed in Theorem 4.2.3.
The AR(1) case has received special attention in the literature. In this model if
.|β0 | < 1, the solution is stationary, if .|β0 | = 1, then the sequence starting from an
initial value is a random walk, and .|β0 | > 1 gives rise to a process starting from
an initial value that is “explosive". We provide more details on the latter cases in
Sect. 5.1.1 below.
Remark 5.1.1 We assumed in Assumption 5.1.2 that the errors are independent and
identically distributed. This can be replaced with the requirement that .{Ei , i ∈ Z}
is a mean zero, uncorrelated, and stationary sequence. If Assumption 5.1.1 holds,
then under the null hypothesis .{yi , i ∈ Z} remains an .Lν –decomposable Bernoulli
shift, and therefore the results of Chap. 4 can be used to check the stability of the
autoregressive parameters. Section 5.1.1 provides an example how to prove the
decomposability of a linear process when the innovations are from a decomposable
Bernoulli sequence. For a general theory of ARMA–GARCH processes we refer to
Ling and Li (1998), Li et al. (2002), Ling and McAleer (2003a), Ling and McAleer
(2003b) and Francq and Zakoian (2004). We discuss change point detection in
GARCH sequences in Sect. 5.4 below.
Remark 5.1.2 In this section we assumed that .{xi , 1 ≤ i ≤ N}, i.e. .{yi , −d ≤
i ≤ N} are available for statistical analysis. However, we only observe .yi , .i ∈
{1, . . . , N }, so for the few first .xi ’s some initial values replacing .y0 , y−1 , . . . , y−d+1
are needed. A standard approach is to replace these values with the mean of the
5.1 ARMA Models 213
series. The effect of the initial values decays geometrically fast asymptotically
under the assumptions of Theorem 5.1.1, and will not effect the conclusions of
Theorem 5.1.1.
We now turn to the problem of performing change point analysis for the param-
eters in a general ARMA.(d, r) model. A single change point in the ARMA.(d, r)
parameters may be represented in terms of the model
⎧
⎪ φ1,0 yi−1 + . . . +
⎪
⎨ ∗
φd,0 yi−d + Ei + ψ1,0 Ei−1 + . . . + ψr,0 Ei−r ,
if 1 ≤ i ≤ k ,
yi. = (5.1.11)
⎪
⎪ φ yi−1 + . . . + φd,A yi−d + Ei + ψ1,A Ei−1 + . . . + ψr,A Ei−r ,
⎩ 1,A
if k ∗ + 1 ≤ i ≤ N.
model admits a stationary and causal solution before and after the change. Let
and
β = (φ1 , . . . , φd , ψ1 , . . . , ψr )T ∈ Rd+r .
.
We let
denote the true model parameters. For a given parameter vector .β, the model errors
may be estimated by the recursive equations:
Ê1 (β) = y1
.
If i is large, then
Σ
d Σ
r
Ēi (β) = yi −
. βl yi−l − βd+l Ēi−l , i ∈ Z. (5.1.12)
k=1 l=1
Using Assumption 5.1.3, (5.1.12) has a stationary solution for all .β ∈ O, where .O
is a compact subset of .Rd+r such that for every element of .O, Assumption 5.1.3
holds. We assume that .β 0 is in the interior of .O. We note that
with some .0 < ρ < 1 (see Brockwell and Davis, 2006, p. 265). The function
E Ē02 (β) reaches its unique smallest value at .β 0 , and since .E Ē02 (β) is an analytical
.
Σ
k
∂ Ê 2 (β)
Ŝk (β) =
.
i
.
∂β
i=1
If .β̂ N satisfies
ŜN (β̂ N ) = 0,
.
||β̂ N − β 0 || = OP (N −1/2 )
.
5.1 ARMA Models 215
(see Brockwell and Davis, 2006, Section 8.11). Hence under the null hypothesis
Ŝk (β̂ N ) ≈ 0,
. for all k ∈ {1, . . . , N}.
Let
∂ Ēi2 (β) ∂ Ēi (β)
ei (β) =
. = 2Ēi (β) .
∂β ∂β
We note that .{ei (β), i ∈ Z} is a stationary sequence with .Ee0 (β 0 ) = 0. Due to the
construction, .{ei (β), i ∈ Z} are uncorrelated random variables. Let
C = Ee0 (β 0 )eT
. 0 (β 0 ). (5.1.14)
Theorem 5.1.2 We assume that .H0 of (5.1.2), Assumptions 5.1.2 and 5.1.3 are
satisfied.
(i) If .0 ≤ κ < 1/2, then
⎛ ⎞2+2κ
N2
. max ŜT −1
k (β̂ N )C Ŝk (β̂ N )
1≤k<N k(N − k)
D Σ
d+r
1
→ max B 2 (t),
0<t<1 [t (1 − t)]2κ i
i=1
for all .x ∈ R, where .a(x) and .bd+r (x) are defined in (1.2.18).
Proof Due to (5.1.13)
Σ
k
∂ Ē 2 (β) Σ
N
∂ Ēi2 (β)
S̄k,1 (β) =
.
i
and S̄k,2 (β) = ,
∂β ∂β
i=1 i=k+1
216 5 Parameter Changes in Time Series Models
||β̄ N − β 0 || = OP (N −1/2 )
. (5.1.15)
(see Brockwell and Davis, 2006, Sections 8.11). Standard arguments give that
⎛ ⎛ ⎞⎞
1 −1 1
.β̄ N − β 0 = − D SN (β 0 ) 1 + OP ,
N N
where
⎛ ⎞
∂ 2 Ē02 (β 0 )
D=E
. . (5.1.16)
∂β∂β
Next we write
Thus we get
⎛ ⎞1/2 || ⎛ ⎞||
N || ||
max ||S̄k,1 (β̄ N ) − S̄k,1 (β 0 ) − k S̄N,1 (β 0 ) || = OP (1).
.
1≤k≤N k || N ||
We observe that
Hence
⎛ ⎞1/2 || ⎛ ⎞||
|| ||
max
N ||S̄k,1 (β̄ N ) − −S̄k,2 (β 0 ) + N − k S̄N,1 (β 0 ) || = OP (1).
.
1≤k<N N −k || N ||
Theorem A.1.3 implies that for each N there are independent Wiener processes
{WN,1 (x), 0 ≤ x ≤ N/2} and .{WN,2 (x), 0 ≤ x ≤ N/2} with values in .Rd+r such
.
that
1 || ||
||S̄k,1 (β 0 ) − WN,1 (k)|| = OP (1)
. max ζ
1≤k≤N/2 k
and
1 || ||
. max ||S̄k,2 (β 0 ) − WN,2 (N − k)|| = OP (1)
1≤k≤N/2 (N − k) ζ
with some .ζ < 1/2, .EWN,1 (x) = EWN,2 (x) = 0 and .EWN,1 (x)WT
N,1 (y) =
T
EWN,2 (x)WN,2 (y) = min(x, y)C. If
⎧ x
⎪
⎪ W (x) − (WN,1 (N/2) + WN,2 (N/2)), if 0 ≤ x ≤ N/2,
⎨ N,1 N
.BN (x) =
N −x
⎪ −W N,2 (N − x) + (WN,1 (N/2) + WN,2 (N/2)),
⎪
⎩ N
if N/2 ≤ x ≤ N,
then
|| ||
N 1/2−ζ || ||
|| 1 S̄LN t⎦ (β̄ N ) − 1 BN (N t)||
. max || || (5.1.18)
1/(N +1)≤t≤1−1/(N +1) [t (1 − t)] ζ N 1/2 N 1/2
= OP (1)
We showed that .S̄k (β̄ N ) can be approximated with a CUSUM process, and due
to the weighted Gaussian approximation in (5.1.18), the results in Sect. 1.3 imply
Theorem 5.1.2. ⨆
⨅
218 5 Parameter Changes in Time Series Models
To construct a test of .H0 , we must estimate .C. Since the derivatives of the
Êi (β)’s are asymptotically uncorrelated, the sample variance is a sensible and
.
∂ Êi2 (β)
êi (β) =
.
∂β
and
1 Σ
N
ĈN (β) =
. êi (β)êT
i (β). (5.1.19)
N
i=1
so Theorem 5.1.2 remains true with .ĈN (β̂ N ) in place of .C. We also note that
Brockwell and Davis (2006) (Section 8.11) provides a simple and computationally
efficient recursive scheme for calculating .êi .
We may also consider test statistics based on direct comparisons of the estimators
computed from the first k and the last .N − k observations. Let
Σ
k
∂ Ê 2 (β)
Ŝk,1 (β) =
.
i
∂β
i=1
and
Σ
N
∂ Êi2 (β)
Ŝk,2 (β) =
. .
∂β
i=k+1
The least squares (quasi- maximum likelihood) estimators .β̂ k,1 and .β̂ k,1 are the
solutions of
The asymptotic distribution of the process .β̂ k,1 − β̂ k,2 is determined by the
asymptotic behaviour of the partial sums of the variables .ê(β 0 ). We recall the
matrices .C and .D from (5.1.14) and (5.1.16).
5.1 ARMA Models 219
Theorem 5.1.3 We assume that .H0 of (5.1.2), Assumptions 5.1.2 and 5.1.3 are
satisfied.
(i) If .0 ≤ κ < 1/2, then
⎛ ⎞−2+2κ
N2
. max (β̂ k,1 − β̂ k,2 )T (DCD)−1 (β̂ k,1 − β̂ k,2 )
1≤k,N k(N − k)
D Σ
d+r
1
→ max B 2 (t),
0<t<1 [t (1 − t)]2κ i
i=1
for all .x ∈ R, where .a(x) and .bd+r (x) are defined in (1.2.18).
Proof As in the proof of Theorem 5.1.2 we need to consider the estimators .β̄ k,1 and
.β̄ k,2 satisfying
where
Σ
k
∂ Ē 2 (β) Σ
k
S̄k,1 (β) =
.
i
= ēi (β)
∂β
i=1 i=1
and
Σ
N
∂ Ēi2 (β) Σ
N
S̄k,2 (β) =
. = ēi (β).
∂β
i=k+1 i=k+1
Let .O be a compact subset of .Rd+r such the for all .β = (β1 , . . . , βd+r )T ∈ O, the
roots of the polynomials
φ(t, β) = 1 − β1 t − β2 t 2 − . . . − βd t d
.
and
are outside of the unit circle in the complex plane. If .β ∈ O, then the invertible
representation of the ARMA innovations implies that
∞
Σ
Ēi (β) =
. πl (β)yi−l , (5.1.21)
l=0
and therefore .{Ēi2 (β), i ∈ Z} is .Lν –decomposable. Hence by the maximal inequal-
ity (A.3.2) and Chebyshev’s inequality we obtain that
⎧ || k || ⎞
|| 1 Σ ||
|| ||
. lim lim sup P max sup || Ēi (β) − E Ē0 (β)|| > δ = 0,
2 2
(5.1.22)
M→∞ N →∞ M≤k≤N β∈O || k ||
i=1
and
⎧ || || ⎞
|| 1 Σ
N ||
|| ||
lim
. lim sup P max sup || Ēi (β) − E Ē0 (β)|| > δ
2 2
(5.1.23)
M→∞ N →∞ 0≤k≤N −M β∈O || N − k ||
i=k+1
=0
for all .δ > 0. Since .Ēi2 (β) is a continuous function of .β ∈ O, (5.1.22) and (5.1.23)
imply
⎧ || k || ⎞
|| 1 Σ ||
|| ||
. lim lim sup P max sup || Ēi2 (β) − E Ē02 (β)|| > δ = 0, (5.1.24)
M→∞ N →∞ M≤k≤N β∈O || k ||
i=1
and
⎧ || || ⎞
|| 1 Σ
N ||
|| ||
lim lim sup P
. max sup || Ēi (β) − E Ē0 (β)|| > δ
2 2
(5.1.25)
M→∞ N →∞ 0≤k≤N −M β∈O || N − k ||
i=k+1
=0
for all .δ > 0. The function .E Ē02 (β) has its unique minimum at .β 0 , hence it follows
from (5.1.24) and (5.1.25)
and
⎧ ⎞
. lim lim sup P max ||β̄ k,2 − β 0 || > δ = 0. (5.1.27)
M→∞ N →∞ 0≤k<N −M
5.1 ARMA Models 221
Following the proofs of (5.1.26) and (5.1.27), one can show that there is a neighbour-
hood .O∗ ⊂ O, containing .β 0 such that
|| ||
|| 1 ∂ S̄ (β) ||
|| k,1 −1 ||
. max sup || − D̄ (β)|| = oP (1), (5.1.28)
M≤k≤N β∈O∗ || k ∂β ||
as .M, N → ∞, and
⎧ || || ⎞
|| 1 ||
|| ∂ S̄M,2 (β) −1 ||
. lim lim sup P sup || − D̄ (β)|| > δ = 0, (5.1.29)
M→∞ N →∞ β∈O∗ || N − M ∂β ||
Thus we get
1
β̄ k,1 − β 0 = − (D + Rk,1 )S̄k,1 (β 0 )
.
k
and
1
β̄ k,2 − β 0 = −
. (D + Rk,2 )S̄k,2 (β 0 ),
N −k
and
⎧ ⎞
|| ||
. lim lim sup P max ||Rk,2 || > δ = 0
M→∞ N →∞ 0≤k≤N −M
and
⎛ ⎞1/2
N −k
. max ||β̄ k,2 − β 0 || = OP (1).
1≤k<N log log(N − k)
222 5 Parameter Changes in Time Series Models
Since the second derivatives of .S̄k,1 (β)/k are also bounded in a neighbourhood
of .β 0 with probability tending to 1, applying a two term Taylor expansion for the
coordinates of .S̄k,1 we get
k || ||
. max ||k(β̄ k,1 − β 0 ) + DS̄k,1 (β 0 )|| = OP (1), (5.1.30)
1≤k≤N log log k
and similarly
N −k || ||
. max ||(N − K)(β̄ k,2 − β 0 ) + DS̄k,2 (β 0 )|| (5.1.31)
1≤k<N log log(N − k)
= OP (1).
Now we can use Theorem A.1.3, and define for each N two independent Gaussian
processes .{WN,1 (x), 0 ≤ x ≤ N/2} and .{WN,2 (x), 0 ≤ x ≤ N/2} such that
1 || ||
||S̄k,1 + WN,1 (k)|| = OP (1),
. sup ζ
1≤k≤N/2 k
and
1 || ||
. sup ||S̄k,1 + WN,2 (N − k)|| = OP (1)
N/2≤k≤N −1 (N − k)
ζ
with some .ζ < 1/2, and .EWk,1 (x) = EWk,2 (x) = 0, .EWk,1 (x)Wk,1 (x ' ) =
EWk,2 (x)Wk,2 (x ' ) = C min(x, x ' ), where .C is defined in (5.1.14). Next we write
k(N − k)
. (β̄ k,1 − β̄ k,2 )
N
⎧
⎪ k ( )
⎨ Sk,1 − SN/2,1 + SN/2,2 + Rk,3 , if 1 ≤ k ≤ N/2,
= N
⎪ N −k ( )
⎩ −Sk,2 + SN/2,1 + SN/2,2 + Rk,4 , if N/2 ≤ k < N
N
using (5.1.30) and (5.1.31) we get that
1
. max ||Rk,3 || = OP (1),
1≤k≤N/2 log log k
1
. max ||Rk,4 || = OP (1).
N/2≤k<N log log(N − k)
These approximations make it possible to use the results in Sect. A.1 to establish the
results by repeating the arguments in Sect. 1.3. ⨆
⨅
5.1 ARMA Models 223
The norming matrix is .(DCD)−1 = D−1 C−1 D−1 . We may again estimate .C with
.Ĉ(β̂ N ) (5.1.19), and we may define a similar estimator for .D
−1 . Let
1 Σ ∂ 2 ê2i (β̂ N )
N
.D̂−1
N = .
N ∂β∂β
i=1
and therefore
|| ||
|| −1 −1 −1 ||
. ||D̂N ĈN D̂N − (DCD)−1 || = OP (N −1/2 ).
then
⎛ ⎞3/2
1 N2
. max
(log log N)1/2 1≤k<N k(N − k)
⎛ ⎞1/2 P
(β̂ k,1 − β̂ k,2 )T Ĝ−1
N ( β̂ k,1 − β̂ k,2 ) → ∞
holds.
Remark 5.1.3 We assumed for the sake of simplicity that Assumption 5.1.2 holds.
This assumption can be replaced with a decomposable Bernoulli shift assumption
along with some further conditions. The identifiability of .β 0 requires that .EEj Ei =
224 5 Parameter Changes in Time Series Models
where .y0 is an initial value for the recursion. We assume that the innovations are
Lν –decomposable, i.e. we do not require that the .Ei ’s are independent. If
.
where
∞
Σ
ȳi =
. β0l Ei−l . (5.1.34)
l=0
(1)
. H0 : μ0 = μA , (5.1.36)
(1)
HA : μ0 /= μA .
. (5.1.37)
However, the parameter of the AR(1) process driving the error terms stays the same
during the observation period. We recall the CUSUM process
⎛ ⎞
LN
Σ LNt⎦ Σ ⎠
t⎦ N
−1/2 ⎝
ZN (t) = N
. Xi − Xi , 0 < t < 1.
N
i=1 i=1
1 1 D 1
. sup |ZN ((N + 1)t/N)| → sup |B(t)|,
τ̂N 0<t<1 [t (1 − t)]κ
0<t<1 [t (1 − t)]
κ
denote the long–run variance of the AR(1) innovations. Recall that .τ̂N2 is the long–
run variance estimator computed from .X1 , . . . , XN . As we shall see, it does not
estimate any particular long–run variance parameter when .β0 = 1.
226 5 Parameter Changes in Time Series Models
(1)
Theorem 5.1.5 If .H0 of (5.1.36), Assumption 5.1.4 holds, and the innovations in
(5.1.32) are .Lν –decomposable for some .ν > 4, then we can define Wiener processes
.{WN (t), 0 ≤ t ≤ 1} such that
| |
1 | 1 |
sup | ZN (t) − σ ┌N (t)|| = oP (1),
.
|
1/N ≤t≤1−1/N [t (1 − t)]
ζ N 1/2
Proof It follows from .Lν –decomposability of the innovations that for each N there
are two independent Wiener processes .{WN,1 (x), 0 ≤ x ≤ N/2} and .{WN,2 (x), 0 ≤
x ≤ N/2} such that
1 || |
. max yi − σ WN,1 (k)| = OP (1), (5.1.39)
1≤k≤N/2 k ζ̄
and
1 | |
. max |(yN − yk ) − σ WN,2 (N − k)| = OP (1) (5.1.40)
N/2≤k<N (N − k)ζ̄
with .σ > 0 of (5.1.38) and some .ζ̄ < 1/2. By (5.1.39) and (5.1.40) we have
| |
|LN t⎦ ⎛ Nt |
1 | | Σ |
sup y − σ W (x)dx | = OP (N ζ̄ +1 ),
.
+1 | i N,1 |
1/N ≤t≤1/2 t ζ̄
| i=1 0 |
and
| |
| Σ ⎛ N |
1 | N |
sup | (yN − yi )−σ WN,2 (N − x)dx || = OP (N ζ̄ +1 ).
.
|
1/2≤t≤1−1/N (1 − t)ζ̄ +1 |i=LN t⎦ Nt |
5.1 ARMA Models 227
Since
Σ
N LN/2⎦
Σ Σ
N
. yi = yi − (yN − yi ) + (N − LN/2⎦ )[yLN/2⎦ + (yN − yLN/2⎦ )]
i=1 i=1 i=LN/2⎦ +1
we get
|N |
|Σ |
| |
.| yi − σ ┌ˆ N | = OP (N 1/2+ζ̄ ),
| |
i=1
where
⎛ N/2 ⎛ N
┌ˆ N =
. WN,1 (x)dx − [WN,2 (N − x) − (WN,1 (N/2) + WN,2 (N/2))]dx.
0 N/2
If
⎧ ⎛⎛ N t ⎞
⎪
⎪ −3/2 ˆ
WN,1 (x)dx − t ┌N , if 0 ≤ t ≤ 1/2,
⎪
⎪ N
⎪
⎪ ⎛⎛0 N
⎪
⎪
⎨ −3/2
N [WN,2 (N − x) − (WN,1 (N/2)+WN,2 (N/2))]dx
.┌N (t) =
⎪
⎪
Nt ⎞
⎪
⎪ ˆ
⎪
⎪ + (1 − t)┌N ,
⎪
⎪
⎩
if 1/2 ≤ t ≤ 1,
Next we note
D
. {┌N (t), 0 ≤ t ≤ 1} =
⎧⎛ t
⎪
⎪ ˆ
⎨ W1 (x)dx − t ┌, if 0 ≤ t ≤ 1/2,
⎛ 1
0
⎪
⎪ ˆ
⎩ [W2 (1 − x) − (W1 (1/2) + W2 (1/2))]dx + (1 − t)┌, if 1/2 ≤ t ≤ 1,
t
228 5 Parameter Changes in Time Series Models
where
⎛ 1/2 ⎛ 1 1
┌ˆ =
. W1 (x)dx − W2 (1 − x)dx + (W1 (1/2) + W2 (1/2)) ,
0 1/2 2
{W1 (x), 0 ≤ x < ∞} and .{W2 (x), 0 ≤ x < ∞} are independent Wiener processes.
.
Let
⎧
W1 (x), if 0 ≤ x ≤ 1/2,
.W (x) =
−[W2 (1 − x) − (W1 (1/2) + W2 (1/2))], if 1/2 ≤ x ≤ 1.
Computing the covariance function one can verify that .{W (x), x ≥ 0} is a Wiener
process. Hence
⎧⎛ t ⎛ 1 ⎞
D
{┌N (t), 0 ≤ t ≤ 1} =
. W (x) − t W (x)dx, 0 ≤ t ≤ 1 .
0 0
Let
1 Σ
N
ŷN =
. yi .
N
i=1
Lν –decomposability yields
.
E(yi − yk )2 ≤ c5 |i − k|
.
Σ
ch
1 Σ
N
≤ c6 E|yi − ŷN ||yi+l − yi |
N
l=0 i=1
Σ 1 Σ⎛ ⎞1/2
ch N
≤ c6 E(yi − ŷN )2 E(yi+l − yi )2
N
l=0 i=1
⎛ ⎞
= O h3/2 N 1/2 .
5.1 ARMA Models 229
Thus we conclude
| ⎛ ⎞
| 1 Σ ch
1 Σ
N −l
| l
.| K (yi − ŷN )(yi+l − ŷN )
| Nh h N −l
l=0 i=1
⎛ c |
1 Σ
N |
|
− K(u)du 2 (yi − ŷN )2 |
0 N |
i=1
⎛⎛ ⎞ ⎞
h 1/2
= OP = oP (1)
N
and similarly
|
| −1 ⎛ ⎞
| 1 Σ l 1 Σ N
.| K (yi − ŷN )(yi+l − ŷN )
|Nh h N − |l|
| l=−ch i=−(l−1)
⎛ c |
1 Σ
N |
2|
− K(u)du 2 (yi − ŷN ) |
0 N |
i=1
⎛⎛ ⎞ ⎞
h 1/2
= OP = oP (1).
N
The result now follows from the approximation of the partial sums, since .yi is a
partial sum when .β = 1. ⨅
⨆
We may also consider the “explosive case” in which the AR(1) innovations in the
AMOC model satisfy
Assumption 5.1.6 .|β0 | > 1.
Theorem 5.1.6 If .H0(1) of (5.1.36), Assumption 5.1.6 hold, and the innovations in
(5.1.32) are .Lν –decomposable for some .ν > 4, then
| | k | | |||
| |Σ k Σ || || β0
N
||
| |
. ||β0 |
−N
max | Xi − Xi | − | (Y + y0 )||| = oP (1),
| 1≤k≤N | N | β0 − 1 |
i=1 i=1
where
∞
Σ
Y =
. β0−l El . (5.1.41)
l=1
230 5 Parameter Changes in Time Series Models
Proof Since
Σ
i−1
yi = β0i y0 +
. β0l Ei−l ,
l=0
The sequence .{zi , i ∈ Z} is .Lν –decomposable, .ν > 4, and therefore Theorem 1.1.1
yields
| k |
|Σ k Σ ||
N ⎛ ⎞
|
. max | zi − zi | = OP N 1/2 .
1≤k≤N | N |
i=1 i=1
Thus we conclude
| k | | |
|Σ k Σ ||
N | k Σ i ||
N ⎛ ⎞
| | i
. max | yi − yi | = max |β0 − β0 | |Y + y0 | + OP N 1/2
1≤k≤N | N | 1≤k≤N | N |
i=1 i=1 i=1
| | | | ⎛ ⎞
| β0 | | k k N ||
|
=| | |
(Y + y0 )| max |β0 − β0 | +OP N 1/2 .
β0 − 1 1≤k≤N N
When .β0 > 1, the function .x |→ |β0x − (x/N)β0N | has its largest value at .xN =
N − [log(N log β0 )]/ log β0 , and thus we conclude
| |
| k |
.β
−N
max |β − k β N | → 1. (5.1.43)
0 1≤k≤N | 0 N 0|
If .β0 < −1, the one needs to repeat the computations above and consider the
max1≤k≤N for odd and even k separately. In this case (5.1.43) also holds with .β0−N
.
replaced with .|β0 |−N . Hence the first part of the theorem is proven.
Using the representation in (5.1.42), we obtain
1 β0N ⎛ ⎞
ȳN =
. (y0 + Y ) + OP N −1/2 ,
N β0 − 1
5.1 ARMA Models 231
and
⎛ ⎞⎛ ⎞
1 β0N 1 β0N
.(yi − ȳN )(yi+l − ȳN ) = yi − (y0 + Y ) yi+l − (y0 + Y )
N β0 N β0
+ RN,1 (i)
⎛ ⎞⎛ ⎞
1 β0N i+l 1 β0N
= β0 −
i
β0 − (y0 + Y )2
N β0 − 1 N β0 − 1
+ RN,2 (i)
with
−1 ⎛ ⎞ N −l
⎛ ⎞⎛ ⎞
Σ
N
l 1 Σ 1 β0N 1 β0N
i+l
. K β0 −
i
β0 −
h N −l N β0 − 1 N β0 − 1
l=0 i=1
−1⎛ ⎞ ⎛
Σ
N
l 1 β02N +2 −l 1 β02N +1 −l
= K β − β
h N − l β02 − 1 0 N (β0 − 1)2 0
l=0
⎞
1 β02N +1 1 β02N
− + 2
N (β0 − 1)2 N (β0 − 1)2
1 2N β02 1
= β (1 + o(1)),
N 0 β02 − 1 1 − 1/β0
and therefore
Σ −1 ⎛ ⎞ N −l
1 Σ
N
l
Nβ0−2N
. K (yi − ȳN )(yi+l − ȳN )
h N −l
l=0 i=1
P β02 1
→ (Y + y0 )2 .
β02 − 1 β0 − 1
232 5 Parameter Changes in Time Series Models
−(N
Σ −1) ⎛ ⎞ Σ
N
l 1
Nβ0−2N
. K (yi − ȳN )(yi+l − ȳN )
h N − |l|
l=−1 i=−(l−1)
P β02 β0
→ (Y + y0 )2 ,
β02 − 1 β0 − 1
1 D
. sup |ZN ((N + 1)t/N)| → U (t),
τ̂N 0<t<1
where
⎧
⎪ sup |B(t)|,
⎨ 0<t<1 if |β0 | < 1,
U (t) =
.
0, if β0 = 1,
⎪
⎩
1, if |β0 | > 1,
Dynamic regression refers to linear regression models for a time series .{yt , t ∈
Z} that involve both autoregression as well as regression on additional, exogenous,
series. An AMOC dynamic regression model is formulated as follows:
⎧
xT ∗
i β 0 + Ei , 1 ≤ k ≤ k ,
. yk = T ∗ (5.2.1)
xi β A + Ei , k + 1 ≤ k ≤ N,
where the errors .{Ei , i ∈ Z} are independent and identically distributed, and
the covariate .xi = (1, zi,2 , . . . , zi,r , yi−1 , . . . , yi−d )T ∈ Rr+d combines both
exogenous and autoregressive components. We wish to test the null hypothesis
H0 : β 0 = β A
. (5.2.2)
5.2 Dynamic Regression Models 233
HA : β 0 /= β A .
.
β 0 = (β T
.
T T
z,0 , β y,0 ) , β z,0 ∈ Rr , β y,0 ∈ Rd ,
so that .β z,0 and .β y,0 correspond to the regression parameters for the exogenous
and autoregressive terms, respectively. Under the null hypothesis we assume
additionally that the process .{yt , t ∈ Z} is stationary, which is implied when
.zi = (1, zi,2 , . . . , zi,r )
T is .Lν –decomposable, and the following assumption holds.
We also assume
Assumption 5.2.4 .D is non singular.
We recall the integral .I (w, c) from (1.2.4).
Theorem 5.2.1 We assume that .H0 of (5.2.2) and Assumptions 5.2.1–5.2.4 are
satisfied.
(i) If .w(·) satisfies Assumption 1.2.1 and .I (w, c) < ∞ for some .c > 0, then
1 ⎡ T ⎤1/2
. sup ẐN (t)D−1 ẐN (t)
1/(N +1)≤t≤N/(N+1) w(t)
⎛r+d ⎞1/2
D 1 Σ
→ sup 2
Bi (t) ,
0<t<1 w(t) i=1
where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , r+d} are independent Brownian bridges.
(ii) Also,
⎧
1 ⎛ ⎞1/2
−1
. lim P a(log N) sup ẐN T(t)D ẐN (t)
1/(N +1)≤t≤N/(N+1) [t (1 − t)]
N →∞ 1/2
⎞
≤ x + br+d (log N) = exp(−2e−x )
for all .x ∈ R, where .a(t) and .br+d (t) are defined in (1.3.9).
The proof of Theorem 5.2.1 is based on the following two lemmas.
Lemma 5.2.1 We assume that .H0 of (5.2.2) and Assumption 5.2.1 are satisfied,
and that .{zi , i ∈ Z} are .Lν –decomposable for some .ν > 4. Then
(i) .{yi , i ∈ Z} is .Lν –decomposable, and (ii) .{zi zT
i , i ∈ Z} is .L
ν/2 –decompos-
able in .R .r×r
where
Brockwell and Davis (2006) contains the proof of (5.2.4). Equation (5.2.3) implies
the Bernoulli shift representation. By the triangle inequality we have
⎛ |∞ |ν ⎞1/ν
( ) |Σ |
| |
cl (wT
1/ν
. E|yi | = E| β 1,0 + Ei−l )|
ν
(5.2.5)
| i−l |
l=0
∞
Σ
≤ |cl |((E||wi−l ||ν )1/ν ||β 1,0 || + (E|Ei−l |ν )1/ν )
l=0
< ∞.
We introduce
⎧
∗
g(ηk , . . . , ηi−j +1 , ηi−j,i,j ∗
, ηi−j if k > i − j,
(0) −1,i,j , . . .),
.z = ∗ ∗
k,i,j g(ηk,i,j , ηk−1,i,j , . . .), if k ≤ i − j,
∗
where .{ηk,i,j , −∞ < i, j, k < ∞} are independent copies of .η0 , as in Defini-
tion 3.1.1. Let .zk,i,j = ((xk,i,j )T , Ek,i,j )T and .wk,i,j = (1, (zk,i,j )T )T . Now we
(0) (0) (0) (0) (0)
define
∞
Σ
∗ T
.yi,j = cl ((w(0) (0)
i−l,i,j ) β 1,0 + Ei−l,i,j ),
l=0
∗ = g(η , η
which satisfies .yi,j ∗ ∗
i i−1 , . . . , ηi−j +1 , ηi−j,i,j , ηi−j −1,i,j , . . .). Using the
∗
definitions of .yi and .yi,j we have
j −1
Σ
∗
yi − yi,j
. = cl ([wi−l − w(0) (0)
i−l,i,j ] β 1,0 + [Ei−l − Ei−l,i,j ])
T
l=0
∞
Σ
+ cl (wT
i−l β 1,0 + Ei−l )
l=j
∞
Σ
cl ((wi−l,i,j )T β 1,0 + Ei−l,i,j ).
(0) (0)
−
l=j
236 5 Parameter Changes in Time Series Models
(0)
Observing that .(zi , zi,k ) has the same distribution as .(zi , zk,i,j ), by .Lν –
decomposability and the triangle inequality we have
⎛ | |ν ⎞1/ν
|Σ |
|j −1 |
. ⎝E | | ⎠
(0) T
| cl ([wi−l − wi−l,i,j ] β 1,0 |
| l=0 |
j −1
Σ (0) (0)
≤ |cl |((E||wi−l − wi−l,i,j ||ν )1/ν ||β 1,0 || + (E|Ei−l − Ei−l,i,j |ν )1/ν )
l=0
⎛ ⎞
j −1
Σ
=O⎝ l−α ⎠ .
l=1
and
⎛ | |ν ⎞1/ν ⎛ ⎞
|Σ |
|∞ | Σ∞
. ⎝E | cl ((wi−l,i,j )T β 1,0 + Ei−l,i,j )|| ⎠ = O ⎝ ρl⎠ ,
(0) (0)
|
|l=j | l=j
≤ (E||zi ||ν )1/ν (E||zi − z∗i,j ||ν )1/ν + (E(||z∗i,j ||ν )1/ν (E||zi − z∗i,j ||ν )1/ν
⎛ ⎞
= O j −α+1 .
and therefore
|| ||
||⎛ T ⎞−1 1 || ⎛ ⎞
|| − A|| = OP N −3/2 .
.
|| XN XN N || (5.2.6)
where .EN = (E1 , . . . , EN )T . Putting together Assumption 5.2.3, Lemma 5.2.1 and
Theorem A.1.3 we conclude
⎛ ⎞
T
.||XN EN || = OP N
1/2
.
Êi = Ei + xT
. i (β 0 − β̂ N )
and therefore
xi Êi = xi Ei + xi xT
. i (β 0 − β̂ N ).
Thus we get
Σ
k
k Σ
N Σ
k
k Σ
N
. xi Êi − xi Êi = xi Ei − xi Ei
N N
i=1 i=1 i=1 i=1
⎛ k ⎞
Σ k Σ T
N
+ xi xT
i − xi xi (β 0 − β̂ N ). (5.2.8)
N
i=1 i=1
238 5 Parameter Changes in Time Series Models
Due to Lemma 5.2.1 we can use Theorem A.1.3 to obtain a Gaussian approximation
for the CUSUM processes on the right hand side of (5.2.8). However, according
to Lemma 5.2.2, the second term is much smaller than the first. So we can
define Gaussian processes .{BN (t), 0 ≤ t ≤ 1} satisfying .EBN (t) = 0 and
T
.EBN (t)B (s) = D(min(t, s) − ts) such that
N
1
N 1/2−ζ
. sup ||ẐN (t) − BN (t)|| = OP (1) (5.2.9)
1/(N +1)≤t≤N/(N +1) [t (1 − t)]ζ
with some .ζ < 1/2. Due to the weighted approximation in (5.2.9), the same
techniques used in Sect. 4.1.1 can be used to prove the results. ⨆
⨅
We only considered the asymptotic distribution of the supremum functionals of
ẐN (t) in the dynamical model of (5.2.1). Using the weighted approximation these
.
We wish to test
H0 : β0 = βA ,
. (5.3.2)
5.3 Random Coefficient Autoregressive Models 239
against
HA : β0 /= βA .
. (5.3.3)
(iii) .{Ei,2 , i ∈ Z} are independent and identically distributed random variables with
.EE0,2 = 0, .0 < EE
0,2 = σ2 < ∞ and .E|E0,2 | < ∞.
2 2 4
⎛ k ⎞⎛ k ⎞−1
Σ yi yi−1 Σ yi−1 2
β̂k,1
. = , 2 ≤ k ≤ N, (5.3.4)
i=2
1 + yi−1
2
i=2
1 + yi−1
2
and
⎛ ⎞⎛ ⎞−1
Σ
N
yi yi−1 Σ
N 2
yi−1
β̂k,2
. = , 1 ≤ k ≤ N − 1.
i=k+1
1 + yi−1
2
i=k+1
1 + yi−1
2
If .−∞ ≤ E log |β0 +E0,1 | < 0, then the solution of (5.3.1) under the null hypothesis
is close to .ȳi , the unique, causal, stationary solution of
The asymptotic variance of .QN (t) depends on if (5.3.1) has a stationary solution or
not:
⎧⎡ ⎛ ⎞2 ⎛ ⎞2 ⎤ ⎛ ⎞−2
⎪
⎪ ȳ 2 ȳ02
⎪ ȳ
σ2 ⎦ E
⎨ ⎣E
⎪ 0 0
σ1 + E
2 2
,
1 + ȳ02 1 + ȳ02 1 + ȳ02
.η =
2
(5.3.6)
⎪
⎪
⎪
⎪ if − ∞ ≤ E log |β0 + E 0,1 | < 0,
⎩ 2
σ1 , if E log |β0 + E0,1 | ≥ 0.
Theorem 5.3.1 We assume that .H0 of (5.3.2), Assumptions 1.2.1 and 5.3.1 hold,
and .−∞ ≤ E log |β0 + E0,1 | < 0.
(i) If .I (w, c) < ∞ for some .c > 0, then
1 1 D |B(t)|
. sup |QN (t)| → sup ,
η 0<t<1 w(t) 0<t<1 w(t)
⎛ ⎞−1 ⎛ ⎞
Σ
N 2
yi−1 Σ
N 2 E
yi−1 i,1 + yi−1 Ei,2
− .
i=k+1
1 + yi−1
2
i=k+1
1 + yi−1
2
Under the null hypothesis, the recursion defining .yi takes the form
yi = ρi yi−1 + Ei,2 ,
. 1 ≤ i < ∞, (5.3.8)
where .ρi = β0 + Ei,1 . We can solve the recursion in (5.3.8) explicitly. Equation
(5.3.8) implies that
Σ
i | |
i | |
i Σ
i−1 | |
l | |
i
.yi = El,2 ρj + y0 ρj = Ei−l,2 ρi−j +1 + y0 ρj .
l=1 j =l+1 j =1 l=0 j =1 j =1
5.3 Random Coefficient Autoregressive Models 241
If
∞
Σ | |
l
ȳi =
. Ei−l,2 ρi−j +1 . (5.3.10)
l=0 j =1
We note that
Σ
i−1 | |
l | |
i
yi =
. Ei−l,2 ρi−j +1 + y0 ρj . (5.3.11)
l=0 j =1 j =1
The proof of Theorem 5.3.1 is based on two lemmas. In the first it is shown that
the .yi ’s can be replaced with the stationary .ȳi ’s with an asymptotically negligible
consequence. In the second lemma we show that the stationary sequence .{ȳi , i ∈ Z}
is .Lν decomposable.
Lemma 5.3.1 If .H0 of (5.3.2), Assumption 5.3.1 are satisfied and .−∞ ≤
E log |β0 + E0,1 | < 0, then
|
∞ | 2
|
Σ 2 E |
| yi−1 Ei,1 ȳi−1 i,1 |
. | − 2 ||
< ∞ a.s., (5.3.12)
| 1 + yi−1
2 1 + ȳi−1
i=2
∞ |
| |
Σ | yi−1 Ei,2 ȳi−1 Ei,2 ||
. | − 2 ||
< ∞ a.s., (5.3.13)
| 1 + yi−1
2 1 + ȳi−1
i=2
and
∞ |
| |
Σ 2 2 |
| yi−1 ȳi−1 |
. | − | < ∞ a.s., (5.3.14)
| 1 + yi−1
2 1 + ȳi−1 |
2
i=2
Proof Since .E(log |ρ0 | + κ̄/2) = −κ̄/2 < 0, by Lemma 2 of Aue et al. (2006)
there are .ν1 > 0 and .c1 < 1 such that
and
⎛ ⎛ ⎞⎞ν1
Σi
.E ⎝exp ⎝ (log |ρj | + κ̄/2)⎠⎠ ≤ c2 for all 1 ≤ i < ∞. (5.3.16)
j =1
We write
| |
| y2 ȳi2 ||
| i
.| − | ≤ 2|yi − ȳi ||yi + ȳi |
| 1 + yi2 1 + ȳi2 |
and therefore
| |
∞
Σ | y2 ȳi2 || Σ∞
|
. |Ei | | i
− | ≤ 2 |Ei ||yi−1 + ȳi−1 ||yi−1 − ȳi−1 |.
| 1 + yi−1
2 2 |
1 + ȳi−1
i=2 i=2
completing the proof of (5.3.12). Similar arguments give (5.3.13) and (5.3.14). ⨆
⨅
As a result of this we only need to work with the stationary solution. It follows
from (5.3.10) .{ȳi , i ∈ Z} is a Bernoulli shift, namely there is a function .g : R∞ |→
R such that
Lemma 5.3.2 If .H0 of (5.3.2), Assumption 5.3.1 are satisfied and .−∞ ≤
E log |β0 + E0,1 | < 0, then
| |
|E 2 2 |4
Ei+1,1 yi,l
| i+1,1 ȳi |
.E | − | ≤ al−α , (5.3.18)
| 1 + ȳi2 1 + yi,l |
2
| |4
|E Ei+1,2 yi,l ||
| i+1,2 ȳi
.E | − 2 ||
≤ al−α (5.3.19)
| 1 + ȳi2 1 + yi,l
and
| |4
| ȳ 2 2
yi,l |
| i |
.E | − 2 ||
≤ al−α (5.3.20)
| 1 + ȳi2 1 + yi,l
ȳ0 = zl + ul
.
and
y0,l = zl + u∗l ,
.
where
Σ
l−1 | |
j | |
l−1
.zl = E−j,2 ρ−k+1 , ul = ȳ−l ρ−j
j =0 k=1 j =0
and
| |
l−1
u∗l = ȳ−l
.
∗ ∗
ρ−j ,
j =0
244 5 Parameter Changes in Time Series Models
where
Let .β > 0. Using (5.3.15) and (5.3.16) we get via Markov’s inequality,
⎛ ⎞4
ȳ02 −β
E
. 1{|ul | > l } ≤ P {|ul | > l−β } ≤ c1 lβν1 E|ul |ν1 e−lν2 ≤ c2 e−lν3
1 + ȳ02
with some constants .ν1 > 0, ν2 > 0, ν3 > 0 and .c1 , .c2 = c2 (β). Similarly,
⎛ 2
⎞4
y0,l
E
. 1{|u∗l | >l −β
} ≤ c2 e−lν3 .
1 + y0,l
2
|zl |
. ≤ 4.
1 + (zl + ul )2
Since .E1 and .(ȳ0 , y0,l ) are independent the proof of (5.3.18) is complete since it
does not depend on i. Similar arguments give (5.3.19) and (5.3.20). ⨅
⨆
Now we are ready to prove Theorem 5.3.1.
Proof of Theorem 5.3.1 Let
ȳ02
.a0 = E .
1 + ȳ02
5.3 Random Coefficient Autoregressive Models 245
Combining Lemma 5.3.20 with the approximations in Theorem A.1.1, we can define
Wiener processes .{WN,1 (x), x ≥ 0} such that
| k ⎛ ⎞ |
|Σ ȳi2 |
1 | |
. sup | − a0 − c1 WN,1 (k)| = OP (1) (5.3.21)
1≤k≤N k ζ1 | 1 + ȳi2 |
i=1
with some .c1 > 0 and .ζ1 < 1/2. It follows from (5.3.21) and the law of the iterated
logarithm that
| k ⎛ ⎞|
|Σ ȳi2 |
1 | |
. max | − a0 | = OP (1)
1≤k≤N k ζ2 | 1 + ȳi2 |
i=1
Theorem A.1.1 implies, that we can define Wiener processes .{WN,2 (x), x ≥ 0} such
that
| N ⎛ ⎞ |
| Σ ȳi2 |
1 | |
. sup | − a0 − c W
1 N,2 (N − k) | (5.3.22)
1≤k<N (N − k) 1 | 1 + ȳi2 |
ζ
i=k+1
= OP (1)
which implies
|⎛ ⎞−1 |
| Σ |
|
ζ2 | 1
N 2
ȳi 1 ||
. max (N − k) − | = OP (1).
| N −k 1 + ȳi2
1≤k<N | i=k+11
a0 |
Using the decomposability of the Bernoulli shifts in (5.3.19) and (5.3.20) and the
approximations in Theorem A.1.1, we can define two independent Wiener processes
.{WN,3 (x), 0 ≤ x ≤ N/2} and .{WN,4 (x), 0 ≤ x ≤ N/2} such that
| Lx⎦ 2 |
1 || Σ ȳi−1 Ei,1 + ȳi−1 Ei,2 |
. max | − a0 ηWN,3 (x)|| = OP (1)
1≤x≤N/2 x ζ 3
i=2
1 + ȳi−1
2
and
| Σ |
1 | N 2 E
ȳi−1 i,1 + ȳi−1 Ei,2 |
max | − a0 ηWN,4 (N − x)|| = OP (1)
N/2≤x≤N −1 (N − x)ζ3 |
.
i=Lx⎦ +1
1 + ȳi−1
2
246 5 Parameter Changes in Time Series Models
with some .ζ3 < 1/2. Putting together the approximations we get
|⎛ ⎞−1 || |
| | |Σ |
1 | 1Σ k 2
ȳi−1 1 | | k ȳi−1
2 E
i,1 + ȳi−1 Ei,2 |
. max | − || | = OP (1)
1≤k≤N k ζ4 | k 1 + ȳi−1
2 a0 || | 1 + ȳi−1
2 |
| i=2 i=2
and
|⎛ ⎞−1
| Σ
1 | 1
N 2
ȳi−1
max |
.
1≤k≤N−1 (N − k)ζ4 | N −k 1 + ȳi−1
2
| i=k+1
|| N |
1 || || Σ ȳi−1
2 E |
i,1 + ȳi−1 Ei,2 |
− || | = OP (1)
a0 | 1 + ȳi−1
2 |
i=k+1
and
|
1 | x(N − x)
max | (β̂Lx⎦ ,1 − β̂Lx⎦ ,2 ) (5.3.24)
N/2≤x≤N −1 (N − x) 5 |
.
ζ N
⎛ ⎞|
N −x ⎡ ⎤ |
− η −WN,4 (N − x) + WN,4 (N/2) + WN,3 (N/2) || = OP (1)
N
with some .ζ5 < 1/2. By computing the covariance functions we see that
BN (t)
. (5.3.25)
⎧ −1/2 ( ⎡ ⎤)
⎨N W (N t) − t WN,4 (N/2) + WN,3 (N/2) , 0 ≤ t ≤ 1/2,
( N,3 ⎡ ⎤)
= N −1/2 −WN,4 (N (1 − t)) + (1 − t) WN,4 (N/2) + WN,3 (N/2) ,
⎩
1/2 ≤ t ≤ 1
is a Brownian bridge for each N. Due to the approximations (5.3.23) and (5.3.24),
the result now follows as in Sects. 1.2 and 1.2.1. ⨆
⨅
5.3 Random Coefficient Autoregressive Models 247
1 D |B(t)|
. sup |QN (t)| → sup ,
0<t<1 w(t) 0<t<1 w(t)
P {|yi | ≤ i δ } ≤ c1 i −δ .
. (5.3.26)
1 1 1
E
. =E 1{|yi | ≤ i δ } + E 1{|yi | ≥ i δ } ≤ P {|yi | ≤ i δ }
1 + yi
2 1 + yi
2 1 + yi2
+ (1 + i 2δ )−1 ≤ c2 i −δ (5.3.27)
248 5 Parameter Changes in Time Series Models
with some constant .c2 . By Markov’s inequality we have for all .x > 0 and .ζ1 > 1−δ
⎧ ⎞ ⎧ ⎞
1 Σ 1 1 Σ 1
k k
P
. max > x ≤P max max >x
M≤k<∞ k ζ1
i=1
1 + yi2 log M≤l<∞ el ≤k<el+1 k ζ1
i=1
1 + yi2
∞
⎧ ⎞
Σ 1 Σ 1
k
≤ P max >x
ζ
el ≤k<el+1 k 1
l=log M
1 + yi2 i=1
∞
⎧ ⎞
Σ 1 Σ
k
≤ P max > xeζ1 l
l=log M
el ≤k<el+1
i=1
1 + yi
2
⎧ ⎫
∞
Σ ⎨exp(l+1)
Σ 1 ⎬
= P > xe ζ1 l
⎩ 1 + yi2 ⎭
l=log M i=1
∞
Σ Σ
exp(l+1)
1 1
≤ e−ζ1 l E
x
l=log M i=1
1 + yi2
∞
Σ Σ
exp(l+1)
c2
≤ e−ζ1 l i −δ
x
l=log M i=1
c3
≤ M −(ζ1 −(1−δ)) .
x
Hence there is .ζ2 < 1 such that for all .x > 0
⎧ ⎞
1 Σ 1
k
. lim P max ζ >x =0 (5.3.28)
M→∞ M≤k<∞ k 2
i=1
1 + yi2
and
| ⎛ N ⎞−1 |
| |
| Σ y 2 |
. max (N − k)1−ζ2 ||(N − k) i−1
− 1| = OP (1).
|
1≤k≤N −1 | i=k+1
1 + yi−1
2
|
l=i
1 + yl−1
2
⎛ ⎞4 ⎡⎛ ⎞2 ⎛ ⎞2 ⎤
Σ
j
1 Σ El,1 El' ,1
= 4
EEl,1 E + E⎣ ⎦
l=i
1 + yl−1
2
i≤l/=l' ≤j
1 + yl−1
2 1 + yl2' −1
⎛ ⎞4
Σ
j
1
≤ 4
EE0,1 E
l=i
1 + yl−1
2
⎡ ⎛ ⎡ ⎛ ⎞4 ⎤1/2
Σ ⎞4 ⎤1/2
1 ⎣E El' ,1 ⎦
+ 4
EE0,1 E
1 + yl−1 1 + yl2' −1
i≤l/=l' ≤j
⎡⎛ ⎛ ⎞2 ⎞2 ⎤
Σ
j Σj
4 ⎢⎝ ⎥
b ⎠ +⎝ bl−1 ⎠ ⎦
1/2
≤ EE0,1 ⎣ l−1
l=i l=i
⎛ ⎞2
Σj
4 ⎝
≤ 2EE0,1 bl−1 ⎠ ,
l=i
where
1
bl = E
. < 1.
(1 + yl2 )4
bl = O(l−δ ),
. as l → ∞.
| k ⎛ ⎞|4 ⎛ ⎞2
|Σ 2 | Σ
j
| yi−1 |
.E max | Ei,1 − 1 | ≤ c4 ⎝ bl−1 ⎠ .
2≤k≤j | 1 + yi−1
2 |
i=2 l=1
250 5 Parameter Changes in Time Series Models
c5 Σ −4ζ3 l 2l(1−δ)
log N
≤ e e
x4
l=0
c6
≤ 4,
x
and
| N |
| Σ E y |
1 | i,1 i−1 |
. max | 2 ||
= OP (1)
1≤k<N (N − k)ζ5 | 1 + yi−1
i=k+1
with some .ζ5 < 1/2. We proved again approximations for .k(N − k)(β̂k,1 − β̂k,2 )
with CUSUM processes:
| ⎛ k ⎞|
| k(N − k) Σ k Σ
N |
1 | |
. max | (β̂k,1 − β̂k,2 ) − Ei,1 − Ei,1 | = OP (1)
1≤k≤N/2 k ζ6 | N N |
i=1 i=1
and
|
1 | k(N − k)
max | (β̂k,1 − β̂k,2 )
.
N/2≤k≤N −1 (N − k)ζ6 | N
⎛ ⎞|
Σ N
N −k Σ
N |
|
− − Ei,1 + Ei,1 | = OP (1)
N |
i=k+11 i=1
with some .ζ6 < 1/2. We can use again Theorem A.1.1 and define two independent
Wiener processes .{WN,1 (x), 0 ≤ x ≤ N/2} and {.WN,2 (x), 0 ≤ x ≤ N/2} such
that
| k |
1 ||Σ |
|
. max | E i,1 − σ1 WN,1 (k) | = OP (1)
1≤k≤N/2 k ζ 7 | |
i=1
and
| N |
| Σ |
1 | |
. max | Ei,1 − σ1 WN,2 (N − k)| = OP (1)
1≤k≤N/2 (N − k)ζ7 | |
i=k+1
with some .ζ7 < 1/2. The theorem now follows from the results in Sects. 1.2
and 1.2.1. ⨆
⨅
Using the weighted least squares estimators, we may define a test of .H0 versus
HA along the lines of Theorem 5.1.2. Another view that leads to this considering
.
the process .QN is the following. The estimator .β̂k,1 is the solution of the equation
Lk (β̂k,1 ) = 0,
.
252 5 Parameter Changes in Time Series Models
where
Σk
(yi − βyi−1 )yi−1
Lk (β) =
. .
i=2
1 + yi−1
2
If .H0 holds, then .Lk (β̂N,1 ) should be close to 0 for all .2 ≤ k ≤ N. Using the
formula for .β̂N,1 in (5.3.4) we obtain that
⎛N ⎞−1 k
Σk
yi yi−1 ΣN
yi yi−1 Σ yi−1 2 Σ yi−1 2
.Lk (β̂N,1 ) = −
i=2
1 + yi−1
2
i=2
1 + yi−1
2
i=2
1 + yi−1
2
i=2
1 + yi−1
2
⎛ k ⎞⎛ N ⎞
Σ yi−1 2 Σ 2
yi−1
= (β̂k,1 − β̂k,2 )
i=2
1 + yi−1
2
i=k+1
1 + yi−1
2
according to the proof of Theorem 5.3.1. Hence if we reject .H0 for large values
of .max2≤k≤N |Lk (β̂N,1 )|, this test is equivalent with rejecting for large values for
.max2≤k≤N k(N − k)|β̂k,1 − β̂k,2 |. These statistics in fact have the same asymptotic
âN,1
.
2
η̂N = 2
,
âN,2
where
⎛ ⎞2
1 Σ yi − β̂N,1 yi−1
N
âN,1
. =
N
i=2
1 + yi−1
2
and
1 Σ yi2
N
âN,2
. = .
N
i=1
1 + yi2
5.3 Random Coefficient Autoregressive Models 253
Corollary 5.3.1 We assume that the conditions of Theorems 5.3.1 or 5.3.2 hold.
(i) If .I (w, c) < ∞ for some .c > 0, then
1 1 D |B(t)|
. sup |QN (t)| → sup ,
η̂N 0<t<1 w(t) 0<t<1 w(t)
= Ei,1
2 2
yi−1 + Ei,2
2
+ (β0 − β̂N,1 )2 yi−1
2
+ 2(β0 − β̂N,1 )yi−1
2
Ei,1
The proofs of Theorems 5.3.1 and 5.3.2 show that in all cases
|β̂N − β0 | = OP (N −1/2 ),
.
1 Σ 1 Σ yi−1
N 4 N 4 E
yi−1 i,1
. = OP (1), = OP (N −1/2 ),
N
i=2
(1 + yi−1
2 )2 N
i=2
(1 + y 2 )2
i−1
1 Σ yi−1 1 Σ yi−1
N 3 E N 3 E E
i−2 i,1 i−2
. = OP (N −1/2 ), = OP (N −1/2 ).
N
i=2
(1 + y 2 )2
i−1
N
i=2
(1 + y 2 )2
i−1
η̂N = η + OP (N −ζ )
.
ml = LNτl ⎦ ,
. 1 ≤ l ≤ M, 0 < τ1 < τ2 < . . . < τM < 1.
and
⎧ ⎛ ⎞
⎪
⎨E
2
ȳl,0
, if − ∞ ≤ E log |β0 + Eml ,1 | < 0,
.al = 1 + ȳl,0
2 (5.3.32)
⎪
⎩
1, if E log |β0 + Eml ,1 | ≥ 0,
1 ≤ l ≤ M + 1. The limit of the process .QN may in this case be expressed in terms
.
Σ
l−1 η2
j ηl2
.η̄(t) = (τ − τj −1 ) +
2 j
(t − τl−1 ), τl−1 < t ≤ τl , (5.3.33)
j =1
aj al2
1 ≤ l ≤ M + 1.
Sect. 1.2.1) that the standardized CUSUM process is determined by its behaviour
near the beginning and end of the observation period. On these intervals the
asymptotic variance .η0 (t, t) is though proportional to .t (1 − t), where
η0 (t, s) = E (┌(t) − t┌(1)) (┌(s) − s┌(1)) = η̄(min(t, s))−t η̄(s)−s η̄(t)+st η̄(1).
.
The following result describes how Theorems 5.3.1 and 5.3.2 change in this
heteroscedastic situation.
Theorem 5.3.3 We assume that .H0 of (5.3.2), Assumptions 1.2.1, 5.3.4 hold.
(i) If .I (w, c) < ∞ for some .c > 0, then
1 D 1
. sup |QN (t)| → sup |┌(t) − t┌(1)|,
0<t<1 w(t) 0<t<1 w(t)
ml−1 +j
Σ
.Sl (j ) = zi , if 1 ≤ j ≤ ml − ml−1 , 1 ≤ l ≤ M + 1.
i=ml−1 +1
According to the proofs of Lemma 5.3.1, (if .−∞ ≤ log |β0 + Eml ,1 | < 0) and
Theorem 5.3.2 (if .0 ≤ log |β0 + Eml ,1 | < ∞)
| |
Σ
ml | y 2 Ei,1 + yi−1 Ei,2 |
| i−1 |
. | − Sl (k) | = OP (1), 1 ≤ l ≤ M + 1.
| 1 + yi−1
2 |
i=ml−1 +1
2 E +y
i−1 Ei,2 )/(1 + yi−1 )
This means that we may replace the partial sums of .(yi−1 2
i,1
with the partial sums of the interval stationary .zi ’s.
Using Theorem A.1.1 we may define independent Wiener processes .{WN,l,1 (x),
0 ≤ x ≤ (ml − ml−1 )/2}, {WN,l,2 (x), 0 ≤ x ≤ (ml − ml1 )/2}, 1 ≤ l ≤ M + 1
such that
1 || |
. max ζ
Sl (k) − ηl WN,l,1 (k)| = OP (1),
1≤k≤(ml −ml−1 )/2 k
and
1 | |
. max |Sl (ml ) − Sl (k) − ηl WN,l,2 (k)| = OP (1)
(ml −ml−1 )/2<k<ml −ml−1 (ml − k)ζ
5.3 Random Coefficient Autoregressive Models 257
1 ≤ l ≤ M + 1. Now we define
.
Σ
l−1
ηj ηl
ΔN (k) =
. WN,j (mj − mj −1 ) + WN,l (x − ml−1 ),
aj al
j =1
Thus we obtain the following approximation for the difference between the estima-
tors:
| ⎛ ⎞|
1 || k(N − k) k |
|
. max | ( β̂k,1 − β̂k,2 ) − Δ N (k) − Δ N (N ) | (5.3.34)
1≤k≤N/2 k ζ N N
= OP (1)
and
|
1 | k(N − k)
max | (β̂k,1 − β̂k,2 )
N/2≤k≤N −1 (N − k) |
.
ζ N
⎛ ⎞|
N −k |
− −(ΔN (N ) − ΔN (k)) + ΔN (N ) || (5.3.35)
N
= OP (1).
Observing that
{ }
D
. N −1/2 ┌N (N t), 0 ≤ t ≤ 1 = {┌(t), 0 ≤ t ≤ 1} ,
η0 (t, t) is proportional to .t (1 − t), if .0 < t ≤ τ1 or .τM < t < 1, we can repeat the
.
and
| |
| η (t, t) η2 |
| 0 M+1 |
. sup | − 2 | = O(1/ log N ).
1−t1 ≤t≤1−t2 | 1 − t aM+1 |
and
| |
| |
| 1 1 |
. max | 1/2 |QN (t) − WN,M+1,2 (N − N t)|
|
1−t2 ≤t≤1−t1 η (N − Nt) 1/2 |
0 (t, t)
⎛ ⎞
= OP (log N)4(ζ −1/2) .
= exp(−2e−x ).
L(NΣ
+1)t⎦ 2
1 yi−1
ĉN,1 (t) =
. , and
N
i=2
1 + yi−1
2
1 Σ
N 2
yi−1
ĉN,2 (t) = , 0 ≤ t ≤ 1,
N
i=L(N +1)t⎦ +1
1 + yi−1
2
and define
⎧
⎪
⎪
⎪
0, if 0 < t < 2/(N⎛ + 1), ⎞
⎨ N 1/2 ĉ (t)ĉ (t) β̂
N,1 N,2 L(N +1)t⎦ ,1 − β̂L(N +1)t⎦ ,2 ,
.Q̄N (t) =
⎪ if 2/(N + 1) ≤ t < 1 − 2/(N + 1),
⎪
⎪
⎩
0, if 1 − 2/(N + 1) ≤ t < 1.
If the no change in the regression parameter null hypothesis holds, then .cN,1 (t) and
cN,2 (t) converge pointwise to the respective functions
.
Σ
l−1
c1 (t) =
. (τl − τl−1 )al + (t − τl−1 )al , τl−1 < t ≤ τl , 1 ≤ l ≤ M + 1,
j =1
and
Σ
l−1
b(t) =
. (τj − τj −1 )ηj2 + (t − τl−1 )ηl2 , (5.3.36)
j =1
τl−1 < t ≤ τl , 1 ≤ l ≤ M + 1.
The weak limit of the modified process .Q̄N may be expressed in terms of the
Gaussian process
g(t, s) = EO(t)O(s)
. (5.3.38)
= c21 (1)b(min(t, s)) − c1 (1)c1 (t)b(s) − c1 (1)c1 (s)b(t) + c1 (t)c1 (s)b(1).
Theorem 5.3.4 We assume that .H0 of (5.3.2), Assumptions 1.2.1, 5.3.4 hold.
(i) If .I (w, c) < ∞ for some .c > 0, then
1 D 1
. sup |Q̄N (t)| → sup |O(t)|,
0<t<1 w(t) 0<t<1 w(t)
= exp(−2e−x ),
where .a(x), .b(x) are defined in (1.2.18) and .g(t, s) is given in (5.3.38).
Proof Repeating the arguments used in the proofs of Theorems 5.3.1 and 5.3.2 on
the intervals .(ml−1 , ml ], one can show that
( )
. sup |ĉN,1 (t) − c1 (t)| = OP (log N )−ζ
0<t<1
with some .ζ > 0. Thus we follow the proof of Theorem 5.3.3 to establish
Theorem 5.3.4. ⨆
⨅
It may be shown that if the alternative in (5.3.3) is satisfied, such that
N 1/2 |β0 − βA | → ∞,
. (5.3.39)
where here .βA is allowed to depend on N and may converge to .β0 , then
1 P
. sup |Q̄N (t)| → ∞. (5.3.40)
0<t<1 w(t)
According to (5.3.40), we may reject the no change null hypothesis .H0 in (5.3.2) at
asymptotic level .α if
D
{Δ(t), 0 ≤ t ≤ 1} = {W (b(t)), 0 ≤ t ≤ 1},
.
where .{W (x), 0 ≤ x < ∞} is a Wiener process. The function .b(t) is unknown, but
can be consistently estimated from the sample by
LN t⎦
⎛ ⎞2
1 Σ yi − β̂N,1 yi−1
.b̂N (t) = , 0 ≤ t ≤ 1.
N
i=2
1 + yi−1
2
with some .ζ > 0. If .|β0 − βA | is bounded, as .N → ∞, then there is a function .b∗ (t)
such that
where .Ôi (t) = ĉN,2 (t)Wi (b̂N (t)) − ĉN,1 (t)(Wi (b̂N (1)) − Wi (b̂N (t))). Let .cN,L (α)
be defined as
This means that our procedure provides a test with correct asymptotic size and we
reject the null hypothesis with probability going to 1 under the alternative.
We may also consider Cramér–von Mises type tests, and approximations of their
distributions using principal components as in Sect. 3.3 (cf. (3.3.4) and (3.3.7)). The
consistency of these procedures under multiple changes and heteroscedasticity may
also be established as in Sect. 3.3.
A multitude of time series models have been put forward to capture “volatility” or
“conditional heteroscedasticity” in a time series. These terms refer to the phenomena
when a, often presumed stationary, time series .{yi , i ∈ Z} has the property that the
conditional variance of the process .σi2 = var(yi |Fi−1 ) changes as a function of i,
for a suitably defined information filtration .Fi . In this section we study change point
detection procedures for models of this phenomena. Such models are frequently
applied in financial applications to model the returns on asset prices.
Perhaps the simplest volatility model is the Autoregressive Conditionally Het-
eroscedastic model of order one (ARCH(1)). We provide detailed results on this
volatility process, mainly to explain and highlight the theoretical and computational
issues when we deal with changes in volatility models. The ARCH(1) with a
potential change point in the parameters is defined by the recursions
yi = σi Ei ,
. i ≥ 1, (5.4.1)
and
σi2 = ωi + αi yi−1
.
2
, i ≥ 1, (5.4.2)
starting from an initial value .y0 . According to the definition, the conditional
expected value of .yi , conditioned on .{yj , j < i} depends on .yi−1
2 , so the process is
H0 : (ω1 , α1 ) = . . . = (ωN , αN )
. (5.4.3)
5.4 ARCH, GARCH and Other Volatility Processes 263
yi = σi Ei ,
. i≥1 (5.4.5)
and
σi2 = ω0 + α0 yi−1
.
2
, i ≥ 1. (5.4.6)
It would be natural to use least squares to estimate the parameters, but the
consistency of such estimators requires that .Eyi4 < ∞ (cf. Francq and Zakoian,
2010, Chapter 7). It is often unappealing to assume the existence of higher order
moments in many applications, such as when employing such models for financial
data. As suggested in Francq and Zakoian (2010) (Section 7.1.2), it is often better
to use weighted least squares and quasi–likelihood estimators. These are similar to
the RCA.(1) parameter estimates discussed in Sect. 5.3.
The weighted least squares estimates .θ̂ N = (ω̂N , α̂N )T are the solutions of the
equations
Σ
N
yi2 − ω̂N − α̂N yi−1
2
. =0 (5.4.7)
i=1
(ω̂N + α̂N yi−1
2 )2
and
Σ
N
(yi2 − ω̂N − α̂N yi−1
2 )y 2
i−1
. = 0. (5.4.8)
i=1
(ω̂N + α̂N yi−1
2 )2
We further assume that (5.4.3) admits a stationary and causal solution, which is
implied by the following condition:
Assumption 5.4.3 .−∞ ≤ E log(α0 E02 ) < 0.
If .H0 of (5.4.3) holds, then the processes
Σ
k
yi2 − ω̂N − α̂N yi−1
2
TN,1 (k) =
.
i=1
(ω̂N + α̂N yi−1
2 )2
and
Σ
k
(yi2 − ω̂N − α̂N yi−1
2 )y 2
i−1
TN,2 (k) =
.
i=1
(ω̂N + α̂N yi−1
2 )2
ȳi = σ̄i Ei
. (5.4.9)
with
. σ̄i2 = ω0 + α0 ȳi−1
2
, i ∈ Z, (5.4.10)
and
⎛ ⎞2
D = E E02 − 1 C,
.
where
⎛ ⎡ ⎤ ⎡ ⎤⎞
1 ȳ02
⎜E ,E ⎟
⎜ (ω + α0 ȳ02 )2 (ω + α0 ȳ02 )2 ⎟
.C = ⎜ ⎡ 0 ⎤ ⎡ 0 ⎤⎟ .
⎜ ȳ02 ȳ04 ⎟
⎝ ⎠
E , E
(ω0 + α0 ȳ02 )2 (ω0 + α0 ȳ02 )2
We note that Francq and Zakoian (2004) (p. 146) showed that .C is invertible if .E02
has a non-degenerate distribution.
5.4 ARCH, GARCH and Other Volatility Processes 265
Theorem 5.4.1 We assume that .H0 of (5.4.3), Assumptions 1.2.1 and 5.4.1–5.4.3
hold.
(i) If .I (w, c) < ∞ with some .c > 0, then
1 1 ⎡ ⎤1/2
T −1
. max T N (k)D TN (k)
N 1/2 1≤k<N w(k/N )
D 1 ⎛ 2 ⎞1/2
→ sup B1 (t) + B22 (t) ,
0<t<1 w(t)
for all .x ∈ R, where .a(t) and .b2 (t) are defined in (1.3.9).
The applicability of Theorem 5.4.1 requires the estimation of .D. We refer to
Section 7.1.2 of Francq and Zakoian (2010) where several methods are provided to
estimate .D.
The stationary versions of .TN,1 (k) and .TN,2 (k) are
Σ
k
ȳi2 − ω̂N − α̂N ȳi−1
2
TN,3 (k) =
.
i=1
(ω̂N + α̂N ȳi−1
2 )2
and
Σ
k
(ȳi2 − ω̂N − α̂N ȳi−1
2 )ȳ 2
i−1
. TN,4 (k) = .
i=1
(ω̂N + α̂N ȳi−1
2 )2
Lemma 5.4.1 If .H0 of (5.4.3) and Assumptions 5.4.1–5.4.3 are satisfied, then
| |
. max |TN,1 (k) − TN,3 (k)| = OP (1)
1≤k≤N
and
| |
. max |TN,2 (k) − TN,4 (k)| = OP (1).
1≤k≤N
266 5 Parameter Changes in Time Series Models
σ12 = ω0 + α0 y02 .
Under Assumption 5.4.3, this relation has an almost sure limit as .m → ∞, which
defines a stationary solution to the ARCH(1) recursion of the form
⎡ ⎤
∞ | |
Σ l
σ̄i2 = ω0 ⎣1 +
.
2
(α0 Ei−j )⎦ .
l=1 j =1
The function .M(t) = E(α0 E02 )t exists for all .t ≤ 2. Since .M(0) = 1 and .M ' (0+) <
0 by Assumption 5.4.3, we get that there is .κ > 0 such that
and
⎛ ⎞κ
| |
i−1 ⎛ ⎞
E ⎝σ12
.
2
(α0 Ei−j )⎠ = O ρ1i .
j =1
and
| |
| 2 |
. |yi − ȳi2 | = O(ρ2i ) a.s. (5.4.13)
5.4 ARCH, GARCH and Other Volatility Processes 267
with some .0 < ρ2 < 1. The result now follows from elementary arguments using
the mean–value theorem to interchange .yi with .ȳi in the summands defining .TN,2
and .TN,3 , which we omit. ⨆
⨅
Let .θ 0 = (ω0 , α0 )T . We also define the vector
⎛ ⎞
Σ
k
Ei2 − 1
⎜ 2 )2 ⎟
⎜ i=1 (ω0 + α0 ȳi−1 ⎟
.e(k) = ⎜ ⎟. (5.4.14)
⎜Σ k
(Ei − 1)ȳi−1 ⎟
2 2
⎝ ⎠
i=1
(ω0 + α0 ȳi−1 )
2 2
Lemma 5.4.2 If .H0 of (5.4.3) and Assumptions 5.4.1–5.4.3 are satisfied, then
C−1
θ̂ N − θ 0 =
. e(N ) + RN
N
with
⎛ ⎞
1
. ||RN || = OP . (5.4.15)
N
and
⎛ ⎞
Σ
N 2 )γ1
(ȳi−1 2 )γ1
(ȳi−1
. ( )2γ2 + 2γ
= OP (N )
i=1 ω̂N + α̂N ȳi−1
2 σi−12
for all .0 ≤ γ1 ≤ γ2 . Now by (5.4.13) and (5.4.16), the Eqs. (5.4.7) and (5.4.8) can
be written as
Σ
N
Ei2 − 1 ( )Σ
N
1
. + ω0 − ω̂N
ω + α0 ȳi−1
i=1 0
2
i=1
(ω0 + α0 ȳi−1
2 )2
( )Σ
N 2
ȳi−1
+ α0 − α̂N + RN,1 = 0
i=1
(ω0 + α0 ȳi−1
2 )2
268 5 Parameter Changes in Time Series Models
and
Σ
N
(Ei2 − 1)ȳi−1
2 ( )Σ
N 2
ȳi−1
. + ω0 − ω̂N
i=1
ω0 + α0 ȳi−1
2
i=1
(ω0 + α0 ȳi−1
2 )2
( )Σ
N 4
ȳi−1
+ α0 − α̂N + RN,2 = 0
i=1
(ω0 + α0 ȳi−1
2 )2
with
|RN,1 | = OP (1)
. and |RN,2 | = OP (1).
Let
⎛ ⎞
Σ
N
1 ΣN 2
ȳi−1
⎜ , 2 )2 ⎟
⎜ i=1 (ω0 + α0 ȳi−1
2 )2 (ω0 + α0 ȳi−1 ⎟
.CN = ⎜ i=1 ⎟.
⎜Σ N 2 ΣN 4 ⎟
⎝ ȳi−1 ȳi−1 ⎠
,
i=1
(ω0 + α0 ȳi−1 ) i=1 (ω0 + α0 ȳi−1 )
2 2 2 2
Next we show
|| || ⎛ ⎞
|| 1 ||
|| C − C|| = OP N −1/2 . (5.4.17)
.
|| N N ||
We observe (see Example A.1.2) that .ȳi as well as .σ̄i2 are .Lν –decomposable. Let
⎡ ⎤ ⎡ ⎤
Σ
k | |
l ∞ | |
Σ l
2
σi,k
. = ω0 ⎣1 + 2
(α0 Ei−j )⎦ + ω0 ⎣1 + ∗
(α0 (Ei−j,k )2 )⎦ ,
l=1 j =1 l=k+1 j =1
and
| |κ
| 2 |
E |ȳi2 − yi,k
. | ≤ c1 ρ1k ,
k/(2κ)
where we used .K = ρ1 . Thus we further have that .{1/σ̄i2 , i ∈ Z} is .Lν –
decomposable with the approximating coefficients .vm in Definition (1.1.1) decaying
geometrically. Theorem A.1.1 yields the normality of the sum of .1/σ̄i2 , 1 ≤ i ≤ N
and therefore
|N ⎡ ⎤|
|Σ 1 || ⎛ ⎞
| 1
.| − E 2 | = OP N 1/2 . (5.4.18)
| σ̄i2 σ̄0 |
i=1
The same argument can be used for the other elements of .CN and therefore (5.4.17)
is established. Similarly to (5.4.18) we also have
|N |
|Σ Ei2 − 1 | ⎛ ⎞
| |
.| | = O P N 1/2
and
| 2 )2 |
(ω0 + α0 ȳi−1
i=1
|N |
|Σ (E 2 − 1)ȳ 2 | ⎛ ⎞
| i i−1 |
| | = O P N 1/2
.
| 2 )2 |
(ω0 + α0 ȳi−1
i=1
| k |
|Σ (E 2 − 1)ȳ 2 |
1 | i i−1 |
. max | − WN,1,2 (k) | = OP (1), (5.4.20)
1≤k≤N/2 k ζ | ω + α0 ȳi−1
2 |
i=1 0
| N |
| Σ Ei2 − 1 |
1 | |
. max | − WN,2,1 (N − k) | = OP (1) (5.4.21)
N/2≤k<N (N − k)ζ | ω0 + α0 ȳ 2
i−1
|
i=k+1
and
| N |
| Σ (E 2 − 1)ȳ 2 |
1 | i i−1 |
. max | − WN,2,2 (N − k) | = OP (1) (5.4.22)
1≤k≤N/2 (N − k)ζ | ω0 + α0 ȳ 2
i−1
|
i=k+1
with some .ζ < 1/2, .EWN,1 (u) = EWN,2 (u) = 0 and .EWN,1 (u)WT
N,1 (v) =
T
EWN,2 (u)WN,2 (v) = E(E0 − 1) C min(u, v).
2 2
Proof We showed in the proof of Lemma 5.4.2 that the summands in (5.4.19) and
(5.4.20) are .Lν –decomposable with coefficients .vm in Definition 1.1.1 decaying
geometrically. Hence the result follows from Theorem A.1.3. ⨆
⨅
Proof of Theorem 5.4.1 Let .TN (k) = (TN,3 (k), TN,4 (k))T . Using two term Taylor
expansion with Lemmas 5.4.1 and 5.4.2 we get
|| ⎛ ⎞||
N || ||
||TN (k) − e(k) − k e(N ) || = OP (1),
. max || ||
1≤k≤N k N
and therefore
|| ⎛ ⎞||
1 || ||
||TN (k) − e(k) − k e(N ) || = OP (1)
. max || ||
1≤k≤N k ζ1 N
Theorem 5.4.2 We assume that .H0 of (5.4.3), Assumptions 1.2.1, 5.4.1, 5.4.2 hold
and
. E log(α0 E02 ) /= 0.
then
∞
Σ
e−S(i) σi2 →
. e−S(l) + σ12 , a.s. (i → ∞)
l=1
where
Σ
l
.S(l) = log(α0 Ej2 ).
j =1
272 5 Parameter Changes in Time Series Models
Proof of Theorem 5.4.2 The result follows from the proof of Theorem 5.4.1 when
−∞ ≤ E log(α0 E02 ) < 0. Lemma 5.4.4 yields that
.
| ⎛ k ⎞|
| Σ E2 − 1 Σk 2−1 |
| k E |
. max |TN,2 (k) −
i
− i
| = OP (1).
1≤k≤N | 2
α0 N 2
α0 |
i=1 i=1
Since Assumption 5.4.1 holds, the approximation for partial sums of independent
random variables and the results in Sects. 1.2 and 1.2.1 yield the result in the
explosive case. ⨆
⨅
The parameter .σ 2 is unknown in Theorem 5.4.2, and applying the result to
conduct change point analysis requires its estimation. We use
P
σ̂N2 → σ 2 .
.
Proof We write
Σ
N
(yi2 − ω̂N − α̂N yi−1
2 )2 y 4
i−1
.
i=1
(ω0 + α0 yi−1
2 )4
Σ
N
((Ei2 − 1)(ω0 + α0 yi−1
2 ) − (ω − ω̂ ) − (α − α̂ )y 2 )2 y 4
0 N 0 N i−1 i−1
=
i=1
(ω̂N + α̂N yi−1
2 )4
Σ
N
(Ei2 − 1)yi−1
4
= + RN .
i=1
(ω0 + α0 yi−1
2 )2
We assume that .−∞ ≤ E log(α0 E02 ) < 0. It follows from Lemmas 5.4.1 and 5.4.2
|
N | 2
|
Σ |
| (Ei − 1) (ω0 + α0 yi−1 ) yi−1 (Ei2 − 1)2 ȳi−1
2 2 2 4 4
|
. | − | = OP (1)
| (ω̂N + α̂N yi−1 )
2 4 (ω0 + α0 ȳi−1 ) |
2 2
i=1
5.4 ARCH, GARCH and Other Volatility Processes 273
By (5.4.16)
Σ
N
(ω0 − ω̂N )2 yi−1
4
. = OP (1),
i=1
(ω̂N + α̂N yi−1
2 )4
and
Σ
N
(α0 − α̂N )2 yi−1
8
. = OP (1).
i=1
(ω̂N + α̂N yi−1
2 )4
Thus we conclude
RN
. = oP (1). (5.4.23)
N
Next we assume that .E log(α0 E02 ) > 0. Using Lemma 5.4.4 we get
|
N | 2
|
Σ | (Ei − 1) (ω0 + α0 yi−1 ) yi−1
2 2 2 4
(Ei2 − 1)2 ||
. | − | = OP (1)
| (ω̂N + α̂N yi−1
2 )4 α04 |
i=1
1 Σ (Ei2 − 1)2 P 2
N
. →σ .
N
i=1
α04
Σ
N ∞
Σ
(α0 − α̂N )2 yi−1
8 8
yi−1
. = (α0 − α̂N )2 = OP (1)
i=1
(ω̂N + α̂N yi−1
2 )4
i=1
(ω̂N + α̂N yi−1
2 )4
since
and
∞
Σ 8 ∞
Σ 8
yi−1 yi−1
. ≤ < ∞,
i=1
(ω̂N + α̂N yi−1
2 )4
i=1
(δ + δyi−1
2 )4
where the weighted least squares are maximized on the set .[δ, 1/δ]2 , 0 < δ < 1.
We showed in the proof of Theorem 5.4.2 that
| | ⎛ ⎞
. |α̂N − α0 | = OP N −1/2
and therefore
Σ
N
(α0 − α̂N )2 yi−1
8
. = OP (1),
i=1
(ω̂N + α̂N yi−1
2 )4
since
1 Σ
N 8
yi−1 P
. → 1.
N
i=1
(ω̂N + α̂N yi−1 )
2 4
provide the tools to do so. Due to the weighted approximations in Lemma 5.4.5,
the results in Chap. 1 can be formulated for these differences. The proofs of
the following two lemmas can be established along the lines of the proofs of
Lemmas 5.4.1, 5.4.2 and 5.4.4.
Lemma 5.4.5 If .H0 of (5.4.3) and Assumptions 5.4.1–5.4.3 are satisfied, then
1 ||
||
⎛ ⎞
−1
||
||
. max ||k θ̂ k,1 − θ 0 − C e(k) || = OP (1)
1≤k≤N kζ
and
|| ⎛ ⎞ ||
1 || −1 ||
. max ||(N − k) θ̂ k,2 − θ 0 − C (e(N ) − e(k)) || = OP (1)
1≤k<N (N − k)ζ
Lemma 5.4.6 If .H0 of (5.4.3), Assumptions 5.4.1, 5.4.2 are satisfied and
then
| |
| Σ
k 2 − 1|
1 | Ei−1 |
. max |k(α̂k,1 − α0 ) − | = OP (1)
1≤k≤N k ζ | α0 |
2
i=1
and
| |
| ΣN 2 − 1|
1 | Ei−1 |
. max |(N − k)(α̂k,2 − α0 ) − | = OP (1)
1≤k<N (N − k)ζ | α 2
0
|
i=k+1
Σ
N
yi2
. li (θ ) with li (θ ) = 2
+ log σi2 (θ),
i=1
σi (θ )
The quasi–likelihood estimates .θ̃ N = (ω̃N , α̃N )T are defined as the solution of a
minimization problem,
⎧N ⎞
Σ
θ̃ N = argmax
. li (θ ) : θ ∈ [δ, 1/δ] 2
,
i=1
where .0 < δ < 1 is a small tuning parameter. The quasi maximum likelihood
estimate satisfies the equation that
Σ
N
∂li (θ̃ N )
. = 0.
∂θ
i=1
276 5 Parameter Changes in Time Series Models
Since
∂li (θ ) y2 1 y 2 − σ 2 (θ)
. = − 4i + 2 = i 4 i
∂ω σi (θ ) σi (θ) σi (θ )
and
we obtain that .θ̃ N = θ̂ N . As such the methods based on weighted least squares are
in fact equivalent to those based on quasi–maximum–likelihood estimation.
These approaches may be extended to ARCH models of order p (ARCH.(p)),
as well as Generalized ARCH processes of orders p and q (GARCH(.p, q)). When
there are no changes in the model parameters, the ARCH(p) model takes the form
yi = σi Ei ,
. i≥1
and
Σ
p
σi2 = ω0 +
.
2
α0,l yi−l , i ≥ 1,
l=1
Σ
p Σ
q
σi2 = ω0 +
.
2
α0,l yi−l + 2
β0,j σi−j , i ≥ 1,
l=1 j =1
We provide the details for the GARCH(1,1) process to illustrate how we can
test for parameter stability in models of this variety. The GARCH(1,1) process with
varying parameters is defined via the equations:
yi = σi Ei ,
. i≥1 (5.4.24)
and
σi2 = ωi + αi yi−1
.
2
+ βi σi−1
2
, i ≥ 1. (5.4.25)
Under the null hypothesis that the parameters are homogeneous with respect to time:
H0 : (ω1 , α1 , β1 ) = · · · = (ωN , αN , βN ).
. (5.4.26)
5.4 ARCH, GARCH and Other Volatility Processes 277
σi2 = ω0 + α0 yi−1
.
2
+ β0 σi−1
2
, i ≥ 1. (5.4.27)
Σ
N
yi2
. li (θ) with li (θ ) = + log σi2 (θ ), (5.4.28)
i=1
σi2 (θ )
σi2 (θ ) = ω + αyi−1
.
2
+ βσi−1
2
(θ ), i ≥ 1. (5.4.29)
with some .0 < δ < 1, small enough. Since we wish to have a strictly positive
conditional variance process, we impose
Assumption 5.4.4 .ω0 > 0, α0 > 0 and .β0 > 0.
The .log likelihood function is smooth, and therefore
Σ
N
∂li (θ̂ N ) ⎛ ⎞T
. = 0, θ̂ N = ω̂N , α̂N , β̂N , (5.4.30)
∂θ
i=1
Σ
k
∂li (θ̂ N )
ZN (k) =
. , 1 ≤ k ≤ N.
∂θ
i=1
ȳi = σ̄i Ei ,
. i ∈ Z, (5.4.31)
278 5 Parameter Changes in Time Series Models
and
. σ̄i2 = ω0 + α0 ȳi−1
2
+ β0 σ̄i−1
2
, i ∈ Z. (5.4.32)
The necessary and sufficient condition for the existence of the unique, causal,
stationary solution is characterized by the following assumption.
Assumption 5.4.5 . E log(β0 + α0 E02 ) < 0.
The analogue of .li (θ ) using the stationary sequence is
ȳi2
.l̄i (θ ) = + log σ̄i2 (θ )
σ̄i2 (θ )
with
Berkes et al. (2003) and Francq and Zakoian (2010) proved that .G is nonsingular
under minor conditions.
Theorem 5.4.4 We assume that .H0 of (5.4.26), Assumptions 1.2.1, 5.4.1 and 5.4.4
hold.
(i) If .I (w, c) < ∞ with some .c > 0, then
1 1 ⎛ ⎞1/2
T −1
. max Z (k)G ZN (k)
N 1/2 1≤k<N w(k/N ) N
D 1 ⎛ 2 ⎞1/2
→ sup B1 (t) + B22 (t) + B32 (t) ,
0<t<1 w(t)
where .{B1 (t), 0 ≤ t ≤ 1}, {B2 (t), 0 ≤ t ≤ 1} and .{B3 (t), 0 ≤ t ≤ 1} are
independent Brownian bridges.
(ii) Also,
⎧ ⎛ ⎞1/2 ⎛
N ⎞1/2
. lim P a(log N) max ZT
N (k)G−1
ZN (k)
N →∞ 1≤k<N k(N − k)
⎞
≤ x + b3 (log N) = exp(−2e−x )
for all .x ∈ R, where .a(t) and .b3 (t) are defined in (1.3.9).
5.4 ARCH, GARCH and Other Volatility Processes 279
Σ
N
∂ l̄i (θ̄ N )
. = 0.
∂θ
i=1
Let
⎛ ⎞
∂ 2 l̄0 (θ 0 )
.J=E .
∂θ 2
The existence and invertibility of .J is established in Berkes et al. (2003) (see also
Francq and Zakoian, 2010).
Lemma 5.4.7 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.4 are satisfied, then
⎛ ⎞
1 Σ ∂ l̄(θ 0 )
N
−1 1
.θ̂ N − θ 0 = −J + OP .
N ∂θ N
i=1
Proof The proof is rather technical so we only provide an outline. A detailed argu-
ment is given in Francq and Zakoian (2010) for general GARCH(.p, q) processes.
The unique stationary and causal solution to the GARCH(1,1) recursion for the
conditional variance is
⎡ ⎤
Σ∞ | |
l
.σ̄i = ω0 ⎣1 + (β0 + α0 Ei−j )⎦ .
2 2
(5.4.33)
l=1 j =1
As in the proof in Lemma 5.4.1, Assumption 5.4.5 implies that there is .κ > 0 such
that
⎛ ⎞κ
.ρ1 = E β0 + α0 E0
2
< 1.
which yields
| |κ
| |
. E |yi2 − ȳi2 | ≤ 2c1 ρ1k .
E|σ̄i2 − σi,k
.
2 κ
| ≤ 2c1 ρ1k
which means that .σi2 , as well as .yi2 are .Lν –decomposable. Berkes et al. (2003)
obtained the representation
∞
Σ
σ̄i2 (θ) = d0 (θ ) +
.
2
dj (θ )yi−j
j =1
j
. sup |dj (θ )| ≤ c2 ρ2 ,
θ ∈O0
then
⎛ ⎞κ
⎛ ⎞
E
. sup |σ̄i2 (θ ) − σ̄i,k
2
(θ)| = O ρ3k .
θ∈O0
5.4 ARCH, GARCH and Other Volatility Processes 281
since
|| N ||
||Σ ∂ l̄i (θ 0 ) || ⎛ ⎞
|| ||
. || || = OP N 1/2 . (5.4.35)
|| ∂θ ||
i=1
The proof of (5.4.35) is given in Berkes et al. (2003) and in Section 7.4 of Francq
and Zakoian (2010). Another proof of (5.4.35) can be based on the decomposability
of .σ̄i2 (θ ). Using the arguments in Lemma 5.2 of Berkes et al. (2003) one can verify
that for all .ν̄ < ν
| |
| ∂ l̄i (θ 0 ) ∂li,k (θ 0 ) |ν̄
| | ≤ c4 ρ k ,
.E
| ∂θ − ∂θ | 4 (5.4.36)
2
yi,k
li,k (θ ) =
.
2 (θ )
+ log σ̄i,k
2
(θ ).
σi,k
The proof of (5.4.37) follows from the martingale properties of .∂ 2 l̄i (θ 0 )/∂θ 2 , and
also from the fact that this sequence if .Lν –decomposable. Thus we get
⎛ ⎞
1 Σ ∂ l̄(θ 0 )
N
−1 1
θ̄ N − θ 0 = −J
. + OP . (5.4.38)
N ∂θ N
i=1
The result in (5.4.34) yields that .|σi2 − σ̄i2 | ≤ c5 ρ5i , for some .c5 > 0, .0 < ρ5 < 1,
and this along with (5.4.38) implies the result. ⨅
⨆
Lemma 5.4.8 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.4 are satisfied, then
|| k ⎡ k ⎤||
||Σ ∂ l̄ (θ̂ ) Σ ∂ l̄i (θ 0 ) k Σ ∂ l̄i (θ 0 ) ||
N
1 || i N ||
. max || − − || = OP (1)
1≤k≤N k ζ || ∂θ ∂θ N ∂θ ||
i=1 i=1 i=1
and
|| N ⎡ N ⎤||
|| Σ ∂ l̄ (θ̂ ) Σ ∂ l̄i (θ 0 ) N − k Σ
N
∂ l̄i (θ 0 ) ||
1 || i N ||
. max || − − ||
1≤k≤N (N − k)ζ || ∂θ ∂θ N ∂θ ||
i=k+1 i=k+1 i=1
= OP (1)
The claim in (5.4.39) is proven, in detail, in Francq and Zakoian (2010) (Section
7.4). The result in (5.4.39) implies
|| k ||
||Σ ∂l(θ̂ ) Σk
∂ l̄(θ̂ N ) ||
|| N ||
. max || − || = OP (1).
1≤k≤N || ∂θ ∂θ ||
i=1 i=1
and
⎛ ⎞
1 || ||
||Rk,1 || = OP 1
. max .
1≤k≤N k N
5.4 ARCH, GARCH and Other Volatility Processes 283
Thus we conclude
Σ
k
∂ l̄i (θ 0 ) k Σ ∂ l̄i (θ 0 )
k
ZN (k) =
. − + Rk,2 ,
∂θ N ∂θ
i=1 i=1
with some .ζ < 1/2. The proof of the second part is the same. ⨆
⨅
Lemma 5.4.9 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.4 are satisfied, then
we can define independent Gaussian processes .{WN,1 (t), 0 ≤ t ≤ N/2} and
.{WN,2 (t), 0 ≤ t ≤ N/2}, such that .EWN,1 (t) = EWN,2 (t) = 0, EWN,1 (t)
WT T
N,1 (s) = EWN,2 (t)WN,2 (s) = G min(t, s),
|| k ||
||Σ ∂ l̄ (θ ) ||
1 || i 0 ||
. max || − WN,1 (k)|| = OP (1)
1≤k≤N/2 k ζ || ∂θ ||
i=1
and
|| N ||
|| Σ ∂ l̄ (θ ) ||
1 || i 0 ||
. max || − WN,2 (N − k)|| = OP (1)
N/2≤k<N (N − k)ζ || ∂θ ||
i=k+1
The covariance matrix .G may be estimated as outline in Sect. 3.1.1, since the
∂ l̄i (θ 0 )/∂θ ’s are uncorrelated random vectors with mean .0. Hence
.
⎛ ⎞⎛ ⎞T
1 Σ ∂li (θ̂ N )
N
∂li (θ̂ N )
.ĜN =
N ∂θ ∂θ
i=1
Theorem 5.4.5 If .H0 of (5.4.26), Assumptions 1.2.1, 5.4.1 and 5.4.4 are satisfied,
then
1 1/2 D 1 ⎛ 2 ⎞1/2
. max LN (k)) → sup B1 (t) + B22 (t) + B32 (t) ,
1≤k<N w(k/N ) 0<t<1 w(t)
= exp(−2e−x )
for all .x ∈ R, where .a(t) and .b3 (t) are defined in (1.3.9).
This result may be established easily from the following linearization of .θ̂ k,1 and
θ̂ k,2 , which may be established along the lines of Lemma 5.4.7. For more details
.
and
|| ||
|| ⎛ ⎞ Σ
N
∂ l̄i (θ 0 ) ||
1 || −1 ||
. max ||(N − k) θ0 − θ̂k,2 − J || = OP (1),
1≤k<N (N − k)ζ || ∂θ ||
i=k+1
Correspondingly, let
k ⎛
Σ ⎞
∂li (θ ) ∂li (θ ) T
.r̄k (θ ) = , , 1 ≤ k ≤ N.
∂α ∂β
i=1
(1)
where .θ̂ N is the quasi–likelihood estimator in (5.4.30). A test statistic for .H0 may
be based on the functionals of
Theorem 5.4.6 We assume that .H0 of (5.4.26), Assumptions 1.2.1, 5.4.4 and 5.4.6
hold.
(i) If .I (w, c) < ∞ with some .c > 0, then
1 1 ⎛ ⎞1/2
T −1
. max Z̄N (k)F̂N Z̄ N (k)
N 1/2 1≤k<N w(k/N )
D 1 ⎛ 2 ⎞1/2
→ sup B1 (t) + B22 (t) ,
0<t<1 w(t)
(ii) Also,
⎧ ⎛ ⎞1/2 ⎛
N ⎞1/2
−1
. lim P a(log N) max Z̄T
N (k)F̂N Z̄N (k)
N →∞ 1≤k<N (k(N − k))1/2
⎞
≤ x + b2 (log N) = exp(−2e−x )
for all .x ∈ R, where .a(t) and .b2 (t) are defined in (1.3.9).
Towards proving Theorem 5.4.6, we start with an elementary lemma on the
properties of the volatilities under Assumption 5.4.6. We use, as before, .σi2 for
2
.σ (θ0 ). Let
i
Σ
j
S(j ) =
. log(β0 + α0 Ek2 ), S(0) = 0
k=1
and
∞
Σ
R = ω0
. exp(−S(k)) + σ02 (β0 + α0 E02 ).
k=1
It follows from Assumptions 5.4.4 and 5.4.6 that .R < ∞ almost surely and .R ≥
ω0 > 0.
Lemma 5.4.11 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.6 are satisfied, then
| | | |
i l−1 | |
i−1
σi2 = ω0
. +σ02 (β0 + α0 El2 )
l=1 j =1 l=0
Σ
i
= ω0 exp (S(i − 1) − S(i − l)) + σ02 (β0 + α0 El2 ) exp(S(i − 1))
l=1
| |
(
. ∅ = 1). The law of large numbers and Assumption 5.4.6 imply
Σ
i Σ
i−1
. exp (−S(i − l)) = exp (−S(k)) → R a.s.,
l=1 k=0
with
1 ∂σi2 (θ ) 1 ∂σ 2 (θ)
pi,1 (θ ) =
.
2
and pi,1 (θ) = 2 i .
σi α σi β
∞
Σ ∞
1 | | Σ 1 | |
j j
β0 β0
zi,1 =
.
2
Ei−j and zi,2 = .
j =1
β0 β + α0 Ei−k
k=1 0
2
j =1
β0 β + α0 Ei−k
k=1 0
2
Since asymptotically .ω0 disappears from the model, we use .θ̄ 0 = (α0 , β0 ).
Lemma 5.4.12 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.6 are satisfied, then
for all .ω̄ > 1
| |
. sup |pi,1 (θ̄ 0 , ω) − zi,1 | = O(1) a.s. (5.4.42)
1/ω̄≤ω≤ω̄
and
| |
. sup |pi,2 (θ̄ 0 , ω) − zi,2 | = O(1) a.s. (5.4.43)
1/ω̄≤ω≤ω̄
Σ
i−1 Σ
i
σi2 (θ ) = ω
. βk + α 2
β l−1 yi−l + β i σ02
k=0 l=1
Σ
i−1 Σ
i
σi2 = σi2 (θ0 ) = ω0
. β0k + α0 β0l−1 yi−l
2
+ β0i σ02
k=0 l=1
288 5 Parameter Changes in Time Series Models
and
Σ
i−1 Σ
i
σi2 (θ̄0 , ω) = ω
. β0k + α0 β0l−1 yi−l
2
+ β0i σ02 .
k=0 l=1
Since .E log(β0 +α0 E02 ) > log β0 , using again Assumption 5.4.6 we get from Lemma
5.4.11 that
| |
| |
. exp (−S(i − 1)) sup |σi2 (θ̄ 0 , ω) − σi2 (θ 0 )| = o(1) a.s. (5.4.44)
1/ω̄≤ω≤ω̄
1 Σ1 2 Σ | |
i i 2 j 2 (θ )
j −1
yi−j σi−k
pi,1 (θ) =
. yi−j = β
σi2 (θ ) j =1 β j =1
2 (θ )
σi−j σ2
k=1 i−k+1
(θ)
and therefore
Σ i
1
2
yi−j | |
j 2 (θ̄ , ω)
β0 σi−k 0
.pi,1 (θ̄ 0 , ω) = .
2 (θ̄ , ω)
β0 σi−j σ 2 (θ̄ , ω)
j =1 0 k=1 i−k+1 0
Since
2 (θ̄ , ω)
β0 σi−k 2 (θ̄ , ω)
β0 σi−k
0 0 1
. = ≤ ≤ 1,
2
σi−k+1 (θ̄ 0 , ω) ω + α0 yi−k
2 + β0 σi−k
2 (θ̄ , ω)
0
2
α0 Ei−k + β0
(5.4.44) implies
| | j
Σ
i/2 | 2
yi−j | | | σ 2 (θ̄ , ω)
j −1 | 2 | i−k 0
. β0 | − Ei−j |
| 2 (θ̄ , ω)
σi−j 0 | 2
σi−k+1 (θ̄ 0 , ω)
j =1 k=1
| 2 |
1 Σ 2
i/2 | σ (θ 0 ) | ⎛ ⎞
| i−j |
≤ Ei−j | 2 − 1| = OP ρ2i ,
β0 | σi−j (θ̄ 0 , ω) |
j =1
5.4 ARCH, GARCH and Other Volatility Processes 289
Σ
i
j
= ρ3
j =i/2
Σ
i | |
j 2 (θ̄ , ω)
β0 σi−k 0
⎛ ⎞
.
2
= O ρ4i a.s.
j =i/2 k=1
σi−k+1 (θ̄ 0 , ω)
2 2 (θ ) 2
yi−j σi−j 0 yi−j σl2 (θ 0 )
. = Ei−j
2
≤ ≤ sup sup
σi−j (θ̄ 0 , ω) σi−j (θ̄ 0 , ω) σi−j (θ̄0 , ω) 1≤l<∞ 1/ω̄≤ω≤ω̄ σl (θ̄0 , ω)
and
σl2 (θ 0 )
. sup sup = O(1) a.s.
1≤l<∞ 1/ω̄≤ω≤ω̄ σl (θ̄0 , ω)
j |
| |
Σ 2 (θ̄ , ω) |
| β0 σi−k 0 β0 |
≤ | − 2 + β ||
| ω + (α0 Ei−k
2 + β )σ 2 (θ̄ , ω)
0 i−k 0 α 0 E i−k 0
k=1
Σ
j
1
≤ β0 ω .
σ 2 (θ̄ , ω)
k=1 i−k 0
Σ
i/2 Σ
j
=O(1) 2
Ei−j exp(−S(i − k)) a.s. (5.4.48)
j =1 k=1
=O(ρ6i ) a.s.
Σ
i | |
j
β0 a.s.
.
2
Ei−j = O(ρ7i )
j =i/2 k=1
α0 E 2
i−k + β0
and
Σ
i | |
j 2 (θ̄ , ω)
β0 σi−k 0 a.s.
. sup 2
Ei−j = O(ρ7i ),
1/ω̄≤ω≤ω̄ j =i/2 k=1
ω + (α0 Ei−k
2 + β0 )σi−k
2 (θ̄ , ω)
0
Σ i
1 | |
j 2 (θ̄ , ω)
β0 σi−k 0
.pi,2 (θ̄ 0 , ω) = ,
j =1
β0 ω + (α0 Ei−k + β0 )σi−k
k=1 0
2 2 (θ̄ , ω)
0
Let
⎛ ⎞
Σ
j | |
l Σ∞ | |
j
1 β0 1 ⎝ β0 ⎠
zi,j,1
. = 2
Ei−l 2 +β
+ 2
Ei,i−l,j 2 +β
β0 α E β0 α E
l=1 k=1 0 i−k 0 l=j +1 k=1 0 i−k 0
⎛ ⎞
| |
l
β0
×⎝ ⎠
α E2
r=j +1 0 i,i−r,j
+ β0
and
⎛ ⎞
Σ ∞
1 | | Σ 1 ⎝ | |
j l j
β0 β0 ⎠
zi,j,2
. = +
l=1
β0 α E 2 + β0 l=j +1 β0 k=1 α0 Ei−k
k=1 0 i−k
2 +β
0
⎛ ⎞
| |
l
β0
×⎝ ⎠,
α E2
r=j +1 0 i,i−r,j
+ β0
and
⎛ ⎡ ⎤ν ⎞1/ν
. E (1 − Ei2 )(zi,2 − zi,j,2 ) ≤ cρ j (5.4.50)
⎛ ⎞ 1 Σ∞ ⎛ ⎡ ⎤κ ⎞1/κ
β0
≤ E(1 − E02 )κ E ≤ c1 ρ j ,
β0 α0 E 2 + β0
l=j +1
292 5 Parameter Changes in Time Series Models
for some .0 < ρ < 1 as in the proof of (5.4.42). Hence (5.4.49) is proven due to the
definitions of .zi,1 and .zi,j,1 . The same argument gives (5.4.50). ⨆
⨅
Let
⎛ ⎞
∂ 2 li (θ ) ∂ 2 li (θ )
⎜ ∂α 2 ,
.Qi (θ ) = ⎜ 2
∂α∂β ⎟ ⎟
⎝ ∂ li (θ ) ∂ 2 li (θ ) ⎠
,
∂α∂β ∂β 2
and
⎛ || ||ν ⎞1/ν
. E ||Q̄i − Qi,j || ≤ cρ j , (5.4.52)
= OP (1)
Proof of Theorem 5.4.6 It follows from the proof of Lemma 5.4.13 that .{(1 −
Ei )2 (zi,1 , zi,2 ), i ∈ Z} is .Lν – decomposable, and therefore Theorem A.1.3 can be
applied. Hence Assumption 1.3.2 holds, and so (1.3.8) and Theorem 1.3.1 holds in
this case with the covariance matrix
⎡ ⎤
T
.F = E(1 − E0 ) E (z0,1 , z0,2 ) (z0,1 , z0,2 ) .
2 2
Since .{(1 − Ei )2 (zi,1 , zi,2 ), i ∈ Z} are uncorrelated vectors, the sample covariance
of these vectors can be used as an estimator for .F. However, these vectors are not
observed. The result in Lemma 5.4.14 yields that the sample covariance matrix of
the partial derivatives of .{li (θ̂ N ), 1 ≤ i ≤ N } with respect to .(α, β) can be used.
One can then establish that
|| || ⎛ ⎞
|| || −1/2
. ||F̂N − F|| = oP (log log N) ,
The long–run covariance matrix .F̃N computed from .{r 1 , i ∈ {1, ..., N}}. One may
show that
|| || ⎛ ⎞
|| || −1/2
. ||F̃N − F|| = oP (log log N) ,
in the stationary case, where .G̃ denotes the upper .2 × 2 submatrix of .G. Hence
Theorems 5.4.4 and 5.4.6 imply that if .log(β0 + α02 ) /= 0, then
1 1 ⎛ ⎞1/2 D 1 ⎛ 2 ⎞1/2
−1
. max Z̄T
N (k)F̃N Z̄N (k) → sup B1 (t) + B22 (t) ,
N 1/2 1≤k<N w(k/N ) 0<t<1 w(t)
294 5 Parameter Changes in Time Series Models
if .I (w, c) < ∞ with some .c > 0, where .{B1 (t), 0 ≤ t ≤ t} and .{B2 (t), 0 ≤ t ≤ t}
are independent Brownian bridges. Also,
⎧ ⎛ ⎞1/2 ⎛
N ⎞1/2
−1
. lim P a(log N) max Z̄T
N (k) F̃ Z̄N (k)
N →∞ 1≤k<N (k(N − k))1/2 N
⎞
≤ x + b2 (log N) = exp(−2e−x )
for all .x ∈ R, where .a(t) and .b2 (t) are defined in (1.3.9).
Many of the above results may be generalized to vector valued time series models.
In this section we consider a vector valued time series .Y1 , . . . , YN taking values in
.R , and in particular focus on constructing change point detection tests for vector
d
Yi = O(i)Yi−1 + ei ,
. i ≥ 1,
where .O(i) ∈ Rd×d and .ei ∈ Rd is an independent and identically distributed mean
zero error sequence. Under the no-change null hypothesis,
H0 : O(1) = · · · = O(N ).
. (5.5.1)
We use .O0 to denote the common autoregressive matrix under .H0 . The least squares
estimator, .ÔN of .O0 , minimizes
Σ
N
SN (O) =
. (Yi − OYi−1 )T (Yi − OYi−1 ) .
i=1
This estimator takes the form (see Brockwell and Davis, 2006)
⎛ ⎞−1
Σ
N ΣN
ÔN =
. Yi YT
i−1
⎝ Yi−1 YT
i−1
⎠ .
i=1 j =1
5.5 Vector Autoregressive Models 295
Σ
k
ZN (k) =
. êi YT
i , k ∈ {1, . . . , N }.
i=1
where
denote the residuals. The process .ZN (k) takes values in .Rd×d , but in comparison
to previous results it is easier to state limit results for .ZN (k) when it is viewed as a
vector valued process. We instead consider
where .vec(A) denotes the .vec operator that maps a .d × d matrix to a vector in
dimension .d 2 by stacking its columns (see pg. 311 of Abadir and Magnus (2005)).
We assume the following condition on the innovations in the VAR(1) model.
Assumption 5.5.1 .{ei , i ∈ Z} are independent and identically distributed random
vectors, .Ee0 = 0 and .E||e0 ||ν < ∞ with some .ν > 4.
Let
A = Ee0 eT
. 0
B = E Ȳ0 ȲT
. 0
. C=B⊗A
Theorem 5.5.1 We assume that .H0 of (5.5.1), Assumptions 1.2.1 and 5.5.1–5.5.3
hold.
(i) If .I (w, c) < ∞ with some .c > 0, then
⎛ 2 ⎞1/2
1 1 ⎛ ⎞1/2 D 1 Σd
. max zN (k)C−1 zT
N (k) → sup ⎝ Bi2 (t)⎠ ,
N 1/2 1≤k<N w(k/N ) 0<t<1 w(t) i=1
for all .x ∈ R, where .a(t) and .bd 2 (t) are defined in (1.3.9).
Since in the limit in Theorem 5.5.1(i) is the supremum of the sum of .d 2
independent Brownian bridges, it is worthwhile to look at integral functionals,
which are more stable with respect to moderate to large dimensions .d 2 .
Theorem 5.5.2 If .H0 of (5.5.1), Assumptions 1.2.1, 5.5.1–5.5.3 and
⎛ 1 t (1 − t)
. dt < ∞
0 w(t)
N −1 d ⎛ 1 2 2
1 Σ 1 −1 T D Σ Bi (t)
.
2
zN (k)C zN (k) → dt,
N w(k/N ) 0 w(t)
k=1 i=1
since
|| ||
|| l ||
. ||O || ≤ c2 ||Oq ||Ll/q⎦ ,
Σ
k ∞
Σ
Yi,k =
. Ol ei−l + Ol e∗i−l,k ,
l=0 l=k+1
and
|| N ||
||Σ ⎛ ⎞||
|| T ||
. || Yi YT
i − Ȳi Ȳi || = OP (1). (5.5.5)
|| ||
i=1
where
Σ
k
k Σ
N
ZN,1 (k) =
. Ȳi−1 ȲT
i−1 − Ȳi−1 ȲT
i−1
N
i=1 i=1
and
⎛ ⎞
. ||RN || = OP N −1/2 .
Σ
k Σ
k
.ZN (k) = ei YT
i−1 − (ÔN − O0 ) Yi−1 YT
i−1
i=1 i=1
Σ
k
k Σ
N
= ei YT
i−1 − ei YT
i−1
N
i=1 i=1
⎛N ⎞−1 ⎡⎛ k ⎞
Σ
N Σ Σ
+ ei YT
i−1 Yi−1 YT
i−1 Yi−1 YT
i−1
i=1 i=1 i=1
⎛N ⎞⎤
k Σ T
− Yi−1 Yi−1 .
N
i=1
5.5 Vector Autoregressive Models 299
that
|| N ||
||Σ || ⎛ ⎞
|| T ||
. || ei Ȳi−1 || = OP N 1/2 .
|| ||
i=1
and
|| N ||
|| Σ ||
1 || ||
. max || vec(ei ȲT ) − ┌ N,2 (N − k)|| = OP (1),
N/2≤k<N (N − k)ζ || i−1 ||
i=k+1
with some .ζ < 1/2, .E┌ N,1 (t) = ┌ N,2 (t) = 0 and .E┌ N,1 (t)┌ T
N,1 (s)
= E┌ N,2 (t)┌ T
N,2 (s) = min(t, s)C.
Proof We showed in the proof of Lemma 5.5.1 that .{ei ȲT i−1 , i ∈ Z} is .L –
ν
and
⎛ ⎞⎛ ⎞−1
1 Σ
N Σ
N
Ôk,2
. = Yi YT
i−1 Yi YT
i−1 .
N −k
i=k+1 i=k+1
and
⎛ ⎞1/2+ζ
N
. max ||RN (k)|| = OP (1)
1≤k<N k(N − k)
with some .ζ > 0. Hence the results in Theorem 5.5.1 can be extended to
k(N − k) ⎛ ⎞
vN (k) =
. Ôk,1 − Ôk,2 .
N
1 Σ T 1 Σ
N N
ĈN = B̂N ⊗ ÂN ,
. with ÂN = êi êi and B̂N = Yi YT
i .
N N
i=1 i=1
In Sect. 3.2, we discussed change point detection methods for the variance and
covariance matrices. In this section we extend those results to conditionally het-
eroscedastic processes for which we are interested in evaluating for changes in their
correlation structure. We consider again vector valued observations .Y1 , . . . , YN
taking values in .Rd . We assume that these variables are centered.
5.6 Multivariate Volatility Models 301
⎛ ⎞
τi2 (j ) = E yi2 (j )|Fj −1 .
.
The “devolatized” observations are denoted .Y∗i = (yi∗ (1), . . . , yi∗ (d))T with
yi (j )
yi∗ (j ) =
. , j ∈ {1, . . . , d}.
τi (j )
Similarly to Sect. 5.4, we assume the .Yi series evolves according to a multivariate
GARCH–type model, so that
1/2
.Yi = E i ei ,
Assumption 5.6.5 .E||Y0 ||ν < ∞ with some .ν > 4 and .{Yi , i ∈ Z} is .Lr –
decomposable with some .r > 2.
302 5 Parameter Changes in Time Series Models
Let
H0 : ρ1 (k, l) = . . . = ρN (k, l)
. for all k, l ∈ {1, . . . , d}, (5.6.1)
such that ρ1 (j ∗ , l∗ ) = · · · = ρk ∗ (j ∗ , l∗ )
/= ρk ∗ +1 (j ∗ , l∗ ) = · · · = ρN (j ∗ , l∗ ).
Under the null hypothesis the covariance matrix of .Y∗i = (yi∗ (1), . . . , yi∗ (d))T does
not depend on time index i, while under the alternative at least one of the elements
of the covariance matrix changes at an unknown time .k ∗ . In essence then we wish
to perform change point analysis for the mean of the matrices .Y∗i (Y∗i )T . Since these
matrices are symmetric, we may characterize them by .d(d + 1)/2 dimensional
vectors using the .vech operator which stacks the columns of a symmetric matrix
starting with the diagonal into a vector.
To formulate a CUSUM process of such vectors, we introduce
Σ
j
s(j ) =
. ri , s(0) = 0
i=1
with
Assuming that .H0 of (5.6.1) holds, we can define the long–run covariance matrix
∞
Σ
D=
. Er0 rT
l.
l=−∞
We show in the proofs that under Assumption 5.6.5 the infinite sum defining .D
is absolutely convergent. The normalization in our test statistics requires that this
matrix is non–singular.
5.6 Multivariate Volatility Models 303
and
N ⎛ ⎞T ⎛ ⎞
(2) 1 Σ i −1 i
.M
N = 2 s(i) − s(N ) D s(i) − s(N ) .
N N N
i=1
Theorem 5.6.1 If .H0 of (5.6.1) and Assumption 5.6.1–5.6.6 are satisfied, then
(1) D Σ
d(d+1)/2
MN
. → sup Bi2 (t),
0<t<1 i=1
and
(2) D Σ ⎛ 1
d(d+1)/2
.M
N → Bi2 (t)dt,
i=1 0
where .{Bi (t), 0 ≤ t ≤ 1}, i ∈ {1, . . . , d(d + 1)/2} are independent Brownian
bridges.
Theorem A.2.11 can again be used to approximate the distributions of the limits
in Theorem 5.6.1 for large d.
In the definition of .yi∗ (j ), we normalize the observed time series with .τi (j ),
which is not observable. In practice this may be replaced by an estimator .τi (j )
computed from .Y1 , . . . , Yi−1 . We consider parametric models to do so. Suppose
there is a p dimensional parameter .θ ∈ RP such that
We require that .τi (j, θ ) and .τ̄i (j, θ) are uniformly close to each other as .i → ∞.
The true value of .θ is denoted by .θ 0 :
Assumption 5.6.7 There is a closed ball .O0 ⊂ Rp with center .θ 0 and a sequence
.a(i) satisfying .a(i) → 0, .ia(i) → ∞, as .i → ∞, such that
. max sup |τi (j, θ) − τ̄i (j, θ)| = O(a(i)), a.s. (i → ∞).
1≤j ≤d θ ∈O0
304 5 Parameter Changes in Time Series Models
Assumption 5.6.7 means that the difference between the stationary .τi (j, θ ) and
the non stationary .τ̄i (j, θ ) is small, i.e. there is a negligible difference between
estimating .θ 0 based on the information .Y1 , . . . , Yi−1 or .{Ys , s ≤ i − 1} when i
is large. We estimate .θ 0 with .θ̂ N which is consistent with rate .N −1/2 :
Assumption 5.6.8 .||θ̂ N − θ 0 || = OP (N −1/2 ).
The random functions .τi (j, θ) are smooth functions of .θ in a neighbourhood of
θ 0:
.
Assumption 5.6.9 There is a closed ball .O0 ⊂ Rp with center .θ 0 such that
|| ||
|| T ||
. ||τi (j, θ ) − τi (j, θ 0 ) − gi (j )(θ − θ 0 )|| ≤ ḡi ||θ − θ 0 || ,
2
yi (j )
.ŷi (j ) = ,
τ̄i (j, θ̂ N )
where
Σ
N
r̄N =
. r̂i .
i=1
5.6 Multivariate Volatility Models 305
Σ
N −1 ⎛ ⎞
l
D̂N =
. K γ̂ l ,
h
l=−N +1
where K is the kernel and .h = h(N) is the window (smoothing parameter). We refer
to Sect. 3.1 for a discussion on the possible choices of the kernel and the window.
Using the proofs in Sect. 3.1 one can verify that Assumption 5.6.10 is satisfied under
standard conditions on K and .h = h(N).
(1) (2)
Similarly to .MN and .MN we define
⎛ ⎞T ⎛ ⎞
1 i i
max ŝ(i) − ŝ(N ) D̂−1
(1)
M̂N =
.
N ŝ(i) − ŝ(N )
N 1≤i≤N N N
and
N ⎛ ⎞T ⎛ ⎞
(2) 1 Σ i −1 i
.M̂
N = 2 ŝ(i) − ŝ(N ) D̂N ŝ(i) − ŝ(N )
N N N
i=1
with
Σ
i
ŝ(i) =
. r̂j .
j =1
Theorem 5.6.2 If .H0 of (5.6.1) and Assumption 5.6.1–5.6.10 are satisfied, then
D Σ
d(d+1)/2
.M̂N(1) → sup Bi2 (t)
0<t<1 i=1
and
(2) D Σ ⎛ 1
d(d+1)/2
.M̂
N → Bi2 (t)dt,
i=1 0
where .{Bi (t), 0 ≤ t ≤ 1}, i ∈ {1, . . . d(d+1)/2} are independent Brownian bridges.
Before we prove Theorems 5.6.1 and 5.6.2 we discuss a few examples where the
conditions of these theorems are satisfied.
306 5 Parameter Changes in Time Series Models
Example 5.6.1 Bollerslev (1990) and Jeantheau (1998) specified the constant
conditional correlation (CCC.(p, q)) multivariate GARCH model by the following
equations:
E i = Di RDi ,
.
⎛ ⎞T
Di = diag (τi (1), τi (2), . . . , τi (d)) ,
. hi = τi2 (1), τi2 (2), . . . , τi2 (d)
and
Σ
q Σ
p
hi = c +
. Al (Yi−l ◦ Yi−l ) + Bj Yi−j ,
l=1 j =1
.E i = Di Ri Di , (5.6.2)
where .hj is a known function and .ζ j , j ∈ {1, . . . , d} are unknown parameters. The
conditional correlation of .Yi satisfies
and
⎡ ⎤
Qi = θ1 C + θ2 (diag(Qi−1 ))1/2 Y∗i−1 (Y∗i−1 )T (diag(Qi−1 ))1/2 + θ3 Qi−1 ,
.
where .Y∗i are the devolatized observations. It is assumed that .C is positive definite,
.θ1 > 0, θ2 ≥ 0, θ3 ≥ 0 and .θ1 + θ2 + θ3 = 1. The parameters of the process are
.C, ζ 1 , . . . , ζ d , θ2 and .θ3 . Since there are several univariate asymmetric GARCH
models (see Francq and Zakoian, 2010), the cDCC model accounts for possible
asymmetry of the returns. Aielli (2013) points out if .hj (ζ j , . . .) are stationary
and ergodic, then vector valued observations satisfying the cDCC also has these
properties. Carrasco and Chen (2002) and Hörmann (2008) prove that augmented
univariate GARCH processes have these properties under minor conditions. Since
augmented GARCH sequences are .Lν –decomposable (see Carrasco and Chen,
2002), cDCC also has this property. The existence of the higher moments of
augmented GARCH sequences are also discussed in Carrasco and Chen (2002) and
Hörmann (2008). We note Hörmann (2008) proves that the augmented GARCH
processes are decomposable Bernoulli shifts and therefore a process following the
cDCC modell is as well. Using .β–mixing or the Bernoulli shift property, one can
show that Assumption 5.6.10 is satisfied by following the arguments in Wu and
Zaffaroni (2018). In Definition 3.4 of Aielli (2013), the .ζ i ’s, the parameters of the
augmented GARCH sequences, are estimated by QMLE. The proofs in Section 3.3
in Aielli (2013) yield that the estimators obtained in the second and third steps have
the properties in Assumptions 5.6.7–5.6.9.
Example 5.6.3 The dynamic conditional correlation (DCC) GARCH model is an
extension of the CCC and dCCC models in Examples 5.6.1 and 5.6.2. Equations
(5.6.2)–(5.6.4) hold but (5.6.5) is replaced with
( )T
Qi = C + AY∗i−1 Y∗i−1 AT + BQi−1 BT ,
.
Σ
q Σ
p
T
Ei = C +
. Aj Yi−j (Aj Yi−j ) + Bk E i−k BT
k, (5.6.6)
j =1 k=1
QMLE and the variance targeting QMLE (see Comte and Lieberman, 2003, Hafner
and Preminger, 2009, Pedersen and Rahbek, 2014 and Francq et al., 2016). For
the sake of simplicity we assume that .p = q = 1 and the parameters matrices
are denoted by .A and .B. Boussama et al. (2011) proves the existence of a unique
stationary solution of the BEKK equations assuming that the distribution of .e0 is
absolutely continuous with respect to the Lebesgue measure on .Rd , the point .0 is
an interior point of the support of the distribution of .e0 , and the spectral radius
of .A + B is less than 1. In their proofs Boussama et al. (2011) also shows that
the solution is ergodic and geometrically .β–mixing, and their proof can also be
used to establish .Lν –decomposability of the solution. Hence Assumptions 5.6.3
and 5.6.5 hold and we need to assume only that .E||Y0 ||r < ∞. Hafner and
Preminger (2009) provide explicit conditions for the existence of moments. If
Assumption 5.6.6 holds, then the mixing property of .Yi and the existence of the
moments of .||Y0 || yield Assumption 5.6.6 along the lines of the calculations in
Wu and Zaffaroni (2018). The parameters of the BEKK model can be estimated
by the QMLE and the variance targeting QMLE. Hafner and Preminger (2009),
Pedersen and Rahbek (2014) and Francq et al. (2016) establish Assumption 5.6.7
with expnential rate, and the established asymptotic normality in those papers yields
Assumption 5.6.8. Finally, the computation of the second derivatives of .τi (i, θ ) in
Pedersen and Rahbek (2014) (see also Hafner and Preminger, 2009 and Francq et al.,
2016) gives Assumption 5.6.9.
Example 5.6.5 Engle et al. (1990) defined the conditional covariance matrix .E i ,
using a factor model, by the equation
Σ
p
Ei = C +
. λi (j )β j β T
j ,
j =1
and
λi (j ) = ωj + αj yi−1
.
2
(j ) + βj λi−1 (j ),
D [0,1]d
N −1/2 (s(Nu) − Es(Nu)) −→ W(u),
.
Proof It follows from Assumptions 5.6.2–5.6.5 that .yi∗ (k)yi∗ (l), k, l ∈ {1, . . . , d}
is also stationary and .β–mixing with the same rate as .Yi . Since Assumption 5.6.4
implies that there is .τ0 > 0 such that .τi (j ) ≥ τ0 , we get
1 ( )1/2
E|yi∗ (k)yi∗ (l)|r/2 ≤
.
2
E E|yi∗ (k)|r E|yi∗ (l)|r < ∞,
τ0
via the Cauchy–Schwarz inequality and Assumption 5.6.5. Hence the weak con-
vergence of partial sums in Ibragimov (1962) (see also Bradley, 2007) implies the
lemma. ⨆
⨅
Proof of Theorem 5.6.1 Lemma 5.6.1 implies that
⎛ ⎞
−1/2 LNu⎦ D [0,1]d
N
. s(Nu) − s(N ) −→ W(u) − uW(1). (5.6.7)
N
where .{Bi (u), u ≤ u ≤ 1}, .i ∈ {1, . . . , d(d + 1)/2} are independent Brownian
bridges. The result now follows from (5.6.7) and (5.6.8) via the continuous mapping
theorem. ⨆
⨅
Proof of Theorem 5.6.2 It follows from the definition of .ŷi (k) that
ŷi (k)ŷi (l) − yi∗ (k)yi∗ (l) = ai,1 (k, l) + · · · + ai,8 (k, l),
.
where
⎛ ⎞⎛ ⎞
1 1 1 1
ai,1 (k, l) = yi (k)yi (l)
. − −
,
τ̄i (k, θ̂ N ) τi (k, θ̂ N ) τ̄i (l, θ̂ N )
τi (l, θ̂ N )
⎛ ⎞⎛ ⎞
1 1 1 1
ai,2 (k, l) = yi (k)yi (l) − − ,
τ̄i (k, θ̂ N ) τi (k, θ̂ N ) τi (l, θ̂ N ) τi (l, θ 0 )
⎛ ⎞
1 1 yi (l)
ai,3 (k, l) = yi (k) − ,
τ̄i (k, θ̂ N ) τi (k, θ̂ N ) τi (l)
⎛ ⎞⎛ ⎞
1 1 1 1
ai,4 (k, l) = yi (k)yi (l) − − ,
τi (k, θ̂ N ) τi (k, θ 0 ) τ̄i (l, θ̂ N ) τi (l, θ̂ N )
⎛ ⎞⎛ ⎞
1 1 1 1
ai,5 (k, l) = yi (k)yi (l) − − ,
τi (k, θ̂ N ) τi (k, θ 0 ) τi (l, θ̂ N ) τi (l, θ 0 )
310 5 Parameter Changes in Time Series Models
⎛ ⎞
yi (k) 1 1
ai,6 (k, l) = yi (l) − ,
τi (k) τ̄i (l, θ̂ N ) τi (l, θ̂ N )
⎛ ⎞
1 1 yi (l)
ai,7 (k, l) = yi (k) −
τi (k, θ̂ N ) τi (k, θ 0 ) τi (l)
⎛ ⎞
yi (k) 1 1
ai,8 (k, l) = yi (l) − .
τi (k) τi (l, θ̂ N ) τi (l, θ 0 )
Since .τ̄i (k) ≥ τ0 > 0, by Assumptions 5.6.8 and 5.6.9 we have, on account of the
mean value theorem, that
Σ
j Σ
N
−1/2
N
. max |ai,1 (k, l)| = OP (1)N −1/2 |yi (k)yi (l)|a 2 (i),
1≤j ≤N
i=1 i=1
where .a(·) is defined in Assumption 5.6.7. We can assume without loss of generality
that .a(i) is increasing as .i → ∞. Using again Assumption 5.6.7, we can define a
sequence .ai such that as .i → ∞, .i −1/2 ai → 0, and .i 1/2 a(ai ) → 0. Therefore
Σ
N Σ
aN
−1/2 −1/2
N
. |yi (k)yi (l)|a (i) ≤ N
2
|yi (k)yi (l)|a 2 (i)
i=1 i=1
Σ
N
+ N −1/2 |yi (k)yi (l)|a 2 (i) (5.6.9)
i=aN +1
⎛ ⎞
= OP N −1/2 aN + Na 2 (aN ) = oP (1),
where in the last step we used that according to the ergodic theorem
1Σ
L
. |yi (k)yi (l)| → E|y0 (k)y0 (l)|, a.s. L → ∞.
L
i=1
We note .E|y0 (k)y0 (l)| < ∞, since by the Cauchy–Schwarz inequality and
Assumption 5.6.5
⎛ ⎞1/2
E|y0 (k)y0 (l)| ≤ Ey02 (k)Ey02 (l)
. < ∞.
5.6 Multivariate Volatility Models 311
Σ
N ⎛ ⎞Σ
N
N −1/2
. |ai,2 (k, l)| = OP N −1/2 |yi (k)yi (l)|a(i)
i=1 i=1
⎡ ⎤
× ||gi (l)||||θ̂ N − θ 0 || + ḡi ||θ̂ N − θ 0 ||2 .
Σ
N
1 Σ
N
. |yi (k)yi (l)|a(i)||gi (l)||||θ̂ N − θ 0 || = OP (1) |yi (k)yi (l)|a(i)||gi (l)||
N
i=1 i=1
= oP (1),
since
⎛ ⎞1/2
|y0 (k)y0 (l)|||gi (l)|| ≤ E(y0 (k)y0 (l))2 E||gi (l)||2
.
⎛ ⎞1/4 ⎛ ⎞1/2
≤ Ey04 (l)Ey04 (k) E||gi (l)||2
< ∞.
Σ
N
1 Σ
N
. |yi (k)yi (l)|a(i)ḡi ||θ̂ N − θ 0 ||2 = OP (1) |yi (k)yi (l)|a(i)||ḡi ||
N 3/2
i=1 i=1
⎛ ⎞
1 Σ
N
1
= OP (1) max ḡi |yi (k)yi (l)|a(i)
N 1/2 1≤i≤N N
i=1
= oP (1),
1
. max ḡi = oP (1).
N 1/2 1≤i≤N
Similarly,
Σ
N
1 Σ
N
. N −1/2 |ai,3 (k, l)| = OP (1) |yi (k)yi (l)|a(i) = oP (1).
N 1/2
i=1 i=1
312 5 Parameter Changes in Time Series Models
Σ
N
N −1/2
. |ai,j (k, l)| = oP (1), j = 4, 5, 6.
i=1
Σ
N
= OP (1)N −1/2 |yi (k)yi (l)|ḡi ||θ 0 − θ̂ N ||2
i=1
⎛ ⎞
1 Σ
N
−1/2
= OP (1) N max ḡi |yi (k)yi (l)|
1≤i≤N N
i=1
= oP (1).
= oP (1).
= oP (1).
yt = β0 + β1 zt + β2 yt−1 + Ei .
. (5.7.1)
We estimated the model parameters from the entire sample using ordinary least
squares, and the residuals computed as in (4.1.26) and their autocorrelation function
are shown in Fig. 5.2. These indicate that the residuals appear to exhibit light
autocorrelation, and also appear to undergo a change in variation a little past halfway
through the observation period.
We computed the quadratic form of the CUSUM process
⎡ ⎤1/2
ẐT
N (t) D̂−1 Ẑ (t)
N
QN (t) =
. (5.7.2)
[t (1 − t)]1/4
Consumption
0
−1
−2
2.5
Income
0.0
−2.5
Fig. 5.1 Plots of the quarterly percentage change in consumption and income over the period
1970–2016 (N = 188)
obtained via simulation. We saw that this process exceeded the 95% quantile of
the limit distribution in Theorem 5.2.1 (approximate p–value was 0.011), and the
location at which the process was maximized coincided with the third quarter of
the year 2000. Performing binary segmentation around this initial change point
estimator suggested that there are no remaining change points of significance. The
estimators of parameters before and after the change point were (rounded to two
decimal places) β̂ 1 = (0.45, 0.37, 0.14) and β̂ 2 = (0.23, 0.04, 0.52), suggesting
5.7 Data Examples 315
1.0
1
0.8
0.6
0
ACF
0.4
−1
0.2
0.0
−2
0 50 100 150 0 5 10 15 20
Index Lag
Fig. 5.2 Plots of the residuals computed as in (4.1.26) from the model (5.7.1), along with their
ACF plot
3.0
2.5
(ZTN(t)D−1ZN(t))1/2 (t(1−t))1/4
2.0
1.5
^
1.0
0.5
0.0
Fig. 5.3 Plot of the process QN in (5.7.2) computed to evaluate for change points in the model
parameters in the model (5.7.1). The largest value of QN is attained at k̂ = 2000.75, the third
quarter of year 2000
316 5 Parameter Changes in Time Series Models
that level of the average change in consumption decreased and became less (linearly)
related to changes in income.
Example 5.7.2 (Stability of Vector GARCH Models for Emerging Market
Stock Indices) It was indicated by Forbes and Rigobon (2002) that a financial
contagion effect occurs if the “interlinkages” across markets experience a significant
increase after a market event. The actual dates at which conditional correlations
exhibit structural breaks are unknown, although they can be estimated and detected
through statistical methods as described in Sect. 5.6. In this example we conduct a
change point analysis conditional correlations between log–returns modelled using
vector GARCH models, as studied in Barassi et al. (2020). We considered three
groups of emerging stock market price indexes, as well as several benchmark
indices. The three regions Latin America, Central East Europe, and (East) Asia
were considered. The specific indices considered are detailed in Table 5.1. To each
regional group we added the S&P 500 index of the United States, and to the German
DAX 40 and Japanese Nikki 225 indices were added to the CEE and Asia groups,
respectively. The data were taken from the Datastream database and covered the
period 1 September 2006 to 1 of September 2010.
Vectors of log–returns were constructed for each region, to which we fit BEKK
as well as cDCC models using QMLE. To find changes in the correlation structures
of these three datasets, we apply Theorem 5.6.2 to evaluate the significance of
the test statistic M̂N(1) for each model. If a change was detected at the 95% level,
binary segmentation was performed. In this way the data were segmented into
six approximately homogenous subsets. The change point detection results are
displayed in Fig. 5.4. Both models show consistent patterns. The first prominent
change is around February 2007 which then reverted around August 2007. The
Table 5.1 Stock indices considered from three different regions: Latin America, Central East
Europe, and East Asia
Latin America Central East Europe (CEE) East Asia
Argentina (Argentina Czech (Prague SEPX) Hong Kong (Hang Seng)
MERVAL)
Brazil (Brazil Estonia (OMX Tallin) Indonesia (IDX composite)
BOVESPA)
Chile (Chile Santiago SE Hungary (Budapest) South Korea (Korea SE composite)
General)
Mexico (Mexico IPC) Poland (Warsaw General) Malaysia (Malaysia KLCI)
Colombia (Colombia Romania (Romania BET) Philippines (Philippine SE)
IGBC)
Peru (BVL General) Slovakia (Slovakia SAX 16) Singapore (Straits Times)
U.S. S&P 500 Slovenia (Slovenian blue chip) Taiwan (Taiwan SE weighted)
U.S. S&P 500 Thailand (Bangkok S.E.T)
Germany (DAX 40) China (Shanghai S.E. A share)
U.S. S&P 500
Japan (Nikki 225)
5.7 Data Examples 317
1 1
0.8 0.8
0.6 0.6
0.4 0.4
Value
Value
0.2 0.2
(Argentina)
(Argentina)
(Brazil)
0 0 (Brazil)
(Chile)
(Chile)
-0.2 (Mexico) -0.2 (Mexico)
(Colombia)
(Colombia)
-0.4 (Peru) -0.4
(Peru)
Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010 Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010
Year Year
(a) (b)
Conditional Correlations between US and Central East European markets(BEKK) Conditional Correlations between US and Central East European markets(cDCC)
1.2 1.2
(Czech)
(Estonia)
1 1
(Hungary)
(Poland)
0.8 (Romania) 0.8
(Slovakia)
0.6
(Slovenia) 0.6
0.4 0.4
Value
Value
0.2 0.2
(Czech)
0 0
(Estonia)
(Hungary)
-0.2 -0.2 (Poland)
(Romania)
(Slovakia)
-0.4 -0.4
(Slovenia)
Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010 Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010
Year Year
(c) (d)
Conditional Correlations between US and East Asian markets(BEKK) Conditional Correlations between US and East Asian markets(cDCC)
1.2 1.2
(Hong Kong) (Hong Kong)
(Indonesia) (Indonesia)
1 (Japan) 1 (Japan)
(South Korea) (South Korea)
(Malaysia) (Malaysia)
0.8 (Philippines) 0.8 (Philippines)
(Singapore) (Singapore)
(Taiwan) (Taiwan)
0.6 (Thailand) 0.6 (Thailand)
(China) (China)
0.4 0.4
Value
Value
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010 Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010
Year Year
(e) (f)
Fig. 5.4 Conditional correlation between the U.S. (S&P 500) and Latin America, CEE and Asia
estimated using multivariate GARCH models. The vertical lines show change points estimated
using binary segmentation. (a) BEKK model for Latin America (b) cDCC model for Latin
America. (c) BEKK model for CEE. (d) cDCC model for CEE. (e) BEKK model for Asia. (f)
cDCC model for Asia
third change generally coincided with September 2008 and the fourth and fifth
changes occurred in the second half of 2009 and April 2010 respectively. The
East Asian markets appear relatively less connected with the U.S. and tend to have
higher resistance, which might be explained with their closer relation with the large
economies in the area, such as Japan and China.
318 5 Parameter Changes in Time Series Models
Fig. 5.5 Bloomberg Galaxy Crypto Index log prices with the estimated times of changes (vertical
lines)
5.8 Exercises 319
an explosive episode may be spurious. The tests in Theorem 5.3.4. are robust to both
conditional and unconditional volatility, and therefore they lend themselves to being
applied to this dataset. Looking at a graph of the data may suggest the presence of
a break around the September 2021 peak on Fig. 5.5, it is also possible that the
behaviour of the index in the terminal period of the sample is driven by changes in
the volatility of the series rather than larger structural changes.
5.8 Exercises
with
⎛ ⎛ ⎞2
Σk
1 Σ
k
.rN = min ⎝ Êi,0,k − Êl,0,k
1≤k<N k
i=1 l=1
⎛ ⎞2 ⎞
Σ
N
1 Σ
N
+ Êi,k,N − Êl,k,N ⎠,
N −k
i=k+1 l=k+1
where Êi,j,k = yi − ρ̂j,k yi−1 and ρ̂j,k is the least squares estimator for the
autoregressive parameter computed from {yi , j + 1 ≤ i ≤ k}. Compute the limit
distribution of TN under the null hypothesis.
Exercise 5.8.2 We consider the AR(1) model yi = ρi yi−1 + Ei , where {Ei , i ∈ Z}
are independent and identically distributed random variables with EE0 = 0, 0 <
EE02 = σ 2 < ∞ and E|E0 |κ < ∞ with some κ > 2. We wish to test the null
hypothesis H0 : ρ1 = ρ2 = . . . = ρN against the alternative
⎧
yi = ρ1 yi−1 + Ei , 1 ≤ i ≤ k1 ,
yi =
.
yi = ρk1 +1 yi−1 + Ei , k1 + 1 ≤ i ≤ N,
320 5 Parameter Changes in Time Series Models
with
⎛ ⎛ ⎞2
Σ
k
1Σ
k
rN = min ⎝
. Êi,0,k − Êl,0,k
1≤k<N k
i=1 l=1
⎛ ⎞2 ⎞
Σ
N
1 Σ
N
+ Êi,k,N − Êl,k,N ⎠,
N −k
i=k+1 l=k+1
where Êi,j,k = yi − ρ̂j,k yi−1 and ρ̂j,k is the least squares estimator for the
autoregressive parameter computed from {yi , j + 1 ≤ i ≤ k}. Compute the limit
distribution of TN under the null hypothesis.
Exercise 5.8.4 We consider the dynamic regression model yi = xi β + ρyi−1 +
a(i/N)ηi , 1 ≤ i ≤ N, y0 = 0, where a(u), 0 ≤ u ≤ 1 is a Riemann integrable
function and |ρ| < 1. We assume that {xi , 1 ≤ i ≤ N } are independent and
identically distributed random variables with Exi = 0, Exi2 = σ 2 < ∞,{ηi , 1 ≤
i ≤ N } are independent and identically distributed random variables with Eηi =
0, Eηi2 = 1. The two sequences are independent. Compute
1 Σ 2
N
. lim Eyi .
N →∞ N
i=1
Prove that
1 Σ 2
N
. yi converges in probability.
N
i=1
where (βj,k , ρj,k ) is the least squares estimators computed from {xi , yi , j +1 ≤ i ≤
k}.
Exercise 5.8.7 We consider the dynamic regression model yi = xi βi + ρi yi−1 +
a(i/N)ηi , 1 ≤ i ≤ N, y0 = 0, where a(u), 0 ≤ u ≤ 1 is a Riemann integrable
function and |ρ| < 1. We assume that {xi , 1 ≤ i ≤ N } are independent and
identically distributed random variables with Exi = 0, Exi2 = σ 2 < ∞, E|xi |4 <
∞, {ηi , 1 ≤ i ≤ N} are independent and identically distributed random variables
with Eηi = 0, Eηi2 = 1 and E|ηi |4 < ∞. The two sequences are independent. We
wish to test H0 : (β1 , ρ1 ) = (β2 , ρ2 ) = . . . = (βN , ρN ) against the alternative
⎧
xi β1 + ρ1 yi−1 + a(i/N)ηi , 1 ≤ i ≤ k1 ,
yi =
.
xi βk1 +1 + ρk1 +1 yi−1 + a(i/N)ηi , k1 + 1 ≤ i ≤ N
that under the null hypothesis E log ||(B0 + E0,1 )r || < 0 with some positive integer
r. Show that under the null hypothesis (5.8.2) has a stationary solution.
Exercise 5.8.9 We consider the multivariate RCA(1) model of (5.8.2). We assume
that {Ei,1 , i ∈ Z} are independent and identically distributed random vectors in
Rd , EE0,1 = 0, E||E0,1 ||ν < ∞ with some ν > 4. We assume that {Ei,2 , i ∈
Z} are independent and identically distributed random vectors in Rd , EE0,2 =
0, E||E0,2 ||ν < ∞ with some ν > 4. The two sequences are independent. We
wish to test H0 : B1 = B2 = . . . = BN . The common value of the regression
parameter under the null hypothesis is denoted by B0 . We assume that under the
null hypothesis E||(B0 + E0,1 )r || < 1 with some positive integer r. Find a test and
compute its asymptotic distribution under the null hypothesis.
Exercise 5.8.10 We test for the stability of a GARCH(1,1) model. Under the null
hypothesis the model is the stationary sequence
yi = hi Ei
. and h2i = ω0 + α0 yi−1
2
+ β0 Ei−1
2
,
ω0 >, α0 > 0, β0 > and E log(β0 + α0 Ei2 ) < 0. We define the squared residuals by
the recursions
yi2
Êi2 =
. and σ̂i2 = ω̂N + α̂N yi−1
2
+ β̂N σ̂i−1
2
, (5.8.3)
σ̂i2
where (ω̂N , α̂N , β̂N ) is the QMLE for (ω0 , α0 , β0 ). Assuming that (5.8.3) holds,
compute the limit distribution of
| k |
|Σ k Σ 2 ||
N
−1/2 |
TN = N
. max | Êi −
2
Êi | .
1≤k<N | N |
i=1 i=1
Davis et al. (1995) proves that Theorem 4.1.2 holds for AR(d) sequences. They also
investigate the cases when the variance as well as d, the lag of the autoregressive
process change at an unknown time. They use a different but asymptotically
equivalent normalization of the maximally selected log likelihood, and their sim-
ulations show better finite sample properties. Davis et al. (2006) proposes the
minimum description to segment time series data into stationary subsets. Theoretical
justifications are in Davis et al. (2008) and Davis and Yau (2013). Lavielle (1999),
Lavielle and Moulines (2000) and Lavielle and Teyssiére (2006) provide methods
to find multiple changes in time series. Dalla et al. (2020) allows changes in the
mean as well as in the variance. Akashi et al. (2018) and Chakar et al. (2017)
5.9 Bibliographic Notes and Remarks 323
develop robust methods to conduct change point analysis for autoregressive models.
Gombay (2008) derived a method based on score vectors.
Due to its importance and relative simplicity, changes in AR(1) processes have
been investigated by several authors. Chong (2001) assumes stationarity, while Pang
et al. (2014), Pang et al. (2018)) also allow for non–stationarity. Busetti and Harvey
(2001) test for the presence of a random walk in a sequence with several breaks. Zhu
and Ling (2011) uses a likelihood test to find a change from AR(p) to threshold
AR(p) model. The transition to a threshold model is also investigated in Berkes
et al. (2011).
Krämer et al. (1988) apply the weighted sum of the recursive residuals in an
AR(1) dynamic regression model to a change. Vogelsang (1997) uses the maximally
selected sums of functionals of Wald statistics. For further results on dynamic
regression we refer to Bauer (2005) and Guay and Guerre (2006). Ling and Li
(1998), Ling (1999) Ling and McAleer (2003a), Ling and McAleer (2003b) and Li
et al. (2002) provide several results and surveys on ARMA processes with condition-
ally heteroscedastic errors. Kirch and Kamgaing (2012) develop testing procedures
for the detection of structural changes in nonlinear autoregressiveprocesses.
Andél (1976) and Nicholls and Quinn (1982) introduced the random coefficient
model (RCA), and established conditions under which it admits a stationary solution
as well as its basic probabilistic and statistical properties. Schick (1996) and
Janečková and Prášková (2004) obtained several limit theorems for the estimators
of the parameters. Aue et al. (2006) contains a comprehensive study of RCA(1)
sequences in the stationary case. Berkes et al. (2011) extended these results to
the non stationary case. Aue (2004) obtained Gaussian approximations for partial
sums of RCA(1) variables. The proofs of Theorems 5.3.1 and 5.3.2 rely on Horváth
and Trapani (2016). Thavaneswaran et al. (2009) generalizes the stationary case
when the errors follow a stationary GARCH model. Erhardsson (2014) investigates
vector valued stationary RCA processes. Dong and Spielmann (2020) contains some
applications to ruin theory. Kang and Lee (2009) studies parameter change tests in
random coefficient integer–valued autoregressive processes with an application to
polio data. For a survey on RCA processes we refer to Regis et al. (2021). Horváth
and Trapani (2016) used weighted CUSUM to test the stability of RCA models
while Horváth et al. (2024) applied the maximally selected likelihood to the same
problem.
Francq and Zakoian (2010) provides an excellent account of the theory of the
GARCH and other volatility processes. We make use of their methods in Sect. 5.4.
Berkes et al. (2003) studies the properties of the GARCH(p, q) sequence and the
quasi–likelihood estimators of the parameters. Their results are extended to ARMA–
GARCH processes in Francq and Zakoian (2004). We use the quasi–likelihood
method which assumes that the innovations are standard normal random variables.
The results of Sect. 5.4 can be extended when the more general quasi–likelihood
method is used to estimate the parameters. Optimality of the conditions needed to
estimate the parameters of GARCH processes is discussed in Berkes and Horváth
(2003b). Hall and Yao (2003) obtains the limit distributions of the quasi maximum
likelihood estimators when the innovations have heavy tails. Robust estimation
324 5 Parameter Changes in Time Series Models
of the parameters in the ARCH model is discussed in Horváth and Liese (2003)
and Peng and Yao (2003). They show that the Lp estimators can be linearized
so that the change point methods discussed can be extended to such estimators.
Hillebrand (2005) investigates the effect of neglected change points in statistical
inference on volatility processes. Li et al. (2002) reviews some theoretical results
for time series models with GARCH errors, and it is directed towards practitioners.
They discuss various new volatility models, including double threshold ARCH and
GARCH, ARFIMA–GARCH, CHARMA and vector ARMA–GARCH. Hörmann
(2008) shows that the augmented GARCH sequences are decomposable Bernoulli
shifts under natural conditions. Berkes et al. (2011) provides a method to detect
if an AR model changes to a threshold AR model at an unknown time. The
score function based testing is adapted from Berkes et al. (2004) who studies
the stationary GARCH(p, q) case. The comparison of the estimates in volatility
processes has been initiated by Ling (2007). Ling (2007) uses the concept of Near
Epoch Dependence (NED) which is closely related to Lν –decomposability. For
more general results on change point detection in time series we refer to Ling
(2016). Jensen and Rahbek (2004) proves the asymptotic normality of the quasi–
likelihood estimators in explosive GARCH (1,1) models. Pedersen and Rahbek
(2014) shows that the variance targeting method can be extended to multivariate
GARCH models.
In this chapter we pointed out the connection between CUSUM based testing and
other methods to find changes in the parameters of a time series. Bücher et al. (2019)
uses a CUSUM based test for stationarity of the observations. It is an interesting
question to distinguish between non–stationarity and changes in the parameters,
including the mean and the variance; see Busetti and Harvey, 2003, Busetti and
Taylor, 2004, Leybourne et al. (2006). Also, we only consider the alternative when
the parameters abruptly change. However, all the proposed tests have power against
other types of alternatives as well, for example gradual changes. Bin and Yongmiao
(2016) investigate smooth changes in GARCH models.
Our discussion of change points for multivariate time series follows closely Kirch
et al. (2015), where some more general results are proven. The results in Sect. 5.6
are motivated by Barassi et al. (2020). For some interesting applications to finance
we refer to Avalos (2014), Edwards (2020), Ju et al. (2020) and Mitchener and Pina
(2020)
Chapter 6
Sequential Monitoring
Xi = μ0 + Ei , 1 ≤ i ≤ M,
.
where .{Ei , i ∈ Z} is a mean zero, stationary sequence. As such the training sample
is assumed to be drawn from a stationary process. Under the null hypothesis, the
mean does not change as we continue to observe data beyond the training sample.
This is formulated as
H0 : Xi = μ0 + Ei , M + 1 ≤ i < ∞.
. (6.1.1)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 325
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_6
326 6 Sequential Monitoring
This hypothesis is often termed the “open-ended” null hypothesis, since we assume
that we will continue to sample observations into the indefinite future. Under the
alternative the mean changes after observing .k ∗ observations beyond the training
sample, so that
⎛
μ0 + Ei , M + 1 ≤ i ≤ M + k ∗ ,
HA : Xi =
. (6.1.2)
μA + Ei , M + k ∗ + 1 ≤ i < ∞.
H0 : Xi = μ0 + Ei ,
. M + 1 ≤ i ≤ cM , (6.1.3)
against
⎛
μ0 + Ei , M + 1 ≤ i ≤ M + k ∗ ,
HA : Xi =
. (6.1.4)
μA + Ei , M + k ∗ + 1 ≤ i ≤ cM .
In these scenarios .cM is a user-specified length of time after which we will terminate
the sequential change point detection procedure if we have not yet detected a change
point. First we consider the open-ended detection problem. We consider a change
point detector based on the residuals
1 Σ
M
Êi,M = Xi − X̄M ,
. where X̄M = Xl .
M
l=1
The basic procedure we study is to compare the size of the running average of the
residuals based on the mean of the training sample, which are proportional to the
partial sum
Σ
M+k
SM (k) =
. Êl,M , (6.1.5)
l=M+1
alternately consider. We terminate the process and declare that a change in the mean
has occurred at the stopping time
⎛
min {k : |SM (k)| ≥ g(M, k)}
τ (M) =
. (6.1.7)
∞, if |SM (k)| < g(M, k) for all k ≥ 1.
Our aim is to calibrate the boundary function so that the probability that the process
.|SM (k)| will cross .g(M, k) is controlled under .H0 , and is as large as possible under
.HA . This is done by choosing the constant .v = v(α) so that
and
|M |
|Σ | ( )
| |
.| Ei − σ WM,1 (M)| = OP M ζ
| |
i=1
and
| |
| M+Lx⎦ |
1 | Σ |
. sup | El − σ WM,2 (x)|| = OP (1)
ζ |
1≤x<∞ x |l=M+1 |
Assumption 6.1.2 (i) .infδ≤t≤1 w(t) > 0 for all .0 < δ < 1, and (ii) .w(t) is non-
decreasing in a neighbourhood of 0.
Similarly to .I (w, c) of (1.2.4), we define
⎧ 1/2 ⎛ ⎞
∗ 1 cw 2 (t)
I (w, c) =
. exp − dt.
0 t t
328 6 Sequential Monitoring
Our main result gives the asymptotic probability that .τ (M) < ∞ under the null
hypothesis.
Theorem 6.1.1 If .H0 of (6.1.1), Assumptions 6.1.1, 6.1.2 are satisfied and
I ∗ (w, c) < ∞ with some .c > 0, then
.
⎧ ⎫
|W (t)|
. lim P {τ (M) < ∞} = sup < vσ ,
M→∞ 0≤t≤1 w(t)
= oP (1), (6.1.9)
M −1/2 M ζ −1/2
= OP (1) sup x ζ + OP (1) sup .
1≤x≤aM w(x/(M + x)) aM≤x≤M w(x/(M + x))
We have that,
t 1/2
. lim = 0,
t→0 w(t)
and therefore
6.1 Sequential Detection Procedures and Stopping Times 329
M −1/2 x ζ
. sup = O(1)
1≤x≤aM (x/(M + x))1/2
M −1/2 x ζ
. sup = o(1)
aM≤x≤M w(x/(M + x))
M −1/2 ζ M ζ −1/2
= OP (1) sup x + OP (1) sup
M≤x<∞ 1 + x/M M≤x<∞ 1 + x/M
= oP (1).
where .{W1 (t), t ≥ 0} and .{W2 (t), t ≥ 0} are independent Wiener processes. By a
change of variables we get that
┌⎛ ⎛ ⎞┐−1 | ⎛ ⎞ |
x⎞ x | x x |
. sup 1+ w |W2 − W1 (1)|
1≤x<∞ M M + x M M
330 6 Sequential Monitoring
1
= sup |W2 (t) − tW1 (1)|
1/M≤t≤1 (1 + t)w(t/(1 + t))
1
→ sup |W2 (t) − tW1 (1)|
0≤t≤1 (1 + t)w(t/(1 + t))
cM
. lim = c.
M→∞ M
⎧ ⎫
|W (t)|
. lim P {τc (M) < cM } = sup < vσ ,
M→∞ 0≤t≤c/(1+c) w(t)
long as
(M − k ∗ )
. |Δ(M)| → ∞,
M 1/2
6.2 Linear Models 331
(M − k ∗ )|ΔM | P
≥ OP (1) + → ∞.
2M 1/2 w(1/2)
|σ̂M − σ | = oP (1)
.
then Theorem 6.1.1 remains true when .σ is replaced with .σ̂M . Such an estimator
may be computed from the training sample as detailed in Sect. 3.1.
It is also straightforward to extend Theorem 6.1.1 vector-valued observations.
We refer to Exercise 6.6.3 to sequentially detect a change in the means of random
vectors.
yi = xT
. i β 0 + Ei , 1 ≤ i ≤ M, β 0 ∈ Rd , xi ∈ Rd .
H0 : yi = xT
. i β 0 + Ei , M + 1 ≤ i < ∞, (6.2.1)
yi = xT ∗
i β A + Ei , M + k + 1 ≤ i < ∞ with β 0 /= β A . (6.2.2)
Let .β̂ M be the least square estimator for .β 0 using the historical sample,
⎛ ⎞−1
β̂ M = XT
. M XM XM YM ,
332 6 Sequential Monitoring
where
⎞
⎛ ⎛ ⎞
xT
1 y1
⎜ xT ⎟ ⎜ y2 ⎟
⎜ 2⎟ ⎜ ⎟
. XM =⎜ . ⎟ and YM = ⎜ . ⎟.
⎝ .. ⎠ ⎝ .. ⎠
xT
M yM
As in Chap. 4, we will assume that the covariate and error series .{zi = (xT T
i , Ei ) , i ∈
Z} are .L -decomposable for some .ν > 4. In this case we have by the ergodic
ν
theorem that
1 T P
. X XM → A.
M M
The model residuals of the incoming data based on the original parameter
estimate from the training sample are
Êi,M = yi − xT
. i β̂ M , M + 1 ≤ i < ∞. (6.2.3)
To define the monitoring strategy based on these residuals, we define a detector and
a boundary function. Our detector is
⎛ ⎞T ⎛ ⎞
Σ
M+k Σ
M+k
−1
ZM (k) =
. xi Êi,M D xi Êi,M , 1 ≤ k < ∞. (6.2.4)
i=M+1 i=M+1
Theorem 6.2.1 Assume that .H0 of (6.2.1), Assumptions 6.1.2–6.2.1 are satisfied,
I ∗ (w, c) < ∞ with some .c > 0, and .{zi = (xT
.
T
i , Ei ) , i ∈ Z} are .L -decomposable
ν
⎧ ⎛ d ⎞1/2 ⎫
⎨ 1 Σ ⎬
. lim P {τL (M) < ∞} = P sup Wl2 (u) ≤v ,
M→∞ ⎩0<u≤1 w(u) ⎭
l=1
.Êi,M = Ei − xT
i (β̂ M − β 0 ),
where .β 0 is the common value of the regression parameter under the null hypothesis.
Hence
Σ
M+k
k Σ
M Σ
M+k
k Σ T
M
. xi Êi,M − xi Êi,M = xi Ei − xi Ei
M M
i=M+1 i=1 i=M+1 i=1
⎛ ⎞
Σ
M+k
k Σ T
M
− xi xT
i − xi xi (β̂ M − β 0 ).
M
i=M+1 i=1
Since
|| || ⎛ ⎞
|| ||
. ||β̂ M − β 0 || = OP M −1/2 ,
= oP (1).
We also have that .{xi Ei , i ∈ Z} is .Lν/2 -decomposable, and therefore for each
M there are independent Gaussian processes .{WM,1 (x), 1 ≤ x ≤ M} and
.{WM,2 (x), 1 ≤ x < ∞} such that
|| x ||
||Σ ||
1 || ||
. max || xi Ei − WM,1 (x)|| = OP (1)
1≤x≤M x ζ || ||
i=1
334 6 Sequential Monitoring
and
|| M+x ||
|| Σ ||
1 || ||
. sup || xi Ei − WM,2 (x)|| = OP (1)
1≤x<∞ x
ζ || ||
i=M+1
with some .ζ < 1/2, .EWM,1 (x) = EWM,2 (x) = 0 and .EWM,1 (x)WT M,1 (y) =
EWM,2 (x)WT M,2 (y) = D min(x, y). Arguing as in the proof of Theorem 6.1.1 one
can verify
M −1/2
. sup
1≤x<∞ (1 + x/M)w(x/(x + M))
|| M+x ||
|| Σ x Σ
M ⎛ ⎞||
|| x ||
× || xi Ei − xi Ei − WM,2 (x) − WM,1 (M) ||
|| M M ||
i=M+1 i=1
= oP (1).
Observing that
┌⎛ ⎞T
M −1/2 x
. sup WM,2 (x) − WM,1 (M) D−1
1≤x<∞ (1 + x/M)w(x/(x + M)) M
⎛ x ⎞ ┐1/2
× WM,2 (x) − WM,1 (M)
M
⎛ d ⎞1/2
D 1 Σ
→ sup Wi2 (u) ,
0<u≤1 w(u) i=1
where .{Wi (u), 0 ≤ u ≤ 1}, .i ∈ {1, . . . , d} are independent Wiener processes, the
proof is complete. ⨆
⨅
Once again in practice we require an estimator of .D. If the random vectors
xi Ei are uncorrelated, one can use the sample covariance matrix of the weighted
.
residuals .xi Êi,M , computed from the training sample. If the weighted innovations
are correlated, a kernel-bandwidth estimator may be used as in Sect. 3.1.2.
These results can also be used to develop monitoring procedures for the parameters
in many of the time series models studied in Chap. 5. To begin, we consider dynamic
linear models as studied in Sect. 5.2. In this case we assume that the historical
sample satisfies
6.3 Time Series Models 335
.yi = xT
i β 0 + Ei , 1 ≤ i ≤ M, (6.3.1)
We remark that this setup also covers autoregressive models. The open-ended null
hypothesis is stated as
H0 : yi+M = xT
. i+M β 0 + Ei+M , 1 ≤ i < ∞, (6.3.2)
and we wish to detect as quickly as possible a change in the linear model parameters
occurring after .k ∗ additional observations have been obtained,
⎛
xT
i+M β 0 + Ei+M , 1 ≤ i < k∗
HA : yi+M =
.
xT
i+M β A + Ei+M , k ∗ ≤ i < ∞,
the null hypothesis and the additional Assumption 5.2.1 that the polynomial .1 −
β0,r+1 t − · · · − β0,r+d t d has all of its zeros outside of the unit circle in .C, there
exists an .Lν -decomposable sequence .{yi , i ∈ Z} satisfying (6.3.1). We define the
residuals as in (6.2.3) and we use the same boundary function and detector as in
(6.1.6) and (6.2.4).
Theorem 6.3.1 If .H0 of (6.3.2), Assumptions 5.2.1–5.2.4, 6.1.2 are satisfied and
I ∗ (w, c) < ∞ with some .c > 0, then
.
⎧
⎛r+d ⎞1/2 ⎫
1 ⎨
Σ ⎬
. lim P {τL (M) < ∞} = P sup Wl2 (u) ≤v ,
M→∞ ⎩0<u≤1 w(u) ⎭
l=1
Under the null hypothesis the parameters remain stable as we continue to observe
data, so that
Under the alternative, there is a change after .k ∗ observations following the training
sample:
⎛
(β0 + Ei,1 )yi−1 + Ei,2 , 1 ≤ i ≤ k ∗ − 1,
HA : yi+M =
.
(βA + Ei,1 )yi−1 + Ei,2 , k ∗ ≤ i < ∞,
and .β0 /= βA . A natural approach is based on comparing the weighted least squares
estimators of .β0 introduced in Sect. 5.3 based on the training sample to sequentially
updated estimators based on the incoming data. Let .β̂i,j be the least square estimator
as in (5.3.4), computed from .yi+1 , . . . , yj . We define the detector process
⎛ ⎞
ẐM (k) = k β̂M,M+k − β̂1,M ,
. 1 ≤ k < ∞, (6.3.4)
and we again use the boundary function of (6.1.6). We then define the stopping time
⎛
min {k : |ZM (k)| ≥ g(M, k)}
τRCA (M) =
.
∞, if |ZM (k)| < g(M, k) for all k ≥ 1.
Theorem 6.3.2 If .H0 of (6.3.3), Assumptions 5.3.1, 5.3.2, 6.1.2 are satisfied,
E log |β0 + E0,1 | /= 0 and .I ∗ (w, c) < ∞ with some .c > 0, then
.
⎧ ⎫
1
. lim P {τRCA (M) < ∞} = P sup |W (u)| ≤ vη ,
M→∞ 0<u≤1 w(u)
and
| |
| Σ
M+k |
1 | |
. sup |k β̂M,M+k − χi | = OP (1)
1≤k<∞ k ζ | |
i=M+1
6.3 Time Series Models 337
with some .ζ < 1/2. We note that the definition of .χi will change depending on
if .E log |β0 + E0,1 | > 0 or .E log |β0 + E0,1 | < 0. The result then follows as in
Theorem 6.1.1. ⨆
⨅
The parameter .η may be estimated from the training sample as detailed in
Sect. 5.3.
As we have seen to this point, sequential monitoring methods based on the
comparison of estimates for the parameters of interest from the historical and the
incoming observations often can be fashioned into a consistent approach. These
have so far made use of least-squares estimators that have simple forms. Many
models of interest though do not admit such easy-to-work-with estimators, as is the
case for GARCH.(p, q) models. We discuss now sequential monitoring procedures
for the parameters in a GARCH.(1, 1) model. In this case the observations are
assumed to be generated from the model
yi = σi Ei ,
.
σi2 = ω0 + α0 yi−1
.
2
+ β0 σi−1
2
, 1 ≤ i ≤ M.
H0 : σi2 = ω0 + α0 yi−1
.
2
+ β0 σi−1
2
, M +1≤i <∞ (6.3.5)
while under the alternative the parameters change after observing .k ∗ additional
observations
⎧
2 + β σ 2 , M + 1 ≤ i < M + k∗,
ω0 + α0 yi−1 0 i−1
.HA : σi =
2
2 + β σ 2 , M + k ∗ ≤ i < ∞,
ωA + αA yi−1 A i−1
hypothesis, we test if .(α0 , β0 ) remained the same during the observation period,
since .ω0 cannot be identified in the explosive (non-stationary) case. Let .θ̂ M ∈ R3
be the quasi-likelihood estimator based on the training sample, i.e.
⎧ ⎫
ΣM
.θ̂ M = argmax θ ∈ [δ, 1/δ] ,
3
li (θ )
i=1
where
1 Σ
M ⎛ ⎞T
F̂M =
. ui (θ̂ M ) ui (θ̂ M ) . (6.3.6)
M
i=1
⎧ ⎫
1 ⎛ 2 ⎞1/2
. lim {τCH (M) < ∞} = P sup W1 (u) + W2 (u)
2
≤v ,
M→∞ 0<u≤1 w(u)
Σ
M
. ui (θ̂ M ) = 0.
i=1
We also established in the proofs of Theorems 5.4.4 and 5.4.6 that there is an .Lν -
decomposable sequence .{χ i , i ∈ Z} such that
|| M ||
||Σ Σ
M || ( )
|| ||
. || ui (θ 0 ) − χ i || = OP M ζ ,
|| ||
i=1 i=1
and
6.3 Time Series Models 339
|| M+k ||
|| Σ Σ
M+k ||
1 || ||
. sup || ui (θ 0 ) − χ i || = OP (1)
1≤k<∞ k
ζ || ||
i=M+1 i=M+1
with some .ζ < 1/2. We note that the definition of .χ i depends on .E log(β0 +α0 E02 ) <
0 or .E log(β0 + α0 E02 ) > 0. We write
Σ
M+k Σ
M+k ⎛ ⎞
. ui (θ̂ M ) = ui (θ 0 ) + RM (k) θ̂ M − θ 0 .
i=M+1 i=M+1
and
1
. sup ||RM (k)|| = OP (1).
1≤k<∞ k
Thus we get
Σ
M+k Σ
M+k
k Σ
M
. ui (θ̂ M ) = ui (θ̂ M ) − ui (θ̂ M )
M
i=M+1 i=M+1 i=1
Σ
M+k
k Σ
M ⎛ ⎞
= χi − χ i + RM (k) θ̂ M − θ 0
M
i=M+1 i=1
k ⎛ ⎞
+ R̄M θ̂ M − θ 0
M
and
⎛ ⎞
1
||R̄M || = OP
. .
M
Due to the .Lν -decomposability of the vectors .{χ i , i ∈ Z}, for each M we can define
independent Gaussian processes .{WM,1 (x), 0 ≤ x ≤ M} and .{WM,2 (x), 0 ≤ x <
∞}, .EWM,1 (x) = EWM,2 (x) = 0, EWM,1 (x)WT T
M,1 (y) = WM,2 (x)WM,2 (y) =
F min(x, y),
|| x ||
||Σ || ( )
|| ||
. sup || χ i − WM,1 (x)|| = OP M ζ
1≤x≤M || ||
i=1
340 6 Sequential Monitoring
and
|| M+x ||
|| Σ ||
1 || ||
. sup || χ i − WM,2 (x)|| = OP (1)
1≤x<∞ k
ζ || ||
i=M+1
1 Σ
N
P
. ui (θ 0 ) (ui (θ 0 ))T → F.
N
i=1
Using the approximations one can repeat the proof of Theorem 6.1.1 to obtain the
limit result for .τCH (M). ⨆
⨅
Theorem 6.3.3 can be extended to the more general GARCH.(p, q) case. For
details we refer to Berkes et al. (2004).
ΔN > 0
. and N 1/2 ΔN → ∞ (6.4.1)
then
⎧ | k | ⎫
|Σ |
−1/2+α 1 1 | | D
N
. max | (Ei + ΔN )| − N 1−α
ΔN → N (0, 1),
σ 1≤k≤N k α | |
i=1
We note
⎛
kζ 1, if ζ ≤ α,
. max =
1≤k≤N kα N ζ −α , if ζ > α.
Thus we get
| k |
|Σ |
−1/2+α 1 | |
N
. max α | Ei − σ WN (k)| = oP (1).
1≤k≤N k | |
i=1
Now
1 1
. max α
|σ WN (k) + kΔN | ≤ max |σ WN (k)| +(N (1 − δ))1−α ΔN
1≤k≤N (1−δ) k 1≤k≤N (1−δ) k α
1 D 1
N −1/2+α
. max |WN (k)| → sup |W (u)|,
1≤k≤N (1−δ) kα 0<t<1−δ u α
342 6 Sequential Monitoring
N 1/2−α N 1−α ΔN = 1/(N 1/2 ΔN ) → 0. As such we get for all .δ > 0 that
.
⎛ ⎞
1 P
.N −1/2+α max |σ WN (k) + kΔN | − N 1−α ΔN → −∞.
1≤k≤N (1−δ) k α
D
N −1/2
. max |W (k) − W (N)| → sup |W (u) − W (1)|
N (1−δ)k≤N 1−δ≤u≤1
(6.1.1) versus (6.1.2), and the stopping time .τ (M) in (6.1.7). We set the weight
function in the boundary function definition to
w(u) = uκ ,
. 0 ≤ κ < 1/2. (6.4.2)
We consider the case when the change is early, i.e. it occurs relatively close to the
end of training sample and the size of the change .Δ(M) is not too small:
Theorem 6.4.1 If .HA of (6.1.2), Assumptions 6.1.1, 6.1.2, 6.4.2 are satisfied, and
the weight function .w(·) is of the form (6.4.2), then
τ (M) − a(M) D
. → N (0, 1),
b(M)
and
σ
.b(M) = a 1/2 (M).
(1 − κ)|Δ(M)|
M
. → 0, (6.4.3)
M
M1/2 Δ → ∞,
. (6.4.4)
k∗
. → 0, (6.4.5)
M
344 6 Sequential Monitoring
k∗
. → 0, (6.4.6)
M
and
⎛ ⎞
1M MΔ
. v − 1/2 → x. (6.4.7)
σ M M (M/M)κ
We note
⎛ ⎞1/(1−κ) ⎛ ⎞−1
M (1/2−κ)
2
M 1/2−κ ⎛ ⎞(κ−1/2)(1−κ)
. = M 1/2 Δ → 0
Δ3/2−κ Δ
which yields (6.4.4) via Assumption 6.4.2(ii). Applying Assumptions 6.4.2(i), (iii)
and (6.4.8) we obtain
k∗ ⎛ ⎞
. = O Δ1/(1−κ) M (1/2−κ)/(1−κ) M −φ = o(1),
M
proving (6.4.5). It is clear that (6.4.3) and (6.4.5) imply (6.4.6). Since by the
definition of .M
. v − ΔM1−κ M κ−1/2
⎛ ⎛ ⎞1/(1−κ) ⎞
vM 1/2−κ
− σ x v 1/2−κ M (1/2−κ) Δ−3/2+2κ
2
= v − ΔM κ−1/2
Δ
⎛ ⎞1/(1−κ)
= σ x Δκ−1/2 M (κ−1/2)/2 ,
It follows from the definition of .SM (k) of (6.1.5) that under .HA
Σ
M+k
k Σ
M
SM (k) =
. Ei − Ei + Δ(k − k ∗ + 1)1{k ≥ k ∗ }.
M
i=M+1 i=1
and
| M+k |
| Σ |
| |
| Ei − σ WM,2 (k)|
| |
i=M+1
. max
1≤k<k ∗ M 1/2 (1 + k/M)(k/(M + k))κ
kζ
= OP (1) max ∗
1≤k<k M 1/2 (1 + k/M)(k/(M + k))κ
⎛⎛ ⎞ ⎞
k ∗ 1/2−κ
= OP
M
and
|WM,2 (t)| D |W (t)|
. sup = sup
1≤t<k ∗ M 1/2 (t/M)κ 1/M≤t<k ∗ /M uκ
where .{W (u), u ≥ 0} is a Wiener process. We note that by the law of iterated
logarithm for the Wiener process at zero,
|W (t)|
. sup → 0 a.s. (M → ∞).
1/M≤t<k ∗ /M uκ
Thus we have
| M+k |
| Σ |
| |
| Ei |
| |
i=M+1
. max = oP (1). (6.4.10)
1≤k<k ∗ M 1/2 (1 + k/M)(k/(M + k))κ
346 6 Sequential Monitoring
ΔM
. lim =a>0
M→∞ M 1/2 (M/M)κ
with some scalar a, and therefore we conclude from (6.4.9) and (6.4.10) that
⎛ ⎞κ−1/2 ⎛ ⎞
M |SM (k)| ΔM P
. max ∗ 1/2 κ
− 1/2 → −∞.
M 1≤k<k M (1+k/M)(k/(M+k)) M (M/M)κ
= o(1).
= OP (1)Mκ−1/2 max k ζ −κ
k ∗ ≤k≤M
⎧( )1/2−κ
k ∗ /M , if ζ ≤ κ
= OP (1) −1/2
M ζ , if ζ > κ
= oP (1).
Similarly,
⎛ ⎞κ−1/2 |
M | || 1
. max WM,2 (k) || 1/2
| |
M ∗
k ≤k≤M M (1 + k/M)(k/(M + k))κ
|
1 |
− 1/2 | = oP (1).
M (k/M)κ |
Note the distribution of .WM,2 (x) does not depend on M. Let .{W (x), x ≥ 0} be a
Wiener process. Using Lemma 6.4.1 we obtain
⎛ ⎫ ⎛ ⎫
|SM (k)| |SM (k)|
. lim P max ≤ 1 = lim P max ≤1
M→∞ 1≤k≤M g(M, k) M→∞ k ∗ ≤k≤M g(M, k)
⎛ ⎫
|σ W (k) + kΔ|
= lim P max ≤1
M→∞ k ∗ ≤k≤M g(M, k)
⎛ ⎫
|σ W (k) + kΔ|
= lim P max ≤1
M→∞ 1≤k≤M g(M, k)
⎛ ⎫
|σ W (k) + kΔ|
= lim P max ≤ v
M→∞ 1≤k≤M M 1/2 (k/M)κ
⎛ ⎛ ⎞κ ⎛
M |σ W (k) + kΔ|
= lim P max
M→∞ M 1≤k≤M M 1/2 (k/M)κ
⎞
ΔM
− 1/2
M (M/M)κ
⎛ ⎞κ ⎛ ⎞⎫
M ΔM
≤ v − 1/2
M M (M/M)κ
= o(x)
on account of (6.4.7), where .o(x) denotes the standard normal distribution function.
Thus we have
τ (M) P
. → 1.
a(M)
and therefore
M (1/2−κ)/(1−κ)
.τ (M) ≈ v 1/(1−κ) in probability.
|Δ(M)|1/(1−κ)
This implies that the shortest reaction time is achieved if .κ is close to 1/2. We
would react to the change instantaneously if .κ = 1/2 but this is not allowed
in the definition of the stopping time. If in the definition of .g(M, k) we use
.w(u) = (u log+ log(1/u))
1/2 , then following the proof of Theorem 6.4.1 one can
show
⎛ ⎞
.τ (M) = OP (log log M)
1/2
,
in Assumption 6.4.1, that the sum of the .E i ’s can be approximated with Gaussian
processes in .Rd :
Assumption 6.4.3 For each N we can define Gaussian processes .{WN (x), 0 ≤
x ≤ N}, .EWN (x) = 0, EWN (x)WN (y) = min(x, y)J, .J is non-singular, such
that
|| x ||
1 ||
||Σ
||
||
. max || E i − WN (x) || = OP (1)
1≤x≤N x ζ || ||
i=1
N||δ||2 → ∞
. (6.4.13)
then
⎛ ⎛ k ⎞T ⎛ k ⎞
N 3/2−2α ⎝ 1 Σ −1
Σ
. max (E i + δ) J (E i + δ)
σ (N ) 1≤k≤N k 2α
i=1 i=1
⎞ D
−N 2−2α δ T J−1 δ → N (0, 1)
where
σ 2 (N ) = 4δ T J−1 δ
.
and
|⎛ ⎞T |
| k |
N −3/2+2α 1 | Σ |
. max 2α || E i − WN (k) J−1 (WN (k) + kδ)||
||δ|| 1≤k≤N k | i=1 |
|| k ||
1 ||
||Σ
|| N −1+α
|| 1
≤ N −1/2+α max α || E i − WN (k)|| max ||WN (k) + kδ||
1≤k≤N k || || ||δ|| 1≤k≤N k α
i=1
It follows from Assumption 6.4.3, as argued in the proof of Lemma 6.4.1, that
|| k ||
||Σ ||
−1/2+α 1 || ||
.N max || E i − WN (k)|| = oP (1)
1≤k≤N k α || ||
i=1
and
N −1+α 1
. max α ||WN (k) + kδ||
||δ|| 1≤k≤N k
N −1+α 1 N −1+α 1
≤ max α ||WN (k)|| + max α ||kδ||
||δ|| 1≤k≤N k ||δ|| 1≤k≤N k
1 1
≤ N −1/2+α max α ||WN (k)|| + O(1) = oP (1) + O(1).
N 1/2 ||δ|| 1≤k≤N k
Thus we get
N −1+α 1
. max α ||WN (k) + kδ|| = oP (1).
||δ|| 1≤k≤N k
Since the distribution of .{WN (x), x ≥ 1} does not depend on N , according to our
calculations we need to prove only
⎛ ⎫
N −3/2+2α 1
. max 2α (W(k) + kδ)T J−1 (W(k) + kδ) − N 2−2α δ T J−1 δ
||δ|| 1≤k≤N k
D
→ N (0, 1),
6.4 Distribution of the Stopping Time 351
1 ⎛ ⎞
. max 2α
WT (k)J−1 W(k) = OP N 1−2α
1≤k≤N k
and
| | ⎛ ⎞
| |
. max |k 1−2α kδ T J−1 W(k)| = OP ||δ||N 3/2−2α .
1≤k≤N
and
Next we write
Now
N −3/2+2α 1 || T −1 T −1
|
|
. max | W (k)J W(k) − W (N )J W(N ) |
||δ|| (1−δ)N ≤k≤N k 2α
N −3/2
= O(1) max ||W(N ) − W(k)||(||W(N )|| + ||W(k)||)
||δ|| (1−δ)N ≤k≤N
⎛ ⎞
1
= OP = oP (1).
N 1/2 ||δ||
352 6 Sequential Monitoring
Also,
| ⎛ ⎞||
N −3/2+2α || T −1 T −1 |
. max
|N (1−δ)≤k≤N k 1−2α
δ J W(k) − δ J W(N ) |
||δ||
= O(1)N −1/2 max ||W(k) − W(N )||,
N (1−δ)≤k≤N
D
N −1/2
. max ||W(k) − W(N )|| → sup ||W(1) − W(u)||.
N (1−δ)≤k≤N 1−δ≤u≤1
By the almost sure continuity of .{W(u), 0 ≤ u ≤ 1} we conclude that for all .x > 0,
|
⎛
N −3/2+2α || 1
. lim lim sup P max (W(k) + kδ)T J−1 (W(k) + kδ)
δ→0 N →∞ ||δ|| | N (1−δ)≤k≤N k 1−2α
1 ⎛ T ⎞ || ⎫
− max W (N )J W(N )+2kδ J W(N )+k δ J J ||>x =0.
−1 T −1 2 T −1
N (1−δ)≤k≤N k 1−2α
where .N (0, 1) is a standard normal random variable. This completes the proof. ⨆
⨅
The result of Lemma 6.4.2 can be easily rewritten for the norms used in Sect. 6.1.
Namely,
⎛⎛ ⎞T ⎛ k ⎞⎞1/2
⎛ Σk Σ
1 ⎝
N −1/2+α
. max (E i + δ) J−1 (E i + δ) ⎠ (6.4.14)
1≤k≤N k α
i=1 i=1
6.4 Distribution of the Stopping Time 353
⎛ ⎞1/2 ⎫ D
T −1
−N 1−α
δ J δ → N (0, 1),
theorem that the covariates are .Lν -decomposable, there is a matrix .A such that
|| M+k ||
|| Σ ||
1 || T ||
. max || xi xi − kA|| = OP (1).
1≤k<∞ (k log+ log(k))1/2 || ||
i=M+1
.δ = A(β A − β 0 ).
Theorem 6.4.2 If .HA of (6.2.2), Assumptions 4.1.1–4.1.2, 6.2.1, 6.4.4 and (6.4.2)
are satisfied, then
τL (M) − a(M) D
. → N (0, 1),
b(M)
where .N (0, 1) is a standard normal random variable,
⎛ ⎞1/(1−κ)
vM 1/2−κ
a(M) =
.
(δ T D−1 δ)1/2
and
1
.b(M) = a 1/2 (M).
(1 − κ)(δ T D−1 δ)1/2
Proof Let
M = M(M, x)
.
⎛ ┌ ┐1/(1−κ) ⎞1/(1−κ)
1/2−κ 1/2−κ (1/2−κ)2
vM v M
=⎝ T −x ⎠ . (6.4.15)
(δ D−1 δ)1/2 ([δ T D−1 δ)]1/2 )3/2−2κ
354 6 Sequential Monitoring
We note that .M defined above satisfies (6.4.3)–(6.4.6). Along the lines of the proof
of (6.4.7) and (6.4.8) we have
⎛ ⎞1/(1−κ)
vM 1/2−κ
. M ≈
(δ T D−1 δ)1/2
and
⎛ ⎛ ⎞1/2 ⎞
M−1/2+κ vM −1/2+κ − M1−κ δ T D−1 δ
. → x. (6.4.16)
⎛ ⎛ ⎛Σ
k
⎞T ⎛ k
Σ
⎞ ⎞1/2 ⎫
−1/2+κ 1 −1
.P M max (xi Ei + δ) D (xi Ei + δ) ≤c
1≤k≤M k κ
i=1 i=1
⎛ ┌ ⎛⎛ Σk ⎞T ⎛Σk ⎞⎞1/2
−1/2+κ 1 −1
=P M max (xi Ei + δ) D (xi Ei + δ)
1≤k≤M k κ
i=1 i=1
⎛ ⎞1/2 ┐ ⎛ ⎛ ⎞1/2 ⎞⎫
T −1 −1/2+κ T −1
−M 1−κ
δ D δ ≤M cM 1/2−κ
−M 1−κ
δ D δ .
= o(x),
where .o(x) denotes the standard normal distribution function. The limit result in
(6.4.17) is the same as in (6.4.11), and it follows as in the proof of Theorem 6.4.1.
First we note that
τL1−κ − a 1−κ D
. → N (0, 1),
b1
where
┌ 2
┐1/(1−κ)
v 1/2−κ M (1/2−κ) 1
b1 = b1 (M) =
.
T
= T
a 1/2−κ (M).
[(δ D−1 δ)1/2 ]3/2−2κ (δ D−1 δ)1/2
Σ
τCH +M
1
.δ̂ = ui (θ̂ M )
M
i=τCH
to estimate the time of change. We can apply Theorem 6.4.2 to get an upper
bound for .τCH (M), for the reaction time to detect the change. We assume that
in the definition of .τCH (M) we use .w(u) = uκ , 0 ≤ κ < 1/2, .||δ|| → 0 and
.M
1/2 ||δ|| → ∞. Let
⎛ ⎞1/(1−κ)
vM 1/2−κ
.aCH (M) = ⎝ ⎠ ,
T
(δ̂ F̂−1
M δ̂) 1/2
and
√
aCH (M) T −1 −1/2
bCH (M) =
. (δ̂ F̂M δ̂) .
1−κ
356 6 Sequential Monitoring
The size of the change is measure in the change of the derivative of the likelihood
function. If we are interested in measuring the change in .(α0 − αA , β0 − βA ) we
could use the detector
⎛ ⎞T ⎛ ⎞
GM (k) = k(θ̂ M,M+k − θ̂ 0,M ) D̂−1
.
M k(θ̂ M,M+k − θ̂ 0,M ) ,
where .θ̂ i,j is the quasi-maximum likelihood estimator for .α0 , β0 computed from
{yk , i < k ≤ j } and .D̂M is the estimator for the asymptotic covariance of .M 1/2 θ̂ 0,M
.
computed from the training sample. If the corresponding stopping time is denoted
∗ (M), .0 ≤ κ < 1/2, then
by .τCH
∗ (M) − a ∗ (M)
τCH CH D
.
∗ → N (0, 1)
bCH (M)
with
⎛ ⎞1/(1−κ)
vM 1/2−κ
∗
aCH
. (M) = ⎝ T
⎠ ,
(δ̂ D̂−1
M δ̂)
1/2
and
/ ∗ (M)
aCH T
∗
.bCH (M) = (δ̂ D̂−1
M δ̂)
−1/2
.
1−κ
Example 6.5.1 (Monitoring the Exchange Rate Between USD and Pound Ster-
ling) Figure 6.1 shows the daily spot exchange rates1 between U.S. dollars and
Pounds sterling from the year 2022 obtained from the Federal Reserve Bank of St.
1 There were ten missing values due to holidays, which we imputed using linear interpolation
1.35
1.30
USD to Pound sterling
1.25
1.20
1.15
1.10
Fig. 6.1 A plot of the spot exchange rate between US dollars and Pounds sterling from 2022. As
a demonstration of sequential monitoring procedures, we used the data prior to September 1, 2002
(M = 171) as a training sample to monitor for changes in the linear trend of the series
yt = β0 + β1 t + Et .
. (6.5.1)
Z 1/2
k
30
g(M, k)
25
20
15
10
5
0
1/2
A plot of ZM (k) against g(M, k) is shown in Fig. 6.2. We saw that the process
1/2
ZM (k) crossed the critical boundary at the date corresponding to September 27th,
approximately 20 days after the conclusion of the election of a new prime minister
in the UK, and one day after a large drop in the spot-exchange rate.
yt = β1 + β2 t + β3 xt + Et ,
. (6.5.3)
6.6 Exercises 359
0.8
0.6
Shiller HPI
0.4
0.2
0.0
−0.2
Fig. 6.3 First differenced S&P CoreLogic Case-Shiller Home Price Index (HPI) series over a 10
year period from 1991 to the end of 2000 (N = 120), obtained from the Federal Reserve Bank of
St. Louis database (St. Louis, 2023)
where xt is the first differenced real disposable personal income per capita change
at a monthly level over the same period. We used January 1991-December 1995
as the training (historical) sample with M = 60, and considered an open-ended
monitoring procedure from there. An application of the KPSS test (Kwiatkowski
et al., 1992) suggested that the covariate series of first differenced real disposable
personal income per capita is reasonably stationary in the training sample. In order
to compute ZM (k) in (6.2.4), we estimated D using the long-run covariace matrix
estimator in Sect. 3.1.2 with the Bartlett kernel and the bandwidth of (Andrews,
1991) from the training sample.
1/2
Figure 6.4 shows a plot of the process ZM (k) against the boundary function
1/2
g(M, k) of Eq. (6.5.2) as in the previous example. We saw that the process ZM (k)
stayed well below the boundary function g(M, k) for the first approximately year
and a half of monitoring, but then sharply increases after a large shift in the rate of
1/2
increase of the HPI occurring in the later part of the year 1997. The process ZM (k)
crossed the boundary at a location corresponding to November, 1997, which is when
we may have sounded an alarm that the model (6.5.3) appears to have undergone a
structural change.
6.6 Exercises
Z 1k 2
80
g (M,k)
60
40
20
0
⎛ | i |⎞−1
|Σ |
∗ | |
.SM (k) = SM (k) max N −1/2 | (Xl − X̄M )|
1≤i≤N | |
l=1
Compute
{ }
. lim P τ ∗ (M) < ∞
M→∞
Xi = μ0 + ei ,
. 1≤i≤M
H0 : Xi = μ0 + ei ,
. M + 1 ≤ i < ∞.
1 Σ
M
.X̄M = Xi ,
M
i=1
if the null hypothesis, Assumption 6.1.2 hold, and I ∗ (w, c) < ∞ with some c > 0.
Exercise 6.6.4 Show that the
if
k∗
. → 0 and M 1/2 ||β 0 − β A || → ∞,
M
where τL (M) is defined in (6.2.5) and Assumptions 5.2.1–5.2.4, 6.1.2 hold.
Exercise 6.6.5 Define a closed end version of τL (M) and provide its asymptotic
properties.
Exercise 6.6.6 We wish to test the null hypothesis of (6.2.1) using the detector
| M+k |
| Σ k Σ
M |
| |
.Ẑk = | Êi,M − Êi,M | ,
| M |
i=M+1 i=1
where the residuals Êi,M are defined in (6.2.3). Define the corresponding stopping
time and discuss its asymptotic properties.
Exercise 6.6.7 We consider the dynamic AR(1) model, i.e. d = 1 in (6.3.1) and
(6.3.2). Under the alternative the regression parameter changes to 1 immediately
after the historical sample. Show that
Exercise 6.6.10 Show that Lemma 6.4.1 remains true when 1/2 ≤ α < 1.
Exercise 6.6.11 Show that Lemma 6.4.2 remains true when 1/2 ≤ α < 1.
6.7 Bibliographic Notes and Remarks 363
The first monitoring scheme to find changes in regression parameters akin to the
approaches considered in this chapter was introduced by Chu et al. (1996), and it
has become the starting point of substantial research. Chu et al. (1996) used the
weight function .w(u) = (u(a 2 + log(1/u)))1/2 and they also provided an upper
bound for .P {τ (M)} < ∞ with this weight function. Theorem 6.1.1 was obtained
by Aue and Horváth (2004), assuming that the innovations are independent and
identically distributed. Zeileis et al. (2005) and Aue et al. (2014) studied monitoring
schemes in linear models with dependent errors. Changes in dynamic regressions
were investigated in Horváth et al. (2022). Kirch (2007), Kirch (2008) and Hušková
and Kirch (2012) provided resampling methods to find critical values for sequential
monitoring. Hlávka et al. (2012) investigated the sequential detection of changes of
the parameter of autoregressive models, i.e. .d = 0 in the dynamic linear model of
(6.3.1) and (6.3.2). Gösmann et al. (2021) propose a likelihood-ratio based approach
for open-ended sequential monitoring. Leisch et al. (2000) used fluctuation tests to
monitor model parameters. Berkes et al. (2004) and Horváth et al. (2006) justified
the applicability of monitoring in volatility models. Homm and Breitung (2012)
compared several methods to find bubbles in stock markets, detecting changes from
stationary to non stationary segments. Similarly, to find change from stationarity to
non-stationarity is discussed in Steland (2006) and Horváth et al. (2020). Bubble
detection in real time is an interesting application of monitoring methods. Time
series models for this purpose assume a stationary sequence turns into an explosive
or mildly explosive series before returning to a stationary phase or a different type of
explosive segment. We refer to Phillips et al. (2014), Phillips et al. (2015), Phillips
and Shi (2018) and Phillips and Yu (2011) for methods to detect bubbles and the
applications of their methods to several financial bubbles. Bardet and Kengne (2014)
considered more general causal and affine processes. Hoga (2017) investigated
multivariate models. A graph based approach is developed in Chen (2019).
The method proposed in this chapter are highly related to other sequential
monitoring techniques in the vein for Shiryaev’s and Robert’s methodology, see
e.g. Shiryaev (1963), Roberts (1966), Pollak (1987), Moustakides (1986), Siegmund
(2013), Lai and Xing (2010). These typically assume incoming observations are
serially independent. A recent review is Tartakovsky et al. (2014). Aue et al. (2012),
Aue et al. (2014) used monitoring in functional models as in Chap. 8.
The limits in the theorems of Sect. 6.1 do not exist if .w(u) = u1/2 is used in
the definition of the boundary function. Horváth et al. (2007) modified the boundary
function so .w(u) = u1/2 could be incorporated into the procedure.
Chow and Hsiung (1976) proved Lemma 6.4.1 for independent and identically
distributed random variables with positive drift. The proof in this chapter is inspired
by Berkes and Horváth (2003a).
Chapter 7
High-Dimensional and Panel Data
We have considered in several instances, see e.g. Sects. 1.3, 5.5, and 5.6, performing
change point analysis with multivariate time series. In this chapter we change
our notation slightly to denote such multivariate time series data as .Xi,t , t ∈
{1, . . . , T }, i ∈ {1, . . . , N}, where we think of t and T as denoting “time”, and
N denotes the dimension or number of “cross-sectional units” that we observe.
For example, such data might comprise real valued observations of N financial or
economic time series over T time units.
Given the fast proliferation and accessibility of economic and financial data
available today, often at least one or both of N and T are large. This has spurred
efforts to understand how various methods to analyze the data .Xi,t , including change
point analysis, are affected when N or T are large in relation to each other, or
are both large. The case when .N >> T and T is relatively small is sometimes
referred to in the econometrics literature as “panel data”, with the various N cross-
sectional time series referred to as “panels”. Other relationships between N and T
frequently arise, and the analysis of such data generally falls within the scope of
high-dimensional multivariate time series analysis.
In this chapter we discuss the asymptotic theory behind change point methods
for such high-dimensional or panel data. An important consideration throughout is
that we allow both N and T to tend to infinity. We see in many cases that the relative
rates at which N and T diverge have a crucial impact on the form of the limiting
distribution of natural change point test statistics.
To illustrate some of the challenges in this setting, in Sect. 7.1 we begin by
considering change point detection methods for the mean of high-dimensional time
series that are cross-sectionally independent. These methods are adapted to deal with
cross-sectional dependence in form of (linear) common factors in Sect. 7.2. Change
point detection in the context of high-dimensional linear regression is considered in
Sect. 7.3, and for high-dimensional RCA models in Sect. 7.4.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 365
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_7
366 7 High-Dimensional and Panel Data
H0 : t0 > T
. (7.1.2)
HA : 1 < t0 < T .
. (7.1.3)
We note here that we are careful to formulate the change point hypotheses in
terms of the time of change, rather than the magnitude of the changes represented
by .{δi , i ∈ {1, . . . , N }}. An important consideration throughout this Chapter will be
the determination of what “magnitude” of a change is required in order for natural
change point statistics to be able to differentiate .H0 from .HA .
To detect a change in the i’th cross-section, we use the CUSUM process
⎛ ⎞
1 ⎣T x⎦
ZT ,i (x) =
. ST ,i (x) − ST ,i (1) ,
T 1/2 T
where
⎣T
Σ x⎦
.ST ,i (x) = Xi,t ,
t=1
where the .σi2 ’s are some suitably chosen standardization constants. In this section
we assume the errors .ei,t are cross-sectionally independent linear processes
Assumption 7.1.2
(i) for all .i ∈ {1, . . . , N }, .t ∈ {1, . . . , T },
∞
Σ
ei,t =
. ci,l Ei,t−l
l=0
1 Σ
N
. lim sup E|Ei,0 |κ < ∞
N →∞ N
i=1
(vi) there are .c0 and .α > 2 such that for all .1 ≤ i ≤ N, 0 ≤ l < ∞
|ci,l | ≤ c0 (l + 1)−α
.
∞
Σ ∞ Σ
Σ ∞
ai2 =
.
2
ci,l +2 ci,l ci,l+h .
l=0 h=1 l=0
N
. → 0.
T2
368 7 High-Dimensional and Panel Data
We will also consider methods in Sect. 7.2 where we only assume that
min(N, T ) → ∞. Section 7.3 further provides some results when T is fixed
.
and only .N → ∞.
The next theorem gives the limit distribution of the standardized .l2 -aggregated
cross-sectional CUSUM process.
Theorem 7.1.1 If .H0 of (7.1.2) and Assumptions 7.1.1–7.1.3 hold, then
D[0,1]
V̄N,T (x) −→ ┌(x),
.
where .{W (u), u ≥ 0} denotes a Wiener process. The representation in (7.1.5) has
an interesting connection to the Brownian bridge. Namely, if .{B(t), 0 ≤ t ≤ 1} is a
Brownian bridge, then
D
{B(t), 0 ≤ t ≤ 1} = {(1 − t)W (t/(1 − t)), 0 ≤ t ≤ 1} .
.
where
∞
Σ ∞
Σ
∗ ∗ ∗
ei,t
. = ci,l Ei,t−l with ci,l = ci,k .
l=1 k=l+1
7.1 Change in the Means of High-Dimensional Observations 369
Let
⎛ ⎞
⎣T x⎦
1 ⎝Σ ⎣T x⎦ Σ ⎠
T
.QT ,i (x) = Ei,t − Ei,t .
T 1/2 T
t=1 t=1
By (7.1.6) we have
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
k k k 1 2
ZT2 ,i
. = ai2 Q2T ,i + 2ai QT ,i + ηi,k .
T T T T
∗ i ∈ Z} we have
and by the stationarity of .{ei,j
⎛ ( ) ( ∗ )2 ( ∗ )2 ⎞ ( ∗ )2
∗ 2
.
2
Eηi,k ≤ 8 E ei,0 + E ei,k + E ei,T ≤ 24E ei,0 .
with some constant .c2 , and therefore Assumptions 7.1.2(iv) and 7.1.3 yield
|N |
|Σ 1 ⎣T x⎦(T − ⎣T x⎦) ||
1 |
. sup | EZT ,i (x) −
2
| = o(1). (7.1.7)
0≤x≤1 N 1/2 | σi2 T2 |
i=1
370 7 High-Dimensional and Panel Data
where
N ⎧
Σ ⎫
1 ⎣T x⎦(T − ⎣T x⎦)
RN,T (x) =
. QT ,1 −
2
.
N 1/2 T2
i=1
with some constant .c3 . By the Rosenthal inequality (see Petrov 1995, p. 59), we
conclude
⎧ ⎛ ⎞2 ⎫
c4
.EQT ,i ≤ + T 2 EEi,0
4 4 2
T EEi,0 .
T
1 Σ 4
N
≤ c6 EEi,0 ,
TN
i=1
Σ
N ⎛ ⎞
LT ,i =
. λl Q2T ,i (xl ) − EQ2T ,i (xl ) .
l=1
EL2T ,i ≥ c7
. if T ≥ T ∗ , (7.1.9)
with some .c7 > 0 and .T ∗ . Using again the Rosenthal inequality (see Petrov 1995,
p. 59) we get
where .c8 only depends on the .λl ’s. Applying (7.1.9) and (7.1.10) we conclude
⎛N ⎞⎛ N ⎞−1/2
Σ⎛ ⎞2/κ Σ
. E|LT ,i | κ/2 2
ELT ,i
i=1 i=1
⎧⎛ ⎞2/κ ⎫
c9 ⎨ Σ
N ⎬
≤ T 1−κ/2 E|Ei,0 |κ + N 2κ
N 1/2 ⎩ ⎭
i=1
⎧⎛ ⎞2/κ ⎛ ⎞2/κ ⎫
⎨ Σ
N
1 Σ
N ⎬
1−κ/2 1
≤ c10 N (4−κ)/(2κ)
T E|Ei,0 |κ + E|Ei,0 |κ
⎩ N N ⎭
i=1 i=1
→0
on account of Assumptions 7.1.2 and 7.1.3. Using Lyapunov’s central limit theorem
(see Theorem 4.9 of Petrov 1995) we get
1 Σ
N
D
. LT ,i → N, (7.1.11)
N 1/2
i=1
where .N is a normal random variable with zero mean and variance which is a
function of the .xl ’s and .λk ’s. Now the Cramér–Wold lemma (see Billingsley 1968)
gives the convergence of the finite dimensional distributions.
372 7 High-Dimensional and Panel Data
The last step of the proof is the computation of the variance of the limiting normal
random variable in (7.1.11). Since .QT ,i (x) is the CUSUM process of independent
and identically distributed random variables, elementary arguments give
| ⎛ ⎞ ⎛ ⎞|
| |
. max |cov Q2T ,i (x), Q2T ,i (y) − cov B 2 (x), B 2 (y) | → 0,
1≤i≤N
which implies
⎛ ⎞
E┌(x)┌(y) = cov B 2 (x), B 2 (y) ,
.
D
[U (0), U (h)] = [U (0), exp(−h)U (0) + (1 − exp(−2h))1/2 N],
. (7.1.13)
and therefore
Applying Rosenthal’s inequality (see Petrov 1995, p. 59) we obtain for all .1 ≤ l ≤
k ≤ T that
⎧Σ ⎡N ┐
N ⎛ ⎞κ/2 Σ ⎛ ⎞2 κ/4
−κ/4 −κ/2
A1 (k, l) ≤ c1 N
. T E ηi,k − Eηi,k
2 2
+ E ηi,k − Eηi,k
2 2
i=1 i=1
⎡N ┐
Σ
N ⎛ ⎞κ/2 Σ ⎛ ⎞2 κ/4 ⎫
+ E ηi,l − Eηi,l
2 2
+ E ηi,l − Eηi,l
2 2
i=1 i=1
⎧Σ ⎛N ⎞
N
( ) Σ⎛ ⎞ κ/4 ⎫
−κ/4 −κ/2
≤ c2 N T E|ηi,k | + E|ηi,l | +
κ κ
ηi,k + ηi,l
4 4
.
i=1 i=1
∗ γ
E|ei,t
. | ≤ c3 E|Ei,0 |γ
which yields
4
Eηi,k
. ≤ c4 EEi,0
4
and E|ηi,k |κ ≤ c5 E|Ei,0 |κ .
374 7 High-Dimensional and Panel Data
Thus we get
⎧ ⎡ ┐κ/4 ⎫
⎨1 Σ
N
1 Σ 4
N ⎬
A1 (k, l) ≤ c6 T −κ/2
. E|Ei,0 |κ + EEi,0
⎩N N ⎭
i=1 i=1
≤ c7 T −κ/2 . (7.1.15)
A2 (k, l)
.
⎧ ΣN ┌ ⎛ ⎞ ⎛ ⎞
1 k l
=E 1/2
Q T ,i η i,k − Q T ,i ηi,l
(N T ) N N
i=1
⎛ ⎛ ⎞ ⎛ ⎞ ⎞ ┐⎫
k l
− E QT ,i ηi,k − QT ,i ηi,l
N N
⎧Σ | ⎛ ⎞ ⎛ ⎞ |κ/2
1
N
| k k |
≤ c8 |
E |QT ,i ηi,k − EQT ,i ηi,k ||
(N T ) κ/4 N N
i=1
Σ | ⎛ ⎞ ⎛ ⎞ |κ/2
N
| l l |
+ |
E |QT ,i ηi,l − EQT ,i ηi,l ||
N N
i=1
⎡ ⎛ ⎛ ⎞ ⎛ ⎞ ⎞2 ┐κ/4
Σ
N
k k
+ E QT ,i ηi,k − EQT ,i ηi,k
N N
i=1
⎡ ⎛ ⎛ ⎞ ⎛ ⎞ ⎞2 ┐κ/4 ⎫
Σ
N
l l
+ E QT ,i ηi,l − EQT ,i ηi,l
N N
i=1
⎧Σ | ⎛ ⎞ |κ/2 Σ | ⎛ ⎞ |κ/2
1
N
| | N
| |
≤ c9 E |QT ,i k ηi,k | + E |QT ,i l ηi,l |
(N T ) κ/4 | N | | N |
i=1 i=1
⎡ ⎛ ⎛ ⎞ ⎞2 ┐κ/4 ⎡ ⎛ ⎛ ⎞ ⎞2 ┐κ/4 ⎫
Σ
N
k Σ
N
l
+ E QT ,i ηi,k + E QT ,i ηi,l .
N N
i=1 i=1
7.1 Change in the Means of High-Dimensional Observations 375
Using the Cauchy–Schwarz inequality, as in the proof of Lemma (7.1.1), we get for
all .0 < γ ≤ κ/2
| ⎛ ⎞ |γ ⎛ | ⎛ ⎞|2γ ⎞1/2
| k | | k | | |2γ
.E |QT ,i ηi,k || ≤ E ||QT ,i | E |ηi,k |
| N N |
⎧⎛ ⎞1/2 ⎛ ⎞γ /2 ⎫ ⎛ ⎞1/2
−γ +1
≤ c10 T E|Ei,0 | 2γ
+ EEi,0
2
E|Ei,0 |2γ
Thus we have
⎧ ⎡ ┐κ/4 ⎫
1 ⎨1 Σ
N
1 Σ 4
N ⎬ 1
A2 (k, l) ≤ c12
. E|Ei,0 |κ + EEi,0 ≤ c13 .
T κ/4 ⎩N N ⎭ T κ/4
i=1 i=1
Next we introduce
⎧ N ┌ ⎛ ⎞ ⎛ ⎞
1 Σ l k
A3 (k, l) = E
.
2
QT ,i − QT ,i
2
N 1/2 T T
i=1
⎛ ⎛ ⎞ ⎛ ⎞⎞┐⎫κ/2
l l k k
− 1− − 1− ,
T T T T
┌ ⎛ ⎞ ⎛ ⎞ ⎛ ⎛ ⎞ ⎛ ⎞⎞┐γ
l k l l k k
2
. QT ,i − QT ,i2
− 1− − 1−
T T T T T T
| ⎛ ⎞ ⎛ ⎞|γ ⎛ ⎞γ
| k l || l−k
≤ c14 ||Q2T ,i − Q2T ,i + c15 .
T T | T
and
| ⎛ ⎞ ⎛ ⎞|2γ
| l k ||
E ||QT ,i − QT ,i
T |
.
T
⎧ | |2γ | |2γ ⎫
⎨ || Σ
⎪ | ⎛ ⎞2γ |Σ | ⎪
l
| l−k | T
| ⎬
≤ c19 T −γ E || Ei,j || + E || Ei,j ||
⎪
⎩ |j =k+1 | T |j =1 | ⎪ ⎭
{ ⎛ ⎞γ
≤ c20 T −γ (l − k)E|Ei,0 |2γ + (l − k)γ EEi,02
⎛ ⎞ ⎫
l − k 2γ ⎡ ⎛ ⎞γ ┐
+ T E|Ei,0 | + T EEi,0
2γ 2
T
⎛ ⎞
l−k γ
≤ c21 E|Ei,0 |2γ .
T
Since .A3 (k, l) is a sum of independent random variables, we can use again
Rosenthal’s inequality and the upper bounds above with .γ = 2 and .γ = κ/2 to
conclude
⎛ ⎞κ/4
1 Σ
N
l−k
A3 (k, l) ≤ c22
. E|Ei,0 |κ .
T N
i=1
The upper bounds for .A1 (k, l), A2 (k, l) and .A3 (k, l) imply for all .1 ≤ k ≤ l ≤ T
| ⎛ ⎞ ⎛ ⎞|κ/2 ⎛ ⎞ ⎛ ⎞
| l l || l − k κ/4 1 l − k κ/4
|
.E V̄N,T − V̄N,T ≤ c23 + c24 κ ≤ c25 ,
| T T | T T T
which yields
| |κ/2
E |V̄N,T (x) − V̄N,T (y)| ≤ c26 |x − y|κ/4 .
.
Since .κ > 4, the lemma follows from Theorem 12.3 of Billingsley (1968) p. 95. ⨆
⨅
Proof of Theorem 7.1.1 The result follows from Lemmas 7.1.1 and 7.1.2, and
Prohorov’s Theorem. ⨆
⨅
In many applications T is comparable or smaller than N, and so it of interest
to study alternatives to Assumption 7.1.3. It may be shown (see Horváth and
7.1 Change in the Means of High-Dimensional Observations 377
Hušková 2012 ) that if the conditions of Theorem 7.1.1 hold with the exception
of Assumption 7.1.3, then there exists a nonzero function .g(x) such that
N 1/2 D[0,1]
V̄N,T (x) −
. g(x) −→ ┌(x).
T
In other words, if .N/T 2 does not converge to 0, then a non random drift
term appears in the process .V̄N,T . As such in the absence of Assumption 7.1.3
using Theorem 7.1.1 to estimate critical values of functionals (e.g. the supremum
functional) of .V̄N,T will asymptotically always lead to a rejection of .H0 . We discuss
a modification of Theorem 7.1.1 in Sect. 7.2, where we need to only assume that
.min(N, T ) → ∞.
Since the parameters .σi2 in the definition of .V̄N,T are unknown, in practice we
replace them with estimators. These constants describe the long-run variances of the
innovations in each cross-section. It is natural to use kernel-bandwidth estimators
2
.σ̂
T ,i as discussed in Chap. 3 for this purpose. We discussed the consistency of the
long-run estimators in Sect. 3.1. The rate of convergence of .σ̂T2,i to .σi2 puts further
restriction on the asymptotic rates between N and T under which Theorem 7.1.1
may be established. For details we refer to Horváth and Hušková (2012).
The asymptotic behaviour of .V̄N,T under .HA may be established using the results
in Sect. 2.1 for each cross-section. We require the following conditions on the
location of the change point as well as the magnitude of the changes:
t0 t0
0 < lim inf
. ≤ lim sup < 1 (7.1.16)
T T
and
T Σ 2
N
. δi → ∞. (7.1.17)
N 1/2
i=1
Condition (7.1.17) states what is required in terms of the magnitudes of the changes
in the means in order for the supremum functional of .V̄N,T to diverge. Notice that
it does not require that changes occur in each cross-section, although if a constant
change occurs on a positive fraction of the cross-sections, then (7.1.17) reduces to
.min{N, T } → ∞.
378 7 High-Dimensional and Panel Data
where .ζt is the common factor and .φi is the factor loading in the ith cross-section.
We assume that the common factor series .{ζt , t ∈ Z} satisfies the functional central
limit theorem:
⎣T
Σ x⎦
1 D[0,1]
. ζt −→ W (x),
T 1/2
t=1
ξi
ζi =
. , 1 ≤ i ≤ N.
N ρi
In other words, we consider the case where the influence of the common fac-
tors is decreasing with respect to the number of cross-sections considered. If
.sup1≤i<∞ |ξi | < ∞ and .ρi > 1/4 for all .i ∈ {1, . . . , N }, then the dependence
between the cross-sections is so small that Theorem 7.1.1 remains true. If instead
.ρi = 1/4 for all .i ∈ {1, . . . , N}, then
D[0,1]
V̄N,T (x) −→ ┌(x) + ξ0 B(x),
.
1 Σ ξi2
N
ξ0 = lim
. .
N →∞ N σ2
i=1 i
7.2 Panel Models with Common Factors 379
P
. sup |V̄N,T (x)| → ∞.
0<x<1
As such, if the loadings are “large”, then the supremum functional of .V̄N,T will
diverge even if there is no change in the cross-sectional means.
In order to further discuss the affect of common factors in change point detection
and alternative change point test statistics for this scenario in more generality, we
now turn the traditional p-factor model with AMOC in the mean
The time of the changes in the means is .t0 , and the mean of the ith cross-section is
μi which changes to .μi + δi at time .t0 . The cross-sectional dependence is modeled
.
by .ft , t ∈ {1, ..., T }, .ft ∈ Rp and .λi ∈ Rp are the corresponding loadings. We allow
linear as well as non-linear time series .ei,t as unobservable errors. The CUSUM
processes computed from each cross-section are denoted
⎣T
Σ 1 Σ
u⎦ T
Si (u) =
. (yi,t − ȳi,T ), ȳi,T = yi,t ,
T
t=1 t=1
Si (u) = 0, 0 ≤ u < 1/T . The following two observations based on the analysis of
.
V̄N,T to this point suggest an alternative process to consider that overcomes some
.
of the challenges encountered. Although the normalization with .σi2 in the definition
of .V̄N,T in Sect. 7.1 makes it possible that we can centralize the process with the
pivotal function .⎣T u⎦(T − ⎣T u⎦)/T 2 , estimating each .σi2 .i ∈ {1, . . . , N } presents
a challenge. Moreover, the absence of Assumption 7.1.3 and in the presence of
moderately strong common-factors, the process .V̄N,T contains a drift term. This
suggests using a random centralization in computing .V̄N,T ,
N ⎛
Σ ⎞
⎣T u⎦(T − ⎣T u⎦) 2
VN,T (u) =
. Si2 (u) − Si (τ ) , 0 ≤ u ≤ 1,
⎣T τ ⎦(T − ⎣T τ ⎦)
i=1
| l |
|Σ |κ
| |
.E | ei,t | ≤ c0 lκ/2 .
| |
t=1
⎣T
Σ u⎦
si (u) =
. ei,t .
t=1
Assumption 7.2.4
⎛ ⎞
. lim max ρP L T −1/2 si (u), σi W (u) = 0,
min(N,T )→0 1≤i≤N
many parametric models, e.g. ARMA and GARCH, this may be formulated as a
compactness condition on the parameter spaces for the processes. Next we define
1 Σ 4
N
. σ̄ 4 = lim σi , (7.2.2)
N→∞ N
i=1
which already appeared in Theorem 7.1.1. It follows from Assumption 7.2.3 that
σ̄ 4 < ∞ and Assumption 7.2.4 implies that .σ̄ > 0. We also require the mild
.
Let
⎛N ⎞−1
Σ Σ
N
Q = lim
. ||λi ||2 λi λT
i
N →∞
i=1 i=1
and
1 Σ
N
c∗ = lim
. ||λi ||2 ∈ [0, ∞]. (7.2.4)
N →∞ N 1/2
i=1
We assume that the above limits are well defined. The common factors can be
arbitrary, perhaps serially correlated, stationary vector valued processes, but we
assume that they too satisfy the functional central limit theorem. Let .I = Ip×p
be the .p × p identity matrix.
⎣T
Σ u⎦
D[0,1]p
T −1/2
. ft −→ WE (u),
t=1
Gaussian process. The matrix .E is the long-run covariance matrix of the partial sum
of the .ft ’s.
The following three theorems show that the asymptotic behavior of the process
.VN,T depends crucially on the constant .c∗ in (7.2.4). Three different behaviors are
1 D[0,1]
. VN,T (u) −→ ┌(u),
NT 1/2
The difference between Theorems 7.2.1 and 7.2.2 arises because of the different
“strengths” of the common factors. In Theorem 7.2.1 the effect of the common
factor is asymptotically negligible, and hence the limiting process is Gaussian. In
7.2 Panel Models with Common Factors 383
Theorem 7.2.2 the common factor dominates and the limit is a quadratic form of an
Rp valued Gaussian process. Using the .vec operator (see Abadir and Magnus 2005,
.
Thirdly, we consider the case when the errors and the common factors are of the
same order. Since in this case both the error and common factor processes affect the
limit, we have to specify their joint behavior:
Assumption 7.2.7 .{ei,t , i ∈ {1, ..., N}, t ∈ {1, ..., T } and .{ft , t ∈ {1, ..., T }} are
independent.
Theorem 7.2.3 If .H0 of (7.1.2), Assumptions 7.2.1–7.2.6 hold and .0 < c∗ < ∞,
then, as .min(N, T ) → ∞,
1 D[0,1]
⎛ ⎛
. VN,T (u) −→ ┌(u) + c ∗ trace Q BE (u)BT E (u)
T N 1/2
⎞⎞
u(1 − u)
− BE (τ )BT
E (τ ) ,
τ (1 − τ )
1 Σ
T
.zi,t = (yi,t − ȳi,T ) −
2
(yi,s − ȳi,T )2 , 1 ≤ t ≤ T,1 ≤ i ≤ N
T
s=1
T −h
1 Σ
. γ̂i,j (h) = zi,s zj,s+h ,
T −h
s=1
384 7 High-Dimensional and Panel Data
Σ
N Σ
N Σ
H ⎛ ⎞
h
wN,T =
. K γ̂i,j (h)
H
i=1 j =1 h=−H
where K is the kernel satisfying Assumption 3.1.5. Here we need stronger condi-
tions on H , the window (smoothing parameter) than in Assumption 3.1.4:
and
∞
Σ ( )
2
.si = cov ei,0 , ei,h .
h=−∞
⎛ ⎞
We need an assumption on the order of decay of .ER0 RT
h and .cov e
2 , e2 :
i,0 i,h
Assumption 7.2.9
(i) .{ft , t ∈ Z} is a stationary sequence with .Ef0 = 0, .E||f0 ||4 < ∞ and there are
.c1 and .α1 > 2 such that
|| ||
|| T || −α1
.E ||R0 Rh || ≤ c1 (1 + |h|)
⎛ ⎞
.
2
cov ei,0 2
, ei,h ≤ c2 (1 + |h|)−α2 .
Theorem 7.2.4 We assume that .H0 of (7.1.2), Assumptions 3.1.5 and 7.2.2–7.2.9
are satisfied.
7.2 Panel Models with Common Factors 385
wN,T P Σ N
. → a0 , where a0 = lim s2i .
NT N →∞
i=1
wN,T P
. → a0 + qT Rq,
NT
where
⎛ ⎞−1
Σ
N Σ
N
q = lim
. ||λi ||
2
vec(λi λT
i ).
N →∞
i=1 i=1
wN,T P T
.
⎛N ⎞2 → q Rq.
Σ
||λi ||2
i=1
1 D[0,1]
. VN,T (u) −→ Δ(u),
(T wN,T )1/2
where
⎧ −1/2
⎪
⎪ a0 ┌(u), if c∗ = 0,
⎪
⎪
⎪ ⎛ ⎞−1/2 ┌ ⎛ ⎛
⎪
⎪ + T
+ Q BE (u)BT
⎪
⎪ a0 c∗ q Rq ┌(u) c∗ trace E (u)
⎪
⎪ ⎞⎞ ┐
⎪
⎨ u(1 − u)
.Δ(u) = − BE (τ )BT
E (τ ) , if 0 < c∗ < ∞,
⎪
⎪ τ (1 − τ )
⎪
⎪ ⎛ ⎞ −1/2 ⎛ ⎛
⎪
⎪ qT Rq trace Q BE (u)BT
⎪
⎪ E (u)
⎪
⎪ ⎞⎞
⎪
⎪ u(1 − u)
⎩− BE (τ )BT
E (τ ) , if c∗ = ∞
τ (1 − τ )
In Corollary 7.2.1 we normalize .VN,T (u) with the same random sequence and
only the form on the limit distribution depends on .c∗ . This makes it possible to use
resampling to approximate the distribution of functionals of .Δ(u).
The normalizing sequence in Corollary 7.2.1 works well under .H0 but without
modification tends to overestimate under .HA , thereby reducing the power of the
tests. This is due to the fact that centering each series by .ȳi,T tends to overestimate
the variances of the idiosyncratic errors under .HA . We already discussed this issue
in Sect. 3.1. In order to mitigate this problem, when computing .wN,T we center
each cross-section by taking into account a potential change point in the mean. To
do this, in each cross section we estimate a potential change point using a standard
CUSUM estimator described in Sect. 2.1. We then calculate the normalization term
.w̄N,T after recentering each cross section taking into account this potential change
point. Hence we expect the behaviour of .w̄N,T to be similar under .H0 and .HA . We
assume that (7.1.16) holds. If .0 ≤ c∗ < ∞, then only the cross-sections dominate
or they have the same influence as the common factors. If .0 ≤ c∗ < ∞ and (7.1.16)
is satisfied, then
1 P
. sup |V̄N,T | → ∞. (7.2.6)
(T w̄N,T )1/2 0≤u≤1
If .c∗ = ∞ and
Σ
N
T δi2
i=1
. → ∞,
Σ
N
||λi ||
2
i=1
which implies
⎛ ⎞2 ⎛ ⎞
⎣T u⎦ ⎣T u⎦
2
.Si (u) = si (u) − si (1) + 2 si (u) − si (1)
T T
⎛ ⎞
⎣T
Σ u⎦ ΣT
⎣T u⎦
× ⎝ λT i ft − λT
i ft ⎠
T
t=1 t=1
⎛ ⎞2
⎣T
Σ ⎣T u⎦ T Σ ⎠
u⎦ T
+ ⎝λT
i ft − λi ft .
T
t=1 t=1
7.2 Panel Models with Common Factors 387
Let
Σ
N
ZN,T (u) =
. ξi (u)
i=1
with
⎛ ⎞2
⎣T u⎦ ⎣T u⎦(T − ⎣T u⎦)
ξi (u) = si (u) −
. si (1) − σi2 .
T T
1 D[0,1]
. ZN,T (u) −→ ┌ 0 (u),
N 1/2 T
respectively.
Proof Using Assumptions 7.2.2–7.2.4 we get that the processes .ξi (u), i ∈
{1, ..., N } are independent and .Eξi (u) = 0. We start with the proof of tightness.
Applying Rosenthal’s inequality (see Petrov 1995, p. 59) we conclude
|N |κ/2 ⎧N
|Σ | Σ
| |
.E | (ξi (u) − ξi (v))| ≤ c1 E|ξi (u) − ξi (v)|κ/2
| |
i=1 i=1
⎛ ⎞κ/4 ⎫
Σ
N ⎬
+ E(ξi (u) − ξi (v))2 . (7.2.7)
⎭
i=1
For .0 ≤ v ≤ u ≤ 1 we write
⎛ ⎞
⎣T u⎦(T − ⎣T u⎦) ⎣T v⎦(T − ⎣T u⎦)
ξi (u) − ξi (v) = si2 (u) − si2 (v) − σi2
. −
T T
⎛ ⎞ ⎛⎛ ⎞2 ⎛ ⎞ ⎞
⎣T u⎦ ⎣T v⎦ ⎣T u⎦ ⎣T v⎦ 2 2
−2 − si (1) + − si (1).
T T T T
388 7 High-Dimensional and Panel Data
| |κ/2 ⎡ ┐
| |
E |si2 (u) − si2 (v)| ≤ E |si (u) − si (v)|κ/2 (|si (u)| + |si (v)|)κ/2
.
⎛ | |κ ⎞1/2
| ⎣T
Σ u⎦ |
|
κ/2 ⎝ |
|
≤2 E| ei,t || ⎠
|t=⎣T v⎦+1 |
⎛ | | | | ⎞
|⎣T u⎦ |κ |⎣T v⎦ |κ 1/2
|Σ | |Σ |
× ⎝E || ei,t || + E || ei,t || ⎠
| t=1 | | t=1 |
|⎡⎛ ┐ |κ/2
| ⎣T u⎦ ⎞2 ⎛ ⎣T v⎦ ⎞2 |
| |
.E | − si (1)| ≤ c5 T κ/2 (u − v)κ/4 ,
2
| T T |
and therefore
Tightness of the process .ZN,T in .D[0, 1] now follows from Billingsley (1968) (p.
95).
Next we show the convergence of the finite dimensional distributions. Let .M ≥ 1
be an integer, .0 ≤ u1 < . . . < uM ≤ uM and .α1 , α2 , . . . , αM be constants. We
define
Σ
M
ηi =
. αl ξi (ul ).
l=1
7.2 Panel Models with Common Factors 389
where
⎧ | ⎫
|1 ⎛ ⎞||
Ji,T
. = 1 sup || ξi (u) − σi2 Bi,T
2
(u) − u(1 − u) || > δ .
0≤u≤1 T
and
⎡ ┐
| ⎛ ⎞|κ̄
| |
E
. sup |σi2 Bi,T
2
(u) − u(1 − u) | Ji,T (7.2.13)
0≤u≤1
⎛ ⎞2κ̄/κ
| ⎛ ⎞|κ/2 ( )(κ−2κ̄)/κ
| |
≤ E sup |σi2 Bi,T
2
(u) − u(1 − u) | E{Ji,T }
0≤u≤1
⎛ ⎞2κ̄/κ
| ⎛ ⎞|κ/2
| |
≤δ (κ−2κ̄)/κ
E sup |σi2 Bi,T
2
(u) − u(1 − u) |
0≤u≤1
⎛ ⎞2κ̄/κ
|⎛ ⎞|κ/2
| 2 |
≤ δ (κ−2κ̄)/κ σi2κ̄ E sup | Bi,T (u) − u(1 − u) | .
0≤u≤1
Using the distribution of the Brownian bridge one can easily verify that
|⎛ ⎞|κ/2
| 2 |
E sup | Bi,T
. (u) − u(1 − u) | ≤ c10 . (7.2.14)
0≤u≤1
We note
| | ⎛ ⎞2
|1 |
| | −1/2
| T ξi (u)| ≤ σi + 4 sup T
2
. sup si (u) . (7.2.15)
0≤u≤1 0≤u≤1
| ⎛ ⎞
|1 ⎛ ⎞|| 2
. lim max E sup || ξi (u) − σi Bi,T (u) − u(1 − u) ||
2 2
min(N,T )→∞ 1≤i≤N 0≤u≤1 T
=0 (7.2.17)
7.2 Panel Models with Common Factors 391
and
| ⎛ ⎞
|1 ⎛ ⎞|| κ̄
. lim |
max E sup | ξi (u) − σi Bi,T (u) − u(1 − u) |
2 2 | (7.2.18)
min(N,T )→∞ 1≤i≤N 0≤u≤1 T
= 0.
Σ
N
1 Σ
N
. Eη 2
= E η̄i2 + o(N) (7.2.19)
T2 i
i=1 i=1
and
Σ
N
1 Σ
N
. E|ηi | =
κ̄
E|η̄i |κ̄ + o(N), (7.2.20)
T2
i=1 i=1
where
Σ
N
η̄i = σi2
. αl (B 2 (ul ) − ul (1 − ul ))
l=1
Hence Lyapunov’s theorem gives that the finite dimensional distributions of .ZN,T
converge to a multivariate normal distribution. Our arguments also show that
.E┌ (u) = 0 and
0
⎡ ┐ Σ
N
. E┌ 0 (u)┌ 0 (v) = E (B 2 (u) − u(1 − u))(B 2 (v) − v(1 − v)) lim σi4 .
N →∞
i=1
⎛N ⎞−1 ⎧ ⎛ ⎞⎫2
Σ N ⎨
Σ ⎣T
Σ u⎦ Σ
T ⎬
1 ⎝ ⎣T u⎦ ⎠
. ||λi ||2 λT f t − f t
T ⎩ i T ⎭
i=1 i=1 t−1 t−1
D[0,1]
⎛ ⎞
−→ trace QBE (u)BT
E (u) ,
The result now follows from Assumption 7.2.6 and the definition of .Q. ⨆
⨅
Lemma 7.2.3 If .H0 of (7.1.2), and Assumptions 7.2.2–7.2.4 hold, then
| ⎛ ⎞| ⎛ ⎛ ⎞1/2 ⎞
|N ⎛ ⎞ ⎣T |
|Σ ⎣T u⎦ Σ u⎦
| ΣN
. sup | si (u) − si (1) ⎝λT ft⎠|| = OP ⎝T ||λi ||2 + T⎠.
| i
0≤u≤1 | i=1 T
t=1 | i=1
Proof Let
Σ
N ⎛ ⎞
⎣T u⎦
.ẐN,T (u) = λi si (u) − si (1) , 0 ≤ u ≤ 1.
T
i=1
Σ
N
2
.aN = ||λi ||2 + 1.
i=1
Let .λi,k be the kth coordinate of .λi . Following the proof of Lemma 7.2.1 one can
prove that
|N ┌ ⎛ ⎞ ⎛ ⎞┐||κ
|Σ ⎣T u⎦ ⎣T v⎦
| |
.E | λi,k si (u) − si (1) − si (v) − si (1) |
| T T |
i=1
⎧ ⎛ ⎞κ/2 ⎫
⎨ 1 Σ N
1 Σ
N ⎬
≤ c3 |λ i,k | κ
+ λ 2
|u − v|κ/2
⎩ aN
κ 2
aN
i,k ⎭
i=1 i=1
≤ c4 |u − v|κ/2 ,
1 Σ
N
.
κ |λi,k |κ ≤ c5 .
aN
i=1
|N ⎛ ⎞||
|Σ
| 2 ⎣T u⎦(T − ⎣T u⎦) |
. sup | Si (u) − σi
2
− ZN,T (u) | = oP (N 1/2 T ).
0≤u≤1 | T |
i=1
394 7 High-Dimensional and Panel Data
So using again Lemma 7.2.1, the proof can be completed via the continuous
mapping theorem. ⨆
⨅
Proof of Theorem 7.2.2 It follows from Lemmas 7.2.1–7.2.1 and the assumption
c∗ = ∞ that
.
| ⎧⎛ Σ ⎣T
Σ ⎞2
⎣T u⎦ Σ T Σ
u⎦
| N N T
|
. sup VN,T (u) −
T
λi ft − λi ft
| T
0≤u≤1 i=1 t=1 i=1 t=1
⎛Σ ⎣T
Σ τ⎦ ⎞2 ⎫|
⎣T τ ⎦ Σ T Σ |
N N T
⎣T u⎦(T − ⎣T u⎦) |
− λT
i ft − λi ft |
⎣T τ ⎦(T − ⎣T τ ⎦) T
i=1 t=1 i=1 t=1
⎛ ⎞
Σ
N
= oP T ||λi ||2 ,
i=1
yi,t = xT
.
T
i,t (β i + δ i 1{t > t0 }) + λi ft + ei,t ,
the common factors .ft ; .t ∈ {1, ..., T } .ft ∈ Rp and .λi ∈ Rp are the corresponding
loadings. Under the null hypothesis
H0 : t0 > T ,
. (7.3.2)
7.3 High-Dimensional Linear Regression 395
.yi,t = xT T
i,t β i + λi ft + ei,t , i ∈ {1, . . . , N }, t ∈ {1, . . . , T }. (7.3.3)
Following the methods presented in Sects. 4.1.1 and 4.1.3, we define the CUSUM
processes of the residuals from each cross-section
⎣T
Σ u⎦
Si,T (u) =
. Êi,t ,
t=1
.Êi,t = yi,t − xT
i,t β̂ i,T , i ∈ {1, . . . , N }, t ∈ {1, . . . , T },
with .β̂ i,T denoting the least squares estimator for .β i of the i’th cross-section. We
suggest using functionals of the .l2 -aggregated cross-sectional CUSUM processes
⎛ 2 ⎞
1 Σ
N
Si,T (u) ⎣T u⎦(T − ⎣T u⎦)
V̄N,T (u) =
. − , (7.3.4)
N 1/2 T σi2 T2
i=1
N ⎛
Σ ⎞
⎣T u⎦(T − ⎣T u⎦) 2
VN,T (u) =
.
2
Si,T (u) − Si,T (τ )
⎣T τ ⎦(T − ⎣T τ ⎦)
i=1
with some .0 < τ < 1. The asymptotic properties of .V̄N,T and .VN,T can be derived
along the lines of the proofs in Sects. 4.1.1 and 4.1.3 assuming that .min(N, T ) →
∞ under various conditions on the strength of the loadings .λi , i ∈ {1, . . . , N }, and
the relative divergence rates of T and N .
In some applications T , the length of the observed time series is much smaller
than N, the number of cross-sections. In these cases a more realistic asymptotic
framework is to consider T to be fixed while N tends to infinity. Following Antoch
et al. (2019), we use the sum of the squared residuals
Σ
N Σ
t
.ṼN (t) = 2
Êi,s , t ∈ {1, . . . , T }.
i=1 s=1
396 7 High-Dimensional and Panel Data
In contrast to the previous chapters, we assume here that the covariates .xi,t ’s are
deterministic. We now list some assumptions under which .ṼN has a Gaussian limit.
Assumption 7.3.1 .Eft = 0 and .E||ft ||2 < ∞, .t ∈ {1, . . . , T }.
Assumption 7.3.2
(i) The innovations .E i = (Ei,1 , Ei,2 , . . . , Ei,T )T , .i ∈ {1, . . . , N } are independent
(ii) .EEi,t = 0, EEi,t Ei,s = 0 and .c1 ≤ σi2 = EEi,t 2 ≤ c for all .i ∈ {1, . . . , N }, 1 ≤
2
s /= t ≤ T with some .0 < c1 < c2 < ∞.
(iii) There is a .κ > 4 such that
1 Σ
N
. lim sup |Ei,t |κ < ∞, for all 1 ≤ t ≤ T ,
N→∞ N
i=1
Assumption 7.3.3 .{ft , t ∈ {1, ..., T }} and .{E i , i ∈ {1, ..., N}} are independent,
Assumption 7.3.4 There is .c3 > 0 such that .||xi,t || ≤ c3 for all .i ∈ {1, . . . , N}, t ∈
{1, . . . , T },
and
Assumption 7.3.5
(i) there are .t1 and .c4 such that
⎛ ⎞−1 ||⎛ ⎞−1 ||
|| t1 ||
Σ
t1
|| Σ ||
xi,s xT exists and || x xT || ≤ c4
. i,s || i,s i,s ||
s=1 || s=1 ||
1 Σ
N
. lim ||λi ||2 = 0.
N→∞ N 1/2
i=1
7.3 High-Dimensional Linear Regression 397
Σ
t
Zi,t =
. xi,s xT
i,s , t ∈ {1, ..., T }, i ∈ {1, ..., N },
s=1
and
Σ
t
Si,t =
. xi,s Es , t ∈ {1, ..., T }, i ∈ {1, ..., N }.
s=1
Under Assumption 7.3.6, last assumption we make implies that the covariance
function of the standardised and normalised .ṼN (t) exists:
Σ
N
AN (t) =
.
2
ai,t
i=1
with
t ⎛
Σ ⎞ ⎛ ⎞
−1 −1
.
2
ai,t = σi2 1 − xT
i,s Zi,T xi,s = σi t − trace(Zi,t Zi,T ) .
2
s=1
Now we can state the limit distribution of .ṼN (t) when the common factors are
negligible.
Theorem 7.3.1 If .H0 and Assumptions 7.3.1–7.3.7 hold, then
{ ⎛ ⎞ } D { }
. N −1/2 ṼN (t) − AN (t) , 1 ≤ t ≤ T → ξ (1) (t), 1 ≤ t ≤ T ,
{ }
where . ξ (1) (t), 1 ≤ t ≤ T has a multivariate Gaussian distribution, with
.Eξ
(1) (t) = 0 and .Eξ (1) (t)ξ (1) (t ' ) = ┌(t, t ' ).
In order to model the situation in which the factor loadings are not negligible, we
replace Assumption 7.3.6 with the following:
Assumption 7.3.8
1 Σ
N
. ||λi ||2 = O(1) with some rN /N 1/2 → ∞.
rN
i=1
N ⎛ ⎞⎛ ⎞
1 Σ 1 T −1 1 T −1
.Q(s, v, z) = lim λi − λi xi,s Zi,T xi,z λi − λi xi,v Zi,T xi,z
N →∞ rN T T
i=1
where
Σ
t
ξ
.
(2)
(t) = fT
s Q(s, v, t)fv .
s,v=1
Similarly to Sects. 7.1 and 7.2, the limiting distribution in Theorem 7.3.2 is
completely determined by the common factors and their loadings.
Due to the complex definitions of the covariances in Theorems 7.3.1 and 7.3.2,
the computation of the distributions of functionals of .ξ (1) (t) and .ξ (2) (t) is difficult
to do without resorting to resampling methods. Antoch et al. (2019) suggests the
wild bootstrap for this purpose. They also discuss the behaviour of .ṼN (t) under the
alternative of a change point.
Theorem 7.3.1 is a consequence of the following two lemmas. We use the
notation
−1 −1
Êi,t = wi,t + ri,t , where wi,t = Ei,t − xT
.
T
i,t Zi,T Si,T and ri,t = λi ft − xi,t Zi,T Ji,T ,
7.3 High-Dimensional Linear Regression 399
with
Σ
t
.Ji,t = xi,v λT
i fv .
v=1
Thus we have
2
Êi,t
. = wi,t
2
+ 2wi,t ri,t + ri,t
2
.
We can assume that the matrix .┌(t, t ' ), 1 ≤ t, t ' ≤ T is nonsingular. If .┌(t, t ' ), 1 ≤
t, t ' ≤ T is singular, we may instead consider a subset of the coordinates of
.{ṼN (t), 1 ≤ t ≤ T } that (after centralization and norming) have a nonsingular
The lemma now follows from applications of Lyapunov’s theorem (see Petrov 1995,
p. 122) and the Cramér–Wold lemma (see Billingsley 1968). ⨆
⨅
Lemma 7.3.2 If .H0 and Assumptions 7.3.1–7.3.7 hold, then
Σ
N ⎛ ⎞
.
2
ri,t = oP N 1/2 ,
i=1
and
Σ
N ⎛ ⎞
. ri,t wi,t = oP N 1/2 .
i=1
Proof We note
−1
2
ri,t
. ≤ 2||λi ||2 ||ft ||2 + 2(xT 2
i,t Zi,T Ji,T ) .
Σ
N ⎛ ⎞
. ||λi ||2 ||ft ||2 = oP N 1/2 .
i=1
which yields
Σ
N Σ
N ⎛ ⎞
−1
. (xT
i,t Zi,T Ji,T )
2
= O(1) max ||fs || 2
||λi ||2 = oP N 1/2 .
1≤s≤T
i=1 i=1
This completes the proof of the first part of the lemma. Towards establishing the
second part, we write
T −1 T −1 T −1 T −1
wi,t ri,t = Ei,t λT
.
T
i ft − Ei,t xi,t Zi,T Ji,T − xi,t Zi,T Si,T λi ft + xi,t Zi,T Si,T xi,t Zi,T Ji,T .
−1
on account of Assumption 7.3.6. Similarly, .EEi,t xT
i,t Zi,T Ji,T = 0 and
⎛N ⎞
Σ ⎛ ⎞
−1
.var Ei,t xT
i,t Zi,T Ji,T = O N 1/2 . (7.3.7)
i=1
Repeating the arguments used in the derivations of (7.3.6) and (7.3.7) one can verify
|N |
|Σ | ⎛ ⎞
| T −1 T |
.| xi,t Zi,T Si,T λi ft | = O N 1/4
| |
i=1
and
|N |
|Σ | ⎛ ⎞
| T −1 T −1 |
.| xi,t Zi,T Si,T xi,t Zi,T Ji,T | = O N 1/4 ,
| |
i=1
Σ
T
−1
2
.ri,t = fT T
t λi λi f t −2 fv λi xT T
i,v Zi,T xi,t λi ft
v=1
Σ
T
−1 T −1
+ fs λi xT T
i,s Zi,T xi,t xi,t Zi,T xi,v λi fv .
s,v=1
and
|N |
|Σ |
| |
.| wi,t ri,t | = oP (rN ),
| |
i=1
We have seen in Sects. 5.3 and 5.4 that estimators for the parameters of RCA time
series models satisfy the central limit theorem in both stationary and explosive
settings. As such it is reasonably straightforward to construct change point detection
procedures for high-dimensional and panel data models in which the cross-sectional
series follow RCA specifications that allow for stationary and explosive cross-
sections, and changes between stationary and explosive regimes. We consider a
model where the cross-sections are RCA(1) sequences cross-correlated by the
presence of common factors:
where .β̂i,T is the weighted least square estimator in the i’th cross-section
⎛ T ⎞⎛ T ⎞−1
Σ yi,t−1 yi,t Σ yi,t−1
2
β̂i,T
. = .
t=2
1 + yi,t−1
2
t=2
1 + yi,t−1
2
Let
⎛ ⎞
⎣T u⎦
1 ⎝Σ ⎣T u⎦ Σ
T
Êi,t Êi,t ⎠,
Si,T (u) =
. −
T 1/2
t=2
(1 + yi,t−1
2 )1/2 T
t=2
(1 + y 2
i,t−1 ) 1/2
0 ≤ u ≤ 1, i ∈ {1, ..., N}
denote the CUSUM process of the residuals of the i’th cross-section. As in Sect. 7.1
we define the sum of the squares of the CUSUM processes
⎛ 2 ⎞
1 Σ
N
Si,T (u) ⎣T u⎦(T − ⎣T u⎦)
. V̄N,T (u) = − ,
N 1/2 2
σ̂i,T T2
i=1
7.4 Changes in the Parameters of RCA Panel Data Models 403
2 are estimators for the variances of .Ê ’s, and as in Sect. 7.2 we define
where .σ̂i,T i,t
N ⎛
Σ ⎞
1 ⎣T u⎦(T − ⎣T u⎦) 2
.VN,T (u) = 2
Si,T (u) − Si,T (τ ) ,
N 1/2 ⎣T τ ⎦(T − ⎣T τ ⎦)
i=1
where .0 < τ < 1. We assume that the innovations for each cross-section are
independent and satisfy fourth order moment conditions as in Assumption 5.3.1
of Chap. 5.
Assumption 7.4.1
(i) .{Ei,,t,1 , t ∈ Z} and .{Ei,t,2 , t ∈ Z}, .i ∈ {1, ..., N}, are independent sequences,
(ii) .{Ei,t,1 , t ∈ Z} are independent and identically distributed random variables
with .EEi,t,1 = 0, .c1 ≤ EEi,t,1 2 = σi,12 ≤ c and .E|E |4 ≤ c for all .i ∈
2 i,1 3
{1, ..., N } with some .0 < c1 , c2 , c3 < ∞,
(iii) .{Ei,t,2 , t ∈ Z} are independent and identically distributed random variables
with .EEi,2 = 0, .c4 ≤ EEi,t,2 2 = σi,2 2 ≤ c and .E|E
5 i,t,2 | ≤ c6 for all .i ∈
4
Assumption 7.4.2 .{Ei,t,1 , Ei,t,2 , t ∈ Z, i ∈ {1, ..., N}} and .{ft , t ∈ {1, ..., T }} are
independent
We assume that the loadings and common factors satisfy the following:
Assumption 7.4.3
(i) .Eft = 0, .Eft fT
t = I and .E||ft || ≤ c8 , with some .ν > 4 and .c8 < ∞,
ν
(ii) .||λi || ≤ c9 ,
(iii)
1 Σ
N
. lim ||λi || = 0.
N →∞ N 1/2
i=1
N
. → 0.
T 1/2
We consider two subsets of the for the cross-sectional series: stationary .A and
explosive .B cross-sections. Whether a cross-section is in either subset is determined
by .E log |βi + Ei,0,1 |:
404 7 High-Dimensional and Panel Data
Assumption 7.4.5
(i) there is .c8 < 0 such that if .i ∈ A, then .E log |βi + Ei,0,1 | ≤ c8 ,
(ii) there is .c9 > 0 such that if .i ∈ B, then .E log |βi + Ei,0,1 | ≥ c9 , and .Ei,0,1 has a
bounded density.
2 , i ∈ {1, ..., N} in the definition of .V̄
We use the estimators .σ̂i,T N,T (u). We show
on the proof of Theorem 7.4.1 that .Si,T (u) is asymptotically the sum of uncorrelated
random variables. Hence we suggest the normalization with
⎛ ⎞2
1 Σ 1 Σ Êi,s,1 yi,s−1
T T
Êi,t,1 yi,t−1
2
.σ̂i,T = − ,
T (1 + yi,t−1 )1/2 T (1 + yi,s−1 )1/2
t=2 s=2
where
We wish to point out that the normalization does not require foreknowledge of the
stationarity properties of the observations in the cross-sections.
Theorem 7.4.1 If .H0 of (7.3.2) and Assumptions 7.4.1–7.4.4 hold, then
D[0,1]
V̄N,T (u) −→ ┌(u),
.
∞
Σ | |
l
wi,t =
. Ei,t−l,2 (βi + Ei,t−j +1 ), t ∈ Z. (7.4.1)
l=0 j =1
D[0,1]
VN,T (u) −→ ┌ ∗ (u),
.
Σ
N
σ̄ 4 = lim
. σi4 .
N →∞
i=1
where
k
Zi,T ,l (k/T ) = Ri,T ,l (k/T ) −
. Ri,T ,l (1), 1 ≤ l ≤ 4, 1 ≤ k ≤ T ,
T
Σ
k
yi,t−1
.Ri,T ,1 (k/T ) = ,
t=2
(1 + yi,t−1
2 )1/2
Σ
k
Ei,t,1 yi,t−1
.Ri,T ,2 (k/T ) = ,
t=2
(1 + yi,t−1
2 )1/2
Σ
k
λT
i ft
Ri,T ,3 (k/T ) =
.
t=2
(1 + yi,t−1
2 )1/2
and
Σ
k
Ei,t,2
. Ri,T ,4 (k/T ) = .
t=2
(1 + yi,t−1
2 )1/2
406 7 High-Dimensional and Panel Data
It follows from
Σ
t | |
t | |
t
yi,t =
. (λT
i fl + Ei,l,2 ) (βj + Ei,j,1 ) + (βi + Ei,j,1 )
l=1 j =l+1 j =1
Σ
t | |
t Σ
t | |
t
= λT
i fl (βj + Ei,j,1 ) + Ei,l,2 (βj + Ei,j,1 )
l=1 j =l+1 l=1 j =l+1
| |
t
+ (βi + Ei,j,1 ).
j =1
= wi,t + λT
i wi,t
with some .0 < c1 , c3 < ∞, 0 < c2 < 1. Due to (7.4.5), we can replace .yi,t with
ȳi,t , the stationary sequence in (7.4.5), so we work with
.
where
k
Z̄i,T ,l (k/T ) = R̄i,T ,l (k/T ) −
. R̄i,T ,l (1), 1 ≤ l ≤ 4, 1 ≤ k ≤ T ,
T
Σ
k
ȳi,t−1
R̄i,T ,1 (k/T ) =
. ,
t=2
(1 + ȳi,t−1
2 )1/2
Σ
k
Ei,t,1 ȳi,t−1
R̄i,T ,2 (k/T ) =
. ,
t=2
(1 + ȳi,t−1
2 )1/2
Σ
k
λT
i ft
R̄i,T ,3 (k/T ) =
.
t=2
(1 + ȳi,t−1
2 )1/2
7.4 Changes in the Parameters of RCA Panel Data Models 407
and
Σ
k
Ei,t,2
. R̄i,T ,4 (k/T ) = .
t=2
(1 + ȳi,t−1
2 )1/2
Next we note
1 1 Σ
. max (βi − β̂i,T )2 Z̄i,T
2
,1 (k/T )
N 1/2 T 2≤k≤N
i∈A
Σ ⎛ ⎞2
1 1 || |
≤ (βi − β̂i,T )2 max Z̄i,T ,1 (k/T )| .
N 1/2 2≤k≤N T 1/2
i∈A
and
⎛ ⎞4
1 || |
.E max Z̄i,T ,1 (k/T )| ≤ c2 , (7.4.6)
2≤k≤N T 1/2
with some constants .c1 and .c2 . Thus we get by the Cauchy–Schwarz inequality
⎧ ⎫
1 1 Σ
E
.
1/2
max (βi − β̂i,T )2 Z̄i,T
2
,1 (k/T ) (7.4.7)
N T 2≤k≤T
i∈A
⎡ ⎛ ⎞4 ┐1/2
1 Σ⎡ ┐1/2 1 || |
≤ E(βi − β̂i,T ) 4
E max Z̄i,T ,1 (k/T )|
N 1/2 2≤k≤T T 1/2
i∈A
1 Σ 1/2 1/2 1
≤ c1 c2
N 1/2 T
i∈A
⎛ ⎞
N 1/2
=O
T
= o(1).
408 7 High-Dimensional and Panel Data
1 Σ 1
. max Z̄ 2 (k/N)
N 1/2 1≤k≤T T 2 i,T ,3
i∈A
⎛ || k
1 Σ 1 ||||Σ ft
≤ ||λi || max
2
||
N 1/2 1≤k≤T T 1/2 || (1 + ȳi,t−1
2 )1/2
i∈A t=2
||⎞2
k Σ
T ||
ft ||
− || .
T (1 + ȳ 2 ) 1/2 ||
t=2 i,t−1
Σ
k
Ei,t,1 wi,t−1
R̄R,T ,5 (k/T ) =
. .
i=2
(1 + wi,t−1
2 )1/2
Using the decomposability of .ȳi,t , wi,t and .wi,t one can verify
1 || 2 |
|
E max
. |R̄R,T ,2 (k/T ) − R̄R,T
2
,5 (k/T )| ≤ c4 ||λi || .
2
(7.4.9)
1≤k≤N T
1 || 2 |
|
E max
. |R̄R,T ,4 (k/T ) − R̄R,T
2
,6 (k/T ) | ≤ c5 ||λi ||2 , (7.4.10)
1≤k≤N T
where
Σ
k
Ei,t,1
R̄R,T ,6 (k/T ) =
. .
i=2
(1 + wi,t−1
2 )1/2
7.4 Changes in the Parameters of RCA Panel Data Models 409
where
and
⎣T
Σ u⎦
1 Ei,t,1 wi,t−1 + Ei,t,2
V̄i,T ,1 (u) =
. .
T 1/2
t=2
(1 + wi,t−1
2 )1/2
Using from Sect. 5.3 that .|yi,t | converges in probability to .∞ at an exponential rate,
if .i ∈ B one can prove
| |
|Σ⎛ ⎞ Σ⎛ ⎞|
1 | |
. sup | S̄i,T (u) − E S̄i,T (u) −
2 2
Vi,T ,2 (u) − EVi,T ,2 (u) | = oP (1),
2 2
0≤u≤1 N 1/2 | |
i∈B i∈B
where
and
⎣T
Σ u⎦
1
V̄i,T ,2 (u) =
. Ei,t,1 .
T 1/2
t=2
D[0,1]
QN,T (u) −→ ┌(u),
. (7.4.11)
D[0,1]
V̄N,T (u) −→ ┌(u).
.
⨆
⨅
The proof of Theorem 7.4.2 similarly combines the results of Sect. 5.3 with the
argument used to establish Theorem 7.1.1, and so we omit the details.
11
1.8
1
1
10
1.7
2
1.6
9
3
1.5
8
4
1.4
7
1.3
Fig. 7.1 The graphs of the exchange rates, 1 = UK, 2 = SI, 3 = CA, 4 = SW (left panel); 1 = DN,
2 = NO, 3 = SD (right panel) with respect to the US dollar
1.2
1.1
1
1
1.1
1.0
1.0
2 3
3
0.9
0.9
0.8
4
0.8
0.7
Fig. 7.2 The graphs of the relative exchange rates, 1 = UK, 2 = SI, 3 = CA, 4 = SW (left panel);
1 = DN, 2 = NO, 3 = SD (right panel) with respect to the US dollar
It is clear from Fig. 7.1 that the exchange rates are between 1.3 and 11, so if the
same proportional change occurs in a cross-section with high values, this change
will be relatively large when compared to the other cross-sections. As such a single
cross-section can disproportionately contribute to the value of the test statistic. To
overcome this problem they rescaled the observations in each cross-section with
the first observation, i.e. with the exchange rate on 03/13/2001. Figure 7.2 contains
the graphs of the relative changes in exchange rates with respect to the US dollar
for the same countries as in Fig. 7.1. Horváth et al. (2017a) repeat the analysis
for the relative changes (rescaled) in the exchange rates with respect to the US
dollar, resulting in rejection and the estimated time of change 303 corresponding
to 05/24/2002. They also construct confidence interval around the estimated time
of change which in includes 297, the estimated time of change in the original (not
scaled) data.
Example 7.5.2 (Change Points in US Macroeconomic Data) We consider detect-
ing change points in the mean of high-dimensional macroeconomic data on the
United States (US). We focus on the FRED–MD data set, which comprises monthly
resolution data on N = 128 macroeconomic variables available from the United
States federal reserve economic database (FRED). The analysis of high-dimensional
macroeconomic panel data has drawn a great deal of attention in the last decade.
One of the most influential papers on this area of research is Stock and Watson
(2012), who used up to 200 time series of macroeconomic variables to investigate
the dynamics of the great recession during 2007–2009. Twenty randomly selected
series from this panel are illustrated in Fig. 7.3.
412 7 High-Dimensional and Panel Data
Fig. 7.3 20 randomly selected series from the FRED-MD data set with vertical lines representing
estimated change points significant at level 0.05 after applying a binary segmentation procedure
(1)
based on the statistic/test VN,T
In total, the 128 time series are taken from the period from June-1999 to June-
2019, with information related to nine areas, including output and income, labor
markets, consumption and orders, orders and inventories, money and credit, interest
rate and exchange rates, prices, and the stock market. McCracken and Ng (2016)
provided detailed descriptions of this data set. They also suggested transformations
of each series towards stationarity so the data are suitable for a factor analysis,
which we applied. They found that the transformed data have a factor structure
similar to the model considered in Stock and Watson (2012). Analysis of this panel
data using the criteria proposed by Bai and Ng (2002) to determine the number of
common factors suggests that the cross-sectional dependence is well explained by
eight common factors.
We take as the goal of the analysis to evaluate for structural breaks in the means
of this high-dimensional time series over the observation period. Change points in
the means of the cross-sections may represent changing phases of the US economy.
To detect such change points, we computed the test statistics
(1) (2)
VN,T = sup |V̄N,T (u)| and VN,T = sup |VN,T (u)|.
.
u∈[0,1] u∈[0,1]
(1) (2)
To approximate the null distributions of VN,T and VN,T , we applied Theorem 7.1.1
and Theorems 7.2.1–7.2.4, respectively. It is discussed in Horváth et al. (2022)
how the normalizing sequences and distributions are approximated. Notably, in
(2)
approximating the null distribution of VN,T , we make use of a factor model-based
bootstrap similar to the method proposed in Cho and Fryzlewicz (2015). We used
7.5 Data Examples 413
Table 7.1 Detected changes and the corresponding relevant events in the FRED-MD data with
the corresponding estimated p-values in brackets
Tests 1st change Event 2nd change Event 3rd change Event
(1)
VN,T Aug/03 (0.00) Labor Mar/08 (0.01) Federal Aug/12 (0.04) Unemployment
market bailout rate bottom
recover
(2)
VN,T Jun/06 (0.00) Housing Mar/08 (0.00) Federal Jan/16 (0.00) Growth rate
boom end bailout decrease
binary segmentation to estimate and detect additional changes in the means. Each
test found that the largest initial change in the macroeconomic structure occurred
in March 2008, which corresponds to when the US government bailout began after
the sub-prime mortgage crisis. However, different dates were found in the second
and third step of a subsequent binary segmentation procedure. The test based on
(2)
VN,T detected changes in June 2006 and January 2016, whereas the test based on
(1)
VN,T detected change points in August 2003 and August 2012. See Table 7.1 for a
summary of these findings.
Example 7.5.3 (Short Time Series in the Capital Asset Pricing Model) As
studied in Antoch et al. (2019), we consider an application of the methods presented
in Sect. 7.3 to the capital asset pricing model (CAPM). The (Fama and French, 1993)
three factor model augmented with the Carhart (1997) momentum factor is defined
as
.yi,t = β1,i,t + x2,t β2,i,t + x3,t β3,i,t + x4,t β4,i,t + x5,t β5,i,t + Ei,t ,
t ∈ {1, ..., T }, i ∈ {1, ..., N}, (7.5.1)
where yi,t denotes the excess return on the mutual fund; x2,t is the market risk
premium; x3,t is the value factor, calculated as the return difference between
portfolios with the highest decile of stocks and the lowest decile of stocks in terms
of the ratio of book equity-to-market equity; x4,t is the value factor, calculated as
the return difference between portfolios with the smallest decile of stocks and the
largest decile of stocks in terms of size; x5,t is the momentum factor calculated as
the return difference between portfolios with the highest decile of stocks and lowest
decile of stocks in terms of recent return (i.e. momentum); and Ei,t is the random
error.
The four factors can be downloaded from Ken French’s data library.1 The raw
dataset on mutual funds contains monthly return data of 6190 US mutual funds
from January 1984 to November 2014 were taken from http://finance.yahoo.com.
Using the Yahoo finance classification, they consider nine categories of the US
mutual funds. These are Large Blend, Large Growth, Large Value, Middle Blend,
1 mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.
414 7 High-Dimensional and Panel Data
Middle Growth, Middle Value, Small Blend, Small Growth and Small Value. These
categories are combinations of the mutual fund size and their investment strategies.
We take as the goal of our analysis to evaluate for change points in the parameters
of model (7.5.1) for mutual funds in these categories.
There are many missing values in the mutual fund dataset because different
mutual funds have different start dates and some of them have already been
terminated. Antoch et al. (2019) selects the mutual funds that have no missing
returns for the period of the subprime crisis (January 2006 to February 2010), so
that T = 50. Since T is small, we adopt the fixed T asymptotic framework of
Theorem 7.3.1. In order to test for change points, we use the test statistic
| |
| |
. sup N −1/2 |ṼN (t) − ÂN (t)| , (7.5.2)
t∈(0,1)
where
Σ
N ⎛ ⎞
ÂN (t) =
.
2
âi,t,1 , 2
âi,t,1 = σ̂i2 trace Z−1
i,t − Z−1
i,T ,
i=1
and σ̂i2 is the sample variance of the linear model residuals in each cross-section.
The null critical values for this statistic were estimated as described in Theorem
7.3.1 using a wild bootstrap as described in Antoch et al. (2019). Change points
were estimated using the maximal argument of the test statistic in (7.5.2).
For all but one fund category, the test statistic in (7.5.2) was above the 5%
critical values during the sub-prime crisis period (middle of 2008 to early 2009).
Interestingly, the test cannot detect changes for the category of Small Value mutual
funds at the 5% significance level. However the Small Value mutual fund category
is significant at the 10% significance level. The coefficients for the four factors
changed first for the Large Blend mutual fund category with estimated break point
in August 2008, and the Small Growth mutual fund category has estimated change
point just a month later in September 2008. The coefficients for Large Growth,
Middle Blend, Middle Growth, and Small Blend appeared to change last in March
2009. It is interesting to note that only the Large Blend indicated a change out of all
large type mutual funds. The Large Blend is defined as a balanced mix of growth
and value stocks, and may more closely resemble the market as a whole.
Figure 7.4 shows the estimated break points for the coefficients for the Mutual
Fund categories with the levels of the S&P 500. The Large Blend category indicated
a structural change when the S&P 500 was well above the level 1200. This could
have been used as a potential trading signal. The lowest point of the S&P 500 was
February 2009, below the 800 mark. Here the change point was detected for the
Large Growth, Middle Blend, Middle Growth, and Small Blend.
7.6 Exercises 415
Fig. 7.4 Estimated change points of the mutual fund returns compared to the level of the S&P 500
7.6 Exercises
We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N are independent. Show that
⎛N ⎞−1/2
Σ Σ
N
D[0,1]
. σi2 Si (u) −→ B(u),
i=1 i=1
We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N are independent. Show that
1 Σ
N
D[0,1]
. Si (u) −→ ┌(u),
N 1/2
i=1
We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N are independent. Show that
1 Σ
N
D[0,1]
. Si∗ (u) −→ B(u),
N 1/2
i=1
1 (1 − ρi2 )1/2
Si (u) =
. ⎣T u⎦(ρ̂i (u) − ρ̂i (1)).
T 1/2 σi
7.6 Exercises 417
We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N are independent. Show that
1 Σ
N
D[0,1]
. Si (u) −→ B(u),
N 1/2
i=1
We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The common factors {ft , t ∈ Z} are independent and identically
distributed with Eft = 0 and Eft2 = σ 2 . The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N
and {ft , t ∈ Z} are independent. Show that there is rN such that
1 Σ ∗
N
D[0,1]
. Si (u) −→ B(u),
rN
i=1
1
Si (u) =
. ⎣T u⎦(ρ̂i (u) − ρ̂i (1))
T 1/2
418 7 High-Dimensional and Panel Data
We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N and {ft , t ∈ Z} are
independent. Show that there is rN
1 Σ
N
D[0,1]
. Si (u) −→ B(u),
rN
i=1
N, 1 ≤ j ≤ d. Let
⎛ ⎞ ⎛ ⎞T
⎣T u⎦ ⎣T
1 ⎝Σ Σ u⎦
.Si (u) = (Xi,t − μi )⎠ E −1 i
⎝ (Xi,t − μi )⎠ .
T
t=1 t=1
Show that
1 Σ
N
. (Si (u) − du) converges in D[0, 1]
N 1/2
i=1
N, 1 ≤ j ≤ d. Let
⎛ ⎞ ⎛ ⎞T
⎣T u⎦ ⎣T
1 ⎝Σ Σ u⎦
.Si (u) = (Xi,t − X̄i )⎠ E −1
i
⎝ (Xi,t − X̄i )⎠ ,
T
t=1 t=1
1 Σ
T
Xi =
. Xi,t .
T
t=1
7.7 Bibliographic Notes and Remarks 419
Show that
1 Σ
N Σ
N
. (Si (u) − du(1 − u)) converges in D[0, 1]
N 1/2
i=1 i=1
1 ( )
Si (u) =
. ⎣T u⎦ α̂i (u) − α̂i (1) .
T 1/2
Show that
N ⎛
Σ ⎞
1
. Si2 (u) − ESi2 (u) converges in D[δ, 1 − δ] for all 0 < δ < 1/2,
N 1/2
i=1
if N → ∞, T → ∞ and N/T 2 → 0.
Due to increasing access to extremely large data sets, analysis of high dimensional
observations has received considerable attention. Jirák (2015) studies d dependent
change point tests, each based on a CUSUM-statistic. He provides an asymptotic
theory when the maximum over all test statistics as both the sample size and d tend
to infinity. His methods are based on a consistent bootstrap and an appropriate limit
distribution. This allows for the construction of simultaneous confidence bands for
dependent change point tests, and also to determine the location of the change both
in time and coordinates in high-dimensional time series.
The wild binary segmentation (WBS) of Fryzlewicz (2014) provides consistent
estimation of the number and locations of multiple change-points in scalar data.
Due to its random localisation mechanism, WBS works even for short spacings
between the change-points and/or small jump magnitudes, unlike standard binary
segmentation. In the high-dimensional setting Cho and Fryzlewicz (2015) propose
420 7 High-Dimensional and Panel Data
Functional data analysis concerns methods to analyse data that are naturally viewed
as taking values in infinite dimensional function spaces. Examples include data that
can be imagined as curves or surfaces. A general object of this type is termed a
functional data object. When functional data are observed sequentially over time,
they are referred to as functional time series. Usually the data that are available in
this setting are discrete measurements of such objects from which the full functional
data objects must be reconstructed or estimated using curve fitting techniques. In
some cases the functional data objects are fully observable on the domain on which
they are defined, for instance when they represent probability densities or other
summary functions. Entry points to this area include the text books Ramsey and
Silverman (2002), Horváth and Kokoszka (2012), Kokoszka and Reimherr (2017),
and Hsing and Eubank (2015), as well as the monograph Bosq (2000). These
cover in detail how to reconstruct functional data objects starting from discrete
measurements using curve fitting.
Here we assume that the functional data objects under consideration are fully
observed. As mentioned above this often means that initial discrete data have been
preprocessed using a curve fitting technique to form functional data objects, and one
should be wary of the effect of this on subsequent analysis. Generally if the discrete
measurements of the functional data are dense and the measurements are made
with relatively small error, subsequent analyses will not be sensitive to this step.
In addition, to simplify the presentation we assume that the functional data objects
are stochastic processes with domain .[0, 1] and sample paths in .L2 ([0, 1], R) = L2 ,
the Hilbert space of real valued square integrable functions. In other words, the
observations are stochastic processes .{X(t), 0 ≤ t ≤ 1} such that .X : Ω × [0, 1] →
R, and .||X||2 < ∞ almost surely, where
⎛⎛ 1 ⎞1/2
||f ||2 =
. f 2 (t)dt
0
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 421
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_8
422 8 Functional Data
⎛ ⎛1
is the .L2 norm. Below we use . to denote . 0 . We note that all results below may be
generalized to observations that are general random elements of abstract, separable
Hilbert spaces.
where .EEi (t) = 0 for all .t ∈ [0, 1], and .μ0 , and .μA are unknown mean functions,
with .k ∗ a possible, unknown change point. Since the sample paths of .Xi are assumed
to lie in .L2 , it is natural to define the no change in the mean null hypothesis as
. H0 : ||μ0 − μA || = 0, (8.1.2)
HA : ||μ0 − μA || > 0.
. (8.1.3)
We again model the errors as general, stationary and weakly dependent time
series processes that are decomposable Bernoulli shifts:
Definition 8.1.1 We say .{Ei (t), i ∈ Z, t ∈ [0, 1]} is .Lν -decomposable if .Ei (t) =
g(ηi , ηi−1 , . . .)(t) for some (deterministic) measurable function .g : S∞ → L2 ,
where .{ηj , j ∈ Z} are independent and identically distributed random variables
with values in a measurable space .S, and .Ei (t) = Ei (t, ω) is jointly measurable in
.(t, ω), for each .i ∈ Z. Further .EEi (t) = 0 for all .t ∈ [0, 1], .E||Ei || < ∞ with some
ν
2
.ν > 2, and
( ∗
)1/ν
. E||Ei − Ei,m ||ν2 ≤ am−α with some a > 0, α > 2, (8.1.4)
8.1 Change Detection in the Mean of Functional Observations 423
∗ = g(η , . . . , η ∗ ∗ ∗
where .Ei,l i i−l+1 , ηi−l , ηi−l−1 , . . .), and .{ηi , i ∈ Z} are independent
copies of .η0 , independent of .{ηl , l ∈ Z}.
Assuming the errors are .Lν -decomposable, the series defining the long-run covari-
ance kernel
∞
E
D(t, s) =
. EE0 (t)El (s), t, s ∈ [0, 1], (8.1.5)
l=−∞
is a well defined element of .L2 ([0, 1]2 , R). .D defines a symmetric kernel integral
operator on .L2 , and we use the notation .λ1 ≥ λ2 ≥ . . . ≥ 0 for its ordered set
of eigenvalues and .φ1 , φ2 , . . . to denote the corresponding orthonormal basis of
eigenfunctions satisfying for .t ∈ [0, 1]
⎛⎛
λl φl (t) =
. D(t, s)φl (s)ds, l ∈ N. (8.1.6)
Theorem 8.1.1 If .H0 of (8.1.2) is satisfied and the errors in model (8.1.1) are .Lν -
decomposable, then there exists a sequence of Gaussian processes .{┌N 0 (u, t), 0 ≤
u, t ≤ 1} such that
⎛
. sup (ZN (u, t) − ┌N0
(u, t))2 dt = oP (1),
0≤u≤1
E┌N (u, t) = 0 and .┌N (u, t)┌N (v, s) = min(u, v)D(t, s). Since under the
.
null hypothesis all .L2 functionals of .ZN do not depend on .μ0 , (8.1.8) implies
Theorem 8.1.1 with
0
┌N
. (u, t) = ┌N (u, t) − u┌N (1, t).
424 8 Functional Data
= 0,
⎧ ⎛ ⎫
1
. lim lim sup P sup (ZN ((N + 1)u/N, t) dt > x
2
1−δ≤u<1 [u(1 − u)]
δ→0 N →∞ 2κ
= 0, (8.1.10)
⎧ ⎛ ⎫
1
. lim lim sup P sup 0
(┌N (u, t))2 dt > x =0 (8.1.11)
δ→0 N →∞ 0<u≤δ [u(1 − u)]2κ
and
⎧ ⎛ ⎫
1
. lim lim sup P sup 0
(┌N (u, t))2 dt >x = 0. (8.1.12)
δ→0 N →∞ 1−δ≤u<1 [u(1 − u)]2κ
δ (1/2−κ)ν
≤ c2 ,
xν
which implies (8.1.9). The proof of (8.1.10) goes along the lines of (8.1.9). Next we
note that the distribution of .{┌N
0 (u, t), 0 ≤ u, t ≤ 1} does not depend on N . We
recall that
D
.{┌N
0
(u, t), 0 ≤ u, t ≤ 1} = {┌(u, t) − u┌(1, t), 0 ≤ u, t ≤ 1},
1 P
. sup ||┌(u, ·)||2 → 0 when δ → 0.
0<u≤δ uκ
By checking that the mean and covariance function coincide, one can verify that
with .(λi , φi )i∈N defined in (8.1.6),
D
{┌(u, t), 0 ≤ u, t ≤ 1} = {O(u, t), 0 ≤ u, t ≤ 1},
.
426 8 Functional Data
where
∞
E 1/2
O(u, t) =
. λi Wi (u)φi (t), (8.1.13)
i=1
and .{Wi (u), 0 ≤ u ≤ 1}, i ∈ N are independent Wiener processes. Using the
orthonormality of the eigenfunctions (see Theorem A.3.3) we have for each .0 ≤
u≤1
⎛ ∞
E
. O (u, t)dt =
2
λi Wi2 (u).
i=1
≤ c4 δ 1−2κ log(1/δ) → 0, as δ → 0.
and .{Bi (u), 0 ≤ u ≤ 1}i∈N are independent and identically distributed standard
Brownian bridges. The limiting behaviour of supremum and integral functionals of
the norm of .ZN (u, ·) under the one change alternative are described in the next
theorem.
Theorem 8.1.3 If .HA of (8.1.3) holds, and the errors in model (8.1.1) are .Lν -
decomposable, and
then
⎛
1 P
. sup 2
ZN (u, t)dt → 1
NθN2 (1 − θN )2 ||μ0 − μA || 2
0≤u≤1
and
⎛⎛
1 P 1
.
2
ZN (u, t)dtdu → ,
NθN2 (1 − θN )2 ||μ0 − μA || 2 3
where .k ∗ = ⎣N θN ⎦.
Proof We write
with
⎛ ⎞
⎣N
E ⎣Nu⎦ E
u⎦ N
ZN,E (u, t) = N −1/2 ⎝
. Ei (t) − Ei (t)⎠ ,
N
i=1 i=1
and
⎧
⎪ k(N − k ∗ )
⎨ (μ0 (t) − μA (t)), if 1 ≤ k ≤ k ∗ ,
.vN (k, t) = ∗ N
⎩ k (N − k) (μ0 (t) − μA (t)),
⎪
if k ∗ + 1 ≤ k ≤ N.
N
⎛⎛ 2
Since .sup0≤u≤1 ||ZN,E (u, ·)||2 = OP (1) and . ZN,E (u, t)dudt = OP (1) by
Theorem 8.1.1, (8.1.15) implies that each of the limits of interest are determined
by .vN . It follows from straightforward calculation that .sup0≤u≤1 ||vN (⎣Nu⎦, ·)||2 =
⎛⎛ 2
N θN2 (1−θN )2 ||μ0 −μA ||2 +O(1), and . vN (⎣Nu⎦, t)dudt = N θN2 (1−θN )2 ||μ0 −
μA ||2 /3 + O(1), from which the result follows. ⨆
⨅
We note that for weighted functionals of the CUSUM process as in Theo-
rem 8.1.2, if .0 ≤ κ < 1/2, then similarly it can be shown that
⎛⎛ ⎞2
1 P
. sup ZN (u, t) → ∞,
1/(N +1)<u<1−1/(N +1) [u(1 − u)]κ
if
denote the sample autocovariance kernel of the sample at lag .l, where
1 E
N
. X̄N (t) = Xi (t)
N
i=1
E
N −1 ⎛ ⎞
l
. D̂N (t, s) = K η̂l (t, s),
h
l=−(N −1)
where
∞
E
D(q) (t, s) =
. |l|q EE0 (t)El (s).
l=−∞
hopt = c0 N 1/(1+2q) ,
. (8.1.18)
where
⎛ ⎞1/(1+2q)
c0 = 2q||wD(q) ||2
.
⎛⎛ ⎛⎛ ⎞2 ⎞ ⎛ ⎞−1/(1+2q)
∞
× ||D|| +2
D(u, u)du 2
K (x)dx .
−∞
The constant .c0 may be estimated from the data using initial “pilot” estimates of .D
and .D(q) .
Hypothesis tests for the presence of a change point in model (8.1.1) may be
constructed using the estimator .D̂N . This method is a functional data analog of the
approach in Newey and West (1987).
Since in practice these are applied without foreknowledge of which of .H0 or .HA
hold, it is useful to also know the asymptotic properties of .D̂N under the change
in the mean alternative. Towards this, and in order to evaluate how the asymptotic
behaviour of .D̂N depends on the size of the change, we define for .t, s ∈ [0, 1]
and assume the difference between the means before and after the change can
depend on N .
Theorem 8.1.5 If .HA of (8.1.3) holds, and .{Ei , i ∈ Z} is .Lν -decomposable for
some .ν > 4, Assumptions 2.1.1, 3.1.4 and 3.1.5 hold, then
⎛ ⎛ ⎛ ⎛ ∞ ⎞2
1
. D̂N (t, s) − θN (1 − θN )ΔN (t, s) K(u)du dtds = oP (1).
h −∞
See Horváth et al. (2014) for a proof. It follows that .D̂N (t, s) is not a consistent
estimator of .D(t, s) under the alternative, owing to the fact that the sample mean
.X̄N used in defining the empirical autocovariance kernels does not properly center
430 8 Functional Data
the series under .HA . If the errors are uncorrelated, then .D(t, s) = cov(E0 (t), E0 (s)),
so the sample variance function
1 E
N
D∗N (t, s) =
. (Xi (t) − X̄N (t))(Xi (s) − X̄N (s))
N −1
i=1
can be used in place of the long-run covariance estimator. Along the lines of
Theorems 8.1.4 and 8.1.5, it may be shown that
⎛ ⎛
┐ ∗ ┐2
. DN (t, s) − cov(E0 (t), E0 (s)) dtds = oP (1)
The above results can be combined to establish the consistency of the following
hypothesis tests for .H0 . We consider the test statistics .TN and .MN defined as
⎛⎛ ⎛
TN =
.
2
ZN (u, t)dudt, and MN = sup 2
ZN (u, t)dt.
0≤u≤1
According to Theorem 8.1.3, a consistent test is obtained by rejecting .H0 for large
values of either statistic, and hence to obtain a consistent test with asymptotic size
.α we reject .H0 when
where .cT (α) and .cM (α) satisfy that under .H0
where .{Nk,l , 1 ≤ k, l < ∞} are independent standard normal random variables and
λ1 ≥ λ2 ≥ . . . are the eigenvalues of .D, defined in (8.1.6). The eigenvalues .λi and
.
eigenfunctions .φi can be estimated from the sample using the empirical eigenvalues
and eigenfunctions .λ̂N,1 ≥ λ̂N,2 ≥ . . . defined by
⎛⎛
λ̂N,l φ̂N,l (t) =
. D̂N (t, s)φ̂N,l (s)ds, 1 ≤ l ≤ N − 1. (8.1.19)
As a consequence of Theorem 8.1.4, we get that under .H0 and for any .d ≥ 1
(see Horváth and Kokoszka (2012), p. 34)
This suggests estimating .cT (α) and .cM (α) with .ĉT ,d (α) and .ĉM,d (α) so that
⎧⎛ ⎛ ⎛ ⎞2 ⎫
P
. ┌ 0 (u, t) dudt ≥ ĉT ,d (α) (8.1.20)
⎧ ⎫
⎨Ed
λ̂N,l 2 ⎬
≈P N ≥ ĉT ,d (α) = α,
⎩ (π k)2 k,l ⎭
k,l=1
and
⎧ ⎛⎛ ⎫
⎞2
P
. sup ┌ 0 (u, t) dudt ≥ ĉM,d (α)
0≤u≤1
⎧ ⎫
E
d
≈P sup λ̂N,l Bl (u) ≥ ĉM,d (α) = α.
0≤u≤1 l=1
The values .ĉT ,d (α) and .ĉM,d (α) satisfying this relation may be readily approxi-
E E
mated by simulating . dk,l=1 N2k,l λ̂N,l /(π k)2 and .sup0≤u≤1 dl=1 λ̂N,l Bl (u), given
the sample.
432 8 Functional Data
Hence Theorem 8.1.3 yields that for any .d ∈ N and so long as .h = o(NθN2 (1 −
θN )2 ),
{ } { }
. lim P TN ≥ ĉT ,d (α) = 1, and lim P MN ≥ ĉM,d (α) = 1
N →∞ N →∞
⎛⎛ 2 (u, t) ⎛ 2 (u, t)
ZN ZN
TN (κ) =
. dudt, and MN (κ) = sup dt.
wκ2 (u) 0≤u≤1 wκ2 (u)
⎛ ⎞⎛ ⎞
┌ 0 (u, t) ┌ 0 (v, s) min(u, v) − uv
E
. = D(t, s).
wκ (u) wκ (v) wκ (u)wκ (v)
Using now that .┌ 0 (u, t)/wκ (u) has the same distribution as
∞
O0 (u, t) E λi Bi (u)
1/2
. = φi (t), (8.1.21)
wκ (u) wκ (u)
i=1
similar approximations to the distribution of .TN (κ) and .MN (κ) under .H0 can be
computed as in (8.1.20).
The above test statistics might be considered to be “fully functional” in that they
do not require any initial dimension reduction of the functional data objects. We
now turn to change point detection methods for functional data based on functional
principal component analysis. Using the eigenfunctions .φ1 , φ2 , . . . of (8.1.6) we
define the projections of .ZN into the directions of the eigenfunctions of the d largest
eigenvalues of .D. Let
⎛
ξN,k (u) =
. ZN ((N + 1)u/N, t)φk (t)dt, 0 < u < 1, k ∈ {1, . . . , d}.
1 |ξN,k (u)| D 1
. max sup → sup |Bk (u)|, (8.1.22)
1≤k≤d λ1/2 0<u<1 [u(1 − u)]κ 0<u<1 [u(1 − u)]
κ
k
get that
|⎛ |
1 | |
. sup | (ZN ((N + 1)u/N, t) − ┌N (u, t))φk (t)dt || = oP (1).
0
[u(1 − u)] κ |
0<u<1
⎛ 0
For each N, the joint distribution of .{ ┌N (u, t))φk (t)dt, 0 ≤ u ≤ 1, 1 ≤ k ≤ d} is
⎛ 0
normal with .E ┌N (u, t))φk (t)dt = 0, and the covariance is given by
⎧⎛⎛ ⎞ ⎛⎛ ⎞⎫
E
.
0
┌N (u, t))φk (t)dt 0
┌N (u' , s))φl (s)ds
⎛⎛
= (min(u, u' ) − uu' ) φk (t)D(t, s)φl (s)dtds
⎧
λk (min(u, u' ) − uu' ), if k = l,
=
0, if k /= l.
−1/2
0 (u, t))φ (t)dt are independent Brownian bridges, completing the
Hence .λk ┌N k
proof of (8.1.22).
Observing that .[u(1 − u)]−1/2 |ξk,N (u)|, .1 ≤ k ≤ d are asymptotically
independent, one can establish (8.1.23) as in Theorem 1.2.5. ⨅
⨆
The empirical counterpart to the statistics appearing on the left-hand sides
of (8.1.22) and (8.1.23) are obtained by substituting .λ̂N,k , .φ̂N,k defined in (8.1.19)
for .λk and .φk . Let
⎛
.ξ̂N,k (u) = ZN ((N + 1)u/N, t)φ̂N,k (t)dt (8.1.24)
434 8 Functional Data
be the empirical counterpart of .ξN,k (u). The proof of the following result is left to
the reader, see Exercise 8.6.6.
Theorem 8.1.7 We assume that .H0 of (8.1.2) holds, .{Ei , i ∈ Z} is
Lν -decomposable for some .ν > 4, and that .λ1 > · · · > λd > λd+1 ≥ 0.
.
1 |ξ̂N,k (u)| D 1
. max sup → sup |Bk (u)|, (8.1.25)
1≤k≤d 1/2
λ̂N,k 0<u<1 [u(1 − u)]κ 0<u<1 [u(1 − u)]κ
In order to study the properties of estimators of the change point .k ∗ under .HA , we
consider the following AMOC alternative model that fixes notation in (8.1.1):
⎧
μ(t) + Ei (t), if 1 ≤ i ≤ k ∗ ,
Xi (t) =
. (8.2.1)
μ(t) + ΔN h(t) + Ei (t), if k ∗ + 1 ≤ i ≤ N,
where .||h|| = 1 and .k ∗ = ⎣θ N ⎦. .ΔN then effectively describes the size of the
change, which we allow to depend on the sample size N. A natural estimator is
the location at which the norm of the functional CUSUM process attains its largest
value. Let
and
⎧
⎪ −1 ⎛
E
⎪
⎪
⎪
⎪ − Ei (t)h(t)dt, if k < 0,
⎪
⎪
⎨ i=k
.S(k) = 0, if k = 0, (8.2.4)
⎪
⎪ k ⎛
⎪
⎪ E
⎪
⎪
⎪
⎩ Ei (t)h(t)dt, if k > 0.
i=1
Theorem 8.2.1 We assume that .{Xi , i ∈ {1, . . . , N}} follow model (8.2.1), and
that .{Ei , i ∈ Z} is .Lν -decomposable for some .ν > 4 are satisfied. Under the
shrinking change Assumption 8.2.2,
(i) if .0 ≤ κ < 1/2, then with .ξ(κ) defined in (2.2.3),
Δ2N D
.
2
(k̂N − k ∗ ) → ξ(κ).
τ
(ii) If in addition
then
Δ2N D
.
2
(k̂N − k ∗ ) → ξ(1/2).
τ
436 8 Functional Data
D
{ }
k̂N − k ∗ → argmaxl ΔS(l) − Δ2 |l|mκ (l) .
. (8.2.5)
|k̂ − k ∗ | = oP (N ).
. (8.2.6)
We decompose the CUSUM process into its random and drift components:
E
k
k E
N
k
. Xi (t) − Xi (t) = Sk (t) − SN (t) − zk (t),
N N
i=1 i=1
where
E
k
Sk (t) =
. Ei (t)
i=1
and
⎧
⎪ k(N − k ∗ )
⎨ ΔN h(t), if 1 ≤ k ≤ k ∗ ,
.zk (t) = ∗ (NN− k)
⎪
⎩ k
ΔN h(t), if k ∗ + 1 ≤ k ≤ N.
N
On the other hand, for all .Nα ≤ k ≤ Nβ, .0 < α < β < 1 we have
⎛
1
. zk2 (t)dt → ∞. (8.2.8)
N
Now (8.2.6) follows from (8.2.7) and (8.2.8). The next step is the proof of
We introduce
⎛ ⎞2κ ⎛ ⎞2
N k
Qk (t) =
. Sk (t) − SN (t) − zk (t)
k(N − k) N
⎛ ⎞2κ ⎛ ⎞2
N k∗
− Sk ∗ (t) − SN (t) − z k ∗ (t)
k ∗ (N − k ∗ ) N
with
⎧⎛ ⎞2κ ⎛ ⎞2κ ⎫ ⎛ ⎞2
N N k
.Qk,1 (t) = − Sk (t) − SN (t) ,
k(N − k) k ∗ (N − k ∗ ) N
⎛ ⎞2κ ⎧⎛ ⎞2 ⎛ ⎞2 ⎫
N k k∗
Qk,2 (t) = Sk (t) − SN (t) − Sk ∗ (t) − SN (t) ,
k ∗ (N − k ∗ ) N N
⎛ ⎞2κ ⎛ ⎞
N k
Qk,3 (t) = −2 Sk (t) − SN (t) (zk (t) − zk ∗ (t)),
k(N − k) N
⎛⎛ ⎞2κ ⎛ ⎞2κ ⎞ ⎛ ⎞
N N k
Qk,4 (t) = − − Sk (t) − SN (t) zk ∗ (t),
k(N − k) k ∗ (N − k ∗ ) N
⎛ ⎞2κ
N k − k∗
Qk,5 (t) = 2 SN (t)zk ∗ (t),
k ∗ (N − k ∗ ) N
⎛ ⎞2κ
N
Qk,6 (t) = −2 ∗ (Sk (t) − Sk ∗ (t)) zk ∗ (t),
k (N − k ∗ )
⎛ ⎞2κ ⎛ ⎞2κ
N N
Qk,7 (t) = zk (t) −
2
zk2∗ (t).
k(N − k) k ∗ (N − k ∗ )
We only consider the case when .1 ≤ k ≤ k ∗ , and we may take a similar approach
when .k > k ∗ . Also, as a consequence of (8.3.14), we can assume that .N α ≤ k,
where .0 < α < θ . Let .a = a(N) = C/Δ2N . Using the mean value theorem we get
|⎛ ⎞2κ ⎛ ⎞2κ ||
|
1 | N N |
. max | − | = O(N −1−2κ )
N α≤k≤k ∗ k∗ − k | k(N − k) k ∗ (N − k ∗ ) |
438 8 Functional Data
We note that
⎛ ⎞2κ ⎛ ⎞
N k − k∗
.Qk,2 (t) = 2 Sk (t) − Sk (t) −
∗ SN (t)
k ∗ (N − k ∗ ) N
⎛ ⎞
k k∗
× Sk (t) − SN (t) + Sk ∗ (t) − SN (t)
N N
⎛ ∗ ⎞
⎛ ⎞2κ E k
N ⎝
= −2 ∗ Ei (t)⎠
k (N − k ∗ )
i=k+1
⎛ ⎞
k k∗
× Sk (t) − SN (t) + Sk ∗ (t) − SN (t)
N N
⎛ ⎞2κ ∗
N k −k
+2 ∗ ∗
SN (t)
k (N − k ) N
⎛ ⎞
k k∗
× Sk (t) − SN (t) + Sk ∗ (t) − SN (t) .
N N
and therefore by the maximal inequality for partial sums (see Theorem A.3.1) we
get
|| l ||ν
||E ||
|| ||
.E max || Ei || ≤ c2 k ν/2 (8.2.14)
1≤l≤k || ||
i=1 2
with some constants .c1 > 0 and .c2 > 0. We then obtain from elementary
inequalities that
|| || || k ||
|| k ∗ || ||E ||
1 || || E || 1 || ||
. max∗ ∗ || El ||
|| = max ∗ || Ek ∗ +1−i ||
1≤k≤k −a k − k || || ||
l=k+1 ||2
a≤k<k k
i=1 2
|| k ||
1 ||
||E ∗
||
||
≤ max max || E k +1−i ||
⎣log a⎦−1≤j ≤⎣log k ∗ ⎦+1 ej ≤k≤ej +1 k || ||
i=1 2
|| k ||
||E ||
1 || ||
≤ max max || E k ∗ +1−i || .
⎣log a⎦−1≤j ≤⎣log k ∗ ⎦+1 ej ej ≤k≤ej +1 || ||
i=1 2
⎣log k ∗ ⎦+1 ⎧ || k || ⎫
E ||E ||
|| || −1/2
≤ P max || j
Ek ∗ +1−i || > xe a
ej ≤k≤ej +1 || ||
j =⎣log a⎦−1 i=1 2
∞
E
c2 ej ν/2 c3
≤ ≤ ,
xν ej ν a −ν/2 xν
j =⎣log a⎦−1
where .c3 is a constant. Hence (8.2.13) is proven. Also, using again Theorem 8.1.1
we get
⎛
| ⎛ ⎞|
| ∗ |
|SN (t) Sk (t) − k SN (t) + Sk ∗ (t) − k SN (t) | dt
.
| N N |
|| ||
|| k k ∗ ||
≤ ||SN ||2 ||Sk − SN + Sk − SN ||
|| ∗
||
N N 2
= OP (N ).
and
|| ||
|| k ||
. max ||Sk − SN || = OP (N 1/2 ).
1≤k≤k ∗ || N ||2
1
. max ||zk − zk ∗ ||2 = OP (|ΔN |) .
1≤k<k ∗ k∗ − k
8.2 Estimating Change Points 441
Thus we get
⎛
1
. max |Qk,3 (t)|dt = oP (1). (8.2.17)
αN ≤k<k ∗ 2 ∗
ΔN (k − k)N 1−2κ
and
⎛
1
. max |Qk,5 (t)|dt = oP (1). (8.2.19)
αN ≤k<k ∗ Δ2N (k ∗ − k)N 1−2κ
where the .OP (1) term does not depend on C. Elementary arguments give that there
are .c4 > 0 and .c5 > 0 such that
⎛ ⎛ ⎞
1−2κ ∗
. − c4 ΔN N (k − k) ≤ zk2 (t) − zk2∗ (t) dt
2
(8.2.21)
for all .N α ≤ k < k ∗ . It follows from (8.2.11)–(8.2.19) and (8.2.21) that for all
C > 0 and .α < θ
.
⎛ ⎛ ⎞
1
. max Qk,1 (t) + Qk,2 (t) + . . . + Qk,5 (t) + Qk,5 (t) dt
N α≤k≤k ∗ −C/Δ2N 2
P
→ −∞. (8.2.22)
= 0.
and
|| ||
|| k k ∗ ||
|| ||
||Sk − N SN + Sk − N SN || = OP (N ).
1/2
. max ∗
k ∗ −C/Δ2N ≤k≤k ∗ 2
Similar arguments can be used when .k ∗ < k ≤ k ∗ + C/Δ2N . This completes the
proof of (8.2.24) when .i = 2.
We note that
⎛
. max |Qk,3 (t)|dt
|k−k ∗ |≤C/Δ2N
|| ||
|| k ||
= O(N −2κ ) max |||| Sk − S ||
N || max ||zk − zk ∗ ||2
1≤k≤N N 2 |k−k ∗ |≤C/ΔN
2
⎛ ⎞
N 1/2
= OP N −2κ = oP (N 1−2κ )
|ΔN |
= o(1),
= 0, if s = 0
⎪
⎪
⎪
⎪ ⎛ ∗ ∗ ) ⎞1−2κ k ∗ +s/Δ2N ⎛
E
⎪
⎪ k (N − k
⎪
⎪ −2 Δ Ei (t)h(t)dt, if s > 0.
⎩ N2
N
∗ i=k +1
Since
⎛ the errors .{Ei , i ∈ Z} are .Lν -decomposable, it follows that the scalar sequence
.{ Ei (t)h(t)dt, i ∈ Z} .L -decomposable, and therefore Theorem A.1.1 yields
ν
⎛
1 D[−C,C]
. Qk ∗ +sτ 2 /Δ2 ,4 (t)dt −→ 2(θ (1 − θ ))1−2κ τ 2 W (s), (8.2.26)
N N
444 8 Functional Data
where .{W (s), −∞ < s < ∞} is the two sided Wiener process of (2.2.2) and .τ is
defined in (8.2.3). If .k̂(C) is defined as
⎧
. k̂(C) = min k : |k − k ∗ |
⎛ ⎞κ
⎛ ⎛ k ⎞2
N E k E
N
≤ Cτ 2
/Δ2N and Xi (t) − Xi (t) dt
k(N − k) N
i=1 i=1
⎛ ⎞κ
⎛ ⎛ ⎞2
⎫
N Ej
j E
N
= max ⎝ Xi (t) − ⎠
Xi (t) dt ,
|j −k ∗ |≤Cτ 2 /Δ2N j (N − j ) N
i=1 i=1
Δ2N D
. (k̂(C) − k ∗ ) → argmax(2(θ (1 − θ ))1−2κ τ 2 W (s)
τ2
− 2θ (1 − θ )τ 2 |s|m0 (s) : |s| ≤ C)
= argmax (W (s) − |s|mκ (s) : |s| ≤ C) .
Since
where .ξ(κ) is defined in (2.2.3), the proof is complete when .0 ≤ κ < 1/2.
The proof of the second part of the theorem based on the observation that
|| k ||
1 ||||E ||
||
. max || E i || = OP ((log N)1/ν ). (8.2.27)
1≤k≤N k 1/2 || ||
i=1 2
To prove (8.2.27) we follow the proof of (8.2.15). Let .C > 0. Arguing as in (8.2.15),
on account of (8.2.14) we have
⎧ || k || ⎫
1 ||
||E
||
|| c6
P
. max 1/2 || Ei (t)|| > C(log N) 1/ν
≤ ν.
1≤k≤N k || || C
i=1 2
Since C can be chosen arbitrarily large, (8.2.27) follows. This implies that the drift
term dominates the process. Hence we have that (8.2.6) holds.
In order to establish (8.2.5), we follow the proof of Theorem 8.2.1. We present
detailed results when .κ = 0. We note that
|k̂N − k ∗ | = oP (N ),
.
8.2 Estimating Change Points 445
so we can assume that .⎣Nα⎦ ≤ k̂N ≤ ⎣Nβ⎦, 0 < α < θ < β < 1. Next we show
that
|k̂N − k ∗ | = OP (1)
. (8.2.28)
Let .C > 0. We use again the decomposition in (8.2.10). We only work out the
details for .⎣N α⎦ ≤ k ≤ k ∗ − C. It follows from the proof of (8.2.13) that
|| ||
|| E k∗ ||
||
1 || ||
max E ||
i || = OP (1) (8.2.29)
.
1≤k≤k ∗ −C ∗ ||
k − k ||
i=k+1 || 2
Since
1
. max ||zk − zk ∗ ||2 = OP (1)
1≤k<k ∗ k∗ − k
We note that there are .c1 > 0 and .c2 > 0 such that
⎛
∗
. − c1 N |k − k| ≤ Qk,5 (t)dt ≤ −c2 N |k ∗ − k| (8.2.35)
for all .⎣N α⎦ ≤ k ≤ ⎣Nβ⎦, .0 < α < θ < β < 1. Since the estimates in (8.2.31)–
(8.2.34) are valid for .k ∗ +C ≤ k < N and therefore by (8.2.35) we get for all .C > 0
and .0 < α < θ < β < 1
⎛ ⎛ ⎞
1 P
. max Q k,1 (t) + Q k,2 (t) + Q k,3 + Q k,5 (t) dt → −∞
|k ∗ −k|≥C,⎣N α⎦≤k≤⎣Nβ⎦ 2
for all .x < 0. Hence we can assume that .k̂N is between .k ∗ − C and .k ∗ + C, for a
large .C > 0. By definition
⎧
⎪ k∗
⎪
⎪ k ∗ (N − k ∗ ) E
⎪
⎪ −2 Δ h(t)Ei (t), if k < k ∗ ,
⎪
⎪ N
⎨ i=k+1
Qk,4 (t) =
. 0, if k = k ∗ ,
⎪
⎪
⎪
⎪ k ∗ (N − k ∗ ) E
k
⎪
⎪ if k > k ∗ ,
⎪
⎩ 2 Δ h(t)Ei (t),
N
i=k ∗ +1
Thus we conclude
⎧ ⎛ ⎫
1
. (Qk ∗ +l,4 (t) + Qk ∗ +l,5 (t)dt, |l| ≤ C
N
D
{ ⎛ ⎞}
→ θ (1 − θ ) ΔS(l) − Δ2 |l|m0 (l) .
The final details proceed along the same lines as in Theorem 2.2.2.
⨆
⨅
In order to use Theorem 8.2.1 to perform inference on the change point location,
one must estimate the unknown values .τ 2 and .ΔN . The function .eN (t) = ΔN h(t)
may be estimated with
1 E E
k̂N N
1
X̂k̂,1 =
. Xi (t) and X̂k̂,2 = Xi (t).
k̂N i=1 N − k̂N
i=k̂N +1
||êN ||22 P
. → 1.
Δ2N
with
⎛
eN (t)
gl = gl,N =
. El (t) dt.
||eN ||
⎛
ĝi =
. Êi (t)êN (t)dt.
E
N −1 ⎛ ⎞
l
τ̂N2 =
. K γ̂l ,
h
l=1−N
where
⎧
⎪ N −l
⎪
⎪ 1 E
⎪
⎪ ĝi ĝi+l , if 0 ≤ l < N,
⎨N −l
γ̂l =
.
i=1
⎪
⎪ 1 E
N
⎪
⎪ ĝi ĝi+l , if − N < l < 0.
⎪
⎩ N − |l|
i=1−l
The bandwidth h and kernel K satisfy Assumptions 3.1.4 and 3.1.5. Again under
the conditions of Theorem 8.2.1
τ̂N2 P
. → 1. (8.2.36)
τ2
These can be used to construct confidence intervals for .k ∗ . Under the conditions
of Theorem 8.2.1, it follows that
⎛ ⎞
τ̂N2 O(κ)1−α/2 τ̂N2 O(κ)α/2
. k̂N − , k̂N − (8.2.37)
||êN ||22 ||êN ||22
and
⎧ ⎫
(2) −1/2 1
.k̂ = sargmax max λ̂ |ξ̂N,j (k/N)| ,
N
k∈{1,...,N } 1≤j ≤d N,j (k(N − k))κ
where .λ̂N,1 ≥ λ̂N,2 ≥ . . . ≥ λ̂N,d are the empirical eigenvalues from (8.1.19).
E
R+1
Xi (t) =
. μj (t)1{kj −1 ≤ i < kj } + Ei (t), (8.2.38)
j =1
The indices .k1 , . . . , kR denote the locations of change points in the mean, which we
assume satisfy
Assumption 8.2.3 .ki = ⎣Nθi ⎦, .0 = θ0 < θ1 < · · · < θR < θR+1 = 1.
We consider estimating and performing inference on R and .k1 , . . . , kR in two
stages. First, we develop preliminary consistent estimators of R and .k1 , . . . , kR
using standard binary segmentation. Subsequently, noting that we expect exactly
one change point between the estimates .k̂i and .k̂i+1 , these estimates are refined in
a second stage by considering single change point estimators over the observations
with indices between .k̂i and .k̂i+1 . The asymptotic distribution of the estimators in
this second stage may be determined as in Sect. 8.2.1.
In order to produce preliminary, consistent estimators of R and .k1 , . . . , kR ,
we suggest using binary segmentation, which, as discussed in Chap. 2, involves
sequentially splitting the original sample into two sub-samples based on an initial
change point estimate, estimating change points on each sub-sample, and then
repeating until some stopping criterion is satisfied. To formulate this method applied
to a functional time series, suppose that we have arrived at some point in the
procedure at a sub-sample with a starting index l and an ending index u satisfying
.1 ≤ l < u ≤ N. In order to identify and estimate change points, we consider
sequential estimates of the mean function based on the partial sum process .Sk (t) =
450 8 Functional Data
Ek
k ∈ {1, ..., N}. To estimate changes points on the sub-sample with
j =1 Xj (t), .
ZN (k/N, t)
.
[k/N (1 − k/N )]κ
defined over the sample .Xl , . . . , Xu . Intuitively if there exists one or more change
points in a sub-sample, .||Zkl,u || will be large, and the point .k̂l,u = sargmax ||Zkl,u ||
l<k<u
estimates a change point. Deciding whether to include or exclude .k̂l,u as a potential
change point can be determined by the magnitude of .||Zkl,u ||: if this exceeds some
threshold .ρN , we then include .k̂l,u among the estimated change points, and further
segment the sub-sample. This may be described by the following pseudo-algorithm:
BINSEG(1, N, .ρN ) returns a set of estimated change points .K̂ = {k̂1 , . . . , k̂R̂ },
sorted into increasing order, and an estimated number of change points .R̂ =
|K̂|. These estimates are consistent so long as the errors in (8.2.38) are .Lν -
decomposable:
Theorem 8.2.2 Assume that .Xi , i ∈ {1, ..., N} satisfies the multiple change point
model (8.2.38), Assumption 8.2.3 holds, .ΔN is bounded away from zero, .0 < κ ≤
1/2, and the model errors .{Ei , i ∈ Z} is .Lν -decomposable for some .ν > 2. If .ρN in
the binary segmentation algorithm satisfies
(log N)1/ν ρN
. + 1/2 → 0, N → ∞.
ρN N
8.2 Estimating Change Points 451
type detectors as in Sect. 1.3 are also consistent in the sense of Theorem 8.2.2.
This corollary though should not be used to trivialize the application of binary
segmentation in each of these settings. The practical aspects of applying binary
segmentation in each of these settings and spaces differ in many respects.
Remark 8.2.2 A standard choice of the threshold .ρN is to take it of the form .ρN =
σ̂N (log N )1/2 , where .σ̂N2 = median(||Xi+1 −Xi ||2 /2, i = 2, . . . , N ). Choosing .ρN
as an appropriate quantile of the approximate (limiting) distribution of the CUSUM
detector used is another popular option, but does not lead to consistency unless the
quantile is taken to increase with the sample size at an appropriate rate.
In order to simplify the presentation, we only consider the case when .κ = 1/2
in (8.2.39). Under model (8.2.4), we may write .Zkl,u (t) = Okl,u (t) + Wkl,u (t), where
⎛ ⎞1/2
u−l
k
.Ol,u (t) = (8.2.40)
(u − k)(k − l)
┌ ┐
k−l
× Mk (t) − Ml (t) − (Mu (t) − Ml (t)) ,
u−l
⎛ ⎞1/2
u−l
.Wkl,u (t) = (8.2.41)
(u − k)(k − l)
┌ ┐
k−l
× Ek (t) − El (t) − (Eu (t) − El (t)) ,
u−l
Ek ER+1 Ek
Mk (t) =
. i=1 j =1 μj (t)1{kj −1 ≤ i < kj }, and .Ek (t) = i=1 εi (t) is the
partial sum of the errors.
The binary segmentation algorithm involves estimating change points on sub-
samples, and so in the Lemmas below we use .l and u to denote the starting and
ending indices of the sub-sample under consideration. For any such indices, if there
are change points between .l and u, we use the notation .i0 and .β ≥ 0 to describe the
452 8 Functional Data
starting index and the number of change points between .l and u, so that
.ki0 ≤ l < ki0 +1 < ki0 +2 < . . . < ki0 +β < u ≤ ki0 +β+1 , (8.2.42)
Let .I = {1, 2, . . . , β} be the index set of the change points between .l and u. In
order to prove Theorem 8.2.2, we establish the following result that shows that the
norm of the trend of the functional CUSUM process is maximized only at change
points.
Lemma 8.2.1 Suppose there exists at least one change point between .l and u.
If .k ∗ = sargmax ||Okl,u ||, then .k ∗ = ki for some .i ∈ {1, . . . , R}, with .l ≤ ki ≤
l<k<u
u.
Proof Let .I denote the set of indices of changes points that are between .l and u.
We consider two cases separately: (i) There is one change point between .l and u
and, (ii) There are two or more change points between .l and u.
Case 1: .|I| = 1: Let v denote the single change point between .l and u. Let
the mean functions between .l and .v and .v + 1 and .u be denoted .μ and .μ' ,
respectively. Then we have for .l < k < v,
⎛ ⎞1/2
u−l
||Okl,u || =
. (k − l)||μ|| (8.2.43)
(u − k)(k − l)
⎛ ⎞
k − l 1/2
= (u − l)1/2 ||μ||.
u−k
(v−l)/(u−l), .d2∗ = (v ' −l)/(u−l) denote the break fractions of the consecutive
change points v and .v ' under consideration. Then .0 = d0 < d1 < . . . < dβ <
dβ+1 = 1. Therefore, for any k between v and .v ' , we may rewrite .Okl,u as
⎛ ⎞1/2
u−l
Okl,u =
. (Mk − Ml )
(u − k)(k − l)
E r ⎛ ⎞
k−l
(dj − dj −1 )μi0 +j + − dr μi0 +r+1
u−l
j =1
= (u − l)1/2 ⎛ ⎞ .
k − l u − k 1/2
u−lu−l
⎛ ⎞2
Er
.s(x) = ⎝ (dj − dj −1 )||μi0 +j || + (x − dr )||μi0 +r+1 ||⎠ (8.2.46)
j =1
E
r ⎛ ⎞
+ 2(x − dr ) (dj − dj −1 ) <μi0 +j , μi0 +r+1 > − ||μj ||||μi0 +r+1 ||
j =1
( )
= a'x + b ' 2
+ 2(x − dr )t,
Er
where .a ' = ||μi0 +r+1 ||, .b' = j =1 (dj − dj −1 )||μi0 +j || − dr a ' and
Er ⎛ ⎞
.t = j =1 (dj − dj −1 ) <μi0 +j , μi0 +r+1 > − ||μj ||||μi0 +r+1 || . Notice that
by the Cauchy-Schwarz inequality, .t ≤ 0. Moreover, (8.2.46) can be represented
as .s(x) = a '2 x 2 + 2(t + a ' b' )x + b'2 − 2tdr = ax 2 + bx + c. If .h(x) is extended
to the open unit interval as .s(x)/[x(1 − x)]2 , then we now wish to show is that h
achieves a maximum over the interval .[d1∗ , d2∗ ] at either .d1∗ or .d2∗ . The derivative
of .h(x) is
where .g(x) is a quadratic function with vertex .−c/(a + b), when .(a + b) /= 0.
First notice that .g(0) = −c = −(b'2 − 2tdr ) ≤ 0. Consider three scenarios: (i)
.a + b = 0. (ii) .a + b > 0, and (iii) .a + b < 0.
⎛ ⎛ ⎞⎞1/2
c c c
.x1 = − +1 − , and
a+b a+b a+b
⎛ ⎛ ⎞⎞1/2
c c c
x2 = +1 − .
a+b a+b a+b
Evidently .x2 > 1. Therefore, .g(x) is either positive from 0 to 1 or negative from
0 to .x1 and positive from .x1 to 1. Once again, .h(x) over .[0, 1] is either strictly
increasing, or decreasing and then increasing, respectively.
It follows in all cases that .h(x) is maximized over .[d1∗ , d2∗ ] at either .d1∗ or .d2∗ ,
which implies the statement of the lemma.
⨆
⨅
Remark 8.2.3 We note that the proof of Lemma 8.2.1 implies that the derivative
of h defined in (8.2.46) is non-zero at the point .d1∗ and/or .d2∗ maximizing h on the
interval .[d1∗ , d2∗ ], since if .h' has a zero at .x0 ∈ [d1∗ , d2∗ ], h must be decreasing to
the left of .x0 , and increasing to the right of .x0 , so that .x0 cannot coincide with the
maxima. This implies that a linear approximation of h at its maxima has a slope
bounded away from zero.
Lemma 8.2.2 If on the sub-sample with indices between .l and u,
(u − l)1/2 ΔN mN
. max ||Mk − Ml || ≥ .
l<k<u ((u − k)(k − l))1/2 2 N 1/2
Proof Let .a = max ||Mk − Ml ||. Then we aim to show that .a ≥ mN ΔN /4 under
l<k<u
condition (8.2.48). Let .v = ki0 +r and .v ' = ki0 +r+1 (if v is the right-most change
point between .l and u, let .v ' = u). Further assume that .E[Xv (t)] = μ(t) and
' '
.E[Xv ' (t)] = μ (t). Since .||μ − μ || ≥ ΔN , we get by the reverse triangle inequality
( '
)
that .max ||μ||, ||μ || ≥ ΔN /2. Moreover, (8.2.48) and the definition of .mN imply
that there is no additional change point between .[v − mN , v) and .(v, v + mN ]. Since
then .Mv (t) − Mv−mN (t) = mN μ(t), and .Mv+mN (t) − Mv (t) = mN μ' (t), we get
that
ΔN
. max(||Mv − Mv−mN ||, ||Mv+mN − Mv ||) ≥ mN . (8.2.49)
2
ΔN
Then we claim that . max ||Mk − Ml || ≥ 4 mN . If not,
l<k<u
ΔN
||Mv+mN − Ml || <
. mN , ||Mv − Ml ||
4
ΔN ΔN
< mN , and ||Mv−mN − Ml || < mN . (8.2.50)
4 4
These with the triangle inequality contradict (8.2.49). Furthermore, because
u−l 4
. ≤ ,
(u − k)(k − l) N
we have
⎛ ⎞1/2
u−l ΔN mN
. max ||Mv − Ml || ≥
l<k<u (u − k)(k − l) 2 N 1/2
⨆
⨅
The final lemma required describes conditions on the sub-samples under which
the binary segmentation algorithm will terminate.
Lemma 8.2.3 Suppose that the sub-sample .l and u are such that for a subset of the
sample space .AN , .maxl<k<u ||Wkl,u || ≤ aN and one of the following conditions are
satisfied:
(i) .β = 0, .ki0 < l < u < ki0 +1 ,
(ii) .β = 1, .min(ki0 +1 − l, u − ki0 +1 ) ≤ fN ,
456 8 Functional Data
(i) .β = 0 : Since there is no change point in this case, then on .AN , .||Zkl,u || =
||Wkl,u || ≤ aN .
(ii) .β = 1 : In this case according to Lemma 8.2.1,
⎛ ⎞1/2
ki +1 ki0 +1 − l
Ol,u =
. ||Ol,u0 || = (u − l) 1/2
||μ||
u − ki0 +1
√ √ 1/2
≤ 2B(min(ki0 +1 − l, u − ki0 +1 ))1/2 ≤ 2CfN .
imply that a change point is initially detected with probability converging to one,
i.e. .P {max1≤j ≤N ||Zk1,N || > ρN } → 1 as .N → ∞. First then we wish to show that
Below we let .r(k) = N/[k(N − k)], so that comparing to (8.2.40) and (8.2.41),
Ek
Ok1,N = r 1/2 (k)O0,k
.
1,N , and .W1,N = r
k 1/2 (k)W0,k = r 1/2 (k)
1,N j =1 (εj − ε̄). With
this notation we write for .k < ki
E
5
.||Zk1,N ||2 − ||Zk1,N
i
||2 = Ai,N (k),
i=1
where
0,ki ki
A4,k = 2<r(k)O0,k 0,k
1,N − r(ki )O1,N , W1,N > and A5,N (k) = ||O1,N || − ||O1,N || .
k 2 2
Using the right-hand inequalities of (8.2.55) and (8.2.56), for i such that .ki ∈ Kmax
there exists a positive constants .c5,i so that .maxk∈Ii,N (M), k≤ki A5,N (k) ≤ −c5,i M.
It follows then that
We now aim to show that .A5,N is the dominate term in (8.2.54). By the mean value
theorem, we have .|r(k) − r(ki )| ≤ c6 |k − ki |/N 2 , k ∈ LN . It follows from this and
the left-hand bounds in (8.2.55) and (8.2.56) that for all M,
|| ||2
|A1,N (k)| c7 || 1 0,k ||
max ≤ max || W ||
N 1≤k≤N || N 1/2 1,N ||
.
k∈IN,i (M), k<ki |A5,N (k)|
⎛ ⎞
(log N)2/ν
= OP = oP (1). (8.2.58)
N
i=k+1
|| ||
|| E ki || ┌ ┐
||
c8 || ||
≤ || (εi − ε̄)||
|| ||W0,k
0,ki
1,N || + ||W1,N || . (8.2.59)
N || ||
i=k+1
0,k
We have as above that .max1≤k≤N |||W1,N ||/N 1/2 = OP ((log N)1/ν ), and
0,ki
|||W1,N
. ||/N 1/2 = OP (1). Additionally, for .ζ ∈ (1/2, 1) and using the triangle
inequality and the definition of .IN,i (M),
|| ||
|| ||
|| 1 Eki
||
max || εi − ε̄||
.
||
k∈IN,i (M), k<ki || ki − k ||
i=k+1 ||
|| ||
|| ||
|| 1 Eki
||
≤ || εi ||
max
k∈IN,i (M), k<ki || − || + ||ε̄|| (8.2.60)
|| i
k k
i=k+1 ||
|| ||
|| ||
1 || 1 E
ki
||
≤ 1−ζ max || ε ||
i || + ||ε̄||.
||
k∈IN,i (M), k<ki || (ki − k) ζ
M ||
i=k+1
8.2 Estimating Change Points 459
Since the error terms are assumed to be .Lν −decomposable, .||ε̄|| = oP (1), and by
the stationarity of the errors and .Lν -decomposability,
|| || || ||
|| Eki || ||E ||
|| 1 || D 1 || k ||
|| εi || || εj ||
. max
k∈IN,i (M), k<ki || − ζ || = M≤k<k
max
−1 ζ || ||
|| i
(k k)
i=k+1 ||
i k ||j =1 ||
|| ||
||E ||
1 || k
||
≤ sup ζ || || ε ||
j || = OP (1). (8.2.61)
k≥1 k ||j =1 ||
|A2,N (k)|
. max
k∈IN,i (M), k<ki |A5,N (k)|
|| || ┌ ┐
|| ki ||
c8 || E || ||W0,k
1,N || ||W0,k
1,N ||
i
≤ max || ||
εi − ε̄ || + (8.2.62)
ki − k ||
k∈IN,i (M), k<ki ||i=k+1 || N N
= oP (1).
Regarding .A3,N , first we note that a simple calculation gives, .r(ki )||O0,k1,N || =
i
O(1). We then have by the Cauchy-Schwarz inequality and (8.2.60) and (8.2.61)
that
|| ||
|| ||
|A3,N (k)| ||
0,ki || 1
E
ki
||
. max ≤ max r(ki )||O1,N || || (εi − ε̄)||
||
k∈IN,i (M), k<ki ki − k k∈IN,i (M), k<ki || ki − k ||
i=k+1
1
≤ OP (1).
M 1−ζ
Therefore for all .x > 0,
⎧ ⎫
|A3,N (k)|
. lim lim sup P max > x = 0. (8.2.63)
M→∞ N →∞ k∈IN,i (M), k<ki |A5,N (k)|
Combining (8.2.54), (8.2.57), (8.2.58), (8.2.62), (8.2.63), and (8.2.64), and applying
symmetric reasoning when .k ≥ ki , we get that
which combined with (8.2.53) implies that .dist(k̃1 , K) = OP (1), which in turn
implies that .dist(k̂1 , K) = OP (1).
Assume now by way of induction that .1 = k̂0 < k̂1 < · · · < k̂r < k̂r+1 = N + 1,
.r ≤ R, have been estimated so that .max1≤j ≤r dist(k̂j , K) = OP (1), and for some
' '
.α ∈ (0, 1) .P (mini∈{0,...,r} k̂i+1 − k̂i > α N) → 1. Under these conditions,
|| E || || E
k̂i+1 ||
|| 1
k
|| || 1 ||
.||W
k
|| ≤ |||| εj |||| + |||| εj ||||.
k̂i ,k̂i+1 (k − k̂i )1/2 (k̂i+1 − k)1/2 j =k+1
j =k̂i +1
and
|| E
k̂1 ||
|| 1 ||
. max |||| εj |||| = OP ((log N)1/ν ).
1≤k≤k̂1 (k̂1 − k)1/2 j =k+1
As for the first term, we have again using Theorem A.3.1 that
|| || || ||
|| 1 E k
|| || 1 E k
||
. | |
max || 1/2 | | | |
εj || ≤ max || 1/2 εj |||| = OP ((log N)1/ν ).
1≤k≤k̂1 k 1≤k≤N k
j =1 j =1
As for the second term, let .E > 0, and choose M large enough so that for all N
sufficiently large .P (BN,1 (M)) = P (dist(k̂1 , K) < M) > 1 − E/2. Let for .x > 0
⎧ ⎫
⎨ || || ⎬
|| 1 Ek̂1
||
AxN,1
. = max |||| εj |||| > x .
⎩1≤k≤k̂1 (k̂1 − k)1/2 ⎭
j =k+1
8.2 Estimating Change Points 461
We aim to bound the first term on the right-hand side of the above inequality.
According to the definition of .BN,1 (M) and using the union bound and Theo-
rem A.3.1, we have for a positive constant .c10
P (AxN,1 ∩ BN,1 )
. (8.2.66)
⎧ ⎫
⎨ || E
j || ⎬
|| 1 ||
≤P max max |
max ||| ε || > x
⎩i∈{1,...,R} j ∈{ki −M,...,ki +M} 1≤k≤j (j − k)1/2
i || ⎭
i=k+1
⎧ ⎫
E i +M
R kE ⎨ || E
j || ⎬
|| 1 | |
≤ P max |||| ε |
i ||
| > x ≤ c10 log Nx −ν ,
⎩1≤k≤j (j − k)1/2 ⎭
i=1 j =ki −M i=k+1
Setting .x = C(log N)1/ν with a suitably large constant C gives that .P (AxN,1 ∩
BN,1 ) < E/2. These combined imply (8.2.65).
Let .ιN satisfy .(log N)1/ν /ιN +ιN /ρN → 0, .BN,r (M) = {max1≤j ≤r dist(k̂j , K) ≤
M}, and .Ar,N = {maxi∈{0,...,r} maxk̂i ≤k≤k̂i+1 ||Wk || ≤ ιN }. Then if .r = R,
k̂i ,k̂i+1
the conditions of Lemma 8.2.3 are satisfied on each sub-sample on the set
.BN,r (M) ∩ Ar,N , whose probability tends to 1 as .M, N → ∞, with .aN = ιN ,
and .fN = M. This implies that the procedure terminates on this set. If .r < R, one
of the sub-samples determined by .l = k̂j , and .u = k̂j +1 satisfies (8.2.48) with
.mN ≥ c4 N for a positive constant .c4 , and an additional change point is detected.
we have for .E > 0 and since .limM→∞ lim supN →∞ P (BN,r (M)) = 1 that for any
E > 0,
.
P (B∗N (M ∗ ))
.
By repeating the same arguments used to establish (8.2.52), it can be shown that
for all .l, u such that .{|l − ki | ≤ M, |u − ki ' | ≤ M}, .limM ∗ →∞ lim supN→∞
462 8 Functional Data
Using the segmented data, we obtain the CUSUM estimators for a change point
inbetween .k̂l−1 and .k̂l+1 :
⎧⎛ ⎞κ
k̂l+1 − k̂l−1
.k̃l = sargmax
[j − k̂l−1 ][k̂l+1 − j ]
⎛⎛
k̂l−1 <j <k̂l+1
⎞2
E
j E
k̂l+1 ⎫
⎝ j − k̂l−1
× Xi (t) − Xi (t)⎠ dt ,
k̂l+1 − k̂l−1
i=k̂l−1 +1 i=k̂l−1 +1
Similarly to the presentation in Sect. 8.2.1, we consider both the cases when
ΔN,l → 0 as .N → ∞, as well as when .ΔN,l tends to a constant. In the former
.
case the limiting distribution of .k̃l is the maximal argument of a Gaussian process,
while in the latter it is distributed as the maximal argument of a random walk with
drift constructed from the innovations in (8.2.38).
Theorem 8.2.3 Suppose .κ = 0, the errors in (8.2.38) are .Lν -decomposable, and
Assumption 8.2.3 holds.
(i) If .maxl∈{1,...,R} ΔN,l + 1/(NΔ2N ) → 0 as .N → ∞, then
Δ2N,1 ⎛ ⎞ Δ2N,R ⎛ ⎞
. k̃ 1 − k 1 , . . . , k̃ R − k R (8.2.67)
τ12 τR2
8.3 Change in the Covariance of Functional Observations 463
Δ2N,l ⎛ ⎞ D
. k̃l − kl → argmaxt {W (t) − |t|m̄0,l (t)}, (8.2.68)
τl2
D
{ }
k̃l − kl → argmaxj Δl S(j ) − Δ2l |l|m̄0,l (j ) ,
.
The ideas presented above can be generalized to test for and estimate change points
in other quantities describing the distribution of functional data. In this subsection,
we focus on changes in the covariance or “second order properties” of a functional
time series. We largely omit the proofs of these results in place of references to
source material. In many cases the proofs are similar to those presented above.
Below we use the notation .x ⊗ y for functions .x, y ∈ L2 [0, 1] to denote the
function .(t, s) |→ x(t)y(s) ∈ L2 [0, 1]2 . Often the key observation is that variables
of the form .Xi ⊗ Xi , for instance as might appear in the typical estimator for the
covariance kernel of a functional time series, belong to a separable Hilbert space
when .Xi ∈ L2 [0, 1], and are also .Lν/2 decomposable when .Xi is .Lν -decomposable.
464 8 Functional Data
We consider the following simple single change point model for changes in the
second order properties of a functional data sequence:
⎧
μ(t) + Ei (t), if 1 ≤ i ≤ k ∗
Xi (t) =
. (8.3.1)
μ(t) + Ei,A (t), if k ∗ + 1 ≤ i ≤ N,
where .Eεi (t) = Eεi,A (t) = 0. The series .Xi so defined so that it has a constant
mean function .μ, but the structure of the innovations may change at the point .k ∗ . In
order for allow for general serial dependence among the innovations, we assume that
the innovations before and after the change are each .Lν -decomposable Bernoulli
shifts:
Assumption 8.3.1 .Ei (t) = g(ηi , ηi−1 , . . .)(t) and .Ei,A (t) = gN (ηi , ηi−1 , . . .)(t)
for some (deterministic) measurable functions .g, gN : S∞ → L2 , where .{ηj , j ∈
Z} are independent and identically distributed random variables with values in a
measurable space .S, and .Ei (t) = Ei (t, ω) is jointly measurable in .(t, ω), for each
.i ∈ Z. Further .EEi (t) = EEi,A (t) = 0 for all .t ∈ [0, 1], .E||Ei || , E||Ei,A || < ∞
ν ν
2 2
with some .ν > 4, and with some .a > 0,
( ∗
)1/ν ( )1/ν
. E||Ei − Ei,m ||ν2 ≤ am−α , E||Ei,A − Ei,A,m
∗
||ν2
≤ cm−α with some α > 2, (8.3.2)
Let
denote the covariance kernel of the observations before a potential change point .k ∗ ,
and let
denote the covariance function after the change. We use the notation .CΔ (t, s) =
C(t, s) − CA (t, s) to denote the difference between the covariance kernels before
and after the change point. Testing for changes in the covariance function can be
framed as a hypothesis test of
H0 : ||CΔ || = 0,
. (8.3.3)
versus
HA : ||CΔ || > 0.
. (8.3.4)
8.3 Change in the Covariance of Functional Observations 465
||K||2 < 1,
. (8.3.6)
and
with
If, in addition to (8.3.5), (8.3.6), and (8.3.7), we assume .aN > 0 is sufficiently small
so that
Let
⎛ ⎛ | |
n
K(l) (x1 , xn+1 ) =
. ··· K(x1 , x2 )K(x2 , x3 ) . . . K(xn , xn+1 ) dxi
i=2
Defining analogous quantities with .KN in place of .K, the stationary solutions
to (8.3.5) and (8.3.8) may be written as
∞
E
Ei (t) =
. K(l) [ηi ](t),
l=0
and
∞
E (l)
.Ei,A (t) = KN [ηi ](t).
l=0
For more details see Bosq (2000) and Horváth and Kokoszka (2012). Noting
(l)
that both the norms of the operators .K(l) and .KN decay geometrically in .l,
Assumption 8.3.1 holds in this case. Let
l ⎛
E ⎛ k−1
| |
L (x1 , xn+1 ) =
.
(l)
··· K(xi , xi+1 )k(xk , xk+1 )
k=1 i=1
| |
l | |
l
× K(xj , xj +1 ) dxm ,
j =k+1 m=2
| |
with . i∈∅ = 1. The corresponding operator is
⎛
L(l) [f ](t) =
. L(l) (t, s)f (s)ds, l ≥ 1 and L(0) [f ](t) = f (t).
with
∞
E
.δi (t) = L(l) [ηi−l ](t)
l=0
and
The results for FAR(1) processes can be extended to FAR.(p) processes, as well
as general, linear processes.
Example 8.3.2 Example 8.3.1 can be extended to linear processes defined as
∞
E
.Ei (t) = Ll [ηi−l ](t)
l=1
and
∞
E
. Ei,A (t) = LN,l [ηi−l ](t),
l=1
where .Ll [f ](t) and .LN,l [f ](t) are the integral operators associated with the
functions .Ll (t, s) and .LN,l (t, s). It is assumed that
∞
E
. ||Ll ||2 < ∞.
l=1
and
∞
E
. ||zl ||2 < ∞
l=1
∞
E
. lim sup ||GN,l ||2 < ∞
N →∞ l=0
where
∞ ⎛ ⎛
E
Ā
. l (t, s) = Ll (t, u)(EE0 (u)E0 (v))Ll (v, s)dudv,
l=0
∗
and Ā
. l (s, t) = Āl (t, s) denotes the adjoint kernel.
Example 8.3.3 Following Aue et al. (2017), Cerovecki et al. (2019), and Küchnert
(2020), we define the functional GARCH(1,1) (FGARCH(1,1)) process
and the non-negative parameter functions .ω, .α and .β satisfy the regularity condi-
tions of Theorem 1 of Aue et al. (2017), which imply that a stationary solution .Ei
satisfying (8.3.9) exists in the function space .C[0, 1] of continuous functions defined
on the unit interval. One of these conditions in particular is that .inf0≤t≤1 ω(t) > 0.
A change in the variance of the process may be modelled by changes in these
parameter functions. For example, a “level shift” in the pointwise variance of the
functional observations is induced as in (8.3.1) by setting .Ei,A (t) = σi,A (t)ηi (t),
with
⎛ ⎛
.σi,A (t) = ω(t) + aN c(t) + (s)ds + β(t, s)σi−1
2 2 2
α(t, s)Ei−1 (t, s)ds,
and the function .c is taken to satisfy .inf0≤t≤1 c(t) > 0. Since the stationary solution
σ0 of (8.3.9) is independent of .η0 , we get
.
( )
CΔ (t, s) = Eσi,A (t)σi,A (s) − Eσi (t)σi (s) Eη0 (t)η0 (s),
.
Therefore a change of magnitude .aN to the level of the conditional variance process
induces a change of the same magnitude in the covariance functions. A similar
change arises when any of the other parameter functions are changed, and it may
be shown as above that if
⎛ ⎛
.σA,i (t) = ω+
2 2 2
(α(t, s)+aN δ1 (t, s))Ei−1 (s)ds+ (β(t, s)+aN δ2 (t, s))σi−1 (s)ds,
8.3 Change in the Covariance of Functional Observations 469
In order to test .H0 versus .HA , we consider CUSUM processes of partial sample
estimates of the covariance kernels: for .u, t, s ∈ [0, 1] we let
⎛ ⎣N
E u⎦
ZN (u, t, s) = N −1/2
. (Xi (t) − X̄N (t))(Xi (s) − X̄N (s))
i=1
⎞
⎣Nu⎦ E
N
− (Xi (t) − X̄N (t))(Xi (s) − X̄N (s)) ,
N
i=1
where .X̄N (t) is the sample mean. The asymptotic long-run covariance function of
ZN (u, t, s) contains the term
.
∞
E
D(t, t ' , s, s ' ) =
. cov(E0 (t)E0 (s), El (t ' )El (s)), t, t ' , s, s ' ∈ [0, 1].
l=−∞
(8.3.10)
The following result describes the asymptotic properties of .ZN :
Theorem 8.3.1 If Assumption 8.3.1 holds with .gN = g, which implies that .H0
holds, then we can define a sequence of Gaussian processes .{┌N (u, t, s), 0 ≤
u, t, s ≤ 1} such that
⎛⎛
. sup (ZN (u, t, s) − ┌N (u, t, s))2 dtds = oP (1)
0<u<1
with .E┌N (u, t, s) = 0 and .E┌N (u, t, s)┌N (u' , t ' , s ' ) = (min(u, u' ) −
uu' )D(t, t ' , s, s ' ).
The proof is similar to that of Theorem 8.1.1. The following result is a “covariance
analog” of Theorem 8.1.2.
Theorem 8.3.2 If Assumption 8.3.1 holds with .gN = g, which implies that .H0
holds, and .0 ≤ κ < 1/2 hold, then with the Gaussian processes .{┌N (u, t, s), 0 ≤
u, t, s ≤ 1} of Theorem 8.3.1 we have
⎛⎛
1
. sup (ZN (u, t, s) − ┌N (u, t, s))2 dtds = oP (1).
1/(N +1)<u<1−1/(N +1) [u(1 − u)]
2κ
In many practical settings, before testing for and estimating changes in the
covariance kernel, a second order property, we first test for and estimate changes
in the mean function. It is natural to attempt to re-center the data based on estimates
of the mean change. Letting .k (∗,m) denote the time of a change in the mean as in
470 8 Functional Data
model (8.1.1), which we estimate by .k̄ (m) , we would then compute the estimated
errors
⎧
Xi (t) − X̄k̄ (m) ,1 (t), if 1 ≤ i ≤ k̄ (m) ,
.Ēi (t) =
Xi (t) − X̄k̄ (m) ,2 (t), if k̄ (m) + 1 ≤ i ≤ i ≤ N,
where
(m)
1 E E
k̄ N
1
X̄k̄ (m) ,1 (t) =
. Xi (t) and X̄k̄ (m) ,2 (t) = Xi (t).
k̄ (m) N − k̄ (m)
i=1 i=k̄ (m) +1
(8.3.11)
We then modify the covariance CUSUM process to reflect the different centraliza-
tion applied:
⎛ ⎣N
E ⎞
⎣Nu⎦ E
u⎦ N
ẐN (u, t, s) = N −1/2
. Ēi (t)Ēi (s) − Ēi (t)Ēi (s) .
N
i=1 i=1
The process .ẐN has the same limiting behaviour as .ZN of Theorem 8.3.1 so long as
the change point in the mean is estimated consistently with a rate .oP (N ).
Theorem 8.3.3 If Assumption 8.3.1 is satisfied with .gN = g, and if there is at most
one change in the mean at location .k (∗,m) with estimator .k̄ (m) satisfying
then
⎛⎛ ⎛ ⎞2
. sup ẐN (u, t, s) − ┌N (u, t, s) dtds,
0<u<1
Theorem 8.3.4 We assume that .HA of (8.3.4) holds along with Assumption 8.3.1,
and
(i) If
then
⎛⎛⎛
1 P 1
.
2
ZN (u, t, s)dtdsdu → .
N (θN (1 − θN )) 2
||CΔ ||22 3
then
1
.
N(θN (1 − θN ))2−κ ||C − CA ||22
⎛⎛
1 P
× sup ZN 2
(u, t, s)dtds → 1.
0<u<1 [u(1 − u)]
2κ
under .H0 , and establish the asymptotic consistency of tests based on .TN (κ). For
example in order to estimate the null distribution of .TN = TN (0), let .{┌(u, t, s), 0 ≤
u, t, s ≤ 1} be a Gaussian process with zero mean and .E┌(u, t, s)┌(u' , t ' , s ' ) =
(min(u, u' ) − uu' )D(t, t ' , s, s ' ), where .D is defined in (8.3.10). Noticing that the
covariance function of .┌ is a product of .min(u, u' )−uu' and the covariance function
of a Brownian bridge, it may be shown that
⎛⎛⎛ ∞
E
D λl
. ┌ 2 (u, t, s)dudtds = N2 ,
(π k)2 k,l
k,l=1
where .<φi , φj > = 1{i = j }. The long-run covariance defined in (8.3.10) can also be
estimated from the sample. In order to do so, we define
where
1 E
N
z̄N (t, s) =
. zi (t, s).
N
i=1
E
N −1 ⎛ ⎞
' ' l
D̂N (t, t , s, s ) =
. K γ̂l (t, t ' , s, s ' ).
h
l=−(N −1)
If h and K satisfy Assumptions 3.1.4 and 3.1.5, one can prove following the proof
of Theorem 8.1.4 that under the null hypothesis of no change point, we have
⎛⎛⎛⎛
. (D̂N (t, t ' , s, s ' ) − D(t, t ' , s, s ' ))2 dtdt ' dsds ' = oP (1) (8.3.12)
and
⎛⎛⎛⎛
. D̂2N (t, t ' , s, s ' )dtdt ' dsds ' = OP (h2 ||CΔ ||22 ) (8.3.13)
under the alternative. Now we can estimate the eigenvalues of .D with .λ̂1 ≥ λ̂2 ≥
. . . , the empirical eigenvalues of .D̂N defined as the solutions of the eigenvalue
problem:
⎛⎛
. λ̂i φ̂i (t, s) = D̂N (t, t ' , s, s ' )φ̂i (t ' , s ' )dt ' ds ' , 1 ≤ i < N.
8.3 Change in the Covariance of Functional Observations 473
We only consider asymptotic behaviour of .k̂N when the size of the change .||CΔ ||
tends to zero with the sample size. As we have seen earlier, in this case the limit
distribution of the time of change will be the maximal argument of a Gaussian
process depending on a small number of parameters.
Under Assumption 8.3.3 the covariance function .CA is close to .C in the .L2 sense if
the sample size N is large. We also assume that the standardized difference has a
limit:
Assumption 8.3.4 there is .C∗ (t, s) ∈ L2 ([0, 1] × [0, 1]) such that
⎛⎛ ⎛ ⎞2
CΔ (t, s)
. − C∗ (t, s) dtds = o(1).
||CΔ ||2
Let
τ2
. (8.3.15)
∞
E ⎛⎛ ⎛ ⎛⎛ ⎞
= Cov E0 (t, s)C∗ (t, s)dtds, El (t, s)C∗ (t, s)dtds .
l=−∞
Theorem 8.3.5 We assume that .HA of 8.3.4 holds, and Assumptions 2.1.1
and 8.3.1–8.3.4 are satisfied.
(i) If .0 ≤ κ < 1/2, then
||CΔ ||22 ⎛ ∗
⎞ D
. k̂ N − k → ξ(κ).
τ2
(ii) If in addition
holds, then
||CΔ ||22 ⎛ ∗
⎞ D
. k̂ N − k → ξ(1/2),
τ2
where .ν is from Assumption 8.3.1 and .ξ(κ) is defined in (2.2.3).
The proof is similar to that of Theorems 2.2.1 and 8.2.1, and can be found in
Horváth et al. (2022).
In order to apply Theorem 8.3.5 to, for example, produce confidence intervals for
∗ 2
.k , we require and estimate of .τ in (8.3.15). This is the long-run variance of the
variables
⎛⎛
.ej = Ej (t)Ej (s)CΔ (t, s)dtds
CΔ (t, s) with the standardized difference between the sample covariance kernels
.
Since it is common to use the eigenvalues as well as the trace of the covariance
kernel to estimate the number of functional principal components required in order
to perform effective dimension reduction with functional data, it is also of interest
to investigate change point detection procedures for them. Consider the sequence of
(i) (i) '
1 ≥ λ2 ≥ · · · ≥ 0 of the covariance kernel of the .i th observation .C (t, s) =
.λ
(i)
(i) (i)
and <φj , φk > = 1{j = k}.
Let
d = (λ1 , . . . , λd )T ∈ Rd ,
(i) (i) (i)
.
denote the vector of the first d largest eigenvalues of .C(i) . Formally then we wish to
test
(1) (N )
.H0 : d = · · · = d
(1) (k ∗ ) (k ∗ +1) (N )
HA : d = · · · = d
. /= d = · · · = d ,
where .k ∗ = ⎣θ N ⎦, with .θ ∈ (0, 1). In order to test .H0 versus .HA , we consider
partial sample estimates of the covariance kernel given by
⎣N u⎦
1 E
Ĉu (t, s) =
. (Xi (t) − X̄(t))(Xi (s) − X̄(s)), t, s ∈ [0, 1], u ∈ [1/N, 1],
N
i=1
(8.3.17)
476 8 Functional Data
with .Ĉu = 0 for .0 ≤ u < 1/N. The estimate .Ĉu may be used to define an integral
operator
⎛
ĉu (f )(t) =
. Ĉu (t, s)f (s)ds, u, s ∈ [0, 1]. (8.3.18)
For .u ∈ [0, 1], let .λ̂j (u) denote the ordered eigenvalues of .ĉu with corresponding
orthonormal eigenfunctions .ϕ̂j,u . To consider tests based on the vector of partial
sample estimates of .d , define
⎛ ⎞ d
ˆ ⎣Nx⎦ D [0,1] 1/2 (d)
N
.
1/2
d (x) − d → Ed W (x),
N
Proof It follows from Theorem 2.1 of Aue et al. (2020) that for any fixed .δ with
0 < δ < 1,
.
⎛ ⎞ d
ˆ ⎣Nx⎦ D [δ,1] 1/2 (d)
.N 1/2
d (x) − d → Ed W (x). (8.3.20)
N
8.3 Change in the Covariance of Functional Observations 477
with .C denoting the common covariance kernel under Assumption 8.3.1. Therefore
by Theorem 8.3.1, for all .t > 0,
⎛ | | ⎞
| k ||
. lim lim sup P N 1/2 |
sup |λ̂j (k/N) − λj | ≥ t = 0.
δ→0 N →∞ 1≤k≤⎣N δ⎦ N
Combining the above, the three terms on the right-hand side of (8.3.21) can be made
arbitrarily small for all sufficiently large N and small .δ. This implies that
|| ⎛ ⎞ ||
|| 1/2 || P
sup || N ˆ d (x) − ⎣Nx⎦ d − E 1/2 W(d) (x)|| → 0,
.
|| N d N ||
0≤x≤1
E −1 ⎛ ⎞
1 E(
N
l ˆ )( )T
Êd =
. K l,υ , ˆ l,υ = ϒ̂ i − ϒ̄ ϒ̂ i+l − ϒ̄ ,
h N
l=1−N i∈Il
where .h = h(N) and .K(u) satisfy Assumptions 3.1.4 and 3.1.5, and .Il =
{1, . . . , N − l} if .l ≥ 0 and .Il = {1 − l, . . . , N } if .l < 0, and .ϒ̂ j =
(υ̂i,1 , . . . , υ̂i,d )T is the estimated score vector whose entries are given by
.υ̂i,j = <(Xi − X̄) ⊗ (Xi − X̄) − Ĉ1 , ϕ̂j,1 ⊗ ϕ̂j,1 >, (8.3.23)
while .ϒ̄ is the sample mean of the .ϒ̂ i . In order to test .H0 , we consider the maximal
quadratic form statistic
−1
ζT
N (x)Êd ζ N (x)
.Jd,N (κ) = sup ,
0≤x≤1 [x(1 − x)]2κ
8.3 Change in the Covariance of Functional Observations 479
where
⎛ ⎞
ˆ ⎣Nx⎦ ˆ
ζ N (x) = N
.
1/2
d (x) − d (1) , x ∈ [0, 1].
N
where .σ̂j2 = Êd (j, j ). The following result is a consequence of Theorem 8.3.6.
Corollary 8.3.2 If the conditions of Theorem 8.3.6 are satisfied, and .0 ≤ κ < 1/2,
D E
d Bj2 (x)
.Jd,N (κ) → sup ,
0≤x≤1 j =1 [x(1 − x)]2κ
and
D |Bj (x)|
Ij,N (κ) → sup
. ,
0≤x≤1 [x(1 − x)]κ
where .{Bj (x), 0 ≤ x ≤ 1}, .j ∈ {1, . . . , d}, are independent and identically
distributed standard Brownian bridges.
A test of asymptotic size .α for .H0 is to reject if .JN or .Ij,N exceed the .1 − α
quantile of their limit distributions in Corollary 8.3.2.
Remark 8.3.1 Due to the bias that occurs in estimating the eigenvalues of the
covariance operator near the beginning of the sample, in practice we may use instead
−1
(δ) ζT
N (x)Êd ζ N (x)
. Jd,N (κ) = sup ,
δ≤x≤1 [x(1 − x)]2κ
and
| |
| |
.I
(δ)
= sup
N 1/2 |λ̂j (x) − ⎣Nx⎦ λ̂j (1)| , j = 1, . . . , d,
j,N (κ) | |
δ≤x≤1 σ̂j [x(1 − x)]
κ N
for a user specified trimming parameter .δ. The asymptotic distributions of these
statistics coincide with those described in Corollary 8.3.2 upon replacing the domain
on which the suprema are calculated with .[δ, 1]. We have found the choice .δ = 0.1
seems to work well in practice, and we generally recommend this choice as a default.
480 8 Functional Data
When the data are stationary, the eigenvalue .λj is often used to describe the
variance of .X0 explained by the j th principal component .ϕj by comparing its
magnitude to the cummulative variance of the function .X0 measured by the trace of
the covariance operator
∞
E ⎛
. λj = C(t, t)dt = Tr(c).
j =1
A common criterion for selecting the number of principal components for subse-
quent analysis is to take the minimum d that for which the total variance explained
(TVE) by the first d principal components to exceed a user selected threshold v, that
is,
⎧ ⎫
λ1 + · · · + λd
.d = dv = min d, ≥v . (8.3.24)
Tr(c)
When performing principal component analysis for functional time series it is often
also of interest to evaluate if .Tr(c) is constant in conjunction with the constance of
the largest eigenvalues. A partial sample estimator of the trace is given by
⎣N x⎦
1 E
.TrN (x) = ||Xi − X̄||2 , x ∈ [0, 1]. (8.3.25)
N
i=1
The large-sample behavior of a centered version of the process .TrN is given next.
Theorem 8.3.7 If Assumption 8.3.1 holds with .gN = g, so that the sequence .Xi is
Lν -decomposable for some .ν > 4, then
.
D[0,1]
.N 1/2 [TrN (x) − x Tr(c)] → σT W (x),
N 1/2
MN (κ) = sup
. |TrN (x) − xTrN (1)|,
0≤x≤1 σ̂T [x(1 − x)]
κ
8.4 Heteroscedastic Errors 481
∞
E ⎛ ⎞
l 1 E
2
.σ̂T = K γ̂l , γ̂l = (ξ̂i − ξ̄ )(ξ̂i+l − ξ̄ ),
h N
l=−∞ i∈Il
D |B(x)|
.MN (κ) → sup
0≤x≤1 [x(1 − x)]
κ
Assumption 8.4.1 .ni = ⎣Nτi ⎦, 1 ≤ i ≤ M and .0 < τ1 < τ2 < . . . < τM < 1.
Let .n0 = 0, nM+1 = N, τ0 = 0 and .τM+1 = 1.
Assumption 8.4.2 .{Ei , i ∈ Z} forms a Bernoulli shift on .nl−1 < i ≤ nl , 1 ≤ l ≤
M + 1, i.e.
.Ei (t) = gl (ηi , ηi−1 , . . .)(t) for some (deterministic) measurable function .gl :
( ∗
)1/ν
. E||Ei − Ei,m ||ν2 ≤ cm−α with some α > 2, (8.4.1)
∗ = g (η , . . . , η ∗ ∗
where .Ei,l l i i−l+1 , ηi−l , ηi−l−1 , . . .) for .nl−1 < i ≤ nl , 1 ≤ l ≤
∗
M + 1, and .{ηi , i ∈ Z} are independent copies of .η0 , independent of .{ηl , l ∈ Z}.
According to Assumption 8.4.1 the sequence .{Ei (t), i ∈ Z} is not stationary but it
is stationary on the intervals .nl−1 < i ≤ nl , 1 ≤ l ≤ M + 1. Dependence though
is allowed between the errors in different intervals of stationarity. However, the
volatility changes abruptly. We use the error terms only on some intervals according
to the definition in Assumption 8.4.1. Let
It is clear that
Under Assumption 8.4.1, the series defining .Dl is absolutely convergent in .L2 . The
limit of the partial sums of the .Ei ’s will be a Gaussian process .{┌(u, t), 0 ≤ u, t ≤ 1}
with
E┌(u, t) = 0
. and E┌(u, t)┌(v, s) = D(u, v, t, s), (8.4.4)
0 ≤ u, v, t, s ≤ 1,
E
l−1
D(u, v, t, s) =
. (τj − τj −1 )Dj (t, s) + (min(u, v) − τl−1 )Dl (t, s). (8.4.5)
j =1
Theorem 8.4.1 If .H0 of (8.1.1), Assumptions 8.4.1 and 8.4.2 hold, then we can
define a sequence of Gaussian processes .{┌N
0 (u, t), 0 ≤ u, t ≤ 1} such that
⎛ ⎛ ⎞2
. sup ZN (u, t) − ┌N
0
(u, t) dt = oP (1)
0≤u≤1
8.4 Heteroscedastic Errors 483
where the mean and covariance of the Gaussian process .{┌(u, t), 0 ≤ u, t ≤ 1} are
defined in (8.4.4).
Proof We write the partial sum of the errors as
E
k l−1 ⎛
E nj
E ⎞ E
k
. Ei (t) = Ei (t) + Ei (t), (8.4.6)
i=1 j =1 i=nj −1 +1 i=nl−1 +1
when k satisfies .nl−1 < k ≤ nl . Using Theorem A.3.1, we can define independent
Gaussian processes .{┌N,1 (u, t), 0 ≤ u ≤ τ1 , 0 ≤ t ≤ 1}, {┌N,2 (u, t), τ1 < u ≤
τ2 , 0 ≤ t ≤ 1}, . . . , {┌N,M+1 (u, t), τM < u ≤ 1, 0 ≤ t ≤ 1} such that
E┌N,l (u, t) = 0,
.
= oP (1), 1 ≤ l ≤ M + 1. (8.4.7)
We define
E
l−1
┌N (u, t) =
. ┌N,j (τj − τj −1 , t) + ┌N,l (u − τl−1 , t),
j =1
= oP (1).
484 8 Functional Data
⎣N
E ⎣Nu⎦ E
u⎦ N
ZN (u, t) = N −1/2
. Ei (t) − Ei (t),
N
i=1 i=1
The behaviour of .ZN under the alternative hypothesis of a single change in the
mean is the same up to a first order approximation as the case when the errors are
homogeneous.
Theorem 8.4.2 If .HA of (8.1.3), Assumptions 8.4.1, 8.4.2 hold and
⎛
NθN2 (1 − θN )2
. (μ0 (t) − μA (t))2 dt → ∞,
then
⎛
1 P
. ⎛ sup 2
ZN (u, t)dt → 1
N θN2 (1 − θN )2 (μ0 (t) − μA (t))2 dt 0<u<1
and
⎛⎛
1 P 1
. ⎛ 2
ZN (u, t)dtdu → .
3
NθN2 (1 − θN )2 (μ0 (t) − μA (t))2 dt
Proof The proof is the same as that of Theorem 8.1.3, noting that under
Assumption 8.4.2 the CUSUM process of the errors .ZN,E still satisfies
.sup0≤u≤1 ||ZN,E (u, ·)|| = OP (1) according to Theorem 8.4.1. ⨆
⨅
2
with
⎣NE
u⎦−1 ⎛ ⎞
∗ l
.DN (u, t, s) = K γ̂k,l (t, s).
h
l=−(⎣N u⎦−1)
The basic idea is that we estimate .D using the observations .X1 , X2 , . . . , X⎣N min(u,v)⎦ ,
but centered using .X̄N estimated from the entire sample. The the following theorem
remains true if we replace .X̄N with
1E
k
.X̄k = Xi (t)
k
i=1
in the definition of .γ̂k,l . The proof of this result is omitted, although it follows
similarly as (4.1.41).
Theorem 8.4.3 If .H0 of (8.1.2), Assumptions 3.1.4, 3.1.5, 8.4.1 and 8.4.2 with .ν >
4 hold, then
⎛⎛⎛⎛⎛ ⎞2
. D̂N (u, v, t, s) − D(u, v, t, s) dudvdtds = oP (1),
where .D(u, v, t, s), .D̂N (u, v, t, s) are defined in (8.4.5) and (8.4.9).
The consistent estimator .D̂N can be used to create a consistent test of .H0 based
on .L2 functionals of .ZN . The covariance function of .{┌N
0 (u, t), 0 ≤ u, t ≤ 1} is
D̂0 (u, t) = D̂N (u, v, t, s) − v D̂N (u, 1, t, s) − uD̂N (1, v, t, s) + uv D̂N (1, 1, t, s).
.
As in Sect. 3.3, we noted that the distribution of the estimator for the change
point might be different if the mean and variance change at the same time; see
Theorems 3.3.6 and 3.3.7. Next we discuss the estimation of .k ∗ of (8.2.1) under
Assumptions 8.4.1 and 8.4.2. As in Theorems 3.3.6 and 3.3.7, we have different
limiting distributions depending on if the mean and the volatility changed at the
same time. For the sake of simplicity we consider the case when there is only one
change in the mean, i.e. .HA holds in model (8.1.3). The time of change .k ∗ satisfies
Assumption 2.1.1. In the data we have .M +1 stationary segments for the distribution
of the errors, and the distribution of .k̂N of (8.2.2) depends on the regime in which
the time of change in the mean occurs. If the long-run covariance function of the
functional observations in this regime is .Dj , then the normalization of the estimator
for the time of change will depend on
⎛⎛
βj2 =
. h(t)Dj (t, s)h(s)dsdt, (8.4.10)
Gaussian process
⎧
∗ βj2 W1 (−t), if t < 0
.Wj (t) = (8.4.11)
βj2+1 W2 (t), if t ≥ 0,
where .{W1 (t), t ≥ 0} and .{W2 (t), t ≥ 0} are independent Wiener processes. Let
{ }
.ξj∗ (κ) = argmaxt Wj∗ (t) − |t|mκ (t) , (8.4.12)
where .m0 (t) is defined in (2.2.3). We recall that .k̂N , the estimator for .k ∗ is defined
in (8.2.2).
Theorem 8.4.4 We assume that .HA of (8.1.3), Assumptions 2.1.1, 8.2.2, 8.4.1, and
8.4.2 are satisfied, and .0 ≤ κ < 1/2.
(i) If .τj −1 < θ < τj with some .1 ≤ j ≤ M, then
Δ2N ⎛ ∗
⎞ D
. k̂ N − k → ξ(κ), (8.4.13)
βj2
also holds, then (8.4.13) and (8.4.14) remain true when .κ = 1/2.
Proof We follow the proof of Theorem 8.2.1. Using again the decomposition
in (8.2.10), the same calculations give that we need to consider only .Qk,4 (t) and
∗
.Qk,5 (t), where .|k − k| = OP (1/Δ ).
2
N
Under the conditions of Theorem 8.4.4(i), (8.2.25) has the form
| ⎛ |
| 1 |
. sup | + − |s|m |
1−2κ 2
| N 1−2κ Q k ∗ +sβj /ΔN ,5
2 2 (t)dt 2(θ (1 θ )) βj κ (s) | (8.4.15)
|s|≤C
= o(1),
488 8 Functional Data
for all .C > 0, where .mκ (s) is defined in (2.2.1). Since Assumption 8.2.2 holds, for
any .C > 0, the errors .{Ek , k ∗ − C/Δ2N ≤ k ≤ k ∗ + C/Δ2N } are in the .j th stationary
segment, i.e. .⎣N τj −1 ⎦ < k ∗ − C/Δ2N , k ∗ + C/Δ2N < ⎣Nτj ⎦. Hence Theorem A.1.1
implies that
⎛
1 D[−C,C]
. Qk ∗ +sβ 2 /Δ2 ,4 (t)dt −→ 2(θ (1 − θ ))1−2κ βj2 W (s), (8.4.16)
N 1−2κ l N
for all .C > 0, where .{W (s), −∞ < s < ∞} is the two sided Wiener process
of (2.2.2). Now (8.4.15) and (8.4.16) imply Theorem 8.4.4(i) as (8.2.25) and (8.2.26)
imply Theorem 8.2.1.
If the conditions of the second part of Theorem 8.4.4 hold, the we replace (8.4.15)
with
| ⎛ |
|1 |
. sup | Q (t)dt + 2θ (1 − θ )|s|m (s) | = o(1). (8.4.17)
|N k ∗ +s/ΔN ,5
2 0 |
|s|≤C
where .{W ∗ (s), −∞ < s < ∞} is defined in (8.4.11). Now the arguments used in the
proof of Theorem 8.2.1 (cf. also Theorem 3.3.6) can be applied. Similar arguments
can be used when .κ = 1/2. ⨆
⨅
We can also provide an analogue of Theorem 2.2.2 for functional time series. As
in Theorem 8.4.4 the limit distribution depends on the volatility regime where the
change in the mean occurs. We define the forward and backward partial sums of the
projected errors in the .i th regime:
⎧ −1 ⎛
⎪
⎨ E
(1) − Ei,j (t)h(t)dt, if l < 0,
Sj (l) =
.
⎪
⎩ i=l
0, if l = 0
and
l ⎛
E
(2)
Sj (l) =
. Ei,j (t)h(t)dt, if l > 0,
i=1
8.4 Heteroscedastic Errors 489
where .Ei,j (t) is defined in (8.4.2). If .τj −1 < θ ≤ τj , then the limit is given in term
of
⎧ (1)
Sj (l), if l ≤ 0
.Sj (l) = (2)
Sj (l), if l > 0.
Let
⎛
. ξ̄j = argmaxl {ΔSj (l) − Δ2 |l|m0 (l) h2 (u)du}, (8.4.19)
and
⎛
ξj∗ = argmaxl {ΔSj∗ (l) − Δ2 |l|m0 (l) h2 (u)du}.
. (8.4.20)
In the following result we only consider the case where .κ = 0, and it is proven along
the lines of Theorems 8.2.1 and 8.4.4.
Theorem 8.4.5 We assume that .HA of (8.1.3), Assumptions 2.1.1, 8.2.1, 8.4.1 and
8.4.2 and are satisfied.
(i) If .τj −1 < θ < τj with some .1 ≤ j ≤ M, then
D
. k̂N − k ∗ → ξ̄j ,
D
k̂N − k ∗ → ξj∗ ,
.
Segment 1
5
Segment 2
Segment 3
0
Fig. 8.1 Upper panel: Time series plot of annual temperature profiles at Gayndah Post Office.
Bottom panel: Estimated segmentation according to the fully functional binary segmentation
method with threshold .ρN = σ̂N (log N )1/2 . The two estimated change point are in the years
1953 and 1972
8.5 Data Examples 491
Table 8.1 Summary of results of a change point analysis in the mean of minimum temperature
curves from eight Australian measuring stations. The p-value for a test of the null hypothesis that
each series has a homogenous mean based on the statistic MN was zero in each case. The column
labelled k̂N (0) reports the estimated break date using the fully functional method, CI gives the
corresponding 95% confidence interval computed from (8.2.37)
Station Range k̂N (0) (year) CI (years)
Sydney (Observatory Hill) 1959–2012 1991 (1981, 1994)
Melbourne (Regional Office) 1855–2012 1998 (1989, 2000)
Boulia Airport 1888–2012 1978 (1954, 1981)
Cape Otway Lighthouse 1864–2012 1999 (1949, 2005)
Gayndah Post Office 1893–2009 1972 (1952, 1980)
Gunnedah Pool 1876–2011 1985 (1935, 1992)
Hobart (Ellerslie Road) 1882–2011 1966 (1957, 1969)
Robe Comparison 1884–2011 1981 (1954, 1985)
which generally suggest that each series appears to have a non-homogeneous mean,
with the most prominent change point estimates tending to cluster around the 1960’s
to 1980’s.
Subsequently, we performed binary segmentation as described above to estimate
further change point in each series. The resulting segmentation for the Gayndah
station, which estimated two change points in the years 1953 and 1972, is displayed
in the bottom panel of Fig. 8.1.
Example 8.5.2 (Crude Oil Intra-Day Return Curves) We illustrate the use of the
weighted CUSUM covariance change point detector and estimator in an application
to detect changes in the covariance of intra-day return curves derived from crude oil
futures prices, as considered in Horváth et al. (2022).
We consider two benchmark assets in the international crude oil pricing system:
West Texas Intermediate (WTI) and Brent crude oil futures. The raw data that we
consider were obtained from www.backtestmarket.com, and are comprised of 5-
minute frequency front-month prices of WTI and Brent futures, from 9:00 am to
2:30 pm, each trading day from 12 May, 2018 to 30 Apr, 2020, which totals 502
days. As such there are 67 discrete observations of the price within each day, which
we linearly interpolated to produce intra-day price curves pi (t) for each asset. The
closing prices of these assets over the observation period are plotted in Fig. 8.4, and
visualizations of these price curves for WTI are shown in Fig. 8.2.
We take as a goal of this analysis to evaluate whether the variability of the
curves modelled by their covariance kernel undergoes change points during the
observation period. In order to study this series as a mean stationary functional
time series, we transform them to cumulative intra-day return curves (CIDRs) via
the transformation
Fig. 8.2 Daily price curves of WTI commodity futures obtained by linear interpolation of 5-
minute frequency front-month prices
Fig. 8.3 Daily cumulative intra-day return (CIDR) curves from WTI commodity futures
where pi (t) is the asset price on day i at intra-day time t, and pi (0) is the opening
price at 9:00 am on day i. Figure 8.3 illustrates the CIDR curves constructed from
both collections of asset price curves. We applied a series of hypothesis tests to
evaluate the stationarity, normality, and serial correlation structure of these CIDR
curves, see respectively Horváth et al. (2014), Górecki et al. (2017), and Kokoszka
et al. (2017), for the details of these tests, the results of which suggested that both
series of crude oil CIDR curves evolve as approximately mean stationary, non-
Gaussian, serially uncorrelated and conditionally heteroscedastic functional time
series.
8.5 Data Examples 493
50
0
WTI
Brent
WTI break
Brent break
Date
Fig. 8.4 Daily closing prices of the WTI and Brent commodity futures, with the estimated breaks
in the covariance operator in the corresponding WTI and Brent CIDR curves
494 8 Functional Data
1.5
1.0
1.0
0.5
0.5
Anomaly Deg C
Anomaly Deg C
0.0
0.0
−0.5
−0.5
−1.0
−1.0
−2.0 −2.0
1.0 1.0
0.5
Anomaly Deg C
Anomaly Deg C
0.5
0.0
0.0
−0.5
−0.5
−1.0
Lon Lon
Lat Lat
−1.0
−1.5
Fig. 8.5 Top four panels: Wireframe plots of the SST anomaly data from four months in 1970.
The entire data set is comprised of a time series of 396 such surfaces
et al. (2020). The data here are available at a monthly resolution starting in January
of 1970 and ending in December of 2003 (n = 396), and consist of 2 degree latitude
by 2 degree longitude spatial measurements of the monthly average sea surface
temperature in the south Pacific, 2261 spatial observations in total, that are adjusted
by removing the corresponding monthly means over the past 12 years in order to
make anomalous months and trends more apparent. Wire frame plots of the surfaces
representing the first four months of data are shown in Fig. 8.5.
SST anomaly data are used primarily in identifying strong climate trends along
with variations such as the El Niño and La Niña phenomena. One of the main
techniques for identifying these trends and variations in the literature is to employ
an empirical orthogonal function (EOF) analysis. EOF analysis effectively amounts
to conducting PCA, or more generally functional PCA, on the surface valued data
encoding the temperature evolution overtime; see Chapter 2.4 in Wikle et al. (2019)
for a discussion of the connections between EOF and PCA analysis.
8.5 Data Examples 495
Table 8.2 Change point estimates along with p-values for tests of stability of the largest
eigenvalue, first three eigenvalues, and trace of the covariance operator
Data set CP λ1 (p-value) CP (λ1 , λ2 , λ3 )T (p-value) CP Trace(p-value)
Raw SST data Apr-76 (0.005) May-97 (0.000) May-97 (0.011)
El Niño adjusted Apr-76 (0.007) May-97 (0.000) May-97 (0.010)
El Niño adjusted/detrended Apr-76 (0.017) Nov-83 (0.055) May-97 (0.156)
Bolded p-values are less than 0.05
We take as the goal of this analysis to determine if the variation explained by the
leading functional principal components of the SST anamoly surfaces is plausibly
homogeneous throughout the sample, and if not to try and better pinpoint the sources
of the variation changes within the series. In this case we envision the data set as
being comprised of a time series of 396 surfaces in L2 ([0, 1]2 ). It is straightforward
to adjust all formulae above to this setup.
First we applied our tests for changes in λ1 , jointly for (λ1 , . . . , λ3 )T , and to
the trace of the covariance operator using the statistics IN(0.1) (0), J3,N (0.1)
(0), and
MN (0) to the raw SST surfaces. The results are summarized in Table 8.2, which also
contains the results of subsequent tests. These showed that the level of variability
measured by the leading eigenvalues and trace appear to change considerably over
the observation period. As seen in the bottom panel of Fig. 8.6, high peaks in the
CUSUM function for the largest eigenvalue process are observed on the dates of
April 1976, and May 1997. These coincide with well known strong El Niño events.
The total variation explained by the first three eigenvalues calculated from the
covariance operator prior to April 1976 is 72%, while for the data past April 1976
4
Raw
El Nino
El Nino + Detrend
3
Cusum
2
1
0
is 54%, and hence one may expect based on this analysis to need more principal
components to accurately represent the data past April 1976 compared to prior to
this date.
In order to adjust for the El Niño effect, Lawrence et al. (2004) suggests using the
first principal component (PC) series as a proxy for the El Niño variations, which
may then be removed by calculating the residuals of a simple linear regression of
the SST time series at each spatial point on to the leading PC series. Specifically,
let xr,v,i denote the raw, or pixel, SST data at each of the 2261 spatial locations,
and Xi (t, s), i ∈ {1, . . . , 396} denote the corresponding SST anomaly surfaces,
t, s ∈ [0, 1]. Then with φ̂1 denoting the eigenfunction corresponding to the largest
eigenvalue of the sample covariance operator of the Xi ’s, the leading PC series is
given by
⎛⎛
Pi =
. [Xi (t, s) − X̄(t, s)]φ̂1 (t, s)dtds.
[0,1]2
xr,v,i = βr,v
.
(1)
Pi + βr,v
(0)
+ εr,v,i ,
'
and subsequently considering the residuals xr,v,i
(1) (0)
= xr,v,i − (β̂r,v Pi + β̂r,v ), where
(1) (0)
βr,v and βr,v are estimated via least squares for each pixel. The resulting El Niño
adjusted surfaces are denoted Xi' . We also applied our tests to these surfaces, which
suggested that they still contain sizable fluctuations in the eigenvalues and overall
variance as measured by the trace of the covariance operator, with apparent and
prominent changes at approximately the same dates.
In order to remove remaining variation that was not captured by the anomaly
and El Niño adjustments, Good et al. (2007) suggest removing further residual
variations by fitting trigonometric series at high frequencies to the pixel level data.
We ran our tests one last time on the curves Xi' that were detrended using a moving
average with a relatively small window of 3 years to remove such variation. The
resulting time series length was reduced to 360 after truncating 1.5 years off of
each end of the series that could not be centered in this way. Our tests show that
this series/detrending technique evidently achieves some stability in terms of its
variability, with all tests for stability of the eigenvalues and trace yielding p-values
larger than 0.05. We note that using larger moving average windows tended to
decrease these p-values, suggesting that the remaining fluctuations after removing
the El-Niño effect are comparatively high-frequency.
8.6 Exercises 497
8.6 Exercises
Exercise 8.6.1 Verify that for the Gaussian process {┌N 0 (u, t), 0 ≤ u, t ≤ 1}
0
defined in the proof of Theorem 8.1.1 satisfies, E┌N (u, t)┌N0 (v, s) = (min(u, v) −
uv)D(t, s), 0 ≤ u, v, t, s ≤ 1.
Exercise 8.6.2 Verify that the processes ┌ and O defined in (8.1.13) have the same
distribution by verifying that they are each Gaussian processes with the same mean
and covariance functions.
Exercise 8.6.3 Let {Xi (t), i ∈ Z, t ∈ [0, 1]} be a stationary sequence of
stochastic processes with EX0 (t) = 0 for all t ∈ [0, 1], and E||X0 ||2 < ∞. Define
⎛1
γ1 (t, s) = EX0 (t)X0 (s), U (t, s) = 0 γ1 (t, u)γ1 (u, s)du, and u : L2 [0, 1] →
⎛ 1
L2 [0, 1] by u(f )(t) = 0 U (t, s)f (s)ds. Show that U defines a positive definite,
symmetric kernel integral operator on L2 [0, 1] by showing that (i) for all v ∈
L2 [0, 1], <u(v), v> ≥ 0, and (ii) for all v, w ∈ L2 [0, 1], <u(v), w> = <v, u(w)>.
Exercise 8.6.4 Assume {εi (t), i ∈ Z, t ∈ [0, 1]} satisfies Definition 8.1.1 with
some ν ≥ 2. Show that {||εi ||2 , i ∈ Z} satisfies the functional central limit theorem,
i.e. there exists constants μ and σ so that
⎣N
E x⎦
1 D[0,1]
. {||εi ||2 − μ} → σ W (x),
N 1/2
i=1
1 |ξ̂N,1 (u)| D 1
. sup → sup |B(u)|,
1/2
λ̂N,1 0<u<1 [u(1 − u)]κ 0<u<1 [u(1 − u)]
κ
Exercise 8.6.8 For 1 ≤ l < u ≤ N , show that the generalized CUSUM process
Zkl,u (t) is equivalent to the weighted CUSUM process
ZN (k/N, t)
.
[k/N (1 − k/N )]1/2
projections. Padilla et al. (2022) use a kernel CUSUM approach along with seeded
binary segmentation to detect and estimate multiple changes for both sparse and
dense functional data. A self-normalized approach to conduct change point analysis
for functional time series based on projections was put forward in Zhang et al.
(2011). A Bayesian method is developed in Li and Ghosal (2021).
A different framework in which tests for “relevant” changes in the mean function,
which are those whose norm exceeds a user specified threshold, are developed
in Dette et al. (2020b). The problem of conducting change point analysis for
the covariance function or operator describing the second order behaviour of
functional data has been comparatively less explored. Jarušková (2013) is the first
to consider the change point detection problem for the covariance operator of
independent functional data, and their approach is based on an initial dimension
reduction step using functional principal component analysis. Stoehr et al. (2021)
generalize change point detection methods for the mean and covariance using
several dimension reduction based approaches, and the test statistics that we
discussed for this problem appear in Jiao et al. (2023), Horváth et al. (2022),
and Sharipov and Wendler (2020). Aue et al. (2020) and Dette and Kutta (2021)
consider change point inference under similar weak dependence conditions for
the spectrum and eigenfunctions, respectively, of covariance operators, the latter
reference considering also the relevant testing framework.
Appendix A
Central limit theory has been among the main focal points of probability research
since its inception. The functional version of the central limit theorem, which
generally refers to the central limit theorem for the partial sum process, appeared
in the fundamental work of Donsker (1951, 1952). Billingsley (1968) remains an
excellent entry point to the study of weak convergence of empirical processes, as
well as Vaart and Wellner (1996). Let .E1 , E2 , . . . be a sequence of random variables
and define the partial sums process
LN
E t⎦
1
SN (t) =
. Ei , t ∈ [0, 1]. (A.1.1)
N 1/2
i=1
.{SN (t), 0 ≤ t ≤ 1} converges weakly in the metric space .D[0, 1], namely the
space of real-valued functions on .[0, 1] that are right-continuous and have left-hand
limits, endowed with the Skorokhod metric. Using the Skorokhod–Dudley–Wichura
representation theorem, we can reformulate (A.1.2) in the following way: By
enlarging the probability space on which .SN is defined if necessary, there exist
Wiener processes .{WN (t), t ∈ [0, 1]}, all defined on the common probability space
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 501
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2
502 A Appendix
See Skorokhod (1956), Dudley (1999), and Wichura (1970). The potential
enlargement of the probability space required to obtain approximations as in (A.1.3)
is due to the fact that the sample space on which .SN is defined originally may
not be rich enough to support a Brownian motion. In subsequent results of this
type below we assume such enlargements have already been made if needed. We
generally state results on the weak convergence of empirical processes as in (A.1.2)
as approximations with copies of the limiting process as in (A.1.3).
Starting with the seminal work of Darling and Erdős (1956), convergence rates
of the weak convergence of partial sums became an active area of research, owing
to their many and varied applications in probability and statistics. As weighted
approximations are useful in justifying many asymptotic results in change point
analysis, we state the results we use in the paper in that form. Csörgő and
Horváth (1993, 1997) contain the theory and several applications of weighted
approximations of partial sums, as well as for standard empirical and quantile
processes. Motivated by these goals and (A.1.3), we wish to obtain approximations
in the following form: for each N, there exist Wiener processes .{WN,1 (t), 0 ≤ t ≤
N/2} and .{WN,2 (t), 0 ≤ t ≤ N/2} such that
| k |
|E |
1 | |
. max | Ei − WN,1 (k)| = OP (1), (A.1.4)
1≤k≤N/2 k ζ | |
i=1
| N |
| E |
1 | |
. max | Ei − WN,2 (N − k)| = OP (1), (A.1.5)
N/2<k<N (N − k)ζ | |
i=k+1
and
S(k) = E1 + E2 + . . . + Ek = W (τ1 + τ2 + . . . + τk ),
. k ∈ N. (A.1.8)
Although it does not lead to the best possible rate in (A.1.4), we assume that .ν <
4. Using the Marcinkiewicz–Zygmund inequality (Marcinkiewicz and Zygmund,
1937) it may be shown that
| k |
|E |
| |
.| τi − k | = o(k 2/ν ) a.s.
| |
i=1
Hence by the modulus of continuity of W (see Csörgő and Révész, 1979) we get
that
satisfies the moment and weak dependence conditions .EEi = 0, .E|Ei |ν < ∞, and
( | | )
vm = E |Ei − Ei,m
.
∗ |ν 1/ν
≤ am−α with some a > 0 and α > 2,
∗ ∗ , η∗
= g(ηi , . . . , ηi−m+1 , ηi−m ∗
where .Ei,m i−m−1 , . . .), and .{ηk , k ∈ Z} are
independent, identically distributed copies of .η0 , independent of .{ηj , j ∈ Z}.
For brevity we simply say that a sequence satisfying Definition A.1.1 is .Lν -
decomposable. Note that all Bernoulli shift sequences are strictly stationary
and ergodic; see Breiman (1968), Proposition 6.6. Part (2) of Definition A.1.1
describes how well the sequence .{Ei , i ∈ Z} can be approximated by
a sequence exhibiting a finite range of dependence. Notice with .Ei,m ' =
∗,i ∗,i
g(ηi , . . . , ηi−m+1 , ηi−m , ηi−m−1 , . . .), where .{ηk∗,i , k ∈ Z} are independent,
identically distributed copies of .η0 , independent of .{ηj , j ∈ Z}, and independent
D
∗ ) = (E , E ' ), so that under Definition A.1.1,
for each i, that .(Ei , Ei,m i i,m
| |
(E |Ei − Ei,m
.
' |ν 1/ν
) ≤ cm−α ,
' , i ∈ Z} is m-dependent.
and the sequence .{Ei,m
Approximations of the partial sum process as in (A.1.4)–(A.1.7) can also be
established for .Lν -decomposable processes.
Theorem A.1.1 If .{Ei , i ∈ Z} is .Lν -decomposable, then for each N we can define
Wiener processes .{WN,1 (t), 0 ≤ t ≤ N/2} and .{WN,2 (t), 0 ≤ t ≤ N/2} such that
| k |
|E |
1 | |
. max | Ei − σ WN,1 (k)| = OP (1), (A.1.9)
1≤k≤N/2 k ζ | |
i=1
| N |
| E |
1 | |
. max | Ei − σ WN,2 (N − k)| = OP (1), (A.1.10)
N/2<k<N (N − k)ζ | |
i=k+1
and
ζ < 1/2,
. (A.1.12)
where
⎛ N ⎞2
1 E
. lim E Ei = σ 2 . (A.1.13)
N →∞ N
i=1
A Appendix 505
A proof of this result is given in Aue et al. (2014), which we outline for
multivariate .Lν -decomposable processes in Theorem A.1.3 below.
Theorem A.1.1 does not provide any information on what value of .ζ can be
taken in (A.1.12). A careful study of the proof of Theorem A.1.3 provides an
upper bound for .ζ , but it is far from optimal. The best possible value of .ζ is
attained by the Komlós–Major–Tusnády approximation in case of independent and
identically distributed random variables, and their results have been extended to
ν
.L -decomposable processes by Berkes et al. (2014). Although their result is more
Example A.1.1 (Linear Time Series) .{Ei , i ∈ Z} is said to follow a linear process
if
∞
E
Ei =
. cl ηi−l , (A.1.14)
l=0
|cl | ≤ cl−α−1 ,
. 1 ≤ l < ∞,
Ei = ηi h i
. and h2i = ω + β1 Ei−1
2
+ β2 h2i−1 , i ∈ Z, (A.1.15)
506 A Appendix
| |
with the convention . ∅ = 1. If .E log(η02 +1) < ∞, then the infinite sum in (A.1.16)
is almost surely finite if and only if .E log(β1 η02 + β2 ) < 0 (see Berkes et al., 2003
and Francq and Zakoian, 2010). Using the triangle inequality and the independence
of the .ηl ’s we get that
⎛ ⎛ ⎞ν ⎞1/ν
∞ | |
E j
. ⎝E ⎝ 2
(β1 ηi−k + β2 )⎠ ⎠
j =l+1 k=1
E j ⎛ ⎛
∞ | | ⎞ν ⎞1/ν ∞
E
≤ 2
E β1 ηi−k + β2 = ρj (A.1.17)
j =l+1 k=1 j =l+1
with .ρ = (E(β1 η02 + β2 )ν )1/ν . Hence part (2) of Definition A.1.1 holds for .{Ei , i ∈
Z} with any .α > 0, if .E(β1 η02 + β2 )ν < 1.
Example A.1.3 (Augmented GARCH Process) Duan (1997) replaced (A.1.15)
with
Ei = ηi h i
. and g(hi ) = a(ηi−1 )g(hi−1 ) + b(ηi−1 ), i ∈ Z, (A.1.18)
where .a(x), b(x), g(x) are measurable functions and .g(x) has a unique inverse. The
variables .{ηl , l ∈ Z} are independent and identically distributed random variables.
The unique stationary causal solution of (A.1.18) is
∞
E | |
l−1
g(hi ) =
. b(ηi−l ) a(ηi−k )
l=1 k=1
| |
(. ∅ = 1). Similarly to Example A.1.2,
⎛ | |ν ⎞1/ν ⎛ | |ν ⎞1/ν
| ∞ −1 | ∞ | −1 |
| E j| |
| E | j| |
|
. ⎝ |
E| b(ηi−j ) |
a(ηi−k )| ⎠ ≤ ⎝ |
E |b(ηi−j ) a(ηi−k )|| ⎠
|j =l+1 k=1 | j =l+1 | k=1 |
∞
E
= (E|b(η0 )|ν )1/ν ρ j −1
j =l+1
A Appendix 507
| |1/ν
with .ρ = |E |a(η0 )|ν | . If .E|b(η0 )|ν < ∞ and .E |a(η0 )|ν < 1, then we
have Definition A.1.1 with any .α > 0. Carrasco and Chen (2002) shows that
nearly all univariate GARCH sequences can be written as augmented GARCH
processes satisfying (A.1.18). Hörmann (2008) contains a detailed description of
the properties of augmented GARCH processes.
Example A.1.4 (Random Coefficient Models) Andél (1976) and Nicholls and
Quinn (1982) defined the RCA(1) sequence as the solution of
∞
E | |
l−1
Ei =
. ηi−l,2 ηi−k+1,1
l=0 k=1
E
(. ∅ = 1), which is finite, if .E| log |η0,2 || < ∞ and .E log |η0,1 | < 0. As before,
⎛ | |ν ⎞1/ν ⎛ | |ν ⎞1/ν
| ∞ −1 | ∞ | −1 |
| E j| |
| E | j| |
|
. ⎝ |
E| ηi−j,2 |
ηi−k+1,1 | ⎠ ≤ ⎝ |
E |ηi−j,2 ηi−k+1,1 || ⎠
|j =l+1 k=1 | j =l+1 | k=1 |
∞
E
= E|η0,2 |ν ρ j −1 ,
j =l+1
so Definition A.1.1 holds for all .α > 0, if .E|η0,1 |ν < 1. Aue et al. (2006) and Berkes
et al. (2009c) contain the necessary and sufficient condition for the existence of a
unique solution of (A.1.19) and estimation theory for the parameters of the RCA(1).
It is straightforward to generalize Definition A.1.1 to vector valued random
variables. Let .|| · || denote the Euclidean norm of vectors and matrices.
Definition A.1.2 We say that .{E i ∈ Rd , i ∈ Z} is a vector valued
ν
.L -decomposable Bernoulli shift if .E i = g(ηi , ηi−1 , . . .), where .g is a
(deterministic) measurable function, .S ∞ → Rd , .EE i = 0, .E||E i ||ν < ∞ with
some .ν > 2, .{ηi , i ∈ Z} are independent and identically distributed random
variables with values in a measurable space .S,
( || ||ν )1/ν
. E ||E i − E ∗i,m || ≤ cm−α with some c > 0 and α > 2, (A.1.20)
Theorem A.1.3 If Assumption A.1.2 holds, then for each N we can define Wiener
processes .{WN,1 (t), 0 ≤ t ≤ N/2} and .{WN,2 (t), 0 ≤ t ≤ N/2} such that
.EWN,1 (t) = EWN,2 (t) = 0, WN,1 (t)W
T (s) = W T
N,1 N,2 (t)WN,2 (s) = min(t, s)E
| k |
|E |
1 | |
. max | E i − WN,1 (k)| = OP (1), (A.1.21)
1≤k≤N/2 k ζ | |
i=1
| N |
| E |
1 | |
. max | E i − WN,2 (N − k)| = OP (1), (A.1.22)
N/2<k<N (N − k)ζ | |
i=k+1
and
ζ < 1/2,
. (A.1.24)
where
⎛ N ⎞ ⎛ N ⎞T
1 E E
. lim E Ei Ei = E. (A.1.25)
N →∞ N
l=1 l=1
Proof The proof is rather technical so we just outline the major steps and explain
why the proof of (A.1.21) implies (A.1.22) and (A.1.23). A detailed proof is given
in Aue et al. (2014). First we define integers .ri so that .r1 = 1, and .Lri − ri−1 ⎦ = i a ,
with a suitably chosen a. Let .Ti = {ri−1 + 1, ri−1 + 2, . . . , ri }. We will show that
at the points .ri ,
|| i ||
||E E
i ||
|| || ζ
. || Zl − Ql || = o(ri ) a.s., (A.1.26)
|| ||
l=1 l=1
where
E E
Zl =
. Ej and Ql = Nj ,
j ∈Tl j ∈Tl
and .Nj , 1 ≤ j < ∞ are independent identically distributed normal random vectors
with .ENj = 0 and .ENj NT j = E. (A.1.26) implies (A.1.21). In order to proceed,
we must verify that the sum of the .E l ’s and .Nl ’s does not change too much if we
A Appendix 509
sum these vectors for values of the elements of .Ti . Let .i ∗ be the largest integer such
that .ri ∗ ≤ k. If
|| ||
|| v ||
|| E ||
|| E l ||
ζ
. max
ri ∗ ≤v<ri ∗ +1 || || = O(rk ) a.s. (A.1.27)
||l=ri ∗ +1 ||
and
|| ||
|| E ||
|| v ||
|| Nl ||
ζ
. max ||
ri ∗ ≤v<ri ∗ +1 || || = O(rk ) a.s. (A.1.28)
l=ri ∗ +1 ||
hold, then (A.1.26) implies (A.1.21). These rely on maximal inequalities for
partial sums. Since the .Nl ’s are independent identically distributed normal random
vectors, one can easily verify that the maximum in (A.1.28) is bounded by
.((ri ∗ +1 − ri ∗ ) log(ri ∗ +1 − ri ∗ ))
1/2 which is smaller than .r ζ by the definitions of
k
.rk . Computing the moments of the sum in (A.1.26), standard maximal inequality
where .|Ilj | = Llb ⎦ and .|Jlj | = Llc ⎦, 1 ≤ j ≤ m(l), .0 < c < b < a. We use .|T | for
the cardinality of the set T . .Rl contains all the remaining elements of .Tl which are
not in .Jlj ∪ Ilj , 1 ≤ j ≤ m(l). We say that the .Jlj ’s are short blocks and .Ilj ’s are
long blocks. For .j ∈ Tl , let .E ∗j,nl with .nl = |Jl1 |. We recall that .E ∗j,k is defined in
Assumption A.1.2. We replaced the dependent vectors with .nl = |Jl1 | independent
vectors for each .l = 1, 2, . . .. The next step is to prove that the difference between
the sum of the .E j ’s and the .E ∗j,nl ’s is small. Let
E
.Z∗l = E ∗j,nl .
j ∈Tl
Next we prove that in .Z∗l only the large blocks matter. So let
E
m(l) E E
E ∗k,nl + E ∗k,nl .
(1)
Ul =
.
j =1 k∈Jlj k∈Rl
510 A Appendix
Similarly,
E
m(l) E
E ∗k,nl ,
(2)
Ul =
.
j =1 k∈Ilj
and therefore
Z∗l = Ul + Ul .
(1) (2)
.
Thus ∗
E .Zl ∗is written as the sum of variables in short and long blocks. We note that
. k∈Jlj E k,nl , 1 ≤ j ≤ m are independent random vectors. So similarly to (A.1.27)
we can establish that
|| i ||
||E ||
|| (1) || ζ
. || Ul || = o(ri ) a.s.
|| ||
l=1
E (2)
We proved that we need to provide approximation for . il=1 Ul . Using the choice
of .nl , we have that
−1/2
E
.Vl,j = n
l E ∗k,nl , 1 ≤ j ≤ m,
k∈Ilj
1/2
E
m(l)
U(2)
.
l = nl Vl,j .
j =1
(2)
It is clear that .Ul , l = 1, 2, . . . are independent random vectors due to the
definition of .E ∗ ’s. Using the observation that .Vl,j , 1 ≤ j ≤ m(l) are independent
(2)
identically random vectors, we can now approximate .Ul with a suitably con-
structed normal random vector (see Lemmas S2.3 and S2.4 in Aue et al., 2014).
Thus we can define independent identically distributed normal random vectors
.{Ml , l ≥ 1} such that .EMl = 0 and .EMl M
T = E and
l
{ || || }
|| ||
P ||(nl m(l))−1/2 Ul − Ml || > cl−(2+ρ) ≤ c1 l−(2+ρ)
(2)
. (A.1.29)
with some .c1 > 0, ρ > 0. Using (A.1.29) with the Borel–Cantelli lemma we
conclude
|| i ||
||E E
i ||
|| (2) || ζ
. || Ul − (nl m(l))1/2 Ml || = o(ri ) a.s. (A.1.30)
|| ||
l=1 l=1
A Appendix 511
with some .ζ = ζ (a, b, c) < 1/2. We note that there are independent identically
distributed normal random variables .Nl,j with .ENl,j = 0 and .ENl,j NTl,j = E
such that
E
m(l) E
.(m(l)nl )1/2 Ml = Nl,j .
k=1 j ∈Ilk
The last step in the proof of (A.1.21) is the verification that the sums of independent
identically distributed random vectors .Nl,j , .ENl,j = 0 and .ENl,j NT l,j = E on
short blocks are negligible. Elementary arguments based on the properties of the
normal distribution yield that
|| ||
||E E E ||
|| i m(l) ||
|| Nl,j ||
ζ
.
|| || = o(ri ) a.s.
|| l=1 k=1 j ∈Jl ||
k
and
|| ||
|| i ||
||E E ||
|| Nl,j ||
ζ
.
|| || = o(ri ) a.s.
|| l=1 j ∈Rl ||
where .{Al , 0 ≤ l < ∞} are .d ×d matrices, .ηi ∈ Rd are independent and identically
distributed random vectors. Let .||·|| be a vector norm, and we use .||·|| for the induced
matrix (linear operator) norm. Similarly to Example A.1.1, if .Eηi = 0, .||ηi ||ν < ∞,
.ν > 2 and
||Al || ≤ cl−α−1 ,
.
∞ l−1
E | |
Ei =
. A(ηi−j )b(ηi−l ). (A.1.33)
l=0 j =0
If there is a vector norm .ψ such that .E||b(η0 )||ψ < ∞ and .E||A(η0 )||νψ < 1,
then the infinite sum in (A.1.32) is absolutely convergent with probability one
and .Lν -decomposable. As in Example A.1.4, .{E i , i ∈ Z} is a decomposable
Bernoulli shift and (A.1.20) holds for all .α. However, the norm in Assumption A.1.2
must be interpreted is a .ψ norm, instead of the Euclidean norm. Pham (1986)
proves that .{E i , l ∈ Z} is .β mixing under these conditions. Carrasco and
Chen (2002) sharpened the .β mixing bounds and provided several examples for
processes satisfying (A.1.32), including standard and power GARCH models, and
autoregressive conditional duration model.
Aue et al. (2009b) shows that several multivariate processes satisfy Assump-
tion A.1.2. They prove that the constant conditional correlation GARCH models of
Bollerslev (1990) and Jeantheau (1998), the multivariate exponential GARCH of
Kawakatsu (2006) are decomposable Bernoulli shifts.
A Appendix 513
1E
k
Fk (x) =
. 1 {Xi ≤ x}
k
i=1
where
( )
. EKN (t, x)KN t ' x ' (A.1.34)
∞
E
( ) ┌ { } ┐
= min t, t ' P X0 ≤ x, Xl ≤ x ' − F (x)F (x ' ) .
l=−∞
Berkes et al. (2009b) also show that under the conditions of Theorem A.1.4
that the infinite sum defining the covariance function in (A.1.34) is absolutely
convergent.
514 A Appendix
Let .{W (t), t ≥ 0} be a Wiener process. The following lemma appeared first in
Csörgő and Révész (1979) and a detailed proof is given in Csörgő and Révész
(1981).
Theorem A.2.1 For any .E > 0 there exists a constant .C = C(E) > 0 such that the
inequality
⎧ ⎫ ⎛ ⎞
CT v2
P
. sup sup |W (t + s) − W (t)| ≥ vh 1/2
≤ exp −
0≤t≤T 0≤s≤h h 2+E
(1 − t1 )t2
r=
. , (A.2.2)
t1 (1 − t2 )
1 1
a(x) = (2 log x)1/2
. and b(x) = 2 log x + log log x − log π.
2 2
The next theorem is usually called a Darling–Erdős type limit result. It is proven in
Darling and Erdős (1956) and later generalized to more general stationary Gaussian
sequences by Qualls and Watanabe (1972). For a survey on limit results for the
A Appendix 515
where
⎛ ⎞
1 1 2
.φ(x) = exp − x
(2π )1/2 2
ν−1/2 ν−1/2
a = max(γ1
. a1 (ν), γ2 a2 (ν)),
516 A Appendix
where
r r
. → γ1 , → γ2 , as min(t1 , 1 − t2 ) → 0, (A.2.3)
t1 1 − t2
and
r = min(t1 , 1 − t2 ).
. (A.2.4)
Theorem A.2.5 If .ν > 1/2, .0 < t1 < t2 < 1 and .min(t1 , 1 − t2 ) → 0, then we
have
|B(t)| D
.r ν−1/2 sup → a(ν).
t1 ≤t≤t2 [t (1 − t)]ν
Also,
| |
| |B(t)| |W (t)| ||
| t
. | sup − sup | ≤ |W (1)| sup
|t1 ≤t≤1/2 [t (1 − t)]ν t1 ≤t≤1/2 [t (1 − t)]ν | t1 ≤t≤1/2 [t (1 − t)]
ν
= OP (t11−ν ) (A.2.6)
|W (s)| |W (s)|
. sup ν (1 − st )ν
≤ 2ν sup → 0 a.s.,
1/(t1 log(1/t1 ))≤s≤1/(2t1 ) s 1 1/(t1 log(1/t1 ))≤s≤1/(2t1 ) sν
|
|W (1) − W (t)| ||
× sup | (A.2.9)
t2 −1/ log(1/(1−t2 ))≤t≤t2 (1 − t)ν |
= oP (1).
Since .{W (t), 0 ≤ t ≤ 1/2} and .{W (1) − W (t), 1/2 ≤ t ≤ 1} are independent,
we have the independence of .a1 (ν) and .a2 (ν) in the definition of .a(ν). Using once
again the scale transformation of the Wiener process we have
and
|W (1) − W (t)|
(1 − t2 )1/2−ν
. sup
t2 −1/ log(1/(1−t2 ))≤t≤t2 (1 − t)ν
D |W (1 − t)|
= (1 − t2 )1/2−ν sup
t2 −1/ log(1/(1−t2 ))≤t≤t2 (1 − t)ν
|W (t)|
= (1 − t2 )1/2−ν sup
1−t2 ≤t≤1/ log(1/(1−t2 )) tν
D |W (t)|
= sup
1≤t≤1/[(1−t2 ) log(1/(1−t2 )] tν
|W (t)|
→ sup a.s.
1≤t<∞ tν
/ ∞
D D |W (t)|p
b1 (p, ν) = b2 (p, ν) =
. dt,
1 tν
and define
ν−p/2−1 ν−p/2−1
b(p, ν) = γ1
. b1 (p, ν) + γ2 b2 (p, ν),
Theorem A.2.6 Let .p ≥ 1. If .ν > p/2 + 1, .0 < t1 < t2 < 1 and .min(t1 , 1 − t2 ) →
0, then we have
/ t2 |B(t)|p D
r ν−p/2−1
. dt → b(p, ν),
t1 [t (1 − t)]ν
where
/ /
s1 |B(t)|p 1/2 |B(t)|p
A1 =
. dt, A2 = dt,
t1 [t (1 − t)]ν s1 [t (1 − t)]ν
/ /
s2 |B(t)|p t2 |B(t)|p
A3 =
. dt, and A4 = dt,
1/2 [t (1 − t)]ν s2 [t (1 − t)]ν
with .s1 = t1 log(1/t1 ) and .s2 = 1 − (1 − t2 ) log(1/(1 − t2 )). By the mean value
theorem we have
| |
. ||W (t) − tW (1)|p − |W (t)|p | ≤ p(|W (t) − tW (1)|p−1 + |W (t)|p−1 )t|W (1)|
and therefore
| / | ⎧ / s1
| s1 |W (t)|p | t|W (t)|p−1
. |A1 − dt | ≤ p2p |W (1)| dt
| [t (1 − t)] |
t1 [t (1 − t)]
ν ν
t1
/ s1 ⎫
tp
+ |W (1)| p
dt
t1 [t (1 − t)]
ν
⎛ ⎛
3/2+p/2−ν 3/2+p/2−ν
= OP max s1 , t1 ,
⎞⎞
p+1−ν p+1−ν
t1 , s1 ,
since
/ s1 / s1
t|W (t)|p−1 t 1/2+p/2
E
. dt = E|W (1)|p−1 dt.
t1 [t (1 − t)]ν t1 [t (1 − t)]ν
A Appendix 519
Thus we get
| / |
ν−p/2−1 || |W (t)|p |
s1
t1 A − dt | = oP (1). (A.2.10)
.
| 1 [t (1 − t)]ν |
t1
Also,
|/ / s1 |
ν−p/2−1 || |W (t)|p |W (t)|p ||
s1
.t
| dt − dt | (A.2.11)
t1 [t (1 − t)]
1 ν
t1 tν
| |/ s
| 1 | 1 ν−p/2−1 |W (t)|p
|
≤ |1 − | t dt
(1 − s1 )ν | t1 1 tν
= oP (1).
so by Markov’s inequality
ν−p/2−1
t1
. A2 = oP (1). (A.2.12)
Now the independence of .{W (t), 0 ≤ t ≤ 1/2} and .{W (1) − W (t), 1/2 ≤ t}
implies the independence of .b1 (p, ν) and .b2 (p, ν). By the scale transformation of
the Wiener process we have
⎧ / / t2 ⎫
ν−p/2−1 |W (t)|ps1 |W (1) − W (t)|p
t. 1 dt, (1 − t2 ) ν−p/2−1
dt (A.2.15)
t1 tν s2 tν
⎧ / s1 / t2 ⎫
D ν−p/2−1 |W (t)|p |W (1 − t)|p
= t1 dt, (1 − t2 ) ν−p/2−1
dt
t1 tν s2 tν
520 A Appendix
⎧ / / 1−s2 ⎫
D ν−p/2−1 |W (t)|p
s1 |W ∗ (t)|p
= t1 dt, (1 − t 2 ) ν−p/2−1
dt
t1 tν 1−t2 tν
⎧/ / (1−s2 )/(1−t2 ) ⎫
D
s1 /t1 |W (t)|p |W ∗ (t)|p
= dt, dt
1 tν 1 tν
where .{W ∗ (t), t ≥ 1} is a standard Wiener process, independent of .{W (t), t ≥ 1}.
Theorem A.2.6 follows from (A.2.13)–(A.2.15). ⨅
⨆
The next result extends the Darling–Erdős limit result of Theorem A.2.3 to .χ 2
processes. Let
d
a(x) = (2 log x)1/2
. and bd (x) = 2 log x + log log x − log ┌(d/2),
2
where .┌(t) denotes the Gamma function.
Theorem A.2.7 If .0 < t1 < t2 < 1, .min(t1 , 1 − t2 ) → 0, then for all .x ∈ R we
have that
⎧ ⎛ ⎞ ⎛ d ⎞1/2 ⎛ ⎞⎫
1 E B 2 (t) 1
.P a log r sup i
≤ x + bd log r → exp(−2e−x ),
2 t1 ≤t≤t2 t (1 − t) 2
i=1
/ ⎛ d ⎞p/2 ⎛ ⎞p/2
E E
d | |
d
a∗ (p, d) =
. xi2 yi2 (2π(1 − exp(−|u|))−1/2
R 2d+1 i=1 i=1 i=1
⎧⎛ ⎞
1
× exp − (x 2 + yi2 − 2exp(−|u|/2)xi yi
2(1 − exp(−|u|)) i
| |
d ⎫ ⎛| |
d
⎞⎛ d
| |
⎞
− φ(xi )φ(yi ) dxi dyi du,
i=1 i=1 i=1
A Appendix 521
where
⎛ ⎞/ ⎛ ⎞
p+d d
b∗ (p, d) = 2
.
p/2
┌ ┌ ,
2 2
⎧/ ⎛ d ⎞p/2
1 t2 1 E
. Bi2 (t)
(a∗ (p, d) log r)1/2 t1 [t (1 − t)](p/2+1)
i=1
⎫
D
− b∗ (p, d) log r → N,
Due to the covariance structure, .B(t) is called a Brownian bridge in .Rd with
covariance function .E. We say that .{W(t), t ≥ 1} is a Brownian motion with
values in .Rd with covariance .E, if it is a Gaussian process with .EW(t) = 0
and .EW(t)W(s) = min(t, s)E. Next we define the independent random variables
.ā1,E (d, ν) and .ā2,E (d, ν) with distribution
D D ||W(t)||
ā1,E (d, ν) = ā2,E (d, ν) = sup
. .
1≤t<∞ tν
Let
ν−1/2 ν−1/2
āE (d, ν) = max(γ1
. ā1,E (d, ν), γ2 ā2,E (d, ν)).
Similarly, .b̄1,E (p, ν), b̄2,E (p, ν) are independent and identically distributed,
/ ∞
D D ||W(t)||p
b̄1,E (p, ν) = b̄2,E (p, ν) =
. dt
1 tν
and
ν−p/2−1 ν−p/2−1
b̄E (p, ν) = γ1
. b̄1 (p, ν) + γ2 b̄2 (p, ν).
Theorem A.2.9 If .ν > 1/2, .0 < t1 < t2 < 1 and .min(t1 , 1 − t2 ) → 0, then we
have
||B(t)|| D
r ν−1/2 sup
. → āE (d, ν),
t1 ≤t≤t2 [t (1 − t)]ν
(M → ∞).
(ii) If
/ 1 t (1 − t)
. dt < ∞,
0 w(t)
then
⎛M / ⎞
E 1 Bi2 (t) D
−1/2
M
. dt − Mc3 → N (0, c4 ), (M → ∞) (A.2.17)
0 w(t)
i=1
A Appendix 523
⎛ ⎞2−4κ ⎛ ⎞1−8κ
1 1
.c1 = , c2 = ,
2 2
/ ⎛/ ⎞2
1 t (1 − t) 1 B 2 (t) − t (1 − t)
c3 =
. dt, c4 = E dt
0 w(t) 0 w(t)
where .┌(t) is a Gaussian process with .E┌(t) = 0 and .E┌(t)┌(s) = E[(B 2 (t) −
t (1 − t))(B 2 (s) − s(1 − s))]. For every .0 < δ < 1/2,
⎛ ⎞
1 E 2
M
−1/2 D [δ,1−δ] ┌(t)
M
.
2
(Bi (t) − t (1 − t)) −→ .
w (t) w 2 (t)
i=1
Using that .Bi (t) = Wi (t) − tWi (t), where .{Wi (t), t ≥ 0} are independent Wiener
processes, we get
|M |
|E B 2 (t) − t (1 − t) |
| i |
.| |
| w 2 (t) |
i=1
|M |
|E W 2 (t) − 2tW (t)W (1) + t 2 W 2 (1) − t (1 − t) |
| i i |
=| i i
|
| w 2 (t) |
i=1
|M | |M | |M |
|E W 2 (t) − t | |E tW (t)W (1) − t 2 | |E | t2
| | | i i | | |
≤| i
| + 2 | | + | (W 2
(1) − 1) | 2 .
| 2
w (t) | | 2
w (t) | | i | w (t)
i=1 i=1 i=1
≤ c5 |t − s|2 ,
M −1/2 M
. i=1 (Wi (t)Wi (1) − t) is tight. Since the finite dimensional distributions are
normal, we get
|M |
|E |
−1/2 | |
M
. sup | (Wi (t)Wi (1) − t)| = OP (1).
0<t<1 | i=1
|
Thus we have
⎧ | | ⎫
t || −1/2 E |
M
|
. lim lim sup P sup 2 |M (Wi (t)Wi (1) − t)| > x = 0
δ→0 M→∞ 0<t≤δ w (t) | i=1
|
for all .x ∈ R. Finally, using the Hájek–Rényi inequality for martingales (see Hall
and Heyde, 1980)
⎧ |M | ⎫
|E W 2 (t) − t |
−1/2 | i |
. lim lim sup P sup M | |>x =0
δ→0 M→∞ 0<t≤δ | w 2 (t) |
i=1
for all .δ > 0, since .t (1 − t)/w 2 (t) = [t (1 − t)]1−2κ reaches its largest value at
.t = 1/2. We showed the uniform continuity, and therefore
⎧ |M
|E B 2 (t) − t (1 − t)
|
. lim lim sup P sup M −1/2 | i
δ→0 M→∞ |t−1/2|≤δ | w 2 (t)
i=1
| ⎫
E Bi2 (1/2) − 1/4 ||
M
− |>x =0
w 2 (1/2) |
i=1
E
M
Bi2 (1/2) − 1/4 D
M −1/2
. → N (0, c2 ),
w 2 (1/2)
i=1
Since .B(t) = W (t) − tW (1), where .{W (t), t ≥ 0} is a Wiener process, we have
If .t ≤ s, then
EW 2 (t)W 2 (s) = E[W 2 (t)((W (s) − W (t))2 + 2(W (s) − W (t))W (t) + W 2 (t))]
.
= t (s − t) + 3t 2
≤ 3st,
526 A Appendix
and therefore
⎛/ 1/2 ⎞2 / 1/2 / s
W 2 (t) EW 2 (t)W 2 (s)
E
. dt =2 dtds
0 w(t) 0 0 w(t)w(s)
⎛/ 1/2 ⎞2
t
≤6 dt < ∞.
0 w(t)
⨆
⨅
We summarize here some important results used in this book related to change point
analysis of functional data. For a thorough treatments of functional data analysis we
refer to Horváth and Kokoszka (2012) and Kokoszka and Reimherr (2017). For the
sake of notational simplicity, we only state the results for functional data that are
stochastic processes taking values in the space .L2 ([0, 1], R) = L2 of real valued
square integrable functions defined on the unit interval, but they hold for variables
/ /1
general separable Hilbert spaces. We use . for . 0 and
⎛/ ⎞1/2
||f ||2 =
.
2
f (t)dt
for the norm in .L2 . We generalize the concept of Bernoulli shifts and decompos-
ability to random functions.
Definition A.3.1 We say .{Ei (t), i ∈ Z, t ∈ [0, 1]} is .Lν -decomposable if .Ei (t) =
g(ηi , ηi−1 , . . .)(t) for some (deterministic) measurable function .g : S ∞ → L2 ,
where .{ηj , j ∈ Z} are independent and identically distributed random variables
with values in a measurable space .S, and .Ei (t) = Ei (t, ω) is jointly measurable in
.(t, ω), for each .i ∈ Z. Further .EEi (t) = 0 for all .t ∈ [0, 1], .||Ei || < ∞ with some
ν
2
.ν > 2, and
( ∗
)1/ν
. E||Ei − Ei,m ||ν2 ≤ cm−α with some α > 2, (A.3.1)
∗ = g(η , . . . , η ∗ ∗ ∗
where .Ei,l i i−l+1 , ηi−l , ηi−l−1 , . . .), .{Ei , i ∈ Z} are independent
copies of .η0 , independent of .{ηl , l ∈ Z}.
Horváth and Kokoszka (2012) and Hörmann and Kokoszka (2010) provides
several examples for processes satisfying Definition A.3.1. Their examples include
the functional autoregressive process of Bosq (2000), linear processes, bilinear
processes (functional random coefficient model), functional ARCH of Hörmann
et al. (2013), and GARCH of Aue et al. (2017).
A Appendix 527
∞
E
D(t, s) =
. EE0 (t)El (s).
l=−∞
Remark A.3.2 In the process of proving Theorem A.3.2 Berkes et al. (2013)
establish that if .{Ei (t), i ∈ Z, t ∈ [0, 1]} is .Lν -decomposable then the infinite
sum defining .D(t, s) is absolutely convergent in .L2 ([0, 1]2 , R).
Corollary A.3.1 If .{Ei (t), i ∈ Z, t ∈ [0, 1]} is .Lν -decomposable, then we
can define two sequence of Gaussian processes .{┌N,1 (u, t), 0 ≤ u, t ≤ 1} and
.{┌N,2 (u, t), 0 ≤ u, t ≤ 1} such that
⎛ ⎞2
LN
E u⎦
. sup ⎝N −1/2 Ei (t) − ┌N,1 (u, t)⎠ dt = oP (1),
0≤u≤1/2 i=1
⎛ ⎞2
E
N
. sup ⎝N −1/2 Ei (t) − ┌N,2 (u, t)⎠ dt = oP (1),
0≤u≤1/2 i=N −LN u⎦
E┌N,2 (u, t) = 0, .E┌N,1 (u, t)┌N,1 (v, s) = min(u, v)D(t, s) and .E┌N,2 (u, t)┌N,2
(v, s) = min(u, v)D(t, s).
The computation of functionals of stochastic processes with sample paths in .L2
frequently makes use of the Karhunen–Loéve expansion.
Theorem A.3.3 (Karhunen–Loéve Expansion) If the process .{X(t), 0 ≤ t ≤ 1}
satisfies
/ 1
EX(t) = 0
. and EX2 (t)dt < ∞,
0
then
∞
E 1/2
X(t) =
. λi ζi φi (t),
l=1
Eζi = 0,
.
⎧
1, if i = j,
Eζi ζj =
.
0, if i =
/ j,
λ1 ≥ λ2 ≥ . . . ≥ 0,
.
/
λi φi (t) =
. C(t, s)φi (s)ds, i ≥ 1,
and
/ ⎧
1, if i = j,
. φi (t)φj (t)dt =
0, if i =
/ j,
|| ||
(1) (2) || ||
.|λi − λi | ≤ ||D(1) (t, s) − D(2) (t, s)|| ,
2
|| ||
(1) (2) || ||
. ||ϕi − ϕi || ≤ ai ||D(1) (t, s) − D(2) (t, s)|| ,
2
√
where if .i ≥ 2, .ai = 2 2 max{(λi−1 − λi )−1 , (λi − λi+1 )−1 }, and .a1 =
(2) (2) (2) (2)
√ (2) (2) −1
2 2(λ1 − λ2 ) .
For a proof see Lemma 2.2 in Horváth and Kokoszka (2012) and Lemma 4.3 of
Bosq (2000). For further results when some of the eigenvalues are repeated we refer
to Reimherr (2015) and Petrovich and Reimherr (2017).
Bibliography
K.M. Abadir, J.R. Magnus, Matrix Algebra, vol. 1 (Cambridge University Press, 2005)
G.P. Aielli, Dynamic conditional correlation: On properties and estimation. J. Bus. Econ. Stat. 31,
282–299 (2013)
F. Akashi, H. Dette, Y. Liu, Change-point detection in autoregressive models with no moment
assumptions. J. Time Ser. Anal. 39(5), 763–786 (2018)
J. Albin, On extremes and streams of upcrossings. Stoch. Process. Appl. 94, 271–300 (2001)
J. Albin, D. Jarušková, On a test statistic for linear trend. Extremes 6, 247–258 (2003)
J. Andél, Autoregressive series with random parameters. Math. Oper. Stat. 7, 735–741 (1976)
D. Andrews, Heteroskedasticity and autocorrelation consistent covariance matrix estimation.
Econometrica 59, 817–858 (1991)
D. Andrews, Tests for parameter instability and structural change with unknown change point.
Econometrica 61, 821–856 (1993)
D. Andrews, J. Monahan. An improved heteroskedasticity and autocorrelation consistent
covariance matrix estimator. Econometrica 60, 953–966 (1992)
J. Antoch, M. Hušková, Asymptotics, Nonparametrics, and Time Series: Estimators of Changes
(CRC Press, Boca Raton, 1999)
J. Antoch, M. Hušková, N. Veraverbeke, Change–point problem and bootstrap. J. Nonparam. Stat.
5, 123–144 (1995)
J. Antoch, M. Hušková, Z. Prášková, Effect of dependence on statistics for determination of
change. J. Stat. Plan. Inference 60, 291–310 (1997)
J. Antoch, J. Hanousek, L. Horváth, M. Hušková, S. Wang, Structural breaks in panel data: Large
number of panels and short length time series. Econom. Rev. 38(7), 828–855 (2019)
S. Arlot, A. Celisse, Z. Harchaoui, A kernel multiple change-point algorithm via model selection.
J. Mach. Learn. Res. 20(162), 1–56 (2019)
P. Aschersleben, M. Wagner, cointReg: Parameter Estimation and Inference in a Cointegrating
Regression (2016). R package version 0.2.0
S. Astill, D.I. Harvey, S.J. Leybourne, A.M.R. Taylor, Y. Zu, Cusum-based monitoring for
explosive episodes in financial data in the presence of time-varying volatility. J. Financ.
Econom. 21, 187–227 (2023)
J. Aston, C. Kirch, Detecting and estimating epidemic changes in dependent functional data. J.
Multivariate Anal. 109, 204–220 (2012)
J.A.D. Aston, C. Kirch, High dimensional efficiency with applications to change point tests.
Electron. J. Stat. 12(1), 1901–1947 (2018)
A. Aue, Strong approximation for RCA(1) time series with applications. Stat. Probab. Lett. 68,
369–382 (2004)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 531
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2
532 Bibliography
A. Aue, L. Horváth, Delay time in sequential detection of change. Stat. Probab. Lett. 67(3),
221–231 (2004). ISSN 0167-7152
A. Aue, L. Horváth, J. Steinebach, Estimation in random coefficient autoregressive models. J.
Time Series Anal. 27, 61–76 (2006)
A. Aue, L. Horváth, M. Hušková, P. Kokoszka, Testing for changes in polynomial regression.
Bernoulli 14, 637–660 (2008)
A. Aue, S. Hörmann, L. Horváth, M. Reimherr, Break detection in the covariance structure of
multivariate time series models. Ann. Stat. 37, 4046–4087 (2009a)
A. Aue, L. Horváth, M. Hušková, Extreme value theory for stochastic integrals of Legendre
polynomials. J. Multivariate Anal. 100, 1029–1043 (2009b)
A. Aue, L. Horváth, M. Hušková, Segmenting mean-nonstationary time series via trending
regressions. J. Econom. 168, 367–381 (2012)
A. Aue, S. Hörmann, L. Horváth, M. Hušková, Dependent functional linear models with
applications to monitoring structural change. Stat. Sin. 24, 1043–1073 (2014)
A. Aue, L. Horváth, D. Pellatt, Functional generalized autoregressive conditional heteroscedastic-
ity. J. Time Ser. Anal. 38, 3–21 (2017)
A. Aue, G. Rice, O. Sönmez, Detecting and dating structural breaks in functional data without
dimension reduction. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 80, 509–529 (2018)
A. Aue, G. Rice, O. Sönmez, Structural break analysis for spectrum and trace of covariance
operators. Environmetrics 31(1), e2617 (2020)
F. Avalos, Do oil prices drive food prices? the tale of a structural break. J. Int. Money Finance 42,
253–271 (2014)
I. Axt, R. Fried, On variance estimation under shifts in the mean. AStA Adv. Stat. Anal. 104,
417–457 (2020)
J. Bai, Least squares estimation of a shift in linear processes. J. Time Ser. Anal. 15(5), 453–472
(1994)
J. Bai, Least absolute deviation estimation of a shift. Econom. Theory 11, 403–436 (1995)
J. Bai, Estimation of a change point in multiple regression models. Rev. Econ. Stat. 79, 551–563
(1997)
J. Bai, Likelihood ratio tests for multiple structural changes. J. Econom. 91, 299–323 (1999)
J. Bai, Panel data models with interactive fixed effects. Econometrica 77, 1229–1279 (2009)
J. Bai, Common breaks in means and variances for panel data. J. Econom. 157, 78–92 (2010)
J. Bai, S. Ng, Determining the number of factors in approximate factor models. Econometrica 70,
191–221 (2002)
J. Bai, P. Perron, Estimating and testing linear models with multiple structural changes.
Econometrica 66, 47–78 (1998)
J. Bai, P. Perron, Computation and analysis of multiple structural change models. J. Appl. Econom.
18, 1–22 (2003)
P. Bai, A. Safikhani, G. Michailidis, Multiple change points detection in low rank and sparse high
dimensional vector autoregressive models. IEEE Trans. Signal Process. 68, 3074–3089 (2020)
B.H. Baltagi, Econometric Analysis of Panel Data, 6th edn. (Springer, New York, 2021)
M. Barassi, L. Horváth, Y. Zhao, Change point detection in time varying correlation structure. J.
Bus. Econ. Stat. 38, 340–349 (2020)
J.-M. Bardet, W. Kengne, Monitoring procedure for parameter change in causal time series. J.
Multivariate Anal. 125, 204–221 (2014)
M. Barigozzi, H. Cho, P. Fryzlewicz, Simultaneous multiple change-point and factor analysis for
high-dimensional time series. J. Econom. 206(1), 187–225 (2018)
D. Bauer, Estimating linear dynamical systems using subspace methods. Econom. Theory 21,
181–211 (2005)
I. Berkes, L. Horváth, Approximations for the maximum of stochastic processes with drift.
Kybernetika 39, 299–306 (2003a)
I. Berkes, L. Horváth, The rate of consistency of the quasi-maximum likelihood estimator. Stat.
Probab. Lett. 61, 133–143 (2003b)
Bibliography 533
I. Berkes, L. Horváth, P.S. Kokoszka, GARCH processes: structure and estimation. Bernoulli 9,
201–227 (2003)
I. Berkes, E. Gombay, L. Horváth, P. Kokoszka, Sequential change–point detection in garch.(p, q)
models. Econom. Theory 20, 1140–1167 (2004)
I. Berkes, E. Gombay, L. Horváth, Testing for changes in the covariance structure of linear
processes. J. Stat. Plan. Inference 139, 2044–2063 (2009a)
I. Berkes, S. Hörmann, J. Schauer, Asymptotic results for the empirical process of stationary
sequences. Stoch. Process. Appl. 119, 1298–1324 (2009b)
I. Berkes, L. Horváth, S. Ling, Estimation in nonstationary random coefficient autoregressive
models. J. Time Series Anal. 30, 395–416 (2009c)
I. Berkes, S. Hörmann, J. Schauer, Split invariance principles for stationary processes. Ann.
Probab. 39, 2441–2473 (2011)
I. Berkes, L. Horváth, G. Rice, Weak invariance principles for sums of dependent random
functions. Stoch. Process. Appl. 123, 385–403 (2013)
I. Berkes, W. Liu, W. Wu, Komlós-major-tusnády approximation under dependence. Ann. Probab.
42, 794–817 (2014)
I. Berkes, L. Horváth, G. Rice, On the asymptotic normality of kernel estimators of the long run
covariance of functional time series. J. Multivariate Anal. 144, 150–175 (2016)
A. Betken, Testing for change-points in long-range dependent time series by means of a self-
normalized Wilcoxon test. J. Time Ser. Anal. 37(6), 785–809 (2016)
P. Billingsley, Convergence of Probability Measures (Wiley, New York, 1968)
C. Bin, H. Yongmiao Detecting for smooth structural changes in garch models. Econom. Theory
32(3), 740–791 (2016)
N.H. Bingham, C.M. Goldie, J.L. Teugels, Regular Variation. Encyclopedia of Mathematics and
its Applications (Cambridge University Press, 1987)
J.R. Blum, J. Kiefer, M. Rosenblatt, Distribution free tests of independence based on the sample
distribution function. Ann. Math. Stat. 32(2), 485–498 (1961)
O. Boldea, A. Cornea-Madeira, A. Hall, Bootstrapping structural change tests. J. Econom. 213,
359–397 (2019)
T. Bollerslev, Modelling the coherence in short run nominal exchange rates: A multivariate
generalized ARCH model. Rev. Econ. Stat. 72, 498–505 (1990). Reprinted in ARCH: Selected
Readings (ed. R. F. Engle), Oxford University Press (1995)
D. Bosq, Linear Processes in Function Spaces (Springer, New York, 2000)
D. Bosq, D. Blanke, Inference and Prediction in Large Dimensions (Wiley, 2007)
F. Boussama, F. Fuchs, R. Stelzer, Stationarity and geometric ergodicity of bekk multivariate garch
models. Stoch. Process. Appl. 121, 2331–2360 (2011)
R.C. Bradley, Introduction to Strong Mixing Conditions, vols. 1,2,3 (Kendrick Press, 2007)
L. Breiman, Probability. Classics in Applied Mathematics (Society for Industrial and Applied
Mathematics, 1968)
P.J. Brockwell, R.A. Davis, Time Series: Theory and Applications, 2nd edn. (Springer, 2006)
P. Brohan, J.J. Kennedy, I. Harris, S.F.B. Tett, P.D. Jones, Uncertainty estimates in regional and
global observed temperature changes: A new data set from 1850. J. Geophys. Res. 111, D12106
(2006)
B. Bucchia, M. Wendler, Change–point detection and bootstrap for Hilbert space valued random
fields. J. Multivariate Anal. 155, 344–368 (2017)
A. Bücher, I. Kojadinovic, T. Rohmer, J. Segers, Detecting changes in cross-sectional dependence
in multivariate time series. J. Multivariate Anal. 132, 111–128 (2014)
A. Bücher, J.-D. Fermanian, I. Kojadinovic, Combining cumulative sum change-point detection
tests for assessing the stationarity of univariate time series. J. Time Ser. Anal. 40(1), 124–150
(2019)
F. Busetti, A. Harvey, Testing for the presence of a random walk in series with structural breaks.
J. Time Ser. Anal. 22, 127–150 (2001)
F. Busetti, A. Harvey, Further comments on stationarity tests in series with structural breaks at
unknown points. J. Time Ser. Anal. 24, 137–140 (2003)
534 Bibliography
F. Busetti, A. Taylor, Test of stationarity against a change in persistence. J. Econom. 123, 33–66
(2004)
M.M. Carhart, On persistence in mutual fund performance. J. Finance 52(1), 57–82 (1997)
M. Carrasco, X. Chen, Mixing and moment properties of various GARCH and stochastic volatility
models. Econom. Theory 18, 17–39 (2002)
G. Cavaliere, A. Taylor, Testing for a change in persistence in the presence of non–stationary
volatility. J. Econom. 147, 84–98 (2008)
G. Cavaliere, D. Harvey, S. Leybourne, A. Taylor, Testing for unit roots in the presence of a
possible break in trend and nonstationarity volatility. Econom. Theory 27, 957–991 (2011)
C. Cerovecki, C. Francq, S. Hörmann, J. Zakoían, Functional GARCH models: the quasi-likelihood
approach and its applications. J. Econom. 209, 353–375 (2019)
S. Chakar, E. Lebarbier, C. Lévy-Leduc, S. Robin, A robust approach for estimating change-points
in the mean of an .AR(1) process. Bernoulli 23(2), 1408–1447 (2017)
J. Chan, L. Horváth, M. Hušková, Darling–erdös limit results for change–point detection in panel
data. J. Stat. Plan. Inference 143, 955–970 (2013)
H. Chen, Sequential change-point detection based on nearest neighbors. Ann. Stat. 47(3), 1381–
1407 (2019)
K. Chen, A. Cohen, H. Sackrowitz, Consistent multiple testing for change points. J. Multivariate
Anal. 102, 1339–1343 (2011)
L. Chen, W. Wang, W.B. Wu, Inference of breakpoints in high-dimensional time series. J. Am.
Stat. Assoc. (2021)
S. Chenouri, A. Mozaffari, G. Rice, Robust multivariate change point analysis based on data depth.
Canad. J. Stat. 48(3), 417–446 (2020)
J.-M. Chiou, Y.-T. Chen, T. Hsing, Identifying multiple changes for a functional data sequence
with application to freeway traffic segmentation. Ann. Appl. Stat. 13(3), 1430–1463 (2019)
H. Cho, Change-point detection in panel data via double CUSUM statistic. Electron. J. Stat. 10(2),
2000–2038 (2016)
H. Cho, P. Fryzlewicz, Multiple–change–point detection for high dimensional time series via
sparsified binary segmentation. J. R. Stat. Soc. Ser. B 77, 475–507 (2015)
H. Cho, C. Kirch, Data segmentation algorithms: Univariate mean change and beyond. Econom.
Stat. (2021)
T.T.L. Chong, Structural change in AR(1) models. Econom. Theory 17, 87–155 (2001)
Y.S. Chow, A.C. Hsiung, Limiting behavior of .maxj ≤n sj j −d and the first passage times in a
random walk with positive drift. Bull. Inst. Math. Acad. Sin. 4, 35–44 (1976)
C.-S.J. Chu, M. Stinchcombe, H. White, Monitoring structural change. Econometrica 64(5), 1045–
65 (1996)
S.A. Churchill, J. Inekwe, K. Ivanovski, R. Smyth, The environmental Kuznets curve in the OECD:
1870–2014. Energy Econ. 75, 389–399 (2018)
G. Ciuperca, A general criterion to determine the number of change–points. Stat. Probab.Lett. 81,
1267–1275 (2011)
G. Claeskens, N.L. Hjort, Model Selection and Model Averaging (Cambridge University Press,
Leiden, 2008)
F. Comte, O. Lieberman, Asymptotic theory for multivariate GARCH processes. J. Multivariate
Anal. 84, 61–84 (2003)
C.M. Crainiceanu, T.J. Vogelsang, Nonmonotonic power for tests of a mean shift in a time series.
J. Stat. Comput. Simul. 77(6), 457–476 (2007)
M. Csörgő, Some rényi type limit theorems for empirical distribution functions. Ann. Math. Stat.
36, 322–326 (1965)
M. Csörgő, L. Horváth, Rényi–type empirical processes. J. Multivariate Anal. 41, 338–358 (1992)
M. Csörgő, L. Horváth, Weighted Approximations in Probability and Statistics (Wiley, New York,
1993)
M. Csörgő, L. Horváth, Limit Theorems in Change–Point Analysis (Wiley, New York, 1997)
M. Csörgő, P. Révész, How big are the increments of a wiener process. Ann. Probab. 7, 731–737
(1979)
Bibliography 535
M. Csörgő, P. Révész, Strong Approximations in Probability and Statistics (Academic Press, New
York, 1981)
M. Csörgő, S. Csörgő, L. Horváth, D. Mason, Weighted empirical and quantile processes. Ann.
Probab. 14, 31–85 (1986)
M. Csörgő, L. Horváth, Q. Shao, Convergence of integrals of uniform empirical and quantile
processes. Stoch. Process. Appl. 45, 283–294 (1993)
V. Dalla, L. Giraitis, P.M. Robinson, Asymptotic theory for time series with changing mean and
variance. J. Econom. 219(2), 281–313 (2020)
D.A. Darling, P. Erdős, A limit theorem for the maximum of normalized sum of independent
random variables. Duke Math. J. 23, 143–155 (1956)
R.A. Davis, C.Y. Yau, Consistency of minimum description length model selection for piecewise
stationary time series models. Electron. J. Stat. 7, 381–411 (2013)
R.D. Davis, D. Huang, Y-C. Yao, Testing for a change in the parameter values and order of an
autoregressive model. Ann. Stat. 23, 282–304 (1995)
R.A. Davis, T.C.M. Lee, G.A. Rodriguez-Yam, Structural break estimation for nonstationary time
series models. J. Am. Stat. Assoc. 101, 223–239 (2006)
R.A. Davis, T.C.M Lee, G.A. Rodriguez-Yam, Break detection for a class of nonlinear time series
models. J. Time Ser. Anal. 29, 834–867 (2008)
Y. Davydov, R. Zitikis, On weak convergence of random fields. Ann. Inst. Stat. Math. 60, 345–365
(2008)
J. Dedecker, P. Doukhan, G. Lang, J.R. León, S. Louhichi, C. Prieur, Weak Dependence: With
Examples and Applications (Springer, 2007)
H. Dehling, T. Mikosch, M. Sørensen, Empirical Process Techniques for Dependent Data
(Birkhäuser, 2002)
H. Dehling, O. Durieu, D. Volný, New techniques for empirical processes of dependent data. Stoch.
Process. Appl. 119(10), 3699–3718 (2009)
H. Dehling, K. Vuk, M. Wendler, Change-Point Detection Under Dependence Based on Two-
Sample U-Statistics (Springer New York, New York, 2015), pp. 195–220
H. Dehling, K. Vuk, M. Wendler, Change-point detection based on weighted two-sample U-
statistics. Electron. J. Stat. 16(1), 862–891 (2022)
A. Deng, P. Perron, A non-local perspective on the power properties of the cusum and cusum of
squares tests for structural change. J. Econom. 142, 212–240 (2008)
H. Dette, T. Kutta, Detecting structural breaks in eigensystems of functional time series. Electron.
J. Stat. 15(1), 944–983 (2021)
H. Dette, W. Wu, Z. Zhou, Change point analysis of correlation in non-stationary time series. Stat.
Sin. 29(2), 611–643 (2019)
H. Dette, T. Eckle, M. Vetter, Multiscale change point detection for dependent data. Scand. J. Stat.
47(4), 1243–1274 (2020a)
H. Dette, K. Kokot, S. Volgushev, Testing relevant hypotheses in functional time series via self-
normalization. J. R. Stat. Soc. Ser. B 82(3), 629–660 (2020b)
Y. Dong, J. Spielmann, Weak limits of random coefficient autoregressive processes and their
application in ruin theory. Insurance Math. Econ. 91, 1–11 (2020)
M. Donsker, An invariance principle for certain probability limit theorems. Mem. Am. Math. Soc.
6 (1951)
M. Donsker, Justification and extension of Doob’s heuristic approach to the Kolmogorov-Smirnov
theorems. Ann. Math. Stat. 23, 277–281 (1952)
J. Duan, Augmented GARCH(p,q) process and its diffusion limit. J. Econom. 79, 97–127 (1997)
R.M. Dudley, Uniform Central Limit Theorems (Cambridge University Press, Cambridge, 1999)
L. Dümbgen, The asymptotic behavior of some nonparametric change–point estimators. Ann. Stat.
19, 1471–1495 (1991)
S. Edwards, Change of monetary regime, contracts, and prices: Lessons from the great depression,
1932–1935. J. Int. Money Finance 108, 102190 (2020)
R.F. Engle, K.F. Kroner, Multivariate simultaneous generalized arch. Econom. Theory 11, 122–
150 (1995)
536 Bibliography
R.F. Engle, V.K. Ng, M. Rothschild, Asset pricing with a factor-arch covariance structure:
Empirical estimates for treasury bills. J. Econom. 45, 213–237 (1990)
T. Erhardsson, Conditions for convergence of random coefficient ar(1) processes and perpetuities
in higher dimensions. Ann. Stat. 20, 990–1005 (2014)
E.F. Fama, K.R. French, Common risk factors in the returns on stocks and bonds. J. Financ. Econ.
33(1), 3–56 (1993). ISSN 0304-405X
P. Fearnhead, G. Rigaill, Changepoint detection in the presence of outliers. J. Am. Stat. Assoc.
114(525), 169–183 (2019)
Q. Feng, C. Kao, Large-dimensional Panel Data Econometrics (World Scientific, 2021)
D. Ferger, Change–point estimators in case of small disorders. J. Stat. Plan. Inference 40, 33–49
(1994)
J.-D. Fermanian, H. Malongo, On the stationarity of dynamical conditional correlation models.
Econom. Theory 33, 636–663 (2017)
F. Ferraty, P. Vieu, Nonparametric Functional Data Analysis: Theory and Practice (Springer, New
York, 2006)
K.J. Forbes, R. Rigobon, No contagion, only interdependence: measuring stock market comove-
ments. J. Finance 57, 2223–2261 (2002)
C. Francq, J.-M. Zakoian, Maximum likelihood estimation of pure GARCH and ARMA-GARCH
processes. Bernoulli 10, 605–637 (2004)
C. Francq, J-M. Zakoian, GARCH Models: Structure, Statistical Inference and Financial
Applications (Wiley, 2010)
C. Francq, J.-M. Zakoian, Comment on “Quasi–maximum likelihood estimation of GARCH
models with heavy tailed likelihoods" by J. Fan, L. Qi and D. Xiu. J. Bus. Econ. Stat. 32,
198–201 (2014)
C. Francq, L. Horváth, J.-M. Zakoian, Variance targeting estimation of multivariate GARCH
models. J. Financ. Econom. 14, 353–382 (2016)
J. Franke, C. Kirch, J. Kamgaing, Changepoints in time series of counts. J. Time Ser. Anal. 33,
757 (2012)
K. Frick, A. Munk, H. Sieling, Multiscale change point inference (with discussion). J. R. Stat.
Soc. Ser. B 76, 495–580 (2014)
P. Fryzlewicz, Wild binary segmentation for multiple change point detection. Ann. Stat. 42, 2243–
2281 (2014)
P. Fryzlewicz, S. Subba Rao, Multiple-change-point detection for auto-regressive conditional
heteroscedastic processes. J. R. Stat. Soc. Ser. B 76, 903–924 (2014)
P. Galeano, D. Pena, Covariance changes detection in multivariate time series. J. Stat. Plan.
Inference 137, 194–211 (2007)
C. Gallagher, R. Lund, R. Killick, X. Shi, Autocovariance estimation in the presence of
changepoints. J. Korean Stat. Soc. 51, 107–433 (2022)
A.M. Garsia, Continuity properties of Gaussian processes with multidimensional time parameter,
in Proceedings of the .6th Berkeley Symp. Math. Stat. Probab., vol. 2 (University of California
Press, 1970), pp. 369–374
A.M. Garsia, E. Rodemich, H. Rumsey, A real variable lemma and the continuity of paths of some
gaussian processes. Indiana Univ. Math. J. 20, 565–578 (1970)
C. Gerstenberger, Robust Wilcoxon-type estimation of change-point location under short-range
dependence. J. Time Ser. Anal. 39, 90–104 (2018)
E. Ghysels, A. Guay, A. Hall, Predictive tests for structural change with unknown breakpoint. J.
Econom. 82, 209–233 (1997)
E. Gombay, Change detection in autoregressive time series. J. Multivariate Anal. 99(3), 451–464
(2008)
E. Gombay, L. Horváth, An application of the maximum likelihood test to the change–point
problem. Stoch. Process. Appl. 50, 161–171 (1994)
Bibliography 537
S.A. Good, G.K. Corlett, J.J. Remedios, E.J. Noyes, D.T. Llewellyn-Jones, The global trend in sea
surface temperature from 20 years of advanced very high resolution radiometer data. J. Climate
20(7), 1255–1264 (2007)
T. Górecki, L. Horváth, P. Kokoszka, Change point detection in heteroscedastic time series.
Econom. Stat. 20, 86–117 (2017)
T. Górecki, L. Horváth, P. Kokoszka, Change point detection in heteroscedastic time series.
Econom. Stat. 7, 63–88 (2018). ISSN 2452-3062
J. Gösmann, T. Kley, H. Dette, A new approach for open-end sequential change point monitoring.
J. Time Ser. Anal. 42(1), 63–84 (2021)
I. Grabovsky, L. Horváth, M. Hušková, Limit theorems for kernel-type estimators for the time of
change. J. Stat. Plan. Inference 89(1), 25–56 (2000)
U. Grenander, M. Rosenblatt, Statistical Analysis of Stationary Time Series (Wiley, New York,
1957)
G. Grossman, A. Krueger, Environmental impacts of a North American free trade agreement.
National Bureau of Economics Research (1991). Working Paper# 3914. Issue Date November
1991
A. Guay, E. Guerre, A data-driven specification test for dynamic regression model. Econom.
Theory 22, 543–586 (2006)
C.M. Hafner, Alternative assets and cryptocurrencies. J. Risk Financ. Manag. 13(1), 7 (2020)
C.M. Hafner, A. Preminger, On asymptotic theory for multivariate GARCH models. J. Multivariate
Anal. 100, 2044–2054 (2009)
P. Hall, C.C. Heyde, Martingale Limit Theory and its Application (Academic Press, 1980)
P. Hall, M. Hosseini-Nasab, On properties of functional principal components. J. R. Stat. Soc. Ser.
B 68, 109–126 (2006)
P. Hall, Q. Yao, Inference in arch and GARCH models with heavy-tailed errors. Econometrica 71,
285–317 (2003)
A.R. Hall, S. Han, O. Boldea, Inference regarding multiple structural changes in linear models
with endogenous regressors. J. Econom. 170, 281–302 (2012)
A.R. Hall, D.R. Osborne, N. Sakkas, Structural break inference using information criteria in
models estimated by two-stage least squares. J. Time Ser. Anal. 36, 741–762 (2015)
L.P. Hansen, Large sample properties of generalized method of moments estimators. Econometrica
50, 1029–1054 (1982)
B.E. Hansen, Tests for parameter instability in regression with i(1) processes. J. Bus. Econ. Stat.
10, 321–335 (1992)
B.E. Hansen, Approximate asymptotic p values for structural change tests. J. Bus. Econ. Stat. 15,
60–67 (1997)
B.E. Hansen, Testing for structural change in conditional models. J. Econom. 97, 93–115 (2000)
T. Harris, B. Li, J.D. Tucker, Scalable multiple changepoint detection for functional data
sequences. Environmetrics 33, e2710 (2021)
D.I. Harvey, S.J. Leybourne, A.M.R. Taylor, Modified tests for a change in persistence. J. Econom.
134, 441–469 (2006)
E. Hewitt, K. Stromberg, Real and Abstract Analysis (Springer, Berlin, 1969)
E. Hillebrand, Neglecting parameter changes in GARCH models. J. Econom. 129, 121–138 (2005)
Z. Hlávka, M. Hušková, C. Kirch, S. Meintanis, Monitoring changes in the error distribution of
autoregressive models based on Fourier methods. Test 21, 605–634 (2012)
Z. Hlávka, M. Hušková, S.G. Meintanis. Change Point Detection with Multivariate Observations
Based on Characteristic Functions (Springer International Publishing, 2017), pp. 273–290
Y. Hoga, Monitoring multivariate time series. J. Multivariate Anal. 155, 105–121 (2017)
Y. Hoga, A structural break test for extremal dependence in .β-mixing random vectors. Biometrika
105(3), 627–643 (2018a)
Y. Hoga, Detecting tail risk differences in multivariate time series: Detecting tail risk differences.
J. Time Ser. Anal. 39, 665–689 (2018b)
M. Holmes, I. Kojadinovic, J. Quessy, Nonparametric tests for change-point detection á la Gombay
and Horváth. J. Multivariate Anal. 115, 16–32 (2013)
538 Bibliography
L. Horváth, G. Rice, Y. Zhao, Testing for changes in linear models using weighted residuals J.
Multivariate Anal. 198, 105210 (2023)
L. Horváth, L. Trapani, J. VanderDoes, The maximally selected likelihood ratio test in random
coefficient models. Econom. J. (2024), Forthcoming
T. Hsing, R. Eubank, Theoretical Foundations of Functional Data Analysis, with an Introduction
to Linear Operators (Wiley, New York, 2015)
M. Hušková, Estimation of a change in linear models. Stat. Probab. Lett. 26, 13–24 (1996)
M. Hušková, C. Kirch, Bootstrapping confidence intervals for the change-point of time series. J.
Time Ser. Anal. 29, 947–972 (2008)
M. Hušková, C. Kirch, A note on studentized confidence intervals for the change-point. Comput.
Stat. 25, 269–289 (2010)
M. Hušková, C. Kirch, Bootstrapping sequential change-point tests for linear regression. Metrika
75(5), 673–708 (2012)
M. Huśková, S.G. Meintanis, Change-point analysis based on empirical characteristic functions of
ranks. Seq. Anal. 25(4), 421–436 (2006a)
M. Huśková, S.G. Meintanis, Change point analysis based on empirical characteristic functions.
Metrika 63, 145–168 (2006b)
R.J. Hyndman, G. Athanasopoulos, Forecasting: Principles and Practice, 3rd edn. (OTexts,
Melbourne, 2021), OTexts.com/fpp3
I.A. Ibragimov, Some limit theorems for stationary processes. Theory Probab. Appl. 7, 349–382
(1962)
I.A. Ibragimov, Y.V. Linnik, Independent and Stationary Sequences of Random Variables (Wolters-
Nordhoff, The Netherlands, 1971)
C. Inclán, G.C. Tiao, Use of cummulative sums of squares for retrospective detection of change of
variance. J. Am. Stat. Assoc. 89, 913–923 (1994)
H. Janečková, Z. Prášková, CWLS and ML estimates in a heteroscedastic RCA(1) model. Stat.
Decis. 22, 245–259 (2004)
D. Jarušková, Asymptotic behaviour of a test statistic for detection of change in mean of vectors.
J. Stat. Plan. Inference 140, 616–625 (2010)
D. Jarušková, Testing for a change in covariance operator. J. Stat. Plan. Inference 143(9), 1500–
1511 (2013). ISSN 0378-3758
D. Jarušková, J. Antoch, Changepoint analysis of Klementinum temperature series. Environmetrics
31(1), e2570 (2020)
T. Jeantheau, Strong consistency of estimators for multivariate arch models. Econom. Theory 14,
70–86 (1998)
S.T. Jensen, A. Rahbek, Asymptotic inference for nonstationary GARCH. Econom. Theory 20,
1203–1226 (2004)
F. Jiang, Z. Zhao, X. Shao, Time series analysis of covid-19 infection curve: A change-point
perspective. J. Econom. 232(1), 1–17 (2023)
S. Jiao, R.D. Frostig, H. Ombao, Break point detection for functional covariance. Scand. J. Stat.
50(2), 477–512 (2023)
M. Jirák, Uniform change point tests in high dimension. Ann. Stat. 43, 2451–2483 (2015)
P.D. Jones, Hemispheric surface air temperature variations: A re–analysis and an update to 1993.
J. Climate 7, 1794–1802 (1994)
P.D. Jones, A. Moberg, Hemispheric and large-scale surface air temperature variations: An
extensive revision and an update to 2001. J. Climate 16, 206–223 (2003)
J. Ju, J.Y. Lin, Q. Liu, K. Shi, Structural changes and the real exchange rate dynamics. J. Int.
Money Finance 107, 102192 (2020)
J. Kang, S. Lee, Parameter change test for random coefficient integer-valued autoregressive
processes with application to polio data analysis. J. Time Series Anal. 30(2), 239–258 (2009)
H. Kawakatsu, Matrix exponential GARCH. J. Econom. 134, 95–128 (2006)
J. Kiefer, K-sample analogues of the Kolmogorov-Smirnov and Cramer-V. Mises tests. Ann. Math.
Stat. 30, 420–447 (1959)
540 Bibliography
R. Killick, I. Eckley, Changepoint: An R package for changepoint analysis. J. Stat. Softw. 58(3),
1–19 (2014)
R. Killick, P. Fearnhead, I. Eckley, Optimal detection of changepoints with a linear computational
cost. J. Am. Stat. Assoc. 107, 1590–1598 (2012)
J. Kim, Detection of change in persistence of a linear time series. J. Econom. 95, 97–116 (2000)
C. Kirch, Block permutation principles for the change analysis of dependent data. J. Stat. Plan.
Inference 137(7), 2453–2474 (2007)
C. Kirch, Bootstrapping sequential change-point tests. Seq. Anal. 27(3), 330–349 (2008)
C. Kirch, J.T. Kamgaing, Testing for parameter stability in nonlinear autoregressive models. J.
Time Ser. Anal. 33, 365–385 (2012)
C. Kirch, P. Klein, Moving sum data segmentation for stochastics processes based on invariance.
Stat. Sin. 33, 873–892 (2021)
C. Kirch, B. Muhsal, H. Ombao, Detection of changes in multivariate time series with application
to EEG DAT. J. Am. Stat. Assoc. 110, 1197–1216 (2015)
P. Kokoszka, M. Reimherr, Asymptotic normality of the principal components of functional time
series. Stoch. Process. Appl. 123, 1546–1562 (2013)
P. Kokoszka, M. Reimherr, Introduction to Functional Data Analysis (Chapman and Hall/CRC,
Boca Raton, 2017)
P. Kokoszka, G. Rice, H.L. Shang, Inference for the autocovariance of a functional time series
under conditional heteroscedasticity. J. Multivariate Anal. 162, 32–50 (2017)
J. Komlós, P. Major, G. Tusnády, An approximation of partial sums of independent R.V.’s and the
sample DF.I. Z. Wahrsch. Verwand. Gebiete 32, 111–131 (1975)
J. Komlós, P. Major, G. Tusnády, An approximation of partial sums of independent R.V.’s and the
sample DF.II. Z. Wahrsch. Verw. Geb. 34, 33–58 (1976)
A.J. Koning, V. Protasov, Tail behaviour of gaussian processes with applications to the Brownian
pillow. J. Multivariate Anal. 87(2), 370–397 (2003)
K.K. Korkas, P. Fryzlewicz, Multiple change-point detection for non-stationary time series using
wild binary segmentation. Stat. Sin. 27, 287–311 (2017)
A.P. Korostelev, On minimax estimation of a discontinuous signal. Theory Probab. Appl. 32(4),
727–730 (1988)
S. Kovács, H. Li, P. Bühlmann, A. Munk, Seeded binary segmentation: A general methodology for
fast and optimal changepoint detection. Biometrika 110(1), 249–256 (2023)
W. Krämer, W. Ploberger, R. Alt, Testing for structural change in dynamic models. Econometrica
56, 1355–1370 (1988)
S. Küchnert, Functional arch and GARCH models: A Yule–Walker approach. Electron. J. Stat. 14,
4321–4360 (2020)
R.J. Kulperger, On the residuals of autoregressive processes and polynomial regression. Stoch.
Process. Appl. 21, 107–118 (1985)
E. Kurozumi, Confidence sets for the date of a structural change at the end of a sample. J. Time
Ser. Anal. 39, 850–862 (2018)
E. Kurozumi, P. Tuvaandorj, Model selection criteria in multivariate models with multiple
structural changes. J. Econom. 164(2), 218–238 (2011)
S. Kuznets, Economic growth and income inequality. Am. Econ. Rev. 45, 1–28 (1955)
D. Kwiatkowski, P.C.B. Phillips, P. Schmidt, Y. Shin, Testing the null hypothesis of stationarity
against the alternative of a unit root: How sure are we that economic time series have a unit
root? J. Econom. 54(1), 159–178 (1992)
T.L. Lai, H. Xing, Sequential change-point detection when the pre- and post-change parameters
are unknown. Seq. Anal. 29(2), 162–175 (2010)
W. Lai, M.D. Johnson, R. Kucherlapati, P. Park, Comparative analysis of algorithms for identifying
amplifications and deletions in array CGH data. Bioinformatics 21(19), 3763–3770 (2005)
M. Lavielle, Detection of multiple changes in a sequence of dependent variables. Stoch. Process.
Appl. 83, 79–102 (1999)
M. Lavielle, E. Moulines, Least-squares estimation of an unknown number of shifts in time series.
J. Time Ser. Anal. 21, 33–59 (2000)
Bibliography 541
N. Neumeyer, I. Van Keilegom, Changepoint tests for the error distribution in nonparametric
regression. Scand. J. Stat. 36, 518–541 (2009)
W.K. Newey, K.D. West, A simple, positive semi-definite, heteroskedasticity and autocorrelation
consistent covariance matrix. Econometrica 55, 703–708 (1987)
D.F. Nicholls, B.G. Quinn, Random Coefficient Autoregressive Models: An Introduction (Springer,
New York, 1982)
J. Nyblom, Testing for the constancy of parameters over time. J. Am. Stat. Assoc. 84, 223–230
(1989)
C.M.M. Padilla, D. Wang, Z. Zhao, Y. Yu, Change-point detection for sparse and dense functional
data in general dimensions, in Advances in Neural Information Processing Systems (2022)
E.S. Page, Continuous inspection schemes. Biometrika 41, 100–115 (1954)
E.S. Page, A test for a change in a parameter occuring at an unknown point. Biometrika 42,
523–527 (1955)
J. Pan, J. Chen, Application of modified information criterion to multiple change point problems.
J. Multivariate Anal. 97, 2221–2241 (2006)
V. Panaretos, S. Tavakoli, Fourier analysis of stationary time series in function space. Ann. Stat.
41(2), 568–603 (2013)
T. Pang, D. Zhang, T.T.L. Chong, Asymptotic inferences for an ar(1) model with a change point:
stationary and non-stationary cases. J. Time Ser. Anal. 35, 133–150 (2014)
T. Pang, T. Tai-Leung Chong, D. Zhang, Non identification of structural change in non stationary
ar(1) models. Econom. Theory 34, 985–1017 (2018)
K. Pape, P. Galeano, D. Wied, Sequential detection of parameter changes in dynamic conditional
correlation models. Appl. Stoch. Models Bus. Ind. 37, 475–495 (2021)
E. Parzen, On consistent estimates of the spectrum of stationary time series. Ann. Math. Stat. 28,
329–348 (1957)
R.S. Pedersen, A. Rahbek, Multivariate variance targeting in the BEKK-GARCH model. Econom.
J. 17, 24–55 (2014)
L. Peng, Q. Yao, Least absolute deviations estimation for ARCH and GARCH models. Biometrika
90, 967–975 (2003)
P. Perron, Y. Yamamoto, J. Zhou, Testing jointly for structural changes in the error variance and
coefficients of a linear regression model. Quant. Econ. 11, 1019–1057 (2020)
M. Pešta, M. Wendler, Nuisance-parameter-free changepoint detection in non-stationary series.
Test 29, 379–408 (2020)
V.V. Petrov, Limit Theorems of Probability Theory (Oxford University Press, Oxford, UK, 1995)
J. Petrovich, M. Reimherr, Asymptotic properties of principal component projections with repeated
eigenvalues. Stat. Probab. Lett. 130, 42–48 (2017)
D.T. Pham, The mixing property of bilinear and generalised random coefficient autoregressive
models. Stoch. Process. Appl. 23, 291–300 (1986)
P.C.B. Phillips, S-P. Shi, Financial bubble implosion and reverse regression. Econom. Theory
34(4) (2018)
P.C.B. Phillips, V. Solo, Asymptotics for linear processes. Ann. Stat. 20, 971–1001 (1992)
P.C.B. Phillips, J. Yu, Dating the timeline of financial bubbles during the subprime crisis. Quant.
Econ. 2(3), 455–491 (2011)
P.C.B. Phillips, S. Shi, J. Yu, Specification sensitivity in right-tailed unit root testing for explosive
behaviour. Oxford Bull. Econ. Stat. 76(3), 315–333 (2014)
P.C.B. Phillips, S. Shi, J. Yu, Testing for multiple bubbles: Historical episodes of exuberance and
collapse in the s&p 500. Int. Econ. Rev. 56(4), 1043–1078 (2015)
V.I. Piterbarg, Asymptotic Methods in the Theory of Gaussian Processes and Fields, volume 148
of Memoirs of the American Mathematical Society (American Mathematical Society, 1996)
D.N. Politis, J.P. Romano, Bias-corrected nonparametric spectral estimation. J. Time Ser. Anal.
16, 67–103 (1995)
M. Pollak, Average run lengths of an optimal method of detecting a change in distribution. Ann.
Stat. 15(2), 749–779 (1987)
Bibliography 543
C. Qualls, H. Watanabe, Asymptotic properties of gaussian processes. Ann. Math. Stat. 43, 580–
596 (1972)
R.E. Quandt, Tests of the hypothesis that a linear regression system obeys two separate regimes.
J. Am. Stat. Assoc. 53, 873–880 (1958)
R.E. Quandt, The estimation of the parameters of a linear regression system obeying two separate
regimes. J. Am. Stat. Assoc. 55, 324–330 (1960)
J.O. Ramsey, B.W. Silverman, Functional Data Analysis (Springer, New York, 2002)
J. Reeves, J. Chen, X.L. Wang, R. Lund, Q. Lu, A review and comparison of changepoint detection
techniques for climate data. J. Appl. Meteorol. Climatol. 46(6), 900–915 (2007)
M. Regis, P. Serra, E.R. van den Heuvel, Random autoregressive models: A structured overview.
Econom. Rev. 41, 207–230 (2021)
M. Reimherr, Functional regression with repeated eigenvalues. Stat. Probab. Lett. 107, 62–70
(2015)
A. Rényi, On the theory of order statistics. Acta Math. Acad. Sci. Hungar. 4, 191–231 (1953)
P. Révész, Random Walk in Random and Non-random Environments (World Scientific, Singapore,
1990)
G. Rice, H.L. Shang, A plug-in bandwidth selection procedure for long-run covariance estimation
with stationary functional time series. J. Time Ser. Anal. 38, 591–609 (2017)
G. Rice, C. Zhang, Consistency of binary segmentation for multiple change-point estimation with
functional data. Stat. Probab. Lett. 180, 109228 (2022). ISSN 0167-7152
A. Rinaldo, D. Wang, Q. Wen, R. Willett, Y. Yu, Localizing changes in high-dimensional regression
models, in Proceedings of the International Conference on Artificial Intelligence and Statistics
(2021)
S.W. Roberts, √ A comparison of some control chart procedures. Technometrics 8, 411–430 (1966)
A. Schick, . N –consistent estimation in a random coefficient autoregressive model. Austral. J.
Stat. 38, 155–160 (1996)
A.J. Scott, M. Knott, A cluster analysis method for grouping means in the analysis of variance.
Biometrics 30(3), 507–512 (1974)
M. Serbinowska, Consistency of an estimator of the number of changes in binomial observations.
Stat. Probab. Lett. 29, 337–344 (1996)
M. Shahbaz, A. Sinha, Environmental Kuznets curve for CO2 emissions: a literature survey. J.
Econ. Stud. 46, 106–168 (2019)
Q.-M. Shao, On a conjecture of Révész. Proc. Am. Math. Soc. 123, 575–582 (1995)
X. Shao, Self-normalization for time series: A review of recent developments. J. Am. Stat. Assoc.
110(512), 1797–1817 (2015)
X. Shao, X. Zhang, Testing for change points in time series. J. Am. Stat. Assoc. 105(491), 1228–
1240 (2010)
O.S. Sharipov, M. Wendler, Bootstrapping covariance operators of functional time series. J.
Nonparam. Stat. 32(3), 648–666 (2020)
O. Sharipov, J. Tewes, M. Wendler, Sequential block bootstrap in a Hilbert space with application
to change point analysis. Canad. J. Stat. 44(3), 300–322 (2016)
X. Shi, C. Gallagher, R. Lund, R. Killick, A comparison of single and multiple changepoint
techniques for time series data. Comput. Stat. Data Anal. 170, 107433 (2022)
A.N. Shiryaev, On optimum methods in quickest detection problems. Theory Probab. Appl. 8,
22–46 (1963)
G.R. Shorack, J.A. Wellner, Empirical Processes with Applications to Statistics (Wiley, 1986)
D. Siegmund, Sequential Analysis: Tests and Confidence Intervals (Springer, New York, 2013)
A.V. Skorokhod, Limit theorems for stochastic processes. Theory Probab. Appl. 1, 261–290 (1956)
St. Louis MO: Federal Reserve Bank of St. Louis. Fred, Federal Reserve economic data (2023)
A. Steland, Monitoring procedures to detect unit roots and stationarity. Econom. Theory 23,
1108–1135 (2006)
A. Steland, Testing and estimating change-points in the covariance matrix of a high-dimensional
time series. J. Multivariate Anal. 177, 104582 (2020)
544 Bibliography
J.H. Stock, M.W. Watson, Disentangling the channels of the 2007–2009 recession. National
Bureau of Economic Research, No. w18094 (2012)
C. Stoehr, J.A.D. Aston, C. Kirch, Detecting changes in the covariance structure of functional time
series with application to FMRI data. Econom. Stat. 18, 44–62 (2021)
Y. Sun, P. Phillips, S. Jin, Optimal bandwidth selection in heteroskedasticity–autocorrelation robust
testing. Econometrica 76, 175–194 (2008)
D. Surgailis, G. Teyssiére, M. Vaičiulis, Detecting and estimating epidemic changes in dependent
functional data. J. Multivariate Anal. 109, 204–220 (2008)
A. Tartakovsky, I. Nikiforov, M. Basseville, Sequential Analysis: Tests and Confidence Intervals
(Chapman and Hall/CRC, New York, 2014)
A. Thavaneswaran, S.S. Appadoo, M. Ghahramani, RCA models with GARCH innovations. Appl.
Math. Lett. 22, 110–114 (2009)
A.W. Vaart, J.A. Wellner, Weak Convergence and Empirical Processes (Springer, 1996)
E.S. Venkatraman, Consistency results in multiple change–point problems. Technical Report
Technical Report No. 24, Stanford University, 1992
T.J. Vogelsang, Wald-type tests for detecting breaks in the trend function of a dynamic time series.
Econom. Theory 13, 818–848 (1997)
L.Ju. Vostrikova, Detection of “disorder” in multidimensional random processes. Sov. Math. Dokl.
24, 55–59 (1981)
D. Wang, Y. Yu, A. Rinaldo, Univariate mean change point detection: Penalization, CUSUM and
optimality. Electron. J. Stat. 14(1), 1917–1961 (2020)
R. Wang, C. Zhu, S. Volgushev, X. Shao, Inference for change points in high-dimensional data via
selfnormalization. Ann. Stat. 50(2), 781–806 (2022)
M.J. Wichura, On the construction of almost uniformly convergent random variables with given
weakly convergent image laws. Ann. Math. Stat. 41, 284–291 (1970)
D. Wied, W. Krämer, H. Dehling, Testing for a change in correlation at an unknown point in time
using an extended functional delta method. Econom. Theory 28, 570–589 (2012)
D. Wied, D. Ziggle, T. Berens, On the application of new tests for structural changes on global
minimum-variance portfolios. Stat. Pap. 54, 955–975 (2013)
C.K. Wikle, A. Zammit-Mangion, N. Cressie, Spatio-Temporal Statistics with R (Chapman and
Hall/CRC, 2019)
J.M. Wooldridge, Econometric Analysis of Cross Section and Panel Data, 2nd edn. (MIT Press,
2010)
C.-F. Wu, Asymptotic theory of nonlinear least squares estimation. Ann. Stat. 9, 501–513 (1981)
W. Wu, Nonlinear System Theory: Another Look at Dependence, volume 102 of Proceedings of
The National Academy of Sciences of the United States (National Academy of Sciences, 2005)
W. Wu, Strong invariance principles for dependent random variables. Ann. Probab. 35, 2294–2320
(2007)
J. Wu, Z. Xiao, A powerful test for changing trends in time series models: Test for changing trends
in time series models. J. Time Ser. Anal. 39, 488 (2018)
W. Wu, P. Zaffaroni, Asymptotic theory for spectral density estimates of general multivariate time
series. Econom. Theory 34, 1–22 (2018)
K.L. Xu, Testing for structural change under non–stationary variances. Econom. J. 18, 274–305
(2015)
Y.-C. Yao, Estimating the number of change-points via schwartz’s criterion. Stat. Probab. Lett. 6,
181–189 (1988)
Y. Yu, A review on minimax rates in change point detection and localisation (2020)
A. Zeileis, Econometric computing with HC and HAC covariance matrix estimators. J. Stat. Softw.
11(10), 1–17 (2004)
A. Zeileis, F. Leisch, C. Kleiber, K. Hornik, Monitoring structural change in dynamic econometric
models. J. Appl. Econom. 20, 99–121 (2005)
X. Zhang, X. Shao, K. Hayhoe, D.J. Wuebbles, Testing the structural stability of temporally
dependent functional observations and application to climate projections. Electron. J. Stat.
5, 1765–1796 (2011)
Bibliography 545
Z. Zhou, Heteroscedasticity and autocorrelation robust structural change detection. J. Am. Stat.
Assoc. 108, 726–740 (2013)
K. Zhu, S. Ling, Likelihood ratio tests for the structural change of an AR(p) model to a threshold
ar(p) model. J. Time Ser. Anal. 33, 223–232 (2011)
E. Zivot, J. Wang, Modelling Financial Time Series with S-PLUS (Springer, New York, 2006)