0% found this document useful (0 votes)
54 views

Change Point Analysis for Time Series

The document is an introduction to the book 'Change Point Analysis for Time Series' by Lajos Horváth and Gregory Rice, part of the Springer Series in Statistics. It discusses the expansion of a review article on change point analysis into a comprehensive resource, focusing on asymptotic results and practical applications in various fields. The book aims to serve as both a reference and a textbook for researchers and graduate students interested in the subject.

Uploaded by

harusilica
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Change Point Analysis for Time Series

The document is an introduction to the book 'Change Point Analysis for Time Series' by Lajos Horváth and Gregory Rice, part of the Springer Series in Statistics. It discusses the expansion of a review article on change point analysis into a comprehensive resource, focusing on asymptotic results and practical applications in various fields. The book aims to serve as both a reference and a textbook for researchers and graduate students interested in the subject.

Uploaded by

harusilica
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 552

Springer Series in Statistics

Lajos Horváth
Gregory Rice

Change Point
Analysis
for Time Series
Springer Series in Statistics

Series Editors
Peter Bühlmann, Seminar für Statistik, ETH Zürich, Zürich, Switzerland
Peter Diggle, Dept. Mathematics, University Lancaster, Lancaster, UK
Ursula Gather, Dortmund, Germany
Scott Zeger, Baltimore, MD, USA
Springer Series in Statistics (SSS) is a series of monographs of general interest that
discuss statistical theory and applications.
The series editors are currently Peter Bühlmann, Peter Diggle, Ursula Gather,
and Scott Zeger. Peter Bickel, Ingram Olkin, and Stephen Fienberg were editors of
the series for many years.
Lajos Horváth . Gregory Rice

Change Point Analysis


for Time Series
Lajos Horváth Gregory Rice
Department of Mathematics Department of Statistics and Actuarial
University of Utah Science
Salt Lake City, UT, USA University of Waterloo
Waterloo, ON, Canada

ISSN 0172-7397 ISSN 2197-568X (electronic)


Springer Series in Statistics
ISBN 978-3-031-51608-5 ISBN 978-3-031-51609-2 (eBook)
https://doi.org/10.1007/978-3-031-51609-2

Mathematics Subject Classification: 60-xx, 62-xx

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.


To Margaret, Liam, Elliott, Alice, and Ellie
To Brittney, and Clark
Preface

“The only constant in life is change.”


— Heraclitus

This book started with a review article on change point analysis that we wrote for
the journal TEST in 2014. The primary goal of that article was to review and extend
some standard change point techniques to data that exhibited serial dependence, e.g.,
time series. The article was received warmly, and it generated some quite stimulating
discussion. Given this, and the crippling boredom of the pandemic lockdowns of
2020, we decided to expand that article into a book.
In the past decade, the pace of research in change point analysis has grown
tremendously. The topics covered in this book only cover a portion of recent
developments that are close to our own research interests. The focus is hence
on asymptotic results in change point analysis when the data are time series. We
consider such results in many different settings, including in applications to change
point analysis in popular regression and time series models, as well as to high-
dimensional and function valued time series.
We have tried to write this book so that it will be useful both as a reference
and as a textbook for researchers or graduate students who are trying to learn more
about the subject. Each chapter concludes with bibliographic notes and a number of
exercises that can be used to evaluate one’s grasp of the material, or in structuring
a reading or topics course. After the first chapter, which covers foundational
asymptotic results for cumulative sum processes derived from stationary variables,
each subsequent chapter contains real data examples, mainly in the areas of
economics, finance, and environmetrics, that illustrate the practical application of
the asymptotic results developed.
This book would not have been possible without the help and contributions of
a great many collaborators, students, and friends over the years. These especially
include Jaromir Antoch, Alexander Aue, Patrick Bardsley, István Berkes, Cooper
Boniece, Julian Chan, Shoja Chenourri, Stefan Fremdt, Tomasz Górecki, Robertas
Gabrys, Edit Gombay, Siegfried Hörmann, Zsuzsanna Horváth, Marie Hus̆ková,
Claudia Kirch, Mario Kühn, Piotr Kokoszka, Bo Li, Hemei Li, Shiqing Ling,
Zhenya Liu, Shanlin Lu, Nirian Martin, Curtis Miller, Leandro Pardo, William
Pouliot, Ron Reeder, Matthew Reimherr, Johannes Schauer, Qi-Man Shao, Ozan

vii
viii Preface

Sönmez, Josef Steinebach, Weiqing Tang, Lorenzo Trapani, Jeremy VanderDoes,


Shixuan Wang, Jia Wang, Gabriel Young, Yaosong Zhan, Chi Zhang, and Yuqian
Zhao. In writing this book, we hope to continue the work and legacy of Miklós
Csörgő. We also give special thanks to Greg Preston for his dedicated work in
creating the bibliography. The second author wishes to acknowledge the support
of Natural Science and Engineering Research Council of Canada, grant NSERC
RGPIN 50503-10477.

Salt Lake City, UT, Canada Lajos Horváth


Waterloo, ON, Canada Gregory Rice
2023
Contents

1 Cumulative Sum Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Weak Convergence of Weighted CUSUM Processes . . . . . . . . . . . . . . . . . 7
1.2.1 Asymptotics of Standardized CUSUM Processes . . . . . . . . . . . 12
1.2.2 Rényi-Type Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Multivariate CUSUM Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.5 Bibliographic Notes and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Change Point Analysis of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1 CUSUM Statistics in the Presence of Change Points . . . . . . . . . . . . . . . . . 29
2.1.1 Testing for Relevant Changes in the Mean . . . . . . . . . . . . . . . . . . 37
2.2 The Asymptotic Properties of Change Point Estimators . . . . . . . . . . . . . 40
2.3 Multiple Changes in the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3.1 Binary Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3.2 Model Selection Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3.3 The Asymptotic Distribution of Multiple Change
Point Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3.4 Multivariate Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.4 Classical CUSUM Tests for Changes in Distribution. . . . . . . . . . . . . . . . . 66
2.5 Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.7 Bibliographic Notes and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3 Variance Estimation, Change Points in Variance, and
Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.1 Estimation of Long–Run Variances and Covariance Matrices . . . . . . . 89
3.1.1 Serially Uncorrelated Observations. . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.1.2 Serially Correlated Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.1.3 Ratio–Type and Self–Normalized Statistics . . . . . . . . . . . . . . . . . . 109
3.2 Changes in Variances and Covariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.2.1 Simultaneous Changes in the Mean and Variance . . . . . . . . . . . 116

ix
x Contents

3.2.2 Changes in the Covariance Matrix of Vector Valued


Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.3 Heteroscedastic Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.4 Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
3.6 Bibliographic Notes and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.1 Change Point Detection Methods for Linear Models . . . . . . . . . . . . . . . . . 145
4.1.1 The Quasi–Likelihood Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.1.2 Residual–Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.1.3 Methods Based on Direct Comparison of Parameter
Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.1.4 Heteroscedasticity in the Model Errors . . . . . . . . . . . . . . . . . . . . . . . 161
4.2 Inference for Change–Points in a Linear Model . . . . . . . . . . . . . . . . . . . . . 164
4.2.1 Estimation of a Single Change Point . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.2.2 Estimation of Multiple Change Points. . . . . . . . . . . . . . . . . . . . . . . . 176
4.3 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
4.4 Non–linear Regression and Generalized Method of Moments . . . . . . . 189
4.5 Changes in the Distributions of the Innovations . . . . . . . . . . . . . . . . . . . . . . 192
4.6 Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
4.8 Bibliographic Notes and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5 Parameter Changes in Time Series Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.1 ARMA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.1.1 Non–stationary AR(1) Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.2 Dynamic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
5.3 Random Coefficient Autoregressive Models . . . . . . . . . . . . . . . . . . . . . . . . . . 238
5.4 ARCH, GARCH and Other Volatility Processes . . . . . . . . . . . . . . . . . . . . . . 262
5.5 Vector Autoregressive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
5.6 Multivariate Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
5.7 Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
5.9 Bibliographic Notes and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
6 Sequential Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
6.1 Sequential Detection Procedures and Stopping Times . . . . . . . . . . . . . . . 325
6.2 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
6.3 Time Series Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
6.4 Distribution of the Stopping Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
6.5 Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
6.7 Bibliographic Notes and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
Contents xi

7 High-Dimensional and Panel Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365


7.1 Change in the Means of High-Dimensional Observations. . . . . . . . . . . . 366
7.2 Panel Models with Common Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
7.3 High-Dimensional Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
7.3.1 Fixed Length Panels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
7.4 Changes in the Parameters of RCA Panel Data Models . . . . . . . . . . . . . . 402
7.5 Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
7.7 Bibliographic Notes and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
8 Functional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
8.1 Change Detection in the Mean of Functional Observations . . . . . . . . . . 422
8.2 Estimating Change Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
8.2.1 Estimating a Single Change Point . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
8.2.2 Estimating Multiple Change Points . . . . . . . . . . . . . . . . . . . . . . . . . . 449
8.3 Change in the Covariance of Functional Observations . . . . . . . . . . . . . . . 463
8.3.1 Changes in the Trace and Eigenvalues of the
Covariance Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
8.4 Heteroscedastic Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
8.4.1 Testing for a Change in the Mean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
8.4.2 Estimating the Time of Change in Heteroscedastic
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
8.5 Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
8.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
8.7 Bibliographic Notes and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498

A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
A.1 Weak Convergence and Approximations of Sums . . . . . . . . . . . . . . . . . . . . 501
A.1.1 Weak Convergence of the Empirical Processes
Based on Stationary Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
A.2 Properties of Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
A.3 Functional Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
Notation and Conventions

N, Z, R, C; the natural numbers, integers, real numbers, and complex numbers.


.

sargmax g(x) = min{y : g(y) = maxx∈A g(x)} ; the smallest maximal argument of
.
x∈A
g over the set A.
D D
→, .= ; converges in distribution, equal in distribution.
.
P a.s.
→, .→ ; converges in probability, converges almost surely.
.
D (I )
. → ; for a hypercube .I ⊆ Rd , weak convergence in the Skorokhod topology on
.D(I ).

.|A| ; the # of elements in the set A.

.dist(A, B) = mina∈A,b∈B |a − b| ; the minimal distance between two finite subsets

A and B of .R.
.log+ x = max{log x, 1}.
E | |
. ∅ = 0, . ∅ = 1 ; empty sum is equal to zero, empty product is equal to one.
.0, .I ; zero vector in .R , identity matrix in .R
d d×d . d is made clear by context.

.1{A}; indicator function of the set A.

.cov(X, Y ), .cov(B); covariance between X and Y , variance–covariance matrix of .B.

xiii
Chapter 1
Cumulative Sum Processes

In this chapter we introduce the basic change point in the mean problem for
scalar observations. We see that the most logical and straightforward approaches to
detect such a change point lead to the consideration of weighted functionals of the
cumulative sum (CUSUM) processes computed from the observed data. As such,
we begin by developing a comprehensive asymptotic theory for CUSUM processes
under conditions that allow for serial dependence in the observations. This includes
a careful analysis of how weights applied to the CUSUM process affect the limiting
distribution of its functionals, and extensions to multivariate observations.

1.1 Introduction

We begin by considering a sequence of real valued observations .X1 , . . . , XN of


length N satisfying .EXi2 < ∞. Perhaps the most basic change point problem is
to detect and estimate a single potential change in the mean of the observations
occurring at an unknown change point .k ∗ ∈ {1, . . . , N }. This may be described
using the model

μ0 + Ei , if 1 ≤ i ≤ k ∗
.Xi = (1.1.1)
μA + Ei , if k ∗ + 1 ≤ i ≤ N.

Here .k ∗ denotes a potential change point in the mean, and assuming that

EEi = 0, and EEi2 = σ 2 for all i ∈ {1, . . . , N }


.

holds, .μ0 and .μA describe the means of the observations before and after the change
point .k ∗ , respectively. The model (1.1.1) is called the At Most One Change (AMOC)
in the mean model. Detecting a change point may then be framed as a hypothesis

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1


L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_1
2 1 Cumulative Sum Processes

testing problem in which we wish to test the null hypothesis

H0 : μ0 = μA ,
. (1.1.2)

i.e. there is no change in the mean of the observations, versus the alternative

HA : μ0 /= μA .
. (1.1.3)

If the change point .k ∗ were known, then the hypotheses .H0 and .HA revert to a two
sample problem to test for a difference in the population means between the samples
2
.X1 , . . . , Xk ∗ and .Xk ∗ +1 , . . . , XN . Assuming for the moment that .σ is also known,

it would be natural to test .H0 versus .HA using the two sample z-test. After some
simple algebra, this amounts to rejecting .H0 in favor of .HA for large values of the
test statistic

X̄k ∗ ,1 − X̄k ∗ ,2
. , (1.1.4)
σ [N/(k ∗ (N − k ∗ ))]1/2

where

1E E
k N
1
X̄k,1 =
. Xi and X̄k,2 = Xi . (1.1.5)
k N −k
i=1 i=k+1

If for instance the errors .E1 , . . . , EN are independent and identically distributed
normal random variables with a known variance, this is equivalent with the
likelihood ratio test for the equality of the means of two samples before and after
the point .k ∗ . Moreover, if .E1 , . . . , EN are independent and identically distributed,
but not necessarily normally distributed, then the above statistic has approximately
a standard normal distribution for large N due to the central limit theorem.
Since the change point .k ∗ is unknown, a logical test statistic to test .H0 versus .HA
is to maximize the two-sample test statistic over all potential change point locations,
which leads to considering the statistic

(k(N − k))1/2 |X̄k,1 − X̄k,2 |


TN,1 = max
. .
1≤k<N N 1/2

We will reject .H0 in favor of .HA or large values of .TN,1 . Some immediate
observations related to the definition of .TN,1 are as follows. Elementary algebra
gives
⎛ ⎞1/2 ||E |
k E ||
k N
[k(N − k)]1/2 N |
. |X̄k,1 − X̄k,2 | = | Xi − Xi | . (1.1.6)
N 1/2 k(N − k) | N |
i=1 i=1
1.1 Introduction 3

The process appearing on the right-hand side of (1.1.6) can be expressed in terms of
the cumulative sum (CUSUM) process,
⎛ ⎞
LN
E LNt⎦ E ⎠
t⎦ N
1 ⎝
.ZN (t) = Xi − Xi , t ∈ [0, 1], (1.1.7)
N 1/2 N
i=1 i=1

where .Lx⎦ denotes the integer part of x. The statistic .TN,1 is hence the maximum
of the weighted CUSUM process .ZN (t) with the weight function .w(t) = [t (1 −
t)]1/2 , 0 < t < 1, i.e.

|ZN (t)|
TN,1 =
. sup .
t∈[1/N,1−1/N ] [t (1 − t)]1/2

Since .w(t) = [t (1 − t)]1/2 is proportional to the standard deviation of .ZN (t),


.ZN (t)/[t (1 − t)]
1/2 is sometimes called the standardized CUSUM process.

Interestingly, if the observations .X1 , . . . , XN are independent and identically


distributed, so that .H0 holds, the law of the iterated logarithm for partial sums
(Breiman, 1968, p. 291) implies that

TN,1
. lim =σ a.s. (1.1.8)
N →∞ (2 log log N)1/2

As a result, when .H0 holds and the observations are independent and identically
P
distributed, the statistic .TN,1 satisfies .TN,1 → ∞ as .N → ∞. What then is an
appropriate threshold that if it is exceeded by .TN,1 we favor .H0 over .HA ? Or more
generally what is the approximate null distribution of .TN,1 ? We may ask the same
questions about more general change point test statistics of the form

|ZN (t)|
. sup
t∈(0,1) w(t)

for a weight function .w(·). In order to answer these questions, and many others
like it that arise in change point analysis, we begin with a detailed account of the
asymptotic properties of .ZN /w and its functionals in terms of the properties of
the model errors and the weight functions .w(·). Since most data that we wish to
apply change point analysis to are sequentially collected, either as time series or by
observing other sequential processes, it is to be expected that the observations are
serially dependent. As such it is useful to understand how serial dependence among
the observations influences the distribution of the process .ZN /w.
We say that .{W (x), x ≥ 0} is a Wiener process, or a standard Brownian motion,
if it is a continuous Gaussian process defined for .x ≥ 0 with .EW (x) = 0 and
.EW (x)W (y) = min(x, y). For the construction and properties of the Wiener

process, we refer to Csörgő and Révész (1981). The continuous Gaussian process
4 1 Cumulative Sum Processes

.B(t) = W (t) − tW (1) for .0 ≤ t ≤ 1 is called a Brownian bridge. It follows that


.EB(t) = 0 and .E[B(t)B(s)] = min(t, s) − ts.
In many cases what is needed to derive the asymptotic distribution of functionals
.ZN /w is the following Gaussian approximation, which we phrase as an assumption:

Assumption 1.1.1 For each N there are two independent Wiener processes
{WN,1 (t), 0 ≤ t ≤ N/2}, .{WN,2 (t), 0 ≤ t ≤ N/2} and .σ > 0 such that
.

| k |
|E |
| |
. sup | Ei − σ WN,1 (k)| = oP (N 1/2 )
1≤k≤N/2 |i=1
|

and
| N |
| E |
| |
. sup | Ei − σ WN,2 (N − k)| = oP (N 1/2 ).
N/2<k<N | i=k+1
|

In case of independent and identically distributed .Ei' s, .σ 2 may be taken to be


their variance, whereas for more general stationary sequences .σ 2 is the long-run
variance, which we will introduce below. There is a large literature on the weak
convergence of the partial sum process, and in Appendix A.1 we survey the topic
and give a general approach allowing one to establish Assumption 1.1.1 for large
classes of stationary, serially dependent processes.
An important, general class of examples to keep in mind of error sequences
for which Assumption 1.1.1 holds we term “.Lν -decomposable” sequences. This
notion characterizes errors that evolve as weakly dependent and stationary processes
generated from an underlying independent and identically distributed innovation
sequence.
Definition 1.1.1 We say that the scalar sequence .{Ei , i ∈ Z} is an .Lν -
decomposable Bernoulli shift, or simply .Lν -decomposable, if for some .ν ≥ 2,
(1) .{Ei , ∈ Z} is a causal Bernoulli shift, which is to say that .Ei = g(ηi , ηi−1 , . . .),
where .{ηi , i ∈ Z} are independent and identically distributed random variables
taking values in a measurable space .S, g is a (deterministic) measurable
function, .g : S ∞ → R, and
(2) .{Ei , ∈ Z} satisfies the moment and weak dependence conditions .EEi = 0,
.E|Ei | < ∞, and
ν

( | | )
vm = E |Ei − Ei,m
.
∗ |ν 1/ν
≤ am−α with some a > 0 and α > 2,

∗ = g(η , . . . , η ∗ ∗ ∗
where .Ei,m i i−m+1 , ηi−m , ηi−m−1 , . . .), where .{ηk , k ∈ Z} are
independent, identically distributed copies of .η0 , independent of .{ηj , j ∈ Z}.
The notion of .Lν -decomposibility is discussed in greater detail in Sect. A.1
of the appendix. .Lν -decomposable sequences include observations generated from
1.1 Introduction 5

most stationary time series models, which are often defined through an underlying
innovation sequence and structural equations. These include stationary solutions
to autoregressive moving average (ARMA) models, generalized autoregressive
conditionally heteroscedastic (GARCH) models, as well as many other stationary
sequences. In essence Definition 1.1.1(2) aims to characterize the rate at which the
error sequence .Ei can be approximated by a sequence exhibiting a finite range of
dependence. We note that .Lν -decomposable errors satisfy Assumption 1.2.2 (see
Sect. A.1 of the appendix).
One of the great advantages afforded by the framework of .Lν -decomposibility
is the ease in which it generalizes to more complicated spaces and settings. For
example, if the observations and errors take their values in a general normed space
rather than .R, we may replace the absolute value .| · | in Definition 1.1.1 with the
norm .|| · || on the space, and immediately get a useful and general characterization
of weakly dependent sequences in that space. As such we make use of this notion
throughout this text to unify the study of asymptotics in change point analysis for
serially dependent observations over many different spaces and settings.
Under Assumption 1.1.1, the CUSUM process can be approximated by a
Brownian bridge.
Theorem 1.1.1 If .H0 of (1.1.2) and Assumption 1.1.1 are satisfied, we may define
a sequence of Brownian bridges .{BN (t), 0 ≤ t ≤ 1} such that

. sup |ZN (t) − σ BN (t)| = oP (1).


0≤t≤1

Proof We note that under the null hypothesis .ZN (t) does not depend on the mean,
and so may be expressed in terms of the errors .E1 , . . . , EN . We write
⎧ ⎛ ⎞
⎪ Ek LN/2⎦
E EN

⎪ k


⎪ Ei − ⎝ Ei + Ei ⎠ ,

⎪ i=1 N

⎪ i=1 i=LN/2⎦ +1
Ek
k E
N ⎨
if 1 ≤ k ≤ N/2, ⎛ ⎞
. Ei − Ei = (1.1.9)
N ⎪
⎪ EN

LN/2⎦
E E
N

⎪ N k ⎝ Ei ⎠ ,
i=1 i=1

⎪ − Ei + Ei +

⎪ N

⎪ i=k+1 i = 1 i = LN/2⎦ + 1

if N/2 < k < N.

Using the Wiener processes of Assumption 1.1.1, we define according to (1.1.9)


⎧ x ( )

⎪ WN,1 (x) − N WN,1 (N/2) + WN,2 (N/2) ,



⎨ if 0 ≤ x ≤ N/2,
.┌N (x)=
⎪ N−x ( ) (1.1.10)

⎪ −WN,2 (N−x) + WN,1 (N/2)+WN,2 (N/2) ,

⎪ N

if N/2 ≤ x ≤ N.
6 1 Cumulative Sum Processes

By Assumption 1.1.1,
|/ k \ |
| E k E
N |
| |
. max | Ei − Ei − σ ┌N (k)| = oP (N 1/2 ). (1.1.11)
1≤k≤N | N |
i=1 i=1

If

BN (t) = N −1/2 ┌N (N t),


. 0 ≤ t ≤ 1, (1.1.12)

then for each N , .BN is a continuous Gaussian process, and one can check that
EBN (t) = 0 and .EBN (t)BN (s) = min(t, s)−ts. This implies that .BN is distributed
.

as a Brownian bridge. Applying (1.1.11), we conclude

. sup |ZN (t) − σ BN (t)|


0≤t≤1

≤ max |ZN (k/N)−σ BN (k/N)| + sup sup |BN (t + s) − BN (t)|


1≤k≤N 0≤t≤1−1/N 0≤s≤1/N

= oP (1),

since by the uniform continuity of the Brownian bridge (see Appendix A.2)

. sup sup |BN (t + s) − BN (t)| = oP (1).


0≤t≤1−1/N 0≤s≤1/N

U

Theorem 1.1.1 implies the weak convergence of functionals of .ZN that are
continuous with respect to the supremum norm. For example, the Kolmogorov–
Smirnov and Cramér–von Mises functionals of .ZN satisfy under Assumption (1.1.1)
and .H0 :

1 D
. sup |ZN (t)| → sup |B(t)|, (1.1.13)
σ 0≤t≤1 0≤t≤1

and
/ 1 / 1
1 D
.
2
ZN (t)dt → B 2 (t)dt. (1.1.14)
σ2 0 0

Two asymptotically size .α tests of .H0 under Assumption 1.1.1 may be constructed
by rejecting .H0 if either statistic on the left-hand side of (1.1.13) or (1.1.14) exceed
the .1 − α quantile of their limiting distributions. In order to practically implement
tests, one usually needs to replace .σ 2 with a consistent estimator, say .σ̂N2 , satisfying

|σ̂N2 − σ 2 | = oP (1).
. (1.1.15)
1.2 Weak Convergence of Weighted CUSUM Processes 7

Combining Slutsky’s theorem with (1.1.13) and (1.1.14) we conclude

1 D
. sup |ZN (t)| → sup |B(t)|
σ̂N 0≤t≤1 0≤t≤1

and
/ 1 / 1
1 D
.
2
ZN (t)dt → B 2 (t)dt.
σ̂N2 0 0

In case of independent and identically distributed or stationary and uncorrelated


observations, the sample variance satisfies (1.1.15). We discuss the estimation of .σ 2
for stationary sequences in Chap. 3.

1.2 Weak Convergence of Weighted CUSUM Processes

It has been observed that functionals of the weighted CUSUM process .ZN /w with
weights .w(·) that ascribe more weight to the CUSUM process near zero and one
can improve the power of CUSUM statistics to detect changes that might occur
anywhere in the sample, especially near the end points. In this subsection we
consider the asymptotic properties of weighted CUSUM processes.
Before proceeding, we note that the Brownian bridge .{B(t), 0 ≤ t ≤ 1} is
“symmetric” in the sense that it has the same behavior around 0 and 1. However,
.{ZN (t), 0 ≤ t ≤ 1} behaves differently around 0 and 1. According to its definition,

.ZN (t) is proportional to t around 0, while it is approximately constant near 1. As

such it may be shown that .ZN (t)/w(t) → ∞ a.s., as .t → 1, for any N , if .w(1) = 0.
So following Csörgő and Horváth (1993) we modify the definition of .ZN (t). Let

QN (t) = ZN (t (N + 1)/N),
. 0<t <1 (1.2.1)

and define
|QN (t)|
. TN,2 = sup .
0<t<1 w(t)

We wish to establish conditions under which


D |B(t)|
TN,2 → sup
. , (1.2.2)
0<t<1 w(t)

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge. (1.2.2) evidently requires that


⎛ ⎞
|B(t)|
P
. sup < ∞ = 1. (1.2.3)
0<t<1 w(t)

We consider non-negative weight functions with the following properties:


8 1 Cumulative Sum Processes

Assumption 1.2.1 (i) .infδ≤t≤1−δ w(t) > 0 for all .0 < δ < 1/2, (ii) .w(t) is non
decreasing in a neighbourhood of 0, (iii) .w(t) is non increasing in a neighbourhood
of 1.
Typical examples of weight functions that we consider are of the form .w(t) =
[t (1 − t)]β , which satisfy Assumption 1.2.1. Let
/ 1 ⎛ ⎞
1 cw 2 (t)
I (w, c) =
. exp − dt. (1.2.4)
0 t (1 − t) t (1 − t)

The necessary and sufficient condition for (1.2.3) is given by an integral test
concerning .I (w, c).
Theorem 1.2.1 If Assumption 1.2.1 is satisfied, then (1.2.3) holds if and only if
I (w, c) < ∞ for some .c > 0.
.

Csörgő and Horváth (1993), (p. 181) contains a detailed proof of Theorem 1.2.1.
In order to establish (1.2.2), we need a stronger condition than Assumption 1.1.1
containing a rate of approximation:
Assumption 1.2.2 For each N there are two independent Wiener processes
{WN,1 (t), 0 ≤ t ≤ N/2}, .{WN,2 (t), 0 ≤ t ≤ N/2}, .σ > 0 and .ζ < 1/2 such
.

that
| k |
|E |
| |
. sup k −ζ | Ei − σ WN,1 (k)| = OP (1)
1≤k≤N/2 | |
i=1

and
| N |
| E |
−ζ | |
. sup (N − k) | Ei − σ WN,2 (N − k)| = OP (1).
N/2<k<N | |
i=k+1

A thorough discussion of this approximation is given in Appendix A.1, but


we remark here that it holds for many stationary processes under mild weak
dependence and moment conditions. These include .Lν -decomposable sequences as
in Definition 1.1.1.
Under this assumption we may extend (1.1.13) to .TN,2 .
Theorem 1.2.2 If .H0 of (1.1.2), Assumptions 1.2.1, 1.2.2 are satisfied, and
I (w, c) < ∞ for some .c > 0, then
.

1 D |B(t)|
. TN,2 → sup , (1.2.5)
σ 0≤t≤1 w(t)

where .{B(t), 0 ≤ t ≤ 1} denotes a Brownian bridge.


1.2 Weak Convergence of Weighted CUSUM Processes 9

Proof First we show that

|QN (t) − σ BN (t)|


. sup = OP (1), (1.2.6)
1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]1/2

where .{BN (t), 0 ≤ t ≤ 1} is the Brownian bridge of (1.1.12). Using Theorem A.2.1
we show that
|W (Nt) − W (LNT ⎦ )
. sup = OP (1), (1.2.7)
1/(N +1)≤t≤1−1/(N +1) (log+ Nt)1/2

where .{W (t), t ≥ 0} is a Wiener process. According to Theorem A.2.1 for any
M > 0 we have that
.

⎧ ⎞
|W (Nt) − W (LNT ⎦ )
.P sup >M
1/(N +1)≤t≤1−1/(N +1) (log+ (N t))1/2
⎧ ⎞
|W (k + s) − W (k)
≤P max sup >M
1/2≤k≤N +1 0≤s≤1 (log+ (k))1/2

⎧ ⎞
E |W (k + s) − W (k)
≤ P sup >M
0≤s≤1 (log+ (k))1/2
k=0

E ⎛ ⎞
M2
≤ c1 k exp − log k
3
k=2

→ 0, as M → ∞,

where .c1 is a positive constant. This gives (1.2.7). By Assumption 1.2.2 we get
| |
|LN t⎦ |
|E |
sup (N t)−ζ | E − σ W (N t) | = OP (1) (1.2.8)
.
| i N,1 |
1/(N +1)≤t≤1/2 | i=1 |

and
| |
| E |
| N |
sup (N (1 − t))−ζ | E − σ W (N (1 − t))|
.
| i N,2 |
1/2<t<1−1/(N +1) |i=LN t⎦ +1 |
= OP (1). (1.2.9)

Combining (1.1.9), (1.2.8) and (1.2.9) we obtain (1.2.6). Namely,

|QN (t) − σ BN (t)|


. sup
1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]1/2
10 1 Cumulative Sum Processes

|QN (t) − σ BN (t)| |QN (t)−σ BN (t)|


≤ sup sup
1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]1/2−ζ
1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]ζ

= OP (1).

Also, for all .0 < δ < 1/2

|QN (t) − σ BN (t)|


. sup = oP (1), (1.2.10)
δ≤t≤1−δ w(t)

on account of Assumption 1.2.1(i).


Next we show that for all .x > 0
⎧ ⎞
|QN (t) − BN (t)|
. lim lim sup P sup >x = 0. (1.2.11)
δ→0 N→∞ 1/(N +1)≤t≤δ w(t)

Since .I (w, c) < ∞ with some .c > 0 and Assumption 1.2.1(ii) holds, Csörgő and
Horváth (1993), (p. 180) yields

t 1/2
. lim = 0, (1.2.12)
t→0 w(t)

and therefore (1.2.11) follows from (1.2.6). It follows similarly that for all .x > 0
⎧ ⎞
|QN (t) − BN (t)|
. lim lim sup P sup >x = 0. (1.2.13)
δ→0 N→∞ 1−δ≤t≤1−1/(N +1) w(t)

Putting together (1.2.10), (1.2.11) and (1.2.13) we conclude

|QN (t) − σ BN (t)|


. sup = oP (1).
1/(N +1)≤t≤1−1/(N +1) w(t)

The distribution of .{BN (t), 0 ≤ t ≤ 1} does not depend on N , and therefore

|BN (t)| D |B(t)|


. sup → sup ,
1/(N +1)≤t≤1−1/(N +1) w(t) 0<t<1 w(t)

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge. Using the definition of .QN , (1.1.9)


and (1.2.12) we get that
|N |
|QN (t)| |E |
t | |
. sup = sup N −1/2 | Ei | = oP (1),
0<t≤1/(N +1) w(t) 0<t≤1/(N +1) w(t) | |
i−1
1.2 Weak Convergence of Weighted CUSUM Processes 11

and similarly

|QN (t)|
. sup = oP (1).
1−1/(N +1)<t≤1 w(t)

The proof of Theorem 1.2.2 is now complete. U



Remark 1.2.1 In light of Theorem 1.2.1, (1.2.5) holds if and only if .I (w, c) < ∞
for some .c > 0.
Theorem 1.2.2 implies the convergence in distribution of the Kolmogorov–
Smirnov statistic .TN,2 . Similarly, we may define the Cramér–von Mises type
statistics
/ 1
|QN (t)|p
.TN,3 = dt,
0 w(t)
where .p ≥ 1. If .p = 2, .TN,3 is a weighted Cramér–von Mises statistic and .p =
1, w(t) = [t (1 − t)]1/2 gives the Anderson–Darling statistic. We wish to show that
/
D
1 |B(t)|p
. TN,3 → σ p dt, (1.2.14)
0 w(t)
where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge. The basic outline of the proof
of Theorem 1.2.2 can be followed to obtain (1.2.14). Firstly, we require that the
limit in (1.2.14) is well defined. The following theorem obtained by Csörgő et al.
(1993) (see Csörgő and Horváth, 1993, pp. 314–316) gives a characterization of the
existence of integrals of the weighted Brownian bridge.
Theorem 1.2.3 If Assumption 1.2.1(i) is satisfied, then
⎛/ 1 ⎞
|B(t)|p
.P dt < ∞ = 1
0 w(t)

if and only if
/ 1 [t (1 − t)]p/2
. dt < ∞. (1.2.15)
0 w(t)

Mimicking the proof of Theorem 1.2.2 one can verify the following result:
Theorem 1.2.4 If .p ≥ 1, .H0 of (1.1.2), Assumptions 1.2.1(i), 1.2.2 and (1.2.15)
are satisfied, then
/
1 D
1 |B(t)|p
. TN,3 → dt, (1.2.16)
σp 0 w(t)

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge.


12 1 Cumulative Sum Processes

1.2.1 Asymptotics of Standardized CUSUM Processes

We now return to the supremum functional of the standardized CUSUM process


⎛ ⎞1/2 ||E |
k E ||
k N
1 1 N |
TN,4
. = TN,1 = max | Xi − Xi | .
σ σ 1≤k<N k(N − k) | N |
i=1 i=1

Under the null hypothesis, .TN,4 is equivalent to

1 |QN (t)|
TN,5 =
. sup ,
σ 0<t<1 [t (1 − t)]1/2

P
where .QN (t) is defined in (1.2.1). As a result of (1.1.8) we have that .TN,4 → ∞
P
and .TN,5 → ∞ under .H0 . Also, due to Theorem 1.2.1, (1.2.2) cannot hold since
.I ([t (1 − t)]
1/2 , c) = ∞ for all .c > 0. One alteration of .T
N,4 that leads to a statistic
with a well defined limiting distribution under .H0 and Assumption 1.1.1 is to “trim”
the domain on which the CUSUM process is maximized. This leads to statistics of
the form
⎛ ⎞1/2 ||E |
k E ||
k N
1 1 N |
.TN,6 = TN,1 = max | Xi − Xi | ,
σ σ LN α1 ⎦ ≤k≤LN α2 ⎦ k(N − k) | N |
i=1 i=1

where .0 < α1 < α2 < 1. It follows from Theorem 1.1.1 and the continuous
mapping theorem that

D |B(t)|
TN,6 →
. sup , (1.2.17)
α1 ≤t≤α2 [t (1 − t)]1/2

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge. The result also follows from


Theorem 1.2.2 with .w(t) = [t (1 − t)]1/2 1{α1 ≤ t ≤ α2 }. It may be shown that

|B(t)| D
. sup = sup |V (u)|,
α1 ≤t≤α2 [t (1 − t)]1/2 0≤u≤c(α1 ,α2 )

where
⎛ ⎞
1 α2 α1
c(α1 , α2 ) =
. log − log
2 1 − α2 1 − α1

and .{V (u), u ≥ 0} is an Ornstein–Uhlenbeck process, i.e. V is a Gaussian process


with .EV (u) = 0 and .EV (u)V (v) = exp(−|u − v|).
1.2 Weak Convergence of Weighted CUSUM Processes 13

Two drawbacks of statistics sharing the form of .TN,6 are that they have reduced
power to detect change points that occur outside of the interval .[LNα1 ⎦ , LNα2 ⎦ ],
and they also depend on the practitioner’s choice of .α1 and .α2 .
Alternatively, it can be shown that .TN,4 and .TN,5 converge in distribution to
extreme-value laws upon proper centralization and normalization, which we now
show. The following result is sometimes referred to in the literature as a “Darling–
Erdős” result, since the basic idea behind it is to apply a Gaussian approximation as
introduced in Darling and Erdős (1956).
Let for .x > 0
1 1
a(x) = (2 log x)1/2
. and b(x) = 2 log x + log log x − log π. (1.2.18)
2 2
Theorem 1.2.5 If .H0 of (1.1.2) and Assumption 1.2.2 are satisfied, then
{ } ( )
. lim P a (log N) TN,5 ≤ x + b (log N) = exp −2e−x
N →∞

for all .x ∈ R.
Proof We divide the unit interval .(0, 1) into 5 subintervals. Let .t1 = t1 (N ) =
(log N )4 /N, .t2 = t2 (N ) = 1 − (log N)4 /N, and define

1 |QN (t)| 1 |QN (t)|


AN,1 =
. sup , AN,2 = sup ,
0<t<1/(N +1) [t (1 − t)] [t (1 − t)]1/2
σ 1/2 σ 1/(N +1)≤t<t1

1 |QN (t)| 1 |QN (t)|


AN,3 = sup , AN,4 = sup
σ t1 ≤t≤t2 [t (1 − t)]1/2 σ t2 ≤t<1−1/(N +1) [t (1 − t)]
1/2

and
1 |QN (t)|
AN,5 =
. sup .
σ 1−1/(N +1)<t<1 [t (1 − t)]1/2

With this notation we can write

. TN,5 = max Ai,N .


1≤i≤5

In the first step we establish that


{ }
. lim P TN,5 = AN,3 = 1. (1.2.19)
N →∞

Theorem A.2.3 yields

|BN (t)|
. sup = OP ((log log log N )1/2 ), (1.2.20)
1/(N +1)≤t≤t1 [t (1 − t)]1/2
14 1 Cumulative Sum Processes

|BN (t)|
. sup = OP ((log log log N )1/2 ), (1.2.21)
t2 ≤t≤1−1/(N +1) [t (1 − t)] 1/2

and
1 |BN (t)| P
. sup → 1, (1.2.22)
(2 log log N) t1 ≤t≤t2 [t (1 − t)]1/2
1/2

where .BN is the Brownian bridge defined in (1.2.6). The approximation in (1.2.6)
with (1.2.20)–(1.2.22) yields that

1 |QN (t)| P
. sup → σ,
(2 log log N)1/2 1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]
1/2

and

AN,2 = OP ((log log log N)1/2 ),


. AN,4 = OP ((log log log N )1/2 ).

By the definition of .QN (t),


|N |
|E | ⎛ ⎞
t −1/2 | | −1/2
AN,1
. = sup N | E i | = O P N
0<t≤1/(N +1) [t (1 − t)]
1/2 | |
i=1

and similarly,
⎛ ⎞
.AN,5 = OP N −1/2 .

Hence (1.2.19) is proven.


Using again (1.2.8) and (1.2.9) with the decomposition in (1.1.10) we obtain that

|QN (t) − BN (t)| |QN (t) − BN (t)|


. sup ≤ sup sup [t (1 − t)]ζ −1/2
t1 ≤t≤t2 [t (1 − t)] 1/2
t1 ≤t≤t2 [t (1 − t)]ζ t1 ≤t≤t2

|QN (t) − BN (t)|


≤ N 1/2−ζ sup (t1 (1 − t1 ))ζ −1/2
t1 ≤t≤t2 N [t (1 − t)]
ζ ζ
⎛ ⎞
= OP N 1/2−ζ ((log N)4 /N)ζ −1/2
⎛ ⎞
= OP (log N)4(ζ −1/2)
⎛ ⎞
= oP (log log N)−1/2 , (1.2.23)
1.2 Weak Convergence of Weighted CUSUM Processes 15

where we have used that .t2 = 1 − t1 . Thus we conclude


| |
| |QN (t)| σ |BN (t)| ||
|
. | sup − sup |
|0<t<1 [t (1 − t)]1/2 1/(N +1)≤t≤1/(N +1) [t (1 − t)]1/2 |

= oP ((log log N)−1/2 ). (1.2.24)

Hence we only need the to establish the extreme value limit result for the
supremum of the standardized Brownian bridge on .[1/(N +1), 1−1/(N +1)], noting
that the distribution of .BN does not depend on N. This is shown in Theorem A.2.3
of the appendix, which implies that
⎧ ⎛ ⎞ ⎛ ⎞⎞
1 |BN (t)| 1
. lim P a rN sup ≤x+a rN
1/(N +1)≤t≤1/(N +1) [t (1 − t)]
N →∞ 2 1/2 2

= exp(−2e−x )

for all .x ∈ R, where .rN = (1 − 1/(N + 1))2 (N + 1)2 . Elementary arguments give
that
| ⎛ ⎞ |
| 1 |
. |a r − a(log N)| = o((log log N )−1/2 )
| 2
N |

and
| ⎛ ⎞ |
| 1 |
| |
| 2 rN − b(log N)| = o(1),
. b

completing the proof of Theorem 1.2.5. U



If .σ in the definition of .TN,4 and .TN,5 is replaced with an estimator .σ̂N , then
Theorem 1.2.5 will remain true if

|σ̂N − σ | = oP ((log log N)−1/2 ),


. (1.2.25)

which is somewhat stronger than (1.1.15).


Theorem 1.2.4 is the .Lp version of Theorem 1.2.2 and we can also provide an .Lp
formulation of the Darling–Erdős type limit result of Theorem 1.2.5. Let for .p ≥ 1
/ ∞/ ∞/ ∞ ⎛
1
a∗ (p) =
. |xy| p
−∞ −∞ −∞ 2π(1 − exp(−2|u|))1/2
⎛ ⎞ ⎞
1
exp − (x + y − 2 exp(−|u|)|xy|) − φ(x)φ(y) dxdydu,
2 2
2(1 − exp(−2|u|))
16 1 Cumulative Sum Processes

and
/ ∞
b∗ (p) =
. |x|p φ(x)dx,
−∞

where
⎛ ⎞
1 1 2
.φ(x) = exp − x
(2π )1/2 2

is the standard normal density function.


Theorem 1.2.6 Let .p ≥ 1. If .H0 of (1.1.2) and Assumption 1.2.2 are satisfied, then
⎛ ⎞1/2 ⎛ / ⎞
1 1 1 |QN (t)|p D
. dt − 2b∗ (p) log N → N,
4a∗ (p) log N σp 0 [t (1 − t)]1+p/2

where .N denotes a standard normal random variable.


We note that Theorem 1.2.6 remains true if .σ is replaced by an estimator .σ̂N
satisfying

|σ̂N − σ | = oP ((log N)−1/2 ).


. (1.2.26)

1.2.2 Rényi-Type Statistics

It may be shown under the no-change null hypothesis, and with independent and
identically distributed errors in model (1.1.1), that the limiting distribution of
.sup0<t<1 |QN (t)|/[t (1 − t)] for .κ > 1/2 is not the supremum of a Gaussian pro-
κ
|E |
| |
cess, but rather is determined by the random variable .sup1≤k<∞ (1/k κ ) | kj =1 Xj |.
In order to modify such statistics so that their limit distribution is a functional of
a Gaussian process, and to increase their power against change points that may lie
near the end points of the sample, Rényi suggested applying such heavier weights
to the CUSUM process, but with an alternate trimming scheme when compared to
that used in (1.2.17). Let the trimming parameters .t1 = t1 (N ) < t2 = t2 (N ) satisfy
the following condition:
Assumption 1.2.3 (i) .min(t1 (N ), 1−t2 (N )) → 0, (ii) .N min(t1 (N ), 1−t2 (N )) →
∞.
We define the statistic

κ−1/2 1 |QN (t)|


TN,7 = rN
. sup ,
σ t1 ≤t≤t2 [t (1 − t)]κ
1.2 Weak Convergence of Weighted CUSUM Processes 17

where

rN = min(t1 (N ), 1 − t2 (N )).
. (1.2.27)

The limiting distribution of .TN,7 may be expressed using the random variables
D D
a1 (κ) and .a2 (κ), which we take to be independent such that .a1 (κ) = a2 (κ) =
.

sup1≤t<∞ |W (t)|/t κ . We introduce

rN rN
. lim = γ1 , lim = γ2 , (1.2.28)
N →∞ t1 (N ) N →∞ 1 − t2 (N )

and
κ−1/2 κ−1/2
a(κ) = max(γ1
. a1 (κ), γ2 a2 (κ)). (1.2.29)

Theorem 1.2.7 If .H0 of (1.1.2), Assumptions 1.2.2 and 1.2.3 are satisfied, and .κ >
1/2, then

D
TN,7 → a(κ).
.

Proof It follows from Assumption 1.2.2 that with the sequence of Brownian bridges
in (1.1.12) we have

|QN (t) − σ BN (t)|


N 1/2−ζ
. sup = OP (1).
1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]ζ

Hence

κ−1/2 |QN (t) − σ BN (t)|


rN
. sup
t1 ≤t≤t2 [t (1 − t)]κ
| |
| |QN (t) − σ BN (t)| |
sup || | sup [t (1 − t)]ζ −κ
κ−1/2
= rN | t ≤t≤t
t1 ≤t≤t2 [t (1 − t)] ζ
1 2
⎛ ⎞
ζ −κ
= OP (N ζ −1/2 )rN (t1 (1 − t2 ))ζ −κ rN t1 + (1 − t2 )ζ −κ
κ−1/2 κ−1/2

⎛ ⎞
= OP (N t1 )ζ −1/2 + (N (1 − t2 ))ζ −1/2

= oP (1)

using Assumption 1.2.3(ii), since .ζ < 1/2. Now the result follows from Theo-
rem A.2.5. U

18 1 Cumulative Sum Processes

Remark 1.2.2 The “difference in sample mean” version of the statistic suggested
by Rényi is
| |
.TN,8 = max |X̄k,1 − X̄k,2 | ,
a(N )≤k≤b(N )

where .X̄k,1 and .X̄k,2 are the sample means of the first k and last .N − k observations,
as defined in (1.1.5). To get the limit distribution of .TN,8 , we apply Theorem 1.2.7
with .κ = 1.
The .Lp analogue of .TN,7 is
/
κ−p/2+1 1 t2 |QN (t)|p
. TN,9 = rN dt,
σp t1 [t (1 − t)]κ

where .rN is defined in (1.2.27). Let .b1 (p, κ) and .b2 (p, κ) be independent random
variables such that
/ ∞
D D |W (t)|p
.b1 (p, κ) = b2 (p, κ) = dt.
1 tκ

The limit distribution is


κ−p/2−1 κ−p/2−1
.b(p, κ) = γ1 b1 (p, κ) + γ2 b2 (p, κ),

where .γ1 and .γ2 are defined in (1.2.28). The proof of the below result mimics
that of Theorem 1.2.7, where the salient difference lies in replacing the use of
Theorem A.2.5 with Theorem A.2.6.
Theorem 1.2.8 Let .p ≥ 1. If .H0 of (1.1.2), Assumptions 1.2.2 and 1.2.3 are
satisfied, and .κ > p/2 + 1, then

D
.TN,9 → b(p, κ).

1.3 Multivariate CUSUM Processes

It is relatively straightforward to state and prove multivariate analogs of the results


stated in Sects. 1.1 and 1.2. The AMOC in the mean model for vector-valued data
may be written as

μ0 + E i , if 1 ≤ i ≤ k ∗ ,
Xi =
. (1.3.1)
μA + E i , if k ∗ + 1 ≤ i ≤ N,
1.3 Multivariate CUSUM Processes 19

where .Xi , μ0 , μA and .E i take values in .Rd , .EE i = 0, and .k ∗ is an unknown change
point. One often wishes to test .H0 of (1.1.2) against the alternative .HA in (1.1.3)
with .μ0 and .μA replacing .μ0 and .μA , respectively.
We introduce the d dimensional cumulative sums (CUSUM) process
⎛ ⎞
LN
E LNt⎦ E ⎠
t⎦ N
ZN (t) = N −1/2 ⎝
. Xi − Xi , 0 ≤ t ≤ 1.
N
i=1 i=1

Let .|| · || and .·T denote respectively the Euclidean or Frobenius norm and the
transpose of vectors and matrices. We replace Assumption 1.1.1 with
Assumption 1.3.1 For each N there are two independent Gaussian processes
{┌ N,1 (t), 0 ≤ t ≤ N/2}, .{┌ N,2 (t), 0 ≤ t ≤ N/2} with values in .Rd such that
.

|| k ||
||E ||
|| ||
. sup || E i − ┌ N,1 (k)|| = oP (N 1/2 ),
1≤k≤N/2 || ||
i=1

|| N ||
|| E ||
|| ||
. sup || E i − ┌ N,2 (N − k)|| = oP (N 1/2 ),
N/2<k≤N || ||
i=k+1

and .┌ N,1 (t) = 0, .┌ N,2 (t) = 0, .E┌ N,1 (t)┌ T


N,1 (t) = min(t, s)E and
.E┌ N,2 (t)┌
T (t) = min(t, s)E.
N,2
The coordinates of .┌ N,1 and .┌ N,2 are dependent Brownian motions with
variances .σi,i , 1 ≤ i ≤ d, where .E = {σi,j , 1 ≤ i, j ≤ d} is the covariance
matrix of the limiting Gaussian processes .┌ N,1 and .┌ N,2 . Assumption 1.3.1 implies
that if .H0 holds, then there are Gaussian processes .{BN (t), 0 ≤ t ≤ 1} with values
in .Rd such that

. sup ||ZN (t) − BN (t)|| = oP (1), (1.3.2)


0≤t≤1

.EBN (t) = 0 and .EBN (t)BT N (s) = (min(t, s) − ts)E. Hence the coordinates of
1/2
.BN (t) are dependent Brownian bridges scaled with .σ
i,i , 1 ≤ i ≤ d. If .E is
nonsingular, the weak convergence in (1.3.2) can be rewritten as
| |
| |
. sup |ZT
N (t)E −1
ZN (t) − B̄T
N (t) B̄N (t) | = oP (1), (1.3.3)
0≤t≤1
20 1 Cumulative Sum Processes

where .B̄(t) = (B̄N,1 (t), . . . , B̄N,d (t))T and .{B̄N,i (t), 0 ≤ t ≤ 1}, i ∈ {1, . . . , d}
are independent Brownian bridges. To use (1.3.2) or (1.3.3) in statistical inference,
we often need a consistent estimator .Ê N satisfying

||Ê N − E|| = oP (1).


.

We discuss the estimation of .E in Sect. 3.1.


The study of the convergence in distribution of functionals of the weighted .ZN
requires an extension of Assumption 1.2.2 to vector-valued observations, which we
state as follows:
Assumption 1.3.2 For each N there are two independent Gaussian processes
{┌ N,1 (t), 0 ≤ t ≤ N/2}, .{┌ N,2 (t), 0 ≤ t ≤ N/2} with values in .Rd and .ζ < 1/2
.

such that
| k |
|E |
| |
. sup k −ζ | E i − ┌ N,1 (k)| = OP (1)
1≤k≤N/2 | |
i=1

and
| N |
| E |
−ζ | |
. sup (N − k) | E i − ┌ N,2 (N − k)| = OP (1),
N/2<k<N | |
i=k+1

┌ N,1 (t) = 0, .┌ N,2 (t) = 0, .E┌ N,1 (t)┌ T


.
T
N,1 (t) = min(t, s)E and .E┌ N,2 (t)┌ N,2 (t) =
min(t, s)E.
We investigate the asymptotic properties of functionals of the weighted

QN (t) = ZN (t (N + 1)/N),
. 0 ≤ t ≤ 1. (1.3.4)

To extend Theorems 1.2.2–1.2.8 to the vector valued CUSUM process, we require


two weighted approximations:

||QN (t) − BN (t)||


. sup = OP (1), (1.3.5)
1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]1/2

and
||QN (t) − BN (t)||
N 1/2−ζ
. sup = OP (1), (1.3.6)
1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]ζ

where

BN (t) = N −1/2 ┌ N (N t),


. 0≤t ≤1
1.3 Multivariate CUSUM Processes 21

and
⎧ x ( )

⎪┌ N,1 (x) − ┌ N,1 (N/2) + ┌ N,2 (N/2) ,




N
if 0 ≤ x ≤ N/2,
.┌ N (x) =
⎪ N −x ( ) (1.3.7)

⎪−┌ N,2 (N − x) + ┌ N,1 (N/2) + ┌ N,2 (N/2) ,

⎪ N

if N/2 ≤ x≤N.

Elementary arguments yield that .BN ∈ Rd is a Gaussian process, and for .t, s ∈
[0, 1], .EBN (t) = 0, and .EBN (t)BTN (s) = (min(t, s) − ts)E. The proofs of The-
orems 1.2.2–1.2.8 use weighted approximations. Using the analogous multivariate
approximations in (1.3.5) and (1.3.6), we can obtain asymptotic approximations for
vector valued processes or for corresponding quadratic forms of those processes.
Assumption 1.3.3 .E is a nonsingular matrix.
If Assumption 1.3.3 holds, we can linearly transform the limit process into a
multivariate process with components that are independent Brownian bridges. Since
the distribution of .{BN (t), 0 ≤ t ≤ 1} does not depend on N , we define .B and .B̄ as

D
. {B(t), 0 ≤ t ≤ 1} = {BN (t), 0 ≤ t ≤ 1} and
{ }D{ }
B̄(t), 0 ≤ t ≤ 1 = E −1/2 B̄N (t), 0 ≤ t ≤ 1 ,

where .E 1/2 is the (positive definite) square root of .E, and .E −1/2 = (E 1/2 )−1 . We
recall the integral criterion based on .I (w, c) from Theorem 1.2.1. Based on the
weighted approximation of the .Rd valued CUSUM process .QN (t), we have that

||QN (t)|| D ||B(t)||


. sup → sup ,
0<t<1 w(t) 0<t<1 w(t)

and

QT −1
N (t)E QN (t) D ||B̄(t)||2
. sup → sup (1.3.8)
0<t<1 w 2 (t) 2
0<t<1 w (t)

if and only .I (w, c) < ∞ for some .c > 0. It is more difficult to obtain the general-
ization of the Darling–Erdős result of Theorem 1.2.5 to .QN (t). Since .cov(B(t)) =
t (1−t)E and .cov(B̄(t)) = t (1−t)Id , a standardized vector-valued CUSUM process
is obtained by weighting the CUSUM process with the weight function .[t (1−t)]1/2 .
The limit distribution of .sup1/(N +1)≤t≤1−1/(N +1) ||B(t)||/[t (1 − t)]1/2 is unknown
for general .E. It is known though for .B̄ (t), and is given in Theorem A.2.7. Let

d
a(x) = (2 log x)1/2
. and bd (x) = 2 log x + log log x −log ┌(d/2), (1.3.9)
2
22 1 Cumulative Sum Processes

where .┌(·) denotes the Gamma function.


Theorem 1.3.1 If .H0 . Assumptions 1.3.3 and 1.3.2 are satisfied, then
⎧ ⎞
(QT −1 1/2
N (t)E QN (t))
. lim P a(log N) sup ≤ x + bd (log N ) = exp(−2e−x )
N →∞ 0<t<1 [t (1 − t)] 1/2

for all .x ∈ R.
Proof It follows as in the proof (1.2.24) in establishing Theorem 1.2.5 that
| |
| (QT −1 1/2 (BT −1 1/2 |
| N (t)E QN (t)) N (t)E BN (t)) |
. | sup − sup |
|0<t<1 [t (1 − t)]1/2 1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]1/2 |
⎛ ⎞
= oP (log log N)−1/2 .

We note that
⎧ d ⎞
{ }
T −1 D E 2
. BN (t)E BN (t), 0 ≤ t ≤ 1 = Bi (t), 0 ≤ t ≤ 1 ,
i=1

where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , d} are independent Brownian bridges. Now
the result follows from Theorem A.2.7 in the Appendix. U

The multivariate Darling–Erdős result can also be extended to .Lp functionals.


Let

/ / d \p/2 / d \p/2
E E | |
d
a∗ (p, d) =
. xi2 yi2 (2π(1 − exp(−|u|))−1/2
R2d+1 i=1 i=1 i=1
⎛ ⎛ ⎞
1
× exp − (x 2 + yi2 − 2 exp(−|u|/2)xi yi
2(1 − exp(−|u|)) i
| |
d ⎞ /| |
d
\/ d \
| |
− φ(xi )φ(yi ) × xi yi du,
i=1 i=1 i=1

where .φ(x) denotes the standard normal density function and


⎛ ⎞/ ⎛ ⎞
p+d d
b∗ (p, d) = 2
.
p/2
┌ ┌ ,
2 2

where .┌(x) is the Gamma function. The proof of the following result is similar to
that of Theorem 1.3.1.
1.3 Multivariate CUSUM Processes 23

Theorem 1.3.2 Let .p ≥ 1. If .H0 , Assumptions 1.3.2 and 1.3.3 are satisfied, then
⎛ ⎞1/2 ⎛ / −1 ⎞
1 1 (QT
N (t)E QN (t))
p/2
D
. dt − 2b∗ (p, d) → N ,
4a∗ (p, d) log N 0 [t (1 − t)]p/2+1

where .N denotes a standard normal random variable.


The techniques used in the proofs of Theorems 1.2.5 and 1.3.1 may also be
used to establish weak convergence results for Rényi-style functionals of the vector-
valued CUSUM process using heavier weights. To state such results we introduce
the limiting random variables. Let .{W(t), t ≥ 1} be a Brownian motion in .Rd with
covariance .E, i.e. it is a Gaussian process with .EW(t) = 0 and .EW(t)W(s) =
min(t, s)E. We define the independent random variables .ā1,E (d, κ) and .ā2,E (d, κ)
with distribution

D D ||W(t)||
ā1,E (d, κ) = ā2,E (κ) = sup
. .
1≤t<∞ tκ

Let
κ−1/2 κ−1/2
āE (d, κ) = max(γ1
. ā1,E (d, κ), γ2 ā2,E (d, κ)). (1.3.10)

Similarly, let .b̄1,E (p, κ), b̄2,E (p, κ) be independent and identically distributed with
/ ∞
D D ||W(t)||p
.b̄1,E (p, κ) = b̄2,E (p, κ) = dt,
1 tκ

and define
κ−p/2−1 κ−p/2−1
b̄E (p, κ) = γ1
. b̄1,E (p, κ) + γ2 b̄2,E (p, κ).

Theorem 1.3.3 If .H0 , Assumptions 1.2.3 and 1.3.2 hold, and .κ > 1/2, then we
have

κ−1/2 ||QN (t)|| D


. rN sup → āE (d, κ),
t1 ≤t≤t2 [t (1 − t)]κ

where .rN and .āE (d, κ) are defined in (1.2.27) and (1.3.10).
The last result of this section gives the convergence in distribution of the integrals
of the heavily weighted CUSUM process.
24 1 Cumulative Sum Processes

Theorem 1.3.4 Let .p ≥ 1. If .H0 , Assumptions 1.2.3 and 1.3.2 hold, .κ > p/2 + 1,
then we have
/ t2
κ−p/2−1 ||QN (t)|| D
.r dt → b̄E (p, κ),
t1 [t (1 − t)]
N κ

where .rN is defined in (1.2.27).


Remark 1.3.1 The results of Theorems 1.3.3 and 1.3.4 hold when .||QN (t)|| is
replaced by the standardized quadratic form T
.[Q (t)E
−1
N
QN (t)] , but the Wiener process .W with values in .R defining the limiting
1/2 d

random variables has .Id as its covariance matrix.


So far we have considered the asymptotic properties of functionals of the
Euclidean norm and quadratic forms of multivariate CUSUM processes. It is also
natural to consider the supremum norm. Let

zN (t) = E −1/2 QN (t),


. 0 ≤ t ≤ 1, (1.3.11)

and .zN (t) = (zN,1 (t), . . . , zN,d (t))T .


The proofs of the following two results are similar to those of Theorems 1.2.2
and 1.3.1, and so they are omitted here.
Theorem 1.3.5 We assume that .H0 , and Assumptions 1.3.2 and 1.3.3 hold.
(i) If Assumption 1.2.1 is satisfied and .I (w, c) < ∞ with some .c > 0, then

|zN,j (t)| D |Bj (t)|


. max max → max max ,
1≤j ≤d 0<t<1 w(t) 1≤j ≤d 0<t<1 w(t)

where .{B1 (t), 0 ≤ t ≤ 1}, . . . , {Bd (t), 0 ≤ t ≤ 1} are independent Brownian


bridges.
(ii) For all .x ∈ R
⎛ ⎞
|zN,j (t)|
. lim P a(log N) max max ≤ x + b(log N ) = exp(−2de−x ),
N →∞ 1≤j ≤d 0<t<1 [t (1 − t)]1/2

where .a(x) and .b(x) are defined in (1.2.18).


Principal component analysis provides another method to transform .QN into a
process with asymptotically independent coordinates, as well as to perform efficient
dimension reduction of the process. Let .λ1 ≥ . . . ≥ λd > 0 and .v1 , . . . , vd be
the eigenvalues and the corresponding eigenvalues of .E. We project .QN into the
directions of the leading p eigenvectors of the matrix .E:

zj,N
. (t) = QT
N (t)vj , 1 ≤ j ≤ p. (1.3.12)
1.4 Exercises 25

∗ (t).
The following result is a restatement of Theorem 1.3.5 for the processes .zj,N
Theorem 1.3.6 We assume that .H0 , Assumptions 1.3.2 and 1.3.3 hold.
(i) If Assumption 1.2.1 is satisfied and .I (w, c) < ∞ with some .c > 0, then for any
.p ∈ {1, . . . , d},


−1/2 |zN,j (t)| D |Bj (t)|
. max max λ → max max ,
1≤j ≤p 0<t<1 j w(t) 1≤j ≤p 0<t<1 w(t)

where .{Bj (t), 0 ≤ t ≤ 1}, .j ∈ {1, . . . , p} are independent Brownian bridges.


(ii) For all .x ∈ R
⎧ ∗ (t)|

−1/2 |zN,j
. lim P a(log N) max max λj ≤ x + b(log N)
N →∞ 1≤j ≤p 0<t<1 [t (1 − t)]1/2

= exp(−2pe−x ),

where .a(x) and .b(x) are defined in (1.2.18).

1.4 Exercises

Exercise 1.4.1 Let {Ei , i ∈ Z} be a stationary AR(1) process, i.e.

Ei = ρEi−1 + ηi ,
. i ∈ Z,

where {ηi , i ∈ Z} are independent and identically distributed random variables with
Eη0 = 0, Eη02 = σ 2 , E|η0 |ν < ∞ with some ν > 2 and |ρ| < 1. Show that there
is a Wiener process {W (x), x ≥ 0} such that
| k |
|E |
| σ |
.| Ei − W (k)| = oP (k 1/ν ).
| 1−ρ |
i=1

Exercise 1.4.2 Let {Ei , i ∈ Z} be a stationary ARMA(p, q) process, i.e.

E
p E
q
.Ei = φj Ei−j + ηi + ψj ηi−j , i ∈ Z,
j =1 j =1
26 1 Cumulative Sum Processes

where {ηi , i ∈ Z} are independent and identically distributed random variables with
Eη0 = 0, Eη02 = σ 2 and E|η0 |ν < ∞ with some ν > 2. Find τ and construct a
Wiener process {W (x), x ≥ 0} such that
| k |
|E |
| |
.| Ei − τ W (k)| = oP (k 1/ν ).
| |
i=1

Exercise 1.4.3 Let {Ei , i ∈ Z} be defined by

E
M
Ei =
. cl ηi−l i ∈ Z,
l=−M

where {ηi , i ∈ Z} are independent and identically distributed random variables with
Eη0 = 0, Eη02 = σ 2 and E|η0 |ν < ∞ with some ν > 2 (finite order linear process).
Find τ and construct a Wiener process {W (x), x ≥ 0} such that
| k |
|E |
| |
.| Ei − τ W (k)| = oP (k 1/ν ).
| |
i=1

Exercise 1.4.4 Show that a stationary ARMA(p, q) sequence

E
p E
q
Ei =
. φj Ei−j + ηi + ψj ηi−j , i ∈ Z,
j =1 j =1

where {ηi , i ∈ Z} are independent and identically distributed random variables


with Eη0 = 0, Eη02 = σ 2 and E|η0 |ν < ∞ with some ν > 4, satisfies Lν -
decomposability.
Exercise 1.4.5 Let X1 , . . . , XN be independent Poisson(λi ), i ∈ {1, . . . , N} ran-
dom variables. We wish to test H0 : λ1 = λ2 = . . . = λN against the alternative
that there is k1 such that λ1 = λ2 = . . . = λk1 /= λk1 +1 = λk1 +2 = . . . = λN . Find
the asymptotic distribution of the maximally selected likelihood ratio statistic under
the null hypothesis.
Exercise 1.4.6 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = μ2 = . . . = μN against the at least
one change alternative using the scan statistic
| |
|k+h
E |
|
−α |
N
|
.TN = h
α−1/2
max (k + h ) X |
i| ,
N 1≤k≤N −hN
N |
| i=k |
1.5 Bibliographic Notes and Remarks 27

where α > 2. Assume that 1 ≤ hN < N and hN → ∞. Compute the limit


distribution of TN under the null hypothesis.
Exercise 1.4.7 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = μ2 = . . . = μN against the at most
two changes alternative HA : μ1 = μ2 = . . . = μk1 /= μk1 +1 = μk1 +2 = . . . =
μk2 /= μk2 +1 = μk2 +2 = . . . = μN . We use the statistic
{ | |
TN = N −3/2
. max max k(m − k) |X̄0,k − X̄k,m | ,
1≤k<m<N
| |}
(m − k)(N − m) |X̄k,m − X̄m,N | ,

where X̄j,k is defined in (2.6.1). Compute the limit distribution of TN under the null
hypothesis.
Exercise 1.4.8 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = μ2 = . . . = μN against the at least
one change alternative using the statistic
{
TN =
. max N −3/2 j (k − j )|X̄0,j − X̄j,k | + (k − j )(l − k)|X̄j,k − X̄k,l |
1≤j <k<l
}
+(l − k)(N − l)|X̄k,l − X̄l,N | .

Compute the limit distribution of TN under the null hypothesis.

1.5 Bibliographic Notes and Remarks

Results on approximation of the partial sum process as in Assumption 1.1.1


have been established in numerous papers for many specialized processes. These
include processes that satisfy mixing conditions, linear processes, and volatility
processes. Horváth and Steinebach (2000) and Horváth and Rice (2014) showed
that a weak approximation of the partial sum process with a rate such as in
Assumption 1.1.1 automatically implies the weak convergence of the weighted
CUSUM process. Csörgő et al. (1986) obtained the first proof of Theorem 1.2.1.
Csörgő and Horváth (1993) discusses the history of the integral tests for the
weighted Brownian bridge and gives a detailed account of Theorem 1.2.1 and its
proof. Shorack and Wellner (1986) is an excellent reference on the weighted Wiener
and Brownian bridges. Chapter 5.3 of Csörgő and Horváth (1993) contains a detailed
proof of Theorem 1.2.3 and gives several application of this result in empirical
process theory. Horváth and Rice (2022) reviews the weak convergence of CUSUM
processes in .Lp spaces.
28 1 Cumulative Sum Processes

Jarušková (2010) obtained Darling–Erdős type results for quadratic forms of


independent vectors. Rényi (1953) introduced the heavily weighted empirical
processes and obtained their asymptotic distribution when the supremum is taken on
a fixed closed sub interval of .[0, 1]. Csörgő (1965) and Csörgő and Horváth (1992)
extended these early results. Horváth et al. (2020) used Rényi’s idea in change point
analysis, and these showed improved power to detect change points that are early
or late in the data. They also extended these results to residuals of many general
models.
Only in special cases do we have explicit formulas for the distributions of
weighted functionals of Brownian bridges. Franke et al. (2012) provides a fast
Monte Carlo algorithm for the computation of quantiles of the supremum norm of
weighted Brownian bridges, employing an adaptive (sequential) time discretization
for the trajectories of the Brownian bridge.
Chapter 2
Change Point Analysis of the Mean

We have seen that under the no change in the mean null hypothesis .H0 , and
assuming the observations satisfy a functional version of the central limit theorem
(Assumptions 1.1.1 and 1.2.2), that the asymptotic distribution of many functionals
of the CUSUM process may be computed. Since the CUSUM process arises as
the objective function in maximally selecting two sample test statistics to test .H0
versus .HA , it stands to reason that, in the presence of change points in the series, the
functionals of the CUSUM process that we have considered should be consistent
in the sense that they diverge in probability to positive infinity as the sample size
grows. One goal of this chapter is to carefully quantify the asymptotic behaviour of
the CUSUM process in the presence of change points.
When several change points are thought to exist in the sequence of observations,
it is natural to estimate their locations with the points at which the CUSUM process
achieves its largest values. In this section we define such estimators, and establish
their asymptotic properties in the presence of a change point in the series, as well
as under local alternatives in which the magnitude of the change in the mean of
the series decreases with the sample size. These can be used to compute confidence
intervals for the change point locations.
There exist a great many methods to detect and estimate multiple change points in
a sequence of observations, and we discuss the consistency properties of two such
approaches: binary segmentation, and model selection techniques using penalized
loss functions (e.g. information criteria).

2.1 CUSUM Statistics in the Presence of Change Points

To begin, we consider the case when there is exactly one change in the mean, i.e. the
observations follow (1.1.1) under .HA . Letting again .μ0 and .μA denote the means

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 29


L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_2
30 2 Change Point Analysis of the Mean

before and after the change, the CUSUM process may be written as

Σ
k
k Σ
N Σ
k
k Σ
N
. Xi − Xi = Ei − Ei + V (k), (2.1.1)
N N
i=1 i=1 i=1 i=1
where

⎪ k(N − k ∗ )
⎨ ΔN , if 1 ≤ k ≤ k ∗ ,
.V (k) = ∗ N (2.1.2)
⎩ k (N − k) ΔN ,

if k ∗ + 1 ≤ k ≤ N,
N
and .ΔN = μ0 − μA . We allow that the change magnitude .ΔN might depend on N,
and may vanish as N increases. Similarly we use .θN = k ∗ /N to denote the break
fraction.
Theorem 2.1.1 We assume that .HA of (1.1.3) and Assumption 1.2.2 are satisfied.
(i) If .0 ≤ κ < 1/2, then

|QN (t)| P
. sup → ∞
0<t<1 [t (1 − t)] κ

if and only if

[θN (1 − θN )]1−κ N 1/2 |ΔN | → ∞.


. (2.1.3)

(ii) Also, when .κ = 1/2,

|QN (t)| P
(log log N)−1/2 sup
. → ∞
0<t<1 [t (1 − t)]
1/2

if and only if

.[θN (1 − θN )]1/2 N 1/2 (log log N)−1/2 |ΔN | → ∞. (2.1.4)

Proof Using Theorem 1.2.2 we see that

|QN,E (t)|
. sup = OP (1),
0<t<1 [t (1 − t)]
κ

where
⎛ ⎞
L(NΣ
+1)t⎦
L(N + 1)t⎦ Σ ⎠
N
1 ⎝
QN,E (t) =
. Ei − Ei .
N 1/2 N
i=1 i=1
2.1 CUSUM Statistics in the Presence of Change Points 31

Note that .V (k ∗ ) = NθN (1 − θN )ΔN , and that the largest value that .[N 2κ V (k)]/
[k(N − k)]κ takes for .k ∈ {1, . . . , N } is at .k = k ∗ . As a result, we get that

|V (L(N + 1)t⎦|
. sup N −1/2 = [θN (1 − θN )]1−κ N 1/2 |ΔN | → ∞ (2.1.5)
0<t<1 [t (1 − t)]κ

if and only if (2.1.3) holds, giving part .(i) of Theorem 2.1.1. The proof of part .(ii)
requires only replacing Theorem 1.2.2 with Theorem 1.2.5. ⨆

Theorem 2.1.1 can be interpreted that for changes that are close to the boundary,
i.e. .θN close to zero or one, smaller changes are more easily detected for larger
values of .κ. Regardless of the value of .κ though, it is easiest to detect changes that
occur in the middle of the sample .(θN = 1/2).
Next we consider the behavior of functionals of weighted CUSUM processes
under local alternatives when the change occurs at a location bounded away from
the end points of the sample.

Assumption 2.1.1 .k ∗ = LNθ⎦ with some .0 < θ < 1.


The local alternative is characterized by the following assumption.
Assumption 2.1.2

Δ
μ0 = μ
. and μA = μ + with Δ /= 0.
N 1/2
Let

t (1 − θ ), if 0 ≤ t ≤ θ
gθ (t) =
.
θ (1 − t), if 0 ≤ t ≤ 1.

Theorem 2.1.2 If .HA of (1.1.3) holds, and Assumptions 1.2.2, 2.1.1 and 2.1.2 are
satisfied, then

|QN (t)| D |σ B(t) − Δgθ (t)|


. sup → sup , (2.1.6)
0<t<1 w(t) 0<t<1 w(t)

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge and .σ is defined in Assumption 1.2.2.


Proof Since .I (w, c) < ∞ for some c, we have that

t 1/2 (1 − t)1/2
. → 0 (t → 0) and → 0 (t → 1)
w(t) w(t)

(see Section 4.1 of Csörgő and Horváth (1993)). Hence .sup0<t<1 |gθ (t)|/w(t)
is finite. As such according to Theorem 1.2.1 the limit in (2.1.6) is finite with
probability 1. It follows from elementary calculation that for a positive constant .c1 ,
32 2 Change Point Analysis of the Mean

|N −1/2 V (L(N + 1)t⎦) − Δg(t)| ≤ c1 t/N for .t ≤ θ , and .|N −1/2 V (L(N + 1)t⎦) −
.

Δg(t)| ≤ c1 (1 − t)/N for .t > θ . From this we obtain that

|N −1/2 V (L(N + 1)t⎦) − Δg(t)|


. sup → 0, as N → ∞. (2.1.7)
0<t<1 w(t)

We showed in the proof of Theorem 1.2.2 that under Assumption 1.2.2 there exists
a sequence of Brownian bridges .{BN (t), 0 ≤ t ≤ 1} such that

|QN,E (t) − σ BN (t)|


. sup = oP (1). (2.1.8)
0<t<1 w(t)

Now the result follows from (2.1.7) and (2.1.8) since the distribution of .{BN , 0 ≤
t ≤ 1} does not depend on N . ⨆

The following result can be used to obtain a description of the power function of
functionals of weighted CUSUM processes under condition (2.1.3).
Theorem 2.1.3 If .HA of (1.1.3), and Assumption 1.2.2 are satisfied, .0 ≤ κ < 1/2,
max{1 − lim supN →∞ θN , lim infN →∞ θN } > 0, and
.

[θN (1 − θN )]1−κ N 1/2 |ΔN | → ∞,


.

then we have
⎧ ⎫
1 |QN (t)| D
. max − [θN (1 − θN )]1−κ N 1/2 |ΔN | → N,
σ [θN (1 − θN )]1/2−κ 0<t<1 [t (1 − t)]κ

where .N denotes the standard normal random variable and .σ is defined in


Assumption 1.2.2.
Proof We establish the result when .ΔN > 0, and the case when .ΔN < 0 can be
handled similarly. As we have seen in the proof of Theorem 2.1.1,

1 |N −1/2 V (L(N + 1)t⎦)|


. sup → 1, (2.1.9)
[θN (1 − θN )]1−κ N 1/2 ΔN 0<t<1 [t (1 − t)]κ

as .N → ∞, where .V (k), 1 ≤ k ≤ N is defined in (2.1.2). According to


Theorem 1.2.2,

|QN,E (t)|
. sup = OP (1). (2.1.10)
0<t<1 (1 − t)]
[t κ
2.1 CUSUM Statistics in the Presence of Change Points 33

Putting together (2.1.9) and (2.1.10) we conclude


⎧ ⎫
|QN (t)| QN (t)
. lim P max = max = 1.
N →∞ 0<t<1 [t (1 − t)] κ 0<t<1 [t (1 − t)]κ

For .0 < δ < 1, we define the events .BN,1 = {(1 − δ)θN ≤ θ̂N ≤ (1 + δ)θN }c ,
and
⎧ ⎫
QN (t) QN (t)
.BN,2 = max > max ,
t∈[(1−δ)θN ,(1+δ)θN ]c [t (1 − t)]κ t∈[(1−δ)θN ,(1+δ)θN ] [t (1 − t)]κ

where
QN (t)
θ̂N = sargmax
.
t∈(0,1) [t (1 − t)]κ

is the smallest maximal argument of .QN (t)/[t (1 − t)]κ . Evidently .BN,1 ⊆ BN,2 .
Using the definition of V , we have that

V (LNt⎦)
. max
t∈[(1−δ)θN ,(1+δ)θN ]c N 1/2 [t (1 − t)]κ
≤ N 1/2 [θN (1 − θN )]1−κ ΔN max{δ 1−κ , (1 − δ)1−κ } + rN,1 ,

where .rN,1 = O(1) is deterministic and arises from approximating .LN t⎦ with Nt.
We have already seen that
V (LNt⎦)
. max = N 1/2 [θN (1 − θN )]1−κ ΔN ,
t∈[(1−δ)θN ,(1+δ)θN ] N 1/2 [t (1 − t)]κ
which is the value of .V (t) achieved at .t = θN . As a result we have that .BN,2 ⊆ BN,3 ,
where

.BN,3 = N (θN (1 − θN ))1−κ ΔN max{δ 1−κ , (1 − δ)1−κ } + cN,1
1/2

QN,E (t)
+ max
t∈[(1−δ)θN ,(1+δ)θN ]c [t (1 − t)]κ

QN,E (θN )
> N 1/2 [θN (1 − θN )]1−κ ΔN + .
[θN (1 − θN )]κ

Since .max0<t<1 QN,E (t)/[t (1 − t)]κ = OP (1) under Assumption 1.2.2, and
1−κ , (1 − δ)1−κ } < 1, we have that .lim
.max{δ N →∞ P (BN,3 ) = 0. It follows that
{ }
. lim P (1 − δ)θN ≤ θ̂N ≤ (1 + δ)θN = lim 1 − P (BN,1 ) = 1. (2.1.11)
N →∞ N →∞
34 2 Change Point Analysis of the Mean

In order to complete the proof of the Theorem, we assume .lim supN→∞ θN < 1;
i.e. .θN cannot be too close to 1. The case when .lim infN→∞ θN > 0 can be proven
similarly. Using Assumption 1.2.2, we may define a sequence of Brownian bridges
.{BN (t), 0 ≤ t ≤ 1} such that

1 |QN,E (t) − σ BN (t)|


. sup = oP (1). (2.1.12)
1/2−κ
θN 1/(N +1)≤t≤2θN [t (1 − t)]κ

Now
| |
1 | B(t) B(θN ) |
sup | | − | (2.1.13)
[θN (1 − θN )] |
.
|t−θN |≤δθN [t (1 − t)]
1/2−κ κ κ
θN
⎧ | |
1 | B(t) B(θ ) ||

⎪ |
⎨ θ 1/2−κ sup | [t (1 − t)]κ − [θ (1 − θ )]κ | , if θN → θ
D ||t−θ|≤δθ |
→ | W (s) |

⎩ sup | κ − W (1)|| , if θN → 0,
⎪ |
|s−1|≤δ s

where .{W (t), t ≥ 0} is a Wiener process. The first part follows from the almost sure
continuity of .B(·). To prove the second part (2.1.13), we use that .B(t) = W (t) −
tW (1), and hence
| |
1 | W (t) − tW (1) W (θN ) − θN W (1) |
sup | − |
.
1/2−κ | [t (1 − t)]κ [θN (1 − θN )]κ |
θN |t−θN |≤δθN
| |
1 | W (t) W (θN ) |
= 1/2−κ |
sup | κ − | + oP (1)
θN |t−θN |≤δθN t θNκ |

and by the scale transformation of the Wiener process


| | | |
| W (t) W (θN ) | D | |
sup | − | = sup | W (s) − W (1)| .
.
| tκ θNκ | | s κ |
|t−θN |≤δθN |s−1|≤δ

Using the almost sure continuity of .B(t) and .W (t) at .t = 1, we conclude that for all
x>0
.

⎧ | |
1 | B(t) B(θN ) |
. lim lim sup P sup | | − |
δ→0 N→∞ 1/2−κ [t (1 − t)]κ [θ (1 − θ )] κ|
θN |t−θN |≤δθN N N

> x = 0. (2.1.14)
2.1 CUSUM Statistics in the Presence of Change Points 35

Putting together (2.1.11), (2.1.12) and (2.1.14) we get that

1 |QN (t)|
. max
[θN (1 − θN )] 1/2−κ 0<t<1 [t (1 − t)]κ
⎧ ⎫
1 σ BN (θN ) N −1/2 V (L(N + 1)t⎦)
= + sup + oP (1)
[θN (1 − θN )]1/2−κ [θN (1 − θN )]κ |t−θN |≤δ [t (1 − t)]κ
⎧ ⎫
1 σ BN (θN )
= + [θN (1 − θN )] 1−κ 1/2
N Δ + oP (1),
[θN (1 − θN )]1/2−κ [θN (1 − θN )]κ

since .V (L(N + 1)t⎦)/[t (1 − t)]κ reaches its largest value at .θN . The distribution of
.BN (θN )/[θN (1 − θN )]
1/2 is standard normal for each N, and therefore the proof is

complete. ⨆

We may also establish similar results for integrated functionals of the CUSUM
process in the presence of a change point.
Theorem 2.1.4 If .HA of (1.1.3) and Assumption 1.2.2 are satisfied, .p ≥ 1 , .κ <
p/2 + 1, then we have
(i)
⎧ 1 |QN (t)|p P
. dt → ∞
0 [t (1 − t)] κ

if and only if

(N 1/2 |ΔN |)p [θN (1 − θN )]p+1−κ → ∞


.

(ii) If in addition, Assumption 2.1.1 holds, then


⎧⎧ 1 ⎧ 1 −1/2 ⎫
1 |QN (t)|p |N V (L(N + 1)t⎦)|p
. dt − dt
p(N 1/2 |ΔN |)p−1 0 [t (1 − t)]κ 0 [t (1 − t)]κ
⎧ 1
D σ B(t) p−1
→ g (t)dt,
0 [t (1 − t)]
κ θ

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge and .σ is defined in


Assumption 1.2.2
Proof The proof follows a similar roadmap to that of Theorem 2.1.1, so we just
outline the main modifications that are needed. We assume once again that .ΔN > 0,
and the .ΔN < 0 case can be handled similarly. Using the definition of .V (k), it may
be checked that

1 1 |N −1/2 V (L(N + 1)t⎦)|p
. dt → c1 ,
(N Δ) [θN (1 − θN )]p+1−κ
1/2 p
0 [t (1 − t)]κ
36 2 Change Point Analysis of the Mean

where .c1 is a positive constant. Hence the first part of Theorem 2.1.4 follows from
Theorem 1.2.4.
To prove the second part we note that
⎧⎧ ⎧ p

1 |QN (t)|p 1 QN (t)
. lim P dt = dt =1
N→∞ 0 [t (1 − t)]κ 0 [t (1 − t)]κ

and
⎧ ⎧ 1
(N −1/2 V (L(N + 1)t⎦))p
1 p
QN (t)
. dt − dt
0 [t (1 − t)]κ
0 [t (1 − t)]κ
⎧ 1
VN (t)(N −1/2 V (L(N + 1)t⎦))p−1
=p dt + oP ((N 1/2 ΔN )p−1 ).
0 [t (1 − t)]κ

Using Assumption 1.2.2 we can define a sequence of Brownian bridges .{BN (t), 0 ≤
t ≤ 1} such that
⎧ 1 VN (t)
. (N −1/2 V (L(N + 1)t⎦))p−1 dt
0 [t (1 − t)]
κ
⎧ 1
σ BN (t)
= (N −1/2 V (L(N + 1)t⎦))p−1 dt + oP ((N 1/2 ΔN )p−1 ).
0 [t (1 − t)]
κ

The distribution of .BN does not depend on N, so once again using the definition of
V (k) we have that
.

⎧ 1 σ BN (t)
(N −1/2 V (L(N + 1)t⎦))p−1 dt
0 [t (1 − t)]κ dt
⎧ 1 ⎛
D σ B(t)
= (N ΔN )
1/2 p−1
t (1 − LNθ ⎦/N)1{0 ≤ t ≤ LNθ⎦/N }
0 [t (1 − t)]
κ
⎞p−1
+ (1 − t)(LNθ ⎦/N)1{LNθ ⎦/N ≤ t ≤ 1} dt.

Since

(N 1/2 ΔN )p−1 t (1−LNθ⎦/N)1{0 ≤ t ≤ LNθ⎦/N }
.

⎞p−1
p−1
+ (1 − t)LNθ ⎦/N1{LNθ⎦/N ≤ t ≤ 1} → gθ (t),
2.1 CUSUM Statistics in the Presence of Change Points 37

as .N → ∞ for each .t ∈ [0, 1], it follows that


⎧ 1 ⎛
σ B(t)
(N 1/2 ΔN )p−1 t (1 − LNθ⎦/N)1{0 ≤ t ≤ LNθ⎦/N }
0 [t (1 − t)]κ
⎞p−1
+ (1 − t)LNθ ⎦/N1{LNθ⎦/N ≤ t ≤ 1} dt
⎧ 1
D σ B(t) p−1
→ g (t)dt,
0 [t (1 − t)]κ θ

which concludes the proof of Theorem 2.1.4(ii). ⨆



We note that the limiting random variable in part .(ii) of Theorem 2.1.4 is
Gaussian.

2.1.1 Testing for Relevant Changes in the Mean

In some cases it is of interest, rather than to consider the alternative (1.1.3) that
the means simply differ, to evaluate through testing whether the means differ by
some “relevant” or “significant” amount. This leads to considering the alternative
hypotheses

HA : |μ0 − μA | > Δ0 ,
. (2.1.15)

where .Δ0 ≥ 0 is a practitioner specified threshold. In this case the null hypothesis
is

H0 : |μ0 − μA | ≤ Δ0 .
. (2.1.16)

We recall the notation .X̄k,1 and .X̄k,2 , denoting the empirical means of .{X1 , . . . , Xk }
and .{Xk+1 , . . . , XN }, defined in (1.1.5). We reject the null hypothesis of (2.1.16)
in favor of the hypothesis in (2.1.15) if there exists a k such that .Δ̂k = |X̄k,1 −
X̄k,2 | is significantly larger than .Δ0 . Since .Δ̂k is not a reliable estimator of .ΔN for
small or large values of k, it is natural to consider Rényi style statistics described
in Sect. 1.2.2, which involve trimming the domain on which .Δ̂k is maximized. In
particular, we consider the statistic
⎛ ⎞
. D̂N = N 1/2
max |X̄k,1 − X̄k,2 | − Δ0 .
a(N )≤k≤N −b(N )

Theorem 2.1.5 We assume that .HA of (1.1.3), Assumption 1.2.2, 2.1.1 are satisfied
and

. min(a(N ), b(N ))Δ2N → ∞. (2.1.17)


38 2 Change Point Analysis of the Mean

Let

. lim N 1/2 (|ΔN | − Δ0 ) = ζ ∈ [−∞, ∞]


N →∞

(i) If .ζ = −∞, then

P
D̂N → −∞.
.

(ii) If .−∞ < ζ < ∞, then

D
D̂N → N (ζ, σ 2 /[θ (1 − θ )]),
.

where .N (ζ, σ 2 /[θ (1 − θ )]) is a normal random variable with mean .ζ and variance
.σ /[θ (1 − θ )].
2

(iii) If .ζ = ∞, then

P
D̂N → ∞.
.

Proof The difference between the means .X̄k,1 and .X̄k,2 may be decomposed as
⎛ k ⎞
N Σ k Σ
N
X̄k,1 − X̄k,2
. = Xi − Xi
k(N − k) N
i=1 i=1
⎛ k ⎞
N Σ k Σ
N
= Ei − Ei + v(k),
k(N − k) N
i=1 i=1

where


⎪ N − k∗
⎨ ΔN , if 1 ≤ k ≤ k ∗ ,
.v(k) =
N − k

⎪ k∗
⎩ ΔN , if k ∗ + 1 ≤ k ≤ N.
k

We assume without loss of generality that .ΔN > 0. It is established in the proof of
Theorem 1.2.7 that
| k |
|Σ k Σ ||
N ⎛ ⎞
N | −1/2 −1/2
. max | Ei − Ei | = OP aN + bN .
a(N )≤k≤N −b(N ) k(N − k) | N |
i=1 i=1
2.1 CUSUM Statistics in the Presence of Change Points 39

The function .v(k) reaches its largest value for .k ∈ {1, . . . , N } at .k ∗ , and .v(k ∗ ) =
ΔN . Also for any .0 < δ < θ ,
⎛ ⎞
N N
. max v(k) < 1 − δ min , .

|k −k|≥N δ k ∗ + Nδ N − (k ∗ − δN )

Since we can choose .δ as small as we wish, it follows similarly as in the proof of


Theorem 2.1.3 that

.|k̂N − k ∗ | = oP (N ),

where

k̂N =
. sargmax |X̄k,1 − X̄k,2 |.
k∈{a(N ),...,N −b(N )}

Thus we obtain that


⎧ ⎫
. lim P max |X̄k,1 − X̄k,2 | = ∗
max (X̄k,1 − X̄k,2 ) = 1,
N →∞ a(N )≤k≤N −b(N ) |k −k|≤δN

for any .δ > 0. We note that


⎛ ∗ ⎞
N Σ
k
k ∗ Σ
N
. ⎝ Ei − Ei ⎠ + ΔN
k ∗ (N − k ∗ ) N
i=1 i=1

≤ max (X̄k,1 − X̄k,2 )


|k ∗ −k|≤δN
⎛ k ⎞
N Σ k Σ
N
≤ max Ei − Ei + ΔN .
|k ∗ −k|≤δN k(N − k) N
i=1 i=1

It follows from Assumption 1.1.1 that


⎛ ⎞
LN
Σ t⎦ Σ
N
N 1/2 ⎝ Nt D [θ−δ,θ+δ] σ B(t)
. Ei − Ei ⎠ −→ ,
N t (N − Nt) N t (1 − t)
i=1 i=1

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge. By the continuity of the Brownian


bridge we have

B(t) B(θ )
. sup → , a.s.,
|t−θ|≤δ t (1 − t) θ (1 − θ )

as .δ → 0. The random variable .B(θ )/[θ (1 − θ )] is normal with mean zero


and variance .[θ (1 − θ )]−1 . If .−∞ < ζ < ∞, then by the hypothesis that
40 2 Change Point Analysis of the Mean

N 1/2 (ΔN − Δ0 ) → ζ , the result (ii) follows. When .|ζ | = ∞, then the random
.

term is asymptotically negligible, from which (i) and (iii) follow. ⨆


2.2 The Asymptotic Properties of Change Point Estimators

We now turn to the asymptotic properties of estimators of .k ∗ and .θ when .HA of


(1.1.3) and Assumption 2.1.1 hold. Let
⎛ ⎞κ ||Σ |
k Σ ||
k N
N |
.k̂N = k̂N (κ) = sargmax | Xi − Xi |
k∈{1,...,N } k(N − k) | N |
i=1 i=1

denote the smallest maximal argument of the weighted CUSUM process. Under the
AMOC model and .HA , .k̂N may be used as an estimator of .k ∗ , and .θ̂N = k̂N /N
serves as an estimator of the break fraction .θ .
In order to describe the asymptotic behavior of .k̂N , we define a triangular drift
term, and two sided Brownian motion, as

⎨ (1 − κ)(1 − θ ) + κθ, if t < 0
.mκ (t) = 0, if t = 0 (2.2.1)

(1 − κ)θ + κ(1 − θ ), if t > 0,

and

W1 (−t), if t < 0
W (t) =
. (2.2.2)
W2 (t), if t ≥ 0,

where .{W1 (t), t ≥ 1} and .{W2 (t), t ≥ 1} are independent Wiener processes. There
is an almost surely unique random variable .ξ(κ) defined as

ξ(κ) = ξ(κ, θ ) = argmaxt {W (t) − |t|mκ (t)} .


. (2.2.3)

It is interesting to note that the random variable .ξ(κ, θ ) does not depend on .θ if
κ = 1/2, and does not depend on .κ when .θ = 1/2. The density function of .ξ(κ) is
.

known, and given by the formula



h(−t; (1 − κ)(1 − θ ) + κθ, (1 − κ)θ + κ(1 − θ )), t ≤ 0
fθ,κ (t) =
.
h(t; (1 − κ)θ + κ(1 − θ ), (1 − κ)(1 − θ ) + κθ ), t > 0,

where

h(u; x, y) = 2x(x + 2y){1 − O((x + 2y)u1/2 )}


.

× exp{2y(x + y)u} − 2x 2 (1 − O(xu1/2 )), u ≥ 0,

and .O(x) denotes the standard normal distribution function.


2.2 The Asymptotic Properties of Change Point Estimators 41

Establishing the asymptotic properties of .k̂N will require the computation of


higher order moments of the CUSUM process of the errors, which necessitates
additional assumptions on the error distribution beyond Assumption 1.2.2. For
this purpose we make use of the .Lν -decomposability framework introduced in
Definition 1.1.1.
Theorem 2.2.1 We assume that .HA of (1.1.3) and Assumption 2.1.1 are satisfied,
and that the model errors in (1.1.1) are .Lν -decomposable.
(i) If .0 ≤ κ < 1/2,

ΔN → 0 and
. NΔ2N → ∞,

then

Δ2N D
.
2
(k̂N − k ∗ ) → ξ(κ).
σ
(ii) If .κ = 1/2,

.ΔN → 0 and NΔ2N / log log N → ∞,

then

Δ2N D
.
2
(k̂N − k ∗ ) → ξ(1/2).
σ

Proof We only prove Theorem 2.2.1(i), and the second part can be proven similarly
with minor modifications. It follows from (2.1.11) that

. |k̂N − k ∗ | = oP (N ). (2.2.4)

In the first step we aim to refine this to

Δ2N |k̂N − k ∗ | = OP (1).


. (2.2.5)

Let for a positive scalar C

C
a = aN =
. .
Δ2N

Notice that for any .C > 0, .aN = o(1). On account of (2.2.5) we can assume that
N α ≤ k ≤ Nβ, for any .α < θ < β. We recall (2.1.1) and introduce
.

⎛ ⎞2κ ⎛Σ ⎞2
k Σ
k N
N
Qk =
. Ei − Ei + V (k)
k(N − k) N
i=1 i=1
42 2 Change Point Analysis of the Mean

⎛ ∗ ⎞2
⎛ ⎞2κ Σ
k ∗ Σ
N
N ⎝ k
− Ei − Ei + V (k ∗ )⎠
k ∗ (N − k ∗ ) N
i=1 i=1

= Qk,1 + · · · + Qk,6 ,

where
⎧⎛ ⎞2κ ⎛ ⎞2κ ⎫ ⎛Σ
⎞2
k Σ
N k
N N
Qk,1
. = − Ei − Ei ,
k(N − k) k (N − k ∗ )
∗ N
i=1 i=1
⎧ ⎫
⎛ ⎞2κ ⎨Σ k∗
k Σ Σ k ∗ Σ ⎬
k N N
N
Qk,2 = Ei − Ei + Ei − Ei
k ∗ (N − k ∗ ) ⎩ N N ⎭
i=1 i=1 i=1 i=1
⎧ ⎫
⎨Σ k∗
Σ k − k ∗ Σ ⎬
k N
× Ei − Ei − Ei , (2.2.6)
⎩ N ⎭
i=1 i=1 i=1
⎛⎛ ⎞2κ ⎛ ⎞2κ ⎞⎛ ⎞
N N Σ
k
k Σ
N

Qk,3 = 2 V (k) − V (k ) Ei − Ei ,
k(N − k) k (N − k ∗ )
∗ N
i=1 i=1
⎛ ⎞2κ ⎛ ⎞ Σ
N
N k∗ − k
Qk,4 = 2 V (k ∗ ) Ei ,
k (N − k ∗ )
∗ N
i=1
⎛ ⎞2κ ⎛ ⎞
N Σ
k Σ
k∗

Qk,5 =2 ∗ V (k ) Ei − Ei ,
k (N − k ∗ )
i=1 i=1
⎛ ⎞2κ ⎛ ⎞2κ
N N
Qk,6 = V 2 (k) − V 2 (k ∗ ),
k(N − k) k (N − k ∗ )

and .V (k) is defined in (2.1.2). Next we show .if l = 1, 2, 3, 4.


1
. max |Qk,l | = oP (1), (2.2.7)
|k−k ∗ |≥a,N α≤k≤Nβ N 1−2κ Δ2N |k ∗ − k|

We only consider the case when .k ≤ k ∗ − a, as the case where .k ≥ k ∗ + a can be


handled by the same arguments. It follows from Theorem A.1.1 that
| k |
|Σ |
| |
. max | Ei | = OP (N 1/2 ). (2.2.8)
1≤k≤N | |
i=1
2.2 The Asymptotic Properties of Change Point Estimators 43

Also, by Theorem A.1.1 there is a sequence of Wiener processes .{WN (t), t ≥ 0}


such that
| |
| Σ k∗ |
1 | |
.a
1/2
max∗ | E − σ W (k ∗
− k) | = OP (a ζ −1/2 ) = oP (1).
∗ |
1≤k≤k −a k − k |
i N |
i=k+1 |

By the scale transformation of the Wiener process we have

1 D 1
a 1/2
. sup |WN (k ∗ − t)| = a 1/2 sup |W (t)|
0≤t≤k ∗ −a k∗ −t a≤t≤k ∗ t

D 1
= sup |W (t)|
1≤t≤k ∗ /a t
1
→ sup |W (t)| a.s.,
1≤t<∞ t

since as .N → ∞, .k ∗ /a = CN Δ2N → ∞ by assumption. Thus we have


| |
| Σ ∗ |
1 | k |
a 1/2 max | Ei || = OP (1). (2.2.9)
.
1≤k≤k ∗ −a k∗ − k |
|i=k+1 |

Using the mean value theorem we obtain that


|⎛ ⎞2κ ⎛ ⎞2κ ||
|
1 | N N |
. max | − | = O(N −1−2κ ),
N α≤k≤Nβ |k ∗ − k| | k(N − k) k ∗ (N − k ∗ ) |

and since .N 1/2 |ΔN | → ∞, (2.2.8) implies (2.2.7) when .l = 1. Now we use
Assumption 2.1.1, and we get from (2.2.8) and (2.2.9) that
⎛ ⎞ ⎛ ⎞
1 1 1
. max |Qk,2 | = OP + OP = oP (1).

|k−k |≥a N 1−2κ Δ2N |k ∗ − k| N |ΔN |
1/2 NΔ2N

Hence (2.2.7) is proven when .l = 2. The same calculations can be used to establish
(2.2.7) when .l = 3 and .4.
Applying (2.2.9) we conclude that

1 1
. max |Qk,5 | = 1/2 OP (1), (2.2.10)
|k−k ∗ |≥a N 1−2κ Δ2N |k ∗ − k| C
44 2 Change Point Analysis of the Mean

where the .OP (1) term does not depend on C. Using again the mean value theorem
there are .c1 > 0 and .c2 > 0 such that

. − c1 N 1−2κ |k ∗ − k|Δ2N ≤ Qk,6 ≤ −c2 N 1−2κ |k ∗ − k|Δ2N , (2.2.11)

if .Nα ≤ k ≤ Nβ. Putting together (2.2.7) and (2.2.11) we conclude


⎛ 4 ⎞
Σ 1 P
. max Qk,l + Qk,6 → −∞.
|k ∗ −k|≥a,N α≤k≤Nβ 2
l=1

Now by (2.2.10) and (2.2.11) we have for all .x > 0 that


⎧ ⎛ ⎞ ⎫
1
. lim lim sup P max Qk,5 + Qk,6 > x = 0.
C→∞ N →∞ |k ∗ −k|≥a,N α≤k≤Nβ 2

This completes the proof of (2.2.5). It may be established similarly as (2.2.7) that

N −(1−2κ)
. max |Qk,l | = oP (1), if l = 1, 2, 3, 4. (2.2.12)
|k−k ∗ |≤Cσ 2 /Δ2N

Hence the limit distribution of .k̂N is determined by .Qk,5 and .Qk,6 . It follows from
elementary calculation that
| |
| −(1−2κ) |
sup. |N Qk ∗ +sσ 2 /Δ2 ,6 + 2[θ (1 − θ )]1−2κ σ 2 |s|mκ (s)| = o(1). (2.2.13)
N
−C≤s≤C

We use now Theorem A.1.1 to obtain that


D [−C,C]
N −(1−2κ) Qk ∗ +sσ 2 /Δ2 ,5
. −→ 2[θ (1 − θ )]1−2κ σ 2 W (s), (2.2.14)
N

where .{W (t), −∞ < t < ∞} is the two sided Wiener process of (2.2.2). Since
{W (t), −∞ < t < ∞} and .{−W (t), −∞ < t < ∞} have the same distribution,
.

(2.2.12)–(2.2.14) imply that

Σ
6
D [−C,C]
N −(1−2κ)
. Qk ∗ +sσ 2 /Δ2 ,l −→
N
l=1

2[θ (1 − θ )]1−2κ σ 2 {W (t) − |s|mκ (s)} . (2.2.15)

If
⎛ ⎞κ ||Σ |
k Σ ||
k N
N |
k̂N,C =
. sargmax | Xi − Xi |
k∈{1,...,N }, |k ∗ −k|≤Cσ 2 Δ2N k(N − k) | N |
i=1 i=1
2.2 The Asymptotic Properties of Change Point Estimators 45

(2.2.15) yields

Δ2N ⎛ ∗
⎞ D
. k̂ N,C − k → argmax|s|≤C (W (s) − |s|mκ (s)) .
σ2
Since

argmax|s|≤C (W (s) − |s|mκ (s)) → ξ(κ) a.s.


.

when .C → ∞, the first part of the theorem is proven. ⨆



Now we turn to the asymptotic distribution of .k̂N when the size of the change, rather
than vanishing as N increases, is asymptotically constant.
Assumption 2.2.1

. lim ΔN = Δ /= 0.
N →∞

In this case, the limiting distribution of .k̂N depends on the joint distribution of the
partial sums of the errors. Let

⎪ −1
Σ



⎪ − Ei , if l < 0,


⎨ i=l
.S(l) = 0, if l = 0, (2.2.16)



⎪ Σ
l



⎩ Ei , if l > 0,
i=1

and define the random variable


{ }
ξκ,Δ = argmaxl ΔS(l) − Δ2 |l|mκ (l) ,
.

where .mκ (t) is defined in (2.2.1).


Theorem 2.2.2 If .HA of (1.1.3), Assumptions 2.1.1, 2.2.1 and .0 ≤ κ ≤ 1/2 hold,
and the model errors in (1.1.1) are .Lν -decomposable, then

D
k̂N − k ∗ → ξκ,Δ .
.

Proof We follow the proof of Theorem 2.2.1 with some modifications. We observe
that (2.2.4) still holds true due to (2.1.11). Instead of (2.2.5), we need to show that

|k̂N − k ∗ | = OP (1).
. (2.2.17)
46 2 Change Point Analysis of the Mean

Using (2.2.4) we can assume that .Nα ≤ k̂N ≤ Nβ for any .0 < α < θ < β < 1.
We use the decomposition of .Qk in (2.2.6). Let .C > 0. We aim to show that

1
. max |Qk,l | = oP (1), if l = 1, 2, 3, 4. (2.2.18)
1≤k≤k ∗ −C N 1−2κ (k ∗ − k)

It follows from (2.2.8) that

1
. max∗ |Qk,1 | = OP (N −1/2 ). (2.2.19)
1≤k≤k −C N 1−2κ (k ∗ − k)

Now using (2.2.8), we get that


| |
| k∗ |
1 | Σ |
max | Ei || = OP (N −1/2 ).
N 1−2κ (k ∗ − k) ||
.
1≤k≤k ∗ −C
i=k+1 |

Our arguments proving (2.2.9) show that for all .C > 0


| |
| k∗ |
1 | Σ |
max | Ei || = OP (1),
.
1≤k≤k ∗ −C k∗ − k |
|i=k+1 |

and for all .x > 0


⎧ | | ⎫
⎨ | k∗ | ⎬
1 | Σ |
lim lim sup P max | E | > x = 0. (2.2.20)
.
⎩1≤k≤k ∗ −C ∗
k −k | i | ⎭
C→∞ N →∞ |i=k+1 |

Hence (2.2.18) holds when .l = 2. Similar arguments can be used to prove (2.2.18)
when .l = 3 and 4. Since
| |
| k∗ |
1 1 || Σ ||
max |Qk,5 | = OP (1) max∗ Ei | ,
1≤k≤k −C k ∗ ||
.
1≤k≤k ∗ −C N 1−2κ (k ∗ − k)
i=k+1 |

where the .OP (1) term does not depend on C, (2.2.20) implies that for all .x > 0,
⎧ ⎫
1
. lim lim sup P max∗ |Q k,5 | > x = 0. (2.2.21)
C→∞ N →∞ 1≤k≤k −C N 1−2κ (k ∗ − k)

Using the mean value theorem we can find .c1 > 0 and .c2 > 0 such that

. − c1 N 1−2κ |k ∗ − k| ≤ Qk,6 ≤ −c2 N 1−2κ |k ∗ − k| (2.2.22)

for all .Nα ≤ k ≤ Nβ.


2.2 The Asymptotic Properties of Change Point Estimators 47

Similar bounds can be obtained for .Qk,l , l = 1, 2, 3, 4, and 5 when .k ∗ + C ≤


k ≤ LNβ⎦. Putting together (2.2.18) and (2.2.22) we conclude
⎛ 4 ⎞
Σ 1 P
. max Qk,l + Qk,6 → −∞,
LN α⎦≤k≤LNβ⎦,|k ∗ −k|≥C 2
l=1

and according to (2.2.21) and (2.2.22)


⎧ ⎛ ⎞ ⎫
1
. lim lim sup P max Q k,5 + Q k,6 > x =0
C→∞ N →∞ LN α⎦≤k≤LNβ⎦,|k ∗ −k|≥C 2

for all .x > 0.


Now we consider .Qk,l for k such that .|k ∗ − k| ≤ C for all .C > 0. In the same
way as (2.2.18) was established, it follows that

1
. max |Qk,l | = oP (1), if l = 1, 2, 3, 4.
N 1−2κ |k ∗ −k|≤C

By definition,
⎧ ⎛ ⎛ ⎞
⎪ ∗ N − k ∗ ⎞1−2κ k∗
Σ

⎪ k

⎪ 2 ΔN ⎝− Ei ⎠ , if 1 ≤ k < k ∗

⎪ N N
1 ⎨ i=k+1
. Qk,5 = 0, if k = k ∗ ,
N 1−2κ ⎪
⎪ ⎛ ∗ ⎞

⎪ k N − k ∗ 1−2κ Σ
k


⎪2
⎩ ΔN Ei , if k ∗ + 1 ≤ k < N.
N N ∗ i=k +1

By the stationarity of .{Ei , i ∈ Z} we have


⎧ ⎫

⎨ Σ
k Σ
k+l ⎬
D
. − Ei 1{−C ≤ l ≤ 0} + Ei 1{1 ≤ l ≤ C} = {S(l), |l| ≤ C}
⎩ ⎭
i=k ∗ +l+1 i=k ∗ +1

where .S(l) is defined in (2.2.16). Therefore


{ } D { }
. N −(1−2κ) Qk ∗ +l,5 , |l| ≤ C → 2[θ (1 − θ )]1−2κ ΔS(l), |l| ≤ C .

Recalling the definition of .Qk,6 , which is deterministic, it follows from elementary


calculations that
| |
| −(1−2κ) ∗ |
. max |N Qk +l,6 + 2|l|[θ (1 − θ )]1−2κ Δ2 mκ (l)| = o(1)
|l|≤C
48 2 Change Point Analysis of the Mean

and therefore
{ }
N −(1−2κ) (Qk ∗ +l,5 + Qk ∗ +l,6 ), |l| ≤ C (2.2.23)

D
{ }
→ 2[θ (1 − θ )]1−2κ (ΔS(l) − Δ2 |l|mκ (l), |l| ≤ C .

If
⎛ ⎞κ ||Σ |
k Σ ||
k N
N |
k̂N,C =
. sargmax | Xi − Xi |
k∈{1,...,N }, |k ∗ −k|≤C k(N − k) | N |
i=1 i=1

then (2.2.23) implies

D
{ }
k̂N,C − k ∗ → argmax ΔS(l) − Δ2 |l|mκ (l) : |l| ≤ C .
.

Since
{ }
argmax ΔS(l) − Δ2 |l|mκ (l) : |l| ≤ C
.

{ }
→ argmax ΔS(l) − Δ2 |l|mκ (l) a.s.,

as .C → ∞. The conclusion of Theorem 2.2.2 follows. ⨆



Remark 2.2.1 So far we have not considered the case when

. lim |ΔN | = ∞.
N →∞

In this case it may be established that

. lim P {k̂N = k ∗ } = 1,
N →∞

if Assumption 2.1.1 holds.


Remark 2.2.2 Theorem 2.2.2 is difficult to apply in practice, since the distribution
of .ξκ,Δ depends on the full joint distribution of the model errors, which is difficult
to estimate. Theorem 2.2.1 may be interpreted that if .|Δ| is small, then

|t|σ 2
ΔS(tσ 2 /Δ2 ) − Δ2
. mκ (t) ≈ σ 2 (W (t) − |t|mκ (t)),
Δ2
where .{W (t), −∞ < t < ∞} is the two sided Wiener process of (2.2.2). The only
unknown in the process on the right hand side above is the parameter .σ 2 , which is
more easily estimated.
2.3 Multiple Changes in the Mean 49

2.3 Multiple Changes in the Mean

We have seen that tests based on several functionals of the CUSUM process are
consistent to detect a single change point, and that natural estimators of a single
change point, defined as the maximal argument of the CUSUM process, are in a
certain sense consistent and have tractable asymptotic distributions upon proper
normalization depending on the size of the change.
CUSUM processes may also be used to detect and estimate more than one change
point, and in this section we explore the asymptotic behavior of weighted CUSUM
processes when there are multiple change points in the mean. In particular, we
consider the model

Σ
R+1
Xi =
. μj 1{kj∗−1 ≤ i < kj∗ } + Ei , i ∈ {1, . . . , N }. (2.3.1)
j =1

Here .μ1 /= μ2 /= . . . /= μR+1 denote the means of the observations .Xi , which
change at the change points .1 = k0∗ < k1∗ < · · · < kR∗ < kR+1 ∗ = N + 1. We
note that model (2.3.1) includes the AMOC model, and the model with .R = 2 and
.μ1 = μ3 is often referred to as the epidemic change point model. We let .ΔN =

min1≤l≤R+1 |ΔN,l |, where .ΔN,l = μl − μl−1 , .i ∈ {1, . . . , R + 1}.


As in the decomposition (2.1.1) in the AMOC model, under model (2.3.1) the
CUSUM process can be expressed as

Σ
k
k Σ
N Σ
k
k Σ
N
. Xi − Xi = Ei − Ei + V̄ (k),
N N
i=1 i=1 i=1 i=1

where

Σ
i−1
k Σ
R+1
V̄ (k) =
. μl (kl∗ − kl−1

) + (k − ki−1 )μi − μl (kl∗ − kl−1

),
N
l=1 l=1

if ki−1 <k≤ ki∗ , (2.3.2)

i ∈ {1, ..., R + 1}. We assume in several of the asymptotic results below that the
.

change points are well spaced, and the size of changes are bounded:

Assumption 2.3.1
(i) .ki∗ = LN θi ⎦, .i ∈ {1, . . . , R} and .0 < θ1 < · · · < θR < 1.
(ii) .max1≤l≤R+1 |ΔN,l | < ∞.
It follows from a simple calculation that under Assumption 2.3.1
| |
| V̄ (Nt) |
| |
. sup
| N − v̄N (t)| = o (ΔN ) ,
0≤t≤1
50 2 Change Point Analysis of the Mean

and

Σ
i−1 Σ
R+1
v̄N (t) =
. μl (θl −θl−1 )+(t−θi−1 )μi−1 −t μl (θl −θl−1 ), if θi−1 <t≤θi ,
l=1 l=1

1 ≤ i ≤ R + 1. Here we use the convention that .θ0 = 0 and .θR+1 = 1. We note that
.

ΔN,l might depend on N . In particular we will consider the case when .ΔN,l → 0.
.

The drift function .v̄N (t) is a polygonal function with knots at .θ1 , . . . , θR . We
can extend Theorems 2.1.1–2.1.4 to multiple changes; largely this is an exercise in
replacing .V (x) with .V̄ (x). We note that under Assumption 2.3.1, the test statistics
that are supremum functionals of CUSUM processes will diverge in probability to
infinity if .N 1/2 max1≤l≤R+1 |ΔN,l | → ∞, i.e. at least one of the changes in the
mean is not too small.

2.3.1 Binary Segmentation

Typically R, the number of change points in model (2.3.1), is unknown, and we


wish to estimate it along with the change point locations. A simple method to extend
any single change point estimation procedure to detect and estimate multiple change
points is to use binary segmentation. The idea behind binary segmentation is simple:
starting from the whole sample, we apply a test, for example using a functional of a
weighted CUSUM process, for the existence of a single change point in the sample.
If that test detects a change point, we estimate it using a change point estimator, and
then we segment the sample into two sub-samples of observations before and after
the estimated change point. The procedure is then repeated on each sub-sample until
no changes are detected or the sub-sample becomes too short. One might expect
that the behaviour of the method will depend critically on how it is decided whether
a change point is detected in the sample, and on how the change point locations
themselves are estimated within each sub-sample.
In order to describe the method rigorously, we define binary segmentation, and
the estimators that it outputs, as an algorithm. Since the procedure involves detecting
and estimating change points on subsegments, we use the notation .l and u to denote
the starting and ending points, respectively, of the sub-sample under consideration.
(l,u) (l,u)
We let .TN = maxk∈{l,...,u} SN (k) denote the change point test statistic, or
detector, that is computed based on the sub-sample defined by .l and u, with the
change point estimator on that segment defined via

(l,u)
k̂ = sargmax SN
. (k).
k∈{l,...,u}

(l,u)
Typically we will consider a change point to be detected in a sub-sample if .TN
exceeds a user specified threshold .ρN . For example, .ρN might be taken to be a
2.3 Multiple Changes in the Mean 51

(l,u)
suitable quantile of the approximate (asymptotic) distribution of .TN , or .ρN might
be a scaled function of N, for instance .ρN = σ (log N)1/2 . With this notation, the
binary segmentation algorithm is described as follows:

Algorithm Binary segmentation: BINSEG(.l, u, ρN )


Inputs: l ← starting index ; u ← ending index ; ρN ← threshold
(l,u) (l,u)
if u − l ≤ 1 , then STOP else Compute k̂ = sargmax SN (k), and TN
k∈{l,...,u}
(l,u)
if TN > ρN then
add k̂ to the set of estimated change points. run BINSEG(l, k̂, ρN ) and BINSEG(k̂ + 1, u,
ρN )
else STOP
end if

BINSEG(1, N, .ρN ) returns a set of estimated change points .K̂ = {k̂1 , . . . , k̂R̂ },
sorted into increasing order, and an estimated number of change points .R̂ = |K̂|.
We now aim to study the asymptotic consistency properties of these estimators
when the detector used is a supremum functional of a weighted CUSUM process.
Interestingly, if weights are not applied in computing the detector, binary segmen-
tation may not lead to a consistent method to estimate R, as illustrated with the
following example.
Example 2.3.1 Let .{Ei , i ≥ 1} be independent identically distributed random
variables with .EEi = 0 and .EEi2 = 1. Consider observations generated so that

⎨ 2 + Ei , if 1 ≤ i ≤ LN/3⎦,
.Xi = 1 + Ei , if LN/3⎦ < i ≤ L2N/3⎦,

Ei , if L2N/3⎦ < i ≤ N.

A plot of the mean of .Xi with respect to i looks like a “staircase” with equal steps.
Consider observations generated so that

LN
Σ LN
LNt⎦ Σ Σ LNt⎦ Σ
t⎦ N t⎦ N
. Xi − Xi = Ei − Ei + vN (t)
N N
i=1 i=1 i=1 i=1

with for .t ∈ [0, 1]




⎪ LNt⎦, if⎛ 1 ≤ LNt⎦ ≤⎞LN/3⎦,

⎨ 3LN/3⎦
.vN (t) = LN/3⎦ + 1 − LNt⎦, if LN/3⎦ < LNt⎦ ≤ L2N/3⎦,

⎪ N

⎩ 3LN/3⎦ − LNt⎦ 3LN/3⎦ , if L2N/3⎦ < LNt⎦ ≤ N.
N
52 2 Change Point Analysis of the Mean

We see that for each .t ∈ [0, 1]

vN (t)
. → v(t), as N → ∞,
N
where

⎨ t, if 0 ≤ t ≤ 1/3,
.v(t) = 1/3, if 1/3 ≤ t ≤ 2/3,

1 − t, if 2/3 ≤ t ≤ 1.

We note that
| k |
|Σ k Σ ||
N
|
. max | Ei − Ei | = OP (N 1/2 ).
1≤k≤N | N |
i=1 i=1

As we will show in detail below, since the drift term .vN (t) is asymptotically of
higher order than the CUSUM process of the errors, the asymptotic properties of
the change point estimator .k̂N are largely determined by the drift term. In this case,
since the drift term takes it largest value at each .k ∈ {LN/3⎦, . . . , L2N/3⎦}, and
is flat on this interval, the change point estimator as the maximal argument of the
CUSUM process is determined by the CUSUM of the errors. If for example .k̂N =
k̂N (0) denotes the change point estimator based on the standard CUSUM statistic,
then .k̂N /N converges to a non-degenerate random variable that is supported on the
interval .[1/3, 2/3], and is not a consistent estimator of a change point.
Example 2.3.1 suggests that binary segmentation applied using functionals of
the CUSUM process without weights can lead to overestimation of the number of
change points.
If binary segmentation is based on the weighted CUSUM .ZN (N t/(N +
1))/[t (1 − t)]κ , .0 < κ ≤ 1/2, and .ΔN is bounded away from zero, then the method
is consistent in that the number of change points R is correctly estimated with
probability tending to one, and, conditioning on .R̂ = R, the centered estimators
.k̂i − ki are bounded in probability. In view of Theorem 2.2.2, these rate conditions

cannot be improved.
Towards establishing this result, the following lemma shows that when weights
are applied the drift function cannot have “flat” segments. As a result, points at
which the maximum is reached will asymptotically coincide with change points.
Let

Σ
k−1 Σ
R+1
v(t) =
. (θi − θi−1 )μi + (t − θk−1 )μk − t (θi − θi−1 )μi (2.3.3)
i=1 i=1

if .θk < t ≤ θk , 1 ≤ k ≤ R + 1 .(θ0 = 0, θR+1 = 1).


2.3 Multiple Changes in the Mean 53

Lemma 2.3.1 We assume that .v(t) is defined in (2.3.3) and .μ1 /= μ2 /= . . . /=


μR+1 and .0 < κ ≤ 1/2. For every .c, d satisfying .θk−1 ≤ c < d ≤ θk with some
.1 ≤ k ≤ R + 1, the function .|v(t)|/[t (1 − t)] reaches its largest value on .[c, d]
κ

only at c or d if .infc<t<d |v(t)| > 0. Moreover, on .[c, d] .vN (t) is either strictly
increasing, strictly decreasing, or is decreasing on .[c, a) and increasing on .(a, d]
for some .a ∈ (c, d).
Proof Due to its definition, .v(t) is linear on .[c, d]. If .|v(t)| is constant on .[c, d],
say .|v(t)| = b > 0 for .t ∈ [c, d], then the derivative of .b/[t (1 − t)]κ is .−bκ[t (1 −
t)]−κ−1 (1 − 2t). If .c < d ≤ 1/2, then .b/[t (1 − t)]κ is strictly decreasing on .[c, d],
if .1/2 ≤ c < d, then .b/[t (1 − t)]κ is strictly increasing on .[c, d]. In these cases
the maximum over .[c, d] occurs at c and d, respectively. If .c < 1/2 < d, then
.b/[t (1 − t)] is strictly decreasing until .1/2 and then strictly increasing. Once again
κ

the maximum can only occur at c or d. As such, we may turn to the case where v
on .[c, d] is linear with a nonzero slope term. Since multiplying .|v(t)| by a constant
does not change the location of its maximum, may consider instead the function

t +b
f (t) =
. on [c, d].
[t (1 − t)]κ

If .v(t) changes sign at a, then we can consider the function .|v(t)| on the intervals
.[c, a] and .[a, d], and show that the maximum of .|v(t)| is reached at c and d since
.v(a) = 0. Thus we can assume that the sign of .v(t) does not change on .[c, d]. If

.v(t) is positive on .(c, d), we need to show that the maximum is uniquely reached at

c or d. If .v(t) is negative on .(c, d), we need to prove that the minimum is achieved
only at c or d. The function .[t (1 − t)]−κ is strictly increasing on .[1/2, 1] we get that
for all .1/2 ≤ t < s ≤ 1,

s+b t +b
. >
(s(1 − s))κ [t (1 − t)]κ

since the product of two strictly increasing functions is strictly increasing. It follows
that the maximum is at d and the minimum is at c. As a result we may turn to the
case where .0 < c < 1/2. Elementary calculations give

h(t)
f ' (t) =
. with h(t) = −(1 − 2κ)t 2 + (2κb − κ + 1)t − κb.
[t (1 − t)]κ+1

The roots of .h(t) giving the points were .f ' (t) = 0 are
( )1/2
2κb − κ + 1 − (2κb − κ + 1)2 − 4(1 − 2κ)κb
.t1 = , (2.3.4)
2(1 − 2κ)
54 2 Change Point Analysis of the Mean

and
( )1/2
2κb − κ + 1 + (2κb − κ + 1)2 − 4(1 − 2κ)κb
.t2 =
2(1 − 2κ)

We consider the following cases (1) .f (t) is increasing at c, i.e. .h(c) > 0.
(i) .1/2 < d < 1. We have already shown that .f (t) is increasing on .[1/2, 1] and
therefore .h(d) > 0. If .t1 = t2 ∈ [c, 1/2], then .h(t) ≥ 0 on .[c, d] so that f
is increasing over .[c, d] and its maximum occurs at d. If .t1 > c, then since
.h(0) = −κb > 0, .b < 0. We note .h(0) < 0 and .h(d) > 0 forces .t2 > d. As a

result, when .t1 > c and .h(d) > 0, f decreases from c to .t1 , and then increases
from .t1 to d, hence the maximum must occur at either c or d. Otherwise .t1 < c.
If .c < t2 < 1/2, then once again .h(d) > 0 cannot hold. Hence in this case
.h(t) ≥ 0 on .[c, d], so .f (t) is increasing on .[c, d] and the maximum occurs at

d.
(ii) .d < 1/2. We claim that .h(d) < 0 leads to a contradiction. Assume that .h(d) <
0. Then

d 2 (2κ − 1) − d(κ − 1)
κb ≥
. .
1 − 2d

Now .h(c) > 0 yields

c2 (2κ − 1) − c(κ − 1)
κb ≤
.
1 − 2c

and therefore

d 2 (2κ − 1) − d(κ − 1) c2 (2κ − 1) − c(κ − 1)


. ≤ . (2.3.5)
1 − 2d 1 − 2c

Using that .c, d ≤ 1/2 one can verify that (2.3.5) holds if and only if

1 − κ ≤ (1 − 2κ)(d + c − 2cd).
.

Observing that

1 1
d + c − 2cd = c(1 − 2d) + d ≤
. (1 − 2d) + d = ,
2 2
we get

1
1−κ ≤
. (1 − 2κ)
2
2.3 Multiple Changes in the Mean 55

which is a contradiction. If .h(d) > 0, the argument in part (1)(i) gives that .f (t)
increases on .[c, d].
(2) .f (t) decreases at c, so .h(c) < 0.
(i) If there are no roots of .h(t) between c and d, then .f (t) decreases on .[c, d], so
the unique maximum and minimum are at c and d.
(ii) If there is exactly one root of .h(t) on .[c, d], then it is the smallest root .t1 and
.f (t) has its smallest value at .t1 and the maximum is at c or d. This covers the

case when .c + b > 0. Assume now that .c + b < 0. Then the smaller root of
.h(t) is .t1 of .(2.3.4). Since .b < −c < 0, we have that .4(1 − 2κ)κb < 0 and

therefore
2κb − κ + 1 − |2κb − κ + 1| 2κb − κ + 1
.t1 ≤ = < 0,
2(1 − 2κ) 1 − 2κ

if .2κb − κ + 1 < 0, It is clear that .t1 ≤ 0 if .2κb − κ + 1 ≥ 0. Since .f (t)


increases on .[t1 , t2 ] so .t2 < c, then we are back to case (2)(i).
(iii) Assume that there are two roots of .h(t) between c and d. Now .f (t) decreases
on .[c, t1 ] so the maximum is reached at c. The function .f (t) starts increasing at
.t1 . Now we apply the same argument as in case (1), and get that .f (t) increases

on .[t1 , d] so the maximum is reached at d. This covers the case when .c+b > 0.
If .c + b < 0, then case (2)(i) yields that .t1 < 0, so we cannot have two roots
between c and d.


Due to this property passed on to the drift term when weights are applied
in computing the CUSUM detectors, binary segmentation at each stage will
asymptotically correctly identify a change point when one is present and well spaced
from the prior change point estimators. By carefully keeping track of the magnitude
of the CUSUM processes of the model errors at each stage, we may establish the
following consistency result.
Theorem 2.3.1 Suppose the observations follow model (2.3.1) with errors that are
.Lν -decomposable for some .ν > 2, Assumption 2.3.1 holds, .ΔN is bounded away
from zero, and .0 < κ ≤ 1/2. Then if the threshold parameter in binary segmentation
.ρN satisfies

(log N)1/ν ρN
. + 1/2 → 0, as N → ∞,
ρN N

then for any sequence .rN satisfying .rN → ∞, arbitrarily slowly,


⎛ ⎞
P {R̂ = R} ∩ { max
. |k̂i − ki∗ | ≤ rN } → 1. (2.3.6)
1≤i≤R
56 2 Change Point Analysis of the Mean

We delay the proof of Theorem 2.3.1 to Chap. 8 (see Theorem 8.2.2), which
establishes this result for binary segmentation based on norms of weighted CUSUM
processes in general separable Hilbert space. The assumption that .ΔN is bounded
away from zero is critical in obtaining the .OP (1) rate for the change point estimators
detailed in (2.3.6), while still assuming minimal conditions on the threshold
parameter .ρN and moment and weak–dependence conditions on the model errors
in (2.3.1). For .ΔN shrinking to zero such that .Δ2N N → ∞,
⎛ ⎞
P {R̂ = R} ∩ { max Δ2N,i |k̂i − ki∗ | ≤ rN } → 1,
. (2.3.7)
1≤i≤R

may be established for example when the model errors are linear processes with
innovations that have sub-Gaussian (or sub-exponential) tails. The rate of estimation
for the change point locations in (2.3.7) is optimal in view of Theorem 2.2.1. Further
discussion of these results may be found in Sect. 2.7 and Remark 2.3.3.

2.3.2 Model Selection Criteria

Another popular method to estimate the number and locations of the change points
is to use model selection criteria. We consider a candidate model consisting of S
change points in the means of the observations occurring at times .r1 < · · · < rS as
an estimate of the model (2.3.1). In order to evaluate the quality of such a model,
we measure its fidelity to the data using the loss function .M(S, r1 , . . . , rS ), which
we aim to minimize with respect to the number and locations of the change points.
We imagine that .M(·) is such that small values indicate a good fit. An example is to
use least squares loss. In this case we compute the sample means for each segment
determined by the change point candidates,

1 Σ
ri
X̂ri−1 ,ri =
. Xl , i ∈ {1, . . . , S + 1},
ri − ri−1
l=ri−1 +1

with .r0 = 0 and .rS+1 = N, and define

Σ
S+1 Σ
ri
. M = M(S, r1 , . . . , rS ) = (Xl − X̂ri−1 ,ri )2 .
i=1 l=ri−1 +1

It is evident is that the above M will be minimized with the value zero if .S = N and
ri = i, .i ∈ {1, . . . , N }, i.e. each point is identified as a change point. To control the
.

number of changes, a penalty term is added to M, so that we instead minimize

MP EN (S, r1 , . . . , rS ) = log(M(S, r1 , . . . , rS )) + P(N, S).


.
2.3 Multiple Changes in the Mean 57

The penalty term is taken to be an increasing function of S. Weighted least squares


may also be preferred to standard least squares, due to the power properties of
weighted CUSUM statistics, in which case we may minimize

MP EN (S, r1 , r2 , . . . , rS )
.
⎛ ⎞
Σ Σ
S+1 ri
= log ⎝ (Xl − X̂ri−1 ,ri )2 w(l; ri , ri−1 )⎠ + P(N, S),
i=1 l=ri−1 +1

where the function .w(·) determines the weights. Typical choices of the penalty
function generate the well known information criteria used in model selection. For
example .P(N, S) = (N + 2S)/N leads to the Akaike information criteria, whereas
.P(N, S) = (S log N)/N is often referred to as the Bayesian or Schwarz information

criteria.
Another similar approach is to replace least squares with a maximized likelihood,
making use of distributional assumptions on the model errors. We compute the
likelihood function, assuming that the means change exactly S times, and the
possible times of changes .r1 < · · · < rS , resulting in .Lmax (S, r1 , . . . , rS ). We
then minimize

MP EN (S, r1 , . . . , rS ) = −2 log Lmax (S, r1 , . . . , rS ) + P(N, S).


.

We now consider a consistency result based on such an approach where standard


CUSUM processes are used to measure the model fidelity to the data. For S change
points occurring at the locations .r1 < · · · < rS , the maximally selected squared
CUSUM statistic for the period .(ri−1 , ri ] is denoted

H (ri−1 , ri )
.

⎛ ⎞2
1 Σ
k
k − ri−1 Σ
ri
= max ⎝ Xl − Xl ⎠ , i ∈ {1, . . . , S + 1}.
N ri−1 <k≤ri ri − ri−1
l=ri−1 +1 l=ri−1 +1

We then define
(H )
MP EN (S, r1 , . . . , rS ) =
. max H (ri−1 , ri ) + g(S)mN
1≤i≤S+1

The number and the locations of the change points are estimated with .R̂ and
k̂1 , . . . , k̂R̂ satisfying
.

(H ) (H )
MP EN (R̂, k̂1 , . . . , k̂R̂ ) = min
. min MP EN (S, r1 , . . . , rS ). (2.3.8)
S∈N 1<r1 <r2 <...<rS <N
58 2 Change Point Analysis of the Mean

Consistency of these estimators is achieved when the innovations satisfy the


Gaussian approximation Assumption 1.1.1, and the penalty term .g(k)mN satisfies

Assumption 2.3.2 (i) .g(x) is a positive and strictly increasing function, and (ii)
mN → ∞ and .mN /(NΔ2N,l ) → 0 for all .l ∈ {1, . . . , R + 1},
.

where again .ΔN,l = μl − μl−1 .l ∈ {1, . . . , R} in (2.3.1).


Theorem 2.3.2 If the observations follow model (2.3.1), Assumptions 1.1.1,
and 2.3.1–2.3.2 hold, and

NΔ2N,l → ∞,
. for all l ∈ {1, . . . , R + 1},

then for the estimators .R̂ and .k̂1 , . . . , k̂R̂ defined in (2.3.8) and all .ε > 0,
⎛ ⎧ ⎫⎞
. lim P {R̂ = R} ∩ max |k̂i − ki∗ | < εN = 1. (2.3.9)
N→∞ 1≤i≤R

Proof We let .HN (S, r1 , . . . , rS ) = max1≤i≤S+1 H (ri−1 , ri ), so that .R̂ and


.k̂1 , . . . , k̂ satisfy

(H )
MP EN (R̂, k̂1 , . . . , k̂R̂ ) = min
. min HN (S) + g(S)mN .
S∈N 1<r1 <r2 <...<rS <N

We decompose the CUSUM into random and non-random terms:


⎛ ⎞
Σ
k
k − ri−1 Σ
ri
N −1/2 ⎝
. Xl − Xl ⎠
ri − ri−1
l=ri−1 +1 l=ri−1 +1
⎛ ⎞
Σ
k
k − ri−1 Σ
ri
= N −1/2 ⎝ El − Ei ⎠ + v(k; ri−1 , ri ),
ri − ri−1
l=ri−1 +1 l=ri−1 +1

where .v(k; ri−1 , ri ), ri−1 < k ≤ ri , 1 ≤ i ≤ S + 1 are non-random drift terms. It


follows from Assumption 1.1.1 that for each fixed S,
⎛ ⎞2
1 ⎝ Σ
k
k − ri−1 Σ
ri
min max El − Ei ⎠
1<r1 <r2 <...<rS <N 1≤i≤S+1 N ri − ri−1
l=ri−1 +1 l=ri−1 +1

D
→ σ sup max max W (xi + si−1 ) − W (si−1 )
0≤s1 ≤s2 ≤...≤sS ≤1 1≤i≤S+1 si−1 ≤xi ≤si
⎞2
xi
− (W (si ) − W (si−1 )) .
si − si−1
2.3 Multiple Changes in the Mean 59

Hence the random part is bounded in probability for any fixed S. So the asymptotic
behaviour of .MP(HEN
)
(S, r1 , . . . , rS ) is governed by the drift and penalty terms. Since
the drift term vanishes when .S = R and .ri = ki ,

MP(HEN
.
)
(R, k1 , . . . , kR ) = g(R)mN + OP (1). (2.3.10)

If .S < R, then for any setting .1 ≤ r1 < · · · < rS ≤ N, there is at least one
change point .ki∗ between .rj −1 and .rj for some .1 ≤ j ≤ S + 1 and at least one of
.ki − rj −1 or .rj − ki is proportional to N as a consequence of Assumption 2.3.1.

Hence as in (2.1.5),

HN (S, r1 , . . . , rS )
. = OP (1), (2.3.11)
N min1≤l≤R Δ2N,l

and therefore using Assumption 2.3.2(ii)

MP(HEN
)
(S, r1 , . . . , rS ) P
. → ∞.
mN

It follows then comparing to (2.3.10) that

. lim P {R̂ < R} = 0.


N →∞

(H )
If .S > R, then the drift term in .MP EN is again made to vanish by setting R of the
' ∗ ∗
.r s equal to .k , . . . , k , which implies that
i 1 R

(H )
MP EN (R̂, k̂1 , . . . , k̂R̂ ) ≥ OP (1) + g(S)mN .
.

Since g is strictly increasing by Assumption 2.3.2(i), for .S > R, .g(S)mN −


g(R)mN → ∞, which implies that

. lim P {R̂ > R} = 0,


N →∞

proving (2.3.9). By Assumptions 2.3.1 and 2.3.2(ii), the fact that .max1≤i≤R |k̂i −
ki∗ | = oP (N ) follows by using the same argument that establishes (2.3.11). ⨆

Assumption 2.3.2 allows for a variety of choices of .mN and .g(k) that lead to
consistent estimates in the sense of (2.3.9). For instance if the changes are shrinking
to zero slower than .(log N/N )1/2 , then the BIC penalty satisfies Assumption 2.3.2.
Remark 2.3.1 Although binary segmentation and model selection criteria methods
are motivated by different considerations, they are comparable in the following
way. In the simple multiple change in the means model (2.3.1), the problem
of estimating the number of change points and their locations is equivalent to
60 2 Change Point Analysis of the Mean

fitting a piecewise constant mean function to the observations. Applying binary


segmentation to estimate this mean function is akin to selecting a model for the mean
using forward selection; we start with the simplest possible model of no change
points in the mean, and change points are added one-by-one until no “significant”
changes are detected. Using model selection criteria offers an alternative way to
arrive at a model. Other model selection techniques, including backward selection
and cross-validation, may also be employed to perform change point analysis. See
Sect. 2.7 for a review of related literature.

2.3.3 The Asymptotic Distribution of Multiple Change Point


Estimators

Next we turn to analyzing the limiting distribution of refinements of the estimators


produced by, for example, binary segmentation. We have discussed several methods
to segment the data and produce estimates of the number of change points .R̂, and
the change point locations .k̂1 < . . . < k̂R̂ . These methods will typically satisfy
the following mild consistency criterion, which we use as an assumption in this
subsection.
Assumption 2.3.3 For all .ε > 0
⎛ ⎧ ⎫⎞
. lim P {R̂ = R} ∩ max |k̂i − ki | < εN = 1. (2.3.12)
N →∞ 1≤i≤R

We note that (2.3.12) in effect allows us to assume in asymptotic arguments that


the number of change points R is known. Also, under (2.3.12) weighted CUSUM
processes of .Lν –decomposable innovations maximized over .k̂i , . . . , k̂i+1 have the
same asymptotic behaviour as if they are maximized over .ki , . . . , ki+1 . This follows
as a consequence of the following result.
Lemma 2.3.2 If the model errors in (2.3.1) satisfy Assumption 1.2.2 with parame-
ter .ζ ,
1
0≤κ<
. − ζ, (2.3.13)
2
and
P
k̂/N → θ ∈ (0, 1),
.

then
| |
|Σ |
1 | k
|
.N
−1/2+κ
max | E |
i | = oP (1).
|
|k̂−k|≤|k̂−N θ| |k̂ − k|κ | |
i=k̂
2.3 Multiple Changes in the Mean 61

Proof We show that for all .x > 0


⎧ | | ⎫
⎨ | j | ⎬
1 |Σ |
. lim lim sup P N −1/2+κ max max | E | > x = 0.
⎩ LαN ⎦<k≤N θ k<j ≤N θ (k − j )κ | i | ⎭
α→θ N →∞ | i=k |

By Assumption 1.2.2 we may define a sequence of Wiener processes .{WN (x), 0 ≤


x ≤ N} such that
| x |
|Σ |
−ζ | |
. sup x | Ei − σ WN (x)| = OP (1),
1≤x≤N | |
i=1

which implies that


| |
|Σ |
1 | j |
.N
−1/2+κ
max max | Ei − σ (WN (j ) − WN (k))|| = oP (1)
LαN ⎦<k≤N θ k<j ≤N θ (k − j )κ |
| i=k |

since (2.3.13) holds. The distribution of .{WN (x), x ≥ 0} does not depend on N , and
by the scale transformation of the Wiener process we have

1
N −1/2+κ
. max max |WN (k) − WN (LNα⎦ + 1)| (2.3.14)
LαN ⎦<k≤N θ k<j ≤N θ (j − k)κ
D 1
= max max |W (j/N ) − W (k/N )|
LN α⎦/N <k/N ≤θ k/N <j/N ≤θ (j/N − k/N )κ
1
→ sup sup |W (s) − W (t)| a.s.
α<t≤θ t<s≤θ (s − t)κ

The existence of the limit in (2.3.14) follows from Theorem A.2.2. Theorem A.2.2
also yields that

1
. sup sup |W (s) − W (t)|
α<t≤θ t<s≤θ (s − t)κ

≤ ξ sup sup (s − t)1/2−κ (log(1/(s − t)))1/2 ,


α<t≤θ t<s≤θ

for a random variable .ξ , and therefore

1
. sup sup |W (s) − W (t)| → 0 a.s.,
α<t≤θ t<s≤θ (s − t)κ

as .α → θ . This completes the proof. ⨆



62 2 Change Point Analysis of the Mean

In order to construct confidence intervals for the time of changes, we refine the
initial estimators .k̂1 , . . . , k̂R . Let
⎛ ⎞κ
1
.k̃l = sargmax
k∈{k̂l−1 ,...,k̂l+1 } (k − k̂l−1 )(k̂l+1 − k)
| |
| Σ
k
k − k̂l−1 Σ
k̂l+1
|
× || Xi − Xi ||,
k̂l+1 − k̂l−1
i=k̂l−1 +1 i=k̂l−1 +1

l ∈ {1, . . . , R}, where .k̂0 = 0 and .k̂R+1 = N. To reflect the multiple changes we
.

also need to modify the drift term of (2.2.1):


⎧ θ
⎪ l+1 − θl θl − θl−1

⎪ (1 − κ) + κ, if t < 0,
⎨ θl+1 − θl−1 θl+1 − θl−1
.m̄l (t) = 0, if t = 0, (2.3.15)


⎩ θl+1 − θl κ + θl − θl−1 (1 − κ),

if t > 0,
θl+1 − θl−1 θl+1 − θl−1

l ∈ {1, . . . , R}. Similarly, the limiting distribution is again the location of the
.

maximum of the two sided Wiener process with triangular drift of (2.2.2):

ξ̄l = argmaxt {W (t) − |t|m̄l (t)}.


. (2.3.16)

Theorem 2.3.3 We assume the observations follow model (2.3.1), Assump-


tions 2.3.1 and 2.3.3 are satisfied, and the model errors are .Lν -decomposable
and satisfy the conditions of Lemma 2.3.2. If additionally .0 ≤ κ < 1/2 − ζ , and for
all .l ∈ {1, . . . , R}

ΔN,l → 0
. and NΔ2N,l → ∞, (2.3.17)

then

Δ2N,1 ⎛ ∗
⎞ Δ2 ⎛

⎞ Δ2N,R ⎛ ⎞
k̃R − kR∗
N,2
.
2
k̃ 1 − k 1 , 2
k̃ 2 − k 2 , . . . , 2
σ σ σ
are asymptotically independent, and for each .l ∈ {1, . . . , R},

Δ2N,l ⎛ ∗
⎞ D
. k̃ l − k l → ξ̄l .
σ2
Proof To simplify the presentation we discuss the details when .R = 2. Assump-
tion 2.3.3 may be taken to mean that

|k̂1 − k1∗ | = oP (N )
. and |k̂2 − k2∗ | = oP (N ). (2.3.18)
2.3 Multiple Changes in the Mean 63

Using Lemma 2.3.2, we only must consider the weighted CUSUM statistics
computed from the observations .{Xi , 1 ≤ i ≤ k2∗ } to compute .k̃1 , and .{Xi , k1∗ ≤
i ≤ N } to compute .k̃2 . Hence the limit distributions of the estimators follow
from Theorem 2.2.1. The proof of Theorem 2.2.1 also shows that apart from the
trend the limiting distribution of .k̃1 is determined by the sum of the .Ei ’s, when
.i = k1 − LN δ⎦, k1 − LNδ⎦ + 1, . . . , k1 + LNδ⎦, and similarly the limit distribution

of .k̃2 is determined by the sum of the .Ei ’s, when .i = k2 − LNδ⎦, k2 − LNδ⎦ +
1, . . . , k2 +LNδ⎦, for any .δ > 0. The asymptotic independence of the two estimators
results from the assumed .Lν decomposability, see Theorem A.1.3. ⨆

Remark 2.3.2 Theorem 2.3.3 suggests that an approximate .1 − α confidence
interval for the .l’th change point .kl∗ can be computed as
⎛ ⎞
∗ qξ (κ, 1 − α/2)σ̂N2 qξ (κ, α/2)σ̂N2
.kl ∈ k̃l − , k̃l − ,
Δ̂2N,l Δ̂2N,l

where .qξ (κ, α) is the .α quantile of the random variable .ξ̄l defined in (2.3.16), .σ̂N2 is
an estimator of the variance parameter in (2.3.1), and

1 Σ
k̂l+1
1 Σ
k̃l
Δ̂N,l =
. Xj − Xj
k̂l+1 − k̃l k̃l − k̂l−1
j =k̃l +1 j =k̂l−1

is an estimator of .ΔN,l . This interval can be thought of as conservative in the sense


that it is derived under the assumption that .ΔN,l asymptotically vanishes.
So far we assumed in Theorem 2.3.3 that all changes are small, i.e. .ΔN,l →
for each .l ∈ {1, . . . , R}. If the size of the change is of fixed size, then the limiting
distribution of the change point estimators coincides with the distribution of the
random variable

ξ̄l,Δl = argmaxj {Δl S(j ) − Δ2l |j |m̄l (j )},


.

where

. lim ΔN,l = Δl /= 0, (2.3.19)


N →∞

{S(j ), j ∈ Z} and .m̄l (t) are defined in (2.2.16) and (2.3.15), respectively. The
.

following results combines Theorems 2.2.2 and 2.3.3.


Theorem 2.3.4 Suppose the observations follow model (2.3.1), Assumptions 2.3.1
and 2.3.3 hold, the model errors are .Lν -decomposable and satisfy the conditions of
Lemma 2.3.2 and .0 ≤ κ < 1/2 − ζ . Then

Δ2N,1 ⎛ ∗
⎞ Δ2 ⎛
N,2 ∗
⎞ Δ2N,R ⎛ ∗

. k̃ 1 − k 1 , k̃ 2 − k 2 , . . . , k̃ R − k R
σ2 σ2 σ2
64 2 Change Point Analysis of the Mean

are asymptotically independent.


If (2.3.19) holds, then

D
k̃l − k ∗ → ξ̄l,Δl .
.

Remark 2.3.3 We have seen that the change point tests studied in this section
asymptotically reject the no-change in the mean null hypothesis in the presence
of one or more mean changes. Since the standardized CUSUM process is equivalent
with the likelihood ratio test when the errors are independent normal random
variables, it would be natural to derive the likelihood ratio assuming that under
the alternative there are exactly .R ≥ 2 changes. Maximizing such a likelihood ratio
with respect to the location of R changes in the mean, after some algebra, leads to
maximizing terms in the random part of the CUSUM process of the form
⎛ ⎞2
1 Σ
k
. max Tk,l , where Tk,l = Ei .
1≤l<k≤N l−k
i=l+1

Similar terms will also appear in studying the asymptotic behaviour of the estimators
in binary segmentation under shrinking magnitudes of the changes, and under an
increasing number of change points. Since .max1≤l<k≤N Tk,l ≥ max1≤i≤N Ei2 ,
the rate at which this random variable diverges will depend on the rate of decay
of the cumulative distribution function of the .Ei ’s. We refer to Révész (1990)
and Shao (1995) for optimal, almost sure limit results on .max1≤l<k≤N Tk,l . In
order to obtain a divergence at no more than a logarithmic rate, these random
variables must have a well defined moment generating function. In the asymptotic
results presented to this point, we often have made use of invariance principles
for errors that allow them to be replaced with independent identically distributed
normal random variables, but such approximations fail for .max1≤l<k≤N Tk,l . The
distribution of .max1≤l<k≤N Tk,l depends on the cumulative distribution function
of the errors even in case of independent, identically distributed random variables.
In general it makes sense to avoid estimating the mean of a series on segments
that only contain a few observations, so we might require the length of each sub-
segment considered in, for example, binary segmentation to be at least as large as
.h = h(N). To consider asymptotics in this case we would need a limit result for

.LN (h) = max1≤l<k≤N,k−l≥h Tk,l . Now approximations for the partial sums of the

.Ei ’s can be used to derive the asymptotic distribution of .LN (h). The choice of h

will depend on the rate at which the partial sums may be approximated by a Wiener
process. Due to the optimality of the Komlós et al. (1975, 1976) approximation, in
case of independent and identically distributed errors, h can be small if .Ei has high
moments. Using the main results in Berkes et al. (2014) (see Theorem A.1.2), the
arguments used for independent and identically distributed random variables can be
extended to the dependent case.
2.3 Multiple Changes in the Mean 65

2.3.4 Multivariate Observations

In this section we considered the behaviour of the weighted CUSUM under the
alternative for scalar observations. Using the methods discussed in Sect. 1.3 one can
extend our results to vector valued observations. One possibility to estimate the time
of change is
⎛ k ⎞T
1 Σ k Σ
N
.k̂N = sargmax Xi − Xi
k∈{2,...,N −1} [k(N − k)]
κ N
i=1 i=1
⎛ ⎞
Σ
k
k Σ
N
×A Xi − Xi , (2.3.20)
N
i=1 i=1

where .A is a symmetric, positive definite weight matrix. The asymptotic properties


of .k̂N can be established along the lines of Theorem 2.2.1. Also, the results of
Theorems 2.3.3 and 2.3.4 can be extended when we have multiple changes in the
means of random vectors. Moreover, due to the fact that . ||x ||A = (x T Ax)1/2 for
a positive definite .A defines a norm on .Rd that is generated by an inner-product,
Theorem 8.2.2 implies that binary segmentation based on the estimator (2.3.20), so
long as .0 < κ ≤ 1/2 is consistent as in Theorem 2.3.1.
Instead of the quadratic form based norm defined by the matrix .A, we may also
use the maximum norm. We recall from (1.3.11) that
⎛ ⎞
L(N +1)t/N
Σ ⎦ L(N + 1)t/N ⎦ Σ ⎠
N
.zN (t) = E −1/2 ⎝ Xi − Xi , 0 < t < 1,
N
i=1 i=1

and .zN (t) = (zN,1 (t), zN,2 (t), . . . , zN,d (t))T and .0 ≤ κ ≤ 1/2. We suggest the
maximum norm based estimator
⎛ ⎞κ
(1) 1
.k̂ = sargmax max |zN,j (k/(N + 1))| (2.3.21)
k∈{2,...,N −1} 1≤j ≤d k(N − k)
N

It can be shown that .|k̂N − k ∗ | = oP (N ) if multivariate analogues of the conditions


(1)

of Theorem 2.2.1 are satisfied.


66 2 Change Point Analysis of the Mean

In a similar fashion we can estimate .k ∗ using the projections defined in (1.3.12).


Let
⎛ ⎞κ
(2) 1 ∗
.k̂
N = sargmax 1≤j max
≤d k(N − k)
|zN,j (k/(N + 1))|. (2.3.22)
k∈{2,...,N −1}

We have again that .|k̂N − k ∗ | = oP (N ) under multivariate analogues of the


(2)

conditions of Theorem 2.2.1.

2.4 Classical CUSUM Tests for Changes in Distribution

Motivated by the problem of testing for changes in the mean parameter, to this
point we have studied the asymptotic properties of the CUSUM process of the raw
observations. In many cases though we are interested in evaluating for the presence
of change points in other quantities describing the distribution of the observations,
for instance the median. In order to perform such change point analyses, it is useful
to study CUSUM processes derived from the empirical distribution function. We let
.F1 , . . . , FN denote the respective cumulative distribution functions (CDFs) of the

observations .X1 , . . . , XN . We are generally interested in testing

.H0 : F1 (x) = · · · = FN (x) (2.4.1)

against the multiple change point alternative, with .k0∗ = 1 < k1∗ < · · · < kR∗ <

kR+1 = N,


HA :
. Fi = F (l) , for kl−1 < i ≤ kl∗ , l ∈ {1, . . . , R + 1}, (2.4.2)

where the CDFs .F (l) , .l ∈ {1, . . . , R + 1}, satisfy .F (l) /= F (l+1) .


We begin with the related problem of evaluating for change points in the
sequence of medians. We suppose the following:
Assumption 2.4.1 .F1 , . . . , FN are absolutely continuous with bounded densities.
Under Assumption 2.4.1, let .mi denote the median of .Xi satisfying .Fi (mi ) =
1/2. When .H0 holds, we let F denote the common CDF of .X1 , . . . , XN , with
median m satisfying .F (m) = 1/2. We introduce the process
⎛ ⎞
L(NΣ
+1)t⎦
L(N+1)t⎦ Σ
N
QN (t) = N −1/2 ⎝
. 1 {Xi ≤ m} − 1 {Xi ≤ m}⎠ , t ∈ [0, 1].
N
i=1 i=1

In some cases one may assume that m is known, for instance if we wish to test
whether m changes during the sample away from a historical baseline median. If m
2.4 Classical CUSUM Tests for Changes in Distribution 67

is unknown, we estimate it with the sample median .m̂N , and replace m with .m̂N in
the definition of .QN (t), leading to the process
⎛ ⎞
L(NΣ
+1)t⎦
L(N + 1)t⎦ Σ {
N
{ } }
Q̂N (t) = N −1/2 ⎝
. 1 Xi ≤ m̂N − 1 Xi ≤ m̂N ⎠ ,
N
i=1 i=1

t ∈ [0, 1].

We now aim to establish the asymptotic properties of .QN and .Q̂N when the
observations arise from a strictly stationary process .{Xi , i ∈ Z}. In this case under
Assumption 2.4.1, .{Ui = F (Xi ), i ∈ Z} is also a stationary sequence, but with
uniform marginal distributions over the unit interval. If the process .{Xi , i ∈ Z} is
p
.L -decomposable for some .p > 0 and .α > 4 in Definition 1.1.1, then the infinite

sequence

Σ
τ2 =
. E [1{(U0 ≤ 1/2} − 1/2)(1{Ui ≤ 1/2} − 1/2)]
l=−∞

is absolutely convergent.
Theorem 2.4.1 If .H0 of (2.4.1) and Assumption 2.4.1 are satisfied, and .{Xi , i ∈ Z}
is .Lp -decomposable for some .p > 0 and .α > 4 in Definition 1.1.1, then we can
define a sequence of Brownian bridges .{BN (t), 0 ≤ t ≤ 1} such that

. sup |QN (t) − τ BN (t)| = oP (1) (2.4.3)


0≤t≤1

and
| |
| |
. sup |Q̂N (t) − τ BN (t)| = oP (1). (2.4.4)
0≤t≤1

Proof By the probability integral transformation, it is enough to consider the


processes
⎛ ⎞
L(NΣ
+1)t⎦ Σ
N
L(N + 1)t⎦
RN (t) = N −1/2 ⎝
. 1 {Ui ≤ 1/2} − 1 {Ui ≤ 1/2}⎠
N
i=1 i=1

and
⎛ L(NΣ
+1)t⎦ { }
R̂N (t) = N −1/2
. 1 Ui ≤ ÛN (1/2)
i=1

L(N + 1)t⎦ Σ {
N }⎞
− 1 Ui ≤ ÛN (1/2) ,
N
i=1
68 2 Change Point Analysis of the Mean

where .ÛN (1/2) is the median of .U1 , . . . , UN . It follows from Theorem A.1.4 that
we can define sequence of Wiener processes .{WN (t), 0 ≤ t ≤ 1} such that
| |
| |
| −1/2 L(NΣ
+1)t⎦
|
. sup |N (1 {U ≤ 1/2} − 1/2) − τ W (t) | = oP (1), (2.4.5)
| i N |
0≤t≤1 | i=1 |

which implies (2.4.3).


We note
| ⎛ ⎞|
| L(N + 1)t⎦ |
|
. sup R̂N (t) − R̄N (t) − R̄N (1) || = oP (1),
| N
0≤t≤1

where
+1)t⎦ ⎛
L(NΣ ⎞
R̄N (t) = N −1/2
. 1{Ui ≤ ÛN (1/2)} − ÛN (1/2) .
i=1

It follows once again from Theorem A.1.4 that there are continuous Gaussian
processes .{┌N (t, x), 0 ≤ t, x ≤ 1} such that
| |
| |
| −1/2 L(NΣ
+1)t⎦
|
. sup | (1{Ui ≤ x} − x) − ┌N (t, x)|| = oP (1), (2.4.6)
|N
0≤t,x≤1 | i=1 |

and .E┌N (t, x) = 0, .E┌N (t, x)┌N (t ' , x ' ) = γ (x, x ' ) min(t, t ' ), where

Σ
'
┌ ( ) ┐
γ (x, x ) =
. E (1{U0 ≤ x} − x) 1{Ul ≤ x ' } − x ' . (2.4.7)
l=−∞

Hence we conclude
| |
| |
. sup |R̄N (t) − ┌N (t, ÛN (1/2))| = oP (1).
0≤t≤1

It follows from Horváth (1984) (see also Csörgő and Horváth (1993, pp. 24 and 25))
and (2.4.6) that
| |
| 1 ||
|
. ÛN (1/2) − = OP (N −1/2 ). (2.4.8)
| 2|

(2.4.8) in combination with the continuity of .{┌N (t, x), 0 ≤ t, x ≤ 1} gives that
| |
| |
. sup |┌N (t, ÛN (1/2)) − ┌N (t, 1/2)| = oP (1).
0≤t≤1
2.4 Classical CUSUM Tests for Changes in Distribution 69

Hence
| |
. sup |R̄N (t) − ┌N (t, 1/2)| = oP (1),
0≤t≤1

and therefore
| |
. sup |R̄N (t) − t R̄N (1) − (┌N (t, 1/2) − t┌N (1, 1/2))| = oP (1).
0≤t≤1

Using the covariance structure of .┌N (t, x) we have that


┌ ┐
E (┌N (t, 1/2) − t┌N (1, 1/2))(┌N (t ' , 1/2) − t ' ┌N (1, 1/2))
.

= (min(t, t ' ) − tt ' )γ (1/2, 1/2).

Since .τ 2 = γ (1/2, 1/2), the proof of (2.4.4) is complete. ⨆



It is interesting to note that replacing the median with the sample median is of no
consequence in terms of the asymptotic properties of .QN .
We now turn to the asymptotic analysis of .QN and .Q̂N assuming there are
multiple changes in distribution as described in .HA of (2.4.2).
Assumption 2.4.2 .F (l) (·) are absolutely continuous with bounded density, for each
.l ∈ {1, ..., R}.

We use the notation .k0 = 0 and .kR+1 = N. In order to make rigorous statements
about the asymptotic properties .QN , we assume that between each pair of change
points the process is stationary and .Lν -decomposable, resulting in a piecewise .Lν -
decomposable process. This is formalized in the following assumption.

Assumption 2.4.3 .Xi = gl (ηi , ηi−1 , . . .), kl−1 < i ≤ kl∗ , 1 ≤ l ≤ R + 1, where
.g1 , g2 , . . . , gR+1 are deterministic, measurable functions, .gi : S
∞ → R, .E|X |ν <
i
∞ with some .ν > 4, .{ηi , i ∈ Z} are independent and identically distributed random
variables with values in a measurable space .S, and for .kl−1 ≤ i ≤ kl
( | | )
.
∗ |ν 1/ν
vm,l = E |Xi − Xi,m ≤ al m−α with some al > 0 and α > 4,

∗ ∗ , η∗
= gl (ηi , . . . , ηi−m+1 , ηi−m ∗
where .Xi,m i−m−1 , . . .), where .{ηk , k ∈ Z} are
independent, identically distributed copies of .η0 , independent of .{ηj , j ∈ Z}, for
∗ ∗
.k
l−1 < i ≤ kl , .l ∈ {1, . . . , R + 1}.
It follows from Theorem A.1.4 in the appendix that
| |
| |
. sup |F̂N (x) − FN∗ (x)| = OP (N −1/2 ), (2.4.9)
−∞<x<∞
70 2 Change Point Analysis of the Mean

where

1 Σ
N
F̂N (x) =
. 1 {Xi ≤ x} (2.4.10)
N
i=1

is the empirical distribution function of the observations, and

Σ
R+1
kl∗ (l)

.FN (x) = F (x).
N
l=1

We define the sample median as

m̂N = inf{x : F̂N (x) ≥ 1/2}.


.

Let

. m∗N = inf{x : FN∗ (x) ≥ 1/2}

denote the median of the distribution function .F ∗ . To avoid degenerate cases, we


use the following assumption.
Assumption 2.4.4 There is a continuous CDF .F ∗ (x) such that
| ∗ |
. sup |F (x) − F ∗ (x)| = o (1) .
N
−∞<x<∞

The sizes of the changes are encoded by


⎛ ⎞
kl∗ k∗
pN (l) = (F (l) (m∗N ) − ĀN )
. − l−1 , l ∈ {1, . . . , R + 1},
N N

where
R+1 ⎛ ∗ ⎞
1 Σ kl∗ kl−1
.ĀN = − F (l) (m∗N ).
N N N
l=1

Theorem 2.4.2 If .HA of (2.4.2), Assumptions 2.4.2–2.4.4 are satisfied, and

N 1/2
. max |pN (l)| → ∞, (2.4.11)
1≤l≤R+1

then
P
. max |QN (t)| → ∞, (2.4.12)
0≤t≤1
2.4 Classical CUSUM Tests for Changes in Distribution 71

and
P
. max |Q̂N (t)| → ∞. (2.4.13)
0≤t≤1

Proof We establish (2.4.13), since (2.4.12) is simpler to show. Assumption 2.4.3


and (2.4.9) yield

|m̂N − m∗N | = OP (N −1/2 )


. (2.4.14)

(see Csörgő and Horváth (1993, pp. 24 and 25)). If .kl−1 < k ≤ kl , then we have

k∗
Σ
k Σ
l−1 Σ
j
Σ
k

. 1{Xi ≤ m̂N } = 1{Xi ≤ m̂N } + (k − kl−1 ) 1{Xi ≤ m̂N }
i=1 j =1 i=kj∗−1 i=kl−1

k∗
Σ
l−1 Σ
j

= (1{Xi ≤ m̂N } − F (j ) (m̂N ))


j =1 i=kj∗−1

Σ
k
+ (k − kl−1 ) 1{Xi ≤ m̂N } − F (l) (m̂N ))

i=kl−1

k∗
Σ
l−1 Σ
j

+ (F (j ) (m̂N ) − F (j ) (m∗N ))
j =1 i=kj∗−1

Σ
k

+ (k − kl−1 ) (F (l) (m̂N ) − F (l) (m∗N ))
i=kl−1

kj∗
Σ
l−1 Σ Σ
k
+ F (j ) (m∗N ) + (k − kl−1

) F (l) (m∗N ).
j =1 i=kj∗−1 ∗
i=kl−1

We note that Assumption 2.4.4 and (2.4.14) imply that

P
m̂N → m∗ ,
.

where .m∗ denotes the median of .F ∗ . Note that according to Theorem 2.4.1, each
scaled partial sum

Σ
k

(k − kl−1
. ) (1{Xi ≤ m̂N } − F (l) (m̂N ))
i=kl−1
72 2 Change Point Analysis of the Mean

satisfies
| |
| Σ |
| k
|
. max ||(k − kl−1

) (1{Xi ≤ m̂N } − F (l) (m̂N ))|| = OP (N 1/2 ).
kl −1≤k≤kl | |
i=kl−1

From this we obtain that


|
| l−1 kj∗
|Σ Σ
|
. max | (1{Xi ≤ m̂N } − F (j ) (m̂N ))
1≤k≤N |
|j =1 i=kj∗−1
|
|
Σ
k
|

+(k − kl−1 ) (1{Xi ≤ m̂N } − F (m̂N ))|| = OP (N 1/2 ).
(l)

i=kl−1 |

Assumption 2.4.2 and (2.4.14) with the mean value theorem yield
|
| l−1 kj∗
|Σ Σ
|
. max | (F (j ) (m̂N ) − F (j ) (m∗N ))
1≤k≤N |
|j =1 i=kj −1

|
|
Σ
k
|

+(k − kl−1 ) (F (l) (m̂N ) − F (l) (m∗N ))|| = OP (N 1/2 ).
i=kl−1 |

Thus we conclude

. sup |Q̂N (t)| = TN + OP (1),


0≤t≤1

where
|
| l−1 kj∗
|Σ Σ Σ
k
|
.TN = max | F (j ) (m∗N ) + (k − kl−1

) F (l) (m∗N )
1≤k≤N |
|j =1 i=kj∗−1 i=kl−1
|
kj∗ |
k Σ Σ (j ) ∗ ||
R
− F (mN )| .
N |
j =1 i=kj∗−1 |
2.4 Classical CUSUM Tests for Changes in Distribution 73

By comparing the maximum only over the points .k = kl∗ , .l ∈ {1, . . . , R}, we see
that
| |
| l |
|Σ |
.TN ≥ max || pN (j )|| ,
1≤l≤R+1 | |
j =1

completing the proof of (2.4.13). ⨆



Theorems 2.4.1 and 2.4.2 may be used to construct asymptotically valid tests for
change points in the median. By altering the above arguments slightly, we may adapt
these results to test for the stability of the entire distribution of the observations. We
recall the empirical distribution function .FN (·) from (2.4.10), and define the quantile
function

. q̂N (u) = inf{x : F̂N (x) ≥ u}, 0 < u < 1.

Now the CUSUM process is redefined to depend on the quantile level .u ∈ (0, 1) as
well:

Q̃N (t, u)
. (2.4.15)
⎛ ⎞
L(NΣ
+1)t⎦ Σ
N
{ } L(N + 1)t⎦ { }
= N −1/2 ⎝ 1 Xi ≤ q̂N (u) − 1 Xi − q̂N (u) ⎠ ,
N
i=1 i=1

0 < t < 1.
.

Theorem 2.4.3 If .H0 of (2.4.1), and Assumption 2.4.1–2.4.3 are satisfied, and the
sequence .{Xi , i ∈ Z} is .Lν -decomposable for some .ν > 4 with parameter .α > 4,
we can define a sequence of Gaussian processes .{┌˜ N (t, u), 0 ≤ t, u ≤ 1} such that
| |
| |
. sup |Q̃N (t, u) − ┌˜ N (t, u)| = oP (1)
0<t,u<1

.E ┌˜ N (t, u) = 0 and .E ┌˜ N (t, u)┌˜ N (t ' , u' ) = γ (u, u' )(min(t, t ' ) − tt ' ), where
'
.γ (u, u ) is defined in (2.4.7).

Proof By Assumption 2.4.1, we can use again apply the probability integral
transformation. We again let .Ui = F (Xi ), 1 ≤ i ≤ N. Using (2.4.6), we get
| |
| N ⎛ ⎞ |
| −1/2 Σ |
. sup |N 1{Ui ≤ ÛN (u)} − ÛN (u) − ┌N (t, ÛN (u))| = oP (1),
0≤t,u≤1 | i=1
|
74 2 Change Point Analysis of the Mean

where
⎧ ⎫
1 Σ
N
.ÛN (u) = inf x : 1{Ui ≤ x} ≥ u , 0 < u < 1.
N
i=1

Similarly to (2.4.8), (Csörgő and Horváth 1993, pp. 24–25) and (2.4.6) imply
| |
| |
. sup |ÛN (u) − u| = OP (N −1/2 ). (2.4.16)
0≤u≤1

Using again the continuity of .{┌N (t, u), 0 ≤ t, u ≤ 1} and (2.4.16) we get
| |
| |
. sup |┌N (t, ÛN (u)) − ┌N (t, u)| = oP (1).
0≤t,u≤1

Thus we conclude
| |
| |
. sup |Q̃N (t, u) − (┌N (t, u) − t┌N (1, u))| = oP (1).
0≤t,u≤1

The result follows by taking .┌˜ N (t, u) = ┌N (t, u) − t┌N (1, u). ⨆

The behaviour of .{Q̃N (t, u), 0 ≤ t, u ≤ 1} under the multiple changes in the
distributions of the .Xi ’s under the alternative can be studied along the lines of
Theorem 2.4.2. The times of changes can be estimated with the locations of the
maximum of .sup0≤u≤1 |Q̃N (t, u)| with respect to t.

2.5 Data Examples

Example 2.5.1 (River Nile Data) Figure 2.1 displays the yearly measurements
of the flow of the river Nile measured at Aswan over N = 100 years starting
from 1817. A change in the mean of the series appears to take place around the
year 1900. In order to test for the presence of a change point in the mean of the
series, we consider the CUSUM process QN (t) in (1.2.1), and its weighted versions
QN (t)/[t (1 − t)]γ . Figure 2.2 displays QN (t)/[t (1 − t)]γ as a function of t ∈ (0, 1)
for γ ∈ {0, 1/4, 1/2}. Each process has a distinct peak corresponding to the year
1898. This coincides with the year that construction began on the Aswan Low Dam.
We remark the effect of changing γ in Fig. 2.2: although the peak of the CUSUM
process remains largely unaffected, we see that larger values of γ amplify the values
of the CUSUM process closer to the end points.
2.5 Data Examples 75

1400
Fig. 2.1 The time series of
annual flow measurements of
the Nile river taken at Aswan
measured in 108 m3 from
1871–1970

1200
1000
Flow in 108m3

800
600

1880 1900 1920 1940 1960

Time (year)

In order to evaluate the statistical significance of the observed break in the series,
we may calculate an approximate p-value of a test of H0 versus HA in (1.1.2) and
(1.1.3), using the approximation obtained from Theorem (1.2.2): when γ ∈ {0, 1/4},
⎧ | ⎫
|B(t)| 1 |QN (t)| ||
p=P
. sup > sup X
γ| 1
, . . . , XN ,
0<t<1 [t (1 − t)] 0<t<1 σ̂N [t (1 − t)]
γ

where σ̂N2 is an estimator for the long-run variance in the model errors in (1.1.1),
and B is a Brownian bridge that is independent of the sample. We discuss in Chap. 3
below how such estimators for the variance may be obtained. Using for example
the kernel-lag window estimator σN2 defined in (3.1.18) below leads to approximate
p-values of zero when γ ∈ {0, 1/4}. The horizontal dotted lines in Fig. 2.2 show
the approximate null 95% quantiles of sup0<t<1 |QN (t)|/[t (1 − t)]γ , γ ∈ {0, 1/4}.
We see that the test statistics far exceed these values, suggesting the presence of a
change point.
In order to evaluate the statistical significance of sup0<t<1 |QN (t)|/[t (1 − t)]1/2 ,
we may instead appeal to the Darling-Erdős approximation derived in Theo-
76 2 Change Point Analysis of the Mean

1000
γ=0
γ = 1/4
800 γ = 1/2
QN(t) (t(1−t))γ

600
400
200
0

Fig. 2.2 Plots of the weighted CUSUM process |QN (t)|/[t (1 − t)]γ over t ∈ (0, 1) for γ ∈
{0, 1/4, 1/2} computed from the annual Nile river flow series. The horizontal black dotted line
shows the 95% quantile of σ̂N sup0≤t≤1 |B(t)|, and the horizontal red dotted line shows the 95%
quantile of σ̂N sup0≤t≤1 |B(t)|/[t (1 − t)]1/4 . The blue dotted line shows the approximate 95% null
quantile of sup0<t<1 |QN (t)|/[t (1 − t)]1/2 described in (2.5.1)

rem 1.2.5. This suggests that an approximation of the 95% null quantile of
sup0<t<1 |QN (t)|/[t (1 − t)]1/2 is of the form

σ̂N (qG,0.95 + b(log N))


. , (2.5.1)
a(log N)

where qG,0.95 is the 95th quantile of the Gumbel law with cumulative distribution
function F (x) = exp(−2e−x ). This threshold is also displayed in Fig. 2.2.
Example 2.5.2 (Array Comparative Genomic Hybridization Data) Array com-
parative genomic hybridization (A-CGH) data consist of log-ratios of normalized
gene expression intensities from disease versus control samples, indexed by their
location on the human genome. Figure 2.3 displays a sub-sequence of A-CGH
data obtained for the study of genetic aberrations in 26 patients with Glioblastoma
Multiforme obtained from Lai et al. (2005). Aberrations in this case appear as
change-points in the mean of the A-CGH sequence, and so we aimed to perform
a multiple change point analysis on this series. In order to detect change points, we
2.5 Data Examples 77

performed binary segmentation as described in Sect. 2.3.1 using the standardized


CUSUM detector sup0<t<1 |QN (t)|/σ̂N [t (1 − t)]1/2 , and change point estimator

⎛ ⎞1/2 ||Σ |
k Σ ||
k N
N |
k̂N = sargmax
. | Xi − Xi | .
k∈{1,...,N } k(N − k) | N |
i=1 i=1

Using the threshold ρN = (2 log N)1/2 leads to the estimation of four change
points, as seen in the top panel of Fig. 2.3, which roughly coincided with the visible
abberations in the series. Using instead the threshold ρN = (log N)1/2 leads to the
estimation of eight change points, as seen in the bottom panel of Fig. 2.3. In order
select a segmentation, we computed the BIC of each segmentation as a function
of the number of change points estimated as in Sect. 2.3.2. The candidate change
point models are ordered in each stage of the binary segmentation in descending
order based on the size of the normalized detector computed in that stage. These
are displayed in Fig. 2.4. We see in this plot that there is a large drop in the BIC
occurring at S = 4 change points, after which the BIC levels off. This suggests
using a four change point model, which agrees with the initial binary segmentation
using the threshold ρN = (2 log N)1/2 .
In order to evaluate the uncertainty in the estimates of the change point locations,
we computed, using the standard CUSUM process to refine the initial change point
estimators, approximate 95% confidence intervals for each change point location as
described in Remark 2.3.2. These intervals are plotted in the top panel of Fig. 2.3
as transparent red bands. We observed that these intervals are quite narrow, and
localize the visible starting and ending indices of the aberrations. We remark that,
withstanding clear violations of the assumptions underpinning Theorem 2.3.3, we
view these intervals as being conservative since they are constructed under the
assumption that the change in the mean is shrinking as a function of the sample
size.

Example 2.5.3 (Prague Temperature Data) In this example we consider a change


point analysis of the “Prague–Klementinum” time series of the mean yearly
temperatures, measured in degrees Celsius, in Prague, Czech Republic, from 1775
until 1990 (N = 215). This time series was named after the monastery in which the
temperature measurements were recorded, and its history is discussed in Jarušková
and Antoch (2020). A plot of the series is shown in Fig. 2.5, in which the mean of
the series appears to undergo at least two level shifts over the observation period.
The weighted CUSUM process |QN (t)|/[t (1 − t)]γ computed from the series
for γ ∈ {0, 1/4} is shown in the left–hand panel of Fig. 2.6. Two peaks in these
CUSUM processes are apparent, both exceeding approximations of the 95% null
critical values of supt∈(0,1) |QN (t)|/[t (1 − t)]γ . These correspond to the change
point estimators k̂1 = 1837 and k̂2 = 1934. The segmented mean using these
change point estimates is shown in Fig. 2.5. We calculated the ACF of the series
after centering by this piecewise constant mean function, which is displayed in the
78 2 Change Point Analysis of the Mean

4
Normalized A−CGH

2
0
−2

0 50 100 150 200

Index
4
Normalized A−CGH

2
0
−2

0 50 100 150 200

Index

Fig. 2.3 The series of normalized A-CGH measurements from chromosome 7 in sample GBM29
as shown in Figure 4 of Lai et al. (2005). The segmentation of the mean in the top panel was
obtained using binary segmentation with the threshold ρN = (2 log N )1/2 , which yielded four
change points, whereas the segmentation in the bottom panel is based on binary segmentation with
threshold ρN = (log N )1/2 , yielding eight change points. The top panel shows using transparent
red bands the approximate confidence intervals for the change points as stated in Remark 2.3.2
2.5 Data Examples 79

6.0
5.8
5.6
BIC

5.4
5.2

2 4 6 8 10 12 14

Number of change points S

Fig. 2.4 A plot of the BIC as a function of the number of change points obtained by binary
segmentation

right hand panel of Fig. 2.6. It appears to support that the centered series is mean
stationary and approximately serially uncorrelated.
A natural question here is whether or not the series appears to undergo changes in
other aspects of its distribution. This can be investigated by examining the CUSUM
process Q̃N (t, u) introduced in (2.4.15). A natural test statistic to evaluate for
further changes in the distribution is to consider
⎧ 1⎧ 1
Q̃2N (t, u)dtdu.
0 0

As a result of Theorem 2.4.3, in the absence of changes in the distribution, this


/ 1/ 1
statistic is approximately distributed as 0 0 ┌ 2 (t, u)dtdu, where ┌ is a mean zero
Gaussian process with covariance described in the statement of the theorem. Due
to the negligible autocorrelation observed in the centered series, when quantifying
the magnitude of Q̃N we proceed under the assumption that the variables are
serially independent. Techniques for estimating the function γ in Theorem 2.4.3
in general are considered in Chap. 8. Under serial independence, it can be shown
that E┌(t, u)┌(t ' , u' ) = (min{t, t ' } − tt ' )(min{u, u' } − uu' ). This process is
80 2 Change Point Analysis of the Mean

11
10
Temperature in C

9
8

1775 1837 1934 1990

Year

Fig. 2.5 Time series of the mean yearly temperature measured in degrees Celsius in Prague
between 1775 to 1990. Binary segmentation using the standard and weighted CUSUM processes
for the mean leads to estimates of change points at locations k̂1 = 1837 and k̂2 = 1934

sometimes referred to as the “Brownian pillow”. As such a p–value for changes


in the distribution can be calculated as
⎧⎧ 1⎧ 1 ⎧ 1⎧ 1 | ⎫
|
p=P 2
┌ (t, u)dtdu > 2
Q̃N (t, u)dtdu|X1 , . . . , XN ,
0 0 0 0

with ┌ distributed as above and independent of X1 , . . . , XN . When applied to the


original temperature series, the p–value of this test was 0.0046, indicating that the
series appears to undergo changes in its distribution. The left-hand panel of Fig. 2.7
shows a surface plot of |Q̃N (t, u)| computed from the original series, from which
“ridges” are apparent, and correspond to the estimated changes in the mean at k̂1 =
1837 and k̂2 = 1934. When the same test is applied to the centered series using
2.5 Data Examples 81

ACF of Centered Series

1.0
2.5
γ=0

0.8
γ = 1/4
2.0

0.6
QN(t) (t(1−t))γ
1.5

ACF
0.4
1.0

0.2
0.5

0.0
0.0

0 5 10 15 20
t Lag

Fig. 2.6 Left panel shows plots of the weighted CUSUM process |QN (t)|/[t (1 − t)]γ over t ∈
(0, 1) for γ ∈ {0, 1/4}, along with approximate 95% null critical values. Peaks are visible at
estimated change point locations corresponding to k̂1 = 1837 and k̂2 = 1934. The right panel
shows the ACF of the series centered based on these estimated change points using the piecewise
constant mean estimate shown in Fig. 2.5
u

t
t

Fig. 2.7 Plots of |Q̃N (t, u)| as defined in (2.4.15) for the Prague mean yearly temperature (left),
and for the centered Prague mean yearly temperature series using the change point estimates
k̂1 = 1837 and k̂2 = 1934 (right). Tests for changes in the distribution of these series based
/ /1 1
on 0 0 Q̃2N (t, u)dtdu are significant at the 5% level for the original series, but not for the centered
series

these change point estimates, the p-value was 0.8715, indicating that there do not
appear to be significant changes in the distribution of the re-centered series. A plot
of |Q̃N (t, u)| for the centered series is shown in the right hand panel of Fig. 2.7,
from which one may see that large peaks in the process no longer appear.
82 2 Change Point Analysis of the Mean

2.6 Exercises

Exercise 2.6.1 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and


identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = · · · = μN against the one change in
the mean alternative. We use
| |
.TN = max |X̄0,m − X̄m,N | ,
1≤m<N

where

1 Σ
k
X̄j,k =
. Xi . (2.6.1)
k−j
i=j +1

Compute the limiting distribution of TN under the null hypothesis.


Exercise 2.6.2 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = · · · = μN against the one change in
the mean alternative. We use
| |
TN = max |X̄0,m − X̄m,N | ,
.
1≤m<m

where X̄j,k is defined in (2.6.1). Investigate the limiting behavior of TN under the
exactly one change in the mean alternative when the mean changes from μ1 to μk1 +1
at time k1 .
Exercise 2.6.3 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} is a stationary AR(1)
sequence defined by

Ei = ρEi−1 + ηi
. i ∈ Z,

where {ηi , i ∈ Z} are independent and identically distributed random variables with
Eη0 = 0, Eη02 = σ 2 , E|η0 |ν < ∞ with some ν > 2 and |ρ| < 1. We wish to test
H0 : μ1 = · · · = μN against the one change in the mean alternative. We use
| |
TN = max |X̄0,m − X̄m,N | ,
.
1≤m<m

where X̄j,k is defined in (2.6.1). Compute the limiting distribution of TN under the
null hypothesis.
Exercise 2.6.4 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} is a stationary AR(1)
sequence defined by

Ei = ρEi−1 + ηi
. i ∈ Z,
2.6 Exercises 83

where {ηi , i ∈ Z} are independent and identically distributed random variables with
Eη0 = 0, Eη02 = σ 2 , E|η0 |ν < ∞ with some ν > 2 and |ρ| < 1. We wish to test
H0 : μ1 = · · · = μN against the one change in the mean alternative. We use
| |
.TN = max |X̄0,m − X̄m,N | ,
1≤m<m

where X̄j,k is defined in (2.6.1). Investigate the limiting behavior of TN under the
exactly one change in the mean alternative when the mean changes from μ1 to μk1 +1
at time k1 .
Exercise 2.6.5 Let X1 , X2 , . . . , XN be stationary and serially uncorrelated random
variables with EXi2 = σ 2 . Show that

⎛ k ⎞2
Σ k Σ
N
σ 2 k(N − k)
.E Xi − Xi = .
N N
i=1 i=1

Exercise 2.6.6 Let X1 , X2 , . . . , XN be independent Poisson(λi ), 1 ≤ i ≤ N


random variables. We wish to test H0 : λ1 = λ2 = . . . = λN against the alternative
that there is k1 such that λ1 = λ2 = . . . = λk1 /= λk1 +1 = λk1 +2 = . . . = λN . Find
the maximally selected likelihood ratio.
Exercise 2.6.7 Let X1 , X2 , . . . , XN be independent Poisson(λi ), 1 ≤ i ≤ N
random variables. We wish to test H0 : λ1 = λ2 = . . . = λN against the alternative
that there is k1 such that λ1 = λ2 = . . . = λk1 /= λk1 +1 = λk1 +2 = . . . = λN . Show
that the maximally selected likelihood ratio test is consistent if λk1 = λk1 (N ), λk2 =
λk2 (N ),

N 1/2 | |
. |λ k − λ k | → ∞
1/2 1 2
(log log N)
and k1 = LNθ1 ⎦, 0 < θ1 < 1.
Exercise 2.6.8 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = · · · = μN against the at most two
changes alternative HA : μ1 = μ2 = . . . = μk1 /= μk1 +1 = μk1 +2 = . . . = μk2 /=
μk2 +1 = μk2 +2 = . . . = μN . We use the statistic
{ | |
.TN = N
−3/2
max k(m − k) |X̄0,k − X̄k,m |
1≤k<m<N
| |}
+(m − k)(N − m) |X̄k,m − X̄m,N | ,

where X̄j,k is defined in (2.6.1). Compute the limit distribution of TN .


Exercise 2.6.9 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
84 2 Change Point Analysis of the Mean

with some ν > 2. We wish to test H0 : μ1 = · · · = μN against the at least one


change alternative using the statistic
| k |
|Σ k Σ ||
N
1 −1/2 |
.TN = N max | Xi − Xi | ,
rN 1≤k<N | N |
i=1 i=1

where rN = min1≤k≤N σ̂k,N ,


⎛ k ⎞
1 Σ Σ
N
2
.σ̂k,N = (Xi − X̄0,k ) +
2
(Xi − X̄k,N ) 2
N
i=1 i=k+1

and X̄j,k is defined in (2.6.1). Assume that we have exactly two changes at LNθ1 ⎦
and LNθ2 ⎦, 0 < θ1 < θ2 < 1. Find a condition on the sizes of the changes in the
mean which implies that TN → ∞ in probability.
Exercise 2.6.10 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = · · · = μN against the at most two
changes alternative HA : μ1 = μ2 = . . . = μk1 /= μk1 +1 = μk1 +2 = . . . = μk2 /=
μk2 +1 = μk2 +2 = . . . = μN . We use the statistic
{ | |
TN = N −3/2
. max k(m − k) |X̄0,k − X̄k,m |
1≤k<m<N
| |}
+(m − k)(N − m) |X̄k,m − X̄m,N | ,

where X̄j,k is defined in (2.6.1). Show that

.TN → ∞,

if μ1 = μ1 (N ), μk1 +1 = μk1 +1 (N ), μk2 +1 = μk2 +1 (N ),

N 1/2 max(|μ1 − μk1 +1 |, |μk2 − μk2 +1 |) → ∞


.

and k1 = LN θ1 ⎦, k2 = LNθ2 ⎦, 0 < θ1 < θ2 < 1.


Exercise 2.6.11 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = · · · = μN against the at most two
changes alternative HA : μ1 = μ2 = . . . = μk1 /= μk1 +1 = μk1 +2 = . . . = μk2 /=
μk2 +1 = μk2 +2 = . . . = μN . We use the statistic
{ | |
TN = N −3/2
. max min k(m − k) |X̄0,k − X̄k,m | ,
1≤k<m<N
| |}
(m − k)(N − m) |X̄k,m − X̄m,N | ,
2.7 Bibliographic Notes and Remarks 85

where X̄j,k is defined in (2.6.1). Show that

TN → ∞,
.

if μ1 = μ1 (N ), μk1 +1 = μk1 +1 (N ), μk2 +1 = μk2 +1 (N ), N 1/2 min(|μ1 −


μk1 +1 |, |μk2 − μk2 +1 |) → ∞, and k1 = LNθ1 ⎦, k2 = LNθ2 ⎦, 0 < θ1 < θ2 < 1.
Exercise 2.6.12 Assume that Xi = μi + Ei , where {Ei , i ∈ Z} are independent and
identically distributed random variables with EE0 = 0, EE02 = σ 2 and E|E0 |ν < ∞
with some ν > 2. We wish to test H0 : μ1 = · · · = μN against the at most two
changes alternative HA : μ1 = μ2 = . . . = μk1 /= μk1 +1 = μk1 +2 = . . . = μk2 /=
μk2 +1 = μk2 +2 = . . . = μN . We use the standardized statistic
⎧⎛ ⎞1/2
k(m − k) | |
TN = N
.
−3/2
max max |X̄0,k − X̄k,m | ,
1≤k<m<N m
⎛ ⎞1/2 ⎫
(m − k)(N − m) | |
|X̄k,m − X̄m,N | ,
N −k

where X̄j,k is defined in (2.6.1). We assume that

P {E0 > x} = cx −α ,
. x ≥ x0 with some c > 0 and α > 0.

Show that
{ }
. lim inf P TN > N 1/α > 0
N →∞

under the null hypothesis.

2.7 Bibliographic Notes and Remarks

Grabovsky et al. (2000) introduced kernel type estimators with more general
weights for the time of change, and obtained their asymptotic properties in case
of independent identically distributed errors in the exactly one change model.
Dümbgen (1991) and Antoch et al. (1995) proved Theorem 2.2.1 for independent
observations. Later Antoch et al. (1997) extended their result to linear processes; see
also Antoch and Hušková (1999) for a review. Kurozumi (2018) and Hušková and
Kirch (2010) consider the development of confidence intervals for the time change.
The density function of ξ(κ) in Theorem 2.2.1 was computed by Ferger (1994); see
also (Csörgő and Horváth 1997, p. 177).
Binary segmentation is often credited to Scott and Knott (1974) and Vostrikova
(1981). The first consistency result for binary segmentation of the mean of
86 2 Change Point Analysis of the Mean

independent variables with a fixed number of change points that do not shrink
to zero appears to have been Korostelev (1988). For Gaussian model errors with
a potentially increasing number of, potentially shrinking, changes appeared in
the PhD thesis Venkatraman (1992), and was revisited in Fryzlewicz (2014).
Lemma 2.3.1 was proven in Venkatraman (1992) when κ = 1/2. Sub-Gaussianity of
the tails of the model errors is a typical ingredient in establishing such consistency
results, which is used to produce logarithmic bounds for random variables of the
form
| b |
| Σ |
1 | |
. max | Ei | .
1≤a<b<c≤N (c − a) 1/2 | |
i=a+1

It may be shown for example that when binary segmentation is applied in the
presence of multiple, potentially shrinking, change points, the first change point
estimator in binary segmentation is within a radius of OP (1/Δ2N ) of a change point
(cf. Theorem 2.2.1). However, in the second stage, the maximum of the CUSUM of
the innovations is maximized over a random interval. The maximal absolute value
of the CUSUM process of the model errors may be bounded by variables of the
form max1≤i≤C/Δ2 |Ei |. As such, when ΔN shrinks, it is convenient to make use of
N
strong tail conditions on the variables to get effective bounds on such maxima.
Bai and Perron (1998, 2003) extend the binary segmentation method to linear
models. Yao (1988) and Lee (1995) used Schwarz’s criteria to estimate the number
of changes in independent normal observations. Serbinowska (1996) applied the
same method to binomial observations. Pan and Chen (2006) and Ciuperca (2011)
applied a more general penalty function to find the number of changes in general
time series models. Bai (1995) modifies the binary segmentation method and derives
the asymptotic properties of the estimators for the time of change. Chen et al.
(2011) compares the binary segmentation and the maximum residual method. The
minimum description length is used as the criterion for segmentation in Davis et al.
(2006), and it is minimised using a genetic algorithm to reduce computational
complexity. For a review of penalty terms and information criteria that may be used
in Sect. 2.3.2, see Claeskens and Hjort (2008).
Multiple change point detection and estimation methods have been intensively
studied, even for univariate scalar data, in the last two decades. Many of these
improve upon the weaknesses of simple binary segmentation, including its propen-
sity to perform poorly for change points that are close together, or when there are
many change points. Some notable modern methods include SMUCE (Frick et al.
2014; Dette et al. 2020a), Wild-binary segmentation (Fryzlewicz 2014; Fryzlewicz
and Rao 2014), seeded binary segmentation (Kovács et al. 2023), MOSUM (Kirch
and Klein 2021), kernel based methods (Arlot et al. 2019), and PELT (Killick et al.
2012), among many others; see also Wang et al. (2020). Excellent reviews may be
found in Cho and Kirch (2021) and Yu (2020). A numerical comparison of many of
these methods may be found in Shi et al. (2022).
2.7 Bibliographic Notes and Remarks 87

One of the first nonparametric change point procedures was introduced in Page
(1954, 1955). Sections 2.2 and 2.3 in Csörgő and Horváth (1997) discuss the
extensions of Page’s procedure in case of independent observations. They also
provide several references on applications of nonparametric statistics to change
point analysis. Their methodology is based on the theory of empirical and quantile
processes which we also used in Sect. 2.4. Empirical process techniques for depen-
dent data are surveyed in Dehling et al. (2002, 2009). Hoga (2018a,b) investigates
changes in the quantiles with applications to risk measures and tail indices. Hušková
and Kirch (2008) and Boldea et al. (2019) advocated resampling methods to improve
finite sample performance of several statistical methods. Gerstenberger (2018) uses
Wilcoxon statistics, along the lines of Sect. 2.4 to estimate the time of change, which
also fall within the scope of the U-statistic based methods in Dehling et al. (2022,
2015). By considering the random functions Xi |→ 1{Xi ≤ ·}, functional data
methods, see Chap. 8, can be applied to motivate similar methods as discussed
in Sect. 2.4; see e.g. Sharipov et al. (2016). Holmes et al. (2013) and Bücher
et al. (2019) provides several nonparametric tests for changes in distribution. An
outlier robust method was proposed in Fearnhead and Rigaill (2019). Empirical
characteristic function based methods to detect and estimate changes in distribution
are developed in Huśková and Meintanis (2006b,a), Hlávka et al. (2017), and
Matteson and James (2014).
If the observation process {Xi , i ∈ Z} is formed from independent and
identically distributed random variables, then the approximating Gaussian process
{┌˜ N (t, u), 0 ≤ u, t ≤ 1} in Theorem 2.4.3 has a simple covariance structure:
E ┌˜ N (t, u) = 0 and E ┌˜ N (t, u)┌˜ N (s, v) = (min(t, s) − ts)(min(u, v) − uv). Hence
for each N the process {┌˜ N (t, u), 0 ≤ u, t ≤ 1} is a “Brownian pillow”; it is
tied down at all edges of the unit square. The Brownian pillow appeared in the
paper of Blum et al. (1961), who provided critical values for the square integral
of the Brownian pillow. Koning and Protasov (2003) obtained a representation for
the Brownian pillow which can be used to simulate critical values for several other
functionals.
Applications of change point analysis to climate data have been reviewed in
Reeves et al. (2007). Example 2.5.2 was motivated by an example in Killick and
Eckley (2014).
Chapter 3
Variance Estimation, Change Points
in Variance, and Heteroscedasticity

A crucial step in approximating the distribution of the CUSUM statistics introduced


in Chaps. 1 and 2 is estimating the variance parameter .σ 2 describing the limiting
variance of the partial sum of the observations under Assumption 1.1.1. It may be
shown that the parameter .σ 2 in Assumption 1.1.1 must be defined by the formula
⎛ ⎞
1 E
N
.σ = lim Var ⎝ Ei ⎠ ,
2
N →∞ N 1/2
j =1

i.e. the variance parameter is defined in terms of the asymptotic variance of the
scaled sample mean of the model errors. This asymptotic quantity is often termed
the long–run variance, and also coincides with a scalar multiple of the spectral
density of the sequence .{Ei , i ∈ Z} evaluated at frequency zero. To begin this
chapter, we discuss typical estimators of the long–run variance, and establish their
asymptotic consistency. We then turn to the problem of performing change point
analysis for the second order properties of a process. The chapter concludes with
studying how asymptotic properties of change point methods for the mean are
affected by heteroscedasticity, or changes in the variance, of the model errors in
model (1.1.1).

3.1 Estimation of Long–Run Variances and Covariance


Matrices

Suppose for the moment that .H0 in the basic AMOC change point model (1.1.1)
holds, so that for each i, .Xi = Ei , and further that .{Ei , i ∈ Z} is a stationary sequence

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 89


L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_3
90 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

with .EEi2 < ∞, and autocovariance function .γE (k) = cov(E0 , Ek ). It follows from
elementary calculations that
⎛ ⎞ ⎛ ⎞
1 E
N E
N−1
|k|
Var
. Ei = 1− γE (k).
N 1/2 N
i=1 k=−(N −1)

Hence, if

E
. |γE (k)| < ∞,
k=−∞

then by Lebesgue’s dominated convergence theorem,


⎛ ⎞ ∞
1 E
N E
. lim Var Ei = γE (k) = σ 2 . (3.1.1)
N→∞ N 1/2
i=1 k=−∞

Quality estimates of .σ 2 are needed in order to apply the results of Chap. 1 to


change point problems. Indeed, if we wish to replace .σ 2 with an estimator in the
Darling–Erdős type standardized statistics so that the conclusion of Theorem 1.2.6
still holds, we have seen that basic consistency of the estimator is in general not
enough guarantee that such a replacement has an asymptotically negligible effect;
see for example (1.2.26).

3.1.1 Serially Uncorrelated Observations

To begin, we consider the properties of estimators of .σ 2 when the observations


are serially uncorrelated, so that .σ 2 = γE (0). Although this case is often too
simplistic for many change point problems of interest, studying it brings to light
some important issues we need to consider when estimating .σ 2 in general.
We begin then by assuming the following.
Assumption 3.1.1 .{Ei , i ∈ Z} is strictly stationary sequence, and .EE02 < ∞ and
.cov(E0 , Ek ) = 0 for all .k /= 0.

In this case it is natural to estimate .σ 2 with the sample variance

1 E( 1 E
N N
)2
2
.σ̂N = Xi − X̄N , where X̄N = Xi . (3.1.2)
N −1 N
i=1 i=1

The asymptotic consistency of .σ̂N2 follows from the ergodic theorem, when the
model errors are ergodic.
3.1 Estimation of Long–Run Variances and Covariance Matrices 91

Assumption 3.1.2 .{Ei , i ∈ Z} is an ergodic sequence.


In this case the following holds by the ergodic theorem (see e.g. Breiman, 1968 (p.
113)).
Theorem 3.1.1 If .H0 of (1.1.2) holds, as well as Assumptions 3.1.1 and 3.1.2, then

. σ̂N2 → σ 2 a.s., (3.1.3)

where .σ̂N2 is defined in (3.1.2).


It is shown further in Proposition 6.6 of Breiman (1968) that Bernoulli shifts as
defined in Definition 1.1.1 and Sect. A.1 are strictly stationary and ergodic. Several
volatility processes satisfy Assumptions 3.1.1 and 3.1.2. For example, this is the
case if .Ei = ηi g(ηi−1 , ηi−2 , . . . ) in (1.1.1) when the .ηi ’s are independent identically
distributed random variables with .Eη0 = 0. Nearly all GARCH–type processes can
be written in this form (see Francq and Zakoian, 2010).
We have seen in some cases that we require a rate of convergence for .σN2 . Such a
rate may be given in terms of the rate of decay of the autocovariance of the sequence
of squared values of the series .{Xi2 , i ∈ Z}. We say a function .a : (0, ∞) |→
(−∞, ∞) is slowly varying at infinity if for all .c > 0,

a(cx)
. lim = 1.
x→∞ a(x)

Theorem 3.1.2 We assume that .H0 of (1.1.2), Assumption 3.1.1 hold and .EX04 <
∞.
If .cov(X02 , Xk2 ) = O(a(k)), where .a(k) is a strictly decreasing, slowly varying
function at infinity, then
⎛ ⎞
|σ̂N2 − σ 2 | = OP a 1/2 (N ) .
.

Proof Since under the null hypothesis .σ̂N2 does not depend on the common mean,
we have that

1 E 2 1 E
N N
N
. σ̂N2 = Ei − Ē 2 with ĒN = Ei .
N −1 N −1 N N
i=1 i=1

Using Assumption 3.1.1 we get that


⎛ ⎞
1
2
E ĒN
. =O ,
N
92 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

and therefore by Chebyshev’s inequality


⎛ ⎞
1
ĒN = OP
. .
N 1/2

Using again Assumption 3.1.1 we obtain


⎛N ⎞2 ⎛ ⎞
E E
N −1
|k| ⎛ ⎞
E
. (Ei2 −σ )
2
=N 1− cov E02 , Ek2
N
i=1 k=−(N −1)

and
| |
| NE −1 ⎛ ⎞ ⎛ ⎞| E −1 | ⎛ ⎞|
| |k| | N
| |
| 1− 2 2 |
cov E0 , Ek | ≤ 4 |cov E02 , Ek2 | .
.
|
|k=−(N −1) N | k=0

Using again Chebyshev’s inequality we conclude


⎧ |N −1 | ⎫
|E | 4 1 E ||
N ⎛ ⎞|
1 | 2 | |
P
. | (Ei − σ )| > t ≤ 2 |cov E02 , Ek2 | (3.1.4)
N | | t N
i=1 k=0

for all .t > 0. By the monotone density theorem for slowly varying function in
Bingham et al. (1987) (pp. 159–160)

N |
E ⎛ ⎞|
| |
. |cov ε02 , εk2 | ≤ CNa(N).
k=0

So using .t = a 1/2 (N ), we obtain the result. ⨆



We typically apply Theorem 3.1.2 with .a(k) = 1/(log log k)2 and .a(k) =
1/(log k)2 .
In practice one does not know whether or not change points are present in the
series under consideration, and hence it is useful to investigate the properties of .σ̂N2
in the presence of change points. We now suppose that the observations follow the
AMOC model (1.1.1), in which a change in the mean occurs at time .k ∗ , and the size
of the change is .ΔN = μ0 − μA , where .μ0 and .μA are the means before and after
the change. We showed in Sect. 2.1 that when .HA holds,

E
k E
N
. (Ei − ĒN ) = OP ((k ∗ )1/2 ) and (Ei − ĒN ) = OP ((N − k ∗ )1/2 ).
i=1 i=k ∗ +1
3.1 Estimation of Long–Run Variances and Covariance Matrices 93

Using this we may express .σ̂N2 under .HA as

⎛ ⎞
1 E
N
k ∗ (N − k ∗ ) 2 |ΔN |
σ̂N2 =
. (Ei − ĒN )2 + ΔN + O P . (3.1.5)
N −1 N2 N 1/2
i=1

The relation (3.1.5) suggests that the usual sample variance will tend to overestimate
σ 2 when change points are present in the sequence. The bias term .Δ2N k ∗ (N −
.

k ∗ )/N 2 may be thought to quantify the “additional variance” that appears in the
series due to the presence of change points. It is of note that this bias term does
not asymptotically vanish when the change point .k ∗ is proportional to N (i.e. under
Assumption 2.1.1) and .lim infN →∞ |ΔN | > 0. Using such an estimator in practice
will have the effect of reducing the power of change point detection methods.
There are several ways though to reduce the bias in such a variance estimator
due to change points. One may note that the bias term arises due to the fact that the
sample mean .X̄N used in defining .σ̂ 2 in (3.1.2) does not properly center the series
under .HA . If instead we center the series before and after a candidate change point
k using the running averages

1E E
k N
1
X̄k =
. Xi and X̃k = Xi , (3.1.6)
k N −k
i=1 i=k+1

then if the change point were to occur at k,


⎧ ⎫
1 E
k E
N
. (Xi − X̄k ) + 2
(Xi − X̃k ) 2
N
i=1 i=k+1

would estimate .σ 2 . The fact that this estimator as a function of k is expected to reach
its smallest value when .k = k ∗ suggests using the estimator
⎧ k ⎫
1 E E
N
2
.σ̃N = min (Xi − X̄k ) +
2
(Xi − X̃k ) .
2
(3.1.7)
N 1≤k<N
i=1 i=k+1

Theorem 3.1.3 We assume that the AMOC alternative in (1.1.3) holds. If Assump-
tions 1.2.2, 3.1.1 and 3.1.2 are satisfied and

. lim |ΔN | < ∞,


N →∞

then
P
σ̃N2 → σ 2 .
.
94 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

Proof If .1 ≤ k ≤ k ∗ , then

E
k E
k
1E
k
. (Xi − X̄k )2 = (Ei − Ēk )2 with Ēk = Ei ,
k
i=1 i=1 i=1

and

E
N k∗ ⎛
E ⎞2
N − k∗
. (Xi − X̃k ) = Ei − Ẽk +
2
ΔN
N −k
i=k+1 i=k+1

E
N ⎛ ⎞2
k − k∗
+ Ei − Ẽk − ΔN

N −k
i=k +1

with

1 E
N
Ẽk =
. Ei .
N −k
i=k+1

Now

E
k E
k
. (Ei − Ēk )2 = Ei2 − k Ēk2 .
i=1 i=1

Under Assumption 1.2.2, we have by the law of the iterated logarithm that
⎛ ⎞2
1 E
k
. max k Ēk2 = max ∗ Ei = OP (log log N) .
1≤k≤k ∗ 1≤k≤k k 1/2
i=1

Expanding the square gives that

k∗ ⎛
E ⎞2
N − k∗
. Ei − Ẽk + ΔN
N −k
i=k+1
∗ ∗
E
k
(N − k ∗ )2 2 E
k
∗ ∗
= Ei2 + (k − k)Ẽk2 + (k − k) Δ N − 2Ẽ k Ei
(N − k)2
i=k+1 i=k+1

E k∗
N − k∗ N − k∗
+2 ΔN Ei − (k ∗ − k)Ẽk ΔN ,
N −k N −k
i=k+1
3.1 Estimation of Long–Run Variances and Covariance Matrices 95

and

E
N ⎛ ⎞2
k − k∗
. Ei − Ẽk − ΔN

N −k
i=k +1

E
N
(k ∗ − k)2 2 E
N
= Ei2 + (N − k ∗ )Ẽk2 + (N − k ∗ ) Δ N − 2Ẽ k Ei
(N − k)2
i=k ∗ +1 ∗
i=k +1

k − k∗ E
N
k − k∗
−2 ΔN Ei + 2(N − k ∗ )Ẽk ΔN .
N −k ∗
N −k
i=k +1

By Assumption 1.2.2 we get that


| |
| k∗ |
| E |
. max (k − k ∗ )Ẽk2 = OP (1), |
max ∗ |Ẽk Ei || = OP (1),
1≤k≤k ∗ 1≤k≤k |
i=k+1 |
| |
| k∗ |
| N − k∗ E |
max | ΔN Ei || = OP (|ΔN |N 1/2 ),
1≤k≤k ∗ || N − k
i=k+1 |
| |
| N − k∗ |
| ∗
max ∗ |(k − k )Ẽk ΔN || = OP (|ΔN |N 1/2 ),
1≤k≤k N −k

and
| |
| E
N |
| |
. max (N − k)Ẽk2 = OP (1), max ∗ |Ẽk Ei | = OP (1),
1≤k≤k ∗ 1≤k≤k | ∗
|
i=k +1
| |
k − k ∗ || E |
N
|
max ∗ | ΔN Ei | = OP (|ΔN |N 1/2 ),
1≤k≤k N − k | |
i=k ∗ +1
| |
|
∗ | k−k
∗ |
max ∗ (N − k ) |Ẽk ΔN || = OP (|ΔN |N 1/2 ).
1≤k≤k N −k

Hence we obtain for .1 ≤ k ≤ k ∗


⎧ ⎫
1 E
k E
N
.σ̃N (k) = (Xi − X̄k ) + (Xi − X̃k )2
2 2
N
i=1 i=k+1
┌ N ┐
1 E 2 (k ∗ − k)(N − k ∗ ) 2
= Ei + ΔN + Rk,1 (3.1.8)
N N −k
i=1
96 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

and

. max |Rk,1 | = OP (log log N + |ΔN |N 1/2 ).


1≤k≤k ∗

Similar arguments give for .k ∗ < k < N that


┌ N ┐
1 E k ∗ (k − k ∗ )
.σ̃N (k) = Ei2 + Δ2N + Rk,2 ,
2
N k
i=1

where

. max |Rk,2 | = OP (log log N + |ΔN |N 1/2 ). (3.1.9)


k ∗ ≤k≤N

The function .((k ∗ − k)(N − k ∗ )/(N − k)1{1 ≤ k ≤ k ∗ } + [k ∗ (k − k ∗ )/k]1{k <


k ∗ < N})Δ2N as a function of k achieves its smallest value at .k = k ∗ , and from this
the result follows from (3.1.8) and (3.1.9). ⨆

Minimizing the sum of the squares in the definition of .σ̃N2 actually gives an
estimator for the time of change that coincides with .k̂N (1/2) in Chap. 2 Sect. 2.2. A
similar idea then is to make use of an estimator .k̂ of .k ∗ in Assumption 2.1.1. Let
⎛ ⎞
1 Ek̂ E
N
.σ̄N =
2 ⎝ (Xi − X̄ )2 + (Xi − X̃k̂ )2 ⎠ , (3.1.10)
N k̂
i=1 i=k̂+1

where .X̄k and .X̃k are defined in (3.1.6). The estimators defined in Chap. 2 satisfy
that under the no change null hypothesis, .k̂/N = θ̂N converges to a non–degenerate
distribution, while under the AMOC alternative .k̂/N = θ̂N → θ in probability.
Under these conditions and with minor modifications of the proof of Theorem 3.1.3
we may show that

P
σ̄N2 → σ 2 .
.

The definition of .σ̃N2 and .σ̄N2 can be extended to the case when multiple change
points are present in the means of the observations. Here we consider the R change
point model of (2.3.1)

E
R+1
Xi =
. μj 1{kj∗−1 ≤ i < kj∗ } + Ei , i ∈ {1, . . . , N }, (3.1.11)
j =1

where .kj∗ , .j ∈ {1, . . . , R} denote change points in the mean, satisfying .1 = k0∗ <
k1∗ < · · · < kR∗ < kR+1
∗ = N + 1. Letting .1 < k1 < . . . < kS < N denote candidate
3.1 Estimation of Long–Run Variances and Covariance Matrices 97

values for these changes, we estimate .S + 1 mean values .μj , .j ∈ {1, . . . , S + 1}


with

1 E
ki
X̄ki =
. Xj , i ∈ {1, ..., S + 1}. (3.1.12)
ki − ki−1
j =ki−1

From this we may estimate the variance with

1 E E
S+1 ki
( )2
2
σ̃N,S
. = min Xj − X̄ki , (3.1.13)
1≤k1 <k2 <···<kS ≤N N
i=1 j =ki−1

where S is a user specified upper bound on the number of possible changes. This
estimator works well if N is large and S does not greatly overestimate the number
of changes. We can modify the estimator .σ̄N2 to be based instead on estimators for
the number and locations of the change points. Suppose we have estimates of the
change points in the mean .k̂1 < . . . < k̂R̂ , with .R̂ denoting the estimator for the
number of changes R. We define

R̂+1 k̂i ⎛ ⎞2
1 E E
2
σ̄N,
.

= Xj − X̄k̂i . (3.1.14)
N
i=1 j =k̂i−1

One can prove that if the estimators .k̂1 , . . . , k̂R̂ and .R̂ satisfy Assumption 2.3.3,
namely that for all .ε > 0 .limN →∞ P ({R̂ = R} ∩ {max1≤i≤R |k̂i − ki∗ | < εN }) = 1,
then
P
.σ̄ 2 → σ 2.
N,R̂

Each of these estimators can analogously be defined for vector valued obser-
vations taking value in .Rd . Consider a stationary vector-valued time series .{Xt ∈
Rd , t ∈ Z}. We have that the covariance of .N 1/2 times the sample mean can be
written as
⎛N ⎞⎛ N ⎞T
1 E E
. E (Xi − EX0 (Xi − EX0
N
i=1 i=1

E
N −1 ⎛ ⎞
|k|
= 1− E (X0 − EX0 ) (Xk − EXk )T
N
k=−(N −1)

E
N −1 ⎛ ⎞
|k|
= 1− γ k,
N
k=−(N −1)
98 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

where .γ k = E (X0 − EX0 ) (Xk − EXk )T is the autocovariance matrix of the series
at lag k. We then define the long–run covariance matrix as
⎛N ⎞⎛ N ⎞T
1 E E
. lim E (Xi − EX0 ) (Xi − EX0 ) = E.
N →∞ N
i=1 i=1

This limit is well defined if for instance



E
. ||γ k || < ∞.
k=1

First we consider again the case when the observations are uncorrelated, i.e.
Assumption 3.1.3 .{Xi , i ∈ Z} is an uncorrelated stationary sequence, .E||X0 ||2 <
∞ and .cov(X0 , Xk ) = 0 for all .k /= 0, where .0 denotes the .d × d zero matrix.
If Assumption 3.1.3 holds, then the sample covariance is the natural estimator
for .E. Let

1 E( 1 E
N N
)( )T
Ê N =
. Xi − X̄N Xi − X̄N with X̄N = Xi .
N −1 N
i=1 i=1

As a result of its definition .Ê N is a non negative definite matrix. It can be shown
as in (3.1.3) that under the null hypothesis and assuming the series .{Xi , i ∈ Z} is
ergodic,

Ê N → E
. a.s. (3.1.15)

In order to define an estimator for .E that is consistent not only under the null
hypothesis but also when the series contains change points in the mean, one can
easily modify the definition of .Ê N as we did in defining .σ̃N2 , σ̄N2 , σ̃N,R
2
∗ and .σ̄
2 .
N,R̂

3.1.2 Serially Correlated Observations

If the observations are serially correlated, then the long–run variance .σ 2 defined in
(3.1.1) may in principle depend on the entire autocovariance function .γE (l). It is
natural to estimate this function with the empirical autocovariance function based
3.1 Estimation of Long–Run Variances and Covariance Matrices 99

on the observed sample




⎪ 1 E
N−l

⎪ (Xi − X̄N )(Xi+l − X̄N ), if 0 ≤ l < N,
⎪ ll
⎨ N −l
i=1
.γ̂l = (3.1.16)

⎪ 1 EN

⎪ (Xi − X̄N )(Xi+l − X̄N ), if − N < l < 0.

⎩ N − |l|
i=−(l−1)

where .X̄N is the sample mean. Note that this estimator may not be used to
estimate .γE (l) for .l larger than .N − 1. A naïve estimator of .σ 2 is obtained by
replacing the unknown autocovariances in (3.1.1) with estimators, leading to

E
N −1
σ̂N2 (Naïve) =
. γ̂l . (3.1.17)
l=1−N

As in the observation that the long–run variance is proportional to the spectral


density of the error process at frequency zero, this estimator is proportional to
the periodogram of the observed series at frequency zero. It is straightforward to
establish that under the no change null hypothesis, .σ̂N2 (Naïve) is an asymptotically
unbiased estimator of .σ 2 , but is not consistent. The essential reason why this
estimator is not consistent is that although including more estimators .γ̂l will reduce
the long–run variance estimator’s bias, estimators .γ̂l for large lags .l are based on a
small number of pairs of observations .(Xi , Xi+l ), and have high variance.
In order to estimate .σ 2 based on the empirical autocovariance function, one
popular approach is to apply weights to the .γ̂l ’s in an attempt to balance the bias
and variance of the final estimator. One such class of estimators are the kernel–
bandwidth long–run variance estimators. A kernel–bandwidth long–run variance
estimator takes the form

E
N −1 ⎛ ⎞
l
2
.σ̂N = 2
σ̂N,LRV = K γ̂l , (3.1.18)
h
l=1−N

where h is referred to as the bandwidth (sometimes also the window, or smoothing


parameter), and K is the kernel. The following assumptions on h and K are often
used in order to establish the consistency of such an estimator.
Assumption 3.1.4 As .N → ∞, (i) .h = h(N) → ∞, (ii) .h/N → 0.
The kernel–bandwidth estimator .σ̂N2 is in general a biased estimator of .σ 2 .
Assumption 3.1.4(i) is needed in order for the bias of the estimator to asymptotically
vanish, while Assumption 3.1.4(ii) is needed in order for the variance of .σ̂N2 to
asymptotically vanish.
100 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

Assumption 3.1.5
(i) .K(0) = 1,
(ii) .K(u) = K(−u),
(iii) there is .c > 0 such that .K(u) = 0, if .u /∈ [−c, c],
(iv) .sup−c<u<c |K(u)| < ∞,

(v) .K(u) is Lipschitz continuous.

Assumption 3.1.5 implies that the kernel function is a symmetric, Lipschitz


continuous function on the real line with compact support. Assumption 3.1.5(iii)
that K has compact support can be relaxed to .limu→∞ ua K(u) = 0 for some .a > 0,
but is satisfied by many oft–used kernels, and greatly simplifies the arguments used
to establish the asymptotic properties of .σ̂N2 . Some of the most popular kernels are
listed here:

1 for |u| ≤ 1;
.KTR (u) = (Truncated kernel)
0 otherwise.

1 − |u| for |u| ≤ 1;
KBT (u) = (Bartlett kernel)
0 otherwise.

⎨ 1 − 6u2 + 6|u|3 for 0 ≤ |u| ≤ 2 ;
1

KPR (u) = 2(1 − |u|) 3 for 2 < |u| ≤ 1;


1 (Parzen kernel)

0 otherwise.

(1 + cos(π u))/2 for |u| ≤ 1;
KTH (u) = (Tukey–Hanning kernel)
0 otherwise.
⎛ ⎞
25 sin(6π u/5)
KQS (u) = − cos(6π u/5) (Quadratic spectral kernel)
12π 2 u2 6π u/5

⎨ sin(π u)
(1 + cos(π u)) for |u| ≤ 1;
KDL (u) = πu (Daniell kernel)
⎩0 otherwise.

⎨ 1, if 0 ≤ |u| ≤ 1/2;
KFT (u) = 2(1 − |u|), if 1/2 < |u| ≤ 1; (Flat-top kernel)

0, if |u| > 1.

All the above kernels satisfy Assumption 3.1.5 with the exception of the
quadratic spectral kernel, which is supported on the entire real line.
In some consistency results and bandwidth selection methods it is often useful to
specify the polynomial degree of the kernel near the origin. A kernel function K is
said to be of order q if

1 − K(x)
0 < lim
. < ∞. (3.1.19)
x→0 |x|q
3.1 Estimation of Long–Run Variances and Covariance Matrices 101

When such a q exists that satisfies (3.1.19), the kernel is said to be of infinite
order. The Bartlett kernel is of order one, whereas the Parzen, Tukey–Hanning,
Quadratic spectral, and Daniell kernels are of order two. Kernels that are equal to
one in a neighborhood of the origin are often said to be of infinite order.
We now turn to studying the asymptotic consistency of kernel–bandwidth
estimators of the long–run variance of the form (3.1.18). While the asymptotic
results established in Chap. 1 typically only require a Gaussian approximation
result for the partial sum process of the observations, as in Assumption 1.1.1, such
conditions are not sufficient to establish the consistency of the long–run variance
estimators. We often require moment bounds for the partial sums of the errors and
their autocovariances. It is convenient and quite general to state such moment and
weak dependence requirements in terms of .Lν –decomposability as introduced in
Definition 1.1.1. For example, if .{Ej , j ∈ Z} is .Lν –decomposable for .ν ≥ 2,
∗ , as defined in Definition 1.1.1, is independent of .E . It follows then by the
then .El,l 0
Cauchy-Schwarz inequality
∗ ∗ 2 1/2
|γE (l)| = |cov(E0 , El )| = |cov(E0 , El − El,l
. )| ≤ (EE02 )1/2 (E|E0 − E0,l | ) .

∗ |2 )1/2 ≤
Hence when .{Ej , j ∈ Z} is .Lν decomposable for .ν ≥ 2, so that .(E|E0 −E0,l
al−α for constants .a > 0 and .α > 2, .γE (l) is an absolutely summable sequence,
and the long–run variance is well defined.
Theorem 3.1.4 If the no–change null hypothesis .H0 of (1.1.2) holds, Assump-
tions 3.1.4 and 3.1.5 are satisfied, and the model errors are .Lν –decomposable for
some .ν ≥ 4, then

P
.σ̂N2 → σ 2

where .σ̂N2 is defined in (3.1.18).


Proof Since if the bandwidth h satisfies Assumption 3.1.4 if and only if the
bandwidth bh satisfies Assumption 3.1.4 for any scalar .b > 0, we may assume
without loss of generality in Assumption 3.1.5(iii) that .c = 1. Note that under the
null hypothesis of no change in the mean,

cov(X0 , Xl ) = cov(E0 , Ek ) = γE (k).


.

By the definition of .γ̂l we have for .l ≥ 0


⎧N −l −l

1 E E
N E
N
.γ̂l = γE (l) + (Ei Ei+l − γE (l)) + ĒN − ĒN Ei − ĒN
2
Ei .
N −l
i=1 i=1 i=l+1

Since by Assumption 3.1.4 and the continuity and boundedness of K under


Assumption 3.1.5, .K (l/ h) γE (l) → γE (l) as .N → ∞ for each .l ∈ Z, and
102 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

|K (l/ h) γE (l)| ≤ supx∈R |K(x)||γE (l)|. Hence by the dominated convergence


.

theorem,

E
N −1 ⎛ ⎞
l
. K γE (l) → σ 2 .
h
l=−(N −1)

Also, since K is bounded and

ĒN = OP (N −1/2 ),
. (3.1.20)

we get that
| h |
| E ⎛ l ⎞| ⎛ ⎞
| | h
2
.ĒN | K | = OP = oP (1).
| h | N
l=−h

As a result of Theorem A.3.1 in the appendix, we get that


⎛ b ⎞2
E
.E Ei ≤ c1 (b − a) with some positive constant c1 , (3.1.21)
i=a

and therefore by the Cauchy–Schwarz inequality for any .0 ≤ l ≤ h, we have


|N −l |
|E |
| |
.E | Ei | ≤ c2 N 1/2 . (3.1.22)
| |
i=1

Since (3.1.22) can be established for .−h ≤ l < 0 as well, we conclude


| h |
|E ⎛ l ⎞ 1 NE−l |
| |
.E | K Ei | ≤ c3 hN −1/2 ,
| h N −l |
l=0 i=1

and therefore by Markov’s inequality


| h ⎛ ⎞ ⎛N −l ⎞| ⎛ ⎞
|E E E
N |
| l 1 | h
.| K Ei + E i | = OP .
| h N −l | N 1/2
l=0 i=1 i=l+1

Using now (3.1.20) we conclude


| h ⎛N −l ⎞|
|E ⎛ l ⎞ 1 E E
N | ⎛ ⎞
| | h
.ĒN | K Ei + E i | = OP = oP (1).
| h N −l | N
l=0 i=1 i=l+1
3.1 Estimation of Long–Run Variances and Covariance Matrices 103

Similar arguments give


| −1 ⎛ N ⎞|
| E ⎛l⎞⎛ 1 ⎞ E E
N+l | ⎛ ⎞
| | h
.ĒN | K Ei + Ei | = OP = oP (1).
| h N − |l| | N
l=−h i=−l1 i=1

Theorem 3.1.4 follows if we are able to prove that


⎛ ⎛ ⎞ N −l
⎞2
E
h
l 1 E
E
. K (Ei Ei+l − γE (l)) → 0. (3.1.23)
h N −l
l=−h i=1

It is enough to show that


⎛ ⎛ ⎞ N −l
⎞2
E
h
l 1 E
E
. K (Ei Ei+l − γE (l)) → 0, (3.1.24)
h N −l
l=0 i=1

since the same estimates can be used for .l < 0. We note that
⎛ h ⎞2
E ⎛ l ⎞ 1 NE −l
.E K (Ei Ei+l − γE (l)) (3.1.25)
h N −l
l=0 i=1

E
h E
h ⎛ ⎞ ⎛ '⎞
l l 1 1
= K K
h h N − l N − l'
l=0 l' =0

E
N −l'
−l NE
┌ ┐
× E (Ei Ei+l − γE (l))(Ej Ej +l' − γE (l)' ) .
i=1 j =1

If .1 ≤ i ≤ i + l, i ≤ j ≤ j + k, by stationarity we have
( )
.E (Ei Ei+l − γE (l)) Ej Ej +k − γE (k) (3.1.26)
( )
= E (E0 El − γE (l)) Ej −i Ej −i+k − γE (k) .

Now if .l < j , then


( )
E (E0 El − γE (l)) Ej −i Ej −i+k − γE (k) = E (E0 El − γE (l)) Ej −i Ej −i+k ,
.

and by the Cauchy–Schwarz inequality and .Lν –decomposability,


| | ⎛ ⎞1/2
| ∗ |
E |(E0 El − γE (l)) (Ej − Ej,j
. −l )E j +k | ≤ E (E0 E l − γ E (l)) 2

× (E(Ej − Ej∗−l )4 )1/4 (EEj4+k )1/4


≤ c4 (j − l + 1)−α
104 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

and
| |
| ∗ ∗ | −α
.E |(E0 El − γE (l)) Ej,j −l (E j +k − E j +k,j +k−l | ≤ c5 (j + k − l) .
)


Since .E(E0 El − γE (l))Ej,j ∗
−l Ej +k,j +k−l = 0, we get

E| ( )|
. |E (E0 El − γE (l)) Ej Ej +k − γE (k) |
A1

E ∞
h E
≤ c6 h (j − l + 1)−α ≤ c7 h, (3.1.27)
l=0 j >l

where .A1 = {0 ≤ k, l ≤ h, l ≤ j ≤ N }. On the set .A2 = {0 ≤ k ≤ h, 1 ≤ j < l}


| |
. |E(E0 El − γE (l))(Ej Ej +k − γk ) − EE0 El Ej Ej +k | ≤ c8 h.

If .(j, k, l) ∈ A2 , .Lν –decomposability and the Cauchy–Schwarz inequality yield

E|E0 Ej El (Ej +k − Ej∗+k,j +k−l )| ≤ c7 (j + k − l)−α


.

and

E|E0 Ej (El − El,l−j
. )Ej∗+k,j +k−l | ≤ c8 (l − j + 1)−α .


For all .(j, k, l) ∈ A2 , .EE0 Ej El,l−j Ej∗+k,j +k−l = E[E0 Ej ]E[El,l−j
∗ Ej∗+k,j +k−l ] =
∗ ∗
γj γj +k−l , since we can assume in the definitions of .El,l−j and .Ej +k,j +k−l that we
use the same .{ηn∗ , −∞ < n < l}. Hence
E| ( )|
. |E (E0 El − γE (l)) Ej Ej +k − γE (k) | ≤ c9 h. (3.1.28)
A2

Applying Assumption 3.1.4(ii), we get (3.1.24) from (3.1.25)–(3.1.28). ⨆



Theorem 3.1.4 requires only minor rate conditions on the bandwidth parameter.
One popular method to choose h in practice is so that it minimizes the mean squared
error:

h∗ = argminh E[σ̂N2 (h) − σ 2 ]2 .


. (3.1.29)

Perhaps unsurprisingly, the optimal bandwidth .h∗ depends in a rather complicated


way on the autocorrelations of the errors, as well as on the parameter .σ 2 that we
wish to estimate. Andrews (1991) and Andrews and Monahan (1992) show that for
finite order kernels, .h∗ satisfying (3.1.29) is approximately of the form .BN 1/(2q+1) ,
where q is the order of the kernel in (3.1.19), and B is a constant depending on
3.1 Estimation of Long–Run Variances and Covariance Matrices 105

the kernel as well as the autocorrelations of the errors that may be estimated. It
is recommended in Andrews (1991) that B be approximated using a parametric
model for the errors, for instance an autoregressive model. When an autoregressive
process of order one is used in this step, and the kernel is of order .q = 1, the optimal
bandwidth may be estimated with

4ρ̂ 2
ĥ = B ∗ [α̂(1)N]1/3 , where α̂(1) =
. . (3.1.30)
(1 − ρ̂ 2 )2

. B ∗ only depends on the kernel K, and .ρ̂ is an estimator of the autoregressive


coefficient. These methods are implemented in several R packages, including
Aschersleben and Wagner (2016) and Zeileis (2004).
To replace .σ with .σ̂N so that the limit results for normalized statistics in
Sect. 1.2.1 remain the same, we need to establish a rate of approximation in
Theorem 3.1.4. One can refine the proof of Theorem 3.1.4 to show that .σ̂N2 − σ 2 =
OP ((h/N )1/2 ). Optimal rates of convergence under .Lν –decomposability of such
estimators are given in Liu and Wu (2010).
As we showed above in the case of uncorrelated observations, in the presence
of change points in the mean the estimator .σ̂N2 is biased and tends to overestimate
2
.σ . This problem is even more pronounced for kernel–bandwidth long–run variance

estimators as in (3.1.18). It may be shown that under the AMOC model (1.1.1) and
Assumption 2.1.1 that

γ̂j − γj P
. → 1, as N → ∞, (3.1.31)
θ (1 − θ )Δ2N

where .ΔN = μ1 − μA , the difference between the means before and after the
change. It follows from (3.1.31), and Assumptions 3.1.4 and 3.1.5 that

P
σ̂N2 → ∞,
.

if .limN →∞ |ΔN | > 0, where .σ̂N2 is defined in (3.1.18). Hence if the possible change
is not taken into account, the power of the various tests introduced in Chap. 2 will
be reduced. One can show in fact that
/
σ̂N2 P
c
. → K(u)du.
hθ (1 − θ )Δ2N −c

In other words .σ̂N increases approximately linearly with .ΔN . As a result, when
CUSUM statistics are standardized by .σ̂N it can be more difficult to detect larger
changes when compared to detecting smaller ones. This issue is sometimes referred
to as the “non-monotonic power problem”.
The methods used to modify the sample variance in (3.1.7) and (3.1.10)–(3.1.14)
can also be applied in the setting of long–run variance estimation. Recall the
106 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

notation under (2.3.1) where .kj∗ , .j ∈ {1, . . . , R} denote change points in the
mean, satisfying .1 = k0∗ < k1∗ < · · · < kR∗ < kR+1 ∗ = N + 1. Letting
.1 < k1 < · · · < kS < N denote candidate values for these changes, we estimate

.S + 1 means values .μj , .j ∈ {1, . . . , S + 1} with

1 E
ki
.X̄ki = Xj , 1 ≤ i ≤ S + 1,
ki − ki−1
j =ki−1

k0 = 0 and .kS+1 = N, as before. We modify the estimators for .γE (l) by centering
.

the data using these mean estimates

Ēj = Xj − X̄ki ,
. if ki−1 < j ≤ ki . (3.1.32)

Then define

⎪ N −l

⎪ 1 E

⎪ Ēi Ēi+l , if 0 ≤ l < N,
⎨N −l
i=1
.γ̃l =

⎪ 1 E
N

⎪ Ēi Ēi+l , if − N < l < 0.

⎩ N − |l|
i=−(l−1)

The corresponding kernel–bandwidth estimator based on S changes at times .k1 <


. . . < kS is

E
N −1 ⎛ ⎞
l
σ̂N2 (k1 , . . . , kS ) =
. K γ̃l .
h
l=−(N −1)

The kernel estimator which allows up to S changes in the mean is defined by


2
σ̃N,S
. = min σ̂N2 (k1 , . . . , kS ),
1<k1 <···<kS ≤N

where S is user–selected upper bound for the number of changes. Similarly, using
1 < k̂1 < k̂ . . . < k̂R̂ < N , estimates and locations of the number of changes, we
.

define

σ̄ 2
. = σ̂N2 (k̂1 , k̂2 , . . . , k̂R̂ ). (3.1.33)
N,R̂

Along the lines of the proof of Theorem 3.1.3 one can show that

P
∗ → σ .
2 2
σ̃N,R
.
3.1 Estimation of Long–Run Variances and Covariance Matrices 107

If .R̂ and .θ̂i,N = k̂i /N are asymptotically consistent estimators for R and .θi , it may
also be verified that
P
2
σ̄N,
.

→ σ 2.

These estimators for the long–run variance can be extended to vector valued
observations. The empirical autocovariance matrices are defined as


⎪ 1 E
N−l

⎪ (Xi − X̄N )(Xi+l − X̄N )T , if 0 ≤ l < N,

⎨N −l
i=1
.γ̂ l = (3.1.34)

⎪ 1 E
N

⎪ (Xi − X̄N )(Xi+l − X̄N ) , T
if − N < l < 0,

⎩ N − |l|
i=−(l−1)

where, as before,

1 E
N
X̄N =
. Xi .
N
i=1

Now the long–run covariance matrix estimator is

E
N −1 ⎛ ⎞
l
Ê N =
. K γ̂ l . (3.1.35)
h
l=1−N

We now give a definition of .Lν –decomposability for vector-valued observations.


Definition 3.1.1 We say that the vector-valued sequence .{E i , i ∈ Z} is an .Lν -
decomposable Bernoulli shift , or simply .Lν −decomposable, if for some .ν ≥ 2,
(1) .{E i , ∈ Z} is a causal Bernoulli shift, which is to say that .E i = g(ηi , ηi−1 , . . .),
where .{ηi , i ∈ Z} are independent and identically distributed random variables
taking values in a measurable space .S, g is a (deterministic) measurable
function, .g : S ∞ → R, and
(2) .{E i , ∈ Z} satisfies the moment and weak dependence conditions .EE i = 0,
.E|E i | < ∞, and
ν

( || ||ν )1/ν
vm = E ||E i − E ∗i,m ||
. ≤ cm−α with some c > 0 and α > 2,

where .|| · || is the Euclidean norm in .Rd , .E ∗i,m = g(ηi , . . . , ηi−m+1 , ηi−m
∗ ,
∗ ∗
ηi−m−1 , . . .), where .{ηk , k ∈ Z} are independent, identically distributed copies
of .η0 , independent of .{ηj , j ∈ Z}.
108 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

Theorem 3.1.5 If .H0 of (1.3.1) and Assumptions 3.1.4–3.1.5 hold, and .{Xi , i ∈ Z}
are .Lν –decomposable for .ν ≥ 4, then

P
Ê N → E
.

Proof We can repeat the proof Theorem 3.1.4 coordinate wise. ⨆



According to Theorem 3.1.5, the estimator .Ê N is asymptotically a positive
semidefinite matrix. However, there is no guarantee that it is positive semidefinite
for any fixed N. Newey and West (1987) replaces .K(l/ h) with alternate weights
.KN,l resulting in a positive semidefinite estimator for .E for all N.

Similarly as in the univariate case, one can show that if a change occurs in the
multivariate mean, then .Ê N has a bias that is diverging as a function of the sample
size. One can show under the AMOC model where the time of change .k ∗ = LNθ ⎦
as in Assumption 2.1.1 that
/ c
Ê N P
. → θ (1 − θ )(μ1 − μA )(μ1 − μA )T K(u)du, (3.1.36)
h −c

where .μ1 , .μA are the mean vectors before and after the change. Hence

−1 P
.||Ê N || → 0.

−1 −1
The result in (3.1.36) provides the exact rate for .Ê N . Namely, .Ê N is asymptoti-
cally of the order .1/ h. Since functionals of .QN (·) converge in probability to infinity
faster than h, one can prove that the statistics in (1.3.8) and Theorem 1.3.1 remain
consistent under the AMOC model with .E replaced with its estimator .Ê N .
According to (3.1.36), the largest empirical eigenvalue of .Ê N is tending to infinity
with the sample size at the same rate as h. Hence all statistics in Theorems 1.3.5
and 1.3.6 are also consistent under the AMOC model.
We can modify the definition of .Ê N as in the scalar case to account for S possible
changes in the mean. If .1 < k1 < . . . < kS < N denote the possible times of
changes in the mean vector, we define

1 E
ki
.X̄ki = Xj , i ∈ {1, ..., S + 1},
ki − ki−1
j =ki−1 +1

where .k0 = 0 and .kS+1 = N, as before. The estimator for the covariance matrix of
order lag .l is defined through

Ē j = Xj − X̄ki ,
. if ki−1 < j ≤ ki .
3.1 Estimation of Long–Run Variances and Covariance Matrices 109

Let

⎪ N −l

⎪ 1 E

⎪ Ē i Ē T
i+l , if 0 ≤ l < N,
⎨N −l
i=1
γ̃ l =
.

⎪ 1 E
N

⎪ Ē i Ē T

⎩ N − |l| i+l , if − N < l < 0,
i=−(l−1)

which is implicitly a function of the candidate change points .1 < k1 < k2 < . . . <
kS < N. Now we define the estimator for .E from the centered observations with

E
N −1 ⎛ ⎞
l
Ê N (k1 , . . . , kS ) =
. K γ̃ l .
h
l=1−N

2 , we define
Similarly to .σ̃N,S

Ẽ N,S = Ê N (k̄1 , . . . , k̄S ),


.

where

||Ê N,S (k̄1 , . . . , k̄S )|| =


. min ||Ê N (k1 , . . . , kS )||2 ,
1≤k1 <k2 <...<kS

and S is a user specified upper bound for the number of changes. As in defining
σ̄ 2 we may define
.
N,R̂

.Ē N,R̂ = Ê N (k̂1 , . . . , k̂R̂ ), (3.1.37)

where .k̂1 , . . . , k̂R̂ are the estimated times of the changes. It can be shown if the true
number of changes is less than .R ∗ , then

P
Ẽ N,R ∗ → E.
.

Also, if the estimators for the number and times of changes are consistent, then

P
Ē N,R̂ → E.
.

3.1.3 Ratio–Type and Self–Normalized Statistics

It is known that in some cases the estimators of .σ 2 of (3.1.1) converge to the true
long–run variance parameter slowly, and moreover are often sensitive to the choice
110 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

of the bandwidth parameter .h = h(N). One option to avoid the estimation of .σ 2


entirely is to consider what are sometimes referred to as “ratio–type” or “self–
normalized” statistics. Let
(1)
Tk,10
TN,10 = max
.
(2)
,
1≤k<N Tk,10

where
| k |
|E k E ||
N
(1) |
.T =| Xl − Xl |
k,10 | N |
l=1 l=1

and
| i | | N |
|E i E ||
k |E N −i E
N |
(2) | | |
.T = max | Xl − Xl | + max | Xl − Xl | .
k,10 1≤i≤k | k | k+1≤i<N | N −k |
l=1 l=1 l=i+1 l=k+1

Similarly we define

E
N −1 (1)
Tk,11
TN,11 =
.
(2)
,
k=1 Tk,11

where
⎛ k ⎞2
(1)
E k E
N
.T
k,11 = Xl − Xl
N
l=1 l=1

and
⎛ i ⎞2 −1
⎛ ⎞2
(2)
E
k E i E
k E
N E
N
N −i E
N
.T = Xl − Xl + Xl − Xl .
k,11
k N −k
i=1 l=1 l=1 i=k+1 l=i+1 l=k+1

We note that .TN,10 and .TN,11 do not depend on the unknown mean under the
null hypothesis. It follows from Assumption 1.1.1 that under the no change null
hypothesis

LN
E t⎦
D [0,1]
N −1/2
. (Xl − μ) −→ σ W (t), (3.1.38)
l=1

where .{W (t), 0 ≤ t ≤ 1} denotes a Wiener process. Using (3.1.38) we get that
{ } D2 [0,1] { }
. N −1/2 TLN
(1)
t⎦,10 , N −1/2 (2)
T LN t⎦,10 , 0 ≤ t ≤ 1 −→ σ L (1)
t,10 , σ L (2)
t,10 , 0 ≤ t ≤ 1 ,
3.1 Estimation of Long–Run Variances and Covariance Matrices 111

where
(1)
Lt,10 = |W (t) − tW (1)|,
.

and
| | | |
| | | 1−u |
.L
(2) | | |
= sup |W (u) − uW (t)|+ sup |W (1) − W (u) − (W (1) − W (t))|| .
t,10
0<u≤t t≤u<1 1 − t

Hence, again under Assumption 1.1.1 and the null hypothesis we get that

D L(1)
t,10
TN,10 → sup
.
(2)
. (3.1.39)
0<t<1 Lt,10

To prove that .TN,10 is asymptotically consistent under the one change in the mean
alternative of (1.1.2), we note that

Tk(1)
∗ ,10
TN,10 ≥
.
(2)
.
Tk ∗ ,10

We recall that under Assumption 2.1.1, .k ∗ = LNθ ⎦ and therefore

1 P
. T (1)
∗ → θ (1 − θ ),
N |ΔN | k ,10
1/2

where .ΔN is the size of the change. Hence

(1)
1 Tk ∗ ,10 D θ (1 − θ )
. → .
N |ΔN | T (2)
1/2 (2)
σ Tθ,10
k ∗ ,10

Using again (3.1.38) we obtain that


⎧ ⎫ 2 ⎧ ⎫
1 (1) 1 (2) D [0,1] (1) (2)
. TLN t⎦,11 , TLN t⎦,11 , 0 ≤ t ≤ 1 −→ σ 2 Lt,11 , σ 2 Lt,11 , 0 ≤ t ≤ 1 ,
N N

where
/ t
(1)
.L
t,11 = (W (u) − uW (1))2 du
0
112 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

and
/ t⎛ ⎞2
(2) u
.L
t,11 = W (u) − W (1) du
0 t
/ 1⎛ ⎞2
1−u
+ W (1) − W (u) − (W (1) − W (t)) du.
t 1−t

If .H0 and Assumption 1.1.1 hold, then

/ 1 (1)
1 D Lt,11
. TN,11 → (2)
dt. (3.1.40)
N 0 Lt,11

It can be also shown that under (1.1.2)

1 P
. TN,11 → ∞,
N

if .θN (1 − θN )N 1/2 |ΔN | → ∞, where .θN = k ∗ /N. As a result of the above, an


asymptotically consistent test of .H0 versus .HA of size .α in the AMOC change
model is to reject .H0 when .TN,10 and .TN,11 /N exceed the .1 − α quantiles of the
distributions on the right hand side of (3.1.39) and (3.1.40), respectively.

3.2 Changes in Variances and Covariances

In Chaps. 1 and 2 we studied a variety of methods to perform inference for change


points in the mean of a sequence of observations. Given the simple observation that
higher order moments are defined as the mean of the sequence suitably transformed,
these methods can be readily extended to higher order moments. In this section we
develop some techniques for performing change point analysis for the variance of
the observations. To begin, we assume the sequence of observations .X1 , . . . , XN
has a stable mean .μ, and consider the AMOC in the variance model:

μ + σ0 Ei , if 1 ≤ i ≤ k ∗ ,
Xi =
. (3.2.1)
μ + σA Ei , if k ∗ + i ≤ i ≤ N.

Under the null hypothesis, there is no change in the variance, so that

H0 : σ02 = σA2
. (3.2.2)

whereas under the alternative

HA : σ02 /= σA2 .
. (3.2.3)
3.2 Changes in Variances and Covariances 113

In order that the parameters .μ, .σ0 and .σA are uniquely identified, we make the
following assumption.

Assumption 3.2.1 .EEi = 0 and .EEi2 = 1.


When model (3.2.1) holds, it is natural to estimate the unknown mean .μ simply by
the sample mean,

1 E
N
. X̂N = Xi .
N
i=1

Further, if .H0 holds the average of the centered and squared observations

. σ̂i2 = (Xi − X̂N )2

is the natural estimator for the variance parameter .σ02 . The CUSUM process for the
variance is
⎛ ⎞
LN
E t⎦ EN
−1/2 ⎝ LNt⎦
.ZN (t) = N σ̂i2 − σ̂i2 ⎠ .
N
i=1 i=1

The long–run variance of the sequence .{Xi2 , i ∈ Z} is defined by



E
τ2 =
. cov(X02 , Xl2 ). (3.2.4)
l=−∞

Theorem 3.2.1 We assume that .H0 of (3.2.1) is satisfied along with Assump-
tions 1.2.1 and 3.2.1, and that the series .{Ei i ∈ Z} in (3.2.1) is .Lν –decomposable
for some .ν ≥ 4.
(i) If .I (w, c) < ∞ for some .c > 0, where .I (w, c) is defined in (1.2.4), then

1 |ZN (N t/(N + 1))| D |B(t)|


. sup → sup ,
τ 0<t<1 w(t) 0<t<1 w(t)

where .{B(t), 0 ≤ t ≤ 1} denotes a Brownian bridge.


(ii) Also,
⎧ ⎛ ⎞1/2 ||Ek E
N
|
|
1 N | k |
. lim P a(log N) max | σ̂i2 − σ̂i2 |
N→∞ τ 1≤k<N (k(N − k)) | N |
i=1 i=1

≤ x + b(log N) = exp(−2e−x )

for all .x ∈ R.
114 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

(iii) If .min(t1 , 1 − t2 ) → 0, .N min(t1 , 1 − t2 ) → ∞ and .κ > 1/2 we have

κ−1/2 1 |ZN (t)| D


kN
. → a(κ),
τ [t (1 − t)] κ

where .kN and .a(κ) are defined in (1.2.27) and (1.2.29).


Proof Since

σ̂i2 = (Xi − μ)2 − 2(Xi − μ)(X̂N − μ) + (X̂N − μ)2 ,


.

we get that

E
k
k E 2 E
N k
k E
N
. σ̂i2 − σ̂i = (Xi − μ)2 − (Xi − μ)2 − 2Rk , (3.2.5)
N N
i=1 i=1 i=1 i=1

where
┌ k ┐
E k E
N
Rk =
. Xi − Xi (X̂N − μ). (3.2.6)
N
i=1 i=1

It follows under .Lν –decomposability that


| |
| |
. |X̂N − μ| = OP (N −1/2 ). (3.2.7)

As a result of Theorem 1.2.5,


| k |
|E k E ||
N
|
| Xi − Xi |
| N |
i=1 i=1
. max = OP (1),
1≤k≤N N 1/2 w(k/N )

which when combined with (3.2.7) imply that the remainder term .Rk does not
influence the limit distribution of functionals of the weighted CUSUM of .σ̂i2 ’s.
Since .Xi being .Lν –decomposable implies that .(Xi − μ)2 is .Lν/2 –decomposable,
the result follows from Theorem 1.2.5 of Chap. 1. The proof of this result is omitted
since it follows from Theorem 2.1.1. ⨆

We may also obtain similar results to those in Sects. 2.1 and 2.2 regarding the
asymptotic properties of the CUSUM process for the variance under .HA in (3.2.3).
As in Theorem 2.2.1, we denote .k ∗ /N = θN and let .σA2 = σA,N 2 depend on the
sample size in order to study local alternatives where .σA,N converge to .σ02 , and
2

.θN → 0 or .θN → 1.
3.2 Changes in Variances and Covariances 115

Theorem 3.2.2 We assume that .HA of (3.2.3) is satisfied along with Assump-
tions 1.2.1 and 3.2.1, and that the series .{Ei i ∈ Z} in (3.2.1) is .Lν –decomposable
for some .ν ≥ 4.
(i) If .0 ≤ κ < 1/2, then

|ZN (N t/(N + 1))| P


. sup → ∞
0<t<1 [t (1 − t)]κ

holds if and only if

[θN (1 − θN )]1−κ N 1/2 |σ02 − σA2 | → ∞.


. (3.2.8)

(ii) Also,

|ZN (N t/(N + 1)| P


(log log N)−1/2 sup
. → ∞ (3.2.9)
0<t<1 [t (1 − t)]κ

holds if and only if

[θN (1 − θN )]1−κ N 1/2 (log log N)−1/2 |σ02 − σA2 | → ∞.


.

An estimator for the time of change in the variance in the AMOC in the variance
model is defined as
⎧⎛ ⎞κ ||E |⎫
k E 2 ||
k N
N |
.k̂N = k̂N (κ) = sargmax σ̂i −
2
| σ̂i | , (3.2.10)
k∈{1,...,N } k(N − k) | N |
i=1 i=1

where .0 ≤ κ ≤ 1/2. The following result is analogous to Theorem 2.2.1. Its proof
follows using the decomposition of the variance CUSUM process in (3.2.5), and
with minor modifications of Theorem 2.2.1.
Theorem 3.2.3 We assume that .HA of (3.2.3) is satisfied along with Assump-
tions 1.2.1 and 3.2.1, and that the series .{Ei i ∈ Z} in (3.2.1) is .Lν –decomposable
for some .ν ≥ 4.
(i) If .0 ≤ κ < 1/2,

.|σ02 − σA2 | → 0 and N(σ02 − σA2 )2 → ∞,

then

N(σ02 − σA2 )2 D
. (θ̂N − θ ) → ξ(κ).
τ2
116 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

(ii) If .κ = 1/2

N
|σ02 − σA2 | → 0 and
. (σ 2 − σA2 )2 → ∞,
(log log N)1/2 0

then

N(σ02 − σA2 )2 D
. (θ̂N − θ ) → ξ(1/2).
τ2

where .τ 2 , .mκ (t) and .ξ(κ) are defined in (3.2.4), (2.2.1) and (2.2.3).
Remark 3.2.1 Once again the condition that .ΔN = |σ02 − σA2 | → 0 gives that
the limiting distribution of .k̂N does depend on the full joint distribution of the
2
.Ei ’s, but instead only depends on the long–run variance parameter .τ . If instead

.limN →∞ |ΔN | = Δ, 0 < Δ ≤ ∞, it may be shown that

|k̂N − k ∗ | = OP (1).
.

If .0 < c < ∞, then, as in Theorem 2.2.2, .k̂N − k ∗ has an asymptotic distribution


which is the location of the maximum of a stochastic process indexed by the integers
that takes the form of a drift plus noise. The noise term is comprised of partial sums
of the centered, squared innovations .Ei2 −1. We also note that if .limN →∞ |ΔN | = ∞,
then it may be shown that .limN →∞ P {k̂N = k ∗ } = 1.

3.2.1 Simultaneous Changes in the Mean and Variance

Another popular model allows for changes in the mean and variance to occur at the
same time:

μ0 + σ0 Ei , if 1 ≤ i ≤ k ∗ ,
.Xi = (3.2.11)
μA + σA Ei , if k ∗ + 1 ≤ i ≤ N.

Here we take Assumption 3.2.1 as granted, so that .μ0 , μA , σ0 , and .σA are unknown
parameters representing the means and variances before and after the change
point .k ∗ . The null hypothesis of interest is the stability of the mean and variance
parameters:

H0 : μ0 = μA and σ0 = σA .
. (3.2.12)

We wish to test this against the alternative

HA : μ0 /= μA and/or σ0 /= σA .
. (3.2.13)
3.2 Changes in Variances and Covariances 117

None of the tests discussed up to this point are “universally consistent” against the
alternative of (3.2.13). A natural idea is to combine the outcomes of two tests; one
for the stability for the mean, and another for the stability of the variance. Deriving
joint asymptotics for such tests is exceedingly complicated. An alternate approach is
to test for the change in the mean, and then based on the result of that test perform a
subsequent test for a change in the variance. Our results concerning change point
tests for the mean have assumed so far that the model errors form a stationary
sequence. This assumption may appear to be frequently violated, and evidently does
not hold under model (3.2.13). This motivates studying the behaviour of CUSUM
based tests when the model errors are generally heteroscedastic, which we take up
in Sect. 3.3 below.
Before turning to that situation though, we discuss for a moment the asymptotic
properties of the CUSUM process for the variance when the mean is estimated using
segmentation based on a consistent change point estimator. Such an estimator under
heteroscedasticity of the error process is introduced in Sect. 3.3. Let .k̂ denote an
estimator for the time of change in the mean in model 3.2.11. We segment the
observations into two sub–samples .{X1 , X2 , . . . , Xk̂ } and .{Xk̂+1 , Xk̂+2 , . . . , XN },
and compute the corresponding sample means:

1E E
k̂ N
1
X̂k̂,1 =
. Xi and X̂k̂,2 = Xi .
k̂ i=1 N − k̂
i=k̂+1

Next we center the observations based on .X̂k̂,1 and .X̂k̂,2 :



(Xi − X̂k̂,1 )2 , if 1 ≤ i ≤ k̂,
.σ̃i
2
=
(Xi − X̂k̂,2 )2 , if k̂ + 1 ≤ i ≤ N.

The CUSUM process of the .σ̃i2 ’s is


⎛ ⎞
LN
E t⎦ E
N
LNt⎦
Z̃N (t) = N −1/2 ⎝
. σ̃i2 − σ̃i2 ⎠ .
N
i=1 i=1

Assume for the moment that .k ∗ < k̂. This means that .k̂ − k ∗ observations are
incorrectly centered. If .k ∗ < k ≤ k̂, then using

(Xi − X̂k̂,1 )2 = (Xi − μA )2 + 2(μA − X̂k̂,1 )(Xi − μA ) + (μA − X̂k̂,1 )2


.
118 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

we get

E
k̂ E
k̂ E

. (Xi − X̂k̂,1 )2 = (Xi − μA )2 + 2(μA − X̂k̂,1 ) (Xi − μA )
i=k ∗ +1 i=k ∗ +1 i=k ∗ +1

E

+ (μA − X̂k̂,1 )2 .
i=k ∗ +1

In general when .|k̂ − k ∗ | = oP (N ), it follows that


| |
| |
. |μA − X̂k̂,1 | = OP (1),

and
| |
| k̂ |
| E |
.| − |
A | = oP (N
1/2
| (Xi μ ) ).
|i=k ∗ +1 |

Using now Theorem 3.2.3 (cf. also Remark 3.2.1) we obtain that

E
k̂ ⎛ ⎞
. (μA − X̂k̂,1 )2 = OP (k̂ − k ∗ )(μ0 − μA )2 .
i=k ∗ +1

These arguments can be adjusted to give the same approximations for .k̂ < k ∗ . Hence
| |
| |
. sup |Z̃N (t) − ZN (t)| = oP (1),
0<t<1

so the estimation of .k ∗ does not change the asymptotic properties of the variance
CUSUM processes. For example, such arguments can be used to establish Theo-
rem 3.2.1 when the process .ZN (t) is replaced with .Z̃N (t). The same reasoning can
be applied when more than one mean change point estimator is used to segment the
mean. We showed in this section how methods to perform change point analysis for
the mean can be modified to detect and estimate changes in the variance. We only
considered AMOC in the variance models, although these methods may be extended
to cover multiple points in the variance models.
3.2 Changes in Variances and Covariances 119

3.2.2 Changes in the Covariance Matrix of Vector Valued


Observations

For vector valued observations .X1 , . . . , XN ∈ Rd , it is often of interest to evaluate


whether or not their second order structure is homogeneous over the observation
period. We consider the multivariate analogue to the model (3.2.1),

μ + E 0 E i , if 1 ≤ i ≤ k ∗ ,
Xi =
. (3.2.14)
μ + E A E i , if k ∗ + 1 ≤ i ≤ N,

where .EE i = 0 and .EE i E Ti = I. This model allows for AMOC in the covariance
matrix of the vectors, from .E 0 to .E A , at time .k ∗ . We may phrase detecting a change
point under model (3.2.14) as a hypothesis test of .H0 : E 0 = E A , versus HA :
E 0 /= E A .
To construct a test statistic for distinguishing between .H0 and .HA , we let .vech(·)
be the operator that stacks the columns on and below the diagonal of a symmetric
.d × d matrix as a vector in .R , where .d = d(d + 1)/2.
d

When .H0 holds and .μ = 0 in (3.2.14), the expected values of the .d dimensional
vectors .vech(Xj XTj ) are constant for .j ∈ {1, . . . , N }. Consequently, a vector valued
CUSUM process as in Sect. 1.3 can be constructed as
⎛ ⎞
LN t⎦
1 ⎝E ┌ ┐ LNt⎦ EN ┌ ┐
.ZN (t) = √ vech Xj XT
j − vech Xj XT
j
⎠ , 0 ≤ t ≤ 1.
N j =1 N
j =1

When .μ is unknown, we replace .Xj with .X̃j = Xj − X with .X =


Σ
(1/N) N j =1 Xj , and consider the mean corrected modification of the CUSUM
process
⎛ ⎞
LN t⎦
1 ⎝E ┌ ┐ LNt⎦ EN ┌ ┐
.Z̃N (t) = √ vech X̃j X̃T
j − vech X̃j X̃T
j
⎠ , 0 ≤ t ≤ 1.
N j =1 N
j =1

Under .H0 and when the errors .{E i , i ∈ Z} have suitably decaying autocovariance
matricies, for instance if they are .Lν –decomposable for some .ν > 4, it may be
shown the long–run covariance
E ⎛ ┌ ┐ ┌ ┐⎞
E=
. Cov vech X0 XT T
0 , vech Xj Xj
j ∈Z
120 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

converges (coordinatewise) absolutely. A consistent estimator .Ê of .E may be


obtained using (3.1.35) based on the observations .Yj = vech(Xj XT j ). The
following two test statistics can be used to test .H0 against .HA . Define
/ 1
ΛN = max Z̃NT (t)Ê −1 Z̃N (t)
. and ΩN = Z̃NT (t)Ê −1 Z̃N (t)dt.
0≤t≤1 0

We may obtain the following result by applying the results in Sect. 1.3, in
particular (1.3.8). Note that under .H0 and when the errors .{E i , i ∈ Z} are .Lν –
decomposable in .Rd for some .ν > 4, then the vectors .Yj = vech(Xj XT j ) in .R are
d

.L
ν/2 –decomposable in .Rd .

Theorem 3.2.4 Suppose .H0 holds and that the errors in model (3.2.14), .{E i , i ∈ Z}
are .Lν –decomposable for some .ν > 4. If .E is non-singular, and .||E − Ê|| = oP (1),
then

D E
d
ΛN −→ sup
. Bl2 (t), as N → ∞,
0≤t≤1 l=1

and
d /
E 1
D
ΩN −→
. Bl2 (t)dt as N → ∞.
l=1 0

The exact distribution of the limiting random variables in Theorem 3.2.4 are
computed in Kiefer (1959).
The location of the change point in (3.2.14) may be estimated with

k̂N = sargmax Z̃NT (k/N)Ê −1 Z̃N (k/N).


. (3.2.15)
1≤k≤N

3.3 Heteroscedastic Errors

In studying CUSUM–based methods to perform change point analysis for the mean
in Chap. 2, we assumed that the model errors in (1.1.1) were strictly stationary.
In many cases though we are interested in conducting change point analysis for
the mean of a series that appears to display other deviations from stationarity,
for instance changes in the variance. In this subsection we study the asymptotic
behaviour of CUSUM–based tests for changes in the mean for series exhibiting
heteroscedasticity.
3.3 Heteroscedastic Errors 121

Specifically we consider the model in (1.1.1), but with errors that do not
necessarily have homogenous variance:

Assumption 3.3.1 .Ei = a(i/N)ei , i ∈ {1, ..., N}, with .Eei = 0, and .Eei2 = 1.
The function a we take to satisfy the following assumption.

Assumption 3.3.2 .a(t), 0 ≤ t ≤ 1 has bounded variation on .[0, 1].


For the definition and properties of functions with bounded variation we refer to
Hewitt and Stromberg (1969). One of the most important results concerning them
is the Jordan decomposition theorem, which states that a function has bounded
variation if and only if it is a difference of two non-decreasing functions.
Some examples of functions a that fit within this framework are:
(i) (abrupt changes in the variances)

E
R+1
a(t) =
. rl 1{θl−1 < t ≤ θl },
l=1

with .0 = θ0 < θ1 < θ2 < . . . < θR < θR+1 = 1 and .r1 /= r2 /= . . . /= rR+1 . In
this model the variance jumps from .rl2 to .rl+1
2 at time .LNθl ⎦.
(ii) (polynomially evolving variances) in this case .a 2 (x) is a non negative polyno-
mial, which includes linearly and quadratically changing variances.
We assume that the innovation sequence .{ei , i ∈ Z}’s satisfies the functional central
limit theorem:
Assumption 3.3.3 There is .σ > 0 such that

LN
E t⎦
D [0,1]
N −1/2
. ei −→ σ W (t),
i=1

where .{W (t), 0 ≤ t ≤ 1} is a Wiener process.


This would hold for example if the series .{ei , i ∈ Z} were .Lν –decomposable. We
recall the definition of the CUSUM processes .{ZN (t), 0 ≤ t ≤ 1} and .{QN (t), 0 ≤
t ≤ 1} from (1.1.7) and (1.2.1).
Theorem 3.3.1 If Assumptions 3.3.1–3.3.3 are satisfied, then

LN
E t⎦
D [0,1]
N −1/2
. Ei −→ W (b(t)),
i=1
122 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

where .{W (t), t ≥ 0} is a Wiener process and


/ t
b(t) = σ 2
. a 2 (u)du. (3.3.1)
0

Proof Let

E
k
S(k) =
. ei and S(0) = 0.
i=1

By Abel’s summation formula, we have

E
k E
k
. Ei = a(i/N)ei = a(k/N )S(k)
i=1 i=1

E
k−1
− S(i) (a((i + 1)/N) − a(i/N)) , 1 ≤ k ≤ N.
i=1

By the Skorokhod–Dudley–Wichura representation theorem (see Shorack and


Wellner (1986), p. 47), Assumption 3.3.3 implies that may define Wiener processes
.{WN (t), t ≥ 0} such that

. max |S(k) − σ WN (k)| = oP (N 1/2 ).


1≤k≤N

Now Assumption 3.3.2 yields


| k ⎛ ⎞|
|E E
k−1 |
| |
. max | Ei − σ a(k/N)WN (k) − WN (l)(a((l + 1)/N) − a(l/N)) |
1≤k≤N | |
l=1 l=1

≤ max |a(k/N )(S(k) − σ WN (k))|


1≤k≤N
|k−1 |
|E |
| |
+ max | (S(l) − σ WN (l))(a((l + 1)/N) − a(l/N))|
1≤k≤N | |
l=1

E
N −1
= oP (N 1/2 ) sup |a(t)| + oP (N 1/2 ) |a((l + 1)/N) − a(l/N)|
0≤t≤1 l=1

= oP (N 1/2 ).
3.3 Heteroscedastic Errors 123

By the Jordan decomposition theorem (see Hewitt and Stromberg, 1969, p. 266),
there are two non-decreasing functions such that .a(t) = a1 (t) − a2 (t). Focusing on
the function .a1 (t) we have

E
k−1
. WN (l)(a1 ((l + 1)/N) − a1 (l/N))
l=1

E
k−1 / l+1
= WN (l) da1 (x/N)
l=1 l

/ k k−1 /
E l+1
= WN (x)da1 (x/N) + (WN (l) − WN (x))da1 (x/N).
0 l=1 l

By the modulus of continuity of the Wiener process (see Appendix A.2) we have
that

. sup sup |WN (x + u) − WN (u)| = OP ((log N)1/2 ).


0≤u≤N 0≤x≤1

Integration by parts gives


/ k / k
WN (k)a1 (k/N) −
. WN (x)da1 (x/N) = a1 (x/N)dWN (x),
0 0

and therefore
| / k |
| E
k−1 |
| |
. |a1 (k/N)WN (k)− WN (l)(a1 ((l + 1)/N)−a1 (l/N))− a1 (x/N)dWN (x)|
| 0 |
l=1

= OP ((log N ) 1/2
).

Similarly,
| / k |
| E
k−1 |
| |
. |a2 (k/N)WN (k)− WN (l)(a2 ((l + 1)/N)−a2 (l/N))− a2 (x/N)dWN (x)|
| 0 |
l=1

= OP ((log N)1/2 ),

resulting in
| k / k |
|E |
| |
. max | El − σ a(x/N)dWN (x)| = oP (N 1/2 ).
1≤k≤N | 0 |
l=1
124 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

Let
/ t
UN (t) =
. a(x/N)dWN (x), 0 ≤ t ≤ N.
0

Checking the covariance functions, one can verify that


⎧ ⎛/ t ⎞ ⎫
D
. {UN (t), 0 ≤ t ≤ N} = W a 2 (x/N)dx , 0 ≤ t ≤ N ,
0

where .{W (x), x ≥ 0} is a Wiener process. Next we note that


| ⎛/ ⎞ ⎛/ ⎞|
| k k+s |
. max sup ||W a 2 (x/N)dx − W a 2 (x/N)dx ||
0≤k≤N −1 0≤s≤1 0 0

= OP ((log N) 1/2
),

since
/ k+s
. max sup a 2 (x/N)dx ≤ 4(a12 (1) + a22 (1)).
0≤k≤N −1 0≤s≤1 k

Thus we conclude that


| |
| / Nt |
| −1/2 LN
E t⎦
σ |
. sup N| El − 1/2 a(x/N)dWN (x)|| = oP (1).
|
0≤t≤1 | l=1
N 0 |

Computing the covariance functions again we conclude


⎧ / Nt ⎫
σ D
. a(x/N)dWN (x), 0 ≤ t ≤ 1 = {W (b(t)) , 0 ≤ t ≤ 1} ,
N 1/2 0

completing the proof. ⨆



Corollary 3.3.1 If .H0 of (1.1.1) and Assumptions 3.3.1–3.3.3 are satisfied, then

D [0,1]
ZN (t) −→ σ ┌(t) with ┌(t) = W (b(t)) − tW (b(1)),
.

where .{W (t), 0 ≤ t ≤ 1} is a Wiener process and .b(t) is defined in (3.3.1).


Proof The continuous mapping theorem and Theorem 3.3.1 imply the result. ⨆

It follows from elementary calculation that .E┌(t) = 0 and

C(t, s) = E┌(t)┌(s) = b(min(t, s)) − tb(s) − sb(t) + tsb(1).


. (3.3.2)
3.3 Heteroscedastic Errors 125

The covariance function .C(t, s) evidently differs from the covariance function
of the Brownian bridge so long as b is not equal to a constant multiple of
the identity function, which upon inspecting (3.3.1) occurs if a is non-constant.
The approximation of the distribution function of .sup0<t<1 |┌(t)|, which requires
estimating .C, is a difficult problem that we will turn to momentarily. Before doing
so we note that approximations as in Corollary 3.3.1 can also be established for
weighted functionals of .ZN . As we have seen in Sect. 1.2, in this case we need a
rate of approximation of the partial sums of the .ei ’s by a Gaussian process.
Assumption 3.3.4 For each N there are two independent Wiener processes
{WN,1 (t), 0 ≤ t ≤ N/2}, .{WN,2 (t), 0 ≤ t ≤ N/2}, .σ > 0 and .ζ < 1/2 such
.

that
| k |
|E |
−ζ | |
. sup k | ei − σ WN,1 (k)| = OP (1)
1≤k≤N/2 | |
i=1

and
| N |
| E |
−ζ | |
. sup (N − k) | ei − σ WN,2 (N − k)| = OP (1).
N/2<k<N | |
i=k+1

Theorem 3.3.2 We assume that .H0 of (1.1.2), Assumptions 1.2.1, 3.3.1, 3.3.2
and 3.3.4 are satisfied and

w(b(t)) w(b(1)) − w(b(t))


. lim sup <∞ and lim sup < ∞. (3.3.3)
t→0 w(t) t→1 w(t)

(i) If .I (w, c) < ∞ for some .c > 0, then

|ZN (N t/(N + 1))| D |┌(t)|


. sup → sup .
0<t<1 w(t) 0<t<1 w(t)

(ii) If .p ≥ 1 and
/ 1 [t (1 − t)]p/2
. < ∞,
0 w(t)

then
/ /
1 |ZN (N t/(N + 1))|p D
1 |┌(t)|p
. dt → dt,
0 w(t) 0 w(t)

where .┌ is defined in Corollary 3.3.1.


126 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

Proof The result can be proven by combining the methods of the proofs of
Theorems 1.2.2, 1.2.4 and 3.3.1. ⨅

We note that (3.3.3) holds in the important case when the variance exhibits
change points, or in other words is piecewise constant.
Self–normalized Darling–Erdős type result as in Theorem 1.2.5 can also be
derived for such heteroscedastic processes. However, these results are difficult to
apply since they depend on estimating .b(t) and its limiting behaviour near 0 and 1.
A similar problem is discussed in Sect. 4.3 in the context of regression models.
We now illustrate how to use Hilbert space techniques to approximate the
/1
critical values for the weighted Cramér–von Mises statistics . 0 ┌ 2 (t)/w(t)dt. The
Karhunen–Loéve expansion, see e.g. pg. 188 of Hsing and Eubank (2015), yields
that
/ 1 ∞E
┌ 2 (t)
. dt = λi Ni2 ,
0 w(t)
i=1

where .N1 , N2 , . . . are independent and identically distributed standard normal ran-
dom variables, and .λ1 ≥ λ2 ≥ . . . are the eigenvalues of the kernel integral operator
associated with the kernel .C(t, s)/[w(t)w(s)] defined in (3.3.2). Specifically,
/ 1 C(t, s)
λi φi (t) =
. φi (s)ds, 1 ≤ i < ∞. (3.3.4)
0 w(t)w(s)

The eigenvalues .λ1 ≥ λ2 ≥ . . ., are unknown but we can estimate them from the
sample. The first step is the estimation of the covariance function .C. We will define
an estimator .ĈN below that is .L2 consistent, such that
⎛ ⎞2
/ 1/ 1 ĈN (t, s) − C(t, s)
. dtds = oP (1). (3.3.5)
0 0 w(t)w(s)

With such an estimator, estimates .λ̂1,N ≥ · · · ≥ λ̂N,N of the eigenvalues satisfying


(3.3.4) may be obtained by solving the eigenvalue equations
/ 1 ĈN (t, s)
λ̂i,N φ̂i,N (t) =
. φ̂i,N (s)ds, 1 ≤ i ≤ N.
0 w(t)w(s)

It follows from (3.3.5) via Theorem A.3.4 that


| |
| |
. |λ̂i,N − λi | = oP (1). (3.3.6)
3.3 Heteroscedastic Errors 127

Now we use the approximation


E E
d
. λi Ni2 ≈ λ̂i,N Ni2 (3.3.7)
i=1 i=1

to obtain approximate critical values of the Cramér–von Mises statistics. The choice
of d is similar to choice of the number of the eigenvectors (or eigenfunctions) in
principal component analysis. Large d reduces the empirical bias but introduces
higher variance in (3.3.7). We also note an upper bound for the rate of convergence
in (3.3.6) is the order of convergence in (3.3.5).
Now we discuss an estimator for .C satisfying (3.3.5). According to the definition
of .C this requires estimating .b(·) defined in (3.3.1). The proofs of the results are
based on approximating moments of sums, for which we make use of the .Lν –
decomposability of the .ei ’s.
To illustrate the method, first we assume that the errors are uncorrelated, so that

Eej el = 0, if j /= l.
. (3.3.8)

Let
LN t⎦
1 E 1 E
N
.b̂N (t) = (Xi − X̄N )2 , where X̄N = Xi . (3.3.9)
N N
i=1 i=1

Theorem 3.3.3 If .H0 of (1.1.2), Assumptions 1.2.1, 3.3.1, 3.3.2, and (3.3.8) hold,
and the innovations .{ei , i ∈ Z} are .Lν –decomposable for some .ν ≥ 4, then

. sup |b̂N (t) − b(t)| = oP (1).


0<t<1

Proof Since the function .b(·) is continuous and monotone on .[0, 1], it is enough to
show that for every .0 ≤ t ≤ 1,

P
b̂N (t) → b(t).
.

We write
LN t⎦
1 E 2 LN t⎦
b̂N (t) =
. a (i/N)ei2 − (X̄N − μ)2
N N
i=1

and
LN t⎦
1 E
X̄N − μ =
. a(i/N)ei .
N
i=1
128 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

Since .a(·) is bounded, we get due to the .Lν –decomposability of the errors that
⎛ ⎞2
LN
E t⎦ ⎛ ⎞
1 1
.E ⎝ a(i/N)ei ⎠ = O ,
N N
i=1

and hence by Chebyshev’s inequality

|X̄N − μ| = oP (1).
.

Also, using .Lν –decomposability, it follows that the series .{ei2 − Eei2 , i ∈ Z} is
.L
ν/2 –decomposable, from which it follows using Theorem A.3.1 that

| |ν
|LN |
|E t⎦
|
.E | 2 2 2 |
a (i/N)(ei − σ )| ≤ c1 N ν/2 .
|
| i=1 |

Therefore
LN t⎦
1 E 2
. a (i/N)(ei2 − σ 2 ) = oP (1).
N
i=1

Since
LN t⎦ / t
1 E 2 2
. σ a (i/N) → σ 2
a 2 (t)dt,
N 0
i=1

the result is proven. ⨆



It follows from Theorem 3.3.3 that

ĈN (t, s) = b̂N (min(t, s)) − t b̂N (s) − s b̂N (t) + ts b̂N (1)
.

satisfies (3.3.5) when .w(t) = 1.


In Theorem 3.3.3 we used that under (3.3.8) the parameter .σ 2 is the variance
of the sum of the .ei ’s. Our arguments show that in the general case we can use
long–run variance estimators. The estimator .b̂N (t) is the sample variance of the first
.LNt⎦ observations in case of uncorrelated data. In a similar way, we use the sample

correlations of the first .LNt⎦ observations to build our estimator in the general case.
For any .1 ≤ k ≤ N we define for .0 ≤ l ≤ k


⎪ 1 E
k−l

⎪ (Xi − X̄N )(Xi+l − X̄N ), 0 ≤ l ≤ k − 1,

⎨N
i=1
.γ̃k,l =
⎪ 1 E
k



⎪ (Xi − X̄N )(Xi+l − X̄N ), −(k − 1) ≤ l < 0.
⎩N
i=−(l−1)
3.3 Heteroscedastic Errors 129

The estimator for .b(·) is the long–run variance estimator computed from
{X1 , . . . , XLN t⎦ }. We note that .{Xi , 1 ≤ i ≤ N } is not necessarily a stationary
.

sequence, althoughΣwe still think of the “long–run variance” as the limit of the
variance of .N −1/2 N i=1 Xi . We let

LNE
t⎦−1 ⎛ ⎞
l
b̃N (t) =
. K γ̃LN t⎦,l , (3.3.10)
h
l=−(LN t⎦−1)

where K and h are respectively kernel and bandwidth parameters satisfying


Assumptions 3.1.5 and 3.1.4. We refer a proof of the following result to Górecki
et al. (2018).
Theorem 3.3.4 If .H0 of (1.1.2), Assumptions 1.2.1, 3.1.4, 3.1.5, 3.3.1, and 3.3.2
hold, and the errors .{ei , i ∈ Z} in (3.3.1) are .Lν –decomposable for some .ν > 4,
then
/ 1
. (b̃N (t) − b(t))2 dt = oP (1).
0

The estimator for .C(t, s) is defined as before but we use .b̃N (t) instead of .b̂N (t).
It follows from Theorem 3.3.4 that

ĈN (t, s) = b̃N (min(t, s)) − t b̃N (s) − s b̃N (t) + ts b̃N (1)
. (3.3.11)

satisfies (3.3.5) when .w(t) = 1.


The consistency of the standard and weighted CUSUM statistic with het-
eroscedastic data can be established similarly as with homoscedastic data. As before

.θN = k /N and .ΔN denote the proportion of the sample when the change in the

means occur and the size of the change. The proof of the following result is similar
to that of Theorem 2.1.1.
Theorem 3.3.5 We assume that .HA of (1.1.3), Assumptions 3.3.1, 3.3.2, 3.3.4 are
satisfied and

b(t) b(1) − b(t)


. lim sup < ∞ and lim sup < ∞.
t→0 t t→0 1−t

If .0 ≤ κ < 1/2, then

|ZN (N t/(N + 1))| P


. sup → ∞
0<t<1 [t (1 − t)]κ

if and only if

(θN (1 − θN ))1−κ N 1/2 |ΔN | → ∞.


.
130 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

We note that Górecki et al. (2018) discusses the consistency of the CUSUM
method with heteroscedastic data for other potential alternatives, including multiple
changes in the mean and a polynomially increasing mean after the change.
Limit theorems for the location of the time of change estimator are also affected
by heteroscedasticity in the data. We recall
| k |
|E k E ||
N
1 |
.k̂N = k̂N (κ) = sargmax Xi −
κ |
Xi | .
k∈{1,...,N −1} [k(N − k)] | N
i=1
|
i=1

The asymptotic properties of .k̂N are similar to those presented in Theorem 2.2.1,
although a possible jump in the variance of the model errors introduces some
differences. Let .a(θ −) = limx↓θ a(x), and .a(θ +) = limx↑θ a(x). Define
⎧⎛ ⎞1/2

⎪ 2a 2 (θ −)

⎨ 2 W1 (−t), if t < 0,

.W (t) =
a (θ −) + a 2 (θ )
⎛ ⎞1/2 (3.3.12)

⎪ 2a 2 (θ )

⎩ 2 W2 (t), if t ≥ 0,
a (θ −) + a 2 (θ )
where .{W1 (t), t ≥ 0} and .{W2 (t), t ≥ 0} are independent Wiener processes. If
a(x) is continuous at .θ , then .W ∗ is a standard two sided Wiener process as in
.

Theorem 2.2.1. Let

ξ ∗ (κ) = argmax{W ∗ (s) − |s|mκ (s)},


.

where .mκ (s) is defined in (2.2.1).


Theorem 3.3.6 We suppose .HA of (1.1.3), and Assumptions 2.1.1, 3.3.1, and 3.3.2
hold, and the errors .{ei , i ∈ Z} are .Lν −decomposable for some .ν > 4. If in
addition .0 ≤ κ ≤ 1/2, and

.ΔN → 0 and NΔ2N → ∞, if 0 ≤ κ < 1/2, or


ΔN → 0 and NΔ2N / log log N → ∞, if κ = 1/2,

then

2Δ2N ⎛ ⎞ D

. k̂ N − k → ξ ∗ (κ).
a 2 (θ −) + a 2 (θ )

Proof We follow the proof of Theorem 2.2.1, but now we assume that .κ = 0.
Repeating the calculations we obtain that the limit distribution of .k̂N is determined
by
⎛ ⎞
E
k k∗
E
k ∗ (N − k ∗ )
Qk,5
. =2 ΔN ⎝ Ei − Ei ⎠
N
i=1 i=1
3.3 Heteroscedastic Errors 131

and
⎛⎛ ⎞2
k(N − k ∗ )
Qk,6 =
. 1{1 ≤ k ≤ k ∗ }
N
⎛ ⎞2 ⎛ ⎞2 ⎞
k ∗ (N − k) k ∗ (N − k ∗ )
+ 1{k ∗ < k ≤ N} − Δ2N
N N

on the set .|k − k ∗ | ≤ A/Δ2N , where A is an arbitrary positive scalar. It follows that
for .C > 0, and .c > 0,
| |
|1 |
. sup || Qk∗+cs/Δ2 ,6 + 2cθ (1 − θ )|s|m0 (s)|| = o(1). (3.3.13)
−C≤s≤C N N

According to the definition of .Qk,5 , if .s < 0, then



1 k ∗ (N − k ∗ ) E
k
. Qk ∗ +cs/Δ2 ,5 = −2 ΔN Ei
N N N
i=k ∗ +cs/Δ2N +1

and for .s ≥ 0

k ∗ +cs/Δ2N
1 k ∗ (N − k ∗ ) E
. Qk ∗ +cs/Δ2 ,5 = −2 ΔN Ei .
N N N
i=k ∗ +1

Now we use Theorem A.1.1 to define two independent Wiener processes .WN,1 and
WN,2 such that for all .A > 0
.

| |
| k∗
|
| E |
| |
. sup |ΔN ei − ΔN WN,1 (−s)| = oP (1) (3.3.14)
|
−A≤s≤0 | |
i=k ∗ +s/Δ2N +1 |

and
| |
| k ∗ +s/Δ2N |
| E |
. sup | ei − WN,2 (s)|| = oP (1). (3.3.15)
| ΔN
0≤s≤A | i=k ∗ +1 |
132 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

Using Abel’s summation formula, as in the proof of Theorem 3.3.1, (3.3.14) and
(3.3.15) imply that for the sums of the .Ei ’s:
| |
| k∗ / k∗ |
| E |
| |
. sup |ΔN Ei − a(x/N)dWN,1 (xΔ2N )| = oP (1)
|
−A≤s≤0 | ∗
k +s/ΔN +1
2 |
i=k ∗ +s/Δ2N +1 |
(3.3.16)

and
| |
| k ∗ +s/Δ2N / k ∗ +s/Δ2 |
| E N |
. sup | Ei − 2 |
a(x/N)dWN,2 (xΔN )| = oP (1). (3.3.17)
| ΔN
0≤s≤A | i=k ∗ +1 k∗ |

The processes
/ k∗
UN,1 (s) =
. a(x/N)dWN,1 (xΔ2N )
k ∗ +s/Δ2N +1

and
/ k ∗ +s/Δ2N
UN,2 (s) =
. a(x/N)dWN,2 (xΔ2N )
k∗

are independent Gaussian processes with zero mean, and covariances


/ k∗
'
.EUN,1 (s)UN,1 (s ) = a 2 (x/N)d(xΔ2N ).
k ∗ −min(−s,−s ' )/Δ2N +1

Since .N Δ2N → ∞ we get


| |
| |
. sup |EUN,1 (s)UN,1 (s ' ) − a 2 (θ −) min(−s, −s ' )| = o(1),
−A≤s,s ' ≤0

and similarly
| |
| |
. sup |EUN,2 (s)UN,2 (s ' ) − a 2 (θ +) min(s, s ' )| = o(1).
0≤s,s ' ≤A

Hence for every .C > 0

1 D [−C,C]
. Qk ∗ +cs/Δ2 ,5 −→ 2θ (1 − θ )c1/2 W̃ (s), (3.3.18)
N N

where

a(θ −)W1 (−t), if t < 0,
W̃ (t) =
. (3.3.19)
a(θ )W2 (t), if t ≥ 0,
3.3 Heteroscedastic Errors 133

Taking

a 2 (θ −) + a 2 (θ )
c=
.
2

leads to .{c1/2 W̃ (t), −∞ < t < ∞} having the same distribution as .{W ∗ (t), −∞ <
t < ∞}. For any .C > 0 we define
| k |
|E k E ||
N
|
.k̂N (C) = sargmax | Xi − Xi | .
{k : |k ∗ −k|≤C} |
i=1
N
i=1
|

Since .{W ∗ (s), −∞ < s < ∞} and .{W ∗ (s), −∞ < s < ∞} have the same distribu-
tion, (3.3.13) and (3.3.18) imply

2Δ2N ⎛ ⎞ D

. k̂ N (C) − k → argmax|s|≤C {W ∗ (s) − |s|m0 (s)}.
a 2 (θ −) + a 2 (θ )

Observing that

argmax|s|≤C {W ∗ (s) − |s|m0 (s)} → ξ ∗ (0)


. a.s., as C → ∞,

the proof is complete. ⨆



Theorem 3.3.6 shows how the limit distribution of the CUSUM change point
estimator depends on a change in the variance occurring at the same time. We
note that if the variance changes continuously, then this distribution is similar to
the homoscedastic case. Since .a(θ ), a(θ −) and .a(θ +) are related to the variances
of the .Xi ’s in a neighbourhood of .θ , the result in essence shows that the distribution
of the change point estimator only depends on those observations with indices close
to .Nθ .
If the change in the mean, rather than asymptotically shrinking, remains constant,
we have the following analogue of Theorem 2.2.2 in the heteroscedastic case. Let

⎪ E−1



⎪ −a(θ −) Ei , if l < 0,


⎨ i=l
.S̄(l) = 0, if l = 0,



⎪ E
l

⎪ +)

⎩ a(θ Ei , if l > 0.
i=1
134 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

We define
{ }
ξ̄Δ (κ) = argmaxl ΔS̄(l) − Δ2 |l|mκ (l) ,
. (3.3.20)

where .mκ (t) is defined in (2.2.1). The proof of the below result is similar to the
above and Theorem 2.2.2.
Theorem 3.3.7 If .HA of (1.1.3), Assumptions 2.1.1, 3.3.1, and 3.3.2 hold, and the
errors .{ei , i ∈ Z} are .Lν −decomposable for some .ν > 4, and .0 ≤ κ ≤ 1/2.

. lim ΔN = Δ /= 0.
N →∞

then
D
. k̂N − k ∗ → ξ̄Δ (κ),

where .ξ̄0,Δ is defined in (3.3.20).

3.4 Data Examples

Example 3.4.1 (River Nile Data Revisited) In this example we revisit the change
point analysis of the river Nile flow series considered in Example 2.5.1. In particular,
we turn our attention to investigating the effect of using different long–run variance
estimators in order to compute the critical levels for the maximized CUSUM process
considered.
A plot of the empirical autocorrelation function (ACF) of the river Nile flow
series is given in the left–hand panel of Fig. 3.1. It is apparent that the magnitude
of the autocorrelation observed in the series is significantly larger than what one
would expect for a sequence of independent and identically distributed variables.
This might be attributed to either the presences of genuine autocorrleation in the
sequence, changes in the mean of the series that are not taken into account when
the ACF is estimated, or both. If autocorrelation is present in the sequence, then
we may estimate the parameter σ 2 using the kernel–bandwidth LRV estimator
in (3.1.18). With the aim of producing such an estimator based on the Bartlett
kernel, we computed the automatic bandwidth parameter based on fitting an
autoregressive process of order one to the series as defined in (3.1.30), giving
ĥ ≈ 6.5. The automatic bandwidth computed as detailed in Newey and West
(1987) was similar. The resulting long–run variance estimator with this bandwidth
2
was σ̂N,LRV = 86537.37. This value is nearly four times larger than the sample
variance σ̂N = 28637.95. Figure 3.2 shows a plot of |QN (t)| with a horizontal
2

black dotted line indicating the 95% quantile of σ̂N sup0≤t≤1 |B(t)|, which also
3.4 Data Examples 135

ACF Nile River Series ACF of Centered Nile River Series

1.0
1.0

0.8
0.8

0.6
0.6
ACF

0.4
0.4

ACF

0.2
0.2

0.0
0.0
−0.2

−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag

Fig. 3.1 The left–hand panel show a plot of the ACF of the river Nile flow series. It appears that
the series exhibits significant autocorrelation. This could be attributed to the fact that a potential
mean change in the series is not accounted for in calculating the ACF. The right hand panel shows
the ACF of the river Nile series that was centered using a single change point estimate k̂N (0) as
described in (3.1.12)

appears in Fig. 2.2, and the horizontal blue dotted line shows the 95% quantile of
σ̂N,LRV sup0≤t≤1 |B(t)|. We see that in this case the maximum absolute value of
the CUSUM process exceeds both thresholds, suggesting that our earlier conclusion
that the series contains a change point remains the same when we factor in the
observed autocorrelation in the sequence by estimating σ 2 using a long–run variance
estimator.
Given that a potential change point in the mean of the series would also lead to
large values of the ACF, yet another option is the estimate the variance parameter
σ 2 after centering the data taking into account potential change points as in (3.1.33).
We estimated the location of a single change in the mean using k̂N (0) at the
location k̂N (0) = 28, and then centered the series using the mean estimates before
and after this change as in (3.1.32). The ACF of the resulting series appears in
the right hand panel of Fig. 3.1, which shows that centering the data based on a
change point estimator appears to remove most of the observed autocorrelation.
Estimating the bandwidth parameter in the same way based on the change point–
centered data results in ĥ ≈ 2.5, and an updated long–run variance estimate of
2 = 19020.28. The horizontal red dotted line in Fig. 3.2 shows the 95% quantile
σ̄N,1
of σ̄N,1 sup0≤t≤1 |B(t)|.
In this case the conclusions of the analysis do not change as a result of the
method used to estimate σ 2 . One should generally check whether this is the case
in conducting change point analyses.
136 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

Fig. 3.2 Plot of the CUSUM

600
process |QN (t)| for the river ^ N,LRV
Nile flow series. The σ
horizontal black dotted line ^N
σ

500
shows the 95% quantile of
σ̂N sup0≤t≤1 |B(t)|, which σN,1
also appears in Fig. 2.2, the

400
horizontal blue dotted line
shows the 95% quantile of
σ̂N,LRV sup0≤t≤1 |B(t)|, and
the horizontal red dotted line

QN(t)

300
shows the 95% quantile of
σ̄N,1 sup0≤t≤1 |B(t)|

200
100
0

Example 3.4.2 (Sticky Consumer Price Index) In this example we consider


change point detection in the mean level of the time series of the monthly percentage
change at an annual rate of the Sticky Consumer Price Index (CPI) over the period
from January, 2010 to March, 2023, obtained from the Federal Reserve Economic
Database of St. Louis St. Louis (2023). The Sticky CPI is calculated based on a
subset of goods and services that have price changes infrequently, in some cases due
to the inherent cost of making a price change. These include for example education
and medical care services. The Sticky CPI is thought to better reflect consumer
expectations about future inflation.
This time series is illustrated in Fig. 3.3. An apparent change in the variability
of the series occurs just before an apparent shift in the mean, suggesting that
a heteroscedastic error model as discussed in Sect. 3.3 is more plausible than
assuming stationary errors as in Chap. 1. The unweighted CUSUM process ZN is
shown in the left–hand panel of Fig. 3.4, which has a prominent peak corresponding
to the month September, 2020.
After recentering the data based on this preliminary change point estimator
as in (3.1.32), we computed the rolling variance estimators assuming the model
innovations are serially uncorrelated, b̂N , and serially correlated, b̃N , as described
in (3.3.9) and (3.3.10), respectively. In the case of the serially correlated long–run
variance process in (3.3.10), we used a truncated kernel, and bandwidth computed
from the method of Andrews (1991). These variance processes are shown in the
right–panel of 3.4. In both cases we observe a steep increase in the functions b̂N (t)
and b̃N (t) occurring just before September, 2020. Since the data have been re–
3.4 Data Examples 137

8
6
Percent Change Sticky CPI

4
2
0
−2

2010−1 2020−9 2023−3

Fig. 3.3 Plot of the monthly percentage change at an annual rate of the Sticky Consumer Price
Index (CPI) over the period from January, 2010 to March, 2023. A change in the mean level of the
series appears to occur just after a change in the variability of the series

99% quantile Correlated


4

95% quantile Uncorrelated


5
4

3
|ZN(t)|
3

bN(t)
2
^
2

1
1
0

t t

Fig. 3.4 The left–hand panel shows the unweighted absolute CUSUM process |ZN |, along with
95% and 99% estimated quantiles of the random variable sup0<t<1 |┌(t)| defined in Theorem 3.3.2,
where the covariance kernel of the process ┌ is estimated using b̃N . The right–hand panel shows
the processes b̂N (uncorrelated) and b̃N (correlated) as described in (3.3.9) and (3.3.10)

centered based on a preliminary change point estimator before computing b̂N (t) and
b̃N (t), this appears to reflect the increased variance of the underlying series. We also
noticed that the series has reasonably strong autocorrelation, suggesting that using
b̃N is more appropriate when estimating the limiting distributions in Theorem 3.3.2.
We used b̃N to estimate the covariance kernel in (3.3.11), and subsequently used
138 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

simulation to estimate the quantiles of sup0<t<1 |┌(t)| in Theorem 3.3.2. The 95%
and 99% estimated quantiles of this distribution are also shown in the left–hand
panel of 3.4, which indicate that the change point in the level of the series observed
is highly significant. Binary segmentation applied after segmenting the series based
on this change point estimate suggest that no further change points of significance
can be detected.
Example 3.4.3 (Log–Returns Covariation and Volatility) In this example we
study the log returns of the adjusted closing stock prices between July 19, 1993, and
March 19 , 2009 (N = 3941), of the 12 companies as studied in Aue et al. (2009a).
The 12 companies considered are listed in Table 3.1, and include four companies in
the airline sector, four in the automotive sector and four in the energy sector. The
raw data we consider take the form pj,l , denoting the price of stock l at time j . The
l’th coordinate of Xj is defined as the centered log–return of stock l,

⎛ ⎞ ⎛ ⎞
1 E
3941
pj +1,l pi+1,l
xj,l
. = log − log , j = 1, . . . , 3941; l = 1, . . . , 12.
pj,l 3941 pi,l
i=1

They are combined to form 12-dimensional vector–valued observations


X1 , . . . , X3941 . To illustrate the time variation of the covariance of the observations,
we have computed the rolling average of the volatility and cross-volatility with a
window of size 100 days

1 E
j
. γ̂j (k, l) = yi,k yi,l , j = 101, . . . , 3941; k, l = 1, . . . , 12.
100
i=j −100+1

Table 3.1 The 12 companies Symbol Name Sector


whose stock values are
studied in Example 3.4.3 1 ALK Alaska Air Group, Inc. Airline
2 AMR AMR Corp (American Airlines) Airline
3 CAL Continental Airlines, Inc. Airline
4 LUV Southwest Airlines, Co. Airline
5 F Ford Motor Co. Automotive
6 GM General Motors Corp. Automotive
7 HMC Honda Motor Co. Automotive
8 TM Toyota Motor Corp. Automotive
9 APA Apache Corp. Energy
10 APC Anadarko Petroleum Corp. Energy
11 OXY Occidental Petroleum Corp. Energy
12 RIG Transocean Inc. Energy
3.4 Data Examples 139

0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014

0.008
ALK ALK and AMR
AMR ALK and CAL
CAL ALK and LUV
LUV AMR and CAL
AMR and LUV

0.006
CAL and LUV

0.004
0.002
0.000
1994 1996 1998 2000 2002 2004 2006 2008 1994 1996 1998 2000 2002 2004 2006 2008

Fig. 3.5 The volatilities (left) and cross–volatilities (right) of the log–returns from the stocks in
the airline sector. Changes in the level of the cross–volatilities are apparent following 2001 and
2007

For the airline sector, the corresponding rolling averages are shown in Fig. 3.1.
There appear to be several locations at which a change in the covariance of the
returns occurs. To further assess this conjecture, we computed the test statistic value
Ω3941 = 60.07 as described in Sect. 3.2.2, where the long–run covariance matrix
E was computed according to Eq. 3.1.35 with a Bartlett kernel and bandwidth h =
log10 N . Since d = 12, we have d = 78. The approximate 95% null–quantile of the
statistic Ω3941 computed from Theorem 3.2.4 was 12.00. Therefore, there is strong
evidence against the hypothesis that there is no change in the covariance matrix
(Fig. 3.5).
In order to detect multiple changes in the covariance matrix, we applied binary
segmentation using the change point estimator (3.2.15). A summary and findings
of this application of binary segmentation are reported in Table 3.2. A number of
the detected changes can be readily associated with major historical events. For
example, the estimated changes in 2001 may be linked to the bursting of the dot–
com bubble and the September 11 attacks, while the break dates in 1997 and 1998
may be connected to the Asian financial crisis, and the collapse of the hedge fund
Long–Term Capital Management and the Russian financial crisis, respectively. The
detected breaks in 2007 and 2008 can be related to the collapse of the housing
market in the United States and several European countries. The detected change–
point on September 9, 2008 predates the collapse of the investment bank Lehman
Brothers by three trading days.
140 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

Table 3.2 A summary of the k̂ k̂ (Date) ΩN Round Significant


results of the segmentation
procedure performed on the 305 1994-09-30 11.60 4 No
entire data set. The estimated 647 1996-02-07 19.80 3 Yes
change-points are listed with 1021 1997-07-31 14.78 4 Yes
their corresponding test 1259 1998-07-13 34.05 2 Yes
statistic value, the round in 1400 1999-02-02 13.60 4 Yes
binary segmentation in which
the change was found, and 1770 2000-07-25 18.65 3 Yes
whether or not it was 1886 2001-01-11 12.07 4 Yes
significant at level 95% based 2101 2001-11-23 60.07 1 Yes
on the statistic ΩN 2358 2002-12-03 14.66 4 Yes
2728 2004-05-24 27.90 3 Yes
3175 2006-03-03 14.70 4 Yes
3589 2007-10-24 28.95 2 Yes
3710 2008-04-18 10.11 4 No
3809 2008-09-09 12.13 3 Yes
3871 2008-12-05 7.47 4 No

3.5 Exercises

Exercise 3.5.1 Show that if the observations {Xi , i ∈ Z} are strictly stationary
with EX0 = 0 and EX04 < ∞, then σ̂N2 (Naïve) defined in (3.1.17) is not a consistent
estimator of the long–run variance of the series.
Exercise 3.5.2 We assume that {Xi , i ∈ Z} is Lν –decomposable for some ν > 4.
Let
1 E
Q(k) = ⎛ ⎞
. (Xi − Xj )2
k
1≤i<j ≤k
2

and define

.ZN (t) = N 1/2 t (Q(LNt⎦) − Q(N − 2)) , 2/N ≤ t ≤ 1 − 2/N

and 0, if t /∈ [2/N, 1 − 2/N]. Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.3 We assume that X1 , . . . , XN are Lν –decomposable for some ν > 4.
Let
1 E
Q(k) = ⎛ ⎞
. ||Xi − Xj ||2
k
1≤i<j ≤k
2
3.5 Exercises 141

and define

. ZN (t) = N 1/2 t (Q(LNt⎦) − Q(N − 2)) , 2/N ≤ t ≤ 1 − 2/N

and 0, if t /∈ [2/N, 1 − 2/N]. Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.4 We assume that X1 , . . . , XN are Lν –decomposable for some ν > 4
and EXi = 0. Let

1E
k
.Q(k) = Xi Xi+1 , 1≤k ≤N −1
k
i=1

and define

ZN (t) = N 1/2 t (Q(LNt⎦) − Q(N − 1)) ,


. 1/N ≤ t < 1 − 1/N

and 0, if t /∈ [1/N, 1 − 1/N). Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.5 We assume that X1 , X2 , . . . , XN are Lν –decomposable for some
ν > 4. Let

1E 1 E
k N
Q(k) =
. (Xi − X̄N )(Xi+1 − X̄N ), 1 ≤ k ≤ N − 1 with X̄N = Xi
k N
i=1 i=1

and define

ZN (t) = N 1/2 t (Q(LNt⎦) − Q(N − 1)) ,


. 1/N ≤ t < 1 − 1/N

and 0, if t /∈ [1/N, 1 − 1/N). Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.6 We assume that X1 , X2 , . . . , XN are Lν –decomposable for some
ν > 4 and EXi = 0. Let

1E
k
Q(k) =
. Xi Xi+1 , 1≤k ≤N −1
k
i=1

and define

. ZN (t) = N 1/2 t (Q(LNt⎦) − Q(N − 1)) , 1/N ≤ t ≤ 1 − 1/N


142 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

and 0, if t /∈ [1/N, 1 − 1/N). Show that ZN (t)/[t (1 − t)]γ converges in D[0, 1] for
all 0 ≤ γ < 1/2 and determine the limit.
Exercise 3.5.7 We assume that X1 , X2 , . . . , XN are Lν –decomposable for some
ν > 4 and EXi = 0. Let

1E
k
.Q(k; l) = Xi Xi+l , 1≤k ≤N −l
k
i=1

where l ≥ 0 and define

ZN (t) = N 1/2 t (Q(LNt⎦; l) − Q(N − l; l)) ,


. l/N ≤ t < 1 − l/N

and 0, if t /∈ [l/N, 1 − l/N). Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.8 We assume that X1 , X2 , . . . , XN are Lν –decomposable for some
ν > 4 and EXi = 0. Let

1E T
k
Q(k) =
. Xi Xi+1 , 1≤k ≤N −1
k
i=1

and define

ZN (t) = N 1/2 t (Q(LNt⎦) − Q(N − 1)) ,


. 1/N ≤ t < 1 − 1/N

and 0, if t /∈ [1/N, 1 − 1/N). Show that ZN (t) converges in D[0, 1] and determine
the limit.
Exercise 3.5.9 We assume that X1 , X2 , . . . , XN are Lν –decomposable for some
ν > 4 and EXi = 0. Let

1 E || ||
k
|| ||
Q(k) =
. ||Xi XT
i+1 || , 1≤k ≤N −1
k
i=1

and define

ZN (t) = N 1/2 t (Q(LNt⎦) − Q(N − 1)) ,


. 1/N ≤ t < 1 − 1/N

and 0, if t /∈ [1/N, 1 − 1/N). Show that ZN (t) converges in D[0, 1] and determine
the limit.
3.6 Bibliographic Notes and Remarks 143

3.6 Bibliographic Notes and Remarks

Parzen (1957) and Grenander and Rosenblatt (1957) introduced the kernel estimator
for the long–run variance. It was generalized to define long–run covariance matrices
in Newey and West (1987), Andrews (1991), and Andrews and Monahan (1992)
who also discussed the optimal choice of the bandwidth or smoothing parameter.
Sun et al. (2008) obtains optimal windows for robust testing procedures. Politis and
Romano (1995) advocated the flat top kernel as a way to reduce the bias of the kernel
estimator. Liu and Wu (2010) proves the central limit theorem for kernel estimates of
the long–run variance. The idea of ratio statistics appeared in Kim (2000). Horváth
et al. (2008) applied the idea of ratio statistics to find changes in the mean. See also
Pešta and Wendler (2020). Surgailis et al. (2008) used the ratios of second order
increments of strongly dependent observations. Shao (2015) and Shao and Zhang
(2010) review some self–normalized methods to conduct change point analysis with
time series similar to those in Sect. 3.1.3 that can be viewed as CUSUM statistics
normalized by a long–run variance estimator that has a fixed (rather than growing the
with the sample size) bandwidth. Betken (2016) develops a non–parameteric self–
normalized change point detection procedure. The non-monotonic power problem
is discussed in Crainiceanu and Vogelsang (2007) and Vogelsang (1997).
We demonstrated that CUSUM methods are extended easily for changes in other
summaries of the observations that may be represented as expected values, like
the variance or covariance matrix. For example, Berkes et al. (2009a) introduced
standardized tests to see if the covariances of linear process change during the
observation period. Galeano and Pena (2007) studies possible changes in the
covariances of dependent vectors. Aue et al. (2009a), Wied et al. (2012) and Steland
(2020) applied CUSUM statistics to detect changes in the covariance structure of
dependent vectors. A robust method to test for changes in the scale of multivariate
observations based on data depth was put forward in Chenouri et al. (2020). For
further results on detecting changes in the second order behaviour of time series we
refer to Wied et al. (2012) and Bücher et al. (2014).
Inclán and Tiao (1994), Gombay and Horváth (1994), Davis et al. (1995), Lee and
Park (2001), Deng and Perron (2008), Antoch et al. (1997), Berkes et al. (2009a),
Aue et al. (2009a), Wied et al. (2012), Wied et al. (2013) and Zhou (2013) propose
tests when the mean and/or the variance are changing under the alternative, i.e.
heteroscedastic errors can occur under the alternative. Dalla et al. (2020) and Xu
(2015) point out that in some applications the errors are heteroscedastic, which
should be taken into account when we test the validity of the no–change in the
mean null hypothesis. Busetti and Taylor (2004), Cavaliere et al. (2011), Cavaliere
and Taylor (2008), Hansen (1992) and Harvey et al. (2006) investigate change point
tests when some type of non–stationarity is exhibited by the data, including for
second order properties in Dette et al. (2019). The discussion in Sect. 3.3 is based on
Górecki et al. (2017). We assumed in Sect. 3.3 that the volatility might be changing
144 3 Variance Estimation, Change Points in Variance, and Heteroscedasticity

under the null as well as under the alternative. Wu and Xiao (2018) devise methods
to test if the volatility of the errors is non-constant a function of time.
Estimating the variance of the underlying model errors in the presence of a shift
in the mean has been studied in a number of different contexts. See for example Axt
and Fried (2020), Fryzlewicz (2014), and Gallagher et al. (2022).
Chapter 4
Regression Models

When two or more variables are observed concurrently, it is often of interest to


evaluate whether their relationship appears to stay constant over time, or if instead
it appears to change. A simple framework to address questions of this type is
when the variables are related to each other through a parametric regression model.
In this case a change in the relationship may be characterized by a change in
the model parameters. This chapter is devoted to the development of asymptotic
methods to perform change point analysis in the context of regression models.
Section 4.1.1 begins with linear regression models, and considers the asymptotic
properties of methods that arise from performing maximally selected likelihood
ratio tests assuming normality of the model errors to perform change point analysis,
which we term the “quasi–likelihood methods”. In Sect. 4.1.3, we consider tests
based directly on comparing sequential estimates of the linear model parameters.
The properties of change point estimators in this setting are developed in Sect. 4.2.1.
These methods are extended to develop change point procedures in the context of
non–linear, generalized method of moments, and polynomial regression in Sects. 4.3
and 4.4. Section 4.5 puts forward methods for performing change point analysis in
the distribution of the model errors in a linear model.

4.1 Change Point Detection Methods for Linear Models

4.1.1 The Quasi–Likelihood Method

We suppose that we have observed a sequence .(yi , xi ), .i ∈ {1, . . . , N }, where


yi ∈ R we think of as being a response or dependent variable, and the .xi =
.

(xi,1 , xi,2 , . . . , xi,d )T ∈ Rd is a d-dimensional vector of covariates or explanatory

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 145
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_4
146 4 Regression Models

variables. We begin by studying the case when .(yi , xi ) follow a linear model with
AMOC in the linear regression parameters. In particular, we assume that

xT
i β 0 + Ei , 1 ≤ i ≤ k∗,
yi =
.
T (4.1.1)
xi β A + Ei , k ∗ + 1 ≤ i ≤ N.

Here .β 0 , β A ∈ Rd , and .{Ei , i ∈ Z} is a sequence of model errors that satisfy


EEi = 0. The no–change null hypothesis is stated as
.

. H0 : β 0 = β A , (4.1.2)

and the alternative of a change in parameters is defined by

HA : β 0 /= β A .
. (4.1.3)

A natural approach to test .H0 versus .HA is to test, for each .k ∈ {1, . . . , N},
for the equality of the linear model parameters between the two samples .(yi , xi ),
.i ∈ {1, . . . , k} and .(yj , xj ), .j ∈ {k + 1, . . . , N }, and then use as evidence against

.H0 the most significant of those N tests. When the model errors are independent

and identically distributed normal random variables with variance .EEi2 = σ 2 , the
likelihood ratio for this two sample test reduces to
⎛ 2 + σ̂ 2
⎞N/2
σ̂k,1 k,2
Λk =
.
2
, (4.1.4)
σ̂N,1

where

1 E 1 E
k N
2
σ̂k,1
. = (yi − xT β̂
i k,1 ) 2
, and σ̂ 2
k,2 = (yi − xT 2
i β̂ k,2 ) . (4.1.5)
N N
i=1 i=k+1

Here .β̂ k,1 and .β̂ k,2 are the least-squares estimators of the vector of regression
parameters based on, respectively, the first k and last .N − k observations. These
estimators are defined by
⎛ ⎞−1 ⎛ ⎞−1
β̂ k,1 = XT
. k,1 Xk,1 XT
k,1 Yk,1 , β̂ k,2 = XT
k,2 Xk,2 XT
k,2 Yk,2 , (4.1.6)
4.1 Change Point Detection Methods for Linear Models 147

where .Yk,1 = (y1 , y2 , . . . , yk )T , .Yk,2 = (yk+1 , yk+2 , . . . , yN )T ,


⎛ ⎞ ⎛ ⎞
x1,1 , x1,2 , . . . , x1,d xk+1,1 , xk+1,2 , . . . , xk+1,d
⎜ x2,1 , x2,2 , . . . , x2,d ⎟ ⎜ xk+2,1 , xk+2,2 , . . . , xk+2,d ⎟
⎜ ⎟ ⎜ ⎟
Xk,1
. = ⎜. ⎟ and Xk,2 = ⎜ . ⎟.
⎝ .. ⎠ ⎝ .. ⎠
xk,1 , xk,2 , . . . , xk,d xN,1 , xN,2 , . . . , xN,d

In order that these estimators are asymptotically well defined, we require that the
matrices .XT T
k,1 Xk,1 and .Xk,2 Xk,2 are non singular for large enough k and .N − k
which will be implied by
Assumption 4.1.1 The matrix .A = {ai,j , i, j ∈ {1, . . . , d}} ∈ Rd×d is a non-
singular matrix, where .ai,j = Ex0,i x0,j .
Let

E ┌ ┐
D=
. E E0 El x0 xT
l . (4.1.7)
l=−∞

We show in the proof of Theorem 4.1.1 that the infinite sum defining .D is absolutely
convergent if, for example, the vector valued process .{zi = (xT T
i , Ei ) , i ∈ Z}
taking values in .R d+1 ν
is strictly stationary and .L –decomposable, as defined in
Definition 3.1.1, for some .ν > 4.
Asymptotic consistency of the least squares estimators in (4.1.6) requires the
following assumption:

Assumption 4.1.2 .EE0 = 0 and .Ex0 E0 = 0.


Theorem 4.1.1 If .H0 of (4.1.2) and Assumptions 4.1.1–4.1.2 are satisfied, and
{zi = (xT
.
T
i , Ei ) , i ∈ Z} is .L –decomposable for some .ν > 4, then
ν

D [0,1] 1
t (1 − t)(−2 log ΛLN t⎦ ) −→
. (┌(t) − t┌(1))T A−1 (┌(t) − t┌(1)),
σ2

where .{┌(t), 0 ≤ t ≤ 1} is a Gaussian process with values in .Rd , .E┌(t) = 0 and


.E┌(t)(┌(s))
T = min(t, s)D.

2 and .σ̂ 2 we have


Proof Using the definitions of .σ̂k,1 k,2

E
k E
N
2
.N σ̂k,1 = Ei2 − Dk,1 and 2
N σ̂k,2 = Ei2 − Dk,2 ,
i=1 i=k+1
148 4 Regression Models

where
⎛ ⎞T ⎛ ⎞−1
. Dk,1 = XT
k,1 Ek,1 XT
k,1 Xk,1 XT
k,1 Ek,1 ,

⎛ ⎞T ⎛ ⎞−1
. Dk,2 = XT
k,2 Ek,2 XT
k,2 Xk,2 XT
k,2 Ek,2 ,

and

Ek,1 = (E1 , E2 , . . . , Ek )T ,
. Ek,2 = (Ek+1 , Ek+2 , . . . , EN )T . (4.1.8)

Let .{z∗i,m = (x∗T ∗ T


i,m , Ei,m ) , i ∈ Z} be the m–dependent approximations of .zi as
defined in Definition 3.1.1. Using .Lν –decomposability and the Cauchy–Schwarz
inequality, we get that
|| ||ν/2 ( || || || || || ||)ν/2
|| ∗ T ||
E ||xi xT
.

i − xi,l (xi,l ) || ≤ E ||xi || ||xi − x∗i,l || + ||x∗i,l || ||xi − x∗i,l ||
⎛ ( || ||)ν/2 (|| || || ||)ν/2 ⎞
≤ 2ν/2 E ||xi || ||xi − x∗i,l || + E ||x∗i,l || ||xi − x∗i,l ||
( )1/2 ( || ||ν )1/2
≤ 21+ν/2 E||x1 ||ν E ||xi − x∗i,l || ,

and therefore
⎛ ⎛ || ||ν/2 ⎞2/ν
|| T ∗ ∗ T ||
. E ||xi xi − xi,l (xi,l ) || ≤ c1 l−α (4.1.9)

with some constant


E .c1 . Since .ν > 4, we can use Theorem A.1.3 in order to

approximate . ki=1 (xi xT i − A) with a vector valued Wiener process. By the law
of the iterated logarithm for Wiener processes (see Breiman, 1968, p. 64),
|| k ||
||E ||
1 || T ||
. max || (xi xi − A)|| = OP (1). (4.1.10)
1≤k≤N (k log+ log(k))1/2 || ||
i=1

The same argument gives


|| N ||
|| E ||
1 || T ||
. max || (xi xi − A)|| = OP (1), (4.1.11)
1≤k<N ((N − k) log+ log(N − k))1/2 || ||
i=k+1

|| ||
1 || T ||
. max || Xk,1 k,1 || = OP (1),
E (4.1.12)
1≤k≤N (k log+ log(k))1/2
4.1 Change Point Detection Methods for Linear Models 149

|| ||
1 || T ||
. max || X k,2 E k,2 || = OP (1), (4.1.13)
1≤k<N ((N − k) log+ log(N − k))1/2
| k |
|E |
1 | 2 |
. max | (Ei − σ )| = OP (1),
2
(4.1.14)
1≤k≤N (k log+ log(k))1/2 | |
i=1

and
| N |
| E |
1 | 2 |
. max | (Ei − σ )| = OP (1).
2
(4.1.15)
1≤k<N ((N − k) log+ log(N − k))1/2 | |
i=k+1

By a two term Taylor expansion we obtain that

2 + σ̂ 2
σ̂k,1 k,2
. − 2 log Λk = −N log 2
(4.1.16)
σ̂N,1
⎡ 2 + σ̂ 2

σ̂k,1 k,2
= −N 2
− 1 + Rk ,
σ̂N,1

where
⎛ 2
⎞⎡ 2 + σ̂ 2
⎤2
σ̂N,1 σ̂k,1 k,2
|Rk | ≤ N
.
2 + σ̂ 2
+1 2
−1 .
σ̂k,1 k,2 σ̂N,1

Putting together (4.1.10)–(4.1.15) we conclude

2
σ̂N,1
. max 2 + σ̂ 2
= OP (1),
d<k<N −d σ̂k,1 k,2

and
| | ⎛⎛ ⎛ ⎞ ⎞
| 1 1 || log log N 1/2
|
.| − 2 | = OP .
| σ̂N,1
2 σ | N

Now
⎛ ⎞2 1 ( )2
.
2
σ̂k,1 + σ̂k,2
2
− σ̂N,1
2
= 2 Dk,1 + (Dk,2 − D1,N )
N
4 ⎛ 2 ⎞
≤ 2 Dk,1 + (Dk,2 − D1,N )2 .
N
150 4 Regression Models

So using again (4.1.10)–(4.1.15) we get

1 ⎛ ⎞
. max 2
Dk,1 + (Dk,2 − D1,N )2 = OP (1),
d<k≤N/2 log+ log k

and
1 ⎛ ⎞
. max 2
Dk,2 + (Dk,1 − D1,N )2 = OP (1).
N/2≤k<N −d log+ log(N − k)

From the above it follows that


N
. max |Rk | = OP (1),
d<k≤N/2 log+ log k

and
N
. max |Rk | = OP (1).
N/2≤k<N −d log+ log(N − k)

Let
⎛ ⎛
1 1 T 1
Zk =
. S (k)A−1 S(k) + (S(N ) − S(k))T A−1 (S(N ) − S(k))
2
σ k N −k

1
− ST (N )A−1 S(N ) (4.1.17)
N

with

E
k
S(k) = Xk EkT =
. xi Ei . (4.1.18)
i=1

We showed that

k 1/2
. max |−2 log Λk − Zk | = OP (1) (4.1.19)
d<k<N −d (log+ log k)3/2

and

(N − k)1/2
. max |−2 log Λk − Zk | = OP (1). (4.1.20)
d<k<N −d (log+ log(N − k))3/2
4.1 Change Point Detection Methods for Linear Models 151

By expanding the products defining .Zk in (4.1.17), we obtain that


⎛ ⎛ ⎞T ⎛ ⎛ ⎞
N k −1 k
.Zk = S(k) − S(N ) A S(k) − S(N ) . (4.1.21)
σ 2 k(N − k) N N

Putting together (4.1.19)–(4.1.21) gives


| ⎛ ⎛ ⎞
|k k
max | 1− (−2 log Λk ) (4.1.22)
d<k<N −d | N
.
N
⎡ ⎛ ⎛ ⎞⎤T ⎡ ⎛ ⎛ ⎞⎤ |
1 k k |
− 2 N −1/2 S(k) − S(N ) A−1 N −1/2 S(k) − S(N ) || = oP (1).
σ N N

Lν –decomposability and the Cauchy–Schwarz inequality once again imply


.

⎛ ⎞2/ν
. E||xi Ei − x∗i,l Ei,l
∗ ν/2
|| ≤ c2 l−α ,

with some .c2 > 0. Now the result in Theorem 4.1.1 follows from Theorem A.1.3.


Repeating the arguments used in Sect. 1.3, we can also derive from (4.1.19)–
(4.1.21) that
|
|
| max (−2 log Λk ) (4.1.23)
| d<k<N −d
.

⎛ ⎛ ⎞T ⎛ ⎛ ⎞|
N k k |
− max S(k) − S(N ) A −1
S(k) − S(N ) ||
d<k<N −d σ k(N − k)
2 N N
= oP (1/ log N).

The limit process depends on the unknown matrix .D and error variance .σ 2 , which
must be estimated in order to make use of Theorem 4.1.1 for testing .H0 versus .HA .
However, this limiting process does have a simple form when the model errors are
serially uncorrelated, or evolve as volatility processes, which we characterize by the
following assumption.

Assumption 4.1.3 .E(Ei |Fi ) = 0, where .Fi is the .σ –algebra generated by the
variables .{xl , El−1 , −∞ < l ≤ i}.
Using Assumption 4.1.3 we get that .D = σ 2 A, and therefore
⎛ ⎫ ⎧ d ⎫
1 D E
T −1
. (┌(t)−t┌(1)) A (┌(t)−t┌(1)), 0 ≤ t ≤ 1 = Bi (t), 0 ≤ t ≤ 1 ,
2
σ2
i=1
152 4 Regression Models

where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , d}, are independent standard Brownian
bridges. We recall from Theorem A.2.7

d
a(x) = (2 log x)1/2
. and bd (x) = 2 log x + log log x − log ┌ (d/2),
2
where .┌ (x) is the Gamma function. The following result is then a consequence of
Theorems 4.1.1 and 1.3.1.
Theorem 4.1.2 We assume that .H0 of (4.1.2) and Assumptions 4.1.1–4.1.3 are
satisfied, and that .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable for some .ν > 4.
ν

(i) We have

( ) D[0,1] E
d
t (1 − t) −2 log ΛLN t⎦ −→
. Bi2 (t), (4.1.24)
i=1

where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , d}, are independent Brownian bridges.
(ii) Also,
⎛ ⎫
. lim P a(log N) max (−2 log Λk ) 1/2
≤ x + bd (log N)
N →∞ d<k<N −d

= exp(−2e−x ) (4.1.25)

for all x, where .a(x) and .bd (x) are defined in (1.3.9).
(iii) If .min(t1 , 1 − t2 ) → 0, .N min(t1 , 1 − t2 ) → ∞ and .κ > 1/2, then

κ−1/2 (t (1 − t)(−2 log ΛLN t⎦ ))1/2 D


rN
. sup → āI (d, κ),
t1 ≤t≤t2 (t (1 − t))κ

where .I is the .d × d identity matrix, .rN and .āI (d, κ) are defined in (1.2.27) and
(1.3.10).
Since .−2 log ΛLN t⎦ can be approximated with a (weighted) CUSUM process one
can prove that

1 E 2
d
t (1 − t) ( ) D[0,1]
.
2
−2 log ΛL(N +1)t⎦ −→ 2
Bi (t),
w (t) w (t)
i=1

assuming that .w(t) satisfies Assumption 1.2.1 and .I (w, c) is finite for some .c > 0,
where .I (w, c) is defined in (1.2.4). It is interesting to note that if Assumption 4.1.3
holds, then the limit distributions of functionals of the .log likelihood process are
parameter free. The limit distributions are known in some cases (see Shorack and
Wellner, 1986), but it is also possible to use simulations to obtain critical values.
4.1 Change Point Detection Methods for Linear Models 153

4.1.2 Residual–Based Methods

The proof of Theorem 4.1.1 shows that the likelihood ratio test statistic is approx-
imately a CUSUM process of the weighted model errors. As such, and since the
model errors are unknown, it is natural to test for homogeneity of the model
parameters by evaluating for changes in the mean of the empirical residuals

. Êi = yi − xT
i β̂ N,1 , (4.1.26)

where .β̂ N,1 is the least squares estimator of the regression parameters computed
from the entire sample. Let
⎛ ⎞
LN
E LN
LNt⎦ E E
t⎦ N t⎦
. ẐN (t) = N −1/2 ⎝ xi Êi − ⎠
xi Êi = N −1/2
xi Êi (4.1.27)
N
i=1 i=1 i=1

be the CUSUM process of the weighted residuals .xi Êi , .i ∈ {1, . . . , N }. The identity
in (4.1.27) is a consequence of the definition of .β̂ N,1 . Sometimes we refer to
.ẐN (t) as the partial sum process of the weighted residuals. Although the right most

expression in (4.1.27) is the simplest, we often use both representations of .ẐN (t). It
is natural to normalize this process by .D−1 defined in (4.1.7), and so we assume the
following.
Assumption 4.1.4 .D is a non–singular matrix.
Theorem 4.1.3 We assume that .H0 of (4.1.2), Assumptions 1.2.1, 4.1.1, 4.1.2,
and 4.1.4 are satisfied, and that .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable
ν

for some .ν > 4.


(i) If .I (w, c) < ∞ for some .c > 0, then

1 ⎛ T ⎞1/2
. sup ẐN ((N + 1)t/N)D−1 ẐN ((N + 1)t/N)
0<t<1 w(t)
⎛ d ⎞1/2
D 1 E
→ sup 2
Bi (t) ,
0<t<1 w(t) i=1

where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , d} are independent Brownian bridges.


154 4 Regression Models

(ii) Also,
⎛ ⎛ ⎛ ⎞1/2
N
. lim P a(log N) max
N →∞ d<k<N −d k(N − k)
⎡⎛ ⎞T ⎛ k ⎞⎤1/2
Ek E ⎫
× ⎣ xi Êi D−1
xi Êi ⎦ ≤ x + bd (log N)
i=1 i=1

= exp(−2e−x )

for all .x ∈ R, where .a(x) and .bd (x) are defined in (1.3.9).
(iii) If .min(t1 , 1 − t2 ) → 0, .N min(t1 , 1 − t2 ) → ∞ and .κ > 1/2, then

κ−1/2 ||ẐT −1
N (t)D ẐN (t)||
1/2
D
rN
. sup → āI (d, κ),
t1 ≤t≤t2 (t (1 − t))κ

where .I is the .d × d identity matrix, .rN and .āI (d, κ) are defined in (1.2.27) and
(1.3.10).
Proof Since .H0 holds, we have

Êi = Ei − xT
. i (β̂ N,1 − β 0 ) (4.1.28)

and therefore

E
k
k E
N E
k
k E
N
. xi Êi − xi Êi = xi Ei − xi Ei
N N
i=1 i=1 i=1 i=1
⎛ k ⎞
E k E T
k
− xi xT
i − xi xi (β̂ N,1 − β 0 ).
N
i=1 i=1

We showed in the proof of Theorem 4.1.1 that

||β̂ N,1 − β 0 || = OP (N −1/2 ).


. (4.1.29)

We assume that .I (w, c) < ∞ for some .c > 0, so using Theorem 1.2.2 for each
coordinate we get that
|| ||
||L(N +1)t⎦ ||
N −1/2 || E L(N + 1)t⎦ EN
||
sup || T
xi xi − T ||
xi xi || = OP (1). (4.1.30)
.
||
w(t) ||
0<t<1 i=1
N
i=1 ||
4.1 Change Point Detection Methods for Linear Models 155

Similarly, Theorem 1.2.5 yields


⎛ ⎛ ⎞1/2 ||
||E
||
k E T ||
k N
N || T ||
. max || xi xi − xi xi || (4.1.31)
d<k<N −d k(N − k) || N ||
i=1 i=1

= OP ((log log N)1/2 ).

By (4.1.29)–(4.1.31) and since .D is the long–run covariance matrix of the process


{xi Ei , i ∈ Z}, Theorem 4.1.3(i) follows from (1.3.8).
.

Similarly, the second part of Theorem 4.1.3 is a consequence of Theorem 1.3.1.


Using Theorem 1.2.7 for the coordinates we obtain
|| k ||
N −1/2 ||E k E T ||
N
|| T || 1/2−κ
. max || xi xi − xi xi || = OP (rN ).
N t1 ≤k≤N t2 ((k/N)(1 − k/N ))κ || N ||
i=1 i=1

Hence (4.1.29) and Theorem 1.3.3 imply the third part of the theorem. ⨆

Similarly to Theorem 1.3.5, one can use the maximum norm of the CUSUM of
the weighted residuals. Let

ẑN (t) = D−1/2 ẐN ((N + 1)t/N)


. (4.1.32)

and .ẑN (t) = (ẑN,1 (t), . . . , ẑN,d (t))T .


The transformation with .D−1/2 gives again asymptotically pivotal limiting dis-
tributions. The following result can be established along the lines of Theorem 1.3.5.
Theorem 4.1.4 We assume that .H0 of (4.1.2), Assumptions 1.2.1, 4.1.1, 4.1.2
and 4.1.4 are satisfied, and that .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable
ν

for some .ν > 4.


(i) If Assumption 1.2.1 holds and .I (w, c) < ∞ for some .c > 0

1 || | D 1 || |
. max sup ẑN,j (t)| → max sup Bj (t)| ,
1≤j ≤d 0<t<1 w(t) 1≤j ≤d 0<t<1 w(t)

where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , d} are independent Brownian bridges.


(ii) Also,
⎛ ⎫
1 || |
. lim P a(log N) max sup ẑN,j (t)| ≤ x + b(log N)
N →∞ 1≤j ≤d 1/N <t<1−1/N w(t)

= exp(−2de−x )

for all .x ∈ R, where .a(x) and .b(x) are defined in (1.2.18).


156 4 Regression Models

When the dimension d is large, it can be advantageous to consider dimension


reduced versions of the process .ẐN . A standard approach is to apply principal com-
ponent analysis, which readily leads to CUSUM-type statistics with asymptotically
pivotal distributions. Let .λ1 ≥ . . . ≥ λd > 0 and .v1 , . . . , vd denote the eigenvalues
and the corresponding eigenvectors of .D. Next we define

.z̃N,j (t) = ẐT


N (t)vj , 1 ≤ j ≤ d. (4.1.33)

Now we state the principal component version of Theorem 4.1.4, which may be
applied to the projections of the CUSUM process onto the eigenvectors .v1 , . . . , vp
for .p ∈ {1, . . . , d}.
Theorem 4.1.5 We assume that .H0 of (4.1.2), .p ∈ {1, . . . , d}, Assump-
tions 1.2.1, 4.1.1, 4.1.2 and 4.1.4 are satisfied, and that .{zi = (xT T
i , Ei ) , i ∈ Z} is
ν
.L –decomposable for some .ν > 4.

(i) If Assumption 1.2.1 holds and .I (w, c) < ∞ for some .c > 0, then

1 | | D 1 || |
. max sup |z̃N,j (t)| → max sup Bj (t)| ,
1≤j ≤p 0<t<1 λ1/2 w(t) 1≤j ≤p 0<t<1 w(t)
j

where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, ..., p}, are independent Brownian bridges.
(ii) Also,
⎛ ⎫
1 | |
. lim P a(log N) max sup |z̃N,j (t)| ≤ x + b(log N)
N →∞ 1≤j ≤p 1/N <t<1−1/N 1/2
λj w(t)

= exp(−2pe−x )

for all .x ∈ R, where .a(x) and .b(x) are defined in (1.2.18).


In order to apply these results, the matrices .A and .D must usually be estimated.
A may be estimated with
.

1 E T
N
ÂN =
. xi xi . (4.1.34)
N
i=1

This estimator is consistent so long as the sequence .{xi , i ∈ Z} is ergodic. When


{xi , i ∈ Z} is .Lν –decomposable for some .ν ≥ 4, Theorem A.1.3 implies
.

|| ||
|| ||
. ||ÂN − A|| = OP (N −1/2 ).
4.1 Change Point Detection Methods for Linear Models 157

The matrix .D depends on the distribution of unobservable model errors .{Ei , i ∈ Z}.
In their place we use the residuals .Êi of (4.1.26). The long–run covariance matrix
estimator discussed in Sect. 3.1 is now computed from .xi Êi , .i ∈ {1, . . . , N }. Let

⎪ N −l

⎪ 1 E

⎪ xi xT
i+l Êi Êi+l , if 0 ≤ l < N,
⎨N −l
i=1
γ̂ l =
.

⎪ 1 E
N

⎪ xi xT

⎩ N − |l| i+l Êi Êi+l , if − N < l < 0,
i=−(l−1)

and the kernel lag-window covariance matrix estimator for .D is defined by

E
N −1 ⎛ ⎛ ⎞
l
D̂N =
. K γ̂ l . (4.1.35)
h
l=−(N −1)

Theorem 4.1.6 If .H0 of (4.1.2), Assumptions 3.1.4, 3.1.5, 4.1.1, 4.1.2 and 4.1.4
hold, then

P
D̂N → D.
.

Proof The proof follows from Theorem 3.1.5 upon showing that the residuals .Êi
may be replaced with the model errors .Ei in the estimator .D̂N with negligible
asymptotic error. Using (4.1.28) we have

xi Êi xT
.
T T T
i+l Êi+l = xi Ei xi+l Ei+l − xi Ei (β̂ N,1 − β 0 ) xi+l xi+l

− xi xT T T T T
i (β̂ N,1 − β 0 )xi+l Ei+l + xi xi (β̂ N,1 − β 0 )(β̂ N,1 − β 0 ) xi+l xi+l .

We showed that

||β̂ N,1 − β 0 || = OP (N −1/2 ).


.

Lν –decomposability yields
.

||Ex0 E0 xT
.
−α
l El || = O(l )

and

||Ex0 xT
.
T −α
0 xl xl || = O(l ).
158 4 Regression Models

Thus we get
|| −1 ⎛ ⎛ ⎞ ||
||NE 1 E
N −l ||
|| l T T ||
. || K xi Ei (β̂ N,1 − β 0 ) xi+l xi+l || = oP (1),
|| h N −l ||
l=0 i=1

||N −1 ⎛ ⎛ ⎞ ||
|| E 1 E
N −l ||
|| l ||
. || K xi xT (β̂ N,1 − β 0 )xT Ei+l || = oP (1)
|| h N −l i i+l ||
l=0 i=1

and
|| −1 ⎛ ⎛ ⎞ ||
||NE 1 E
N −l ||
|| l T T T ||
. || K xi xi (β̂ N,1 − β 0 )(β̂ N,1 − β 0 ) xi+l xi+l || = oP (1).
|| h N −l ||
l=0 i=1

Putting together our calculations we conclude


||N −1 ⎛ ⎛ ⎞
|| E 1 E
N −l
|| l
. || K xi xT
i+l Êi Êi+l
|| h N −l
l=0 i=1
⎛ ⎛ ⎞ ||
E
N −1
1 E
N −l ||
l T ||
− K xi xi+l Ei Ei+l || = oP (1)
h N −l ||
l=0 i=1

and similar arguments give


||
|| −1 ⎛ ⎛ ⎞
|| E l 1 EN
|| K xi xT
.
|| i+l Êi Êi+l
||l=−(N −1) h N − |l|
i=−(l−1)
||
−1 ⎛ ⎛ ⎞ ||
E l 1 E
N
||
− xi xT ||
K i+l i i+l || = oP (1).
E E
h N − |l| ||
l=−(N −1) −(l−1)

The result now follows from Theorem 3.1.5. ⨆



The consistency results in Theorem 4.1.6 imply that Theorems 4.1.3 and 4.1.4
remain true when .D is replaced with .D̂N . Theorem 4.1.6 implies that the eigenvalues
and eigenvectors of .D̂N are asymptotically consistent estimators of the theoretical
eigenvalues .λ1 ≥ . . . ≥ λd and eigenvectors .v1 , . . . , vd of .D, and therefore we can
replace the theoretical values with the estimates in Theorem 4.1.5(i).
4.1 Change Point Detection Methods for Linear Models 159

4.1.3 Methods Based on Direct Comparison of Parameter


Estimates

As we have seen the quasi–likelihood method used in Sect. 4.1.1 leads to the
consideration of maximally selected standardized CUSUM processes constructed
from the covariates and residuals. Another possibility is to simply directly compare
estimators of the regression parameters before and after all candidate change
points, and use as evidence against .H0 the maximum difference. Formally then
we divide the data into two subsets at time k, .(xi , Ei ), .i ∈ {1, . . . , k} and .(xi , Ei ),
.i ∈ {k + 1, . . . , N }, and obtain the estimators .β̂ k,1 and .β̂ k,2 given in (4.1.6). We

define the process



⎨ 0, if 0 < t ≤ d/N,
R.N (t) = t (1 − t)N 1/2 (β̂ LN t⎦,1 − β̂ LN t⎦,2 ), if d/N < t < 1 − d/N, (4.1.36)

0, if 1 − d/N ≤ t < 1.

Theorem 4.1.7 We assume that .H0 of (4.1.2) and Assumptions 1.2.1, 4.1.1, 4.1.2
and 4.1.4 are satisfied, and that .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable for
ν

some .ν > 4.
(i) If .I (w, c) < ∞ with some .c > 0, then
⎛ d ⎞1/2
1 ⎛ T ⎞1/2 D 1 E
. sup RN (t)AD−1 ARN (t) → sup Bi2 (t) ,
0<t<1 w(t) 0<t<1 w(t) i=1

where .{Bi (t), 0 ≤ t ≤ 1}, i ∈ {1, ..., d}, are independent Brownian bridges.
(ii) Also,
⎛ ⎛ ⎛ ⎞1/2
k(N − k)
. lim P a(log N) max
N →∞ d<k<N −d N
⎡⎛ ⎞T ⎛ ⎞⎤1/2 ⎫
× β̂ k,1 − β̂ k,2 AD−1 A β̂ k,1 − β̂ k,2 ≤ x + bd (log N)

= exp(−2e−x )

for all .x ∈ R, where .a(x) and .bd (x) are defined in (1.3.9).
(iii) If .min(t1 , 1 − t2 ) → 0, .N min(t1 , 1 − t2 ) → ∞ and .κ > 1/2, then we have

κ−1/2 ||RT −1
N (t)AD ARN (t)||
1/2
D
rN
. sup → āI (d, κ),
t1 ≤t≤t2 (t (1 − t))κ

where .I is the .d × d identity matrix, .rN and .āI (d, κ) are defined in (1.2.27) and
(1.3.10).
160 4 Regression Models

Proof Under the null hypothesis


⎛ ⎞−1 ⎛ ⎞−1
β̂ k,1 = β 0 + XT
. k,1 Xk,1 XT
k,1 Ek,1 and β̂ k,2 = β 0 + XT
k,2 Xk,2 XT
k,2 Ek,2 ,

where .Xk,1 , .Xk,2 , .Ek,1 , and .Ek,2 are defined in Eqs. (4.1.6) and (4.1.8). Using
(4.1.10) and (4.1.12) we conclude
||⎛ ⎛⎛ ⎞−1 1 ⎞ ||
k || ||
max || T
Xk,1 Xk,1 − A −1
Xk,1 Ek,1 ||
T
.
d<k<N −d log+ log(k) || k ||
||⎛ ⎞−1 1 || || ||
k || T ||
−1 || || T ||
≤ max ||
≤ || Xk,1 Xk,1 − A || ||Xk,1 Ek,1 ||
d<k<N −d log log k + k
||⎛ ⎞−1 1 ||
k 3/2 || T ||
≤ max || X Xk,1 − A || −1 ||
d<k<N −d (log+ log k)1/2 || k,1 k
|| ||
1 || T ||
× max || Xk,1 E k,1 || = OP (1).
d<k<N −d (k log+ log k)1/2

Similarly,
||⎛ ⎛⎛ ⎞−1 ⎞ ||
N −k || 1 ||
|| XT Xk,2 − −1
Xk,2 Ek,2 ||
T
. max
d<k<N −d log+ log(N − k) || k,2
N −k
A || = OP (1).

As such we have the following approximations for the estimators:

1 1
. β̂ k,1 = β 0 + A−1 XT
k,1 Ek,1 + Rk,1 and β̂ k,2 = β 0 + A−1 XT
k,2 Ek,2 + Rk,2 .
k N −k

Therefore
k
. max ||Rk,1 || = OP (1), and
d<k<N −d (log+ log k)1/2
N −k
max ||Rk,2 || = OP (1).
d<k<N −d (log+ log(N − k))1/2

Hence the comparison of the parameters .β̂ k,1 and .β̂ k,2 leads to a CUSUM based
procedure. It follows from elementary calculation that
⎛ ⎛ ⎞
1 T 1 N k
. Xk,1 Ek,2 − XT E k,2 = S(k) − S(N ) ,
k N − k k,2 k(N − k) N

where .S(k) is defined in (4.1.18). Perhaps unsurprisingly the difference between


these estimators is asymptotically a weighted CUSUM process of the sequence
.{xi Ei , ∈ Z}, as studied in Sect. 4.1.1. ⨆

4.1 Change Point Detection Methods for Linear Models 161

It is interesting to note that according to the proof Theorem 4.1.7(iii)


|| ||
|| || D
(N rN )1/2
. max ||β̂ k,1 − β̂ k,2 || → āA−1 DA−1 (d, 1), (4.1.37)
N t1 (N )≤k≤N t2 (N )

where .rN and .āA−1 DA−1 (d, 1) are defined in (1.2.27) and (1.3.10).
.A and .D may again be estimated as in Sect. 4.1.1.

4.1.4 Heteroscedasticity in the Model Errors

So far we have studied the case in which the model errors a drawn from a stationary
process. In order to allow for heteroscedasticity in the model errors, we may
generalize the approach developed in Sect. 3.3 to the linear model framework. Recall
Assumptions 3.3.1 and 3.3.2 that the model errors satisfy .Ei = a(i/N)ei for a
stationary sequence .ei with mean zero and variance one, and .a : [0, 1] |→ R
a function of bounded variation. If Assumption 3.3.3 holds so that the sequence
.{xi ei , i ∈ Z} satisfies the functional central limit theorem, and .Exi ei = 0 for all i,

then we can define a sequence of Gaussian processes .{┌ N (t), 0 ≤ t ≤ 1} such that
|| ||
|| ||
|| −1/2 LN
E t⎦
||
|| xi Ei − ┌ N (t)||
. sup
||N || = oP (1),
0≤t≤1 || i=1 ||

where .E┌ N (t) = 0 and .┌ N (t)┌ T


N (s) = b(min(t, s)),
/ u

b(u) = D
. a 2 (s)ds,
0

and

E

. D = Ex0 e0 xT
l el (4.1.38)
l=−∞

is the long–run covariance matrix of the vector valued process .{xi ei , i ∈ Z}. With
again
⎛ ⎞
LN
E LNt⎦ E
t⎦ N
ZN (t) = N −1/2 ⎝
. xi Ei − xi Ei ⎠ ,
N
i=1 i=1
162 4 Regression Models

we obtain that

. sup ||ZN (t) − ┌ˆ N (t)|| = oP (1),


0≤t≤1

where

┌ˆ N (t) = ┌ N (t) − t┌ N (1).


.

T
It follows that .E ┌ˆ N (t) = 0 and .E ┌ˆ N (t)┌ˆ N (s) = D(t, s), where

D(t, s) = b(min(t, s)) − tb(s) − sb(t) + tsb(1).


.

Now the integral of the quadratic form satisfies


/ 1 / 1
D
. ||ZN (t)|| dt → 2 ˆ
||┌(t)||2
dt, (4.1.39)
0 0

where for each N

ˆ D
. {┌(t), 0 ≤ t ≤ 1} = {┌ˆ N (t), 0 ≤ t ≤ 1}.

Using the Karhunen-Loéve expansion we have


/ 1 ∞
E
. ˆ
||┌(t)||2
dt = λi Ni2 , (4.1.40)
0 i=1

where .{Ni2 , i ≥ 1} are independent standard normal random variables and .λ1 ≥
λ2 ≥ . . . are the eigenvalues of integral operator defined by the matrix valued
function .D. In particular, the eigenvalues satisfy the equation, with .φ i (s) =
(φi,1 (s), . . . ., φi,d (s))T

/ 1 d /
E 1
. λi φ i (t) = D(t, s)φ i (s)ds, 1 ≤ i < ∞, 2
φi,j (s)ds = 1,
0 j =1 0

where the left-hand integral is carried out coordinate–wise.


The function .D is typically unknown, but can be consistently estimated. We note
that similar techniques are presented in Chap. 8. First we define the residuals as
before

Êi = yi − xT
. i β̂ N , i ∈ {1, . . . , N},
4.1 Change Point Detection Methods for Linear Models 163

where we use .β̂ N for .β̂ N,1 . The estimator of .D is computed from .r̂ i = xi Êi ,
1 ≤ i ≤ N. First we estimate .b(u) with a kernel lag-window estimator for each
.0 < u ≤ 1. If the estimator is denoted by .b̂N (u), then the plug in estimator for

.D(t, s) is

D̂N (t, s) = b̂N (min(t, s)) − t b̂N (1) − s b̂N (1) + st b̂N (1),
. 0 ≤ t, s ≤ 1.

We follow the estimation technique of Sect. 3.1. The estimator .b̂N (u) is the long–
run covariance matrix estimator computed from .r̂ i , .i ∈ {1, . . . , LNu⎦}. Let


⎪ 1 E
k−l

⎪ (r̂ i − r̄)(r̂ i+l − r̄)T , if l /= 0,

⎨N
i=1
.γ̂ k,l =

⎪ 1 E
k

⎪ (r̂ i − r̄)(r̂ i+l − r̄)T , if l < 0,

⎩N
i=−(l−1)

where

1 E
N
.r̄ = r̂ i .
N
i=1

Now .{b̂N (u), 0 ≤ u ≤ 1} is defined as

LNE
u⎦−1 ⎛ ⎛ ⎞
l
b̂N (u) =
. K γ̂ LN u⎦,l ,
h
l=−(LN u⎦−1)

where .K(t) and h satisfy Assumptions 3.1.5 and 3.1.4. As in the proof of
Theorem 3.3.4, it can be shown that
/ || ||2
|| ||
. ||b̂N (u) − b(u)|| du = oP (1),

which implies that


/ / || ||2
|| ||
. ||D̂N (t, s) − D(t, s)|| dtds = oP (1). (4.1.41)

Let .λ̂1 ≥ λ̂2 ≥ . . . be the eigenvalues of .D̂N . Combining Theorem A.3.4 and
(4.1.41) we get that

|λ̂i − λi | = oP (1).
.
164 4 Regression Models

This result suggests approximating the limiting distribution in (4.1.40) with

∞ ∗
E E
d
. λi Ni2 ≈ λ̂i Ni2 ,
i=1 i=1

where .d ∗ is a user-specified, typically large, integer. This approximation may be


used to obtain approximate null critical values and p–values for testing .H0 versus
.HA based on the integrated quadratic form statistic on the left–hand side of 4.1.39.

If Assumption 2.1.1 holds, then there is a scalar .a > 0 so that


/
1 P
. ||ZN (t)||2 dt → a. (4.1.42)
N max1≤l≤R ||β l+1 − β l ||2

Under the alternative .D̂N is not a consistent estimator for .D, but
//
. D̂2N (t, s)dtds = OP (h2 max ||β l+1 − β l ||2 ).
1≤l≤R

Hence
⎛ ⎛ ⎞
.λ̂1 = OP h max ||β l+1 − β l || ,
1≤l≤R

which implies

E
d∗ ⎛ ⎛ ⎞
. λ̂i Ni2 = OP h max ||β l+1 − β l || ,
1≤l≤R
i=1

and therefore the consistency of the suggested procedure follows from (4.1.42).

4.2 Inference for Change–Points in a Linear Model

We begin this section by establishing the asymptotic behaviour of the statistics


in Theorems 4.1.1–4.1.5 under the AMOC alternative (4.1.3). We assume that
asymptotically there are a growing number of observations before and after the
change point.
Assumption 4.2.1 .min(k ∗ , N − k ∗ ) → ∞.
As in Chap. 2, we allow that the size of the change characterized by .||β 0 − β A || may
depend on the sample size and tend zero. The following assumptions characterizes
when the tests developed Sect. 4.1.1 are consistent.
4.2 Inference for Change–Points in a Linear Model 165

Assumption 4.2.2

.N 1/2 [θN (1 − θN )]1−κ ||β 0 − β A || → ∞,

with .k ∗ = LN θN ⎦.
The change in the parameters in the first consistency result we consider is measured
by

ΔN = (β 0 − β A )T A(β 0 − β A ).
.

Theorem 4.2.1 If .HA of (4.1.3), Assumptions 4.1.1, 4.1.2, 4.2.1 and 4.2.2 hold,
and .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable for some .ν > 4, then
ν

−2 log Λk ∗ P
. ⎛ ⎛ ⎞ → 1.
1
N log θN (1 − θN )ΔN + 1
σ2

Proof Using the definition of .β̂ N,1 we have under (4.1.3) that

.β̂ N,1 = [θN β 0 + (1 − θN )β A ](1 + oP (1))

and therefore by Theorem A.1.3



E
k E
N
2
N σ̂N,1
. = (yi − xT β̂
i N,1 ) 2
+ (yi − xT
i β̂ N,1 )
2

i=1 i=k ∗ +1

E
k E
N
= (Ei − xT
i ( β̂ N,1 − β 0 )) 2
+ (Ei − xT
i (β̂ N,1 − β A ))
2

i=1 i=k ∗ +1
∗ ∗
E
N E
k E
k
= Ei2 − 2 Ei xT
i (β̂ N,1 − β 0 ) + (β̂ N,1 − β 0 )
T
xi xT
i (β̂ N,1 − β 0 )
i=1 i=1 i=1

E
N E
N
−2 Ei xT
i (β̂ N,1 − β A ) + (β̂ N,1 − β A )
T
xi xT
i (β̂ N,1 − β A )
i=k ∗ +1 i=k ∗ +1

E
N
= Ei2 + NθN (1 − θN )2 ΔN (1 + oP (1))+NθN2 (1 − θN )ΔN (1 + oP (1)).
i=1
166 4 Regression Models

Similarly,
∗ ∗ ∗
E
k E
k E
k
N σ̂k2∗ ,1 =
. Ei2 − 2 Ei xT
i (β̂ k ∗ ,1 − β 0 ) + (β̂ k ∗ ,1 − β 0 )
T
xi xT
i (β̂ k ∗ ,1 − β 0 )
i=1 i=1 i=1

= k ∗ σ 2 (1 + oP (1)),

and

E
N E
N
.N σ̂k2∗ ,2 = Ei2 − 2 Ei xT
i (β̂ k ∗ ,2 − β A )
i=k ∗ +1 i=k ∗ +1

E
N
+ (β̂ k ∗ ,2 − β A )T xi xT
i (β̂ k ∗ ,2 − β A )
i=k ∗ +1

= (N − k ∗ )σ 2 (1 + oP (1)).

Combining the above approximations gives the result. ⨆



As such the asymptotic consistency of hypothesis testing based on comparing
the test statistics to the quantiles of the limiting distributions in Theorems 4.1.1
1/2
and 4.1.2, when .N 1/2 θN (1 − θN )ΔN → ∞. A similar rate condition can be
established for statistics based on the CUSUM process of the weighted residuals,
as detailed in the following result.
Theorem 4.2.2 If .HA of (4.1.3), Assumptions 4.1.1, 4.1.2, 4.1.4–4.2.2 and .0 ≤
κ < 1/2 hold, then

1 ⎛ ⎞1/2
T −1
sup1/(N +1)<t<1−1/(N +1) ẐN (t)D ẐN (t)
(t (1 − t))κ P
.
T −1
→ 1.
N [θN (1 − θN )] ((β 0 − β A ) AD A(β 0 − β A ))
1/2 1−κ 1/2

where .{ẐN (t), 0 ≤ t ≤ 1} is defined in (4.1.27). Similarly it holds that


⎛ ⎞1/2
RT ∗ −1 ∗
N (k /N)AD ARN (k /N) P
. → 1,
N 1/2 θN (1 − θN )((β 0 − β A )T D−1 (β 0 − β A ))1/2

where .RN (t) is defined in (4.1.36).


Proof Using the definition of the residuals we write

Ei − xT
i (β̂ N,1 − β 0 ), if 1 ≤ i ≤ k ∗ .
Êi =
.
T (4.2.1)
Ei − xi (β̂ N,1 − β A ), if k ∗ + 1 ≤ i ≤ N.
4.2 Inference for Change–Points in a Linear Model 167

Thus we have


⎪ Ek Ek

⎪ xi Ei − xi xT if 1 ≤ i ≤ k ∗

⎪ i (β̂ N,1 − β 0 ),

⎨ i=1
Ek i=1
. xi Êi = E k Ek∗ Ek

⎪ − T
− − xi xT
i=1 ⎪

xi E i xi xi ( β̂ N,1 β 0 ) i (β̂ N,1 − β A ),

⎪ ∗ +1


i=1 i=1 i=k
if k ∗ + 1 ≤ i ≤ N.

It follows from Theorem 4.1.3 (cf. Theorem A.1.3) that


⎛ ⎛ ⎞κ ||E |
|
k E
k N
N2 | |
. max | xi Ei − xi Ei | = OP (N 1/2 ).
1≤k<N k(N − k) | N |
i=1 i=1

For .1 ≤ i ≤ k ∗ we write
⎛ ∗ ⎞
E
k
k E
k E
k
. xi xT
i (β̂ N,1 − β 0 ) −
⎝ xi xT
i (β̂ N,1 − β 0 ) + xi xT
i (β̂ N,1 − β A )

N ∗
i=1 i=1 i=k +1
⎛ k ⎞
E k E T
N
k E
k
= xi xT
i − xi xi (β̂ N,1 − β 0 ) − xi xT
i (β̂ N,1 − β A )
N N ∗
i=1 i=1 i=k +1

and using again Theorem A.1.3


⎛ ⎛ ⎞κ || ⎛ k
|| E
⎞ ||
||
k E T
N
N2 || T ||
. max || xi xi − xi xi (β̂ N,1 − β 0 )|| = OP (1)
1≤k<N k(N − k) || N ||
i=1 i=1

and
⎛ ⎛ ⎞κ || ⎛ N
|| E
⎞ ||
||
N2 || ||
. max || xi xT − (N − k ∗ )A (β̂ N,1 − β A )|| = OP (1).
1≤k<N k(N − k) || ∗
i ||
i=k +1

So we conclude
⎛ ⎛ ⎞κ ||
||E
||
||
k E
k N
N2 || ∗ ||
. max || xi Êi − xi Êi − (N − k )A(β̂ N,1 − β A )||
1≤k≤k ∗ k(N − k) || N ||
i=1 i=1

= OP (1).
168 4 Regression Models

Similarly,
⎛ ⎛ ⎞κ ||
||E
||
||
k E
k N
N2 || ∗ ||
. max || xi Êi − xi Êi − k A(β̂ N,1 − β 0 )|| = OP (1).
k ∗ <k<N k(N − k) || N ||
i=1 i=1

Hence

⎛ ⎛ ⎞κ ⎨⎛E ⎞T
k E
k N
N2
. max xi Êi − xi Êi
1≤k<N k(N − k) ⎩ N
i=1 i=1

⎛ k ⎞⎫1/2
E k E
N ⎬
× D−1 xi Êi − xi Êi
N ⎭
i=1 i=1
⎛ ⎞1/2
= N[θN (1 − θN )]1−κ (β 0 − β A )T AD−1 A(β 0 − β A ) + OP (N 1/2 ).

The second half of the theorem follows similarly and we omit the details. ⨆

Remark 4.2.1 We note that Theorem 4.2.2 remains true if .κ = 1/2 when we
replace Assumption 4.2.2 with

N 1/2 [θN (1 − θN )]1/2 ||β 0 − β A ||/(log log N)1/2 → ∞.


.

In order to produce asymptotically pivotal test statistics we often use estimates


of .D−1 , the inverse of the long run covariance matrix of the weighted errors
.xi Ei , 1 ≤ i ≤ N . As detailed in Chap. 3, such estimates as defined in (4.1.35) are

not consistent when there is a change in the regression parameters. If .HA of (4.1.3)
holds, then the long–run covariance matrix of (4.1.35) satisfies

||D̂N || = OP (h),
. (4.2.2)

where h is the bandwidth parameter that is viewed as a function of the sample


size N . A more general result on the long–run variance estimator is stated in
Theorem 8.1.5 in Chap. 8, and its proof can be adapted to show (4.2.2). According
to Theorems 4.2.1–4.2.2, we obtain that even when .D̂N satisfies (4.2.2), so long as
.h/N → 0 as .N → ∞,

P −1 P
. sup t (1 − t)(−2 log ΛLN t⎦ ) → ∞, sup ẐT
N (t)D̂N ẐN (t) → ∞,
0<t<1 0<t<1
4.2 Inference for Change–Points in a Linear Model 169

and
P
. sup RT −1
N (t)AD ARN (t) → ∞
0≤t≤1

under Assumptions 2.1.1 and 4.2.2.

4.2.1 Estimation of a Single Change Point

We have seen that several different approaches for testing .H0 versus .HA amount
to considering maximally selected CUSUM processes. Naturally the argument at
which these processes attain their maximum may be used to estimate the location of
a change point. In view of (4.1.27), an estimator for the location of a change point
may be defined as
⎛ || k || ⎫
||E ||
1 || ||
k̂N (κ) = k̂N = sargmax
. || xi Êi || , (4.2.3)
k∈{1,...,N −1} (k(N − k))κ || ||
i=1

where .0 ≤ κ ≤ 1/2. The asymptotic properties of this estimator are detailed in the
following result, in which the size of the change in model (4.1.1) is measured by

||A(β 0 − β A )||2
ΔN =
. . (4.2.4)
||AD1/2 (β 0 − β A )||

We note that .ΔN is proportional to .||A(β 0 − β A )||.


Theorem 4.2.3 We assume that .HA of (4.1.3), Assumptions 2.1.1, 4.1.1, 4.1.2
and 4.1.4 are satisfied, and that .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable
ν

for some .ν > 4.


(i) If .0 ≤ κ < 1/2 and

||β 0 − β A || → 0 and
. N||β 0 − β A ||2 → ∞, (4.2.5)

then
D
Δ2N (k̂N − k ∗ ) → ξ(κ).
.

(ii) If .κ = 1/2 and

||β 0 − β A || → 0
. and N ||β 0 − β A ||2 / log log N → ∞,
170 4 Regression Models

then
D
Δ2N (k̂N − k ∗ ) → ξ(1/2),
.

where .ξ(κ) is defined in (2.2.3).


Proof For notational simplicity we use .β̂ N instead of .β̂ N,1 . We only provide
details for .0 ≤ κ < 1/2. The result can be established for .κ = 1/2 with minor
modifications. This proof follows that of Theorem 2.2.1, and so we highlight the
main differences but refer similar parts to the proof of that result. We write

E
k
k E
N E
k
k E
N
. xi Êi − xi Êi = xi Ei − xi Ei + vk ,
N N
i=1 i=1 i=1 i=1

where
⎧ ⎛ ∗ ⎞

⎪E k Ek E N

⎪ xi xT
k ⎝ xi xT xi xT ⎠

⎪ i (β̂ N − β 0 ) − i (β̂ N − β 0 ) + i (β̂ N − β A ) ,

⎪ N

⎪ i=1 i=1 i=k +1 ∗

⎪ ∗,

⎨ ∗ if 1 ≤ i ≤ k
E E ⎛ ⎛ k ∗
k E T
k k
.vk =
⎪ T T

⎪ xi xi (β̂ N − β 0 ) + xi xi (β̂ N − β A ) − xi xi (β̂ N − β 0 )

⎪ N

⎪ i=1 i=k ∗ +1 i=1

⎪ E N ⎞

⎪ T


⎩ + xi x i ( β̂ N − β A ,
) if k ∗ + 1 ≤ i ≤ N.
i=k ∗ +1

It follows from the proof Theorem 4.1.3 that


|| k ||
N 2κ−1/2 ||E k E
N ||
|| ||
. max || xi Ei − xi Ei || = OP (1).
1≤k<N (k(N − k))κ || N ||
i=1 i=1

We also showed that


|| ⎛ ⎛ ∗ ⎞||
|| k N − k∗ ||
. ||β̂ N − || −1/2
|| β 0 + β A || = OP (N ).
N N

Using again Theorem A.1.3 we get


|| k ||
||E || ⎛ ⎞
−κ || T ||
. max k || xi xi − kA|| = OP N κ−1/2
1≤k≤N || ||
i=1
4.2 Inference for Change–Points in a Linear Model 171

and
|| N ||
|| E || ⎛ ⎞
−κ || ||
. max (N − k) || xi xT − (N − k)A|| = OP N κ−1/2 .
1≤k<N || i ||
i=k+1

Hence
|| k ||
N κ−1/2 N κ−1/2 ||||E T
||
||
. max ||vk − zk || ≤ max || (x x
i i − A)( β̂ − β 0 ||
)
1≤k≤k ∗ kκ 1≤k≤k ∗ k κ || N
||
i=1
|| ||
|| E k∗ ||
N κ−1/2 |||| k T
||
||
+ max ∗ κ || (x x
i i − A)( β̂ N − β )
0 ||
1≤k≤k k || N i=1 ||
|| ||
N κ−1/2 || k E
N ||
|| ||
+ max ∗ || (xi xT − A)(β̂ N − β A )||
1≤k≤k kκ || N ∗
i ||
i=k +1

= OP (1),

where


⎪ k(N − k ∗ )
⎨ A(β 0 − β A ), if 1 ≤ k ≤ k ∗ ,
.zk =
N

⎪ k ∗ (N − k)
⎩ A(β 0 − β A ), if k ∗ + 1 ≤ k ≤ N.
N

Similar arguments give

N κ−1/2
. max ||vk − zk || = OP (1).
k ∗ +1≤k<N (N − k)κ

Thus we conclude

|k̂N − k ∗ | = oP (N ).
. (4.2.6)

Let
C
a=
. ,
Δ2N

where .ΔN is defined in (4.2.4). We note that for all .1 ≤ k ≤ k ∗

E
k
k E
N
. xi Êi − xi Êi = Q1 (k) + . . . + Q4 (k),
N
i=1 i=1
172 4 Regression Models

where

E
k
k E
N
Q1 (k) =
. xi Ei − xi Ei ,
N
i=1 i=1
⎛ ⎞
E
k
k E
N
Q2 (k) = − xi xT
i − xi xT
i (β̂ N − (θ β 0 + (1 − θ )β A ),
N
i=1 i=1

E
k
Q3 (k) = xi xT
i (1 − θ )[β 0 − β A ],
i=1
⎡ ⎤
k∗
E E
N
k ⎣ ⎦ (β 0 − β A ).
Q4 (k) = − (1 − θ ) xi xT
i −θ xi xT
i
N ∗i=1 i=k +1

We use the following decomposition for .k ∈ {k ∗ − C/Δ2N , . . . , k ∗ + C/Δ2N } :

⎛ ⎛ ⎞2κ ||
||E
||2
||
k E
k N
N || ||
. || xi Êi − xi Êi ||
k(N − k) || N ||
i=1 i=1
|| ||2
⎛ ⎛ ⎞2κ ||E ∗
∗ E
||
N || k k
N
||
− || xi Êi − xi Êi ||
∗ ∗
k (N − k ) || ||
|| i=1 N
i=1 ||
⎛ ⎛ ⎞2κ E
4 ⎛ ⎛ ⎞2κ E
4
N 1
= QT
i (k)Qj (k) − QT ∗ ∗
i (k )Qj (k ).
k(N − k) k ∗ (N − k ∗ )
i,j =1 i,j =1

From here we may follow the proof of Theorem 2.2.1 to obtain the limit of the
process .Q4 (k). We obtain that

Δ2N |k̂N − k ∗ | = OP (1).


.

With

E
N
Q3 (k) =
. xi xT
i θ (β 0 − β A ), if k > k ∗ ,
i=k+1
4.2 Inference for Change–Points in a Linear Model 173

we also have
| ⎛ ⎛ ⎞2κ
| N −k E T
N
−(1−2κ) | N
N
. max | k(N − k) xi Ei Q3 (k)
k ∗ <k≤k ∗ +C/Δ2N N
i=1
⎛ ⎛ ⎞2κ |
− k∗ E |
N
N N |
− xT ∗
i Ei Q3 (k ) − RN,1 (k)| = oP (1),
k ∗ (N − k ∗ ) N
i=1

where

⎪ ⎛ ⎛ ⎛ ⎛ ⎞2κ E k

⎪ −(1−2κ) N


⎪ N xT
i Ei Q3 (k)

⎪ k(N − k)

⎪ ⎛ ⎞2κ E ∗
i=1 ⎞



⎪ − N k
xT E Q (k ∗ ) , if 1 ≤ k ≤ k ∗ ,

⎪ k ∗ (N −k ∗ ) i=1 i i 3

⎨ ⎛ ⎛ ⎛ ⎛ ⎞2κ ⎛ E N

.RN,1 (k) = −(1−2κ) N T
⎪N
⎪ − xi Ei Q3 (k)

⎪ k(N − k)

⎪ ⎞2κ ⎛
i=k+1 ⎞

⎪ ⎛ ⎛ E ⎞

⎪ N
N

⎪ − ∗ − T ∗
xi Ei Q3 (k ) ,

⎪ k (N − k ∗ )

⎪ ∗ +1


i=k
if k ∗ + 1 ≤ k ≤ N,

We note that
| ⎛ ⎛ ⎞2κ ⎡
| N
−(1−2κ) | QT
3 (k)Q3 (k) − (1 − θ ) k
2 2
N
. max | k(N − k)
|k−k ∗ |≤C/Δ2N

× (β 0 − β A )T AT A(β 0 − β A )
⎛ ⎛ ⎞2κ ⎡
N
− QT ∗ ∗ 2 ∗ 2
3 (k )Q3 (k ) − (1 − θ ) (k )
k ∗ (N − k ∗ )
⎤|
|
× (β 0 − β A ) A A(β 0 − β A ) ||
T T

= oP (1).
174 4 Regression Models

Let
⎛ ⎛ ⎛ ⎛ ⎞2κ
N
RN,2 (k) = N −(1−2κ)
. (1 − θ )2 k 2 (β 0 − β A )T AT A(β 0 − β A )
k(N − k)
⎛ ⎛ ⎞2κ
N
− (1 − θ )2 (k ∗ )2
k ∗ (N − k ∗ )

T T
× (β 0 − β A ) A A(β 0 − β A ) ,

if .k ∈ {k ∗ − C/Δ2N , . . . , k ∗ + C/Δ2N }. Elementary arguments give


| |
| ||A(β 0 − β A )||2 ||
| ∗
sup |RN,2 (k + s/ΔN ) + 2(θ (1 − θ )) |s|mκ (s) | = o(1).
2 1−2κ
.
|s|≤C Δ2 N

Using our previous arguments we get for all .C > 0


⎛ ⎞T
| ⎛ ⎛ ⎞2κ k∗
E
| N
N −(1−2κ) max |RN,1 (k) − RN,3 (k) − 2 ⎝− xi Ei ⎠
2 |
.
|k ∗ −k|≤C/ΔN k ∗ (N − k ∗ )
i=k+1

E
∗ |
k
|
× xi xT (β 0 − β )
A |
|
i
i=1

= oP (1).

where
⎧ ⎛ ⎞T ∗

⎪ ⎛ ⎛ ⎞2κ Ek∗ E k

⎪ N

⎪ 2 (1 − θ ) ⎝ − x E ⎠ xi xT
i (β 0 − β A ),

⎪ k ∗ (N − k ∗ ) i i

⎪ i=k+1 i=1
⎨ ∗
.RN,3 (k) =
if 1 ≤ k ≤ k

⎪ ⎛ ⎛ ⎞2κ ⎛ E k
⎞ N
E

⎪ N

⎪ 2 ∗ θ xi Ei xi xT
i (β 0 − β A ),

⎪ k (N − k ∗ )

⎪ i=k ∗ i=k ∗ +1

if k ∗ + 1 ≤ k ≤ N.

Using Theorem A.1.1 we can define a sequence two sided Wiener processes
{WN (t), −∞ < t < ∞} such that
.

| |
| −(1−2κ) ||AD1/2 (β 0 − β A )|| ||
. sup N| ∗
RN,3 (k + s/ΔN ) − 2(θ (1 − θ ))
2 1−2κ
WN (s)
| ΔN |
|s|≤C

= oP (1),
4.2 Inference for Change–Points in a Linear Model 175

where the two sided Wiener process is defined in (2.2.2). We observe that by the
choice of .ΔN we have

||AD1/2 (β 0 − β A )||
2(θ (1 − θ ))1−2κ WN (s)
.
ΔN
||A(β 0 − β A )||2
− 2(θ (1 − θ ))1−2κ |s|mκ (s)
Δ2N
||AD1/2 (β 0 − β A )||
= 2(θ (1 − θ )1−2κ (WN (s) − |s|mκ (s)).
ΔN

Since the distribution of .{WN (t), −∞ < t < ∞} does not depend on N, we can
repeat the last step in the proof of Theorem 2.2.1 to conclude. ⨆

This result pertains to the argument-maximum of the CUSUM process based on
weighted residuals, although a similar result may be established for maximally
selected likelihood ratio statistics.
As described in Chap. 3, for example in (3.1.33), estimators for .D as in (4.1.35)
can be modified using .k̂ = k̂N in order to reduce additional estimated variation
due to the presence of change points. We may compute least–squares regression
parameters estimators .β̂ k̂,1 from .{(yi , xT T
i ), 1 ≤ i ≤ k̂} and .β̂ k̂,2 from .{(yi , xi ), k̂ +
1 ≤ i ≤ N}. Subsequently .D may be estimated with a kernel lag–window long–run
covariance matrix estimate based on .{xi Êi , 1 ≤ i ≤ N }, where

yi − xT
i β̂ k̂,1 , if 1 ≤ i ≤ k̂,
.Êi =
yi − xT
i β̂ k̂,2 , if k̂ + 1 ≤ i ≤ N.

As in (2.3.21) and (2.3.22), we can define maximum norm based estimators


for the time of change. We recall the definitions of .ẑN (t) and .z̃N (t) from (4.1.32)
(4.1.33). Let
⎛ ⎛ ⎛ ⎞κ ⎫
(1) 1
.k̂ = sargmax max |ẑN,j (k/(N + 1))| ,
N
k∈{1,...,N −1} 1≤j ≤d k(N − k)

and
⎛ ⎛ ⎛ ⎞κ ⎫
(2) 1
.k̂ = sargmax max |z̃N,j (k/(N + 1))| ,
N
k∈{1,...,N −1} 1≤j ≤d k(N − k)
176 4 Regression Models

where .{ẑN,j (t), 0 ≤ t ≤ 1, 1 ≤ j ≤ d} and .{z̃N,j (t), 0 ≤ t ≤ 1, 1 ≤ j ≤ d} are


defined in (4.1.32) and (4.1.33), respectively. These estimators satisfy

(1)
|k̂N
. − k ∗ | = OP (1/Δ2N ) = oP (N ) and (2)
|k̂N − k ∗ | = OP (1/Δ2N ) = oP (N ),

if the conditions of Theorem 4.2.3 are satisfied.

4.2.2 Estimation of Multiple Change Points

The methods we discussed can be extended to multiple changes. We assume that the
observations are given by

E
R+1
yi =
. xT
i β j 1{kj −1 ≤ i < kj } + Ei , i ∈ {1, . . . , N }, (4.2.7)
j =1

where .k0 = 0 and .kR+1 = N + 1. As such the times of the changes are .k1 , . . . , kR ,
and at time .kl the regression parameter changes from .β l to .β l+1 . The alternative of
R change points can be characterized by

HA : β 1 /= β 2 /= . . . /= β R+1 .
. (4.2.8)

When .ki = LNθi ⎦, 1 ≤ i ≤ R and .0 < θ1 < θ2 < . . . θR < 1 (Assumption 2.3.1)
holds, it is required that

N 1/2 min ||β l+1 − β l || → ∞,


.
1≤l≤R

in order for the tests resulting from Theorems 4.1.2–4.1.5 to be consistent. Similarly
if

N 1/2 min ||β l+1 − β l ||(log log N)−1/2 → ∞,


.
1≤l≤R

then the standardized statistics discussed in Theorems 4.1.2–4.1.5 remain consis-


tent.
The quasi-likelihood method of Sect. 4.1.1 can be adapted to test for exactly
R changes in the regression parameter, although this leads to a quite complicated
limiting distribution that is difficult to make use of in practice.
An alternative is to minimize the penalized sum of the squares of the residuals to
estimate R and .k1 < · · · < kR , or employ binary segmentation.
4.2 Inference for Change–Points in a Linear Model 177

We define the sum of the squares assuming that there are S changes at times
1 < r1 < r2 < . . . < rS < N:
.

E
S+1 E
rl
.M(r1 , r2 , . . . , rS , S) = min (yi − xT
i β)
2
β
l=1 i=rl−1 +1

(.r0 = 0, rS+1 = N). Since taking .S = N would minimize .M(r1 , r2 , . . . , rS , S) =


0, we add a penalty term .P(N, S) that is a strictly increasing function S, and instead
minimize

MP EN (r1 , r2 , . . . , rS , S) = M(r1 , r2 , . . . , rS , S) + P(N, S),


.

Estimators of R and .k1 < . . . < kR may then be defined as

MP EN (k̂1 , k̂2 , . . . , k̂R̂ , R̂) =


. min MP EN (r1 , r2 , . . . , rS , S).
r1 ,r2 ,...,rS ,S

If .rl − rl−1 > d, then

E
rl E
rl
. min (yi − xT
i β) =
2
(yi − xT 2
i β̂ l ) ,
β
i=rl−1 +1 i=rl−1 +1

where .β̂ l is the least-squares estimator of the regression parameter computed from
.{yi , xi , rl−1 < i ≤ rl }. We assume that the penalty function takes the form
.P(N, S) = g(S)mN , and satisfies the conditions

g(s) is a strictly increasing positive function,


.

mN → ∞,
. and mN /(N ||β l+1 − β l ||2 ) → 0, 1≤l≤R+1

hold. We note that the penalty terms discussed in Sect. 2.1 can be used. Following
the proof of Theorem 2.3.2, one can show that for all .ε > 0,
⎛ ⎛ ⎛ ⎫⎞
. lim P {R̂ = R} = 1} ∩ max |k̂l − kl | < εN = 1. (4.2.9)
N →∞ 1≤l≤R

Binary segmentation, as described in Sect. 2.3.1 can also be used to estimate R


and .k1 < · · · < kR . The test statistics described in Theorems 4.1.2–4.1.5 may be
used as detectors with weight function .w(t) = [t (1 − t)]κ for .0 < κ ≤ 1/2. The
consistency of this approach can be established along the lines of Theorem 8.2.2.
Using such preliminary change point estimators as defined above, the limiting
distribution of refined estimators can we established as in Sect. 2.1, which is of use
in producing confidence intervals for the change points. Similarly to Theorem 2.3.3,
178 4 Regression Models

we assume that we have preliminary change point estimators satisfying the follow-
ing assumption:
Assumption 4.2.3 The estimators .R̂ and .k̂1 , . . . , k̂R̂ satisfy for all .ε > 0,
⎛ ⎛ ⎛ ⎫⎞
P {R̂ = R} ∩
. max |k̂i − ki | < εN → 1.
i∈{1,...,R}

Using these estimators we define refined change point estimators as

⎛⎛ ⎞κ ||
|| k
||
|| ⎫
1 || E ||
k̃l = sargmax || xi β̂ l ||
.
|| || ,
k∈{k̂l−1 +1,...,k̂l+1 −1} (k − k̂l−1 )(k̂l+1 − k) || ||
i=k̂l−1 +1

where .β̂ l is the least squares estimator computed from .(yi , xi ), .i ∈ {k̂l−1 , . . . , k̂l+1 }.
The size of change at .kl is measured by

||A(β l+1 − β l )||2


.ΔN,l = .
||AD1/2 (β l+1 − β l )||

We note since .D and .A are non singular matrices, .ΔN,l is proportional to .||β l+1 −
β l ||. The proof of the following results can be derived from Theorem 4.2.3 similarly
as Theorem 2.3.3 was derived from Theorem 2.2.1.
Theorem 4.2.4 We assume that .HA of (4.2.8), Assumptions 2.3.1, 4.1.1,
4.1.2, 4.1.4, 4.2.3 are satisfied, .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable
ν

for some .ν > 4 and satisfies Assumption 1.3.2 with rate parameter .ζ , and
.0 ≤ κ < 1/2 − ζ . If for all .l ∈ {1, ..., R}

||β l+1 − β l || → 0
. and N||β l+1 − β l ||2 → ∞,

then

. Δ2N,1 (k̃1 − k1 ), Δ2N,2 (k̃2 − k2 ), . . . , Δ2N,R (k̃R − kR )

are asymptotically independent, and for each .1 ≤ l ≤ R, and

D
Δ2N,l (k̃l − kl ) → ξ̄ (κ),
.

where .ξ̄ (κ) is defined in (2.3.16).


The multiple changes in the regression model of (4.2.7) also allows the possibility
of non-stationary errors as defined in Assumptions 3.3.1 and 3.3.2. As we have seen
when the model errors are of the form .Ei = a(i/N)εi , Theorem 4.2.4 does not hold
if .a(x) is has a discontinuity at .θl . In adapting this result to the multiple change
4.3 Polynomial Regression 179

scenario, let

ξl∗ = argmax{W ∗ (t) − |t|m̄l (t)},


. (4.2.10)

where .{W ∗ (t), −∞ < t < ∞} and .m̄l (t) defined in (3.3.19) and (2.3.15). The
normalization also must be changed to reflect non-stationarity of the errors. Let

||A(β l+1 − β l )||2


Δ̄N,l =
. ,
||A(a 2 (θl )D∗ )1/2 (β l+1 − β l )||

where .D∗ is defined in (4.1.38). The following result may be established along
the lines of Theorem 2.3.3 where its reliance on Theorem 2.2.1 is replaced by
Theorem 4.2.4.
Theorem 4.2.5 We assume that .HA of (4.2.8), Assumptions 2.3.1, 3.3.1, 3.3.2,
4.1.1, 4.1.2, 4.1.4, and 4.2.3 are satisfied, .{zi = (xT T
i , Ei ) , i ∈ Z} is .L –
ν

decomposable for some .ν > 4 and satisfies Assumption 1.3.2 with .ζ = 1/ν„ and
.0 ≤ κ < 1/2 − 1/ν. If for all .l ∈ {1, . . . , R},

||β l+1 − β l || → 0
. and N ||β l+1 − β l ||2 → ∞,

then

Δ̄2N,1 (k̃1 − k1 ), . . . , Δ̄2N,R (k̃R − kR )


.

are asymptotically independent and for each .l ∈ {1, . . . , R}, and

D
Δ2N,l (k̃l − kl ) → ξl∗ (κ),
.

where .ξl∗ (κ) is defined in (4.2.10).

4.3 Polynomial Regression

When .{zi = (xT T


i , Ei ) , i ∈ Z} is .L –decomposable for some .ν > 4, we have by
ν

the ergodic theorem that

E
k
. xi xT
i ≈ kA, (4.3.1)
i=1
180 4 Regression Models

where .A is defined in Assumption 4.1.1. An important example in which (4.3.1)


does not hold is when fitting a curve to the responses .yi , .i ∈ {1, . . . , N}. In this case
a polynomial regression model may be formulated as in (4.1.1), where
⎛ ⎛ ⎛ ⎞2 ⎛ ⎛ ⎞p ⎞T
i i i
.xi = 1, , ,..., , i ∈ {1, ..., N}. (4.3.2)
N N N

To derive a test for .H0 of (4.1.2) against .HA of (4.1.3), we may again use a quasi–
likelihood argument as in Sect. 4.1.1. If .{Ei , i ≥ 1} are independent and identically
distributed normal random variables with .EEi = 0 and .EEi2 = σ 2 . If the variance
2
.σ is known, negative two multiplied by the logarithm of the likelihood ratio takes

the form
N ⎛ 2 ⎞
lN (k) =
. σ̂N,1 − [ σ̂ 2
k,1 + σ̂ 2
k,2 ] ,
σ2
2 and .σ̂ 2 are defined in (4.1.5). We also recall the least square estimators
where .σ̂k,1 k,2
β̂ k,1 and .β̂ k,2 defined in (4.1.6). Elementary algebra gives that, under .H0 ,
.

1 T −1
2
σ̂N,1
. − [σ̂k,1
2
+ σ̂k,2
2
]= R C CN,1 C−1
k,2 Rk ,
N k k,1
where

E
k E
N
Ck,1 =
. xi xT
i , Ck,2 = xi xT
i ,
i=1 i=k+1

and

E
k E
k E
N
Rk =
. xi (yi − xT
i β̂ N,1 ) = xi Ei − Ck C−1
N,1 xi yi .
i=1 i=1 i=1

Rk is the partial sum of the weighted residuals of (4.1.26). We then use as evidence
.

against .H0 large values of

1 −1 −1
TN =
. max RT
k Ck,1 CN,1 Ck,2 Rk . (4.3.3)
σ2 p<k<N −p

Following instead the quasi–likelihood argument of Sect. 4.1.1 in which the


variance .σ 2 is unknown, the likelihood ratio is given in (4.1.4). We again reject
.H0 for large values of

T̂N =
. max (−2 log Λk ),
p<k<N −p
4.3 Polynomial Regression 181

where .Λk is defined in (4.1.4). We have already investigated the properties of .T̂N in
Sect. 4.1.1, but under assumptions that imply (4.3.1). In the present section we study
.T̂N where the covariates are as in (4.3.2). In order to simplify this presentation, we

assume that the model errors are serially uncorrelated.


Assumption 4.3.1 . EEi = 0, EEi2 = σ 2 and EEi Ej = 0, if i /= j .
Rather than assuming .Lν –decomposability, all that is required in this case is a
weak approximation of the partial sum of the model errors, which we phrase as
the following assumption.
Assumption 4.3.2 For each N there are two independent Wiener processes
{WN,1 (t), 0 ≤ t ≤ N/2} and .{WN,2 (t), N/2 ≤ t ≤ N } such that
.

| k |
|E |
−ζ | |
. max k | Ei − σ WN,1 (k)| = OP (1)
1≤k≤N/2 | |
i=1

and
| N |
| E |
| |
. max (N − k)−ζ | Ei − σ WN,2 (N − k)| = OP (1)
N/2≤k≤N | |
i=k+1

with some .ζ < 1/2.


The limit of the standardized sums .CLN t⎦,1 /N and .CLN t⎦,2 /N are the matrix valued
functions
⎛/ t ⎫
.C1 (t) = x dx, 0 ≤ i, j ≤ p
i+j
0

and
⎛/ 1 ⎫
C2 (t) =
. x i+j dx, 0 ≤ i, j ≤ p .
t

The sum of the weighted errors has a weak limit under Assumption 4.3.2 that may
be expressed using the Gaussian process
⎛ ⎛/ t / t / t ⎞T
┌(t) =
. dW (x), xdW (x), . . . , x p dW (x) ,
0 0 0

where .{W (s), 0 ≤ s ≤ 1} denotes a Wiener process.


182 4 Regression Models

Theorem 4.3.1 If .H0 of (4.1.2), Assumptions 4.3.1 and 4.3.2 are satisfied, then

1 −1 −1 D
. max RT
k Ck,1 CN Ck,2 Rk → sup ||Γ 0 (t)||2
σ2 LN δ⎦≤k≤N −LN δ⎦ δ<t<1−δ

for all .0 < δ < 1/2, where .┌ 0 (t) = ┌(t) − C(t)C−1 (1)┌(t).
Proof Elementary calculations give that
|| ||
|| 1 ||
|| ||
. sup
|| N CLN t⎦,1 − C1 (t)|| = o(1), (4.3.4)
0≤t≤1

and
|| ||
|| 1 ||
|| ||
. sup
|| N CLN t⎦,2 − C2 (t)|| = o(1). (4.3.5)
0≤t≤1

Assumption 4.3.2 implies that

LN
E t⎦
1 D [0,1]
. Ei −→ σ W (t), (4.3.6)
N 1/2
i=1

where .{W (t), 0 ≤ t ≤ 1} is a Wiener process. Using integration by parts, as in


Sect. 3.3, (4.3.6) yields
⎧ ⎛ ⎞ ⎫
⎨1 / t LN
E x⎦ ⎬ ⎛/ t ⎫
1 D [0,1]p+1
. x j d ⎝ 1/2 Ei ⎠ , 0 ≤ j ≤ p −→ x j dW (x), 0 ≤ j ≤ p .
⎩σ 0 N ⎭ 0
i=1

Now

1 D [0,1]p+1
. R
1/2 LN t⎦
−→ ┌ 0 (t)
N
follows from (4.3.4) and (4.3.5). Using again (4.3.4) and (4.3.5) we obtain Theo-
rem 4.3.1. ⨆

The statistics .TN and .T̂N are maximally selected .log likelihood ratios, and
as a result they are “standardized” in the sense that the expected values of
T −1 C C−1 R and .−2 log Λ are constant. Hence it is expected that the
.R C
k k,1 N k,2 k k
maximally selected .log likelihood ratios must have a limit distribution related to
4.3 Polynomial Regression 183

the Darling–Erdős results of Sect. 1.2.1. Let


⎛ ⎞
2p/2 ┌ (p/2)
.a(p) = log , (4.3.7)
p

where .┌ (x) denotes the Gamma function.


Theorem 4.3.2 If .H0 of (4.1.2), Assumptions 4.3.1 and 4.3.2 are satisfied, then

. lim P {TN ≤ x + 2 log log N + (p + 1) log log log N − 2a(p + 1)}


N →∞
⎛ ⎞
= exp −2e−x/2 (4.3.8)

and
{ }
. lim P T̂N ≤ x + 2 log log N + (p + 1) log log log N − 2a(p + 1)
N →∞
⎛ ⎞
= exp −2e−x/2 (4.3.9)

for all .x ∈ R, where .a(x) is defined in (4.3.7).


Remark 4.3.1 The rate of convergence as a function of the sample size of such
Darling–Erdős type limit theorems is often slow. One explanation for this arises
from examining the proof of the result, in which two limits are successively
computed. First, the partial sums of the model errors are approximated with
Gaussian processes, and then an asymptotic result is obtained for the maximum of a
Gaussian process, which gives rise to the extreme-value limit. Also the normalizing
sequences are somewhat arbitrary, and could be replaced by a host of asymptotically
equivalent versions. Monte Carlo simulations suggest that better finite sample
performance is achieved if we use

. lim P {TN ≤ x + 2 log log h(N) + (p + 1) log log log h(N) − 2a(p + 1)}
N →∞
⎛ ⎞
= exp −2e−x/2

and
{ }
. lim P T̂N ≤ x + 2 log log h(N) + (p + 1) log log log h(N) − 2a(p + 1)
N →∞
⎛ ⎞
= exp −2e−x/2

where .h(N) = N(log N)γ , .−∞ < γ < ∞. The parameter .γ can be tuned to
improve the finite sample approximation of the limit result in Theorem 4.3.2, as
discussed in Aue et al. (2008).
184 4 Regression Models

Polynomial regression will be a special case of trending regression in Theo-


rem 4.3.3, where the assumption that the errors are uncorrelated is removed.
Proof of Theorem 4.3.2 The proof is rather technical so we just outline the mains
steps. More details are given in Aue et al. (2008). We only consider (4.3.8) since the
proof of (4.3.9) nearly the same. For notational simplicity, we write .β̂ N instead of
.β̂ N,1 . First we note that .CN,1 = Ck,1 + Ck,2 , and therefore

−1 −1 T −1 T −1
.RT
k Ck,1 CN,1 Ck,2 Rk = Rk Ck,1 Rk + Rk Ck,2 Rk .

It follows from Assumption 4.3.2, (4.3.4) and (4.3.4) that

||β̂ N − β 0 || = OP (N −1/2 ).
. (4.3.10)

We recall

E
k
.S(k) = xi Ei
i=1

from (4.1.18). We write

Rk = S(k) − Ck (β̂ N − β 0 ).
. (4.3.11)

Putting together (4.3.4) and (4.3.10) we conclude


|| ||
|| 1 ||
|| Ck (β̂ N − β 0 )||
. max
1≤k≤N || k || = OP (1).

We showed in the proof of Theorem 4.3.1 that

. max ||S(k)|| = OP (N 1/2 ),


1≤k≤N

and therefore by (4.3.5)

. max Rk C−1
k,2 Rk = OP (1).
1≤k≤N/2

Let

b1 = b1 (N ) = (log N)α
. and b2 = b2 (N ) = N(log N )−β ,
4.3 Polynomial Regression 185

where .α > 0 and .β > 0. Repeating our previous arguments we get


⎛ ⎛ ⎞ ⎛ ⎛ ⎛ ⎛ ⎞ ⎞
−1 1 1
. max RT
k Ck,2 Rk = OP max ||S(k)||2 + OP max ||Ck ||2
1≤k≤b2 N 1≤k≤b2 N 1≤k≤b2
⎛ ⎛ ⎞
1
= OP max ||S(k)||2 + OP ((log N)−β ).
N 1≤k≤b2

The approximation in Assumption 4.3.2 implies


| k ⎛ ⎛ ⎞ / k ⎛ ⎞j |
|E i j |
1 | x |
. max max | E i − σ dWN,1 (x) | = OP (1). (4.3.12)
1≤k≤N k ζ 1≤k≤b2 | N 0 N |
i=1

We note that
⎛/ k ⎛ ⎞ ⎫ ⎛ ⎛ ⎛/ k ⎛ ⎞ ⎞ ⎫
x j D x 2j
. dWN,1 (x), 1 ≤ k ≤ N = W dx , 1 ≤ k ≤ N .
0 N 0 N

where .{W (x), x ≥ 0} is a Wiener process. By the scale transformation of the Wiener
process we have
| ⎛ ⎛/ ⎛ x ⎞2j ⎞| ⎛ ⎞
| k |
max ||W dx || = OP b2
1/2
. (4.3.13)
1≤k≤b2 0 N

and therefore
−1 ( )
. max RT
k Ck,2 Rk = OP (log N)
−β
.
1≤k≤b2

For the other term we redefine the decomposition (4.3.11) as


−1
RT
.
T T T
k Ck,1 Rk = (S (k) − (β̂ N − β 0 ) Ck,1 )Ck,1 (S(k) − Ck,1 (β̂ N − β 0 ))

= ST (k)C−1 T T
k,1 S(k) − 2(β̂ N − β 0 ) S(k) + (β̂ N − β 0 ) Ck,1 (β̂ N − β 0 ).

Using again Assumption 4.3.2 and (4.3.10), one can verify that
−1 −1
. max RT T
k Ck,1 Rk = max S (k)Ck,1 S(k) + OP (N
−1/2
(log N )α ),
1≤k≤b1 1≤k≤b1
−1
max RT
k Ck,1 Rk = max ST (k)C−1 −β
k,1 S(k) + OP ((log N) ),
b1 ≤k≤b2 b1 ≤k≤b2
−1
max RT
k Ck,1 Rk = max ST (k)C−1
k,1 S(k) + OP (1).
b2 ≤k≤N/2 b2 ≤k≤N/2

We can approximate .S(k) with Gaussian processes as in (4.3.12). According to


(4.3.13), the approximating process for each coordinate is a time transferred Wiener
186 4 Regression Models

process. Hence we can apply the Darling–Erdős result in Theorem A.2.3 and get

. max ST (k)C−1
k,1 S(k) = OP (log log log N)
1≤k≤b1

max ST (k)C−1
k,1 S(k) = OP (log log log N)
b2 ≤k≤N/2

1
max ST (k)C−1
P
.
b1 ≤k≤b2 k,1 S(k) → c1 ,
log log N

with some constant .c1 > 0. Our estimates yield


⎛ ⎫
−1 −1 −1 −1
. lim P max RT
k Ck,2 CN,1 Ck,2 Rk = max RT
k Ck,2 CN,1 Ck,2 Rk =1
N →∞ 1≤k≤N b1 ≤k≤b2

and

. max RT
k Ck,2 T
−1
CN,1 C−1 T −1 −β
k,2 Rk = max S (k)Ck,1 S(k) + OP ((log N ) ).
b1 ≤k≤b2 b1 ≤k≤b2

Thus
⎛ ⎫
−1 −1 −1
. lim P max RT C C C R
k k,2 N,1 k,2 k = max ST
(k)Ck,1 S(k) = 1.
N →∞ 1≤k≤N b1 ≤k≤b2

The approximation in Assumption 4.3.2 implies

1 || || ⎛ ⎞
. max ||S(k) − σ ┌ N,1 (k)|| = OP b−(1/2−ζ ) = oP (1/ log N ),
b1 ≤k≤b2 2
k

where
⎛ ⎛/ k / k / k ⎛ x ⎞p ⎞T
x
┌ N,1 (k) =
. dWN,1 (x), dWN,1 (x), . . . , dWN,1 (x) .
0 0 N 0 N

By the modulus of continuity of the Wiener process (see Theorem A.2.2) we have
−1
. max ┌ T
N,1 (k)Ck,1 ┌ N,1 (k)
b1 ≤k≤b2
−1
= sup ┌ T
N,1 (t)(NC1 (t/N))┌ N,1 (t) + oP ((log N)
1/2−α
).
b1 ≤t≤b2

By the scale transformation of the Wiener process we have

D
. sup ┌ T −1
N,1 (t)(NC1 (t/N)) ┌ N,1 (t) = sup ┌ T (s)C−1
1 (s)┌(s),
b1 ≤t≤b2 b1 /N ≤s≤b2 /N
4.3 Polynomial Regression 187

where
⎛ ⎛/ s / s / s ⎞T
┌(s) =
. dW (x), xdW (x), . . . , x p dW (x) ,
0 0 0

where .{W (x), x ≥ 0} is a Wiener process. Since the same argument can be used on
maxN/2≤k<N , we get that
.

−1 −1 −1
. max RT 2 T
k Ck,2 CN,1 Ck,2 Rk = max σ ┌ N,2 (k)Ck,1 ┌ N,2 (k)
N/2≤k<N b1 ≤k≤b2

+ oP (1/ log log N).

Hence the limit distribution is the same as the limit distribution of the maximum of
two independent copies of copies of .supb1 /N ≤s≤b2 /N ┌ T (s)C−11 (s)┌(s). Aue et al.
(2009b) obtained the limit distribution using some results on Legendre polynomials
and the general theory of the maximum of the square norm of stationary Gaussian
processes (Leadbetter et al., 1983; Piterbarg, 1996; Albin, 2001). ⨆

In the definition of .xi in (4.3.2) we can replace the polynomial with a smooth
function,
⎛ ⎛ ⎞
i
.xi = h , 1 ≤ i ≤ N, (4.3.14)
N

satisfying the following four assumptions:


Assumption 4.3.3 .h(u) is continuous on .[0, 1],
Assumption 4.3.4 The matrices
/ t / 1
T
. h(u)h (u)du and h(u)hT (u)du
0 t

are non–singular for all .0 < t < 1.


The next assumption says that .h(u) can be approximated with polynomials in a
neighbourhood of 0 and 1:
Assumption 4.3.5 There are p linearly independent vectors .a0,1 , a0,2 , . . . , a0,p
and non–negative integers .0 ≤ γ0,1 < . . . < γ0,p such that
|| ||
|| E
p ||
1 || γ0,i ||
. lim sup ||h(t) − a0,i t || < ∞.
t→0 t γ0,p +1 ||
i=1
||
188 4 Regression Models

Moreover, there are p linearly independent vectors .a1,1 , a1,2 , . . . , a1,p and non–
negative integers .0 ≤ γ1,1 < γ1,2 < . . . < γ1,p such that
|| ||
|| E
p ||
1 || γ1,i ||
. lim sup ||h(t) − a1,i (1 − t) || < ∞.
t→0 t γ1,p +1 || ||
i=1

We introduce the difference between .h(t) and the approximating polynomials:

E
p

.h0 (t) = h(t) − a0,i t γ0,i
i=1

and

E
p

.h1 (t) = h(t) − a1,i (1 − t)γ1,i .
i=1

Assumption 4.3.6 there are .C0 , C1 , 0 < τ0 , τ1 < 1 such that


|| ∗ ||
. ||h (t) − h∗ (s)|| ≤ C0 t γ0,p |t − s|, for all 0 ≤ s ≤ t ≤ τ0
0 0

and
|| ∗ ||
. ||h (t) − h∗ (s)|| ≤ C1 (1 − s)γ1,p |t − s|, for all 1 − τ1 ≤ s ≤ t ≤ 1.
1 1

Assumptions 4.3.3–4.3.6 are rather mild as it is only required that the regressors
are continuous and linearly independent in .(0, 1), and that they are smoothly
differentiable in a neighborhood of both 0 and 1. In particular, Assumptions 4.3.5
and 4.3.6 are used to determine the leading non–zero terms in the Taylor expansion
of .h(t) at 0 and 1. The polynomial regression satisfies Assumptions 4.3.3–4.3.6. We
detail two important examples where these conditions hold. Let

h(t) = (h1 (t), h2 (t), . . . , hp (t))T .


.

Example 4.3.1 If the data exhibit cyclical or seasonal behavior, it may be appropri-
ate to specify the regressors as trigonometric functions. Let .p = 2q with q being a
positive integer. One possibility is to pick Fourier frequencies .0 < ω1 < ω2 < . . . <
ωq < 1/2 and define .hl (t) = cos(2π ωl t), 1 ≤ l ≤ q and .hl (t) = sin(2π ωl−k t)
for .q < l ≤ p. This gives
q ⎡
E ⎛ ⎛ ⎞ ⎛ ⎛ ⎞⎤
2π ωl i 2π ωl i
xT
.i β0 = β0,l cos + β0,q+l sin .
N N
l=1
4.4 Non–linear Regression and Generalized Method of Moments 189

Example 4.3.2 In many applications it is useful to model the mean behavior of a


set of observations through logistic regression functions .hl (t) = exp(αl t), where
'
.αl /= αl' for .l /= l .

We use again the quadratic form .TN of (4.3.3) and .T̂N , the maximally selected
log likelihood ratio of (4.1.4). As in the polynomial case, these two statistics are
.

asymptotically equivalent under the no change in the regression null hypothesis.


Theorem 4.3.3 If .H0 of (4.1.2) Assumptions 4.3.2, and 4.3.3–(4.3.6) are satisfied,
then
⎧ ⎫
EE12
. lim P T̂N ≤ x + 2 log log N + p log log log N − 2a(p) = exp(−2ex/2 )
N →∞ σ2

and
⎧ ⎫
EE12
. lim P TN ≤ x + 2 log log N + p log log log N − 2a(p) = exp(−2ex/2 )
N →∞ σ2

for all .x ∈ R, where .a(p) is defined in (4.3.7).


Proof We follow the proof of Theorem 4.3.2. First we find the interval where the
maximum is reached, i.e. we need to compute the maximum on the sets.[b1 , b2 ]
and .[N − b2 , N − b1 ] instead of .[p, N − p]. In the second step we prove that
on these two sets we can replace .h(t) with the polynomial expansion provided
in Assumption 4.3.5. The result then follows as in Theorem 4.3.2. For detailed
arguments we refer to Aue et al. (2012). ⨆

4.4 Non–linear Regression and Generalized Method


of Moments

The residual based methods for testing the stability of the parameters in a linear
regression model introduced in Sect. 4.1 can be extended to general non–linear
regression models of the form

yi = h(xi , θ i ) + Ei , i ∈ {1, . . . , N },
.

where the .θ i ’s are d–dimensional parameter vectors. Under the null hypothesis

(1)
H0
. : θ1 = · · · = θN,
190 4 Regression Models

and the alternative is formulated as

HA : there is a k ∗ ∈ {1, . . . , N } such that θ 1 = θ 2 = . . .


(1)
.

= θ k ∗ /= θ k ∗ +1 = . . . = θ N .

The unknown common parameter vector under .H0(1) is denoted by .θ 0 . Using the
least squares principle, the estimator for .θ 0 is .θ̂ N , the location of the minimizer of

E
N
LN (θ ) =
. (yt − h(xt , θ ))2 ,
t=1

where the minimum is taken over a compact parameter space .O. We make the
following assumptions that are standard in non–linear least squares:
Assumption 4.4.1 The parameter space .O is a compact subset of .Rd , and .θ 0 is an
interior point of .O.
Assumption 4.4.2 There is a function .M : Rd |→ R so that
|| 2 ||
|| ∂ ||
. sup Eh (x0 , θ ) < ∞, sup || ||
|| ∂θ 2 h(xt , θ )|| ≤ M(xt ), EM(x0 ) < ∞,
2
θ∈O θ∈O
|| ||2
|| ∂ ||
||
E || h(x0 , θ 0 )||
|| < ∞, and E[h(x0 , θ 0 ) − h(x0 , θ )] > 0, if θ /= θ 0 .
2
∂θ

The conditions formulated in Assumption 4.4.2 are standard in deriving the


asymptotic normality of parameter estimates in non–linear regression, see e.g. Wu
(1981). Letting the residuals be defined by

Ẽt = yt − h(xt , θ̂ N ).
. (4.4.1)

we base a test of .H0(1) on the CUSUM process of the residuals


⎛ ⎞
LN
E LNt⎦ E ⎠
t⎦ N
1 ⎝
Z̃N (t) =
. Ẽi − Ẽi . (4.4.2)
N 1/2 N
i=1 i=1

(1)
Theorem 4.4.1 If .H0 holds, Assumptions 4.4.1 and 4.4.2 are satisfied , and .{zi =
(xT T
i , Ei ) , i ∈ Z} is .L –decomposable for some .ν > 4, then
ν

(i) If .I (w, c) < ∞ for some .c > 0, then

|Z̃N (N t/(N + 1))| D |B(t)|


. sup → sup .
0<t<1 w(t) 0<t<1 w(t)
4.4 Non–linear Regression and Generalized Method of Moments 191

(ii) If .p ≥ 1 and
/ 1 (t (1 − t))p/2
. < ∞,
0 w(t)

then
/ /
1 |Z̃N (N t/(N + 1))|p D
1 |B(t)|p
. dt → dt,
0 w(t) 0 w(t)

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge.


These results may also be considered in the context of generalized method of
moments (GMM) estimation. The basic notation that we use here is inspired by
Chapter 21 of Zivot and Wang (2006). The GMM estimator .θ̂N is generally the
solution of a moment equation satisfying

E
N
. g(xt , θ̂N ) = 0,
t=1

where .xt contains both model and instrumental variables. Let .mt (θ ) = g(xt , θ ),
and we assume that the parameter .θ ∈ O, where .O is a compact subset of .R. One
could more generally consider .θ ∈ Rd , .d ≥ 1. Model stability in this case can be
described as

H0(2) 0 = Em1 (θ0 )


.

= · · · = EmN (θ0 ), for a unique θ0 ∈ O,

and a single change point at time .k ∗ is characterized by

(2)
HA
. 0 = Em1 (θ ) = · · · = Emk ∗ (θ ) /= Emk ∗ +1 (θ )
= · · · = EmN (θ ) for some θ ∈ O.

(2)
Under .H0 , .θ0 denotes the true value of the parameter. We require that the
instrumental variables are stationary and weakly dependent (for example .Lν –
decomposable), which yields that .mt (θ ) is a stationary sequence. The following
assumptions are standard in GMM estimation, see for example the conditions of
Theorem 3.1 of Hansen (1982).
Assumption 4.4.3 .Em0 (θ ) = 0 if and only if .θ = θ0 .

| ∂ 4.4.4| There is a function .M : R |→ R so that∂.supθ∈O E|m0 (θ )| < ∞,


Assumption
supθ∈O | ∂θ
. m0 (θ )| ≤ M(x0 ), with .EM(x0 ) < ∞, and .E ∂θ m0 (θ ) is different from
zero in a neighborhood of .θ0 .
192 4 Regression Models

Assumption 4.4.5 There are independent Wiener processes .{WT ,1 (x), 0 ≤ x ≤


T /2} and .{WT ,2 (x), 0 ≤ x ≤ T /2} such that
| |
| Lx⎦ |
| E |
−κ |
. max x | ms (θ0 ) − σ WT ,1 (x)|| = OP (1) (4.4.3)
1≤x≤T /2 | s=1 |

and
| |
| E |
|
−κ |
T
|
. max (T − x) | ms (θ0 ) − σ WT ,2 (T − x)|| = OP (1) (4.4.4)
T /2≤x≤T −1 |s=Lx⎦+1 |

with some .σ > 0 and .0 < κ < 1/2.


Ling (2007) establishes general conditions in this setting under which Assump-
tion 4.4.5 holds. In this case we define the CUSUM process .Z̃N using the variables
.mt (θ̂N ) by

⎛ ⎞
LN
E LNt⎦ E
t⎦ N
1 ⎝
. Z̃N (t) = mt (θ̂N ) − mt (θ̂N )⎠ (4.4.5)
N 1/2 N
i=1 i=1

LN
E t⎦
1
= mt (θ̂N ).
N 1/2
i=1

Theorem 4.4.2 Suppose that .H0(2) holds, Assumptions 4.4.3–4.4.5 are satisfied,
and that .{xi , i ∈ Z} is .Lν –decomposable for some .ν > 4, then the conclusions
of Theorem 4.4.1 remain true with .Z̃N defined in (4.4.5).
We note that these results may also be extended to Darling–Erdős and Rényi–
type functionals of .Z̃N . We refer the proofs of Theorems 4.4.1 and 4.4.2 to Górecki
et al. (2018) and Horváth et al. (2020).

4.5 Changes in the Distributions of the Innovations

So far we have investigated detecting and estimating of possible changes in the


parameters of regression models. The innovations may satisfy Assumptions 3.3.1
and 3.3.2, which allow changes in their distributions. Now we discuss a change
point detection procedure to evaluate if the distribution of the errors change at
unknown times. As in Sect. 2.4, let .F1 , . . . , FN denote the distribution functions of
.E1 , . . . , EN . We wish to test if the null hypothesis of (2.4.1) holds, in particular that

.F1 = · · · = FN . We show that Theorem 2.4.3 remains true when the innovations .Ei
4.5 Changes in the Distributions of the Innovations 193

are replaced with their estimates .Êi of (4.1.26). Let


⎛ ⎞
LN
E LNt⎦ E
t⎦ N
RN (t, x) = N −1/2 ⎝
. 1{Êi ≤ x} − 1{Êi ≤ x}⎠ ,
N
i=1 i=1

0 ≤ t ≤ 1, −∞ < x < ∞.

Let F denote the common distribution function under .H0 of (2.4.1). We require in
this case a somewhat stronger condition than Assumption 2.4.1:

Assumption 4.5.1 F is twice differentiable, and its derivative is bounded on the


support of F .
Theorem 4.5.1 If .H0 of (4.1.2), Assumptions 4.1.1,4.1.2, and 4.5.1 hold, and
.{zi = (xT T
i , Ei ) , i ∈ Z} is .L –decomposable for some .ν > 4, and .α >
ν

4 in Definition 3.1.1, then we can define a sequence of Gaussian processes


.{┌ N (t, x), 0 ≤ t ≤ 1, −∞ < x < ∞} such that

. sup sup |RN (t, x) − ┌ N (t, x)| = oP (1),


0≤t≤1 −∞<x<∞

E┌ N (t, x) = 0 and
.

E┌ N (t, x)┌ N (t ' , x ' )


.


E
= (min(t, t ' ) − tt ' ) E[(1{E0 ≤ x} − F (x))(1{El ≤ x ' } − F (x ' )),
l=−∞

0 ≤ t, t ' ≤ 1 and .−∞ < x, x ' < ∞.


.

Proof To simplify the argument, we assume that the dimension of the covariate
vector is .d = 2. Let
⎛ LN
E t⎦ ⎛ ⎞
R̄N (t, x, u) = N −1/2
. 1{Ei ≤ x + N −1/2 xT
i u} − F (x + N −1/2 T
xi u)
i=1
LN
E t⎦ ⎫
− (1{Ei ≤ x} − F (x)) , 0 ≤ t ≤ 1, x ∈ R, u ∈ R2 .
i=1

We show that for any .C > 0,

. sup sup sup R̄N (t, x, u) = oP (1), (4.5.1)


0≤t≤1 −∞<x<∞ u∈I(C)
194 4 Regression Models

where

I(C) = [−C, C]2 .


.

The proof of (4.5.1) is based on the arguments in Davydov and Zitikis (2008). We
cover .I(C) with the smallest number of squares .U(k, l) of length .1/M, i.e. the
edges of the squares are .(k/M, l/M), ((k + 1)/M, l/M), (k/M, (l + 1)/M) and
.u(k, l) = ((k + 1)/M, (l + 1)/M). It follows from elementary calculation that

| |
| |
. |1{Ei ≤ x + N −1/2 xT
i u} − 1{Ei ≤ x + N
−1/2 T
xi u(k, l)}|

≤ 1{Ei ≤ x + N −1/2 (xT


i u(k, l) + 4x̄i /M)}

− 1{Ei ≤ x + N −1/2 (xT


i u(k, l) − 4x̄i /M)},

for all .u ∈ U(k, l), where .x̄i is the maximum norm of .xi . Thus we get
| |
| −1/2 T |
. |1{Ei ≤ x + N −1/2 xT
i u} − F (x + N xi u)| (4.5.2)
|
|
≤ |1{Ei ≤ x + N −1/2 (xT i u(k, l) + 4x̄i /M)}
|
|
−F (x + N −1/2 (xTi u(k, l) + 4 x̄i /M)) |
|
|
+ |1{Ei ≤ x + N −1/2 (xT i u(k, l) − 4x̄i /M)}
|
|
−F (x + N −1/2 (xTi u(k, l) − 4x̄i /M))|
| |
| |
+ |F (x + N −1/2 (xTi u(k, l) + 4 x̄ i /M)) − F (x + N −1/2 T
xi u(k, l)) |
| |
| |
+ |F (x + N −1/2 xT
i u(k, l)) − F (x + N −1/2 T
(x i u(k, l) − 4x̄i /M)) |.

It follows from Berkes et al. (2009b) that

. max max max |R̄N (t, x, u(k, l))| = oP (1). (4.5.3)


0≤t≤1 −∞<x<∞ |k|,|l|≤MC+1

If .f (x) = F ' (x), then for any .u ∈ U(k, l) and .x ∈ R


|
|
. |F (x + N −1/2 xT
i u) − F (x + N
−1/2 T
(xi u(k, l) + 4x̄i /M))
|
|
−f (x)N −1/2 (xTi (u − u(k, l)) − 4x̄i /M) |
4.5 Changes in the Distributions of the Innovations 195

≤ c1 (N −1/2 (xT
i (u − u(k, l)) − 4x̄i /M))
2

1
≤ c2 x̄i4 .
NM 2
By the ergodic theorem we have

1 E 4 1 E
N N
. x̄i = OP (1) and ||xi || = OP (1).
N N
i=1 i=1

Thus we conclude
N |
E |
| |
. sup N −1/2 |F (x + N −1/2 xT
i u) − F (x + N −1/2 T
(xi u(k, l) + 4x̄i /M))|
u∈U(k,l) i=1

1 E T E
N N
c3
≤ f (x) |xi (u − u(k, l)) − 4x̄i /M| + 3/2 2 x̄i4
N N M
i=1 i=1
⎛ ⎞
c4 1 E
N
≤ (x̄i4 + 1) . (4.5.4)
M N
i=1

We can use the proof of (4.5.4) to get similar estimates for the increments of
F (x + xT
.
i u). Now (4.5.1) follows from (4.5.2)–(4.5.4). Since (4.5.1) holds for all
C, (4.1.29) implies

|RN (t, x) − RN
. (t, x)| = oP (1),

where
⎛ ⎞
LN
E LNt⎦ E
t⎦ N

RN
. (t, x) = N −1/2 ⎝ 1{Ei ≤ x} − 1{Ei ≤ x}⎠ ,
N
i=1 i=1

0 ≤ t ≤ 1, −∞ < x < ∞.

∗ (t, x) is established in Theorem A.1.4 in the


The weak approximation of .RN
Appendix. ⨆

The limit in Theorem 4.5.1 is not pivotal even in case of independent .Ei ’s.
However, with the quantile transformation we can find an asymptotically pivotal
limit when the innovations are independent. Let

1 E
N
F̂N (x) =
. 1{Êi ≤ x}
N
i=1
196 4 Regression Models

be the empirical distribution function of the residuals .Ê1 , . . . , ÊN and .F̂N−1 (u) denote
the quantile function of the residuals. Now we compute .Q̃N (t, u) of Sect. 2.4 from
the residuals:

L(NE+1)t⎦ { }
−1/2 ⎝
.Q̃N (t, u) = N 1 Eˆi ≤ F̂N−1 (u)
i=1

L(N + 1)t⎦ E { }
N
− 1 Eˆi ≤ F̂N−1 (u) ⎠ , (4.5.5)
N
i=1

0 < t, u < 1. The proof of Theorem 4.5.1 shows that Theorem 2.4.3 remains true if
.

Q̃N (t, u) is defined by (4.5.5).


.

If the innovations .{Ei , i ∈ Z} are independent and identically distributed random


variables, then the weak limit of .{Q̃N (t, u), 0 ≤ t, u ≤ 1} is the Brownian pillow.
We refer to Sect. 2.4 and Example 2.5.3 for a brief discussion on the distribution of
the functionals of the Brownian pillow.

4.6 Data Examples

Example 4.6.1 (Environmental Kuznets Curve in the US) In this example we


consider a change point analysis of the environmental Kuznets curve (EKC) for
the United States of America (U.S.). Kuznets (1955) identified an inverted U–
shaped relationship between economic growth and income inequality in countries.
Grossman and Krueger (1991) later demonstrated that the Kuznets’s inverted U–
shaped association also appears between economic growth and environmental
deterioration. An explanation for this is that earlier stages of economic growth
often come with environmental degradation due to deforestation, air pollution, and
other sources. As a nation’s economy develops though, the pace of environmental
degradation begins to decrease, eventually even improving, due to the adoption of
cleaner technologies. For recent studies in this vein, we refer to Churchill et al.
(2018) and Shahbaz and Sinha (2019).
In this application, we are interested in detecting and estimating change points
in the EKC in the U.S.. We used (log–differenced) CO2 emission per capita as
an environmental deterioration indicator, which we modelled in terms of the log–
differenced GDP per capita, and its square GDP2 . The sample covers the period
1800–2018 with annual frequency, providing N = 219 observations in total. This
dataset is publicly available at ourworldindata.org. The linear model we
consider is

(CO2 )i = β0 + β1 (GDP)i + β2 (GDP)2i + Ei ,


. i ∈ {1, . . . , 218}. (4.6.1)
4.6 Data Examples 197

0.25

0.25
1800 - 1878 1800 - 1878
1879 - 1977 1879 - 1977
0.15

0.15
1978 - 2018 1978 - 2018
CO2

CO2
0.05

0.05
-0.05

-0.05
-0.06 -0.02 0.02 0.06 0.000 0.001 0.002 0.003 0.004 0.005
GDP GDP2

Fig. 4.1 Scatter plots of the estimated model based on the three sub–samples obtained by the
segmentation. The left sub–plot shows the scatter of the (GDP) against (CO2 ), and the right sub–
plot shows the scatter of the (GDP)2 against (CO2 )

Here we applied tests for the stability of the model parameters in (4.6.1) based
on the maximum of the standardized–quadratic form
⎡ ⎞⎤1/2
⎛ ⎛ ⎞1/2 ⎛E
k
⎞T ⎛ k
E
N ⎣
.VN = a(log N ) max xi Êi D̂−1 xi Êi ⎦
3<k<N −3 k(N − k)
i=1 i=1

− b3 (log N).

The matrix D̂ was estimated using a Bartlett kernel and bandwidth selected using
the method of Andrews (1991). By Theorem 4.1.3, we expect under stability of the
parameter VN follows approximately a Gumbel law. Using binary segmentation with
the threshold determined as the 95% quantile of the approximate null distribution of
VN , two change points were detected: years 1878, and 1977.
Figure 4.1 displays the scatter plots of the model in the three sub–samples
determined by these change point estimates. We might think of these change points
splitting the historic U.S. economic development into three phases: (1) early growth
phase, from 1800 to 1878, with the coefficients β̂1 = 0.92 and β̂2 = 3.81; (2) middle
growth phase, from 1879 to 1977, with the coefficients β̂1 = 0.96 and β̂2 = −1.40;
(3) late growth phase, from 1978 to 2018, with the coefficients β̂1 = 1.24 and
β̂2 = −6.16. We noticed that while the coefficient corresponding to GDP are always
near one, the parameter β2 appears to fluctuate from positive to negative, which
appears to support inverted U–shaped curve described in the EKC theory.
Example 4.6.2 (COVID-19 Confirmed Cases and Deaths in the U.K.) In this
example we consider a change point analysis of the linear relationship between
COVID-19 deaths and confirmed cases. Change point methods have been applied
frequently to COVID-19 data in order to evaluate the effects of public health
measures and changing environmental conditions on how the pandemic progressed;
see e.g. Jiang et al. (2023). The time series of the number of confirmed cases
198 4 Regression Models

and deaths due to COVID-19 was collected from the GOV.UK website https://
coronavirus.data.gov.uk/details/cases, covering the period from March 11, 2020 to
November 4, 2021 (N = 603). We considered two series: yi , the log differenced
deaths due to COVID-19 in the UK, and xi , the log differenced confirmed cases of
COVID-19 in the UK. We expected a positive correlation between confirmed cases
and future deaths. By calculating the cross–correlation function between the log–
differenced confirmed cases and deaths series, we observed the strongest correlation
between changes in deaths and confirmed cases at a lag of 14 days. Thus, we regress
the log–differenced deaths on the log–differenced confirmed cases, using the linear
model:

yi = β0 + β1 xi−14 , i ∈ {1, . . . , N}.


.

In order to evaluate the stability of the model parameters, we considered the


CUSUM process ẑN (t) = D̂−1/2 ẐN ((N + 1)t/N) defined in (4.1.32), where D̂ was
estimated using a Bartlett kernel with bandwidth selected according to the method
of Andrews (1991). With ẑN (t) = (ẑN,1 (t), . . . , ẑN,d (t))T , we considered the test–
statistic
1
.QN = a(log N) max sup |ẑN,j (t)| − b(log N).
1≤j ≤2 1/N <t<1−1/N w(t)

According to Theorem 4.1.4, this statistic follows approximately a Gumbel law


under parameter stability. We applied binary segmentation based on this statistic
with the threshold taken to be the 95% quantile of the distribution in Theo-
rem 4.1.4(ii). This lead to a three change point model, with change point estimates
on 28 April 2020, 20 January, and 7 July 2021.
Using these change point estimates, we segmented the whole sample into four
sub–samples. Figure 4.2 shows the scatter plot of log–differenced deaths versus
100

11/Mar/20 - 28/Apr/20
29/Apr/20 - 20/Jan/21
21/Jan/21 - 7/Jul/21
8/Jul/21 - 4/Nov/21
50
Diff deaths
0
-50

-2000 -1000 0 1000 2000


Diff confirmed cases

Fig. 4.2 Scatter plots of the estimated model based on four sub–samples
4.6 Data Examples 199

70000

1400
14-day lagged confirmed cases
deaths

1200
break
realised vaccination rate 30% 50% 70%
50000

1000
confirmed cases

800

deaths
30000

600
400
10000

200
0

0
26/Mar/20 08/Oct/20 22/Apr/21 04/Nov/21

Fig. 4.3 Detected breaks with the shaded areas indicating the first, second and third national
lockdown phases

lagged log–differenced confirmed cases in the four sub–samples. The results show
that there was, conspicuously, a negative relationship between confirmed cases and
deaths in the first sub–sample. This might be attributed to the lack of COVID-19
tests at the beginning of the pandemic. A relatively strong positive relationship
between confirmed cases and deaths emerge in the second and third sub–samples,
while the positive linear relation weakens in the third and fourth sub–sample. This
might be explained by the different seasons spanning each period, the national
lockdown policy in the UK, and the administration of vaccines. The UK instituted
three national lockdowns: the first phase was 26 March to 16 June 2020, second
phase was 31 Oct to 2 Dec 2020, and third phase was 1 Jan to 12 Apr 2021.
Figure 4.3 displays the raw data of confirmed cases and deaths with the detected
breaks as they coincide with the dates of lockdowns and national vaccination
rates. We found then that during the first and third national lockdowns, a positive
relationship between confirmed deaths and lagged cases was maintained, while after
high vaccination rates were achieved following June 2021, the relationship became
weaker.
Example 4.6.3 (Changing Trends in Global Near–Surface Temperature) As
an application of Theorem 4.3.2, Aue et al. (2009b) analyzed the average global
near–surface temperatures collected in the benchmark data set HadCRUT31 (see
Brohan et al., 2006) which is frequently used in climatology to study the impact of
global warming. The data extends earlier versions compiled by Jones (1994) and
Jones and Moberg (2003). HadCRUT3 has been updated making use of additional
observations and advances in the marine component of the data set, blending the
measurements of over 4000 land and marine stations located around the globe. The
time series is commonly used to illustrate the increase in global mean temperatures
since the 1850s. For each year from 1850 until 2008, anomalies in the average
200 4 Regression Models

Fig. 4.4 The global mean


temperature anomalies (◦ C)
with respect to a 1961–1990
baseline (dotted line), and the
piecewise quadratic
polynomial fit (solid line).
The four pieces are obtained
using binary segmentation.
The three changes in 1874,
1922 and 1944 are indicated
as dashed vertical lines

temperatures are reported in degrees Celsius (◦ C) that are centered using the
baseline temperature calculated as the average from 1961 and 1990. A time series
plot of HadCRUT3 is shown in Fig. 4.4. Aue et al. (2009b) conducted preliminary
model fitting with polynomials of up to seventh order. Inspecting the corresponding
model residuals, they found that neither of these polynomial fits provides an
acceptable description of the whole data set of 159 observations. Hence instead
piecewise quadratic polynomials were fit to the series using binary segmentation
obtained from a repeated application of the test statistic T̂N in Theorem 4.3.3,
with the threshold of binary segmentation determined as the 95% quantile of the
corresponding null limiting distribution. This lead to three change point estimators
and four sub–subsamples with an approximately quadratic trend: years 1874, 1922
and 1944. The resulting estimated trend is also shown in Fig. 4.4, with the estimated
break points shown as vertical lines. This trend is mostly in line with the conclusions
drawn in Brohan et al. (2006), who argue that there have been two major periods of
(unusual and man–made) increases in the average global near-surface temperatures.
The first approximately from 1930 to 1940, and the second in the 1970s.

4.7 Exercises

Exercise 4.7.1 Let {xi , i ∈ Z} be independent and identically distributed random


vectors, {Ei , i ∈ Z} be independent and identically distributed random variables
with EE0 = 0, 0 < EE02 = σ 2 , E|Ei |ν < ∞ and E||x0 ||ν < ∞ with some ν > 2.
Assume the two sequences are independent. In the linear model yi = xTi β i +Ei , 1 ≤
i ≤ N, we wish to test H0 : β 1 = · · · = β N . The residuals are Êi,N,1 = yi −
4.7 Exercises 201

xT
i β̂ N,1 , 1 ≤ i ≤ N, where β̂ N,1 is the least square estimator. We use the statistic
| k |
|E k E
N |
1 | |
TN =
. max | Êi,0,N − Êi,0,N | , (4.7.1)
rN
1/2 1≤k<N | N |
i=1 i=1

where
⎧ ⎛ ⎞2

⎨E k
1 E k
.rN = min ⎝Êi,0,k − Êj,0,k ⎠
1≤k≤N ⎪
⎩ i=1 k
j =1

⎛ ⎞2 ⎫


E
N
1 E
N
+ ⎝Êi,k,N − Êj,k,N ⎠ ,
N −k ⎪

i=k+1 j =k+1

Êi,k,j = yi − xT i β̂ k,j , j + 1 ≤ i ≤ k, β̂ k,j is the least square estimator computed


from {yi , xi , k + 1 ≤ i ≤ j }. Compute the limit distribution of TN under the null
hypothesis.
Exercise 4.7.2 Let {xi , i ∈ Z} be independent and identically distributed random
vectors, {Ei , i ∈ Z} be independent and identically distributed random variables
with EE0 = 0, 0 < EE02 = σ 2 , E|Ei |ν < ∞ and E||x0 ||ν < ∞ with some ν > 2.
The two sequences are independent. In the linear model yi = xT i β i + Ei , 1 ≤ i ≤ N
we wish to test H0 : β 1 = β 2 = . . . = β N against the alternative

xT
i β 1 + Ei , 1 ≤ i ≤ k 1 ,
yi =
.
xT
i β k1 +1 + Ei , k1 + 1 ≤ i ≤ N.

We use TN of (4.7.1). Show that TN → ∞ in probability under the alternative, if


k1 = LNθ1 ⎦, 0 < θ1 < 1, β 1 = β 1 (N ), β k1 +1 = β k1 +1 (N ) and

N 1/2 ||β 1 − β k1 +1 || → ∞.
.

Exercise 4.7.3 Let {xi , i ∈ Z} be independent and identically distributed random


vectors, {Ei , i ∈ Z} be independent and identically distributed random variables
with EE0 = 0, 0 < EE02 = σ 2 , E|Ei |ν < ∞ and E||x0 ||ν < ∞ with some ν > 2.
The two sequences are independent. In the linear model yi = xT
i β i +Ei , 1 ≤ i ≤ N ,
202 4 Regression Models

we wish to test H0 : β 1 = · · · = β N . The residuals are Êi,N,1 = yi − xT


i β̂ N,1 , 1 ≤
i ≤ N, where β̂ N,1 is the least square estimator. We use the statistic
| k |
|E k E
N |
1 | |
.TN = max | Ê i,0,N − Ê i,0,N |, (4.7.2)
1/2 1≤k<N | N |
r N i=1 i=1

where
⎧ ⎛ ⎞2
⎪ ⎛ ⎞2
⎨E k
1E
k Ej
1 Ej
.rN = min Êi,0,k − Êl,0,k + ⎝Êi,k,j − Êl,k,j ⎠
1≤k≤N ⎪
⎩ i=1 k j −k
l=1 i=k+1 l=k+1

⎛ ⎞2 ⎫
E
N E
N ⎪

⎝Êi,j,N 1
+ − Êl,j,N ⎠ ,
j −k ⎪

i=j +1 l=j +1

Êi,k,j = yi − xT i β̂ k,j , j + 1 ≤ i ≤ k, β̂ k,j is the least square estimator computed


from {yi , xi , k + 1 ≤ i ≤ j }. Compute the limit distribution of TN under the null
hypothesis.
Exercise 4.7.4 Let {xi , i ∈ Z} be independent and identically distributed random
vectors, {Ei , i ∈ Z} be independent and identically distributed random variables
with EE0 = 0, 0 < EE02 = σ 2 , E|Ei |ν < ∞ and E||x0 ||ν < ∞ with some ν > 2.
The two sequences are independent. In the linear model yi = xT i β i + Ei , 1 ≤ i ≤ N
we wish to test H0 : β 1 = β 2 = . . . = β N against the alternative
⎧ T
⎨ xi β 1 + Ei , 1 ≤ i ≤ k1 ,
.yi = xT β + Ei , k 1 + 1 ≤ i ≤ k2 ,
⎩ iT k1 +1
xi β k2 +1 + Ei , k2 + 1 ≤ i ≤ N.

We use TN of (4.7.2). Find conditions that imply that TN → ∞ in probability under


the alternative.
Exercise 4.7.5 Let {xi , i ∈ Z} be independent and identically distributed random
vectors, {Ei , i ∈ Z} be independent and identically distributed random variables
with EE0 = 0, 0 < EE02 = σ 2 , E|Ei |ν < ∞ and E||x0 ||ν < ∞ with some ν > 2.
The two sequences are independent. In the linear model yi = xT i β i +Ei , 1 ≤ i ≤ N,
we wish to test H0 : β 1 = β 2 = . . . = β N . The residuals are Êi,N,1 = yi −
xT
i β̂ N,1 , 1 ≤ i ≤ N, where β̂ N,1 is the least square estimator. We use the statistic
⎛ || ||
|| ||
TN = N −3/2
. max k(j − k) ||β̂ 0,k − β̂ k,j || (4.7.3)
1≤k<j <N
|| ||⎞
|| ||
+(j − k)(N − j ) ||β̂ k,j − β̂ j,N || ,
4.7 Exercises 203

where β̂ k,j is the least square estimator computed from {yi , xi , k + 1 ≤ i ≤ j }.


Compute the limit distribution of TN under the null hypothesis.
Exercise 4.7.6 Let {xi , i ∈ Z} be independent and identically distributed random
vectors, {Ei , i ∈ Z} be independent and identically distributed random variables
with EE0 = 0, 0 < EE02 = σ 2 , E|Ei |ν < ∞ and E||x0 ||ν < ∞ with some ν > 2.
The two sequences are independent. In the linear model yi = xT i β i + Ei , 1 ≤ i ≤ N
we wish to test H0 : β 1 = β 2 = . . . = β N against the alternative
⎧ T
⎨ xi β 1 + Ei , 1 ≤ i ≤ k1 ,
.yi = xT β + Ei , k 1 + 1 ≤ i ≤ k2 ,
⎩ iT k1 +1
xi β k2 +1 + Ei , k2 + 1 ≤ i ≤ N.

We use TN of (4.7.3). Show that TN → ∞ in probability if β 1 = β 1 (N ), β k1 +1 =


β k1 +1 (N ), β k2 +1 = β k2 +1 (N ) and

N 1/2 min(||β k1 +1 − β 1 ||, ||β k2 +1 − β k1 +1 ||) → ∞.


.

Exercise 4.7.7 Let {xi , i ∈ Z} be independent and identically distributed random


variables, {ηi , i ∈ Z} be independent and identically distributed random variables
with Eη0 = 0, 0 < Eη02 = σ 2 , E|ηi |ν < ∞ and E|x0 |ν < ∞ with some ν > 2.
The two sequences are independent. In the linear model yi = xi βi + a(i/N)ηi , 1 ≤
i ≤ N we wish to test H0 : β 1 = β 2 = . . . = β N , where a(u), 0 ≤ u ≤ 1 is a
Riemann integrable function. We use the statistic
| |
1 | |
TN =
. max k(N − k) |β̂0,k − β̂k,N | , (4.7.4)
N 3/2 1≤k<N

β̂k,j is the least square estimator computed from {yi , xi , k + 1 ≤ i ≤ j }. Compute


the limit distribution of TN under the null hypothesis.
Exercise 4.7.8 Let {xi , i ∈ Z} be independent and identically distributed random
variables, {ηi , i ∈ Z} be independent and identically distributed random variables
with Eη0 = 0, 0 < Eη02 = σ 2 , E|ηi |ν < ∞ and E|x0 |ν < ∞ with some ν > 2.
The two sequences are independent. In the linear model yi = xi βi + a(i/N)ηi , 1 ≤
i ≤ N we wish to test H0 : β 1 = β 2 = . . . = β N , against the alternative

xi β1 + Ei , 1 ≤ i ≤ k1 ,
.yi =
xi βk1 +1 + Ei , k1 + 1 ≤ i ≤ N.

where a(u), 0 ≤ u ≤ 1 is a Riemann integrable function. We use the statistic TN of


(4.7.4). Show that TN → ∞, if k1 = LNθ1 ⎦, 0 < θ1 < 1, β1 = β1 (N ), βk1 +1 =
βk1 +1 (N ) and

N 1/2 |β1 − βk1 +1 | → ∞.


.
204 4 Regression Models

4.8 Bibliographic Notes and Remarks

Testing for change points in linear models appears to have been initiated in Quandt
(1958) and Quandt (1960) who suggested maximally selected statistics and provided
practical advice how to obtain critical values. Gombay and Horváth (1994), Horváth
(1995) and Horváth and Shao (1995) obtained the limit distributions of some of
the test statistics proposed by Quandt (1958), Quandt (1960) including maximally
selected .F –statistics and the likelihood ratio. McCabe and Harrison (1980) also
contributed to this literature and advise the use of ordinary least squares residuals
rather than recursive cumulative sum control chart (CUSUM)-type tests. Later
McCabe (1988), using a multiple decision theory approach, shows that the CUSUM
test is optimal in a decision theoretic sense for structural stability in scale and
variance models, and also that the CUSUM-of-squares test is similarly optimal for
structural stability in variance of linear regression models. Turning to estimation of
the time change of change, Hušková (1996) gave large sample approximation for
the estimator of the time of change assuming that we have exactly one change in
the regression coefficients during the observation period. The serial independence
of the error terms is assumed in these early articles. Andrews (1993) provides a
general methodology to test for the stability of random systems from an economic
viewpoint. Ghysels et al. (1997), Bai (1999), Bai and Perron (1998) and Hall et al.
(2012) followed the suggestions of Andrews (1993), and they also used maximally
selected statistics, but the maxima were not computed for all observations points,
and a fraction of early and late observations are trimmed. Horváth et al. (2017b)
derived the limit distribution of a maximally selected test which is derived under the
assumption that there are exactly R changes in the parameters. Their statistic is a
maximally selected weighted likelihood ratio.
Bai (1999) used the likelihood ratio test in linear models. He derived the
limit distribution of .maxLN δ⎦≤k≤N −LN δ⎦ (−2 log Λk ) which follows from (4.1.24),
if .0 < δ < 1/2. The second part of Theorem 4.1.2 shows that for .δ = 0 we
obtain a Darling–Erdős type result. Bai (1995) derives the limit distribution of the
estimator for the time of change along the lines of Theorem 4.2.3 with .κ = 1/2, but
under stricter conditions on .ΔN , the size of the change. He also points out that the
estimator is related to the sup–Wald–type statistic. Bai (1999) derived the likelihood
ratio assuming that multiple changes can occur in the regression parameters. Bai
and Perron (1998) uses least squares to detect multiple changes and obtains the
limit distributions of the estimators for the time of change, and Bai and Perron
(2003) investigates computational issues of change point tests. Bai (1995) develops
an asymptotic theory for least absolute deviation estimation of a shift in linear
regressions. Rates of convergence and asymptotic distributions for the estimated
regression parameters and the estimated shift point are also derived. One of the
examples in Horváth et al. (2022) is the heavily weighted CUSUM process of the
residuals. They also derived tests based on the comparison of the estimators when
the change occurs late or early in Horváth et al. (2022).
4.8 Bibliographic Notes and Remarks 205

Nyblom (1989) derives the locally best invariant test as a Lagrange multiplier
test and shows that it is a quadratic form of the sums of the weighted residuals.
Hansen (1997) provides a method to compute critical values for maximally selected
but heavily truncated standardised statistics, like in Theorems 4.1.25(ii)–4.1.7(ii),
but the maximum is taken on .LNα⎦ ≤ k ≤ LNβ⎦ with some .0 < α < β < 1.
Perron et al. (2020) extended the likelihood method to detect changes in regression
parameter and variance at the same time in Csörgő and Horváth (1997) to dependent
observations. Hall et al. (2015) uses least squares with penalty term to fit a change
point model to the data. Kurozumi and Tuvaandorj (2011) considers the issue of
selecting the number of regressors and the number of structural breaks in multivari-
ate regression models in the possible presence of multiple structural changes. They
develop a modified Akaike information criterion, a modified Mallows’ criterion, and
a modified Bayesian information criterion. Lin and Teräsvirta (1994) assumes that
instead of a single jump the regression parameter changes according to a continuous
function after an unknown time.
Theorems 4.1.3 and 4.1.4 are taken from Horváth et al. (2023a) who also
considered the case when the errors are heteroscedastic.
Kulperger (1985) investigates the asymptotic properties of the CUSUM process
of the residuals in polynomial regression. He points out these are different from
the case when (4.3.1) holds. Theorem 4.3.1 is due to Hansen (2000). Albin and
Jarušková (2003) provides test to find changes in linear trends, and they proved
Theorem 4.3.2 when .p = 1. Our proofs of the results in Sect. 4.3 are based on Aue
et al. (2008), Aue et al. (2009b), and Aue et al. (2012). Aue et al. (2008), Aue et al.
(2012) uses the maximally selected likelihood ratio method to test for stability of the
parameter against exactly one change. However, they also showed that the derived
tests are consistent against several changes under the alternative, and discuss the
applicability of the limit results in case of small and moderate sample sizes.
Neumeyer and Keilegom (2009) use nonparametric kernel estimators to check
the stability of the innovations.
We assumed that the variance of the errors remain the same even if the regression
parameters remain the same. The results of the present section can be extended to
cover the changing variance case as well (see Bai, 1997, 1999 and Bai and Perron
(1998, 2003)). If we know that some of the regression parameters do not change,
they may be treated as nuisance parameters.
Chapter 5
Parameter Changes in Time Series
Models

We develop in this chapter the asymptotic theory surrounding change point methods
for many popular time series models. Although up to this point we have generally
taken into consideration potential serial dependence in the observations under study,
in this chapter we are concerned with detecting change points in the parameters for
models specifically designed to capture the serial dependence structure of a time
series. To begin, in Sect. 5.1 we consider change point methods for autoregressive,
moving average (ARMA) models. Dynamic regression models for a scalar time
series modelled jointly with covariate series are considered in Sect. 5.2. Random
coefficient autoregressive models are studied in Sect. 5.3. In Sect. 5.4 we consider
generalized autoregressive conditionally heteroscedastic (GARCH) models along
with other models for conditionally heteroscedastic time series. Extensions of these
approaches to linear and non–linear multivariate time series models are considered
in Sects. 5.5 and 5.6.

5.1 ARMA Models

We say that a time series .{yi , i ∈ Z} follows an ARMA process of autoregressive


order d and moving average order r (ARMA(.r, d)) if there exists scalar parameters
.φ1 , . . . , φd and .ψ1 , . . . , ψr , (.φd /= 0, and .ψr /= 0), and an innovation sequence

.{Ei , i ∈ Z} so that

yi = φ1 yi−1 + . . . + φd yi−d + Ei + ψ1 Ei−1 + . . . + ψr Ei−r .


.

An autoregressive process of order d (AR(d)) follows the above model where .ψ1 =
· · · = ψr = 0, and similarly a moving average model of order r (MA(r)) is as
above with .φ1 = · · · = φd = 0. Change point analysis of pure AR processes can
be framed simply as a change in a linear model as studied in Chap. 4. An AMOC

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 207
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_5
208 5 Parameter Changes in Time Series Models

change point model for an AR(d) process can be written as



xT
i β 0 + Ei , if 1 ≤ i ≤ k ∗ ,
.yi = T (5.1.1)
xi β A + Ei , if k ∗ + 1 ≤ i ≤ N,

where

xi = (yi−1 , yi−2 , . . . , yi−d )T .


.

The null hypothesis of no change in the parameters is

H0 : β 0 = β A ,
. (5.1.2)

and the alternative of a single change is

HA : β 0 /= β A .
.

Let .β̂ N be the least squares estimator for the autoregressive parameter that
minimizes as a function of .β the sum of squares

Σ
N
. (yi − xT 2
i β) .
i=d+1

We note that this estimator differs in only finitely many terms with the estimator
obtained by minimizing

Σ
N
. (yi − xT 2
i β) , (5.1.3)
i=1

which depends on unobserved, past values of the series .y0 , . . . , y−d . In the asymp-
totic arguments below to lighten the notation we assume the estimator is computed
to minimize (5.1.3), which as we show is of no asymptotic consequence. Similarly
we may define .β̂ k,1 and .β̂ k,2 , the least-squares estimators of the parameters based
on the first k and last .N − k observations, as in (4.1.6).
We define the model residuals as

Êi = yi − xT
. i β̂ N , (5.1.4)

and write .β 0 = (β1,0 , . . . , βd,0 )T . In this way we may also define the likelihood
ratio statistic as in (4.1.4), and the process of the difference between the estimated
parameters in (4.1.36).
5.1 ARMA Models 209

Under the null hypothesis, there exists a stationary series satisfying (5.1.1) if the
following conditions are satisfied:
Assumption 5.1.1 The roots of the polynomials .1 − β1,0 t − · · · − βd,0 t d and .1 −
β1,A t − · · · − βd,A t d are outside the unit circle in .C.
Assumption 5.1.2 .{Ei , i ∈ Z} are independent and identically distributed random
variables, .EE0 = 0, .E|Ei |ν < ∞ with some .ν > 4.
We discuss at the end of this section how Assumption 5.1.2 can be replaced
with weaker conditions in some cases, including allowing for serially dependent
innovations.
Under these conditions the test statistics introduced in Theorems 4.1.1–4.1.7 have
the same asymptotic distributions as developed in Chap. 4.
Theorem 5.1.1 If .H0 of (5.1.2), Assumptions 5.1.1 and 5.1.2 hold, then Theo-
rems 4.1.1–4.1.7 remain true.
Proof Under .H0 and Assumption 5.1.1 there is a unique, stationary and causal
sequence .{yi , i ∈ Z} satisfying the equation

yi = xT
. i β 0 + Ei , i ∈ Z, (5.1.5)

where .xi = (yi−1 , . . . , yi−d )T . It is shown in Example A.1.1 that a stationary


AR.(d) sequence satisfying Assumption 5.1.2 is .Lν –decomposable, and hence so is
the sequence .{zi = (xT T
i , Ei ) , i ∈ Z}. As such the conditions of Theorems 4.1.1–
4.1.7 hold, from which this result also follows. ⨆

We note that the long–run covariance matrix .D appearing in (4.1.7) has a simple
form in the autoregressive case. Due to the AR.(d) recursion and Assumptions 5.1.1
and 5.1.2,

D = EE02 A,
.

where in this case .A = (γY (i − j ), i, j ∈ {1, . . . , d}) is the matrix of the


autocovariances of the time series .{yi , i ∈ Z}. We can estimate .A with .ÂN of
(4.1.34), and .EE02 with

1 Σ 2
N
2
sN
. = Êi .
N
i=1

As such there is no need to use a long–run covariance matrix estimator in this


case. It may readily be shown using the approximation of AR(d) sequences with
decomposable Bernoulli shifts that under .H0 as well as Assumptions 5.1.1 and 5.1.2,
| | ⎛ ⎞ || || ⎛ ⎞
| 2 | || ||
. |sN − EE02 | = OP N −1/2 , and ||ÂN − A|| = OP N −1/2 .
210 5 Parameter Changes in Time Series Models

In case of a linear model, we generally assumed that the distribution of the


covariates was unchanged after the change point, so that (4.3.1) holds under the null
as well as under the change point alternative. However, in this setting it is expected
that (4.3.1) will not hold in the presences of change points, since at the change
point .k ∗ the distribution of the process, which defines the linear model covariates,
changes. If the parameters after the change point .β A = (β1,A , . . . , βd,A )T satisfy
.βd,A /= 0, and the roots of the polynomial .1 − β1,A t − · · · − βd,A t are outside of
d

the complex unit circle, then the new regime defining .yi beyond .k ∗ also admits a
stationary solution.
In analyzing this case further we assume that Assumption 2.1.1 holds, so that

.k = LNθ ⎦ with some .0 < θ < 1. Our strategy is to first show that there exists a

deterministic sequence .β̄ = β̄(N ) satisfying


|| || ⎛ ⎞
|| ||
. ||β̂ N − β̄ || = OP N −1/2 . (5.1.6)

With .β̄, under .HA we write the residuals in the following form:

Ei + xT T
i (β 0 − β̄) + xi (β̄ − β̂ N ), 1 ≤ i ≤ k∗,
Êi =
.
Ei + xT T
i (β A − β̄) + xi (β̄ − β̂ N ), k ∗ + 1 ≤ k ≤ N.

We then establish two weak laws of large numbers:



1 Σ T
k
. lim xi xi = A1 in probability, (5.1.7)
k ∗ →∞ k ∗
i=1

and

1 Σ
N
. lim xi xT
i = A2 in probability. (5.1.8)
N −k ∗ →∞ N − k ∗ ∗ i=k +1

As we shall see these results may be used to establish the asymptotic behaviour of
the linear model change point test statistics under .HA .
We discuss the proofs of (5.1.6)–(5.1.8) in two cases: when the size of the change
is constant, i.e. the difference between .β 0 and .β A does not depend on N , and when
the change in the parameters is small, so that .||β 0 − β A || → 0 as .N → ∞.
First we consider the case when the size of the change is constant. We define the
stationary sequence .{ŷi , i ∈ Z} as the solution of the AR(d) model

ŷi = x̂T
. i β A + Ei , i ∈ Z,
5.1 ARMA Models 211

where .x̂i = (ŷi−1 . . . . , ŷi−d )T . We look at .{yk ∗ +j , j ≥ 1} as a sequence starting at



.k . Arguing as before one can show that

j
E|yk ∗ +j − ŷk ∗ +j | ≤ c1 ρ1 ,
. (5.1.9)

with some .0 < ρ1 < 1. We already established (5.1.7) with .A1 = Ex0 xT
0 . Due to
(5.1.9), standard arguments give (5.1.6) with
( )
. β̄ = (θ A1 + (1 − θ )A2 )−1 θ A1 β 0 + (1 − θ )A2 β A

and (5.1.8) with .A2 = E x̂0 x̂T


0.
Hence
⎛ ∗ ⎞
k∗
Σ k∗
Σ Σk ⎛ ⎞
. xi Êi = xi Ei + ⎝ xi xTi ⎠ β 0 − β̂ N
i=1 i=1 i=1
⎛ ⎞
= OP N 1/2 + N θ (1 − θ )A1 (θ A1 + (1 − θ )A2 )−1 A2 (β 0 − β A )(1 + oP (1)).

If the processes .{yi , i ∈ Z} and .{ŷi , i ∈ Z} are non degenerate, then .A1 and .A2 are
non singular, so if .||β 0 − β A || > 0, then the maximum of the weighted functions
of .ZN (k), as defined in (4.1.27) and Theorem 4.1.3, converge to .∞ in probability at
rate .N 1/2 .
We now consider when .ΔN = β 0 − β A depends on N, and .||ΔN || → 0 as
.N → ∞. For the sake for notational simplicity, we only consider the AR(2) case,

i.e. .d = 2, but the result may be extended. After .k ∗ , the observations satisfy the
recursion

Yi = GYi−1 + ei ,
.

where .Yi = (yi , yi−1 )T , ei = (Ei , Ei−1 )T and


⎛ ⎞
β1,A , β2,A
.G = GN = .
1, 0

Thus the observations after the change point .k ∗ may be represented as

j −1
Σ
Y
. k ∗ +j =G Yj
k∗ + Gl ek ∗ +j −l .
l=0

The eigenvalues of .G correspond to the reciprocal of the roots of .1 − β1,A t − β2,A t 2 ,


so under Assumption 5.1.1, the norms of the eigenvalues of .G are strictly less than
1. As a result of Gelfand’s theorem (see Horn and Johnson, 1991, p. 180) there is a
positive constant .c2 such that

||Gm || ≤ c2 ρ2m
. with some 0 < ρ2 < 1, for all m ∈ N.
212 5 Parameter Changes in Time Series Models

Hence for all .j ≥ 1,


( | |ν )1/ν
E |yk ∗ +j − ȳk ∗ +j |
j
. ≤ c3 ρ2 , (5.1.10)

where .ȳk is the stationary solution in (5.1.5). Hence (5.1.6) holds with .β̄ = θ β 0 +
(1 − θ )β A and (5.1.7) and (5.1.8) with .A1 = A2 = E x̄0 x̄T 0 . Putting together our
estimates we get

∗ ∗
⎛ ∗

Σ
k Σ
k Σ
k ⎛ ⎞
. xi Êi = xi Ei + ⎝ xi xTi ⎠ β 0 − β̂ N
i=1 i=1 i=1
⎛ ⎞
= OP N 1/2 + θ (1 − θ )E x̄0 x̄T
0 NΔN (1 + oP (1)).

As a result the maximum of the weighted process .ZN (k), as defined in (4.1.27) and
Theorem 4.1.3, converges in probability to .∞ at rate .N 1/2 ||ΔN ||. In the case of
the self–normalized Darling–Erdős statistic as in Theorem 4.1.3(ii), divergence to
infinity occurs when .N 1/2 ||ΔN ||/(log log N)1/2 → ∞. In both cases we see that
the consistency results as established in Chap. 4 remain true if at least one of the
changes in the regression parameters is not too small (as in Assumption 4.2.2).
Similarly it may be shown that the change point estimator .k̂N defined in
(4.2.3) satisfies the same consistency and asymptotic distributional properties for
autoregressive processes as detailed in Theorem 4.2.3.
The AR(1) case has received special attention in the literature. In this model if
.|β0 | < 1, the solution is stationary, if .|β0 | = 1, then the sequence starting from an

initial value is a random walk, and .|β0 | > 1 gives rise to a process starting from
an initial value that is “explosive". We provide more details on the latter cases in
Sect. 5.1.1 below.
Remark 5.1.1 We assumed in Assumption 5.1.2 that the errors are independent and
identically distributed. This can be replaced with the requirement that .{Ei , i ∈ Z}
is a mean zero, uncorrelated, and stationary sequence. If Assumption 5.1.1 holds,
then under the null hypothesis .{yi , i ∈ Z} remains an .Lν –decomposable Bernoulli
shift, and therefore the results of Chap. 4 can be used to check the stability of the
autoregressive parameters. Section 5.1.1 provides an example how to prove the
decomposability of a linear process when the innovations are from a decomposable
Bernoulli sequence. For a general theory of ARMA–GARCH processes we refer to
Ling and Li (1998), Li et al. (2002), Ling and McAleer (2003a), Ling and McAleer
(2003b) and Francq and Zakoian (2004). We discuss change point detection in
GARCH sequences in Sect. 5.4 below.
Remark 5.1.2 In this section we assumed that .{xi , 1 ≤ i ≤ N}, i.e. .{yi , −d ≤
i ≤ N} are available for statistical analysis. However, we only observe .yi , .i ∈
{1, . . . , N }, so for the few first .xi ’s some initial values replacing .y0 , y−1 , . . . , y−d+1
are needed. A standard approach is to replace these values with the mean of the
5.1 ARMA Models 213

series. The effect of the initial values decays geometrically fast asymptotically
under the assumptions of Theorem 5.1.1, and will not effect the conclusions of
Theorem 5.1.1.
We now turn to the problem of performing change point analysis for the param-
eters in a general ARMA.(d, r) model. A single change point in the ARMA.(d, r)
parameters may be represented in terms of the model

⎪ φ1,0 yi−1 + . . . +

⎨ ∗
φd,0 yi−d + Ei + ψ1,0 Ei−1 + . . . + ψr,0 Ei−r ,
if 1 ≤ i ≤ k ,
yi. = (5.1.11)

⎪ φ yi−1 + . . . + φd,A yi−d + Ei + ψ1,A Ei−1 + . . . + ψr,A Ei−r ,
⎩ 1,A
if k ∗ + 1 ≤ i ≤ N.

The parameter vector before the change is

β 0 = (φ1,0 , . . . , φd,0 , ψ1,0 , . . . , ψr,0 )T ,


.

which changes at time .k ∗ to

β A = (φ1,A , . . . , φd,A , ψ1,A , . . . , ψr,A )T .


.

The change point detection problem may be characterized as a hypothesis test of


H0 : β 0 = β A versus .HA : β 0 /= β A . Under .H0 we study the situation where the
.

model admits a stationary and causal solution before and after the change. Let

φ(t) = 1 − φ1,0 t − φ2,0 t 2 − . . . − φd,0 t d


.

and

ψ(t) = 1 + ψ1,0 t + ψ2,0 t 2 + . . . + ψr,0 t r .


.

We modify Assumption 5.1.1 as


Assumption 5.1.3 (i) .φd,0 /= 0 and .ψr,0 /= 0, (ii) all roots of .φ(t) are outside unit
circle in .C, (iii) all roots of .ψ(t) are outside the unit circle in .C, and (iv) .φ(t) and
.ψ(t) do not share common zeros.

We estimate the parameter .β 0 with the least squares (quasi–maximum likelihood)


method. First we define the residuals, as a function of the parameter

β = (φ1 , . . . , φd , ψ1 , . . . , ψr )T ∈ Rd+r .
.

We let

β 0 = (φ1,0 , . . . , φd,0 , ψ1,0 , . . . , ψr,0 )T


.
214 5 Parameter Changes in Time Series Models

denote the true model parameters. For a given parameter vector .β, the model errors
may be estimated by the recursive equations:

Ê1 (β) = y1
.

Ê2 (β) = y2 − β1 y1 − βd+1 Ê1 (β)


..
.
Σ
d Σ
r
Êi (β) = yi − βl yi−l − βd+l Êi−l (β), if i ≥ max(d, r).
k=1 l=1

If i is large, then

E Êi2 (β) ≈ E Ē02 (β),


.

where .Ēi (β) satisfies the recursion

Σ
d Σ
r
Ēi (β) = yi −
. βl yi−l − βd+l Ēi−l , i ∈ Z. (5.1.12)
k=1 l=1

Using Assumption 5.1.3, (5.1.12) has a stationary solution for all .β ∈ O, where .O
is a compact subset of .Rd+r such that for every element of .O, Assumption 5.1.3
holds. We assume that .β 0 is in the interior of .O. We note that

. E sup (Êi − Ēi )2 = O(ρ i ), (5.1.13)


β∈O

with some .0 < ρ < 1 (see Brockwell and Davis, 2006, p. 265). The function
E Ē02 (β) reaches its unique smallest value at .β 0 , and since .E Ē02 (β) is an analytical
.

function, we get that the derivative of .E Ē02 (β) is .0 at .β 0 . Let

Σ
k
∂ Ê 2 (β)
Ŝk (β) =
.
i
.
∂β
i=1

If .β̂ N satisfies

ŜN (β̂ N ) = 0,
.

then under the null hypothesis

||β̂ N − β 0 || = OP (N −1/2 )
.
5.1 ARMA Models 215

(see Brockwell and Davis, 2006, Section 8.11). Hence under the null hypothesis

Ŝk (β̂ N ) ≈ 0,
. for all k ∈ {1, . . . , N}.

Let
∂ Ēi2 (β) ∂ Ēi (β)
ei (β) =
. = 2Ēi (β) .
∂β ∂β
We note that .{ei (β), i ∈ Z} is a stationary sequence with .Ee0 (β 0 ) = 0. Due to the
construction, .{ei (β), i ∈ Z} are uncorrelated random variables. Let

C = Ee0 (β 0 )eT
. 0 (β 0 ). (5.1.14)

Theorem 5.1.2 We assume that .H0 of (5.1.2), Assumptions 5.1.2 and 5.1.3 are
satisfied.
(i) If .0 ≤ κ < 1/2, then
⎛ ⎞2+2κ
N2
. max ŜT −1
k (β̂ N )C Ŝk (β̂ N )
1≤k<N k(N − k)

D Σ
d+r
1
→ max B 2 (t),
0<t<1 [t (1 − t)]2κ i
i=1

where .{B1 (t), 0 ≤ t ≤ 1}, . . . , {Bd+r (t), 0 ≤ t ≤ 1} are independent Brownian


bridges.
(ii) Also,
⎧ ⎛ ⎞3/2 ⎛
N2 ⎞1/2
.P a(log N) max ŜT
k ( β̂ N )C−1
Ŝ k ( β̂ N )
1≤k<N k(N − k)

( )
≤ x + bd+r (log N) = exp −2e−x

for all .x ∈ R, where .a(x) and .bd+r (x) are defined in (1.2.18).
Proof Due to (5.1.13)

E sup ||Ŝk,1 (β) − S̄k,1 (β)|| = O(ρ k ),


.
β∈O

so it is enough to consider the processes

Σ
k
∂ Ē 2 (β) Σ
N
∂ Ēi2 (β)
S̄k,1 (β) =
.
i
and S̄k,2 (β) = ,
∂β ∂β
i=1 i=k+1
216 5 Parameter Changes in Time Series Models

where .β̄ N satisfies

S̄N,1 (β̄ N ) = S̄0,2 (β̄ N ) = 0.


.

We note that under .H0

||β̄ N − β 0 || = OP (N −1/2 )
. (5.1.15)

(see Brockwell and Davis, 2006, Sections 8.11). Standard arguments give that
⎛ ⎛ ⎞⎞
1 −1 1
.β̄ N − β 0 = − D SN (β 0 ) 1 + OP ,
N N

where
⎛ ⎞
∂ 2 Ē02 (β 0 )
D=E
. . (5.1.16)
∂β∂β

Next we write

S̄k,1 (β̄ N ) = S̄k,1 (β̄ N ) − S̄k,1 (β 0 ) + S̄k,1 (β 0 )


.

and along with a two term Taylor expansion, (5.1.15) yields


⎛ ⎞1/2
N || ||
. max ||S̄k,1 (β̄ N ) − S̄k,1 (β 0 ) − kD(β̄ N − β 0 )|| = OP (1). (5.1.17)
1≤k≤N k

Thus we get
⎛ ⎞1/2 || ⎛ ⎞||
N || ||
max ||S̄k,1 (β̄ N ) − S̄k,1 (β 0 ) − k S̄N,1 (β 0 ) || = OP (1).
.
1≤k≤N k || N ||

We observe that

Sk,1 (β̄ N ) = −Sk,2 (β̄ N )


.

and along the lines of (5.1.17)


⎛ ⎞1/2
N || ||
. max ||S̄k,2 (β̄ N ) − S̄k,2 (β 0 ) + (N − k)D(β̄ N − β 0 )|| = OP (1).
1≤k<N N −k
5.1 ARMA Models 217

Hence
⎛ ⎞1/2 || ⎛ ⎞||
|| ||
max
N ||S̄k,1 (β̄ N ) − −S̄k,2 (β 0 ) + N − k S̄N,1 (β 0 ) || = OP (1).
.
1≤k<N N −k || N ||

Theorem A.1.3 implies that for each N there are independent Wiener processes
{WN,1 (x), 0 ≤ x ≤ N/2} and .{WN,2 (x), 0 ≤ x ≤ N/2} with values in .Rd+r such
.

that
1 || ||
||S̄k,1 (β 0 ) − WN,1 (k)|| = OP (1)
. max ζ
1≤k≤N/2 k

and
1 || ||
. max ||S̄k,2 (β 0 ) − WN,2 (N − k)|| = OP (1)
1≤k≤N/2 (N − k) ζ

with some .ζ < 1/2, .EWN,1 (x) = EWN,2 (x) = 0 and .EWN,1 (x)WT
N,1 (y) =
T
EWN,2 (x)WN,2 (y) = min(x, y)C. If
⎧ x

⎪ W (x) − (WN,1 (N/2) + WN,2 (N/2)), if 0 ≤ x ≤ N/2,
⎨ N,1 N
.BN (x) =
N −x
⎪ −W N,2 (N − x) + (WN,1 (N/2) + WN,2 (N/2)),

⎩ N
if N/2 ≤ x ≤ N,

then
|| ||
N 1/2−ζ || ||
|| 1 S̄LN t⎦ (β̄ N ) − 1 BN (N t)||
. max || || (5.1.18)
1/(N +1)≤t≤1−1/(N +1) [t (1 − t)] ζ N 1/2 N 1/2

= OP (1)

and as in the proof of (1.2.6)


|| ||
1 || 1 1 ||
|| ||
. max
1/(N +1)≤t≤1−1/(N +1) [t (1 − t)]1/2 || N 1/2 S̄LN t⎦ (β̄ N ) − N 1/2 BN (N t)|| = OP (1).

We see that .BN (N t) = 0 and


⎛ ⎞⎛ ⎞T
1 1
E
. BN (N t) B (N s)
1/2 N
= (min(s, t) − st)C.
N 1/2 N

We showed that .S̄k (β̄ N ) can be approximated with a CUSUM process, and due
to the weighted Gaussian approximation in (5.1.18), the results in Sect. 1.3 imply
Theorem 5.1.2. ⨆

218 5 Parameter Changes in Time Series Models

To construct a test of .H0 , we must estimate .C. Since the derivatives of the
Êi (β)’s are asymptotically uncorrelated, the sample variance is a sensible and
.

asymptotically consistent estimate. Let

∂ Êi2 (β)
êi (β) =
.
∂β

and

1 Σ
N
ĈN (β) =
. êi (β)êT
i (β). (5.1.19)
N
i=1

It can be proven under the conditions of Theorem 5.1.2 that


|| || ⎛ ⎞
|| ||
. ||ĈN (β̂ N ) − C|| = OP N −1/2 , (5.1.20)

so Theorem 5.1.2 remains true with .ĈN (β̂ N ) in place of .C. We also note that
Brockwell and Davis (2006) (Section 8.11) provides a simple and computationally
efficient recursive scheme for calculating .êi .
We may also consider test statistics based on direct comparisons of the estimators
computed from the first k and the last .N − k observations. Let

Σ
k
∂ Ê 2 (β)
Ŝk,1 (β) =
.
i
∂β
i=1

and

Σ
N
∂ Êi2 (β)
Ŝk,2 (β) =
. .
∂β
i=k+1

The least squares (quasi- maximum likelihood) estimators .β̂ k,1 and .β̂ k,1 are the
solutions of

Ŝk,1 (β̂ k,1 ) = 0 and


. Ŝk,2 (β̂ k,2 ) = 0.

The asymptotic distribution of the process .β̂ k,1 − β̂ k,2 is determined by the
asymptotic behaviour of the partial sums of the variables .ê(β 0 ). We recall the
matrices .C and .D from (5.1.14) and (5.1.16).
5.1 ARMA Models 219

Theorem 5.1.3 We assume that .H0 of (5.1.2), Assumptions 5.1.2 and 5.1.3 are
satisfied.
(i) If .0 ≤ κ < 1/2, then
⎛ ⎞−2+2κ
N2
. max (β̂ k,1 − β̂ k,2 )T (DCD)−1 (β̂ k,1 − β̂ k,2 )
1≤k,N k(N − k)

D Σ
d+r
1
→ max B 2 (t),
0<t<1 [t (1 − t)]2κ i
i=1

where .{B1 (t), 0 ≤ t ≤ 1}, . . . , {Bd+r (t), 0 ≤ t ≤ 1} are independent Brownian


bridges.
(ii) Also,
⎧ ⎞ ⎛
k(N − k) 1/2 ⎛ T −1
⎞1/2
.P a(log N) max ( β̂ k,1 − β̂ k,2 ) (DCD) ( β̂ k,1 − β̂ k,2 )
1≤k<N N2

( )
≤ x + bd+r (log N) = exp −2e−x

for all .x ∈ R, where .a(x) and .bd+r (x) are defined in (1.2.18).
Proof As in the proof of Theorem 5.1.2 we need to consider the estimators .β̄ k,1 and
.β̄ k,2 satisfying

S̄k,1 (β̄ k,1 ) = 0 and


. S̄k,2 (β̄ k,2 ) = 0,

where
Σ
k
∂ Ē 2 (β) Σ
k
S̄k,1 (β) =
.
i
= ēi (β)
∂β
i=1 i=1

and
Σ
N
∂ Ēi2 (β) Σ
N
S̄k,2 (β) =
. = ēi (β).
∂β
i=k+1 i=k+1

Let .O be a compact subset of .Rd+r such the for all .β = (β1 , . . . , βd+r )T ∈ O, the
roots of the polynomials

φ(t, β) = 1 − β1 t − β2 t 2 − . . . − βd t d
.

and

ψ(t, β) = 1 + βd+1 t + ψd+2 t 2 + . . . + ψd+r t r


.
220 5 Parameter Changes in Time Series Models

are outside of the unit circle in the complex plane. If .β ∈ O, then the invertible
representation of the ARMA innovations implies that

Σ
Ēi (β) =
. πl (β)yi−l , (5.1.21)
l=0

and therefore .{Ēi2 (β), i ∈ Z} is .Lν –decomposable. Hence by the maximal inequal-
ity (A.3.2) and Chebyshev’s inequality we obtain that
⎧ || k || ⎞
|| 1 Σ ||
|| ||
. lim lim sup P max sup || Ēi (β) − E Ē0 (β)|| > δ = 0,
2 2
(5.1.22)
M→∞ N →∞ M≤k≤N β∈O || k ||
i=1

and
⎧ || || ⎞
|| 1 Σ
N ||
|| ||
lim
. lim sup P max sup || Ēi (β) − E Ē0 (β)|| > δ
2 2
(5.1.23)
M→∞ N →∞ 0≤k≤N −M β∈O || N − k ||
i=k+1

=0

for all .δ > 0. Since .Ēi2 (β) is a continuous function of .β ∈ O, (5.1.22) and (5.1.23)
imply
⎧ || k || ⎞
|| 1 Σ ||
|| ||
. lim lim sup P max sup || Ēi2 (β) − E Ē02 (β)|| > δ = 0, (5.1.24)
M→∞ N →∞ M≤k≤N β∈O || k ||
i=1

and
⎧ || || ⎞
|| 1 Σ
N ||
|| ||
lim lim sup P
. max sup || Ēi (β) − E Ē0 (β)|| > δ
2 2
(5.1.25)
M→∞ N →∞ 0≤k≤N −M β∈O || N − k ||
i=k+1

=0

for all .δ > 0. The function .E Ē02 (β) has its unique minimum at .β 0 , hence it follows
from (5.1.24) and (5.1.25)

. max ||β̄ k,1 − β 0 || = oP (1), as M, N → ∞, (5.1.26)


M≤k≤N

and
⎧ ⎞
. lim lim sup P max ||β̄ k,2 − β 0 || > δ = 0. (5.1.27)
M→∞ N →∞ 0≤k<N −M
5.1 ARMA Models 221

Following the proofs of (5.1.26) and (5.1.27), one can show that there is a neighbour-
hood .O∗ ⊂ O, containing .β 0 such that
|| ||
|| 1 ∂ S̄ (β) ||
|| k,1 −1 ||
. max sup || − D̄ (β)|| = oP (1), (5.1.28)
M≤k≤N β∈O∗ || k ∂β ||

as .M, N → ∞, and
⎧ || || ⎞
|| 1 ||
|| ∂ S̄M,2 (β) −1 ||
. lim lim sup P sup || − D̄ (β)|| > δ = 0, (5.1.29)
M→∞ N →∞ β∈O∗ || N − M ∂β ||

for all .δ > 0, where


⎛ ⎞
−1 ∂ 2 Ē02 (β)

. (β) = E .
∂β∂β

Thus we get

1
β̄ k,1 − β 0 = − (D + Rk,1 )S̄k,1 (β 0 )
.
k
and
1
β̄ k,2 − β 0 = −
. (D + Rk,2 )S̄k,2 (β 0 ),
N −k

where the remainder terms .Rk,1 and .Rk,2 satisfy


|| ||
. max ||Rk,1 || = oP (1), as M, N → ∞,
M≤k≤N

and
⎧ ⎞
|| ||
. lim lim sup P max ||Rk,2 || > δ = 0
M→∞ N →∞ 0≤k≤N −M

for all .δ > 0. Thus we get via Theorem 1.3.1 that


⎛ ⎞1/2
k
. max ||β̄ k,1 − β 0 || = OP (1)
1≤k≤N log log k

and
⎛ ⎞1/2
N −k
. max ||β̄ k,2 − β 0 || = OP (1).
1≤k<N log log(N − k)
222 5 Parameter Changes in Time Series Models

Since the second derivatives of .S̄k,1 (β)/k are also bounded in a neighbourhood
of .β 0 with probability tending to 1, applying a two term Taylor expansion for the
coordinates of .S̄k,1 we get

k || ||
. max ||k(β̄ k,1 − β 0 ) + DS̄k,1 (β 0 )|| = OP (1), (5.1.30)
1≤k≤N log log k

and similarly

N −k || ||
. max ||(N − K)(β̄ k,2 − β 0 ) + DS̄k,2 (β 0 )|| (5.1.31)
1≤k<N log log(N − k)
= OP (1).

Now we can use Theorem A.1.3, and define for each N two independent Gaussian
processes .{WN,1 (x), 0 ≤ x ≤ N/2} and .{WN,2 (x), 0 ≤ x ≤ N/2} such that

1 || ||
||S̄k,1 + WN,1 (k)|| = OP (1),
. sup ζ
1≤k≤N/2 k

and
1 || ||
. sup ||S̄k,1 + WN,2 (N − k)|| = OP (1)
N/2≤k≤N −1 (N − k)
ζ

with some .ζ < 1/2, and .EWk,1 (x) = EWk,2 (x) = 0, .EWk,1 (x)Wk,1 (x ' ) =
EWk,2 (x)Wk,2 (x ' ) = C min(x, x ' ), where .C is defined in (5.1.14). Next we write

k(N − k)
. (β̄ k,1 − β̄ k,2 )
N

⎪ k ( )
⎨ Sk,1 − SN/2,1 + SN/2,2 + Rk,3 , if 1 ≤ k ≤ N/2,
= N
⎪ N −k ( )
⎩ −Sk,2 + SN/2,1 + SN/2,2 + Rk,4 , if N/2 ≤ k < N
N
using (5.1.30) and (5.1.31) we get that

1
. max ||Rk,3 || = OP (1),
1≤k≤N/2 log log k

1
. max ||Rk,4 || = OP (1).
N/2≤k<N log log(N − k)

These approximations make it possible to use the results in Sect. A.1 to establish the
results by repeating the arguments in Sect. 1.3. ⨆

5.1 ARMA Models 223

The norming matrix is .(DCD)−1 = D−1 C−1 D−1 . We may again estimate .C with
.Ĉ(β̂ N ) (5.1.19), and we may define a similar estimator for .D
−1 . Let

1 Σ ∂ 2 ê2i (β̂ N )
N
.D̂−1
N = .
N ∂β∂β
i=1

Similarly to (5.1.20) we have


|| ||
|| −1 ||
. ||D̂N − D−1 || = OP (N −1/2 ),

and therefore
|| ||
|| −1 −1 −1 ||
. ||D̂N ĈN D̂N − (DCD)−1 || = OP (N −1/2 ).

The consistency of the tests in Theorem 5.1.3 is a consequence of the behaviour


of the estimators under the alternative hypothesis. If .k ∗ = LN θ ⎦ with some .0 < θ <
1, and
|| ||
N 1/2 ||β 0 − β A || → ∞,
.

then for any estimator .ĜN → G in probability, we have


⎛ ⎞2+2κ
N2
(β̂ k,1 − β̂ k,2 )T Ĝ−1
P
. max N (β̂ k,1 − β̂ k,2 ) → ∞,
1≤k,N k(N − k)

assuming that .ĜN and .G are positive definite matrices. Similarly, if


⎛ ⎞1/2
N || ||
. ||β 0 − β A || → ∞,
log log N

then
⎛ ⎞3/2
1 N2
. max
(log log N)1/2 1≤k<N k(N − k)
⎛ ⎞1/2 P
(β̂ k,1 − β̂ k,2 )T Ĝ−1
N ( β̂ k,1 − β̂ k,2 ) → ∞

holds.
Remark 5.1.3 We assumed for the sake of simplicity that Assumption 5.1.2 holds.
This assumption can be replaced with a decomposable Bernoulli shift assumption
along with some further conditions. The identifiability of .β 0 requires that .EEj Ei =
224 5 Parameter Changes in Time Series Models

0, if .i /= j . This condition holds, for example, for volatility processes, including


several versions of GARCH processes.
Remark 5.1.4 We only discussed the case when under the alternative an
ARMA.(p, q) sequence changes to another ARMA.(p, q). The consistency can
also be proven when the observations change to an ARMA.(p' , q ' ) model.

5.1.1 Non–stationary AR(1) Processes

In Sect. 5.1 we studied several methods to check if the parameters of a stationary


ARMA process change in such a way that both ARMA processes before and after
the change admit stationary solutions. In some cases though we could expect the
model errors to be non–stationary processes. One example of this is when the non–
stationarity takes the form of a unit root or explosive (autoregressive parameter
larger than 1) autoregressive process. In this subsection we present on such problems
in the context of AR(1) models.
First we study the behaviour of the CUSUM process in the model

.yi = β0 yi−1 + Ei , i ∈ {1, . . . , N}, (5.1.32)

where .y0 is an initial value for the recursion. We assume that the innovations are
Lν –decomposable, i.e. we do not require that the .Ei ’s are independent. If
.

Assumption 5.1.4 .|β0 | < 1,


holds, then it is straightforward to see that
( )1/ν
. E |yi − ȳi |ν = O(|β0 |i ), (5.1.33)

where

Σ
ȳi =
. β0l Ei−l . (5.1.34)
l=0

The sequence .{ȳi , i ∈ Z} is stationary and satisfies the recursion

. ȳi = β0 ȳi−1 + Ei , i ∈ Z. (5.1.35)

We define the long–run variance of the stationary solution of (5.1.34) as



Σ
.τ = 2
E ȳ0 ȳi .
l=−∞
5.1 ARMA Models 225

For observations following the AMOC model with AR(1) errors



μ0 + yi , if 1 ≤ i ≤ k ∗ ,
.Xi =
μA + yi , if k ∗ + 1 ≤ i ≤ N.

We wish to test again the null hypothesis

(1)
. H0 : μ0 = μA , (5.1.36)

against the alternative

(1)
HA : μ0 /= μA .
. (5.1.37)

However, the parameter of the AR(1) process driving the error terms stays the same
during the observation period. We recall the CUSUM process
⎛ ⎞
LN
Σ LNt⎦ Σ ⎠
t⎦ N
−1/2 ⎝
ZN (t) = N
. Xi − Xi , 0 < t < 1.
N
i=1 i=1

The long–run variance estimator computed from .X1 , . . . , XN is denoted by .τ̂N2 .


When .|β0 | < 1, we have the following result as in Sect. 1.2.1. The proof is omitted.
(1)
Theorem 5.1.4 If .H0 of (5.1.36), Assumptions 3.1.4, 3.1.5, 5.1.4, .0 ≤ κ < 1/2,
and the innovations in (5.1.32) are .Lν –decomposable, then

1 1 D 1
. sup |ZN ((N + 1)t/N)| → sup |B(t)|,
τ̂N 0<t<1 [t (1 − t)]κ
0<t<1 [t (1 − t)]
κ

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge.


In contrast, we next consider the case when the AR(1) innovations in the AMOC
model have a unit root.
Assumption 5.1.5 .β0 = 1.
In this case the innovations follow a random walk with .Lν –decomposable incre-
ments starting from .y0 . We let

Σ
.σ2 = EE0 El (5.1.38)
l=−∞

denote the long–run variance of the AR(1) innovations. Recall that .τ̂N2 is the long–
run variance estimator computed from .X1 , . . . , XN . As we shall see, it does not
estimate any particular long–run variance parameter when .β0 = 1.
226 5 Parameter Changes in Time Series Models

(1)
Theorem 5.1.5 If .H0 of (5.1.36), Assumption 5.1.4 holds, and the innovations in
(5.1.32) are .Lν –decomposable for some .ν > 4, then we can define Wiener processes
.{WN (t), 0 ≤ t ≤ 1} such that

| |
1 | 1 |
sup | ZN (t) − σ ┌N (t)|| = oP (1),
.
|
1/N ≤t≤1−1/N [t (1 − t)]
ζ N 1/2

with some .ζ < 1, where


⎛ t ⎛ 1
┌N (t) =
. WN (x)dx − t WN (x)dx,
0 0

and .σ 2 is defined in (5.1.38).


If in addition Assumptions 3.1.4 and 3.1.5 also hold, then
| ⎛ c ⎛ 1⎧ ⎛ 1 ⎞2 ||
| τ̂ 2
| N |
.| −σ 2
K(u)du WN (x) − WN (u)du dx | = oP (1).
| Nh −c 0 0 |

Proof It follows from .Lν –decomposability of the innovations that for each N there
are two independent Wiener processes .{WN,1 (x), 0 ≤ x ≤ N/2} and .{WN,2 (x), 0 ≤
x ≤ N/2} such that

1 || |
. max yi − σ WN,1 (k)| = OP (1), (5.1.39)
1≤k≤N/2 k ζ̄

and
1 | |
. max |(yN − yk ) − σ WN,2 (N − k)| = OP (1) (5.1.40)
N/2≤k<N (N − k)ζ̄

with .σ > 0 of (5.1.38) and some .ζ̄ < 1/2. By (5.1.39) and (5.1.40) we have
| |
|LN t⎦ ⎛ Nt |
1 | | Σ |
sup y − σ W (x)dx | = OP (N ζ̄ +1 ),
.
+1 | i N,1 |
1/N ≤t≤1/2 t ζ̄
| i=1 0 |

and
| |
| Σ ⎛ N |
1 | N |
sup | (yN − yi )−σ WN,2 (N − x)dx || = OP (N ζ̄ +1 ).
.
|
1/2≤t≤1−1/N (1 − t)ζ̄ +1 |i=LN t⎦ Nt |
5.1 ARMA Models 227

Since

Σ
N LN/2⎦
Σ Σ
N
. yi = yi − (yN − yi ) + (N − LN/2⎦ )[yLN/2⎦ + (yN − yLN/2⎦ )]
i=1 i=1 i=LN/2⎦ +1

we get
|N |
|Σ |
| |
.| yi − σ ┌ˆ N | = OP (N 1/2+ζ̄ ),
| |
i=1

where
⎛ N/2 ⎛ N
┌ˆ N =
. WN,1 (x)dx − [WN,2 (N − x) − (WN,1 (N/2) + WN,2 (N/2))]dx.
0 N/2

If
⎧ ⎛⎛ N t ⎞

⎪ −3/2 ˆ
WN,1 (x)dx − t ┌N , if 0 ≤ t ≤ 1/2,

⎪ N

⎪ ⎛⎛0 N


⎨ −3/2
N [WN,2 (N − x) − (WN,1 (N/2)+WN,2 (N/2))]dx
.┌N (t) =


Nt ⎞

⎪ ˆ

⎪ + (1 − t)┌N ,



if 1/2 ≤ t ≤ 1,

then for all .0 ≤ ζ < 1,


| |
1 |1 |
sup | ZN (t) − τ ┌N (t)| = OP (N −3/2+ζ ).
.
[t (1 − t)]ζ |N |
1/N ≤t≤1−1/N

Next we note
D
. {┌N (t), 0 ≤ t ≤ 1} =
⎧⎛ t

⎪ ˆ
⎨ W1 (x)dx − t ┌, if 0 ≤ t ≤ 1/2,
⎛ 1
0

⎪ ˆ
⎩ [W2 (1 − x) − (W1 (1/2) + W2 (1/2))]dx + (1 − t)┌, if 1/2 ≤ t ≤ 1,
t
228 5 Parameter Changes in Time Series Models

where
⎛ 1/2 ⎛ 1 1
┌ˆ =
. W1 (x)dx − W2 (1 − x)dx + (W1 (1/2) + W2 (1/2)) ,
0 1/2 2

{W1 (x), 0 ≤ x < ∞} and .{W2 (x), 0 ≤ x < ∞} are independent Wiener processes.
.

Let

W1 (x), if 0 ≤ x ≤ 1/2,
.W (x) =
−[W2 (1 − x) − (W1 (1/2) + W2 (1/2))], if 1/2 ≤ x ≤ 1.

Computing the covariance function one can verify that .{W (x), x ≥ 0} is a Wiener
process. Hence
⎧⎛ t ⎛ 1 ⎞
D
{┌N (t), 0 ≤ t ≤ 1} =
. W (x) − t W (x)dx, 0 ≤ t ≤ 1 .
0 0

Let

1 Σ
N
ŷN =
. yi .
N
i=1

Lν –decomposability yields
.

E(yi − yk )2 ≤ c5 |i − k|
.

and therefore by the Cauchy–Schwarz inequality we have


| ch |
|Σ ⎛ l ⎞ 1 NΣ −l |
| |
.E | K (yi − ŷN )(yi+l − yi )|
| h N −l |
l=0 i=1

Σ
ch
1 Σ
N
≤ c6 E|yi − ŷN ||yi+l − yi |
N
l=0 i=1

Σ 1 Σ⎛ ⎞1/2
ch N
≤ c6 E(yi − ŷN )2 E(yi+l − yi )2
N
l=0 i=1
⎛ ⎞
= O h3/2 N 1/2 .
5.1 ARMA Models 229

Thus we conclude
| ⎛ ⎞
| 1 Σ ch
1 Σ
N −l
| l
.| K (yi − ŷN )(yi+l − ŷN )
| Nh h N −l
l=0 i=1
⎛ c |
1 Σ
N |
|
− K(u)du 2 (yi − ŷN )2 |
0 N |
i=1
⎛⎛ ⎞ ⎞
h 1/2
= OP = oP (1)
N

and similarly
|
| −1 ⎛ ⎞
| 1 Σ l 1 Σ N
.| K (yi − ŷN )(yi+l − ŷN )
|Nh h N − |l|
| l=−ch i=−(l−1)

⎛ c |
1 Σ
N |
2|
− K(u)du 2 (yi − ŷN ) |
0 N |
i=1
⎛⎛ ⎞ ⎞
h 1/2
= OP = oP (1).
N

The result now follows from the approximation of the partial sums, since .yi is a
partial sum when .β = 1. ⨅

We may also consider the “explosive case” in which the AR(1) innovations in the
AMOC model satisfy
Assumption 5.1.6 .|β0 | > 1.
Theorem 5.1.6 If .H0(1) of (5.1.36), Assumption 5.1.6 hold, and the innovations in
(5.1.32) are .Lν –decomposable for some .ν > 4, then
| | k | | |||
| |Σ k Σ || || β0
N
||
| |
. ||β0 |
−N
max | Xi − Xi | − | (Y + y0 )||| = oP (1),
| 1≤k≤N | N | β0 − 1 |
i=1 i=1

where

Σ
Y =
. β0−l El . (5.1.41)
l=1
230 5 Parameter Changes in Time Series Models

If in addition, Assumptions 3.1.4 and 3.1.5 also hold, then


| |
| β02 β0 + 1 |
| −2N 2 2|
. |N |β0 | τ̂N − 2 (Y + y0 ) | = oP (1).
| β0 − 1 β0 − 1 |

Proof Since

Σ
i−1
yi = β0i y0 +
. β0l Ei−l ,
l=0

we get the representation



Σ
yi = β0i y0 + β0i Y − zi with zi =
. β0−l Ei+l . (5.1.42)
l=1

The sequence .{zi , i ∈ Z} is .Lν –decomposable, .ν > 4, and therefore Theorem 1.1.1
yields
| k |
|Σ k Σ ||
N ⎛ ⎞
|
. max | zi − zi | = OP N 1/2 .
1≤k≤N | N |
i=1 i=1

Thus we conclude
| k | | |
|Σ k Σ ||
N | k Σ i ||
N ⎛ ⎞
| | i
. max | yi − yi | = max |β0 − β0 | |Y + y0 | + OP N 1/2
1≤k≤N | N | 1≤k≤N | N |
i=1 i=1 i=1
| | | | ⎛ ⎞
| β0 | | k k N ||
|
=| | |
(Y + y0 )| max |β0 − β0 | +OP N 1/2 .
β0 − 1 1≤k≤N N

When .β0 > 1, the function .x |→ |β0x − (x/N)β0N | has its largest value at .xN =
N − [log(N log β0 )]/ log β0 , and thus we conclude
| |
| k |

−N
max |β − k β N | → 1. (5.1.43)
0 1≤k≤N | 0 N 0|

If .β0 < −1, the one needs to repeat the computations above and consider the
max1≤k≤N for odd and even k separately. In this case (5.1.43) also holds with .β0−N
.

replaced with .|β0 |−N . Hence the first part of the theorem is proven.
Using the representation in (5.1.42), we obtain

1 β0N ⎛ ⎞
ȳN =
. (y0 + Y ) + OP N −1/2 ,
N β0 − 1
5.1 ARMA Models 231

and
⎛ ⎞⎛ ⎞
1 β0N 1 β0N
.(yi − ȳN )(yi+l − ȳN ) = yi − (y0 + Y ) yi+l − (y0 + Y )
N β0 N β0

+ RN,1 (i)
⎛ ⎞⎛ ⎞
1 β0N i+l 1 β0N
= β0 −
i
β0 − (y0 + Y )2
N β0 − 1 N β0 − 1

+ RN,2 (i)

with

.N|β0 |−N max |β0 |−i |RN,1 (i)| = OP (1)


1≤i≤N

and N|β0 |−N max |β0 |−i |RN,2 (i)| = OP (1).


1≤i≤N

Elementary calculations give

−1 ⎛ ⎞ N −l
⎛ ⎞⎛ ⎞
Σ
N
l 1 Σ 1 β0N 1 β0N
i+l
. K β0 −
i
β0 −
h N −l N β0 − 1 N β0 − 1
l=0 i=1
−1⎛ ⎞ ⎛
Σ
N
l 1 β02N +2 −l 1 β02N +1 −l
= K β − β
h N − l β02 − 1 0 N (β0 − 1)2 0
l=0

1 β02N +1 1 β02N
− + 2
N (β0 − 1)2 N (β0 − 1)2

1 2N β02 1
= β (1 + o(1)),
N 0 β02 − 1 1 − 1/β0

and therefore

Σ −1 ⎛ ⎞ N −l
1 Σ
N
l
Nβ0−2N
. K (yi − ȳN )(yi+l − ȳN )
h N −l
l=0 i=1

P β02 1
→ (Y + y0 )2 .
β02 − 1 β0 − 1
232 5 Parameter Changes in Time Series Models

Similar arguments give

−(N
Σ −1) ⎛ ⎞ Σ
N
l 1
Nβ0−2N
. K (yi − ȳN )(yi+l − ȳN )
h N − |l|
l=−1 i=−(l−1)

P β02 β0
→ (Y + y0 )2 ,
β02 − 1 β0 − 1

completing the proof. ⨆



Theorems 5.1.4–5.1.6 imply that

1 D
. sup |ZN ((N + 1)t/N)| → U (t),
τ̂N 0<t<1

where

⎪ sup |B(t)|,
⎨ 0<t<1 if |β0 | < 1,
U (t) =
.
0, if β0 = 1,


1, if |β0 | > 1,

with .{B(t), 0 ≤ t ≤ 1} denoting a standard Brownian bridge. This means if the


critical value is taken from the distribution of .supt∈[0,1] |B(t)| in order to test .H0
based on the maximally selected CUSUM statistic, this testing procedure will not
be consistent under unit root or explosive AR(1) innovations.

5.2 Dynamic Regression Models

Dynamic regression refers to linear regression models for a time series .{yt , t ∈
Z} that involve both autoregression as well as regression on additional, exogenous,
series. An AMOC dynamic regression model is formulated as follows:

xT ∗
i β 0 + Ei , 1 ≤ k ≤ k ,
. yk = T ∗ (5.2.1)
xi β A + Ei , k + 1 ≤ k ≤ N,

where the errors .{Ei , i ∈ Z} are independent and identically distributed, and
the covariate .xi = (1, zi,2 , . . . , zi,r , yi−1 , . . . , yi−d )T ∈ Rr+d combines both
exogenous and autoregressive components. We wish to test the null hypothesis

H0 : β 0 = β A
. (5.2.2)
5.2 Dynamic Regression Models 233

against the alternative

HA : β 0 /= β A .
.

We write the parameter vector .β 0 as

β 0 = (β T
.
T T
z,0 , β y,0 ) , β z,0 ∈ Rr , β y,0 ∈ Rd ,

so that .β z,0 and .β y,0 correspond to the regression parameters for the exogenous
and autoregressive terms, respectively. Under the null hypothesis we assume
additionally that the process .{yt , t ∈ Z} is stationary, which is implied when
.zi = (1, zi,2 , . . . , zi,r )
T is .Lν –decomposable, and the following assumption holds.

Here we write .β y,0 = (βy,0,1 , . . . , βy,0,d )T .


Assumption 5.2.1 .βy,0,d /= 0, and the roots of the polynomial .1 − βy,0,1 t − · · · −
βy,0,d t d are outside of the unit circle in .C.
A change point detection method may again be based on the residuals .Êi = yi −
xT
i N , .i ∈ {1, . . . , N }, where .β̂ N is the least squares estimator
β̂
⎛ ⎞−1
β̂ N = XT
. N X XT
N YN ,

using the notation .YN = (y1 , y2 , . . . , yN )T and


⎛ ⎞
1, z1,2 , . . . , z1,r , y1−1 , . . . , y1−d
⎜ 1, z2,2 , . . . , z2,r , y2−1 , . . . , y2−d ⎟
⎜ ⎟
.XN = ⎜ . . .. ⎟.
⎝ .. .. . ⎠
1, zN,2 , . . . , zN,r , yN−1 , . . . , yN −d

It follows from Lemma 5.2.1 that the limit


1 T P
. XN XN → A
N
exists and we require
Assumption 5.2.2 .A is non singular.
We prove in Lemma 5.2.2 that .β̂ N is asymptotically unbiased if and only if
Assumption 5.2.3 .Ez0,j E0 = 0, j ∈ {1, . . . , r}.
Assumption 5.2.2 implies (see (5.2.3) in the proof of Lemma 5.2.1) that .Exi Ei = 0.
We use the CUSUM process .ẐN (t) of the weighted residuals defined in (4.1.27).
234 5 Parameter Changes in Time Series Models

The long–run variance of the limit of .ẐN (t) is given by in term of



Σ
D=
. E[(x0 E0 )(xT
l El )].
l=−∞

We also assume
Assumption 5.2.4 .D is non singular.
We recall the integral .I (w, c) from (1.2.4).
Theorem 5.2.1 We assume that .H0 of (5.2.2) and Assumptions 5.2.1–5.2.4 are
satisfied.
(i) If .w(·) satisfies Assumption 1.2.1 and .I (w, c) < ∞ for some .c > 0, then

1 ⎡ T ⎤1/2
. sup ẐN (t)D−1 ẐN (t)
1/(N +1)≤t≤N/(N+1) w(t)
⎛r+d ⎞1/2
D 1 Σ
→ sup 2
Bi (t) ,
0<t<1 w(t) i=1

where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , r+d} are independent Brownian bridges.
(ii) Also,

1 ⎛ ⎞1/2
−1
. lim P a(log N) sup ẐN T(t)D ẐN (t)
1/(N +1)≤t≤N/(N+1) [t (1 − t)]
N →∞ 1/2


≤ x + br+d (log N) = exp(−2e−x )

for all .x ∈ R, where .a(t) and .br+d (t) are defined in (1.3.9).
The proof of Theorem 5.2.1 is based on the following two lemmas.
Lemma 5.2.1 We assume that .H0 of (5.2.2) and Assumption 5.2.1 are satisfied,
and that .{zi , i ∈ Z} are .Lν –decomposable for some .ν > 4. Then
(i) .{yi , i ∈ Z} is .Lν –decomposable, and (ii) .{zi zT
i , i ∈ Z} is .L
ν/2 –decompos-

able in .R .r×r

Proof Elementary arguments give



Σ
yi =
. cl (wT
i−l β 1,0 + Ei−l ) (5.2.3)
l=0
5.2 Dynamic Regression Models 235

where

wi = (1, zi,2 , . . . , zi,r )T


.

and .cl satisfies

.|cl | = O(ρ l ) with some 0 < ρ < 1. (5.2.4)

Brockwell and Davis (2006) contains the proof of (5.2.4). Equation (5.2.3) implies
the Bernoulli shift representation. By the triangle inequality we have
⎛ |∞ |ν ⎞1/ν
( ) |Σ |
| |
cl (wT
1/ν
. E|yi | = E| β 1,0 + Ei−l )|
ν
(5.2.5)
| i−l |
l=0

Σ
≤ |cl |((E||wi−l ||ν )1/ν ||β 1,0 || + (E|Ei−l |ν )1/ν )
l=0

< ∞.

We introduce


g(ηk , . . . , ηi−j +1 , ηi−j,i,j ∗
, ηi−j if k > i − j,
(0) −1,i,j , . . .),
.z = ∗ ∗
k,i,j g(ηk,i,j , ηk−1,i,j , . . .), if k ≤ i − j,


where .{ηk,i,j , −∞ < i, j, k < ∞} are independent copies of .η0 , as in Defini-
tion 3.1.1. Let .zk,i,j = ((xk,i,j )T , Ek,i,j )T and .wk,i,j = (1, (zk,i,j )T )T . Now we
(0) (0) (0) (0) (0)

define

Σ
∗ T
.yi,j = cl ((w(0) (0)
i−l,i,j ) β 1,0 + Ei−l,i,j ),
l=0

∗ = g(η , η
which satisfies .yi,j ∗ ∗
i i−1 , . . . , ηi−j +1 , ηi−j,i,j , ηi−j −1,i,j , . . .). Using the

definitions of .yi and .yi,j we have

j −1
Σ

yi − yi,j
. = cl ([wi−l − w(0) (0)
i−l,i,j ] β 1,0 + [Ei−l − Ei−l,i,j ])
T

l=0

Σ
+ cl (wT
i−l β 1,0 + Ei−l )
l=j

Σ
cl ((wi−l,i,j )T β 1,0 + Ei−l,i,j ).
(0) (0)

l=j
236 5 Parameter Changes in Time Series Models

(0)
Observing that .(zi , zi,k ) has the same distribution as .(zi , zk,i,j ), by .Lν –
decomposability and the triangle inequality we have
⎛ | |ν ⎞1/ν
|Σ |
|j −1 |
. ⎝E | | ⎠
(0) T
| cl ([wi−l − wi−l,i,j ] β 1,0 |
| l=0 |
j −1
Σ (0) (0)
≤ |cl |((E||wi−l − wi−l,i,j ||ν )1/ν ||β 1,0 || + (E|Ei−l − Ei−l,i,j |ν )1/ν )
l=0
⎛ ⎞
j −1
Σ
=O⎝ l−α ⎠ .
l=1

Repeating the arguments used in (5.2.5),


⎛ | |ν ⎞1/ν ⎛ ⎞
|∞ | ∞
|Σ | Σ
. ⎝E | cl (wT | ⎠ =O⎝
i−l β 1,0 + Ei−l )| ρl⎠
|
|l=j | l=j

and
⎛ | |ν ⎞1/ν ⎛ ⎞
|Σ |
|∞ | Σ∞
. ⎝E | cl ((wi−l,i,j )T β 1,0 + Ei−l,i,j )|| ⎠ = O ⎝ ρl⎠ ,
(0) (0)
|
|l=j | l=j

completing the first part of Lemma 5.2.1.


Regarding the second part,
⎛ ⎞2/ν
. E||zi zT ∗ ∗ T ν/2
i − zi,j (zi,j ) |
⎛ ⎞2/ν
= E||zi zT ∗ T ∗ T ∗ ∗ T ν/2
i − zi (zi,j ) + zi (zi,j ) − zi,j (zi,j ) ||
⎛ ⎞2/ν ⎛ ⎞2/ν
≤ E||zi zT ∗ T ν/2
i − zi (zi,j ) || + E||zi (z∗i,j )T − z∗i,j (z∗i,j )T ||ν/2
⎛ ⎞2/ν ⎛ ⎞2/ν
= E(||zi ||||zi − z∗i,j ||)ν/2 + E(||z∗i,j ||||zi − z∗i,j ||)ν/2

≤ (E||zi ||ν )1/ν (E||zi − z∗i,j ||ν )1/ν + (E(||z∗i,j ||ν )1/ν (E||zi − z∗i,j ||ν )1/ν
⎛ ⎞
= O j −α+1 .

This shows (ii). ⨆



5.2 Dynamic Regression Models 237

Lemma 5.2.2 If .H0 of (5.2.2) and Assumptions 5.2.1–5.2.3 hold, then


⎛ ⎞
||β̂ N − β 0 || = OP N −1/2 .
.

Proof It follows from Lemma 5.2.1 and Theorem A.1.3 that


|| || ⎛ ⎞
|| 1 T ||
|| X XN − A|| = OP N −1/2
.
|| N N ||

and therefore
|| ||
||⎛ T ⎞−1 1 || ⎛ ⎞
|| − A|| = OP N −3/2 .
.
|| XN XN N || (5.2.6)

Under the null hypothesis


⎛ ⎞−1
β̂ N = β 0 + XT
. N XN XT
N EN , (5.2.7)

where .EN = (E1 , . . . , EN )T . Putting together Assumption 5.2.3, Lemma 5.2.1 and
Theorem A.1.3 we conclude
⎛ ⎞
T
.||XN EN || = OP N
1/2
.

The result now follows from (5.2.6) and (5.2.7). ⨆



Now we ready to prove the main result of this section.
Proof of Theorem 5.2.1 By definition, under the null hypothesis

Êi = Ei + xT
. i (β 0 − β̂ N )

and therefore

xi Êi = xi Ei + xi xT
. i (β 0 − β̂ N ).

Thus we get

Σ
k
k Σ
N Σ
k
k Σ
N
. xi Êi − xi Êi = xi Ei − xi Ei
N N
i=1 i=1 i=1 i=1
⎛ k ⎞
Σ k Σ T
N
+ xi xT
i − xi xi (β 0 − β̂ N ). (5.2.8)
N
i=1 i=1
238 5 Parameter Changes in Time Series Models

Due to Lemma 5.2.1 we can use Theorem A.1.3 to obtain a Gaussian approximation
for the CUSUM processes on the right hand side of (5.2.8). However, according
to Lemma 5.2.2, the second term is much smaller than the first. So we can
define Gaussian processes .{BN (t), 0 ≤ t ≤ 1} satisfying .EBN (t) = 0 and
T
.EBN (t)B (s) = D(min(t, s) − ts) such that
N

1
N 1/2−ζ
. sup ||ẐN (t) − BN (t)|| = OP (1) (5.2.9)
1/(N +1)≤t≤N/(N +1) [t (1 − t)]ζ

with some .ζ < 1/2. Due to the weighted approximation in (5.2.9), the same
techniques used in Sect. 4.1.1 can be used to prove the results. ⨆

We only considered the asymptotic distribution of the supremum functionals of
ẐN (t) in the dynamical model of (5.2.1). Using the weighted approximation these
.

results can be extended to .Lp functionals.


The application of Theorem 5.2.1 requires the estimation of the long–run
covariance matrix .D, the long–run covariance of the vectors .{xi Ei , i ∈ Z}. We need
to replace .Ei with the corresponding residual .Êi = yi − xT
i β̂ N , .i ∈ {1, . . . , N } and
we can use the kernel–bandwidth long–run covariance matrix estimator of Sect. 3.1
computed from .xi Êi , .i ∈ {1, . . . , N }. One can show that under the conditions
of Theorem 5.2.1 the kernel long–run covariance estimator converges to .D in
probability under minor conditions on the kernel and bandwidth.
The consistency of testing procedures based on Theorem 5.2.1 follow similarly
to the results in Sect. 4.2.

5.3 Random Coefficient Autoregressive Models

A popular generalization of the classical AR(1) model is the random coefficient


autoregressive process of order one (RCA(1)). An RCA(1) model for a time series
.{yi , i ∈ Z} takes the form

yi = (β + Ei,1 )yi−1 + Ei,2 ,


.

where .{Ei,1 , i ∈ Z} and .{Ei,2 , i ∈ Z} are independent innovation series. If we wish


to check for the stability of the regression coefficient .β as in Sect. 5.1.1, we may
consider the alternative AMOC model

(β0 + Ei,1 )yi−1 + Ei,2 , if 1 ≤ i ≤ k ∗ ,
.yi = (5.3.1)
(βA + Ei,1 )yi−1 + Ei,2 , if k ∗ + 1 ≤ i ≤ N.

We wish to test

H0 : β0 = βA ,
. (5.3.2)
5.3 Random Coefficient Autoregressive Models 239

against

HA : β0 /= βA .
. (5.3.3)

The following assumptions are typically made of the innovations.


Assumption 5.3.1
(i) .{Ei,1 , i ∈ Z} and .{Ei,2 , i ∈ Z} are independent sequences,
(ii) .{Ei,1 , i ∈ Z} are independent and identically distributed random variables with
.EE0,1 = 0, .0 < EE
0,1 = σ1 < ∞ and .E|E0,1 | < ∞,
2 2 4

(iii) .{Ei,2 , i ∈ Z} are independent and identically distributed random variables with
.EE0,2 = 0, .0 < EE
0,2 = σ2 < ∞ and .E|E0,2 | < ∞.
2 2 4

We note that .EE0,1 = EE0,2 = 0 is needed to identify the regression parameters


in the definition of .yi . To avoid assumptions on the moments of the .yi ’s we
use weighted least squares to estimate the regression parameter. We divide the
observations into two subsets at a candidate change point k, and compute the
weighted least squares estimate of .β with weights .1/(1 + yi−1
2 ), resulting in

⎛ k ⎞⎛ k ⎞−1
Σ yi yi−1 Σ yi−1 2
β̂k,1
. = , 2 ≤ k ≤ N, (5.3.4)
i=2
1 + yi−1
2
i=2
1 + yi−1
2

and
⎛ ⎞⎛ ⎞−1
Σ
N
yi yi−1 Σ
N 2
yi−1
β̂k,2
. = , 1 ≤ k ≤ N − 1.
i=k+1
1 + yi−1
2
i=k+1
1 + yi−1
2

The following result characterizes the asymptotic properties of the process




⎪ 0, if 0 ≤ t < 2/(N + 1),
⎨ 1/2
N [t (1 − t)](β̂L(N +1)t⎦ ,1 − β̂L(N +1)t⎦ ,2 ),
.QN (t) =

⎪ if 2/(N + 1) ≤ t ≤ 1 − 2/(N + 1),

0, if 1 − 2/(N + 1) < t ≤ 1.

If .−∞ ≤ E log |β0 +E0,1 | < 0, then the solution of (5.3.1) under the null hypothesis
is close to .ȳi , the unique, causal, stationary solution of

ȳi = (β0 + Ei,1 )ȳi−1 + Ei,2 ,


. i ∈ Z. (5.3.5)
240 5 Parameter Changes in Time Series Models

The asymptotic variance of .QN (t) depends on if (5.3.1) has a stationary solution or
not:
⎧⎡ ⎛ ⎞2 ⎛ ⎞2 ⎤ ⎛ ⎞−2

⎪ ȳ 2 ȳ02
⎪ ȳ
σ2 ⎦ E
⎨ ⎣E
⎪ 0 0
σ1 + E
2 2
,
1 + ȳ02 1 + ȳ02 1 + ȳ02
.η =
2
(5.3.6)



⎪ if − ∞ ≤ E log |β0 + E 0,1 | < 0,
⎩ 2
σ1 , if E log |β0 + E0,1 | ≥ 0.

Theorem 5.3.1 We assume that .H0 of (5.3.2), Assumptions 1.2.1 and 5.3.1 hold,
and .−∞ ≤ E log |β0 + E0,1 | < 0.
(i) If .I (w, c) < ∞ for some .c > 0, then

1 1 D |B(t)|
. sup |QN (t)| → sup ,
η 0<t<1 w(t) 0<t<1 w(t)

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge and .η is defined in (5.3.6).


(ii) For all .x ∈ R
⎧ ⎛ ⎞ ⎞
1 k(N − k) 1/2
. lim P a(log N) max |β̂k,1 − β̂k,2 | ≤ x + b(log N)
N→∞ η 1<k<N N
= exp(−2e−x ),

where .a(x) and .b(x) are defined in (1.2.18).


If .H0 holds, then
⎛ k ⎞−1 ⎛ k ⎞
Σ 2
yi−1 Σ yi−1
2 E
i,1 + yi−1 Ei,2
. β̂k,1 − β̂k,2 = (5.3.7)
i=2
1 + yi−1
2
i=2
1 + yi−1
2

⎛ ⎞−1 ⎛ ⎞
Σ
N 2
yi−1 Σ
N 2 E
yi−1 i,1 + yi−1 Ei,2
− .
i=k+1
1 + yi−1
2
i=k+1
1 + yi−1
2

Under the null hypothesis, the recursion defining .yi takes the form

yi = ρi yi−1 + Ei,2 ,
. 1 ≤ i < ∞, (5.3.8)

where .ρi = β0 + Ei,1 . We can solve the recursion in (5.3.8) explicitly. Equation
(5.3.8) implies that

Σ
i | |
i | |
i Σ
i−1 | |
l | |
i
.yi = El,2 ρj + y0 ρj = Ei−l,2 ρi−j +1 + y0 ρj .
l=1 j =l+1 j =1 l=0 j =1 j =1
5.3 Random Coefficient Autoregressive Models 241

If

. − ∞ ≤ E log |ρ0 | = −κ̄, where κ̄ > 0, (5.3.9)

then the unique causal (non-anticipative) solution of (5.3.5) is


Σ | |
l
ȳi =
. Ei−l,2 ρi−j +1 . (5.3.10)
l=0 j =1

We note that

Σ
i−1 | |
l | |
i
yi =
. Ei−l,2 ρi−j +1 + y0 ρj . (5.3.11)
l=0 j =1 j =1

The proof of Theorem 5.3.1 is based on two lemmas. In the first it is shown that
the .yi ’s can be replaced with the stationary .ȳi ’s with an asymptotically negligible
consequence. In the second lemma we show that the stationary sequence .{ȳi , i ∈ Z}
is .Lν decomposable.
Lemma 5.3.1 If .H0 of (5.3.2), Assumption 5.3.1 are satisfied and .−∞ ≤
E log |β0 + E0,1 | < 0, then
|
∞ | 2
|
Σ 2 E |
| yi−1 Ei,1 ȳi−1 i,1 |
. | − 2 ||
< ∞ a.s., (5.3.12)
| 1 + yi−1
2 1 + ȳi−1
i=2

∞ |
| |
Σ | yi−1 Ei,2 ȳi−1 Ei,2 ||
. | − 2 ||
< ∞ a.s., (5.3.13)
| 1 + yi−1
2 1 + ȳi−1
i=2

and
∞ |
| |
Σ 2 2 |
| yi−1 ȳi−1 |
. | − | < ∞ a.s., (5.3.14)
| 1 + yi−1
2 1 + ȳi−1 |
2
i=2

Proof Since .E(log |ρ0 | + κ̄/2) = −κ̄/2 < 0, by Lemma 2 of Aue et al. (2006)
there are .ν1 > 0 and .c1 < 1 such that

E|eκ̄/2 ρ0 |ν1 = c1 < 1.


.

Also, Lemma 2 yields that

E|ȳ02 |ν2 < ∞


.
242 5 Parameter Changes in Time Series Models

with some .ν2 > 0. Thus we get


⎛ ⎞ ⎛ ⎞
| |
i Σi Σi
. |ρj | = exp ⎝ log |ρj |⎠ = e−i κ̄/2 exp ⎝ (log |ρj | + κ̄/2)⎠ (5.3.15)
j =1 j =1 j =1

and
⎛ ⎛ ⎞⎞ν1
Σi
.E ⎝exp ⎝ (log |ρj | + κ̄/2)⎠⎠ ≤ c2 for all 1 ≤ i < ∞. (5.3.16)
j =1

We write
| |
| y2 ȳi2 ||
| i
.| − | ≤ 2|yi − ȳi ||yi + ȳi |
| 1 + yi2 1 + ȳi2 |

and therefore
| |

Σ | y2 ȳi2 || Σ∞
|
. |Ei | | i
− | ≤ 2 |Ei ||yi−1 + ȳi−1 ||yi−1 − ȳi−1 |.
| 1 + yi−1
2 2 |
1 + ȳi−1
i=2 i=2

We can assume that .0 < ν3 = min(ν1 , ν2 )/3 < 1 and we conclude


⎛∞ | |⎞ν3
Σ | y2 ȳi2 ||
|
.E |Ei | | i
− 2 ||
| 1 + yi−1
2 1 + ȳi−1
i=2

Σ
≤ 2ν3 E (|Ei ||yi−1 + ȳi−1 ||yi−1 − ȳi−1 |)ν3
i=2
∞ ⎛
Σ ⎞1/3
≤ 2ν3 E|Ei |3ν3 E|yi−1 + ȳi−1 |3ν3 E|yi−1 − ȳi−1 |3ν3
i=2

Σ
≤ c3 e−3i κ̄ν3 /2 < ∞,
i=2

completing the proof of (5.3.12). Similar arguments give (5.3.13) and (5.3.14). ⨆

As a result of this we only need to work with the stationary solution. It follows
from (5.3.10) .{ȳi , i ∈ Z} is a Bernoulli shift, namely there is a function .g : R∞ |→
R such that

.ȳi = g(E i , E i−1 , . . .), E i = (Ei,1 , Ei,2 )T . (5.3.17)


5.3 Random Coefficient Autoregressive Models 243

Let .{E ∗i , i ∈ Z} be independent copies of .E 0 , independent of .{E i , i ∈ Z} and define

yi,l = g(E i , E i−1 , . . . , E i−l−1 , E ∗i−l , E ∗i−l−1 , . . .),


. l ≥ 1.

Lemma 5.3.2 If .H0 of (5.3.2), Assumption 5.3.1 are satisfied and .−∞ ≤
E log |β0 + E0,1 | < 0, then
| |
|E 2 2 |4
Ei+1,1 yi,l
| i+1,1 ȳi |
.E | − | ≤ al−α , (5.3.18)
| 1 + ȳi2 1 + yi,l |
2

| |4
|E Ei+1,2 yi,l ||
| i+1,2 ȳi
.E | − 2 ||
≤ al−α (5.3.19)
| 1 + ȳi2 1 + yi,l

and
| |4
| ȳ 2 2
yi,l |
| i |
.E | − 2 ||
≤ al−α (5.3.20)
| 1 + ȳi2 1 + yi,l

for all .α > 0 with some .a > 0.


Proof Using (5.3.11) we have

ȳ0 = zl + ul
.

and

y0,l = zl + u∗l ,
.

where

Σ
l−1 | |
j | |
l−1
.zl = E−j,2 ρ−k+1 , ul = ȳ−l ρ−j
j =0 k=1 j =0

and

| |
l−1
u∗l = ȳ−l
.
∗ ∗
ρ−j ,
j =0
244 5 Parameter Changes in Time Series Models

where

ȳl∗ = g(E ∗−l , E ∗−l−1 , · · · ) and ρ−j


.
∗ ∗
= β0 + E−j,1 .

Let .β > 0. Using (5.3.15) and (5.3.16) we get via Markov’s inequality,
⎛ ⎞4
ȳ02 −β
E
. 1{|ul | > l } ≤ P {|ul | > l−β } ≤ c1 lβν1 E|ul |ν1 e−lν2 ≤ c2 e−lν3
1 + ȳ02

with some constants .ν1 > 0, ν2 > 0, ν3 > 0 and .c1 , .c2 = c2 (β). Similarly,
⎛ 2
⎞4
y0,l
E
. 1{|u∗l | >l −β
} ≤ c2 e−lν3 .
1 + y0,l
2

Elementary calculations give


| |
| ȳ 2 2
y0,l | 4|z |(|u | + |u∗ |) 2(u2 + (u∗ )2 )
| 0 | l l
.| − |≤ l
+ l l
.
| 1 + ȳ0
2 1 + y0,l |
2 1 + ȳ02 1 + ȳ02

We can assume .l is so large that .l−β < 1/8 and therefore

|zl |
. ≤ 4.
1 + (zl + ul )2

On the set .|ul | ≤ l−β and .|u∗l | ≤ l−β


| |
| ȳ 2 2
yi,l |
| i |
.| − 2 ||
≤ c3 l−β .
| 1 + ȳi2 1 + yi,l

Since .E1 and .(ȳ0 , y0,l ) are independent the proof of (5.3.18) is complete since it
does not depend on i. Similar arguments give (5.3.19) and (5.3.20). ⨅

Now we are ready to prove Theorem 5.3.1.
Proof of Theorem 5.3.1 Let

ȳ02
.a0 = E .
1 + ȳ02
5.3 Random Coefficient Autoregressive Models 245

Combining Lemma 5.3.20 with the approximations in Theorem A.1.1, we can define
Wiener processes .{WN,1 (x), x ≥ 0} such that
| k ⎛ ⎞ |
|Σ ȳi2 |
1 | |
. sup | − a0 − c1 WN,1 (k)| = OP (1) (5.3.21)
1≤k≤N k ζ1 | 1 + ȳi2 |
i=1

with some .c1 > 0 and .ζ1 < 1/2. It follows from (5.3.21) and the law of the iterated
logarithm that
| k ⎛ ⎞|
|Σ ȳi2 |
1 | |
. max | − a0 | = OP (1)
1≤k≤N k ζ2 | 1 + ȳi2 |
i=1

for all .ζ2 > 1/2 and therefore


|⎛ ⎞−1 |
| Σ |
| 1
k 2
ȳi 1 ||
. max k 2 | − = OP (1).
ζ
| k + 2 a0 ||
1≤k≤N | i=1
1 ȳi

Theorem A.1.1 implies, that we can define Wiener processes .{WN,2 (x), x ≥ 0} such
that
| N ⎛ ⎞ |
| Σ ȳi2 |
1 | |
. sup | − a0 − c W
1 N,2 (N − k) | (5.3.22)
1≤k<N (N − k) 1 | 1 + ȳi2 |
ζ
i=k+1

= OP (1)

which implies
|⎛ ⎞−1 |
| Σ |
|
ζ2 | 1
N 2
ȳi 1 ||
. max (N − k) − | = OP (1).
| N −k 1 + ȳi2
1≤k<N | i=k+11
a0 |

Using the decomposability of the Bernoulli shifts in (5.3.19) and (5.3.20) and the
approximations in Theorem A.1.1, we can define two independent Wiener processes
.{WN,3 (x), 0 ≤ x ≤ N/2} and .{WN,4 (x), 0 ≤ x ≤ N/2} such that

| Lx⎦ 2 |
1 || Σ ȳi−1 Ei,1 + ȳi−1 Ei,2 |
. max | − a0 ηWN,3 (x)|| = OP (1)
1≤x≤N/2 x ζ 3
i=2
1 + ȳi−1
2

and
| Σ |
1 | N 2 E
ȳi−1 i,1 + ȳi−1 Ei,2 |
max | − a0 ηWN,4 (N − x)|| = OP (1)
N/2≤x≤N −1 (N − x)ζ3 |
.

i=Lx⎦ +1
1 + ȳi−1
2
246 5 Parameter Changes in Time Series Models

with some .ζ3 < 1/2. Putting together the approximations we get
|⎛ ⎞−1 || |
| | |Σ |
1 | 1Σ k 2
ȳi−1 1 | | k ȳi−1
2 E
i,1 + ȳi−1 Ei,2 |
. max | − || | = OP (1)
1≤k≤N k ζ4 | k 1 + ȳi−1
2 a0 || | 1 + ȳi−1
2 |
| i=2 i=2

and
|⎛ ⎞−1
| Σ
1 | 1
N 2
ȳi−1
max |
.
1≤k≤N−1 (N − k)ζ4 | N −k 1 + ȳi−1
2
| i=k+1
|| N |
1 || || Σ ȳi−1
2 E |
i,1 + ȳi−1 Ei,2 |
− || | = OP (1)
a0 | 1 + ȳi−1
2 |
i=k+1

for all .ζ4 > 0. It follows from (5.3.7)


|
1 || x(N − x)
max (β̂Lx⎦ ,1 − β̂Lx⎦ ,2 ) (5.3.23)
2≤x≤N/2 x ζ5 |
.
N
⎛ |
x ⎡ ⎤⎞ |
− η WN,3 (x) − WN,4 (N/2) + WN,3 (N/2) || = OP (1)
N

and
|
1 | x(N − x)
max | (β̂Lx⎦ ,1 − β̂Lx⎦ ,2 ) (5.3.24)
N/2≤x≤N −1 (N − x) 5 |
.
ζ N
⎛ ⎞|
N −x ⎡ ⎤ |
− η −WN,4 (N − x) + WN,4 (N/2) + WN,3 (N/2) || = OP (1)
N

with some .ζ5 < 1/2. By computing the covariance functions we see that

BN (t)
. (5.3.25)
⎧ −1/2 ( ⎡ ⎤)
⎨N W (N t) − t WN,4 (N/2) + WN,3 (N/2) , 0 ≤ t ≤ 1/2,
( N,3 ⎡ ⎤)
= N −1/2 −WN,4 (N (1 − t)) + (1 − t) WN,4 (N/2) + WN,3 (N/2) ,

1/2 ≤ t ≤ 1

is a Brownian bridge for each N. Due to the approximations (5.3.23) and (5.3.24),
the result now follows as in Sects. 1.2 and 1.2.1. ⨆

5.3 Random Coefficient Autoregressive Models 247

If .E log |β0 + E0,1 | ≥ 0, i.e. there is no stationary solution .{yi , i ∈ Z} to (5.3.1)


under .H0 , we need an additional regularity condition:
Assumption 5.3.2
(i) if .E log |β0 + E0,1 | = 0, then .E0,2 has a bounded density
(ii) if .E log |β0 + E0,1 | > 0, then .P {(β0 + E0,1 )y0 + E0,2 = c} = 0 for all c.
Assumption 5.3.2 implies, as we see in the proofs below, that in both cases .|yi |
converges in probability to .∞.
Theorem 5.3.2 We assume that .H0 of (5.3.2), Assumptions 1.2.1, 5.3.1, 5.3.2 hold
and .E log |β0 + E0,1 | ≥ 0.
(i) If .I (w, c) < ∞ for some .c > 0, then

1 D |B(t)|
. sup |QN (t)| → sup ,
0<t<1 w(t) 0<t<1 w(t)

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge.


(ii) For all .x ∈ R
⎧ ⎛ ⎞1/2 ⎞
k(N − k)
. lim P a(log N) max |β̂k,1 − β̂k,2 | ≤ x + b(log N)
N →∞ 1<k<N N
−x
= exp(−2e ),

where .a(x) and .b(x) are defined in (1.2.18).


Proof We show that the approximations in (5.3.23) and (5.3.24) hold in the non-
stationary case too. These approximations imply the desired result using the limit
results developed in 1.2 and 1.2.1.
Lemma A.4 of Horváth and Trapani (2016) implies that there are constants .0 <
δ < 0 and .c1 > 0 such that

P {|yi | ≤ i δ } ≤ c1 i −δ .
. (5.3.26)

It follows from (5.3.26)

1 1 1
E
. =E 1{|yi | ≤ i δ } + E 1{|yi | ≥ i δ } ≤ P {|yi | ≤ i δ }
1 + yi
2 1 + yi
2 1 + yi2
+ (1 + i 2δ )−1 ≤ c2 i −δ (5.3.27)
248 5 Parameter Changes in Time Series Models

with some constant .c2 . By Markov’s inequality we have for all .x > 0 and .ζ1 > 1−δ
⎧ ⎞ ⎧ ⎞
1 Σ 1 1 Σ 1
k k
P
. max > x ≤P max max >x
M≤k<∞ k ζ1
i=1
1 + yi2 log M≤l<∞ el ≤k<el+1 k ζ1
i=1
1 + yi2

⎧ ⎞
Σ 1 Σ 1
k
≤ P max >x
ζ
el ≤k<el+1 k 1
l=log M
1 + yi2 i=1


⎧ ⎞
Σ 1 Σ
k
≤ P max > xeζ1 l
l=log M
el ≤k<el+1
i=1
1 + yi
2

⎧ ⎫

Σ ⎨exp(l+1)
Σ 1 ⎬
= P > xe ζ1 l
⎩ 1 + yi2 ⎭
l=log M i=1


Σ Σ
exp(l+1)
1 1
≤ e−ζ1 l E
x
l=log M i=1
1 + yi2


Σ Σ
exp(l+1)
c2
≤ e−ζ1 l i −δ
x
l=log M i=1
c3
≤ M −(ζ1 −(1−δ)) .
x
Hence there is .ζ2 < 1 such that for all .x > 0
⎧ ⎞
1 Σ 1
k
. lim P max ζ >x =0 (5.3.28)
M→∞ M≤k<∞ k 2
i=1
1 + yi2

and similar arguments give


⎧ ⎞
1 ΣN
1
. lim lim sup P max >x = 0. (5.3.29)
M→∞ N →∞ 1≤k≤N −M (N − k) ζ 2 1 + yi2
i=k+1

We obtain from (5.3.28) and (5.3.29)


| ⎛ ⎞−1 |
| Σ |
|
1−ζ2 |
k
y 2 |
. max k k i−1
− 1| = OP (1)
| |
| i=2 1 + yi−1
2
1≤k≤N |
5.3 Random Coefficient Autoregressive Models 249

and
| ⎛ N ⎞−1 |
| |
| Σ y 2 |
. max (N − k)1−ζ2 ||(N − k) i−1
− 1| = OP (1).
|
1≤k≤N −1 | i=k+1
1 + yi−1
2
|

Using Assumption 5.3.1 we get via the Cauchy–Schwarz inequality


⎛ ⎞4
Σ
j
El,1 ⎠
E⎝
.

l=i
1 + yl−1
2

⎛ ⎞4 ⎡⎛ ⎞2 ⎛ ⎞2 ⎤
Σ
j
1 Σ El,1 El' ,1
= 4
EEl,1 E + E⎣ ⎦
l=i
1 + yl−1
2
i≤l/=l' ≤j
1 + yl−1
2 1 + yl2' −1
⎛ ⎞4
Σ
j
1
≤ 4
EE0,1 E
l=i
1 + yl−1
2

⎡ ⎛ ⎡ ⎛ ⎞4 ⎤1/2
Σ ⎞4 ⎤1/2
1 ⎣E El' ,1 ⎦
+ 4
EE0,1 E
1 + yl−1 1 + yl2' −1
i≤l/=l' ≤j
⎡⎛ ⎛ ⎞2 ⎞2 ⎤
Σ
j Σj
4 ⎢⎝ ⎥
b ⎠ +⎝ bl−1 ⎠ ⎦
1/2
≤ EE0,1 ⎣ l−1
l=i l=i

⎛ ⎞2
Σj
4 ⎝
≤ 2EE0,1 bl−1 ⎠ ,
l=i

where
1
bl = E
. < 1.
(1 + yl2 )4

As with the proof of (5.3.27)

bl = O(l−δ ),
. as l → ∞.

Theorem 3.1 of Móricz et al. (1982) implies

| k ⎛ ⎞|4 ⎛ ⎞2
|Σ 2 | Σ
j
| yi−1 |
.E max | Ei,1 − 1 | ≤ c4 ⎝ bl−1 ⎠ .
2≤k≤j | 1 + yi−1
2 |
i=2 l=1
250 5 Parameter Changes in Time Series Models

Hence by Markov’s inequality for all .x > 0


⎧ | k | ⎞
|Σ Ei,1 y 2 Σ
k |
1 | i−1 |
P
. max | − Ei,1 | > x
2≤k≤N k ζ3 | 1 + yi−1
2 |
i=2 i=2
⎧ | k | ⎞
|Σ E |
1 | i,1 |
≤P max max | 2 ||
>x
0≤l≤log N el ≤k≤el+1 k ζ3 | 1 + yi−1
i=2
⎧ | k | ⎞
Σ 1 ||Σ Ei,1 ||
log N
≤ P max | 2 ||
>x
el ≤k≤el+1 k ζ3 | 1 + yi−1
l=0 i=2
⎧ | k | ⎞
Σ
log N |Σ E |
| i,1 |
≤ P max | 2 ||
> xe ζ3 l
el ≤k≤el+1 | 1 + yi−1
l=0 i=2
| k |4
1 Σ −4ζ3 l
log N |Σ E |
| i,1 |
≤ 4 e E max | 2 ||
x 1≤k≤e l+1 | 1 + y i−1
l=0 i=2

c5 Σ −4ζ3 l 2l(1−δ)
log N
≤ e e
x4
l=0
c6
≤ 4,
x

with the choice of .(1 − δ) < ζ3 < 1/2. Hence


| k |
|Σ Ei,1 y 2 Σ
k |
1 | i−1 |
. max | − E i,1 | = OP (1), (5.3.30)
2≤k≤N k ζ4 | 1 + yi−1
2 |
i=2 i=2

where .ζ4 < 1/2. Similar argument gives


| N |
| Σ Ei,1 y 2 Σk |
1 | i−1 |
. max | − E i,1 | = OP (1). (5.3.31)
2≤k<N (N − k)ζ4 | 1 + y 2
i−1
|
i=k+1 i=2

Following the proof of (5.3.30) and (5.3.31), one can verify,


| k |
|Σ E y |
1 | i,1 i−1 |
. max | | = OP (1),
2≤k≤N k ζ5 | 1 + yi−1 |
2
i=2
5.3 Random Coefficient Autoregressive Models 251

and
| N |
| Σ E y |
1 | i,1 i−1 |
. max | 2 ||
= OP (1)
1≤k<N (N − k)ζ5 | 1 + yi−1
i=k+1

with some .ζ5 < 1/2. We proved again approximations for .k(N − k)(β̂k,1 − β̂k,2 )
with CUSUM processes:
| ⎛ k ⎞|
| k(N − k) Σ k Σ
N |
1 | |
. max | (β̂k,1 − β̂k,2 ) − Ei,1 − Ei,1 | = OP (1)
1≤k≤N/2 k ζ6 | N N |
i=1 i=1

and
|
1 | k(N − k)
max | (β̂k,1 − β̂k,2 )
.
N/2≤k≤N −1 (N − k)ζ6 | N
⎛ ⎞|
Σ N
N −k Σ
N |
|
− − Ei,1 + Ei,1 | = OP (1)
N |
i=k+11 i=1

with some .ζ6 < 1/2. We can use again Theorem A.1.1 and define two independent
Wiener processes .{WN,1 (x), 0 ≤ x ≤ N/2} and {.WN,2 (x), 0 ≤ x ≤ N/2} such
that
| k |
1 ||Σ |
|
. max | E i,1 − σ1 WN,1 (k) | = OP (1)
1≤k≤N/2 k ζ 7 | |
i=1

and
| N |
| Σ |
1 | |
. max | Ei,1 − σ1 WN,2 (N − k)| = OP (1)
1≤k≤N/2 (N − k)ζ7 | |
i=k+1

with some .ζ7 < 1/2. The theorem now follows from the results in Sects. 1.2
and 1.2.1. ⨆

Using the weighted least squares estimators, we may define a test of .H0 versus
HA along the lines of Theorem 5.1.2. Another view that leads to this considering
.

the process .QN is the following. The estimator .β̂k,1 is the solution of the equation

Lk (β̂k,1 ) = 0,
.
252 5 Parameter Changes in Time Series Models

where

Σk
(yi − βyi−1 )yi−1
Lk (β) =
. .
i=2
1 + yi−1
2

If .H0 holds, then .Lk (β̂N,1 ) should be close to 0 for all .2 ≤ k ≤ N. Using the
formula for .β̂N,1 in (5.3.4) we obtain that
⎛N ⎞−1 k
Σk
yi yi−1 ΣN
yi yi−1 Σ yi−1 2 Σ yi−1 2
.Lk (β̂N,1 ) = −
i=2
1 + yi−1
2
i=2
1 + yi−1
2
i=2
1 + yi−1
2
i=2
1 + yi−1
2

⎛ k ⎞⎛ N ⎞
Σ yi−1 2 Σ 2
yi−1
= (β̂k,1 − β̂k,2 )
i=2
1 + yi−1
2
i=k+1
1 + yi−1
2

≈ a02 k(N − k)(β̂k,1 − β̂k,2 )

according to the proof of Theorem 5.3.1. Hence if we reject .H0 for large values
of .max2≤k≤N |Lk (β̂N,1 )|, this test is equivalent with rejecting for large values for
.max2≤k≤N k(N − k)|β̂k,1 − β̂k,2 |. These statistics in fact have the same asymptotic

distribution under .H0 .


The applications of Theorems 5.3.1 and 5.3.2 require estimating .η in (5.3.6) that
is consistent regardless of the stationarity properties of the sequence. We estimate
2
.η with

âN,1
.
2
η̂N = 2
,
âN,2

where
⎛ ⎞2
1 Σ yi − β̂N,1 yi−1
N
âN,1
. =
N
i=2
1 + yi−1
2

and

1 Σ yi2
N
âN,2
. = .
N
i=1
1 + yi2
5.3 Random Coefficient Autoregressive Models 253

Corollary 5.3.1 We assume that the conditions of Theorems 5.3.1 or 5.3.2 hold.
(i) If .I (w, c) < ∞ for some .c > 0, then

1 1 D |B(t)|
. sup |QN (t)| → sup ,
η̂N 0<t<1 w(t) 0<t<1 w(t)

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge and .η is defined in (5.3.6).


(ii) For all .x ∈ R
⎧ ⎛ ⎞1/2 ⎞
1 k(N − k)
. lim P a(log N) max |β̂k,1 − β̂k,2 | ≤ x + b(log N )
N →∞ η̂N 1<k<N N
= exp(−2e−x ),

where .a(x) and .b(x) are defined in (1.2.18)


Proof It follows from the definition of the model that
⎡ ⎤2
(yi − β̂N,1 yi−1 )2 = (β0 − β̂N,1 )yi−1 + Ei,1 yi−1 + Ei,2
.

= Ei,1
2 2
yi−1 + Ei,2
2
+ (β0 − β̂N,1 )2 yi−1
2
+ 2(β0 − β̂N,1 )yi−1
2
Ei,1

+ 2(β0 − β̂N,1 )yi−1 Ei,2 + 2yi−1 Ei,1 Ei,2 .

The proofs of Theorems 5.3.1 and 5.3.2 show that in all cases

|β̂N − β0 | = OP (N −1/2 ),
.

1 Σ 1 Σ yi−1
N 4 N 4 E
yi−1 i,1
. = OP (1), = OP (N −1/2 ),
N
i=2
(1 + yi−1
2 )2 N
i=2
(1 + y 2 )2
i−1

1 Σ yi−1 1 Σ yi−1
N 3 E N 3 E E
i−2 i,1 i−2
. = OP (N −1/2 ), = OP (N −1/2 ).
N
i=2
(1 + y 2 )2
i−1
N
i=2
(1 + y 2 )2
i−1

For the leading term we have

1 Σ Ei,1 yi−1 + yi−1 Ei,2


N 2 4 2 2
.
N −1 (1 + yi−1
2 )2
i=2
⎧ ⎛ ⎞2 ⎛ ⎞2

⎪ ȳ 2


⎨E 0
σ1 + E
2 0
σ22 + OP (N −ζ ),
= 1 + ȳ0
2 1 + ȳ0
2

⎪ if E log |β0 + E0,1 | < 0,

⎩ 2
σ1 + OP (N −ζ ), if E log |β0 + E0,1 | ≥ 0
254 5 Parameter Changes in Time Series Models

with some .ζ > 0. Similarly,


⎧ ⎛ ⎞

⎨E ȳ02
+ OP (N −ζ ), if E log |β0 + E0,1 | < 0,
âN,2
. = 1 + ȳ02


1 + OP (N −ζ ), if E log |β0 + E0,1 | ≥ 0,

completing the proof of

η̂N = η + OP (N −ζ )
.

with some .ζ > 0. ⨆



The behaviour of the AR(1) and RCA(1) processes are similar in many respects.
For both we may describe regions for their parameters admitting stationary, bound-
ary end explosive regimes. There is though a striking difference between the AR(1)
and the RCA(1) models when we test for the stability of their regression coefficients.
In the AR(1) model the limit of standard change point detector processes will
depend on the regime (cf. Sects. 5.1 and 5.1.1) while for RCA(1) models, such
processes have same asymptotic behaviour in all cases.
These results may be modified to account for the possibility that the distribution
of RCA innovations .{Ei,1 , Ei,2 , 1 ≤ i ≤ N} are “heteroscedastic”, and might change
at times .1 < m1 < . . . < mM < N satisfying the following assumption.
Assumption 5.3.3

ml = LNτl ⎦ ,
. 1 ≤ l ≤ M, 0 < τ1 < τ2 < . . . < τM < 1.

We use the notation .m0 = 0, mM+1 = N, τ0 = 0 and .τM+1 = 1. To allow changes


in the distributions of the errors we replace Assumption 5.3.1 with
Assumption 5.3.4
(i) .{Ei,1 , i ∈ Z} and .{Ei,2 , i ∈ Z} are independent sequences of independent
random variables,
(ii) for each .1 ≤ l ≤ M + 1, .{Ei,1 , ml−1 < i ≤ ml } are identically distributed
random variables with .EEml ,1 = 0, EEm 2
l ,1
= σm2 l ,1 and .E|Eml ,1 |4 < ∞,
(iii) for each .1 ≤ l ≤ M + 1, .{Ei,2 , ml−1 < i ≤ ml } are identically distributed
random variables with .EEml ,2 = 0, EEm 2
l ,2
= σm2 l ,2 and .E|Eml ,2 |4 < ∞,
(iv) if .log |β0 + Eml ,1 | ≥ 0, then .Em{(
l ,1 has a bounded
) density, .1 ≤ l}≤ M + 1,
(v) if .log |β0 + Eml ,1 | > 0, then .P β0 + εml ,1 yml−1 + εml ,2 = c = 0 for all c.
Each regime .{yi , ml−1 < i ≤ ml }, .1 ≤ l ≤ M + 1 may have a different
distribution, with potentially different variances for the estimators in each regime.
If .−∞ ≤ E log |β0 + El,1 | < 0, then the elements of this subsequence can be
approximated with stationary variables .{ȳl,j , j ∈ Z} defined by the recursion

ȳl,j = (β0 + El,j,1 )ȳl,j −1 + El,j,2 ,


. j ∈ Z,
5.3 Random Coefficient Autoregressive Models 255

where .El,j,1 = Ej,1 , .ml−1 < j ≤ ml , and .El,j,1 , j ∈ Z, j /∈ (ml−1 , . . . , ml ] are


independent and identically distributed copies of .Eml ,1 . The random variables .El,j,2
are defined in the same way. Let
⎧⎡ ⎛ ⎞2 ⎛ ⎞2 ⎤

⎪ ȳ 2


⎨ ⎣E
⎪ l,0 l,0 2 ⎦
2
σl,1 +E σl,2 ,
1 + ȳl,0
2 1 + ȳl,0
2
.ηl =
2



⎪ if − ∞ ≤ E log |β0 + Eml ,1 | < 0,
⎩ 2
σl,1 , if E log |β0 + Eml ,1 | ≥ 0,

and
⎧ ⎛ ⎞

⎨E
2
ȳl,0
, if − ∞ ≤ E log |β0 + Eml ,1 | < 0,
.al = 1 + ȳl,0
2 (5.3.32)


1, if E log |β0 + Eml ,1 | ≥ 0,

1 ≤ l ≤ M + 1. The limit of the process .QN may in this case be expressed in terms
.

of a Gaussian process with covariance function .η̄(min(t, s)) with

Σ
l−1 η2
j ηl2
.η̄(t) = (τ − τj −1 ) +
2 j
(t − τl−1 ), τl−1 < t ≤ τl , (5.3.33)
j =1
aj al2

1 ≤ l ≤ M + 1.

It is more difficult to determine the limit distribution of the standardized process


.QN (t), since its variance at each .t ∈ [0, 1] is not asymptotically proportional to
.t (1 − t). We have seen in the proofs of the standardized Darling–Erdős statistics (cf.

Sect. 1.2.1) that the standardized CUSUM process is determined by its behaviour
near the beginning and end of the observation period. On these intervals the
asymptotic variance .η0 (t, t) is though proportional to .t (1 − t), where

η0 (t, s) = E (┌(t) − t┌(1)) (┌(s) − s┌(1)) = η̄(min(t, s))−t η̄(s)−s η̄(t)+st η̄(1).
.

The following result describes how Theorems 5.3.1 and 5.3.2 change in this
heteroscedastic situation.
Theorem 5.3.3 We assume that .H0 of (5.3.2), Assumptions 1.2.1, 5.3.4 hold.
(i) If .I (w, c) < ∞ for some .c > 0, then

1 D 1
. sup |QN (t)| → sup |┌(t) − t┌(1)|,
0<t<1 w(t) 0<t<1 w(t)

where .{┌(t), 0 ≤ t ≤} is a Gaussian process with .E┌(t) = 0 and .E┌(t)┌(s) =


η̄(min(t, s)), .η̄(t) is defined in (5.3.33).
256 5 Parameter Changes in Time Series Models

(ii) For all .x ∈ R


⎧ ⎞
1
. lim P a(log N) sup 1/2
|QN (t)| ≤ x + b(log N) = exp(−2e−x ),
N →∞ 0<t<1 η0 (t, t)

where .a(x) and .b(x) are defined in (1.2.18).


Proof We can assume without loss of generality that .M ≥ 1, since the .M = 0 case
is already covered in Theorem 5.3.1. We show that the approximations developed in
the proofs of Theorems 5.3.1 and 5.3.2 are valid on each segment .(ml−1 , ml ], 1 ≤
l ≤ M + 1, only the variances of the approximating Gaussian processes will depend
on .l. The approximating process on a segment is independent of the approximating
processes on the other segments.
Let
⎧ 2
⎨ ȳl,i−1 Ei,1 + ȳl,i−1 Ei,2 , if − ∞ ≤ E log |β + E

0 ml ,1 | < 0,
.zi = 1 + ȳl,i−1
2


Ei,1 , if E log |β0 + Eml ,1 | ≥ 0,

if .ml−1 < i ≤ ml , 1 ≤ l ≤ M + 1. We define the sums

ml−1 +j
Σ
.Sl (j ) = zi , if 1 ≤ j ≤ ml − ml−1 , 1 ≤ l ≤ M + 1.
i=ml−1 +1

According to the proofs of Lemma 5.3.1, (if .−∞ ≤ log |β0 + Eml ,1 | < 0) and
Theorem 5.3.2 (if .0 ≤ log |β0 + Eml ,1 | < ∞)
| |
Σ
ml | y 2 Ei,1 + yi−1 Ei,2 |
| i−1 |
. | − Sl (k) | = OP (1), 1 ≤ l ≤ M + 1.
| 1 + yi−1
2 |
i=ml−1 +1

2 E +y
i−1 Ei,2 )/(1 + yi−1 )
This means that we may replace the partial sums of .(yi−1 2
i,1
with the partial sums of the interval stationary .zi ’s.
Using Theorem A.1.1 we may define independent Wiener processes .{WN,l,1 (x),
0 ≤ x ≤ (ml − ml−1 )/2}, {WN,l,2 (x), 0 ≤ x ≤ (ml − ml1 )/2}, 1 ≤ l ≤ M + 1
such that
1 || |
. max ζ
Sl (k) − ηl WN,l,1 (k)| = OP (1),
1≤k≤(ml −ml−1 )/2 k

and
1 | |
. max |Sl (ml ) − Sl (k) − ηl WN,l,2 (k)| = OP (1)
(ml −ml−1 )/2<k<ml −ml−1 (ml − k)ζ
5.3 Random Coefficient Autoregressive Models 257

with some .ζ < 1/2 and all .1 ≤ l ≤ M + 1. Let




⎪ WN,l,1 (x), if 0 ≤ x ≤ (ml − ml−1 )/2,

WN,l,1 ((ml − ml−1 )/2) + WN,l,2 ((ml − ml−1 )/2)
.WN,l (x) =

⎪ −WN,l,2 ((ml − ml−1 ) − x),

if (ml − ml−1 )/2 ≤ x ≤ ml − ml−1 ,

1 ≤ l ≤ M + 1. Now we define
.

Σ
l−1
ηj ηl
ΔN (k) =
. WN,j (mj − mj −1 ) + WN,l (x − ml−1 ),
aj al
j =1

ml−1 < x ≤ ml−1 , 1 ≤ l ≤ M + 1.

Thus we obtain the following approximation for the difference between the estima-
tors:
| ⎛ ⎞|
1 || k(N − k) k |
|
. max | ( β̂k,1 − β̂k,2 ) − Δ N (k) − Δ N (N ) | (5.3.34)
1≤k≤N/2 k ζ N N
= OP (1)

and
|
1 | k(N − k)
max | (β̂k,1 − β̂k,2 )
N/2≤k≤N −1 (N − k) |
.
ζ N
⎛ ⎞|
N −k |
− −(ΔN (N ) − ΔN (k)) + ΔN (N ) || (5.3.35)
N
= OP (1).

Observing that
{ }
D
. N −1/2 ┌N (N t), 0 ≤ t ≤ 1 = {┌(t), 0 ≤ t ≤ 1} ,

η0 (t, t) is proportional to .t (1 − t), if .0 < t ≤ τ1 or .τM < t < 1, we can repeat the
.

arguments in Sect. 1.2.


The proof of the second part of the theorem is based again on the approximations
in (5.3.34) and (5.3.35). First we observe
⎧ ⎛
1 1
. lim P sup1/2
|QN (t)| = max sup 1/2
|QN (t)|,
N →∞ 0<t<1 η0 (t, t) t1 ≤t≤t2 η
0 (t, t)
⎞⎞
1
sup 1/2
|QN (t)| = 1,
1−t2 ≤t≤1−t1 η0 (t, t)
258 5 Parameter Changes in Time Series Models

where .t1 = (log N )4 /N and .t2 = 1/ log N. We note that


| |
| η (t, t) η2 |
| 0 1|
. sup | − 2 | = O(1/ log N),
t1 ≤t≤t2 | t a1 |

and
| |
| η (t, t) η2 |
| 0 M+1 |
. sup | − 2 | = O(1/ log N ).
1−t1 ≤t≤1−t2 | 1 − t aM+1 |

It follows from (5.3.34) that


| |
| | ⎛ ⎞
| 1 1 | 4(ζ −1/2)
. max | |Q N (t) − WN,1,1 (N t)| = O P (log N )
t1 ≤t≤t2 | η1/2 (t, t) (N t)1/2 |
0

and
| |
| |
| 1 1 |
. max | 1/2 |QN (t) − WN,M+1,2 (N − N t)|
|
1−t2 ≤t≤1−t1 η (N − Nt) 1/2 |
0 (t, t)
⎛ ⎞
= OP (log N)4(ζ −1/2) .

Since .{WN,1,1 (x), N t1 ≤ x ≤ Nt2 } and .{WN,M+1,2 (x), N (1−t2 ) ≤ x ≤ N(1−t1 )}


are independent, it is shown in Section A.3 of Csörgő and Horváth (1997) that
⎧ ⎛
. lim P a(log N) max sup x −1/2 |WN,1,1 (x)|,
N →∞ (log N )4 ≤x≤N/ log N
⎞ ⎞
−1/2
sup (N − x) |WN,M+1,2 (N − x)| ≤ x + b(log N)
N −N/ log N ≤x≤N −(log N )4
⎧ ⎛ ⎞
= lim P a(log N) max sup x −1/2 |WN,1,1 (x)| ≤ x + b(log N)
N →∞ (log N )4 ≤x≤N/ log N
⎧ ⎞
× lim P sup x −1/2 |WN,M+1,2 (x)| ≤ x + b(log N )
N →∞ (log N )4 ≤x≤N/ log N

= exp(−2e−x ).

The proof of Theorem 5.3.3 is now complete. ⨆



5.3 Random Coefficient Autoregressive Models 259

It is more difficult to estimate the variance and covariance functions of the


Gaussian limit of .QN (t), so we consider a modification to reflect the possible
changes in the variances of the errors. Let

L(NΣ
+1)t⎦ 2
1 yi−1
ĉN,1 (t) =
. , and
N
i=2
1 + yi−1
2

1 Σ
N 2
yi−1
ĉN,2 (t) = , 0 ≤ t ≤ 1,
N
i=L(N +1)t⎦ +1
1 + yi−1
2

and define




0, if 0 < t < 2/(N⎛ + 1), ⎞
⎨ N 1/2 ĉ (t)ĉ (t) β̂
N,1 N,2 L(N +1)t⎦ ,1 − β̂L(N +1)t⎦ ,2 ,
.Q̄N (t) =
⎪ if 2/(N + 1) ≤ t < 1 − 2/(N + 1),



0, if 1 − 2/(N + 1) ≤ t < 1.

If the no change in the regression parameter null hypothesis holds, then .cN,1 (t) and
cN,2 (t) converge pointwise to the respective functions
.

Σ
l−1
c1 (t) =
. (τl − τl−1 )al + (t − τl−1 )al , τl−1 < t ≤ τl , 1 ≤ l ≤ M + 1,
j =1

and

.c2 (t) = c1 (1) − c1 (t), 0 ≤ t ≤ 1,

where .al , 1 ≤ l ≤ M + 1 is defined in (5.3.32). We also define

Σ
l−1
b(t) =
. (τj − τj −1 )ηj2 + (t − τl−1 )ηl2 , (5.3.36)
j =1

τl−1 < t ≤ τl , 1 ≤ l ≤ M + 1.

The weak limit of the modified process .Q̄N may be expressed in terms of the
Gaussian process

.O(t) = c2 (t)Δ(t) − c1 (t)(Δ(1) − Δ(t)), 0 ≤ t ≤ 1, (5.3.37)


260 5 Parameter Changes in Time Series Models

where .{Δ(t), 0 ≤ t ≤ 1} is a Gaussian process with .EΔ(t) = 0 and .EΔ(t)Δ(s) =


b(min(t, s)). Elementary calculations show

g(t, s) = EO(t)O(s)
. (5.3.38)
= c21 (1)b(min(t, s)) − c1 (1)c1 (t)b(s) − c1 (1)c1 (s)b(t) + c1 (t)c1 (s)b(1).

Theorem 5.3.4 We assume that .H0 of (5.3.2), Assumptions 1.2.1, 5.3.4 hold.
(i) If .I (w, c) < ∞ for some .c > 0, then

1 D 1
. sup |Q̄N (t)| → sup |O(t)|,
0<t<1 w(t) 0<t<1 w(t)

where .{O(t), 0 ≤ t ≤ 1} is a Gaussian process defined in (5.3.37).


(ii) For all .x ∈ R
⎧ ⎞
1 1
. lim P a(log N) sup 1/2 (t, t) (ĉ 1/2
| Q̄ N (t)| ≤ x + b(log N )
N →∞ 0<t<1 g N,1 (t)ĉN,2 (t))

= exp(−2e−x ),

where .a(x), .b(x) are defined in (1.2.18) and .g(t, s) is given in (5.3.38).
Proof Repeating the arguments used in the proofs of Theorems 5.3.1 and 5.3.2 on
the intervals .(ml−1 , ml ], one can show that
( )
. sup |ĉN,1 (t) − c1 (t)| = OP (log N )−ζ
0<t<1

with some .ζ > 0. Thus we follow the proof of Theorem 5.3.3 to establish
Theorem 5.3.4. ⨆

It may be shown that if the alternative in (5.3.3) is satisfied, such that

N 1/2 |β0 − βA | → ∞,
. (5.3.39)

where here .βA is allowed to depend on N and may converge to .β0 , then

1 P
. sup |Q̄N (t)| → ∞. (5.3.40)
0<t<1 w(t)

According to (5.3.40), we may reject the no change null hypothesis .H0 in (5.3.2) at
asymptotic level .α if

. sup |Q̄N (t)|/w(t) ≥ c(α), (5.3.41)


0<t<1
5.3 Random Coefficient Autoregressive Models 261

where .c(α) satisfies


⎧ ⎞
1
P
. sup |O(t)| ≥ c(α) = α.
0<t<1 w(t)

Computing the covariance functions, one can verify

D
{Δ(t), 0 ≤ t ≤ 1} = {W (b(t)), 0 ≤ t ≤ 1},
.

where .{W (x), 0 ≤ x < ∞} is a Wiener process. The function .b(t) is unknown, but
can be consistently estimated from the sample by

LN t⎦
⎛ ⎞2
1 Σ yi − β̂N,1 yi−1
.b̂N (t) = , 0 ≤ t ≤ 1.
N
i=2
1 + yi−1
2

One can show that under the conditions of Theorem 5.3.4,


( )
. sup |b̂N (t) − b(t)| = OP (log N)−ζ (5.3.42)
0<t<1

with some .ζ > 0. If .|β0 − βA | is bounded, as .N → ∞, then there is a function .b∗ (t)
such that

. sup |b̂N (t) − b∗ (t)| = oP (1). (5.3.43)


0<t<1

Assume that we simulated independent Wiener processes .Wi (x), 1 ≤ i ≤ L, and


compute the empirical distribution function
⎧ ⎞

L
.FN,L (x) = 1 sup |Ôi (t)|/w(t) ≤ x ,
L 0<t<1
i=1

where .Ôi (t) = ĉN,2 (t)Wi (b̂N (t)) − ĉN,1 (t)(Wi (b̂N (1)) − Wi (b̂N (t))). Let .cN,L (α)
be defined as

cN,L (α) = inf{x : FN,L (x) ≥ 1 − α}.


.

Under the null hypothesis


⎧ ⎞
1
. lim P sup |Q̄N (t)| ≥ cN,L (α) = α
min(N,L)→∞ 0<t<1 w(t)
262 5 Parameter Changes in Time Series Models

and under the alternative


⎧ ⎞
1
. lim P sup |Q̄N (t)| ≥ cN,L (α) = 1.
min(N,L)→∞ 0<t<1 w(t)

This means that our procedure provides a test with correct asymptotic size and we
reject the null hypothesis with probability going to 1 under the alternative.
We may also consider Cramér–von Mises type tests, and approximations of their
distributions using principal components as in Sect. 3.3 (cf. (3.3.4) and (3.3.7)). The
consistency of these procedures under multiple changes and heteroscedasticity may
also be established as in Sect. 3.3.

5.4 ARCH, GARCH and Other Volatility Processes

A multitude of time series models have been put forward to capture “volatility” or
“conditional heteroscedasticity” in a time series. These terms refer to the phenomena
when a, often presumed stationary, time series .{yi , i ∈ Z} has the property that the
conditional variance of the process .σi2 = var(yi |Fi−1 ) changes as a function of i,
for a suitably defined information filtration .Fi . In this section we study change point
detection procedures for models of this phenomena. Such models are frequently
applied in financial applications to model the returns on asset prices.
Perhaps the simplest volatility model is the Autoregressive Conditionally Het-
eroscedastic model of order one (ARCH(1)). We provide detailed results on this
volatility process, mainly to explain and highlight the theoretical and computational
issues when we deal with changes in volatility models. The ARCH(1) with a
potential change point in the parameters is defined by the recursions

yi = σi Ei ,
. i ≥ 1, (5.4.1)

and

σi2 = ωi + αi yi−1
.
2
, i ≥ 1, (5.4.2)

starting from an initial value .y0 . According to the definition, the conditional
expected value of .yi , conditioned on .{yj , j < i} depends on .yi−1
2 , so the process is

conditionally heteroscedastic. Assuming that .y1 , y2 , . . . , yN are observed, we wish


to test the null hypothesis

H0 : (ω1 , α1 ) = . . . = (ωN , αN )
. (5.4.3)
5.4 ARCH, GARCH and Other Volatility Processes 263

against the alternative that there are R changes in the parameters,

HA : (ω1 , α1 ) = . . . = (ωk1 , αk1 ) /= (ωk1 +1 , αk1 +1 ) = . . . = (ωk2 , αk2 ) /= . . .


.

/= (ωkR +1 , αkR +1 ) = . . . = (ωN , αN ). (5.4.4)

It is standard to assume that


Assumption 5.4.1 .{Ei , i ∈ Z} are independent and identically distributed random
variables with .EE0 = 0, .EE02 = 1 and .E|E0 |2ν < ∞ with some .ν > 2.
We note that the assumption .EE02 = 1 is needed to identify .σi2 as the conditional
variance. If .H0 holds, the parameters are denoted by .ω0 and .α0 so (5.4.1) and (5.4.2)
are

yi = σi Ei ,
. i≥1 (5.4.5)

and

σi2 = ω0 + α0 yi−1
.
2
, i ≥ 1. (5.4.6)

It would be natural to use least squares to estimate the parameters, but the
consistency of such estimators requires that .Eyi4 < ∞ (cf. Francq and Zakoian,
2010, Chapter 7). It is often unappealing to assume the existence of higher order
moments in many applications, such as when employing such models for financial
data. As suggested in Francq and Zakoian (2010) (Section 7.1.2), it is often better
to use weighted least squares and quasi–likelihood estimators. These are similar to
the RCA.(1) parameter estimates discussed in Sect. 5.3.
The weighted least squares estimates .θ̂ N = (ω̂N , α̂N )T are the solutions of the
equations

Σ
N
yi2 − ω̂N − α̂N yi−1
2
. =0 (5.4.7)
i=1
(ω̂N + α̂N yi−1
2 )2

and

Σ
N
(yi2 − ω̂N − α̂N yi−1
2 )y 2
i−1
. = 0. (5.4.8)
i=1
(ω̂N + α̂N yi−1
2 )2

To have a non–degenerate, conditionally heteroscedastic process with strictly


positive conditional variances, we require
Assumption 5.4.2 .ω0 > 0 and .α0 > 0.
264 5 Parameter Changes in Time Series Models

We further assume that (5.4.3) admits a stationary and causal solution, which is
implied by the following condition:
Assumption 5.4.3 .−∞ ≤ E log(α0 E02 ) < 0.
If .H0 of (5.4.3) holds, then the processes

Σ
k
yi2 − ω̂N − α̂N yi−1
2
TN,1 (k) =
.

i=1
(ω̂N + α̂N yi−1
2 )2

and

Σ
k
(yi2 − ω̂N − α̂N yi−1
2 )y 2
i−1
TN,2 (k) =
.

i=1
(ω̂N + α̂N yi−1
2 )2

are expected to be close to zero for all .k ∈ {1, . . . , N }. If Assumptions 5.4.2


and 5.4.3 hold, then the equations

ȳi = σ̄i Ei
. (5.4.9)

with

. σ̄i2 = ω0 + α0 ȳi−1
2
, i ∈ Z, (5.4.10)

have a unique, stationary and causal solution. Let

TN (k) = (TN,1 (k), TN,2 (k))T ,


.

and
⎛ ⎞2
D = E E02 − 1 C,
.

where
⎛ ⎡ ⎤ ⎡ ⎤⎞
1 ȳ02
⎜E ,E ⎟
⎜ (ω + α0 ȳ02 )2 (ω + α0 ȳ02 )2 ⎟
.C = ⎜ ⎡ 0 ⎤ ⎡ 0 ⎤⎟ .
⎜ ȳ02 ȳ04 ⎟
⎝ ⎠
E , E
(ω0 + α0 ȳ02 )2 (ω0 + α0 ȳ02 )2

We note that Francq and Zakoian (2004) (p. 146) showed that .C is invertible if .E02
has a non-degenerate distribution.
5.4 ARCH, GARCH and Other Volatility Processes 265

Theorem 5.4.1 We assume that .H0 of (5.4.3), Assumptions 1.2.1 and 5.4.1–5.4.3
hold.
(i) If .I (w, c) < ∞ with some .c > 0, then

1 1 ⎡ ⎤1/2
T −1
. max T N (k)D TN (k)
N 1/2 1≤k<N w(k/N )
D 1 ⎛ 2 ⎞1/2
→ sup B1 (t) + B22 (t) ,
0<t<1 w(t)

where .{B1 (t), 0 ≤ t ≤ 1} and .{B2 (t), 0 ≤ t ≤ 1} are independent Brownian


bridges.
(ii) Also,
⎧ ⎛ ⎞1/2 ⎡ ⎞
N ⎤1/2
T −1
. lim P a(log N) max TN (k)D TN (k)
N →∞ 1≤k<N k(N − k)
( )
= exp −2e−x

for all .x ∈ R, where .a(t) and .b2 (t) are defined in (1.3.9).
The applicability of Theorem 5.4.1 requires the estimation of .D. We refer to
Section 7.1.2 of Francq and Zakoian (2010) where several methods are provided to
estimate .D.
The stationary versions of .TN,1 (k) and .TN,2 (k) are

Σ
k
ȳi2 − ω̂N − α̂N ȳi−1
2
TN,3 (k) =
.

i=1
(ω̂N + α̂N ȳi−1
2 )2

and

Σ
k
(ȳi2 − ω̂N − α̂N ȳi−1
2 )ȳ 2
i−1
. TN,4 (k) = .
i=1
(ω̂N + α̂N ȳi−1
2 )2

Lemma 5.4.1 If .H0 of (5.4.3) and Assumptions 5.4.1–5.4.3 are satisfied, then
| |
. max |TN,1 (k) − TN,3 (k)| = OP (1)
1≤k≤N

and
| |
. max |TN,2 (k) − TN,4 (k)| = OP (1).
1≤k≤N
266 5 Parameter Changes in Time Series Models

Proof Iterating the relation (5.4.5) and (5.4.6) leads to


⎡ ⎤
Σ | |
i−m−1 l | |
i−m
.σi2 = ω0 ⎣1 + 2
(α0 Ei−j )⎦ + σ12 2
(α0 Ei−j ), (5.4.11)
l=1 j =1 j =1

σ12 = ω0 + α0 y02 .

Under Assumption 5.4.3, this relation has an almost sure limit as .m → ∞, which
defines a stationary solution to the ARCH(1) recursion of the form
⎡ ⎤
∞ | |
Σ l
σ̄i2 = ω0 ⎣1 +
.
2
(α0 Ei−j )⎦ .
l=1 j =1

The function .M(t) = E(α0 E02 )t exists for all .t ≤ 2. Since .M(0) = 1 and .M ' (0+) <
0 by Assumption 5.4.3, we get that there is .κ > 0 such that

.ρ1 = E(α0 E02 )κ < 1. (5.4.12)

We can assume that .0 < κ < 1 and therefore


⎛ ⎞κ ⎛ ⎞κ
Σ∞ | |
l ∞
Σ | |
l
.E ⎝ )⎠ ≤ E ⎝ (α0 Ei−j )⎠
2 2
(α0 Ei−j
l=i j =1 l=i j =1
∞ ⎛
Σ ⎞l
= E(α0 E02 )κ
l=i
⎛ ⎞
= O ρ1i ,

and
⎛ ⎞κ
| |
i−1 ⎛ ⎞
E ⎝σ12
.
2
(α0 Ei−j )⎠ = O ρ1i .
j =1

So by Markov’s inequality and the Borel–Cantelli lemma we get


| |
| 2 |
. |σi − σ̄i2 | = O(ρ2i ) a.s.

and
| |
| 2 |
. |yi − ȳi2 | = O(ρ2i ) a.s. (5.4.13)
5.4 ARCH, GARCH and Other Volatility Processes 267

with some .0 < ρ2 < 1. The result now follows from elementary arguments using
the mean–value theorem to interchange .yi with .ȳi in the summands defining .TN,2
and .TN,3 , which we omit. ⨆

Let .θ 0 = (ω0 , α0 )T . We also define the vector
⎛ ⎞
Σ
k
Ei2 − 1
⎜ 2 )2 ⎟
⎜ i=1 (ω0 + α0 ȳi−1 ⎟
.e(k) = ⎜ ⎟. (5.4.14)
⎜Σ k
(Ei − 1)ȳi−1 ⎟
2 2
⎝ ⎠
i=1
(ω0 + α0 ȳi−1 )
2 2

Lemma 5.4.2 If .H0 of (5.4.3) and Assumptions 5.4.1–5.4.3 are satisfied, then

C−1
θ̂ N − θ 0 =
. e(N ) + RN
N
with
⎛ ⎞
1
. ||RN || = OP . (5.4.15)
N

Proof Francq and Zakoian (2010) (p. 146) showed that


| | ⎛ ⎞ | | ⎛ ⎞
. |ω̂N − ω0 | = OP N −1/2 and |α̂N − α0 | = OP N −1/2 . (5.4.16)

We also observe that by (5.4.13) and the ergodic theorem,


⎛ ⎞
Σ
N
1 1
. |Ei2 − 1| ( )2 + 4 = OP (N ),
i=1 ω̂N + α̂N ȳi−1
2 σi−1

and
⎛ ⎞
Σ
N 2 )γ1
(ȳi−1 2 )γ1
(ȳi−1
. ( )2γ2 + 2γ
= OP (N )
i=1 ω̂N + α̂N ȳi−1
2 σi−12

for all .0 ≤ γ1 ≤ γ2 . Now by (5.4.13) and (5.4.16), the Eqs. (5.4.7) and (5.4.8) can
be written as

Σ
N
Ei2 − 1 ( )Σ
N
1
. + ω0 − ω̂N
ω + α0 ȳi−1
i=1 0
2
i=1
(ω0 + α0 ȳi−1
2 )2

( )Σ
N 2
ȳi−1
+ α0 − α̂N + RN,1 = 0
i=1
(ω0 + α0 ȳi−1
2 )2
268 5 Parameter Changes in Time Series Models

and

Σ
N
(Ei2 − 1)ȳi−1
2 ( )Σ
N 2
ȳi−1
. + ω0 − ω̂N
i=1
ω0 + α0 ȳi−1
2
i=1
(ω0 + α0 ȳi−1
2 )2

( )Σ
N 4
ȳi−1
+ α0 − α̂N + RN,2 = 0
i=1
(ω0 + α0 ȳi−1
2 )2

with

|RN,1 | = OP (1)
. and |RN,2 | = OP (1).

Let
⎛ ⎞
Σ
N
1 ΣN 2
ȳi−1
⎜ , 2 )2 ⎟
⎜ i=1 (ω0 + α0 ȳi−1
2 )2 (ω0 + α0 ȳi−1 ⎟
.CN = ⎜ i=1 ⎟.
⎜Σ N 2 ΣN 4 ⎟
⎝ ȳi−1 ȳi−1 ⎠
,
i=1
(ω0 + α0 ȳi−1 ) i=1 (ω0 + α0 ȳi−1 )
2 2 2 2

Next we show
|| || ⎛ ⎞
|| 1 ||
|| C − C|| = OP N −1/2 . (5.4.17)
.
|| N N ||

We observe (see Example A.1.2) that .ȳi as well as .σ̄i2 are .Lν –decomposable. Let
⎡ ⎤ ⎡ ⎤
Σ
k | |
l ∞ | |
Σ l
2
σi,k
. = ω0 ⎣1 + 2
(α0 Ei−j )⎦ + ω0 ⎣1 + ∗
(α0 (Ei−j,k )2 )⎦ ,
l=1 j =1 l=k+1 j =1

∗ , k, l ∈ Z} are independent copies of .E , independent of .{E , l ∈ Z}.


where .{El,k 0 l
Arguing as in Lemma 5.4.1
| |κ
| 2 |
E |σ̄i2 − σi,k
. | ≤ c1 ρ1k

and
| |κ
| 2 |
E |ȳi2 − yi,k
. | ≤ c1 ρ1k ,

where .c1 > 0, .0 < ρ1 < 1.


5.4 ARCH, GARCH and Other Volatility Processes 269

Hence for any .K > 0 and .ν̄ > 2


| |ν̄ ⎛ 2 ⎞
| 1 1 ||
2 | {
|σ̄i − σi,k } { } ν̄
|
.E | − 2 | ≤E 1 |σ̄i − σi,k | ≤ K + 21 |σ̄i − σi,k | > K
2 2 2 2
| σ̄i2 σi,k | ω02
⎡⎛ ⎞ν̄ ⎤
K { }
≤ 2ν̄ ⎣ + 2ν̄ P |σ̄i2 − σi,k
2
|>K ⎦
2ω02
k/2
≤ c2 ρ1 ,

k/(2κ)
where we used .K = ρ1 . Thus we further have that .{1/σ̄i2 , i ∈ Z} is .Lν –
decomposable with the approximating coefficients .vm in Definition (1.1.1) decaying
geometrically. Theorem A.1.1 yields the normality of the sum of .1/σ̄i2 , 1 ≤ i ≤ N
and therefore
|N ⎡ ⎤|
|Σ 1 || ⎛ ⎞
| 1
.| − E 2 | = OP N 1/2 . (5.4.18)
| σ̄i2 σ̄0 |
i=1

The same argument can be used for the other elements of .CN and therefore (5.4.17)
is established. Similarly to (5.4.18) we also have
|N |
|Σ Ei2 − 1 | ⎛ ⎞
| |
.| | = O P N 1/2
and
| 2 )2 |
(ω0 + α0 ȳi−1
i=1
|N |
|Σ (E 2 − 1)ȳ 2 | ⎛ ⎞
| i i−1 |
| | = O P N 1/2
.
| 2 )2 |
(ω0 + α0 ȳi−1
i=1

Thus we get the representation


⎛ ⎞
RN,1 C−1
θ̂ N − θ 0 =
. C−1 −1
N e(N ) + CN = e(N ) + RN ,
RN,2 N

where .RN satisfies (5.4.15). ⨆



Lemma 5.4.3 If .H0 of (5.4.3) and Assumptions 5.4.1–5.4.3 are satisfied, then we
can define vector valued Gaussian processes .{WN,1 (u) = (WN,1,1 (u), WN,1,2 (k))T ,
0 ≤ u ≤ N/2}, .{WN,2 (u) = (WN,2,1 (u), WN,2,2 (k))T , 0 ≤ u ≤ N/2} such that
| k |
|Σ Ei2 − 1 |
1 | |
. max | − WN,1,1 (k) | = OP (1), (5.4.19)
1≤k≤N/2 k ζ | ω0 + α0 ȳi−1
2 |
i=1
270 5 Parameter Changes in Time Series Models

| k |
|Σ (E 2 − 1)ȳ 2 |
1 | i i−1 |
. max | − WN,1,2 (k) | = OP (1), (5.4.20)
1≤k≤N/2 k ζ | ω + α0 ȳi−1
2 |
i=1 0

| N |
| Σ Ei2 − 1 |
1 | |
. max | − WN,2,1 (N − k) | = OP (1) (5.4.21)
N/2≤k<N (N − k)ζ | ω0 + α0 ȳ 2
i−1
|
i=k+1

and
| N |
| Σ (E 2 − 1)ȳ 2 |
1 | i i−1 |
. max | − WN,2,2 (N − k) | = OP (1) (5.4.22)
1≤k≤N/2 (N − k)ζ | ω0 + α0 ȳ 2
i−1
|
i=k+1

with some .ζ < 1/2, .EWN,1 (u) = EWN,2 (u) = 0 and .EWN,1 (u)WT
N,1 (v) =
T
EWN,2 (u)WN,2 (v) = E(E0 − 1) C min(u, v).
2 2

Proof We showed in the proof of Lemma 5.4.2 that the summands in (5.4.19) and
(5.4.20) are .Lν –decomposable with coefficients .vm in Definition 1.1.1 decaying
geometrically. Hence the result follows from Theorem A.1.3. ⨆

Proof of Theorem 5.4.1 Let .TN (k) = (TN,3 (k), TN,4 (k))T . Using two term Taylor
expansion with Lemmas 5.4.1 and 5.4.2 we get
|| ⎛ ⎞||
N || ||
||TN (k) − e(k) − k e(N ) || = OP (1),
. max || ||
1≤k≤N k N

and therefore
|| ⎛ ⎞||
1 || ||
||TN (k) − e(k) − k e(N ) || = OP (1)
. max || ||
1≤k≤N k ζ1 N

with some .ζ1 < 1/2. Similar arguments yield


|| ⎛ ⎞||
|| ||
max
1 ||TN (N ) − TN (k) − e(N ) − e(k) − N − k e(N ) || = OP (1).
.
1≤k<N (N − k) ζ 1 || N ||

On account of Lemma 5.4.3, we can use the results of Sect. 1.3. ⨆



If Assumption 5.4.3 does not hold, the Eqs. (5.4.1) and (5.4.2) have an explosive
solution. In this case, .ω0 cannot be consistently estimated; see Jensen and Rahbek
(2004). In such cases we may still wish to investigate the stability of the .αi ’s. We
employ weighted least squares, and we maximize the process .TN,2 . In this case we
are interested in developing a method that does not require prior knowledge of the
5.4 ARCH, GARCH and Other Volatility Processes 271

stationarity properties of the observations. Let


⎧ ⎡ ⎤

⎪ y04

⎨ E(E0 − 1) E
2 2
, if − ∞ ≤ E log(α0 E02 ) < 0,
(ω0 + α0 y02 )2
σ =
.
2

⎪ E(E02 − 1)2

⎩ , if E log(α0 E02 ) > 0.
α04

Theorem 5.4.2 We assume that .H0 of (5.4.3), Assumptions 1.2.1, 5.4.1, 5.4.2 hold
and

. E log(α0 E02 ) /= 0.

(i) If .I (w, c) < ∞ with some .c > 0, then

1 |TN,2 (k)| D |B(t)|


.
1/2
max → sup ,
σN 1≤k<N w(k/N ) 0<t<1 w(t)

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge.


Also,
⎧ ⎛ ⎞1/2 ⎞
1 N | |
. lim P a(log N) max |TN,2 (k)| ≤ x + b(log N )
N →∞ σ 1≤k<N k(N − k)
( )
= exp −2e−x

for all .x ∈ R, where .a(t) and .b(t) are defined in (1.2.18).


Proof The following lemma follows from the formula for .σi2 in (5.4.11). ⨆

Lemma 5.4.4 If .H0 of (5.4.3), Assumptions 5.4.1 and 5.4.2 are satisfied, and

.E log(α0 E02 ) > 0,

then

Σ
e−S(i) σi2 →
. e−S(l) + σ12 , a.s. (i → ∞)
l=1

where

Σ
l
.S(l) = log(α0 Ej2 ).
j =1
272 5 Parameter Changes in Time Series Models

Proof of Theorem 5.4.2 The result follows from the proof of Theorem 5.4.1 when
−∞ ≤ E log(α0 E02 ) < 0. Lemma 5.4.4 yields that
.

| ⎛ k ⎞|
| Σ E2 − 1 Σk 2−1 |
| k E |
. max |TN,2 (k) −
i
− i
| = OP (1).
1≤k≤N | 2
α0 N 2
α0 |
i=1 i=1

Since Assumption 5.4.1 holds, the approximation for partial sums of independent
random variables and the results in Sects. 1.2 and 1.2.1 yield the result in the
explosive case. ⨆

The parameter .σ 2 is unknown in Theorem 5.4.2, and applying the result to
conduct change point analysis requires its estimation. We use

1 Σ (yi − ω̂N − α̂N yi−1


N 2 )2 y 4
i−1
σ̂N2 =
. .
N
i=1
( ω̂N + α̂N y 2 )4
i−1

Theorem 5.4.3 Under the assumptions of Theorem 5.4.2,

P
σ̂N2 → σ 2 .
.

Proof We write

Σ
N
(yi2 − ω̂N − α̂N yi−1
2 )2 y 4
i−1
.

i=1
(ω0 + α0 yi−1
2 )4

Σ
N
((Ei2 − 1)(ω0 + α0 yi−1
2 ) − (ω − ω̂ ) − (α − α̂ )y 2 )2 y 4
0 N 0 N i−1 i−1
=
i=1
(ω̂N + α̂N yi−1
2 )4

Σ
N
(Ei2 − 1)yi−1
4
= + RN .
i=1
(ω0 + α0 yi−1
2 )2

We assume that .−∞ ≤ E log(α0 E02 ) < 0. It follows from Lemmas 5.4.1 and 5.4.2
|
N | 2
|
Σ |
| (Ei − 1) (ω0 + α0 yi−1 ) yi−1 (Ei2 − 1)2 ȳi−1
2 2 2 4 4
|
. | − | = OP (1)
| (ω̂N + α̂N yi−1 )
2 4 (ω0 + α0 ȳi−1 ) |
2 2
i=1
5.4 ARCH, GARCH and Other Volatility Processes 273

and the ergodic theorem yields

1 Σ (Ei2 − 1)2 ȳi−1


N 4
P
. → σ 2.
N
i=1
(ω0 + α ȳ 2 )2
0 i−1

By (5.4.16)

Σ
N
(ω0 − ω̂N )2 yi−1
4
. = OP (1),
i=1
(ω̂N + α̂N yi−1
2 )4

and

Σ
N
(α0 − α̂N )2 yi−1
8
. = OP (1).
i=1
(ω̂N + α̂N yi−1
2 )4

Thus we conclude
RN
. = oP (1). (5.4.23)
N

Next we assume that .E log(α0 E02 ) > 0. Using Lemma 5.4.4 we get
|
N | 2
|
Σ | (Ei − 1) (ω0 + α0 yi−1 ) yi−1
2 2 2 4
(Ei2 − 1)2 ||
. | − | = OP (1)
| (ω̂N + α̂N yi−1
2 )4 α04 |
i=1

and by the law of large numbers

1 Σ (Ei2 − 1)2 P 2
N
. →σ .
N
i=1
α04

Using again Lemma 5.4.4 we conclude

Σ
N ∞
Σ
(α0 − α̂N )2 yi−1
8 8
yi−1
. = (α0 − α̂N )2 = OP (1)
i=1
(ω̂N + α̂N yi−1
2 )4
i=1
(ω̂N + α̂N yi−1
2 )4

since

|α0 − α̂N | = OP (1)


.
274 5 Parameter Changes in Time Series Models

and

Σ 8 ∞
Σ 8
yi−1 yi−1
. ≤ < ∞,
i=1
(ω̂N + α̂N yi−1
2 )4
i=1
(δ + δyi−1
2 )4

where the weighted least squares are maximized on the set .[δ, 1/δ]2 , 0 < δ < 1.
We showed in the proof of Theorem 5.4.2 that
| | ⎛ ⎞
. |α̂N − α0 | = OP N −1/2

and therefore

Σ
N
(α0 − α̂N )2 yi−1
8
. = OP (1),
i=1
(ω̂N + α̂N yi−1
2 )4

since

1 Σ
N 8
yi−1 P
. → 1.
N
i=1
(ω̂N + α̂N yi−1 )
2 4

Thus we established that (5.4.23) also holds in the explosive case. ⨆



Another quite related method to test .H0 versus .HA is to directly compare
.θ̂ k,1 = (ω̂k,1 , α̂k,1 ) and .θ̂ k,2 = (ω̂k,2 , α̂k,2 )T , the weighted least squares estimators
computed from .yi , .i ∈ {1, . . . , k} and .yi , .i ∈ {k + 1, . . . , N }, respectively. To obtain
limit distributions of the functionals of the weighted differences .θ̂ k,1 − θ̂ k,2 and
.α̂k,1 − α̂k,2 we approximate them with CUSUM processes. The next two lemmas

provide the tools to do so. Due to the weighted approximations in Lemma 5.4.5,
the results in Chap. 1 can be formulated for these differences. The proofs of
the following two lemmas can be established along the lines of the proofs of
Lemmas 5.4.1, 5.4.2 and 5.4.4.
Lemma 5.4.5 If .H0 of (5.4.3) and Assumptions 5.4.1–5.4.3 are satisfied, then

1 ||
||
⎛ ⎞
−1
||
||
. max ||k θ̂ k,1 − θ 0 − C e(k) || = OP (1)
1≤k≤N kζ

and
|| ⎛ ⎞ ||
1 || −1 ||
. max ||(N − k) θ̂ k,2 − θ 0 − C (e(N ) − e(k)) || = OP (1)
1≤k<N (N − k)ζ

with some .ζ < 1/2, and .e(·) is defined in (5.4.14).


5.4 ARCH, GARCH and Other Volatility Processes 275

Lemma 5.4.6 If .H0 of (5.4.3), Assumptions 5.4.1, 5.4.2 are satisfied and

E log(α0 E02 ) > 0,


.

then
| |
| Σ
k 2 − 1|
1 | Ei−1 |
. max |k(α̂k,1 − α0 ) − | = OP (1)
1≤k≤N k ζ | α0 |
2
i=1

and
| |
| ΣN 2 − 1|
1 | Ei−1 |
. max |(N − k)(α̂k,2 − α0 ) − | = OP (1)
1≤k<N (N − k)ζ | α 2
0
|
i=k+1

with some .ζ < 1/2.


A third sensible approach may be based on the quasi–likelihood estimates of
the model parameters. Assuming that the .Ei ’s, the ARCH innovations, are standard
normal random variables, then negative two times the logarithm of the likelihood
function of .y1 , . . . , yN (up to a constant factor) is

Σ
N
yi2
. li (θ ) with li (θ ) = 2
+ log σi2 (θ),
i=1
σi (θ )

where .θ = (ω, α)T and

.σi2 (θ) = ω + αyi−1


2
, i ≥ 1.

The quasi–likelihood estimates .θ̃ N = (ω̃N , α̃N )T are defined as the solution of a
minimization problem,
⎧N ⎞
Σ
θ̃ N = argmax
. li (θ ) : θ ∈ [δ, 1/δ] 2
,
i=1

where .0 < δ < 1 is a small tuning parameter. The quasi maximum likelihood
estimate satisfies the equation that

Σ
N
∂li (θ̃ N )
. = 0.
∂θ
i=1
276 5 Parameter Changes in Time Series Models

Since

∂li (θ ) y2 1 y 2 − σ 2 (θ)
. = − 4i + 2 = i 4 i
∂ω σi (θ ) σi (θ) σi (θ )

and

∂li (θ ) y4 y2 (y 2 − σi2 (θ))yi2


. = − 4i + 2i = i
∂α σi (θ ) σi (θ ) σi4 (θ )

we obtain that .θ̃ N = θ̂ N . As such the methods based on weighted least squares are
in fact equivalent to those based on quasi–maximum–likelihood estimation.
These approaches may be extended to ARCH models of order p (ARCH.(p)),
as well as Generalized ARCH processes of orders p and q (GARCH(.p, q)). When
there are no changes in the model parameters, the ARCH(p) model takes the form

yi = σi Ei ,
. i≥1

and

Σ
p
σi2 = ω0 +
.
2
α0,l yi−l , i ≥ 1,
l=1

whereas in the GARCH specification

Σ
p Σ
q
σi2 = ω0 +
.
2
α0,l yi−l + 2
β0,j σi−j , i ≥ 1,
l=1 j =1

We provide the details for the GARCH(1,1) process to illustrate how we can
test for parameter stability in models of this variety. The GARCH(1,1) process with
varying parameters is defined via the equations:

yi = σi Ei ,
. i≥1 (5.4.24)

and

σi2 = ωi + αi yi−1
.
2
+ βi σi−1
2
, i ≥ 1. (5.4.25)

Under the null hypothesis that the parameters are homogeneous with respect to time:

H0 : (ω1 , α1 , β1 ) = · · · = (ωN , αN , βN ).
. (5.4.26)
5.4 ARCH, GARCH and Other Volatility Processes 277

(5.4.25) is replaced with

σi2 = ω0 + α0 yi−1
.
2
+ β0 σi−1
2
, i ≥ 1. (5.4.27)

Quasi–likelihood estimation is the preferred approach to estimate the parameters


θ 0 = (ω0 , α0 , β0 )T of a GARCH model, due to the latency of the volatility .σi2 .
.

Negative two times the log likelihood is

Σ
N
yi2
. li (θ) with li (θ ) = + log σi2 (θ ), (5.4.28)
i=1
σi2 (θ )

where .θ = (ω, α, β)T , and

σi2 (θ ) = ω + αyi−1
.
2
+ βσi−1
2
(θ ), i ≥ 1. (5.4.29)

The quasi–likelihood estimator is defined as


⎧ ⎞
Σ
N
θ̂ N = argmax
. li (θ) : θ ∈ O0 , O0 = [δ, 1/δ]3
i=1

with some .0 < δ < 1, small enough. Since we wish to have a strictly positive
conditional variance process, we impose
Assumption 5.4.4 .ω0 > 0, α0 > 0 and .β0 > 0.
The .log likelihood function is smooth, and therefore

Σ
N
∂li (θ̂ N ) ⎛ ⎞T
. = 0, θ̂ N = ω̂N , α̂N , β̂N , (5.4.30)
∂θ
i=1

assuming that .δ is small enough. As in case of ARCH observations, we use the


functionals of

Σ
k
∂li (θ̂ N )
ZN (k) =
. , 1 ≤ k ≤ N.
∂θ
i=1

First we consider the case when there is a stationary causal (non-anticipative)


process satisfying

ȳi = σ̄i Ei ,
. i ∈ Z, (5.4.31)
278 5 Parameter Changes in Time Series Models

and

. σ̄i2 = ω0 + α0 ȳi−1
2
+ β0 σ̄i−1
2
, i ∈ Z. (5.4.32)

The necessary and sufficient condition for the existence of the unique, causal,
stationary solution is characterized by the following assumption.
Assumption 5.4.5 . E log(β0 + α0 E02 ) < 0.
The analogue of .li (θ ) using the stationary sequence is

ȳi2
.l̄i (θ ) = + log σ̄i2 (θ )
σ̄i2 (θ )

with

. σ̄i2 (θ) = ω + α ȳi−1


2
+ β σ̄i−1
2
(θ).

We normalize .ZN (k) with the inverse of the matrix


⎡ ⎛ ⎞T ⎤
∂ l̄0 (θ 0 ) ∂ l̄0 (θ 0 )
.G = E .
∂θ ∂θ

Berkes et al. (2003) and Francq and Zakoian (2010) proved that .G is nonsingular
under minor conditions.
Theorem 5.4.4 We assume that .H0 of (5.4.26), Assumptions 1.2.1, 5.4.1 and 5.4.4
hold.
(i) If .I (w, c) < ∞ with some .c > 0, then

1 1 ⎛ ⎞1/2
T −1
. max Z (k)G ZN (k)
N 1/2 1≤k<N w(k/N ) N
D 1 ⎛ 2 ⎞1/2
→ sup B1 (t) + B22 (t) + B32 (t) ,
0<t<1 w(t)

where .{B1 (t), 0 ≤ t ≤ 1}, {B2 (t), 0 ≤ t ≤ 1} and .{B3 (t), 0 ≤ t ≤ 1} are
independent Brownian bridges.
(ii) Also,
⎧ ⎛ ⎞1/2 ⎛
N ⎞1/2
. lim P a(log N) max ZT
N (k)G−1
ZN (k)
N →∞ 1≤k<N k(N − k)

≤ x + b3 (log N) = exp(−2e−x )

for all .x ∈ R, where .a(t) and .b3 (t) are defined in (1.3.9).
5.4 ARCH, GARCH and Other Volatility Processes 279

We refer to Chapter 7 of Francq and Zakoian (2010) for a discussion on the


estimation of .G.
We proceed with providing an outline of the proof of Theorem (5.4.4). We show
that the sequence .ZN (k) can be well approximated with a CUSUM process with a
fast enough rate so that the results in Chap. 1 can be employed.
The quasi–maximum likelihood estimator based on the stationary sequence
( )T
.{ȳi , i ∈ Z} is .θ̄ N = ω̄N , ᾱN , β̄N , which is the solution of

Σ
N
∂ l̄i (θ̄ N )
. = 0.
∂θ
i=1

Let
⎛ ⎞
∂ 2 l̄0 (θ 0 )
.J=E .
∂θ 2

The existence and invertibility of .J is established in Berkes et al. (2003) (see also
Francq and Zakoian, 2010).
Lemma 5.4.7 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.4 are satisfied, then
⎛ ⎞
1 Σ ∂ l̄(θ 0 )
N
−1 1
.θ̂ N − θ 0 = −J + OP .
N ∂θ N
i=1

Proof The proof is rather technical so we only provide an outline. A detailed argu-
ment is given in Francq and Zakoian (2010) for general GARCH(.p, q) processes.
The unique stationary and causal solution to the GARCH(1,1) recursion for the
conditional variance is
⎡ ⎤
Σ∞ | |
l
.σ̄i = ω0 ⎣1 + (β0 + α0 Ei−j )⎦ .
2 2
(5.4.33)
l=1 j =1

As in the proof in Lemma 5.4.1, Assumption 5.4.5 implies that there is .κ > 0 such
that
⎛ ⎞κ
.ρ1 = E β0 + α0 E0
2
< 1.

Since we can assume that .κ < 1, we get


⎛ ⎞κ
∞ | |
Σ l
E⎝
. (β0 + α0 Ei−j
2
)⎠ ≤ c1 ρ1k (5.4.34)
l=k+1 j =1
280 5 Parameter Changes in Time Series Models

which yields
| |κ
| |
. E |yi2 − ȳi2 | ≤ 2c1 ρ1k .

It is clear that .σ̄i2 is a Bernoulli shift. Let


⎡ ⎤
Σ
k | |
l ∞ | |
Σ l
2
σi,k
. = ω0 ⎣1 + (β0 + α0 Ei−j
2
)+ ∗
(β0 + α0 (Ei−j,k )2 )⎦ ,
l=1 j =1 l=k+1 j =1

∗ , j, k ∈ Z} are independent copies of .E , independent of .{E , l ∈ Z}.


where .{Ej,k 0 l
We obtain from (5.4.34) that

E|σ̄i2 − σi,k
.
2 κ
| ≤ 2c1 ρ1k

which means that .σi2 , as well as .yi2 are .Lν –decomposable. Berkes et al. (2003)
obtained the representation

Σ
σ̄i2 (θ) = d0 (θ ) +
.
2
dj (θ )yi−j
j =1

and they showed

j
. sup |dj (θ )| ≤ c2 ρ2 ,
θ ∈O0

with some .0 < ρ2 < 1. Hence


⎛ ⎧| |⎫⎞κ
⎨|| Σ
∞ |⎬
|
.E ⎝ sup | | 2 | ⎠
dj (θ )yi−j | ≤ c3 ρ3k
θ∈O0 ⎩| |⎭
j =k+1

with some .0 < ρ3 < 1. If we define


⎡ ⎤
Σ
k | |
l ∞ | |
Σ l
2
.σi,k (θ ) = ω ⎣1 + (β + αEi−j
2
)+ (β ∗
+ α(Ei−j,k )2 )⎦ ,
l=1 j =1 l=k+1 j =1

then
⎛ ⎞κ
⎛ ⎞
E
. sup |σ̄i2 (θ ) − σ̄i,k
2
(θ)| = O ρ3k .
θ∈O0
5.4 ARCH, GARCH and Other Volatility Processes 281

Francq and Zakoian (2010) also proved


⎛ ⎞ ⎛ ⎞
||θ̂ N − θ 0 || = OP N −1/2
. and ||θ̄ N − θ 0 || = OP N −1/2 .

Further they show that there is .Ō, a neighbourhood of .θ 0 , such that


⎧|| ||⎞
|| 1 Σ
N
∂ 3 l̄i (θ) ||
|| ||
. sup || || = OP (1).
θ∈O || N ∂θ 3 ||
i=1

Using a two term Taylor expansion coordinate–wise we get


⎛ ⎞−1 ⎛ ⎞
Σ
N
∂ 2 l̄i (θ 0 ) Σ
N
∂ l̄i (θ 0 ) 1
θ̄ N − θ 0 = −
. + OP ,
i=1
∂θ 2 i=1
∂θ N 3/2

since
|| N ||
||Σ ∂ l̄i (θ 0 ) || ⎛ ⎞
|| ||
. || || = OP N 1/2 . (5.4.35)
|| ∂θ ||
i=1

The proof of (5.4.35) is given in Berkes et al. (2003) and in Section 7.4 of Francq
and Zakoian (2010). Another proof of (5.4.35) can be based on the decomposability
of .σ̄i2 (θ ). Using the arguments in Lemma 5.2 of Berkes et al. (2003) one can verify
that for all .ν̄ < ν
| |
| ∂ l̄i (θ 0 ) ∂li,k (θ 0 ) |ν̄
| | ≤ c4 ρ k ,
.E
| ∂θ − ∂θ | 4 (5.4.36)

0 < ρ4 < 1, where


.

2
yi,k
li,k (θ ) =
.
2 (θ )
+ log σ̄i,k
2
(θ ).
σi,k

We need to show only


|| N ⎛ ⎞||
||Σ ∂ 2 l̄ (θ ) || ⎛ ⎞
|| i 0 ||
. || − J || = OP N 1/2 . (5.4.37)
|| ∂θ 2 ||
i=1
282 5 Parameter Changes in Time Series Models

The proof of (5.4.37) follows from the martingale properties of .∂ 2 l̄i (θ 0 )/∂θ 2 , and
also from the fact that this sequence if .Lν –decomposable. Thus we get
⎛ ⎞
1 Σ ∂ l̄(θ 0 )
N
−1 1
θ̄ N − θ 0 = −J
. + OP . (5.4.38)
N ∂θ N
i=1

The result in (5.4.34) yields that .|σi2 − σ̄i2 | ≤ c5 ρ5i , for some .c5 > 0, .0 < ρ5 < 1,
and this along with (5.4.38) implies the result. ⨅

Lemma 5.4.8 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.4 are satisfied, then
|| k ⎡ k ⎤||
||Σ ∂ l̄ (θ̂ ) Σ ∂ l̄i (θ 0 ) k Σ ∂ l̄i (θ 0 ) ||
N
1 || i N ||
. max || − − || = OP (1)
1≤k≤N k ζ || ∂θ ∂θ N ∂θ ||
i=1 i=1 i=1

and
|| N ⎡ N ⎤||
|| Σ ∂ l̄ (θ̂ ) Σ ∂ l̄i (θ 0 ) N − k Σ
N
∂ l̄i (θ 0 ) ||
1 || i N ||
. max || − − ||
1≤k≤N (N − k)ζ || ∂θ ∂θ N ∂θ ||
i=k+1 i=k+1 i=1

= OP (1)

with some .ζ < 1/2.


Proof First we note there is .Ō, a neighbourhood of .θ 0 , such that
⎧ N || ||⎞
Σ ¯ i (θ) ||
|| ∂li (θ ) ∂l
|| ||
. sup
|| ∂θ − ∂θ || = OP (1). (5.4.39)
θ ∈Ō i=1

The claim in (5.4.39) is proven, in detail, in Francq and Zakoian (2010) (Section
7.4). The result in (5.4.39) implies
|| k ||
||Σ ∂l(θ̂ ) Σk
∂ l̄(θ̂ N ) ||
|| N ||
. max || − || = OP (1).
1≤k≤N || ∂θ ∂θ ||
i=1 i=1

Using (5.4.38) we conclude


⎛ k ⎞
Σ
k
∂ l̄(θ̂ N ) Σ
k
∂ l̄(θ 0 ) Σ ∂ 2 l̄(θ 0 ) ⎛ ⎞
. = + θ̂ N − θ 0 + Rk,1
i=1
∂θ
i=1
∂θ
i=1
∂θ 2

and
⎛ ⎞
1 || ||
||Rk,1 || = OP 1
. max .
1≤k≤N k N
5.4 ARCH, GARCH and Other Volatility Processes 283

Along the lines of the proof of (5.4.37), we can derive


|| k ||
||Σ ∂ 2 l̄(θ ) ||
1 || 0 ||
. max || − kJ|| = OP (1)
1≤k≤N k η || ∂θ 2 ||
i=1

for all .η > 1/2. So now by Lemma 5.4.7 we obtain


|| k ||
||Σ ∂ 2 l̄(θ ) || || || ⎛ ⎞
1 || 0 || || || −1/2
. max || − kJ|| ||θ̂ N − θ 0 || = O P N .
1≤k≤N k η || ∂θ 2 ||
i=1

Thus we conclude

Σ
k
∂ l̄i (θ 0 ) k Σ ∂ l̄i (θ 0 )
k
ZN (k) =
. − + Rk,2 ,
∂θ N ∂θ
i=1 i=1

and the error term satisfies


1 || ||
||Rk,2 || = OP (1)
. max ζ
1≤≤N k

with some .ζ < 1/2. The proof of the second part is the same. ⨆

Lemma 5.4.9 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.4 are satisfied, then
we can define independent Gaussian processes .{WN,1 (t), 0 ≤ t ≤ N/2} and
.{WN,2 (t), 0 ≤ t ≤ N/2}, such that .EWN,1 (t) = EWN,2 (t) = 0, EWN,1 (t)

WT T
N,1 (s) = EWN,2 (t)WN,2 (s) = G min(t, s),

|| k ||
||Σ ∂ l̄ (θ ) ||
1 || i 0 ||
. max || − WN,1 (k)|| = OP (1)
1≤k≤N/2 k ζ || ∂θ ||
i=1

and
|| N ||
|| Σ ∂ l̄ (θ ) ||
1 || i 0 ||
. max || − WN,2 (N − k)|| = OP (1)
N/2≤k<N (N − k)ζ || ∂θ ||
i=k+1

with some .ζ < 1/2.


Proof The approximations follow from (5.4.36) and Theorem A.1.3. ⨆

Proof of Theorem 5.4.4 Lemmas 5.4.7–5.4.9 show that these results follow as did
Theorems 1.1.1–1.2.5 in Chap. 1. ⨆

284 5 Parameter Changes in Time Series Models

The covariance matrix .G may be estimated as outline in Sect. 3.1.1, since the
∂ l̄i (θ 0 )/∂θ ’s are uncorrelated random vectors with mean .0. Hence
.

⎛ ⎞⎛ ⎞T
1 Σ ∂li (θ̂ N )
N
∂li (θ̂ N )
.ĜN =
N ∂θ ∂θ
i=1

is a consistent estimator under .H0 .


In order to develop a test statistic based directly on comparing the estimators
from the first k and last .N − k observations, let .θ̂ k,1 and .θ̂ k,2 be the quasi–likelihood
estimators. Define
k(N − k) ⎛ ⎞T
−1
⎛ ⎞
.LN (k) = θ̂ k,1 − θ̂ k,2 JG J θ̂ k,1 − θ̂ k,2 .
N 1/2

Theorem 5.4.5 If .H0 of (5.4.26), Assumptions 1.2.1, 5.4.1 and 5.4.4 are satisfied,
then
1 1/2 D 1 ⎛ 2 ⎞1/2
. max LN (k)) → sup B1 (t) + B22 (t) + B32 (t) ,
1≤k<N w(k/N ) 0<t<1 w(t)

where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, 2, 3} are independent Brownian bridges.


(ii) Also,
⎧ ⎛ ⎞1/2 ⎞
N 1/2
. lim P a(log N) max LN (k) ≤ x + b3 (log N)
N →∞ 1≤k<N k(N − k)

= exp(−2e−x )

for all .x ∈ R, where .a(t) and .b3 (t) are defined in (1.3.9).
This result may be established easily from the following linearization of .θ̂ k,1 and
θ̂ k,2 , which may be established along the lines of Lemma 5.4.7. For more details
.

and a different approach we refer to Ling (2007, 2016).


Lemma 5.4.10 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.4 are satisfied, then
|| ||
|| ⎛ ⎞ Σ
k
∂ l̄i (θ 0 ) ||
1 || −1 ||
. max ||k θ0 − θ̂k,1 − J || = OP (1),
1≤k≤N k ζ || ∂θ ||
i=1

and
|| ||
|| ⎛ ⎞ Σ
N
∂ l̄i (θ 0 ) ||
1 || −1 ||
. max ||(N − k) θ0 − θ̂k,2 − J || = OP (1),
1≤k<N (N − k)ζ || ∂θ ||
i=k+1

with some .ζ < 1/2.


5.4 ARCH, GARCH and Other Volatility Processes 285

We considered the stability of GARCH.(1, 1) processes only to lighten the nota-


tion. The results of Theorems 5.4.4 and 5.4.5 remain true with minor modifications
for GARCH.(p, q) sequences.
It is also of interest to understand how the above change point detection statistics
behave in the absence of Assumption 5.4.5.
Assumption 5.4.6 .E log(β0 + α0 E02 ) > 0,
Under Assumption 5.4.6, their is no stationary solution to (5.4.24) and (5.4.27),
and the process starting from an initial value .y0 is explosive in the sense that its
conditional variance is diverging in probability to positive infinity. In the model of
(5.4.24) and (5.4.25), the null hypothesis of the stability of the parameters in (5.4.26)
cannot be tested since the .ωi ’s cannot be estimated in the explosive case; see e.g.
Jensen and Rahbek (2004). Hence we wish to consider tests in this situation of
(1)
H0
. : (α1 , β1 ) = . . . = (αN , βN ). (5.4.40)

Correspondingly, let

k ⎛
Σ ⎞
∂li (θ ) ∂li (θ ) T
.r̄k (θ ) = , , 1 ≤ k ≤ N.
∂α ∂β
i=1

We use the normalization


⎛ ⎞⎛ ⎞T
1 Σ ∂li (θ̂ N ) ∂li (θ̂ N )
N
∂li (θ̂ N ) ∂li (θ̂ N ))
.F̂N = , , , (5.4.41)
N α β α β
i=1

(1)
where .θ̂ N is the quasi–likelihood estimator in (5.4.30). A test statistic for .H0 may
be based on the functionals of

. Z̄N (k) = r̄k (θ̂ N ), 1 ≤ k ≤ N.

Theorem 5.4.6 We assume that .H0 of (5.4.26), Assumptions 1.2.1, 5.4.4 and 5.4.6
hold.
(i) If .I (w, c) < ∞ with some .c > 0, then

1 1 ⎛ ⎞1/2
T −1
. max Z̄N (k)F̂N Z̄ N (k)
N 1/2 1≤k<N w(k/N )
D 1 ⎛ 2 ⎞1/2
→ sup B1 (t) + B22 (t) ,
0<t<1 w(t)

where .{B1 (t), 0 ≤ t ≤ t} and .{B2 (t), 0 ≤ t ≤ t} are independent Brownian


bridges.
286 5 Parameter Changes in Time Series Models

(ii) Also,
⎧ ⎛ ⎞1/2 ⎛
N ⎞1/2
−1
. lim P a(log N) max Z̄T
N (k)F̂N Z̄N (k)
N →∞ 1≤k<N (k(N − k))1/2

≤ x + b2 (log N) = exp(−2e−x )

for all .x ∈ R, where .a(t) and .b2 (t) are defined in (1.3.9).
Towards proving Theorem 5.4.6, we start with an elementary lemma on the
properties of the volatilities under Assumption 5.4.6. We use, as before, .σi2 for
2
.σ (θ0 ). Let
i

Σ
j
S(j ) =
. log(β0 + α0 Ek2 ), S(0) = 0
k=1

and

Σ
R = ω0
. exp(−S(k)) + σ02 (β0 + α0 E02 ).
k=1

It follows from Assumptions 5.4.4 and 5.4.6 that .R < ∞ almost surely and .R ≥
ω0 > 0.
Lemma 5.4.11 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.6 are satisfied, then

. exp(−S(i − 1))σi2 → R a.s.

Proof The explicit solution of the recursion in (5.4.24) and (5.4.27)

| | | |
i l−1 | |
i−1
σi2 = ω0
. +σ02 (β0 + α0 El2 )
l=1 j =1 l=0

Σ
i
= ω0 exp (S(i − 1) − S(i − l)) + σ02 (β0 + α0 El2 ) exp(S(i − 1))
l=1
| |
(
. ∅ = 1). The law of large numbers and Assumption 5.4.6 imply

Σ
i Σ
i−1
. exp (−S(i − l)) = exp (−S(k)) → R a.s.,
l=1 k=0

completing the proof. ⨆



5.4 ARCH, GARCH and Other Volatility Processes 287

Similarly to the non-stationary RCA(1) and ARCH(1) sequences, we may


approximate the derivatives of the .log–likelihood with stationary sequences. Ele-
mentary arguments yield
⎡ ⎤ ⎡ ⎤
∂li (θ ) yi2 ∂li (θ ) yi2
. = 1− 2 pi,1 (θ ), and = 1− 2 pi,2 (θ )
α σi (θ ) β σi (θ )

with

1 ∂σi2 (θ ) 1 ∂σ 2 (θ)
pi,1 (θ ) =
.
2
and pi,1 (θ) = 2 i .
σi α σi β

In the approximations we use the stationary sequences


Σ ∞
1 | | Σ 1 | |
j j
β0 β0
zi,1 =
.
2
Ei−j and zi,2 = .
j =1
β0 β + α0 Ei−k
k=1 0
2
j =1
β0 β + α0 Ei−k
k=1 0
2

Since asymptotically .ω0 disappears from the model, we use .θ̄ 0 = (α0 , β0 ).
Lemma 5.4.12 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.6 are satisfied, then
for all .ω̄ > 1
| |
. sup |pi,1 (θ̄ 0 , ω) − zi,1 | = O(1) a.s. (5.4.42)
1/ω̄≤ω≤ω̄

and
| |
. sup |pi,2 (θ̄ 0 , ω) − zi,2 | = O(1) a.s. (5.4.43)
1/ω̄≤ω≤ω̄

with some .0 < ρ < 1.


Proof It follows from the GARCH(1,1) recursions that

Σ
i−1 Σ
i
σi2 (θ ) = ω
. βk + α 2
β l−1 yi−l + β i σ02
k=0 l=1

which implies the representations

Σ
i−1 Σ
i
σi2 = σi2 (θ0 ) = ω0
. β0k + α0 β0l−1 yi−l
2
+ β0i σ02
k=0 l=1
288 5 Parameter Changes in Time Series Models

and

Σ
i−1 Σ
i
σi2 (θ̄0 , ω) = ω
. β0k + α0 β0l−1 yi−l
2
+ β0i σ02 .
k=0 l=1

Since .E log(β0 +α0 E02 ) > log β0 , using again Assumption 5.4.6 we get from Lemma
5.4.11 that
| |
| |
. exp (−S(i − 1)) sup |σi2 (θ̄ 0 , ω) − σi2 (θ 0 )| = o(1) a.s. (5.4.44)
1/ω̄≤ω≤ω̄

and therefore there is .0 < ρ1 < 1 such that


| |
| σ 2 (θ̄ , ω) | ⎛ ⎞
| i 0 |
. sup | 2 − 1| = O ρ1i a.s. (5.4.45)
1/ω̄≤ω≤ω̄ | σi (θ 0 ) |

The recursion defining .σi2 (θ ) also imply

1 Σ1 2 Σ | |
i i 2 j 2 (θ )
j −1
yi−j σi−k
pi,1 (θ) =
. yi−j = β
σi2 (θ ) j =1 β j =1
2 (θ )
σi−j σ2
k=1 i−k+1
(θ)

and therefore

Σ i
1
2
yi−j | |
j 2 (θ̄ , ω)
β0 σi−k 0
.pi,1 (θ̄ 0 , ω) = .
2 (θ̄ , ω)
β0 σi−j σ 2 (θ̄ , ω)
j =1 0 k=1 i−k+1 0

Since
2 (θ̄ , ω)
β0 σi−k 2 (θ̄ , ω)
β0 σi−k
0 0 1
. = ≤ ≤ 1,
2
σi−k+1 (θ̄ 0 , ω) ω + α0 yi−k
2 + β0 σi−k
2 (θ̄ , ω)
0
2
α0 Ei−k + β0

(5.4.44) implies
| | j
Σ
i/2 | 2
yi−j | | | σ 2 (θ̄ , ω)
j −1 | 2 | i−k 0
. β0 | − Ei−j |
| 2 (θ̄ , ω)
σi−j 0 | 2
σi−k+1 (θ̄ 0 , ω)
j =1 k=1
| 2 |
1 Σ 2
i/2 | σ (θ 0 ) | ⎛ ⎞
| i−j |
≤ Ei−j | 2 − 1| = OP ρ2i ,
β0 | σi−j (θ̄ 0 , ω) |
j =1
5.4 ARCH, GARCH and Other Volatility Processes 289

with some .0 < ρ2 < 1, on account of

. max i −1/κ Ei2 = O(1) a.s.


1≤i<∞

By Minkowski’s inequality we have for any .κ1 > 2


⎛ ⎛ ⎞κ1 ⎞1/κ1 ⎛ ⎛ ⎞κ1 ⎞1/κ1
Σ
i | |
j 2 (θ̄ , ω)
β0 σi−k 0 Σ
i | |
j
β0
. ⎝E ⎝ ⎠ ⎠ ≤ ⎝E ⎝ ⎠ ⎠
σ2 (θ̄ , ω)
j =i/2 k=1 i−k+1 0
α E 2 + β0
j =i/2 k=1 0 i−k
(5.4.46)
⎛ ⎛ ⎞κ1 ⎞1/κ1
Σ
i | |
j
β0
≤ ⎝E ⎝ ⎠ ⎠
j =i/2
α E 2
k=1 0 i−k
+ β0

Σ
i
j
= ρ3
j =i/2

with some .0 < ρ3 < 1. Thus the Borel–Cantelli lemma yields

Σ
i | |
j 2 (θ̄ , ω)
β0 σi−k 0
⎛ ⎞
.
2
= O ρ4i a.s.
j =i/2 k=1
σi−k+1 (θ̄ 0 , ω)

with some .0 < ρ4 < 1. It follows from the recursion

2 2 (θ ) 2
yi−j σi−j 0 yi−j σl2 (θ 0 )
. = Ei−j
2
≤ ≤ sup sup
σi−j (θ̄ 0 , ω) σi−j (θ̄ 0 , ω) σi−j (θ̄0 , ω) 1≤l<∞ 1/ω̄≤ω≤ω̄ σl (θ̄0 , ω)

and

σl2 (θ 0 )
. sup sup = O(1) a.s.
1≤l<∞ 1/ω̄≤ω≤ω̄ σl (θ̄0 , ω)

Putting together the bounds established above we conclude


| |
| 2 (θ̄ , ω) || ⎛ ⎞
| Σ i
1 2 Σj
β σ
sup ||pi,1 (θ̄0 , ω) − | = O ρi
0 i−k 0
. Ei−j 2 | 5 a.s.,
1/ω̄≤ω≤ω̄ | j =1
β0 σ (θ̄ , ω) |
k=1 i−k+1 0
290 5 Parameter Changes in Time Series Models

for some .0 < ρ5 < 1. Elementary arguments give


| |
| j |
| | | 2 (θ̄ , ω)
β0 σi−k | |
j
β |
| 0

0 | (5.4.47)
| .
|
|k=1 ω + (α0 Ei−k + β0 )σi−k (θ̄ 0 , ω) k=1 α0 Ei−k + β0 |
2 2 2

j |
| |
Σ 2 (θ̄ , ω) |
| β0 σi−k 0 β0 |
≤ | − 2 + β ||
| ω + (α0 Ei−k
2 + β )σ 2 (θ̄ , ω)
0 i−k 0 α 0 E i−k 0
k=1

Σ
j
1
≤ β0 ω .
σ 2 (θ̄ , ω)
k=1 i−k 0

Using (5.4.44) and (5.4.45) we get that


| |
| j |
Σ
i/2
| | | 2 (θ̄ , ω)
β0 σi−k | |
j
β0 |
sup 2
Ei−j | 0
− |
.
| |
|k=1 ω + (α0 Ei−k + β0 )σi−k (θ̄ 0 , ω) k=1 α0 Ei−k + β0 |
2 2 2
1/ω̄≤ω≤ω j =1

Σ
i/2 Σ
j
=O(1) 2
Ei−j exp(−S(i − k)) a.s. (5.4.48)
j =1 k=1

=O(ρ6i ) a.s.

with some .0 < ρ6 < 1. Similarly to (5.4.46)

Σ
i | |
j
β0 a.s.
.
2
Ei−j = O(ρ7i )
j =i/2 k=1
α0 E 2
i−k + β0

and

Σ
i | |
j 2 (θ̄ , ω)
β0 σi−k 0 a.s.
. sup 2
Ei−j = O(ρ7i ),
1/ω̄≤ω≤ω̄ j =i/2 k=1
ω + (α0 Ei−k
2 + β0 )σi−k
2 (θ̄ , ω)
0

with some .0 < ρ7 < 1, completing the proof of (5.4.42).


Since

Σ i
1 | |
j 2 (θ̄ , ω)
β0 σi−k 0
.pi,2 (θ̄ 0 , ω) = ,
j =1
β0 ω + (α0 Ei−k + β0 )σi−k
k=1 0
2 2 (θ̄ , ω)
0

the proof of (5.4.42) is nearly the same as that of (5.4.43). ⨆



5.4 ARCH, GARCH and Other Volatility Processes 291

Let
⎛ ⎞
Σ
j | |
l Σ∞ | |
j
1 β0 1 ⎝ β0 ⎠
zi,j,1
. = 2
Ei−l 2 +β
+ 2
Ei,i−l,j 2 +β
β0 α E β0 α E
l=1 k=1 0 i−k 0 l=j +1 k=1 0 i−k 0
⎛ ⎞
| |
l
β0
×⎝ ⎠
α E2
r=j +1 0 i,i−r,j
+ β0

and
⎛ ⎞
Σ ∞
1 | | Σ 1 ⎝ | |
j l j
β0 β0 ⎠
zi,j,2
. = +
l=1
β0 α E 2 + β0 l=j +1 β0 k=1 α0 Ei−k
k=1 0 i−k
2 +β
0
⎛ ⎞
| |
l
β0
×⎝ ⎠,
α E2
r=j +1 0 i,i−r,j
+ β0

where .{Ei,j,k , −∞ < i, j, k < ∞} are independent copies of .E0 .


Lemma 5.4.13 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.6 are satisfied, then
⎛ ⎡ ⎤ν ⎞1/ν
. E (1 − Ei2 )(zi,1 − zi,j,1 ) ≤ cρ j (5.4.49)

and
⎛ ⎡ ⎤ν ⎞1/ν
. E (1 − Ei2 )(zi,2 − zi,j,2 ) ≤ cρ j (5.4.50)

where .0 < ρ < 1 and .ν is defined in Assumption 5.4.1.


Proof By Assumption 5.4.1 and Minkowski’s inequality we have
⎛ ⎡ ⎛ ⎞

Σ 1 ⎝ | |
j
β0
. ⎝E ⎣(1 − Ei ) ⎠
2 2
Ei,i−l,j 2 +β
β0 α0 E 0
l=j +1 k=1 i−k
⎛ ⎞⎤κ ⎞1/κ
| |
l
β0
×⎝ ⎠⎦ ⎠
α E2
r=j +1 0 i,i−r,j
+ β0

⎛ ⎞ 1 Σ∞ ⎛ ⎡ ⎤κ ⎞1/κ
β0
≤ E(1 − E02 )κ E ≤ c1 ρ j ,
β0 α0 E 2 + β0
l=j +1
292 5 Parameter Changes in Time Series Models

for some .0 < ρ < 1 as in the proof of (5.4.42). Hence (5.4.49) is proven due to the
definitions of .zi,1 and .zi,j,1 . The same argument gives (5.4.50). ⨆

Let
⎛ ⎞
∂ 2 li (θ ) ∂ 2 li (θ )
⎜ ∂α 2 ,
.Qi (θ ) = ⎜ 2
∂α∂β ⎟ ⎟
⎝ ∂ li (θ ) ∂ 2 li (θ ) ⎠
,
∂α∂β ∂β 2

be the matrix of the second order partial derivatives of .li (θ ). Similarly to


Lemma 5.4.13, .Ji (θ ) can be approximated with a matrix .Q̄i of the form .Q̄i =
j(Ei , Ei−1 , . . .). The sequence .Q̄i is .Lν –decomposable, and can be approximated
∗ , E∗
by the sequence .Q̄i,j = q(Ei , Ei−1 , . . . , Ei−j +1 , Ei−j i−j −1 , . . .). The following
result follows similarly to Lemma 5.4.13, and so the details are omitted.
Lemma 5.4.14 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.6 are satisfied, then
for all .ω̄ > 1,
|| ||
. sup ||Qi (θ̄ 0 , ω) − Q̄i || = O(ρ i ) a.s. (5.4.51)
1/ω̄≤ω≤ω̄

and
⎛ || ||ν ⎞1/ν
. E ||Q̄i − Qi,j || ≤ cρ j , (5.4.52)

where .0 < ρ < 1 and .ν > 2 is defined in Assumption 5.4.1.


Proof The matrix .Qi can be computed explicitly, so following the proofs of
Lemmas 5.4.11 and 5.4.13 we get (5.4.51) and (5.4.52), respectively. ⨅

Putting together Lemmas 5.4.11–5.4.14, we get that .∇lk (θ̂ N ) is well approxi-
mated by a CUSUM process.
Lemma 5.4.15 If .H0 of (5.4.26), Assumptions 5.4.1 and 5.4.6 are satisfied, then
⎛ ⎞ζ || ⎛
|| ∂l (θ̂ ) ∂l (θ̂ )

N || k N k N
. max || ,
1≤k≤N −1 k(N − k) || ∂α ∂β
⎛ ⎞||
Σ
k
k Σ
N ||
||
− (1 − Ei ) (zi,1 , zi,2 ) −
2
(1 − Ei )2 (zi,1 , zi,2 ) ||
N ||
i=1 i=1

= OP (1)

for any .ζ > 0.


Using Lemmas 5.4.11–5.4.15, we may establish Theorem 5.4.6.
5.4 ARCH, GARCH and Other Volatility Processes 293

Proof of Theorem 5.4.6 It follows from the proof of Lemma 5.4.13 that .{(1 −
Ei )2 (zi,1 , zi,2 ), i ∈ Z} is .Lν – decomposable, and therefore Theorem A.1.3 can be
applied. Hence Assumption 1.3.2 holds, and so (1.3.8) and Theorem 1.3.1 holds in
this case with the covariance matrix
⎡ ⎤
T
.F = E(1 − E0 ) E (z0,1 , z0,2 ) (z0,1 , z0,2 ) .
2 2

Since .{(1 − Ei )2 (zi,1 , zi,2 ), i ∈ Z} are uncorrelated vectors, the sample covariance
of these vectors can be used as an estimator for .F. However, these vectors are not
observed. The result in Lemma 5.4.14 yields that the sample covariance matrix of
the partial derivatives of .{li (θ̂ N ), 1 ≤ i ≤ N } with respect to .(α, β) can be used.
One can then establish that
|| || ⎛ ⎞
|| || −1/2
. ||F̂N − F|| = oP (log log N) ,

where .F̂N is defined in (5.4.41). ⨆



In practice one will not know whether the stationary or non–stationary GARCH
model is more appropriate for the data. It is useful then, as in case of RCA (1)
processes in Sect. 5.3, to find a normalization of the CUSUM process that can be
used in both cases. Comparing Lemmas 5.4.8 and 5.4.15, we see that the limiting
distribution of the processes considered is determined by the partial sums of the
vectors .∇li (θ 0 ). So if we test for the stability in .H0 of (5.4.40), we only consider
the first two coordinates of .∇li (θ ). Let
⎛ ⎞T
∂li (θ̂ N ) ∂li (θ̂ N )
ri =
. , , i ∈ {1, ..., N}.
∂α ∂β

The long–run covariance matrix .F̃N computed from .{r 1 , i ∈ {1, ..., N}}. One may
show that
|| || ⎛ ⎞
|| || −1/2
. ||F̃N − F|| = oP (log log N) ,

under Assumption 5.4.6, in contrast to


|| || ⎛ ⎞
|| ||
. ||F̃N − G̃|| = oP (log log N)−1/2

in the stationary case, where .G̃ denotes the upper .2 × 2 submatrix of .G. Hence
Theorems 5.4.4 and 5.4.6 imply that if .log(β0 + α02 ) /= 0, then

1 1 ⎛ ⎞1/2 D 1 ⎛ 2 ⎞1/2
−1
. max Z̄T
N (k)F̃N Z̄N (k) → sup B1 (t) + B22 (t) ,
N 1/2 1≤k<N w(k/N ) 0<t<1 w(t)
294 5 Parameter Changes in Time Series Models

if .I (w, c) < ∞ with some .c > 0, where .{B1 (t), 0 ≤ t ≤ t} and .{B2 (t), 0 ≤ t ≤ t}
are independent Brownian bridges. Also,
⎧ ⎛ ⎞1/2 ⎛
N ⎞1/2
−1
. lim P a(log N) max Z̄T
N (k) F̃ Z̄N (k)
N →∞ 1≤k<N (k(N − k))1/2 N


≤ x + b2 (log N) = exp(−2e−x )

for all .x ∈ R, where .a(t) and .b2 (t) are defined in (1.3.9).

5.5 Vector Autoregressive Models

Many of the above results may be generalized to vector valued time series models.
In this section we consider a vector valued time series .Y1 , . . . , YN taking values in
.R , and in particular focus on constructing change point detection tests for vector
d

autoregressive models of order one (VAR(1)). These methods may be generalized to


general VAR(p) models. A VAR(1) model allowing for a changing autoregressive
matrix takes the form,

Yi = O(i)Yi−1 + ei ,
. i ≥ 1,

where .O(i) ∈ Rd×d and .ei ∈ Rd is an independent and identically distributed mean
zero error sequence. Under the no-change null hypothesis,

H0 : O(1) = · · · = O(N ).
. (5.5.1)

We use .O0 to denote the common autoregressive matrix under .H0 . The least squares
estimator, .ÔN of .O0 , minimizes

Σ
N
SN (O) =
. (Yi − OYi−1 )T (Yi − OYi−1 ) .
i=1

This estimator takes the form (see Brockwell and Davis, 2006)
⎛ ⎞−1
Σ
N ΣN
ÔN =
. Yi YT
i−1
⎝ Yi−1 YT
i−1
⎠ .
i=1 j =1
5.5 Vector Autoregressive Models 295

Our analysis is based on the partial sums of the weighted residuals

Σ
k
ZN (k) =
. êi YT
i , k ∈ {1, . . . , N }.
i=1

where

. êi = Yi − ÔN Yi−1 , i ∈ {1, . . . , N }

denote the residuals. The process .ZN (k) takes values in .Rd×d , but in comparison
to previous results it is easier to state limit results for .ZN (k) when it is viewed as a
vector valued process. We instead consider

zN (k) = vec(ZN (k)),


. k ∈ {1, . . . , N},

where .vec(A) denotes the .vec operator that maps a .d × d matrix to a vector in
dimension .d 2 by stacking its columns (see pg. 311 of Abadir and Magnus (2005)).
We assume the following condition on the innovations in the VAR(1) model.
Assumption 5.5.1 .{ei , i ∈ Z} are independent and identically distributed random
vectors, .Ee0 = 0 and .E||e0 ||ν < ∞ with some .ν > 4.
Let

A = Ee0 eT
. 0

be the common covariance matrix of the errors. If


q
Assumption 5.5.2 There is an integer .q ≥ 1 such that .||O0 || < 1.
holds, then

. Ȳi = O0 Ȳi−1 + ei , i∈Z

has a unique stationary and causal solution. Let

B = E Ȳ0 ȲT
. 0

be the covariance matrix of the stationary vectors .Ȳi . We also define

. C=B⊗A

where .⊗ denotes the Kronecker (tensor/outer) product of matrices. We normalize


with the matrix .C−1 , so we need
Assumption 5.5.3 .A and .B are nonsingular matrices.
296 5 Parameter Changes in Time Series Models

Theorem 5.5.1 We assume that .H0 of (5.5.1), Assumptions 1.2.1 and 5.5.1–5.5.3
hold.
(i) If .I (w, c) < ∞ with some .c > 0, then
⎛ 2 ⎞1/2
1 1 ⎛ ⎞1/2 D 1 Σd
. max zN (k)C−1 zT
N (k) → sup ⎝ Bi2 (t)⎠ ,
N 1/2 1≤k<N w(k/N ) 0<t<1 w(t) i=1

where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , d 2 } are independent Brownian bridges.


(ii) Also,
⎧ ⎛
⎞1/2 ⎛
N ⎞1/2
. lim P a(log N) max zN (k)C−1 zT
N (k)
N →∞ 1≤k<N k(N − k)

( )
≤ x + bd 2 (log N) = exp −2e−x

for all .x ∈ R, where .a(t) and .bd 2 (t) are defined in (1.3.9).
Since in the limit in Theorem 5.5.1(i) is the supremum of the sum of .d 2
independent Brownian bridges, it is worthwhile to look at integral functionals,
which are more stable with respect to moderate to large dimensions .d 2 .
Theorem 5.5.2 If .H0 of (5.5.1), Assumptions 1.2.1, 5.5.1–5.5.3 and
⎛ 1 t (1 − t)
. dt < ∞
0 w(t)

are satisfied, then

N −1 d ⎛ 1 2 2
1 Σ 1 −1 T D Σ Bi (t)
.
2
zN (k)C zN (k) → dt,
N w(k/N ) 0 w(t)
k=1 i=1

where .{Bi (t), 0 ≤ t ≤ 1}, .i ∈ {1, . . . , d 2 } are independent Brownian bridges.


The proofs of Theorems 5.5.1 and 5.5.2 make use of the following linearization
result for the estimator .ÔN .
Lemma 5.5.1 If .H0 of (5.5.1) and Assumptions 5.5.1–5.5.3 are satisfied, then
|| || ⎛ ⎞
|| Σ
N ||
|| T −1 || 1
. ||ÔN − O0 − ei Ȳi B || = OP .
|| || N
i=1
5.5 Vector Autoregressive Models 297

Proof The unique stationary causal solution is



Σ
. Ȳi = Ol ei−l , i ∈ Z.
l=0

As such .Ȳi , i ∈ Z is .Lν –decomposable, and by Assumption 5.5.2


|| ∞ ||ν
|| Σ ||
|| ||
.E || O ei−l || ≤ c1 ρ1k ,
l
ρ1 = ||Oq ||, (5.5.2)
|| ||
l=k+1

since
|| ||
|| l ||
. ||O || ≤ c2 ||Oq ||Ll/q⎦ ,

where .L·⎦ denotes the integer part. Let

Σ
k ∞
Σ
Yi,k =
. Ol ei−l + Ol e∗i−l,k ,
l=0 l=k+1

where .{e∗j,k , j, k ∈ Z} are independent copies of .e0 , independent of .{ej , j ∈ Z}.


Using (5.5.2) we get
|| ||ν
E ||Ȳi − Yi,k || ≤ 2c1 ρ1k .
. (5.5.3)

Now (5.5.3) yields


|| ||ν/2
|| T ||
E ||Ȳi ȲT
. i − Yi,k i,k ||
Y ≤ c2 ρ2k , with some 0 < ρ2 < 1,

and by Theorem A.1.3 we conclude


|| ||
|| 1 Σ
N || ⎛ ⎞
|| ||
. || Ȳi Ȳi − B|| = OP N −1/2 .
T
(5.5.4)
|| N ||
i=1

Using again (5.5.2) we obtain


|| N ||
||Σ ( )||
|| ||
. || Yi − Ȳi || = OP (1)
|| ||
i=1
298 5 Parameter Changes in Time Series Models

and
|| N ||
||Σ ⎛ ⎞||
|| T ||
. || Yi YT
i − Ȳi Ȳi || = OP (1). (5.5.5)
|| ||
i=1

Combining (5.5.3) with Theorem A.1.3 we get


|| N ||
||Σ || ⎛ ⎞
|| ||
. || ei ȲT || = OP N 1/2 ,
|| i−1 ||
i=1

and now (5.5.4) implies the result. ⨆



The next lemma shows that .ZN (k) can be written as a vector–valued CUSUM
process up to an asymptotically negligible remainder.
Lemma 5.5.2 If .H0 of (5.5.1) and Assumptions 5.5.1–5.5.3 are satisfied, then
⎛ k ⎞
Σ k Σ
N
ZN (k) =
. ei ȲT
i − T
ei Ȳi + RN ZN,1 (k)
N
i=1 i=1

where

Σ
k
k Σ
N
ZN,1 (k) =
. Ȳi−1 ȲT
i−1 − Ȳi−1 ȲT
i−1
N
i=1 i=1

and
⎛ ⎞
. ||RN || = OP N −1/2 .

Proof Applying some matrix algebra gives

Σ
k Σ
k
.ZN (k) = ei YT
i−1 − (ÔN − O0 ) Yi−1 YT
i−1
i=1 i=1

Σ
k
k Σ
N
= ei YT
i−1 − ei YT
i−1
N
i=1 i=1
⎛N ⎞−1 ⎡⎛ k ⎞
Σ
N Σ Σ
+ ei YT
i−1 Yi−1 YT
i−1 Yi−1 YT
i−1
i=1 i=1 i=1
⎛N ⎞⎤
k Σ T
− Yi−1 Yi−1 .
N
i=1
5.5 Vector Autoregressive Models 299

By (5.5.3) we conclude that


|| k ||
||Σ ||
|| T T ||
. max || ei (Yi−1 − Ȳi−1 )|| = OP (1).
1≤k<∞ || ||
i=1

Using that .{ei ȲT


i−1 , i ∈ Z} is .L –decomposable, we conclude by Theorem A.1.3
ν

that
|| N ||
||Σ || ⎛ ⎞
|| T ||
. || ei Ȳi−1 || = OP N 1/2 .
|| ||
i=1

According to (5.5.5), we can replace .Yi−1 YT T


i−1 with .Ȳi−1 Ȳi−1 in the definition of
.ZN (k) with no asymptotic consequence, which concludes the proof. ⨆

Lemma 5.5.3 If .H0 of (5.5.1) and Assumptions 5.5.1–5.5.3 are satisfied, then we
can define independent Gaussian processes .{┌ N,1 (t), t ≥ 0} and .{┌ N,2 (t), t ≥ 0}
such that
|| k ||
1 ||
||Σ T
||
||
. max || vec(ei Ȳi−1 ) − ┌ N,1 (k)|| = OP (1),
1≤k≤N/2 k ||
ζ ||
i=1

and
|| N ||
|| Σ ||
1 || ||
. max || vec(ei ȲT ) − ┌ N,2 (N − k)|| = OP (1),
N/2≤k<N (N − k)ζ || i−1 ||
i=k+1

with some .ζ < 1/2, .E┌ N,1 (t) = ┌ N,2 (t) = 0 and .E┌ N,1 (t)┌ T
N,1 (s)
= E┌ N,2 (t)┌ T
N,2 (s) = min(t, s)C.

Proof We showed in the proof of Lemma 5.5.1 that .{ei ȲT i−1 , i ∈ Z} is .L –
ν

decomposable, and therefore Theorem A.1.3 implies the result. ⨅



Proof of Theorem 5.5.1 and 5.5.2 Using Lemmas 5.5.1–5.5.3, these results follow
from Theorems 1.3.1–1.3.2 in Sect. 1.3.
Similarly to all the other cases when we are trying to find changes in the
parameters of a time series, one can use again the comparison of estimators
computed from the first k and last .N − k observations,
⎛ k ⎞⎛ k ⎞−1
1 Σ Σ
Ôk,1
. = Yi YT
i−1 Yi YT
i−1
k
i=1 i=1
300 5 Parameter Changes in Time Series Models

and
⎛ ⎞⎛ ⎞−1
1 Σ
N Σ
N
Ôk,2
. = Yi YT
i−1 Yi YT
i−1 .
N −k
i=k+1 i=k+1

Following the proof of Lemma 5.5.1 one can verify


⎛ k ⎞
N Σ k Σ
N
T
Ôk,1 − Ôk,2
. = ei Ȳi−1 − ei Ȳi−1 B−1 + RN (k),
T
k(N − k) N
i=1 i=1

and
⎛ ⎞1/2+ζ
N
. max ||RN (k)|| = OP (1)
1≤k<N k(N − k)

with some .ζ > 0. Hence the results in Theorem 5.5.1 can be extended to

k(N − k) ⎛ ⎞
vN (k) =
. Ôk,1 − Ôk,2 .
N

The matrix .C may be estimated as,

1 Σ T 1 Σ
N N
ĈN = B̂N ⊗ ÂN ,
. with ÂN = êi êi and B̂N = Yi YT
i .
N N
i=1 i=1

One can show that under the conditions of Theorem 5.5.1


|| || ⎛ ⎞
|| || −1/2
. ||ĈN − C|| = oP (log log N ) ,

so Theorem 5.5.1 remains true if .C is replaced with .ĈN .


When .d 2 is large, Theorem A.2.11 in the Appendix provides normal approxima-
tions for the limits in Theorems 5.5.1 and 5.5.2, which are useful in reducing the
computational burden of computing these distributions for large .d 2 . ⨆

5.6 Multivariate Volatility Models

In Sect. 3.2, we discussed change point detection methods for the variance and
covariance matrices. In this section we extend those results to conditionally het-
eroscedastic processes for which we are interested in evaluating for changes in their
correlation structure. We consider again vector valued observations .Y1 , . . . , YN
taking values in .Rd . We assume that these variables are centered.
5.6 Multivariate Volatility Models 301

Assumption 5.6.1 .EYi = 0.


We use the notation .Yi = (yi (1), . . . , yi (d))T ∈ Rd . The conditional variance of
.yi (j ) with respect its past .Fj −1 = {Yl , l ≤ j − 1} is

⎛ ⎞
τi2 (j ) = E yi2 (j )|Fj −1 .
.

The “devolatized” observations are denoted .Y∗i = (yi∗ (1), . . . , yi∗ (d))T with

yi (j )
yi∗ (j ) =
. , j ∈ {1, . . . , d}.
τi (j )

Similarly to Sect. 5.4, we assume the .Yi series evolves according to a multivariate
GARCH–type model, so that

1/2
.Yi = E i ei ,

where the following conditions hold


Assumption 5.6.2 .{ei , i ∈ Z} are independent and identically distributed random
vectors in .Rd with .Ee0 = 0, .Ee0 eT
0 = I.
Assumption 5.6.3 .E i is measurable with respect to .Fi−1 , and .{E i , i ∈ Z} is a
causal Bernoulli–shift in the innovations .{ei , i ∈ Z}.
Under these conditions .E i in the conditional covariance matrix of .Yi with respect
to the past .Fi−1 , i.e.
⎛ ⎞
E i = E Yi YT
. i | Fi−1 .

To avoid degenerate cases we also impose

Assumption 5.6.4 There is a positive definite matrix .E 0 such that .E i −E 0 is almost


surely positive definite for all .i ∈ Z.
1/2
Using the notation .E i = {σi (k, j ), 1 ≤ k, j ≤ d}, we can write .τi (j ) = σi (j, j ).
It follows from Assumption 5.6.4 that there is a positive constant .τ0 > 0 such that

.τi (j ) ≥ τ0 for all i and j ∈ {1, . . . , d}.

It is a consequence of Assumptions 5.6.2 and 5.6.3 that .{Yi , i ∈ Z} is a stationary


and ergodic sequence. Next we impose a condition on the dependence structure of
.Yi :

Assumption 5.6.5 .E||Y0 ||ν < ∞ with some .ν > 4 and .{Yi , i ∈ Z} is .Lr –
decomposable with some .r > 2.
302 5 Parameter Changes in Time Series Models

Let

.ρi (k, l) = Eyi∗ (k)yi∗ (l), 1 ≤ k, l ≤ d

be the covariances of the devolatilized variables .Y∗i . We are interested to evaluate


if .ρi (k, l) appears to change during the sample. This is formulated as the null
hypothesis

H0 : ρ1 (k, l) = . . . = ρN (k, l)
. for all k, l ∈ {1, . . . , d}, (5.6.1)

against the alternative

HA : there is a k ∗ ∈ {1, . . . , N } and j ∗ , l∗ ∈ {1, . . . , d}


.

such that ρ1 (j ∗ , l∗ ) = · · · = ρk ∗ (j ∗ , l∗ )
/= ρk ∗ +1 (j ∗ , l∗ ) = · · · = ρN (j ∗ , l∗ ).

Under the null hypothesis the covariance matrix of .Y∗i = (yi∗ (1), . . . , yi∗ (d))T does
not depend on time index i, while under the alternative at least one of the elements
of the covariance matrix changes at an unknown time .k ∗ . In essence then we wish
to perform change point analysis for the mean of the matrices .Y∗i (Y∗i )T . Since these
matrices are symmetric, we may characterize them by .d(d + 1)/2 dimensional
vectors using the .vech operator which stacks the columns of a symmetric matrix
starting with the diagonal into a vector.
To formulate a CUSUM process of such vectors, we introduce

Σ
j
s(j ) =
. ri , s(0) = 0
i=1

with

ri = vech(yi∗ (j )yi∗ (k), 1 ≤ j, k ≤ d).


.

Assuming that .H0 of (5.6.1) holds, we can define the long–run covariance matrix

Σ
D=
. Er0 rT
l.
l=−∞

We show in the proofs that under Assumption 5.6.5 the infinite sum defining .D
is absolutely convergent. The normalization in our test statistics requires that this
matrix is non–singular.
5.6 Multivariate Volatility Models 303

Assumption 5.6.6 .D is non–singular.


We use the following two statistics
⎛ ⎞T ⎛ ⎞
(1) 1 i −1 i
.M
N = max s(i) − s(N ) D s(i) − s(N )
N 1≤i≤N N N

and
N ⎛ ⎞T ⎛ ⎞
(2) 1 Σ i −1 i
.M
N = 2 s(i) − s(N ) D s(i) − s(N ) .
N N N
i=1

Theorem 5.6.1 If .H0 of (5.6.1) and Assumption 5.6.1–5.6.6 are satisfied, then

(1) D Σ
d(d+1)/2
MN
. → sup Bi2 (t),
0<t<1 i=1

and

(2) D Σ ⎛ 1
d(d+1)/2
.M
N → Bi2 (t)dt,
i=1 0

where .{Bi (t), 0 ≤ t ≤ 1}, i ∈ {1, . . . , d(d + 1)/2} are independent Brownian
bridges.
Theorem A.2.11 can again be used to approximate the distributions of the limits
in Theorem 5.6.1 for large d.
In the definition of .yi∗ (j ), we normalize the observed time series with .τi (j ),
which is not observable. In practice this may be replaced by an estimator .τi (j )
computed from .Y1 , . . . , Yi−1 . We consider parametric models to do so. Suppose
there is a p dimensional parameter .θ ∈ RP such that

τi (j ) = τi (j, θ), j ∈ {1, . . . , d}, i ∈ {1, . . . , N }.


.

Our estimator .τ̄i (j ) for .τi (j ) also depends on .θ,

τ̄i (j ) = τ̄i (j, θ ), j ∈ {1, . . . , d}, i ∈ {1, ..., N}


.

We require that .τi (j, θ ) and .τ̄i (j, θ) are uniformly close to each other as .i → ∞.
The true value of .θ is denoted by .θ 0 :

Assumption 5.6.7 There is a closed ball .O0 ⊂ Rp with center .θ 0 and a sequence
.a(i) satisfying .a(i) → 0, .ia(i) → ∞, as .i → ∞, such that

. max sup |τi (j, θ) − τ̄i (j, θ)| = O(a(i)), a.s. (i → ∞).
1≤j ≤d θ ∈O0
304 5 Parameter Changes in Time Series Models

Assumption 5.6.7 means that the difference between the stationary .τi (j, θ ) and
the non stationary .τ̄i (j, θ ) is small, i.e. there is a negligible difference between
estimating .θ 0 based on the information .Y1 , . . . , Yi−1 or .{Ys , s ≤ i − 1} when i
is large. We estimate .θ 0 with .θ̂ N which is consistent with rate .N −1/2 :
Assumption 5.6.8 .||θ̂ N − θ 0 || = OP (N −1/2 ).
The random functions .τi (j, θ) are smooth functions of .θ in a neighbourhood of
θ 0:
.

Assumption 5.6.9 There is a closed ball .O0 ⊂ Rp with center .θ 0 such that
|| ||
|| T ||
. ||τi (j, θ ) − τi (j, θ 0 ) − gi (j )(θ − θ 0 )|| ≤ ḡi ||θ − θ 0 || ,
2

for all .θ ∈ O0 , where .{gi (j ), j ∈ {1, . . . , d}, gi , i ∈ Z} is a stationary and ergodic


sequence with .E||g0 (j )||2 < ∞ and .E ḡ02 < ∞.

Using the estimated volatilities .τ̄i (j, θ̂ N ) we define

yi (j )
.ŷi (j ) = ,
τ̄i (j, θ̂ N )

which can be computed from the sample. Let

r̂i = vech(ŷi (j )ŷi (k), 1 ≤ j, k ≤ d),


. i ∈ {1, . . . , N }

We also need .D̂N , an estimator for .D, that satisfies


Assumption 5.6.10 .||D̂N − D|| = oP (1).
A long–run kernel–bandwidth estimator satisfies Assumption 5.6.10 under minor
conditions. Let

⎪ N −l

⎪ 1 Σ


⎨N (r̂i − r̄N )(r̂i − r̄N )T , if 0 ≤ l < N
.γ̂ l =
i=1

⎪ 1 ΣN

⎪ (r̂i − r̄N )(r̂i − r̄N )T , if − N < l < 0,

⎩N
i=−l+1

where

Σ
N
r̄N =
. r̂i .
i=1
5.6 Multivariate Volatility Models 305

Now the estimator is

Σ
N −1 ⎛ ⎞
l
D̂N =
. K γ̂ l ,
h
l=−N +1

where K is the kernel and .h = h(N) is the window (smoothing parameter). We refer
to Sect. 3.1 for a discussion on the possible choices of the kernel and the window.
Using the proofs in Sect. 3.1 one can verify that Assumption 5.6.10 is satisfied under
standard conditions on K and .h = h(N).
(1) (2)
Similarly to .MN and .MN we define
⎛ ⎞T ⎛ ⎞
1 i i
max ŝ(i) − ŝ(N ) D̂−1
(1)
M̂N =
.
N ŝ(i) − ŝ(N )
N 1≤i≤N N N

and
N ⎛ ⎞T ⎛ ⎞
(2) 1 Σ i −1 i
.M̂
N = 2 ŝ(i) − ŝ(N ) D̂N ŝ(i) − ŝ(N )
N N N
i=1

with

Σ
i
ŝ(i) =
. r̂j .
j =1

Theorem 5.6.2 If .H0 of (5.6.1) and Assumption 5.6.1–5.6.10 are satisfied, then

D Σ
d(d+1)/2
.M̂N(1) → sup Bi2 (t)
0<t<1 i=1

and

(2) D Σ ⎛ 1
d(d+1)/2
.M̂
N → Bi2 (t)dt,
i=1 0

where .{Bi (t), 0 ≤ t ≤ 1}, i ∈ {1, . . . d(d+1)/2} are independent Brownian bridges.
Before we prove Theorems 5.6.1 and 5.6.2 we discuss a few examples where the
conditions of these theorems are satisfied.
306 5 Parameter Changes in Time Series Models

Example 5.6.1 Bollerslev (1990) and Jeantheau (1998) specified the constant
conditional correlation (CCC.(p, q)) multivariate GARCH model by the following
equations:

E i = Di RDi ,
.

⎛ ⎞T
Di = diag (τi (1), τi (2), . . . , τi (d)) ,
. hi = τi2 (1), τi2 (2), . . . , τi2 (d)

and

Σ
q Σ
p
hi = c +
. Al (Yi−l ◦ Yi−l ) + Bj Yi−j ,
l=1 j =1

where .◦ denotes the Hadamard product of vectors (coordinate-wise multiplication),


R is a correlation matrix, .c is a vector of positive coordinates, and .Al , 1 ≤
.

l ≤ q, Bj , 1 ≤ j ≤ p are matrices with positive elements. Aue et al. (2009a)


finds sufficient conditions that imply Assumptions 5.6.3–5.6.5. We note that Aue
et al. (2009a) proves that the solution is .Lν –decomposable, with the approximating
coefficients .vm in Definition (1.1.1) decaying geometrically. We note that an
alternative is to use mixing–conditions as detailed in Bradley (2007).
Francq and Zakoian (2010) also gave a detailed account of the estimation
of the parameters of an CCC.(p, q) sequence by quasi–maximum likelihood and
Assumptions 5.6.7–5.6.9 are established. In addition to the QMLE, the variance
targeting estimator also satisfies our assumptions (see Francq et al., 2016). Francq
and Zakoian (2014) proposed a new method to estimate parameters utilizing the
covariance structure of the observations. Their proofs show that Assumptions 5.6.7–
5.6.9 hold for their estimators.
Example 5.6.2 Aielli (2013) introduced the corrected dynamic conditional corre-
lation (cDCC) model where

.E i = Di Ri Di , (5.6.2)

and .Di is a diagonal matrix,

Di = diag (τi (1), τi (2), . . . , τi (d)) .


. (5.6.3)

It is assumed that .yi (j ) is modeled as a univariate GARCH process

τi2 (j ) = hj (ζ j , yi−1 (j ), yi−2 (j ), . . .),


. j ∈ {1, . . . , d}, (5.6.4)

where .hj is a known function and .ζ j , j ∈ {1, . . . , d} are unknown parameters. The
conditional correlation of .Yi satisfies

Ri = (diag(Qi ))−1/2 Qi (diag(Qi ))−1/2


. (5.6.5)
5.6 Multivariate Volatility Models 307

and
⎡ ⎤
Qi = θ1 C + θ2 (diag(Qi−1 ))1/2 Y∗i−1 (Y∗i−1 )T (diag(Qi−1 ))1/2 + θ3 Qi−1 ,
.

where .Y∗i are the devolatized observations. It is assumed that .C is positive definite,
.θ1 > 0, θ2 ≥ 0, θ3 ≥ 0 and .θ1 + θ2 + θ3 = 1. The parameters of the process are
.C, ζ 1 , . . . , ζ d , θ2 and .θ3 . Since there are several univariate asymmetric GARCH

models (see Francq and Zakoian, 2010), the cDCC model accounts for possible
asymmetry of the returns. Aielli (2013) points out if .hj (ζ j , . . .) are stationary
and ergodic, then vector valued observations satisfying the cDCC also has these
properties. Carrasco and Chen (2002) and Hörmann (2008) prove that augmented
univariate GARCH processes have these properties under minor conditions. Since
augmented GARCH sequences are .Lν –decomposable (see Carrasco and Chen,
2002), cDCC also has this property. The existence of the higher moments of
augmented GARCH sequences are also discussed in Carrasco and Chen (2002) and
Hörmann (2008). We note Hörmann (2008) proves that the augmented GARCH
processes are decomposable Bernoulli shifts and therefore a process following the
cDCC modell is as well. Using .β–mixing or the Bernoulli shift property, one can
show that Assumption 5.6.10 is satisfied by following the arguments in Wu and
Zaffaroni (2018). In Definition 3.4 of Aielli (2013), the .ζ i ’s, the parameters of the
augmented GARCH sequences, are estimated by QMLE. The proofs in Section 3.3
in Aielli (2013) yield that the estimators obtained in the second and third steps have
the properties in Assumptions 5.6.7–5.6.9.
Example 5.6.3 The dynamic conditional correlation (DCC) GARCH model is an
extension of the CCC and dCCC models in Examples 5.6.1 and 5.6.2. Equations
(5.6.2)–(5.6.4) hold but (5.6.5) is replaced with
( )T
Qi = C + AY∗i−1 Y∗i−1 AT + BQi−1 BT ,
.

where .C is a positive definite matrix, .A, .B are .d × d matrices. Fermanian and


Malongo (2017) provide general conditions for the existence of a unique stationary
and ergodic solution of the DCC equations and they prove Assumptions 5.6.3–
5.6.5. Pape et al. (2021) provide estimators for the parameters which satisfy
Assumptions 5.6.7–5.6.9.
Example 5.6.4 Baba, Engle, Kraft and Kroner (see Engle and Kroner, 1995)
introduced the BEKK model, where the conditional covariance matrix satisfies the
recursion

Σ
q Σ
p
T
Ei = C +
. Aj Yi−j (Aj Yi−j ) + Bk E i−k BT
k, (5.6.6)
j =1 k=1

where .C, Aj , j ∈ {1, . . . , q}, Bk , k ∈ {1, . . . , p} are .d × d matrices and .C is


positive definite. The parameters of the BEKK sequences can be estimated by the
308 5 Parameter Changes in Time Series Models

QMLE and the variance targeting QMLE (see Comte and Lieberman, 2003, Hafner
and Preminger, 2009, Pedersen and Rahbek, 2014 and Francq et al., 2016). For
the sake of simplicity we assume that .p = q = 1 and the parameters matrices
are denoted by .A and .B. Boussama et al. (2011) proves the existence of a unique
stationary solution of the BEKK equations assuming that the distribution of .e0 is
absolutely continuous with respect to the Lebesgue measure on .Rd , the point .0 is
an interior point of the support of the distribution of .e0 , and the spectral radius
of .A + B is less than 1. In their proofs Boussama et al. (2011) also shows that
the solution is ergodic and geometrically .β–mixing, and their proof can also be
used to establish .Lν –decomposability of the solution. Hence Assumptions 5.6.3
and 5.6.5 hold and we need to assume only that .E||Y0 ||r < ∞. Hafner and
Preminger (2009) provide explicit conditions for the existence of moments. If
Assumption 5.6.6 holds, then the mixing property of .Yi and the existence of the
moments of .||Y0 || yield Assumption 5.6.6 along the lines of the calculations in
Wu and Zaffaroni (2018). The parameters of the BEKK model can be estimated
by the QMLE and the variance targeting QMLE. Hafner and Preminger (2009),
Pedersen and Rahbek (2014) and Francq et al. (2016) establish Assumption 5.6.7
with expnential rate, and the established asymptotic normality in those papers yields
Assumption 5.6.8. Finally, the computation of the second derivatives of .τi (i, θ ) in
Pedersen and Rahbek (2014) (see also Hafner and Preminger, 2009 and Francq et al.,
2016) gives Assumption 5.6.9.
Example 5.6.5 Engle et al. (1990) defined the conditional covariance matrix .E i ,
using a factor model, by the equation

Σ
p
Ei = C +
. λi (j )β j β T
j ,
j =1

and

λi (j ) = ωj + αj yi−1
.
2
(j ) + βj λi−1 (j ),

where .C is a positive definite matrix, .ωj > 0, αj ≥ 0, βj ≥ 0, j ∈ {1, . . . , p}


and .β 1 , β 2 , . . . , β d are linearly independent vectors. Francq and Zakoian (2010)
point out that the factor model can be written in a BEKK form of Example 5.6.3, so
Assumptions 5.6.3–5.6.9 are satisfied under mild conditions.
The proofs of both Theorems 5.6.1 and 5.6.2 are based on the following lemmas.
Lemma 5.6.1 If .H0 of (5.6.1) and Assumptions 5.6.2–5.6.6 hold, then

D [0,1]d
N −1/2 (s(Nu) − Es(Nu)) −→ W(u),
.

where .{W(u), u ≥ 0} is a Gaussian process with .EW(u) = 0 and .EW(u)W(v) =


D min(u, v).
5.6 Multivariate Volatility Models 309

Proof It follows from Assumptions 5.6.2–5.6.5 that .yi∗ (k)yi∗ (l), k, l ∈ {1, . . . , d}
is also stationary and .β–mixing with the same rate as .Yi . Since Assumption 5.6.4
implies that there is .τ0 > 0 such that .τi (j ) ≥ τ0 , we get

1 ( )1/2
E|yi∗ (k)yi∗ (l)|r/2 ≤
.
2
E E|yi∗ (k)|r E|yi∗ (l)|r < ∞,
τ0

via the Cauchy–Schwarz inequality and Assumption 5.6.5. Hence the weak con-
vergence of partial sums in Ibragimov (1962) (see also Bradley, 2007) implies the
lemma. ⨆

Proof of Theorem 5.6.1 Lemma 5.6.1 implies that
⎛ ⎞
−1/2 LNu⎦ D [0,1]d
N
. s(Nu) − s(N ) −→ W(u) − uW(1). (5.6.7)
N

Checking the covariance structure one can verify that


{ }
. D−1/2 (W(u) − uW(1)), 0 ≤ u ≤ 1
{ }
D
= (B1 (u), B2 (u), . . . , Bd(d+1)/2 (u))T , 0 ≤ u ≤ 1 , (5.6.8)

where .{Bi (u), u ≤ u ≤ 1}, .i ∈ {1, . . . , d(d + 1)/2} are independent Brownian
bridges. The result now follows from (5.6.7) and (5.6.8) via the continuous mapping
theorem. ⨆

Proof of Theorem 5.6.2 It follows from the definition of .ŷi (k) that

ŷi (k)ŷi (l) − yi∗ (k)yi∗ (l) = ai,1 (k, l) + · · · + ai,8 (k, l),
.

where
⎛ ⎞⎛ ⎞
1 1 1 1
ai,1 (k, l) = yi (k)yi (l)
. − −
,
τ̄i (k, θ̂ N ) τi (k, θ̂ N ) τ̄i (l, θ̂ N )
τi (l, θ̂ N )
⎛ ⎞⎛ ⎞
1 1 1 1
ai,2 (k, l) = yi (k)yi (l) − − ,
τ̄i (k, θ̂ N ) τi (k, θ̂ N ) τi (l, θ̂ N ) τi (l, θ 0 )
⎛ ⎞
1 1 yi (l)
ai,3 (k, l) = yi (k) − ,
τ̄i (k, θ̂ N ) τi (k, θ̂ N ) τi (l)
⎛ ⎞⎛ ⎞
1 1 1 1
ai,4 (k, l) = yi (k)yi (l) − − ,
τi (k, θ̂ N ) τi (k, θ 0 ) τ̄i (l, θ̂ N ) τi (l, θ̂ N )
⎛ ⎞⎛ ⎞
1 1 1 1
ai,5 (k, l) = yi (k)yi (l) − − ,
τi (k, θ̂ N ) τi (k, θ 0 ) τi (l, θ̂ N ) τi (l, θ 0 )
310 5 Parameter Changes in Time Series Models

⎛ ⎞
yi (k) 1 1
ai,6 (k, l) = yi (l) − ,
τi (k) τ̄i (l, θ̂ N ) τi (l, θ̂ N )
⎛ ⎞
1 1 yi (l)
ai,7 (k, l) = yi (k) −
τi (k, θ̂ N ) τi (k, θ 0 ) τi (l)
⎛ ⎞
yi (k) 1 1
ai,8 (k, l) = yi (l) − .
τi (k) τi (l, θ̂ N ) τi (l, θ 0 )

Since .τ̄i (k) ≥ τ0 > 0, by Assumptions 5.6.8 and 5.6.9 we have, on account of the
mean value theorem, that

Σ
j Σ
N
−1/2
N
. max |ai,1 (k, l)| = OP (1)N −1/2 |yi (k)yi (l)|a 2 (i),
1≤j ≤N
i=1 i=1

where .a(·) is defined in Assumption 5.6.7. We can assume without loss of generality
that .a(i) is increasing as .i → ∞. Using again Assumption 5.6.7, we can define a
sequence .ai such that as .i → ∞, .i −1/2 ai → 0, and .i 1/2 a(ai ) → 0. Therefore

Σ
N Σ
aN
−1/2 −1/2
N
. |yi (k)yi (l)|a (i) ≤ N
2
|yi (k)yi (l)|a 2 (i)
i=1 i=1

Σ
N
+ N −1/2 |yi (k)yi (l)|a 2 (i) (5.6.9)
i=aN +1
⎛ ⎞
= OP N −1/2 aN + Na 2 (aN ) = oP (1),

where in the last step we used that according to the ergodic theorem


L
. |yi (k)yi (l)| → E|y0 (k)y0 (l)|, a.s. L → ∞.
L
i=1

We note .E|y0 (k)y0 (l)| < ∞, since by the Cauchy–Schwarz inequality and
Assumption 5.6.5
⎛ ⎞1/2
E|y0 (k)y0 (l)| ≤ Ey02 (k)Ey02 (l)
. < ∞.
5.6 Multivariate Volatility Models 311

Putting together Assumptions 5.6.7–5.6.9, we conclude via a two term Taylor


expansion that

Σ
N ⎛ ⎞Σ
N
N −1/2
. |ai,2 (k, l)| = OP N −1/2 |yi (k)yi (l)|a(i)
i=1 i=1
⎡ ⎤
× ||gi (l)||||θ̂ N − θ 0 || + ḡi ||θ̂ N − θ 0 ||2 .

Following the proof of (5.6.9), one can show that

Σ
N
1 Σ
N
. |yi (k)yi (l)|a(i)||gi (l)||||θ̂ N − θ 0 || = OP (1) |yi (k)yi (l)|a(i)||gi (l)||
N
i=1 i=1

= oP (1),

since
⎛ ⎞1/2
|y0 (k)y0 (l)|||gi (l)|| ≤ E(y0 (k)y0 (l))2 E||gi (l)||2
.

⎛ ⎞1/4 ⎛ ⎞1/2
≤ Ey04 (l)Ey04 (k) E||gi (l)||2

< ∞.

The same arguments give

Σ
N
1 Σ
N
. |yi (k)yi (l)|a(i)ḡi ||θ̂ N − θ 0 ||2 = OP (1) |yi (k)yi (l)|a(i)||ḡi ||
N 3/2
i=1 i=1
⎛ ⎞
1 Σ
N
1
= OP (1) max ḡi |yi (k)yi (l)|a(i)
N 1/2 1≤i≤N N
i=1

= oP (1),

since .E ḡi2 < ∞ implies

1
. max ḡi = oP (1).
N 1/2 1≤i≤N

Similarly,

Σ
N
1 Σ
N
. N −1/2 |ai,3 (k, l)| = OP (1) |yi (k)yi (l)|a(i) = oP (1).
N 1/2
i=1 i=1
312 5 Parameter Changes in Time Series Models

By symmetry, we also have

Σ
N
N −1/2
. |ai,j (k, l)| = oP (1), j = 4, 5, 6.
i=1

Assumption 5.6.9 implies


| |
|Σ Σj ⎛ ⎞|
| i
yi (k)yi (l) T |
.N
−1/2
max || ai,7 − 2 (k)
gi (k) θ 0 − θ̂ N
|
|
1≤i≤N | τi (l)τ |
j =1 i=1 i

Σ
N
= OP (1)N −1/2 |yi (k)yi (l)|ḡi ||θ 0 − θ̂ N ||2
i=1
⎛ ⎞
1 Σ
N
−1/2
= OP (1) N max ḡi |yi (k)yi (l)|
1≤i≤N N
i=1

= oP (1).

Using again the ergodic theorem and Assumption 5.6.8, we conclude


|⎛ ⎞T |
| j |
| Σ y (k)y (l) ΣN |
|⎝ j yi (k)yi (l) |
−1/2
gi (k)⎠ (θ 0 − θ̂ N )|
i i
.N max | gi (k) −
1≤j ≤N | τ 2 (k)τ (l) N τ 2 (k)τ (l) |
| i=1 i i i=1 i i |
|| ||
||Σ ||
|| j yi (k)yi (l) j Σ yi (k)yi (l) ||
N
1 ||
= OP (1) max || 2
gi (k) − 2
gi (k)||
||
N 1≤j ≤N || τ (k)τi (l) N τ (k)τi (l) ||
i=1 i i=1 i

= oP (1).

Hence we obtain that


| |
| j |
| Σ j ΣN
|
.N
−1/2
max | | ai,7 − ai,7 || = oP (1)
1≤j ≤N | N |
i=1 i=1

and by the same arguments


| |
| j |
| Σ j ΣN
|
.N
−1/2
max | ai,8 − ai,8 || = oP (1).
1≤j ≤N || N |
i=1 i=1
5.7 Data Examples 313

Thus we proved that


|⎛ ⎞
| j
| Σ j ΣN
.N
−1/2
max |⎝ ŷi (k)ŷi (l) − ŷi (k)ŷi (l)⎠
1≤j ≤N || N
i=1 i=1
⎛ ⎞|
|
Σj
j ΣN
|
−⎝ yi∗ (k)yi∗ (l) − yi∗ (k)yi∗ (l)⎠||
i=1
N
i=1 |

= oP (1).

The result now follows from Theorem 5.6.1. ⨆


5.7 Data Examples

Example 5.7.1 (Dynamic Linear Model for Consumption and Income)


Figure 5.1 shows the quarterly percentage changes in consumption and income in
the United States from 1970–2016 (N = 188) obtained from the federal reserve
economic database (FRED). Jointly modelling these series using a dynamic
regression model was considered in Hyndman and Athanasopoulos (2021).
In this example we consider evaluating for change points in the relationship
between consumption and income within the framework of a dynamic regression
model as in Sect. 5.2. Letting yt and zt denote the consumption and income series,
respectively, we posit the dynamic regression model

yt = β0 + β1 zt + β2 yt−1 + Ei .
. (5.7.1)

We estimated the model parameters from the entire sample using ordinary least
squares, and the residuals computed as in (4.1.26) and their autocorrelation function
are shown in Fig. 5.2. These indicate that the residuals appear to exhibit light
autocorrelation, and also appear to undergo a change in variation a little past halfway
through the observation period.
We computed the quadratic form of the CUSUM process
⎡ ⎤1/2
ẐT
N (t) D̂−1 Ẑ (t)
N
QN (t) =
. (5.7.2)
[t (1 − t)]1/4

as in Theorem 5.2.1, where D̂ is estimated from Eq. (3.1.37) based on a single


preliminary change point estimator and using the Bartlett kernel with the bandwidth
selection method in Andrews (1991). The results using the bandwidth of Newey
and West (1987) were nearly the same. This process shown in Fig. 5.3, along with a
314 5 Parameter Changes in Time Series Models

Consumption
0

−1

−2

2.5

Income
0.0

−2.5

1970 1980 1990 2000 2010


Year

Fig. 5.1 Plots of the quarterly percentage change in consumption and income over the period
1970–2016 (N = 188)

horizontal dotted line indicating the 95% quantile of


⎛ 3 ⎞1/2
1 Σ
. sup Bi2 (t)
0<t<1 [t (1 − t)]1/4
i=1

obtained via simulation. We saw that this process exceeded the 95% quantile of
the limit distribution in Theorem 5.2.1 (approximate p–value was 0.011), and the
location at which the process was maximized coincided with the third quarter of
the year 2000. Performing binary segmentation around this initial change point
estimator suggested that there are no remaining change points of significance. The
estimators of parameters before and after the change point were (rounded to two
decimal places) β̂ 1 = (0.45, 0.37, 0.14) and β̂ 2 = (0.23, 0.04, 0.52), suggesting
5.7 Data Examples 315

1.0
1

0.8
0.6
0

ACF
0.4
−1

0.2
0.0
−2

0 50 100 150 0 5 10 15 20
Index Lag

Fig. 5.2 Plots of the residuals computed as in (4.1.26) from the model (5.7.1), along with their
ACF plot
3.0
2.5
(ZTN(t)D−1ZN(t))1/2 (t(1−t))1/4

2.0
1.5
^

1.0
0.5
0.0

Fig. 5.3 Plot of the process QN in (5.7.2) computed to evaluate for change points in the model
parameters in the model (5.7.1). The largest value of QN is attained at k̂ = 2000.75, the third
quarter of year 2000
316 5 Parameter Changes in Time Series Models

that level of the average change in consumption decreased and became less (linearly)
related to changes in income.
Example 5.7.2 (Stability of Vector GARCH Models for Emerging Market
Stock Indices) It was indicated by Forbes and Rigobon (2002) that a financial
contagion effect occurs if the “interlinkages” across markets experience a significant
increase after a market event. The actual dates at which conditional correlations
exhibit structural breaks are unknown, although they can be estimated and detected
through statistical methods as described in Sect. 5.6. In this example we conduct a
change point analysis conditional correlations between log–returns modelled using
vector GARCH models, as studied in Barassi et al. (2020). We considered three
groups of emerging stock market price indexes, as well as several benchmark
indices. The three regions Latin America, Central East Europe, and (East) Asia
were considered. The specific indices considered are detailed in Table 5.1. To each
regional group we added the S&P 500 index of the United States, and to the German
DAX 40 and Japanese Nikki 225 indices were added to the CEE and Asia groups,
respectively. The data were taken from the Datastream database and covered the
period 1 September 2006 to 1 of September 2010.
Vectors of log–returns were constructed for each region, to which we fit BEKK
as well as cDCC models using QMLE. To find changes in the correlation structures
of these three datasets, we apply Theorem 5.6.2 to evaluate the significance of
the test statistic M̂N(1) for each model. If a change was detected at the 95% level,
binary segmentation was performed. In this way the data were segmented into
six approximately homogenous subsets. The change point detection results are
displayed in Fig. 5.4. Both models show consistent patterns. The first prominent
change is around February 2007 which then reverted around August 2007. The

Table 5.1 Stock indices considered from three different regions: Latin America, Central East
Europe, and East Asia
Latin America Central East Europe (CEE) East Asia
Argentina (Argentina Czech (Prague SEPX) Hong Kong (Hang Seng)
MERVAL)
Brazil (Brazil Estonia (OMX Tallin) Indonesia (IDX composite)
BOVESPA)
Chile (Chile Santiago SE Hungary (Budapest) South Korea (Korea SE composite)
General)
Mexico (Mexico IPC) Poland (Warsaw General) Malaysia (Malaysia KLCI)
Colombia (Colombia Romania (Romania BET) Philippines (Philippine SE)
IGBC)
Peru (BVL General) Slovakia (Slovakia SAX 16) Singapore (Straits Times)
U.S. S&P 500 Slovenia (Slovenian blue chip) Taiwan (Taiwan SE weighted)
U.S. S&P 500 Thailand (Bangkok S.E.T)
Germany (DAX 40) China (Shanghai S.E. A share)
U.S. S&P 500
Japan (Nikki 225)
5.7 Data Examples 317

Conditional Correlations between US and Latin American markets(BEKK)


Conditional Correlations between US and Latin American markets(cDCC)
1.2 1.2

1 1

0.8 0.8

0.6 0.6

0.4 0.4
Value

Value
0.2 0.2
(Argentina)
(Argentina)
(Brazil)
0 0 (Brazil)
(Chile)
(Chile)
-0.2 (Mexico) -0.2 (Mexico)
(Colombia)
(Colombia)
-0.4 (Peru) -0.4
(Peru)
Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010 Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010
Year Year

(a) (b)

Conditional Correlations between US and Central East European markets(BEKK) Conditional Correlations between US and Central East European markets(cDCC)
1.2 1.2
(Czech)
(Estonia)
1 1
(Hungary)
(Poland)
0.8 (Romania) 0.8

(Slovakia)
0.6
(Slovenia) 0.6

0.4 0.4
Value

Value

0.2 0.2

(Czech)
0 0
(Estonia)
(Hungary)
-0.2 -0.2 (Poland)
(Romania)
(Slovakia)
-0.4 -0.4
(Slovenia)
Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010 Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010
Year Year

(c) (d)

Conditional Correlations between US and East Asian markets(BEKK) Conditional Correlations between US and East Asian markets(cDCC)
1.2 1.2
(Hong Kong) (Hong Kong)
(Indonesia) (Indonesia)
1 (Japan) 1 (Japan)
(South Korea) (South Korea)
(Malaysia) (Malaysia)
0.8 (Philippines) 0.8 (Philippines)
(Singapore) (Singapore)
(Taiwan) (Taiwan)
0.6 (Thailand) 0.6 (Thailand)
(China) (China)
0.4 0.4
Value

Value

0.2 0.2

0 0

-0.2 -0.2

-0.4 -0.4

Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010 Sep.2006 May.2007 Jan.2009 Sep.2008 May.2009 Jan.2010 Sep.2010
Year Year

(e) (f)

Fig. 5.4 Conditional correlation between the U.S. (S&P 500) and Latin America, CEE and Asia
estimated using multivariate GARCH models. The vertical lines show change points estimated
using binary segmentation. (a) BEKK model for Latin America (b) cDCC model for Latin
America. (c) BEKK model for CEE. (d) cDCC model for CEE. (e) BEKK model for Asia. (f)
cDCC model for Asia

third change generally coincided with September 2008 and the fourth and fifth
changes occurred in the second half of 2009 and April 2010 respectively. The
East Asian markets appear relatively less connected with the U.S. and tend to have
higher resistance, which might be explained with their closer relation with the large
economies in the area, such as Japan and China.
318 5 Parameter Changes in Time Series Models

Example 5.7.3 (Trends in Cryptocurrency Using RCA Models) Cryptocurren-


cies have been extremely volatile during the past few years, which suggests that
they are speculative in nature. Change point analyses of cryptocurrencies have been
the subject of several recent studies, and we refer to Hafner (2020), Astill et al.
(2023), and the references therein. We follow the example of Horváth and Trapani
(2023), who used demeaned Bloomberg Galaxy Crypto Index (BGCI) log prices
between August 2nd, 2017 and April 28th, 2022 (with N = 1237). The data show
heteroscedasticity and therefore we fit an RCA(1) models to this series. We compute
the test statistic
1
. sup |Q̄N (t)|
0<t<1 [t (1 − t)]κ

as described in Theorem 5.3.4, which is then compared to the quantiles of the


limiting distribution estimated as described in (5.3.41).
Using the binary segmentation, the cryptocurrency index appears to have several
breaks as shown in Fig. 5.5. This is particularly true during the first period, covering
the second half of 2017. Astill et al. (2023) carry out a similar analysis, and in
a similar time window, in the context of sequential monitoring of Bitcoin prices.
They claimed to have found possible evidence of explosive behaviour during the
summer 2017, which we did not find here in this retrospective analysis. However,
Astill et al. (2023) also find evidence of heteroscedasticity, stating that it “is [. . . ]
of considerable importance to allow for the presence of time–varying volatility in
the data when investigating whether or not the general upward movement in the
Bitcoin price series is due to explosive episodes”, concluding that the detection of

Fig. 5.5 Bloomberg Galaxy Crypto Index log prices with the estimated times of changes (vertical
lines)
5.8 Exercises 319

an explosive episode may be spurious. The tests in Theorem 5.3.4. are robust to both
conditional and unconditional volatility, and therefore they lend themselves to being
applied to this dataset. Looking at a graph of the data may suggest the presence of
a break around the September 2021 peak on Fig. 5.5, it is also possible that the
behaviour of the index in the terminal period of the sample is driven by changes in
the volatility of the series rather than larger structural changes.

5.8 Exercises

Exercise 5.8.1 We consider the AR(1) model yi = ρi yi−1 + Ei , where {Ei , i ∈ Z}


are independent and identically distributed random variables with EE0 = 0, 0 <
EE02 = σ 2 < ∞ and E|E0 |κ < ∞ with some κ > 2. We wish to test the null
hypothesis H0 : ρ1 = ρ2 = . . . = ρN . Under the null hypothesis |ρ1 | < 1. We use
| k |
|Σ k Σ
N |
1 | |
TN =
. max | Êi,0,N − Êi,0,N | ,
1/2
rN 1≤k<N | N |
i=1 i=1

with
⎛ ⎛ ⎞2
Σk
1 Σ
k
.rN = min ⎝ Êi,0,k − Êl,0,k
1≤k<N k
i=1 l=1
⎛ ⎞2 ⎞
Σ
N
1 Σ
N
+ Êi,k,N − Êl,k,N ⎠,
N −k
i=k+1 l=k+1

where Êi,j,k = yi − ρ̂j,k yi−1 and ρ̂j,k is the least squares estimator for the
autoregressive parameter computed from {yi , j + 1 ≤ i ≤ k}. Compute the limit
distribution of TN under the null hypothesis.
Exercise 5.8.2 We consider the AR(1) model yi = ρi yi−1 + Ei , where {Ei , i ∈ Z}
are independent and identically distributed random variables with EE0 = 0, 0 <
EE02 = σ 2 < ∞ and E|E0 |κ < ∞ with some κ > 2. We wish to test the null
hypothesis H0 : ρ1 = ρ2 = . . . = ρN against the alternative

yi = ρ1 yi−1 + Ei , 1 ≤ i ≤ k1 ,
yi =
.
yi = ρk1 +1 yi−1 + Ei , k1 + 1 ≤ i ≤ N,
320 5 Parameter Changes in Time Series Models

ρ1 = ρ1 (N ), limN →∞ |ρ1 | < 1. Show that TN → ∞ in probability, if ρk1 +1 =


ρk1 +1 (N ), limN →∞ |ρk1 +1 | < 1 and

N 1/2 |ρ1 − ρk1 +1 | → ∞.


.

Exercise 5.8.3 We consider the AR(1) model yi = ρi yi−1 + Ei , where {Ei , i ∈ Z}


is a stationary GARCH(1, 1) sequence, i.e. Ei = hi ηi , h2i = ω + αEi−1 2 + βh2 ,
i−1
ω, α, β > 0, α +β < 1. The innovations {ηi , i ∈ Z} are independent and identically
distributed with Eη0 = 0, Eη02 = 1 and E|η0 |κ < ∞ with some κ > 4.. We wish
to test H0 : ρ1 = ρ2 = . . . = ρN . Under the null hypothesis |ρ1 | < 1. We use
| k |
|Σ k Σ
N |
1 | |
TN =
. max | Êi,0,N − Êi,0,N | ,
1/2
rN 1≤k<N | N |
i=1 i=1

with
⎛ ⎛ ⎞2
Σ
k

k
rN = min ⎝
. Êi,0,k − Êl,0,k
1≤k<N k
i=1 l=1
⎛ ⎞2 ⎞
Σ
N
1 Σ
N
+ Êi,k,N − Êl,k,N ⎠,
N −k
i=k+1 l=k+1

where Êi,j,k = yi − ρ̂j,k yi−1 and ρ̂j,k is the least squares estimator for the
autoregressive parameter computed from {yi , j + 1 ≤ i ≤ k}. Compute the limit
distribution of TN under the null hypothesis.
Exercise 5.8.4 We consider the dynamic regression model yi = xi β + ρyi−1 +
a(i/N)ηi , 1 ≤ i ≤ N, y0 = 0, where a(u), 0 ≤ u ≤ 1 is a Riemann integrable
function and |ρ| < 1. We assume that {xi , 1 ≤ i ≤ N } are independent and
identically distributed random variables with Exi = 0, Exi2 = σ 2 < ∞,{ηi , 1 ≤
i ≤ N } are independent and identically distributed random variables with Eηi =
0, Eηi2 = 1. The two sequences are independent. Compute

1 Σ 2
N
. lim Eyi .
N →∞ N
i=1

Exercise 5.8.5 We consider the dynamic regression model yi = xi β + ρyi−1 +


a(i/N)ηi , 1 ≤ i ≤ N, y0 = 0, where a(u), 0 ≤ u ≤ 1 is a Riemann integrable
function and |ρ| < 1. We assume that {xi , 1 ≤ i ≤ N} are independent and
identically distributed random variables with Exi = 0, Exi2 = σ 2 < ∞, E|xi |4 <
∞, {ηi , 1 ≤ i ≤ N} are independent and identically distributed random variables
with Eηi = 0, Eηi2 = 1 and E|ηi |4 < ∞. The two sequences are independent.
5.8 Exercises 321

Prove that

1 Σ 2
N
. yi converges in probability.
N
i=1

Exercise 5.8.6 We consider the dynamic regression model yi = xi βi + ρi yi−1 +


a(i/N)ηi , 1 ≤ i ≤ N, y0 = 0, where a(u), 0 ≤ u ≤ 1 is a Riemann integrable
function and |ρ| < 1. We assume that {xi , 1 ≤ i ≤ N } are independent and
identically distributed random variables with Exi = 0, Exi2 = σ 2 < ∞, E|xi |4 <
∞, {ηi , 1 ≤ i ≤ N} are independent and identically distributed random variables
with Eηi = 0, Eηi2 = 1 and E|ηi |4 < ∞. The two sequences are independent. We
wish to test H0 : (β1 , ρ1 ) = (β2 , ρ2 ) = . . . = (βN , ρN ) using the test statistics
{| | | |}
TN = N −3/2 max k(N − k) |β0,k − βk,N | + |ρ0,k − ρk,N | ,
. (5.8.1)
1≤k<N

where (βj,k , ρj,k ) is the least squares estimators computed from {xi , yi , j +1 ≤ i ≤
k}.
Exercise 5.8.7 We consider the dynamic regression model yi = xi βi + ρi yi−1 +
a(i/N)ηi , 1 ≤ i ≤ N, y0 = 0, where a(u), 0 ≤ u ≤ 1 is a Riemann integrable
function and |ρ| < 1. We assume that {xi , 1 ≤ i ≤ N } are independent and
identically distributed random variables with Exi = 0, Exi2 = σ 2 < ∞, E|xi |4 <
∞, {ηi , 1 ≤ i ≤ N} are independent and identically distributed random variables
with Eηi = 0, Eηi2 = 1 and E|ηi |4 < ∞. The two sequences are independent. We
wish to test H0 : (β1 , ρ1 ) = (β2 , ρ2 ) = . . . = (βN , ρN ) against the alternative

xi β1 + ρ1 yi−1 + a(i/N)ηi , 1 ≤ i ≤ k1 ,
yi =
.
xi βk1 +1 + ρk1 +1 yi−1 + a(i/N)ηi , k1 + 1 ≤ i ≤ N

using TN of (5.8.1). Show that TN → ∞ in probability, if k1 = LN θ1 ⎦ , 0 < θ1 < 1,


β1 = β1 (N ), ρ1 = ρ1 (N ), βk1 +1 = βk1 +1 (N ), ρk1 +1 = ρk1 +1 (N ) and

N 1/2 min(|β1 − βk1 +1 |, |ρ1 − ρk1 +1 |) → ∞.


.

Exercise 5.8.8 We consider the multivariate RCA(1) model

Yi = (Bi + Ei,1 )Yi−1 + Ei,2 ,


. (5.8.2)

Yi ∈ Rd , Bi ∈ Rd×d . We assume that {Ei,1 , i ∈ Z} are independent and identically


distributed random vectors in Rd , EE0,1 = 0, E||E0,1 ||ν < ∞ with some ν > 0.
We assume that {Ei,2 , i ∈ Z} are independent and identically distributed random
vectors in Rd , EE0,2 = 0, E||E0,2 ||ν < ∞ with some ν > 0. The two sequences
are independent. We wish to test H0 : B1 = B2 = . . . = BN . The common value
of the regression parameter under the null hypothesis is denoted by B0 . We assume
322 5 Parameter Changes in Time Series Models

that under the null hypothesis E log ||(B0 + E0,1 )r || < 0 with some positive integer
r. Show that under the null hypothesis (5.8.2) has a stationary solution.
Exercise 5.8.9 We consider the multivariate RCA(1) model of (5.8.2). We assume
that {Ei,1 , i ∈ Z} are independent and identically distributed random vectors in
Rd , EE0,1 = 0, E||E0,1 ||ν < ∞ with some ν > 4. We assume that {Ei,2 , i ∈
Z} are independent and identically distributed random vectors in Rd , EE0,2 =
0, E||E0,2 ||ν < ∞ with some ν > 4. The two sequences are independent. We
wish to test H0 : B1 = B2 = . . . = BN . The common value of the regression
parameter under the null hypothesis is denoted by B0 . We assume that under the
null hypothesis E||(B0 + E0,1 )r || < 1 with some positive integer r. Find a test and
compute its asymptotic distribution under the null hypothesis.
Exercise 5.8.10 We test for the stability of a GARCH(1,1) model. Under the null
hypothesis the model is the stationary sequence

yi = hi Ei
. and h2i = ω0 + α0 yi−1
2
+ β0 Ei−1
2
,

ω0 >, α0 > 0, β0 > and E log(β0 + α0 Ei2 ) < 0. We define the squared residuals by
the recursions

yi2
Êi2 =
. and σ̂i2 = ω̂N + α̂N yi−1
2
+ β̂N σ̂i−1
2
, (5.8.3)
σ̂i2

where (ω̂N , α̂N , β̂N ) is the QMLE for (ω0 , α0 , β0 ). Assuming that (5.8.3) holds,
compute the limit distribution of
| k |
|Σ k Σ 2 ||
N
−1/2 |
TN = N
. max | Êi −
2
Êi | .
1≤k<N | N |
i=1 i=1

5.9 Bibliographic Notes and Remarks

Davis et al. (1995) proves that Theorem 4.1.2 holds for AR(d) sequences. They also
investigate the cases when the variance as well as d, the lag of the autoregressive
process change at an unknown time. They use a different but asymptotically
equivalent normalization of the maximally selected log likelihood, and their sim-
ulations show better finite sample properties. Davis et al. (2006) proposes the
minimum description to segment time series data into stationary subsets. Theoretical
justifications are in Davis et al. (2008) and Davis and Yau (2013). Lavielle (1999),
Lavielle and Moulines (2000) and Lavielle and Teyssiére (2006) provide methods
to find multiple changes in time series. Dalla et al. (2020) allows changes in the
mean as well as in the variance. Akashi et al. (2018) and Chakar et al. (2017)
5.9 Bibliographic Notes and Remarks 323

develop robust methods to conduct change point analysis for autoregressive models.
Gombay (2008) derived a method based on score vectors.
Due to its importance and relative simplicity, changes in AR(1) processes have
been investigated by several authors. Chong (2001) assumes stationarity, while Pang
et al. (2014), Pang et al. (2018)) also allow for non–stationarity. Busetti and Harvey
(2001) test for the presence of a random walk in a sequence with several breaks. Zhu
and Ling (2011) uses a likelihood test to find a change from AR(p) to threshold
AR(p) model. The transition to a threshold model is also investigated in Berkes
et al. (2011).
Krämer et al. (1988) apply the weighted sum of the recursive residuals in an
AR(1) dynamic regression model to a change. Vogelsang (1997) uses the maximally
selected sums of functionals of Wald statistics. For further results on dynamic
regression we refer to Bauer (2005) and Guay and Guerre (2006). Ling and Li
(1998), Ling (1999) Ling and McAleer (2003a), Ling and McAleer (2003b) and Li
et al. (2002) provide several results and surveys on ARMA processes with condition-
ally heteroscedastic errors. Kirch and Kamgaing (2012) develop testing procedures
for the detection of structural changes in nonlinear autoregressiveprocesses.
Andél (1976) and Nicholls and Quinn (1982) introduced the random coefficient
model (RCA), and established conditions under which it admits a stationary solution
as well as its basic probabilistic and statistical properties. Schick (1996) and
Janečková and Prášková (2004) obtained several limit theorems for the estimators
of the parameters. Aue et al. (2006) contains a comprehensive study of RCA(1)
sequences in the stationary case. Berkes et al. (2011) extended these results to
the non stationary case. Aue (2004) obtained Gaussian approximations for partial
sums of RCA(1) variables. The proofs of Theorems 5.3.1 and 5.3.2 rely on Horváth
and Trapani (2016). Thavaneswaran et al. (2009) generalizes the stationary case
when the errors follow a stationary GARCH model. Erhardsson (2014) investigates
vector valued stationary RCA processes. Dong and Spielmann (2020) contains some
applications to ruin theory. Kang and Lee (2009) studies parameter change tests in
random coefficient integer–valued autoregressive processes with an application to
polio data. For a survey on RCA processes we refer to Regis et al. (2021). Horváth
and Trapani (2016) used weighted CUSUM to test the stability of RCA models
while Horváth et al. (2024) applied the maximally selected likelihood to the same
problem.
Francq and Zakoian (2010) provides an excellent account of the theory of the
GARCH and other volatility processes. We make use of their methods in Sect. 5.4.
Berkes et al. (2003) studies the properties of the GARCH(p, q) sequence and the
quasi–likelihood estimators of the parameters. Their results are extended to ARMA–
GARCH processes in Francq and Zakoian (2004). We use the quasi–likelihood
method which assumes that the innovations are standard normal random variables.
The results of Sect. 5.4 can be extended when the more general quasi–likelihood
method is used to estimate the parameters. Optimality of the conditions needed to
estimate the parameters of GARCH processes is discussed in Berkes and Horváth
(2003b). Hall and Yao (2003) obtains the limit distributions of the quasi maximum
likelihood estimators when the innovations have heavy tails. Robust estimation
324 5 Parameter Changes in Time Series Models

of the parameters in the ARCH model is discussed in Horváth and Liese (2003)
and Peng and Yao (2003). They show that the Lp estimators can be linearized
so that the change point methods discussed can be extended to such estimators.
Hillebrand (2005) investigates the effect of neglected change points in statistical
inference on volatility processes. Li et al. (2002) reviews some theoretical results
for time series models with GARCH errors, and it is directed towards practitioners.
They discuss various new volatility models, including double threshold ARCH and
GARCH, ARFIMA–GARCH, CHARMA and vector ARMA–GARCH. Hörmann
(2008) shows that the augmented GARCH sequences are decomposable Bernoulli
shifts under natural conditions. Berkes et al. (2011) provides a method to detect
if an AR model changes to a threshold AR model at an unknown time. The
score function based testing is adapted from Berkes et al. (2004) who studies
the stationary GARCH(p, q) case. The comparison of the estimates in volatility
processes has been initiated by Ling (2007). Ling (2007) uses the concept of Near
Epoch Dependence (NED) which is closely related to Lν –decomposability. For
more general results on change point detection in time series we refer to Ling
(2016). Jensen and Rahbek (2004) proves the asymptotic normality of the quasi–
likelihood estimators in explosive GARCH (1,1) models. Pedersen and Rahbek
(2014) shows that the variance targeting method can be extended to multivariate
GARCH models.
In this chapter we pointed out the connection between CUSUM based testing and
other methods to find changes in the parameters of a time series. Bücher et al. (2019)
uses a CUSUM based test for stationarity of the observations. It is an interesting
question to distinguish between non–stationarity and changes in the parameters,
including the mean and the variance; see Busetti and Harvey, 2003, Busetti and
Taylor, 2004, Leybourne et al. (2006). Also, we only consider the alternative when
the parameters abruptly change. However, all the proposed tests have power against
other types of alternatives as well, for example gradual changes. Bin and Yongmiao
(2016) investigate smooth changes in GARCH models.
Our discussion of change points for multivariate time series follows closely Kirch
et al. (2015), where some more general results are proven. The results in Sect. 5.6
are motivated by Barassi et al. (2020). For some interesting applications to finance
we refer to Avalos (2014), Edwards (2020), Ju et al. (2020) and Mitchener and Pina
(2020)
Chapter 6
Sequential Monitoring

Up to this point we have been concerned with what is usually referred to as


“retrospective” or “off-line” change point detection and estimation, in which the
goal is to conduct change point analysis retrospectively on an observed series. In this
chapter, we shift our focus to sequential or “online” change point detection methods.
These aim to detect a change point in the data generating process, relative to a stable
training or historical sample, as quickly as possible as we continue to obtain data
sequentially. We begin by developing the framework of such sequential detection
procedures in the context of a simple mean change in Sect. 6.1, which we then
extend to linear and time series models in Sects. 6.2 and 6.3. A key consideration
throughout is the distribution of the stopping time, which is the amount of time
required in order to detect a change point in the data generating process. The
asymptotic distribution of stopping times are investigated in Sect. 6.4.

6.1 Sequential Detection Procedures and Stopping Times

In order to fix ideas, we begin by considering the problem of detecting a change in


the mean of sequentially obtained observations from that of a training (historical)
sample. We assume that the training sample is of the form

Xi = μ0 + Ei , 1 ≤ i ≤ M,
.

where .{Ei , i ∈ Z} is a mean zero, stationary sequence. As such the training sample
is assumed to be drawn from a stationary process. Under the null hypothesis, the
mean does not change as we continue to observe data beyond the training sample.
This is formulated as

H0 : Xi = μ0 + Ei , M + 1 ≤ i < ∞.
. (6.1.1)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 325
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_6
326 6 Sequential Monitoring

This hypothesis is often termed the “open-ended” null hypothesis, since we assume
that we will continue to sample observations into the indefinite future. Under the
alternative the mean changes after observing .k ∗ observations beyond the training
sample, so that

μ0 + Ei , M + 1 ≤ i ≤ M + k ∗ ,
HA : Xi =
. (6.1.2)
μA + Ei , M + k ∗ + 1 ≤ i < ∞.

In some cases we are also interested in “closed-ended” procedures, in which we


test

H0 : Xi = μ0 + Ei ,
. M + 1 ≤ i ≤ cM , (6.1.3)

against

μ0 + Ei , M + 1 ≤ i ≤ M + k ∗ ,
HA : Xi =
. (6.1.4)
μA + Ei , M + k ∗ + 1 ≤ i ≤ cM .

In these scenarios .cM is a user-specified length of time after which we will terminate
the sequential change point detection procedure if we have not yet detected a change
point. First we consider the open-ended detection problem. We consider a change
point detector based on the residuals

1 Σ
M
Êi,M = Xi − X̄M ,
. where X̄M = Xl .
M
l=1

The basic procedure we study is to compare the size of the running average of the
residuals based on the mean of the training sample, which are proportional to the
partial sum

Σ
M+k
SM (k) =
. Êl,M , (6.1.5)
l=M+1

to a detector boundary function


⎛ ⎞ ⎛ ⎞
k k
g(M, k) = vM
.
1/2
1+ w , (6.1.6)
M k+M

where .w(·) is a weight function, and v is a user-specified constant. A typical choice


for the weight function is .w(u) = uκ , .0 ≤ κ < 1/2. Chu et al. (1996) make use
of a boundary function of the form .g(u) = [u(a 2 + log(1/u))]1/2 , which we could
6.1 Sequential Detection Procedures and Stopping Times 327

alternately consider. We terminate the process and declare that a change in the mean
has occurred at the stopping time

min {k : |SM (k)| ≥ g(M, k)}
τ (M) =
. (6.1.7)
∞, if |SM (k)| < g(M, k) for all k ≥ 1.

Our aim is to calibrate the boundary function so that the probability that the process
.|SM (k)| will cross .g(M, k) is controlled under .H0 , and is as large as possible under
.HA . This is done by choosing the constant .v = v(α) so that

. lim P {τ (M) < ∞} = α under H0 ,


M→∞

and

. lim P {τ (M) < ∞} = 1 under HA . (6.1.8)


M→∞

In order to compute the above asymptotic probabilities, we assume the following


conditions on the error terms akin to Assumption 1.1.1 in Chap. 1:
Assumption 6.1.1 For each M there are independent Wiener processes
{WM,1 (x), 0 ≤ x ≤ M} and .{WM,2 (x), 0 ≤ x < ∞} such that
.

|M |
|Σ | ( )
| |
.| Ei − σ WM,1 (M)| = OP M ζ
| |
i=1

and
| |
| M+Lx⎦ |
1 | Σ |
. sup | El − σ WM,2 (x)|| = OP (1)
ζ |
1≤x<∞ x |l=M+1 |

with some .ζ < 1/2 and .σ 2 > 0.


This allows for the approximation of .SM by a Gaussian process. In order to
describe weight functions for which .limM→∞ P {τ (M) < ∞} is well defined, we
use a modification of Assumption 1.2.1:

Assumption 6.1.2 (i) .infδ≤t≤1 w(t) > 0 for all .0 < δ < 1, and (ii) .w(t) is non-
decreasing in a neighbourhood of 0.
Similarly to .I (w, c) of (1.2.4), we define
⎧ 1/2 ⎛ ⎞
∗ 1 cw 2 (t)
I (w, c) =
. exp − dt.
0 t t
328 6 Sequential Monitoring

Our main result gives the asymptotic probability that .τ (M) < ∞ under the null
hypothesis.

Theorem 6.1.1 If .H0 of (6.1.1), Assumptions 6.1.1, 6.1.2 are satisfied and
I ∗ (w, c) < ∞ with some .c > 0, then
.

⎧ ⎫
|W (t)|
. lim P {τ (M) < ∞} = sup < vσ ,
M→∞ 0≤t≤1 w(t)

where .{W (t), 0 ≤ t ≤ 1} is a Wiener process.


Proof We show
| M+x |
M −1/2 | Σ ⎛ ⎞|
| x |
. sup | Êi,M − σ WM,2 (x) − WM,1 (M) |
1≤x<∞ (1 + x/M)w(x/(M + x)) | i=M+1
M |

= oP (1), (6.1.9)

as .M → ∞. By Assumption 6.1.1, we have for any .0 < a < 1


| M+x |
M −1/2 | Σ ⎛ ⎞|
| x |
. sup | Êi,M − σ WM,2 (x) − WM,1 (M) |
1≤x≤M (1 + x/M)w(x/(M + x)) | M |
i=M+1
| M+x |
M −1/2 | Σ |
| |
≤ sup | Ei − σ WM,2 (x)|
1≤x≤aM w(x/(M + x)) | |
i=M+1
|M |
M −1/2 |Σ |
| |
+ sup | Ei − σ WM,1 (M)|
aM≤x≤M w(x/(M + x)) | |
i=1

M −1/2 M ζ −1/2
= OP (1) sup x ζ + OP (1) sup .
1≤x≤aM w(x/(M + x)) aM≤x≤M w(x/(M + x))

We have that,

M −1/2 x ζ (x/(M + x))1/2 M −1/2 x ζ


. sup ≤ sup sup .
1≤x≤aM w(x/(M + x)) 1≤x≤aM w(x/(M + x)) 1≤x≤aM (x/(M + x))1/2

Since .I ∗ (w, c) < ∞, we have that

t 1/2
. lim = 0,
t→0 w(t)

and therefore
6.1 Sequential Detection Procedures and Stopping Times 329

(x/(M + x))1/2 t 1/2


. sup ≤ sup → 0,
1≤x≤aM w(x/(M + x)) 0<t<a/(1+a) w(t)

as .a → 0 for each M. We see that

M −1/2 x ζ
. sup = O(1)
1≤x≤aM (x/(M + x))1/2

for all .a > 0. Also,

M −1/2 x ζ
. sup = o(1)
aM≤x≤M w(x/(M + x))

for any .a > 0. Similar arguments give


| M+x |
M −1/2 | Σ ⎛ ⎞|
| x |
. sup | Êi,M − σ WM,2 (x) − WM,1 (M) |
M≤x<∞ (1 + x/M)w(x/(M + x)) | M |
i=M+1
| M+x |
M −1/2 | Σ |
1 | |
≤ sup sup | Ei − σ WM,1 (x)|
1/2≤u≤1 w(u) M≤x<∞ 1 + x/M | |
i=M+1
|M |
M −1/2 |Σ |
1 | |
+ sup sup | Ei − σ WM,2 (M)|
1/2≤u≤1 w(u) M≤x<∞ 1 + x/M | |
i=1

M −1/2 ζ M ζ −1/2
= OP (1) sup x + OP (1) sup
M≤x<∞ 1 + x/M M≤x<∞ 1 + x/M

= oP (1).

Using the scale transformation of the Wiener process,


⎛ ⎛ ⎞ ⎫
M −1/2 x
. WM,2 (x) − WM,1 (M) , x ≥ 1
(1 + x/M)w(x/(M + x)) M
⎛ ⎛ ⎞ ⎫
D 1 x
= W2 (x/M) − W1 (1) , x ≥ 1 ,
(1 + x/M)w(x/(M + x)) M

where .{W1 (t), t ≥ 0} and .{W2 (t), t ≥ 0} are independent Wiener processes. By a
change of variables we get that
┌⎛ ⎛ ⎞┐−1 | ⎛ ⎞ |
x⎞ x | x x |
. sup 1+ w |W2 − W1 (1)|
1≤x<∞ M M + x M M
330 6 Sequential Monitoring

1
= sup |W2 (t) − tW1 (1)|
1/M≤t≤1 (1 + t)w(t/(1 + t))
1
→ sup |W2 (t) − tW1 (1)|
0≤t≤1 (1 + t)w(t/(1 + t))

almost surely as .M → ∞. By computing covariance functions one can easily verify


that
⎛ ⎫ ⎛ ⎛ ⎞ ⎫
1 D t
. (W2 (t) − tW1 (1)) , t ≥ 0 = W ,t ≥ 0 ,
1+t 1+t

where .{W (u), 0 ≤ u ≤ 1} is a standard Wiener process. Noting that

|W (t/(1 + t))| D |W (t)|


. sup = sup ,
0<t<∞ w(t/(1 + t)) 0<t<1 w(t)

the result follows. ⨆



This approach may be easily modified for closed-ended procedures. With .cM
again denoting the maximum length of the sequential detection procedure in (6.1.3)
and (6.1.4), we define the stopping time

min {k : 1 ≤ k < cM , |SM (k)| ≥ g(M, k)}
τc (M) =
.
cM , if |SM (k)| < g(M, k) for all 1 ≤ k < cM .

Under the assumptions of Theorem 6.1.1, and assuming that

cM
. lim = c.
M→∞ M

⎧ ⎫
|W (t)|
. lim P {τc (M) < cM } = sup < vσ ,
M→∞ 0≤t≤c/(1+c) w(t)

where .{W (t), 0 ≤ t ≤ 1} is a Wiener process.


Before we turn to generalizing these methods beyond monitoring for changes in
the mean of scalar observations, we show briefly that these detection procedures
are consistent under .HA . We take up a more careful analysis of the distributions of
.τ (M) and .τc (M) under .HA in Sect. 6.4. Letting .Δ(M) = μ0 − μA , we see that so

long as

(M − k ∗ )
. |Δ(M)| → ∞,
M 1/2
6.2 Linear Models 331

as .M → ∞, then (6.1.8) holds. The proof of Theorem 6.1.1 yields


| M+x |
M −1/2 | Σ |
| |
. sup | Êi,M |
1≤x<∞ (1 + x/M)w(x/(M + x)) | |
i=M+1

(M − k ∗ )|ΔM | P
≥ OP (1) + → ∞.
2M 1/2 w(1/2)

In order to apply Theorem 6.1.1, require a consistent estimate of the (long-run)


variance .σ in Assumption 6.1.1. If the estimator .σ̂M satisfies

|σ̂M − σ | = oP (1)
.

then Theorem 6.1.1 remains true when .σ is replaced with .σ̂M . Such an estimator
may be computed from the training sample as detailed in Sect. 3.1.
It is also straightforward to extend Theorem 6.1.1 vector-valued observations.
We refer to Exercise 6.6.3 to sequentially detect a change in the means of random
vectors.

6.2 Linear Models

We now consider sequentially monitoring for changes in the parameters of a linear


model from those in a training sample of length M,

yi = xT
. i β 0 + Ei , 1 ≤ i ≤ M, β 0 ∈ Rd , xi ∈ Rd .

We consider here only open-ended monitoring, although these methods may be


modified as in Sect. 6.1 to develop a closed-ended procedure as well. Under the
open-ended null hypothesis, the linear model parameters remain homogenous as we
continue to obtain observations beyond the training sample, i.e.

H0 : yi = xT
. i β 0 + Ei , M + 1 ≤ i < ∞, (6.2.1)

while under the alternative

HA : there is an integer k ∗ ≥ 1 such that yi = xT


.

i β 0 + Ei , M + 1 ≤ i ≤ M + k and

yi = xT ∗
i β A + Ei , M + k + 1 ≤ i < ∞ with β 0 /= β A . (6.2.2)

Let .β̂ M be the least square estimator for .β 0 using the historical sample,
⎛ ⎞−1
β̂ M = XT
. M XM XM YM ,
332 6 Sequential Monitoring

where

⎛ ⎛ ⎞
xT
1 y1
⎜ xT ⎟ ⎜ y2 ⎟
⎜ 2⎟ ⎜ ⎟
. XM =⎜ . ⎟ and YM = ⎜ . ⎟.
⎝ .. ⎠ ⎝ .. ⎠
xT
M yM

As in Chap. 4, we will assume that the covariate and error series .{zi = (xT T
i , Ei ) , i ∈
Z} are .L -decomposable for some .ν > 4. In this case we have by the ergodic
ν

theorem that
1 T P
. X XM → A.
M M
The model residuals of the incoming data based on the original parameter
estimate from the training sample are

Êi,M = yi − xT
. i β̂ M , M + 1 ≤ i < ∞. (6.2.3)

To define the monitoring strategy based on these residuals, we define a detector and
a boundary function. Our detector is
⎛ ⎞T ⎛ ⎞
Σ
M+k Σ
M+k
−1
ZM (k) =
. xi Êi,M D xi Êi,M , 1 ≤ k < ∞. (6.2.4)
i=M+1 i=M+1

The matrix .D is the long-run covariance matrix of (4.1.7), i.e.



Σ
D=
. E (x0 E0 ) (xl El )T .
l=−∞

Assumption 6.2.1 The matrices .A and .D are non-singular.


Using the same boundary function .g(M, k) as in (6.1.6), we define the stopping
time
⎧ { }
1/2
min k : ZM (k) ≥ g(M, k)
.τL (M) = (6.2.5)
1/2
∞, if ZM (k) < g(M, k) for all k ≥ 1.

Theorem 6.2.1 Assume that .H0 of (6.2.1), Assumptions 6.1.2–6.2.1 are satisfied,
I ∗ (w, c) < ∞ with some .c > 0, and .{zi = (xT
.
T
i , Ei ) , i ∈ Z} are .L -decomposable
ν

for some .ν > 4. Then


6.2 Linear Models 333

⎧ ⎛ d ⎞1/2 ⎫
⎨ 1 Σ ⎬
. lim P {τL (M) < ∞} = P sup Wl2 (u) ≤v ,
M→∞ ⎩0<u≤1 w(u) ⎭
l=1

where .{Wi (u), 0 ≤ u ≤ 1}, .i ∈ {1, . . . , d} are independent Wiener processes.


Proof We note

.Êi,M = Ei − xT
i (β̂ M − β 0 ),

where .β 0 is the common value of the regression parameter under the null hypothesis.
Hence

Σ
M+k
k Σ
M Σ
M+k
k Σ T
M
. xi Êi,M − xi Êi,M = xi Ei − xi Ei
M M
i=M+1 i=1 i=M+1 i=1
⎛ ⎞
Σ
M+k
k Σ T
M
− xi xT
i − xi xi (β̂ M − β 0 ).
M
i=M+1 i=1

Since .{zi = (xT T T


i , Ei ) , i ∈ Z} are .L -decomposable for some .ν > 4, .{xi xi , i ∈ Z}
ν
ν/2 T
is .L -decomposable. Hence each coordinate of the partial sums of .xi xi may be
approximated by a Gaussian process. We obtain as in the proof of Theorem 6.1.1
that
|| M+x ||
M −1/2 || Σ Σ
M ||
|| x ||
. sup || xi xT − xi xT || = OP (1).
1≤x<∞ (1 + x/M)w(x/(x + M)) || i
M i ||
i=M+1 i=1

Since
|| || ⎛ ⎞
|| ||
. ||β̂ M − β 0 || = OP M −1/2 ,

(see Sect. 4.1.1) we conclude


||⎛ M+x ⎞ ||
M −1/2 || Σ x Σ T ⎛
M ⎞||
|| T ||
. sup || xi xi − xi xi β̂ M − β 0 ||
1≤x<∞ (1 + x/M)w(x/(x + M)) || M
i=M+1
||
i=1

= oP (1).

We also have that .{xi Ei , i ∈ Z} is .Lν/2 -decomposable, and therefore for each
M there are independent Gaussian processes .{WM,1 (x), 1 ≤ x ≤ M} and
.{WM,2 (x), 1 ≤ x < ∞} such that

|| x ||
||Σ ||
1 || ||
. max || xi Ei − WM,1 (x)|| = OP (1)
1≤x≤M x ζ || ||
i=1
334 6 Sequential Monitoring

and
|| M+x ||
|| Σ ||
1 || ||
. sup || xi Ei − WM,2 (x)|| = OP (1)
1≤x<∞ x
ζ || ||
i=M+1

with some .ζ < 1/2, .EWM,1 (x) = EWM,2 (x) = 0 and .EWM,1 (x)WT M,1 (y) =
EWM,2 (x)WT M,2 (y) = D min(x, y). Arguing as in the proof of Theorem 6.1.1 one
can verify

M −1/2
. sup
1≤x<∞ (1 + x/M)w(x/(x + M))
|| M+x ||
|| Σ x Σ
M ⎛ ⎞||
|| x ||
× || xi Ei − xi Ei − WM,2 (x) − WM,1 (M) ||
|| M M ||
i=M+1 i=1

= oP (1).

Observing that
┌⎛ ⎞T
M −1/2 x
. sup WM,2 (x) − WM,1 (M) D−1
1≤x<∞ (1 + x/M)w(x/(x + M)) M
⎛ x ⎞ ┐1/2
× WM,2 (x) − WM,1 (M)
M
⎛ d ⎞1/2
D 1 Σ
→ sup Wi2 (u) ,
0<u≤1 w(u) i=1

where .{Wi (u), 0 ≤ u ≤ 1}, .i ∈ {1, . . . , d} are independent Wiener processes, the
proof is complete. ⨆

Once again in practice we require an estimator of .D. If the random vectors
xi Ei are uncorrelated, one can use the sample covariance matrix of the weighted
.

residuals .xi Êi,M , computed from the training sample. If the weighted innovations
are correlated, a kernel-bandwidth estimator may be used as in Sect. 3.1.2.

6.3 Time Series Models

These results can also be used to develop monitoring procedures for the parameters
in many of the time series models studied in Chap. 5. To begin, we consider dynamic
linear models as studied in Sect. 5.2. In this case we assume that the historical
sample satisfies
6.3 Time Series Models 335

.yi = xT
i β 0 + Ei , 1 ≤ i ≤ M, (6.3.1)

where .β 0 = (β0,1 , . . . , β0,r+d )T ∈ Rr+d , and


( )T
xi = xi,1 , xi,2 , . . . , xi,r , yi−1 , . . . , yi−d = (xT
.
T T
i,r , yi,d ) .

We remark that this setup also covers autoregressive models. The open-ended null
hypothesis is stated as

H0 : yi+M = xT
. i+M β 0 + Ei+M , 1 ≤ i < ∞, (6.3.2)

and we wish to detect as quickly as possible a change in the linear model parameters
occurring after .k ∗ additional observations have been obtained,

xT
i+M β 0 + Ei+M , 1 ≤ i < k∗
HA : yi+M =
.
xT
i+M β A + Ei+M , k ∗ ≤ i < ∞,

where .β 0 /= β A . We assume as above that the exogenous covariates and model


innovations .{zi = (xT T
i , Ei ) , i ∈ Z} are .L -decomposable for some .ν > 4. Under
ν

the null hypothesis and the additional Assumption 5.2.1 that the polynomial .1 −
β0,r+1 t − · · · − β0,r+d t d has all of its zeros outside of the unit circle in .C, there
exists an .Lν -decomposable sequence .{yi , i ∈ Z} satisfying (6.3.1). We define the
residuals as in (6.2.3) and we use the same boundary function and detector as in
(6.1.6) and (6.2.4).
Theorem 6.3.1 If .H0 of (6.3.2), Assumptions 5.2.1–5.2.4, 6.1.2 are satisfied and
I ∗ (w, c) < ∞ with some .c > 0, then
.


⎛r+d ⎞1/2 ⎫
1 ⎨
Σ ⎬
. lim P {τL (M) < ∞} = P sup Wl2 (u) ≤v ,
M→∞ ⎩0<u≤1 w(u) ⎭
l=1

where .{Wi (u), 0 ≤ u ≤ 1}, .i ∈ {1, . . . , d} are independent Wiener processes.


In Sects. 5.3 and 5.4 we investigated the stability of the parameters of several
popular non-linear time series models, and these methods can also be adapted to the
sequential monitoring setup. For an RCA.(1) model, we assume we have access to a
training sample following

yi = (β0 + Ei,1 )yi−1 + Ei,2 , 1 ≤ i ≤ M.


.

Under the null hypothesis the parameters remain stable as we continue to observe
data, so that

yi = (β0 + Ei,1 )yi−1 + Ei,2 , M + 1 ≤ i < ∞.


. (6.3.3)
336 6 Sequential Monitoring

Under the alternative, there is a change after .k ∗ observations following the training
sample:

(β0 + Ei,1 )yi−1 + Ei,2 , 1 ≤ i ≤ k ∗ − 1,
HA : yi+M =
.
(βA + Ei,1 )yi−1 + Ei,2 , k ∗ ≤ i < ∞,

and .β0 /= βA . A natural approach is based on comparing the weighted least squares
estimators of .β0 introduced in Sect. 5.3 based on the training sample to sequentially
updated estimators based on the incoming data. Let .β̂i,j be the least square estimator
as in (5.3.4), computed from .yi+1 , . . . , yj . We define the detector process
⎛ ⎞
ẐM (k) = k β̂M,M+k − β̂1,M ,
. 1 ≤ k < ∞, (6.3.4)

and we again use the boundary function of (6.1.6). We then define the stopping time

min {k : |ZM (k)| ≥ g(M, k)}
τRCA (M) =
.
∞, if |ZM (k)| < g(M, k) for all k ≥ 1.

Theorem 6.3.2 If .H0 of (6.3.3), Assumptions 5.3.1, 5.3.2, 6.1.2 are satisfied,
E log |β0 + E0,1 | /= 0 and .I ∗ (w, c) < ∞ with some .c > 0, then
.

⎧ ⎫
1
. lim P {τRCA (M) < ∞} = P sup |W (u)| ≤ vη ,
M→∞ 0<u≤1 w(u)

where .η is defined in (5.3.6) and .{W (u), 0 ≤ u ≤ 1} is a Wiener process.


Proof We can follow the proof of Theorems 5.3.1 and 5.3.2 to show that .ẐM (k)
can be approximated with a CUSUM process. Namely, according to the proofs of
Theorems 5.3.1 and 5.3.2 we can define a .Lν -decomposable sequence .{χi , i ∈ Z},
so that
| |
| Σ
M | ( )
| |
. |M β̂1,M − χi | = O P M ζ ,
| |
i=1

and
| |
| Σ
M+k |
1 | |
. sup |k β̂M,M+k − χi | = OP (1)
1≤k<∞ k ζ | |
i=M+1
6.3 Time Series Models 337

with some .ζ < 1/2. We note that the definition of .χi will change depending on
if .E log |β0 + E0,1 | > 0 or .E log |β0 + E0,1 | < 0. The result then follows as in
Theorem 6.1.1. ⨆

The parameter .η may be estimated from the training sample as detailed in
Sect. 5.3.
As we have seen to this point, sequential monitoring methods based on the
comparison of estimates for the parameters of interest from the historical and the
incoming observations often can be fashioned into a consistent approach. These
have so far made use of least-squares estimators that have simple forms. Many
models of interest though do not admit such easy-to-work-with estimators, as is the
case for GARCH.(p, q) models. We discuss now sequential monitoring procedures
for the parameters in a GARCH.(1, 1) model. In this case the observations are
assumed to be generated from the model

yi = σi Ei ,
.

where the volatility process in the training period is assumed to evolve as

σi2 = ω0 + α0 yi−1
.
2
+ β0 σi−1
2
, 1 ≤ i ≤ M.

The open-ended null hypothesis is stated as

H0 : σi2 = ω0 + α0 yi−1
.
2
+ β0 σi−1
2
, M +1≤i <∞ (6.3.5)

while under the alternative the parameters change after observing .k ∗ additional
observations

2 + β σ 2 , M + 1 ≤ i < M + k∗,
ω0 + α0 yi−1 0 i−1
.HA : σi =
2
2 + β σ 2 , M + k ∗ ≤ i < ∞,
ωA + αA yi−1 A i−1

(ω0 , α0 , β0 ) /= (ωA , αA , βA ). Since stationarity is not assumed under the null


.

hypothesis, we test if .(α0 , β0 ) remained the same during the observation period,
since .ω0 cannot be identified in the explosive (non-stationary) case. Let .θ̂ M ∈ R3
be the quasi-likelihood estimator based on the training sample, i.e.
⎧ ⎫
ΣM
.θ̂ M = argmax θ ∈ [δ, 1/δ] ,
3
li (θ )
i=1

where .δ > 0 is a user-specified tuning parameter, and .li (θ ) is defined in (5.4.28).


Let
⎛ ⎞T
∂li (θ ) ∂li (θ)
ui (θ) =
. , .
∂α ∂β
338 6 Sequential Monitoring

The detector process is defined as


⎛ ⎞ ⎛ ⎞T
Σ
M+k Σ
M+k

.ZM (k) = ui (θ̂ M ) F̂−1
M ui (θ̂ M ) ,
l=M+1 l=M+1

where

1 Σ
M ⎛ ⎞T
F̂M =
. ui (θ̂ M ) ui (θ̂ M ) . (6.3.6)
M
i=1

In other words, the detector process is constructed by evaluating the estimating


equation .Eui (θ 0 ) = 0 estimated from the incoming data using the parameter
estimator from the training sample. If a change occurs we expect .ui (θ̂ M ) to differ
significantly from zero for .i ≥ M + k ∗ . A stopping time may then be defined as
⎧ ( ∗ )1/2
min{k : ZM (k) ≥ g(M, k)}
τCH (M) =
. ( ∗ )1/2
∞, if ZM (k) < g(M, k) for all k,

where .g(M, k) is defined in (6.1.6).


We state the asymptotics for .τCH (M) under the null hypothesis:
Theorem 6.3.3 If .H0 of (6.3.5), Assumptions 5.4.1, 5.4.4, 6.1.2 are satisfied,
E log(β0 + α0 E02 ) /= 0 and .I ∗ (w, c) < ∞ with some .c > 0, then
.

⎧ ⎫
1 ⎛ 2 ⎞1/2
. lim {τCH (M) < ∞} = P sup W1 (u) + W2 (u)
2
≤v ,
M→∞ 0<u≤1 w(u)

where .{W1 (u), 0 ≤ u ≤ 1} and .{W2 (u), 0 ≤ u ≤ 1} are independent Wiener


processes.
Proof We showed in Sect. 5.4 that

Σ
M
. ui (θ̂ M ) = 0.
i=1

We also established in the proofs of Theorems 5.4.4 and 5.4.6 that there is an .Lν -
decomposable sequence .{χ i , i ∈ Z} such that
|| M ||
||Σ Σ
M || ( )
|| ||
. || ui (θ 0 ) − χ i || = OP M ζ ,
|| ||
i=1 i=1

and
6.3 Time Series Models 339

|| M+k ||
|| Σ Σ
M+k ||
1 || ||
. sup || ui (θ 0 ) − χ i || = OP (1)
1≤k<∞ k
ζ || ||
i=M+1 i=M+1

with some .ζ < 1/2. We note that the definition of .χ i depends on .E log(β0 +α0 E02 ) <
0 or .E log(β0 + α0 E02 ) > 0. We write

Σ
M+k Σ
M+k ⎛ ⎞
. ui (θ̂ M ) = ui (θ 0 ) + RM (k) θ̂ M − θ 0 .
i=M+1 i=M+1

According to the proofs of Theorems 5.4.4 and 5.4.6,


|| || ⎛ ⎞
|| ||
. ||θ̂ M − θ 0 || = OP M −1/2

and
1
. sup ||RM (k)|| = OP (1).
1≤k<∞ k

Thus we get

Σ
M+k Σ
M+k
k Σ
M
. ui (θ̂ M ) = ui (θ̂ M ) − ui (θ̂ M )
M
i=M+1 i=M+1 i=1

Σ
M+k
k Σ
M ⎛ ⎞
= χi − χ i + RM (k) θ̂ M − θ 0
M
i=M+1 i=1

k ⎛ ⎞
+ R̄M θ̂ M − θ 0
M
and
⎛ ⎞
1
||R̄M || = OP
. .
M

Due to the .Lν -decomposability of the vectors .{χ i , i ∈ Z}, for each M we can define
independent Gaussian processes .{WM,1 (x), 0 ≤ x ≤ M} and .{WM,2 (x), 0 ≤ x <
∞}, .EWM,1 (x) = EWM,2 (x) = 0, EWM,1 (x)WT T
M,1 (y) = WM,2 (x)WM,2 (y) =
F min(x, y),
|| x ||
||Σ || ( )
|| ||
. sup || χ i − WM,1 (x)|| = OP M ζ
1≤x≤M || ||
i=1
340 6 Sequential Monitoring

and
|| M+x ||
|| Σ ||
1 || ||
. sup || χ i − WM,2 (x)|| = OP (1)
1≤x<∞ k
ζ || ||
i=M+1

with some .ζ < 1/2, where

1 Σ
N
P
. ui (θ 0 ) (ui (θ 0 ))T → F.
N
i=1

Using the approximations one can repeat the proof of Theorem 6.1.1 to obtain the
limit result for .τCH (M). ⨆

Theorem 6.3.3 can be extended to the more general GARCH.(p, q) case. For
details we refer to Berkes et al. (2004).

6.4 Distribution of the Stopping Time

It is of interest in each of the above sequential monitoring procedures to know how


long we expect it to take after a change point occurs for the detector process to
exceed the boundary function. This may be better understood by computing the
approximate distribution of the stopping time. The approach used to study the
asymptotic distribution of stopping times is akin to analyzing the distribution of the
location of the maximum of a partial sum process constructed with random variables
that have non-zero means. A change in the parameters introduces a linear drift term
in the detector process that determines the limit distribution of the stopping time
under the alternative.
For the moment we consider a general stationary sequence .{Ei , i ∈ Z} whose
partial sums can be well approximated with a Wiener process:
Assumption 6.4.1 For each N we can define Wiener processes .{WN (x), 0 ≤ x ≤
N } such that
| x |
1 ||Σ |
|
. max | E i − σ WN (x) | = OP (1)
1≤x≤N x ζ | |
i=1

with some .σ > 0 and .ζ < 1/2.


Our next result implies that the maximum of the standardized partial sum process
of random variables with positive means takes its largest value at N, where N is the
number of the random variables in the sum.
Lemma 6.4.1 If Assumption 6.4.1 is satisfied, .0 ≤ α < 1/2, and
6.4 Distribution of the Stopping Time 341

ΔN > 0
. and N 1/2 ΔN → ∞ (6.4.1)

then
⎧ | k | ⎫
|Σ |
−1/2+α 1 1 | | D
N
. max | (Ei + ΔN )| − N 1−α
ΔN → N (0, 1),
σ 1≤k≤N k α | |
i=1

where .N (0, 1) is a standard normal random variable.


Proof It follows from Assumption 6.4.1 that
| k |
|Σ | kζ
−1/2+α 1 | |
.N max | Ei − σ WN (k)| = OP (1)N −1/2+α max α .
1≤k≤N k α | | 1≤k≤N k
i=1

We note

kζ 1, if ζ ≤ α,
. max =
1≤k≤N kα N ζ −α , if ζ > α.

Thus we get
| k |
|Σ |
−1/2+α 1 | |
N
. max α | Ei − σ WN (k)| = oP (1).
1≤k≤N k | |
i=1

Let .0 < δ < 1. We write



1 1
. max |σ WN (k) + kΔN | = max max |σ WN (k) + kΔN | ,
1≤k≤N k α 1≤k≤N (1−δ)kα

1
max |σ WN (k) + kΔN | .
N (1−δ)≤k≤N kα

Now
1 1
. max α
|σ WN (k) + kΔN | ≤ max |σ WN (k)| +(N (1 − δ))1−α ΔN
1≤k≤N (1−δ) k 1≤k≤N (1−δ) k α

and by the scale transformation of the Wiener process

1 D 1
N −1/2+α
. max |WN (k)| → sup |W (u)|,
1≤k≤N (1−δ) kα 0<t<1−δ u α
342 6 Sequential Monitoring

{W (u), 0 ≤ u ≤ 1} is a Wiener process. According to our assumptions


.

N 1/2−α N 1−α ΔN = 1/(N 1/2 ΔN ) → 0. As such we get for all .δ > 0 that
.

⎛ ⎞
1 P
.N −1/2+α max |σ WN (k) + kΔN | − N 1−α ΔN → −∞.
1≤k≤N (1−δ) k α

It follows from elementary calculation that


| |
| |W (k)| |W (N )| ||
−1/2+α | max
N
.
|N (1−δ)≤k≤N k α − N α |
|W (k) − W (N)|
≤ N −1/2+α max
N (1−δ)≤k≤N kα
| |
|1 |
+ N −1/2+α |W (N)| max | − 1 |
N (1−δ)≤k≤N k| α N |
α

≤ (1 − δ)−α N −1/2 max |W (k) − W (N )|


N (1−δ)k≤N

+ N −1/2 |W (N)|αN α (N (1 − δ))−α−1 Nδ.

Using again the scale transformation of the Wiener process we obtain

D
N −1/2
. max |W (k) − W (N)| → sup |W (u) − W (1)|
N (1−δ)k≤N 1−δ≤u≤1

and by the continuity of W we have

. lim sup |W (u) − W (1)| → 0 a.s.


δ→0 1−δ≤u≤1

Thus we conclude that for all .x > 0


⎛ |
| |σ WN (k) + kΔN |
. lim lim sup P N −1/2+α || max
δ→0 N →∞ N (1−δ)≤k≤N kα
| ⎫
|σ WN (N ) + NΔN | ||
− | > x = 0.

It follows from (6.4.1),

. lim P {|σ WN (N ) + NΔN | = σ WN (N ) + NΔN } = 1,


N →∞

completing the proof. ⨆



In order to discuss the limiting distribution of the stopping time introduced above,
we begin by considering the open-ended mean change detection procedure to test
6.4 Distribution of the Stopping Time 343

(6.1.1) versus (6.1.2), and the stopping time .τ (M) in (6.1.7). We set the weight
function in the boundary function definition to

w(u) = uκ ,
. 0 ≤ κ < 1/2. (6.4.2)

We consider the case when the change is early, i.e. it occurs relatively close to the
end of training sample and the size of the change .Δ(M) is not too small:

Assumption 6.4.2 (i) .Δ(M) → 0, (ii) .M 1/2 |Δ(M)| → ∞, and (iii)


⎛ ⎞2

( φ) 1 − 2κ
.k = O M with some 0≤φ< .
2(1 − κ)

Theorem 6.4.1 If .HA of (6.1.2), Assumptions 6.1.1, 6.1.2, 6.4.2 are satisfied, and
the weight function .w(·) is of the form (6.4.2), then

τ (M) − a(M) D
. → N (0, 1),
b(M)

where .N (0, 1) is a standard normal random variable, and the normalizing


sequences are of the form
⎛ ⎞1/(1−κ)
vM 1/2−κ
. a(M) =
|Δ(M)|

and
σ
.b(M) = a 1/2 (M).
(1 − κ)|Δ(M)|

Proof We can assume that .Δ = Δ(M) > 0. Let


⎛ ⎛ ⎞1/(1−κ) ⎞1/(1−κ)
2
vM 1/2−κ v 1/2−κ M (1/2−κ)
.M = M(M, x) = ⎝ − xσ ⎠ .
Δ Δ3/2−2κ

We start by noting a few properties of the function .M. As .M → ∞,

M
. → 0, (6.4.3)
M

M1/2 Δ → ∞,
. (6.4.4)

k∗
. → 0, (6.4.5)
M
344 6 Sequential Monitoring

k∗
. → 0, (6.4.6)
M
and
⎛ ⎞
1M MΔ
. v − 1/2 → x. (6.4.7)
σ M M (M/M)κ

We note
⎛ ⎞1/(1−κ) ⎛ ⎞−1
M (1/2−κ)
2
M 1/2−κ ⎛ ⎞(κ−1/2)(1−κ)
. = M 1/2 Δ → 0
Δ3/2−κ Δ

by Assumption 6.4.2(ii). Hence for any .x ≥ 0,


⎛ ⎞−1/(1−κ)
vM 1/2−κ
.M → 1, (6.4.8)
Δ

and therefore we get, again for any .x ≥ 0,


⎛⎛ ⎞1/(1−κ) ⎞ ⎛⎛
M M 1/2−κ κ−1 ⎞−1/(1−κ) ⎞
. =O M =O M Δ
1/2
.
M Δ

So Assumption 6.4.2(ii) implies (6.4.3). Using (6.4.8) we get


⎛ ⎛ ⎞−1/2(1−κ) ⎞ ⎛⎛
1 1 M 1/2−κ ⎞−(1−2κ)/2(1−κ) ⎞
. =O =O M 1/2
Δ ,
M1/2 Δ Δ Δ

which yields (6.4.4) via Assumption 6.4.2(ii). Applying Assumptions 6.4.2(i), (iii)
and (6.4.8) we obtain

k∗ ⎛ ⎞
. = O Δ1/(1−κ) M (1/2−κ)/(1−κ) M −φ = o(1),
M
proving (6.4.5). It is clear that (6.4.3) and (6.4.5) imply (6.4.6). Since by the
definition of .M

. v − ΔM1−κ M κ−1/2
⎛ ⎛ ⎞1/(1−κ) ⎞
vM 1/2−κ
− σ x v 1/2−κ M (1/2−κ) Δ−3/2+2κ
2
= v − ΔM κ−1/2
Δ
⎛ ⎞1/(1−κ)
= σ x Δκ−1/2 M (κ−1/2)/2 ,

(6.4.7) follows from (6.4.8).


6.4 Distribution of the Stopping Time 345

It follows from the definition of .SM (k) of (6.1.5) that under .HA

Σ
M+k
k Σ
M
SM (k) =
. Ei − Ei + Δ(k − k ∗ + 1)1{k ≥ k ∗ }.
M
i=M+1 i=1

Using Assumption 6.1.1 and (6.4.6) we conclude


|M |
|Σ |
k | |
| Ei | ⎛⎛ ⎞1−κ ⎞
M | | k∗
i=1
. max = OP = oP (1), (6.4.9)
1≤k<k ∗ M 1/2 (1 + k/M)(k/(M + k))κ M

and
| M+k |
| Σ |
| |
| Ei − σ WM,2 (k)|
| |
i=M+1
. max
1≤k<k ∗ M 1/2 (1 + k/M)(k/(M + k))κ

= OP (1) max ∗
1≤k<k M 1/2 (1 + k/M)(k/(M + k))κ
⎛⎛ ⎞ ⎞
k ∗ 1/2−κ
= OP
M

since .ζ < 1/2. By (6.4.6) we have

|WM,2 (k)| |WM,2 (t)|


. max = OP (1) sup
1≤k<k ∗ M 1/2 (1 + k/M)(k/(k + M))κ 1≤t<k ∗ M 1/2 (t/M)κ

and
|WM,2 (t)| D |W (t)|
. sup = sup
1≤t<k ∗ M 1/2 (t/M)κ 1/M≤t<k ∗ /M uκ

where .{W (u), u ≥ 0} is a Wiener process. We note that by the law of iterated
logarithm for the Wiener process at zero,

|W (t)|
. sup → 0 a.s. (M → ∞).
1/M≤t<k ∗ /M uκ

Thus we have
| M+k |
| Σ |
| |
| Ei |
| |
i=M+1
. max = oP (1). (6.4.10)
1≤k<k ∗ M 1/2 (1 + k/M)(k/(M + k))κ
346 6 Sequential Monitoring

Putting together (6.4.3) and (6.4.7) we get

ΔM
. lim =a>0
M→∞ M 1/2 (M/M)κ

with some scalar a, and therefore we conclude from (6.4.9) and (6.4.10) that
⎛ ⎞κ−1/2 ⎛ ⎞
M |SM (k)| ΔM P
. max ∗ 1/2 κ
− 1/2 → −∞.
M 1≤k<k M (1+k/M)(k/(M+k)) M (M/M)κ

Using again Assumption 6.1.1 and (6.4.3) we have


|M |
|Σ |
k | |
⎛ ⎞κ−1/2 | Ei | ⎛⎛ ⎞1/2 ⎞
M M | | M
i=1
. max = OP = oP (1).
M k ∗ ≤k≤M M 1/2 (1+k/M)(k/(k+M))κ M

Similarly, by Assumption 6.4.2(iii) and (6.4.3)


⎛ ⎞κ−1/2
M Δ(k ∗ + 1)
. max
M k ∗ ≤k≤M M 1/2 (1 + k/M)(k/(k + M))κ
⎛⎛ ⎞ ⎛ ⎞ ⎞
M κ−1/2 Δk ∗ M κ
= O(1)
M M 1/2 k ∗
⎛⎛ ⎞(κ−1/2)/(1−κ) ⎞
M 1/2−κ ∗ 1−κ
= O(1) Δ(k )
Δ
⎛ ⎞
= O(1) Δ1+(1/2−κ)/(1−κ) M −(κ−1/2) /(1−κ)+φ(1−κ)
2

= o(1).

The approximation in Assumption 6.1.1 and (6.4.6) yield


| M+k |
| Σ |
| |
⎛ ⎞κ−1/2 | Ei − σ WM,2 (k)|
M | |
i=M+1
. max
M M 1/2 (1 + k/M)(k/(M + k))κ
k ∗ ≤k≤M
⎛ ⎞κ−1/2
M kζ
= OP (1) max
M k ∗ ≤k≤M M 1/2 (1 + k/M)(k/(M + k))κ
⎛ ⎞κ−1/2
M 1 kζ
= OP (1) max
M M 1/2 k ∗ ≤k≤M (k/M)κ
6.4 Distribution of the Stopping Time 347

= OP (1)Mκ−1/2 max k ζ −κ
k ∗ ≤k≤M
⎧( )1/2−κ
k ∗ /M , if ζ ≤ κ
= OP (1) −1/2
M ζ , if ζ > κ

= oP (1).

Similarly,
⎛ ⎞κ−1/2 |
M | || 1
. max WM,2 (k) || 1/2
| |
M ∗
k ≤k≤M M (1 + k/M)(k/(M + k))κ
|
1 |
− 1/2 | = oP (1).
M (k/M)κ |

Note the distribution of .WM,2 (x) does not depend on M. Let .{W (x), x ≥ 0} be a
Wiener process. Using Lemma 6.4.1 we obtain
⎛ ⎫ ⎛ ⎫
|SM (k)| |SM (k)|
. lim P max ≤ 1 = lim P max ≤1
M→∞ 1≤k≤M g(M, k) M→∞ k ∗ ≤k≤M g(M, k)
⎛ ⎫
|σ W (k) + kΔ|
= lim P max ≤1
M→∞ k ∗ ≤k≤M g(M, k)
⎛ ⎫
|σ W (k) + kΔ|
= lim P max ≤1
M→∞ 1≤k≤M g(M, k)
⎛ ⎫
|σ W (k) + kΔ|
= lim P max ≤ v
M→∞ 1≤k≤M M 1/2 (k/M)κ
⎛ ⎛ ⎞κ ⎛
M |σ W (k) + kΔ|
= lim P max
M→∞ M 1≤k≤M M 1/2 (k/M)κ

ΔM
− 1/2
M (M/M)κ
⎛ ⎞κ ⎛ ⎞⎫
M ΔM
≤ v − 1/2
M M (M/M)κ
= o(x)

on account of (6.4.7), where .o(x) denotes the standard normal distribution function.
Thus we have

. lim P {τ (M) ≥ M(M, x)} = o(x). (6.4.11)


M→∞
348 6 Sequential Monitoring

The result in (6.4.11) can be rewritten as


{ }
. lim P τ 1−κ (M) − a 1−κ (M) ≤ xσ a 1/2−κ /Δ
M→∞

= 1 − P {τ (M) ≥ M(M, −x)} = o(x). (6.4.12)

It follows from (6.4.8) and (6.4.11) that

τ (M) P
. → 1.
a(M)

Hence by the mean value theorem we have


⎛⎛ ⎞1/(1−κ) ⎛ ⎞1/(1−κ) ⎞
τ (M) − a(M) 1
. = τ 1−κ
(M) − a 1−κ
(M)
b(M) b(M)
1 ⎛ 1−κ ⎞1/(1−κ)−1 τ 1−κ (M) − a 1−κ (M)
= a (M)(1 + oP (1))
1−κ b(M)
1 τ 1−κ (M) − a 1−κ (M)
= a κ (M) (1 + oP (1)),
1−κ b(M)

so the result follows from (6.4.12). ⨆



It is notable that
τ (M) P
. → 1,
a(M)

and therefore

M (1/2−κ)/(1−κ)
.τ (M) ≈ v 1/(1−κ) in probability.
|Δ(M)|1/(1−κ)

This implies that the shortest reaction time is achieved if .κ is close to 1/2. We
would react to the change instantaneously if .κ = 1/2 but this is not allowed
in the definition of the stopping time. If in the definition of .g(M, k) we use
.w(u) = (u log+ log(1/u))
1/2 , then following the proof of Theorem 6.4.1 one can

show
⎛ ⎞
.τ (M) = OP (log log M)
1/2
,

if .|Δ(M)| > 0 does not depend on M.


Next we consider a multivariate version of Lemma 6.4.1. We assume that .E i ∈ Rd
and the drift of the partial sums is determined by .δ(N ) = δ. We also require, as
6.4 Distribution of the Stopping Time 349

in Assumption 6.4.1, that the sum of the .E i ’s can be approximated with Gaussian
processes in .Rd :

Assumption 6.4.3 For each N we can define Gaussian processes .{WN (x), 0 ≤
x ≤ N}, .EWN (x) = 0, EWN (x)WN (y) = min(x, y)J, .J is non-singular, such
that
|| x ||
1 ||
||Σ
||
||
. max || E i − WN (x) || = OP (1)
1≤x≤N x ζ || ||
i=1

with some .ζ < 1/2.


Lemma 6.4.2 If Assumption 6.4.3 is satisfied, .0 ≤ κ < 1/2 and

N||δ||2 → ∞
. (6.4.13)

then
⎛ ⎛ k ⎞T ⎛ k ⎞
N 3/2−2α ⎝ 1 Σ −1
Σ
. max (E i + δ) J (E i + δ)
σ (N ) 1≤k≤N k 2α
i=1 i=1
⎞ D
−N 2−2α δ T J−1 δ → N (0, 1)

where

σ 2 (N ) = 4δ T J−1 δ
.

and .N (0, 1) denotes a standard normal random variable.


Proof We write
|⎛ ⎞T ⎛ k ⎞
| Σ Σ
N −3/2+2α 1 | k
max | (E i + δ) J−1 (E i + δ)
.
||δ|| 1≤k≤N k 2α |
| i=1 i=1
|
|
− (WN (k) + kδ)T J−1 (WN (k) + kδ)|
|⎛ ⎞T |
| k |
N −3/2+2α 1 || Σ |
≤ max E i − WN (k) J (WN (k) + kδ)||
−1
||δ|| 1≤k≤N k 2α || |
i=1
|⎛ ⎞T ⎛ k ⎞||
| Σ Σ
N −3/2+2α 1 | k
|
+ max 2α || E i − WN (k) J−1 (E i + δ) ||
||δ|| 1≤k≤N k | |
i=1 i=1
350 6 Sequential Monitoring

and
|⎛ ⎞T |
| k |
N −3/2+2α 1 | Σ |
. max 2α || E i − WN (k) J−1 (WN (k) + kδ)||
||δ|| 1≤k≤N k | i=1 |
|| k ||
1 ||
||Σ
|| N −1+α
|| 1
≤ N −1/2+α max α || E i − WN (k)|| max ||WN (k) + kδ||
1≤k≤N k || || ||δ|| 1≤k≤N k α
i=1

It follows from Assumption 6.4.3, as argued in the proof of Lemma 6.4.1, that
|| k ||
||Σ ||
−1/2+α 1 || ||
.N max || E i − WN (k)|| = oP (1)
1≤k≤N k α || ||
i=1

and

N −1+α 1
. max α ||WN (k) + kδ||
||δ|| 1≤k≤N k
N −1+α 1 N −1+α 1
≤ max α ||WN (k)|| + max α ||kδ||
||δ|| 1≤k≤N k ||δ|| 1≤k≤N k
1 1
≤ N −1/2+α max α ||WN (k)|| + O(1) = oP (1) + O(1).
N 1/2 ||δ|| 1≤k≤N k

Thus we get

N −1+α 1
. max α ||WN (k) + kδ|| = oP (1).
||δ|| 1≤k≤N k

Similar arguments give


|⎛ ⎞T ⎛ k ⎞||
| k
N −3/2+2α 1 | Σ Σ |
. max 2α || E i − WN (k) J−1 (E i + δ) || = oP (1).
||δ|| 1≤k≤N k | i=1 |
i=1

Since the distribution of .{WN (x), x ≥ 1} does not depend on N , according to our
calculations we need to prove only
⎛ ⎫
N −3/2+2α 1
. max 2α (W(k) + kδ)T J−1 (W(k) + kδ) − N 2−2α δ T J−1 δ
||δ|| 1≤k≤N k

D
→ N (0, 1),
6.4 Distribution of the Stopping Time 351

where .{W(x), x ≥ 1} is a Gaussian process with .EW(x) = 0 and .EW(x)WT (y) =


J min(x, y). We observe

1 ⎛ ⎞
. max 2α
WT (k)J−1 W(k) = OP N 1−2α
1≤k≤N k

and
| | ⎛ ⎞
| |
. max |k 1−2α kδ T J−1 W(k)| = OP ||δ||N 3/2−2α .
1≤k≤N

Also, if .0 < δ < 1, then

. max k 2−2α δ T J−1 δ = δ T J−1 δ(N (1 − δ))2−α


1≤k≤N (1−δ)

and

. max k 2−2α δ T J−1 δ = δ T J−1 δN 2−α .


N (1−δ)≤k≤N

Thus we conclude that for all .0 < δ < 1



1
. lim inf P max (W(k) + kδ)T J−1 (W(k) + kδ)
N →∞ 1≤k≤N k 2α

1
= max (W(k) + kδ)T J−1 (W(k) + kδ) = 1.
N (1−δ)≤k≤N k 2α

Next we write

. (W(k) + kδ)T J−1 (W(k) + kδ) = WT (k)J−1 W(k)


+ 2kδ T J−1 W(k) + k 2 δ T J−1 δ.

Now

N −3/2+2α 1 || T −1 T −1
|
|
. max | W (k)J W(k) − W (N )J W(N ) |
||δ|| (1−δ)N ≤k≤N k 2α
N −3/2
= O(1) max ||W(N ) − W(k)||(||W(N )|| + ||W(k)||)
||δ|| (1−δ)N ≤k≤N
⎛ ⎞
1
= OP = oP (1).
N 1/2 ||δ||
352 6 Sequential Monitoring

Also,
| ⎛ ⎞||
N −3/2+2α || T −1 T −1 |
. max
|N (1−δ)≤k≤N k 1−2α
δ J W(k) − δ J W(N ) |
||δ||
= O(1)N −1/2 max ||W(k) − W(N )||,
N (1−δ)≤k≤N

and by the scale transformation of .W we have

D
N −1/2
. max ||W(k) − W(N )|| → sup ||W(1) − W(u)||.
N (1−δ)≤k≤N 1−δ≤u≤1

By the almost sure continuity of .{W(u), 0 ≤ u ≤ 1} we conclude that for all .x > 0,
|

N −3/2+2α || 1
. lim lim sup P max (W(k) + kδ)T J−1 (W(k) + kδ)
δ→0 N →∞ ||δ|| | N (1−δ)≤k≤N k 1−2α
1 ⎛ T ⎞ || ⎫
− max W (N )J W(N )+2kδ J W(N )+k δ J J ||>x =0.
−1 T −1 2 T −1
N (1−δ)≤k≤N k 1−2α

Condition (6.4.13) implies for all .0 < δ < 1,


⎛ ⎛ ⎞
1
. lim P max WT (N )J−1 W(N ) + 2kδ T J−1 W(N ) + k 2 δ T J−1
N →∞ N (1−δ)≤k≤N k 1−2α
⎛ ⎞⎫
= N 1−2α WT (N )J−1 W(N ) + 2N δ T J−1 W(N ) + N 2 δJ−1 δ = 1.

Using again (6.4.13) we get


⎛ ⎞
N −3/2+2α 1−2α T 1
. N W (N )J−1 W(N ) = OP = oP (1).
||δ|| N 1/2 ||δ||

Using the normality of .W(N ) we get that


⎛ ⎞1/2
D
δ T J−1 W(N ) =
. Nδ T J−1 δ N (0, 1),

where .N (0, 1) is a standard normal random variable. This completes the proof. ⨆

The result of Lemma 6.4.2 can be easily rewritten for the norms used in Sect. 6.1.
Namely,
⎛⎛ ⎞T ⎛ k ⎞⎞1/2
⎛ Σk Σ
1 ⎝
N −1/2+α
. max (E i + δ) J−1 (E i + δ) ⎠ (6.4.14)
1≤k≤N k α
i=1 i=1
6.4 Distribution of the Stopping Time 353

⎛ ⎞1/2 ⎫ D
T −1
−N 1−α
δ J δ → N (0, 1),

where .N (0, 1) is a standard normal random variable.


Lemma 6.4.2 can be used to derive the asymptotic distribution of the stopping
times to detect changes in linear and dynamic linear models, as well as non-linear
time series models discussed in Sects. 6.2 and 6.3. We consider the stopping time
.τL (M) of Theorem 6.2.1 based on the detector (6.2.4). Since we assume in that

theorem that the covariates are .Lν -decomposable, there is a matrix .A such that
|| M+k ||
|| Σ ||
1 || T ||
. max || xi xi − kA|| = OP (1).
1≤k<∞ (k log+ log(k))1/2 || ||
i=M+1

The size of the change is

.δ = A(β A − β 0 ).

Analogously to Assumption 6.4.2 we require

Assumption 6.4.4 (i) .||δ|| → 0, (ii) .M 1/2 ||δ|| → ∞, and (iii)


⎛ ⎞2

( φ) 1 − 2κ
.k = O M with some 0≤φ< .
2(1 − κ)

Theorem 6.4.2 If .HA of (6.2.2), Assumptions 4.1.1–4.1.2, 6.2.1, 6.4.4 and (6.4.2)
are satisfied, then
τL (M) − a(M) D
. → N (0, 1),
b(M)
where .N (0, 1) is a standard normal random variable,
⎛ ⎞1/(1−κ)
vM 1/2−κ
a(M) =
.
(δ T D−1 δ)1/2
and
1
.b(M) = a 1/2 (M).
(1 − κ)(δ T D−1 δ)1/2

Proof Let

M = M(M, x)
.

⎛ ┌ ┐1/(1−κ) ⎞1/(1−κ)
1/2−κ 1/2−κ (1/2−κ)2
vM v M
=⎝ T −x ⎠ . (6.4.15)
(δ D−1 δ)1/2 ([δ T D−1 δ)]1/2 )3/2−2κ
354 6 Sequential Monitoring

We note that .M defined above satisfies (6.4.3)–(6.4.6). Along the lines of the proof
of (6.4.7) and (6.4.8) we have
⎛ ⎞1/(1−κ)
vM 1/2−κ
. M ≈
(δ T D−1 δ)1/2

and
⎛ ⎛ ⎞1/2 ⎞
M−1/2+κ vM −1/2+κ − M1−κ δ T D−1 δ
. → x. (6.4.16)

Following the proof of Theorem 6.4.1 we need to consider


⎧ ⎛⎛ ⎞T
⎨ 1 Σk
. lim P M −1/2+κ max κ ⎝ (xi Ei + δ)
M→∞ ⎩ 1≤k≤M k
i=1
⎛ k ⎞⎞1/2 ⎫
Σ ⎬
×D−1 (xi Ei + δ) ≤v .

i=1

We arrange the equation as

⎛ ⎛ ⎛Σ
k
⎞T ⎛ k
Σ
⎞ ⎞1/2 ⎫
−1/2+κ 1 −1
.P M max (xi Ei + δ) D (xi Ei + δ) ≤c
1≤k≤M k κ
i=1 i=1
⎛ ┌ ⎛⎛ Σk ⎞T ⎛Σk ⎞⎞1/2
−1/2+κ 1 −1
=P M max (xi Ei + δ) D (xi Ei + δ)
1≤k≤M k κ
i=1 i=1
⎛ ⎞1/2 ┐ ⎛ ⎛ ⎞1/2 ⎞⎫
T −1 −1/2+κ T −1
−M 1−κ
δ D δ ≤M cM 1/2−κ
−M 1−κ
δ D δ .

Using the definition of .M in (6.4.15) we get from (6.4.14) that


⎧ ⎛⎛ ⎫
⎪ ⎞T ⎛ k ⎞⎞1/2 ⎪
⎨ 1 Σ k Σ ⎬
. lim P M −1/2+κ max κ ⎝ (xi Ei +δ) D−1 (xi Ei +δ) ⎠ ≤v
M→∞ ⎪ ⎩ 1≤k≤M k ⎪

i=1 i=1

= o(x),

which means that

. lim P {τL (M) ≥ M(M, x)} = o(x), (6.4.17)


M→∞
6.4 Distribution of the Stopping Time 355

where .o(x) denotes the standard normal distribution function. The limit result in
(6.4.17) is the same as in (6.4.11), and it follows as in the proof of Theorem 6.4.1.
First we note that

τL1−κ − a 1−κ D
. → N (0, 1),
b1

where
┌ 2
┐1/(1−κ)
v 1/2−κ M (1/2−κ) 1
b1 = b1 (M) =
.
T
= T
a 1/2−κ (M).
[(δ D−1 δ)1/2 ]3/2−2κ (δ D−1 δ)1/2

By the mean value theorem,

τL (M) − a(M) 1 τ 1−κ (M) − a 1−κ (M)


. ≈ [a 1−κ (M)]κ/(1−κ) L ,
b(M) 1−κ b(M)

which yields the result by the definition of .b(M). ⨆



It is interesting to note that we have the same limit results in RCA(1) and in
GARCH(1,1) models, regardless of the stationarity properties of the observations
in the historical sample and after the change. However, this remark only applies to
the estimable parameters. For example, the above sequential monitoring procedures
are consistent for changes in the regression parameter in RCA (1) models, and the
approximate distribution of the corresponding stopping time may be computed. In
case of the GARCH (1,1) models of Theorem 6.3.3, we stop at the first time when
we detect that the mean of the vector .ui (θ̂ M ) appears to differ from .0. It can be
shown that .Eui (θ ) does not depend on i if .i > k ∗ is large enough and we could use

Σ
τCH +M
1
.δ̂ = ui (θ̂ M )
M
i=τCH

to estimate the time of change. We can apply Theorem 6.4.2 to get an upper
bound for .τCH (M), for the reaction time to detect the change. We assume that
in the definition of .τCH (M) we use .w(u) = uκ , 0 ≤ κ < 1/2, .||δ|| → 0 and
.M
1/2 ||δ|| → ∞. Let

⎛ ⎞1/(1−κ)
vM 1/2−κ
.aCH (M) = ⎝ ⎠ ,
T
(δ̂ F̂−1
M δ̂) 1/2

and

aCH (M) T −1 −1/2
bCH (M) =
. (δ̂ F̂M δ̂) .
1−κ
356 6 Sequential Monitoring

The matrix .F̂M is defined in (6.3.6). In the GARCH(1,1) case, regardless of


stationarity properties,

τCH (M) − aCH (M) D


. → N (0, 1),
bCH (M)

The size of the change is measure in the change of the derivative of the likelihood
function. If we are interested in measuring the change in .(α0 − αA , β0 − βA ) we
could use the detector
⎛ ⎞T ⎛ ⎞
GM (k) = k(θ̂ M,M+k − θ̂ 0,M ) D̂−1
.
M k(θ̂ M,M+k − θ̂ 0,M ) ,

where .θ̂ i,j is the quasi-maximum likelihood estimator for .α0 , β0 computed from
{yk , i < k ≤ j } and .D̂M is the estimator for the asymptotic covariance of .M 1/2 θ̂ 0,M
.

computed from the training sample. If the corresponding stopping time is denoted
∗ (M), .0 ≤ κ < 1/2, then
by .τCH
∗ (M) − a ∗ (M)
τCH CH D
.
∗ → N (0, 1)
bCH (M)

with
⎛ ⎞1/(1−κ)
vM 1/2−κ

aCH
. (M) = ⎝ T
⎠ ,
(δ̂ D̂−1
M δ̂)
1/2

and
/ ∗ (M)
aCH T

.bCH (M) = (δ̂ D̂−1
M δ̂)
−1/2
.
1−κ

Now .δ̂ = θ̂ 0,M − θ̂ τCH


∗ ,τ ∗ +M .
CH

6.5 Data Examples

Example 6.5.1 (Monitoring the Exchange Rate Between USD and Pound Ster-
ling) Figure 6.1 shows the daily spot exchange rates1 between U.S. dollars and
Pounds sterling from the year 2022 obtained from the Federal Reserve Bank of St.

1 There were ten missing values due to holidays, which we imputed using linear interpolation

between neighboring values.


6.5 Data Examples 357

1.35
1.30
USD to Pound sterling

1.25
1.20
1.15
1.10

2022−01−03 2022−09−01 2022−12−30

Fig. 6.1 A plot of the spot exchange rate between US dollars and Pounds sterling from 2022. As
a demonstration of sequential monitoring procedures, we used the data prior to September 1, 2002
(M = 171) as a training sample to monitor for changes in the linear trend of the series

Louis database (2023). As a demonstration of the sequential monitoring methods


introduced in Sect. 6.2, we consider the problem of monitoring for changes in the
trend of the spot exchange using as a training sample the data from January 3rd,
2022, until September 1st, 2022 (M = 174).
Letting yt denote the spot exchange rate on day t, we see that over this period
yt declines approximately linearly, which motivates monitoring for changes in the
parameters of the time trend model

yt = β0 + β1 t + Et .
. (6.5.1)

In order to compute the process ZM (k) in (6.2.4), we estimated D using the


long-run covariace matrix estimator in Sect. 3.1.2 with the Bartlett kernel and
the bandwidth of Andrews (1991) from the training sample. It is remarkable in
this case that the estimated residuals in the training sample from model (6.5.1)
are strongly serially correlated, highlighting the importance of using a long-run
covariance matrix estimator.
1/2
We compared the process ZM (k) to the boundary function
⎛ ⎞⎛ ⎞0.45
k k
g(M, k) = v0.95 M
.
1/2
1+ , (6.5.2)
M k+M

where v0.95 was computed so that


⎧ ⎛ ⎞1/2 ⎫
⎨ 1 Σ
2 ⎬
P
. sup Wl2 (u) ≤ v0.95 = 0.95.
⎩0<u≤1 u0.45 ⎭
l=1
358 6 Sequential Monitoring

Z 1/2
k
30

g(M, k)
25
20
15
10
5
0

2022−09−02 2022−09−27 2022−10−13


1/2
Fig. 6.2 A plot of ZM (k), described in (6.2.4), computed from the incoming data beyond
September 1st, 2022, and the boundary function g(M, k) in (6.5.2). We saw that the process
1/2
ZM (k) crossed the boundary on September 27, 2022

1/2
A plot of ZM (k) against g(M, k) is shown in Fig. 6.2. We saw that the process
1/2
ZM (k) crossed the critical boundary at the date corresponding to September 27th,
approximately 20 days after the conclusion of the election of a new prime minister
in the UK, and one day after a large drop in the spot-exchange rate.

Example 6.5.2 (Monitoring a Dynamic Linear Model for US Housing Prices)


In this example, we consider online-monitoring for changes in the parameters of
a dynamic linear model used to model the evolution of U.S. housing prices. The
historical prices of U.S. real estate have gone through several booms, like the
California housing boom of the 1880s, the Florida land boom of the 1920s, and
peaks in the national real estate market occurred in 1980s and the 2000s.
The data that we consider consists of the monthly S&P CoreLogic Case-Shiller
Home Price Index (HPI) series over a 10 year period from 1991 to the end of
2000 (N = 120), obtained from the Federal Reserve Bank of St. Louis database
(St. Louis, 2023). These indices are often used as a proxy of US residential real
estate housing prices. This first differenced HPI series is shown in Fig. 6.3, which
generally shows an upward housing price trend in the United States at the national
level over this period.
We take as the goal of this analysis to demonstrate monitoring for changes in a
dynamic linear model for the HPI series. Letting yt denoting the first differenced
HPI series, we consider the dynamic linear model

yt = β1 + β2 t + β3 xt + Et ,
. (6.5.3)
6.6 Exercises 359

0.8
0.6
Shiller HPI

0.4
0.2
0.0
−0.2

1991−01−01 1996−01−01 2000−12−01

Fig. 6.3 First differenced S&P CoreLogic Case-Shiller Home Price Index (HPI) series over a 10
year period from 1991 to the end of 2000 (N = 120), obtained from the Federal Reserve Bank of
St. Louis database (St. Louis, 2023)

where xt is the first differenced real disposable personal income per capita change
at a monthly level over the same period. We used January 1991-December 1995
as the training (historical) sample with M = 60, and considered an open-ended
monitoring procedure from there. An application of the KPSS test (Kwiatkowski
et al., 1992) suggested that the covariate series of first differenced real disposable
personal income per capita is reasonably stationary in the training sample. In order
to compute ZM (k) in (6.2.4), we estimated D using the long-run covariace matrix
estimator in Sect. 3.1.2 with the Bartlett kernel and the bandwidth of (Andrews,
1991) from the training sample.
1/2
Figure 6.4 shows a plot of the process ZM (k) against the boundary function
1/2
g(M, k) of Eq. (6.5.2) as in the previous example. We saw that the process ZM (k)
stayed well below the boundary function g(M, k) for the first approximately year
and a half of monitoring, but then sharply increases after a large shift in the rate of
1/2
increase of the HPI occurring in the later part of the year 1997. The process ZM (k)
crossed the boundary at a location corresponding to November, 1997, which is when
we may have sounded an alarm that the model (6.5.3) appears to have undergone a
structural change.

6.6 Exercises

Exercise 6.6.1 We define the stopping time



min {k : |SM (k)| ≥ g(M, k)}
τ (M) =
.
∞, if |SM (k)| < g(M, k) for all k ≥ 1,
360 6 Sequential Monitoring

Z 1k 2
80

g (M,k)
60
40
20
0

1996−01−01 1997−11−01 1998−07−01


1/2
Fig. 6.4 A plot of ZM (k), described in (6.2.4), computed from the incoming data beyond January,
1/2
1996, and the boundary function g(M, k) in (6.5.2). We saw that the process ZM (k) crossed the
boundary in November, 1997

where SM (k) is given by (6.1.5) and the boundary function is


⎛ ⎞1+κ
k
g(M, k) = vM
.
κ−1/2
1+ , 0 ≤ κ < ∞.
M

Show that under the conditions of Theorem 6.1.1


⎛ ⎫
. lim P {τ (M) < ∞} = P sup (1 − u) |W (u)| ≤ vσ ,
κ
M→∞ 0<u<1

where {W (u), 0 ≤ u ≤ 1} is a Wiener process.


Exercise 6.6.2 We define the self normalized stopping time
⎛ { ∗ (k)| ≥ g(M, k)
}
min k : |SM
τ ∗ (M) =
.
∗ (k)| < g(M, k) for all k ≥ 1,
∞, if |SM

⎛ | i |⎞−1
|Σ |
∗ | |
.SM (k) = SM (k) max N −1/2 | (Xl − X̄M )|
1≤i≤N | |
l=1

where SM (k) is given by (6.1.5) and the boundary function is


6.6 Exercises 361

g(M, k) = vM −1/2 ((k + M)/M)w(k/(k + M)).


. (6.6.1)

Compute
{ }
. lim P τ ∗ (M) < ∞
M→∞

under the conditions of Theorem 6.1.1


Exercise 6.6.3 Extend the result of Theorem 6.1.1 when the observations are in Rd .
We consider the model

Xi = μ0 + ei ,
. 1≤i≤M

and under the null hypothesis

H0 : Xi = μ0 + ei ,
. M + 1 ≤ i < ∞.

Under the alternative there is a change in the mean at time M + k ∗ . We assume


that ei , i ∈ Z are independent and identically distributed random vectors in Rd ,
Ee0 = 0, E||e0 ||κ < ∞ with some κ > 2 and J = e0 eT 0 is a non singular matrix.
Let
⎛┌ ┐T ┌ ┐⎞1/2
Σ
M+k Σ
M+k
ZM (k) = ⎝
. (Xi − X̄M ) J−1 (Xi − X̄M ) ⎠ ,
i=M+1 i=M+1

1 Σ
M
.X̄M = Xi ,
M
i=1

and define the stopping time



min{k : ZM (k) ≥ g(M, k)}
.τ (M) =
∞, if ZM (k) < g(M, k) for all k,

where g(M, k) is defined in (6.6.1). Compute

. lim P {τ (M) < ∞}


M→∞

if the null hypothesis, Assumption 6.1.2 hold, and I ∗ (w, c) < ∞ with some c > 0.
Exercise 6.6.4 Show that the

. lim P {τL (M) < ∞} = 1,


M→∞
362 6 Sequential Monitoring

if

k∗
. → 0 and M 1/2 ||β 0 − β A || → ∞,
M
where τL (M) is defined in (6.2.5) and Assumptions 5.2.1–5.2.4, 6.1.2 hold.
Exercise 6.6.5 Define a closed end version of τL (M) and provide its asymptotic
properties.
Exercise 6.6.6 We wish to test the null hypothesis of (6.2.1) using the detector
| M+k |
| Σ k Σ
M |
| |
.Ẑk = | Êi,M − Êi,M | ,
| M |
i=M+1 i=1

where the residuals Êi,M are defined in (6.2.3). Define the corresponding stopping
time and discuss its asymptotic properties.
Exercise 6.6.7 We consider the dynamic AR(1) model, i.e. d = 1 in (6.3.1) and
(6.3.2). Under the alternative the regression parameter changes to 1 immediately
after the historical sample. Show that

. lim P {τ (M) < ∞} = 1


M→∞

under the alternative of changing from stationarity to a random walk.


Exercise 6.6.8 Let {Ei , i ∈ Z < ∞} be independent and identically distributed
random variables, with EE0 = μ > 0, 0 < σ 2 < ∞ and E|E0 |κ < ∞ with some
κ > 2. Let 0 ≤ α < 1/2. Show that
| |
Nα | 1 Σ
k |
| |
. lim sup | max α Ei − μN 1−α | = σ a.s.
N →∞ (2N log log N)
1/2 |1≤k≤N k |
i=1

Exercise 6.6.9 Let {E i , i ∈ Z < ∞} be independent and identically distributed


random vectors in Rd , with EE 0 = μ /= 0 and E||E 0 ||κ < ∞ with some κ > 2. Let
0 ≤ α < 1/2. Show that
| || k || |
| 1 || || |
Nα | ||Σ || 1−α |
. lim sup | max α || E i || − ||μ||N |<∞ a.s.
N →∞ (N log log N)
1/2 |1≤k≤N k || || |
i=1

Exercise 6.6.10 Show that Lemma 6.4.1 remains true when 1/2 ≤ α < 1.
Exercise 6.6.11 Show that Lemma 6.4.2 remains true when 1/2 ≤ α < 1.
6.7 Bibliographic Notes and Remarks 363

6.7 Bibliographic Notes and Remarks

The first monitoring scheme to find changes in regression parameters akin to the
approaches considered in this chapter was introduced by Chu et al. (1996), and it
has become the starting point of substantial research. Chu et al. (1996) used the
weight function .w(u) = (u(a 2 + log(1/u)))1/2 and they also provided an upper
bound for .P {τ (M)} < ∞ with this weight function. Theorem 6.1.1 was obtained
by Aue and Horváth (2004), assuming that the innovations are independent and
identically distributed. Zeileis et al. (2005) and Aue et al. (2014) studied monitoring
schemes in linear models with dependent errors. Changes in dynamic regressions
were investigated in Horváth et al. (2022). Kirch (2007), Kirch (2008) and Hušková
and Kirch (2012) provided resampling methods to find critical values for sequential
monitoring. Hlávka et al. (2012) investigated the sequential detection of changes of
the parameter of autoregressive models, i.e. .d = 0 in the dynamic linear model of
(6.3.1) and (6.3.2). Gösmann et al. (2021) propose a likelihood-ratio based approach
for open-ended sequential monitoring. Leisch et al. (2000) used fluctuation tests to
monitor model parameters. Berkes et al. (2004) and Horváth et al. (2006) justified
the applicability of monitoring in volatility models. Homm and Breitung (2012)
compared several methods to find bubbles in stock markets, detecting changes from
stationary to non stationary segments. Similarly, to find change from stationarity to
non-stationarity is discussed in Steland (2006) and Horváth et al. (2020). Bubble
detection in real time is an interesting application of monitoring methods. Time
series models for this purpose assume a stationary sequence turns into an explosive
or mildly explosive series before returning to a stationary phase or a different type of
explosive segment. We refer to Phillips et al. (2014), Phillips et al. (2015), Phillips
and Shi (2018) and Phillips and Yu (2011) for methods to detect bubbles and the
applications of their methods to several financial bubbles. Bardet and Kengne (2014)
considered more general causal and affine processes. Hoga (2017) investigated
multivariate models. A graph based approach is developed in Chen (2019).
The method proposed in this chapter are highly related to other sequential
monitoring techniques in the vein for Shiryaev’s and Robert’s methodology, see
e.g. Shiryaev (1963), Roberts (1966), Pollak (1987), Moustakides (1986), Siegmund
(2013), Lai and Xing (2010). These typically assume incoming observations are
serially independent. A recent review is Tartakovsky et al. (2014). Aue et al. (2012),
Aue et al. (2014) used monitoring in functional models as in Chap. 8.
The limits in the theorems of Sect. 6.1 do not exist if .w(u) = u1/2 is used in
the definition of the boundary function. Horváth et al. (2007) modified the boundary
function so .w(u) = u1/2 could be incorporated into the procedure.
Chow and Hsiung (1976) proved Lemma 6.4.1 for independent and identically
distributed random variables with positive drift. The proof in this chapter is inspired
by Berkes and Horváth (2003a).
Chapter 7
High-Dimensional and Panel Data

We have considered in several instances, see e.g. Sects. 1.3, 5.5, and 5.6, performing
change point analysis with multivariate time series. In this chapter we change
our notation slightly to denote such multivariate time series data as .Xi,t , t ∈
{1, . . . , T }, i ∈ {1, . . . , N}, where we think of t and T as denoting “time”, and
N denotes the dimension or number of “cross-sectional units” that we observe.
For example, such data might comprise real valued observations of N financial or
economic time series over T time units.
Given the fast proliferation and accessibility of economic and financial data
available today, often at least one or both of N and T are large. This has spurred
efforts to understand how various methods to analyze the data .Xi,t , including change
point analysis, are affected when N or T are large in relation to each other, or
are both large. The case when .N >> T and T is relatively small is sometimes
referred to in the econometrics literature as “panel data”, with the various N cross-
sectional time series referred to as “panels”. Other relationships between N and T
frequently arise, and the analysis of such data generally falls within the scope of
high-dimensional multivariate time series analysis.
In this chapter we discuss the asymptotic theory behind change point methods
for such high-dimensional or panel data. An important consideration throughout is
that we allow both N and T to tend to infinity. We see in many cases that the relative
rates at which N and T diverge have a crucial impact on the form of the limiting
distribution of natural change point test statistics.
To illustrate some of the challenges in this setting, in Sect. 7.1 we begin by
considering change point detection methods for the mean of high-dimensional time
series that are cross-sectionally independent. These methods are adapted to deal with
cross-sectional dependence in form of (linear) common factors in Sect. 7.2. Change
point detection in the context of high-dimensional linear regression is considered in
Sect. 7.3, and for high-dimensional RCA models in Sect. 7.4.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 365
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_7
366 7 High-Dimensional and Panel Data

7.1 Change in the Means of High-Dimensional Observations

We consider the AMOC change point model

Xi,t = μi + δi 1{t ≥ t0 } + ei,t ,


. t ∈ {1, . . . , T }, i ∈ {1, . . . , N }, (7.1.1)

where in order to identify the mean parameter we assume that

Assumption 7.1.1 .Eei,t = 0, t ∈ {1, . . . , T }, i ∈ {1, . . . , N}.


According to (7.1.1), .μi changes to .μi + δi simultaneously across each cross-
section at the change point .t0 . We note that in this chapter we use .t0 to denote
the change point in order to emphasize that it occurs in time rather than across
dimension. We wish to test the null hypothesis that there are no changes in the
mean, which in this case may be phrased as

H0 : t0 > T
. (7.1.2)

against the alternative

HA : 1 < t0 < T .
. (7.1.3)

We note here that we are careful to formulate the change point hypotheses in
terms of the time of change, rather than the magnitude of the changes represented
by .{δi , i ∈ {1, . . . , N }}. An important consideration throughout this Chapter will be
the determination of what “magnitude” of a change is required in order for natural
change point statistics to be able to differentiate .H0 from .HA .
To detect a change in the i’th cross-section, we use the CUSUM process
⎛ ⎞
1 ⎣T x⎦
ZT ,i (x) =
. ST ,i (x) − ST ,i (1) ,
T 1/2 T

where
⎣T
Σ x⎦
.ST ,i (x) = Xi,t ,
t=1

0 ≤ x ≤ 1, .i ∈ {1, . . . , N}. The testing procedure that we consider is based on


.

functionals of the .l2 -norm aggregated CUSUM process


⎧ ⎫
1 Σ
N
1 2 ⎣T x⎦(T − ⎣T x⎦)
. V̄N,T (x) = ZT ,i (x) − , 0 ≤ x ≤ 1,
N 1/2 σi2 T2
i=1
7.1 Change in the Means of High-Dimensional Observations 367

where the .σi2 ’s are some suitably chosen standardization constants. In this section
we assume the errors .ei,t are cross-sectionally independent linear processes
Assumption 7.1.2
(i) for all .i ∈ {1, . . . , N }, .t ∈ {1, . . . , T },

Σ
ei,t =
. ci,l Ei,t−l
l=0

(ii) the sequences .{Ei,t , t ∈ Z} are independent


(iii) for every i the variables .{Ei,t , t ∈ Z} are independent and identically
distributed
(iv) .EEi,0 = 0, EEi,0
2 = 1 and .E|E |κ < ∞ with some .κ > 4
i,0
(v)

1 Σ
N
. lim sup E|Ei,0 |κ < ∞
N →∞ N
i=1

(vi) there are .c0 and .α > 2 such that for all .1 ≤ i ≤ N, 0 ≤ l < ∞

|ci,l | ≤ c0 (l + 1)−α
.

(vii) there is .δ > 0 such that .ai2 ≥ δ 2 , 1 ≤ i ≤ N, where


Σ ∞ Σ
Σ ∞
ai2 =
.
2
ci,l +2 ci,l ci,l+h .
l=0 h=1 l=0

Assumption 7.1.2(ii) entails that the cross-sectional units are independent. We


relax this assumption in Sect. 7.2 in which we assume that the cross-sections are
correlated through dependence on common factors. Using Assumption 7.1.2 we get
that the long-run variance satisfies
⎛ T ⎞2
1 Σ
.σi
2
= lim ei,t = ai2 ≥ δ 2 . (7.1.4)
T →∞ T
t=1

In discussing the asymptotic properties


√ of the process .V̄N,T , we start with the case
in which T grows faster than . N.
Assumption 7.1.3

N
. → 0.
T2
368 7 High-Dimensional and Panel Data

We will also consider methods in Sect. 7.2 where we only assume that
min(N, T ) → ∞. Section 7.3 further provides some results when T is fixed
.

and only .N → ∞.
The next theorem gives the limit distribution of the standardized .l2 -aggregated
cross-sectional CUSUM process.
Theorem 7.1.1 If .H0 of (7.1.2) and Assumptions 7.1.1–7.1.3 hold, then

D[0,1]
V̄N,T (x) −→ ┌(x),
.

where .{┌(x), 0 ≤ x ≤ 1} is a Gaussian process with .E┌(x) = 0 and .E┌(x)┌(y) =


2x 2 (1 − y)2 , if .0 ≤ x ≤ y ≤ 1.
Checking the covariance functions, one can easily verify
{ ⎛ ⎞ }
D
. {┌(x), 0 ≤ x ≤ 1} = 21/2 (1 − x)2 W x 2 /(1 − x)2 , 0 ≤ x ≤ 1 , (7.1.5)

where .{W (u), u ≥ 0} denotes a Wiener process. The representation in (7.1.5) has
an interesting connection to the Brownian bridge. Namely, if .{B(t), 0 ≤ t ≤ 1} is a
Brownian bridge, then

D
{B(t), 0 ≤ t ≤ 1} = {(1 − t)W (t/(1 − t)), 0 ≤ t ≤ 1} .
.

The proof of Theorem 7.1.1 is based on the following lemmas.


Lemma 7.1.1 If .H0 of (7.1.2) and Assumptions 7.1.1–7.1.3 hold, then the finite
dimensional distributions of .V̄N,T (x) converges to that of .┌(x), where .┌(x) is
defined in Theorem 7.1.1.
Proof According to Phillips and Solo (1992) (see also Bai 1994, p. 470) we have
that
⎛ k ⎞
Σ
k
k Σ
T Σ k Σ
T
. ei,t − ei,t = ai Ei,t − Ei,t + ηi,k , (7.1.6)
T T
t=1 t=1 t=1 t=1
⎛ ⎞
k ∗ ∗ k ∗
ηi,k = ηi,k (T ) = 1 −
. ei,0 − ei,k + ei,T ,
T T

where

Σ ∞
Σ
∗ ∗ ∗
ei,t
. = ci,l Ei,t−l with ci,l = ci,k .
l=1 k=l+1
7.1 Change in the Means of High-Dimensional Observations 369

Let
⎛ ⎞
⎣T x⎦
1 ⎝Σ ⎣T x⎦ Σ ⎠
T
.QT ,i (x) = Ei,t − Ei,t .
T 1/2 T
t=1 t=1

By (7.1.6) we have
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
k k k 1 2
ZT2 ,i
. = ai2 Q2T ,i + 2ai QT ,i + ηi,k .
T T T T

It follows from elementary calculation that


⎛ ⎞ ⎛ ⎞
k k k
ai2 EQ2T ,i
. = σi2 1− .
T T T
∗ we have
Using the definition of .ei,k
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
k ∗ k 1 ∗ 1 k ∗
EQT ,i
. ei,0 = 0, EQT ,i e = 1 − ci,0 2
EEi,0 ,
T T T 1/2 i,k T T
⎛ ⎞ ⎛ T ⎞
k k ∗ k Σ k ∗

EQT ,i e = 2 ci,l − ci,0 EEi,02
T T 3/2 i,T T T
l=T −k

∗ i ∈ Z} we have
and by the stationarity of .{ei,j
⎛ ( ) ( ∗ )2 ( ∗ )2 ⎞ ( ∗ )2
∗ 2
.
2
Eηi,k ≤ 8 E ei,0 + E ei,k + E ei,T ≤ 24E ei,0 .

It follows from Assumption 7.1.2(vi)


( ∗ )2 ( ∗ )2
E ei,0
. = EEi,0
2
ci,l ≤ c1 EEi,0
2

with some constant .c1 . Thus we obtain by (7.1.4)


|N |
|Σ 1 ⎣T x⎦(T − ⎣T x⎦) || N 1/2 1 Σ 2
N
1 |
. sup | EZ 2
(x) − | ≤ c2 EEi,0
0≤x≤1 N 1/2 | σi2
T ,i
T2 | T N
i=1 i=1

with some constant .c2 , and therefore Assumptions 7.1.2(iv) and 7.1.3 yield
|N |
|Σ 1 ⎣T x⎦(T − ⎣T x⎦) ||
1 |
. sup | EZT ,i (x) −
2
| = o(1). (7.1.7)
0≤x≤1 N 1/2 | σi2 T2 |
i=1
370 7 High-Dimensional and Panel Data

Next we show that for all k


| ⎛ ⎞ ⎛ ⎞|
| k k ||
|
. V̄N,T − RN,T = oP (1),
| T T |

where
N ⎧
Σ ⎫
1 ⎣T x⎦(T − ⎣T x⎦)
RN,T (x) =
. QT ,1 −
2
.
N 1/2 T2
i=1

In light of (7.1.7), it is enough to show that


⎛ ⎛ ⎞ ⎛ ⎞⎞2
1 Σ
N
1 2 k k
. max E ZT ,i − Q2T ,i = o(1). (7.1.8)
1≤k≤T N
i=1
σi2 N N

Using again (7.1.6) we obtain that


⎛ ⎛ ⎞ ⎛ ⎞⎞2 ⎛ ⎛ ⎞ ⎞
1 2 k k k 1 1 2 2
E
. ZT ,i − Q2T ,i = E 2ai QT ,i ηi,k + ηi,k
σi2 N N N T 1/2 T
⎧ ⎛ ⎞ ⎫
k 1 2 1 4
≤ 16E ai2 Q2T ,i ηi,k + 2 ηi,k
N T T
⎧ ⎛ ⎞ ⎞1/2 ⎫
1 1 1/2 ⎛
≤ c3 EE 4
+ EE 4
EQ 4
T ,i
T 2 i,0 T i,0

with some constant .c3 . By the Rosenthal inequality (see Petrov 1995, p. 59), we
conclude
⎧ ⎛ ⎞2 ⎫
c4
.EQT ,i ≤ + T 2 EEi,0
4 4 2
T EEi,0 .
T

So by the Cauchy–Schwarz inequality we have


⎛ ⎛ ⎞ ⎛ ⎞⎞2
1 Σ
N
1 2 k k
. E ZT ,i − Q2T ,i
N
i=1
σi2 N N
⎧ ⎫
1 Σ 4 1 1 ⎛ 4 ⎞1/2 2
N
1
≤ c5 EE i,0 + EEi,0 EEi,0
T 3/2 N T N
i=1

1 Σ 4
N
≤ c6 EEi,0 ,
TN
i=1

completing the proof of (7.1.8).


7.1 Change in the Means of High-Dimensional Observations 371

Let .0 < x1 < . . . < xK < 1, .λ1 , . . . , λK be constants and introduce

Σ
N ⎛ ⎞
LT ,i =
. λl Q2T ,i (xl ) − EQ2T ,i (xl ) .
l=1

Observing that .QT ,i (·) is a CUSUM process of independent and identically


distributed random variables, Assumption 7.1.2 yields

EL2T ,i ≥ c7
. if T ≥ T ∗ , (7.1.9)

with some .c7 > 0 and .T ∗ . Using again the Rosenthal inequality (see Petrov 1995,
p. 59) we get

. E|LT ,i |κ ≤ c8 T −κ/2 (T E|Ei,0 |κ + T κ/2 ), (7.1.10)

where .c8 only depends on the .λl ’s. Applying (7.1.9) and (7.1.10) we conclude
⎛N ⎞⎛ N ⎞−1/2
Σ⎛ ⎞2/κ Σ
. E|LT ,i | κ/2 2
ELT ,i
i=1 i=1
⎧⎛ ⎞2/κ ⎫
c9 ⎨ Σ
N ⎬
≤ T 1−κ/2 E|Ei,0 |κ + N 2κ
N 1/2 ⎩ ⎭
i=1
⎧⎛ ⎞2/κ ⎛ ⎞2/κ ⎫
⎨ Σ
N
1 Σ
N ⎬
1−κ/2 1
≤ c10 N (4−κ)/(2κ)
T E|Ei,0 |κ + E|Ei,0 |κ
⎩ N N ⎭
i=1 i=1

→0

on account of Assumptions 7.1.2 and 7.1.3. Using Lyapunov’s central limit theorem
(see Theorem 4.9 of Petrov 1995) we get

1 Σ
N
D
. LT ,i → N, (7.1.11)
N 1/2
i=1

where .N is a normal random variable with zero mean and variance which is a
function of the .xl ’s and .λk ’s. Now the Cramér–Wold lemma (see Billingsley 1968)
gives the convergence of the finite dimensional distributions.
372 7 High-Dimensional and Panel Data

The last step of the proof is the computation of the variance of the limiting normal
random variable in (7.1.11). Since .QT ,i (x) is the CUSUM process of independent
and identically distributed random variables, elementary arguments give
| ⎛ ⎞ ⎛ ⎞|
| |
. max |cov Q2T ,i (x), Q2T ,i (y) − cov B 2 (x), B 2 (y) | → 0,
1≤i≤N

which implies
⎛ ⎞
E┌(x)┌(y) = cov B 2 (x), B 2 (y) ,
.

where .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge. Since the Brownian bridge is


Gaussian with mean zero and covariance function .min(t, s)−ts, tedious calculations
can provide the formula for .cov(B 2 (x), B 2 (y)). We instead use an indirect method
based on the connection between the Brownian bridge and the Ornstein–Uhlenback
process. Let .{U (x), x ≥ 0} be a standard Ornstein–Uhlenback process, i.e. a
Gaussian process with .EU (x) = 0 and

EU (x)U (y) = exp(−|x − y|).


.

We show in (A.2.1) that


⎧ ⎫ ⎧ ⎛ ⎛ ⎞⎞ ⎫
B(x) D 1 x
. , 0 < x < 1 = U log , 0 < x < 1 .
(x(1 − x))1/2 2 1−x

By the stationarity of .U (x) we get for all .0 < x ≤ y < 1 that


┌ ┐ ┌ ⎛ ⎛ ⎞⎞ ⎛ ⎛ ⎞⎞┐
B 2 (x) B 2 (y) 1 x 1 y
E
. = E U2 log U2 log
x(1 − x) y(1 − y) 2 1−x 2 1−y
┌ ⎛ ⎛ ⎞⎞┐
1 y(1 − x)
= E U 2 (0)U 2 log . (7.1.12)
2 x(1 − y)

Using the covariance function of .U (x) we have the representation

D
[U (0), U (h)] = [U (0), exp(−h)U (0) + (1 − exp(−2h))1/2 N],
. (7.1.13)

where .N is a standard normal random variable, independent of .U (0). Apply-


ing (7.1.13), we get that

E[U 2 (0)U 2 (h)] = EU 4 (0) exp(−2h)+1−exp(−2h) = 1+2 exp(−2h),


. h ≥ 0,
7.1 Change in the Means of High-Dimensional Observations 373

and therefore

var(U 2 (0), U 2 (h)) = 2 exp(−2h).


.

Thus from (7.1.12) we obtain that

cov(B 2 (x), B 2 (y)) = 2x 2 (1 − y)2 ,


. 0 ≤ x ≤ y ≤ 1. (7.1.14)

Now the covariance function of .┌(x) follows from (7.1.12). ⨆



Since we showed the convergence of the finite dimensional distributions, the
proof of Theorem 7.1.1 is complete if we show that .V̄N,T (·) is tight in .D[0, 1] (see
Prohorov’s Theorem in Billingsley 1968).
Lemma 7.1.2 If .H0 of (7.1.2) and Assumptions 7.1.1–7.1.3 hold, then .V̄N,T (x) is
tight in .D[0, 1].
Proof We use again (7.1.6). Let
⎛ ⎞
1 ΣN
1 ⎡ 2 ⎛ ⎞┐ κ/2
A1 (k, l) = E
. ηi,k − ηi,l − E ηi,k − ηi,l
2 2 2
.
N 1/2 T σi2
i=1

Applying Rosenthal’s inequality (see Petrov 1995, p. 59) we obtain for all .1 ≤ l ≤
k ≤ T that

⎧Σ ⎡N ┐
N ⎛ ⎞κ/2 Σ ⎛ ⎞2 κ/4
−κ/4 −κ/2
A1 (k, l) ≤ c1 N
. T E ηi,k − Eηi,k
2 2
+ E ηi,k − Eηi,k
2 2

i=1 i=1
⎡N ┐
Σ
N ⎛ ⎞κ/2 Σ ⎛ ⎞2 κ/4 ⎫
+ E ηi,l − Eηi,l
2 2
+ E ηi,l − Eηi,l
2 2

i=1 i=1

⎧Σ ⎛N ⎞
N
( ) Σ⎛ ⎞ κ/4 ⎫
−κ/4 −κ/2
≤ c2 N T E|ηi,k | + E|ηi,l | +
κ κ
ηi,k + ηi,l
4 4
.
i=1 i=1

∗ and Assumption 7.1.2 that there is a constant


It follows from the definition of .ηi,t
.c3 = c3 (γ ) such that

∗ γ
E|ei,t
. | ≤ c3 E|Ei,0 |γ

which yields
4
Eηi,k
. ≤ c4 EEi,0
4
and E|ηi,k |κ ≤ c5 E|Ei,0 |κ .
374 7 High-Dimensional and Panel Data

Thus we get
⎧ ⎡ ┐κ/4 ⎫
⎨1 Σ
N
1 Σ 4
N ⎬
A1 (k, l) ≤ c6 T −κ/2
. E|Ei,0 |κ + EEi,0
⎩N N ⎭
i=1 i=1

≤ c7 T −κ/2 . (7.1.15)

Repeating the arguments leading to (7.1.15), one can verify

A2 (k, l)
.

⎧ ΣN ┌ ⎛ ⎞ ⎛ ⎞
1 k l
=E 1/2
Q T ,i η i,k − Q T ,i ηi,l
(N T ) N N
i=1
⎛ ⎛ ⎞ ⎛ ⎞ ⎞ ┐⎫
k l
− E QT ,i ηi,k − QT ,i ηi,l
N N
⎧Σ | ⎛ ⎞ ⎛ ⎞ |κ/2
1
N
| k k |
≤ c8 |
E |QT ,i ηi,k − EQT ,i ηi,k ||
(N T ) κ/4 N N
i=1

Σ | ⎛ ⎞ ⎛ ⎞ |κ/2
N
| l l |
+ |
E |QT ,i ηi,l − EQT ,i ηi,l ||
N N
i=1
⎡ ⎛ ⎛ ⎞ ⎛ ⎞ ⎞2 ┐κ/4
Σ
N
k k
+ E QT ,i ηi,k − EQT ,i ηi,k
N N
i=1
⎡ ⎛ ⎛ ⎞ ⎛ ⎞ ⎞2 ┐κ/4 ⎫
Σ
N
l l
+ E QT ,i ηi,l − EQT ,i ηi,l
N N
i=1
⎧Σ | ⎛ ⎞ |κ/2 Σ | ⎛ ⎞ |κ/2
1
N
| | N
| |
≤ c9 E |QT ,i k ηi,k | + E |QT ,i l ηi,l |
(N T ) κ/4 | N | | N |
i=1 i=1
⎡ ⎛ ⎛ ⎞ ⎞2 ┐κ/4 ⎡ ⎛ ⎛ ⎞ ⎞2 ┐κ/4 ⎫
Σ
N
k Σ
N
l
+ E QT ,i ηi,k + E QT ,i ηi,l .
N N
i=1 i=1
7.1 Change in the Means of High-Dimensional Observations 375

Using the Cauchy–Schwarz inequality, as in the proof of Lemma (7.1.1), we get for
all .0 < γ ≤ κ/2

| ⎛ ⎞ |γ ⎛ | ⎛ ⎞|2γ ⎞1/2
| k | | k | | |2γ
.E |QT ,i ηi,k || ≤ E ||QT ,i | E |ηi,k |
| N N |
⎧⎛ ⎞1/2 ⎛ ⎞γ /2 ⎫ ⎛ ⎞1/2
−γ +1
≤ c10 T E|Ei,0 | 2γ
+ EEi,0
2
E|Ei,0 |2γ

≤ c11 E|Ei,0 |2γ .

Thus we have
⎧ ⎡ ┐κ/4 ⎫
1 ⎨1 Σ
N
1 Σ 4
N ⎬ 1
A2 (k, l) ≤ c12
. E|Ei,0 |κ + EEi,0 ≤ c13 .
T κ/4 ⎩N N ⎭ T κ/4
i=1 i=1

Next we introduce
⎧ N ┌ ⎛ ⎞ ⎛ ⎞
1 Σ l k
A3 (k, l) = E
.
2
QT ,i − QT ,i
2
N 1/2 T T
i=1
⎛ ⎛ ⎞ ⎛ ⎞⎞┐⎫κ/2
l l k k
− 1− − 1− ,
T T T T

1 ≤ k ≤ l ≤ T . Elementary algebra gives for all .1 ≤ γ ≤ κ/2


.

┌ ⎛ ⎞ ⎛ ⎞ ⎛ ⎛ ⎞ ⎛ ⎞⎞┐γ
l k l l k k
2
. QT ,i − QT ,i2
− 1− − 1−
T T T T T T
| ⎛ ⎞ ⎛ ⎞|γ ⎛ ⎞γ
| k l || l−k
≤ c14 ||Q2T ,i − Q2T ,i + c15 .
T T | T

The Cauchy–Schwarz inequality implies


| ⎛ ⎞ ⎛ ⎞|γ
| k l ||
E ||Q2T ,i − Q2T ,i
T |
.
T
⎧ | ⎛ ⎞ ⎛ ⎞|2γ | ⎛ ⎞ ⎛ ⎞|2γ ⎫1/2
| k l | | k l ||
≤ E ||QT ,i + QT ,i | E |QT ,i
| | − QT ,i
T T T T |
⎧⎛ | ⎛ ⎞|2γ | ⎛ ⎞|2γ ⎞
| k | | |
≤ c16 E ||QT ,i | + E |QT ,i l |
T | | T |
| ⎛ ⎞ ⎛ ⎞|2γ ⎫1/2
| k l ||
× E ||QT ,i − QT ,i .
T T |
376 7 High-Dimensional and Panel Data

Another application of Rosenthal’s inequality yields for all .1 ≤ γ ≤ κ/2


| ⎛ ⎞|2γ ⎛ ⎛ ⎞γ ⎞
| l ||
|
.E QT ,i ≤ c T −γ
T E|E | 2γ
+ T EE 2
≤ c18 E|Ei,0 |2γ ,
| T |
17 i,0 i,0

and
| ⎛ ⎞ ⎛ ⎞|2γ
| l k ||
E ||QT ,i − QT ,i
T |
.
T
⎧ | |2γ | |2γ ⎫
⎨ || Σ
⎪ | ⎛ ⎞2γ |Σ | ⎪
l
| l−k | T
| ⎬
≤ c19 T −γ E || Ei,j || + E || Ei,j ||

⎩ |j =k+1 | T |j =1 | ⎪ ⎭
{ ⎛ ⎞γ
≤ c20 T −γ (l − k)E|Ei,0 |2γ + (l − k)γ EEi,02

⎛ ⎞ ⎫
l − k 2γ ⎡ ⎛ ⎞γ ┐
+ T E|Ei,0 | + T EEi,0
2γ 2
T
⎛ ⎞
l−k γ
≤ c21 E|Ei,0 |2γ .
T

Since .A3 (k, l) is a sum of independent random variables, we can use again
Rosenthal’s inequality and the upper bounds above with .γ = 2 and .γ = κ/2 to
conclude
⎛ ⎞κ/4
1 Σ
N
l−k
A3 (k, l) ≤ c22
. E|Ei,0 |κ .
T N
i=1

The upper bounds for .A1 (k, l), A2 (k, l) and .A3 (k, l) imply for all .1 ≤ k ≤ l ≤ T
| ⎛ ⎞ ⎛ ⎞|κ/2 ⎛ ⎞ ⎛ ⎞
| l l || l − k κ/4 1 l − k κ/4
|
.E V̄N,T − V̄N,T ≤ c23 + c24 κ ≤ c25 ,
| T T | T T T

which yields
| |κ/2
E |V̄N,T (x) − V̄N,T (y)| ≤ c26 |x − y|κ/4 .
.

Since .κ > 4, the lemma follows from Theorem 12.3 of Billingsley (1968) p. 95. ⨆

Proof of Theorem 7.1.1 The result follows from Lemmas 7.1.1 and 7.1.2, and
Prohorov’s Theorem. ⨆

In many applications T is comparable or smaller than N, and so it of interest
to study alternatives to Assumption 7.1.3. It may be shown (see Horváth and
7.1 Change in the Means of High-Dimensional Observations 377

Hušková 2012 ) that if the conditions of Theorem 7.1.1 hold with the exception
of Assumption 7.1.3, then there exists a nonzero function .g(x) such that

N 1/2 D[0,1]
V̄N,T (x) −
. g(x) −→ ┌(x).
T

In other words, if .N/T 2 does not converge to 0, then a non random drift
term appears in the process .V̄N,T . As such in the absence of Assumption 7.1.3
using Theorem 7.1.1 to estimate critical values of functionals (e.g. the supremum
functional) of .V̄N,T will asymptotically always lead to a rejection of .H0 . We discuss
a modification of Theorem 7.1.1 in Sect. 7.2, where we need to only assume that
.min(N, T ) → ∞.

Since the parameters .σi2 in the definition of .V̄N,T are unknown, in practice we
replace them with estimators. These constants describe the long-run variances of the
innovations in each cross-section. It is natural to use kernel-bandwidth estimators
2
.σ̂
T ,i as discussed in Chap. 3 for this purpose. We discussed the consistency of the
long-run estimators in Sect. 3.1. The rate of convergence of .σ̂T2,i to .σi2 puts further
restriction on the asymptotic rates between N and T under which Theorem 7.1.1
may be established. For details we refer to Horváth and Hušková (2012).
The asymptotic behaviour of .V̄N,T under .HA may be established using the results
in Sect. 2.1 for each cross-section. We require the following conditions on the
location of the change point as well as the magnitude of the changes:

t0 t0
0 < lim inf
. ≤ lim sup < 1 (7.1.16)
T T
and

T Σ 2
N
. δi → ∞. (7.1.17)
N 1/2
i=1

Under these conditions


P
. sup |V̄N,T (x)| → ∞.
0<x<1

Condition (7.1.17) states what is required in terms of the magnitudes of the changes
in the means in order for the supremum functional of .V̄N,T to diverge. Notice that
it does not require that changes occur in each cross-section, although if a constant
change occurs on a positive fraction of the cross-sections, then (7.1.17) reduces to
.min{N, T } → ∞.
378 7 High-Dimensional and Panel Data

7.2 Panel Models with Common Factors

Though it is illuminating to consider the case that the cross-sectional series


are independent, in practice this assumption is often unrealistic. A simple and
commonly used approach to model cross-sectional dependence between a large
number of time series is to use common factor models. We shall see that the
asymptotic properties of .V̄N,T are heavily influenced by the presence of cross-
sectional dependence in the form of common factors.
In order to do so, consider the simple “single common-factor” model allowing
for AMOC in the mean is of the form

Xi,t = μi + δi 1{t ≥ t0 } + φi ζt + Ei,t 1 ≤ i ≤ N, 1 ≤ t ≤ T ,


.

where .ζt is the common factor and .φi is the factor loading in the ith cross-section.
We assume that the common factor series .{ζt , t ∈ Z} satisfies the functional central
limit theorem:
⎣T
Σ x⎦
1 D[0,1]
. ζt −→ W (x),
T 1/2
t=1

where .{W (x), x ≥ 0} is a Wiener process. Further we assume that .{ζt , t ∈


{1, ..., T }} and .{Ei,t , t ∈ {1, . . . , T }, i ∈ {1, . . . , N }} are independent, and we
write the factor loadings in the following form:

ξi
ζi =
. , 1 ≤ i ≤ N.
N ρi
In other words, we consider the case where the influence of the common fac-
tors is decreasing with respect to the number of cross-sections considered. If
.sup1≤i<∞ |ξi | < ∞ and .ρi > 1/4 for all .i ∈ {1, . . . , N }, then the dependence

between the cross-sections is so small that Theorem 7.1.1 remains true. If instead
.ρi = 1/4 for all .i ∈ {1, . . . , N}, then

D[0,1]
V̄N,T (x) −→ ┌(x) + ξ0 B(x),
.

where .{B(x), 0 ≤ x ≤ 1} is a Brownian bridge, independent of .{┌(x), 0 ≤ x ≤ 1}


and

1 Σ ξi2
N
ξ0 = lim
. .
N →∞ N σ2
i=1 i
7.2 Panel Models with Common Factors 379

If .|ξi | is bounded below by a positive number and .0 ≤ ρi < 1/4, then

P
. sup |V̄N,T (x)| → ∞.
0<x<1

As such, if the loadings are “large”, then the supremum functional of .V̄N,T will
diverge even if there is no change in the cross-sectional means.
In order to further discuss the affect of common factors in change point detection
and alternative change point test statistics for this scenario in more generality, we
now turn the traditional p-factor model with AMOC in the mean

yi,t = μi + δi 1{t > t0 } + λT


. i ft + ei,t , i ∈ {1, . . . , N }, t ∈ {1, . . . , T }. (7.2.1)

The time of the changes in the means is .t0 , and the mean of the ith cross-section is
μi which changes to .μi + δi at time .t0 . The cross-sectional dependence is modeled
.

by .ft , t ∈ {1, ..., T }, .ft ∈ Rp and .λi ∈ Rp are the corresponding loadings. We allow
linear as well as non-linear time series .ei,t as unobservable errors. The CUSUM
processes computed from each cross-section are denoted

⎣T
Σ 1 Σ
u⎦ T
Si (u) =
. (yi,t − ȳi,T ), ȳi,T = yi,t ,
T
t=1 t=1

Si (u) = 0, 0 ≤ u < 1/T . The following two observations based on the analysis of
.

V̄N,T to this point suggest an alternative process to consider that overcomes some
.

of the challenges encountered. Although the normalization with .σi2 in the definition
of .V̄N,T in Sect. 7.1 makes it possible that we can centralize the process with the
pivotal function .⎣T u⎦(T − ⎣T u⎦)/T 2 , estimating each .σi2 .i ∈ {1, . . . , N } presents
a challenge. Moreover, the absence of Assumption 7.1.3 and in the presence of
moderately strong common-factors, the process .V̄N,T contains a drift term. This
suggests using a random centralization in computing .V̄N,T ,

N ⎛
Σ ⎞
⎣T u⎦(T − ⎣T u⎦) 2
VN,T (u) =
. Si2 (u) − Si (τ ) , 0 ≤ u ≤ 1,
⎣T τ ⎦(T − ⎣T τ ⎦)
i=1

where .τ is a constant satisfying

Assumption 7.2.1 .0 < τ < 1.


This process is similar to .V̄N,T of Sect. 7.1, but in the present case the centering
process is replaced by functionals of the cross-sectional CUSUM processes them-
selves. The aim of doing this is to adapt the centering to the unknown cross-sectional
dependence structure. This idea is similar in spirit to techniques in “self-normalized”
or “ratio-type” statistics introduced in Sect. 3.1.3. The parameter .τ acts as a tuning
parameter that is used to properly center the process.
380 7 High-Dimensional and Panel Data

In order to derive the asymptotic properties of the process .VN,T , we make


use of the following basic assumptions. First, we assume that the cross-sectional
dependence is fully explained by the common factors, so that the idiosyncratic errors
satisfy:

Assumption 7.2.2 .{ei,t , t ∈ Z} are independent sequences, .i ∈ {1, . . . , N }.


and

Assumption 7.2.3 for each i, .{ei,t , t ∈ Z} is a stationary sequence with .Eei,t = 0,


Eei,t ei,s = 0, s /= t, i ∈ {1, . . . , N }, and there are .c0 and .κ > 4 such that
.

| l |
|Σ |κ
| |
.E | ei,t | ≤ c0 lκ/2 .
| |
t=1

We note that Assumption 7.2.3 is satisfied by .Lν -decomposable processes under


mild conditions.
The Prohorov–Lévy distance between measures and the corresponding processes
is denoted by .ρP L . For the definition and basic properties of the Prohorov–Lévy
distance we refer to Billingsley (1968) (p. 238). Basically, weak convergence in
.D[0, 1] is equivalent with the convergence of the corresponding measures using the

Prohorov–Lévy distance. Let

⎣T
Σ u⎦
si (u) =
. ei,t .
t=1

Assumption 7.2.4
⎛ ⎞
. lim max ρP L T −1/2 si (u), σi W (u) = 0,
min(N,T )→0 1≤i≤N

where .{W (u), 0 ≤ u ≤ 1} is a Wiener process, .σi2 = Eei,0


2 and

. lim inf σi > 0.


i→∞

There is a large literature on the weak convergence of partial sums of stationary


processes that is reviewed in the appendix. Assumption 7.2.4 requires that the
convergence of the measures generated by .si (·) is uniform in the Prohorov–Lévy
distance across the cross-sections. Our assumptions on the coefficients of the linear
time series in Chap. 6.2 imply that the convergence of the distributions of the partial
sum processes to a Wiener measure is uniform. If .Lν -decomposable processes are
considered, this condition follows so long as the .Lν -decomposability coefficients
in Definition 1.1.1 are uniformly bounded by a summable sequence. In case of
7.2 Panel Models with Common Factors 381

many parametric models, e.g. ARMA and GARCH, this may be formulated as a
compactness condition on the parameter spaces for the processes. Next we define

1 Σ 4
N
. σ̄ 4 = lim σi , (7.2.2)
N→∞ N
i=1

and the covariance function

g(u, v) = 2u2 (1 − v)2 ,


. 0 ≤ u ≤ v ≤ 1, (7.2.3)

which already appeared in Theorem 7.1.1. It follows from Assumption 7.2.3 that
σ̄ 4 < ∞ and Assumption 7.2.4 implies that .σ̄ > 0. We also require the mild
.

assumption that the factor loadings are essentially bounded as .N → ∞:

Assumption 7.2.5 .λi = λi,N and

. lim sup max ||λi || < ∞.


N →∞ 1≤i≤N

Let
⎛N ⎞−1
Σ Σ
N
Q = lim
. ||λi ||2 λi λT
i
N →∞
i=1 i=1

and

1 Σ
N
c∗ = lim
. ||λi ||2 ∈ [0, ∞]. (7.2.4)
N →∞ N 1/2
i=1

We assume that the above limits are well defined. The common factors can be
arbitrary, perhaps serially correlated, stationary vector valued processes, but we
assume that they too satisfy the functional central limit theorem. Let .I = Ip×p
be the .p × p identity matrix.

Assumption 7.2.6 .Eft = 0, Eft fT


t = I and

⎣T
Σ u⎦
D[0,1]p
T −1/2
. ft −→ WE (u),
t=1

where .{WE (u), 0 ≤ u ≤ 1} is a Gaussian process with .EWE (u) = 0,


EWE (u)WT
.
E (v) = min(u, v)E and .E is a positive-definite matrix.
We note that Assumption 7.2.6 can be expressed that as the Prohorov–Lévy
distance between the measures generated by the partial sum process and the limiting
382 7 High-Dimensional and Panel Data

Gaussian process. The matrix .E is the long-run covariance matrix of the partial sum
of the .ft ’s.
The following three theorems show that the asymptotic behavior of the process
.VN,T depends crucially on the constant .c∗ in (7.2.4). Three different behaviors are

observed depending on if .c∗ = 0, c∗ ∈ (0, ∞), or .c∗ = ∞.


Theorem 7.2.1 If .H0 of (7.1.2), Assumptions 7.2.1–7.2.6 hold and .c∗ = 0, then, as
min(N, T ) → ∞,
.

1 D[0,1]
. VN,T (u) −→ ┌(u),
NT 1/2

where .{┌(u), 0 ≤ u ≤ 1} is a Gaussian process with .E┌(u) = 0 and



u(1 − u) v(1 − v)
.E┌(u)┌(v) = σ̄ g(u, v) − g(τ, v) −
4
g(τ, u)
τ (1 − τ ) τ (1 − τ )

u(1 − u) v(1 − v)
+ g(τ, τ ) ,
τ (1 − τ ) τ (1 − τ )

where .g(u, v) is defined in (7.2.3).


To state the next theorem we define

BE (u) = WE (u) − uWE (1)


. (7.2.5)

to be the generalized p-dimensional Brownian bridge with covariance matrix .E.


Theorem 7.2.2 If .H0 of (7.1.2), Assumptions 7.2.1–7.2.6 hold, and .c∗ = ∞, then,
as .min(N, T ) → ∞,
⎛N ⎞−1
1 Σ D[0,1]
⎛ ⎛
. ||λi ||
2
VN,T (u) −→ trace Q BE (u)BT
E (u)
T
i=1
⎞⎞
u(1 − u)
− BE (τ )BT
E (τ ) .
τ (1 − τ )

The difference between Theorems 7.2.1 and 7.2.2 arises because of the different
“strengths” of the common factors. In Theorem 7.2.1 the effect of the common
factor is asymptotically negligible, and hence the limiting process is Gaussian. In
7.2 Panel Models with Common Factors 383

Theorem 7.2.2 the common factor dominates and the limit is a quadratic form of an
Rp valued Gaussian process. Using the .vec operator (see Abadir and Magnus 2005,
.

p. 283), we get that


⎛ ⎛ ⎞⎞
u(1 − u)
trace Q BE (u)BT
. E (u) − B E (τ )BT
E (τ )
τ (1 − τ )
⎛ ⎞
T u(1 − u)
= (vec(Q)) vec (BE (u)) − vec (BE (τ )) .
τ (1 − τ )

Thirdly, we consider the case when the errors and the common factors are of the
same order. Since in this case both the error and common factor processes affect the
limit, we have to specify their joint behavior:

Assumption 7.2.7 .{ei,t , i ∈ {1, ..., N}, t ∈ {1, ..., T } and .{ft , t ∈ {1, ..., T }} are
independent.
Theorem 7.2.3 If .H0 of (7.1.2), Assumptions 7.2.1–7.2.6 hold and .0 < c∗ < ∞,
then, as .min(N, T ) → ∞,

1 D[0,1]
⎛ ⎛
. VN,T (u) −→ ┌(u) + c ∗ trace Q BE (u)BT E (u)
T N 1/2
⎞⎞
u(1 − u)
− BE (τ )BT
E (τ ) ,
τ (1 − τ )

where the processes .{┌(u), 0 ≤ u ≤ 1} and .{BE (u), 0 ≤ u ≤ 1} are independent


and they are defined in Theorems 7.2.1 and 7.2.2.
Comparing the results in Theorems 7.2.1–7.2.3 to the results in Sect. 7.1, we see
that here we only require .min(N, T ) → ∞, allowing that the number of the cross-
sections is much higher than the length of the observations in each cross-section.
The normalizations of .VN,T are different in each of Theorems 7.2.1–7.2.3,
which makes it difficult to apply the asymptotic results without knowledge of the
“strength” of the common factors. It is hence desirable to construct a normalization
of the process .VN,T that does not assume foreknowledge of .c∗ . Let

1 Σ
T
.zi,t = (yi,t − ȳi,T ) −
2
(yi,s − ȳi,T )2 , 1 ≤ t ≤ T,1 ≤ i ≤ N
T
s=1

and define the empirical cross-covariances as

T −h
1 Σ
. γ̂i,j (h) = zi,s zj,s+h ,
T −h
s=1
384 7 High-Dimensional and Panel Data

. γ̂i,j (h) = γ̂i,j (−h), h < 0. We use the estimator

Σ
N Σ
N Σ
H ⎛ ⎞
h
wN,T =
. K γ̂i,j (h)
H
i=1 j =1 h=−H

where K is the kernel satisfying Assumption 3.1.5. Here we need stronger condi-
tions on H , the window (smoothing parameter) than in Assumption 3.1.4:

Assumption 7.2.8 .H = H (T ), H (T ) → ∞ and .H (T )/T 1/2 → 0 as .T → ∞.


We shall see that .wN,T is able to estimate the long-run variances of the squared
errors in each cross-section as well as the long-run covariance of the quadratic form
of the common factors. Let
⎛ ⎞
T
.Rt = vec ft ft − I .

It follows from Assumption 7.2.6 that .ERt = 0. Next we define



Σ
R=
. ER0 RT
h
h=−∞

and

Σ ( )
2
.si = cov ei,0 , ei,h .
h=−∞
⎛ ⎞
We need an assumption on the order of decay of .ER0 RT
h and .cov e
2 , e2 :
i,0 i,h

Assumption 7.2.9
(i) .{ft , t ∈ Z} is a stationary sequence with .Ef0 = 0, .E||f0 ||4 < ∞ and there are
.c1 and .α1 > 2 such that

|| ||
|| T || −α1
.E ||R0 Rh || ≤ c1 (1 + |h|)

(ii) .Eei,0 = 0, .Eei,0


4 < ∞ and there are .c and .α > 6 such that
2 2

⎛ ⎞
.
2
cov ei,0 2
, ei,h ≤ c2 (1 + |h|)−α2 .

Theorem 7.2.4 We assume that .H0 of (7.1.2), Assumptions 3.1.5 and 7.2.2–7.2.9
are satisfied.
7.2 Panel Models with Common Factors 385

(i) If .c∗ = 0, then

wN,T P Σ N
. → a0 , where a0 = lim s2i .
NT N →∞
i=1

(ii) If .0 < c∗ < ∞, then

wN,T P
. → a0 + qT Rq,
NT
where
⎛ ⎞−1
Σ
N Σ
N
q = lim
. ||λi ||
2
vec(λi λT
i ).
N →∞
i=1 i=1

(iii) If .c∗ = ∞, then

wN,T P T
.
⎛N ⎞2 → q Rq.
Σ
||λi ||2
i=1

Putting together Theorems 7.2.1–7.2.4 we obtain the following result:


Corollary 7.2.1 If .H0 of (7.1.2), Assumptions 3.1.5 and 7.2.1–7.2.9 hold, then

1 D[0,1]
. VN,T (u) −→ Δ(u),
(T wN,T )1/2

where
⎧ −1/2

⎪ a0 ┌(u), if c∗ = 0,


⎪ ⎛ ⎞−1/2 ┌ ⎛ ⎛

⎪ + T
+ Q BE (u)BT

⎪ a0 c∗ q Rq ┌(u) c∗ trace E (u)

⎪ ⎞⎞ ┐

⎨ u(1 − u)
.Δ(u) = − BE (τ )BT
E (τ ) , if 0 < c∗ < ∞,

⎪ τ (1 − τ )

⎪ ⎛ ⎞ −1/2 ⎛ ⎛

⎪ qT Rq trace Q BE (u)BT

⎪ E (u)

⎪ ⎞⎞

⎪ u(1 − u)
⎩− BE (τ )BT
E (τ ) , if c∗ = ∞
τ (1 − τ )

the processes .{┌(u), 0 ≤ u ≤ 1} and .{BE (u), 0 ≤ u ≤ 1} are defined in Theo-


rems 7.2.1 and 7.2.2.
386 7 High-Dimensional and Panel Data

In Corollary 7.2.1 we normalize .VN,T (u) with the same random sequence and
only the form on the limit distribution depends on .c∗ . This makes it possible to use
resampling to approximate the distribution of functionals of .Δ(u).
The normalizing sequence in Corollary 7.2.1 works well under .H0 but without
modification tends to overestimate under .HA , thereby reducing the power of the
tests. This is due to the fact that centering each series by .ȳi,T tends to overestimate
the variances of the idiosyncratic errors under .HA . We already discussed this issue
in Sect. 3.1. In order to mitigate this problem, when computing .wN,T we center
each cross-section by taking into account a potential change point in the mean. To
do this, in each cross section we estimate a potential change point using a standard
CUSUM estimator described in Sect. 2.1. We then calculate the normalization term
.w̄N,T after recentering each cross section taking into account this potential change

point. Hence we expect the behaviour of .w̄N,T to be similar under .H0 and .HA . We
assume that (7.1.16) holds. If .0 ≤ c∗ < ∞, then only the cross-sections dominate
or they have the same influence as the common factors. If .0 ≤ c∗ < ∞ and (7.1.16)
is satisfied, then
1 P
. sup |V̄N,T | → ∞. (7.2.6)
(T w̄N,T )1/2 0≤u≤1

If .c∗ = ∞ and

Σ
N
T δi2
i=1
. → ∞,
Σ
N
||λi ||
2

i=1

then (7.2.6) remains true.


Next we provide detailed proofs of Theorems 7.2.1–7.2.3.
We assume that .H0 holds and in this case the cross-sectional CUSUM processes
do not depend on .μi so we can assume that .μi = 0. The proofs of the results in this
section are based on the decomposition
⎣T
Σ ⎣T u⎦ T Σ
u⎦ T
⎣T u⎦
.Si (u) = si (u) − si (1) + λT
i f t − λ i ft ,
T T
t=1 t=1

which implies
⎛ ⎞2 ⎛ ⎞
⎣T u⎦ ⎣T u⎦
2
.Si (u) = si (u) − si (1) + 2 si (u) − si (1)
T T
⎛ ⎞
⎣T
Σ u⎦ ΣT
⎣T u⎦
× ⎝ λT i ft − λT
i ft ⎠
T
t=1 t=1
⎛ ⎞2
⎣T
Σ ⎣T u⎦ T Σ ⎠
u⎦ T
+ ⎝λT
i ft − λi ft .
T
t=1 t=1
7.2 Panel Models with Common Factors 387

Let

Σ
N
ZN,T (u) =
. ξi (u)
i=1

with
⎛ ⎞2
⎣T u⎦ ⎣T u⎦(T − ⎣T u⎦)
ξi (u) = si (u) −
. si (1) − σi2 .
T T

Lemma 7.2.1 If .H0 of (7.1.2), Assumptions 7.2.2–7.2.4 hold, then

1 D[0,1]
. ZN,T (u) −→ ┌ 0 (u),
N 1/2 T

where .{┌ 0 (u), 0 ≤ u ≤ 1} is a Gaussian process with .E┌ 0 (u) = 0,


.E┌ (u)┌ (v) = σ̄ g(u, v), where .σ̄ , .g(u, v) are defined in (7.2.2) and (7.2.3),
0 0 4 4

respectively.
Proof Using Assumptions 7.2.2–7.2.4 we get that the processes .ξi (u), i ∈
{1, ..., N } are independent and .Eξi (u) = 0. We start with the proof of tightness.
Applying Rosenthal’s inequality (see Petrov 1995, p. 59) we conclude
|N |κ/2 ⎧N
|Σ | Σ
| |
.E | (ξi (u) − ξi (v))| ≤ c1 E|ξi (u) − ξi (v)|κ/2
| |
i=1 i=1
⎛ ⎞κ/4 ⎫
Σ
N ⎬
+ E(ξi (u) − ξi (v))2 . (7.2.7)

i=1

For .0 ≤ v ≤ u ≤ 1 we write
⎛ ⎞
⎣T u⎦(T − ⎣T u⎦) ⎣T v⎦(T − ⎣T u⎦)
ξi (u) − ξi (v) = si2 (u) − si2 (v) − σi2
. −
T T
⎛ ⎞ ⎛⎛ ⎞2 ⎛ ⎞ ⎞
⎣T u⎦ ⎣T v⎦ ⎣T u⎦ ⎣T v⎦ 2 2
−2 − si (1) + − si (1).
T T T T
388 7 High-Dimensional and Panel Data

By Assumption 7.2.2 we can apply Rosenthal’s inequality again for .0 ≤ v ≤ u ≤ 1

| |κ/2 ⎡ ┐
| |
E |si2 (u) − si2 (v)| ≤ E |si (u) − si (v)|κ/2 (|si (u)| + |si (v)|)κ/2
.

⎛ | |κ ⎞1/2
| ⎣T
Σ u⎦ |
|
κ/2 ⎝ |
|
≤2 E| ei,t || ⎠
|t=⎣T v⎦+1 |
⎛ | | | | ⎞
|⎣T u⎦ |κ |⎣T v⎦ |κ 1/2
|Σ | |Σ |
× ⎝E || ei,t || + E || ei,t || ⎠
| t=1 | | t=1 |

≤ c2 21+κ/2 T κ/4 (⎣T u⎦ − ⎣T v⎦)κ/4


≤ c3 21+κ/2 T κ/2 (u − v)κ/4 .

Similar arguments yield


|⎛ ⎞ |κ/2
| ⎣T u⎦ ⎣T v⎦ |
.E | s (u) − s (v) s (1) | ≤ c4 T κ/2 (u − v)κ/4 ,
| T
i
T
i i |

|⎡⎛ ┐ |κ/2
| ⎣T u⎦ ⎞2 ⎛ ⎣T v⎦ ⎞2 |
| |
.E | − si (1)| ≤ c5 T κ/2 (u − v)κ/4 ,
2
| T T |

and therefore

E|ξi (u) − ξi (v)|κ/2 ≤ c6 T κ/2 |u − v|κ/4


. (7.2.8)

and similarly to (7.2.8)

E(ξi (u) − ξi (v))2 ≤ c7 T 2 |u − v|.


. (7.2.9)

Putting together (7.2.7)–(7.2.9) we conclude


|N |κ/2
|Σ | { }
| |
.E | (ξi (u) − ξi (v))| ≤ c8 NT κ/2 + N κ/4 T κ/2 |u − v|κ/4 .
| |
i=1

Tightness of the process .ZN,T in .D[0, 1] now follows from Billingsley (1968) (p.
95).
Next we show the convergence of the finite dimensional distributions. Let .M ≥ 1
be an integer, .0 ≤ u1 < . . . < uM ≤ uM and .α1 , α2 , . . . , αM be constants. We
define
Σ
M
ηi =
. αl ξi (ul ).
l=1
7.2 Panel Models with Common Factors 389

It follows from the proof of (7.2.9) that

. E|ηi |κ/2 ≤ c9 T κ/2 . (7.2.10)

Using Assumption 7.2.4 there are independent Brownian bridges .{Bi,T , 0 ≤ t ≤ 1}


such that for all .δ > 0 there is an .T0 = T0 (δ) such that
⎧ | ⎫
|1 ⎛ ⎞||
P
. | 2 2 |
sup | ξi (u) − σi Bi,T (u) − u(1 − u) | > δ < δ,
0≤u≤1 T

for all .T ≥ T0 . Let .2 ≤ κ̄ < κ/2. Thus we get


| ⎛ ⎞||κ̄
|1
| |
| ξi (u) − σi Bi,T (u) − u(1 − u) |
2 2
.E sup (7.2.11)
0≤u≤1 T
⎡ | |κ̄ ┐
| 1 |
≤ δ κ̄ + E sup || ξi (u)|| Ji,T
T
0≤u≤1
⎡ ┐
| ⎛ ⎞|κ̄
| 2 |
+E sup |σi Bi,T (u) − u(1 − u) | Ji,T ,
2
0≤u≤1

where
⎧ | ⎫
|1 ⎛ ⎞||
Ji,T
. = 1 sup || ξi (u) − σi2 Bi,T
2
(u) − u(1 − u) || > δ .
0≤u≤1 T

By the Hölder inequality,


⎡ | |κ̄ ┐ ⎛ | |κ/2 ⎞2κ̄/κ
|1 | |1 | ( )(κ−2κ̄)/κ
.E sup || ξi (u)|| Ji,T ≤ E sup || ξi (u)|| E{Ji,T }
0≤u≤1 T 0≤u≤1 T
⎛ | |κ/2 ⎞2κ̄/κ
|1 |
≤δ (κ−2κ̄)/κ
E sup | ξi (u)||
| (7.2.12)
0≤u≤1 T
390 7 High-Dimensional and Panel Data

and
⎡ ┐
| ⎛ ⎞|κ̄
| |
E
. sup |σi2 Bi,T
2
(u) − u(1 − u) | Ji,T (7.2.13)
0≤u≤1
⎛ ⎞2κ̄/κ
| ⎛ ⎞|κ/2 ( )(κ−2κ̄)/κ
| |
≤ E sup |σi2 Bi,T
2
(u) − u(1 − u) | E{Ji,T }
0≤u≤1
⎛ ⎞2κ̄/κ
| ⎛ ⎞|κ/2
| |
≤δ (κ−2κ̄)/κ
E sup |σi2 Bi,T
2
(u) − u(1 − u) |
0≤u≤1
⎛ ⎞2κ̄/κ
|⎛ ⎞|κ/2
| 2 |
≤ δ (κ−2κ̄)/κ σi2κ̄ E sup | Bi,T (u) − u(1 − u) | .
0≤u≤1

Using the distribution of the Brownian bridge one can easily verify that
|⎛ ⎞|κ/2
| 2 |
E sup | Bi,T
. (u) − u(1 − u) | ≤ c10 . (7.2.14)
0≤u≤1

We note
| | ⎛ ⎞2
|1 |
| | −1/2
| T ξi (u)| ≤ σi + 4 sup T
2
. sup si (u) . (7.2.15)
0≤u≤1 0≤u≤1

According to our previous arguments, Assumption 7.2.3 implies


| l |κ
| Σ | ⎛ ⎞
| |
.E | ei,t | ≤ c11 (l − k)E|ei,0 |κ + (l − k)κ/2 (Eei,0
2 1/2
)
| |
t=k+1

and therefore by the maximal inequality of Móricz et al. (1982)


| |κ ⎛ ⎞
| |
E sup |T −1/2 si (u)| ≤ c12 E|ei,0 |κ + (Eei,0
.
2 1/2
) . (7.2.16)
0≤u≤1

Putting together (7.2.11)–(7.2.16) we conclude

| ⎛ ⎞
|1 ⎛ ⎞|| 2
. lim max E sup || ξi (u) − σi Bi,T (u) − u(1 − u) ||
2 2
min(N,T )→∞ 1≤i≤N 0≤u≤1 T

=0 (7.2.17)
7.2 Panel Models with Common Factors 391

and
| ⎛ ⎞
|1 ⎛ ⎞|| κ̄
. lim |
max E sup | ξi (u) − σi Bi,T (u) − u(1 − u) |
2 2 | (7.2.18)
min(N,T )→∞ 1≤i≤N 0≤u≤1 T

= 0.

The approximations in (7.2.17) and (7.2.18) yield

Σ
N
1 Σ
N
. Eη 2
= E η̄i2 + o(N) (7.2.19)
T2 i
i=1 i=1

and

Σ
N
1 Σ
N
. E|ηi | =
κ̄
E|η̄i |κ̄ + o(N), (7.2.20)
T2
i=1 i=1

where

Σ
N
η̄i = σi2
. αl (B 2 (ul ) − ul (1 − ul ))
l=1

and .{B(u), 0 ≤ u ≤ 1} is a Brownian bridge. It follows from (7.2.19) and (7.2.20)


that
⎛N ⎞1/κ̄
Σ
E|ηi |κ̄ ⎛ ⎞1/κ̄
T κ̄ N
i=1
.
⎛N ⎞1/2 ≤ c13 ( )1/2 → 0, as min(N, T ) → ∞.
Σ T 2N
Eηi2
i=1

Hence Lyapunov’s theorem gives that the finite dimensional distributions of .ZN,T
converge to a multivariate normal distribution. Our arguments also show that
.E┌ (u) = 0 and
0

⎡ ┐ Σ
N
. E┌ 0 (u)┌ 0 (v) = E (B 2 (u) − u(1 − u))(B 2 (v) − v(1 − v)) lim σi4 .
N →∞
i=1

Now (7.1.14) finishes the proof of the lemma. ⨆



392 7 High-Dimensional and Panel Data

Lemma 7.2.2 If .H0 of (7.1.2), and Assumptions 7.2.2–7.2.4 hold, then

⎛N ⎞−1 ⎧ ⎛ ⎞⎫2
Σ N ⎨
Σ ⎣T
Σ u⎦ Σ
T ⎬
1 ⎝ ⎣T u⎦ ⎠
. ||λi ||2 λT f t − f t
T ⎩ i T ⎭
i=1 i=1 t−1 t−1

D[0,1]
⎛ ⎞
−→ trace QBE (u)BT
E (u) ,

where .{BE (u), 0 ≤ u ≤ 1} is defined in Theorem 7.2.2.


Proof It follows from elementary calculation that
⎧ ⎛ ⎞⎫2
⎨ ⎣T
Σ u⎦
⎣T u⎦ Σ
T ⎬
λT ⎝
i . ft − ft ⎠
⎩ T ⎭
t−1 t−1
⎛⎧ ⎛ ⎞⎫2 ⎞
⎨ ⎣T
Σ u⎦
⎣T u⎦ Σ ⎠⎬ ⎟
T
⎜ ⎝
= trace ⎝ λT f t − ft ⎠
⎩ i T ⎭
t=1 t−1
⎛ ⎞
⎛ ⎣T
Σ u⎦ Σ
T ⎞⎛ ⎣T
Σ u⎦ Σ
T ⎞T
⎣T u⎦ ⎣T u⎦
= trace ⎝λT
i ft − ft ft − f t λi ⎠
T T
i=1 i=1 i=1 i=1
⎛ ⎞
⎛ ⎣T
Σ ⎞⎛ ⎣T ⎞T
⎣T u⎦ Σ Σ ⎣T u⎦ Σ
u⎦ T u⎦ T
= trace ⎝λi λT
i ft − ft ft − ft ⎠.
T T
i=1 i=1 i=1 i=1

The result now follows from Assumption 7.2.6 and the definition of .Q. ⨆

Lemma 7.2.3 If .H0 of (7.1.2), and Assumptions 7.2.2–7.2.4 hold, then
| ⎛ ⎞| ⎛ ⎛ ⎞1/2 ⎞
|N ⎛ ⎞ ⎣T |
|Σ ⎣T u⎦ Σ u⎦
| ΣN
. sup | si (u) − si (1) ⎝λT ft⎠|| = OP ⎝T ||λi ||2 + T⎠.
| i
0≤u≤1 | i=1 T
t=1 | i=1

Proof Let

Σ
N ⎛ ⎞
⎣T u⎦
.ẐN,T (u) = λi si (u) − si (1) , 0 ≤ u ≤ 1.
T
i=1

Using Assumption 7.2.6 we get


| ⎛ ⎞|
|N ⎛ ⎞ ⎣T |
|Σ ⎣T u⎦ Σ u⎦
|
. sup | si (u) − si (1) ⎝ T
λi ft || = OP (T 1/2 ) sup ||ẐN,T (u)||,

|
0≤u≤1 | i=1 T
t=1 | 0≤u≤1
7.2 Panel Models with Common Factors 393

so we need to show only


⎛ ⎛N ⎞1/2 ⎞
Σ
. sup ||ẐN,T (u)|| = OP ⎝T 1/2 ||λi ||2 + T 1/2 ⎠ . (7.2.21)
0≤u≤1 i=1

Assumptions 7.2.3 and 7.2.6 yield that for all .0 ≤ u ≤ 1


|| N ||
||Σ || Σ
N
T || T ||
.||E ẐN,T (u)ẐN,T (u)|| ≤ c1 T || σi λi λi || ≤ c2 T ||λi ||2 ,
2
|| ||
i=1 i=1

so (7.2.21) holds if we can establish the tightness of .{ẐN,T (u)/(T 1/2 aN ), 0 ≤ u ≤


1}, where

Σ
N
2
.aN = ||λi ||2 + 1.
i=1

Let .λi,k be the kth coordinate of .λi . Following the proof of Lemma 7.2.1 one can
prove that
|N ┌ ⎛ ⎞ ⎛ ⎞┐||κ
|Σ ⎣T u⎦ ⎣T v⎦
| |
.E | λi,k si (u) − si (1) − si (v) − si (1) |
| T T |
i=1
⎧ ⎛ ⎞κ/2 ⎫
⎨ 1 Σ N
1 Σ
N ⎬
≤ c3 |λ i,k | κ
+ λ 2
|u − v|κ/2
⎩ aN
κ 2
aN
i,k ⎭
i=1 i=1

≤ c4 |u − v|κ/2 ,

since an application of Assumption 7.2.5 gives

1 Σ
N
.
κ |λi,k |κ ≤ c5 .
aN
i=1

The tightness now follows from Billingsley (1968) p. 95. ⨆



Proof of Theorem 7.2.1 It follows from Lemmas 7.2.1–7.2.2 and the assumption
c∗ = 0 that
.

|N ⎛ ⎞||

| 2 ⎣T u⎦(T − ⎣T u⎦) |
. sup | Si (u) − σi
2
− ZN,T (u) | = oP (N 1/2 T ).
0≤u≤1 | T |
i=1
394 7 High-Dimensional and Panel Data

So using again Lemma 7.2.1, the proof can be completed via the continuous
mapping theorem. ⨆

Proof of Theorem 7.2.2 It follows from Lemmas 7.2.1–7.2.1 and the assumption
c∗ = ∞ that
.

| ⎧⎛ Σ ⎣T
Σ ⎞2
⎣T u⎦ Σ T Σ
u⎦
| N N T
|
. sup VN,T (u) −
T
λi ft − λi ft
| T
0≤u≤1 i=1 t=1 i=1 t=1
⎛Σ ⎣T
Σ τ⎦ ⎞2 ⎫|
⎣T τ ⎦ Σ T Σ |
N N T
⎣T u⎦(T − ⎣T u⎦) |
− λT
i ft − λi ft |
⎣T τ ⎦(T − ⎣T τ ⎦) T
i=1 t=1 i=1 t=1
⎛ ⎞
Σ
N
= oP T ||λi ||2 ,
i=1

so the result follows from Lemma 7.2.2. ⨆



Proof of Theorem 7.2.3 It is a combination of the proofs of Theorems 7.2.1
and 7.2.1. ⨆

Proof of Theorem 7.2.4 For a proof of this result see Horváth et al. (2022). ⨆

7.3 High-Dimensional Linear Regression

We may also consider generalizing the methods in Chap. 4 to the problem of


detecting changes linear regression parameters in the high-dimensional setting. In
this case we suppose that he have observed data .(yi,t , xi,t ), .i ∈ {1, . . . , N } and
.t ∈ {1, . . . , T }, where we think of .yi,t as denoting the i’th cross-section of a

dependent variable or response observed at time t, and .xi,t ∈ Rd is a corresponding


d dimensional vector of covariates or explanatory variables. We replace (7.2.1) with
the regression model

yi,t = xT
.
T
i,t (β i + δ i 1{t > t0 }) + λi ft + ei,t ,

i ∈ {1, . . . , N }, t ∈ {1, . . . , T }. (7.3.1)

The regression parameter of the i’th cross-section is .β i ∈ Rd , which changes to


.β i + δ i at an unknown time .t0 . The cross-sectional dependence is modeled again by

the common factors .ft ; .t ∈ {1, ..., T } .ft ∈ Rp and .λi ∈ Rp are the corresponding
loadings. Under the null hypothesis

H0 : t0 > T ,
. (7.3.2)
7.3 High-Dimensional Linear Regression 395

so in this case (7.3.1) reduces to

.yi,t = xT T
i,t β i + λi ft + ei,t , i ∈ {1, . . . , N }, t ∈ {1, . . . , T }. (7.3.3)

Following the methods presented in Sects. 4.1.1 and 4.1.3, we define the CUSUM
processes of the residuals from each cross-section

⎣T
Σ u⎦
Si,T (u) =
. Êi,t ,
t=1

where the residuals .Êi,t are defined by

.Êi,t = yi,t − xT
i,t β̂ i,T , i ∈ {1, . . . , N }, t ∈ {1, . . . , T },

with .β̂ i,T denoting the least squares estimator for .β i of the i’th cross-section. We
suggest using functionals of the .l2 -aggregated cross-sectional CUSUM processes
⎛ 2 ⎞
1 Σ
N
Si,T (u) ⎣T u⎦(T − ⎣T u⎦)
V̄N,T (u) =
. − , (7.3.4)
N 1/2 T σi2 T2
i=1

where .σi2 are normalizing constants, or

N ⎛
Σ ⎞
⎣T u⎦(T − ⎣T u⎦) 2
VN,T (u) =
.
2
Si,T (u) − Si,T (τ )
⎣T τ ⎦(T − ⎣T τ ⎦)
i=1

with some .0 < τ < 1. The asymptotic properties of .V̄N,T and .VN,T can be derived
along the lines of the proofs in Sects. 4.1.1 and 4.1.3 assuming that .min(N, T ) →
∞ under various conditions on the strength of the loadings .λi , i ∈ {1, . . . , N }, and
the relative divergence rates of T and N .

7.3.1 Fixed Length Panels

In some applications T , the length of the observed time series is much smaller
than N, the number of cross-sections. In these cases a more realistic asymptotic
framework is to consider T to be fixed while N tends to infinity. Following Antoch
et al. (2019), we use the sum of the squared residuals

Σ
N Σ
t
.ṼN (t) = 2
Êi,s , t ∈ {1, . . . , T }.
i=1 s=1
396 7 High-Dimensional and Panel Data

In contrast to the previous chapters, we assume here that the covariates .xi,t ’s are
deterministic. We now list some assumptions under which .ṼN has a Gaussian limit.
Assumption 7.3.1 .Eft = 0 and .E||ft ||2 < ∞, .t ∈ {1, . . . , T }.
Assumption 7.3.2
(i) The innovations .E i = (Ei,1 , Ei,2 , . . . , Ei,T )T , .i ∈ {1, . . . , N } are independent
(ii) .EEi,t = 0, EEi,t Ei,s = 0 and .c1 ≤ σi2 = EEi,t 2 ≤ c for all .i ∈ {1, . . . , N }, 1 ≤
2
s /= t ≤ T with some .0 < c1 < c2 < ∞.
(iii) There is a .κ > 4 such that

1 Σ
N
. lim sup |Ei,t |κ < ∞, for all 1 ≤ t ≤ T ,
N→∞ N
i=1

Assumption 7.3.3 .{ft , t ∈ {1, ..., T }} and .{E i , i ∈ {1, ..., N}} are independent,
Assumption 7.3.4 There is .c3 > 0 such that .||xi,t || ≤ c3 for all .i ∈ {1, . . . , N}, t ∈
{1, . . . , T },
and
Assumption 7.3.5
(i) there are .t1 and .c4 such that
⎛ ⎞−1 ||⎛ ⎞−1 ||
|| t1 ||
Σ
t1
|| Σ ||
xi,s xT exists and || x xT || ≤ c4
. i,s || i,s i,s ||
s=1 || s=1 ||

for all .i ∈ {1, ..., N}.


(iii) There are .t2 and .c5 such that
⎛ ⎞−1 ||⎛ ⎞−1 ||
|| ||
Σ
T || Σ T ||
⎝ ⎠ ||⎝ ⎠ ||
. xi,s xT exists and || xi,s xT || ≤ c5
i,s || i,s ||
s=T −t2 || s=T −t2 ||

for all .i ∈ {1, . . . , N }.


The common factors have negligible effect on the limit distribution of .ṼN if the
following assumption holds.
Assumption 7.3.6

1 Σ
N
. lim ||λi ||2 = 0.
N→∞ N 1/2
i=1
7.3 High-Dimensional Linear Regression 397

In this case, we let

Σ
t
Zi,t =
. xi,s xT
i,s , t ∈ {1, ..., T }, i ∈ {1, ..., N },
s=1

and

Σ
t
Si,t =
. xi,s Es , t ∈ {1, ..., T }, i ∈ {1, ..., N }.
s=1

Under Assumption 7.3.6, last assumption we make implies that the covariance
function of the standardised and normalised .ṼN (t) exists:

Assumption 7.3.7 the function


⎡ ⎤
Σ
N Σ t'
t Σ {⎛ ⎞⎛ ⎞}
'
.┌(t, t ) = lim ⎣ E Ei,s − xT −1
i,s Zi,T Si,T Ei,s ' − xT −1 ⎦
N →∞ i,s ' Zi,T Si,T
i=1 s=1 s ' =1

exists for all .t, t ' ∈ {1, ..., T }.


The asymptotic expected value of .ṼN (t) is

Σ
N
AN (t) =
.
2
ai,t
i=1

with
t ⎛
Σ ⎞ ⎛ ⎞
−1 −1
.
2
ai,t = σi2 1 − xT
i,s Zi,T xi,s = σi t − trace(Zi,t Zi,T ) .
2

s=1

Now we can state the limit distribution of .ṼN (t) when the common factors are
negligible.
Theorem 7.3.1 If .H0 and Assumptions 7.3.1–7.3.7 hold, then
{ ⎛ ⎞ } D { }
. N −1/2 ṼN (t) − AN (t) , 1 ≤ t ≤ T → ξ (1) (t), 1 ≤ t ≤ T ,

{ }
where . ξ (1) (t), 1 ≤ t ≤ T has a multivariate Gaussian distribution, with
.Eξ
(1) (t) = 0 and .Eξ (1) (t)ξ (1) (t ' ) = ┌(t, t ' ).

The limiting distribution in this case is a T dimensional Gaussian vector with


covariance described in Assumption 7.3.7.
398 7 High-Dimensional and Panel Data

In order to model the situation in which the factor loadings are not negligible, we
replace Assumption 7.3.6 with the following:
Assumption 7.3.8

1 Σ
N
. ||λi ||2 = O(1) with some rN /N 1/2 → ∞.
rN
i=1

In this case we modify Assumption 7.3.7 to:


Assumption 7.3.9 the limit

N ⎛ ⎞⎛ ⎞
1 Σ 1 T −1 1 T −1
.Q(s, v, z) = lim λi − λi xi,s Zi,T xi,z λi − λi xi,v Zi,T xi,z
N →∞ rN T T
i=1

exists for all .1 ≤ s, v ≤ z ≤ T , and there are .1 ≤ s0 , v0 ≤ z0 ≤ T such that at


least one element of .Q(s0 , v0 , z0 ) differs from zero.
Under these conditions we obtain an alternate asymptotic distribution of .ṼN (t).
Theorem 7.3.2 If .H0 and Assumptions 7.3.1–7.3.5, 7.3.8 and 7.3.9 hold, then
⎧ ⎫
1 ⎛ ⎞ D
{ }
. ṼN (t) − AN (t) , 1 ≤ t ≤ T → ξ (2) (t), 1 ≤ t ≤ T ,
rN

where

Σ
t
ξ
.
(2)
(t) = fT
s Q(s, v, t)fv .
s,v=1

Similarly to Sects. 7.1 and 7.2, the limiting distribution in Theorem 7.3.2 is
completely determined by the common factors and their loadings.
Due to the complex definitions of the covariances in Theorems 7.3.1 and 7.3.2,
the computation of the distributions of functionals of .ξ (1) (t) and .ξ (2) (t) is difficult
to do without resorting to resampling methods. Antoch et al. (2019) suggests the
wild bootstrap for this purpose. They also discuss the behaviour of .ṼN (t) under the
alternative of a change point.
Theorem 7.3.1 is a consequence of the following two lemmas. We use the
notation
−1 −1
Êi,t = wi,t + ri,t , where wi,t = Ei,t − xT
.
T
i,t Zi,T Si,T and ri,t = λi ft − xi,t Zi,T Ji,T ,
7.3 High-Dimensional Linear Regression 399

with

Σ
t
.Ji,t = xi,v λT
i fv .
v=1

Thus we have
2
Êi,t
. = wi,t
2
+ 2wi,t ri,t + ri,t
2
.

Lemma 7.3.1 If .H0 and Assumptions 7.3.1–7.3.7 hold, then


⎧ ⎛ t ⎞ ⎫
1 Σ
N Σ D
{ }
.
2
wi,s − ai,t
2
,1 ≤ t ≤ T → ξ (1) (t), 1 ≤ t ≤ T ,
N 1/2
i=1 s=1
{ }
where . ξ (1) (t), 1 ≤ t ≤ T is defined in Theorem 7.3.1.
Proof We see that
⎛ ⎞
−1
Ewi,s = σi2 1 − xT
. Z x
i,s i,T i,s

and Assumptions 7.3.2, 7.3.4 and 7.3.7 imply that


2
E|wi,s
. − Ewi,s | ≤ cE|Ei,s |κ
2 κ/2
(7.3.5)

with some c. Thus we get


| t |κ/2
Σ
N |Σ |
| 2 |
. E| wi,s − ai,t | = O(N ).
2
| |
i=1 s=1

We can assume that the matrix .┌(t, t ' ), 1 ≤ t, t ' ≤ T is nonsingular. If .┌(t, t ' ), 1 ≤
t, t ' ≤ T is singular, we may instead consider a subset of the coordinates of
.{ṼN (t), 1 ≤ t ≤ T } that (after centralization and norming) have a nonsingular

asymptotic covariance matrix. It follows then for any .(c̄1 , . . . , c̄T )T /= 0,


⎛ T ⎛ t ⎞⎞
1 Σ Σ Σ
N
. lim var c̄t wi,s − ai,t
2 2
> 0.
N →∞ N
i=1 t=1 s=1

Thus (7.3.5) yields


| T ⎛ t ⎞|κ/2
Σ
N |Σ Σ |
| 2 |
. E| c̄t wi,s − ai,t | = O(N).
2
| |
i=1 t=1 s=1
400 7 High-Dimensional and Panel Data

The lemma now follows from applications of Lyapunov’s theorem (see Petrov 1995,
p. 122) and the Cramér–Wold lemma (see Billingsley 1968). ⨆

Lemma 7.3.2 If .H0 and Assumptions 7.3.1–7.3.7 hold, then

Σ
N ⎛ ⎞
.
2
ri,t = oP N 1/2 ,
i=1

and

Σ
N ⎛ ⎞
. ri,t wi,t = oP N 1/2 .
i=1

Proof We note
−1
2
ri,t
. ≤ 2||λi ||2 ||ft ||2 + 2(xT 2
i,t Zi,T Ji,T ) .

By Assumptions 7.3.1 and 7.3.6 we have

Σ
N ⎛ ⎞
. ||λi ||2 ||ft ||2 = oP N 1/2 .
i=1

Also, by Assumptions 7.3.3 and 7.3.4 we get


−1
.(xT
i,t Zi,T Ji,T ) ≤ c1 ||Ji,T || ≤ c2 ||λi || ||ft || ,
2 2 2 2

which yields

Σ
N Σ
N ⎛ ⎞
−1
. (xT
i,t Zi,T Ji,T )
2
= O(1) max ||fs || 2
||λi ||2 = oP N 1/2 .
1≤s≤T
i=1 i=1

This completes the proof of the first part of the lemma. Towards establishing the
second part, we write
T −1 T −1 T −1 T −1
wi,t ri,t = Ei,t λT
.
T
i ft − Ei,t xi,t Zi,T Ji,T − xi,t Zi,T Si,T λi ft + xi,t Zi,T Si,T xi,t Zi,T Ji,T .

Assumptions 7.3.1–7.3.3 yield that .EEi,t λT


i ft = 0 and
⎛N ⎞
Σ Σ
N ⎛ ⎞
.var Ei,t λT
i ft = O(1) ||λi ||2 = O N 1/2 , (7.3.6)
i=1 i=1
7.3 High-Dimensional Linear Regression 401

−1
on account of Assumption 7.3.6. Similarly, .EEi,t xT
i,t Zi,T Ji,T = 0 and
⎛N ⎞
Σ ⎛ ⎞
−1
.var Ei,t xT
i,t Zi,T Ji,T = O N 1/2 . (7.3.7)
i=1

Repeating the arguments used in the derivations of (7.3.6) and (7.3.7) one can verify
|N |
|Σ | ⎛ ⎞
| T −1 T |
.| xi,t Zi,T Si,T λi ft | = O N 1/4
| |
i=1

and
|N |
|Σ | ⎛ ⎞
| T −1 T −1 |
.| xi,t Zi,T Si,T xi,t Zi,T Ji,T | = O N 1/4 ,
| |
i=1

completing the proof of the lemma. ⨆



Theorem 7.3.1 follows directly from Lemmas 7.3.1 and 7.3.2.
Proof of Theorem 7.3.2 Using the definition of .ri,t we have

Σ
T
−1
2
.ri,t = fT T
t λi λi f t −2 fv λi xT T
i,v Zi,T xi,t λi ft
v=1

Σ
T
−1 T −1
+ fs λi xT T
i,s Zi,T xi,t xi,t Zi,T xi,v λi fv .
s,v=1

Assumptions 7.3.8 and 7.3.9 yield


⎧ ⎫
1 Σ 2
N { }
. ri,t , 1 ≤ t ≤ T → ξ (2) (t), 1 ≤ t ≤ T , a.s.
rN
i=1

It follows from the proofs of Lemmas 7.3.1 and 7.3.2 that


|N |
|Σ | ⎛ ⎞
| |
.| wi,t − AN (t)| = O N 1/2
2
| |
i=1

and
|N |
|Σ |
| |
.| wi,t ri,t | = oP (rN ),
| |
i=1

completing the proof. ⨆



402 7 High-Dimensional and Panel Data

7.4 Changes in the Parameters of RCA Panel Data Models

We have seen in Sects. 5.3 and 5.4 that estimators for the parameters of RCA time
series models satisfy the central limit theorem in both stationary and explosive
settings. As such it is reasonably straightforward to construct change point detection
procedures for high-dimensional and panel data models in which the cross-sectional
series follow RCA specifications that allow for stationary and explosive cross-
sections, and changes between stationary and explosive regimes. We consider a
model where the cross-sections are RCA(1) sequences cross-correlated by the
presence of common factors:

yi,t = (βi + δi 1{t > t0 } + Ei,t,1 )yi,t−1 + λT


. i ft + Ei,t,2 ,

t ∈ {1, . . . , T }, i ∈ {1, . . . , N}.

In this section we are interested in detecting possible changes in the regression


coefficients .βi , .i ∈ {1, . . . , N }. We have seen in Sect. 5.3 that the variance of the
error terms .Ei,t,2 cannot be estimated in the explosive case. One can also consider
changes in the variances of .Ei,t,1 , although we do not pursue such a test here.
Our testing procedures are based on the residuals

Êi,t = yi,t − β̂i,T yi,t−1 , t ∈ {2, ..., T }, i ∈ {1, ..., N },


.

where .β̂i,T is the weighted least square estimator in the i’th cross-section
⎛ T ⎞⎛ T ⎞−1
Σ yi,t−1 yi,t Σ yi,t−1
2
β̂i,T
. = .
t=2
1 + yi,t−1
2
t=2
1 + yi,t−1
2

Let
⎛ ⎞
⎣T u⎦
1 ⎝Σ ⎣T u⎦ Σ
T
Êi,t Êi,t ⎠,
Si,T (u) =
. −
T 1/2
t=2
(1 + yi,t−1
2 )1/2 T
t=2
(1 + y 2
i,t−1 ) 1/2

0 ≤ u ≤ 1, i ∈ {1, ..., N}

denote the CUSUM process of the residuals of the i’th cross-section. As in Sect. 7.1
we define the sum of the squares of the CUSUM processes
⎛ 2 ⎞
1 Σ
N
Si,T (u) ⎣T u⎦(T − ⎣T u⎦)
. V̄N,T (u) = − ,
N 1/2 2
σ̂i,T T2
i=1
7.4 Changes in the Parameters of RCA Panel Data Models 403

2 are estimators for the variances of .Ê ’s, and as in Sect. 7.2 we define
where .σ̂i,T i,t

N ⎛
Σ ⎞
1 ⎣T u⎦(T − ⎣T u⎦) 2
.VN,T (u) = 2
Si,T (u) − Si,T (τ ) ,
N 1/2 ⎣T τ ⎦(T − ⎣T τ ⎦)
i=1

where .0 < τ < 1. We assume that the innovations for each cross-section are
independent and satisfy fourth order moment conditions as in Assumption 5.3.1
of Chap. 5.
Assumption 7.4.1
(i) .{Ei,,t,1 , t ∈ Z} and .{Ei,t,2 , t ∈ Z}, .i ∈ {1, ..., N}, are independent sequences,
(ii) .{Ei,t,1 , t ∈ Z} are independent and identically distributed random variables
with .EEi,t,1 = 0, .c1 ≤ EEi,t,1 2 = σi,12 ≤ c and .E|E |4 ≤ c for all .i ∈
2 i,1 3
{1, ..., N } with some .0 < c1 , c2 , c3 < ∞,
(iii) .{Ei,t,2 , t ∈ Z} are independent and identically distributed random variables
with .EEi,2 = 0, .c4 ≤ EEi,t,2 2 = σi,2 2 ≤ c and .E|E
5 i,t,2 | ≤ c6 for all .i ∈
4

{1, ..., N } with some .0 < c4 , c5 , c6 < ∞.


We also need that the common factors and the errors are independent:

Assumption 7.4.2 .{Ei,t,1 , Ei,t,2 , t ∈ Z, i ∈ {1, ..., N}} and .{ft , t ∈ {1, ..., T }} are
independent
We assume that the loadings and common factors satisfy the following:
Assumption 7.4.3
(i) .Eft = 0, .Eft fT
t = I and .E||ft || ≤ c8 , with some .ν > 4 and .c8 < ∞,
ν

(ii) .||λi || ≤ c9 ,
(iii)

1 Σ
N
. lim ||λi || = 0.
N →∞ N 1/2
i=1

Our final assumption states that the number of cross-sections asymptotically


grows slower then the square root of the number of the observations in each cross-
section:
Assumption 7.4.4

N
. → 0.
T 1/2
We consider two subsets of the for the cross-sectional series: stationary .A and
explosive .B cross-sections. Whether a cross-section is in either subset is determined
by .E log |βi + Ei,0,1 |:
404 7 High-Dimensional and Panel Data

Assumption 7.4.5
(i) there is .c8 < 0 such that if .i ∈ A, then .E log |βi + Ei,0,1 | ≤ c8 ,
(ii) there is .c9 > 0 such that if .i ∈ B, then .E log |βi + Ei,0,1 | ≥ c9 , and .Ei,0,1 has a
bounded density.
2 , i ∈ {1, ..., N} in the definition of .V̄
We use the estimators .σ̂i,T N,T (u). We show
on the proof of Theorem 7.4.1 that .Si,T (u) is asymptotically the sum of uncorrelated
random variables. Hence we suggest the normalization with
⎛ ⎞2
1 Σ 1 Σ Êi,s,1 yi,s−1
T T
Êi,t,1 yi,t−1
2
.σ̂i,T = − ,
T (1 + yi,t−1 )1/2 T (1 + yi,s−1 )1/2
t=2 s=2

where

Êi,t,1 = yi,t − β̂i,T yi,t−1 ,


. t ∈ {2, ..., T }

We wish to point out that the normalization does not require foreknowledge of the
stationarity properties of the observations in the cross-sections.
Theorem 7.4.1 If .H0 of (7.3.2) and Assumptions 7.4.1–7.4.4 hold, then

D[0,1]
V̄N,T (u) −→ ┌(u),
.

where .{┌(u), 0 ≤ u ≤ 1} is a Gaussian process with .E┌(u) = 0 and .E┌(u)┌(v) =


2u2 (1 − v)2 , 0 ≤ u ≤ v ≤ 1.
We note that .┌ already appeared as a limiting process in Theorem 7.1.1.
Due to Assumption 7.4.3, the common factors disappear in the limit. If .i ∈ A,
then .yi,t is close to


Σ | |
l
wi,t =
. Ei,t−l,2 (βi + Ei,t−j +1 ), t ∈ Z. (7.4.1)
l=0 j =1

{wi,t , t ∈ Z} is thus a stationary Bernoulli shift, and under the conditions of


.

Theorem 7.4.1 it is .Lν -decomposable. The asymptotic variance of the CUSUM


processes .Si,T (u) is asymptotically .σi2 u(1 − u), where
⎧ ⎛ 2


⎨σ2 wi,0
i,0,1 E , if i ∈ A
σi2 =
. 1 + wi,0
2 (7.4.2)

⎩ 2 ,
σi,0,1 if i ∈ B.
7.4 Changes in the Parameters of RCA Panel Data Models 405

Theorem 7.4.2 If .H0 of (7.3.2) and Assumptions 7.4.1–7.4.4 hold, then

D[0,1]
VN,T (u) −→ ┌ ∗ (u),
.

where .{┌ ∗ (u), 0 ≤ u ≤ 1} is a Gaussian Process with .E┌ ∗ (u) = 0,



u(1 − u) v(1 − v)
E┌(u)┌(v) = σ̄ 4 g(u, v) −
. g(τ, v) − g(τ, u)
τ (1 − τ ) τ (1 − τ )

u(1 − u) v(1 − v)
+ g(τ, τ ) ,
τ (1 − τ ) τ (1 − τ )

and .g(u, v) = 2u2 (1 − v)2 , 0 ≤ u ≤ v ≤ 1 with

Σ
N
σ̄ 4 = lim
. σi4 .
N →∞
i=1

Proofs of Theorems 7.4.1 The proofs are based on the decomposition

T 1/2 Si,T (k/T ) = (βi − β̂i,T )Zi,T ,1 (k/T ) + Zi,T ,2 (k/T )


.

+ Zi,T ,3 (k/T ) + Zi,T ,4 (k/T ),

where
k
Zi,T ,l (k/T ) = Ri,T ,l (k/T ) −
. Ri,T ,l (1), 1 ≤ l ≤ 4, 1 ≤ k ≤ T ,
T

Σ
k
yi,t−1
.Ri,T ,1 (k/T ) = ,
t=2
(1 + yi,t−1
2 )1/2

Σ
k
Ei,t,1 yi,t−1
.Ri,T ,2 (k/T ) = ,
t=2
(1 + yi,t−1
2 )1/2

Σ
k
λT
i ft
Ri,T ,3 (k/T ) =
.

t=2
(1 + yi,t−1
2 )1/2

and

Σ
k
Ei,t,2
. Ri,T ,4 (k/T ) = .
t=2
(1 + yi,t−1
2 )1/2
406 7 High-Dimensional and Panel Data

It follows from
Σ
t | |
t | |
t
yi,t =
. (λT
i fl + Ei,l,2 ) (βj + Ei,j,1 ) + (βi + Ei,j,1 )
l=1 j =l+1 j =1

Σ
t | |
t Σ
t | |
t
= λT
i fl (βj + Ei,j,1 ) + Ei,l,2 (βj + Ei,j,1 )
l=1 j =l+1 l=1 j =l+1

| |
t
+ (βi + Ei,j,1 ).
j =1

If .i ∈ A, then the sum



Σ | |
l ∞
Σ | |
l
ȳi,t =
. Ei,t−l,2 (βi + Ei,t−j +1 ) + λT
i ft−l (βi + Ei,t−j +1 ) (7.4.3)
l=0 j =1 l=0 j =1

= wi,t + λT
i wi,t

is absolutely convergent with probability one, and satisfies the recursion

ȳi,t = (βi + Ei,t,1 )ȳi,t−1 + λT


. i ft + Ei,t,2 , t ∈ Z. (7.4.4)

Hence there is a .κ > 0 such that


| |κ | |κ
.E |yi,t − ȳi,t | ≤ c1 c2 E |ȳi,t | ≤ c3
t
and (7.4.5)

with some .0 < c1 , c3 < ∞, 0 < c2 < 1. Due to (7.4.5), we can replace .yi,t with
ȳi,t , the stationary sequence in (7.4.5), so we work with
.

T 1/2 S̄i,T (k/T ) = (βi − β̂i,T )Z̄i,T ,1 + Z̄i,T ,2 + Z̄i,T ,3 + Z̄i,T ,4 ,


.

where
k
Z̄i,T ,l (k/T ) = R̄i,T ,l (k/T ) −
. R̄i,T ,l (1), 1 ≤ l ≤ 4, 1 ≤ k ≤ T ,
T

Σ
k
ȳi,t−1
R̄i,T ,1 (k/T ) =
. ,
t=2
(1 + ȳi,t−1
2 )1/2

Σ
k
Ei,t,1 ȳi,t−1
R̄i,T ,2 (k/T ) =
. ,
t=2
(1 + ȳi,t−1
2 )1/2

Σ
k
λT
i ft
R̄i,T ,3 (k/T ) =
.

t=2
(1 + ȳi,t−1
2 )1/2
7.4 Changes in the Parameters of RCA Panel Data Models 407

and

Σ
k
Ei,t,2
. R̄i,T ,4 (k/T ) = .
t=2
(1 + ȳi,t−1
2 )1/2

Next we note
1 1 Σ
. max (βi − β̂i,T )2 Z̄i,T
2
,1 (k/T )
N 1/2 T 2≤k≤N
i∈A

Σ ⎛ ⎞2
1 1 || |
≤ (βi − β̂i,T )2 max Z̄i,T ,1 (k/T )| .
N 1/2 2≤k≤N T 1/2
i∈A

Using the arguments of Sect. 5.3 one can show that

E(T 1/2 (βi − β̂i,T ))4 ≤ c1


.

and
⎛ ⎞4
1 || |
.E max Z̄i,T ,1 (k/T )| ≤ c2 , (7.4.6)
2≤k≤N T 1/2

with some constants .c1 and .c2 . Thus we get by the Cauchy–Schwarz inequality
⎧ ⎫
1 1 Σ
E
.
1/2
max (βi − β̂i,T )2 Z̄i,T
2
,1 (k/T ) (7.4.7)
N T 2≤k≤T
i∈A
⎡ ⎛ ⎞4 ┐1/2
1 Σ⎡ ┐1/2 1 || |
≤ E(βi − β̂i,T ) 4
E max Z̄i,T ,1 (k/T )|
N 1/2 2≤k≤T T 1/2
i∈A
1 Σ 1/2 1/2 1
≤ c1 c2
N 1/2 T
i∈A
⎛ ⎞
N 1/2
=O
T
= o(1).
408 7 High-Dimensional and Panel Data

It follows from elementary calculation that

1 Σ 1
. max Z̄ 2 (k/N)
N 1/2 1≤k≤T T 2 i,T ,3
i∈A
⎛ || k
1 Σ 1 ||||Σ ft
≤ ||λi || max
2
||
N 1/2 1≤k≤T T 1/2 || (1 + ȳi,t−1
2 )1/2
i∈A t=2
||⎞2
k Σ
T ||
ft ||
− || .
T (1 + ȳ 2 ) 1/2 ||
t=2 i,t−1

Similarly to (7.4.6) we have


⎛ || k ||⎞2
1 ||||Σ k Σ ||
T
ft ft ||
E max
. || − || ≤ c3 ,
1≤k≤T T 1/2 || (1 + ȳi,t−1 )
2 1/2 T (1 + ȳi,t−1 ) ||
2 1/2
t=2 t=2

with some constants .c3 . Hence


⎛ ⎞ ⎛ ⎞
1 Σ 1 1 Σ
N
E
. max Z̄ 2
(k/N) = O ||λi || = o(1)
2
(7.4.8)
N 1/2 1≤k≤T T 2 i,T ,3 N 1/2
i∈A i=1

by Assumption 7.4.3. Next we introduce

Σ
k
Ei,t,1 wi,t−1
R̄R,T ,5 (k/T ) =
. .
i=2
(1 + wi,t−1
2 )1/2

Using the decomposability of .ȳi,t , wi,t and .wi,t one can verify

1 || 2 |
|
E max
. |R̄R,T ,2 (k/T ) − R̄R,T
2
,5 (k/T )| ≤ c4 ||λi || .
2
(7.4.9)
1≤k≤N T

Similar arguments give

1 || 2 |
|
E max
. |R̄R,T ,4 (k/T ) − R̄R,T
2
,6 (k/T ) | ≤ c5 ||λi ||2 , (7.4.10)
1≤k≤N T

where

Σ
k
Ei,t,1
R̄R,T ,6 (k/T ) =
. .
i=2
(1 + wi,t−1
2 )1/2
7.4 Changes in the Parameters of RCA Panel Data Models 409

Putting together (7.4.7)–(7.4.10) with the Cauchy–Schwarz inequality, one can


show that
| |
1 ||Σ ⎛ 2 ⎞ Σ⎛ ⎞|
|
. sup
1/2 ||
S̄i,T (u) − E S̄i,T (u) −
2
Vi,T ,1 (u) − EVi,T ,1 (u) | = oP (1),
2 2
0≤u≤1 N |
i∈A i∈A

where

Vi,T ,1 (u) = V̄i,T ,1 (u) − uV̄i,T ,1 (1)


.

and
⎣T
Σ u⎦
1 Ei,t,1 wi,t−1 + Ei,t,2
V̄i,T ,1 (u) =
. .
T 1/2
t=2
(1 + wi,t−1
2 )1/2

Using from Sect. 5.3 that .|yi,t | converges in probability to .∞ at an exponential rate,
if .i ∈ B one can prove
| |
|Σ⎛ ⎞ Σ⎛ ⎞|
1 | |
. sup | S̄i,T (u) − E S̄i,T (u) −
2 2
Vi,T ,2 (u) − EVi,T ,2 (u) | = oP (1),
2 2
0≤u≤1 N 1/2 | |
i∈B i∈B

where

Vi,T ,2 (u) = V̄i,T ,2 (u) − V̄i,T ,2 (1)


.

and
⎣T
Σ u⎦
1
V̄i,T ,2 (u) =
. Ei,t,1 .
T 1/2
t=2

Hence we need to consider the weak convergence of



Σ 1 ⎛
1 ⎞
.QN,T (u) = −
2 2
V i,T ,1 (u) EVi,T ,1 (u)
N 1/2 σ2
i∈A i

Σ 1 ⎛ ⎞
+ 2
2
Vi,T ,2 (u) − EVi,T ,2 (u)
2
,
i∈B
τ i

where .σi2 is defined in (7.4.2).


410 7 High-Dimensional and Panel Data

It follows from Assumption 7.4.1 that .QN,T (u) is a sum of independent


processes and
⎛ ⎞
k k
,j (k/T ) = τi 1− j = 1, 2.
2 2
EVi,T
. ,
T T

Now the result in Sect. 7.1 yields that

D[0,1]
QN,T (u) −→ ┌(u),
. (7.4.11)

where the Gaussian process .{┌(u), 0 ≤ u ≤ 1} is a Gaussian process defined in


Theorem 7.4.1.
It can be shown
⎛ ⎞2 c
6
2
E σ̂i,T
. − σi2 ≤ for all 1 ≤ i ≤ N,
T
and therefore (7.4.11) imply

D[0,1]
V̄N,T (u) −→ ┌(u).
.



The proof of Theorem 7.4.2 similarly combines the results of Sect. 5.3 with the
argument used to establish Theorem 7.1.1, and so we omit the details.

7.5 Data Examples

Example 7.5.1 (Exchange Rates of Many Currencies to USD) As in Horváth


et al. (2017a), we considered conducting change point analysis of the exchange rates
between the US dollar and 23 other currencies, obtained from the federal reserve
economic database (St. Louis, 2023). Figure 7.1 contains the graphs of the exchange
rates between the United Kingdom (UK), Canada (CA), Singapore (SI), Switzerland
(SW), Denmark (DN), Norway (NO) and Sweden (SD). We considered the time
period 03/13/2001—03/11/2003, so we have N = 23 cross-sections and each cross-
section has T = 500 observations.
Using the test statistic V(1)
N,T = supu∈[0,1] |V̄N,T (u)| as in Theorem 7.1.1, the
null hypothesis of no change in the mean of the cross-sections is strongly rejected
(p-value computed as zero). The estimated time of change, i.e. the location of the
maximum of the test statistic coincides with the date 05/16/2002. Since test statistic
as well as the number of available observations are large, the 90%, 95%, and 99%
confidence intervals constructed as described in Horváth et al. (2017a) contain the
single time point 297 (05/16/2002).
7.5 Data Examples 411

11
1.8

1
1

10
1.7

2
1.6

9
3
1.5

8
4
1.4

7
1.3

03/13/2001 05/16/2002 03/11/2003 03/13/2001 05/16/2002 03/11/2003

Fig. 7.1 The graphs of the exchange rates, 1 = UK, 2 = SI, 3 = CA, 4 = SW (left panel); 1 = DN,
2 = NO, 3 = SD (right panel) with respect to the US dollar
1.2

1.1
1
1
1.1

1.0
1.0

2 3
3

0.9
0.9
0.8

4
0.8
0.7

03/13/2001 05/24/2002 03/11/2003 03/13/2001 05/24/2002 03/11/2003

Fig. 7.2 The graphs of the relative exchange rates, 1 = UK, 2 = SI, 3 = CA, 4 = SW (left panel);
1 = DN, 2 = NO, 3 = SD (right panel) with respect to the US dollar

It is clear from Fig. 7.1 that the exchange rates are between 1.3 and 11, so if the
same proportional change occurs in a cross-section with high values, this change
will be relatively large when compared to the other cross-sections. As such a single
cross-section can disproportionately contribute to the value of the test statistic. To
overcome this problem they rescaled the observations in each cross-section with
the first observation, i.e. with the exchange rate on 03/13/2001. Figure 7.2 contains
the graphs of the relative changes in exchange rates with respect to the US dollar
for the same countries as in Fig. 7.1. Horváth et al. (2017a) repeat the analysis
for the relative changes (rescaled) in the exchange rates with respect to the US
dollar, resulting in rejection and the estimated time of change 303 corresponding
to 05/24/2002. They also construct confidence interval around the estimated time
of change which in includes 297, the estimated time of change in the original (not
scaled) data.
Example 7.5.2 (Change Points in US Macroeconomic Data) We consider detect-
ing change points in the mean of high-dimensional macroeconomic data on the
United States (US). We focus on the FRED–MD data set, which comprises monthly
resolution data on N = 128 macroeconomic variables available from the United
States federal reserve economic database (FRED). The analysis of high-dimensional
macroeconomic panel data has drawn a great deal of attention in the last decade.
One of the most influential papers on this area of research is Stock and Watson
(2012), who used up to 200 time series of macroeconomic variables to investigate
the dynamics of the great recession during 2007–2009. Twenty randomly selected
series from this panel are illustrated in Fig. 7.3.
412 7 High-Dimensional and Panel Data

Fig. 7.3 20 randomly selected series from the FRED-MD data set with vertical lines representing
estimated change points significant at level 0.05 after applying a binary segmentation procedure
(1)
based on the statistic/test VN,T

In total, the 128 time series are taken from the period from June-1999 to June-
2019, with information related to nine areas, including output and income, labor
markets, consumption and orders, orders and inventories, money and credit, interest
rate and exchange rates, prices, and the stock market. McCracken and Ng (2016)
provided detailed descriptions of this data set. They also suggested transformations
of each series towards stationarity so the data are suitable for a factor analysis,
which we applied. They found that the transformed data have a factor structure
similar to the model considered in Stock and Watson (2012). Analysis of this panel
data using the criteria proposed by Bai and Ng (2002) to determine the number of
common factors suggests that the cross-sectional dependence is well explained by
eight common factors.
We take as the goal of the analysis to evaluate for structural breaks in the means
of this high-dimensional time series over the observation period. Change points in
the means of the cross-sections may represent changing phases of the US economy.
To detect such change points, we computed the test statistics

(1) (2)
VN,T = sup |V̄N,T (u)| and VN,T = sup |VN,T (u)|.
.
u∈[0,1] u∈[0,1]

(1) (2)
To approximate the null distributions of VN,T and VN,T , we applied Theorem 7.1.1
and Theorems 7.2.1–7.2.4, respectively. It is discussed in Horváth et al. (2022)
how the normalizing sequences and distributions are approximated. Notably, in
(2)
approximating the null distribution of VN,T , we make use of a factor model-based
bootstrap similar to the method proposed in Cho and Fryzlewicz (2015). We used
7.5 Data Examples 413

Table 7.1 Detected changes and the corresponding relevant events in the FRED-MD data with
the corresponding estimated p-values in brackets
Tests 1st change Event 2nd change Event 3rd change Event
(1)
VN,T Aug/03 (0.00) Labor Mar/08 (0.01) Federal Aug/12 (0.04) Unemployment
market bailout rate bottom
recover
(2)
VN,T Jun/06 (0.00) Housing Mar/08 (0.00) Federal Jan/16 (0.00) Growth rate
boom end bailout decrease

binary segmentation to estimate and detect additional changes in the means. Each
test found that the largest initial change in the macroeconomic structure occurred
in March 2008, which corresponds to when the US government bailout began after
the sub-prime mortgage crisis. However, different dates were found in the second
and third step of a subsequent binary segmentation procedure. The test based on
(2)
VN,T detected changes in June 2006 and January 2016, whereas the test based on
(1)
VN,T detected change points in August 2003 and August 2012. See Table 7.1 for a
summary of these findings.
Example 7.5.3 (Short Time Series in the Capital Asset Pricing Model) As
studied in Antoch et al. (2019), we consider an application of the methods presented
in Sect. 7.3 to the capital asset pricing model (CAPM). The (Fama and French, 1993)
three factor model augmented with the Carhart (1997) momentum factor is defined
as

.yi,t = β1,i,t + x2,t β2,i,t + x3,t β3,i,t + x4,t β4,i,t + x5,t β5,i,t + Ei,t ,
t ∈ {1, ..., T }, i ∈ {1, ..., N}, (7.5.1)

where yi,t denotes the excess return on the mutual fund; x2,t is the market risk
premium; x3,t is the value factor, calculated as the return difference between
portfolios with the highest decile of stocks and the lowest decile of stocks in terms
of the ratio of book equity-to-market equity; x4,t is the value factor, calculated as
the return difference between portfolios with the smallest decile of stocks and the
largest decile of stocks in terms of size; x5,t is the momentum factor calculated as
the return difference between portfolios with the highest decile of stocks and lowest
decile of stocks in terms of recent return (i.e. momentum); and Ei,t is the random
error.
The four factors can be downloaded from Ken French’s data library.1 The raw
dataset on mutual funds contains monthly return data of 6190 US mutual funds
from January 1984 to November 2014 were taken from http://finance.yahoo.com.
Using the Yahoo finance classification, they consider nine categories of the US
mutual funds. These are Large Blend, Large Growth, Large Value, Middle Blend,

1 mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.
414 7 High-Dimensional and Panel Data

Middle Growth, Middle Value, Small Blend, Small Growth and Small Value. These
categories are combinations of the mutual fund size and their investment strategies.
We take as the goal of our analysis to evaluate for change points in the parameters
of model (7.5.1) for mutual funds in these categories.
There are many missing values in the mutual fund dataset because different
mutual funds have different start dates and some of them have already been
terminated. Antoch et al. (2019) selects the mutual funds that have no missing
returns for the period of the subprime crisis (January 2006 to February 2010), so
that T = 50. Since T is small, we adopt the fixed T asymptotic framework of
Theorem 7.3.1. In order to test for change points, we use the test statistic
| |
| |
. sup N −1/2 |ṼN (t) − ÂN (t)| , (7.5.2)
t∈(0,1)

where

Σ
N ⎛ ⎞
ÂN (t) =
.
2
âi,t,1 , 2
âi,t,1 = σ̂i2 trace Z−1
i,t − Z−1
i,T ,
i=1

and σ̂i2 is the sample variance of the linear model residuals in each cross-section.
The null critical values for this statistic were estimated as described in Theorem
7.3.1 using a wild bootstrap as described in Antoch et al. (2019). Change points
were estimated using the maximal argument of the test statistic in (7.5.2).
For all but one fund category, the test statistic in (7.5.2) was above the 5%
critical values during the sub-prime crisis period (middle of 2008 to early 2009).
Interestingly, the test cannot detect changes for the category of Small Value mutual
funds at the 5% significance level. However the Small Value mutual fund category
is significant at the 10% significance level. The coefficients for the four factors
changed first for the Large Blend mutual fund category with estimated break point
in August 2008, and the Small Growth mutual fund category has estimated change
point just a month later in September 2008. The coefficients for Large Growth,
Middle Blend, Middle Growth, and Small Blend appeared to change last in March
2009. It is interesting to note that only the Large Blend indicated a change out of all
large type mutual funds. The Large Blend is defined as a balanced mix of growth
and value stocks, and may more closely resemble the market as a whole.
Figure 7.4 shows the estimated break points for the coefficients for the Mutual
Fund categories with the levels of the S&P 500. The Large Blend category indicated
a structural change when the S&P 500 was well above the level 1200. This could
have been used as a potential trading signal. The lowest point of the S&P 500 was
February 2009, below the 800 mark. Here the change point was detected for the
Large Growth, Middle Blend, Middle Growth, and Small Blend.
7.6 Exercises 415

Fig. 7.4 Estimated change points of the mutual fund returns compared to the level of the S&P 500

7.6 Exercises

Exercise 7.6.1 We consider the model

xi,t = μi + Ei,t , 1 ≤ t ≤ Ti , i ∈ {1, . . . , N },


.

EEi,t = 0, 0 < c1 ≤ EEi,t


2 = σ 2 ≤ c < ∞, E|E |κ ≤ c < ∞ with some κ > 2.
i 2 i,t 3
We define
⎣T
Σi u⎦
1 Σ
Ti
1
.Si (u) = 1/2
(xi,t − x̄i ) with x̄i = xi,t .
Ti Ti
t=1 t=1

We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N are independent. Show that
⎛N ⎞−1/2
Σ Σ
N
D[0,1]
. σi2 Si (u) −→ B(u),
i=1 i=1

as N → ∞ and min1≤i≤N Ti → ∞, where {B(u), 0 ≤ u ≤ 1} is a Brownian


bridge.
Exercise 7.6.2 We consider the model

xi,t = μi + Ei,t , t ∈ {1, . . . , T }, i ∈ {1, . . . , N },


.
416 7 High-Dimensional and Panel Data

EEi,t = 0, 0 < c1 ≤ EEi,t


2 = σ 2 ≤ c < ∞, E|E |κ ≤ c < ∞ with some κ > 2.
i 2 i,t 3
We define
⎣T
Σ 1 Σ
u⎦ T
1
Si (u) =
. (xi,t − x̄i ) with x̄i = xi,t .
T 1/2 T
t=1 t=1

We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N are independent. Show that

1 Σ
N
D[0,1]
. Si (u) −→ ┌(u),
N 1/2
i=1

as N → ∞ and T is fixed, where {┌(u), 0 ≤ u ≤ 1} is a Gaussian process. Compute


the mean and the covariance of {┌(u), 0 ≤ u ≤ 1}.
Exercise 7.6.3 We consider the model

xi,t = μi + Ei,t , 1 ≤ t ≤ Ti , i ∈ {1, . . . , N },


.

EEi,t = 0, 0 < c1 ≤ EEi,t


2 = σ 2 ≤ c < ∞, E|E |κ ≤ c < ∞ with some κ > 4.
i 2 i,t 3
We define
⎣T
Σi u⎦
1 Σ 1 Σ
Ti Ti
1 xi,t − x̄i ( )2
Si∗ (u) =
.
1/2
with x̄i = xi,t and σ̄i2 = xi,t − x̄i .
Ti σ̄i Ti Ti
t=1 t=1 t=1

We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N are independent. Show that

1 Σ
N
D[0,1]
. Si∗ (u) −→ B(u),
N 1/2
i=1

as N → ∞ and min1≤i≤N Ti → ∞, where {B(u), 0 ≤ u ≤ 1} is a Brownian


bridge.
Exercise 7.6.4 We consider the model

xi,t = ρi + Ei,t , − < ∞ < t < ∞, i ∈ {1, . . . , N},


.

EEi,t = 0, 0 < c1 ≤ EEi,t


2 = σ 2 ≤ c < ∞, E|E |κ ≤ c < ∞ with some κ > 4.
i 2 i,t 3
We also assume max i∈{1,...,N } |ρi | ≤ c5 < 1. We define ρ̂i (u), the least squares
estimator computed from xi,1 , xi,2 , . . . , xi,⎣T u⎦ . Define

1 (1 − ρi2 )1/2
Si (u) =
. ⎣T u⎦(ρ̂i (u) − ρ̂i (1)).
T 1/2 σi
7.6 Exercises 417

We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N are independent. Show that

1 Σ
N
D[0,1]
. Si (u) −→ B(u),
N 1/2
i=1

as N → ∞ and T → ∞, where {B(u), 0 ≤ u ≤ 1} is a Brownian bridge.


Exercise 7.6.5 We consider the model

. xi,t = μi + λi ft + Ei,t , t ∈ {1, . . . , T }, i ∈ {1, . . . , N },

EEi,t = 0, 0 < c1 ≤ EEi,t


2 = σ 2 ≤ c < ∞, E|E |κ ≤ c < ∞ with some κ > 2.
i 2 i,t 3
We define
⎣T
Σ 1 Σ
u⎦ T
1
Si∗ (u) =
. (xi,t − x̄i ) with x̄i = xi,t .
T 1/2 T
t=1 t=1

We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The common factors {ft , t ∈ Z} are independent and identically
distributed with Eft = 0 and Eft2 = σ 2 . The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N
and {ft , t ∈ Z} are independent. Show that there is rN such that

1 Σ ∗
N
D[0,1]
. Si (u) −→ B(u),
rN
i=1

as N → ∞ and T → ∞, where {B(u), 0 ≤ u ≤ 1} is a Brownian bridge.


Exercise 7.6.6 We consider the model

. xi,t = ρi + λi ft Ei,t , − < ∞ < t < ∞, i ∈ {1, . . . , N},

EEi,t = 0, 0 < c1 ≤ EEi,t 2 = σ 2 ≤ c < ∞, E|E |κ ≤ c < ∞ with some κ > 4.


i 2 i,t 3
We also assume max i∈{1,...,N } |ρi | ≤ c5 < 1. The common factors {ft , t ∈ Z} are
independent and identically distributed with Eft = 0 and Eft2 = σ 2 . We define
ρ̂i (u), the least squares estimator computed from xi,1 , xi,2 , . . . , xi,⎣T u⎦ . Define

1
Si (u) =
. ⎣T u⎦(ρ̂i (u) − ρ̂i (1))
T 1/2
418 7 High-Dimensional and Panel Data

We assume that for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed. The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N and {ft , t ∈ Z} are
independent. Show that there is rN

1 Σ
N
D[0,1]
. Si (u) −→ B(u),
rN
i=1

as N → ∞ and T → ∞, where {B(u), 0 ≤ u ≤ 1} is a Brownian bridge.


Exercise 7.6.7 We consider the model

Xi,t = μi + E i,t , i ∈ {1, . . . , N}, t ∈ {1, . . . , T },


.

Xi,t ∈ Rd , μi ∈ Rd . We assume that for each i the errors {Ei,t ∈ Rd , t ∈ Z} are


independent and identically distributed. The sequences {Ei,t , t ∈ Z}, 1 ≤ i ≤ N
are independent. We also assume E i,t = 0, ||E i,t ||κ ≤ c1 < ∞ with some κ > 2,
E i = EE i,t E T
i,t = diag(σi,1 , σi,2 , . . . , σi,d ), 0 < c2 ≤ σi,j ≤ c3 < ∞, 1 ≤ i ≤
2 2 2 2

N, 1 ≤ j ≤ d. Let
⎛ ⎞ ⎛ ⎞T
⎣T u⎦ ⎣T
1 ⎝Σ Σ u⎦
.Si (u) = (Xi,t − μi )⎠ E −1 i
⎝ (Xi,t − μi )⎠ .
T
t=1 t=1

Show that

1 Σ
N
. (Si (u) − du) converges in D[0, 1]
N 1/2
i=1

and determine the limit when N → ∞ and T → ∞.


Exercise 7.6.8 We consider the model

Xi,t = μi + E i,t , i ∈ {1, . . . , N}, t ∈ {1, . . . , T },


.

Xi,t ∈ Rd , μi ∈ Rd . We assume that for each i the errors {E i,t ∈ Rd , t ∈ Z} are


independent and identically distributed. The sequences {E i,t , t ∈ Z}, 1 ≤ i ≤ N
are independent. We also assume EE i,t = 0, E||E i,t ||κ ≤ c1 < ∞ with some κ > 2,
E i = EE i,t E T
i,t = diag(σi,1 , σi,2 , . . . , σi,d ), 0 < c2 ≤ σi,j ≤ c3 < ∞, 1 ≤ i ≤
2 2 2 2

N, 1 ≤ j ≤ d. Let
⎛ ⎞ ⎛ ⎞T
⎣T u⎦ ⎣T
1 ⎝Σ Σ u⎦
.Si (u) = (Xi,t − X̄i )⎠ E −1
i
⎝ (Xi,t − X̄i )⎠ ,
T
t=1 t=1

1 Σ
T
Xi =
. Xi,t .
T
t=1
7.7 Bibliographic Notes and Remarks 419

Show that

1 Σ
N Σ
N
. (Si (u) − du(1 − u)) converges in D[0, 1]
N 1/2
i=1 i=1

and determine the limit when N → ∞, T → ∞ and N/T → 0.


Exercise 7.6.9 We consider the model
⎛ ⎞
2π t
.xi,t = αi sin + Ei,t , 1 ≤ t ≤ T , 1 ≤ i ≤ N.
T

We assume that EEi,t = 0, 0 < c1 ≤ EEi,t 2 = σ 2 ≤ c < ∞, E|E |κ ≤ c < ∞


i 2 i,t 3
with some κ > 2, for each i the errors {Ei,t , t ∈ Z} are independent and identically
distributed and the sequences {Ei,t , t ∈ Z} are independent. Let α̂i (u) be the least
square estimator for αi computed from xi,t , 1 ≤ t ≤ ⎣T u⎦. Let

1 ( )
Si (u) =
. ⎣T u⎦ α̂i (u) − α̂i (1) .
T 1/2
Show that
N ⎛
Σ ⎞
1
. Si2 (u) − ESi2 (u) converges in D[δ, 1 − δ] for all 0 < δ < 1/2,
N 1/2
i=1

if N → ∞, T → ∞ and N/T 2 → 0.

7.7 Bibliographic Notes and Remarks

Due to increasing access to extremely large data sets, analysis of high dimensional
observations has received considerable attention. Jirák (2015) studies d dependent
change point tests, each based on a CUSUM-statistic. He provides an asymptotic
theory when the maximum over all test statistics as both the sample size and d tend
to infinity. His methods are based on a consistent bootstrap and an appropriate limit
distribution. This allows for the construction of simultaneous confidence bands for
dependent change point tests, and also to determine the location of the change both
in time and coordinates in high-dimensional time series.
The wild binary segmentation (WBS) of Fryzlewicz (2014) provides consistent
estimation of the number and locations of multiple change-points in scalar data.
Due to its random localisation mechanism, WBS works even for short spacings
between the change-points and/or small jump magnitudes, unlike standard binary
segmentation. In the high-dimensional setting Cho and Fryzlewicz (2015) propose
420 7 High-Dimensional and Panel Data

the Sparsified Binary Segmentation (SBS) algorithm which aggregates cross-


sectional CUSUM statistics by adding only those that pass a certain threshold.
This “sparsifying” step reduces the impact of irrelevant, noisy contributions, which
is particularly beneficial in high dimensions. See also Liu et al. (2021) and
Cho (2016). Some of these results are extended to non-stationary time series in
Korkas and Fryzlewicz (2017). Frick et al. (2014) introduce a new estimator, the
simultaneous multiscale change point estimator, for the change point problem in
exponential family regression. See also Barigozzi et al. (2018) in the context of
factor models. An unknown step function is estimated by minimizing the number
of change points over the acceptance region of a multiscale test. For multiple
change-points detection of high dimensional time series, Chen et al. (2021) provide
asymptotic theory concerning the consistency and the asymptotic distribution of
the breakpoint statistics and estimated break sizes. The theory backs up a simple
two-step procedure for detecting and estimating multiple change points. Many
methods for change point analysis in high-dimensions are surveyed and compared in
Aston and Kirch (2018) in terms of a natural efficiency criterion. A self-normalized
approach to conduct change point analysis for the mean of high-dimensional data
was proposed in Wang et al. (2022).
Change point analysis of high-dimensional regression and autoregression were
also considered in Rinaldo et al. (2021), Bai et al. (2020).
Panel data modelling has been used in a number of applied problems in econo-
metrics. Baltagi (2021) and Wooldridge (2010) provide excellent introductions to
the most important theoretical results and several applications are also given. Bai
(2009), Bai (2010) investigates changes in the mean and second order properties of
the observations. Feng and Kao (2021) contains a survey of change point detection
in panel data.
Horváth and Hušková (2012) prove Theorem 7.1.1. No weight function was
used in the definition of their test statistics. If approximations can be obtained for
the aggregated CUSUM processes, one could get the limit distribution of typical
weighted statistics. Chan et al. (2013) obtain a partial result assuming that the
cross-sections are independent and the cross-sections are based on independent and
identically distributed random variables.
Chapter 8
Functional Data

Functional data analysis concerns methods to analyse data that are naturally viewed
as taking values in infinite dimensional function spaces. Examples include data that
can be imagined as curves or surfaces. A general object of this type is termed a
functional data object. When functional data are observed sequentially over time,
they are referred to as functional time series. Usually the data that are available in
this setting are discrete measurements of such objects from which the full functional
data objects must be reconstructed or estimated using curve fitting techniques. In
some cases the functional data objects are fully observable on the domain on which
they are defined, for instance when they represent probability densities or other
summary functions. Entry points to this area include the text books Ramsey and
Silverman (2002), Horváth and Kokoszka (2012), Kokoszka and Reimherr (2017),
and Hsing and Eubank (2015), as well as the monograph Bosq (2000). These
cover in detail how to reconstruct functional data objects starting from discrete
measurements using curve fitting.
Here we assume that the functional data objects under consideration are fully
observed. As mentioned above this often means that initial discrete data have been
preprocessed using a curve fitting technique to form functional data objects, and one
should be wary of the effect of this on subsequent analysis. Generally if the discrete
measurements of the functional data are dense and the measurements are made
with relatively small error, subsequent analyses will not be sensitive to this step.
In addition, to simplify the presentation we assume that the functional data objects
are stochastic processes with domain .[0, 1] and sample paths in .L2 ([0, 1], R) = L2 ,
the Hilbert space of real valued square integrable functions. In other words, the
observations are stochastic processes .{X(t), 0 ≤ t ≤ 1} such that .X : Ω × [0, 1] →
R, and .||X||2 < ∞ almost surely, where
⎛⎛ 1 ⎞1/2
||f ||2 =
. f 2 (t)dt
0

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 421
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2_8
422 8 Functional Data

⎛ ⎛1
is the .L2 norm. Below we use . to denote . 0 . We note that all results below may be
generalized to observations that are general random elements of abstract, separable
Hilbert spaces.

8.1 Change Detection in the Mean of Functional


Observations

Consider functional observations .X1 , . . . , XN following the simple AMOC in the


mean model

μ0 (t) + Ei (t), if 1 ≤ i ≤ k ∗ ,
.Xi (t) = (8.1.1)
μA (t) + Ei (t), if k ∗ + 1 ≤ i ≤ N,

where .EEi (t) = 0 for all .t ∈ [0, 1], and .μ0 , and .μA are unknown mean functions,
with .k ∗ a possible, unknown change point. Since the sample paths of .Xi are assumed
to lie in .L2 , it is natural to define the no change in the mean null hypothesis as

. H0 : ||μ0 − μA || = 0, (8.1.2)

and the alternative of a change in the mean is defined as

HA : ||μ0 − μA || > 0.
. (8.1.3)

Testing procedures may be based on the functional version of the CUSUM


process:
⎛ ⎞
⎣N
E ⎣Nu⎦ E
u⎦ N
ZN (u, t) = N −1/2 ⎝
. Xi (t) − Xi (t)⎠ , 0 ≤ u, t ≤ 1.
N
i=1 i=1

We again model the errors as general, stationary and weakly dependent time
series processes that are decomposable Bernoulli shifts:
Definition 8.1.1 We say .{Ei (t), i ∈ Z, t ∈ [0, 1]} is .Lν -decomposable if .Ei (t) =
g(ηi , ηi−1 , . . .)(t) for some (deterministic) measurable function .g : S∞ → L2 ,
where .{ηj , j ∈ Z} are independent and identically distributed random variables
with values in a measurable space .S, and .Ei (t) = Ei (t, ω) is jointly measurable in
.(t, ω), for each .i ∈ Z. Further .EEi (t) = 0 for all .t ∈ [0, 1], .E||Ei || < ∞ with some
ν
2
.ν > 2, and

( ∗
)1/ν
. E||Ei − Ei,m ||ν2 ≤ am−α with some a > 0, α > 2, (8.1.4)
8.1 Change Detection in the Mean of Functional Observations 423

∗ = g(η , . . . , η ∗ ∗ ∗
where .Ei,l i i−l+1 , ηi−l , ηi−l−1 , . . .), and .{ηi , i ∈ Z} are independent
copies of .η0 , independent of .{ηl , l ∈ Z}.
Assuming the errors are .Lν -decomposable, the series defining the long-run covari-
ance kernel

E
D(t, s) =
. EE0 (t)El (s), t, s ∈ [0, 1], (8.1.5)
l=−∞

is a well defined element of .L2 ([0, 1]2 , R). .D defines a symmetric kernel integral
operator on .L2 , and we use the notation .λ1 ≥ λ2 ≥ . . . ≥ 0 for its ordered set
of eigenvalues and .φ1 , φ2 , . . . to denote the corresponding orthonormal basis of
eigenfunctions satisfying for .t ∈ [0, 1]
⎛⎛
λl φl (t) =
. D(t, s)φl (s)ds, l ∈ N. (8.1.6)

It may be shown that .Lν −decomposibility of the errors implies that



E
. λl < ∞. (8.1.7)
l=1

Theorem 8.1.1 If .H0 of (8.1.2) is satisfied and the errors in model (8.1.1) are .Lν -
decomposable, then there exists a sequence of Gaussian processes .{┌N 0 (u, t), 0 ≤

u, t ≤ 1} such that

. sup (ZN (u, t) − ┌N0
(u, t))2 dt = oP (1),
0≤u≤1

0 (u, t) = 0 and .┌ 0 (u, t)┌ 0 (v, s) = (min(u, v) − uv)D(t, s).


E┌N
.
N N
Proof It follows from Theorem A.3.1 in Appendix A.3 that there is a sequence of
Gaussian processes .{┌N (u, t), 0 ≤ u, t ≤ 1} such that
⎛ ⎛ ⎞2
⎣N
E u⎦
. sup ⎝N −1/2 Ei (t) − ┌N (u, t)⎠ dt = oP (1), (8.1.8)
0≤u≤1 i=1

E┌N (u, t) = 0 and .┌N (u, t)┌N (v, s) = min(u, v)D(t, s). Since under the
.

null hypothesis all .L2 functionals of .ZN do not depend on .μ0 , (8.1.8) implies
Theorem 8.1.1 with
0
┌N
. (u, t) = ┌N (u, t) − u┌N (1, t).
424 8 Functional Data

0 is evidently a Gaussian process, and it is left as an exercise to check that its


┌N
.

covariance function is .(min(u, v) − uv)D(t, s). ⨅



The next result shows how Theorem 8.1.1 can extended to weighted supremum
functionals of the CUSUM process.
Theorem 8.1.2 If .H0 of (8.1.2) holds and the errors in model (8.1.1) are .Lν -
decomposable, then with the sequence of Gaussian processes .{┌N
0 (u, t), 0 ≤ u, t ≤

1} defined in Theorem 8.1.1 we have


⎧⎛ ⎫1/2
1
. sup (ZN ((N + 1)u/N ), t) − ┌N
0
(u, t))2 dt = oP (1),
0<u<1 [u(1 − u)]
κ

for all .0 ≤ κ < 1/2.


Proof Let .0 < δ < 1/2. It follows from Theorem 8.1.1 that
⎧⎛ ⎫1/2
1
. sup (ZN ((N + 1)u/N, t) − ┌N
0
(u, t))2 dt = oP (1).
δ≤u≤1−δ [u(1 − u)]κ

Next we show that for all .x > 0,


⎧ ⎛ ⎫
1
lim
. lim sup P sup (ZN ((N + 1)u/N, t)2 dt > x (8.1.9)
0<u≤δ [u(1 − u)]
δ→0 N →∞ 2κ

= 0,

⎧ ⎛ ⎫
1
. lim lim sup P sup (ZN ((N + 1)u/N, t) dt > x
2
1−δ≤u<1 [u(1 − u)]
δ→0 N →∞ 2κ

= 0, (8.1.10)
⎧ ⎛ ⎫
1
. lim lim sup P sup 0
(┌N (u, t))2 dt > x =0 (8.1.11)
δ→0 N →∞ 0<u≤δ [u(1 − u)]2κ

and
⎧ ⎛ ⎫
1
. lim lim sup P sup 0
(┌N (u, t))2 dt >x = 0. (8.1.12)
δ→0 N →∞ 1−δ≤u<1 [u(1 − u)]2κ

By an application of Theorem A.3.1 in Appendix A.3, we obtain that


⎧ ⎛ ⎞κ ||
||
||
|| ⎫
|| −1/2 E ||
k
N
. lim lim sup P max ||N Ei || > x = 0,
δ→0 N →∞ 1≤k≤N δ k || ||
i=1 2
8.1 Change Detection in the Mean of Functional Observations 425

since .0 < κ < 1/2. Theorem A.3.1 yields


⎧ || k || ⎫
N −1/2+κ ||E ||
|| ||
.P max || Ei || > x
1≤k≤N δ kκ || ||
i=1 2
⎧ || k || ⎫
||E ||
1 || ||
≤P max max || Ei || > xN 1/2−κ
1≤j ≤⎣log(N δ)⎦+1 ej −1 ≤k≤ej kκ || ||
i=1 2
⎧ || k || ⎫
1 || ||
⎣log(N δ)⎦+1
E ||E ||
≤ P max || E i || > xN 1/2−κ
ej −1 ≤k≤ej k κ || ||
j =1 i=1 2
⎣log(N δ)⎦+1 ⎧
|| k || ⎫
E ||E ||
|| || κ(j −1) 1/2−κ
≤ P max || Ei || > xe N
ej −1 ≤k≤ej || ||
j =1 i=1 2
⎛ || k || ⎞ν
N −ν(1/2−κ)
⎣log(N δ)⎦+1
E ||E ||
−(νκ(j −1)) || ||
≤ e E max || Ei ||
xν 1≤k≤ej || ||
j =1 i=1 2
⎣log(N δ)⎦+1
N −ν(1/2−κ) E
≤ c1 e−(νκ(j −1)) ej ν/2

j =1

δ (1/2−κ)ν
≤ c2 ,

which implies (8.1.9). The proof of (8.1.10) goes along the lines of (8.1.9). Next we
note that the distribution of .{┌N
0 (u, t), 0 ≤ u, t ≤ 1} does not depend on N . We

recall that
D
.{┌N
0
(u, t), 0 ≤ u, t ≤ 1} = {┌(u, t) − u┌(1, t), 0 ≤ u, t ≤ 1},

where .{┌(u, t), 0 ≤ u, t ≤ 1} is a Gaussian process with .E┌(u, t) = and


E┌(u, t)┌(v, s) = min(s, t)D(t, s). We now show that
.

1 P
. sup ||┌(u, ·)||2 → 0 when δ → 0.
0<u≤δ uκ

By checking that the mean and covariance function coincide, one can verify that
with .(λi , φi )i∈N defined in (8.1.6),

D
{┌(u, t), 0 ≤ u, t ≤ 1} = {O(u, t), 0 ≤ u, t ≤ 1},
.
426 8 Functional Data

where

E 1/2
O(u, t) =
. λi Wi (u)φi (t), (8.1.13)
i=1

and .{Wi (u), 0 ≤ u ≤ 1}, i ∈ N are independent Wiener processes. Using the
orthonormality of the eigenfunctions (see Theorem A.3.3) we have for each .0 ≤
u≤1
⎛ ∞
E
. O (u, t)dt =
2
λi Wi2 (u).
i=1

According to Theorem A.2.2, there exist independent and identically distributed


random variables .{ξi , i ≥ 1} with finite moments of all orders such that

|Wi (u)| ≤ ξi u1/2 (log(1/u))1/2 ,


. 0 ≤ u ≤ 1, a.s.

and therefore in light of (8.1.7),


∞ ∞
1 E E 1
E sup
.

λ i Wi
2
(u) ≤ λi E sup 2κ Wi2 (u)
0<u≤δ u 0<u≤δ u
i=1 i=1

≤ c4 δ 1−2κ log(1/δ) → 0, as δ → 0.

This implies (8.1.11). A similar argument gives (8.1.12). ⨆



We note also that with .┌ 0 (u, t) = ┌(u, t) − u┌(1, t), then .{┌ 0 (u, t), 0 ≤ u, t ≤
D
1} = {O0 (u, t), 0 ≤ u, t ≤ 1}, where

E 1/2
O0 (u, t) =
. λi Bi (u)φi (t), (8.1.14)
i=1

and .{Bi (u), 0 ≤ u ≤ 1}i∈N are independent and identically distributed standard
Brownian bridges. The limiting behaviour of supremum and integral functionals of
the norm of .ZN (u, ·) under the one change alternative are described in the next
theorem.
Theorem 8.1.3 If .HA of (8.1.3) holds, and the errors in model (8.1.1) are .Lν -
decomposable, and

.NθN2 (1 − θN )2 ||μ0 − μA ||2 → ∞, (8.1.15)


8.1 Change Detection in the Mean of Functional Observations 427

then

1 P
. sup 2
ZN (u, t)dt → 1
NθN2 (1 − θN )2 ||μ0 − μA || 2
0≤u≤1

and
⎛⎛
1 P 1
.
2
ZN (u, t)dtdu → ,
NθN2 (1 − θN )2 ||μ0 − μA || 2 3

where .k ∗ = ⎣N θN ⎦.
Proof We write

.ZN (u, t) = ZN,E (u, t) + N −1/2 vN (⎣Nu⎦, t), (8.1.16)

with
⎛ ⎞
⎣N
E ⎣Nu⎦ E
u⎦ N
ZN,E (u, t) = N −1/2 ⎝
. Ei (t) − Ei (t)⎠ ,
N
i=1 i=1

and

⎪ k(N − k ∗ )
⎨ (μ0 (t) − μA (t)), if 1 ≤ k ≤ k ∗ ,
.vN (k, t) = ∗ N
⎩ k (N − k) (μ0 (t) − μA (t)),

if k ∗ + 1 ≤ k ≤ N.
N

⎛⎛ 2
Since .sup0≤u≤1 ||ZN,E (u, ·)||2 = OP (1) and . ZN,E (u, t)dudt = OP (1) by
Theorem 8.1.1, (8.1.15) implies that each of the limits of interest are determined
by .vN . It follows from straightforward calculation that .sup0≤u≤1 ||vN (⎣Nu⎦, ·)||2 =
⎛⎛ 2
N θN2 (1−θN )2 ||μ0 −μA ||2 +O(1), and . vN (⎣Nu⎦, t)dudt = N θN2 (1−θN )2 ||μ0 −
μA ||2 /3 + O(1), from which the result follows. ⨆

We note that for weighted functionals of the CUSUM process as in Theo-
rem 8.1.2, if .0 ≤ κ < 1/2, then similarly it can be shown that
⎛⎛ ⎞2
1 P
. sup ZN (u, t) → ∞,
1/(N +1)<u<1−1/(N +1) [u(1 − u)]κ

if

N 1/2 (θN (1 − θN ))1−κ ||μ0 − μA ||2 → ∞.


.
428 8 Functional Data

In order to estimate the distribution of the Gaussian process .┌N 0 appearing

in (8.1.1) and (8.1.2), we now consider estimators of the long-run covariance


function .D(t, s) along the lines of those discussed in Sect. 3.1. Let

⎪ N −l

⎪ 1 E

⎪ (Xi (t) − X̄N (t))(Xi+l (s) − X̄N (s)), if l ≥ 0
⎨N −l
η̂l (t, s) =
.
i=1

⎪ 1 EN

⎪ (Xi (t) − X̄N (t))(Xi+l (s) − X̄N (s)),

⎩ N − |l|
if l < 0,
i=1−l

denote the sample autocovariance kernel of the sample at lag .l, where

1 E
N
. X̄N (t) = Xi (t)
N
i=1

is the sample mean of the functional observations. A natural estimator of .D is


defined as

E
N −1 ⎛ ⎞
l
. D̂N (t, s) = K η̂l (t, s),
h
l=−(N −1)

where K is a kernel function and h is a bandwidth as in Assumptions 3.1.4 and 3.1.5.


The following consistency results are established in Horváth et al. (2013), and
Berkes et al. (2016). See also Panaretos and Tavakoli (2013).
Theorem 8.1.4 If .H0 in (8.1.2) is satisfied, .{Ei , i ∈ Z} is .Lν -decomposable for
some .ν > 4, and the kernel K and bandwidth h satisfy Assumptions 3.1.4 and 3.1.5,
then
⎛⎛⎛ ⎞2
.E D̂N (t, s) − D(t, s) dtds → 0, as N → ∞.

If in addition K is a kernel of order q, i.e. .0 < w = limx→0 [1 − K(x)]x −q < ∞,


and .{Ei , i ∈ Z} is .Lν -decomposable for some .ν > max{2q, 4}, then
⎛ ⎛⎛ ⎞2 ⎞
h
.E||D̂N − D|| = ||D|| +
2 2
D(u, u)du (8.1.17)
N
⎛ ∞ ⎛ ⎞
h
× K 2 (x)dx + h−2q ||wD(q) ||2 + o + h−2q ,
−∞ N
8.1 Change Detection in the Mean of Functional Observations 429

where

E
D(q) (t, s) =
. |l|q EE0 (t)El (s).
l=−∞

We remark that (8.1.17) furnishes a data-driven approach to select the bandwidth


parameter h: we choose h in order to minimize the leading terms on the right-hand
side of (8.1.17). This leads to the choice

hopt = c0 N 1/(1+2q) ,
. (8.1.18)

where
⎛ ⎞1/(1+2q)
c0 = 2q||wD(q) ||2
.

⎛⎛ ⎛⎛ ⎞2 ⎞ ⎛ ⎞−1/(1+2q)

× ||D|| +2
D(u, u)du 2
K (x)dx .
−∞

The constant .c0 may be estimated from the data using initial “pilot” estimates of .D
and .D(q) .
Hypothesis tests for the presence of a change point in model (8.1.1) may be
constructed using the estimator .D̂N . This method is a functional data analog of the
approach in Newey and West (1987).
Since in practice these are applied without foreknowledge of which of .H0 or .HA
hold, it is useful to also know the asymptotic properties of .D̂N under the change
in the mean alternative. Towards this, and in order to evaluate how the asymptotic
behaviour of .D̂N depends on the size of the change, we define for .t, s ∈ [0, 1]

.ΔN (t, s) = [μ0 (t) − μA (t)][μ0 (s) − μA (s)]


= [μ0,N (t) − μA,N (t)][μ0,N (s) − μA,N (s)],

and assume the difference between the means before and after the change can
depend on N .
Theorem 8.1.5 If .HA of (8.1.3) holds, and .{Ei , i ∈ Z} is .Lν -decomposable for
some .ν > 4, Assumptions 2.1.1, 3.1.4 and 3.1.5 hold, then
⎛ ⎛ ⎛ ⎛ ∞ ⎞2
1
. D̂N (t, s) − θN (1 − θN )ΔN (t, s) K(u)du dtds = oP (1).
h −∞

See Horváth et al. (2014) for a proof. It follows that .D̂N (t, s) is not a consistent
estimator of .D(t, s) under the alternative, owing to the fact that the sample mean
.X̄N used in defining the empirical autocovariance kernels does not properly center
430 8 Functional Data

the series under .HA . If the errors are uncorrelated, then .D(t, s) = cov(E0 (t), E0 (s)),
so the sample variance function

1 E
N
D∗N (t, s) =
. (Xi (t) − X̄N (t))(Xi (s) − X̄N (s))
N −1
i=1

can be used in place of the long-run covariance estimator. Along the lines of
Theorems 8.1.4 and 8.1.5, it may be shown that
⎛ ⎛
┐ ∗ ┐2
. DN (t, s) − cov(E0 (t), E0 (s)) dtds = oP (1)

under .H0 , and under .HA


⎛ ⎛
{ ∗ }2
. DN (t, s) − [cov(E0 (t), E0 (s)) − θN (1 − θN )ΔN (t, s)] dtds = oP (1).

The above results can be combined to establish the consistency of the following
hypothesis tests for .H0 . We consider the test statistics .TN and .MN defined as
⎛⎛ ⎛
TN =
.
2
ZN (u, t)dudt, and MN = sup 2
ZN (u, t)dt.
0≤u≤1

According to Theorem 8.1.3, a consistent test is obtained by rejecting .H0 for large
values of either statistic, and hence to obtain a consistent test with asymptotic size
.α we reject .H0 when

.TN ≥ cT (α), or MN ≥ cM (α),

where .cT (α) and .cM (α) satisfy that under .H0

. lim P {TN ≥ cT (α)} = α, and lim P {MN ≥ cM (α)} = α.


N →∞ N →∞

According to Theorem 8.1.1, these critical values can be approximated in order to


satisfy
⎧⎛ ⎛ ⎛ ⎞2 ⎫
P
. ┌ 0 (u, t) dudt ≥ cT (α) = α, and
⎧ ⎛⎛ ⎫
⎞2
P sup ┌ 0 (u, t) dudt ≥ cM (α) = α,
0≤u≤1

where .┌ is a Gaussian process defined in (8.1.5). Owing to (8.1.14) and the


Karhunen-Loéve expansion of the Brownian bridge, see Chapter 5.3 of Shorack
8.1 Change Detection in the Mean of Functional Observations 431

and Wellner (1986), we have that


⎛⎛ ⎛ ⎞2 ∞
E ⎛⎛ ⎞2
D λl
.
0
┌ (u, t) dudt = N 2
k,l , and sup ┌ 0
(u, t) dt
(π k)2 0≤u≤1
k,l=1

E
D
= sup λl Bl (u),
0≤u≤1 l=1

where .{Nk,l , 1 ≤ k, l < ∞} are independent standard normal random variables and
λ1 ≥ λ2 ≥ . . . are the eigenvalues of .D, defined in (8.1.6). The eigenvalues .λi and
.

eigenfunctions .φi can be estimated from the sample using the empirical eigenvalues
and eigenfunctions .λ̂N,1 ≥ λ̂N,2 ≥ . . . defined by
⎛⎛
λ̂N,l φ̂N,l (t) =
. D̂N (t, s)φ̂N,l (s)ds, 1 ≤ l ≤ N − 1. (8.1.19)

As a consequence of Theorem 8.1.4, we get that under .H0 and for any .d ≥ 1
(see Horváth and Kokoszka (2012), p. 34)

. max |λ̂N,l − λl | = oP (1).


1≤l≤d

This suggests estimating .cT (α) and .cM (α) with .ĉT ,d (α) and .ĉM,d (α) so that
⎧⎛ ⎛ ⎛ ⎞2 ⎫
P
. ┌ 0 (u, t) dudt ≥ ĉT ,d (α) (8.1.20)
⎧ ⎫
⎨Ed
λ̂N,l 2 ⎬
≈P N ≥ ĉT ,d (α) = α,
⎩ (π k)2 k,l ⎭
k,l=1

and
⎧ ⎛⎛ ⎫
⎞2
P
. sup ┌ 0 (u, t) dudt ≥ ĉM,d (α)
0≤u≤1
⎧ ⎫
E
d
≈P sup λ̂N,l Bl (u) ≥ ĉM,d (α) = α.
0≤u≤1 l=1

The values .ĉT ,d (α) and .ĉM,d (α) satisfying this relation may be readily approxi-
E E
mated by simulating . dk,l=1 N2k,l λ̂N,l /(π k)2 and .sup0≤u≤1 dl=1 λ̂N,l Bl (u), given
the sample.
432 8 Functional Data

Under .HA , as a result of Theorem 8.1.5 it follows that for any .d ∈ N,


⎛ ⎞
λ̂N,1 = OP h||μ0 (t) − μA (t)||22 , and
. max λ̂N,j = OP (1).
j ∈{2,...,d}

Hence Theorem 8.1.3 yields that for any .d ∈ N and so long as .h = o(NθN2 (1 −
θN )2 ),
{ } { }
. lim P TN ≥ ĉT ,d (α) = 1, and lim P MN ≥ ĉM,d (α) = 1
N →∞ N →∞

under the alternative.


Similar consistency results can be established for tests based on weighted
functionals of .ZN . Letting .wκ (u) = [u(1 − u)]κ , weighted versions of .TN and
.MN may be defined as

⎛⎛ 2 (u, t) ⎛ 2 (u, t)
ZN ZN
TN (κ) =
. dudt, and MN (κ) = sup dt.
wκ2 (u) 0≤u≤1 wκ2 (u)

According to Theorem 8.1.2, under .H0 the limiting Gaussian process of


ZN (u, t)/wκ (u) is .┌ 0 (u, t)/wκ (u), which has mean zero and covariance
.

⎛ ⎞⎛ ⎞
┌ 0 (u, t) ┌ 0 (v, s) min(u, v) − uv
E
. = D(t, s).
wκ (u) wκ (v) wκ (u)wκ (v)

Using now that .┌ 0 (u, t)/wκ (u) has the same distribution as


O0 (u, t) E λi Bi (u)
1/2
. = φi (t), (8.1.21)
wκ (u) wκ (u)
i=1

similar approximations to the distribution of .TN (κ) and .MN (κ) under .H0 can be
computed as in (8.1.20).
The above test statistics might be considered to be “fully functional” in that they
do not require any initial dimension reduction of the functional data objects. We
now turn to change point detection methods for functional data based on functional
principal component analysis. Using the eigenfunctions .φ1 , φ2 , . . . of (8.1.6) we
define the projections of .ZN into the directions of the eigenfunctions of the d largest
eigenvalues of .D. Let

ξN,k (u) =
. ZN ((N + 1)u/N, t)φk (t)dt, 0 < u < 1, k ∈ {1, . . . , d}.

Theorem 8.1.6 We assume that .H0 of (8.1.2) holds, and .{Ei , i ∈ Z} is


Lν -decomposable for some .ν > 2.
.
8.1 Change Detection in the Mean of Functional Observations 433

(i) If .0 ≤ κ < 1/2, then

1 |ξN,k (u)| D 1
. max sup → sup |Bk (u)|, (8.1.22)
1≤k≤d λ1/2 0<u<1 [u(1 − u)]κ 0<u<1 [u(1 − u)]
κ
k

where .{Bi (u), 0 ≤ u ≤ 1}, i ∈ {1, . . . , d} are independent Brownian bridges.


(ii) With .κ = 1/2, then for all .x ∈ R,
⎧ ⎫
1 |ξN,k (u)|
P a(log N) max
. sup ≤ x + b(log N )
0<u<1 [u(1 − u)]
1≤k≤d 1/2 1/2
λk
= exp(−2de−x ), (8.1.23)

where .a(x) and .b(x) are defined in (1.2.18).


Proof Using the Gaussian processes .{┌N
0 (u, t), 0 ≤ u, t ≤ 1} of Theorem 8.1.2 we

get that
|⎛ |
1 | |
. sup | (ZN ((N + 1)u/N, t) − ┌N (u, t))φk (t)dt || = oP (1).
0
[u(1 − u)] κ |
0<u<1
⎛ 0
For each N, the joint distribution of .{ ┌N (u, t))φk (t)dt, 0 ≤ u ≤ 1, 1 ≤ k ≤ d} is
⎛ 0
normal with .E ┌N (u, t))φk (t)dt = 0, and the covariance is given by
⎧⎛⎛ ⎞ ⎛⎛ ⎞⎫
E
.
0
┌N (u, t))φk (t)dt 0
┌N (u' , s))φl (s)ds
⎛⎛
= (min(u, u' ) − uu' ) φk (t)D(t, s)φl (s)dtds

λk (min(u, u' ) − uu' ), if k = l,
=
0, if k /= l.

−1/2
0 (u, t))φ (t)dt are independent Brownian bridges, completing the
Hence .λk ┌N k
proof of (8.1.22).
Observing that .[u(1 − u)]−1/2 |ξk,N (u)|, .1 ≤ k ≤ d are asymptotically
independent, one can establish (8.1.23) as in Theorem 1.2.5. ⨅

The empirical counterpart to the statistics appearing on the left-hand sides
of (8.1.22) and (8.1.23) are obtained by substituting .λ̂N,k , .φ̂N,k defined in (8.1.19)
for .λk and .φk . Let

.ξ̂N,k (u) = ZN ((N + 1)u/N, t)φ̂N,k (t)dt (8.1.24)
434 8 Functional Data

be the empirical counterpart of .ξN,k (u). The proof of the following result is left to
the reader, see Exercise 8.6.6.
Theorem 8.1.7 We assume that .H0 of (8.1.2) holds, .{Ei , i ∈ Z} is
Lν -decomposable for some .ν > 4, and that .λ1 > · · · > λd > λd+1 ≥ 0.
.

(i) If .0 ≤ κ < 1/2, then

1 |ξ̂N,k (u)| D 1
. max sup → sup |Bk (u)|, (8.1.25)
1≤k≤d 1/2
λ̂N,k 0<u<1 [u(1 − u)]κ 0<u<1 [u(1 − u)]κ

where .{Bi (u), 0 ≤ u ≤ 1}, i ∈ {1, . . . , d} are independent Brownian bridges.


(ii) If .κ = 1/2, then for all .x ∈ R
⎧ ⎫
1 |ξ̂N,k (u)|
P a(log N) max
. sup ≤ x + b(log N)
0<u<1 [u(1 − u)]
1≤k≤d 1/2 1/2
λ̂N,k
= exp(−2de−x ), (8.1.26)

where .a(x) and .b(x) are defined in (1.2.18).

8.2 Estimating Change Points

8.2.1 Estimating a Single Change Point

In order to study the properties of estimators of the change point .k ∗ under .HA , we
consider the following AMOC alternative model that fixes notation in (8.1.1):

μ(t) + Ei (t), if 1 ≤ i ≤ k ∗ ,
Xi (t) =
. (8.2.1)
μ(t) + ΔN h(t) + Ei (t), if k ∗ + 1 ≤ i ≤ N,

where .||h|| = 1 and .k ∗ = ⎣θ N ⎦. .ΔN then effectively describes the size of the
change, which we allow to depend on the sample size N. A natural estimator is
the location at which the norm of the functional CUSUM process attains its largest
value. Let

k̂N = k̂N (κ)


. (8.2.2)
⎧⎛ ⎞κ
⎛ ⎛ k ⎞2 ⎫
N E k E
N
= sargmax Xi (t) − Xi (t) dt .
k∈{1,...,N −1} k(N − k) N
i=1 i=1
8.2 Estimating Change Points 435

In order to establish the asymptotic properties of .k̂N , we assume either of the


following conditions on the size of the change:
Assumption 8.2.1 .limN →∞ ΔN = Δ /= 0.

Assumption 8.2.2 .|ΔN | → 0 and .NΔ2N → ∞ as .N → ∞.


We used analogous assumptions when we considered the limiting distribution of
change point estimators for scalar data in Chap. 2. The following result describes
the asymptotic behaviour of .k̂N in both scenarios, in which we let

E ⎛⎛ ⎞ ⎛⎛ ⎞
τ =
.
2
E E0 (t)h(t)dt El (t)h(t)dt , (8.2.3)
l=−∞

and

⎪ −1 ⎛
E



⎪ − Ei (t)h(t)dt, if k < 0,


⎨ i=k
.S(k) = 0, if k = 0, (8.2.4)

⎪ k ⎛

⎪ E



⎩ Ei (t)h(t)dt, if k > 0.
i=1

Theorem 8.2.1 We assume that .{Xi , i ∈ {1, . . . , N}} follow model (8.2.1), and
that .{Ei , i ∈ Z} is .Lν -decomposable for some .ν > 4 are satisfied. Under the
shrinking change Assumption 8.2.2,
(i) if .0 ≤ κ < 1/2, then with .ξ(κ) defined in (2.2.3),

Δ2N D
.
2
(k̂N − k ∗ ) → ξ(κ).
τ
(ii) If in addition

N 1/2 |ΔN |(log N)−1/ν → ∞,


.

then

Δ2N D
.
2
(k̂N − k ∗ ) → ξ(1/2).
τ
436 8 Functional Data

If instead the asymptotically fixed change magnitude Assumption 8.2.1 holds,


then for .0 ≤ κ ≤ 1/2,

D
{ }
k̂N − k ∗ → argmaxl ΔS(l) − Δ2 |l|mκ (l) .
. (8.2.5)

where .mκ is defined in (2.2.1).


Proof The proof follows the basic roadmap of the proof of Theorem 2.2.1. First we
show that

|k̂ − k ∗ | = oP (N ).
. (8.2.6)

We decompose the CUSUM process into its random and drift components:

E
k
k E
N
k
. Xi (t) − Xi (t) = Sk (t) − SN (t) − zk (t),
N N
i=1 i=1

where

E
k
Sk (t) =
. Ei (t)
i=1

and

⎪ k(N − k ∗ )
⎨ ΔN h(t), if 1 ≤ k ≤ k ∗ ,
.zk (t) = ∗ (NN− k)

⎩ k
ΔN h(t), if k ∗ + 1 ≤ k ≤ N.
N

We showed in Theorem 8.1.1 that


⎛ ⎞2κ
⎛ ⎛ ⎞2
N2 k
. max Sk (t) − SN (t) dt = OP (N ). (8.2.7)
1≤k≤N k(N − k) N

On the other hand, for all .Nα ≤ k ≤ Nβ, .0 < α < β < 1 we have

1
. zk2 (t)dt → ∞. (8.2.8)
N

Now (8.2.6) follows from (8.2.7) and (8.2.8). The next step is the proof of

Δ2N |k̂ − k ∗ | = OP (1).


. (8.2.9)
8.2 Estimating Change Points 437

We introduce
⎛ ⎞2κ ⎛ ⎞2
N k
Qk (t) =
. Sk (t) − SN (t) − zk (t)
k(N − k) N
⎛ ⎞2κ ⎛ ⎞2
N k∗
− Sk ∗ (t) − SN (t) − z k ∗ (t)
k ∗ (N − k ∗ ) N

and we decompose .Qk (t) as

Qk (t) = Qk,1 (t) + . . . + Qk,7 (t)


. (8.2.10)

with
⎧⎛ ⎞2κ ⎛ ⎞2κ ⎫ ⎛ ⎞2
N N k
.Qk,1 (t) = − Sk (t) − SN (t) ,
k(N − k) k ∗ (N − k ∗ ) N
⎛ ⎞2κ ⎧⎛ ⎞2 ⎛ ⎞2 ⎫
N k k∗
Qk,2 (t) = Sk (t) − SN (t) − Sk ∗ (t) − SN (t) ,
k ∗ (N − k ∗ ) N N
⎛ ⎞2κ ⎛ ⎞
N k
Qk,3 (t) = −2 Sk (t) − SN (t) (zk (t) − zk ∗ (t)),
k(N − k) N
⎛⎛ ⎞2κ ⎛ ⎞2κ ⎞ ⎛ ⎞
N N k
Qk,4 (t) = − − Sk (t) − SN (t) zk ∗ (t),
k(N − k) k ∗ (N − k ∗ ) N
⎛ ⎞2κ
N k − k∗
Qk,5 (t) = 2 SN (t)zk ∗ (t),
k ∗ (N − k ∗ ) N
⎛ ⎞2κ
N
Qk,6 (t) = −2 ∗ (Sk (t) − Sk ∗ (t)) zk ∗ (t),
k (N − k ∗ )
⎛ ⎞2κ ⎛ ⎞2κ
N N
Qk,7 (t) = zk (t) −
2
zk2∗ (t).
k(N − k) k ∗ (N − k ∗ )

We only consider the case when .1 ≤ k ≤ k ∗ , and we may take a similar approach
when .k > k ∗ . Also, as a consequence of (8.3.14), we can assume that .N α ≤ k,
where .0 < α < θ . Let .a = a(N) = C/Δ2N . Using the mean value theorem we get
|⎛ ⎞2κ ⎛ ⎞2κ ||
|
1 | N N |
. max | − | = O(N −1−2κ )
N α≤k≤k ∗ k∗ − k | k(N − k) k ∗ (N − k ∗ ) |
438 8 Functional Data

and therefore by Theorem A.3.1 we have


⎛ ⎛ ⎞
1 N −1−2κ N
. max |Qk,1 (t)|dt = OP (8.2.11)
N α≤k≤k ∗ N 1−2κ Δ2N (k ∗ − k) Δ2N N 1−2κ
⎛ ⎞
1
= OP = oP (1).
NΔ2N

We note that
⎛ ⎞2κ ⎛ ⎞
N k − k∗
.Qk,2 (t) = 2 Sk (t) − Sk (t) −
∗ SN (t)
k ∗ (N − k ∗ ) N
⎛ ⎞
k k∗
× Sk (t) − SN (t) + Sk ∗ (t) − SN (t)
N N
⎛ ∗ ⎞
⎛ ⎞2κ E k
N ⎝
= −2 ∗ Ei (t)⎠
k (N − k ∗ )
i=k+1
⎛ ⎞
k k∗
× Sk (t) − SN (t) + Sk ∗ (t) − SN (t)
N N
⎛ ⎞2κ ∗
N k −k
+2 ∗ ∗
SN (t)
k (N − k ) N
⎛ ⎞
k k∗
× Sk (t) − SN (t) + Sk ∗ (t) − SN (t) .
N N

By the Cauchy–Schwarz inequality for integrals we have


⎛ | |
| E ∗ || |
| k || k k∗ |
| Ei (t)|| ||Sk (t) − SN (t) + Sk ∗ (t) − SN (t)|| dt
.
|
|i=k+1 | N N
|| ||
|| E ∗ || || ||
|| k || || k k ∗ ||
||
≤ || Ei || || SN + Sk − SN ||
|| ||Sk − ∗
|| .
||i=k+1 || N N 2
2

Theorem 8.1.1 yields that


|| ||
|| k k ∗ ||
|| Sk − SN + Sk ∗ − SN ||
|| = OP (N ).
1/2
. max (8.2.12)
1≤k≤k ∗ || N N 2
8.2 Estimating Change Points 439

Next we show that


|| ||
|| E ∗ ||
1 || k ||
max || E ||
i || = OP (a
−1/2
). (8.2.13)
.
1≤k≤k ∗ −a ∗
k −k ||
||i=k+1 ||
2

It follows from Theorem A.3.1 that


|| l ||ν
||E ||
|| ||
.E || Ei || ≤ c1 (l − k)ν/2
|| ||
i=k 2

and therefore by the maximal inequality for partial sums (see Theorem A.3.1) we
get
|| l ||ν
||E ||
|| ||
.E max || Ei || ≤ c2 k ν/2 (8.2.14)
1≤l≤k || ||
i=1 2

with some constants .c1 > 0 and .c2 > 0. We then obtain from elementary
inequalities that
|| || || k ||
|| k ∗ || ||E ||
1 || || E || 1 || ||
. max∗ ∗ || El ||
|| = max ∗ || Ek ∗ +1−i ||
1≤k≤k −a k − k || || ||
l=k+1 ||2
a≤k<k k
i=1 2
|| k ||
1 ||
||E ∗
||
||
≤ max max || E k +1−i ||
⎣log a⎦−1≤j ≤⎣log k ∗ ⎦+1 ej ≤k≤ej +1 k || ||
i=1 2
|| k ||
||E ||
1 || ||
≤ max max || E k ∗ +1−i || .
⎣log a⎦−1≤j ≤⎣log k ∗ ⎦+1 ej ej ≤k≤ej +1 || ||
i=1 2

Using Markov’s inequality we have


⎧ || || ⎫
⎨ || E ∗ || ⎬
1 || k ||
P max || E || > xa −1/2 (8.2.15)
.
⎩1≤k≤k ∗ −a ∗
k −k || l || ⎭
||l=k+1 ||
2
⎧ || k || ⎫
||E ||
1 || || −1/2
≤P max max || Ek ∗ +1−i || > xa
⎣log a⎦−1≤j ≤⎣log k ∗ ⎦+1 ej ej ≤k≤ej +1 || ||
i=1 2
440 8 Functional Data

⎣log k ∗ ⎦+1 ⎧ || k || ⎫
E ||E ||
|| || −1/2
≤ P max || j
Ek ∗ +1−i || > xe a
ej ≤k≤ej +1 || ||
j =⎣log a⎦−1 i=1 2

E
c2 ej ν/2 c3
≤ ≤ ,
xν ej ν a −ν/2 xν
j =⎣log a⎦−1

where .c3 is a constant. Hence (8.2.13) is proven. Also, using again Theorem 8.1.1
we get

| ⎛ ⎞|
| ∗ |
|SN (t) Sk (t) − k SN (t) + Sk ∗ (t) − k SN (t) | dt
.
| N N |
|| ||
|| k k ∗ ||
≤ ||SN ||2 ||Sk − SN + Sk − SN ||
|| ∗
||
N N 2

= OP (N ).

Thus Assumption 8.2.2 implies


⎛ ⎛ ⎞
1 1
. max |Qk,2 (t)|dt = OP
αN ≤k≤k ∗ Δ2N (k − k ∗ )N 1−2κ N 1/2 |ΔN |
⎛ ⎞
1
+ OP = oP (1). (8.2.16)
N

By the Cauchy–Schwarz inequality we have


⎛ ⎛ ⎞2κ || ||
N || ||
|Qk,3 (t)|dt ≤ 2 ||Sk − k SN || ||zk − zk ∗ ||2 ,
.
k(N − k) || N ||2

and
|| ||
|| k ||
. max ||Sk − SN || = OP (N 1/2 ).
1≤k≤k ∗ || N ||2

Using the definition of .zk we obtain

1
. max ||zk − zk ∗ ||2 = OP (|ΔN |) .
1≤k<k ∗ k∗ − k
8.2 Estimating Change Points 441

Thus we get

1
. max |Qk,3 (t)|dt = oP (1). (8.2.17)
αN ≤k<k ∗ 2 ∗
ΔN (k − k)N 1−2κ

Following the proof of (8.2.11) one can show



1
. max |Qk,4 (t)|dt = oP (1) (8.2.18)
αN≤k<k ∗ 2 ∗
ΔN (k − k)N 1−2κ

and

1
. max |Qk,5 (t)|dt = oP (1). (8.2.19)
αN ≤k<k ∗ Δ2N (k ∗ − k)N 1−2κ

Applying again (8.2.13) we obtain


⎛ ⎛ ⎞
1 a −1/2 |ΔN |N
. max |Qk,6 (t)|dt = OP (1)
1≤k≤k ∗ −a Δ2N N 1−2κ (k ∗ − k) Δ2N N
1
= OP (1), (8.2.20)
C 1/2

where the .OP (1) term does not depend on C. Elementary arguments give that there
are .c4 > 0 and .c5 > 0 such that
⎛ ⎛ ⎞
1−2κ ∗
. − c4 ΔN N (k − k) ≤ zk2 (t) − zk2∗ (t) dt
2
(8.2.21)

≤ −c5 Δ2N N 1−2κ (k ∗ − k)

for all .N α ≤ k < k ∗ . It follows from (8.2.11)–(8.2.19) and (8.2.21) that for all
C > 0 and .α < θ
.

⎛ ⎛ ⎞
1
. max Qk,1 (t) + Qk,2 (t) + . . . + Qk,5 (t) + Qk,5 (t) dt
N α≤k≤k ∗ −C/Δ2N 2
P
→ −∞. (8.2.22)

Combining (8.2.20) and (8.2.21) we get that for all .K < 0


⎧ ⎛ ⎛ ⎞ ⎫
1
. lim lim sup P max Qk,6 (t) + Qk,7 (t) dt > K (8.2.23)
C→∞ N →∞ N α≤k≤k ∗ −C/Δ2N 2

= 0.

The result in (8.2.9) now follows from (8.2.22) and (8.2.23).


442 8 Functional Data

According to (8.2.9), we can assume that .|k̂ − k ∗ | ≤ C/Δ2N , where C is an


arbitrary number. We show that for all .C > 0

. max |Qk,i (t)|dt = oP (N 1−2κ ), i = 1, . . . , 5. (8.2.24)
|k−k ∗ |≤C/Δ2N

Assume that .k ≤ k ∗ . We showed that

. ||SN ||2 = OP (N 1/2 )

and
|| ||
|| k k ∗ ||
|| ||
||Sk − N SN + Sk − N SN || = OP (N ).
1/2
. max ∗
k ∗ −C/Δ2N ≤k≤k ∗ 2

Following (8.2.11) we have



. max |Qk,1 (t)|dt
|k−k ∗ |≤C/Δ2N
⎛ ⎞⎛ ⎛
1
=O max (Sk∗ (t) − Sk (t))2 dt
N 1+2κ Δ2N |k−k ∗ |≤C/Δ2N
⎛ ⎞
|k − k ∗ |
+ 2
SN (t)dt ,
N

which implies (8.2.24) when .i = 1. As in (8.2.16), we have



. max |Qk,2 (t)|dt
k ∗ −C/Δ2N ≤k≤k ∗
⎧ || || ⎫
⎛ ⎞2κ ⎨ || E ∗ || ⎬
N || k || C
≤ max || Ei || + ||SN ||2
k ∗ (N − k ∗ ) ||
⎩k ∗ −C/Δ2N ≤k≤k ∗ || || 2 ⎭
i=k+1 ||
NΔN
2
⎛ ⎞2κ || ||
N || ∗ ||
× max ||Sk − k SN + Sk ∗ − k SN || .
k ∗ (N − k ∗ ) || N N ||
k ∗ −C/Δ2N ≤k≤k ∗ 2

Using Theorem A.3.1 we have that


|| ||
|| E ∗ || ⎛ ⎞
|| k || 1
max || E || = O .
.
|| i || P
|ΔN |
k ∗ −C/Δ2N ≤k≤k ∗ ||i=k+1 ||
2
8.2 Estimating Change Points 443

Similar arguments can be used when .k ∗ < k ≤ k ∗ + C/Δ2N . This completes the
proof of (8.2.24) when .i = 2.
We note that

. max |Qk,3 (t)|dt
|k−k ∗ |≤C/Δ2N
|| ||
|| k ||
= O(N −2κ ) max |||| Sk − S ||
N || max ||zk − zk ∗ ||2
1≤k≤N N 2 |k−k ∗ |≤C/ΔN
2

⎛ ⎞
N 1/2
= OP N −2κ = oP (N 1−2κ )
|ΔN |

on account of Assumption 8.2.2. Similar argument gives (8.2.24) when .i = 4 and 5.


According to (8.2.24), we only need to consider .Qk,6 (t) + Qk,7 (t) when .|k −
k ∗ | ≤ C/Δ2N . We can assume without loss of generality that .ΔN > 0. Elementary
arguments give
| ⎛ |
| 1 |
. |
sup | 1−2κ Qk ∗ +sτ 2 /Δ2 ,7 (t)dt + 2(θ (1 − θ )) τ |s|mκ (s)||
1−2κ 2
(8.2.25)
|s|≤C N N

= o(1),

where .mκ (s) is defined in (2.2.1). Next we write



1
. Qk ∗ +s/Δ2 ,6 (t)dt
N 1−2κ N
⎧ ⎛ ⎞ ⎛
⎪ Ek∗

⎪ k ∗ (N − k ∗ ) 1−2κ

⎪ 2 Δ N Ei (t)h(t)dt, if s < 0,

⎪ N2

⎨ ∗
i=k +s/ΔN +1
2

= 0, if s = 0



⎪ ⎛ ∗ ∗ ) ⎞1−2κ k ∗ +s/Δ2N ⎛
E

⎪ k (N − k

⎪ −2 Δ Ei (t)h(t)dt, if s > 0.
⎩ N2
N
∗ i=k +1

Since
⎛ the errors .{Ei , i ∈ Z} are .Lν -decomposable, it follows that the scalar sequence
.{ Ei (t)h(t)dt, i ∈ Z} .L -decomposable, and therefore Theorem A.1.1 yields
ν


1 D[−C,C]
. Qk ∗ +sτ 2 /Δ2 ,4 (t)dt −→ 2(θ (1 − θ ))1−2κ τ 2 W (s), (8.2.26)
N N
444 8 Functional Data

where .{W (s), −∞ < s < ∞} is the two sided Wiener process of (2.2.2) and .τ is
defined in (8.2.3). If .k̂(C) is defined as

. k̂(C) = min k : |k − k ∗ |

⎛ ⎞κ
⎛ ⎛ k ⎞2
N E k E
N
≤ Cτ 2
/Δ2N and Xi (t) − Xi (t) dt
k(N − k) N
i=1 i=1

⎛ ⎞κ
⎛ ⎛ ⎞2

N Ej
j E
N
= max ⎝ Xi (t) − ⎠
Xi (t) dt ,
|j −k ∗ |≤Cτ 2 /Δ2N j (N − j ) N
i=1 i=1

we obtain from (8.2.25) and (8.2.26) that

Δ2N D
. (k̂(C) − k ∗ ) → argmax(2(θ (1 − θ ))1−2κ τ 2 W (s)
τ2
− 2θ (1 − θ )τ 2 |s|m0 (s) : |s| ≤ C)
= argmax (W (s) − |s|mκ (s) : |s| ≤ C) .

Since

argmax (W (s) − |s|mκ (s) : |s| ≤ C) → ξ(κ) a.s.,


.

where .ξ(κ) is defined in (2.2.3), the proof is complete when .0 ≤ κ < 1/2.
The proof of the second part of the theorem based on the observation that
|| k ||
1 ||||E ||
||
. max || E i || = OP ((log N)1/ν ). (8.2.27)
1≤k≤N k 1/2 || ||
i=1 2

To prove (8.2.27) we follow the proof of (8.2.15). Let .C > 0. Arguing as in (8.2.15),
on account of (8.2.14) we have
⎧ || k || ⎫
1 ||
||E
||
|| c6
P
. max 1/2 || Ei (t)|| > C(log N) 1/ν
≤ ν.
1≤k≤N k || || C
i=1 2

Since C can be chosen arbitrarily large, (8.2.27) follows. This implies that the drift
term dominates the process. Hence we have that (8.2.6) holds.
In order to establish (8.2.5), we follow the proof of Theorem 8.2.1. We present
detailed results when .κ = 0. We note that

|k̂N − k ∗ | = oP (N ),
.
8.2 Estimating Change Points 445

so we can assume that .⎣Nα⎦ ≤ k̂N ≤ ⎣Nβ⎦, 0 < α < θ < β < 1. Next we show
that

|k̂N − k ∗ | = OP (1)
. (8.2.28)

Let .C > 0. We use again the decomposition in (8.2.10). We only work out the
details for .⎣N α⎦ ≤ k ≤ k ∗ − C. It follows from the proof of (8.2.13) that
|| ||
|| E k∗ ||
||
1 || ||
max E ||
i || = OP (1) (8.2.29)
.
1≤k≤k ∗ −C ∗ ||
k − k ||
i=k+1 || 2

and for all .x > 0


⎧ || || ⎫
⎨ || E k∗ || ⎬
1 |||| ||
lim lim sup P max E || > x = 0. (8.2.30)
.
⎩1≤k≤k ∗ −C ||
k ∗ − k ||
i || ⎭
i=k+1 ||
C→∞ N →∞
2

Using (8.2.12) and (8.2.29) we conclude



1
. max |Qk,1 (t)|dt = OP (N −1/2 ). (8.2.31)
1≤k≤k ∗ −C N(k ∗ − k)

Since
1
. max ||zk − zk ∗ ||2 = OP (1)
1≤k<k ∗ k∗ − k

we obtain by Theorem 8.1.1 that



1
. max∗ |Qk,2 (t)|dt = OP (N −1/2 ). (8.2.32)
1≤k≤k −C N(k ∗ − k)

Similar arguments give



1
. max∗ |Qk,3 (t)|dt = OP (N −1/2 ). (8.2.33)
1≤k≤k −C N(k ∗ − k)

Since by Theorem 8.1.1


|| ||
⎛ || Ek∗ ||
1 1 || ||
max |Qk,4 (t)|dt ≤ 2||zk ∗ ||2 max || E ||
i || ,
.
1≤k≤k ∗ −C N (k ∗ − k) 1≤k≤k ∗ −C ∗ ||
N(k − k) ||
i=k+1 || 2
446 8 Functional Data

Eq. (8.2.30) yields that for all .x > 0


⎧ ⎛ ⎫
1
. lim lim sup P max∗ |Qk,4 (t)|dt > x = 0. (8.2.34)
C→∞ N →∞ 1≤k≤k −C N(k ∗ − k)

We note that there are .c1 > 0 and .c2 > 0 such that


. − c1 N |k − k| ≤ Qk,5 (t)dt ≤ −c2 N |k ∗ − k| (8.2.35)

for all .⎣N α⎦ ≤ k ≤ ⎣Nβ⎦, .0 < α < θ < β < 1. Since the estimates in (8.2.31)–
(8.2.34) are valid for .k ∗ +C ≤ k < N and therefore by (8.2.35) we get for all .C > 0
and .0 < α < θ < β < 1
⎛ ⎛ ⎞
1 P
. max Q k,1 (t) + Q k,2 (t) + Q k,3 + Q k,5 (t) dt → −∞
|k ∗ −k|≥C,⎣N α⎦≤k≤⎣Nβ⎦ 2

Combining (8.2.34) and (8.2.35) we conclude


⎧⎛ ⎛ ⎞ ⎫
1
. lim lim sup P Qk,4 (t) + Qk,5 (t) dt > x = 0
C→∞ N →∞ 2

for all .x < 0. Hence we can assume that .k̂N is between .k ∗ − C and .k ∗ + C, for a
large .C > 0. By definition

⎪ k∗

⎪ k ∗ (N − k ∗ ) E

⎪ −2 Δ h(t)Ei (t), if k < k ∗ ,

⎪ N
⎨ i=k+1
Qk,4 (t) =
. 0, if k = k ∗ ,



⎪ k ∗ (N − k ∗ ) E
k

⎪ if k > k ∗ ,

⎩ 2 Δ h(t)Ei (t),
N
i=k ∗ +1

and therefore for all .C > 0


⎧ ⎛ ⎫
1 D
. Qk ∗ +l,4 (t)dt, |l| ≤ C → {2θ (1 − θ )ΔS(l), |l| ≤ C} ,
N

where .{S(l), −∞ < l < ∞} is defined in (8.2.4). Elementary arguments show



1
. max Qk ∗ +l,5 (t)dt + 2θ (1 − θ )Δ2 |l|m0 (l)| = o(1).
|l|≤C N
8.2 Estimating Change Points 447

Thus we conclude
⎧ ⎛ ⎫
1
. (Qk ∗ +l,4 (t) + Qk ∗ +l,5 (t)dt, |l| ≤ C
N
D
{ ⎛ ⎞}
→ θ (1 − θ ) ΔS(l) − Δ2 |l|m0 (l) .

The final details proceed along the same lines as in Theorem 2.2.2.


In order to use Theorem 8.2.1 to perform inference on the change point location,
one must estimate the unknown values .τ 2 and .ΔN . The function .eN (t) = ΔN h(t)
may be estimated with

êN (t) = X̂k̂,2 (t) − X̂k̂,1 (t),


.

using the change point estimator .k̂N , where

1 E E
k̂N N
1
X̂k̂,1 =
. Xi (t) and X̂k̂,2 = Xi (t).
k̂N i=1 N − k̂N
i=k̂N +1

One can show that under the conditions of Theorem 8.2.1,

||êN ||22 P
. → 1.
Δ2N

In order to estimate .τ 2 , we observe that



E
τ2 =
. Eg0 gl
l=−∞

with

eN (t)
gl = gl,N =
. El (t) dt.
||eN ||

It is natural then to employ a kernel long-run variance estimator as discussed in


Sect. 3.1. First, we center the observations using the estimated change point .k̂N ,
creating the estimated residuals

Ei (t) − X̂k̂,1 , if 1 ≤ i ≤ k̂N ,
Êi (t) =
.
Ei (t) − X̂k̂,2 , if k̂N + 1 ≤ i ≤ N.
448 8 Functional Data

gi may then be estimated with


.


ĝi =
. Êi (t)êN (t)dt.

The estimator for .τ 2 is the long-run estimator based on .{ĝi , 1 ≤ i ≤ N }:

E
N −1 ⎛ ⎞
l
τ̂N2 =
. K γ̂l ,
h
l=1−N

where

⎪ N −l

⎪ 1 E

⎪ ĝi ĝi+l , if 0 ≤ l < N,
⎨N −l
γ̂l =
.
i=1

⎪ 1 E
N

⎪ ĝi ĝi+l , if − N < l < 0.

⎩ N − |l|
i=1−l

The bandwidth h and kernel K satisfy Assumptions 3.1.4 and 3.1.5. Again under
the conditions of Theorem 8.2.1

τ̂N2 P
. → 1. (8.2.36)
τ2

These can be used to construct confidence intervals for .k ∗ . Under the conditions
of Theorem 8.2.1, it follows that
⎛ ⎞
τ̂N2 O(κ)1−α/2 τ̂N2 O(κ)α/2
. k̂N − , k̂N − (8.2.37)
||êN ||22 ||êN ||22

where .O(κ)q is the q quantile of the distribution of .ξ(κ) is an approximate .1 − α


confidence interval for .k ∗ .
Another option is to use functional principal component analysis to define change
point estimators. We recall the empirical projections .ξ̂N,k (u) from (8.1.24), and
define
⎧ ⎫
(1) 1
.k̂ = sargmax max | ξ̂
κ N,j
(k/N)| ,
k∈{1,...,N } 1≤j ≤d (k(N − k))
N
8.2 Estimating Change Points 449

and
⎧ ⎫
(2) −1/2 1
.k̂ = sargmax max λ̂ |ξ̂N,j (k/N)| ,
N
k∈{1,...,N } 1≤j ≤d N,j (k(N − k))κ

where .λ̂N,1 ≥ λ̂N,2 ≥ . . . ≥ λ̂N,d are the empirical eigenvalues from (8.1.19).

8.2.2 Estimating Multiple Change Points

In this section we consider estimating and performing inference on multiple change


points in the mean function of a functional time series. In particular we assume an
R change point model, such that

E
R+1
Xi (t) =
. μj (t)1{kj −1 ≤ i < kj } + Ei (t), (8.2.38)
j =1

i ∈ {1, . . . , N}, t ∈ [0, 1],

where .k0 = 1, .kR+1 = N + 1, and .ΔN,j = ||μj − μj +1 || > 0 for all


.j ∈ {1, . . . , R}. Below we let .ΔN = minj ∈{1,...,R} ΔN,j . In order to simplify
the results that follow we assume that the “directions” of the changes in the mean
.hl (t) = [μl (t) − μl−1 (t)]/||μl+1 − μl ||, .l ∈ {1, . . . , R}, do not depend on N .

The indices .k1 , . . . , kR denote the locations of change points in the mean, which we
assume satisfy
Assumption 8.2.3 .ki = ⎣Nθi ⎦, .0 = θ0 < θ1 < · · · < θR < θR+1 = 1.
We consider estimating and performing inference on R and .k1 , . . . , kR in two
stages. First, we develop preliminary consistent estimators of R and .k1 , . . . , kR
using standard binary segmentation. Subsequently, noting that we expect exactly
one change point between the estimates .k̂i and .k̂i+1 , these estimates are refined in
a second stage by considering single change point estimators over the observations
with indices between .k̂i and .k̂i+1 . The asymptotic distribution of the estimators in
this second stage may be determined as in Sect. 8.2.1.
In order to produce preliminary, consistent estimators of R and .k1 , . . . , kR ,
we suggest using binary segmentation, which, as discussed in Chap. 2, involves
sequentially splitting the original sample into two sub-samples based on an initial
change point estimate, estimating change points on each sub-sample, and then
repeating until some stopping criterion is satisfied. To formulate this method applied
to a functional time series, suppose that we have arrived at some point in the
procedure at a sub-sample with a starting index l and an ending index u satisfying
.1 ≤ l < u ≤ N. In order to identify and estimate change points, we consider

sequential estimates of the mean function based on the partial sum process .Sk (t) =
450 8 Functional Data

Ek
k ∈ {1, ..., N}. To estimate changes points on the sub-sample with
j =1 Xj (t), .

indices between .l and u, we consider a generalized CUSUM process .Zkl,u defined


as
⎛ ⎞κ
(u − l)2 1
. Zkl,u (t) =
(u − k)(k − l) (u − l)1/2
┌ ┐
k−l
× Sk (t) − Sl (t) − (Su (t) − Sl (t)) . (8.2.39)
u−l

In (8.2.39) .κ defines the degree of weighting applied to the standard CUSUM


process. Elementary algebra shows that .Zkl,u is equivalent with the weighted
CUSUM process

ZN (k/N, t)
.
[k/N (1 − k/N )]κ

defined over the sample .Xl , . . . , Xu . Intuitively if there exists one or more change
points in a sub-sample, .||Zkl,u || will be large, and the point .k̂l,u = sargmax ||Zkl,u ||
l<k<u
estimates a change point. Deciding whether to include or exclude .k̂l,u as a potential
change point can be determined by the magnitude of .||Zkl,u ||: if this exceeds some
threshold .ρN , we then include .k̂l,u among the estimated change points, and further
segment the sub-sample. This may be described by the following pseudo-algorithm:

Algorithm Functional Binary Segmentation: BINSEG(.l, u, ρN )


Inputs: l ← starting index ; u ← ending index ; ρN ← threshold
if u − l ≤ 1 , then STOP else Define k0 = sargmax ||Zkl,u ||, and Z = ||Zkl,u
0
||
l<k<u
if Z > ρN then
add k0 to the set of estimated change points. run BINSEG(l, k0 , ρN ) and BINSEG(k0 , u, ρN )
else STOP
end if

BINSEG(1, N, .ρN ) returns a set of estimated change points .K̂ = {k̂1 , . . . , k̂R̂ },
sorted into increasing order, and an estimated number of change points .R̂ =
|K̂|. These estimates are consistent so long as the errors in (8.2.38) are .Lν -
decomposable:
Theorem 8.2.2 Assume that .Xi , i ∈ {1, ..., N} satisfies the multiple change point
model (8.2.38), Assumption 8.2.3 holds, .ΔN is bounded away from zero, .0 < κ ≤
1/2, and the model errors .{Ei , i ∈ Z} is .Lν -decomposable for some .ν > 2. If .ρN in
the binary segmentation algorithm satisfies

(log N)1/ν ρN
. + 1/2 → 0, N → ∞.
ρN N
8.2 Estimating Change Points 451

Then for any sequence .rN satisfying .rN → ∞ as .N → ∞, arbitrarily slowly,


⎛ ⎞
. lim P {R̂ = R} ∩ { max |k̂i − ki | ≤ rN } = 1.
N →∞ i∈{1,...,R}

Remark 8.2.1 Since every separable Hilbert space is isometrically isomorphic


to either .Rd or .L2 [0, 1], we obtain as a corollary of Theorem 8.2.2 that binary
segmentation as presented in Chap. 2, and in more general Euclidean spaces, is
consistent under .Lν -decomposability conditions and when the change magnitudes
are bounded away from zero. Since for a symmetric, strictly positive definite matrix
.A ∈ R
d×d , .||x|| T
A = (x Ax)
1/2 defines a norm on .Rd induced by an inner

product under which .R is separable, binary segmentation based on quadratic form-


d

type detectors as in Sect. 1.3 are also consistent in the sense of Theorem 8.2.2.
This corollary though should not be used to trivialize the application of binary
segmentation in each of these settings. The practical aspects of applying binary
segmentation in each of these settings and spaces differ in many respects.
Remark 8.2.2 A standard choice of the threshold .ρN is to take it of the form .ρN =
σ̂N (log N )1/2 , where .σ̂N2 = median(||Xi+1 −Xi ||2 /2, i = 2, . . . , N ). Choosing .ρN
as an appropriate quantile of the approximate (limiting) distribution of the CUSUM
detector used is another popular option, but does not lead to consistency unless the
quantile is taken to increase with the sample size at an appropriate rate.
In order to simplify the presentation, we only consider the case when .κ = 1/2
in (8.2.39). Under model (8.2.4), we may write .Zkl,u (t) = Okl,u (t) + Wkl,u (t), where

⎛ ⎞1/2
u−l
k
.Ol,u (t) = (8.2.40)
(u − k)(k − l)
┌ ┐
k−l
× Mk (t) − Ml (t) − (Mu (t) − Ml (t)) ,
u−l

⎛ ⎞1/2
u−l
.Wkl,u (t) = (8.2.41)
(u − k)(k − l)
┌ ┐
k−l
× Ek (t) − El (t) − (Eu (t) − El (t)) ,
u−l
Ek ER+1 Ek
Mk (t) =
. i=1 j =1 μj (t)1{kj −1 ≤ i < kj }, and .Ek (t) = i=1 εi (t) is the
partial sum of the errors.
The binary segmentation algorithm involves estimating change points on sub-
samples, and so in the Lemmas below we use .l and u to denote the starting and
ending indices of the sub-sample under consideration. For any such indices, if there
are change points between .l and u, we use the notation .i0 and .β ≥ 0 to describe the
452 8 Functional Data

starting index and the number of change points between .l and u, so that

.ki0 ≤ l < ki0 +1 < ki0 +2 < . . . < ki0 +β < u ≤ ki0 +β+1 , (8.2.42)

Let .I = {1, 2, . . . , β} be the index set of the change points between .l and u. In
order to prove Theorem 8.2.2, we establish the following result that shows that the
norm of the trend of the functional CUSUM process is maximized only at change
points.
Lemma 8.2.1 Suppose there exists at least one change point between .l and u.
If .k ∗ = sargmax ||Okl,u ||, then .k ∗ = ki for some .i ∈ {1, . . . , R}, with .l ≤ ki ≤
l<k<u
u.
Proof Let .I denote the set of indices of changes points that are between .l and u.
We consider two cases separately: (i) There is one change point between .l and u
and, (ii) There are two or more change points between .l and u.
Case 1: .|I| = 1: Let v denote the single change point between .l and u. Let
the mean functions between .l and .v and .v + 1 and .u be denoted .μ and .μ' ,
respectively. Then we have for .l < k < v,
⎛ ⎞1/2
u−l
||Okl,u || =
. (k − l)||μ|| (8.2.43)
(u − k)(k − l)
⎛ ⎞
k − l 1/2
= (u − l)1/2 ||μ||.
u−k

From this it is clear that .||Okl,u || is either a monotonically increasing or


identically zero as a function of k for .l < k < v. Similarly, for any k such
that .v < k < u, we have
⎛ ⎞1/2
u−l
||Okl,u || = ||Mk − Ml ||
(u − k)(k − l)
⎛ ⎞1/2
u−l
. = ||Mu − Mk || (8.2.44)
(u − k)(k − l)
⎛ ⎞
u − k 1/2 '
= (u − l)1/2 ||μ ||,
k−l

which is monotonically decreasing or identically zero as a function of k. More-


over, .||Okl,u || cannot be zero over the entire segment .{l, . . . , u} as .||μ − μ' || ≥
ΔN > 0 implies .max(||μ||, ||μ' ||) > ΔN /2 > 0. Thus, .||Ovl,u || = max ||Okl,u ||,
l<k<u
and the maximum v is unique.
Case 2: .|I| > 1 : Let .v = ki0 +r and .v ' = ki0 +r+1 denote two consecutive change
points between .l and u. Define .dj = (ki0 +j − l)/(u − l) for .j ∈ I, and let .d1∗ =
8.2 Estimating Change Points 453

(v−l)/(u−l), .d2∗ = (v ' −l)/(u−l) denote the break fractions of the consecutive
change points v and .v ' under consideration. Then .0 = d0 < d1 < . . . < dβ <
dβ+1 = 1. Therefore, for any k between v and .v ' , we may rewrite .Okl,u as

⎛ ⎞1/2
u−l
Okl,u =
. (Mk − Ml )
(u − k)(k − l)
E r ⎛ ⎞
k−l
(dj − dj −1 )μi0 +j + − dr μi0 +r+1
u−l
j =1
= (u − l)1/2 ⎛ ⎞ .
k − l u − k 1/2
u−lu−l

Let .x = x(k) = (k − l)/(u − l). Then


|| || 2
|| r ||
||E ||
|| (dj − dj −1 )μi +j + (x − dr )μi +r+1 ||
|| 0 0 ||
||j =1 ||
||Okl,u ||2 = (u − l)
.
x(1 − x)
= (u − l)h(x). (8.2.45)
E
Let .s(x) = || rj =1 (dj − dj −1 )μi0 +j + (x − dr )μi0 +r+1 ||2 , the numerator of
.h(x). Expanding s yields

⎛ ⎞2
Er
.s(x) = ⎝ (dj − dj −1 )||μi0 +j || + (x − dr )||μi0 +r+1 ||⎠ (8.2.46)
j =1

E
r ⎛ ⎞
+ 2(x − dr ) (dj − dj −1 ) <μi0 +j , μi0 +r+1 > − ||μj ||||μi0 +r+1 ||
j =1
( )
= a'x + b ' 2
+ 2(x − dr )t,
Er
where .a ' = ||μi0 +r+1 ||, .b' = j =1 (dj − dj −1 )||μi0 +j || − dr a ' and
Er ⎛ ⎞
.t = j =1 (dj − dj −1 ) <μi0 +j , μi0 +r+1 > − ||μj ||||μi0 +r+1 || . Notice that
by the Cauchy-Schwarz inequality, .t ≤ 0. Moreover, (8.2.46) can be represented
as .s(x) = a '2 x 2 + 2(t + a ' b' )x + b'2 − 2tdr = ax 2 + bx + c. If .h(x) is extended
to the open unit interval as .s(x)/[x(1 − x)]2 , then we now wish to show is that h
achieves a maximum over the interval .[d1∗ , d2∗ ] at either .d1∗ or .d2∗ . The derivative
of .h(x) is

(a + b)x 2 + 2cx − c g(x)


. h' (x) = = , (8.2.47)
[x(1 − x)]2 [x(1 − x)]2
454 8 Functional Data

where .g(x) is a quadratic function with vertex .−c/(a + b), when .(a + b) /= 0.
First notice that .g(0) = −c = −(b'2 − 2tdr ) ≤ 0. Consider three scenarios: (i)
.a + b = 0. (ii) .a + b > 0, and (iii) .a + b < 0.

Scenario 1: .a + b = 0 : First we claim that in this case .c /= 0. If .c = 0, then


we have .b' = t = 0. Notice that, .a + b = a '2 + 2(t + a ' b' ) = a '2E . Therefore,
'
.a + b = 0 and .c = 0 implies .a = ||μi0 +r+1 || = 0 and .b =
' r
j =1 (dj −
dj −1 )||μi0 +j || − 0 = 0. Combining these lead to .μi0 +1 = μi0 +2 = . . . =
μi0 +r+1 = 0 which contradicts Assumption 8.2.3 and the assumption that .|I| ≥
2. Under this scenario, .g(x) = c(2x − 1) and .g(0) = −c < 0. This implies .h(x)
decreases on .[0, 0.5] and increases on .[0.5, 1].
Scenario 2: .a + b > 0 : the vertex of .g(x) is .−c/(a + b) which is negative
in this case. Therefore, .g(x) is negative from 0 to some real number .x0 , and is
positive from .x0 to infinity if .c > 0. If .c = 0, then .g(x) is always positive, which
implies .h(x) strictly increases on .[0, 1].
Scenario 3: .a + b < 0 : In this scenario, the vertex .−c/(a + b) will be positive
and the maximum of .g(x) is .−(c2 /(a + b) + c). If the maximum of .g(x) is
negative, then .g(x) is always negative and .h(x) is strictly decreasing. Otherwise,
we have .−(c2 /(a + b) + c) > 0 which implies .c/(a + b) < −1. The roots of
.g(x) are

⎛ ⎛ ⎞⎞1/2
c c c
.x1 = − +1 − , and
a+b a+b a+b
⎛ ⎛ ⎞⎞1/2
c c c
x2 = +1 − .
a+b a+b a+b

Evidently .x2 > 1. Therefore, .g(x) is either positive from 0 to 1 or negative from
0 to .x1 and positive from .x1 to 1. Once again, .h(x) over .[0, 1] is either strictly
increasing, or decreasing and then increasing, respectively.
It follows in all cases that .h(x) is maximized over .[d1∗ , d2∗ ] at either .d1∗ or .d2∗ ,
which implies the statement of the lemma.


Remark 8.2.3 We note that the proof of Lemma 8.2.1 implies that the derivative
of h defined in (8.2.46) is non-zero at the point .d1∗ and/or .d2∗ maximizing h on the
interval .[d1∗ , d2∗ ], since if .h' has a zero at .x0 ∈ [d1∗ , d2∗ ], h must be decreasing to
the left of .x0 , and increasing to the right of .x0 , so that .x0 cannot coincide with the
maxima. This implies that a linear approximation of h at its maxima has a slope
bounded away from zero.
Lemma 8.2.2 If on the sub-sample with indices between .l and u,

l < ki0 +r − mN < ki0 +r + mN < u for some r ∈ I,


. (8.2.48)
8.2 Estimating Change Points 455

where .mN ≤ ζ N mini∈{1,...,R} θi+1 − θi , for some .ζ ∈ (0, 1), then

(u − l)1/2 ΔN mN
. max ||Mk − Ml || ≥ .
l<k<u ((u − k)(k − l))1/2 2 N 1/2

Proof Let .a = max ||Mk − Ml ||. Then we aim to show that .a ≥ mN ΔN /4 under
l<k<u
condition (8.2.48). Let .v = ki0 +r and .v ' = ki0 +r+1 (if v is the right-most change
point between .l and u, let .v ' = u). Further assume that .E[Xv (t)] = μ(t) and
' '
.E[Xv ' (t)] = μ (t). Since .||μ − μ || ≥ ΔN , we get by the reverse triangle inequality
( '
)
that .max ||μ||, ||μ || ≥ ΔN /2. Moreover, (8.2.48) and the definition of .mN imply
that there is no additional change point between .[v − mN , v) and .(v, v + mN ]. Since
then .Mv (t) − Mv−mN (t) = mN μ(t), and .Mv+mN (t) − Mv (t) = mN μ' (t), we get
that
ΔN
. max(||Mv − Mv−mN ||, ||Mv+mN − Mv ||) ≥ mN . (8.2.49)
2
ΔN
Then we claim that . max ||Mk − Ml || ≥ 4 mN . If not,
l<k<u

ΔN
||Mv+mN − Ml || <
. mN , ||Mv − Ml ||
4
ΔN ΔN
< mN , and ||Mv−mN − Ml || < mN . (8.2.50)
4 4
These with the triangle inequality contradict (8.2.49). Furthermore, because

u−l 4
. ≤ ,
(u − k)(k − l) N

we have
⎛ ⎞1/2
u−l ΔN mN
. max ||Mv − Ml || ≥
l<k<u (u − k)(k − l) 2 N 1/2



The final lemma required describes conditions on the sub-samples under which
the binary segmentation algorithm will terminate.
Lemma 8.2.3 Suppose that the sub-sample .l and u are such that for a subset of the
sample space .AN , .maxl<k<u ||Wkl,u || ≤ aN and one of the following conditions are
satisfied:
(i) .β = 0, .ki0 < l < u < ki0 +1 ,
(ii) .β = 1, .min(ki0 +1 − l, u − ki0 +1 ) ≤ fN ,
456 8 Functional Data

(iii) .β = 2, .max(ki0 +1 − l, u − ki0 +2 ) ≤ fN . Then on the set .AN , for a positive


1/2
constant .c1 , . max ||Zkl,u || ≤ c1 max{fN , aN }.
l<k<u

Proof Let .maxl≤k≤u ||Okl,u || = Ol,u .

(i) .β = 0 : Since there is no change point in this case, then on .AN , .||Zkl,u || =
||Wkl,u || ≤ aN .
(ii) .β = 1 : In this case according to Lemma 8.2.1,
⎛ ⎞1/2
ki +1 ki0 +1 − l
Ol,u =
. ||Ol,u0 || = (u − l) 1/2
||μ||
u − ki0 +1
√ √ 1/2
≤ 2B(min(ki0 +1 − l, u − ki0 +1 ))1/2 ≤ 2CfN .

Therefore, . max ||Zkl,u || ≤ max ||Okl,u || + max ||Wkl,u || ≤ Ol,u + aN ≤


l<k<u l<k<u l<k<u
1/2
c2 max{fN , aN }.
ki +1 ki +2 √
(iii) .β = 2 : In this case .Ol,u = max(||Ol,u0 ||, ||Ol,u0 ||) ≤ 2c3 (max(ki0 +1 −
√ 1/2
l, u − ki0 +2 ))1/2 ≤ 2c4 fN . Similarly then in the last case, . max ||Zl,u || ≤
l<k<u
1/2
c5 max{fN , aN }.


Proof of Theorem 8.2.2 Let .K denote the set of change points. The binary segmen-
tation procedure starts with .l = 0 and .u = N . We note that for any starting and
ending indices we have by the triangle inequality that
|| E || || E ||
|| 1
k
|| || 1
u
||
.||Wl,u ||
k ||
≤ || | | |
εj || + ||| εj ||||
(k − l)1/2 (u − k)1/2
j =l+1 j =k+1

= ||Wkl,u,1 || + ||Wkl,u,2 ||. (8.2.51)

It follows by the maximal inequality Theorem A.3.1 that .max1≤k≤N ||Wk1,N,1 || =


OP ((log N )1/ν ) and .max1≤k≤N ||Wk1,N,2 || = OP ((log N )1/ν ), giving
.max1≤k≤N ||W
1,N || = OP ((log N)
k 1/ν ). As long as the number of change points
.R ≥ 1, this in conjunction with the definition of the threshold .ρN and Lemma 8.2.2

imply that a change point is initially detected with probability converging to one,
i.e. .P {max1≤j ≤N ||Zk1,N || > ρN } → 1 as .N → ∞. First then we wish to show that

dist(k̂1 , K) = OP (1), where k̂1 = sargmax ||Zk1,N ||.


. (8.2.52)
1<k<N
8.2 Estimating Change Points 457

Let .Ol,u = maxl≤k≤u ||Okl,u ||, and .Kmax = {ki : ||Ok1,N


i
|| = O1,N }. Evidently
for any .ki ∈ Kmax , .k̂1 = sargmax ||Zk1,N ||2 − ||Zk1,N
i
||2 . It follows as in the proof
1<k<N
of (8.2.6) that there are constants .αi and .βi with .0 < ki−1 < N αi < ki <
Nβi < ki+1 < 1, so that with .LN = ∪i : ki ∈Kmax {⎣Nαi ⎦, . . . , ⎣Nβi ⎦}, and .k̃1 =
sargmax ||Zk1,N ||, then .P (k̂1 = k̃1 ) → 1. Hence it is enough to prove (8.2.52) with
k∈LN
k̃1 replacing .k̂1 . Let .Ii,N (M) = {ki−1 +M, . . . , ki −M, ki +M, . . . , ki+1 −M}∩LN .
.

Note that since .maxk∈LN ||Zk1,N ||2 − ||Zk1,N


i
||2 ≥ 0 for all .ki ∈ Kmax ,
E
P (dist(k̃1 , K) > M) ≤
. P ( max ||Zk1,N ||2 − ||Zk1,N
i
||2 ≥ 0).
k∈Ii,N (M)
i : ki ∈Kmax
(8.2.53)

Below we let .r(k) = N/[k(N − k)], so that comparing to (8.2.40) and (8.2.41),
Ek
Ok1,N = r 1/2 (k)O0,k
.
1,N , and .W1,N = r
k 1/2 (k)W0,k = r 1/2 (k)
1,N j =1 (εj − ε̄). With
this notation we write for .k < ki

E
5
.||Zk1,N ||2 − ||Zk1,N
i
||2 = Ai,N (k),
i=1

where

A1,N (k) = [r(k) − r(ki )]||W0,k


.
1,N || ,
2
(8.2.54)
┌ ┐
A2,k = −r(ki ) ||W0,k
1,N ||
i 2
− ||W 0,k 2
1,N || ,

A3,N (k) = −2r(ki )<O0,k 0,ki 0,k


1,N , W1,N − W1,N >,
i

0,ki ki
A4,k = 2<r(k)O0,k 0,k
1,N − r(ki )O1,N , W1,N > and A5,N (k) = ||O1,N || − ||O1,N || .
k 2 2

Consider .k ∈ {⎣Nαi ⎦, . . . , ki } and .k ∈ {ki , . . . , ⎣Nβi ⎦}. According to the proof


of Lemma 8.2.1, .αi and .βi may be chosen additionally so that .||Ok1,N ||2 is strictly
increasing over .{⎣Nαi ⎦, . . . , ki } and strictly decreasing over .{ki , . . . , ⎣Nβi ⎦}. Rea-
soning as in Remark 8.2.3 and using the mean value theorem, it follows that there
are positive constants .c1 , c2 , c3 and .c4 so that

. − c1 (ki − k) < ||Ok1,N ||2 − ||Ok1,N


i
||2 ≤ −c2 (ki − k), (8.2.55)
k ∈ {⎣Nαi ⎦, . . . , ki },

.c3 (ki − k) < ||Ok1,N ||2 − ||Ok1,N


i
||2 ≤ c4 (ki − k), (8.2.56)
k ∈ {ki , . . . , ⎣Nβi ⎦}.
458 8 Functional Data

Using the right-hand inequalities of (8.2.55) and (8.2.56), for i such that .ki ∈ Kmax
there exists a positive constants .c5,i so that .maxk∈Ii,N (M), k≤ki A5,N (k) ≤ −c5,i M.
It follows then that

. lim lim sup max A5,N = −∞. (8.2.57)


M→∞ N →∞ k∈IN,i (M), k<ki

We now aim to show that .A5,N is the dominate term in (8.2.54). By the mean value
theorem, we have .|r(k) − r(ki )| ≤ c6 |k − ki |/N 2 , k ∈ LN . It follows from this and
the left-hand bounds in (8.2.55) and (8.2.56) that for all M,
|| ||2
|A1,N (k)| c7 || 1 0,k ||
max ≤ max || W ||
N 1≤k≤N || N 1/2 1,N ||
.
k∈IN,i (M), k<ki |A5,N (k)|
⎛ ⎞
(log N)2/ν
= OP = oP (1). (8.2.58)
N

Regarding .A2,N , note that .r(ki ) ≤ c8 /N , and by the Cauchy-Schwarz and


triangle inequalities we have for .k < ki ,
/ \
┌ ┐ E
ki
|A2,N (k)| = r(ki )
. ||W0,k
1,N ||
i 2
− ||W0,k
1,N ||
2
= r(ki ) (εi − ε̄), W0,k
1,N + W0,k
1,N
i

i=k+1
|| ||
|| E ki || ┌ ┐
||
c8 || ||
≤ || (εi − ε̄)||
|| ||W0,k
0,ki
1,N || + ||W1,N || . (8.2.59)
N || ||
i=k+1

0,k
We have as above that .max1≤k≤N |||W1,N ||/N 1/2 = OP ((log N)1/ν ), and
0,ki
|||W1,N
. ||/N 1/2 = OP (1). Additionally, for .ζ ∈ (1/2, 1) and using the triangle
inequality and the definition of .IN,i (M),
|| ||
|| ||
|| 1 Eki
||
max || εi − ε̄||
.
||
k∈IN,i (M), k<ki || ki − k ||
i=k+1 ||
|| ||
|| ||
|| 1 Eki
||
≤ || εi ||
max
k∈IN,i (M), k<ki || − || + ||ε̄|| (8.2.60)
|| i
k k
i=k+1 ||
|| ||
|| ||
1 || 1 E
ki
||
≤ 1−ζ max || ε ||
i || + ||ε̄||.
||
k∈IN,i (M), k<ki || (ki − k) ζ
M ||
i=k+1
8.2 Estimating Change Points 459

Since the error terms are assumed to be .Lν −decomposable, .||ε̄|| = oP (1), and by
the stationarity of the errors and .Lν -decomposability,
|| || || ||
|| Eki || ||E ||
|| 1 || D 1 || k ||
|| εi || || εj ||
. max
k∈IN,i (M), k<ki || − ζ || = M≤k<k
max
−1 ζ || ||
|| i
(k k)
i=k+1 ||
i k ||j =1 ||
|| ||
||E ||
1 || k
||
≤ sup ζ || || ε ||
j || = OP (1). (8.2.61)
k≥1 k ||j =1 ||

Combining the above we obtain that

|A2,N (k)|
. max
k∈IN,i (M), k<ki |A5,N (k)|
|| || ┌ ┐
|| ki ||
c8 || E || ||W0,k
1,N || ||W0,k
1,N ||
i

≤ max || ||
εi − ε̄ || + (8.2.62)
ki − k ||
k∈IN,i (M), k<ki ||i=k+1 || N N

= oP (1).

Regarding .A3,N , first we note that a simple calculation gives, .r(ki )||O0,k1,N || =
i

O(1). We then have by the Cauchy-Schwarz inequality and (8.2.60) and (8.2.61)
that
|| ||
|| ||
|A3,N (k)| ||
0,ki || 1
E
ki
||
. max ≤ max r(ki )||O1,N || || (εi − ε̄)||
||
k∈IN,i (M), k<ki ki − k k∈IN,i (M), k<ki || ki − k ||
i=k+1

1
≤ OP (1).
M 1−ζ
Therefore for all .x > 0,
⎧ ⎫
|A3,N (k)|
. lim lim sup P max > x = 0. (8.2.63)
M→∞ N →∞ k∈IN,i (M), k<ki |A5,N (k)|

Using similar techniques as above it may also be shown that


⎛ ⎞
|A4,N (k)| (log N)1/ν
. max = OP = oP (1). (8.2.64)
k∈IN,i (M), k<ki |A5,N (k)| N 1/2
460 8 Functional Data

Combining (8.2.54), (8.2.57), (8.2.58), (8.2.62), (8.2.63), and (8.2.64), and applying
symmetric reasoning when .k ≥ ki , we get that

. lim lim sup P ( max ||Zk1,N ||2 − ||Zk1,N


i
||2 ≥ 0) = 0,
M→∞ N →∞ k∈IN,i (M)

which combined with (8.2.53) implies that .dist(k̃1 , K) = OP (1), which in turn
implies that .dist(k̂1 , K) = OP (1).
Assume now by way of induction that .1 = k̂0 < k̂1 < · · · < k̂r < k̂r+1 = N + 1,
.r ≤ R, have been estimated so that .max1≤j ≤r dist(k̂j , K) = OP (1), and for some
' '
.α ∈ (0, 1) .P (mini∈{0,...,r} k̂i+1 − k̂i > α N) → 1. Under these conditions,

. max max ||Wk || = OP ((log N)1/ν ). (8.2.65)


i∈{0,...,r} k̂i ≤k≤k̂i+1 k̂i ,k̂i+1

To see this, we have by the triangle inequality that

|| E || || E
k̂i+1 ||
|| 1
k
|| || 1 ||
.||W
k
|| ≤ |||| εj |||| + |||| εj ||||.
k̂i ,k̂i+1 (k − k̂i )1/2 (k̂i+1 − k)1/2 j =k+1
j =k̂i +1

By the strict stationarity of the errors, it is then enough to show that


|| ||
|| 1 E k
||
. ||
max || 1/2 εj |||| = OP ((log N)1/ν ),
1≤k≤k̂1 k
j =1

and
|| E
k̂1 ||
|| 1 ||
. max |||| εj |||| = OP ((log N)1/ν ).
1≤k≤k̂1 (k̂1 − k)1/2 j =k+1

As for the first term, we have again using Theorem A.3.1 that
|| || || ||
|| 1 E k
|| || 1 E k
||
. | |
max || 1/2 | | | |
εj || ≤ max || 1/2 εj |||| = OP ((log N)1/ν ).
1≤k≤k̂1 k 1≤k≤N k
j =1 j =1

As for the second term, let .E > 0, and choose M large enough so that for all N
sufficiently large .P (BN,1 (M)) = P (dist(k̂1 , K) < M) > 1 − E/2. Let for .x > 0
⎧ ⎫
⎨ || || ⎬
|| 1 Ek̂1
||
AxN,1
. = max |||| εj |||| > x .
⎩1≤k≤k̂1 (k̂1 − k)1/2 ⎭
j =k+1
8.2 Estimating Change Points 461

Then for N sufficiently large

P (AxN,1 ) = P (AxN,1 ∩ BN,1 (M)) + P (AxN,1 ∩ BcN,1 ) ≤ P (AxN,1 ∩ BN,1 ) + E/2.


.

We aim to bound the first term on the right-hand side of the above inequality.
According to the definition of .BN,1 (M) and using the union bound and Theo-
rem A.3.1, we have for a positive constant .c10

P (AxN,1 ∩ BN,1 )
. (8.2.66)
⎧ ⎫
⎨ || E
j || ⎬
|| 1 ||
≤P max max |
max ||| ε || > x
⎩i∈{1,...,R} j ∈{ki −M,...,ki +M} 1≤k≤j (j − k)1/2
i || ⎭
i=k+1
⎧ ⎫
E i +M
R kE ⎨ || E
j || ⎬
|| 1 | |
≤ P max |||| ε |
i ||
| > x ≤ c10 log Nx −ν ,
⎩1≤k≤j (j − k)1/2 ⎭
i=1 j =ki −M i=k+1

Setting .x = C(log N)1/ν with a suitably large constant C gives that .P (AxN,1 ∩
BN,1 ) < E/2. These combined imply (8.2.65).
Let .ιN satisfy .(log N)1/ν /ιN +ιN /ρN → 0, .BN,r (M) = {max1≤j ≤r dist(k̂j , K) ≤
M}, and .Ar,N = {maxi∈{0,...,r} maxk̂i ≤k≤k̂i+1 ||Wk || ≤ ιN }. Then if .r = R,
k̂i ,k̂i+1
the conditions of Lemma 8.2.3 are satisfied on each sub-sample on the set
.BN,r (M) ∩ Ar,N , whose probability tends to 1 as .M, N → ∞, with .aN = ιN ,

and .fN = M. This implies that the procedure terminates on this set. If .r < R, one
of the sub-samples determined by .l = k̂j , and .u = k̂j +1 satisfies (8.2.48) with
.mN ≥ c4 N for a positive constant .c4 , and an additional change point is detected.

Letting .k̂ ∗ = sargmax ||Zkl,u ||, and


l<k<u

B∗N (M ∗ ) = {dist(k̂ ∗ , K) > M ∗ ,


. min |k̂ ∗ − k̂i | > α ' N },
{i=0,...,r+1}

we have for .E > 0 and since .limM→∞ lim supN →∞ P (BN,r (M)) = 1 that for any
E > 0,
.

P (B∗N (M ∗ ))
.

≤ P (B∗N (M ∗ ) ∩ BN,r (M1 )) + E/2


E E
≤ P (B∗N (M ∗ ), l = l' , u = u' ) + E/2.
{ki <ki ' , : i ' −i≥2} {|l' −ki |≤M, |u' −ki ' |≤M}

By repeating the same arguments used to establish (8.2.52), it can be shown that
for all .l, u such that .{|l − ki | ≤ M, |u − ki ' | ≤ M}, .limM ∗ →∞ lim supN→∞
462 8 Functional Data

P (B∗N (M ∗ )) = 0, giving that .dist(k̂ ∗ , K) = OP (1), and .P {min{i=0,...,r+1} |k̂ ∗ −


k̂i | > α ' N} → 1, completing the proof. ⨆

Given such a method to produce preliminary estimates of the locations of the
change points, we now consider refining these estimators. Doing so also furnishes
a simple method to produce asymptotically conservative (simultaneous) confidence
intervals for the change points. All that is required of the initial segmentation is that
it satisfies the following basic consistency condition.
Assumption 8.2.4 For all .ε > 0,
⎛ ⎧ ⎫⎞
. lim P {R̂ = R} ∩ max |k̂i − ki | < εN = 1.
N→∞ 1≤i≤R

Using the segmented data, we obtain the CUSUM estimators for a change point
inbetween .k̂l−1 and .k̂l+1 :
⎧⎛ ⎞κ
k̂l+1 − k̂l−1
.k̃l = sargmax
[j − k̂l−1 ][k̂l+1 − j ]
⎛⎛
k̂l−1 <j <k̂l+1
⎞2
E
j E
k̂l+1 ⎫
⎝ j − k̂l−1
× Xi (t) − Xi (t)⎠ dt ,
k̂l+1 − k̂l−1
i=k̂l−1 +1 i=k̂l−1 +1

1 ≤ l ≤ R̂ with .k̂0 = 0 and .k̂R̂+1 = N. In order to describe the asymptotic


.

properties of .k̃l (κ) we define the long-run variance parameter



E ⎛⎛ ⎞ ⎛⎛ ⎞
2
.τl = E E0 (t)hl (t)dt Ej (t)hl (t)dt , 1 ≤ l ≤ R.
j =−∞

Similarly to the presentation in Sect. 8.2.1, we consider both the cases when
ΔN,l → 0 as .N → ∞, as well as when .ΔN,l tends to a constant. In the former
.

case the limiting distribution of .k̃l is the maximal argument of a Gaussian process,
while in the latter it is distributed as the maximal argument of a random walk with
drift constructed from the innovations in (8.2.38).
Theorem 8.2.3 Suppose .κ = 0, the errors in (8.2.38) are .Lν -decomposable, and
Assumption 8.2.3 holds.
(i) If .maxl∈{1,...,R} ΔN,l + 1/(NΔ2N ) → 0 as .N → ∞, then

Δ2N,1 ⎛ ⎞ Δ2N,R ⎛ ⎞
. k̃ 1 − k 1 , . . . , k̃ R − k R (8.2.67)
τ12 τR2
8.3 Change in the Covariance of Functional Observations 463

are asymptotically independent, and

Δ2N,l ⎛ ⎞ D
. k̃l − kl → argmaxt {W (t) − |t|m̄0,l (t)}, (8.2.68)
τl2

where .W (t) is a two-sided Brownian motion, and .m̄κ,l is defined in (2.3.15).


(ii) If instead .ΔN,l → Δl /= 0 as .N → ∞, then (8.2.67) remains true, but instead
of (8.2.68),

D
{ }
k̃l − kl → argmaxj Δl S(j ) − Δ2l |l|m̄0,l (j ) ,
.

where S is defined in (8.2.4).


Sketch of the Proof We can repeat the proof of Theorem 2.3.3 with minor
modifications. Using Assumption 8.2.4 one can show that the CUSUM estimators
computed from .{Xk̂l−1 +1 , Xk̂l−1 +2 , . . . , Xk̂l+1 } and .{Xkl−1 +1 , Xkl−1 +2 , . . . , Xkl+1 }
have the same limit distributions. Hence Theorem 8.2.1 implies the limit result.
The independence follows from the observation that the limit distribution of .k̃l is
determined by the partial sums of .{Ei (t), kl − ⎣Nδ⎦ ≤ i ≤ kl + ⎣Nδ⎦}, where we
can choose .δ > 0 as small as we wish. Due to the .Ei ’s being .Lν -decomposable, we
have the asymptotic independence of .Δ2N,l (k̃l − kl ), 1 ≤ l ≤ R. The proof of this
result is the same as of Theorem 2.3.4, we just need to replace Theorems 2.2.2 and
2.3.3 with Theorem 8.2.3, respectively. ⨆

Confidence intervals for each change point may now be constructed
using (8.2.37).

8.3 Change in the Covariance of Functional Observations

The ideas presented above can be generalized to test for and estimate change points
in other quantities describing the distribution of functional data. In this subsection,
we focus on changes in the covariance or “second order properties” of a functional
time series. We largely omit the proofs of these results in place of references to
source material. In many cases the proofs are similar to those presented above.
Below we use the notation .x ⊗ y for functions .x, y ∈ L2 [0, 1] to denote the
function .(t, s) |→ x(t)y(s) ∈ L2 [0, 1]2 . Often the key observation is that variables
of the form .Xi ⊗ Xi , for instance as might appear in the typical estimator for the
covariance kernel of a functional time series, belong to a separable Hilbert space
when .Xi ∈ L2 [0, 1], and are also .Lν/2 decomposable when .Xi is .Lν -decomposable.
464 8 Functional Data

We consider the following simple single change point model for changes in the
second order properties of a functional data sequence:

μ(t) + Ei (t), if 1 ≤ i ≤ k ∗
Xi (t) =
. (8.3.1)
μ(t) + Ei,A (t), if k ∗ + 1 ≤ i ≤ N,

where .Eεi (t) = Eεi,A (t) = 0. The series .Xi so defined so that it has a constant
mean function .μ, but the structure of the innovations may change at the point .k ∗ . In
order for allow for general serial dependence among the innovations, we assume that
the innovations before and after the change are each .Lν -decomposable Bernoulli
shifts:
Assumption 8.3.1 .Ei (t) = g(ηi , ηi−1 , . . .)(t) and .Ei,A (t) = gN (ηi , ηi−1 , . . .)(t)
for some (deterministic) measurable functions .g, gN : S∞ → L2 , where .{ηj , j ∈
Z} are independent and identically distributed random variables with values in a
measurable space .S, and .Ei (t) = Ei (t, ω) is jointly measurable in .(t, ω), for each
.i ∈ Z. Further .EEi (t) = EEi,A (t) = 0 for all .t ∈ [0, 1], .E||Ei || , E||Ei,A || < ∞
ν ν
2 2
with some .ν > 4, and with some .a > 0,
( ∗
)1/ν ( )1/ν
. E||Ei − Ei,m ||ν2 ≤ am−α , E||Ei,A − Ei,A,m

||ν2
≤ cm−α with some α > 2, (8.3.2)

∗ = g(ηi , . . . , ηi−l+1 , ηi−l∗ , η∗ ∗


where .Ei,l i−l−1 , . . .) and .Ei,A,l = gN (ηi , . . . ,
∗ ∗ ∗ ∗
ηi−l+1 , ηi−l , ηi−l−1 , . . .), with .{ηi , i ∈ Z} .{Ei , i ∈ Z} are independent copies of
.η0 , independent of .{ηl , l ∈ Z}.

Let

. C(t, s) = Cov(Xi (t), Xi (s)) = EEi (t)Ei (s), 1 ≤ i ≤ k∗,

denote the covariance kernel of the observations before a potential change point .k ∗ ,
and let

CA (t, s) = Cov(Xi (t), Xi (s)) = EEk,A (t)Ek,A (s),


. k∗ + 1 ≤ k ≤ N

denote the covariance function after the change. We use the notation .CΔ (t, s) =
C(t, s) − CA (t, s) to denote the difference between the covariance kernels before
and after the change point. Testing for changes in the covariance function can be
framed as a hypothesis test of

H0 : ||CΔ || = 0,
. (8.3.3)

versus

HA : ||CΔ || > 0.
. (8.3.4)
8.3 Change in the Covariance of Functional Observations 465

Before proceeding, we consider several common examples of functional time


series models, and show that changes in their parameters give rise to observations
that follow model (8.3.1) and Assumption 8.3.1.
Example 8.3.1 We consider Functional AutoRegressive models of order 1
(FAR(1)). We assume that .{Ei (t), i ∈ Z, t ∈ [0, 1]} is the stationary solution
of

.Ei (t) = K(t, s)Ei−1 (s)ds + ηi (t). (8.3.5)

It may be shown that the equation in (8.3.5) admits a unique, non-anticipative


stationary solution if

||K||2 < 1,
. (8.3.6)

and

{ηi (t), 0 ≤ t ≤ 1, i ∈ Z} are independent and identically distributed


. (8.3.7)
random functions with Eη0 (t) = 0 and E||η0 ||ν2 < ∞ with some ν > 4.

We define the random functions .Ei,A (t), i ∈ Z as the unique, non-anticipative


stationary solution of

Ei,A (t) =
. KN (t, s)Ei−1,A (s)ds + ηi (t), (8.3.8)

with

KN (t, s) = K(t, s) + aN k(t, s).


.

If, in addition to (8.3.5), (8.3.6), and (8.3.7), we assume .aN > 0 is sufficiently small
so that

||K||2 + aN ||k||2 < 1.


.

Let
⎛ ⎛ | |
n
K(l) (x1 , xn+1 ) =
. ··· K(x1 , x2 )K(x2 , x3 ) . . . K(xn , xn+1 ) dxi
i=2

and the associated operator be



K [f ](t) =
.
(l)
K(l) (t, s)f (s)ds, l ≥ 1 and K(0) [f ](t) = f (t).
466 8 Functional Data

Defining analogous quantities with .KN in place of .K, the stationary solutions
to (8.3.5) and (8.3.8) may be written as

E
Ei (t) =
. K(l) [ηi ](t),
l=0

and

E (l)
.Ei,A (t) = KN [ηi ](t).
l=0

For more details see Bosq (2000) and Horváth and Kokoszka (2012). Noting
(l)
that both the norms of the operators .K(l) and .KN decay geometrically in .l,
Assumption 8.3.1 holds in this case. Let

l ⎛
E ⎛ k−1
| |
L (x1 , xn+1 ) =
.
(l)
··· K(xi , xi+1 )k(xk , xk+1 )
k=1 i=1

| |
l | |
l
× K(xj , xj +1 ) dxm ,
j =k+1 m=2

| |
with . i∈∅ = 1. The corresponding operator is

L(l) [f ](t) =
. L(l) (t, s)f (s)ds, l ≥ 1 and L(0) [f ](t) = f (t).

Elementary calculations give

Ei,A (t) − Ei (t) = aN δi (t) + aN


.
2
Ri,N (t)

with

E
.δi (t) = L(l) [ηi−l ](t)
l=0

and

. lim sup E||R0,N ||ν2 < ∞.


N →∞

From this it follows that

||CΔ − aN (EE0 ⊗ δ0 + Eδ0 ⊗ E0 )||2 = O(aN


.
2
).
8.3 Change in the Covariance of Functional Observations 467

The results for FAR(1) processes can be extended to FAR.(p) processes, as well
as general, linear processes.
Example 8.3.2 Example 8.3.1 can be extended to linear processes defined as

E
.Ei (t) = Ll [ηi−l ](t)
l=1

and

E
. Ei,A (t) = LN,l [ηi−l ](t),
l=1

where .Ll [f ](t) and .LN,l [f ](t) are the integral operators associated with the
functions .Ll (t, s) and .LN,l (t, s). It is assumed that


E
. ||Ll ||2 < ∞.
l=1

For the sake of simplicity we assume

LN,l (t, s) = Ll (t, s) + aN zl (t, s) + aN


.
2
GN,l (t, s),

and

E
. ||zl ||2 < ∞
l=1


E
. lim sup ||GN,l ||2 < ∞
N →∞ l=0

hold. It follows from elementary calculation that



||CΔ − aN (Āl + Āl )||2 = O(aN
.
2
)

where
∞ ⎛ ⎛
E

. l (t, s) = Ll (t, u)(EE0 (u)E0 (v))Ll (v, s)dudv,
l=0


and Ā
. l (s, t) = Āl (t, s) denotes the adjoint kernel.

The last example is an example of a non-linear functional time series process.


468 8 Functional Data

Example 8.3.3 Following Aue et al. (2017), Cerovecki et al. (2019), and Küchnert
(2020), we define the functional GARCH(1,1) (FGARCH(1,1)) process

Ei (t) = σi (t)ηi (t), t ∈ [0, 1],


.

where .Eηi (t) = 0, .Eηi2 (t) = 1, and


⎛ ⎛
σi2 (t) = ω(t) +
.
2
α(t, s)Ei−1 (s)ds + 2
β(t, s)σi−1 (s)ds, (8.3.9)

and the non-negative parameter functions .ω, .α and .β satisfy the regularity condi-
tions of Theorem 1 of Aue et al. (2017), which imply that a stationary solution .Ei
satisfying (8.3.9) exists in the function space .C[0, 1] of continuous functions defined
on the unit interval. One of these conditions in particular is that .inf0≤t≤1 ω(t) > 0.
A change in the variance of the process may be modelled by changes in these
parameter functions. For example, a “level shift” in the pointwise variance of the
functional observations is induced as in (8.3.1) by setting .Ei,A (t) = σi,A (t)ηi (t),
with
⎛ ⎛
.σi,A (t) = ω(t) + aN c(t) + (s)ds + β(t, s)σi−1
2 2 2
α(t, s)Ei−1 (t, s)ds,

and the function .c is taken to satisfy .inf0≤t≤1 c(t) > 0. Since the stationary solution
σ0 of (8.3.9) is independent of .η0 , we get
.

( )
CΔ (t, s) = Eσi,A (t)σi,A (s) − Eσi (t)σi (s) Eη0 (t)η0 (s),
.

and by the mean-value theorem


|| ||2
|| aN c ||
||
.E σi,A − σi − || = O(a 4 ).
|| 2σi ||2 N

From this it may be shown that


⎛⎛ ⎛ ⎧ ⎛ ⎞ ⎛ ⎞ ⎫ ⎞2
1 σ0 (s) σ0 (t)
. CΔ (t, s) − E c(t) + E c(s) aN dtds = O(aN
4
).
2 σ0 (t) σ0 (s)

Therefore a change of magnitude .aN to the level of the conditional variance process
induces a change of the same magnitude in the covariance functions. A similar
change arises when any of the other parameter functions are changed, and it may
be shown as above that if
⎛ ⎛
.σA,i (t) = ω+
2 2 2
(α(t, s)+aN δ1 (t, s))Ei−1 (s)ds+ (β(t, s)+aN δ2 (t, s))σi−1 (s)ds,
8.3 Change in the Covariance of Functional Observations 469

then as .aN tends to zero there exists a nonzero kernel .A so that

||CΔ − aN A||2 = O(aN


.
2
).

In order to test .H0 versus .HA , we consider CUSUM processes of partial sample
estimates of the covariance kernels: for .u, t, s ∈ [0, 1] we let
⎛ ⎣N
E u⎦
ZN (u, t, s) = N −1/2
. (Xi (t) − X̄N (t))(Xi (s) − X̄N (s))
i=1

⎣Nu⎦ E
N
− (Xi (t) − X̄N (t))(Xi (s) − X̄N (s)) ,
N
i=1

where .X̄N (t) is the sample mean. The asymptotic long-run covariance function of
ZN (u, t, s) contains the term
.


E
D(t, t ' , s, s ' ) =
. cov(E0 (t)E0 (s), El (t ' )El (s)), t, t ' , s, s ' ∈ [0, 1].
l=−∞
(8.3.10)
The following result describes the asymptotic properties of .ZN :
Theorem 8.3.1 If Assumption 8.3.1 holds with .gN = g, which implies that .H0
holds, then we can define a sequence of Gaussian processes .{┌N (u, t, s), 0 ≤
u, t, s ≤ 1} such that
⎛⎛
. sup (ZN (u, t, s) − ┌N (u, t, s))2 dtds = oP (1)
0<u<1

with .E┌N (u, t, s) = 0 and .E┌N (u, t, s)┌N (u' , t ' , s ' ) = (min(u, u' ) −
uu' )D(t, t ' , s, s ' ).
The proof is similar to that of Theorem 8.1.1. The following result is a “covariance
analog” of Theorem 8.1.2.
Theorem 8.3.2 If Assumption 8.3.1 holds with .gN = g, which implies that .H0
holds, and .0 ≤ κ < 1/2 hold, then with the Gaussian processes .{┌N (u, t, s), 0 ≤
u, t, s ≤ 1} of Theorem 8.3.1 we have
⎛⎛
1
. sup (ZN (u, t, s) − ┌N (u, t, s))2 dtds = oP (1).
1/(N +1)<u<1−1/(N +1) [u(1 − u)]

In many practical settings, before testing for and estimating changes in the
covariance kernel, a second order property, we first test for and estimate changes
in the mean function. It is natural to attempt to re-center the data based on estimates
of the mean change. Letting .k (∗,m) denote the time of a change in the mean as in
470 8 Functional Data

model (8.1.1), which we estimate by .k̄ (m) , we would then compute the estimated
errors

Xi (t) − X̄k̄ (m) ,1 (t), if 1 ≤ i ≤ k̄ (m) ,
.Ēi (t) =
Xi (t) − X̄k̄ (m) ,2 (t), if k̄ (m) + 1 ≤ i ≤ i ≤ N,

where
(m)
1 E E
k̄ N
1
X̄k̄ (m) ,1 (t) =
. Xi (t) and X̄k̄ (m) ,2 (t) = Xi (t).
k̄ (m) N − k̄ (m)
i=1 i=k̄ (m) +1
(8.3.11)

We then modify the covariance CUSUM process to reflect the different centraliza-
tion applied:

⎛ ⎣N
E ⎞
⎣Nu⎦ E
u⎦ N
ẐN (u, t, s) = N −1/2
. Ēi (t)Ēi (s) − Ēi (t)Ēi (s) .
N
i=1 i=1

The process .ẐN has the same limiting behaviour as .ZN of Theorem 8.3.1 so long as
the change point in the mean is estimated consistently with a rate .oP (N ).
Theorem 8.3.3 If Assumption 8.3.1 is satisfied with .gN = g, and if there is at most
one change in the mean at location .k (∗,m) with estimator .k̄ (m) satisfying

|k̄ (m) − k (∗,m) | = oP (N ),


.

then
⎛⎛ ⎛ ⎞2
. sup ẐN (u, t, s) − ┌N (u, t, s) dtds,
0<u<1

where .{┌N (u, t, s), 0 ≤ u, t, s ≤ 1} is defined in Theorem 8.3.1.


The proof goes along the lines of the arguments used in Sect. 3.2 and it is omitted.
The following result describes the consistency properties of tests for .H0 versus
.HA based on the average and supremum functionals of .ZN .

Theorem 8.3.4 We assume that .HA of (8.3.4) holds along with Assumption 8.3.1,
and

. lim sup ||CA ||2 < ∞.


N →∞
8.3 Change in the Covariance of Functional Observations 471

(i) If

NθN2 (1 − θN )2 ||CΔ ||22 → ∞,


.

then
⎛⎛⎛
1 P 1
.
2
ZN (u, t, s)dtdsdu → .
N (θN (1 − θN )) 2
||CΔ ||22 3

(ii) If .0 ≤ κ < 1/2 and

N(θN (1 − θN ))2−2κ ||C − CA ||22 → ∞,


.

then
1
.
N(θN (1 − θN ))2−κ ||C − CA ||22
⎛⎛
1 P
× sup ZN 2
(u, t, s)dtds → 1.
0<u<1 [u(1 − u)]

The proof is similar to that of Theorem 8.1.3.


Using Theorems 8.3.1 and 8.3.4, we can apply the techniques discussed in
Sect. 8.1 to estimate the critical values of
⎛⎛⎛ 2 (u, t, s)
ZN
TN (κ) =
. dudtds
[u(1 − u)]2κ

under .H0 , and establish the asymptotic consistency of tests based on .TN (κ). For
example in order to estimate the null distribution of .TN = TN (0), let .{┌(u, t, s), 0 ≤
u, t, s ≤ 1} be a Gaussian process with zero mean and .E┌(u, t, s)┌(u' , t ' , s ' ) =
(min(u, u' ) − uu' )D(t, t ' , s, s ' ), where .D is defined in (8.3.10). Noticing that the
covariance function of .┌ is a product of .min(u, u' )−uu' and the covariance function
of a Brownian bridge, it may be shown that
⎛⎛⎛ ∞
E
D λl
. ┌ 2 (u, t, s)dudtds = N2 ,
(π k)2 k,l
k,l=1

where .{Nk,l , 1 ≤ k, l < ∞} are independent, identically distributed standard


normal random variables, and .λ1 ≥ λ2 ≥ . . . satisfy
⎛⎛
λi φi (t, s) =
. D(t, t ' , s, s ' )φi (t ' , s ' )dt ' ds ' , 1 ≤ i < ∞,
472 8 Functional Data

where .<φi , φj > = 1{i = j }. The long-run covariance defined in (8.3.10) can also be
estimated from the sample. In order to do so, we define

zi (t, s) = [Xi (t) − X̄N (t)][Xi (s) − X̄N (s)].


.

The corresponding autocovariance of the .zi ’s at lag .l are defined as

γ̂l (t, t ' , s, s ' )


.

⎪ N −l

⎪ 1 E

⎪ (zi (t, s) − z̄N (t, s))(zi+l (t ' , s ' ) − z̄N (t ' , s ' )), if l ≥ 0
⎨N −l
i=1
=

⎪ 1 EN

⎪ (zi (t, s) − z̄N (t, s))(zi+l (t ' , s ' ) − z̄N (t ' , s ' )), if l < 0,

⎩ N − |l|
i=−(l−1)

where

1 E
N
z̄N (t, s) =
. zi (t, s).
N
i=1

The kernel estimator for .D of (8.3.10) is

E
N −1 ⎛ ⎞
' ' l
D̂N (t, t , s, s ) =
. K γ̂l (t, t ' , s, s ' ).
h
l=−(N −1)

If h and K satisfy Assumptions 3.1.4 and 3.1.5, one can prove following the proof
of Theorem 8.1.4 that under the null hypothesis of no change point, we have
⎛⎛⎛⎛
. (D̂N (t, t ' , s, s ' ) − D(t, t ' , s, s ' ))2 dtdt ' dsds ' = oP (1) (8.3.12)

and
⎛⎛⎛⎛
. D̂2N (t, t ' , s, s ' )dtdt ' dsds ' = OP (h2 ||CΔ ||22 ) (8.3.13)

under the alternative. Now we can estimate the eigenvalues of .D with .λ̂1 ≥ λ̂2 ≥
. . . , the empirical eigenvalues of .D̂N defined as the solutions of the eigenvalue
problem:
⎛⎛
. λ̂i φ̂i (t, s) = D̂N (t, t ' , s, s ' )φ̂i (t ' , s ' )dt ' ds ' , 1 ≤ i < N.
8.3 Change in the Covariance of Functional Observations 473

We may then use



E
d
λ̂l
. T̂N,0 (d) = N2 ,
(π k)2 k,l
k,l=1

as an approximation for the null distribution of .TN . An approximate .α sized test of


H0 versus .HA is to reject .H0 when .TN exceeds the .1 − α quantile of .T̂N,0 (d), which
.

can be approximated via simulation. The consistency of this procedure follows


from (8.3.12) and (8.3.13). This procedure can also be adapted to adjust for a change
in the mean by replacing .zi with

(Xi (t) − X̄k̄ (m) ,1 (t))(Xi (s) − X̄k̄ (m) ,1 (s)), 1 ≤ i ≤ k̄,
zi (t, s) =
.
(Xi (t) − X̄k̄ (m) ,2 (t))(Xi (s) − X̄k̄ (m) ,2 (s)), k̄ + 1 ≤ i ≤ N,

where .X̄k̄ (m) ,1 and .X̄k̄ (m) ,2 are defined in (8.3.11).


Next we consider estimating .k ∗ of (8.3.1). Our estimator for the time of change
in the covariance function is
⎧⎛ ⎞κ
⎛⎛ ⎛E
k
N
k̂N = k̂N (κ) = sargmax
. (Xi (t)
k∈{1,...,N −1} k(N − k)
i=1

− X̄N (t))(Xi (s) − X̄N (s)) (8.3.14)


⎞2 ⎫
k E
N
− (Xi (t) − X̄N (t))(Xi (s) − X̄N (s)) dt .
N
i=1

We only consider asymptotic behaviour of .k̂N when the size of the change .||CΔ ||
tends to zero with the sample size. As we have seen earlier, in this case the limit
distribution of the time of change will be the maximal argument of a Gaussian
process depending on a small number of parameters.

Assumption 8.3.2 .||CΔ ||2 → 0 and .N ||CΔ ||22 → ∞.


The next assumption yields that the change in the structure of the errors is not too
large:
Assumption 8.3.3 with some .ν > 4
|| ||ν
E ||Ei − Ei,A ||2 → 0,
. as N → ∞.
474 8 Functional Data

Under Assumption 8.3.3 the covariance function .CA is close to .C in the .L2 sense if
the sample size N is large. We also assume that the standardized difference has a
limit:

Assumption 8.3.4 there is .C∗ (t, s) ∈ L2 ([0, 1] × [0, 1]) such that
⎛⎛ ⎛ ⎞2
CΔ (t, s)
. − C∗ (t, s) dtds = o(1).
||CΔ ||2

Let

τ2
. (8.3.15)

E ⎛⎛ ⎛ ⎛⎛ ⎞
= Cov E0 (t, s)C∗ (t, s)dtds, El (t, s)C∗ (t, s)dtds .
l=−∞

Theorem 8.3.5 We assume that .HA of 8.3.4 holds, and Assumptions 2.1.1
and 8.3.1–8.3.4 are satisfied.
(i) If .0 ≤ κ < 1/2, then

||CΔ ||22 ⎛ ∗
⎞ D
. k̂ N − k → ξ(κ).
τ2
(ii) If in addition

N 1/2 ||CΔ ||2 (log N)−2/ν → ∞


.

holds, then

||CΔ ||22 ⎛ ∗
⎞ D
. k̂ N − k → ξ(1/2),
τ2
where .ν is from Assumption 8.3.1 and .ξ(κ) is defined in (2.2.3).
The proof is similar to that of Theorems 2.2.1 and 8.2.1, and can be found in
Horváth et al. (2022).
In order to apply Theorem 8.3.5 to, for example, produce confidence intervals for
∗ 2
.k , we require and estimate of .τ in (8.3.15). This is the long-run variance of the

variables
⎛⎛
.ej = Ej (t)Ej (s)CΔ (t, s)dtds

so .τ 2 can be estimated by a kernel long-run variance estimator. As .ej is not


observed, we replace .Ej (t)Ej (s) .[Xj (t)− X̄(t)][Xj (t)− X̄(t)] with the residuals and
8.3 Change in the Covariance of Functional Observations 475

CΔ (t, s) with the standardized difference between the sample covariance kernels
.

before and after .k̂N .


These results can also be extended to multiple changes in the covariance function
as in Theorem 8.2.3.

8.3.1 Changes in the Trace and Eigenvalues of the Covariance


Kernel

Since it is common to use the eigenvalues as well as the trace of the covariance
kernel to estimate the number of functional principal components required in order
to perform effective dimension reduction with functional data, it is also of interest
to investigate change point detection procedures for them. Consider the sequence of
(i) (i) '
1 ≥ λ2 ≥ · · · ≥ 0 of the covariance kernel of the .i th observation .C (t, s) =

(i)

Cov(Xi (t), Xi (s)) satisfying



(i) (i) (i)
.λ φ
j j (t) = C(i) (t, s)φj (s)ds, j ∈ {1, 2, . . .}, (8.3.16)

(i) (i)
and <φj , φk > = 1{j = k}.

Let

d = (λ1 , . . . , λd )T ∈ Rd ,
(i) (i) (i)
.

denote the vector of the first d largest eigenvalues of .C(i) . Formally then we wish to
test
(1) (N )
.H0 : d = · · · = d

against the alternative

(1) (k ∗ ) (k ∗ +1) (N )
HA : d = · · · = d
. /= d = · · · = d ,

where .k ∗ = ⎣θ N ⎦, with .θ ∈ (0, 1). In order to test .H0 versus .HA , we consider
partial sample estimates of the covariance kernel given by

⎣N u⎦
1 E
Ĉu (t, s) =
. (Xi (t) − X̄(t))(Xi (s) − X̄(s)), t, s ∈ [0, 1], u ∈ [1/N, 1],
N
i=1
(8.3.17)
476 8 Functional Data

with .Ĉu = 0 for .0 ≤ u < 1/N. The estimate .Ĉu may be used to define an integral
operator

ĉu (f )(t) =
. Ĉu (t, s)f (s)ds, u, s ∈ [0, 1]. (8.3.18)

For .u ∈ [0, 1], let .λ̂j (u) denote the ordered eigenvalues of .ĉu with corresponding
orthonormal eigenfunctions .ϕ̂j,u . To consider tests based on the vector of partial
sample estimates of .d , define

. ˆ d (x) = (λ̂1 (x), . . . , λ̂d (x))T ,


 x ∈ [0, 1],

ˆ d (x), x ∈ [0, 1]) taking values in


and note that this gives rise to the process .(
Dd [0, 1], the d-dimensional Skorokhod space. We define
.

υi,j = <εi ⊗ εi − E[ε0 ⊗ ε0 ], ϕj ⊗ ϕj >,


. (8.3.19)
i ∈ {1, . . . , N }; j ∈ {1, . . . , d},

ˆ d under .H0 and assuming that .{Xi }i∈Z is .Lν -


In studying the properties of .
(i)
decomposable, in which case the eigenvalues appearing .{λj }{j ∈N} = {λj }{j ∈N}
do not depend on i, we assume that these common eigenvalues are well spaced:
Assumption 8.3.5 .λ1 > · · · > λd > λd+1 ≥ 0.
This ensures that the eigenspaces corresponding to each of the first d eigenvalues
are one dimensional. The following theorem establishes the asymptotic properties
ˆ d (x), x ∈ [0, 1]}.
of a suitably normalized version of the process .{
Theorem 8.3.6 If Assumption 8.3.1 holds with .gN = g, so that the sequence .Xi is
Lν -decomposable for some .ν > 4, and 8.3.5 hold, then for any fixed .δ ∈ (0, 1),
.

⎛ ⎞ d
ˆ ⎣Nx⎦ D [0,1] 1/2 (d)
N
.
1/2
d (x) − d → Ed W (x),
N

where .W(d) denotes a standard d-dimensional Brownian motion, and .Ed a .d × d


covariance matrix with entries

E
Ed (j, j ' ) =
. cov(υ0,j , υi,j ' ), j, j ' = 1, . . . , d.
i=−∞

Proof It follows from Theorem 2.1 of Aue et al. (2020) that for any fixed .δ with
0 < δ < 1,
.

⎛ ⎞ d
ˆ ⎣Nx⎦ D [δ,1] 1/2 (d)
.N 1/2
d (x) − d → Ed W (x). (8.3.20)
N
8.3 Change in the Covariance of Functional Observations 477

Theorem A.3.4 implies that for each .j ∈ {1, . . . , d},


| | || ||
| | || ||
|λ̂j (k/N) − k λj | ≤ 1 ||Ĉk/N − k C|| ,
.
| N | N || N ||2

with .C denoting the common covariance kernel under Assumption 8.3.1. Therefore
by Theorem 8.3.1, for all .t > 0,
⎛ | | ⎞
| k ||
. lim lim sup P N 1/2 |
sup |λ̂j (k/N) − λj | ≥ t = 0.
δ→0 N →∞ 1≤k≤⎣N δ⎦ N

This implies that


⎛ || ⎛ ⎞|| ⎞
|| 1/2 ⎣N x⎦ ||
. lim lim sup P ||
sup ||N ˆ
d (x) − ||
d || ≥ t = 0.
δ→0 N →∞ 0≤x≤δ N

Moreover due to the continuity of Brownian motion, for all .t > 0,


⎛ ⎞
. lim P sup ||W (d)
(x)|| ≥ t = 0.
δ→0 0≤x≤δ

As a result of (8.3.20) and due to the Skorokhod-Dudley-Wichura Theorem, there


exists a sequence of identically distributed standard d dimensional Wiener processes
(d)
.W
N (x) so that
|| ⎛ ⎞ ||
|| 1/2 ||
sup || ˆ d (x) − ⎣Nx⎦ d − E 1/2 W(d) (x)|| a.s.
.
|| N 
N d N || → 0.
δ≤x≤1

For each .t > 0 and .0 < δ < 1,


⎛ || ⎛ ⎞ || ⎞
|| 1/2 ⎣Nx⎦ ||
.P ||
sup ||N ˆ
d (x) −
1/2 (d) ||
d − Ed WN (x)|| ≥ t
0≤x≤1 N
⎛ ⎞
≤P sup ||W(d) (x)|| ≥ t/2
0≤x≤δ

|| ⎛ ⎞ || ⎞
|| 1/2 ⎣Nx⎦ ||
+ P sup || ˆ d (x) − d − Ed WN (x)||
1/2 (d)
||N 
N || ≥ t
δ≤x≤1
⎛ || ⎛ ⎞|| ⎞
|| 1/2 ⎣Nx⎦ ||
+ P sup ||||N ˆ d (x) − d ||
|| ≥ t/2 . (8.3.21)
0≤x≤δ N
478 8 Functional Data

Combining the above, the three terms on the right-hand side of (8.3.21) can be made
arbitrarily small for all sufficiently large N and small .δ. This implies that
|| ⎛ ⎞ ||
|| 1/2 || P
sup || N ˆ d (x) − ⎣Nx⎦ d − E 1/2 W(d) (x)|| → 0,
.
|| N d N ||
0≤x≤1

which establishes the Theorem. ⨆



For .i ∈ Z, let .ϒ i = (υi,1 , . . . , υi,d )T , with .υi,j as in (8.3.19). It is seen
that .Ed is the usual long-run covariance matrix of the stationary sequence .(ϒ i )
in .Rd . Assuming for the moment that the series .{Xi }i∈Z is .Lν -decomposable,
and 8.3.5, and is independent and identically distributed, .Ed (j, j ) in Theorem 8.3.6
would reduce to .2λj , coinciding with standard asymptotic normality results for the
eigenvalues computed from sample covariance operators based on a simple random
sample. As a corollary to Theorem 8.3.6, the limiting distribution of the individual
partial sample empirical eigenvalue estimates is obtained. These asymptotics are
useful in evaluating whether individual eigenvalues have undergone a change.
Corollary 8.3.1 If the conditions of Theorem 8.3.6 are satisfied, then
⎛ ⎞ d
⎣Nx⎦ D [0,1]
N 1/2 λ̂j (x) −
. λj → σj W (x), (8.3.22)
N

where .W (x) a standard one-dimensional Brownian motion, and .σj2 = Ed (j, j ).


In order to use these results to develop a test for a change in the eigenvalues, we
define an estimator for .Ed as

E −1 ⎛ ⎞
1 E(
N
l ˆ )( )T
Êd =
. K  l,υ , ˆ l,υ = ϒ̂ i − ϒ̄ ϒ̂ i+l − ϒ̄ ,
h N
l=1−N i∈Il

where .h = h(N) and .K(u) satisfy Assumptions 3.1.4 and 3.1.5, and .Il =
{1, . . . , N − l} if .l ≥ 0 and .Il = {1 − l, . . . , N } if .l < 0, and .ϒ̂ j =
(υ̂i,1 , . . . , υ̂i,d )T is the estimated score vector whose entries are given by

.υ̂i,j = <(Xi − X̄) ⊗ (Xi − X̄) − Ĉ1 , ϕ̂j,1 ⊗ ϕ̂j,1 >, (8.3.23)

while .ϒ̄ is the sample mean of the .ϒ̂ i . In order to test .H0 , we consider the maximal
quadratic form statistic

−1
ζT
N (x)Êd ζ N (x)
.Jd,N (κ) = sup ,
0≤x≤1 [x(1 − x)]2κ
8.3 Change in the Covariance of Functional Observations 479

where
⎛ ⎞
ˆ ⎣Nx⎦ ˆ
ζ N (x) = N
.
1/2
d (x) − d (1) , x ∈ [0, 1].
N

To evaluate the constancy of individual eigenvalues, we consider the test statistic


| |
| |
.Ij,N (κ) = sup
N 1/2 |λ̂j (x) − ⎣Nx⎦ λ̂j (1)| , j = 1, . . . , d,
| |
0≤x≤1 σ̂j [x(1 − x)]
κ N

where .σ̂j2 = Êd (j, j ). The following result is a consequence of Theorem 8.3.6.
Corollary 8.3.2 If the conditions of Theorem 8.3.6 are satisfied, and .0 ≤ κ < 1/2,

D E
d Bj2 (x)
.Jd,N (κ) → sup ,
0≤x≤1 j =1 [x(1 − x)]2κ

and

D |Bj (x)|
Ij,N (κ) → sup
. ,
0≤x≤1 [x(1 − x)]κ

where .{Bj (x), 0 ≤ x ≤ 1}, .j ∈ {1, . . . , d}, are independent and identically
distributed standard Brownian bridges.
A test of asymptotic size .α for .H0 is to reject if .JN or .Ij,N exceed the .1 − α
quantile of their limit distributions in Corollary 8.3.2.
Remark 8.3.1 Due to the bias that occurs in estimating the eigenvalues of the
covariance operator near the beginning of the sample, in practice we may use instead

−1
(δ) ζT
N (x)Êd ζ N (x)
. Jd,N (κ) = sup ,
δ≤x≤1 [x(1 − x)]2κ

and
| |
| |
.I
(δ)
= sup
N 1/2 |λ̂j (x) − ⎣Nx⎦ λ̂j (1)| , j = 1, . . . , d,
j,N (κ) | |
δ≤x≤1 σ̂j [x(1 − x)]
κ N

for a user specified trimming parameter .δ. The asymptotic distributions of these
statistics coincide with those described in Corollary 8.3.2 upon replacing the domain
on which the suprema are calculated with .[δ, 1]. We have found the choice .δ = 0.1
seems to work well in practice, and we generally recommend this choice as a default.
480 8 Functional Data

When the data are stationary, the eigenvalue .λj is often used to describe the
variance of .X0 explained by the j th principal component .ϕj by comparing its
magnitude to the cummulative variance of the function .X0 measured by the trace of
the covariance operator

E ⎛
. λj = C(t, t)dt = Tr(c).
j =1

A common criterion for selecting the number of principal components for subse-
quent analysis is to take the minimum d that for which the total variance explained
(TVE) by the first d principal components to exceed a user selected threshold v, that
is,
⎧ ⎫
λ1 + · · · + λd
.d = dv = min d, ≥v . (8.3.24)
Tr(c)

When performing principal component analysis for functional time series it is often
also of interest to evaluate if .Tr(c) is constant in conjunction with the constance of
the largest eigenvalues. A partial sample estimator of the trace is given by

⎣N x⎦
1 E
.TrN (x) = ||Xi − X̄||2 , x ∈ [0, 1]. (8.3.25)
N
i=1

The large-sample behavior of a centered version of the process .TrN is given next.
Theorem 8.3.7 If Assumption 8.3.1 holds with .gN = g, so that the sequence .Xi is
Lν -decomposable for some .ν > 4, then
.

D[0,1]
.N 1/2 [TrN (x) − x Tr(c)] → σT W (x),

where W a standard Brownian motion and, with .ξi = ||Xi − μ||2 ,



E
. σT2 = cov(ξ0 , ξi ). (8.3.26)
i=−∞

Theorem 8.3.7 suggests using the test statistic

N 1/2
MN (κ) = sup
. |TrN (x) − xTrN (1)|,
0≤x≤1 σ̂T [x(1 − x)]
κ
8.4 Heteroscedastic Errors 481

where .σ̂T2 is of the form


E ⎛ ⎞
l 1 E
2
.σ̂T = K γ̂l , γ̂l = (ξ̂i − ξ̄ )(ξ̂i+l − ξ̄ ),
h N
l=−∞ i∈Il

where .ξ̂i = ||Xi − X̄||2 and .Il is as above.


Corollary 8.3.3 If the conditions of Theorem 8.3.7 are satisfied, and .0 ≤ κ < 1/2,
then

D |B(x)|
.MN (κ) → sup
0≤x≤1 [x(1 − x)]
κ

where B is a standard Brownian bridge.


A test of asymptotic size .α for the null hypothesis of no change point in the trace of
the covariance kernel is to reject if .MN (κ) exceeds the .1 − α quantile of the limit
distribution in Corollary 8.3.3.

8.4 Heteroscedastic Errors

In many cases of interest, for instance in many financial applications, it is not


plausible that the error term in (8.1.1) is stationary. A more flexible model would
allow the error distribution to also change at several times during the observation
period. In Sect. 8.3 we discussed how to detect changes in the covariance structure
of the observations when there is no change in the mean, or in the presence of
a single change in the mean. We may instead be interested in performing change
point analysis on the mean function, but when there might be multiple changes in
the distribution of the errors. This problem was investigated in Sect. 3.3 in the case
of scalar observations. To formally state such a problem, suppose the observations
still foll model (8.1.1), but that the variance structure of the errors changes and
unknown number M times, occurring at the time .1 < n1 < · · · < nM < N . We
assume that

Assumption 8.4.1 .ni = ⎣Nτi ⎦, 1 ≤ i ≤ M and .0 < τ1 < τ2 < . . . < τM < 1.
Let .n0 = 0, nM+1 = N, τ0 = 0 and .τM+1 = 1.
Assumption 8.4.2 .{Ei , i ∈ Z} forms a Bernoulli shift on .nl−1 < i ≤ nl , 1 ≤ l ≤
M + 1, i.e.
.Ei (t) = gl (ηi , ηi−1 , . . .)(t) for some (deterministic) measurable function .gl :

S∞ → L2 , where .{ηj , j ∈ Z} are independent and identically distributed random


variables with values in a measurable space .S, and .Ei (t) = Ei (t, ω) is jointly
measurable in .(t, ω), for each .i ∈ Z. Further .EEi (t) = 0 for all .t ∈ [0, 1],
482 8 Functional Data

E||Ei ||ν2 < ∞ with some .ν > 2, and


.

( ∗
)1/ν
. E||Ei − Ei,m ||ν2 ≤ cm−α with some α > 2, (8.4.1)

∗ = g (η , . . . , η ∗ ∗
where .Ei,l l i i−l+1 , ηi−l , ηi−l−1 , . . .) for .nl−1 < i ≤ nl , 1 ≤ l ≤

M + 1, and .{ηi , i ∈ Z} are independent copies of .η0 , independent of .{ηl , l ∈ Z}.
According to Assumption 8.4.1 the sequence .{Ei (t), i ∈ Z} is not stationary but it
is stationary on the intervals .nl−1 < i ≤ nl , 1 ≤ l ≤ M + 1. Dependence though
is allowed between the errors in different intervals of stationarity. However, the
volatility changes abruptly. We use the error terms only on some intervals according
to the definition in Assumption 8.4.1. Let

Ei,l = gl (ηi , ηi−1 , . . .), i ∈ Z.


. (8.4.2)

It is clear that

Ei (t) = Ei,l (t), if nl−1 < i ≤ nl , 1 ≤ l ≤ M + 1.


.

We define the long-run covariances for the segments of stationarity as



E
Dl (t, s) =
. EE0,l (t)Ej,l (s), 1 ≤ l ≤ M + 1. (8.4.3)
j =−∞

Under Assumption 8.4.1, the series defining .Dl is absolutely convergent in .L2 . The
limit of the partial sums of the .Ei ’s will be a Gaussian process .{┌(u, t), 0 ≤ u, t ≤ 1}
with

E┌(u, t) = 0
. and E┌(u, t)┌(v, s) = D(u, v, t, s), (8.4.4)
0 ≤ u, v, t, s ≤ 1,

where for .τl−1 < min(u, v) ≤ τl

E
l−1
D(u, v, t, s) =
. (τj − τj −1 )Dj (t, s) + (min(u, v) − τl−1 )Dl (t, s). (8.4.5)
j =1

Theorem 8.4.1 If .H0 of (8.1.1), Assumptions 8.4.1 and 8.4.2 hold, then we can
define a sequence of Gaussian processes .{┌N
0 (u, t), 0 ≤ u, t ≤ 1} such that

⎛ ⎛ ⎞2
. sup ZN (u, t) − ┌N
0
(u, t) dt = oP (1)
0≤u≤1
8.4 Heteroscedastic Errors 483

and for each N


{ } { }
D
.
0
┌N (u, t), 0 ≤ u, t ≤ 1 = ┌ 0 (u, t), 0 ≤ u, t ≤ 1

with ┌ 0 (u, t) = ┌(u, t) − u┌(1, t),

where the mean and covariance of the Gaussian process .{┌(u, t), 0 ≤ u, t ≤ 1} are
defined in (8.4.4).
Proof We write the partial sum of the errors as

E
k l−1 ⎛
E nj
E ⎞ E
k
. Ei (t) = Ei (t) + Ei (t), (8.4.6)
i=1 j =1 i=nj −1 +1 i=nl−1 +1

when k satisfies .nl−1 < k ≤ nl . Using Theorem A.3.1, we can define independent
Gaussian processes .{┌N,1 (u, t), 0 ≤ u ≤ τ1 , 0 ≤ t ≤ 1}, {┌N,2 (u, t), τ1 < u ≤
τ2 , 0 ≤ t ≤ 1}, . . . , {┌N,M+1 (u, t), τM < u ≤ 1, 0 ≤ t ≤ 1} such that

E┌N,l (u, t) = 0,
.

E┌N,l (u, t)┌N,l (v, s) = min(u, v)Dl (t, s), 0 ≤ u, v ≤ τl − τl−1 ,


⎛ ⎛ ⎞2
⎣N
E u⎦
. sup ⎝N −1/2 Ei (t) − ┌N,l (u, t)⎠ dt
0≤u≤τl −τl−1 i=nl−1 +1

= oP (1), 1 ≤ l ≤ M + 1. (8.4.7)

We define

E
l−1
┌N (u, t) =
. ┌N,j (τj − τj −1 , t) + ┌N,l (u − τl−1 , t),
j =1

if u satisfies .τl−1 < u ≤ τl , 1 ≤ l ≤ M + 1. Letting .┌N 0 (u, t) = ┌ (u, t) −


N
u┌N (1, t) and putting together (8.4.6) and (8.4.7), we conclude that
⎛ ⎛ ⎞2
⎣N
E ⎣Nu⎦ E
u⎦ N
. sup ⎝N −1/2 Ei (t) − Ei (t) − ┌N
0
(u, t)⎠ dt (8.4.8)
0≤u≤1 N
i=1 i=1

= oP (1).
484 8 Functional Data

Under the null hypothesis of no change in the mean function,

⎣N
E ⎣Nu⎦ E
u⎦ N
ZN (u, t) = N −1/2
. Ei (t) − Ei (t),
N
i=1 i=1

and therefore (8.4.8) implies the result 8.4.1. ⨆


8.4.1 Testing for a Change in the Mean

The behaviour of .ZN under the alternative hypothesis of a single change in the
mean is the same up to a first order approximation as the case when the errors are
homogeneous.
Theorem 8.4.2 If .HA of (8.1.3), Assumptions 8.4.1, 8.4.2 hold and

NθN2 (1 − θN )2
. (μ0 (t) − μA (t))2 dt → ∞,

then

1 P
. ⎛ sup 2
ZN (u, t)dt → 1
N θN2 (1 − θN )2 (μ0 (t) − μA (t))2 dt 0<u<1

and
⎛⎛
1 P 1
. ⎛ 2
ZN (u, t)dtdu → .
3
NθN2 (1 − θN )2 (μ0 (t) − μA (t))2 dt

Proof The proof is the same as that of Theorem 8.1.3, noting that under
Assumption 8.4.2 the CUSUM process of the errors .ZN,E still satisfies
.sup0≤u≤1 ||ZN,E (u, ·)|| = OP (1) according to Theorem 8.4.1. ⨆

2

⎛ ⎛We2 discussed after Theorem 8.1.1 methods to approximate the distribution of


. ZN (u, t)dudt, which can also be used in this case upon estimating the long-run
covariance function in (8.4.5). Let


⎪ 1 E
k−l

⎪ (Xi (t) − X̄N (t))(Xi+l (s) − X̄N (s)), if l ≥ 0

⎨k−l
i=1
.γ̂k,l (t, s) =

⎪ 1 Ek

⎪ (Xi (t) − X̄N (t))(Xi+l (s) − X̄N (s)), if l < 0.

⎩ k − |l|
i=−(l−1)
8.4 Heteroscedastic Errors 485

The estimator is defined as

D̂N (u, v, t, s) = D∗N (min(u, v), t, s)


. (8.4.9)

with
⎣NE
u⎦−1 ⎛ ⎞
∗ l
.DN (u, t, s) = K γ̂k,l (t, s).
h
l=−(⎣N u⎦−1)

The basic idea is that we estimate .D using the observations .X1 , X2 , . . . , X⎣N min(u,v)⎦ ,
but centered using .X̄N estimated from the entire sample. The the following theorem
remains true if we replace .X̄N with

1E
k
.X̄k = Xi (t)
k
i=1

in the definition of .γ̂k,l . The proof of this result is omitted, although it follows
similarly as (4.1.41).
Theorem 8.4.3 If .H0 of (8.1.2), Assumptions 3.1.4, 3.1.5, 8.4.1 and 8.4.2 with .ν >
4 hold, then
⎛⎛⎛⎛⎛ ⎞2
. D̂N (u, v, t, s) − D(u, v, t, s) dudvdtds = oP (1),

where .D(u, v, t, s), .D̂N (u, v, t, s) are defined in (8.4.5) and (8.4.9).
The consistent estimator .D̂N can be used to create a consistent test of .H0 based
on .L2 functionals of .ZN . The covariance function of .{┌N
0 (u, t), 0 ≤ u, t ≤ 1} is

D0 (u, t) = D(u, v, t, s) − vD(u, 1, t, s) − uD(1, v, t, s) + uvD(1, 1, t, s).


.

Hence we can use the plug in estimator

D̂0 (u, t) = D̂N (u, v, t, s) − v D̂N (u, 1, t, s) − uD̂N (1, v, t, s) + uv D̂N (1, 1, t, s).
.

Let .λ̂N,1 ≥ λ̂N,1 ≥ . . . satisfying


⎛⎛
λ̂N,i φ̂N,i (u, t) =
. D̂0N (u, v, t, s)φ̂N,i (v, s)dvds, 1 ≤ i ≤ N − 1.

The null critical values of the statistic


⎛⎛
.TN =
2
ZN (u, t)dudt
486 8 Functional Data

may be approximated as the quantiles .ĉN,d (α) satisfying


⎧ ⎫
E
d
P
. λ̂N,i N2i ≥ ĉN,d (α) = α,
i=1

where .{Ni , i ≥ 1} are independent standard normal random variables. The


computation of the eigenvalues is somewhat simpler in the homoscedastic case;
compare to (8.1.19). Under the null hypothesis of no change in the mean function
{ }
. lim lim P TN ≥ ĉN,d (α) = α
d→∞ N →∞

and for any .d ≥ 1


{ }
. lim P TN ≥ ĉN,d (α) = 1
N →∞

under the alternative.

8.4.2 Estimating the Time of Change in Heteroscedastic


Models

As in Sect. 3.3, we noted that the distribution of the estimator for the change
point might be different if the mean and variance change at the same time; see
Theorems 3.3.6 and 3.3.7. Next we discuss the estimation of .k ∗ of (8.2.1) under
Assumptions 8.4.1 and 8.4.2. As in Theorems 3.3.6 and 3.3.7, we have different
limiting distributions depending on if the mean and the volatility changed at the
same time. For the sake of simplicity we consider the case when there is only one
change in the mean, i.e. .HA holds in model (8.1.3). The time of change .k ∗ satisfies
Assumption 2.1.1. In the data we have .M +1 stationary segments for the distribution
of the errors, and the distribution of .k̂N of (8.2.2) depends on the regime in which
the time of change in the mean occurs. If the long-run covariance function of the
functional observations in this regime is .Dj , then the normalization of the estimator
for the time of change will depend on
⎛⎛
βj2 =
. h(t)Dj (t, s)h(s)dsdt, (8.4.10)

where .Dj is defined in (8.4.3). The situation is somewhat different if .θ = τj , i.e.


change in the mean and the variance occurs at the same time. In this case asymptotic
distribution of .k̂N is similar to that in Theorem 3.3.6. We define the two sided
8.4 Heteroscedastic Errors 487

Gaussian process

∗ βj2 W1 (−t), if t < 0
.Wj (t) = (8.4.11)
βj2+1 W2 (t), if t ≥ 0,

where .{W1 (t), t ≥ 0} and .{W2 (t), t ≥ 0} are independent Wiener processes. Let
{ }
.ξj∗ (κ) = argmaxt Wj∗ (t) − |t|mκ (t) , (8.4.12)

where .m0 (t) is defined in (2.2.3). We recall that .k̂N , the estimator for .k ∗ is defined
in (8.2.2).
Theorem 8.4.4 We assume that .HA of (8.1.3), Assumptions 2.1.1, 8.2.2, 8.4.1, and
8.4.2 are satisfied, and .0 ≤ κ < 1/2.
(i) If .τj −1 < θ < τj with some .1 ≤ j ≤ M, then

Δ2N ⎛ ∗
⎞ D
. k̂ N − k → ξ(κ), (8.4.13)
βj2

where .ξ(κ) and .βj2 are defined in (2.2.3) and (8.4.10).


(ii) If .θ = τj with some .1 ≤ j ≤ M, then
⎛ ⎞ D
Δ2N k̂N − k ∗ → ξj∗ (κ),
. (8.4.14)

where .ξj∗ (κ) is defined in (8.4.12).


If an addition

N 1/2 |ΔN |(log N)−1/ν → ∞


.

also holds, then (8.4.13) and (8.4.14) remain true when .κ = 1/2.
Proof We follow the proof of Theorem 8.2.1. Using again the decomposition
in (8.2.10), the same calculations give that we need to consider only .Qk,4 (t) and

.Qk,5 (t), where .|k − k| = OP (1/Δ ).
2
N
Under the conditions of Theorem 8.4.4(i), (8.2.25) has the form
| ⎛ |
| 1 |
. sup | + − |s|m |
1−2κ 2
| N 1−2κ Q k ∗ +sβj /ΔN ,5
2 2 (t)dt 2(θ (1 θ )) βj κ (s) | (8.4.15)
|s|≤C

= o(1),
488 8 Functional Data

for all .C > 0, where .mκ (s) is defined in (2.2.1). Since Assumption 8.2.2 holds, for
any .C > 0, the errors .{Ek , k ∗ − C/Δ2N ≤ k ≤ k ∗ + C/Δ2N } are in the .j th stationary
segment, i.e. .⎣N τj −1 ⎦ < k ∗ − C/Δ2N , k ∗ + C/Δ2N < ⎣Nτj ⎦. Hence Theorem A.1.1
implies that

1 D[−C,C]
. Qk ∗ +sβ 2 /Δ2 ,4 (t)dt −→ 2(θ (1 − θ ))1−2κ βj2 W (s), (8.4.16)
N 1−2κ l N

for all .C > 0, where .{W (s), −∞ < s < ∞} is the two sided Wiener process
of (2.2.2). Now (8.4.15) and (8.4.16) imply Theorem 8.4.4(i) as (8.2.25) and (8.2.26)
imply Theorem 8.2.1.
If the conditions of the second part of Theorem 8.4.4 hold, the we replace (8.4.15)
with
| ⎛ |
|1 |
. sup | Q (t)dt + 2θ (1 − θ )|s|m (s) | = o(1). (8.4.17)
|N k ∗ +s/ΔN ,5
2 0 |
|s|≤C

When second order properties of the observations change at .k ∗ so instead of (8.4.16)


we have

1 D[−C,C]
.
1−2κ
Qk ∗ +s/Δ2 ,4 (t)dt −→ 2(θ (1 − θ ))1−2κ W ∗ (s), (8.4.18)
N N

where .{W ∗ (s), −∞ < s < ∞} is defined in (8.4.11). Now the arguments used in the
proof of Theorem 8.2.1 (cf. also Theorem 3.3.6) can be applied. Similar arguments
can be used when .κ = 1/2. ⨆

We can also provide an analogue of Theorem 2.2.2 for functional time series. As
in Theorem 8.4.4 the limit distribution depends on the volatility regime where the
change in the mean occurs. We define the forward and backward partial sums of the
projected errors in the .i th regime:
⎧ −1 ⎛

⎨ E
(1) − Ei,j (t)h(t)dt, if l < 0,
Sj (l) =
.

⎩ i=l
0, if l = 0

and
l ⎛
E
(2)
Sj (l) =
. Ei,j (t)h(t)dt, if l > 0,
i=1
8.4 Heteroscedastic Errors 489

where .Ei,j (t) is defined in (8.4.2). If .τj −1 < θ ≤ τj , then the limit is given in term
of
⎧ (1)
Sj (l), if l ≤ 0
.Sj (l) = (2)
Sj (l), if l > 0.

Let

. ξ̄j = argmaxl {ΔSj (l) − Δ2 |l|m0 (l) h2 (u)du}, (8.4.19)

where .m0 (l) is defined in (2.2.1). Similarly,


⎧ (1)
∗ Sj (l), if l ≤ 0
.Sj (l) = (2)
Sj +1 (l), if l > 0

and

ξj∗ = argmaxl {ΔSj∗ (l) − Δ2 |l|m0 (l) h2 (u)du}.
. (8.4.20)

In the following result we only consider the case where .κ = 0, and it is proven along
the lines of Theorems 8.2.1 and 8.4.4.
Theorem 8.4.5 We assume that .HA of (8.1.3), Assumptions 2.1.1, 8.2.1, 8.4.1 and
8.4.2 and are satisfied.
(i) If .τj −1 < θ < τj with some .1 ≤ j ≤ M, then

D
. k̂N − k ∗ → ξ̄j ,

where .ξ̄j is defined in (8.4.19).


(ii) If .θ = τj with some .1 ≤ j ≤ M, then

D
k̂N − k ∗ → ξj∗ ,
.

where .ξj∗ is defined in (8.4.20).


490 8 Functional Data

8.5 Data Examples

Example 8.5.1 (Australian Minimum Temperature Profiles) We consider here


an application to annual minimum temperature profile cuves obtained from eight
measuring stations in Australia as studied in Aue et al. (2018). More precisely,
the raw data consists of 365 (366 in leap years) daily measurements of minimum
temperatures that were converted into functional objects using 21 Fourier basis
functions. These data in the case of the Gayndah post office station are displayed in
Fig. 8.1. The observations for each of the eight stations are recorded over different
time spans, roughly equaling 100 years. The data may be downloaded from The
Australian Bureau of Meteorology at the URL www.bom.gov.au. For the series
associated to each station, we used the statistic MN = MN (0) to test for the presence
of change points in each series, which we estimated using the estimator k̂N (0), also
computing the confidence interval in (8.2.37). The results are shown in Table 8.1,
20
15
10

Segment 1
5

Segment 2
Segment 3
0

0.0 0.2 0.4 0.6 0.8 1.0

Fig. 8.1 Upper panel: Time series plot of annual temperature profiles at Gayndah Post Office.
Bottom panel: Estimated segmentation according to the fully functional binary segmentation
method with threshold .ρN = σ̂N (log N )1/2 . The two estimated change point are in the years
1953 and 1972
8.5 Data Examples 491

Table 8.1 Summary of results of a change point analysis in the mean of minimum temperature
curves from eight Australian measuring stations. The p-value for a test of the null hypothesis that
each series has a homogenous mean based on the statistic MN was zero in each case. The column
labelled k̂N (0) reports the estimated break date using the fully functional method, CI gives the
corresponding 95% confidence interval computed from (8.2.37)
Station Range k̂N (0) (year) CI (years)
Sydney (Observatory Hill) 1959–2012 1991 (1981, 1994)
Melbourne (Regional Office) 1855–2012 1998 (1989, 2000)
Boulia Airport 1888–2012 1978 (1954, 1981)
Cape Otway Lighthouse 1864–2012 1999 (1949, 2005)
Gayndah Post Office 1893–2009 1972 (1952, 1980)
Gunnedah Pool 1876–2011 1985 (1935, 1992)
Hobart (Ellerslie Road) 1882–2011 1966 (1957, 1969)
Robe Comparison 1884–2011 1981 (1954, 1985)

which generally suggest that each series appears to have a non-homogeneous mean,
with the most prominent change point estimates tending to cluster around the 1960’s
to 1980’s.
Subsequently, we performed binary segmentation as described above to estimate
further change point in each series. The resulting segmentation for the Gayndah
station, which estimated two change points in the years 1953 and 1972, is displayed
in the bottom panel of Fig. 8.1.
Example 8.5.2 (Crude Oil Intra-Day Return Curves) We illustrate the use of the
weighted CUSUM covariance change point detector and estimator in an application
to detect changes in the covariance of intra-day return curves derived from crude oil
futures prices, as considered in Horváth et al. (2022).
We consider two benchmark assets in the international crude oil pricing system:
West Texas Intermediate (WTI) and Brent crude oil futures. The raw data that we
consider were obtained from www.backtestmarket.com, and are comprised of 5-
minute frequency front-month prices of WTI and Brent futures, from 9:00 am to
2:30 pm, each trading day from 12 May, 2018 to 30 Apr, 2020, which totals 502
days. As such there are 67 discrete observations of the price within each day, which
we linearly interpolated to produce intra-day price curves pi (t) for each asset. The
closing prices of these assets over the observation period are plotted in Fig. 8.4, and
visualizations of these price curves for WTI are shown in Fig. 8.2.
We take as a goal of this analysis to evaluate whether the variability of the
curves modelled by their covariance kernel undergoes change points during the
observation period. In order to study this series as a mean stationary functional
time series, we transform them to cumulative intra-day return curves (CIDRs) via
the transformation

.ri (t) = log(pi (t)) − log(pi (0)),


492 8 Functional Data

Fig. 8.2 Daily price curves of WTI commodity futures obtained by linear interpolation of 5-
minute frequency front-month prices

Fig. 8.3 Daily cumulative intra-day return (CIDR) curves from WTI commodity futures

where pi (t) is the asset price on day i at intra-day time t, and pi (0) is the opening
price at 9:00 am on day i. Figure 8.3 illustrates the CIDR curves constructed from
both collections of asset price curves. We applied a series of hypothesis tests to
evaluate the stationarity, normality, and serial correlation structure of these CIDR
curves, see respectively Horváth et al. (2014), Górecki et al. (2017), and Kokoszka
et al. (2017), for the details of these tests, the results of which suggested that both
series of crude oil CIDR curves evolve as approximately mean stationary, non-
Gaussian, serially uncorrelated and conditionally heteroscedastic functional time
series.
8.5 Data Examples 493

We applied a test of H0 to each series based on TN (1/4) to detect potential


changes in the covariance of both curve sequences. Following the settings used
in the simulation study, we used the Bartlett window function and the bandwidth
h = N 1/5 in the calculation of the long-run covariance operator. The p-values of
these tests for each asset were zero, suggesting the presence of a change point, and
the change point estimates k̂N (1/4) for the series corresponded to the dates March
5th, 2020 for WTI and February 28, 2020 for Brent. We segmented each series
based on these estimates and performed tests of H0 again within each segment in
order to detect additional change points. One additional change point was detected
at significance level 0.05 in the first segment of each series, and these were estimated
at the locations December 28th, 2020, and December 29th, 2020, for the WTI and
Brent series respectively (approximate p-values of 0.041 and 0.048, respectively).
No further change points were detected of notable significance. The location of these
change points in the case of the WTI series are illustrated in Fig. 8.3. We note that
we did not take into account multiple testing when producing these p-values, and an
application of e.g. the Holm-Bonferroni method would suggest that the second two
estimated change points are not significant at the level 0.05 (Fig. 8.4).
It is noteworthy that the change point in covariance estimates for the WTI and
Brent series are very similar. The first break estimate coincides with the beginning
of the COVID-19 pandemic in the US. The second detected change for each series
at the end of December 2018 coincides with the largest drop in oil-prices that had
been observed in the previous three years, which followed an already long decline
that began in November of that year.

Example 8.5.3 (Sea Surface Temperature Anomaly Data) In this application


we analyze the sea surface temperature (SST) anomaly data set available in the
companion R package to the textbook Wikle et al. (2019), as considered in Aue
100
Closing price

50
0

WTI
Brent
WTI break
Brent break

12/May/18 23/Aug/18 03/Dec/18 18/Mar/19 27/Jun/19 08/Oct/19 17/Jan/20 30/Apr/20

Date

Fig. 8.4 Daily closing prices of the WTI and Brent commodity futures, with the estimated breaks
in the covariance operator in the corresponding WTI and Brent CIDR curves
494 8 Functional Data

January 1970 February 1970

1.5
1.0

1.0

0.5

0.5
Anomaly Deg C

Anomaly Deg C
0.0

0.0

−0.5

−0.5

−1.0
−1.0

Lon Lon −1.5


Lat −1.5 Lat

−2.0 −2.0

March 1970 April 1970


1.5
1.5

1.0 1.0

0.5
Anomaly Deg C

Anomaly Deg C

0.5

0.0

0.0

−0.5

−0.5

−1.0

Lon Lon
Lat Lat
−1.0
−1.5

Fig. 8.5 Top four panels: Wireframe plots of the SST anomaly data from four months in 1970.
The entire data set is comprised of a time series of 396 such surfaces

et al. (2020). The data here are available at a monthly resolution starting in January
of 1970 and ending in December of 2003 (n = 396), and consist of 2 degree latitude
by 2 degree longitude spatial measurements of the monthly average sea surface
temperature in the south Pacific, 2261 spatial observations in total, that are adjusted
by removing the corresponding monthly means over the past 12 years in order to
make anomalous months and trends more apparent. Wire frame plots of the surfaces
representing the first four months of data are shown in Fig. 8.5.
SST anomaly data are used primarily in identifying strong climate trends along
with variations such as the El Niño and La Niña phenomena. One of the main
techniques for identifying these trends and variations in the literature is to employ
an empirical orthogonal function (EOF) analysis. EOF analysis effectively amounts
to conducting PCA, or more generally functional PCA, on the surface valued data
encoding the temperature evolution overtime; see Chapter 2.4 in Wikle et al. (2019)
for a discussion of the connections between EOF and PCA analysis.
8.5 Data Examples 495

Table 8.2 Change point estimates along with p-values for tests of stability of the largest
eigenvalue, first three eigenvalues, and trace of the covariance operator
Data set CP λ1 (p-value) CP (λ1 , λ2 , λ3 )T (p-value) CP Trace(p-value)
Raw SST data Apr-76 (0.005) May-97 (0.000) May-97 (0.011)
El Niño adjusted Apr-76 (0.007) May-97 (0.000) May-97 (0.010)
El Niño adjusted/detrended Apr-76 (0.017) Nov-83 (0.055) May-97 (0.156)
Bolded p-values are less than 0.05

We take as the goal of this analysis to determine if the variation explained by the
leading functional principal components of the SST anamoly surfaces is plausibly
homogeneous throughout the sample, and if not to try and better pinpoint the sources
of the variation changes within the series. In this case we envision the data set as
being comprised of a time series of 396 surfaces in L2 ([0, 1]2 ). It is straightforward
to adjust all formulae above to this setup.
First we applied our tests for changes in λ1 , jointly for (λ1 , . . . , λ3 )T , and to
the trace of the covariance operator using the statistics IN(0.1) (0), J3,N (0.1)
(0), and
MN (0) to the raw SST surfaces. The results are summarized in Table 8.2, which also
contains the results of subsequent tests. These showed that the level of variability
measured by the leading eigenvalues and trace appear to change considerably over
the observation period. As seen in the bottom panel of Fig. 8.6, high peaks in the
CUSUM function for the largest eigenvalue process are observed on the dates of
April 1976, and May 1997. These coincide with well known strong El Niño events.
The total variation explained by the first three eigenvalues calculated from the
covariance operator prior to April 1976 is 72%, while for the data past April 1976
4

Raw
El Nino
El Nino + Detrend
3
Cusum

2
1
0

0.0 0.2 0.4 0.6 0.8 1.0


| |
| |
Fig. 8.6 Plots of (N 1/2 /σ̂1 ) |λ̂1 (x) − (⎣N x⎦/N )λ̂1 (1)|, 0.1 ≤ x ≤ 1, versus x for raw, El Niño
adjusted, and El Niño adjusted plus detrended SST anomaly series. Horizontal line is the 95%
quantile of sup0.1≤x≤1 |B(x)|, and hence cusum plots that go over this line are approximately
significant to the level 0.05. Distinct peaks of the CUSUM charts for the raw and El Niño adjusted
series occur at points corresponding to the dates April 1976, and May, 1997
496 8 Functional Data

is 54%, and hence one may expect based on this analysis to need more principal
components to accurately represent the data past April 1976 compared to prior to
this date.
In order to adjust for the El Niño effect, Lawrence et al. (2004) suggests using the
first principal component (PC) series as a proxy for the El Niño variations, which
may then be removed by calculating the residuals of a simple linear regression of
the SST time series at each spatial point on to the leading PC series. Specifically,
let xr,v,i denote the raw, or pixel, SST data at each of the 2261 spatial locations,
and Xi (t, s), i ∈ {1, . . . , 396} denote the corresponding SST anomaly surfaces,
t, s ∈ [0, 1]. Then with φ̂1 denoting the eigenfunction corresponding to the largest
eigenvalue of the sample covariance operator of the Xi ’s, the leading PC series is
given by
⎛⎛
Pi =
. [Xi (t, s) − X̄(t, s)]φ̂1 (t, s)dtds.
[0,1]2

Lawrence et al. (2004) then suggests fitting the linear regression

xr,v,i = βr,v
.
(1)
Pi + βr,v
(0)
+ εr,v,i ,

'
and subsequently considering the residuals xr,v,i
(1) (0)
= xr,v,i − (β̂r,v Pi + β̂r,v ), where
(1) (0)
βr,v and βr,v are estimated via least squares for each pixel. The resulting El Niño
adjusted surfaces are denoted Xi' . We also applied our tests to these surfaces, which
suggested that they still contain sizable fluctuations in the eigenvalues and overall
variance as measured by the trace of the covariance operator, with apparent and
prominent changes at approximately the same dates.
In order to remove remaining variation that was not captured by the anomaly
and El Niño adjustments, Good et al. (2007) suggest removing further residual
variations by fitting trigonometric series at high frequencies to the pixel level data.
We ran our tests one last time on the curves Xi' that were detrended using a moving
average with a relatively small window of 3 years to remove such variation. The
resulting time series length was reduced to 360 after truncating 1.5 years off of
each end of the series that could not be centered in this way. Our tests show that
this series/detrending technique evidently achieves some stability in terms of its
variability, with all tests for stability of the eigenvalues and trace yielding p-values
larger than 0.05. We note that using larger moving average windows tended to
decrease these p-values, suggesting that the remaining fluctuations after removing
the El-Niño effect are comparatively high-frequency.
8.6 Exercises 497

8.6 Exercises

Exercise 8.6.1 Verify that for the Gaussian process {┌N 0 (u, t), 0 ≤ u, t ≤ 1}
0
defined in the proof of Theorem 8.1.1 satisfies, E┌N (u, t)┌N0 (v, s) = (min(u, v) −

uv)D(t, s), 0 ≤ u, v, t, s ≤ 1.
Exercise 8.6.2 Verify that the processes ┌ and O defined in (8.1.13) have the same
distribution by verifying that they are each Gaussian processes with the same mean
and covariance functions.
Exercise 8.6.3 Let {Xi (t), i ∈ Z, t ∈ [0, 1]} be a stationary sequence of
stochastic processes with EX0 (t) = 0 for all t ∈ [0, 1], and E||X0 ||2 < ∞. Define
⎛1
γ1 (t, s) = EX0 (t)X0 (s), U (t, s) = 0 γ1 (t, u)γ1 (u, s)du, and u : L2 [0, 1] →
⎛ 1
L2 [0, 1] by u(f )(t) = 0 U (t, s)f (s)ds. Show that U defines a positive definite,
symmetric kernel integral operator on L2 [0, 1] by showing that (i) for all v ∈
L2 [0, 1], <u(v), v> ≥ 0, and (ii) for all v, w ∈ L2 [0, 1], <u(v), w> = <v, u(w)>.
Exercise 8.6.4 Assume {εi (t), i ∈ Z, t ∈ [0, 1]} satisfies Definition 8.1.1 with
some ν ≥ 2. Show that {||εi ||2 , i ∈ Z} satisfies the functional central limit theorem,
i.e. there exists constants μ and σ so that

⎣N
E x⎦
1 D[0,1]
. {||εi ||2 − μ} → σ W (x),
N 1/2
i=1

where W is a standard Brownian motion.


Exercise 8.6.5 Show that Eq. (8.1.16) holds, and that sup0≤u≤1 ||vN (⎣Nu⎦, ·)||2 =
NθN2 (1 − θN )2 ||μ0 − μA ||2 + O(1).
Exercise 8.6.6 Suppose that λ1 > λ2 in (8.1.6). Show that under the conditions of
Theorem 8.1.7,

1 |ξ̂N,1 (u)| D 1
. sup → sup |B(u)|,
1/2
λ̂N,1 0<u<1 [u(1 − u)]κ 0<u<1 [u(1 − u)]
κ

where B is a standard Brownian bridge.


Exercise 8.6.7 Show using the mean value theorem that for 0 < α < θ < 1,
0 ≤ κ ≤ 1/2, and k ∗ = ⎣Nθ ⎦,
|⎛ ⎞2κ ⎛ ⎞2κ ||
|
1 | N N |
. max ∗ | − | = O(N −1−2κ ).
N α≤k≤k ∗ k −k | k(N − k) k ∗ (N − k ∗ ) |
498 8 Functional Data

Exercise 8.6.8 For 1 ≤ l < u ≤ N , show that the generalized CUSUM process
Zkl,u (t) is equivalent to the weighted CUSUM process

ZN (k/N, t)
.
[k/N (1 − k/N )]1/2

defined over the sample Xl , . . . , Xu .


Exercise 8.6.9 If {Xi }i∈Z is a mean zero, Lν -decomposable sequence in L2 [0, 1]
for some ν ≥ 2, show that the sequence {Yi }i∈Z taking values in L2 [0, 1]2 and
defined by Yi (t, s) = Xi (t)Xi (s) is Lν/2 decomposable.
Exercise 8.6.10 If {Ei (t), i ∈ Z} is Lν -decomposable for some ν ≥ 1, then show
that for all v ∈ L2 [0, 1], {<Ei , v>, i ∈ Z} is an Lν -decomposable scalar sequence.
Exercise 8.6.11 Let {Xi (t), i ∈ Z, t ∈ [0, 1]} be a sequence of stochastic
processes with EXi (t) = 0 for all t ∈ [0, 1], and E||Xi ||4 ≤ M < ∞, and suppose
we wish to test for the stability of γ1,i (t, s) = EXi (t)Xi+1 (s), i.e. it does not depend
on i. Develop a CUSUM-type test statistic, and establish under what conditions this
statistic converges in distribution to a functional of a Gaussian process.

8.7 Bibliographic Notes and Remarks

The monograph of Ramsey and Silverman (2002) offers an excellent introduction to


functional data analysis. Bosq (2000) and Bosq and Blanke (2007) study theoretical
foundations and extend some results to functional time series. Other textbook length
treatments of functional data can be found in Ferraty and Vieu (2006), Horváth and
Kokoszka (2012) and Kokoszka and Reimherr (2017).
The proof of Theorems 8.1.4 and 8.1.5 are based on Hörmann et al. (2013) and
Aston and Kirch (2012).
Estimating the long-run covariance kernel and choosing the corresponding kernel
and bandwidth parameter are studied in Berkes et al. (2016), Horváth et al. (2016),
and Rice and Shang (2017).
The eigenvalues and eigenfunctions of covariance and long-run covariance func-
tions play an important role in many results and procedures. Asymptotic normality
of the empirical eigenvalues and eigenfunctions of the standard covariance kernel
estimate and long-run covariance kernel are established in Hall and Hosseini-Nasab
(2006), Kokoszka and Reimherr (2013), and Berkes et al. (2016).
Bucchia and Wendler (2017) use resampling methods to improve finite sample
performance of tests based on Hilbert space valued observations. The consistency
of binary segmentation for functional data is studied in Rice and Zhang (2022).
Chiou et al. (2019) develop a greedy algorithm based on a functional CUSUM-type
process to detect and estimate multiple change points with functional data. Harris
et al. (2021) develop a fused-LASSO based approach using principal component
8.7 Bibliographic Notes and Remarks 499

projections. Padilla et al. (2022) use a kernel CUSUM approach along with seeded
binary segmentation to detect and estimate multiple changes for both sparse and
dense functional data. A self-normalized approach to conduct change point analysis
for functional time series based on projections was put forward in Zhang et al.
(2011). A Bayesian method is developed in Li and Ghosal (2021).
A different framework in which tests for “relevant” changes in the mean function,
which are those whose norm exceeds a user specified threshold, are developed
in Dette et al. (2020b). The problem of conducting change point analysis for
the covariance function or operator describing the second order behaviour of
functional data has been comparatively less explored. Jarušková (2013) is the first
to consider the change point detection problem for the covariance operator of
independent functional data, and their approach is based on an initial dimension
reduction step using functional principal component analysis. Stoehr et al. (2021)
generalize change point detection methods for the mean and covariance using
several dimension reduction based approaches, and the test statistics that we
discussed for this problem appear in Jiao et al. (2023), Horváth et al. (2022),
and Sharipov and Wendler (2020). Aue et al. (2020) and Dette and Kutta (2021)
consider change point inference under similar weak dependence conditions for
the spectrum and eigenfunctions, respectively, of covariance operators, the latter
reference considering also the relevant testing framework.
Appendix A

A.1 Weak Convergence and Approximations of Sums

Central limit theory has been among the main focal points of probability research
since its inception. The functional version of the central limit theorem, which
generally refers to the central limit theorem for the partial sum process, appeared
in the fundamental work of Donsker (1951, 1952). Billingsley (1968) remains an
excellent entry point to the study of weak convergence of empirical processes, as
well as Vaart and Wellner (1996). Let .E1 , E2 , . . . be a sequence of random variables
and define the partial sums process

LN
E t⎦
1
SN (t) =
. Ei , t ∈ [0, 1]. (A.1.1)
N 1/2
i=1

Assumption A.1.1 .{Ei , i ≥ 1} are independent and identically distributed random


variables with .EEi = 0 and .EEi2 = 1.
Donsker’s theorem states that
D [0,1]
SN (t) −→ W (t),
. (A.1.2)

where .{W (t), 0 ≤ t ≤ 1} is a Wiener process (standard Brownian motion). We


say that .W (t) is a Wiener process, if it is a continuous Gaussian process with
.EW (t) = 0 and .EW (t)W (s) = min(t, s). The result in (A.1.2) means that

.{SN (t), 0 ≤ t ≤ 1} converges weakly in the metric space .D[0, 1], namely the

space of real-valued functions on .[0, 1] that are right-continuous and have left-hand
limits, endowed with the Skorokhod metric. Using the Skorokhod–Dudley–Wichura
representation theorem, we can reformulate (A.1.2) in the following way: By
enlarging the probability space on which .SN is defined if necessary, there exist
Wiener processes .{WN (t), t ∈ [0, 1]}, all defined on the common probability space

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 501
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2
502 A Appendix

as .SN , such that

. sup |SN (t) − WN (t)| = oP (1). (A.1.3)


0≤t≤1

See Skorokhod (1956), Dudley (1999), and Wichura (1970). The potential
enlargement of the probability space required to obtain approximations as in (A.1.3)
is due to the fact that the sample space on which .SN is defined originally may
not be rich enough to support a Brownian motion. In subsequent results of this
type below we assume such enlargements have already been made if needed. We
generally state results on the weak convergence of empirical processes as in (A.1.2)
as approximations with copies of the limiting process as in (A.1.3).
Starting with the seminal work of Darling and Erdős (1956), convergence rates
of the weak convergence of partial sums became an active area of research, owing
to their many and varied applications in probability and statistics. As weighted
approximations are useful in justifying many asymptotic results in change point
analysis, we state the results we use in the paper in that form. Csörgő and
Horváth (1993, 1997) contain the theory and several applications of weighted
approximations of partial sums, as well as for standard empirical and quantile
processes. Motivated by these goals and (A.1.3), we wish to obtain approximations
in the following form: for each N, there exist Wiener processes .{WN,1 (t), 0 ≤ t ≤
N/2} and .{WN,2 (t), 0 ≤ t ≤ N/2} such that
| k |
|E |
1 | |
. max | Ei − WN,1 (k)| = OP (1), (A.1.4)
1≤k≤N/2 k ζ | |
i=1

| N |
| E |
1 | |
. max | Ei − WN,2 (N − k)| = OP (1), (A.1.5)
N/2<k<N (N − k)ζ | |
i=k+1

{WN,1 (t), 0 ≤ t ≤ N/2}


. and {WN,2 (t), 0 ≤ t ≤ N/2} (A.1.6)
are independent

and

. ζ < 1/2. (A.1.7)


E E
Under Assumption A.1.1, .{ ki=1 Ei , 1 ≤ k ≤ N/2} and .{ N
i=k+1 , N/2 < k ≤ N}
D
are independent, so we obtain (A.1.6). Also, .{E1 + E2 + · · · + Ek , 1 ≤ k ≤ N} =
{EN + EN−1 + . . . + EN−k+1 , 1 ≤ k ≤ N/2}, and therefore (A.1.5) follows from
the approximation (A.1.4). The approximation in (A.1.4) may be established under
Assumption A.1.1 under the moment condition
Assumption A.1.2 .E|Ei |ν < ∞ with some ν > 2.
A Appendix 503

In this case, according to the Skorokhod’s embedding theorem (see Breiman,


1968, p. 279), there exists a Wiener process .{W (x), x ≥ 0} and independent
identically distributed non-negative random variables .τ1 , τ2 , . . . with .Eτi2 = 1 and
ν/2
.Eτ
i < ∞ such that

S(k) = E1 + E2 + . . . + Ek = W (τ1 + τ2 + . . . + τk ),
. k ∈ N. (A.1.8)

Although it does not lead to the best possible rate in (A.1.4), we assume that .ν <
4. Using the Marcinkiewicz–Zygmund inequality (Marcinkiewicz and Zygmund,
1937) it may be shown that
| k |
|E |
| |
.| τi − k | = o(k 2/ν ) a.s.
| |
i=1

Hence by the modulus of continuity of W (see Csörgő and Révész, 1979) we get
that

. max |W (k) − W (j )| = O(k 1/ν (log k)1/2 ) a.s.


|k−j |≤k 2/ν

These imply (A.1.4) with any .ζ < 1/ν.


While the Skorokhod embedding leads to a reasonably simple proof of (A.1.4)–
(A.1.8) under Assumptions A.1.1 and A.1.2, the rate can be improved using
more sophisticated arguments. Optimal rates in the approximation of partial sums
of independent and identically distributed random variables are achieved by the
“Hungarian construction” of Komlós, Major, and Tusnády, often referred to as the
KMT construction; see Komlós et al. (1975, 1976). Csörgő and Révész (1981) and
Csörgő and Horváth (1993, 1997) provide a detailed history and applications these
approximation results under Assumptions A.1.1 and A.1.2.
A vast related literature exists on extending central limit theorems and invariance
principles to more general stationary processes; see, for example, the monographs
of Ibragimov and Linnik (1971), Bradley (2007), Dedecker et al. (2007), and
Billingsley (1968), among others. Work in this area can be categorized based on
which of the many popular models for serial dependence are assumed to apply to
the stationary process. These include for example linear processes, martingales, and
mixing conditions. Due to their wide applicability to time series models, which are
often built starting from a driving innovation sequence, we consider throughout this
book decomposable Bernoulli shifts. This concept can be easily extended to random
objects taking values in more complex spaces, including vector and function valued
random variables.
Definition A.1.1 We say that the scalar sequence .{Ei , i ∈ Z} is an .Lν -
decomposable Bernoulli shift for some .ν > 2 if (1) .{Ei , i ∈ Z} is a causal Bernoulli
shift, which is to say that .Ei = g(ηi , ηi−1 , . . .), where .{ηi , i ∈ Z} are independent
and identically distributed random variables taking values in a measurable space
504 A Appendix

S, g is a (deterministic) measurable function, .g : S ∞ → R, and (2) .{Ei , i ∈ Z}


.

satisfies the moment and weak dependence conditions .EEi = 0, .E|Ei |ν < ∞, and
( | | )
vm = E |Ei − Ei,m
.
∗ |ν 1/ν
≤ am−α with some a > 0 and α > 2,

∗ ∗ , η∗
= g(ηi , . . . , ηi−m+1 , ηi−m ∗
where .Ei,m i−m−1 , . . .), and .{ηk , k ∈ Z} are
independent, identically distributed copies of .η0 , independent of .{ηj , j ∈ Z}.
For brevity we simply say that a sequence satisfying Definition A.1.1 is .Lν -
decomposable. Note that all Bernoulli shift sequences are strictly stationary
and ergodic; see Breiman (1968), Proposition 6.6. Part (2) of Definition A.1.1
describes how well the sequence .{Ei , i ∈ Z} can be approximated by
a sequence exhibiting a finite range of dependence. Notice with .Ei,m ' =
∗,i ∗,i
g(ηi , . . . , ηi−m+1 , ηi−m , ηi−m−1 , . . .), where .{ηk∗,i , k ∈ Z} are independent,
identically distributed copies of .η0 , independent of .{ηj , j ∈ Z}, and independent
D
∗ ) = (E , E ' ), so that under Definition A.1.1,
for each i, that .(Ei , Ei,m i i,m
| |
(E |Ei − Ei,m
.
' |ν 1/ν
) ≤ cm−α ,
' , i ∈ Z} is m-dependent.
and the sequence .{Ei,m
Approximations of the partial sum process as in (A.1.4)–(A.1.7) can also be
established for .Lν -decomposable processes.
Theorem A.1.1 If .{Ei , i ∈ Z} is .Lν -decomposable, then for each N we can define
Wiener processes .{WN,1 (t), 0 ≤ t ≤ N/2} and .{WN,2 (t), 0 ≤ t ≤ N/2} such that
| k |
|E |
1 | |
. max | Ei − σ WN,1 (k)| = OP (1), (A.1.9)
1≤k≤N/2 k ζ | |
i=1

| N |
| E |
1 | |
. max | Ei − σ WN,2 (N − k)| = OP (1), (A.1.10)
N/2<k<N (N − k)ζ | |
i=k+1

{WN,1 (t), 0 ≤ t ≤ N/2}


. and {WN,2 (t), 0 ≤ t ≤ N/2} are independent
(A.1.11)

and

ζ < 1/2,
. (A.1.12)

where
⎛ N ⎞2
1 E
. lim E Ei = σ 2 . (A.1.13)
N →∞ N
i=1
A Appendix 505

A proof of this result is given in Aue et al. (2014), which we outline for
multivariate .Lν -decomposable processes in Theorem A.1.3 below.
Theorem A.1.1 does not provide any information on what value of .ζ can be
taken in (A.1.12). A careful study of the proof of Theorem A.1.3 provides an
upper bound for .ζ , but it is far from optimal. The best possible value of .ζ is
attained by the Komlós–Major–Tusnády approximation in case of independent and
identically distributed random variables, and their results have been extended to
ν
.L -decomposable processes by Berkes et al. (2014). Although their result is more

general, we present it for the case of .Lν -decomposable Bernoulli shifts:


Theorem A.1.2 If .{Ei , i ∈ Z} is .Lν -decomposable with .ν > 4, then there is a
constant .α0 (ν) so that if .α ≥ α0 (ν), then for each N we can define Wiener processes
.{WN,1 (t), 0 ≤ t ≤ N/2} and .{WN,2 (t), 0 ≤ t ≤ N/2} such that (A.1.9)–(A.1.11)

and (A.1.13) holds with .ζ = 1/ν.


As we now demonstrate, the processes generated by many time series models are
Lν -decomposable under natural moment conditions on the innovation sequence.
.

Example A.1.1 (Linear Time Series) .{Ei , i ∈ Z} is said to follow a linear process
if

E
Ei =
. cl ηi−l , (A.1.14)
l=0

where .{ηl , l ∈ Z} are independent and identically distributed random variables


with .Eηi = 0 and .E|ηi |ν < ∞. Evidently .Ei is a Bernoulli shift, and if the scalar
coefficients .{cl , l ∈ N} satisfy

|cl | ≤ cl−α−1 ,
. 1 ≤ l < ∞,

for a constant .c > 0, then by the triangle inequality


⎛ | |ν ⎞1/ν
| E |
| ∞ | E∞
( )
. ⎝ |
E| cl ηi−j || ⎠ ≤ (E|η0 |ν )1/ν |cj | = O l−α , as l → ∞.
|j =l+1 | j =l+1

Hence the approximability condition in Definition A.1.1 holds. If .Ei is a stationary


and causal ARMA.(p, q) process, then (A.1.14) holds and .|cl | = O(ρ −l ), as .l →
∞, with some .0 < ρ < 1. So in this case Definition A.1.1 holds for all .α > 0.
Example A.1.2 (GARCH(1,1) Process) .{Ei , i ∈ Z} is said to follow a stationary
GARCH(1,1) sequence if

Ei = ηi h i
. and h2i = ω + β1 Ei−1
2
+ β2 h2i−1 , i ∈ Z, (A.1.15)
506 A Appendix

where .ω > 0, β1 ≥ 0, β2 ≥ 0 and .{ηl , l ∈ Z} are independent and identically


distributed random variables. The unique stationary causal solution of (A.1.15) is
Bernoulli shift of the form
∞ | |
E l
Ei = ηi h i
. and h2i = ω 2
(β1 ηi−k + β2 ), (A.1.16)
l=0 k=1

| |
with the convention . ∅ = 1. If .E log(η02 +1) < ∞, then the infinite sum in (A.1.16)
is almost surely finite if and only if .E log(β1 η02 + β2 ) < 0 (see Berkes et al., 2003
and Francq and Zakoian, 2010). Using the triangle inequality and the independence
of the .ηl ’s we get that
⎛ ⎛ ⎞ν ⎞1/ν
∞ | |
E j
. ⎝E ⎝ 2
(β1 ηi−k + β2 )⎠ ⎠
j =l+1 k=1

E j ⎛ ⎛
∞ | | ⎞ν ⎞1/ν ∞
E
≤ 2
E β1 ηi−k + β2 = ρj (A.1.17)
j =l+1 k=1 j =l+1

with .ρ = (E(β1 η02 + β2 )ν )1/ν . Hence part (2) of Definition A.1.1 holds for .{Ei , i ∈
Z} with any .α > 0, if .E(β1 η02 + β2 )ν < 1.
Example A.1.3 (Augmented GARCH Process) Duan (1997) replaced (A.1.15)
with

Ei = ηi h i
. and g(hi ) = a(ηi−1 )g(hi−1 ) + b(ηi−1 ), i ∈ Z, (A.1.18)

where .a(x), b(x), g(x) are measurable functions and .g(x) has a unique inverse. The
variables .{ηl , l ∈ Z} are independent and identically distributed random variables.
The unique stationary causal solution of (A.1.18) is


E | |
l−1
g(hi ) =
. b(ηi−l ) a(ηi−k )
l=1 k=1
| |
(. ∅ = 1). Similarly to Example A.1.2,
⎛ | |ν ⎞1/ν ⎛ | |ν ⎞1/ν
| ∞ −1 | ∞ | −1 |
| E j| |
| E | j| |
|
. ⎝ |
E| b(ηi−j ) |
a(ηi−k )| ⎠ ≤ ⎝ |
E |b(ηi−j ) a(ηi−k )|| ⎠
|j =l+1 k=1 | j =l+1 | k=1 |

E
= (E|b(η0 )|ν )1/ν ρ j −1
j =l+1
A Appendix 507

| |1/ν
with .ρ = |E |a(η0 )|ν | . If .E|b(η0 )|ν < ∞ and .E |a(η0 )|ν < 1, then we
have Definition A.1.1 with any .α > 0. Carrasco and Chen (2002) shows that
nearly all univariate GARCH sequences can be written as augmented GARCH
processes satisfying (A.1.18). Hörmann (2008) contains a detailed description of
the properties of augmented GARCH processes.
Example A.1.4 (Random Coefficient Models) Andél (1976) and Nicholls and
Quinn (1982) defined the RCA(1) sequence as the solution of

Ei = ηi,1 Ei−1 + ηi,2 ,


. (A.1.19)

where .{ηi = (ηi,1 , ηi,2 )T , i ∈ Z} are independent and identically distributed


random vectors. The unique stationary causal solution of (A.1.19) is


E | |
l−1
Ei =
. ηi−l,2 ηi−k+1,1
l=0 k=1
E
(. ∅ = 1), which is finite, if .E| log |η0,2 || < ∞ and .E log |η0,1 | < 0. As before,
⎛ | |ν ⎞1/ν ⎛ | |ν ⎞1/ν
| ∞ −1 | ∞ | −1 |
| E j| |
| E | j| |
|
. ⎝ |
E| ηi−j,2 |
ηi−k+1,1 | ⎠ ≤ ⎝ |
E |ηi−j,2 ηi−k+1,1 || ⎠
|j =l+1 k=1 | j =l+1 | k=1 |

E
= E|η0,2 |ν ρ j −1 ,
j =l+1

so Definition A.1.1 holds for all .α > 0, if .E|η0,1 |ν < 1. Aue et al. (2006) and Berkes
et al. (2009c) contain the necessary and sufficient condition for the existence of a
unique solution of (A.1.19) and estimation theory for the parameters of the RCA(1).
It is straightforward to generalize Definition A.1.1 to vector valued random
variables. Let .|| · || denote the Euclidean norm of vectors and matrices.
Definition A.1.2 We say that .{E i ∈ Rd , i ∈ Z} is a vector valued
ν
.L -decomposable Bernoulli shift if .E i = g(ηi , ηi−1 , . . .), where .g is a
(deterministic) measurable function, .S ∞ → Rd , .EE i = 0, .E||E i ||ν < ∞ with
some .ν > 2, .{ηi , i ∈ Z} are independent and identically distributed random
variables with values in a measurable space .S,
( || ||ν )1/ν
. E ||E i − E ∗i,m || ≤ cm−α with some c > 0 and α > 2, (A.1.20)

where .E ∗i,j = g(ηi , . . . , ηi−j +1 , ηi−j


∗ , η∗ ∗
i−j −1 , . . .), and .{ηl , l ∈ Z} are indepen-
dent, identically distributed copies of .η0 , independent of .{ηj , −∞ < j < ∞}.
508 A Appendix

Theorem A.1.3 If Assumption A.1.2 holds, then for each N we can define Wiener
processes .{WN,1 (t), 0 ≤ t ≤ N/2} and .{WN,2 (t), 0 ≤ t ≤ N/2} such that
.EWN,1 (t) = EWN,2 (t) = 0, WN,1 (t)W
T (s) = W T
N,1 N,2 (t)WN,2 (s) = min(t, s)E

| k |
|E |
1 | |
. max | E i − WN,1 (k)| = OP (1), (A.1.21)
1≤k≤N/2 k ζ | |
i=1

| N |
| E |
1 | |
. max | E i − WN,2 (N − k)| = OP (1), (A.1.22)
N/2<k<N (N − k)ζ | |
i=k+1

{WN,1 (t), 0 ≤ t ≤ N/2}


. and {WN,2 (t), 0 ≤ t ≤ N/2} are independent
(A.1.23)

and

ζ < 1/2,
. (A.1.24)

where
⎛ N ⎞ ⎛ N ⎞T
1 E E
. lim E Ei Ei = E. (A.1.25)
N →∞ N
l=1 l=1

Proof The proof is rather technical so we just outline the major steps and explain
why the proof of (A.1.21) implies (A.1.22) and (A.1.23). A detailed proof is given
in Aue et al. (2014). First we define integers .ri so that .r1 = 1, and .Lri − ri−1 ⎦ = i a ,
with a suitably chosen a. Let .Ti = {ri−1 + 1, ri−1 + 2, . . . , ri }. We will show that
at the points .ri ,
|| i ||
||E E
i ||
|| || ζ
. || Zl − Ql || = o(ri ) a.s., (A.1.26)
|| ||
l=1 l=1

where
E E
Zl =
. Ej and Ql = Nj ,
j ∈Tl j ∈Tl

and .Nj , 1 ≤ j < ∞ are independent identically distributed normal random vectors
with .ENj = 0 and .ENj NT j = E. (A.1.26) implies (A.1.21). In order to proceed,
we must verify that the sum of the .E l ’s and .Nl ’s does not change too much if we
A Appendix 509

sum these vectors for values of the elements of .Ti . Let .i ∗ be the largest integer such
that .ri ∗ ≤ k. If
|| ||
|| v ||
|| E ||
|| E l ||
ζ
. max
ri ∗ ≤v<ri ∗ +1 || || = O(rk ) a.s. (A.1.27)
||l=ri ∗ +1 ||

and
|| ||
|| E ||
|| v ||
|| Nl ||
ζ
. max ||
ri ∗ ≤v<ri ∗ +1 || || = O(rk ) a.s. (A.1.28)
l=ri ∗ +1 ||

hold, then (A.1.26) implies (A.1.21). These rely on maximal inequalities for
partial sums. Since the .Nl ’s are independent identically distributed normal random
vectors, one can easily verify that the maximum in (A.1.28) is bounded by
.((ri ∗ +1 − ri ∗ ) log(ri ∗ +1 − ri ∗ ))
1/2 which is smaller than .r ζ by the definitions of
k
.rk . Computing the moments of the sum in (A.1.26), standard maximal inequality

argument gives (A.1.27) (see Lemma S.2.1 in Aue et al., 2014).


We use blocking argument to prove (A.1.26). Now we cut .Tl into consecutive
subsets

Tl = Jl1 ∪ Il1 ∪ Jl2 ∪ Il2 ∪ . . . ∪ Jlm (l) ∪ Ilm (l) ∪ Rl ,


.

where .|Ilj | = Llb ⎦ and .|Jlj | = Llc ⎦, 1 ≤ j ≤ m(l), .0 < c < b < a. We use .|T | for
the cardinality of the set T . .Rl contains all the remaining elements of .Tl which are
not in .Jlj ∪ Ilj , 1 ≤ j ≤ m(l). We say that the .Jlj ’s are short blocks and .Ilj ’s are
long blocks. For .j ∈ Tl , let .E ∗j,nl with .nl = |Jl1 |. We recall that .E ∗j,k is defined in
Assumption A.1.2. We replaced the dependent vectors with .nl = |Jl1 | independent
vectors for each .l = 1, 2, . . .. The next step is to prove that the difference between
the sum of the .E j ’s and the .E ∗j,nl ’s is small. Let
E
.Z∗l = E ∗j,nl .
j ∈Tl

Using Assumption A.1.2 one can show that


|| i ||
||E E
i ||
|| ∗ || ζ
. || Zl − Zl || = o(ri ) a.s.
|| ||
l=1 l=1

Next we prove that in .Z∗l only the large blocks matter. So let

E
m(l) E E
E ∗k,nl + E ∗k,nl .
(1)
Ul =
.

j =1 k∈Jlj k∈Rl
510 A Appendix

Similarly,

E
m(l) E
E ∗k,nl ,
(2)
Ul =
.

j =1 k∈Ilj

and therefore

Z∗l = Ul + Ul .
(1) (2)
.

Thus ∗
E .Zl ∗is written as the sum of variables in short and long blocks. We note that
. k∈Jlj E k,nl , 1 ≤ j ≤ m are independent random vectors. So similarly to (A.1.27)
we can establish that
|| i ||
||E ||
|| (1) || ζ
. || Ul || = o(ri ) a.s.
|| ||
l=1
E (2)
We proved that we need to provide approximation for . il=1 Ul . Using the choice
of .nl , we have that
−1/2
E
.Vl,j = n
l E ∗k,nl , 1 ≤ j ≤ m,
k∈Ilj

are independent and identically distributed random vectors. Also,

1/2
E
m(l)
U(2)
.
l = nl Vl,j .
j =1

(2)
It is clear that .Ul , l = 1, 2, . . . are independent random vectors due to the
definition of .E ∗ ’s. Using the observation that .Vl,j , 1 ≤ j ≤ m(l) are independent
(2)
identically random vectors, we can now approximate .Ul with a suitably con-
structed normal random vector (see Lemmas S2.3 and S2.4 in Aue et al., 2014).
Thus we can define independent identically distributed normal random vectors
.{Ml , l ≥ 1} such that .EMl = 0 and .EMl M
T = E and
l
{ || || }
|| ||
P ||(nl m(l))−1/2 Ul − Ml || > cl−(2+ρ) ≤ c1 l−(2+ρ)
(2)
. (A.1.29)

with some .c1 > 0, ρ > 0. Using (A.1.29) with the Borel–Cantelli lemma we
conclude
|| i ||
||E E
i ||
|| (2) || ζ
. || Ul − (nl m(l))1/2 Ml || = o(ri ) a.s. (A.1.30)
|| ||
l=1 l=1
A Appendix 511

with some .ζ = ζ (a, b, c) < 1/2. We note that there are independent identically
distributed normal random variables .Nl,j with .ENl,j = 0 and .ENl,j NTl,j = E
such that

E
m(l) E
.(m(l)nl )1/2 Ml = Nl,j .
k=1 j ∈Ilk

Now we can write (A.1.30) as


|| ||
|| i ||
||E (2) E E E
i m(l)
||
|| Ul − Nl,j ||
ζ
.
|| || = o(ri ) a.s. (A.1.31)
|| l=1 l=1 k=1 j ∈Il ||
k

The last step in the proof of (A.1.21) is the verification that the sums of independent
identically distributed random vectors .Nl,j , .ENl,j = 0 and .ENl,j NT l,j = E on
short blocks are negligible. Elementary arguments based on the properties of the
normal distribution yield that
|| ||
||E E E ||
|| i m(l) ||
|| Nl,j ||
ζ
.
|| || = o(ri ) a.s.
|| l=1 k=1 j ∈Jl ||
k

and
|| ||
|| i ||
||E E ||
|| Nl,j ||
ζ
.
|| || = o(ri ) a.s.
|| l=1 j ∈Rl ||

Now the proof of (A.1.21) is complete.


To prove (A.1.22) we need to repeat the proof of (A.1.21) with one modification.
We start at N and we go backwards defining the short and long blocks as before.
The independence in (A.1.23) follows from the construction. Since we only need
approximations on the intervals .[1, N/2] and .[N/2, N], it is enough to use those
which are completely inside .[1, N/2] or .[N/2, N]. But the sums of the .Ek,n∗ on
l
these long blocks are independent. The approximating normal random variables are
fitted to these sums so we can assume that the normal random vectors used in the
approximation on .[1, N/2] and .[N/2, N] are independent. Hence (A.1.23) holds.


We provide two examples of multivariate time series processes where the
assumptions of Theorem A.1.3 hold.
512 A Appendix

Example A.1.5 We assume that



E
Xi =
. Al ηi−l , −∞ < l < ∞,
l=0

where .{Al , 0 ≤ l < ∞} are .d ×d matrices, .ηi ∈ Rd are independent and identically
distributed random vectors. Let .||·|| be a vector norm, and we use .||·|| for the induced
matrix (linear operator) norm. Similarly to Example A.1.1, if .Eηi = 0, .||ηi ||ν < ∞,
.ν > 2 and

||Al || ≤ cl−α−1 ,
.

then .Xi is .Łν -decomposable.


Example A.1.6 (Generalized Random Coefficient Models) The random coeffi-
cient model in Example A.1.4 was extended to vector valued model by Pham (1986).
In this model .E i satisfies the recursion

E i = A(ηi )E i−1 + b(ηi ),


. i ∈ Z, (A.1.32)

where .E i ∈ Rd , .A(x) is a .d × d matrix valued function defined on .Rp , .b(x) ∈ Rd


is a vector valued function defined on .R p and .{ηl , l ∈ Z} are independent and
identically distributed random vectors with values in .Rp . Let .||x||ψ be a vector norm
in .Rp and .||A||ψ = sup{||Ax||ψ /||x||ψ : ||x||ψ /= 0} be the .ψ norm of matrices. The
stationary, ergodic and non anticipative solution of (A.1.32) is

∞ l−1
E | |
Ei =
. A(ηi−j )b(ηi−l ). (A.1.33)
l=0 j =0

If there is a vector norm .ψ such that .E||b(η0 )||ψ < ∞ and .E||A(η0 )||νψ < 1,
then the infinite sum in (A.1.32) is absolutely convergent with probability one
and .Lν -decomposable. As in Example A.1.4, .{E i , i ∈ Z} is a decomposable
Bernoulli shift and (A.1.20) holds for all .α. However, the norm in Assumption A.1.2
must be interpreted is a .ψ norm, instead of the Euclidean norm. Pham (1986)
proves that .{E i , l ∈ Z} is .β mixing under these conditions. Carrasco and
Chen (2002) sharpened the .β mixing bounds and provided several examples for
processes satisfying (A.1.32), including standard and power GARCH models, and
autoregressive conditional duration model.
Aue et al. (2009b) shows that several multivariate processes satisfy Assump-
tion A.1.2. They prove that the constant conditional correlation GARCH models of
Bollerslev (1990) and Jeantheau (1998), the multivariate exponential GARCH of
Kawakatsu (2006) are decomposable Bernoulli shifts.
A Appendix 513

Remark A.1.1 Assuming a polynomialE rate of decay to the coefficients .vm in


Definition A.1.1 could be relaxed to . ∞ m=1 vm < ∞ in order to establish
most central limit theorems for the partial sum process; see Wu (2005, 2007).
A polynomial rate is helpful to simplify arguments related to the consistency of
spectral density and long-run covariance estimation.

A.1.1 Weak Convergence of the Empirical Processes Based


on Stationary Sequences

Consider again observations .X1 , . . . , XN coming from a strictly stationary process


{Xi , i ∈ Z}. In addition to partial sums, the weak convergence of the empirical
.

process of such variables has also been investigated. Let

1E
k
Fk (x) =
. 1 {Xi ≤ x}
k
i=1

be the empirical distribution function. Due to stationarity, .Fk (x) is an unbiased


estimator for .F (x) = P {X1 ≤ x}, the distribution function of the observations.
The two parameter of sequential empirical process is defined as
⎛ ⎞
L(NE
+1)t⎦
1 ⎝
αN (t, x) =
. 1 {Xi ≤ x} − F (x)⎠ .
N 1/2
i=1

The following result is a special case of Theorem 1 of Berkes et al. (2009b).


Theorem A.1.4 If .{Xi , i ∈ Z} is .Lν -decomposable with some .ν > 0, and .α > 4 in
Definition 1.1.1, then one can define a sequence of mean zero Gaussian processes
.{KN (t, x), t ∈ [0, 1], −∞ < x < ∞} such that

. sup sup |αN (t, x) − KN (t, x)| = oP (1),


0<t<1 −∞<x<∞

where
( )
. EKN (t, x)KN t ' x ' (A.1.34)

E
( ) ┌ { } ┐
= min t, t ' P X0 ≤ x, Xl ≤ x ' − F (x)F (x ' ) .
l=−∞

Berkes et al. (2009b) also show that under the conditions of Theorem A.1.4
that the infinite sum defining the covariance function in (A.1.34) is absolutely
convergent.
514 A Appendix

A.2 Properties of Gaussian Processes

Let .{W (t), t ≥ 0} be a Wiener process. The following lemma appeared first in
Csörgő and Révész (1979) and a detailed proof is given in Csörgő and Révész
(1981).
Theorem A.2.1 For any .E > 0 there exists a constant .C = C(E) > 0 such that the
inequality
⎧ ⎫ ⎛ ⎞
CT v2
P
. sup sup |W (t + s) − W (t)| ≥ vh 1/2
≤ exp −
0≤t≤T 0≤s≤h h 2+E

holds for every positive .v, T and .0 ≤ h ≤ T .


The next result is due to Garsia (1970) and Garsia et al. (1970) (cf. Lemma 4.1
in Csörgő and Horváth, 1993, p. 240).
Theorem A.2.2 There is a non negative random variable .ξ such that

|W (t) − W (s)| ≤ ξ (|t − s| log(1/|t − s|))1/2


.

for all .0 ≤ t, s ≤ 1 and .E|ξ |p < ∞ for all .p > 0.


Due to the representation of the Brownian bridge .{B(t), 0 ≤ t ≤ 1} as .B(t) =
W (t) − tW (1), Theorems A.2.1 and A.2.2 remain true with the Wiener process
replaced by the Brownian bridge.
Let .{B(t), 0 ≤ t ≤ 1} be a Brownian bridge. Checking the covariances one can
easily verify that
⎧ ⎫ ⎧ ⎛ ⎞⎫
B(t) D 1 t
. , 0<t <1 = V log , 0<t <1 , (A.2.1)
[t (1 − t)]1/2 2 1−t

where .{V (t), t ≥ 0} is an Ornstein–Uhlenbeck process, i.e. it is a continuous


Gaussian process with .EV (t) = 0 and .EV (t)V (s) = exp(−|t − s|). Let .0 <
t1 ≤ t2 < 1,

(1 − t1 )t2
r=
. , (A.2.2)
t1 (1 − t2 )

1 1
a(x) = (2 log x)1/2
. and b(x) = 2 log x + log log x − log π.
2 2
The next theorem is usually called a Darling–Erdős type limit result. It is proven in
Darling and Erdős (1956) and later generalized to more general stationary Gaussian
sequences by Qualls and Watanabe (1972). For a survey on limit results for the
A Appendix 515

maximum of stationary Gaussian processes we refer to Leadbetter et al. (1983) and


Piterbarg (1996).
Theorem A.2.3 If .0 < t1 < t2 < 1, .min(t1 , 1 − t2 ) → 0, then for all .x ∈ R we
have that
⎧ ⎛ ⎞ ⎛ ⎞⎫
1 |B(t)| 1
.P a log r sup ≤ x + b log r → exp(−2e−x ),
t1 ≤t≤t2 [t (1 − t)]
2 1/2 2

where r is defined in (A.2.2)


Next we discuss the .Lp version of Theorem A.2.3. The result is taken from
Csörgő and Horváth (1993, p. 318). They derived their result from the central
limit theorem for .Lp functionals of Ornstein–Uhlenbeck processes proved in Mandl
(1968). Let for .p > 0
/ ∞/ ∞/ ∞ ⎧
1
a∗ (p) =
. |xy| p
−∞ −∞ −∞ 2π(1 − exp(−2|u|))1/2
⎛ ⎞ ⎫
1
exp − (x + y − 2exp(−|u|)|xy|) − φ(x)φ(y) dxdydu,
2 2
2(1 − exp(−2|u|))
/ ∞
b∗ (p) =
. |x|p φ(x)dx,
−∞

where
⎛ ⎞
1 1 2
.φ(x) = exp − x
(2π )1/2 2

is the standard normal density function.


Theorem A.2.4 Let .p > 0. If .0 < t1 < t2 < 1, .min(t1 , 1 − t2 ) → 0, then we have
⎛ ⎞1/2 ⎧ / ⎫
1 t2 |B(t)|p D
. dt − b∗ (p) log r → N,
2a∗ (p) log r t1 t (1 − t))1+p/2

where .N denotes a standard normal random variable and r is defined in (A.2.2).


Next we consider the limits of functionals of heavily weighted Brownian bridge.
The limit of the maximum is given in terms of .a1 (ν) and .a2 (ν), which are
independent copies of .sup1≤t<∞ |W (t)|/t ν . Define

ν−1/2 ν−1/2
a = max(γ1
. a1 (ν), γ2 a2 (ν)),
516 A Appendix

where
r r
. → γ1 , → γ2 , as min(t1 , 1 − t2 ) → 0, (A.2.3)
t1 1 − t2

and

r = min(t1 , 1 − t2 ).
. (A.2.4)

Theorem A.2.5 If .ν > 1/2, .0 < t1 < t2 < 1 and .min(t1 , 1 − t2 ) → 0, then we
have
|B(t)| D
.r ν−1/2 sup → a(ν).
t1 ≤t≤t2 [t (1 − t)]ν

where r is defined in (A.2.4).


Proof We use the representation of the Brownian bridge in terms of a Wiener
process .{W (t), t ≥ 1},

W (t) − tW (1), if 0 ≤ t ≤ 1/2
B(t) =
. (A.2.5)
−(1 − W (t)) + (1 − t)W (1), if 1/2 ≤ t ≤ 1.

Also,
| |
| |B(t)| |W (t)| ||
| t
. | sup − sup | ≤ |W (1)| sup
|t1 ≤t≤1/2 [t (1 − t)]ν t1 ≤t≤1/2 [t (1 − t)]ν | t1 ≤t≤1/2 [t (1 − t)]
ν

= OP (t11−ν ) (A.2.6)

and by the scale transformation of the Wiener process we have

|W (t)| D 1/2−ν |W (s)|


. sup = t1 sup . (A.2.7)
t1 ≤t≤1/2 [t (1 − t)] 1≤s≤1/(2t1 ) s (1 − st1 )
ν ν ν

Applying the law of the iterated logarithm we get that

|W (s)| |W (s)|
. sup ν (1 − st )ν
≤ 2ν sup → 0 a.s.,
1/(t1 log(1/t1 ))≤s≤1/(2t1 ) s 1 1/(t1 log(1/t1 ))≤s≤1/(2t1 ) sν

so by (A.2.6) and (A.2.7) we have


| |
| |B(t)| |W (t)| ||
| 1/2−ν 1/2−ν
. |t sup − t1 sup | = oP (1). (A.2.8)
|1 t1 ≤t≤1/2 [t (1 − t)]
ν
t1 ≤t≤1/ log(1/t1 ) tν |
A Appendix 517

Similarly to (A.2.8) we have


|
| |B(t)|
|
. |(1 − t2 ) − (1 − t2 )1/2−ν
1/2−ν
sup
| 1/2≤t≤t2 [t (1 − t)]
ν

|
|W (1) − W (t)| ||
× sup | (A.2.9)
t2 −1/ log(1/(1−t2 ))≤t≤t2 (1 − t)ν |

= oP (1).

Since .{W (t), 0 ≤ t ≤ 1/2} and .{W (1) − W (t), 1/2 ≤ t ≤ 1} are independent,
we have the independence of .a1 (ν) and .a2 (ν) in the definition of .a(ν). Using once
again the scale transformation of the Wiener process we have

1/2−ν |W (t)| D |W (s)| |W (s)|


t1
. sup = sup → sup a.s.
t1 ≤t≤1/ log(1/t1 ) tν 1≤s≤1/(t1 log(1/t1 )) sν 1≤s<∞ sν

and
|W (1) − W (t)|
(1 − t2 )1/2−ν
. sup
t2 −1/ log(1/(1−t2 ))≤t≤t2 (1 − t)ν
D |W (1 − t)|
= (1 − t2 )1/2−ν sup
t2 −1/ log(1/(1−t2 ))≤t≤t2 (1 − t)ν
|W (t)|
= (1 − t2 )1/2−ν sup
1−t2 ≤t≤1/ log(1/(1−t2 )) tν
D |W (t)|
= sup
1≤t≤1/[(1−t2 ) log(1/(1−t2 )] tν
|W (t)|
→ sup a.s.
1≤t<∞ tν

Hence the proof of Theorem A.2.5 is complete. ⨆



Now we consider the integral version of Theorem A.2.5. Let .b1 (p, ν) and
b2 (p, ν) be independent random variables such that
.

/ ∞
D D |W (t)|p
b1 (p, ν) = b2 (p, ν) =
. dt,
1 tν

and define
ν−p/2−1 ν−p/2−1
b(p, ν) = γ1
. b1 (p, ν) + γ2 b2 (p, ν),

where .γ1 and .γ2 are defined in (1.2.28).


518 A Appendix

Theorem A.2.6 Let .p ≥ 1. If .ν > p/2 + 1, .0 < t1 < t2 < 1 and .min(t1 , 1 − t2 ) →
0, then we have
/ t2 |B(t)|p D
r ν−p/2−1
. dt → b(p, ν),
t1 [t (1 − t)]ν

where r is defined in (A.2.4).


Proof Using again (A.2.5), we write
/ t2 |B(t)|p
. = A1 + . . . + A4 ,
t1 [t (1 − t)]ν

where
/ /
s1 |B(t)|p 1/2 |B(t)|p
A1 =
. dt, A2 = dt,
t1 [t (1 − t)]ν s1 [t (1 − t)]ν
/ /
s2 |B(t)|p t2 |B(t)|p
A3 =
. dt, and A4 = dt,
1/2 [t (1 − t)]ν s2 [t (1 − t)]ν

with .s1 = t1 log(1/t1 ) and .s2 = 1 − (1 − t2 ) log(1/(1 − t2 )). By the mean value
theorem we have
| |
. ||W (t) − tW (1)|p − |W (t)|p | ≤ p(|W (t) − tW (1)|p−1 + |W (t)|p−1 )t|W (1)|

≤ p2p (|W (t)|p−1 + t p−1 |W (1)|p−1 )t|W (1)|,

and therefore
| / | ⎧ / s1
| s1 |W (t)|p | t|W (t)|p−1
. |A1 − dt | ≤ p2p |W (1)| dt
| [t (1 − t)] |
t1 [t (1 − t)]
ν ν
t1
/ s1 ⎫
tp
+ |W (1)| p
dt
t1 [t (1 − t)]
ν
⎛ ⎛
3/2+p/2−ν 3/2+p/2−ν
= OP max s1 , t1 ,
⎞⎞
p+1−ν p+1−ν
t1 , s1 ,

since
/ s1 / s1
t|W (t)|p−1 t 1/2+p/2
E
. dt = E|W (1)|p−1 dt.
t1 [t (1 − t)]ν t1 [t (1 − t)]ν
A Appendix 519

Thus we get
| / |
ν−p/2−1 || |W (t)|p |
s1
t1 A − dt | = oP (1). (A.2.10)
.
| 1 [t (1 − t)]ν |
t1

Also,
|/ / s1 |
ν−p/2−1 || |W (t)|p |W (t)|p ||
s1
.t
| dt − dt | (A.2.11)
t1 [t (1 − t)]
1 ν
t1 tν
| |/ s
| 1 | 1 ν−p/2−1 |W (t)|p
|
≤ |1 − | t dt
(1 − s1 )ν | t1 1 tν
= oP (1).

Elementary argument gives


/ /
ν−p/2−1
1/2 |B(t)|p ν−p/2−1
1/2 [t (1 − t)]p
.t E dt = E|W (1)|p t1 dt = o(1),
1
s1 [t (1 − t)]ν s1 [t (1 − t)]ν

so by Markov’s inequality

ν−p/2−1
t1
. A2 = oP (1). (A.2.12)

Putting together (A.2.10)–(A.2.12) we conclude


| / |
ν−p/2−1 || |W (t)|p ||
s1
.t
1 |A1 + A2 − dt | = oP (1). (A.2.13)
t1 tν

One can similarly to the proof of (A.2.13) that


| / |
ν−p/2−1 |
| t2 |W (1) − W (t)|p ||
(1 − t2 )
.
| A 3 + A4 − dt | = oP (1). (A.2.14)
s2 tν

Now the independence of .{W (t), 0 ≤ t ≤ 1/2} and .{W (1) − W (t), 1/2 ≤ t}
implies the independence of .b1 (p, ν) and .b2 (p, ν). By the scale transformation of
the Wiener process we have
⎧ / / t2 ⎫
ν−p/2−1 |W (t)|ps1 |W (1) − W (t)|p
t. 1 dt, (1 − t2 ) ν−p/2−1
dt (A.2.15)
t1 tν s2 tν
⎧ / s1 / t2 ⎫
D ν−p/2−1 |W (t)|p |W (1 − t)|p
= t1 dt, (1 − t2 ) ν−p/2−1
dt
t1 tν s2 tν
520 A Appendix

⎧ / / 1−s2 ⎫
D ν−p/2−1 |W (t)|p
s1 |W ∗ (t)|p
= t1 dt, (1 − t 2 ) ν−p/2−1
dt
t1 tν 1−t2 tν
⎧/ / (1−s2 )/(1−t2 ) ⎫
D
s1 /t1 |W (t)|p |W ∗ (t)|p
= dt, dt
1 tν 1 tν

→ (b1 (p, ν), b2 (p, ν)) a.s.,

where .{W ∗ (t), t ≥ 1} is a standard Wiener process, independent of .{W (t), t ≥ 1}.
Theorem A.2.6 follows from (A.2.13)–(A.2.15). ⨅

The next result extends the Darling–Erdős limit result of Theorem A.2.3 to .χ 2
processes. Let

d
a(x) = (2 log x)1/2
. and bd (x) = 2 log x + log log x − log ┌(d/2),
2
where .┌(t) denotes the Gamma function.
Theorem A.2.7 If .0 < t1 < t2 < 1, .min(t1 , 1 − t2 ) → 0, then for all .x ∈ R we
have that

⎧ ⎛ ⎞ ⎛ d ⎞1/2 ⎛ ⎞⎫
1 E B 2 (t) 1
.P a log r sup i
≤ x + bd log r → exp(−2e−x ),
2 t1 ≤t≤t2 t (1 − t) 2
i=1

where .{B1 (t), 0 ≤ t ≤ 1}, . . . , {Bd (t), 0 ≤ t ≤ 1} are independent Brownian


bridges.
For a proof of Theorem A.2.7 see Horváth (1993), in which this distribution is
derived from Mandl (1968). More details can also be found in Csörgő and Horváth
(1997).
An .Lp version of Theorem A.2.7 is proven in Csörgő and Horváth (1997), which
we now outline. Let

/ ⎛ d ⎞p/2 ⎛ ⎞p/2
E E
d | |
d
a∗ (p, d) =
. xi2 yi2 (2π(1 − exp(−|u|))−1/2
R 2d+1 i=1 i=1 i=1
⎧⎛ ⎞
1
× exp − (x 2 + yi2 − 2exp(−|u|/2)xi yi
2(1 − exp(−|u|)) i
| |
d ⎫ ⎛| |
d
⎞⎛ d
| |

− φ(xi )φ(yi ) dxi dyi du,
i=1 i=1 i=1
A Appendix 521

where
⎛ ⎞/ ⎛ ⎞
p+d d
b∗ (p, d) = 2
.
p/2
┌ ┌ ,
2 2

and .┌(x) is the Gamma function.


Theorem A.2.8 Let .p > 0. If .0 < t1 < t2 and .min(t1 , 1 − t2 ) → 0, then we have

⎧/ ⎛ d ⎞p/2
1 t2 1 E
. Bi2 (t)
(a∗ (p, d) log r)1/2 t1 [t (1 − t)](p/2+1)
i=1

D
− b∗ (p, d) log r → N,

where r is defined in (A.2.4) and .N is a standard normal random variable.


Next we discuss the generalization of Theorem A.2.5 to the multivariate setting.
Let .{B(t), 0 ≤ t ≤ 1} be a Gaussian process in .Rd with .EB(t) = 0 and
T
.EB(t)B (s) = (min(t, s) − ts)E for a positive definite covariance matrix .E.

Due to the covariance structure, .B(t) is called a Brownian bridge in .Rd with
covariance function .E. We say that .{W(t), t ≥ 1} is a Brownian motion with
values in .Rd with covariance .E, if it is a Gaussian process with .EW(t) = 0
and .EW(t)W(s) = min(t, s)E. Next we define the independent random variables
.ā1,E (d, ν) and .ā2,E (d, ν) with distribution

D D ||W(t)||
ā1,E (d, ν) = ā2,E (d, ν) = sup
. .
1≤t<∞ tν

Let
ν−1/2 ν−1/2
āE (d, ν) = max(γ1
. ā1,E (d, ν), γ2 ā2,E (d, ν)).

Similarly, .b̄1,E (p, ν), b̄2,E (p, ν) are independent and identically distributed,
/ ∞
D D ||W(t)||p
b̄1,E (p, ν) = b̄2,E (p, ν) =
. dt
1 tν

and
ν−p/2−1 ν−p/2−1
b̄E (p, ν) = γ1
. b̄1 (p, ν) + γ2 b̄2 (p, ν).

The constants .γ1 and .γ2 are defined in (1.2.28).


522 A Appendix

Theorem A.2.9 If .ν > 1/2, .0 < t1 < t2 < 1 and .min(t1 , 1 − t2 ) → 0, then we
have

||B(t)|| D
r ν−1/2 sup
. → āE (d, ν),
t1 ≤t≤t2 [t (1 − t)]ν

where r is defined in (A.2.4).


Theorem A.2.10 Let .p ≥ 1. If .ν > p/2+1, .0 < t1 < t2 < 1 and .min(t1 , 1−t2 ) →
0, then we have
/ t2 ||B(t)|| D
r
.
ν−p/2−1
dt → b̄E (p, ν),
t1 [t (1 − t)] ν

where r is defined in (A.2.4).


Theorems A.2.9 and A.2.10 can be established along similar lines as Theo-
rem A.2.6.
Theorem A.2.11 We assume that .{Bi (t), 0 ≤ t ≤ 1}, i ∈ {1, . . . , M} are
independent Brownian bridges.
(i) If .0 ≤ κ < 1/2, then
⎛ ⎞
1 E
M
D
−1/2
M
. sup Bi2 (t) − Mc1 → N (0, c2 ), (A.2.16)
0<t<1 [t (1 − t)]

i=1

(M → ∞).

(ii) If
/ 1 t (1 − t)
. dt < ∞,
0 w(t)

then
⎛M / ⎞
E 1 Bi2 (t) D
−1/2
M
. dt − Mc3 → N (0, c4 ), (M → ∞) (A.2.17)
0 w(t)
i=1
A Appendix 523

where .N (0, c) is a normal random variable with .EN (0, c) = 0,


var(N (0, c)) = c,
.

⎛ ⎞2−4κ ⎛ ⎞1−8κ
1 1
.c1 = , c2 = ,
2 2
/ ⎛/ ⎞2
1 t (1 − t) 1 B 2 (t) − t (1 − t)
c3 =
. dt, c4 = E dt
0 w(t) 0 w(t)

and .{B(t), 0 ≤ t ≤ 1} is a Brownian bridge.


Proof To prove the first part, we claim if .I (w, c) < ∞ with some .c > 0, then
|M |
−1/2 1 ||E 2 |
| D |┌(t)|
M
. sup 2 | (Bi (t) − t (1 − t))| → sup 2 , (A.2.18)
0<t<1 w (t) | |
i=1 0<t<1 w (t)

where .┌(t) is a Gaussian process with .E┌(t) = 0 and .E┌(t)┌(s) = E[(B 2 (t) −
t (1 − t))(B 2 (s) − s(1 − s))]. For every .0 < δ < 1/2,
⎛ ⎞
1 E 2
M
−1/2 D [δ,1−δ] ┌(t)
M
.
2
(Bi (t) − t (1 − t)) −→ .
w (t) w 2 (t)
i=1

Using that .Bi (t) = Wi (t) − tWi (t), where .{Wi (t), t ≥ 0} are independent Wiener
processes, we get
|M |
|E B 2 (t) − t (1 − t) |
| i |
.| |
| w 2 (t) |
i=1
|M |
|E W 2 (t) − 2tW (t)W (1) + t 2 W 2 (1) − t (1 − t) |
| i i |
=| i i
|
| w 2 (t) |
i=1
|M | |M | |M |
|E W 2 (t) − t | |E tW (t)W (1) − t 2 | |E | t2
| | | i i | | |
≤| i
| + 2 | | + | (W 2
(1) − 1) | 2 .
| 2
w (t) | | 2
w (t) | | i | w (t)
i=1 i=1 i=1

By the central limit theorem


|M |
|E |
−1/2 | |
M
. | (Wi (1) − 1)| = OP (1)
2
| |
i=1
524 A Appendix

and since .t 1/2 /w(t) → 0, as .t → 0, we get


⎧ |M | ⎫
|E | t2
| |
. lim lim sup P sup | (Wi2 (1) − 1)| 2 >x =0
δ→0 M→∞ 0<t≤δ | i=1
| w (t)

for all .x ∈ R. It follows from elementary calculation that


⎛ (
. E|tWi (t)Wi (1) − sWi (s)Wi (1)|4 ≤ (E|Wi (1)|8 )1/2 E |t − s||Wi (t)|
)8 ⎞1/2
+ |Wi (t) − Wi (s)|

≤ c5 |t − s|2 ,

E i (t) − Wi (s)| = c5 |t − s| , where .c5 = EN (0, 1). Hence


since .E|W 8 3 8

M −1/2 M
. i=1 (Wi (t)Wi (1) − t) is tight. Since the finite dimensional distributions are
normal, we get
|M |
|E |
−1/2 | |
M
. sup | (Wi (t)Wi (1) − t)| = OP (1).
0<t<1 | i=1
|

Thus we have
⎧ | | ⎫
t || −1/2 E |
M
|
. lim lim sup P sup 2 |M (Wi (t)Wi (1) − t)| > x = 0
δ→0 M→∞ 0<t≤δ w (t) | i=1
|

for all .x ∈ R. Finally, using the Hájek–Rényi inequality for martingales (see Hall
and Heyde, 1980)
⎧ |M | ⎫
|E W 2 (t) − t |
−1/2 | i |
. lim lim sup P sup M | |>x =0
δ→0 M→∞ 0<t≤δ | w 2 (t) |
i=1

for all .x ∈ R. Similar arguments give


⎧ | | ⎫
| 1 E M |
−1/2 | |
. lim lim sup P sup M | 2 (Bi (t) − t (1 − t))| > x = 0
2
δ→0 M→∞ 1−δ≤t<1 | w (t) |
i=1

for all .x ∈ R, completing the proof of (A.2.18). By (A.2.18) we have


⎧ ⎫
E
M
B 2 (1/2) E
M
B 2 (t) E
M
B 2 (t)
i
. lim P ≤ sup i
≤ sup i
=1
M→∞ w 2 (1/2) 0<t<1 i=1 w 2 (t) |t−1/2|≤δ i=1 w 2 (t)
i=1
A Appendix 525

for all .δ > 0, since .t (1 − t)/w 2 (t) = [t (1 − t)]1−2κ reaches its largest value at
.t = 1/2. We showed the uniform continuity, and therefore

⎧ |M
|E B 2 (t) − t (1 − t)
|
. lim lim sup P sup M −1/2 | i
δ→0 M→∞ |t−1/2|≤δ | w 2 (t)
i=1
| ⎫
E Bi2 (1/2) − 1/4 ||
M
− |>x =0
w 2 (1/2) |
i=1

for all .x ∈ R. This means that we need to only show that

E
M
Bi2 (1/2) − 1/4 D
M −1/2
. → N (0, c2 ),
w 2 (1/2)
i=1

which is a consequence of the central limit theorem.


According to the central limit theorem, the second part of the theorem is proven
if we show
⎛/ 1 ⎞2
B 2 (t)
E
. dt < ∞.
0 w(t)

Due to symmetry, we need to establish only


⎛/ 1/2 ⎞2
B 2 (t)
E
. dt < ∞.
0 w(t)

Since .B(t) = W (t) − tW (1), where .{W (t), t ≥ 0} is a Wiener process, we have

B 2 (t) ≤ 4W 2 (t) + 4t 2 W 2 (1).


.

We have by assumption that


⎛/ 1/2 t 2 W 2 (1) ⎞2 ⎛/ 1/2 ⎞2
t2
E
. dt = E[W (1)] 4
dt < ∞.
0 w(t) 0 w(t)

If .t ≤ s, then

EW 2 (t)W 2 (s) = E[W 2 (t)((W (s) − W (t))2 + 2(W (s) − W (t))W (t) + W 2 (t))]
.

= t (s − t) + 3t 2
≤ 3st,
526 A Appendix

and therefore
⎛/ 1/2 ⎞2 / 1/2 / s
W 2 (t) EW 2 (t)W 2 (s)
E
. dt =2 dtds
0 w(t) 0 0 w(t)w(s)
⎛/ 1/2 ⎞2
t
≤6 dt < ∞.
0 w(t)


A.3 Functional Observations

We summarize here some important results used in this book related to change point
analysis of functional data. For a thorough treatments of functional data analysis we
refer to Horváth and Kokoszka (2012) and Kokoszka and Reimherr (2017). For the
sake of notational simplicity, we only state the results for functional data that are
stochastic processes taking values in the space .L2 ([0, 1], R) = L2 of real valued
square integrable functions defined on the unit interval, but they hold for variables
/ /1
general separable Hilbert spaces. We use . for . 0 and
⎛/ ⎞1/2
||f ||2 =
.
2
f (t)dt

for the norm in .L2 . We generalize the concept of Bernoulli shifts and decompos-
ability to random functions.
Definition A.3.1 We say .{Ei (t), i ∈ Z, t ∈ [0, 1]} is .Lν -decomposable if .Ei (t) =
g(ηi , ηi−1 , . . .)(t) for some (deterministic) measurable function .g : S ∞ → L2 ,
where .{ηj , j ∈ Z} are independent and identically distributed random variables
with values in a measurable space .S, and .Ei (t) = Ei (t, ω) is jointly measurable in
.(t, ω), for each .i ∈ Z. Further .EEi (t) = 0 for all .t ∈ [0, 1], .||Ei || < ∞ with some
ν
2
.ν > 2, and

( ∗
)1/ν
. E||Ei − Ei,m ||ν2 ≤ cm−α with some α > 2, (A.3.1)

∗ = g(η , . . . , η ∗ ∗ ∗
where .Ei,l i i−l+1 , ηi−l , ηi−l−1 , . . .), .{Ei , i ∈ Z} are independent
copies of .η0 , independent of .{ηl , l ∈ Z}.
Horváth and Kokoszka (2012) and Hörmann and Kokoszka (2010) provides
several examples for processes satisfying Definition A.3.1. Their examples include
the functional autoregressive process of Bosq (2000), linear processes, bilinear
processes (functional random coefficient model), functional ARCH of Hörmann
et al. (2013), and GARCH of Aue et al. (2017).
A Appendix 527

Theorem A.3.1 If .{Ei (t), i ∈ Z, t ∈ [0, 1]} is .Lν -decomposable, then


|| k ||ν || N ||ν
||E || ||E ||
|| || || ||
.E max || Ei || ≤ cN ν/2 and E max || Ei || ≤ cN ν/2 (A.3.2)
1≤k≤N || || 1≤k≤N || ||
i=1 2 i=k 2

with some .c > 0.


A proof of Theorem A.3.1 was obtained by Berkes et al. (2013).
Remark A.3.1 Berkes et al. (2013) obtained upper bounds for c in (A.3.2). If g
of Definition A.3.1 depends on N such that .E||E0,N (t)||ν2 → 0 and the constant
in (A.3.1) does not depend on N, then .c = cN → 0 in Theorem A.3.1.
The following Theorem was also proven in Berkes et al. (2013).
Theorem A.3.2 If .{Ei (t), i ∈ Z, t ∈ [0, 1]} is .Lν -decomposable, then we can
define a sequence of Gaussian processes .{┌N (u, t), 0 ≤ u, t ≤ 1} such that
⎛ ⎞2
/ LN
E u⎦
. sup ⎝N −1/2 Ei (t) − ┌N (u, t)⎠ dt = oP (1),
0≤u≤1 i=1

E┌N (u, t) = 0, .E┌N (u, t)┌N (v, s) = min(u, v)D(t, s) and


.


E
D(t, s) =
. EE0 (t)El (s).
l=−∞

Remark A.3.2 In the process of proving Theorem A.3.2 Berkes et al. (2013)
establish that if .{Ei (t), i ∈ Z, t ∈ [0, 1]} is .Lν -decomposable then the infinite
sum defining .D(t, s) is absolutely convergent in .L2 ([0, 1]2 , R).
Corollary A.3.1 If .{Ei (t), i ∈ Z, t ∈ [0, 1]} is .Lν -decomposable, then we
can define two sequence of Gaussian processes .{┌N,1 (u, t), 0 ≤ u, t ≤ 1} and
.{┌N,2 (u, t), 0 ≤ u, t ≤ 1} such that

⎛ ⎞2
LN
E u⎦
. sup ⎝N −1/2 Ei (t) − ┌N,1 (u, t)⎠ dt = oP (1),
0≤u≤1/2 i=1

⎛ ⎞2
E
N
. sup ⎝N −1/2 Ei (t) − ┌N,2 (u, t)⎠ dt = oP (1),
0≤u≤1/2 i=N −LN u⎦

for each N the processes .{┌N,1 (u, t), 0 ≤ u ≤ 1/2, 0 ≤ t ≤ 1} and


{┌N,2 (u, t), 0 ≤ u ≤ 1/2, 0 ≤ t ≤ 1} are independent, .E┌N,1 (u, t) =
.
528 A Appendix

E┌N,2 (u, t) = 0, .E┌N,1 (u, t)┌N,1 (v, s) = min(u, v)D(t, s) and .E┌N,2 (u, t)┌N,2
(v, s) = min(u, v)D(t, s).
The computation of functionals of stochastic processes with sample paths in .L2
frequently makes use of the Karhunen–Loéve expansion.
Theorem A.3.3 (Karhunen–Loéve Expansion) If the process .{X(t), 0 ≤ t ≤ 1}
satisfies
/ 1
EX(t) = 0
. and EX2 (t)dt < ∞,
0

then

E 1/2
X(t) =
. λi ζi φi (t),
l=1

Eζi = 0,
.


1, if i = j,
Eζi ζj =
.
0, if i =
/ j,

λ1 ≥ λ2 ≥ . . . ≥ 0,
.

/
λi φi (t) =
. C(t, s)φi (s)ds, i ≥ 1,

and
/ ⎧
1, if i = j,
. φi (t)φj (t)dt =
0, if i =
/ j,

where .C(t, s) = EX(t)X(s).


For a proof see Theorem 1.5 of Bosq (2000). We note if .{X(t), 0 ≤ t ≤ 1} is a
Gaussian process, then .{ζi , i ∈ Z} may be taken to be independent standard normal
random variables.
The consistency of estimators for the eigenvalues of covariance operators is based
on the following result.
Theorem A.3.4 If .{D(1) (t, s), 0 ≤ t, s ≤ 1} and .{D(2) (t, s), 0 ≤ t, s ≤ 1} are
square integrable kernel (covariance) functions (symmetric, non-negative definite)
(1) (1) (1) (2) (2) (2)
with respective eigenvalues .λ1 ≥ λ2 ≥ λ3 ≥ . . . and .λ1 ≥ λ2 ≥ λ3 ≥ . . .,
(1) (2)
and corresponding orthonormal eigenfunctions .{ϕi , i ∈ N} and .{ϕi , i ∈ N}
then
A Appendix 529

|| ||
(1) (2) || ||
.|λi − λi | ≤ ||D(1) (t, s) − D(2) (t, s)|| ,
2

(2) (2) (2) (2)


and if .λi−1 < λi < λi+1 .(λ0 = 0) then,

|| ||
(1) (2) || ||
. ||ϕi − ϕi || ≤ ai ||D(1) (t, s) − D(2) (t, s)|| ,
2

where if .i ≥ 2, .ai = 2 2 max{(λi−1 − λi )−1 , (λi − λi+1 )−1 }, and .a1 =
(2) (2) (2) (2)
√ (2) (2) −1
2 2(λ1 − λ2 ) .
For a proof see Lemma 2.2 in Horváth and Kokoszka (2012) and Lemma 4.3 of
Bosq (2000). For further results when some of the eigenvalues are repeated we refer
to Reimherr (2015) and Petrovich and Reimherr (2017).
Bibliography

K.M. Abadir, J.R. Magnus, Matrix Algebra, vol. 1 (Cambridge University Press, 2005)
G.P. Aielli, Dynamic conditional correlation: On properties and estimation. J. Bus. Econ. Stat. 31,
282–299 (2013)
F. Akashi, H. Dette, Y. Liu, Change-point detection in autoregressive models with no moment
assumptions. J. Time Ser. Anal. 39(5), 763–786 (2018)
J. Albin, On extremes and streams of upcrossings. Stoch. Process. Appl. 94, 271–300 (2001)
J. Albin, D. Jarušková, On a test statistic for linear trend. Extremes 6, 247–258 (2003)
J. Andél, Autoregressive series with random parameters. Math. Oper. Stat. 7, 735–741 (1976)
D. Andrews, Heteroskedasticity and autocorrelation consistent covariance matrix estimation.
Econometrica 59, 817–858 (1991)
D. Andrews, Tests for parameter instability and structural change with unknown change point.
Econometrica 61, 821–856 (1993)
D. Andrews, J. Monahan. An improved heteroskedasticity and autocorrelation consistent
covariance matrix estimator. Econometrica 60, 953–966 (1992)
J. Antoch, M. Hušková, Asymptotics, Nonparametrics, and Time Series: Estimators of Changes
(CRC Press, Boca Raton, 1999)
J. Antoch, M. Hušková, N. Veraverbeke, Change–point problem and bootstrap. J. Nonparam. Stat.
5, 123–144 (1995)
J. Antoch, M. Hušková, Z. Prášková, Effect of dependence on statistics for determination of
change. J. Stat. Plan. Inference 60, 291–310 (1997)
J. Antoch, J. Hanousek, L. Horváth, M. Hušková, S. Wang, Structural breaks in panel data: Large
number of panels and short length time series. Econom. Rev. 38(7), 828–855 (2019)
S. Arlot, A. Celisse, Z. Harchaoui, A kernel multiple change-point algorithm via model selection.
J. Mach. Learn. Res. 20(162), 1–56 (2019)
P. Aschersleben, M. Wagner, cointReg: Parameter Estimation and Inference in a Cointegrating
Regression (2016). R package version 0.2.0
S. Astill, D.I. Harvey, S.J. Leybourne, A.M.R. Taylor, Y. Zu, Cusum-based monitoring for
explosive episodes in financial data in the presence of time-varying volatility. J. Financ.
Econom. 21, 187–227 (2023)
J. Aston, C. Kirch, Detecting and estimating epidemic changes in dependent functional data. J.
Multivariate Anal. 109, 204–220 (2012)
J.A.D. Aston, C. Kirch, High dimensional efficiency with applications to change point tests.
Electron. J. Stat. 12(1), 1901–1947 (2018)
A. Aue, Strong approximation for RCA(1) time series with applications. Stat. Probab. Lett. 68,
369–382 (2004)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 531
L. Horváth, G. Rice, Change Point Analysis for Time Series, Springer Series
in Statistics, https://doi.org/10.1007/978-3-031-51609-2
532 Bibliography

A. Aue, L. Horváth, Delay time in sequential detection of change. Stat. Probab. Lett. 67(3),
221–231 (2004). ISSN 0167-7152
A. Aue, L. Horváth, J. Steinebach, Estimation in random coefficient autoregressive models. J.
Time Series Anal. 27, 61–76 (2006)
A. Aue, L. Horváth, M. Hušková, P. Kokoszka, Testing for changes in polynomial regression.
Bernoulli 14, 637–660 (2008)
A. Aue, S. Hörmann, L. Horváth, M. Reimherr, Break detection in the covariance structure of
multivariate time series models. Ann. Stat. 37, 4046–4087 (2009a)
A. Aue, L. Horváth, M. Hušková, Extreme value theory for stochastic integrals of Legendre
polynomials. J. Multivariate Anal. 100, 1029–1043 (2009b)
A. Aue, L. Horváth, M. Hušková, Segmenting mean-nonstationary time series via trending
regressions. J. Econom. 168, 367–381 (2012)
A. Aue, S. Hörmann, L. Horváth, M. Hušková, Dependent functional linear models with
applications to monitoring structural change. Stat. Sin. 24, 1043–1073 (2014)
A. Aue, L. Horváth, D. Pellatt, Functional generalized autoregressive conditional heteroscedastic-
ity. J. Time Ser. Anal. 38, 3–21 (2017)
A. Aue, G. Rice, O. Sönmez, Detecting and dating structural breaks in functional data without
dimension reduction. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 80, 509–529 (2018)
A. Aue, G. Rice, O. Sönmez, Structural break analysis for spectrum and trace of covariance
operators. Environmetrics 31(1), e2617 (2020)
F. Avalos, Do oil prices drive food prices? the tale of a structural break. J. Int. Money Finance 42,
253–271 (2014)
I. Axt, R. Fried, On variance estimation under shifts in the mean. AStA Adv. Stat. Anal. 104,
417–457 (2020)
J. Bai, Least squares estimation of a shift in linear processes. J. Time Ser. Anal. 15(5), 453–472
(1994)
J. Bai, Least absolute deviation estimation of a shift. Econom. Theory 11, 403–436 (1995)
J. Bai, Estimation of a change point in multiple regression models. Rev. Econ. Stat. 79, 551–563
(1997)
J. Bai, Likelihood ratio tests for multiple structural changes. J. Econom. 91, 299–323 (1999)
J. Bai, Panel data models with interactive fixed effects. Econometrica 77, 1229–1279 (2009)
J. Bai, Common breaks in means and variances for panel data. J. Econom. 157, 78–92 (2010)
J. Bai, S. Ng, Determining the number of factors in approximate factor models. Econometrica 70,
191–221 (2002)
J. Bai, P. Perron, Estimating and testing linear models with multiple structural changes.
Econometrica 66, 47–78 (1998)
J. Bai, P. Perron, Computation and analysis of multiple structural change models. J. Appl. Econom.
18, 1–22 (2003)
P. Bai, A. Safikhani, G. Michailidis, Multiple change points detection in low rank and sparse high
dimensional vector autoregressive models. IEEE Trans. Signal Process. 68, 3074–3089 (2020)
B.H. Baltagi, Econometric Analysis of Panel Data, 6th edn. (Springer, New York, 2021)
M. Barassi, L. Horváth, Y. Zhao, Change point detection in time varying correlation structure. J.
Bus. Econ. Stat. 38, 340–349 (2020)
J.-M. Bardet, W. Kengne, Monitoring procedure for parameter change in causal time series. J.
Multivariate Anal. 125, 204–221 (2014)
M. Barigozzi, H. Cho, P. Fryzlewicz, Simultaneous multiple change-point and factor analysis for
high-dimensional time series. J. Econom. 206(1), 187–225 (2018)
D. Bauer, Estimating linear dynamical systems using subspace methods. Econom. Theory 21,
181–211 (2005)
I. Berkes, L. Horváth, Approximations for the maximum of stochastic processes with drift.
Kybernetika 39, 299–306 (2003a)
I. Berkes, L. Horváth, The rate of consistency of the quasi-maximum likelihood estimator. Stat.
Probab. Lett. 61, 133–143 (2003b)
Bibliography 533

I. Berkes, L. Horváth, P.S. Kokoszka, GARCH processes: structure and estimation. Bernoulli 9,
201–227 (2003)
I. Berkes, E. Gombay, L. Horváth, P. Kokoszka, Sequential change–point detection in garch.(p, q)
models. Econom. Theory 20, 1140–1167 (2004)
I. Berkes, E. Gombay, L. Horváth, Testing for changes in the covariance structure of linear
processes. J. Stat. Plan. Inference 139, 2044–2063 (2009a)
I. Berkes, S. Hörmann, J. Schauer, Asymptotic results for the empirical process of stationary
sequences. Stoch. Process. Appl. 119, 1298–1324 (2009b)
I. Berkes, L. Horváth, S. Ling, Estimation in nonstationary random coefficient autoregressive
models. J. Time Series Anal. 30, 395–416 (2009c)
I. Berkes, S. Hörmann, J. Schauer, Split invariance principles for stationary processes. Ann.
Probab. 39, 2441–2473 (2011)
I. Berkes, L. Horváth, G. Rice, Weak invariance principles for sums of dependent random
functions. Stoch. Process. Appl. 123, 385–403 (2013)
I. Berkes, W. Liu, W. Wu, Komlós-major-tusnády approximation under dependence. Ann. Probab.
42, 794–817 (2014)
I. Berkes, L. Horváth, G. Rice, On the asymptotic normality of kernel estimators of the long run
covariance of functional time series. J. Multivariate Anal. 144, 150–175 (2016)
A. Betken, Testing for change-points in long-range dependent time series by means of a self-
normalized Wilcoxon test. J. Time Ser. Anal. 37(6), 785–809 (2016)
P. Billingsley, Convergence of Probability Measures (Wiley, New York, 1968)
C. Bin, H. Yongmiao Detecting for smooth structural changes in garch models. Econom. Theory
32(3), 740–791 (2016)
N.H. Bingham, C.M. Goldie, J.L. Teugels, Regular Variation. Encyclopedia of Mathematics and
its Applications (Cambridge University Press, 1987)
J.R. Blum, J. Kiefer, M. Rosenblatt, Distribution free tests of independence based on the sample
distribution function. Ann. Math. Stat. 32(2), 485–498 (1961)
O. Boldea, A. Cornea-Madeira, A. Hall, Bootstrapping structural change tests. J. Econom. 213,
359–397 (2019)
T. Bollerslev, Modelling the coherence in short run nominal exchange rates: A multivariate
generalized ARCH model. Rev. Econ. Stat. 72, 498–505 (1990). Reprinted in ARCH: Selected
Readings (ed. R. F. Engle), Oxford University Press (1995)
D. Bosq, Linear Processes in Function Spaces (Springer, New York, 2000)
D. Bosq, D. Blanke, Inference and Prediction in Large Dimensions (Wiley, 2007)
F. Boussama, F. Fuchs, R. Stelzer, Stationarity and geometric ergodicity of bekk multivariate garch
models. Stoch. Process. Appl. 121, 2331–2360 (2011)
R.C. Bradley, Introduction to Strong Mixing Conditions, vols. 1,2,3 (Kendrick Press, 2007)
L. Breiman, Probability. Classics in Applied Mathematics (Society for Industrial and Applied
Mathematics, 1968)
P.J. Brockwell, R.A. Davis, Time Series: Theory and Applications, 2nd edn. (Springer, 2006)
P. Brohan, J.J. Kennedy, I. Harris, S.F.B. Tett, P.D. Jones, Uncertainty estimates in regional and
global observed temperature changes: A new data set from 1850. J. Geophys. Res. 111, D12106
(2006)
B. Bucchia, M. Wendler, Change–point detection and bootstrap for Hilbert space valued random
fields. J. Multivariate Anal. 155, 344–368 (2017)
A. Bücher, I. Kojadinovic, T. Rohmer, J. Segers, Detecting changes in cross-sectional dependence
in multivariate time series. J. Multivariate Anal. 132, 111–128 (2014)
A. Bücher, J.-D. Fermanian, I. Kojadinovic, Combining cumulative sum change-point detection
tests for assessing the stationarity of univariate time series. J. Time Ser. Anal. 40(1), 124–150
(2019)
F. Busetti, A. Harvey, Testing for the presence of a random walk in series with structural breaks.
J. Time Ser. Anal. 22, 127–150 (2001)
F. Busetti, A. Harvey, Further comments on stationarity tests in series with structural breaks at
unknown points. J. Time Ser. Anal. 24, 137–140 (2003)
534 Bibliography

F. Busetti, A. Taylor, Test of stationarity against a change in persistence. J. Econom. 123, 33–66
(2004)
M.M. Carhart, On persistence in mutual fund performance. J. Finance 52(1), 57–82 (1997)
M. Carrasco, X. Chen, Mixing and moment properties of various GARCH and stochastic volatility
models. Econom. Theory 18, 17–39 (2002)
G. Cavaliere, A. Taylor, Testing for a change in persistence in the presence of non–stationary
volatility. J. Econom. 147, 84–98 (2008)
G. Cavaliere, D. Harvey, S. Leybourne, A. Taylor, Testing for unit roots in the presence of a
possible break in trend and nonstationarity volatility. Econom. Theory 27, 957–991 (2011)
C. Cerovecki, C. Francq, S. Hörmann, J. Zakoían, Functional GARCH models: the quasi-likelihood
approach and its applications. J. Econom. 209, 353–375 (2019)
S. Chakar, E. Lebarbier, C. Lévy-Leduc, S. Robin, A robust approach for estimating change-points
in the mean of an .AR(1) process. Bernoulli 23(2), 1408–1447 (2017)
J. Chan, L. Horváth, M. Hušková, Darling–erdös limit results for change–point detection in panel
data. J. Stat. Plan. Inference 143, 955–970 (2013)
H. Chen, Sequential change-point detection based on nearest neighbors. Ann. Stat. 47(3), 1381–
1407 (2019)
K. Chen, A. Cohen, H. Sackrowitz, Consistent multiple testing for change points. J. Multivariate
Anal. 102, 1339–1343 (2011)
L. Chen, W. Wang, W.B. Wu, Inference of breakpoints in high-dimensional time series. J. Am.
Stat. Assoc. (2021)
S. Chenouri, A. Mozaffari, G. Rice, Robust multivariate change point analysis based on data depth.
Canad. J. Stat. 48(3), 417–446 (2020)
J.-M. Chiou, Y.-T. Chen, T. Hsing, Identifying multiple changes for a functional data sequence
with application to freeway traffic segmentation. Ann. Appl. Stat. 13(3), 1430–1463 (2019)
H. Cho, Change-point detection in panel data via double CUSUM statistic. Electron. J. Stat. 10(2),
2000–2038 (2016)
H. Cho, P. Fryzlewicz, Multiple–change–point detection for high dimensional time series via
sparsified binary segmentation. J. R. Stat. Soc. Ser. B 77, 475–507 (2015)
H. Cho, C. Kirch, Data segmentation algorithms: Univariate mean change and beyond. Econom.
Stat. (2021)
T.T.L. Chong, Structural change in AR(1) models. Econom. Theory 17, 87–155 (2001)
Y.S. Chow, A.C. Hsiung, Limiting behavior of .maxj ≤n sj j −d and the first passage times in a
random walk with positive drift. Bull. Inst. Math. Acad. Sin. 4, 35–44 (1976)
C.-S.J. Chu, M. Stinchcombe, H. White, Monitoring structural change. Econometrica 64(5), 1045–
65 (1996)
S.A. Churchill, J. Inekwe, K. Ivanovski, R. Smyth, The environmental Kuznets curve in the OECD:
1870–2014. Energy Econ. 75, 389–399 (2018)
G. Ciuperca, A general criterion to determine the number of change–points. Stat. Probab.Lett. 81,
1267–1275 (2011)
G. Claeskens, N.L. Hjort, Model Selection and Model Averaging (Cambridge University Press,
Leiden, 2008)
F. Comte, O. Lieberman, Asymptotic theory for multivariate GARCH processes. J. Multivariate
Anal. 84, 61–84 (2003)
C.M. Crainiceanu, T.J. Vogelsang, Nonmonotonic power for tests of a mean shift in a time series.
J. Stat. Comput. Simul. 77(6), 457–476 (2007)
M. Csörgő, Some rényi type limit theorems for empirical distribution functions. Ann. Math. Stat.
36, 322–326 (1965)
M. Csörgő, L. Horváth, Rényi–type empirical processes. J. Multivariate Anal. 41, 338–358 (1992)
M. Csörgő, L. Horváth, Weighted Approximations in Probability and Statistics (Wiley, New York,
1993)
M. Csörgő, L. Horváth, Limit Theorems in Change–Point Analysis (Wiley, New York, 1997)
M. Csörgő, P. Révész, How big are the increments of a wiener process. Ann. Probab. 7, 731–737
(1979)
Bibliography 535

M. Csörgő, P. Révész, Strong Approximations in Probability and Statistics (Academic Press, New
York, 1981)
M. Csörgő, S. Csörgő, L. Horváth, D. Mason, Weighted empirical and quantile processes. Ann.
Probab. 14, 31–85 (1986)
M. Csörgő, L. Horváth, Q. Shao, Convergence of integrals of uniform empirical and quantile
processes. Stoch. Process. Appl. 45, 283–294 (1993)
V. Dalla, L. Giraitis, P.M. Robinson, Asymptotic theory for time series with changing mean and
variance. J. Econom. 219(2), 281–313 (2020)
D.A. Darling, P. Erdős, A limit theorem for the maximum of normalized sum of independent
random variables. Duke Math. J. 23, 143–155 (1956)
R.A. Davis, C.Y. Yau, Consistency of minimum description length model selection for piecewise
stationary time series models. Electron. J. Stat. 7, 381–411 (2013)
R.D. Davis, D. Huang, Y-C. Yao, Testing for a change in the parameter values and order of an
autoregressive model. Ann. Stat. 23, 282–304 (1995)
R.A. Davis, T.C.M. Lee, G.A. Rodriguez-Yam, Structural break estimation for nonstationary time
series models. J. Am. Stat. Assoc. 101, 223–239 (2006)
R.A. Davis, T.C.M Lee, G.A. Rodriguez-Yam, Break detection for a class of nonlinear time series
models. J. Time Ser. Anal. 29, 834–867 (2008)
Y. Davydov, R. Zitikis, On weak convergence of random fields. Ann. Inst. Stat. Math. 60, 345–365
(2008)
J. Dedecker, P. Doukhan, G. Lang, J.R. León, S. Louhichi, C. Prieur, Weak Dependence: With
Examples and Applications (Springer, 2007)
H. Dehling, T. Mikosch, M. Sørensen, Empirical Process Techniques for Dependent Data
(Birkhäuser, 2002)
H. Dehling, O. Durieu, D. Volný, New techniques for empirical processes of dependent data. Stoch.
Process. Appl. 119(10), 3699–3718 (2009)
H. Dehling, K. Vuk, M. Wendler, Change-Point Detection Under Dependence Based on Two-
Sample U-Statistics (Springer New York, New York, 2015), pp. 195–220
H. Dehling, K. Vuk, M. Wendler, Change-point detection based on weighted two-sample U-
statistics. Electron. J. Stat. 16(1), 862–891 (2022)
A. Deng, P. Perron, A non-local perspective on the power properties of the cusum and cusum of
squares tests for structural change. J. Econom. 142, 212–240 (2008)
H. Dette, T. Kutta, Detecting structural breaks in eigensystems of functional time series. Electron.
J. Stat. 15(1), 944–983 (2021)
H. Dette, W. Wu, Z. Zhou, Change point analysis of correlation in non-stationary time series. Stat.
Sin. 29(2), 611–643 (2019)
H. Dette, T. Eckle, M. Vetter, Multiscale change point detection for dependent data. Scand. J. Stat.
47(4), 1243–1274 (2020a)
H. Dette, K. Kokot, S. Volgushev, Testing relevant hypotheses in functional time series via self-
normalization. J. R. Stat. Soc. Ser. B 82(3), 629–660 (2020b)
Y. Dong, J. Spielmann, Weak limits of random coefficient autoregressive processes and their
application in ruin theory. Insurance Math. Econ. 91, 1–11 (2020)
M. Donsker, An invariance principle for certain probability limit theorems. Mem. Am. Math. Soc.
6 (1951)
M. Donsker, Justification and extension of Doob’s heuristic approach to the Kolmogorov-Smirnov
theorems. Ann. Math. Stat. 23, 277–281 (1952)
J. Duan, Augmented GARCH(p,q) process and its diffusion limit. J. Econom. 79, 97–127 (1997)
R.M. Dudley, Uniform Central Limit Theorems (Cambridge University Press, Cambridge, 1999)
L. Dümbgen, The asymptotic behavior of some nonparametric change–point estimators. Ann. Stat.
19, 1471–1495 (1991)
S. Edwards, Change of monetary regime, contracts, and prices: Lessons from the great depression,
1932–1935. J. Int. Money Finance 108, 102190 (2020)
R.F. Engle, K.F. Kroner, Multivariate simultaneous generalized arch. Econom. Theory 11, 122–
150 (1995)
536 Bibliography

R.F. Engle, V.K. Ng, M. Rothschild, Asset pricing with a factor-arch covariance structure:
Empirical estimates for treasury bills. J. Econom. 45, 213–237 (1990)
T. Erhardsson, Conditions for convergence of random coefficient ar(1) processes and perpetuities
in higher dimensions. Ann. Stat. 20, 990–1005 (2014)
E.F. Fama, K.R. French, Common risk factors in the returns on stocks and bonds. J. Financ. Econ.
33(1), 3–56 (1993). ISSN 0304-405X
P. Fearnhead, G. Rigaill, Changepoint detection in the presence of outliers. J. Am. Stat. Assoc.
114(525), 169–183 (2019)
Q. Feng, C. Kao, Large-dimensional Panel Data Econometrics (World Scientific, 2021)
D. Ferger, Change–point estimators in case of small disorders. J. Stat. Plan. Inference 40, 33–49
(1994)
J.-D. Fermanian, H. Malongo, On the stationarity of dynamical conditional correlation models.
Econom. Theory 33, 636–663 (2017)
F. Ferraty, P. Vieu, Nonparametric Functional Data Analysis: Theory and Practice (Springer, New
York, 2006)
K.J. Forbes, R. Rigobon, No contagion, only interdependence: measuring stock market comove-
ments. J. Finance 57, 2223–2261 (2002)
C. Francq, J.-M. Zakoian, Maximum likelihood estimation of pure GARCH and ARMA-GARCH
processes. Bernoulli 10, 605–637 (2004)
C. Francq, J-M. Zakoian, GARCH Models: Structure, Statistical Inference and Financial
Applications (Wiley, 2010)
C. Francq, J.-M. Zakoian, Comment on “Quasi–maximum likelihood estimation of GARCH
models with heavy tailed likelihoods" by J. Fan, L. Qi and D. Xiu. J. Bus. Econ. Stat. 32,
198–201 (2014)
C. Francq, L. Horváth, J.-M. Zakoian, Variance targeting estimation of multivariate GARCH
models. J. Financ. Econom. 14, 353–382 (2016)
J. Franke, C. Kirch, J. Kamgaing, Changepoints in time series of counts. J. Time Ser. Anal. 33,
757 (2012)
K. Frick, A. Munk, H. Sieling, Multiscale change point inference (with discussion). J. R. Stat.
Soc. Ser. B 76, 495–580 (2014)
P. Fryzlewicz, Wild binary segmentation for multiple change point detection. Ann. Stat. 42, 2243–
2281 (2014)
P. Fryzlewicz, S. Subba Rao, Multiple-change-point detection for auto-regressive conditional
heteroscedastic processes. J. R. Stat. Soc. Ser. B 76, 903–924 (2014)
P. Galeano, D. Pena, Covariance changes detection in multivariate time series. J. Stat. Plan.
Inference 137, 194–211 (2007)
C. Gallagher, R. Lund, R. Killick, X. Shi, Autocovariance estimation in the presence of
changepoints. J. Korean Stat. Soc. 51, 107–433 (2022)
A.M. Garsia, Continuity properties of Gaussian processes with multidimensional time parameter,
in Proceedings of the .6th Berkeley Symp. Math. Stat. Probab., vol. 2 (University of California
Press, 1970), pp. 369–374
A.M. Garsia, E. Rodemich, H. Rumsey, A real variable lemma and the continuity of paths of some
gaussian processes. Indiana Univ. Math. J. 20, 565–578 (1970)
C. Gerstenberger, Robust Wilcoxon-type estimation of change-point location under short-range
dependence. J. Time Ser. Anal. 39, 90–104 (2018)
E. Ghysels, A. Guay, A. Hall, Predictive tests for structural change with unknown breakpoint. J.
Econom. 82, 209–233 (1997)
E. Gombay, Change detection in autoregressive time series. J. Multivariate Anal. 99(3), 451–464
(2008)
E. Gombay, L. Horváth, An application of the maximum likelihood test to the change–point
problem. Stoch. Process. Appl. 50, 161–171 (1994)
Bibliography 537

S.A. Good, G.K. Corlett, J.J. Remedios, E.J. Noyes, D.T. Llewellyn-Jones, The global trend in sea
surface temperature from 20 years of advanced very high resolution radiometer data. J. Climate
20(7), 1255–1264 (2007)
T. Górecki, L. Horváth, P. Kokoszka, Change point detection in heteroscedastic time series.
Econom. Stat. 20, 86–117 (2017)
T. Górecki, L. Horváth, P. Kokoszka, Change point detection in heteroscedastic time series.
Econom. Stat. 7, 63–88 (2018). ISSN 2452-3062
J. Gösmann, T. Kley, H. Dette, A new approach for open-end sequential change point monitoring.
J. Time Ser. Anal. 42(1), 63–84 (2021)
I. Grabovsky, L. Horváth, M. Hušková, Limit theorems for kernel-type estimators for the time of
change. J. Stat. Plan. Inference 89(1), 25–56 (2000)
U. Grenander, M. Rosenblatt, Statistical Analysis of Stationary Time Series (Wiley, New York,
1957)
G. Grossman, A. Krueger, Environmental impacts of a North American free trade agreement.
National Bureau of Economics Research (1991). Working Paper# 3914. Issue Date November
1991
A. Guay, E. Guerre, A data-driven specification test for dynamic regression model. Econom.
Theory 22, 543–586 (2006)
C.M. Hafner, Alternative assets and cryptocurrencies. J. Risk Financ. Manag. 13(1), 7 (2020)
C.M. Hafner, A. Preminger, On asymptotic theory for multivariate GARCH models. J. Multivariate
Anal. 100, 2044–2054 (2009)
P. Hall, C.C. Heyde, Martingale Limit Theory and its Application (Academic Press, 1980)
P. Hall, M. Hosseini-Nasab, On properties of functional principal components. J. R. Stat. Soc. Ser.
B 68, 109–126 (2006)
P. Hall, Q. Yao, Inference in arch and GARCH models with heavy-tailed errors. Econometrica 71,
285–317 (2003)
A.R. Hall, S. Han, O. Boldea, Inference regarding multiple structural changes in linear models
with endogenous regressors. J. Econom. 170, 281–302 (2012)
A.R. Hall, D.R. Osborne, N. Sakkas, Structural break inference using information criteria in
models estimated by two-stage least squares. J. Time Ser. Anal. 36, 741–762 (2015)
L.P. Hansen, Large sample properties of generalized method of moments estimators. Econometrica
50, 1029–1054 (1982)
B.E. Hansen, Tests for parameter instability in regression with i(1) processes. J. Bus. Econ. Stat.
10, 321–335 (1992)
B.E. Hansen, Approximate asymptotic p values for structural change tests. J. Bus. Econ. Stat. 15,
60–67 (1997)
B.E. Hansen, Testing for structural change in conditional models. J. Econom. 97, 93–115 (2000)
T. Harris, B. Li, J.D. Tucker, Scalable multiple changepoint detection for functional data
sequences. Environmetrics 33, e2710 (2021)
D.I. Harvey, S.J. Leybourne, A.M.R. Taylor, Modified tests for a change in persistence. J. Econom.
134, 441–469 (2006)
E. Hewitt, K. Stromberg, Real and Abstract Analysis (Springer, Berlin, 1969)
E. Hillebrand, Neglecting parameter changes in GARCH models. J. Econom. 129, 121–138 (2005)
Z. Hlávka, M. Hušková, C. Kirch, S. Meintanis, Monitoring changes in the error distribution of
autoregressive models based on Fourier methods. Test 21, 605–634 (2012)
Z. Hlávka, M. Hušková, S.G. Meintanis. Change Point Detection with Multivariate Observations
Based on Characteristic Functions (Springer International Publishing, 2017), pp. 273–290
Y. Hoga, Monitoring multivariate time series. J. Multivariate Anal. 155, 105–121 (2017)
Y. Hoga, A structural break test for extremal dependence in .β-mixing random vectors. Biometrika
105(3), 627–643 (2018a)
Y. Hoga, Detecting tail risk differences in multivariate time series: Detecting tail risk differences.
J. Time Ser. Anal. 39, 665–689 (2018b)
M. Holmes, I. Kojadinovic, J. Quessy, Nonparametric tests for change-point detection á la Gombay
and Horváth. J. Multivariate Anal. 115, 16–32 (2013)
538 Bibliography

U. Homm, J. Breitung, Testing for speculative bubbles in stock markets: a comparison of


alternative methods. J. Financ. Econom. 10, 198–231 (2012)
S. Hörmann, Augmented GARCH sequences: Dependence structure and asymptotics. Bernoulli
14, 543–561 (2008)
S. Hörmann, P. Kokoszka, Weakly dependent functional data. Ann. Stat. 38, 1845–1884 (2010)
S. Hörmann, L. Horváth, R. Reeder, A functional version of the ARCH model. Econom. Theory
29(2), 267–288 (2013)
R.A. Horn, C.R. Johnson, Topics in Matrix Analysis (Cambridge University Press, 1991)
L. Horváth, Strong approximation of renewal processes. Stoch. Process. Appl. 18, 127–138 (1984)
L. Horváth, The maximum likelihood method for testing changes in the parameters of normal
observations. Ann. Stat. 21, 671–680 (1993)
L. Horváth, Detecting changes in linear regression models. Statistics 26, 189–208 (1995)
L. Horváth, M. Hušková, Change-point detection in panel data. J. Time Ser. Anal. 33, 631–648
(2012)
L. Horváth, P. Kokoszka, Inference for Functional Data with Applications (Springer, New York,
2012)
L. Horváth, F. Liese, .Lp -estimators in ARCH models. J. Stat. Plan. Inference 119, 277–310 (2003)
L. Horváth, G. Rice, Extensions of some classical methods in change point analysis (with
discussions). Test 23, 219–290 (2014)
L. Horváth, G. Rice, Limit results for .Lp functionals of weighted CUSUM processes, in Trends in
Mathematical, Information and Data Sciences: A Tribute to Leandro Pardo (2022), pp. 51–62
L. Horváth, Q.-M. Shao, Limit theorems for for union-intersection tests. J. Stat. Plan. Inference
44, 113–148 (1995)
L. Horváth, J. Steinebach, Testing for changes in the mean and variance of a stochastic process
under weak invariance. J. Stat. Plan. Inference 91, 365–376 (2000)
L. Horváth, L. Trapani, Statistical inference in a random coefficient panel model. J. Econom. 193,
54–75 (2016)
L. Horváth, L. Trapani, Changepoint detection in heteroscedastic random coefficient autoregressive
models. J. Bus. Econ. Stat. 41(4), 1300–1314 (2023)
L. Horváth, P. Kokoszka, A. Zhang, Monitoring constancy of variance in conditionally het-
eroskedastic time series. Econom. Theory 22(3), 373–402 (2006)
L. Horváth, P. Kokoszka, J. Steinebach, On sequential detection of parameter changes in linear
regression. Stat. Probab. Lett. 77, 885–895 (2007)
L. Horváth, Z. Horváth, M. Hušková, Ratio tests for change point detection, in Beyond
Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen,
IMS Collections (IMS, 2008), pp. 293–304
L. Horváth, P. Kokoszka, R. Reeder, Estimation of the mean of functional time series and a two
sample problem. J. R. Stat. Soc. Ser. B 75(1), 103–122 (2013)
L. Horváth, P. Kokoszka, G. Rice, Testing stationarity of functional time series. J. Econom. 179(1),
66–82 (2014)
L. Horváth, G. Rice, S. Whipple, Adaptive bandwidth selection in the long run covariance estimator
of functional time series. Comput. Stat. Data Anal. 100, 676–693 (2016)
L. Horváth, M. Hušková, G. Rice, J. Wang, Asymptotic properties of the CUSUM estimator for
the time of change in linear panel data models. Econom. Theory 33(2), 366–412 (2017a)
L. Horváth, W. Pouliot, S. Wang, Detecting at-most-m-changes in linear regression models. J.
Time Ser. Anal. 38, 552–590 (2017b)
L. Horváth, C. Miller, G. Rice, A new class of change point test statistics of Rényi type. J. Bus.
Econ. Stat. 38(3), 570–579 (2020)
L. Horváth, Z. Liu, S. Lu, Sequential monitoring of changes in dynamic linear models, applied to
U.S. housing market. Econom. Theory 38, 209–272 (2021)
L. Horváth, Z. Liu, G. Rice, Y. Zhao, Detecting common breaks in the means of high dimensional
cross-dependent panels. Econom. J. 25(2), 362–383 (2022)
L. Horváth, G. Rice, Y. Zhao, Change point analysis of covariance functions: A weighted
cumulative sum approach. J. Multivariate Anal. 189, 104877 (2022)
Bibliography 539

L. Horváth, G. Rice, Y. Zhao, Testing for changes in linear models using weighted residuals J.
Multivariate Anal. 198, 105210 (2023)
L. Horváth, L. Trapani, J. VanderDoes, The maximally selected likelihood ratio test in random
coefficient models. Econom. J. (2024), Forthcoming
T. Hsing, R. Eubank, Theoretical Foundations of Functional Data Analysis, with an Introduction
to Linear Operators (Wiley, New York, 2015)
M. Hušková, Estimation of a change in linear models. Stat. Probab. Lett. 26, 13–24 (1996)
M. Hušková, C. Kirch, Bootstrapping confidence intervals for the change-point of time series. J.
Time Ser. Anal. 29, 947–972 (2008)
M. Hušková, C. Kirch, A note on studentized confidence intervals for the change-point. Comput.
Stat. 25, 269–289 (2010)
M. Hušková, C. Kirch, Bootstrapping sequential change-point tests for linear regression. Metrika
75(5), 673–708 (2012)
M. Huśková, S.G. Meintanis, Change-point analysis based on empirical characteristic functions of
ranks. Seq. Anal. 25(4), 421–436 (2006a)
M. Huśková, S.G. Meintanis, Change point analysis based on empirical characteristic functions.
Metrika 63, 145–168 (2006b)
R.J. Hyndman, G. Athanasopoulos, Forecasting: Principles and Practice, 3rd edn. (OTexts,
Melbourne, 2021), OTexts.com/fpp3
I.A. Ibragimov, Some limit theorems for stationary processes. Theory Probab. Appl. 7, 349–382
(1962)
I.A. Ibragimov, Y.V. Linnik, Independent and Stationary Sequences of Random Variables (Wolters-
Nordhoff, The Netherlands, 1971)
C. Inclán, G.C. Tiao, Use of cummulative sums of squares for retrospective detection of change of
variance. J. Am. Stat. Assoc. 89, 913–923 (1994)
H. Janečková, Z. Prášková, CWLS and ML estimates in a heteroscedastic RCA(1) model. Stat.
Decis. 22, 245–259 (2004)
D. Jarušková, Asymptotic behaviour of a test statistic for detection of change in mean of vectors.
J. Stat. Plan. Inference 140, 616–625 (2010)
D. Jarušková, Testing for a change in covariance operator. J. Stat. Plan. Inference 143(9), 1500–
1511 (2013). ISSN 0378-3758
D. Jarušková, J. Antoch, Changepoint analysis of Klementinum temperature series. Environmetrics
31(1), e2570 (2020)
T. Jeantheau, Strong consistency of estimators for multivariate arch models. Econom. Theory 14,
70–86 (1998)
S.T. Jensen, A. Rahbek, Asymptotic inference for nonstationary GARCH. Econom. Theory 20,
1203–1226 (2004)
F. Jiang, Z. Zhao, X. Shao, Time series analysis of covid-19 infection curve: A change-point
perspective. J. Econom. 232(1), 1–17 (2023)
S. Jiao, R.D. Frostig, H. Ombao, Break point detection for functional covariance. Scand. J. Stat.
50(2), 477–512 (2023)
M. Jirák, Uniform change point tests in high dimension. Ann. Stat. 43, 2451–2483 (2015)
P.D. Jones, Hemispheric surface air temperature variations: A re–analysis and an update to 1993.
J. Climate 7, 1794–1802 (1994)
P.D. Jones, A. Moberg, Hemispheric and large-scale surface air temperature variations: An
extensive revision and an update to 2001. J. Climate 16, 206–223 (2003)
J. Ju, J.Y. Lin, Q. Liu, K. Shi, Structural changes and the real exchange rate dynamics. J. Int.
Money Finance 107, 102192 (2020)
J. Kang, S. Lee, Parameter change test for random coefficient integer-valued autoregressive
processes with application to polio data analysis. J. Time Series Anal. 30(2), 239–258 (2009)
H. Kawakatsu, Matrix exponential GARCH. J. Econom. 134, 95–128 (2006)
J. Kiefer, K-sample analogues of the Kolmogorov-Smirnov and Cramer-V. Mises tests. Ann. Math.
Stat. 30, 420–447 (1959)
540 Bibliography

R. Killick, I. Eckley, Changepoint: An R package for changepoint analysis. J. Stat. Softw. 58(3),
1–19 (2014)
R. Killick, P. Fearnhead, I. Eckley, Optimal detection of changepoints with a linear computational
cost. J. Am. Stat. Assoc. 107, 1590–1598 (2012)
J. Kim, Detection of change in persistence of a linear time series. J. Econom. 95, 97–116 (2000)
C. Kirch, Block permutation principles for the change analysis of dependent data. J. Stat. Plan.
Inference 137(7), 2453–2474 (2007)
C. Kirch, Bootstrapping sequential change-point tests. Seq. Anal. 27(3), 330–349 (2008)
C. Kirch, J.T. Kamgaing, Testing for parameter stability in nonlinear autoregressive models. J.
Time Ser. Anal. 33, 365–385 (2012)
C. Kirch, P. Klein, Moving sum data segmentation for stochastics processes based on invariance.
Stat. Sin. 33, 873–892 (2021)
C. Kirch, B. Muhsal, H. Ombao, Detection of changes in multivariate time series with application
to EEG DAT. J. Am. Stat. Assoc. 110, 1197–1216 (2015)
P. Kokoszka, M. Reimherr, Asymptotic normality of the principal components of functional time
series. Stoch. Process. Appl. 123, 1546–1562 (2013)
P. Kokoszka, M. Reimherr, Introduction to Functional Data Analysis (Chapman and Hall/CRC,
Boca Raton, 2017)
P. Kokoszka, G. Rice, H.L. Shang, Inference for the autocovariance of a functional time series
under conditional heteroscedasticity. J. Multivariate Anal. 162, 32–50 (2017)
J. Komlós, P. Major, G. Tusnády, An approximation of partial sums of independent R.V.’s and the
sample DF.I. Z. Wahrsch. Verwand. Gebiete 32, 111–131 (1975)
J. Komlós, P. Major, G. Tusnády, An approximation of partial sums of independent R.V.’s and the
sample DF.II. Z. Wahrsch. Verw. Geb. 34, 33–58 (1976)
A.J. Koning, V. Protasov, Tail behaviour of gaussian processes with applications to the Brownian
pillow. J. Multivariate Anal. 87(2), 370–397 (2003)
K.K. Korkas, P. Fryzlewicz, Multiple change-point detection for non-stationary time series using
wild binary segmentation. Stat. Sin. 27, 287–311 (2017)
A.P. Korostelev, On minimax estimation of a discontinuous signal. Theory Probab. Appl. 32(4),
727–730 (1988)
S. Kovács, H. Li, P. Bühlmann, A. Munk, Seeded binary segmentation: A general methodology for
fast and optimal changepoint detection. Biometrika 110(1), 249–256 (2023)
W. Krämer, W. Ploberger, R. Alt, Testing for structural change in dynamic models. Econometrica
56, 1355–1370 (1988)
S. Küchnert, Functional arch and GARCH models: A Yule–Walker approach. Electron. J. Stat. 14,
4321–4360 (2020)
R.J. Kulperger, On the residuals of autoregressive processes and polynomial regression. Stoch.
Process. Appl. 21, 107–118 (1985)
E. Kurozumi, Confidence sets for the date of a structural change at the end of a sample. J. Time
Ser. Anal. 39, 850–862 (2018)
E. Kurozumi, P. Tuvaandorj, Model selection criteria in multivariate models with multiple
structural changes. J. Econom. 164(2), 218–238 (2011)
S. Kuznets, Economic growth and income inequality. Am. Econ. Rev. 45, 1–28 (1955)
D. Kwiatkowski, P.C.B. Phillips, P. Schmidt, Y. Shin, Testing the null hypothesis of stationarity
against the alternative of a unit root: How sure are we that economic time series have a unit
root? J. Econom. 54(1), 159–178 (1992)
T.L. Lai, H. Xing, Sequential change-point detection when the pre- and post-change parameters
are unknown. Seq. Anal. 29(2), 162–175 (2010)
W. Lai, M.D. Johnson, R. Kucherlapati, P. Park, Comparative analysis of algorithms for identifying
amplifications and deletions in array CGH data. Bioinformatics 21(19), 3763–3770 (2005)
M. Lavielle, Detection of multiple changes in a sequence of dependent variables. Stoch. Process.
Appl. 83, 79–102 (1999)
M. Lavielle, E. Moulines, Least-squares estimation of an unknown number of shifts in time series.
J. Time Ser. Anal. 21, 33–59 (2000)
Bibliography 541

M. Lavielle, G. Teyssiére, Detection of multiple change-points in multivariate time series. Lith.


Math. J. 46, 287–306 (2006)
S.P. Lawrence, D.T. Llewellyn-Jones, S.J. Smith, The measurement of climate change using data
from the Advanced Very High Resolution and Along Track Scanning Radiometers. J. Geophys.
Res. (Oceans) 109, C08017 (2004)
M.R. Leadbetter, G. Lindgren, H. Rootzen, Extremes and Related Properties of Random Sequences
and Processes (Springer, New York, 1983)
C. Lee, Estimating the number of change points in a sequence of independent normal random
variables. Stat. Probab. Lett. 25, 241–248 (1995)
S. Lee, S. Park, The CUSUM of squares test for scale changes in infinite order moving average
processes. Scand. J. Stat. 28, 625–644 (2001)
F. Leisch, C. Kleiber, K. Hornik, Monitoring structural changes with the generalized fluctuation
test. Econom. Theory 16, 835–854 (2000)
S. Leybourne, R. Taylor, T.-H. Kim, CUSUM of squares-based tests for a change in persistence.
J. Time Ser. Anal. 28, 408–433 (2006)
X. Li, S. Ghosal, Bayesian change point detection for functional data. J. Stat. Plan. Inference 213,
193–205 (2021). 0378-3758
W.K. Li, S. Ling, M. McAleer, A survey of recent theoretical results for time series models with
GARCH errors. J. Econ. Surv. 16, 245–269 (2002)
C.F.J. Lin, T. Teräsvirta, Testing the constancy of regression parameters against continuous
structural change. J. Econom. 62, 211–228 (1994)
S. Ling, On the stationarity and the existence of moments of conditional heteroskedastic ARMA
models. Stat. Sin. 9, 1119–1130 (1999)
S. Ling, Testing for change points in time series models and limiting theorems for NED sequences.
Ann. Stat. 35, 1213–1237 (2007)
S. Ling, Estimation of change points in linear and non-linear time series models. Econom. Theory
32(2), 402–430 (2016)
S. Ling, W.K. Li, Limiting distributions of maximum likelihood estimators for unstable ARMA
models with GARCH errors. Ann. Stat. 26, 84–125 (1998)
S. Ling, M. McAleer, Asymptotic theory for a vector ARMA–GARCH model. Econom. Theory
19, 280–310 (2003a)
S. Ling, M. McAleer, On adaptative estimation in nonstationary ARMA models with GARCH
errors. Ann. Stat. 31, 642–674 (2003b)
W. Liu, W.B. Wu, Asymptotics of spectral density estimates. Econom. Theory 26, 1218–1245
(2010)
H. Liu, C. Gao, R.J. Samworth, Minimax rates in sparse, high-dimensional change point detection.
Ann. Stat. 49(2), 1081–1112 (2021)
P. Mandl, Analytical Treatment of One-Dimensional Markov Processes (Springer, New York,
1968)
J. Marcinkiewicz, A. Zygmund, Sur les fonctions indépendants. Fundam. Math. 29, 60–90 (1937)
D.S. Matteson, N.A. James, A nonparametric approach for multiple change point analysis of
multivariate data. J. Am. Stat. Assoc. 109(505), 334–345 (2014)
B.P.M. McCabe, A multiple decision theory analysis of structural stability in regression. Econom.
Theory 4, 499–508 (1988)
B.P.M. McCabe, M.J. Harrison, Testing the constancy of regression relationships over time using
least squares residuals. J. R. Stat. Soc. Ser. C 29, 142–148 (1980)
M.W. McCracken, S. Ng, Fred-md: A monthly database for macroeconomic research. J. Bus.
Econ. Stat. 34, 574–589 (2016)
K.J. Mitchener, G. Pina, Pegxit pressure. J. Int. Money Finance 107, 102191 (2020)
F.A. Móricz, R.J. Serfling, W.F. Stout, Moment and probability bounds with quasi-superadditive
structure for the maximum partial sums. Ann. Probab. 10, 1032–1040 (1982)
G.V. Moustakides, Optimal stopping times for detecting changes in distributions. Ann. Stat. 14(4),
1379–1387 (1986)
542 Bibliography

N. Neumeyer, I. Van Keilegom, Changepoint tests for the error distribution in nonparametric
regression. Scand. J. Stat. 36, 518–541 (2009)
W.K. Newey, K.D. West, A simple, positive semi-definite, heteroskedasticity and autocorrelation
consistent covariance matrix. Econometrica 55, 703–708 (1987)
D.F. Nicholls, B.G. Quinn, Random Coefficient Autoregressive Models: An Introduction (Springer,
New York, 1982)
J. Nyblom, Testing for the constancy of parameters over time. J. Am. Stat. Assoc. 84, 223–230
(1989)
C.M.M. Padilla, D. Wang, Z. Zhao, Y. Yu, Change-point detection for sparse and dense functional
data in general dimensions, in Advances in Neural Information Processing Systems (2022)
E.S. Page, Continuous inspection schemes. Biometrika 41, 100–115 (1954)
E.S. Page, A test for a change in a parameter occuring at an unknown point. Biometrika 42,
523–527 (1955)
J. Pan, J. Chen, Application of modified information criterion to multiple change point problems.
J. Multivariate Anal. 97, 2221–2241 (2006)
V. Panaretos, S. Tavakoli, Fourier analysis of stationary time series in function space. Ann. Stat.
41(2), 568–603 (2013)
T. Pang, D. Zhang, T.T.L. Chong, Asymptotic inferences for an ar(1) model with a change point:
stationary and non-stationary cases. J. Time Ser. Anal. 35, 133–150 (2014)
T. Pang, T. Tai-Leung Chong, D. Zhang, Non identification of structural change in non stationary
ar(1) models. Econom. Theory 34, 985–1017 (2018)
K. Pape, P. Galeano, D. Wied, Sequential detection of parameter changes in dynamic conditional
correlation models. Appl. Stoch. Models Bus. Ind. 37, 475–495 (2021)
E. Parzen, On consistent estimates of the spectrum of stationary time series. Ann. Math. Stat. 28,
329–348 (1957)
R.S. Pedersen, A. Rahbek, Multivariate variance targeting in the BEKK-GARCH model. Econom.
J. 17, 24–55 (2014)
L. Peng, Q. Yao, Least absolute deviations estimation for ARCH and GARCH models. Biometrika
90, 967–975 (2003)
P. Perron, Y. Yamamoto, J. Zhou, Testing jointly for structural changes in the error variance and
coefficients of a linear regression model. Quant. Econ. 11, 1019–1057 (2020)
M. Pešta, M. Wendler, Nuisance-parameter-free changepoint detection in non-stationary series.
Test 29, 379–408 (2020)
V.V. Petrov, Limit Theorems of Probability Theory (Oxford University Press, Oxford, UK, 1995)
J. Petrovich, M. Reimherr, Asymptotic properties of principal component projections with repeated
eigenvalues. Stat. Probab. Lett. 130, 42–48 (2017)
D.T. Pham, The mixing property of bilinear and generalised random coefficient autoregressive
models. Stoch. Process. Appl. 23, 291–300 (1986)
P.C.B. Phillips, S-P. Shi, Financial bubble implosion and reverse regression. Econom. Theory
34(4) (2018)
P.C.B. Phillips, V. Solo, Asymptotics for linear processes. Ann. Stat. 20, 971–1001 (1992)
P.C.B. Phillips, J. Yu, Dating the timeline of financial bubbles during the subprime crisis. Quant.
Econ. 2(3), 455–491 (2011)
P.C.B. Phillips, S. Shi, J. Yu, Specification sensitivity in right-tailed unit root testing for explosive
behaviour. Oxford Bull. Econ. Stat. 76(3), 315–333 (2014)
P.C.B. Phillips, S. Shi, J. Yu, Testing for multiple bubbles: Historical episodes of exuberance and
collapse in the s&p 500. Int. Econ. Rev. 56(4), 1043–1078 (2015)
V.I. Piterbarg, Asymptotic Methods in the Theory of Gaussian Processes and Fields, volume 148
of Memoirs of the American Mathematical Society (American Mathematical Society, 1996)
D.N. Politis, J.P. Romano, Bias-corrected nonparametric spectral estimation. J. Time Ser. Anal.
16, 67–103 (1995)
M. Pollak, Average run lengths of an optimal method of detecting a change in distribution. Ann.
Stat. 15(2), 749–779 (1987)
Bibliography 543

C. Qualls, H. Watanabe, Asymptotic properties of gaussian processes. Ann. Math. Stat. 43, 580–
596 (1972)
R.E. Quandt, Tests of the hypothesis that a linear regression system obeys two separate regimes.
J. Am. Stat. Assoc. 53, 873–880 (1958)
R.E. Quandt, The estimation of the parameters of a linear regression system obeying two separate
regimes. J. Am. Stat. Assoc. 55, 324–330 (1960)
J.O. Ramsey, B.W. Silverman, Functional Data Analysis (Springer, New York, 2002)
J. Reeves, J. Chen, X.L. Wang, R. Lund, Q. Lu, A review and comparison of changepoint detection
techniques for climate data. J. Appl. Meteorol. Climatol. 46(6), 900–915 (2007)
M. Regis, P. Serra, E.R. van den Heuvel, Random autoregressive models: A structured overview.
Econom. Rev. 41, 207–230 (2021)
M. Reimherr, Functional regression with repeated eigenvalues. Stat. Probab. Lett. 107, 62–70
(2015)
A. Rényi, On the theory of order statistics. Acta Math. Acad. Sci. Hungar. 4, 191–231 (1953)
P. Révész, Random Walk in Random and Non-random Environments (World Scientific, Singapore,
1990)
G. Rice, H.L. Shang, A plug-in bandwidth selection procedure for long-run covariance estimation
with stationary functional time series. J. Time Ser. Anal. 38, 591–609 (2017)
G. Rice, C. Zhang, Consistency of binary segmentation for multiple change-point estimation with
functional data. Stat. Probab. Lett. 180, 109228 (2022). ISSN 0167-7152
A. Rinaldo, D. Wang, Q. Wen, R. Willett, Y. Yu, Localizing changes in high-dimensional regression
models, in Proceedings of the International Conference on Artificial Intelligence and Statistics
(2021)
S.W. Roberts, √ A comparison of some control chart procedures. Technometrics 8, 411–430 (1966)
A. Schick, . N –consistent estimation in a random coefficient autoregressive model. Austral. J.
Stat. 38, 155–160 (1996)
A.J. Scott, M. Knott, A cluster analysis method for grouping means in the analysis of variance.
Biometrics 30(3), 507–512 (1974)
M. Serbinowska, Consistency of an estimator of the number of changes in binomial observations.
Stat. Probab. Lett. 29, 337–344 (1996)
M. Shahbaz, A. Sinha, Environmental Kuznets curve for CO2 emissions: a literature survey. J.
Econ. Stud. 46, 106–168 (2019)
Q.-M. Shao, On a conjecture of Révész. Proc. Am. Math. Soc. 123, 575–582 (1995)
X. Shao, Self-normalization for time series: A review of recent developments. J. Am. Stat. Assoc.
110(512), 1797–1817 (2015)
X. Shao, X. Zhang, Testing for change points in time series. J. Am. Stat. Assoc. 105(491), 1228–
1240 (2010)
O.S. Sharipov, M. Wendler, Bootstrapping covariance operators of functional time series. J.
Nonparam. Stat. 32(3), 648–666 (2020)
O. Sharipov, J. Tewes, M. Wendler, Sequential block bootstrap in a Hilbert space with application
to change point analysis. Canad. J. Stat. 44(3), 300–322 (2016)
X. Shi, C. Gallagher, R. Lund, R. Killick, A comparison of single and multiple changepoint
techniques for time series data. Comput. Stat. Data Anal. 170, 107433 (2022)
A.N. Shiryaev, On optimum methods in quickest detection problems. Theory Probab. Appl. 8,
22–46 (1963)
G.R. Shorack, J.A. Wellner, Empirical Processes with Applications to Statistics (Wiley, 1986)
D. Siegmund, Sequential Analysis: Tests and Confidence Intervals (Springer, New York, 2013)
A.V. Skorokhod, Limit theorems for stochastic processes. Theory Probab. Appl. 1, 261–290 (1956)
St. Louis MO: Federal Reserve Bank of St. Louis. Fred, Federal Reserve economic data (2023)
A. Steland, Monitoring procedures to detect unit roots and stationarity. Econom. Theory 23,
1108–1135 (2006)
A. Steland, Testing and estimating change-points in the covariance matrix of a high-dimensional
time series. J. Multivariate Anal. 177, 104582 (2020)
544 Bibliography

J.H. Stock, M.W. Watson, Disentangling the channels of the 2007–2009 recession. National
Bureau of Economic Research, No. w18094 (2012)
C. Stoehr, J.A.D. Aston, C. Kirch, Detecting changes in the covariance structure of functional time
series with application to FMRI data. Econom. Stat. 18, 44–62 (2021)
Y. Sun, P. Phillips, S. Jin, Optimal bandwidth selection in heteroskedasticity–autocorrelation robust
testing. Econometrica 76, 175–194 (2008)
D. Surgailis, G. Teyssiére, M. Vaičiulis, Detecting and estimating epidemic changes in dependent
functional data. J. Multivariate Anal. 109, 204–220 (2008)
A. Tartakovsky, I. Nikiforov, M. Basseville, Sequential Analysis: Tests and Confidence Intervals
(Chapman and Hall/CRC, New York, 2014)
A. Thavaneswaran, S.S. Appadoo, M. Ghahramani, RCA models with GARCH innovations. Appl.
Math. Lett. 22, 110–114 (2009)
A.W. Vaart, J.A. Wellner, Weak Convergence and Empirical Processes (Springer, 1996)
E.S. Venkatraman, Consistency results in multiple change–point problems. Technical Report
Technical Report No. 24, Stanford University, 1992
T.J. Vogelsang, Wald-type tests for detecting breaks in the trend function of a dynamic time series.
Econom. Theory 13, 818–848 (1997)
L.Ju. Vostrikova, Detection of “disorder” in multidimensional random processes. Sov. Math. Dokl.
24, 55–59 (1981)
D. Wang, Y. Yu, A. Rinaldo, Univariate mean change point detection: Penalization, CUSUM and
optimality. Electron. J. Stat. 14(1), 1917–1961 (2020)
R. Wang, C. Zhu, S. Volgushev, X. Shao, Inference for change points in high-dimensional data via
selfnormalization. Ann. Stat. 50(2), 781–806 (2022)
M.J. Wichura, On the construction of almost uniformly convergent random variables with given
weakly convergent image laws. Ann. Math. Stat. 41, 284–291 (1970)
D. Wied, W. Krämer, H. Dehling, Testing for a change in correlation at an unknown point in time
using an extended functional delta method. Econom. Theory 28, 570–589 (2012)
D. Wied, D. Ziggle, T. Berens, On the application of new tests for structural changes on global
minimum-variance portfolios. Stat. Pap. 54, 955–975 (2013)
C.K. Wikle, A. Zammit-Mangion, N. Cressie, Spatio-Temporal Statistics with R (Chapman and
Hall/CRC, 2019)
J.M. Wooldridge, Econometric Analysis of Cross Section and Panel Data, 2nd edn. (MIT Press,
2010)
C.-F. Wu, Asymptotic theory of nonlinear least squares estimation. Ann. Stat. 9, 501–513 (1981)
W. Wu, Nonlinear System Theory: Another Look at Dependence, volume 102 of Proceedings of
The National Academy of Sciences of the United States (National Academy of Sciences, 2005)
W. Wu, Strong invariance principles for dependent random variables. Ann. Probab. 35, 2294–2320
(2007)
J. Wu, Z. Xiao, A powerful test for changing trends in time series models: Test for changing trends
in time series models. J. Time Ser. Anal. 39, 488 (2018)
W. Wu, P. Zaffaroni, Asymptotic theory for spectral density estimates of general multivariate time
series. Econom. Theory 34, 1–22 (2018)
K.L. Xu, Testing for structural change under non–stationary variances. Econom. J. 18, 274–305
(2015)
Y.-C. Yao, Estimating the number of change-points via schwartz’s criterion. Stat. Probab. Lett. 6,
181–189 (1988)
Y. Yu, A review on minimax rates in change point detection and localisation (2020)
A. Zeileis, Econometric computing with HC and HAC covariance matrix estimators. J. Stat. Softw.
11(10), 1–17 (2004)
A. Zeileis, F. Leisch, C. Kleiber, K. Hornik, Monitoring structural change in dynamic econometric
models. J. Appl. Econom. 20, 99–121 (2005)
X. Zhang, X. Shao, K. Hayhoe, D.J. Wuebbles, Testing the structural stability of temporally
dependent functional observations and application to climate projections. Electron. J. Stat.
5, 1765–1796 (2011)
Bibliography 545

Z. Zhou, Heteroscedasticity and autocorrelation robust structural change detection. J. Am. Stat.
Assoc. 108, 726–740 (2013)
K. Zhu, S. Ling, Likelihood ratio tests for the structural change of an AR(p) model to a threshold
ar(p) model. J. Time Ser. Anal. 33, 223–232 (2011)
E. Zivot, J. Wang, Modelling Financial Time Series with S-PLUS (Springer, New York, 2006)

You might also like