Adler: Random Fields and Geometry
Adler: Random Fields and Geometry
From Robert
For my parents, Kurt and Berta Adler, who taught me the most important lesson
of all:
Know from whence you have come, to where you are going,
and before Whom you must ultimately report.
Ethics of the Fathers, Chapter 3
From Jonathan
For my wife, Lee-Ann, and daughter, Isobel, for their tremendous support of my
efforts in writing this book and my parents, John and Brenda, for all of their encour-
agement.
Robert J. Adler
Jonathan E. Taylor
With 21 Illustrations
Robert Adler Jonathan Taylor
Faculty of Industrial Engineering Department of Statistics
and Management Stanford University
Technion – Israel Institute of Technology Sequoia Hall
Haifa, 32000 Stanford, CA 94305-4065
Israel U.S.A.
9 8 7 6 5 4 3 2 1
springer.com (JLS/EB)
Preface
Since the term “random field’’ has a variety of different connotations, ranging from
agriculture to statistical mechanics, let us start by clarifying that, in this book, a
random field is a stochastic process, usually taking values in a Euclidean space, and
defined over a parameter space of dimensionality at least 1.
Consequently, random processes defined on countable parameter spaces will not
appear here. Indeed, even processes on R1 will make only rare appearances and,
from the point of view of this book, are almost trivial. The parameter spaces we like
best are manifolds, although for much of the time we shall require no more than that
they be pseudometric spaces.
With this clarification in hand, the next thing that you should know is that this
book will have a sequel dealing primarily with applications.
In fact, as we complete this book, we have already started, together with KW
(Keith Worsley), on a companion volume [8] tentatively entitled RFG-A, or Random
Fields and Geometry: Applications. The current volume—RFG—concentrates on
the theory and mathematical background of random fields, while RFG-A is intended
to do precisely what its title promises. Once the companion volume is published,
you will find there not only applications of the theory of this book, but of (smooth)
random fields in general.
Making a clear split between theory and practice has both advantages and disad-
vantages. It certainly eased the pressure on us to attempt the almost impossible goal
of writing in a style that would be accessible to all. It also, to a large extent, eases the
load on you, the reader, since you can now choose the volume closer to your interests
and so avoid either “irrelevant’’ mathematical detail or the “real world,’’ depending on
your outlook and tastes. However, these are small gains when compared to the major
loss of creating an apparent dichotomy between two things that should, in principle,
go hand-in-hand: theory and application. What is true in principle is particularly true
of the topic at hand, and, to explain why, we shall indulge ourselves in a paragraph
or two of history.
The precusor to both of the current volumes was the 1981 monograph The Geom-
etry of Random Fields (GRF) which grew out of RJA’s (i.e., Robert Adler’s) Ph.D.
thesis under Michael Hasofer. The problem that gave birth to the thesis was an applied
vi Preface
one, having to do with ground fissures due to water permeating through the earth un-
der a building site. However, both the thesis and GRF ended up being directed more
to theoreticians than to subject-matter researchers. Nevertheless, the topics there
found many applications over the past two decades, in disciplines as widespread as
astrophysics and medical imaging.
These applications led to a wide variety of extensions of the material of GRF,
which, while different in extent to what was there, were not really different in kind.
However, in the late 1990s KW found himself facing a brain mapping problem on
the cerebral cortex (i.e., the brain surface) that involved looking at random fields
on manifolds. Jonathan Taylor (JET) looked at this problem and, in somewhat of a
repetition of history, took it to an abstract level and wrote a Ph.D. thesis that completely
revolutionized1 the way one should think about problems involving the geometry
generated by smooth random fields. This, and subsequent material, makes up Part III
of the current, three-part, book.
In fact, this book is really about Part III, and it is there that most of the new
material will be found. Part I is mainly an adaptation of RJA’s 1990 IMS lecture
notes, An Introduction to Continuity, Extrema, and Related Topics for General Gauss-
ian Processes, considerably corrected and somewhat reworked with the intention of
providing all that one needs to know about Gaussian random fields in order to read
Part III. In addition, Part I includes a chapter on stationarity. En passant, we also
included many things that were not really needed for Part III, so that Part I can be
(and often has been) used as the basis of a one-quarter course in Gaussian processes.
Such a course (and, indeed, this book as a whole) would be aimed at students who
have already taken a basic course in measure-theoretic probability and also have some
basic familiarity with stochastic processes.
Part II covers material from both integral and differential geometry. However, the
material here is considerably less standard than that of Part I, and we expect that few
readers other than professional geometers will be familiar with all of it. In addition,
some of the proofs are different from what is found in the standard geometry literature
in that they use properties of Gaussian distributions.2
There are two main aims to Part II. One is to set up an analogue of the critical point
theory of Marston Morse in the framework of Whitney stratified manifolds. What
makes this nonstandard (at least in terms of what most students of mathematics see
as part of their graduate education) is that Morse theory is usually done for smooth
manifolds, preferably without boundaries. Whitney stratified manifolds are only
piecewise smooth, and are permitted any number of edges, corners, etc. This brings
them closer to the objects of integral geometry, to which we devote a chapter. While
the results of this specific chapter are actually subsumed by what we shall have to
say about Whitney stratified manifolds, they have the advantage that they are easy to
state and prove without heavy machinery.
The second aim of Part II is to develop Lipschitz–Killing curvatures in the setting
of Whitney stratified manifolds and to describe their role in what are known as “tube
1 This verb was chosen by RJA and not JET.
2 After all, since we shall by then have the Gaussian Part I behind us, it seems wasteful not
to use it when it can help simplify proofs.
Preface vii
formulas.’’ We shall spend quite some time on this. Some of the material here is
“well known’’ (albeit only to experts) and some, particularly that relating to tube and
Crofton formulas in Gauss space, is new. Furthermore, we derive the tube formulas
for locally convex Whitney stratified manifolds, which is both somewhat more general
than the usual approach for smooth manifolds, and somewhat more practical, since
most of the parameter spaces we are interested in have boundaries. In addition, the
approach we adopt is often unconventional.
These two aims make for a somewhat unusual combination of material and there is
no easily accessible and succinct3 alternative to our Part II for learning about them. In
the same vein, in order to help novice differential geometers, we have included a one-
chapter primer on differential geometry that runs quickly, and often unaesthetically,
through the basic concepts and notation of this most beautiful part of mathematics.
However, although Parts I and II of this book contain much material of intrinsic
interest we would not have written them were it not for Part III, for which they provide
necessary background material. What is it in Part III that justifies close to 300 pages
of preparation? Part III revolves around the excursion sets of smooth, Rk -valued
random fields f over piecewise smooth manifolds M. Excursion sets are subsets of
M given by
AD ≡ AD (f, M) = {t ∈ M : f (t) ∈ D}
for D ⊂ Rk .
A great deal of the sample function behavior of such fields can be deduced from
their excursion sets and a surprising amount from the Euler, or Euler–Poincaré, char-
acteristics of these excursion sets, defined in Part II. In particular, if we denote the
Euler characteristic of a set A by ϕ(A), then much of Part III is devoted to finding
the following expression for their expectation, when f is Gaussian with zero mean
and unit constant variance:
M
dim
E{ϕ(AD )} = (2π)−j/2 Lj (M)Mkj (D). (0.0.1)
j =0
Here the Lj (M) are the Lipschitz–Killing curvatures of M with respect to a Riemann-
ian metric induced by the random field f , and the Mkj (D) are certain Minkowski-like
functionals (closely akin to Lipschitz–Killing curvatures) on Rk under Gauss mea-
sure.
If all of this sounds terribly abstract, the truth is that it both is, and is not. It is
abstract, because while (0.0.1) has had many precursors over the last 60 years or so, it
has never before been established in the generality described above. It is also abstract
in that the tools involved in the derivation of (0.0.1) in this setting require some rather
heavy machinery from differential geometry. However, this level of abstraction has
3 The stress here is on “succinct.’’ With the exception of the material on Gauss space, almost
everything that we have to say can be found somewhere in the literatures of integral and
differential geometry, for which there are many excellent texts, some of which we shall list
later. However, all presume a background knowledge that is beyond what we shall require,
and each contains only a subset of the results we shall need.
viii Preface
turned out to pay significant dividends, for not only does it yield insight into earlier
results that we did not have before, but it also has practical implications. For example,
the approach that we shall employ works just as well for nonstationary processes as
it does for stationary ones.4 However, nonstationarity, even on manifolds as simple
as [0, 1]2 , was previously considered essentially intractable. Simply put, this is one
of those rare but constantly pursued examples in mathematics in which abstraction
leads not only to a complete and elegant theory, but also to practical consequences.
An extremely simple and very down-to-earth application of (0.0.1) arises when
the manifold is the unit interval [0, 1], f is real-valued, and D = [u, ∞). In that
case, E{ϕ(AD )} is no more than the mean number of upcrossings of the level u by the
process f , along with a boundary correction term. Consequently, modulo the bound-
ary term, (0.0.1) collapses to no more than the famous Rice formula, undoubtedly
one of the most important results in the applications of smooth stochastic processes.
If you are unfamiliar with Rice’s formula, then you might want to start reading this
book at Section 11.1, where it appears in some detail, together with heuristic, but
instructional, proofs and applications.
One of the reasons that Rice’s formula is so important is that it has long been used
as an approximation, for large u, to the excursion probability
P sup f (t) ≥ u ,
t∈[0,1]
itself an object of major practical importance. The heuristic argument behind this
is simple: If f crosses a high level, it is unlikely do so more than once. Thus, in
essence, the probability that f crosses the level u is close to the probability that there
is an upcrossing of u, along with a boundary correction term. (The correction comes
from the fact that one way for supt∈[0,1] f (t) to be larger than u is for there to be
no upcrossings but f (0) ≥ u.) Since the number of upcrossings of a high level will
always be small, the probability of an upcrossing is well approximated by the mean
number of upcrossings. Hence Rice’s formula gives an approximation for excursion
probabilities.
If (0.0.1) is the main result of Part III, then the second-most-important result is
that, at the same level of generality and for a wide choice of D, we can find a bound
for the difference
|P{∃ t ∈ M : f (t) ∈ D} − E{ϕ(AD )}|.
A specific case of this occurs when f takes values in R1 , in which case not only can
we show that, for large u,
P sup f (t) ≥ u − E{ϕ(A[u,∞) )}
t∈M
is small, but we can provide an upper bound to it that is both sharp and explicit.
Given that the second term here is known from (0.0.1), what this inequility gives is
an excellent approximation to Gaussian excursion probabilities in a very wide setting,
something that has long been a holy grail of Gaussian process theory.
4 Still assuming marginal stationarity, i.e., zero mean and constant variance.
Preface ix
A B
A B
Fig. 0.0.1. A C ∞ function defined over a manifold with a C ∞ boundary gives excursion sets
that have sharp, nondifferentiable, corners.
Such an example is shown in Figure 0.0.1, where the parameter space is a disk.
The three excursions of a (C ∞ ) function f above some nominal level are marked on
the function surface, and these lie above the three corresponding components of the
excursion set A[u,∞) . Note that, despite the smoothness of each component of this
example, the excursion set has sharp corners where it intersects with the boundary of
the parameter space. In other words, A[u,∞) is only a piecewise smooth manifold.
It turns out that since we end up with piecewise smooth manifolds for our excursion
sets, there is not a lot saved by not starting with them as parameter spaces as well.5
So now you know what awaits you at the end of the path through this book.
However, traversing the path has value in itself. Wandering, as it does, through the
fields of both probability and geometry, it is a path that we imagine not too many of
you will have traversed before. We hope that you will enjoy the scenery along the
way as much as we have enjoyed describing it. (We also hope, for your sake, that it
will be easier and faster in the reading than it was in the writing.)
We are left now with two tasks: Advising how best to read this book, and offering
our acknowledgments.
5 Of course, we could simplify things considerably by working only with parameter spaces
that have no boundary, something that would be natural, for example, for a differential
geometer. However, this would leave us with a theory that could not handle parameter
spaces as simple as the square and the cube, a situation that would be intolerable from the
point of view of applications.
x Preface
The best way to read the book is, of course, to start at the beginning and work
through to the end. That was how we wrote it. However, here some other possibilities,
depending on what you want to get out of it.
(i) A course on Gaussian processes: Chapters 1 through 4 along with Sections 5.1
through 5.4 if you want to learn about stationarity as well. These chapters can
be read in more or less any order; see the comments in the introduction to Part I.
(ii) Random fields on Euclidean spaces, with an accent on geometry: Sections 1.1–
1.4.2 and Chapter 3 for basic Gaussian processes, Sections 4.1, 4.5, and 4.6
for some classical material on extremal distributions, and Chapter 5 on station-
arity. Chapter 6 and Section 9.4 give the basic geometry and Chapter 11 the
random geometry of Gaussian fields. Section 14.4 gives examples of how the
results of Chapter 11 relate to excursion probabilities, and Section 15.10 gives
examples of the non-Gaussian theory. (Note that because you have chosen to
remain in the Euclidean scenario, and so avoid most of the real challenges of
differential geometry, you have been relegated to reading examples instead of
the general case!)
(iii) Probabilisitic problems in, and using, differential geometry: Sections 1.1, 1.2
and the results (but not proofs) of Chapter 3 to get a bare-bones introduction to
Gaussian processes, along with Sections 5.5 and 5.6 for some important notation.
As much of Chapter 7 as you need to revise differential-geometric concepts,
followed by Chapters 8, 9, and 10. The punch line is then in Chapters 12 and 13
for Gaussian processes and Chapter 15 in general. It is only in this last chapter
that you will get to see all the geometric preliminaries of Part II in play at once.
(iv) Applications without the theory: Wait for RFG-A. We are working on it!
Now for the acknowledgments. Both RJA and JET owe debts of gratitude to KW,
and we had better acknowledge them now, since we can hardly do it in the preface
of RFG-A.
Beyond our personal debts to KW, not least for getting the two us of together,
the subject matter of this book also owes him an enormous debt of gratitude. It
was during his various extensions and applications of the material of GRF that the
passage between the old Euclidean theory and its newer manifold version began to
take shape. Without his tenacious refusal to leave (applied) problems because the
theory (geometry) seemed too hard, the foundations on which our Part III is based
would never have been laid.
Back to the personal level, we also owe debts of gratitude to numerous students
at the Technion, UC Santa Barbara, Stanford, and the ICE-EM in Brisbane who sat
through courses as we put this volume together, as well as the group at McGill that
went through the book as a reading course with KW. Their enthusiasm, patience, and
refusal to take “it is easy to see that’’ for an answer when it was not all that easy to
see things, not to mention all the typos and errors that they found, has helped iron a
lot of wrinkles out of the final product.
In particular, we would like to thank Nicholas Chamandy, Sourav Chatterjee,
Steve Huntsman, Farzan Rohani, Alessio Sancetta, Armin Schwartzman, and Sreekar
Vadlamani for their questions, comments, and, embarrassingly often, corrections.
Preface xi
The ubiquitous anonymous reviewer also made a number of useful suggestions and
we are suitably grateful to him/her.
The generous support of the U.S.–Israel Binational Science Foundation, the Israel
Science Foundation, the U.S. National Science Foundation, the Louis and Samuel
SeidenAcademic Chair, and the Canadian Natural Sciences and Engineering Research
Council over the (too long a) period that we worked on this book are all gratefully
acknowledged.
Finally, don’t forget, after you finish reading this book, to run to your library for
a copy of RFG-A, to see what all of this theory is really good for.
Until such time as RFG-A appears in print, preliminary versions will be available
on our home pages, which is where we shall also keep a list of typos and/or corrections
for this book.
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1 Gaussian Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Gaussian Variables and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Boundedness and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Fields on RN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.2 Differentiability on RN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4.3 The Brownian Family of Processes . . . . . . . . . . . . . . . . . . . . 24
1.4.4 Generalized Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.4.5 Set-Indexed Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4.6 Non-Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.5 Majorizing Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2 Gaussian Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.1 Borell–TIS Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 Comparison Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 Orthogonal Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.1 The General Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2 The Karhunen–Loève Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4 Excursion Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1 Entropy Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Processes with a Unique Point of Maximal Variance . . . . . . . . . . . . . 86
4.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
xiv Contents
Part II Geometry
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Gaussian Processes
Part I. Gaussian Processes 3
If you have not yet read the preface, then please do so now.
Since you have read the preface, you already know a number of important things
about this book, including the fact that Part I is about Gaussian random fields.
The centrality of Gaussian fields to this book is due to two basic factors:
• Gaussian processes have a rich, detailed, and very well-understood general theory,
which makes them beloved by theoreticians.
• In applications of random field theory, as in applications of almost any theory, it is
important to have specific, explicit formulas that allow one to predict, to compare
theory with experiment, etc. As we shall see in Part III, it will be only for Gaussian
(and related; cf. Section 1.4.6 and Chapter 15) fields that it is possible to derive
such formulas, and then only in the setting of excursion sets.
The main reason behind both these facts is the convenient analytic form of the
multivariate Gaussian density, and the related properties of Gaussian fields. This is
what Part I is about.
There are five main collections of basic results that will be of interest to us.
Rather interestingly, although later in the book we shall be interested in Gaussian
fields defined over various types of manifolds, the basic theory of Gaussian fields
is actually independent of the specific geometric structure of the parameter space.
Indeed, after decades of polishing, even proofs gain little in the way of simplification
by restricting to special cases even as simple as R. Thus, at least for a while, we can
and shall work in as wide as possible generality, working with fields defined only on
topological spaces to which we shall assign a natural (pseudo)metric induced by the
covariance structure of the field.
The first set of results that we require, along with related information, form Chap-
ter 1 and are encapsulated in different forms in Theorems 1.3.3 and 1.3.5 and their
corollaries. These give a sufficient condition, in terms of metric entropy, ensuring the
sample path boundedness and continuity of a Gaussian field along with providing in-
formation about moduli of continuity. While this entropy condition is also necessary
for stationary fields, this is not the case in general, and so for completeness we look
briefly at the majorizing measure version of this theory in Section 1.5. However, it
will be a rare reader of this book who will ever need the more general theory.
To put the seemingly abstract entropy conditions into focus, these results will be
followed by a section with a goodly number of extremely varied examples. Despite
the fact that these cover only the tip of a very large iceberg, their diversity shows
the power of the abstract approach, in that all can be treated via the general theory
without further probabilistic arguments. The reader who is not interested in the general
Gaussian theory, and cares mainly about the geometry of fields on RN or on smooth
manifolds, need only read Sections 1.4.1 and 1.4.2 on continuity and differentiability
in this scenario, along with the early parts of Section 1.4.3, needed for understanding
the spectral representation of Chapter 5.
Chapter 2 contains the Borell–TIS (Borell–Tsirelson–Ibragimov–Sudakov) in-
equality and Slepian inequalities (along with some of their relatives). The Borell–TIS
inequality gives a universal bound for the excursion probability
4 Part I. Gaussian Processes
P sup f (t) ≥ u , (0.0.2)
t∈T
u > 0, for any centered, continuous Gaussian field. As such, it is a truly basic
tool of Gaussian processes, somewhat akin to Chebyshev’s inequality in statistics or
maximal inequalities in martingale theory. Slepian’s inequality and its relatives are
just as important and basic, and allow one to use relationships between covariance
functions of Gaussian fields to compare excursion probabilities and expectations of
suprema.
The main result of Chapter 3 is Theorem 3.1.1, which gives an expansion for
a Gaussian field in terms of deterministic eigenfunctions with independent N (0, 1)
coefficients. A special case of this expansion is the Karhunen–Loève expansion of
Section 3.2, with which we expect many readers will already be familiar. Together
with the spectral representations of Chapter 5, they make up what are probably the
most important tools in the Gaussian modeler’s bag of tricks. However, these ex-
pansions are also an extremely important theoretical tool, whose development has
far-reaching consequences.
Chapter 4 serves as a basic introduction to what is also one of the central topics
of Part III, the computation of the excursion probabilities for a zero-mean Gaussian
field and a general parameter space T . In Part III we shall develop highly detailed
expansions of the form
2
n
P sup f (t) ≥ u = uα e−u /2σT Cj u−j + error,
2
t∈T j =0
for large u, appropriate parameters α, σT2 , n, and Cj that depend on both f and T .
However, to do this, we shall have to place specific assumptions on the parameter
space T , in particular assuming that it is a piecewise smooth manifold.6 Without
these assumptions, the best that one can do is to identify α, σT2 , and occasionally C0 ,
and this is what Chapter 4 will do. Furthermore, to keep the treatment down to a
reasonable length, we shall generally concentrate only on upper bounds, rather than
expansions, for Gaussian excursion probabilities.7
Chapter 5, the last of Part I, is somewhat different from the others in that it is
not really about Gaussian processes, but about stationarity and isotropy in general.
The main reason for the generality is that limiting oneself to the Gaussian scenario
gains us so little that it is not worthwhile doing so. The results here, however, will be
crucial for many of the detailed calculations of Part III.
6 For those of you who are already comfortable with the theory of stratified manifolds, our
“piecewise smooth manifolds’’ are Whitney stratified manifolds with convex support cones.
7 There is also a well-developed theory of a Poisson limit nature for probabilities of the form
P{supt∈Tu f (t) ≥ η(x, u)}, for which η(x, u) and the size of Tu grow (to infinity) with
u. In this case, one searches for growth rates that give a limit dependent on x, but not u.
You can find more about this in Aldous [10], which places the Gaussian theory within a
much wider framework of limit results, or in Leadbetter, Lindgren, and Rootzén [97] and
Piterbarg [126], which give more detailed and more rigorous accounts for the Gaussian and
Gaussian-related situation.
Part I. Gaussian Processes 5
There are a number of ways in which you can read Part I of this book. You
should definitely start with Sections 1.1 and 1.2, which have some boring, standard,
but important technical material. From then on, it is very much up to you, since
the remainder of Chapter 1 and the other four chapters of Part I are more or less
independent of one another. A reviewer of the book suggested going from Section 1.2
directly to Chapter 3 to read about how to construct examples of Gaussian processes
via orthogonal expansions, a suggestion that certainly makes historical sense and
would also be more natural for an analyst rather than a probabilist. With or without
Chapter 3, one can go from Section 1.2 to Section 1.3 to learn a little about entropy
methods and from there directly to Chapter 4 to get quickly to extremal properties,
one of the key topics of this book. Chapter 2, on the Borell and Slepian inequalities,
can follow this. We obviously wrote the book in the order that seemed most logical
to us, but you do have a fair number of choices in how to read Part I.
Finally, we repeat what was already said in the preface, that there is nothing new
in Part I beyond perhaps the way some things are presented, and that as a treament
of the basic theory of Gaussian fields it is not meant to be exhaustive. There are now
many books covering various aspects of this theory, including those by Bogachev
[28], Dudley [56], Fernique [67], Hida and Hitsuda [77], Janson [86], Ledoux and
Talagrand [99], Lifshits [105], and Piterbarg [126]. In terms of what will be important
to us, [56] and [99] stand out from the pack, perhaps augmented with Talagrand’s
review [154]. Finally, while not as exhaustive8 as the others, you might find RJA’s
lecture notes [3], augmented with the corrections in Section 2.1 below, a user-friendly
introduction to the subject.
We shall start with some rather dry, but necessary, technical definitions. As mentioned
in the penultimate paragraph of the introduction to Part I, once you have read them
you have a number of choices as to what to read next.
As you read in the preface, for us a random field is simply a stochastic process,
usually taking values in a Euclidean space, and defined over a parameter space of
dimensionality at least 1. Although we shall be rather loose about exchanging the
terms “field’’ and “process,’’ in general we shall use “field’’ when the geometric
structure of the parameter space is important to us, and shall use “process’’ otherwise.
“Random’’ and “stochastic’’ are, of course, completely synonymous.
Here is a formal definition, which states the obvious (and, we hope, the familiar).
Thus, f (ω) is a function, and (f (ω))(t) its value at time t. In general, however,
we shall not distinguish among
1 The use of T comes from the prehistory of Gaussian processes, and probably stands for
“time.’’ While the whole point of this book is to get away from the totally ordered structure
of R, the notation is too deeply entombed in the collective psyche of probabilists to change
it now. Later on, however, when we move to manifolds as parameter spaces, we shall
emphasize this by replacing T by M. Nevertheless, points in M will still be denoted by t.
We hereby make the appropriate apologies to geometers.
2 More on notation: While we shall follow the standard convention of denoting random
variables by uppercase Latin characters, we shall use lowercase to denote random functions.
The reason for this will be become clear in Parts II and III, where we shall need the former
for tangent vectors.
8 1 Gaussian Fields
While and may not be explicit, there are simple, and rather important, bounds
that hold for every x > 0 and become sharp very quickly as x grows. In particular,
in terms of we have4
1 1 1
− 3 ϕ(x) < (x) < ϕ(x). (1.2.2)
x x x
x 0
∞
= x −1 e−x /2 e−(2v+v /x )/2 dv
2 2 2
0
∞
≥ x −1 e−x /2 e−v (1 − v 2 /(2x 2 )) dv
2
on using the fact that e−z > 1 − z for all z ≥ 0. It is now trivial to check that the remaining
integral is at least 1 − x −2 .
5 Throughout the book, vectors are taken to be row vectors, and a prime indicates transposition.
The inner product between x and y in Rd is usually denoted by x, y or, occasionally, by
x · y and even (x, y) when this makes more sense.
6 A d × d matrix C is called nonnegative definite (or positive semidefinite) if xCx ≥ 0 for
all x ∈ Rd . A function C : T × T → R is called nonnegative definite if the matrices
(C(ti , tj ))ni,j =1 are nonnegative definite for all 1 ≤ n < ∞ and all (t1 , . . . , tn ) ∈ T n .
7 At various places we shall use the notation | | to denote any of “absolute value,’’ “Euclidean
norm,’’ “determinant,’’ or “Lebesgue measure,’’ depending on the argument, in a natural
fashion. The notation is used only for either the norm of complex numbers or for special
norms, when it usually appears with a subscript. This is unless it is used, as in Chapter 2
in particular, as f to denote the supremum of a function f , which is not a norm at all.
Despite this multitude of uses of a simple symbol, its meaning should always be clear from
the context.
10 1 Gaussian Fields
as n → ∞, where m and C are the mean and covariance matrix of the limiting
Gaussian. The norm on vectors is Euclidean and that on matrices any of the usual.
The proofs involve only (1.2.4) and the continuity theorem for convergence of random
variables.
One immediate consequence of either (1.2.3) or (1.2.4) is that if A is any d × d
matrix and X ∼ Nd (m, C), then
where C11 is an n × n matrix. Then each Xi is N (mi , Cii ) and the conditional
distribution9 of X i given Xj is also Gaussian, with mean vector
−1
mi|j = mi + Cij Cjj (X j − mj ) (1.2.7)
In −C12 C22
A=
0 Id−n
and define Y = (Y 1 , Y 2 ) = AX. Check using (1.2.6) that Y 1 and Y 2 ≡ X2 are independent
and use this to obtain (1.2.7) and (1.2.8) for i = 1, j = 2.
1.3 Boundedness and Continuity 11
−1
Ci|j = Cii − Cij Cjj Cj i . (1.2.8)
are called the mean and covariance functions of f . Multivariate10 Gaussian fields
taking values in Rd are fields for which α, ft is a real-valued Gaussian field for
every α ∈ Rd .
In fact, one can also go in the other direction as well. Given any set T , a function
m : T → R, and a nonnegative definite function C : T × T → R there exists11 a
Gaussian process on T with mean function m and covariance function C.
Putting all this together, we have the important principle that for a Gaussian
process, everything about it is determined by the mean and covariance functions.
The fact that no real structure is required of the parameter space T is what makes
Gaussian fields such a useful model for random processes on general spaces. To build
an appreciation for this, you may want to jump ahead to Section 1.4 to look at some
examples. However, you will get more out of that section if you first bear with us to
answer one of the most fundamental questions in the theory of Gaussian processes:
When are Gaussian processes (almost surely) bounded and/or continuous?
Of course, in order to talk about continuity—i.e., for the notation s → t above to have
some meaning—it is necessary that T have some topology, so we assume that (T , τ )
is a metric space, and that continuity is in terms of the τ -topology. Our first step is to
show that τ is irrelevant to the question of continuity.12 This is rather useful, since
10 Similary, Gaussian fields taking values in a Banach space B are fields for which α(f )
t
is a real-valued Gaussian field for every α in the topological dual B ∗ of B. The co-
variance function is then replaced by a family of operators Cst : B ∗ → B, for which
Cov(α(ft ), β(fs )) = β(Cst α), for α, β ∈ B ∗ .
11 This is a consequence of the Kolmogorov existence theorem, which, at this level of gen-
erality, can be found in Dudley [55]. Such a process is a random variable in RT and may
have terrible properties, including lack of measurability in t. However, it will always exist.
12 However, τ will come back into the picture when we talk about moduli of continuity later
in this section.
12 1 Gaussian Fields
we shall also soon show that boundedness and continuity are essentially the same
problem for Gaussian fields, and formulating the boundedness question requires no
topological demands on T .
To start, define a new metric d on T by
d(s, t) = {E[(f (s) − f (t))2 ]}1/2 , (1.3.1)
the exchange of limit and expectation justified by the uniform integrability provided
by the fact that since f is Gaussian, boundedness in L2 implies boundness in all Lp .
In other words, a.s. continuity of f implies the continuity of d.
Here is the lemma establishing the irrelevance of τ to the continuity question.
covers Aη for some η = η(ε) > 0, with η(ε) → 0 as ε → 0. That is, whenever
(s, t) ∈ Aη there is an (s , t ) ∈ B with τ (s, s ), τ (t, t ) ≤ ε. Note that
The astute reader will have noted that in the statement of Lemma 1.3.1 the param-
eter space T was quietly assumed to be compact, and that this additional assumption
was needed in the proof. Indeed, from now on we shall assume that this is always the
case, and shall rely on it heavily. Fortunately, however, it is not a serious problem. As
far as continuity is concerned, if T is σ -compact14 then a.s. continuity on its compact
subsets immediately implies a.s. continuity over T itself. We shall not go beyond
σ -compact spaces in this book. The same is not true for boundedness, nor should it
be.15 However, we shall see that, at least on compact T , boundedness and continuity
are equivalent problems. Furthermore, both depend on how “large’’ the parameter
set T is when size is measured in a metric that comes from the process itself.16 One
way to measure the size of T is via the notion of metric entropy.
Definition 1.3.2. Let f be a centered Gaussian field on T , and d the canonical metric
(1.3.1). Assume that T is d-compact, and write
Bd (t, ε) = {s ∈ T : d(s, t) ≤ ε} (1.3.2)
for the d ball centered on t ∈ T and of radius ε. Let N (T , d, ε) ≡ N (ε) denote the
smallest number of such balls that cover T , and set
Then N and H are called the (metric) entropy and log-entropy functions for T (or
f ). We shall refer to any condition or result based on N or H as an entropy condi-
tion/result.
14 T is σ -compact if it can be represented as the countable union of compact sets.
15 Think of simple Brownian motion on R . While bounded on every finite interval, it is a
+
consequence of the law of the iterated logarithm that it is unbounded on R+ .
16 A particularly illuminating example of this comes from Brownian noise processes of Sec-
tion 1.4.3. There we shall see that the same process can be continuous, or discontinuous,
depending on how we specify its parameter set.
14 1 Gaussian Fields
Note that since we are assuming that T is d-compact, it follows that H (ε) < ∞ for
all ε > 0. The same need not be (nor generally is) true for limε→0 H (ε). Furthermore,
note for later use that if we define
diam(T ) = sup d(s, t), (1.3.4)
s,t∈T
(Note we can drop the absolute value sign since the supremum here is always non-
negative.)
More precisely, write d (2) for the canonical metric of fs,t on T × T . Then
1/2
d (2) ((s, t), (s , t )) = E ((ft − fs ) − (ft − fs ))2
≤ 2 max(d(s, t), d(s , t )),
and so
N({(s, t) ∈ T × T : d(s, t) ≤ δ}, d (2) , δ) ≤ N (T , d, δ/2).
From these observations, it is immediate that Theorem 1.3.3 implies the following
corollary.
Corollary 1.3.4. Under the conditions of Theorem 1.3.3 there exists a universal con-
stant K such that
δ
E ωf,d (δ) ≤ K H 1/2 (ε) dε. (1.3.8)
0
Note that this is not quite enough to establish the a.s. continuity of f . Continuity
is, however, not far away, since the same construction used to prove Theorem 1.3.3
will also give us the following, which, with the elementary tools we have at hand at
the moment,17 neither follows from, nor directly implies, (1.3.8).
17 See, however, Theorem 2.1.3 below to see what one can do with better tools.
1.3 Boundedness and Continuity 15
Theorem 1.3.5. Under the conditions of Theorem 1.3.3 there exists a random η ∈
(0, ∞) and a universal constant K such that
δ
ωf,d (δ) ≤ K H 1/2 (ε) dε, (1.3.9)
0
sup ft = sup ft .
t∈T t∈D
18 We shall treat stationarity in detail in Chapter 5. For the moment all you need to know is that
under stationarity the expectation E{f (t)} is constant, while the covariance E{f (t)f (s)} is
a function of s − t only.
16 1 Gaussian Fields
As trite as this observation is, it ceases to be valid if, for example, we investigate
supt |ft + X| rather than supt (ft + X).
Proof of Theorem 1.3.3. Fix a point t0 ∈ T and consider ft − ft0 . In view of Obser-
vation 2, we can work with E{supT (ft −ft0 )} rather than E{supT ft }. Furthermore, in
view of separability, it will suffice to take these suprema over the countable separating
set D ⊂ T . To save on notation, we might therefore just as well assume that T is
countable, which we now do.
We shall represent the difference ft − ft0 via a telescoping sum, in what is called
a chaining argument, and is in essence an approximation technique. We shall keep
track of the accuracy of the approximations via entropy and simple union bounds.
To build the approximations, first fix some r ≥ 2 and choose the largest i ∈ Z
such that diam(T ) ≤ r −i , where the diam(T ) is measured in terms of the canonical
metric of f . For j > i, take a finite subset j of T such that
so that
For consistency, set i = {t0 } and πi (t) = t0 for all t. Consistent with the notation
of Definition 1.3.2, we can choose j to have no more than Nj = N (r −j ) points,
To start the proof we look for a bound on the tail of the distribution of supT (ft −
ft0 ), for which we need a little notation (carefully chosen to make the last lines of the
proof come out neatly). Define
Mj = Nj Nj −1 , aj = 23/2 r −j +1 ln(2j −i Mj ), S= aj .
j >i
Then Mj is the maximum number of possible pairs (πj (t), πj −1 (t)) as t varies through
T and aj was chosen so as to make later formulas simplify. Recall (1.2.2), which
implies
P{X ≥ u} ≤ e−u
2 /2σ 2
, (1.3.13)
for X ∼ N(0, σ 2 ) and u > 0. Applying (1.3.13) we have, for all u > 0,
−u2 aj2
P ∃ t ∈ T : fπj (t) − fπj −1 (t) > uaj ≤ Mj exp , (1.3.14)
2(2r −j +1 )2
and so
−u2 aj2
P sup ft − ft0 ≥ uS ≤ Mj exp
t∈T 2(2r −j +1 )2
j >i
−u2
= Mj 2j −i Mj .
j >i
j >i j >i
together with the observation that supt∈T (ft − ft0 ) ≥ 0 since t0 ∈ T , immediately
yields
(n)
n
ft = ft0 + fπj (t) − fπj −1 (t) ≡ fπn (t) .
j =i+1
(n) 2
From the construction of the πj it is immediate that E{|f (t) − ft | } → 0 as n → ∞,
(n)
and so the fact that ft converges almost surely implies that it must also converge to ft .
Since this must be true for any countable subset of points in T , separability gives that it is
true throughout T (cf. footnote 3).
18 1 Gaussian Fields
E sup ft ≤ KS, (1.3.15)
t∈T
∞
with K = 1 + 2 1 2−u du. Thus, all that remains to complete the proof is to
2
compute S. √ √
√ Using the definition of S, along with the elementary inequality that ab ≤ a +
b, gives
√
S ≤ 23/2 r −j +1 j − i ln 2 + ln Nj + ln Nj −1
j >i
⎛ ⎞
≤ K ⎝r −i + r −j ln Nj ⎠
j ≥i
≤K r −j ln Nj ,
j ≥i
where K is a constant that may change from line to line, but depends only on r. The
last inequality follows from absorbing the lone term of r −i into the second term of
the sum, which is possible since the very definition of i implies that Ni+1 ≥ 2, and
changing the multiplicative constant K accordingly.
Recalling now the definition of Nj as N (r −j ), we have that
ε ≤ r −j ⇒ N (ε) ≥ Nj .
Thus
r −i
ln N(ε) dε ≥ r −j − r −j −1 ln Nj = K r −j ln Nj .
0 j ≥i j ≥i
Putting this together with the bound on S and substituting into (1.3.15) gives
r −i
E sup ft ≤ K H 1/2 (ε) dε.
t∈T 0
Proof of Theorem 1.3.5. The proof starts with the same construction as in the proof
of Theorem 1.3.3. Note that from the same principles behind the telescoping sum
(1.3.12) defining ft − ft0 we have that for all s, t ∈ T and J > i,
ft − fs = fπJ (t) − fπJ (s) + [fπj (t) − fπj −1 (t) ] (1.3.16)
j >J
− [fπj (s) − fπj −1 (s) ].
j >J
1.3 Boundedness and Continuity 19
Arguing as we did to obtain (1.3.14), and the line or two following, we now see that
√
P ∃ s, t ∈ T : |fπj (t) − fπj (s) | ≥ 2d(πj (t), πj (s)) ln(2j −i Nj2 ) ≤ 2i−j .
Since this is a summable series, Borel–Cantelli gives the existence of a random j0 > i
for which, with probability one,
√
j > j0 ⇒ |fπj (t) − fπj (s) | ≤ 2d(πj (t), πj (s)) ln(2j −i Nj2 )
for all s, t ∈ T .
Essentially the same argument also gives that
√
j > j0 ⇒ |fπj (t) − fπj −1 (t) | ≤ 2d(πj (t), πj −1 (t)) ln(2j −i Mj )
for all t ∈ T .
Putting these into (1.3.16) gives that
|ft − fs | ≤ Kd(πj0 (t), πj0 (s)) ln(2j0 −i Nj20 )
+K d(πj (t), πj −1 (t)) ln(2j −i Mj ).
j >j0
if we take d(s, t) ≤ η = r −j0 . The above sums can be turned into integrals just as
we did at the end of the previous proof, which leads to (1.3.9) and so completes the
argument.
Before leaving to look at some examples, you should note one rather crucial fact:
The only Gaussian ingredient in the preceding two proofs was the basic inequality
(1.3.13) giving exp(−u2 /2) as a tail bound for a single N (0, 1) random variable. The
remainder of the proof used little more than the union bound on probabilities and
some clever juggling. Furthermore, it does not take a lot of effort to see that the
square root in the entropy integrals such as (1.3.5) is related to “inverting’’ the square
in exp(−u2 /2), while the logarithm comes from “inverting’’ the exponential. If this
makes you feel that there is a far more general non-Gaussian theory behind all this,
and that it is not going to be very different from the Gaussian one, then you are right.
A brief explanation of how it works is in Section 1.4.6.
20 1 Gaussian Fields
1.4 Examples
Our choice of examples was determined primarily by the type of random fields that we
need later in the book. Thus we shall start by looking at random fields on RN , followed
by looking at how to turn questions of differentiability into questions of continuity.
Following these, we shall look at the Brownian family of processes, needed for setting
up the spectral representation theory of Chapter 5. As mentioned above, this family
is also very instructive in understanding why the finiteness of entropy integrals is
a natural condition for continuity. The section on generalized fields is useful as a
recipe for providing examples, and the section on set-indexed processes is there for
fun. Neither of these will appear elsewhere in the book. The non-Gaussian examples
of Section 1.4.6 are worth knowing about already at this stage.
1.4.1 Fields on RN
Returning to Euclidean space after the abstraction of entropy on general metric spaces,
it is natural to expect that conditions for continuity and boundedness will become so
simple to both state and prove that there was really no need to introduce such abstruse
general concepts.
This expectation is both true and false. It turns out that avoiding the notion of
entropy does not make it any easier to establish continuity theorems, and indeed,
reliance on the specific geometry of the parameter space often confounds the basic
issues. On the other hand, the following important result is easy to state without
specifically referring to any abstract notions. To formulate it, let ft be a centered
Gaussian process on a compact T ⊂ RN and define
p2 (u) = sup E |fs − ft |2 , (1.4.1)
|s−t|≤u
for all s, t with |s − t| < η. Furthermore, there exists a constant K, dependent only
on the dimension N , and a random δ0 > 0 such that for all δ < δ0 ,
1.4 Examples 21
p(δ) 1
ωf (δ) ≤ K (− ln u) 2 dp(u), (1.4.5)
0
where the modulus of continuity ωf is taken with respect to the Euclidean metric. A
similar bound, in the spirit of (1.3.8), holds for E{ωf (δ)}.
Proof. Note first that since p(u) is obviously nondecreasing in u, the Riemann–
Stieltjes integral (1.4.3) is well defined. The proof that both integrals in (1.4.3)
converge and diverge together and that the convergence of both is assured by (1.4.4)
is simple calculus and left to the reader. Of more significance is relating these integrals
to the entropy integrals of Theorems 1.3.3 and 1.3.5 and Corollary 1.3.4. Indeed, all
the claims of the theorem will follow from these results if we show that
δ p(δ)
1
H (ε) dε ≤ K
1/2
(− ln u) 2 dp(u) (1.4.6)
0 0
√ note that for each ε > 0, the cube CL , and so T , can be covered by [1 +
Now
L N/(2p −1 (ε))]N (Euclidean) N -balls, each of which has radius no more than ε in
the canonical metric d. Thus,
δ √ δ √ 1
H 1/2 (ε) dε ≤ N (ln(1 + L N /(2p −1 (ε))) 2 dε
0 0
√ p(δ) √ 1
= N (ln(1 + L N /2u)) 2 dp(u)
0
√ p(δ) 1
≤2 N (− ln u) 2 dp(u)
0
Despite these drawbacks, the results of Theorem 1.4.1 are, from a practical point
of view, reasonably definitive. For example, we shall see below (Corollary 1.5.5) that
if f is stationary and
K1 K2
≤ C(0) − C(t) ≤ , (1.4.7)
(− ln |t|)1+α1 (− ln |t|)1+α2
for |t| small enough, then f will be sample path continuous if α2 > 0 and discontin-
uous if α1 < 0.
Before leaving the Euclidean case, it is also instructive to see how the above
conditions on covariance functions translate to conditions on the spectral measure
ν of (5.4.1) when f is stationary. Although this involves using concepts yet to be
defined, this is a natural place to describe the result. If you are familiar with spectral
theory, the following will be easy to follow. If not, then you can return here after
reading Chapter 5.
The translation is via standard Tauberian theory, which translates the behavior of
C at the origin to that of ν at infinity (cf., for example, [25]). A typical result is the
following, again in the centered, Gaussian case: If the integral
∞
1+α
log(1 + λ) ν(dλ)
0
converges for some α > 0, then f is a.s. continuous, while if it diverges for some
α < 0, then f is a.s. discontinuous.
In other words, it is the “high-frequency oscillations’’in the spectral representation
that are controlling the continuity/discontinuity dichotomy. This is hardly surprising.
What is perhaps somewhat more surprising, since we have seen that for Gaussian pro-
cesses continuity and boundedness come together, is that it is these same oscillations
that are controlling boundedness as well.
1.4.2 Differentiability on RN
We shall stay with Euclidean T ⊂ RN for the moment, and look at the question of
a.s. differentiability of centered, Gaussian f .
Firstly, however, we need to define the L2 , or mean square, (partial) derivatives
of a random field.
Choose a point t ∈ RN and a sequence of k “directions’’ t1 , . . . , tk in RN , so that
t = (t1 , . . . , tk ) ∈ ⊗k RN , the k-fold tensor product RN with itself. We say that
f has a kth-order L2 partial derivative at t, in the direction t , which we denote by
DLk 2 f (t, t ), if the limit
1
k
DLk 2 f (t, t ) = hi ti
lim k F t, (1.4.8)
h1 ,...,hk →0
i=1 hi i=1
and the limit above is interpreted sequentially, i.e., first send h1 to 0, then h2 , etc. A
simple sufficient condition for L2 partial differentiability of order k in all directions
and throughout a region T ∈ RN is that
! k k "
1
lim k E F t, hi ti F s, hi si (1.4.9)
h1 ,...,hk ,hi ,...,hk →0 i=1 hi hi i=1 i=1
and write BN,k (y, h) for the h-ball centered at y = (t, t ) in the metric induced by
· N,k . Furthermore, write
Tk,ρ = T × t : t ⊗k RN ∈ (1 − ρ, 1 + ρ)
for the product of T with the ρ-tube around the unit sphere in ⊗k RN . This is enough
to allow us to formulate the following theorem.
Theorem 1.4.2. Suppose f is a centered Gaussian random field on an open T ∈ RN ,
possessing kth-order partial derivatives in the L2 sense in all directions everywhere
inside T . Suppose, furthermore, that there exist 0 < K < ∞ and ρ, δ, h0 > 0 such
that for 0 < η1 , η2 , h < h0 ,
E{[F (t, η1 t ) − F (s, η2 s )]2 } (1.4.10)
−(1+δ)
< K(− ln((t, t ) − (s, s )N,k + |η1 − η2 |)) ,
for all
((t, t ), (s, s )) ∈ Tk,ρ × Tk,ρ : (s, s ) ∈ BN,k ((t, t ), h).
Then, with probability one, f is k-times continuously differentiable; that is, f ∈
C k (T ).
20 This is an immediate consequence of the fact that a sequence X of random variables
n
converges in L2 if and only if E{Xn Xm } converges to a constant as n, m → ∞.
24 1 Gaussian Fields
Proof. Recalling that we have assumed the existence of L2 derivatives, we can define
the Gaussian field
!
F (t, ηt ), η = 0,
F (t, t , η) =
DL2 f (t, t ), η = 0,
k
on T = Tk,ρ × (−h, h), an open subset of the finite-dimensional vector space RN ×
⊗k RN × R with norm
Whether f ∈ C k (T ) is clearly the same issue as whether F ∈ C(T ), with the issue
of the continuity of f really only being on the hyperplane where η = 0. But this
puts us back into the setting of Theorem 1.4.1, and it is easy to check that condition
(1.4.4) there translates to (1.4.10) in the current scenario.
Perhaps the most basic of all random fields is a collection of independent Gaussian
random variables. While it is simple to construct such random fields for finite and
even countable parameter sets, deep technical difficulties obstruct the construction
for uncountable parameter sets. The path that we shall take around these difficulties
involves the introduction of random measures, which, at least in the Gaussian case,
are straightforward to formulate.
Let (T , T , ν) be a σ -finite measure space and denote by Tν the collection of sets
of T of finite ν measure. A Gaussian noise based on ν, or “Gaussian ν-noise,’’ is a
random field W : Tν → R such that for all A, B ∈ Tν ,
Proof. In view of the closing remarks of the preceding section, all we need do is
provide an appropriate covariance function on Tν × Tν . Try
21 While the notation “W ’’ is inconsistent with our determination to use lowercase Latin
characters for random functions, we retain it as a tribute to Norbert Wiener, who is the
mathematical father of these processes.
1.4 Examples 25
Cν (A, B) = ν(A ∩ B). (1.4.14)
2
= αi IAi (x) ν(dx)
T i
≥ 0.
{(t1 , . . . , tN ) : ti ≥ 0}. It then makes sense to define a random field on R+ itself via
N
the equivalence
variables. (This is easily checked via the covariance function.) Also, when N > 1,
it follows immediately from (1.4.16) that Wt = 0 when mink tk = 0, i.e., when t is
on one of the axes. It is this image, with N = 2, of a sheet tucked in at two sides and
given a good shake, that led Ron Pyke [128] to introduce the name.
A simple simulation of a Brownian sheet, along with its contour lines, is shown
in Figure 1.4.1.
Fig. 1.4.1. A simulated Brownian sheet on [0, 1]2 , along with its contour lines at the zero level.
One of the rather interesting aspects of the contour lines of Figure 1.4.1 is that they
are predominantly parallel to the axes. There is a rather deep reason for this, and it has
generated a rather massive literature. Many fascinating geometrical properties of the
Brownian sheet have been discovered over the years (e.g., [36, 44, 45] and references
therein), and a description of the potential theoretical aspects of the Brownian sheet
is well covered in [91], where you will also find more references. Nevertheless, the
geometrical properties of fields of this kind fall beyond the scope of our interests,
since we shall be concerned with the geometrical properties of smooth (i.e., at least
differentiable) processes only. Since the Brownian motion on R1+ is well known to
be nondifferentiable at all points, it follows from the above comments relating the
sheet to the one-dimensional case that Brownian sheets too are far from smooth.
Nevertheless, we shall still have need of these processes, primarily since they
hold roughly the same place in the theory of multiparameter stochastic processes
that the standard Brownian motion does in one dimension. The Brownian sheet is
a multiparameter martingale (e.g., [36, 84, 170, 171]) and forms the basis of the
multiparameter stochastic calculus. There is a nice review of its basic properties in
[165], which also develops its central role in the theory of stochastic partial differential
equations, and describes in what sense it is valid to describe the derivative
∂ N W (t1 , . . . , tN )
∂t1 · · · ∂tN
as Gaussian white noise.
The most basic of the sample path properties of the Brownian sheets are the
continuity results of the following theorem. Introduce a partial order on Rk by writing
s ≺ ()t if si < (≤) ti for all i = 1, . . . , k, and for s t let (s, t) = N1 [si , ti ].
Although W ((s, t)) is already well defined via the original set-indexed process, it
1.4 Examples 27
is also helpful to think of it as the “increment’’ of the point-indexed Wt over (s, t),
that is,
N
N
W ((s, t)) = (−1)N− i=1 αi W s+ αi (ti − si ) . (1.4.17)
α∈{0,1}N i=1
Theorem 1.4.4. The point and rectangle-indexed Brownian sheets are continuous22
over compact T ⊂ RN .
Proof. We need only consider the point-indexed sheet, since by (1.4.17) its continuity
immediately implies that of the rectangle-indexed version. Furthermore, we lose
nothing by taking T = [0, 1]N . Thus, consider
w(s, t) = E |W (s) − W (t)|2
#
N #
N
w(s, t) ≤ 2 (si ∨ ti ) − 2 (si ∧ ti ).
i=1 i=1
N−1 N−1
Set a = 2 i=1 (si ∨ ti ) and b = 2 i=1 (si ∧ ti ). Then 2 ≥ a > b and
so that
22 For the rectangle-indexed case, we obviously need a metric on the sets forming the parameter
space. The symmetric difference metric of (1.1.1) is natural, and so we use it.
28 1 Gaussian Fields
#
N−1 #
N−1
w(s, t) ≤ 2|tN − sN | + (si ∨ ti ) − (si ∧ ti ).
i=1 i=1
N
w(s, t) ≤ 2 |ti − si | ≤ 2N |t − s|,
i=1
In the framework of the general set-indexed sheet, Theorem 1.4.4 states that the
Brownian sheet is continuous over A = {all rectangles in T } for compact T , and so
bounded. This is far from a trivial result, for enlarging the parameter set, for the same
process, can lead to unboundedness. The easiest way to see this is with an example.
An interesting, but quite simple example is given by the class of lower layers in
[0, 1]2 . A set A in RN is called a lower layer if s ≺ t and t ∈ A implies s ∈ A. In
essence, restricted to [0, 1]2 these are sets bounded by the two axes and a nonincreasing
line. A specific example in given in Figure 1.4.2, which is part of the proof of the
next theorem.
Theorem 1.4.5. The Brownian sheet on lower layers in [0, 1]2 is discontinuous and
unbounded with probability one.
Proof. We start by constructing some examples of lower layers. Write a generic
point in [0, 1]2 as (s, t) and let T01 be the upper right triangle of [0, 1]2 , i.e., those
points for which which s ≤ 1 and t ≤ 1 ≤ s + t. Let C01 be the largest square in
T01 , i.e., those points for which which 12 < s ≤ 1 and 12 ≤ t ≤ 1.
Continuing this process, for n = 1, 2, . . . , and j = 1, . . . , 2n , let Tnj be the
right triangle defined by s + t ≥ 1, (j − 1)2−n ≤ s < j 2−n , and 1 − j 2−n < t ≤
1 − (j − 1)2−n . Let Cnj be the square filling the upper right corner of Tnj , in which
(2j − 1)2−(n+1) ≤ s < j 2−n and 1 − (2j − 1)2−(n+1) ≤ t < 1 − (j − 1)2−n .
The class of lower layers in [0, 1]2 certainly includes all sets made up by taking
those points that lie between the axes and one of the step-like structures of Figure 1.4.2,
where each step comes from the horizontal and vertical sides of some Tnj with,
perhaps, different n.
Note that since the squares Cnj are disjoint for all n and j , the random variables
W (Cnj ) are independent. Also |Cnj | = 4−(n+1) for all n, j .
Let D be the negative diagonal {(s, t) ∈ [0, 1]2 : s + t = 1} and Lnj = D ∩ Tnj .
For each n ≥ 1, each point p = (s, t) ∈ D belongs to exactly one such interval
Ln,j (n,p) for some unique j (n, p).
For each p ∈ D and M < ∞ the events
have that for almost all ω, the events Enp occur infinitely often. Let n(p) = n(p, ω)
be the least such n.
Since the events Enp (ω) are measurable jointly in p and ω, Fubini’s theorem
implies that with probability one, for almost all p ∈ D (with respect to Lebesgue
measure on D) some Enp occurs, and n(p) < ∞. Let
Vω = Tn(p),j (n(p),p) ,
p∈D
Aω = {(s, t) : s + t ≤ 1} ∪ Vω ,
Bω = Aω \ Cn(p),j (n(p),p) .
p∈D
These examples should be enough to convince you that the relationship between
a Gaussian process and its parameter space is, as far as continuity and boundedness
are concerned, an important and delicate subject.
The above construction makes sense only when C(t, t) < ∞, for otherwise f
has infinite variance and the integrand in (1.4.19) is not defined. Nevertheless, there
are occasions when (1.4.20) makes sense, even though C(t, t) = ∞. In this case we
shall refer to C as a covariance kernel, rather than as a covariance function.
We shall look at a very important example of such processes in Section 5.3 when
we consider moving averages of Gaussian ν-noise, in which case F will be made up
of translations of the form ϕ(s) = F (t − s), for F ∈ L2 (ν).
Another important example arises in the spectral representation of stationary pro-
cesses. In Section 5.4 we shall take F to the family of complex exponentials exp(it ·λ)
for fields on RN , or as the family of characters for fields on a general group.23
While moving averages and stationary processes afford two classes of examples,
there are many more, some of which we shall describe at the end of this section. In
particular, given any positive definite function C on RN × RN , not necessarily finite,
one can define a function-indexed process on
23 On the assumption that you might already be familiar with basic spectral theory, consider
the spectral distribution theorem, Theorem 5.4.1. This is written in the setting of complex-
valued fields. For simplicity, assume that the spectral measure ν has a spectral density g.
Then (5.4.1) gives us that stationary covariances over RN can be formally written as
C(s, t) = ei(t−s)·λ g(λ) dλ
RN
= eit·λ1 g 1/2 (λ1 )δ(λ1 , λ2 )g 1/2 (λ2 ) e−is·λ2 dλ1 dλ2 ,
RN RN
which is in the form of (1.4.20), with ϕ = eit· , ψ = e−is· . The covariance kernel in the
integrand now involves the Dirac delta function, which is certainly not finite on the diagonal.
It is this issue that will lead to our having to be careful in defining the stochastic integral in
the spectral representation of Theorem 5.4.2.
1.4 Examples 31
FC = ϕ : ϕ(s)C(s, t)ϕ(t) ds dt < ∞ .
RN RN
The proof requires no more than checking that given such a C on RN × RN , the
corresponding C defined by (1.4.20) determines a finite positive definite, and so
covariance, function on FC × FC .
In general, function-indexed processes of this kind, for which the covariance
kernel C(s, t) is infinite on the diagonal, are known as generalized random fields.24
The question that we shall now look at is when such processes are continuous and
bounded. The answer involves a considerable amount of work, but it is worthwhile at
some stage to go through the argument carefully. It is really the only non-Euclidean
case for which we shall give an involved entropy calculation in reasonably complete
detail.
We start by describing potential function spaces to serve as parameter spaces for
continuous generalized Gaussian fields with such covariance kernels.
Take T ⊂ RN compact, q > 0, and p = q!. Let C0 , . . . , Cp and Cq be finite,
positive constants, and let F (q) = F (q) (T , C0 , . . . , Cp , Cq ) be the class of functions
on T whose partial derivatives of orders 1, . . . , p are bounded by C0 , . . . , Cp , and
for which the partial derivatives of order p satisfy Hölder conditions of order q − p
with constant Cq . Thus for each ϕ ∈ F (q) and t, t + τ ∈ T ,
p
n (t, τ )
ϕ(t + τ ) = + (t, τ ), (1.4.21)
n!
n=0
N
N
∂ n ϕ(t)
n (t, τ ) = ··· τj · · · τjn , (1.4.22)
∂tj1 · · · ∂tjn 1
j1 =1 jn =1
and where
∂ n ϕ(t)
sup ≤ Cn
and |(t, τ )| ≤ Cq |τ |q . (1.4.23)
t∈T ∂t · · · ∂t
j1 jn
A couple of comments are in order before we start the proof. Firstly, note that since
we have not specified any other metric on F (q) , the continuity claim of the theorem
is in relation to the topology induced by the canonical metric d. There are, of course,
more natural metrics on F (q) , but recall from Lemma 1.3.1 that mere continuity is
independent of the metric, as long as C(ϕ, ψ) is continuous in ϕ and ψ. More detailed
information on moduli of continuity will follow immediately from Theorem 1.3.5,
the relationship of the chosen metric to d, and the entropy bound (1.4.32) below.
Secondly, the condition q > N , while sufficient, is far from necessary. To see
this one need only take the case of f (t) ≡ X, where X is centered Gaussian. In this
case, defining f (ϕ) by (1.4.19) gives a generalized process that is continuous over
F (q) for all q > 0, regardless of the dimension N . To obtain sharper results one
needs to assume more on the specific form of the covariance kernel, which we shall
not do here.
Proof of Theorem 1.4.6. The proof relies on showing that the usual entropy integral
converges, where the entropy of F (q) is measured in the canonical metric d given by
d 2 (ϕ, ψ) = (ϕ(s) − ψ(s)) C(s, t) (ϕ(t) − ψ(t)) ds dt. (1.4.24)
T T
We shall obtain a bound on the entropy by explicitly constructing, for each ε > 0,
(q)
a finite family Fε of functions that serve as an ε-net for F (q) in the d metric. To
make life notationally easier, we shall assume throughout that T = [0, 1]N .
Fix ε > 0 and set
Let Zδ denote the grid of the (1 + δ −1 !)N points in [0, 1]N of the form
25 There is a similar-looking but incorrect result (“Theorem’’ 1.7) in the lecture notes [3]. We
thank Leonid Mytnik for pointing out that it had to be wrong.
1.4 Examples 33
Set
where we have written ϕ (n1 ,...,n N ) for the derivative ∂ n ϕ/∂ n1 x · · · ∂ nN x , and the
n+N −1
1 N
index i runs from 1 to N−1 , the number of partitions of n into N parts.
Finally, for ϕ ∈ F (q) , let A(n) = A(n) (ϕ) denote the vector-valued function on
Zδ defined by A(n) (tη ) = A(n) η (ϕ). For each ϕ ∈ F
(q) , let F
A(n) (ϕ) denote the set of
ψ ∈ F with fixed matrix A (ϕ). Our first task will be to show that the d-radius
(q) (n)
of FA(n) (ϕ) is not greater than Cε, where C is a constant dependent only on q and N .
All that will then remain will be to calculate how many different collections FA(n) (ϕ)
are required to cover F (q) . In other words, we need to find how many ϕ’s are needed
to approximate, in terms of the metric (1.4.24), all functions in F (q) .
Thus, take ϕ1 , ϕ2 ∈ FA(n) (ϕ) , and set
ϕ = ϕ1 − ϕ2 . (1.4.28)
Let d be the norm induced on F (q) by the metric d, and ∞ the usual sup
norm. Then
ϕd =
2
ϕ(s)C(s, t)ϕ(t) ds dt, ϕ∞ = sup |ϕ(t)|.
[0,1]N [0,1]N [0,1]k
We want to show that the ϕ of (1.4.28) has d-norm less than Cε, for some finite
constant C.
Note first, however, that in view of the definition (1.4.27) of the matrix Aδ we
have for each tη ∈ Zδ and each partial derivative ϕ (n1 ,...,nN ) of such a ϕ of order
n1 + · · · + nN = n ≤ p that
the equality following from the definition of the δn and the fact that each polynomial
n of (1.4.22) has less than N n distinct terms.
Thus, for ϕ of the form (1.4.28),
ϕ∞ ≤ Cδ q . (1.4.29)
We now turn to ϕd . Allowing the constant C to change from line to line, we have
ϕ2d = ϕ(s)C(s, t)ϕ(t) ds dt
[0,1]N ×[0,1]N
≤ Cδ 2q C(s, t) ds dt
[0,1]N ×[0,1]N
≤ Cδ .2q
Thus, by (1.4.25),
as required.
It remains to determine how many collections FA(n) (ϕ) are required to cover F (q) .
Since this is a calculation that is now independent of both Gaussian processes in
general and the above covariance function in particular, we shall only outline how it
is done. The details, which require somewhat cumbersome notation, can be found
in Kolmogorov and Tihomirov [94], which is a basic reference for general entropy
computations.
Consider, for fixed δ, the matrix Aδ , parameterized, as in (1.4.26), by ηi =
0, 1, . . . , [δ −1 ], i = 1, . . . , N, and n = 0, 1, . . . , p. Fix, for the moment, η2 =
· · · = ηN = 0. It is clear from the restrictions (1.4.23), (1.4.27), the definition of δn ,
n+N−1
and the fact that each vector A(n) η has no more than N−1 distinct elements that
there are no more than
1 1 (N−1) 1 (p+N )
N
= O δ −ξ
N−1
O ···
δ0 δ1 δp
(for an appropriate and eventually unimportant ξ ) ways to fill in the row of Aδ corre-
sponding to (n1 , . . . , nN ) = (0, . . . , 0).
What remains to show is that because of the rigid continuity conditions on the
functions in F (q) , there exists an absolute constant M = M(q, C0 , . . . , Cp , Cq ) such
that once this first row is determined, there are no more than M ways to complete
the row corresponding to (n1 , . . . , nN ) = (1, . . . , 0), and similarly no more than M 2
ways to complete the row corresponding to (n1 , . . . , nN ) = (2, . . . , 0), etc. Thus, all
told, there are no more than
−1 N
O(δ −ξ M (1+δ ) ) (1.4.31)
ways to fill the matrix Aδ , and so we have a bound for the number of different
collections FA(n) (ϕ) .
1.4 Examples 35
Modulo a constant, it now follows from (1.4.25), (1.4.30), and (1.4.31) that the
log entropy function for our process is bounded above by
1/q
C2 ξ 1 1
C1 + ln + C3 . (1.4.32)
(q + (N − α)/2) ε ε
Since this is integrable if q > N , we are done.
Before leaving function-indexed processes, there are a number of comments that
are worth making that relate them to other problems both within and outside of the
theory of Gaussian processes.
In most of the literature pertaining to generalized Gaussian fields the parameter
space used is the Schwarz space S of infinitely differentiable functions decaying
faster than any polynomial at infinity. Since this is a very small class of functions
(at least in comparison to the classes F (q) that Theorem 1.4.6 deals with), continuity
over S is automatically assured and therefore not often explicitly treated. However,
considerations of continuity and smaller parameter spaces are of relevence in the
treatment of infinite-dimensional diffusions arising as the solutions of stochastic par-
tial differential equations, in which solutions over very specific parameter spaces are
often sought. For more on this see, for example, [82, 166, 165].
A class of examples of particular importance to the theory of empirical processes
is that in which the covariance kernel is the product of a Dirac delta δ and a bounded
“density,’’ g, in the sense that
E{f (ϕ)f (ψ)} = ϕ(t)ψ(t)g(t) dt
“=’’ ϕ(s)[g 1/2 (s)δ(s, t)g 1/2 (t)]ψ(t) dt.
Such processes will arise as the stochastic integrals W (ϕ) in Section 5.2, where
W is a Gaussian ν-noise where ν is (now) a probability measure with density g. For
more on this setting, in which the computations are similar in spirit to those made
above, see Dudley [56].
Secondly, it is worth noting that much of what has been said above regarding gen-
eralized fields—i.e., function-indexed processes—can be easily extended to Gaussian
processes indexed by families of measures. For example, if we consider the function
ϕ in (1.4.19) to be the (positive) density of a measure μ on RN , then by analogy it
makes sense to write
f (μ) = f (t)μ(dt),
RN
with the corresponding covariance functional
C(μ, ν) = E{f (μ)f (ν)} = μ(ds)C(s, t)ν(dt).
RN RN
Again, as was the case for generalized Gaussian fields, the process X(μ) may be
well defined even if the covariance kernel C diverges on the diagonal. In fact, f (μ)
will be well defined for all μ ∈ MC , where
36 1 Gaussian Fields
MC = μ : μ(ds)C(s, t)μ(dt) < ∞ .
RN RN
We have already met some set-indexed processes in dealing with the Brownian family
of processes in Section 1.4.3, in which we concentrated on Gaussian ν-noise (cf.
(1.4.11)–(1.4.13)) indexed by sets. We saw, for example, that while the Brownian
sheet indexed by rectangles was continuous (Theorem 1.4.4), when indexed by lower
layers in [0, 1]2 it was discontinuous and unbounded (Theorem 1.4.5).
In this section we shall remain in the setting of ν-noise, and look at two classes of
set-indexed processes. The first will be Euclidean, and the parameter spaces will be
classes of sets in compact T ⊂ RN with smooth boundaries. For this example we also
add the assumption that ν has a density bounded away from zero and infinity on T .
For the second eaxmple, we look at Vapnick–Červonenkis classes of sets, of singular
importance in statistical learning theory and image processing, and characterized by
certain combinatorial properties. Here the ambient space (in which the sets lie) can be
any measure space. We shall skimp on proofs when they have nothing qualitatively
new to offer. In any case, all that we have to say is done in full detail in Dudley
[53, 56], where you can also find a far more comprehensive treatment and many more
examples.
Our first family is actually closely related to the family F (q) of functions we have
just studied in detail. While we shall need some of the language of manifolds and
homotopy to describe this example, which will be developed only later, in Chapter 6,
it will be basic enough that the average reader should have no trouble following the
argument.
With S N−1 denoting the unit sphere in RN , recall the basic fact that we can
cover it by two patches V1 and V2 , each of which maps via a C ∞ diffeomorphism
Fj : Vj → B N−1 to the open ball B N−1 = {t ∈ RN−1 : |t|2 < 1}.
Adapting slightly the notation of the previous example, let F (q) (Vj , M) be the
set of all real-valued functions ϕ on Vj such that
(cf. (1.4.21)–(1.4.23)). Furthermore, let F (q) (S N−1 , M) denote the set of all real-
valued functions ϕ on S N−1 such that the restriction of ϕ to Vj is in F (q) (Vj , M).
Now taking the N-fold Cartesian product of copies of F (q) (S N−1 , M), we obtain a
family of functions from S N−1 to RN , which we denote by D(N, q, M), where the
D stands for Dudley, who introduced this family in [52].
1.4 Examples 37
Outline of proof. The proof of the unboundedness part of the result is beyond us,
and so you are referred to [56]. As far as the proof of continuity is concerned, what
we need are the following inequalities for the log-entropy, on the basis of which
continuity follows from Theorem 1.3.5:
!
Cε−2(N−1)/(Nq−N +1) , (N − 1)/N < q ≤ 1,
H (I (N, q, M), ε) ≤
Cε −2(N−1)/q , 1 ≤ q.
These inequalities rely on the “simple algebraic geometric construction’’ noted above,
and so we shall not consider them in detail. The details are in [56]. The basic idea,
however, requires little more than noting that there are basically as many sets in
I (N, q, M) as there are functions in D(N, q, M), and we have already seen, in the
previous example, how to count the number of functions in D(N, q, M).
There are also equivalent lower bounds for the log-entropy for certain values of
N and q, but these are not important to us.
We now turn to the so-called Vapnick–Červonenkis, or VC, sets, due, not surpris-
ingly, to Vapnick and Červonenkis [163, 164]. These sets arise in a very natural way
in many areas including statistical learning theory and image analysis. The recent
book [162] by Vapnick is a good place to see why.
The arguments involved in entropy calculations for VC classes of sets are of an
essentially combinatoric nature, and so somewhat different from those we have met so
far. We shall therefore look at them somewhat more closely than we did for Dudley
sets. For more details, however, including a discussion of the importance of VC
classes to the problem of finding universal Donsker classes in the theory of empirical
processes, see [56].
Let (E, E) be a measure space. Given a class C of sets in E and a finite set F ⊂ E,
let C (F ) be the number of different sets A ∩ F for A ∈ C. If the number of such
sets is 2|F | , then C is said to shatter F . For n = 1, 2, . . . , set
26 The construction works as follows: For given ϕ : S N−1 → RN in D(N, q, M), let R be
ϕ
its range and Aϕ the set of all t ∈ RN , t ∈
/ Rϕ , such that among mappings of S N−1 into
RN \ {t}, ϕ is not homotopic (cf. Definition 6.1.3) to any constant map ψ(s) ≡ r = t. Then
define Iϕ = Rϕ ∪ Aϕ . For an example, try untangling this description for ϕ the identity
map from S N−1 to itself to see that what results is Iϕ = B N .
38 1 Gaussian Fields
The class C is called a Vapnick–Červonenkis class if mC (n) < 2n for some n, i.e.,
if V (C) < ∞. The number V (C) is called the VC index of C, and V (C) − 1 is the
largest cardinality of a set shattered by C.
Two extreme but easy examples that you can check for yourself are E = R and
C all half-lines of the form (−∞, t], for which mC (n) = n + 1 and V (C) = 2, and
E = [0, 1] with C all the open sets in [0, 1]. Here mC (n) = 2n for all n, and so
V (C) = ∞ and C is not a VC class.
A more instructive example, which also leads into the general theory we are after,
is E = RN and C is the collection of half-spaces of RN . Let (N, n) be the maximal
number of components into which it is possible to partition RN via n hyperplanes.
Then, by definition, mC (n) = (N, n). It is not hard to see that must satisfy the
recurrence relation
It thus follows, from (1.4.33), that the half-spaces of RN form a VC class for
all N.
What is somewhat more surprising, however, is that an inequality akin to (1.4.35),
which we developed only for this special example, holds in wide (even non-Euclidean)
generality.
Since the proof of this result is combinatoric rather than probabilistic, and will be
of no further interest to us, you are referred to either of [163, 56] for a proof.
The importance of Lemma 1.4.8 is that it enables us to obtain bounds on the
entropy function for Gaussian ν-noise over VC classes that are independent of ν.
Theorem 1.4.9. Let W be the Gaussian ν-noise on a probability space (E, E, ν). Let
C be a Vapnick–Červonenkis class of sets in E with V (C) = v. Then there exists
a constant K = K(v) (not depending on ν) such that for 0 < ε ≤ 12 , the entropy
function for W satisfies
N (C, ε) ≤ Kε −2v | ln ε|v .
Proof. We start with a little counting, and then turn to the entropy calculation proper.
The counting argument is designed to tell us something about the maximum number
of C sets that are a certain minimum distance from one another and can be packed
into E.
Fix ε > 0 and suppose A1 , . . . , Am ∈ C, m ≥ 2, and ν(Ai Aj ) ≥ ε for i = j .
We need an upper bound on m. Sampling with replacement, select n points at random
from E. The ν-probability that at least one of the sets Ai Aj contains none of these
n points is at most
m
(1 − ε)n . (1.4.37)
2
Choose n = n(m, ε) large enough so that this bound is less than 1. Then
and so for at least one configuration of the n sample points the class C picks out at
least m distinct subsets (since, with positive probability, given any two of the Ai there
is at least one point not in both of them). Thus, by (1.4.36),
v
m ≤ mC (n) ≤ nv = n(m, ε) . (1.4.38)
Take now the smallest n for which (1.4.37) is less than 1. For this n we have
m2 (1 − ε)n−1 ≥ 2, so that
2 ln m − ln 2
n−1≤ ,
| ln(1 − ε)|
1
m ≤ K(v)ε−v | ln ε|v for 0 < ε ≤ ,
2
This concludes the counting part of the proof. We can now do the entropy
calculation. Recall that the canonical distance between sets in E is given by
1
dν (A, B) = [ν(AB)] 2 .
Fix ε > 0. In view of the above, there can be no more than m = K(v)ε2v |2 ln ε|v
sets A1 , . . . , Am in C for which dν (Ai , Aj ) ≥ ε for all i, j . Take an ε-neighborhood
of each of the Ai in the dν metric. (Each such neighborhood is a collection of sets in
E.) The union of these neighborhoods covers C, and so we have constructed an ε-net
of the required size, and are done.
An immediate consequence of the entropy bound of Theorem 1.4.9 is the follow-
ing.
Corollary 1.4.10. Let W be Gaussian ν-noise based over a probability space (E, E, ν).
Then W is continuous over any Vapnick–Červonenkis class of sets in E.
For more about set-indexed processes, particularly from the martingale viewpoint,
see [84].
A natural question to ask is whether the results and methods that we have seen in this
section extend naturally to non-Gaussian fields. We already noted, immediately after
the proof of the central Theorem 1.3.5, that the proof there used normality only once,
and so the general techniques of entropy should be extendable to a far wider setting.
For most of the processes that will concern us, this will not be terribly relevant.
In Chapter 15 we shall concentrate on random fields that can be written in the form
where the g i are i.i.d. Gaussian and F : Rd → R is smooth. In this setting, conti-
nuity and boundedness of the non-Gaussian f follow deterministically from similar
properties on F and the g i , and so no additional theory is needed.
Nevertheless, there are many processes that are not attainable in this way, for
which one might a priori expect that the random geometry of Part III of this book might
apply. In particular, we are thinking of the smooth stable fields of Samorodnitsky
and Taqqu [138]. With this in mind, and for completeness, we state Theorem 1.4.11
below. However, other than the “function of Gaussian’’ non-Gaussian scenario, we
know of no cases for which this random geometry has even the beginnings of a parallel
theory.
To set up the basic result, let ft be a random process defined on a metric space
(T , τ ) and taking values in a Banach space (B, B ). Since we are no longer in the
Gaussian case, there is no reason to assume that there is any longer a canonical metric
of T to replace τ . Also, recall that a function ϕ : R → R is called a Young function
if it is even, continuous, convex, and satisfies
ϕ(x) ϕ(x)
lim = 0, lim = ∞.
x→0 x x→∞ x
1.5 Majorizing Measures 41
Theorem 1.4.11. Take f as above, and assume that the real-valued process ft −
fs B is separable. Let Nτ be the metric entropy function for T with respect to the
metric τ . If there exists an α ∈ (0, 1] and a Young function ϕ such that the following
two conditions are satisfied, then f is continuous with probability one:
f (t) − f (s)αB
E ϕ ≤ 1,
τ (s, t)
ϕ −1 (Nτ (u)) du < ∞.
Nτ (u)>1
The best place to read about this is in Ledoux and Talagrand [99].
where Bd is the d-ball of (1.3.2) and the infimum on μ is taken over all probability
measures μ on T .
Furthermore, if f is a.s. bounded, then there exists a probability measure μ and
a universal constant K such that
∞&
1
K sup ln dε ≤ E sup ft . (1.5.2)
t∈T 0 μ(Bd (t, ε)) t∈T
27 “Mathematical completeness’’ should be understood in a relative sense, since our proofs
here will most definitely be incomplete!
42 1 Gaussian Fields
A measure μ for which the integrals above are finite for all t is called a majorizing
measure.
Note that the upper limit to the integrals in the theorem is really diam(T ), since
Outline of proof. Start by rereading the proof of Theorem 1.3.3 as far as (1.3.14).
The argument leading up to (1.3.14) was that, eventually, increments fπj (t) −fπj −1 (t)
would be smaller than uaj . However, on the way to this we could have been less
wasteful in a number of our arguments.
For example, we could have easily arranged things so that
which would have given us fewer increments to control. We could have also as-
sumed that
∀ t ∈ j , πj (t) = t, (1.5.4)
so that, by (1.5.3),
So let us assume both (1.5.3) and (1.5.4). Then controlling the increments fπj (t) −
fπj −1 (t) actually means controlling the increments ft − fπj −1 (t) for t ∈ j . There
are only Nj such increments, which improves on our previous estimate of Nj Nj −1 .
This does not make much of a difference, but what does, and this is the core of the
majorizing measure argument, is replacing the original aj by families of nonnegative
numbers {aj (t)}t∈j , and then to ask when
∀ t ∈ T, ft − ft0 ≤ uS,
where
28 There is a similiar extension of Theorem 1.3.5, but we shall not bother with it.
1.5 Majorizing Measures 43
S = sup aj (πj (t)).
t∈T j >i
Thus
P sup(ft − ft0 ) ≥ uS ≤ P{ft − fπj −1 (t) ≥ uaj (t)}
t∈T j >i t∈j
−u2 aj2 (t)
≤ 2 exp , (1.5.7)
(2r −j +1 )2
j >i t∈j
'
where j = j \ i<k≤j −1 k . (The move from the j to the disjoint j is crucial,
and made possible by (1.5.5).) This bound is informative only if the right-hand side
is less than or equal to one. Let us see how to ensure this when u = 1. Setting
−aj2 (t)
wj (t) = 2 exp , (1.5.8)
(2r −j +1 )2
we want to have j t∈ wj (t) ≤ 1. We are now getting close to our majorizing
j
measure.
Recall that T has long ago been assumed countable. Suppose we have a probability
measure μ supported on T and for all j > i and all t ∈ j set wj (t) = μ({t}). Undo
(1.5.8) to see that this means that we need to take
&
−j +1 2
aj (t) = 2r ln .
μ({t})
2
With this choice, the last sum in (1.5.7) is no more than 21−u , and S is given by
&
2
−j +1
S = 2 sup r ln ,
t∈T μ({π j (t)})
j >i
all of which ensures that for the arbitrary measure μ we now have
&
2
−j +1
E sup ft ≤ K sup r ln . (1.5.9)
t∈T t∈T μ({πj (t)})
j >i
This is, in essence, the majorizing measure upper bound. To make it look more
like (1.5.1), note that each map πj defines a partition Aj of T comprising the sets
At = {s ∈ T : πj (s) = t}, t ∈ j .
With Aj (t) denoting the unique element of Aj that contains t ∈ T , it is not too hard
(but also not trivial) to reformulate (1.5.9) to obtain
44 1 Gaussian Fields
&
2
E sup ft ≤ K sup r −j +1 ln , (1.5.10)
t∈T t∈T j >i μ({Aj (t)})
which is now starting to look a lot more like (1.5.1). To see why this reformulation
works, you should go to [154], which you should now be able to read without even
bothering about notational changes. You will also find there how to turn (1.5.10)
into (1.5.1), and also how to get the lower bound (1.5.2). All of this takes quite a bit
of work, but at least now you should have some idea of how majorizing measures
arose.
Despite the elegance of Theorem 1.5.1, it is not always easy, given a specific
Gaussian process, to find the “right’’ majorizing measure for it. To circumvent this
problem, Talagrand recently [155] gave a recipe for how to wield the technique
without the need to explicitly compute a majorizing measure. However, we are
already familiar with one situation in which there is a simple recipe for building
majorizing measures. This is when entropy integrals are finite.
Thus, let be H (ε) be our old friend (1.3.3), and set
(
1
g(t) = ln , 0 < t ≤ 1. (1.5.11)
t
Then here is a useful result linking entropy and majorizing measures.
∞
Lemma 1.5.2. If 0 H 1/2 (ε) dε < ∞, then there exists a majorizing measure μ and
a universal constant K such that
η η
sup g(μ(B(t, ε))) dε < K(η| log η| + H 1/2 (ε) d), (1.5.12)
t∈T 0 0
where δtk is the measure giving unit mass to tk . This will be our majorizing measure.
To check that it satisfies (1.5.12), note first that if ε ∈ (2−(n+1) , 2−n ], then
√ 2−n
≤ (n + 2)2−n ln 2 + 2 H 1/2 (ε) dε,
0
Theorem 1.5.3. If f is stationary over a compact Abelian group T , then (1.5.1) and
(1.5.2) hold with μ taken to be normalized Haar measure on T .
A very similar result holds if T is only locally compact. You can find the details
in [99].
Proof. Since (1.5.1) is true for any probability measure on T , it also holds for Haar
measure. Thus we need only prove the lower bound (1.5.2).
Thus, assume that f is bounded, so that by Theorem 1.5.1 there exists a majorizing
measure μ satisfying (1.5.2). We need to show that μ can be replaced by Haar measure
on T , which we denote by ν.
Set
Dμ = sup{η : μ(B(t, η)) < 1/2, for all t ∈ T },
with Dν defined analogously. With g as in (1.5.11), (1.5.2) can be rewritten as
Dm
g(μ(B(t, ε)) dε ≤ KE sup ft ,
0 t∈T
46 1 Gaussian Fields
where the second equality comes from the stationarity of f and the third and fourth
from the properties of Haar measures.
Now note that g(x) is convex over x ∈ (0, 12 ), so that it is possible to define a
function g that agrees with it on (0, 12 ), is bounded on ( 12 , ∞), and convex on all of
R+ . By Jensen’s inequality,
g(E{Z(ε)}) ≤ E{g(Z(ε))}.
That is,
g(ν(B(t0 , ε))) ≤ g(μ(B(t, ε))ν(dt).
T
Proof. That (1.5.14) implies (1.5.15) is obvious. That (1.5.16) implies (1.5.14) is
Lemma 1.5.2 together with Theorem 1.5.1. Thus it suffices to show that (1.5.15)
implies (1.5.16), which we shall now do.
Note firstly that by Theorem 1.5.3 we know that
∞
sup g μ B(t, ε) dε < ∞ (1.5.17)
t∈T 0
Proof. Recall from Theorem 1.4.1 that the basic energy integral in (1.5.16) converges
or diverges together with ∞
p(e−u ) du,
2
(1.5.19)
δ
where
p 2 (u) = 2 sup [C(0) − C(t)]
|t|≤u
(cf. (1.4.2)). Applying the bounds in (1.5.18) to evaluate (1.5.19) and applying the
extension of Theorem 1.5.3 to noncompact groups easily proves the corollary.
2
Gaussian Inequalities
Basic statistics has its Chebyshev inequality, martingale theory has its maximal in-
equalities, Markov processes have large deviations, but all pale in comparison to the
power and simplicity of the coresponding basic inequality of Gaussian processes. This
inequality was discovered independently, and established with very different proofs,
by Borell [30] and Tsirelson, Ibragimov, and Sudakov (TIS) [160]. For brevity, we
shall call it the Borell–TIS inequality. In the following section we shall treat it in
some detail.
A much older inequality, which allows comparison between the suprema of dif-
ferent Gaussian processes, is due to Slepian, and we shall describe it and some of its
newer relatives in Section 2.2.
There is a classical result of Landau and Shepp [96] and Marcus and Shepp [110] that
gives a result closely related to (2.1.2), but for the supremum of a general centered
Gaussian process. If we assume that ft is a.s. bounded, then they showed that
lim u−2 ln P sup ft > u = −(2σT2 )−1 , (2.1.3)
u→∞ t∈T
where
50 2 Gaussian Inequalities
σT2 = sup E{ft2 }
t∈T
is a notation that will remain with us throughout this section. An immediate conse-
quence of (2.1.3) is that for all ε > 0 and large enough u,
P sup ft > u ≤ eεu −u /2σT .
2 2 2
(2.1.4)
t∈T
Since ε > 0 is arbitrary, comparing (2.1.4) and (2.1.1), we reach the rather surprising
conclusion that the supremum of a centered, bounded Gaussian process behaves much
like a single Gaussian variable with a suitably chosen variance.
In Chapter 4 we shall see that it requires little more than the basic techniques of
the current chapter, along with the notion of entropy from Chapter 1, to considerably
2
improve on (2.1.4), in that the exponential term eεu can be replaced by a power term
of the form uα . Later, in Part III, we shall how to do even better, although then we
shall have to assume more on the random fields.
Now, however, we want to see what can be done with minimal assumptions, and
to see from where (2.1.4) comes. In fact, (2.1.4) and its consequences are all special
cases of a nonasymptotic result due, as mentioned above, independently, and with very
different proofs, to Borell [30] and Tsirelson, Ibragimov, and Sudakov (TIS) [160].
Then
E{f } < ∞,
and for all u > 0,
P{f − E{f } > u} ≤ e−u
2 /2σ 2
T . (2.1.5)
so that both (2.1.3) and (2.1.4) follow from the Borell–TIS inequality.
1 Actually, Theorem 2.1.1 is not in the same form as Borell’s original inequality, in which
E{f } was replaced by the median of f . However, the two forms are equivalent. For
this and other variations of (2.1.5), including extensions to Banach space–valued processes
for which is the norm, see the more detailed treatments of [28, 56, 67, 99, 105]. To see
how the Borell–TIS inequality fits into the wider theory of concentration inequalities, see
the recent book [98] by Ledoux.
2.1 Borell–TIS Inequality 51
Indeed, a far stronger result is true, for (2.1.4) can now be replaced by
2 /2σ 2
P {f > u} ≤ eCu−u T ,
where C is a constant depending only on E{X}, and we know at least how to bound
this quantity from Theorem 1.3.5.
Note that, despite the misleading notation, ≡ sup is not a norm, and that very
often one needs bounds on the tail of supt |ft |, which does give a norm. However,
symmetry immediately gives
P sup |ft | > u ≤ 2P sup ft > u , (2.1.6)
t t
∞ √ 2
− ln u1/α − E{f }
+2 exp du
eα ∨E{f } 2σT2
≤ eα + E{f }
∞
−(u − E{f })2
+ 4α u exp exp{αu2 } du,
0 2σT2
Recall Theorems 1.3.3 and 1.3.5, which established, respectively, the a.s. bound-
edness of f and a bound on the modulus of continuity ωf,d (δ) under essentially
identical entropy conditions. It was rather irritating back there that we had to establish
each result independently, since it is “obvious’’ that one should imply the other. A
simple application of the Borell–TIS inequality almost does this.
Theorem 2.1.3. Suppose that f is a.s. bounded on T . Then f is also a.s uniformly
continuous (with respect to the canonical metric d) if and only if
lim φ(η) = 0, (2.1.8)
η→0
where
φ(η) = E sup (fs − ft ) . (2.1.9)
d(s,t)<η
Furthermore, under (2.1.8), for all ε > 0 there exists an a.s. finite random variable
η > 0 such that
ωf,d (δ) ≤ φ(δ) |ln φ(δ)|ε , (2.1.10)
for all δ ≤ η.
Proof. We start with necessity. For almost every ω we have
lim sup |fs (ω) − ft (ω)| = 0.
η→0 d(s,t)<η
But (2.1.8) now follows from dominated convergence and Theorem 2.1.2.
For sufficiency, note that from (2.1.8) we can find a sequence {δn } with δn → 0
such that φ(δn ) ≤ 2−n . Set δn = min(δn , 2−n ), and consider the event
An = sup |fs − ft | > 2−n/2 .
d(s,t)<δn
where the second line is an elementary Gaussian computation and the third uses the
fact that sups,t∈S (ft − fs ) is nonnegative. Consequently, we have that
√
δ ≤ 2π φ(δ), (2.1.11)
We now turn to the proof of the Borell–TIS inequality. There are essentially three
quite different ways to tackle this proof. Borell’s original proof relied on isoperimetric
inequalities.2 While isoperimetric inequalities may be natural for a book with the word
“geometry’’ in its title, we shall avoid them, since they involve setting up a number of
concepts for which we shall have no other need. The proof of Tsirelson, Ibragimov,
and Sudakov used Itô’s formula from stochastic calculus. This is one of our3 favorite
proofs, since as one of the too few links between the Markovian and Gaussian worlds
of stochastic processes, it is to be prized.
We shall, however, take a more direct route, which we learned about from the
excellent collection of exercises [38], although its roots are much older. The first step
in this route involves the following two lemmas, which are of independent interest.
∂f
where ∇f (x) = ( ∂xi
f (x))i=1,...,k .
Proof. It suffices to prove the lemma with f (x) = eit,x and g(x) = eis,x , with
s, t, x ∈ Rk . Standard approximation arguments (which is where the requirement
that f is C 2 appears) will do the rest. Write
ϕ(t) = E eit,X = e|t| /2
2
Proof. Let Y be an independent copy of X and α a uniform random variable on [0, 1].
Define the pair (X, Zα ) via
(X, Zα ) = X, αX + 1 − α 2 Y .
4 Recall that the Lipschitz constant of a function on Rk is given by
|f (x) − f (y)|
f Lip = sup |∇f (x)| = sup .
x x |x − y|
2.1 Borell–TIS Inequality 55
Take h as in the statement of the lemma, t ≥ 0 fixed, and define g = eth . Applying
(2.1.12) gives
1
E{h(X)g(X)} = E{∇g(X), ∇h(Zα )} dα
0
) 1 *
=t E ∇h(X), ∇h(Zα ) eth(X) dα
0
≤ tE eth(X) ,
Then
E h(X)eth(X) = u (t)eu(t) ,
so that from the preceding inequality, u (t) ≤ t. Since u(0) = 0 it follows that
u(t) ≤ t 2 /2, and we are done.
The following lemma gives the crucial step toward proving the Borell–TIS in-
equality.
Proof. By scaling it suffices to prove the result for σ = 1. Assume for the moment
that f ∈ C 2 . Then, for every t, u > 0,
P{h(X) − E{h(X)} > u} ≤ et (h(x)−E{h(X)}−u) dP (x)
h(x)−E{h(X)}>u
≤ e−tu E et (h(x)−E{h(X)})
1 2
−tu
≤ e2t ,
the last inequality following from (2.1.13). Taking the optimal choice of t = u gives
(2.1.14) for f ∈ C 2 .
To remove the C 2 assumption, take a sequence of C 2 approximations to f each
one of which has Lipschitz coefficient no greater than σ and apply Fatou’s inequality.
This completes the proof.
Proof of Theorem 2.1.1. There will be two stages to the proof. Firstly, we shall
establish Theorem 2.1.1 for finite T . We then lift the result from finite to general T .
Thus, let T be finite, so that we can write it as {1, . . . , k}. In this case we can
replace sup by max, which has Lipshitz constant 1. Theorem 2.1.1 then follows
immediately from 2.1.6. The general case is a little more delicate.
Let C be the k × k covariance matrix of f on T , with components cij = E{fi fj },
so that
where as usual ei is the vector with 1 in position i and zeros elsewhere. The first
inequality above is elementary, and the second is Cauchy–Schwarz. But
so that
max(Ax)i − max(Ay)i ≤ σT |x − y|.
i i
In view of the equivalence in law of maxi fi and maxi (AW )i and Lemma 2.1.6, this
establishes the theorem for finite T .
We now turn to lifting the result from finite to general T . This is, almost, an easy
exercise in approximation. For each n > 0 let Tn be a finite subset of T such that
Tn ⊂ Tn+1 and Tn increases to a dense subset of T . By separability,
a.s.
sup ft → sup ft ,
t∈Tn t∈T
Since σT2n → σT2 < ∞ (again monotonically), this would be enough to prove the
general version of the Borell–TIS inequality from the finite-T version if only we
knew that the one worrisome term, E{supT ft }, were definitely finite, as claimed in
2.2 Comparison Inequalities 57
the statement of the theorem. Thus if we show that the assumed a.s. finiteness of f
implies also the finiteness of its mean, we shall have a complete proof to both parts
of the theorem.
We proceed by contradiction. Thus, assume E{f } = ∞, and choose u0 > 0
such that
1 3
e−u0 /σT ≤
2 2
and P sup ft < u0 ≥ .
4 t∈T 4
Now choose n ≥ 1 such that E{f Tn } > 2u0 , which is possible since E{f Tn } →
E{f T } = ∞. The Borell–TIS inequality on the finite space Tn then gives
1
≥ 2e−u0 /σT ≥ 2e−u0 /σTn ≥ P{|f Tn − E{f Tn }| > u0 }
2 2 2 2
2
3
≥ P{E{f Tn } − f T > u0 } ≥ P{f T < u0 } ≥ .
4
This provides the required contradiction, and so we are done.
∂ 2 h(x)
≥0 (2.2.5)
∂xi ∂xj
∂ϕ 1 ∂ 2ϕ ∂ϕ ∂ 2ϕ
= , = , i = j. (2.2.6)
∂cii 2 ∂xi2 ∂cij ∂xi ∂xj
This completes the proof for the case of nonsingular C. The general case can be
handled by approximating a singular C via a sequence of nonsingular covariance
matrices.
Proof of Theorem 2.2.1. By separability, and the final argument in the proof of
the Borell–TIS inequality, it suffices to prove (2.2.2) for T finite. Note that since
E{ft2 } = E{gt2 } for all t ∈ T , (2.2.1) implies that E{fs ft } ≥ E{gs gt } for all s, t ∈ T .
Let h(x) = ki=1 hi (xi ), where each hi is a positive nonincreasing, C 2 function
satisfying the growth conditions placed on h in the statement of Lemma 2.2.2, and k
is the number of points in T . Note that for i = j ,
6 We could actually manage with h twice differentiable only in the sense of distributions. This
would save the approximation argument following (2.2.7) below, and would give a neater,
albeit slightly more demanding, proof of Slepian’s inequality, as in [99].
7 This is, of course, little more than the heat equation of PDE theory.
2.2 Comparison Inequalities 59
∂ 2 h(x) #
= hi (xi )hj (xj ) hn (xn ) ≥ 0,
∂xi ∂xj n=i
n=j
since both hi and hj are nonpositive. It therefore follows from Lemma 2.2.2 that
! " ! k "
#
k #
E hi (fi ) ≥ E hi (gi ) . (2.2.7)
i=1 i=1
As mentioned above, there are many extensions of Slepian’s inequality, the most
important of which is probably the following.
E{ft } = E{gt }
and
E{(ft − fs )2 } ≤ E{(gt − gs )2 }
In other words, a Slepian-like inequality holds without a need to assume either zero
mean or identical variance for the compared processes. However, in this case we have
only the weaker ordering of expectations of (2.2.3) and not the stochastic domination
of (2.2.2).
60 2 Gaussian Inequalities
The original version of this inequality assumed zero means, and its proof involved
considerable calculus, in spirit not unlike that in the proof of Slepian’s inequality (cf.
[66] or [85]). The proof that we give is due to Sourav Chatterjee, [37] who, at the
time of writing, is a gifted young graduate student at Stanford. It starts with a simple
and well-known lemma.
Lemma 2.2.4. Let X = (X1 , . . . , Xk ) be a vector of centered Gaussian variables
with arbitrary covariance matrix, and let h : Rk → R be C 1 , with h and its first-
order derivatives satisfying a o(|x|d ) growth condition at infinity for some finite d.
Then, for 1 ≤ i ≤ k,
k
E {Xi h(X)} = E Xi Xj E hj (X) . (2.2.9)
j =1
where hj = ∂h/∂xj .
Proof. Assume firstly that the Xj are independent and have unit variance. Then,
xi h(x)e−|x| /2 dx
2
E {Xi h(X)} = (2.2.10)
R
k
hi (x)e−|x| /2 dx
2
=
Rk
= E {hi (X)} ,
where the second equality is via a simple integration by parts. This is (2.2.9) for this
case.
For the general case, write C = AT A for the covariance matrix of X, and define
h (x) = h(Ax). Let X = (X1 , . . . , Xk ) be a vector of independent, standard
normal variables. Then, X ∼ AX , and writing amn and amn T for the elements of A
T
and A and applying (2.2.10),
! k "
k
{X
E i h(X)} = E aim Xm h (X ) = aim E{hm (X )}
m=1 m=1
⎧ ⎫
k ⎨k ⎬
k
k
= aim E aj m hj (X) = E{hj (X)} T
aim amj
⎩ ⎭
m=1 j =1 j =1 m=1
k
= E{Xi Xj }E{hj (X)},
j =1
as required.
The Sudakov–Fernique inequality on finite spaces is a special case of the following
result, whose proof, en passant, actually provides more. For general (nonfinite) spaces
the arguments that we have previously used to go from finite to general parameter
work here as well, giving the classic Sudakov–Fernique inequality with no extra work.
2.2 Comparison Inequalities 61
and
γ = sup |γijX − γijY |.
1≤i,j ≤k
Then
E max Xi − E max Yi ≤ 2γ log k. (2.2.11)
i i
Proof. Without loss of generality, we may assume that X and Y are defined on the
same probability space and are independent. Fix β > 0 and define hβ : Rk → R by
k
−1
hβ (x) = β log e βxi
.
i=1
Note that
k
−1 −1
max xi = β log(e β maxi xi
)≤β log e βxi
i
i=1
≤ β −1 log(keβ maxi xi ) = β −1 log k + max xi .
i
Thus
sup hβ (x) − max xi ≤ β −1 log k. (2.2.13)
x∈Rk i
3i X
Furthermore, write σijX = E{X 3i X
3j }, σ Y = E{X 3j }, and for t ∈ [0, 1] define the
ij
random vector Zt by
√ √
Zt,i =
3i + t Y
1 − tX 3i + μi .
! "
∂hβ √
k
∂ 2 hβ
E 3
(Zt )Xi = 1 − t σij E
X
(Zt )
∂xi ∂xi ∂xj
j =1
and
! "
∂hβ √ k
∂ 2 hβ
E 3i = t
(Zt )Y σij E
Y
(Zt ) ,
∂xi ∂xi ∂xj
j =1
so that
! "
1
k
∂ 2h
ϕ (t) =
β
E (Zt ) (σijY − σijX ). (2.2.14)
2 ∂xi ∂xj
i,j =1
In a moment we shall show that if γijX ≤ γijY then ϕ ≥ 0 throughout its range. This
will prove (2.2.12), since it implies that for all β > 0,
Taking β → ∞ and applying (2.2.13) establishes (2.2.12). The more delicate inequal-
ity (2.2.11) requires more information on ϕ, which comes from a little elementary
calculus.
Note that
∂hβ eβxi
(x) = k
= pi (x),
∂xi βxj
j =1 e
where for each x ∈ Rk , the pi (x) define a probability measure on {1, . . . , k}. It is
then straightforward to check that
!
∂ 2 hβ β(pi (x) − pi2 (x)), i = j,
(x) =
∂xi ∂xj −βpi (x)pj (x), i = j.
Thus,
2.2 Comparison Inequalities 63
k
∂ 2 hβ
(x)(σijY − σijX ) (2.2.15)
∂xi ∂xj
i,j =1
k
k
=β pi (x)(σijY − σijX ) − β pi (x)pj (x)(σijY − σijX )
i=1 i,j =1
β
k
= pi (x)pj (x)[(σiiY + σjj
Y
− 2σijY ) − (σiiX + σjj
X
− 2σijX )]
2
i,j =1
β
k
= pi (x)pj (x)(γijY − γijX ).
2
i,j =1
The second equality here uses the fact that i pi (x) = 1, while the third one relies
on the fact that
σiiX + σjj
X 3i − X
− 2σijX = E{(X 3j )2 }
= E{(Xi − Xj )2 } − (μi − μj )2 = γijX − (μi − μj )2 ,
βγ
|E{hβ (Y )} − E{hβ (X)}| = |ϕ(1) − ϕ(0)| ≤ .
4
Combining this with (2.2.13) gives
E max Xi − E max Yi ≤ βγ + 2 log k ,
i i 4 β
and choosing β = 8 log k/γ gives (2.2.11), as required.
There are many extensions to the Sudakov–Fernique inequality that we shall not
need in this book, but you can find them in the references in the description of Part I.
From those sources you can also find out how to extend the above arguments to obtain
conditions on covariance functions that allow statements of the form
64 2 Gaussian Inequalities
P min max Xij ≥ u ≥ P min max Yij > u ,
1≤i≤n 1≤j ≤m 1≤i≤n 1≤j ≤m
along with even more extensive variations due, originally, to Gordon [69]. Gor-
don [70] also shows how to extend the essentially Gaussian computations above to
elliptically contoured distributions.
As we mentioned earlier, Chapter 4, which does not require the material of Chap-
ter 3, continues with the theme of suprema distributions.
3
Orthogonal Expansions
While most of what we shall have to say in this brief chapter is rather theoretical,
it actually covers one of the most important practical aspects of Gaussian modeling.
The basic result is Theorem 3.1.1, which states that every centered Gaussian process
with a continuous covariance function has an expansion of the form
∞
f (t) = ξn ϕn (t), (3.0.1)
n=1
where the ξn are i.i.d. N (0, 1), and the ϕn are certain functions on T determined by
the covariance function C of f . In general, the convergence in (3.0.1) is in L2 (P)
for each t ∈ T , but (Theorem 3.1.2) if f is a.s. continuous then the convergence is
uniform over T , with probability one.
There are many theoretical conclusions that follow from this representation.
For one example, note that since continuity of C will imply that of the ϕn (cf.
Lemma 3.1.4), it follows from (3.0.1) that sample path continuity of f is a “tail
event’’ on the σ -algebra determined by the ξn , from which one can show that centered
Gaussian processes are either continuous with probability one, or discontinuous with
probability one. There is no middle ground. A wide variety of additional zero–one
laws also follow from this representation. Perhaps the most notable is the following,
P lim fs = ft for all t ∈ T = 1
s→t
⇐⇒ P lim fs = ft = 1 for each t ∈ T .
s→t
That is, under our ubiquitous assumption that the covariance function C is continuous,
pointwise and global a.s. continuity are equivalent for Gaussian processes. A proof,
in the spirit of what follows, can be found in [3].
The practical implications of (3.0.1) are mainly in the area of simulation. By
truncating the sum (3.0.1) at a point appropriate to the problem at hand, one needs
“only’’ to determine the ϕn . However, these arise as the orthonormal basis of a
particular Hilbert space, and can generally be found by solving an eigenfunction
66 3 Orthogonal Expansions
The fact that C is nonnegative definite implies (u, u)H ≥ 0 for all u ∈ S. Furthermore,
note that the inner product (3.1.1) has the following unusual property:
n
n
(u, C(t, ·))H = ai C(si , ·), C(t, ·) = ai C(si , t) = u(t). (3.1.2)
i=1 H i=1
the last line following directly from (3.1.1). Thus it follows that if {un } is Cauchy in
· H then it is pointwise Cauchy. The closure of S under this norm is a space of
real-valued functions, denoted by H (C), and called the RKHS of f or of C, since
every u ∈ H (C) satisfies (3.1.2) by the separability of H (C). (The separability of
H (C) follows from the separability of T and the assumption that C is continuous.)
3.1 The General Theory 67
Since all this seems at first rather abstract, consider two concrete examples. Take
T = {1, . . . , n} finite, and f centered Gaussian with covariance matrix C = (cij ),
cij = E{fi fj }. Let C −1 = (cij ) denote the inverse of C, which exists by positive
definiteness. Then the RKHS of f is made up of all n-dimensional vectors u =
(u1 , . . . , un ) with inner product
n
n
(u, v)H = ui cij vj .
i=1 j =1
To prove this, we need only check that the reproducing kernel property (3.1.2) holds.1
However, with δ(i, j ) the Kronecker delta function, and Ck denoting the kth row of C,
n
n
n
(u, Ck )H = ui cij ckj = ui δ(i, k) = uk ,
i=1 j =1 i=1
as required.
For a slightly more interesting example, take f = W to be standard Brownian
motion on T = [0, 1], so that C(s, t) = min(s, t). Note that C(s, ·) is differentiable
everywhere except at s, so that following the heuristics developed above we expect
that H (C) should be made up of a subset of functions that are differentiable almost
everywhere.
To both make this statement more precise and prove it, we start by looking at the
space S. Thus, let
n
n
u(t) = ai C(si , t), v(t) = bi C(ti , t),
i=1 i=1
Since the derivative of C(s, t) with respect to t is 1[0,s] (t), the derivative of u is
n
Therefore,
i=1 ai 1[0,si ] (t).
1
(u, v)H = a i bj 1[0,si ] (t)1[0,tj ] (t) dt
0
1
= ai 1[0,si ] (t) bj 1[0,tj ] (t) dt
0
1
= u̇(t)v̇(t) dt.
0
1 A simple proof by contradiction shows that there can never be two different inner products
on S with the reproducing kernel property.
68 3 Orthogonal Expansions
With S under control, we can now look for an appropriate candidate for the RKHS.
Define ! "
t 1
H = u : u(t) = u̇(s) ds, (u̇(s))2 ds < ∞ , (3.1.3)
0 0
it follows that the H defined by (3.1.3) is indeed our RKHS. This H is also known,
in the setting of diffusion processes, as a Cameron–Martin space.
With a couple of examples under our belt, we can now return to our main task:
setting up the expansion (3.0.1). The first step is finding a countable orthonormal
basis for the separable Hilbert space H (C). We start with H = span{ft , t ∈ T },
with the (covariance) inner product that H inherits as a subspace of L2 (P). Next we
define a linear mapping : S → H by
n
n
(u) = ai C(ti , ·) = ai f (ti ).
i=1 i=1
Consequently, extends to all of H (C) with range equal to all of H, with all limits
remaining Gaussian. This extension is called the canonical isomorphism between
these spaces.
Since H (C) is separable, we now also know that H is, and proceed to build an
orthonormal basis for it. If {ϕn }n≥1 is an orthonormal basis for H (C), then setting
ξn = (ϕn ) gives {ξn }n≥1 as an orthonormal basis for H. In particular, we must have
that the ξn are N (0, 1) and
∞
ft = ξn E{ft ξn }, (3.1.5)
n=1
where the series converges in L2 (P). Since was an isometry, it follows from
(3.1.5) that
E{ft ξn } = (C(t, ·), ϕn )H = ϕn (t), (3.1.6)
the last equality coming from the reproducing kernel property of H (C). Putting
(3.1.6) together with (3.1.5) now establishes the following central result.
3.1 The General Theory 69
Theorem 3.1.1. If {ϕn }n≥1 is an orthonormal basis for H (C), then f has the L2
representation
∞
ft = ξn ϕn (t), (3.1.7)
n=1
where {ξn }n≥1 is the orthonormal sequence of centered Gaussian variables given by
ξn = (ϕn ).
The equivalence in (3.1.7) is only in L2 ; i.e., the sum is, in general, convergent,
for each t, only in mean square. The following result shows that much more is true
if we know that f is a.s. continuous.
Theorem 3.1.2. If f is a.s. continuous, then the sum in (3.1.7) converges uniformly
on T with probability 1.2
We need two preliminary results before we can prove Theorem 3.1.2. The first is
a convergence result due to Itô and Nisio [83], which, since it is not really part of a
basic probability course,3 we state in full, and the second an easy lemma.
Lemma 3.1.3. Let {Zn }n≥1 be a sequence of symmetric independent random vari-
ables, taking values in a separable, real Banach space B, equipped with the norm
topology. Let Xn = ni=1 Zi . Then Xn converges with probability one if and only if
there exists a B-valued random variable X such that Xn , x ∗ → X, x ∗ in proba-
bility for every x ∗ ∈ B ∗ , the topological dual of B.
Lemma 3.1.4. Let {ϕn }n≥1 be an orthonormal basis for H (C). Then, under our glo-
bal assumption of continuity of the covariance function, each ϕn is continuous and
∞
ϕn2 (t) (3.1.8)
n=1
where the first and last-but-one equalities use the reproducing kernel property and
the one inequality is Cauchy–Schwarz. The continuity of ϕn now follows from that
of C.
To establish the uniform convergence of (3.1.8), note that the orthonormal ex-
pansion and the reproducing kernel property imply
∞
∞
C(t, ·) = ϕn (·)(C(t, ·), ϕn )H = ϕn (·)ϕn (t), (3.1.9)
n=1 n=1
where the series converges absolutely and uniformly on [0, 1]N × [0, 1]N .
The key to the Karhunen–Loève expansion is the following result.
√
Lemma 3.2.2. For f on [0, 1]N as above, { λn ψn } is a complete orthonormal system
in H (C).
√
Proof. Set ϕn = λn ψn and define
! ∞ ∞
"
H = h : h(t) = an ϕn (t), t ∈ [0, 1] ,
N
an < ∞ .
2
n=1 n=1
where h = an ϕn and g = bn ϕn .
To check that H has the reproducing kernel property, note that
∞ ∞
(h(·), C(t, ·))H = an ϕn (·), λn ψn (t)ϕn (·)
n=1 n=1 H
∞
= λn an ψn (t) = h(t).
n=1
√
It remains to be checked that H is in fact a Hilbert space, and that { λn ψn } is both
complete and orthonormal. But all this is standard, given Mercer’s theorem.
72 3 Orthogonal Expansions
We can now start rewriting things to get the expansion we want. Remaining with
the basic notation of Mercer’s theorem, we have that the RKHS, H (C), consists of
all square-integrable functions h on [0, 1]N for which
∞
2
λn h(t)ψn (t) dt < ∞,
n=1 T
1
The Karhunen–Loève expansion of f is obtained by setting ϕn = λn2 ψn in the
orthonormal expansion (3.1.7), so that
∞
1
ft = λn2 ξn ψn (t), (3.2.3)
n=1
on I N = [0, 1]N .
For the moment, we set N = 1 and so have the standard Brownian motion
on [0, 1], for which we have already characterized the corresponding RKHS as the
Cameron–Martin space. For Brownian motion (3.2.1) becomes
1 t 1
λψ(t) min(s, t)ψ(s) ds = sψ(s) ds + t ψ(s) ds.
0 0 t
Returning now to the general Brownian sheet, it is now clear from the product
form (3.2.4) of the covariance function that rather than taking a single sum in the
Karhunen–Loève expansion, it is natural to work with an N -dimensional sum and a
multi-index n = (n1 , . . . , nN ). With this in mind, it follows then from (3.2.5) that
the eigenfunctions and eigenvalues are given by
#
N
1
ψn (t) = 2N/2 sin (2ni + 1) π ti ,
2
i=1
N
# 2
2
λn = .
(2ni + 1)π
i=1
As an aside, there is also an elegant expansion of the Brownian sheet using the
Haar functions, which also works in the set-indexed setting, due to Ron Pyke [129].
Another class of examples that can almost be handled via the Karhunen–Loève
approach is that of stationary fields defined on all of (noncompact) RN . Since what
is about to come is rather imprecise, we shall allow ourselves the standard notational
luxury of the stationary theory of taking our field to be complex-valued, despite the
fact that all that we have done in this chapter was for real-valued processes. In this
case the covariance function is C(s, t) = E{fs ft } = C(t − s) and is a function of
t − s only. It is then easy to find eigenfunctions for (3.2.1) via complex exponentials.
Note that for any λ ∈ RN , the function eit,λ (a function of t ∈ RN ) satisfies
C(s, t)eis,λ ds = C(t − s)eis,λ ds
RN RN
These are, respectively, special cases of the spectral distribution theorem (cf. (5.4.1))
and the spectral representation theorem (cf. (5.4.6)) of Chapter 5 when the spectral
measure is discrete.
Despite the minor irregularity of assuming that f is complex-valued, the above
argument is completely rigorous. On the other hand, what follows is not.
If Kλ = 0 on an uncountable set, then the situation becomes more delicate,
but is nevertheless worth looking at. In this case, one could imagine replacing the
summations in (3.2.6) and (3.2.7) by integrals, to obtain
C(t) = Kλ eit,λ dλ
RN
and
1/2
f (t) = Kλ ξλ eit,λ dλ. (3.2.8)
RN
Everything is well defined in the first of these integrals, but in the second we have
the problem that the ξλ should be independent for each λ, and it is well known that
there is no measurable way to construct such a process.
Nevertheless, we shall see in Chapter 5 that there are ways to get around these
problems, and that when properly formulated, (3.2.8) actually makes sense.
4
Excursion Probabilities
As we have already mentioned more than once before, one of the oldest and most
important problems in the study of stochastic processes of any kind is to evaluate the
excursion probabilities
P sup f (t) ≥ u ,
t∈T
where f is a random process over some parameter set T . As usual, we shall restrict
ourselves to the case in which f is centered and Gaussian and T is compact for the
canonical metric of (1.3.1).
While the Borell–TIS inequality, Theorem 2.1.1, gives an easy and universal
bound to Gaussian excursion probabilities, it is far from optimal. In fact, comput-
ing excursion probabilities for general Gaussian processes is a surprisingly difficult
problem. Even the simple case of T = [0, 1] is hard. In this one-dimensional case,
there is hope for obtaining an explicit expression for the excursion probability if f is
Markovian. For example, if f is Brownian motion, then, as any elementary textbook
on stochastic processes will tell you, the so-called reflection principle easily yields
that P{sup0≤t≤1 f (t) ≥ u} = 2(u), where is the Gaussian tail probability (1.2.1).
However, if we turn to stationary processes on R, then there are only five non-
trivial1 Gaussian processes for which the excursion probability has a known analytic
form, and the resulting formulas, while amenable to numerical computation, are not
generally very pretty.2
In each of these five examples the process is either Markovian or close to Marko-
vian, and this is what makes the computation feasible. Similarly, none of the processes
has even a mean square derivative, and so none are particularly smooth. In the case
of smooth processes, or processes over parameter spaces richer than the unit interval,
the very specific tools that work for these very special cases fail completely. What
one can do in the general case is the subject of this chapter and of Chapter 14.
1 There is also one close-to-trivial but extremely enlightening case, with covariance function
cos(ωt), for which elementary calculations give the excursion probability. We shall discuss
this in some detail in Section 14.4.4.
2 The five covariance functions are
76 4 Excursion Probabilities
Our ultimate aim will be to develop, when possible and appropriate, expansions
of the form
⎡ ⎤
n
P sup f (t) ≥ u = uα e−u /2σT ⎣ Cj u−j + error⎦
2 2
(4.0.1)
t∈T j =0
for large u and appropriate parameters α, σT2 , n, and Cj that depend on both f and
T . Furthermore, we would like to be able to identify the constants Cj and also to be
able to say something about the error term.
In Chapter 14 we shall indeed establish such a result, with a full identification
of all constants. However, we shall have to make two major assumptions. One is
that the parameter space T is a piecewise smooth manifold, and the second is that
f is almost surely C 2 . The current chapter will make fewer assumptions, and will
concentrate on general approaches relying heavily on the concept of entropy from
Chapter 1. As a consequence, we shall generally be able to identify only the first term
in (4.0.1). Even then, while we shall be able to determine α and σT2 , only under rare
cases shall we also be able to identify C0 .
Furthermore, so as to keep the treatment down to a reasonable length, we shall
generally concentrate on upper bounds, rather than expansions, for Gaussian excur-
sion probabilities.
This chapter contains six sections. The first four have just been described. The
remaining two are both brief, and are meant as quick introductions to techniques that
can lead to sharper results under additional assumptions. They lie between the general
results of the first four sections and the very sharp results, for smooth Gaussian fields
on structured domains, of Chapter 14.
where in the last case, β ∈ (0, 1), and the computation of the excursion probability is known
only for f (0) conditioned to be 0. For details, see [145, 43] and the review [26].
4.1 Entropy Bounds 77
2 −u2 /2σ 2
P sup ft ≥ u ≤ eεu T , (4.1.1)
t∈T
where σT2 = supT E{ft2 } (cf. (2.1.4)). This is in the classic form of a large-deviation
result, and holds with no assumptions beyond the almost sure boundedness of f .
Our aim is to improve on (4.1.1) by placing side conditions on the entropy (cf.
Definition 1.3.2) of f over T . Here is a very easy, and definitely suboptimal, result.
Theorem 4.1.1. Let f be a centered, a.s. continuous Gaussian field over T with
entropy function N . If N (ε) ≤ Kε−α , then for all sufficiently large u,
P sup f (t) ≥ u ≤ Cα uα+η e−u /2σT ,
2 2
(4.1.2)
t∈T
and
μ(ε) = sup μ(t, ε), (4.1.3)
t∈T
where Bd (t, ε) is a ball of radius ε around t in the canonical metric d of (1.3.1). Since
N(ε) balls of radius ε cover T , it is an immediate consequence of the Borell–TIS
inequality that for u > μ(ε),
1
P sup f (t) ≥ u ≤ N (ε)e− 2 (u−μ(ε)) /σT .
2 2
(4.1.4)
t∈T
The bound in (4.1.2) can be improved slightly, to C(u log u)α e−u /2σT , by choos-
2 2
ing ε = ε(u) in the above proof to satisfy u−1 = ε(log(1/ε))1/2 . The gain, however,
is rather small, in view of the fact that far stronger results hold. In particular, in a
series of papers [135, 136, 150, 137, 153] by Samorodnitsky and Talagrand, with a
leapfrogging of ideas and techniques,3 the results of the following three theorems,
among others, were obtained. In each, f is a priori assumed to have continuous
sample paths with probability one, σT2 = supT E{ft2 }, and at each appearance, K
represents a universal constant that may differ between appearances.
Theorem 4.1.2. If for some A > σT , some α > 0, and some ε0 ∈ [0, σT ] we have
N (T , d, ε) ≤ (A/ε)α (4.1.7)
√
for all ε < ε0 , then for u ≥ σT2 [(1 + α)/ε0 ] we also have
α
KAu u
P sup ft ≥ u ≤ √ 2 , (4.1.8)
t∈T ασT σT
N (Tδ , d, ε) ≤ Aδ β ε −α . (4.1.10)
√
Then for u ≥ 2σT β,
α−β
Aβ β/2 u u
P sup ft ≥ u ≤ α/2 K α+β . (4.1.11)
t∈T α σT2 σT
Theorem 4.1.4. Suppose there exist A, B > 0 and α ∈ (0, 2) such that
In a moment we shall turn to proofs of the first two of these results,5 adopting
(and occasionally correcting typos and oversights in) the proofs in [153]. Note that
(4.1.10) raises an interesting question whether we (formally) take α = β. In that
case the upper bound is of the form C(α, β)(u/σT ), and this also serves as a trivial
lower bound when C = 1. Thus it is natural to ask if there are scenarios in which
one can in fact also take C = 1 in the upper bound. It turns out that there are, and we
shall look at these in Section 4.2.
Theorems 4.1.2 and 4.1.3 treat entropy functions for which the growth of N in
ε is of a power form, which, at least for processes indexed by points in Euclidean
spaces or smooth manifolds, are the most common. Theorem 4.1.4, which we shall
not prove, handles the scenario of exponential entropy and is a qualitatively different
result. Note that you cannot set α = 0 in this result to recover either Theorem 4.1.2
or 4.1.3. The upper bound given here is, under mild side conditions, also a lower
bound.
Before reading further, you might want to turn now to Section 4.3 to see some
concrete examples for which the above results apply. Here, however, we shall give
the proofs of Theorems 4.1.2 and 4.1.3. The central idea is old, and goes back at
least to the works of Berman [18, 19] and Pickands [123, 124, 125] in the mid to late
1960s, which themselves have roots in Cramér’s classic paper [41].
The basic approach lies in looking for a subset Tmax ⊂ T (very often a unique
point in T ) where the maximal variance is achieved, and then studying two things:
the “size’’ of Tmax (e.g., as measured in terms of metric entropy) and how rapidly
E{ft2 } decays as one moves out of Tmax . The underlying idea is that the supremum
of the process is most likely to occur in or near the region of maximal variance, and
the rate of decay of E{ft2 } outside that region determines how best to account for the
impact of nearby regions.
A key lemma, on which all the proofs we shall give rely, is the following result
of [153]. Recall throughout the assumption of Chapter 1 that the parameter space T
is compact for the pseudometric d, which we shall assume is still in place.
Proof. While the proof is rather technical, the idea behind it is actually quite simple.
It starts by defining the function at by
E{(ft − X)X}
at = ≤ 0,
σ2
and the random process gt by
ft = gt + (1 + at )X.
Note that E{gt X} = 0. That is, X and the full process g are independent.
The argument then works as follows. There are essentially three ways for supT ft
to be larger than u. One is for supT gt to be large. One is for each of X and supT gt
to be large enough for the sum to be large, and the last is for X to be large. The
three terms in (4.1.13) correspond, in order, to these three cases, and each is obtained
by conditioning on the distribution of X and then using the Borell–TIS inequality to
handle supT gt . That there is only one term in (4.1.14) comes from the fact that here u
is even larger, and then the last term actually dominates all others. The reason for this
is that from the assumption at ≤ 0 and a < σ it follows that E{X 2 } > supT E{gt2 },
and so it is “easier’’ for X to reach larger values than it is for g. Now read on.
It is trivial to check that for all s, t ∈ T ,
E (gt − gs )2 ≤ E (ft − fs )2 ,
However, since X and the process g are independent, we can drop the last condition-
ing, and since
E gt2 ≤ E (gt + at X)2 = E (ft − X)2 ≤ a 2 ,
it follows from (4.1.15) and the Borell–TIS inequality (Theorem 2.1.1) that for 0 ≤
x ≤ u − μ,
(u − x − μ)2
P sup ft ≥ uX = x ≤ exp − . (4.1.16)
t∈T 2a 2
∞
e−x 2 /2σ 2
P sup ft ≥ u = P sup ft ≥ uX = x √ dx
t∈T −∞ t∈T 2π σ
0 u−μ ∞
= + + ...
−∞ 0 u−μ
= I1 + I2 + I3 .
The term I3 is trivially bounded by the last term in (4.1.13). For I1 , bound the
integrand by (4.1.17) and integrate out x to obtain the first term on the right-hand side
of (4.1.13). For the intermediate term, apply (4.1.16) and note, with s = u − μ, that
∞
1 (s − x)2 x2
I2 ≤ √ exp − − dx
2πσ −∞ 2a 2 2σ 2
a s2
=√ exp − ,
a2 + σ 2 2(σ 2 + a 2 )
evaluating the integral by completing the square. Thus the first part of the lemma,
(4.1.13), is established.
It remains to prove (4.1.14). We start by observing that
(u − μ)2 (u − μ)2 (u − μ)2 σ 2
exp − = exp − exp − .
2a 2 2(a 2 + σ 2 ) 2a 2 (a 2 + σ 2 )
Since σ > a and u > μ + σ ,
(u − μ)2 σ 2 σ2 a
exp − 2 2 ≤ exp − ≤ .
2a (a + σ ) 2 4a 2 σ
Thus the sum of the first two terms on the right-hand side of (4.1.13) is bounded
above by
2a (u − μ)2
exp − .
σ 2(a 2 + σ 2 )
Since
1 1 a2
≥ − ,
α2 + σ 2 σ2 σ4
the previous expression is at most
2a (u − μ)2 u2 a 2
exp − +
σ 2σ 2 2σ 4
2 2
2au(u − μ) u a (u − μ)2
≤ exp exp − ,
σ3 2σ 4 2σ 2
on applying the various inequalities between a, σ , u and μ. Apply the basic Gaussian
tail bound (1.2.2) to bound the above by
4.1 Entropy Bounds 83
2 2
Kua u−μ u a
exp ,
σ2 σ 2σ 4
for a universal K. To complete the argument note that for x ≥ 1, y ≥ 0,
(x − y) ≤ e2xy (x).
(This is easily proven from the fact that f (y) = e2xy (x) − (x − y) satisfies
f (0) = 0 and has positive derivative, since
1
f (y) = 2xe2xy (x) − √ e−(x−y) /2 ≥ 0,
2
2π
again applying the Gaussian tail bound (1.2.2).)
While Lemma 4.1.5 will be the main tool for the proofs of Theorems 4.1.2
and 4.1.3, we first need a counting result about partitioning our (pseudo)metric
space (T , d).
Lemma 4.1.6. Let (T , d) be a metric space and take p, q integers, p < q. Let Pq
be a partition of T of sets of diameter no more than 4−q+1 . Take a set of integers kl ,
p ≤ l ≤ q. Then there exists an increasing sequence {Pl }p≤l≤q of partitions of T
with the following properties:
Pl+1 is a refinement of Pl . i.e., For each A ∈ Pl+1 there is an (4.1.18)
A ∈ Pl such that A ⊆ A .
For each set A ∈ Pl , diam(A) ≤ 4−l+1 . (4.1.19)
Each set of Pl contains at most kl sets of Pl+1 . (4.1.20)
|Pl+1 |
|Pl | ≤ N(4−l ) + , for all l < q, (4.1.21)
kl
where N(ε) is the metric entropy function for (T , d) and |Pl | is the number of sets
in Pl .
Proof. We shall construct the partitions P by decreasing induction over l; i.e., we
show how to construct P once P+∞ is given.
Set N = N(4−l ) and consider points {ti }1≤i≤N of T such that
sup inf d(t, ti ) ≤ 4−l .
t∈T i
For i ≤ N let Ai be the union of all the sets in P+∞ that intersect Bd (ti , 4−l ). Thus,
since the sets in P+∞ have diameter no larger than 4−l , Ai has diameter at most
Corollary 4.1.7. Suppose that in Lemma 4.1.6 we have, for some A, α > 0, N (4−l ) ≤
(A4l )α if l ≥ p and |Pq | ≤ (A4q )α . Then, if kl ≥ 2 · 4α for all l, we have
|Pl | ≤ 2(A4l )α .
Proof. Using the construction given in the proof of the lemma, and by decreasing
induction over l,
2(A4l+1 )α
|P | ≤ (A4l )α + ≤ 2(A4l )α .
kl
Proof of Theorem 4.1.2. The argument is by partitioning, much as for the proof of
Theorem 4.1.1. The improvement in the result relies on being more careful as to
how the partitioning is carried out, and from a judicious application of Lemma 4.1.5,
which we did not previously have.
Start by choosing a < σT and μ > 0. Then choose a partition {Ti }i≤N of T into N
compact sets, each of diameter no more than a and for each of which E{supTi ft } ≤ μ.
Then, for u > max(2μ, μ + σT ), it follows from (4.1.14) (by taking X to be f taken
at a point of maximal variance in each Ti ) that
au a 2 u2 2uμ u
P sup ft ≥ u ≤ N 1 + K 2 exp 4
exp 2
, (4.1.22)
t∈T σT 2σT σT σ T
where K is a universal constant. The free variables here are μ and a, and the problem
is that N → ∞ as a, μ → 0. The trick therefore is to find a sequence of partitions
for which N does not grow too fast.
As usual, our first step is to assume that T is finite and once again apply the
argument at the end of the proof of the Borell–TIS inequality for moving from finite
to infinite T .6 Thus we can, and do, assume that T is finite, and can now start the
serious part of the proof.
Since T is finite, we can consider it as a partition of itself, and find a q large
enough so that d(s, t) ≥ 4−q+1 for all s, t ∈ T . Applying Corollary 4.1.7 with
kl = 2 · 4α ! + 1 ≤ 3 · 4α , and for each l ≤ q, with 4−l ≤ ε0 , we find a further
partition of T into N ≤ 2(A4l )α sets {Ti }i≤N such that for m ≥ l,
for all i ≤ N. The last inequality follows from repeated applications of (4.1.20).
Using this bound in Theorem 1.3.3, it is a simple calculation to see that
√
E sup ft ≤ K α4−l , (4.1.23)
t∈Ti
Proof of Theorem 4.1.3. As before, the proof uses a partitioning argument. This
time, however, we start by partioning the space into regions depending on the size
of E{ft2 }. In each of these regions we apply Theorem 4.1.2 to bound the supremum,
so that the hard part of the bounding has already been done. The rest of the proof is
really just accounting, to make
√ sure that the various pieces
√ add up appropriately.
To start, fix u > 2σT β. Set δ0 = 0, δ1 = βσT2 /u and, for k ≥ 1, set
√ √
δk = 2k−1 δ1 . For k ≥ 1 set Uk = Tδk \ Tδk−1 . Set ε0 = δ1 (1 + α)/ β, and
note that
√ √
σT2 1 + α σ2 β
≤ T = u.
ε0 δ1
we see that
α
Ku 2
u2 δk−1 u
β
P sup ft ≥ u ≤ Aδk √ 2 exp − .
t∈Uk ασT 2σT4 σT
Note that
86 4 Excursion Probabilities
β
2
u2 δk−1 β
β
2
u2 δk−1
δk exp − = δ1 + δk exp −
k≥1
2σT4 k≥2
2σT4
⎛ ⎞
≤ δ1 ⎝1 + 2βk exp(−β22k−3 )⎠
β
k≥2
≤ (Kδ1 ) , β
'
the last line requiring β > 1. Consequently, if we set T = k≤k0 Uk , we have,
recalling the value of δ1 , that
α−β
β β/2 u u
P sup ft ≥ u ≤ A α/2 K β+α . (4.1.24)
t∈T α σT2 σT
/ T we have
For t ∈
E ft2 ≤ σT2 − δk20 −1 ≤ σT2 − σT2 /16 = 15σT2 /16.
√
Thus, if we now apply the entropy bound (4.1.10), we see that for u > σT β we
have
α
Ku 4u
β
P sup ft ≥ u ≤ AσT √ 2 √ . (4.1.25)
t∈T \T ασT σT 15
√ √
To conclude, it suffices to check that for x ≥ β we have (4x/ 15) ≤
(Kβ)β/2 x −β (x), so that the left-hand side of (4.1.25) is dominated by the left-
hand side of (4.1.11).
We shall now look at a rather interesting special case. We know already from the
discussion following Theorem 4.1.3 that there would seem to be cases in which the
trivial lower bound
u
P sup f (t) ≥ u ≥ sup P {f (t) ≥ u} =
t∈T t∈T σT
Suppose that
E supt∈Tδ ft
lim = 0. (4.2.2)
δ→0 δ
Then
P supt∈T f (t) ≥ u
lim = 1. (4.2.3)
u→∞ (u/σT )
There is actually a converse to this result, which states that (4.2.3) implies the
existence of a t0 ∈ T for which (4.2.1) and (4.2.2) hold. You can find a proof of the
converse in [151].
Proof. Since it is a triviality that
u
P sup ft ≥ u ≥ P ft0 ≥ u = ,
t∈T σT
the theorem will be proven once we establish (4.2.3) with a limsup rather than lim.
To start, take η ∈ (0, 1) and, by (4.2.2), δ0 small enough so that
δ ≤ 2δ0 ⇒ E sup ft ≤ η2 δ. (4.2.4)
t∈Tδ
Fix u and set α = σT2 /ηu. Adopting and adapting the main idea in the proof of
Theorem 4.1.3, define a nondecreasing sequence of subsets of T , and a sequence of
“annuli,’’ by setting V−1 = ∅ and, for k ≥ 0,
Vk = T2k α , Uk = Vk \ Vk−1 .
Tδ0 ⊂ Uk ,
0≤k≤p
and we shall now try to obtain the bound in (4.2.6) by obtaining a similar bound for
each Uk and then combining
them
via a union bound.
Setting μk = E supt∈Vk ft , (4.2.4) implies that μk ≤ αη2 2k for k ≤ p. Setting
2 1/2
ωk = sup E ft − ft0 ,
t∈Vk
we have
! "
ωk = sup KE ft − ft0 ≤ KE sup ft − ft0 ≤ Kμk ≤ Kαη2 2k , (4.2.7)
t∈Vk t∈Vk
the first line being a standard Gaussian identity and the second inequality coming
from the fact that t0 ∈ Vk .
By (4.1.14) with X = ft0 we therefore have
! "
u 2 u
P sup ft ≥ u ≤ (1 + KηeKη /2 )e2η ≤ (1 + Kη).
t∈U0 σT σT
We use a similar argument for the remaining Uk . Take the random variable X of
Lemma 4.1.5 to be Xk = (1 − (α2k−1 )2 /σT2 )ft0 . It is then easy to check that the
definition of Tδ implies that (4.1.12) holds. Furthermore, since k ≤ p and δ0 ≤ σT η2 ,
k−1 2
2 1/2 α2 δ0 k−1
E X k − f t0 = ≤ α2 ≤ η2 α2k .
σT σT
It follows from (4.2.7) and the triangle inequality that if we set
ak2 = sup E (ft − Xk )2 ,
t∈Vk
then ak ≤ Kη2 α2k . Thus, by (4.1.14) and the fact that η < 1 we have
4.3 Examples 89
u 2 2k k
P sup ft ≥ u ≤ 1 + Kη2k eKη 2 eKη2
t∈Uk σT (1 − (α2 ) /σT )
k−1 2 2
u 2k
≤ eKη2 .
σT (1 − (α2 ) /σT )
k−1 2 2
u u (α2k−1 )2 u
≥ + .
σT (1 − (α2k−1 )2 /σT2 ) σT 2σT3
Since the last sum can be made arbitrarily small by taking η small enough, we have
established (4.2.6) and so the theorem.
4.3 Examples
Theorem 4.2.1 provided us with conditions under which Gaussian processes have
supremum distribution, at least in the tail, which behaves the same as that of a single
Gaussian variable. Here is a class of examples for which this happens.
p 2 (t)
lim = 0. (4.3.1)
t→0 |t|
Without doubt the best known class of random fields statisfying the conditions
of the example are the so-called fractional Brownian motions or fields, which have
covariance function
1 2α
C(s, t) = |t| + |s|2α − |t − s|2α .
2
While these processes have isotropic increments for all α ∈ (0, 1), it is only when
α > 12 that p2 is convex. More information on these processes can be found, for
example, in [138].
Proof. The proof requires no more than checking that conditions (4.2.1) and (4.2.2)
of Theorem 4.2.1 hold. Since f has stationary increments, (4.2.1) is clearly satisfied
by taking t0 = T . Thus we need only show that (4.2.2) holds, namely,
E supt∈Tδ ft
lim = 0.
δ→0 δ
However, since p2 (t) is convex, it has a left derivative at each point, and so it is easy
to check (draw a picture) that for t ∈ Tδ we have |T − t| ≤ Kδ 2 for some finite K.
Thus, since f has isotropic increments, it suffices to show that
E sup0≤t≤δ 2 |ft |
lim = 0.
δ→0 δ
But this follows immediately from an entropy bound such as Theorem 1.3.3 along
with the inequality (1.4.6) and condition (4.3.1).
We now look at two applications of Theorem 4.1.3. The first, which treats a
nonstationary process on I N = [0, 1]N , is designed to show how sample roughness
and nonstationarity interact to determine excursion probabilties. Setting β = ∞
gives a result for stationary processes, and setting α = 2 gives a result for (relatively)
smooth processes of a form that we shall study in far more detail in Chapter 14.
The second example shows how to use the general theory to handle a Brownian
sheet problem.
when |t| ≤ γ , for some a, γ > 0 and α ∈ [0, 2]. Let σ (t) be positive, continuous,
and nondecreasing (under the usual partial order) on I N such that
Finally, let σ , without a parameter, denote σ (1, . . . , 1). Then there exists a
(computable) finite C = C(N, a, α, b, β, σ ) > 0 such that for sufficiently large u,
!
− 2 + 2N
Cu β α σu , 0 < α < 2β,
P sup h(t) > u ≤ u (4.3.6)
t∈I N C σ , 0 < 2β ≤ α.
Proof. Note firstly that if σ is strictly increasing, then h has a unique point of max-
imum variance, at t0 = (1, . . . , 1). Nevertheless, even in this case, the assumptions
we have made are not strong enough to imply that (4.2.2) holds, so that Theorem 4.2.1
(which is designed for examples in which there is a unique point of maximal variance)
need not apply. Consequently, we apply Theorem 4.1.3, so that the proof relies on
finding a good bound for the entropy N (Tδ , d, ε) of the set
Tδ = t ∈ I N : E{h2t } = σ 2 (t) ≥ σ 2 − δ 2 .
It is an easy calculation from (4.3.4) that Tδ can be covered by a cube of side length
no more than Cδ 2/β b−1/β , where C = C(N) is a dimension-dependent constant.
Furthermore, in view of the definition (4.3.5) of h,
E (ht − hs )2 = (σt − σs )2 + 2σt σs (1 − C(t − s)) (4.3.7)
≤ b2 |t − s|2β + 2σ 2 a|t − s|α ,
Combining this with the comments above on the size of Tδ , we have that
The first case in (4.3.6) now follows from Theorem 4.1.3. The second is similar and
left to you.
92 4 Excursion Probabilities
Our last example deals with the pinned Brownian sheet over N -dimensional rect-
angles. Recall from Section 1.4.3 that the Brownian sheet is a Gaussian noise defined
on the Borel subsets of RN . The set-indexed, pinned Brownian sheet, based on a prob-
ability measure ν on the Borel sets of I N , is the set-indexed, zero-mean Gaussian
process B with covariance function
E B(A)B(A ) = ν(A ∩ A ) − ν(A)ν(A ). (4.3.8)
Example 4.3.3. Let B be the set-indexed, pinned Brownian sheet based on a proba-
bility measure ν and defined over the collection AN of N -dimensional rectangles in
I N . Assume that ν have a bounded density that is everywhere positive.7 Then there
exists a finite C = C(ν) > 0 such that for large enough u,
P sup B(A) > u ≤ Cu2(2N−1) e−2u .
2
(4.3.9)
A∈AN
where a(u) ' b(u) ⇐⇒ limu→∞ a(u)/b(u) is well defined. (Note: the factor (N − 1)!
is missing in [62, 63].)
In the one-dimensional case there is no loss of generality in taking ν to be uniform measure
on [0, 1], in which case B is the Brownian bridge. We can then do better than (4.3.10). In
particular, any standard graduate probability textbook will prove—via an iterative use of
the classic reflection principle—that
∞
(−1)k−1 e−2k u ,
2 2
P sup |B(t)| > u = 2
t∈[0,1] k=1
the first term of which, of course, we have in (4.3.10), albeit without a precise constant.
Note that the factor of 2 here comes from the fact that we have looked at the supremum of
the absolute value of B. The fact that it is precisely 2 comes from the “well-known’’ fact
that high maxima and low minima of Gaussian processes are asymptotically indpendent
(e.g., [97]).
4.4 Extensions 93
P sup B(t) > u ≤ Cu2(N−1) e−2u .
2
(4.3.10)
t∈I N
Proof. The first point to note is that since ν is a probability measure, it follows from
(4.3.8) that
1
sup E B(A)2 = sup E B(t)2 = .
A∈AN t∈I N 4
Assuming all its conditions hold, Theorem 4.1.3 therefore immediately gives the
exponent of the exponentials in (4.3.9) and (4.3.10). The remainder of the proof
involves an unentertaining amount of straightforward algebra, so we shall be content
with only an outline. In particular, we shall discuss only (4.3.9). The point-indexed
result (4.3.10) is easier.
As in the proof of Example 4.3.2 we need to find a bound for the entropy of
N(Tδ , d, ε), where Tδ is now given by
1
Tδ = A ∈ AN : E B (A) ≥ − δ
2 2
4
1
= A ∈ AN : ν(A) − ν (A) ≥ − δ .
2 2
4
If ν is Lebesgue measure, which we now assume to keep the notation manageable,
this simplifies to
1 1
Tδ = A ∈ AN : (1 − δ) ≤ ν(A) ≤ (1 + δ) . (4.3.11)
2 2
To approximate the sets in Tδ in terms of the canonical metric, take N -dimensional
rectangles whose endpoints sit on the points of the lattice
−2
ε = t ∈ [0, 1] : ti = ni ε , ni ∈ 0, 1, . . . , [ε ] , i = 1, . . . , N .
2
LN N
We can choose each of the first N − 1 coordinates ni in at most ε−2 ways, and the
last (in view of (4.3.11)) in at most CN δε−2 ways, so that
−4N
N (Tδ , d, ε) ≤ CN ε δ.
Substituting this into (4.1.10) and (4.1.11) is all we need in order to establish (4.3.9)
and so complete the argument.
If ν is not Lebesgue measure, but has an everywhere positive bounded density,
the same argument works with a few more constants.
4.4 Extensions
There is almost no limit to the number of extensions and variations that exist of the
results of the preceding two sections. Obviously, the more one is prepared to assume,
the more one can obtain.
94 4 Excursion Probabilities
L(h)
lim = 0.
h→0 h
Dobric, Marcus, and Weber [48] have shown that if we assume a little more about
L(h) we can improve Theorem 4.2.1 as follows.
Theorem 4.4.1. Suppose there exists a unique t0 ∈ T such that E ft20 = 1 =
2
supT E ft , and two functions ω1 , ω2 , concave for h ∈ [0, h̄] for some h̄ > 0 with
ωi (0) = 0. Define
ωi (h)
hi (u) = sup h : = u , i = 1, 2.
h2
If
and
; <2
sup E ft − ft0 E(ft ft0 ) < (2 − ε)h2
t∈Th
for h ∈ [0, h̄] for some ε > 0, then there exist constants C1 and C2 such that for all
u large enough,
P{supT ft ≥ u}
eC1 uω1 (h1 (u)) ≤ ≤ eC2 uω2 (h2 (u)) .
(u)
P{supt∈T f (t) ≥ u}
lim sup ≥ 1.
u→∞ (u) exp(k1 (1 − ε)uω1 (h1 (u)))
You can find a proof of this result in [48]. It differs in detail, but not in kind, from
what we have seen here.
There are also asymptotic bounds of a somewhat different nature due to Lifshits
[104], who, in a quite general setting, has shown that
4.5 The Double-Sum Method 95
u2−p − du1−p
u−d
P sup ft > u ' (2 − p) p 1/2
t∈T pσT2σT
! "
(2 − p)u2 d(p − 1)u d2
× exp − − + ,
2pσT2 pσT2 2σT2
where = {λij }1≤i,j ≤N is the matrix of second-order spectral moments (cf. (5.5.2)).
In this case, one can show that
P supt∈[0,T ]N ft ≥ u (det )1/2
lim = . (4.4.2)
u→∞ uN (u) (2π )N/2
Why this result should be true is something we shall soon investigate in detail and
over parameter spaces far more general than cubes in Part III of the book. Indeed,
we shall obtain far more precise results, of the ilk of (4.0.1).
The single summations are treated much as we did before, by choosing a point tk ∈ Tk ,
writing
∞
P sup f (t) ≥ u = P sup f (t) ≥ uf (tk ) = u ptk (x) dx,
t∈Tk −∞ t∈Tk
then we would be basically done, since then the double-sum term would easily be
seen to be of lower order than the single sum. Such independence obviously does not
hold, but if we choose the sizes of the Tk in such a fashion that a “typical component’’
of an excursion set is considerably smaller than this size, and manage to show that
high extrema are independent of one another, then we are well on the way to a proof.
The details, which are heavy, are all in Piterbarg [126]. Piterbarg’s monograph
is also an excellent source of worked examples, and includes a number of rigorous
computations of excursion probabilities for many interesting examples of processes
and fields.
We close this chapter with somewhat of a red herring, in that we shall describe an
approach that makes a lot of sense, that has been used and developed with considerable
success by Piterbarg [126] and others, but which we shall nevertheless not explore
further. Despite this, it does give a good heuristic feeling for results we shall develop
in Part III, particularly those of Chapter 14.
We leave the general setting of the previous sections, and concentrate on smooth
random fields over structured parameter spaces. In particular, we shall adopt the
regularity conditions of Chapter 12, in which f will be a smooth Gaussian process
defined over a Whitney stratified manifold M. Since we shall get around to defining
these manifolds only in Section 8.1, you can either return to the current discussion
after reading that section, or simply assume that “an N-dimensional Whitney stratified
4.6 Local Maxima and Excursion Probabilities 97
manifold’’ is no more than another way of saying “a unit cube I N ⊂ RN .’’ You will
not lose much in the way of intuition in doing so.
Furthermore, since the current section deals with a general idea rather than specific
results, we shall not be more precise about exactly what conditions we require and
shall simply assume that everything we write is well defined. The appropriate rigor
will come in Part III of the book.
Our aim is to connect the excursion probability of f with the mean number of its
local maxima of a certain type.
We start by noting that an N -dimensional Whitney stratified manifold M can be
written as the disjoint union
N
M= ∂k M, (4.6.1)
k=0
of open k-dimensional submanifolds (cf. (9.3.5)). (If you are working with I N , then
this is just a decomposition of I N into k-dimensional facets, so that ∂N M = M ◦ is
the interior of the cube, ∂N−1 M the union of the interiors of the (N − 1)-dimensional
faces, and so forth, down to ∂0 M, which is made up of the 2N vertices of the cube.)
Let
MuE (M) = # (extended outward maxima of f in M with ft ≥ u) .
In the terminology of Section 9.1, MuE is more precisely defined as the number of
extended outward critical points t ∈ ∂k M ⊂ M of f for which the Hessian of
f |∂k M has maximal index. For the example of I N , these points are the local maxima
t ∈ ∂k M, k = 0, . . . , N, of f |∂k M for which f (tn ) ↑ f (t) whenever the tn ∈ ∂k+1 M
converge to t monotonically (for the usual partial order on RN ) along a direction
normal to ∂k M at t.
Since the notion of extended local maxima9 includes the 0-dimensional sets ∂0 M
(i.e., the “corners’’ of M) it is not hard to see that if supt∈M ft ≥ u, then f must have
at least one extended outward maximum on M, and vice versa (see the argument in
Section 14.1.1 if you want details). Consequently, we can argue as follows:
9 Note that if we write
Mu (M) = #(local maxima of f in M with ft ≥ u),
then one can actually carry through all of the following argument with MuE replaced by
Mu . In fact, this was Piterbarg’s original argument (cf. [126]). Since it is always true that
Mu ≥ MuE , this means that the upper bound given in (4.6.5) below is poorer with this
change. Since the lower bound involves a difference of two terms, each of which is larger
with this change, it is unclear which variation of the argument is better, although one expects
the version with MuE rather than Mu to give the tighter bound. In some sense, the difference
is of academic interest only, since neither bound can be explicitly evaluated. When we turn
to the explicit computations for the Gaussian case in Chapter 14, we shall in any case adopt
a somewhat different approach.
98 4 Excursion Probabilities
N
N
P sup ft ≥ u = P{MuE (M) ≥ 1} ≤ P{MuE (∂k M) ≥ 1} ≤ E{MuE (∂k M)}.
t∈M k=0 k=0
(4.6.2)
This gives a simple upper bound for excursion probabilities in terms of the mean
number of outward extended maxima. For a lower bound, note that
sup ft ≥ u ⇐⇒ {MuE (M ◦ ) ≥ 1, MuE (∂M) = 0} ∪ {MuE (∂M) ≥ 1}, (4.6.3)
t∈M
'
where ∂M = N−1 k=0 ∂k M. Now assume that f , as a function on the N -dimensional
manifold M, does not have any critical points on ∂M. Write
{MuE (M ◦ ) ≥ 1, MuE (∂M) = 0} = {MuE (M ◦ ) = 1, MuE (∂M) = 0}
∪ {MuE (M ◦ ) ≥ 2, MuE (∂M) = 0}.
To compute the probabilities of the above two events, set
pk = P{MuE (M ◦ ) = k},
and note that
∞
E{MuE (M ◦ )} = P{MuE (M ◦ ) = 1} + kpk ,
k=2
so that
P{MuE (M) = 1, MuE (∂M) = 0}
= P{MuE (M) = 1} − P{MuE (M ◦ ) = 1, MuE (∂M) ≥ 1}
∞
= E{MuE (M ◦ )} − kpk − P{MuE (M ◦ ) = 1, MuE (∂M) ≥ 1}.
k=2
In a similar vein,
P{MuE (M ◦ ) ≥ 2, MuE (∂M) = 0}
∞
= pk − P{MuE (M ◦ ) ≥ 2, MuE (∂M) ≥ 1}.
k=2
Putting the last two equalities together with (4.6.3) immediately gives us that
∞
P sup ft ≥ u = E{MuE (M ◦ )} − (k − 1)pk + P{MuE (∂M) ≥ 1}
t∈M k=2
(4.6.4)
− P{MuE (M ◦ )≥ 1, MuE (∂M) ≥ 1}
1
≥ E{MuE (M ◦ )} − E{MuE (M ◦ )[MuE (M ◦ ) − 1]}
2
on noting that k − 1 < k(k − 1)/2 for k ≥ 2. Iterate this argument through ∂k M,
0 ≤ k ≤ N − 1, and combine it with (4.6.2) to obtain the following.
4.6 Local Maxima and Excursion Probabilities 99
N
E{MuE (∂k M)} ≥ P sup ft ≥ u (4.6.5)
k=0 t∈M
N 9
:
1
≥ E{MuE (∂k M)} − E{MuE (∂k M)[MuE (∂k M) − 1]} .
2
k=0
Of course, we have not really proven Theorem 4.6.1 with the rigor that it deserves,
so you should feel free to call it a conjecture rather than a theorem.
What you should note about this theorem/conjecture is that it makes no distri-
butional assumptions on f beyond ensuring that all the terms it involves are well
defined and finite. It certainly does not require that f be Gaussian. Consequently, it
covers a level of generality far beyond the Gaussian theory we have treated so far.
It also makes sense that it would be quite a general phenomenon for the
negative terms in (4.6.5) to be of smaller order that the others. After all, for
MuE (∂k M)[MuE (∂k M)−1] to be nonzero, there must be at least two extended outward
maxima of f |∂k M above the level u on ∂k M, and this is unlikely to occur if u is large.
Thus Theorem 4.6.1 seems to hold a lot of promise for approximating extremal
probabilities in general, assuming that we could actually compute explicit expressions
for the simple expectations in (4.6.5) along with some useful bounds for the product
expectations.
In Chapter 11 we shall work hard to obtain generic expressions that, in principle,
allow us to compute all these expectations. In particular, we shall have Theorem 11.2.1
for the simple means and Theorem 11.5.1 for the factorial moments in (4.6.5). Un-
fortunately, however, carrying out the computations in practice will turn out to be
impossible, even in the case that f is Gaussian. Consequently, we shall be forced to
take a less-direct path to approximating excursion probabilities, by evaluating mean
Euler characteristics that actually approximate the first-order expectations in (4.6.5).
The details of this argument will appear in Chapter 14.
First of all, however, we shall need to spend quite some time in Part II of this
book on studying geometry.
5
Stationary Fields
Stationarity has always been the backbone of almost all examples in the theory of
Gaussian processes for which specific computations were possible. As described in
the preface, one of the main reasons we shall be studying Gaussian processes on
manifolds is to get around this assumption. Nevertheless, despite the fact that we
shall ultimately try to avoid it, we invest a chapter on the topic for two reasons:
• Many of the results of Part III are significantly easier to interpret when specialized
down to cases under which stationarity holds.
• Even in the nonstationary case, many of the detailed computations of Part III can
be considered as deriving from a “local conversion to pseudostationarity,’’ or,
even more so, to pseudoisotropy. This will be taken care of there via the “induced
Riemannian metric’’ defined in Section 12.2. Knowledge of what happens under
stationarity is therefore important for knowing what to do in the general case.
We imagine that many of you will be familiar with most of the material of this
chapter and so will skip it and return only when specific details are required later. For
the newcomers, you should be warned that our treatment is full enough only to meet
our specific needs and that both style and content are occasionally a little eclectic. In
other words, you should go elsewhere for fuller, more standard treatments. References
will be given along the way.
The most important classic results of this chapter are the spectral distribution and
spectral representation theorems for RN of Section 5.4. However, the most important
results for us will be some of the consequences of these theorems for relationships
between spectral moments that are concentrated in Section 5.5. This is the one section
of this chapter that you will almost definitely need to come back to, even if you have
decided that you are familiar enough with stationarity to skip this chapter for the
moment.
valued processes. Hence, unless otherwise stated, we shall assume throughout this
chapter that f (t) = (fR (t) + ifI (t)) takes values in the complex plane C and that
E{f (t)2 } = E{fR2 (t) + fI2 (t)} < ∞. (Both fR and fI are, obviously, to be real-
valued.) As for a definition of normality in the complex scenario, we first define
a complex random variable to be Gaussian if the vector of its two components is
bivariate Gaussian.1 A complex process f is Gaussian if i αti fti is a complex
Gaussian variable for all sequences {ti } and complex {αti }.
We also need some additional assumptions on the parameter space T . In particular,
we require that it have a group structure2 and an operation with respect to which
the field is stationary. Consequently, we now assume that T has such a structure,
“+’’ represents the binary operation on T and “−’’ repesents inversion. As usual,
t − s = t + (−s). For the moment, we need no further assumptions on the group.
Since ft ∈ C, it follows that the mean function m(t) = E{f (t)} is also complex-
valued, as is the covariance function, which we redefine for the complex case as
C(s, t) = E{[f (s) − m(s)][f (t) − m(t)]}, (5.1.1)
• C(s, t) = C(t, s), which becomes the simple symmetry C(s, t) = C(t, s) if f
(and so C) is real-valued.
• For any k ≥ 1, t1 , . . . , tk ∈ T , and z1 , . . . , zk ∈ C, the Hermitian form
k k
i=1 j =1 C(ti , tj )zi zj is always real and nonnegative. We summarize this,
as before, by saying that C is nonnegative definite.
(The second of these properties follows from the equivalence of the double-sum to
E{ ki=1 [f (ti ) − m(ti )]zi 2 }.)
Suppose for the moment that T is Abelian. A random field f is called strictly
homogeneous or strictly stationary over T , with respect to the group operation +, if
its finite-dimensional distributions are invariant under this operation. That is, for any
k ≥ 1 and any set of points τ, t1 , . . . , tk ∈ T ,
L
(f (t1 ), . . . , f (tk )) = (f (t1 + τ ), . . . , f (tk + τ )). (5.1.2)
Note that none of the above required the Gaussianity we have assumed up until
now on f . If, however, we do add the assumption of Gaussianity, it immediately
follows from the structure (1.2.3) of the multivariate Gaussian density3 that a weakly
stationary Gaussian field will also be strictly stationary if C (s, t) = E{[f (s) −
m(s)][f (t) − m(t)]} is also a function only of s − t. If f is real-valued, then since
C ≡ C it follows that all weakly stationary real-valued Gaussian fields are also
strictly stationary, and the issue of qualifying adjectives is moot.4
If T is not Abelian we must distinguish between left and right stationarity. We
say that a random field f on T is right-stationary if (5.1.2) holds and that f is left-
stationary if f (t) = f (−t) is right-stationary. The corresponding conditions on the
covariance function change accordingly.
In order to build examples of stationary processes, we need to make a brief
excursion into (Gaussian) stochastic integration.
We return to the setting of Section 1.4.3, so that we have a σ -finite5 measure space
(T , T ,ν), along with the Gaussian ν-noise W defined over T . Our aim will be to
establish the existence of integrals of the form
f (t)W (dt), (5.2.1)
T
noise and replace conditions (1.4.11)–(1.4.13) with the following three requirements
for all A, B ∈ T :
Note that in the Gaussian case (5.2.4) is really equivalent to the seemingly stronger
(1.4.13), since zero covariance and independence are then equivalent.
The second restriction is that the integrand f in (5.2.1) is deterministic. Remov-
ing this assumption would lead us to having to define the Itô integral, which is a
construction for which we shall have no need.
Since, by (5.2.3), W is a finitely additive (signed) measure, (5.2.1) is evocative
of Lebesgue integration. Consequently, we start by defining the stochastic version
for simple functions
n
f (t) = ai 1Ai (t), (5.2.5)
1
n
W (f ) ≡ f (t)W (dt) = ai W (Ai ). (5.2.6)
T 1
It follows immediately from (5.2.2) and (5.2.4) that in this case W (f ) has zero mean
and variance given by ai2 ν(Ai ). Think of W (f ) as a mapping from simple functions
in L2 (T , T , ν) to random variables6 in L2 (P) ≡ L2 ( , F, P). The remainder of the
construction involves extending this mapping to a full isomorphism from L2 (ν) ≡
L2 (T , T , ν) onto a subspace of L2 (P). We shall use this isomorphism to define the
integral.
Let S = S(T ) denote the class of simple functions of the form (5.2.5) for some
finite n. Note first there is no problem with the consistency of the definition (5.2.6)
over different representations of f . Furthermore, W clearly defines a linear mapping
on S that preserves inner products. To see this, write f, g ∈ S as
n
n
f (t) = ai 1Ai (t), g(t) = bi 1Ai (t),
1 1
in terms of the same partition, to see that the L2 (P) inner product between W (f ) and
W (g) is given by
Note also that since L2 limits of Gaussian random variables remain Gaussian (cf.
(1.2.5) and the discussion above it) under the additional assumption that W is a
Gaussian noise, it follows that W (f ) is also Gaussian.
With our integral defined, we can now start looking at some examples of what
can be done with it.
for some C. However, from (5.2.9) and the invariance of ν under the group operation,
E {f (t)f (s)} = F (t − u)F (s − u)ν(du)
T
= F (t − s + v)F (v)ν(dv)
T
= C(t − s),
A similar but slightly more sophisticated construction also yields a more general
class of examples, in which we think of the elements g of a group G acting on the
elements t of an underlying space T . This will force us to change notation a little
and, for the argument to be appreciated in full, to assume that you also know a little
about manifolds. If you do not, then you can return to this example later, after having
read Chapter 6, or simply take the manifold to be RN . In either case, you may still
want to read the very concrete and quite simple examples at the end of this section
now.
Thus, taking the elements g of a group G acting on the elements t of an underlying
space T , we denote the identity element of G by e and the left and right multiplication
maps by Lg and Rg . We also write Ig = Lg ◦ Rg−1 for the inner automorphism of G
induced by g.
Since we are now working in more generality, we shall also drop the commutativity
assumption that has been in force so far. This necessitates some additional definitions,
since we must distinguish between left and right stationarity. We say that a random
field f on G is strictly left-stationary if for all n, all (g1 , . . . , gn ), and any g0 ,
L
(f (g1 ), . . . , f (gn )) = (f ◦ Lg0 (g1 ), . . . , f ◦ Lg0 (gn )).
It is called strictly right-stationary if f (g) = f (g −1 ) is strictly left-stationary and
strictly bistationary, or simply strictly stationary, if it is both left and right strictly
stationary. As before, if f is Gaussian and has constant mean and covariance function
C satisfying
for some C , then f is right-stationary. If f is not Gaussian, but has constant mean and
(5.3.2) holds, then f is weakly left-stationary. Weak right-stationarity and stationarity
are defined analogously.
We can now start collecting the building blocks of the construction, which will be
of a left-stationary Gaussian random field on a group G. An almost identical argument
will construct a right-stationary field. It is then easy to see that this construction will
give a bistationary field on G only if it is unimodular, i.e., if any left Haar measure
on G is also right invariant.
We first add the condition that G be a Lie group, i.e., a group that is also a C ∞
manifold such that the maps taking g to g −1 and (g1 , g2 ) to g1 g2 are both C ∞ . We
say G has a smooth (C ∞ ) (left) action on a smooth (C ∞ ) manifold T if there exists
a map θ : G × T → T satisfying, for all t ∈ T and g1 , g2 ∈ G,
θ (e, t) = t,
θ (g2 , θ(g1 , t)) = θ (g2 g1 , t).
Furthermore, we assume that θg∗ (ν) is absolutely continuous with respect to ν, with
Radon–Nikodym derivative
dθg∗ (ν)
D(g) = (t), (5.3.4)
dν
independent of t. We call such a measure ν left relatively invariant under G. It is
easy to see that D(g) is a C ∞ homomorphism from G into the multiplicative group
of positive real numbers, i.e., D(g1 g2 ) = D(g1 )D(g2 ). We say that ν is left invariant
with respect to G if it is left relatively invariant and D ≡ 1.
Here, finally, is the result.
Lemma 5.3.2. Suppose G acts smoothly on a smooth manifold T and ν is left rela-
tively invariant under G. Let D be as in (5.3.4) and let W be Gaussian ν-noise on
T . Then for any F ∈ L2 (T , ν),
1
f (g) = √ W (F ◦ θg −1 )
D(g)
is a left stationary Gaussian random field on G.
= C(g1−1 g2 ).
It is easy to find simple examples to which Lemma 5.3.2 applies. The most
natural generic example of a Lie group acting on a manifold is its action on itself.
In particular, any right Haar measure is left relatively invariant, and this is a way to
generate stationary processes. To apply Lemma 5.3.2 in this setting one needs only to
start with a Gaussian noise based on a Haar measure on G. In fact, this is the example
(5.3.1) with which we started this section.
A richer but still concrete example of a group G acting on a manifold T is given
by the group of rigid motions GN = GL(N, R) × RN acting7 on T = RN . For
g = (A, t) and s ∈ RN , set
θ (A, t)(s) = As + t.
The moving averages of the previous section gave us examples of stationary fields that
were rather easy to generate in quite general situations from Gaussian noise. Now,
however, we want to look at a general way of generating all stationary fields, via the
so-called spectral representation. This is quite a simple task when the parameter set
is RN , but rather more involved when a general group is taken as the parameter space
and issues of group representations arise. Thus we shall start with the Euclidean
case, which we treat in detail, and then discuss some aspects of the general case in the
following section. In both cases, while an understanding of the spectral representation
is a powerful tool for understanding stationarity and a variety of sample path properties
of stationary fields, it is not necessary for what comes later in the book.
We return to the setting of complex-valued fields, take T = RN , and assume, as
usual, that E{ft } = 0. Furthermore, since we are now working only with stationary
processes, it makes sense to abuse notation somewhat and write
we have the following result, which dates back to Bochner [27], in the setting of
(nonstochastic) Fourier analysis, a proof of which can be found in almost any text on
Fourier analysis.
for all t ∈ RN .
nonnegative definite. Understanding of the result comes from the spectral represen-
tation theorem (Theorem 5.4.2), for which we need to extend somewhat the stochastic
integration of Section 5.2.
As usual, we start with a measure ν on RN , and define a complex ν-noise to be a
C-valued, set-indexed process satisfying
has covariance
C(s, t) = ei(s−t),λ ν(dλ) (5.4.7)
RN
Proof. The fact that (5.4.6) generates a stationary field with covariance (5.4.7) is an
immediate consequence of (5.4.5). What needs to be proven is the statement that all
stationary fields can be represented as in (5.4.6). We shall only sketch the basic idea
of the proof, leaving the details to the reader. (They can be found in almost any book
on time series—our favorite is [34]—for processes on either Z or R and the extension
to RN is trivial.)
For the first step, set up a mapping from10 H = span{ft , t ∈ RN } ⊂ L2 (P) to
K = span{eit· , t ∈ RN } ⊂ L2 (ν) via
⎛ ⎞
n
n
⎝ aj f (tj )⎠ = aj eitj · (5.4.8)
j =1 j =1
There is also a corresponding real form of the spectral representation (5.4.6). The
fact that the spectral representation yields a real-valued process also implies certain
symmetries14 on the spectral process W . In particular, it turns out that there are two
independent real-valued μ-noises, W1 and W2 , such that15
ft = cos(λ, t)W1 (dλ) + sin(λ, t)W2 (dλ). (5.4.11)
R+ ×RN−1 R+ ×RN−1
Since they will be very important later on, we now take a closer look at spectral
measures and, in particular, their moments. It turns out that these contain a lot of
simple, but very useful, information. Given the spectral representation (5.4.7), that is,
C(t) = eit,λ ν(dλ), (5.5.1)
RN
for all (i1 , . . . , iN ) with ij ≥ 0. Recalling that stationarity implies that C(t) = C(−t)
and ν(A) = ν(−A), it follows that the odd-ordered spectral moments, when they
exist, are zero; i.e.,
13 There is nothing special about the half-space λ ≥ 0 taken in this representation. Any
1
half-space will do.
14 To establish this rigorously, we really need the inverse to (5.4.6), expressing W as an integral
involving f , which we do not have.
15 In one dimension, it is customary to take W as a μ-noise and W as a (2ν )-noise, which
1 2 1
at first glance is different from what we have. However, noting that when N = 1,
sin(λt)W2 (dλ) = 0 when λ = 0, it is clear that the two definitions in fact coincide in
this case.
5.5 Spectral Moments 113
N
λi1 ...iN = 0 if ij is odd. (5.5.3)
j =1
∂k
f (t) = DLk 2 f (t, (ei1 , . . . , eik ))
∂ti1 · · · ∂tik
of f of various orders.
It is then straightforward to see that the covariance function of such partial deriva-
tives must be given by
∂ k f (s) ∂ k f (t) ∂ 2k C(s, t)
E = . (5.5.4)
∂si1 ∂si1 · · · ∂sik ∂ti1 ∂ti1 · · · ∂tik ∂si1 ∂ti1 · · · ∂sik ∂tik
Here are some important special cases of the above, for which we adopt the shorthand
fj = ∂f/∂tj and fij = ∂ 2 f/∂ti ∂tj along with a corresponding shorthand for the
partial derivatives of C:
(i) fj has covariance function −Cjj and thus variance λ2ej = −Cjj (0), where ej ,
as usual, is the vector with a 1 in the j th position and zero elsewhere.
16 If you decide to check this for yourself using (5.4.6) and (5.4.7)—which is a worthwhile
exercise—make certain that you recall the fact that the covariance function is defined as
E{f (s)f (t)}, or you will make the same mistake RJA did in [2] and forget the factor of
(−1)α+β in the first line. Also, note that although (5.5.5) seems to have some asymmetries
in the powers, these disappear due to the fact that all odd-ordered spectral moments, like all
odd-ordered derivatives of C, are identically zero.
114 5 Stationary Fields
for all j and all t. If f is Gaussian, this is equivalent to independence. Note that
(5.5.6) does not imply that f and fj are uncorrelated as processes. In general,
for s = t, we will have that E{f (s)fj (t)} = −Cj (s − t) = 0.
(iii) Taking α = γ = δ = 1, β = 0 in (5.5.5) gives that
It will be important for us in later chapters that some of the relationships of the previous
section continue to hold under a condition weaker than stationarity. Of particular
interest is knowing when (5.5.6) holds; i.e., when f (t) and fj (t) are uncorrelated.
Suppose that f has constant variance, σ 2 = C(t, t), throughout its domain of
definition, and that its L2 first-order derivatives all exist. In this case, analagously to
(5.5.5), we have that
∂ ∂
E{f (t)fj (t)} = C(t, s) = C(t, s) . (5.6.1)
∂tj s=t ∂sj s=t
Since constant variance implies that ∂/∂tj C(t, t) ≡ 0, the above two equalities
imply that both partial derivatives there must be identically zero. That is, f and its
first-order derivatives are uncorrelated.
One can, of course, continue in this fashion. If first derivatives have constant
variance, then they, in turn, will be uncorrelated with second derivatives, in the sense
that fi will be uncorrelated with fij for all i, j . It will not necessarily be true, however,
that fi and fj k will be uncorrelated if i = j and i = k. This will, however, be true
if the covariance matrix of all first-order derivatives is constant.
5.7 Isotropy 115
5.7 Isotropy
An interesting special class of homogeneous random fields on RN that often arises in
applications in which there is no special meaning attached to the coordinate system
being used is the class of isotropic fields. These are characterized17 by the property
that the covariance function depends only on the Euclidean length |t| of the vector t,
so that
C(t) = C(|t|). (5.7.1)
Isotropy has a number of surprisingly varied implications for both the covariance
and spectral distribution functions, and is actually somewhat more limiting than it
might at first seem. For example, we have the following result, due to Matérn [111].
Proof. Isotropy implies that C can be written as a function on R+ only. Let τ be any
positive real number. We shall show that C(τ ) ≥ −C(0)/N .
Choose any t1 , . . . , tN+1 in RN for which |ti − tj | = τ for all i = j . Then, by
(5.7.1),
⎧ 2 ⎫
⎨N+1 ⎬
E f (tk ) = (N + 1)[C(0) + N C(τ )].
⎩ ⎭
k=1
The restriction of isotropy also has significant simplifying consequences for the
spectral measure ν of (5.4.1). Let θ : RN → RN be a rotation, so that |θ(t)| = |t| for
all t. Isotropy then implies C(t) = C(θ (t)), and so the spectral distribution theorem
implies
e it,λ
ν(dλ) = eiθ(t),λ ν(dλ) (5.7.2)
RN RN
= eit,θ(λ) ν(dλ) = eit,λ νθ (dλ),
RN RN
RN away from the origin. In particular, it is not possible to have a spectral measure
degenerate at one point, unless that point is the origin. The closest the spectral measure
of an isotropic field can come to this sort of behavior is to have all its probability
concentrated in an annulus of the form
{λ ∈ RN : a ≤ |λ| ≤ b}.
In such a case it is not hard to see that the field itself is then composed of a “sum’’ of
waves traveling in all directions but with wavelengths between 2π/b and 2π/a only.
Another consequence of isotropy is that the spherical symmetry of the spectral
measure significantly simplifies the structure of the spectral moments, and hence the
correlations between various derivatives of f . In particular, it follows immediately
from (5.5.5) that
E fi (t)fj (t) = −E f (t)fij (t) = λ2 δij , (5.7.3)
where δij is the Kronecker delta and λ2 = RN λ2i ν(dλ), which is independent of the
value of i. Consequently, if f is Gaussian, then the first-order derivatives of f are
independent of one another, as they are of f itself.
Since isotropy has such a limiting effect in the spectrum, it is natural to ask how
the spectral distribution and representation theorems are affected under isotropy. The
following result, due originally to Schoenberg [140] (in a somewhat different setting)
and Yaglom [177], describes what happens.
Theorem 5.7.2. For C to be the covariance function of a mean square continuous,
isotropic, random field on RN it is necessary and sufficient that
∞
J(N−2)/2 (λ|t|)
C(t) = μ(dλ), (5.7.4)
0 (λ|t|)(N−2)/2
where μ is a finite measure on R+ and Jm is the Bessel function of the first kind of
order m, that is,
∞
(x/2)2k+m
Jm (x) = (−1)k .
k!(k + m + 1)
k=0
Proof. The proof consists in simplifying the basic spectral representation by using
the symmetry properties of ν.
We commence by converting to polar coordinates, (λ, θ1 , . . . , θN−1 ), λ ≥ 0,
(θ1 , . . . , θN−1 ) ∈ S N−1 , where S N−1 is the unit sphere in RN . Define a measure μ
on R+ by setting μ([0, λ]) = ν(B N (λ)), and extending as usual, where B N (λ) is the
N-ball of radius λ and ν is the spectral measure of (5.4.1).
Then, on substituting into (5.4.1) with t = (|t|, 0, . . . , 0) and performing the
coordinate transformation, we obtain
∞
C(|t|) = exp(i|t|λ cos θN−1 )σ (dθ )μ(dλ),
0 S N−1
5.7 Isotropy 117
where
2π N/2
sN = , N ≥ 1, (5.7.5)
(N/2)
is the surface area18 of S N−1 .
The inside integral can be evaluated in terms of Bessel functions to yield
π
J(N−2)/2 (λ|t|)
eiλ|t| cos θ sinN−2 θ dθ = ,
0 (λ|t|)(N−2)/2
which, on absorbing sN−2 into μ, completes the proof.
For small values of the dimension N , (5.7.4) can be simplified even further. For
example, substituting N = 2 into (5.7.4) yields that in this case,
∞
C(t) = J0 (λ|t|)μ(dλ),
0
while substituting N = 3 and evaluating the inner integral easily yields that in
this case, ∞
sin(λ|t|)
C(t) = μ(dλ).
0 λ|t|
Given the fact that the covariance function of an isotropic field takes such a special
form, it is natural to seek a corresponding form for the spectral representation of the
field itself. Such a representation does in fact exist and we shall now describe it,
albeit without giving any proofs. These can be found, for example, in the book by
Wong [169], or as special cases in the review by Yaglom [178], which is described in
Section 5.8 below. Another way to verify it would be to check that the representation
given in Theorem 5.7.3 below yields the covariance structure of (5.7.4). Since this
is essentially an exercise in the manipulation of special functions and not of intrinsic
probabilistic interest, we shall avoid the temptation to carry it out.
The spectral representation of isotropic fields on RN is based on the so-called
spherical harmonics19 on the (N − 1)-sphere, which form an orthonormal basis for
18 When N = 1, we have the “boundary’’ of the “unit sphere’’ [−1, 1] ∈ R. This is made up
of the two points ±1, which, in counting measure, has measure 2. Hence s1 = 2 makes
sense.
19 We shall also avoid giving details about spherical harmonics. A brief treatment would add
little to understanding them. The kind of treatment required to, for example, get the code
correct in programming a simulation of an isotropic field using the representations that
follow will, in any case, send you back to the basic reference of Erdélyi [61] followed by
some patience in sorting out a software help reference. A quick web search will yield you
many interactive, colored examples of these functions within seconds.
118 5 Stationary Fields
the space of square-integrable functions on S N−1 equipped with the usual surface
measure. We shall denote them by {h(N−1)
ml , l = 1, . . . , dm , m = 0, 1, . . . }, where
the dm are known combinatorial coefficents.20
Now use the spectral decomposition
f (t) = eit,λ W (dλ)
RN
where, once again, we work in polar coordinates. Note that since W is a ν-noise,
where ν is the spectral measure, information about the covariance of f has been coded
into the Wml . From this family, define a family of mutually uncorrelated, stationary,
one-dimensional processes {fml } by
∞
Jm+(N−2)/2 (λr)
fml (r) = Wml (dλ),
0 (λr)(N−2)/2
where, as in the spectral representation (5.4.6), one has to justify the existence of this
L2 stochastic integral. These are all the components we need in order to state the
following.
In other words, isotropic random fields can be decomposed into a countable num-
ber of mutually uncorrelated stationary processes with a one-dimensional parameter,
a result that one would not intuitively expect. As noted above, there is still a hidden
spectral process in (5.7.6), entering via the Wml and fml . This makes for an important
difference between (5.7.6) and the similar looking Karhunen–Loève expansion we
saw Section 3.2. Another difference lies in the fact that while it is possible to truncate
the expansion (5.7.6) to a finite number of terms and retain isotropy, this is not true
of the standard Karhunen–Loève expansion. In particular, isotropic fields can never
have finite Karhunen–Loève expansions. For a heuristic argument as to why this is
the case, recall from Section 5.7 that under isotropy the spectral measure must be
invariant under rotations, and so cannot be supported on a finite, or even countable,
number of points. Consequently, one also needs an uncountable number of inde-
pendent variables in the spectral noise process to generate the process via (5.4.6).
20 The spherical harmonics on S N−1 are often written as {h(N−1)
m,l1 ,...,lN−2 ,±lN−1 }, where 0 ≤
łN−1 ≤ · · · ≤ l1 ≤ m. The constants dm in our representation can be computed from this.
5.8 Stationarity over Groups 119
where
(N−2)/2
2 N
GN (x) = J(N−2)/2 (x)
x 2
We have already seen in Section 5.3 that the appropriate setting for stationarity is that
in which the parameter set has a group structure. In this case it made sense, in general,
to talk about left and right stationarity (cf. (5.3.2) and (5.3.3)). Simple “stationarity’’
requires both of these and so makes most sense if the group is abelian (commutative).
In essence, the spectral representation of a random field over a group is intimately
related to the representation theory of the group. This, of course, is far from being a
simple subject. Furthermore, its level of difficulty depends very much on the group
in question, and so it is correspondingly not easy to give a general spectral theory for
random fields over groups. The most general results in this area are in the paper by
120 5 Stationary Fields
Yaglom [178] already mentioned above, and the remainder of this section is taken
from there.21
We shall make life simpler by assuming for the rest of this section that T is a
locally compact Abelian (LCA) group. As before, we shall denote the binary group
operation by +, while − denotes inversion. The Fourier analysis of LCA groups is
well developed (e.g., [133]) and based on characters. A homomorphism γ from T
to the multiplicative group of complex numbers is called a character if γ (t) = 1
for all t ∈ T and if
γ (t + s) = γ (t)γ (s), s, t ∈ T .
If T = RN under the usual addition, then the characters are given by the family
γλ (t) = eit,λ N λ∈R
of complex exponential functions, which were at the core of the spectral theory of
fields over RN . If T = ZN , again under addition, the characters are as for T = RN ,
but λ is restricted to [−π, π ]N . If T = RN under rotation rather than addition, then
the characters are the spherical harmonics on S N−1 .
The set of all continuous characters also forms a group, say, called the dual
group, with composition defined by
There is also a natural topology on (cf. [133]) that gives an LCA structure and
under which the map (γ , t) = γ (t) : × T → C is continuous. The spectral
distribution theorem in this setting can now be written as
C(t) = (γ , t)ν(dγ ), (5.8.1)
where the finite spectral measure ν is on the σ -algebra generated by the topology on
. The spectral representation theorem can be correspondingly written as
f (t) = (γ , t)W (dγ ), (5.8.2)
where W is a ν-noise on .
Special cases now follow from basic group-theoretic results. For example, if T
is discrete, then is compact, as we noted already for the special case of T = ZN .
Consequently, the integral in the spectral representation (5.8.2) is actually a sum and
21 There is also a very readable, albeit less exhaustive, treatment in Hannan [76]. In addition,
Letac [103] has an elegant exposition based on Gelfand pairs and Banach algebras for
processes indexed by unimodular groups, which, in a certain sense, give a generalization of
isotropic fields over RN . Ylinen [179] has a theory for noncommutative locally compact
groups that extends the results in [178].
5.8 Stationarity over Groups 121
where the Wml are uncorrelated with variance depending only on m. This, of course,
is simply (5.7.6) once again, derived from a more general setting. Similarly, the
covariance function can be written as
∞
(N−1)/2
C(θ1 , θ2 ) = C(θ12 ) = σm2 Cm (cos(θ12 )), (5.8.4)
m=0
where θ12 is the angular distance between θ1 and θ2 , and the Cm N are the Gegenbauer
polynomials.
Other examples for the LCA situation follow in a similar fashion from (5.8.1) and
(5.8.2) by knowing the structure of the dual group .
The general situation is much harder and, as has already been noted, relies heavily
on knowing the representation of T . In essence, given a representation of G on GL(H )
for some Hilbert space H , one constructs a (left or right) stationary random field on
G via the canonical white noise on H . The construction of Lemma 5.3.2 with T = G
and H = L2 (G, μ) can be thought of as a special example of this approach. For
further details, you should go to the references given earlier in this section.
Part II
Geometry
Part II. Geometry 125
If you have not yet read the preface, then please do so now.
Since you have read the preface, you already know that central to much of what
we shall be looking at in Part III is the geometry of excursion sets:
for a number of problems related to random fields even on the “flat’’ manifold RN .
This approach is crucial if you want to understand the full theory. Furthermore, since
some of the proofs of later results, even in the integral-geometric setting, are more
natural in the manifold scenario, it is essential if you need to see full proofs. How-
ever, the jump in level of mathematical sophistication between the two approaches is
significant, so that unless you feel very much at home in the world of manifolds you
are best advised to read the integral-geometric story first.
Chapter 6 handles all the integral geometry that we shall need. The treatment
there is detailed, complete, and fully self-contained. This is not the case when we
turn to differential geometry, where the theory is too rich to treat in full. We start with
a quick and nasty revision of basic differential geometry in Chapter 7, most of which
is standard graduate-course material. Chapter 8 treats piecewise smooth manifolds,
which provide the link between the geometry of Chapter 6, with its sharp corners and
edges, and the smooth manifolds of Chapter 7. In Chapter 9 we look at Morse theory
in the piecewise smooth setting. While Morse theory in the smooth setting is, once
again, standard material, the piecewise smooth case is less widely known. In fact,
Chapter 9 actually contains some new results that, unlike most of the rest of Part II,
we were not able to find elsewhere.
In the passage from integral to differential geometry a number of things will
happen. Among them, the space T (time, or multidimensional time) will become
M (manifold)23 and the Minkowski functionals will become the Lipschitz–Killing
curvatures.
While Lipschitz–Killing curvatures are well-known objects in global differential
geometry, we do not imagine that they will be too well known to the probabilist reader.
Hence we devote Chapter 10 to a study of the so-called tube formulas, originally
due to Hermann Weyl [168] in Euclidean spaces. In addition, as we have already
noted in the preface, some of the proofs are different from what is found in the
standard geometry literature, in that they rely on properties of Gaussian distributions.
Furthermore, Chapter 10 also discusses a relatively new generalization of the Weyl
results to manifolds in Gauss space, due to JET [158].
The tube formulas of Chapter 10 can also be exploited to develop a formal ap-
proximation to the excursion probability
P sup f (t) ≥ u
t∈M
for certain Gaussian fields with finite orthonormal expansions,24 and we shall look at
this in Sections 10.2 and 10.6.
All of Part II, old and new, is crucial for understanding the general proofs of
Part III.
23 A true transition from T to M would also have the elements t of T becoming elements
p (points) of M. However, as we have already mentioned, this seems to be too heavy a
psychological price for a probabilist to pay, so we shall remain with points t ∈ M. For this
mortal sin of misnotation we beg forgiveness from our geometer colleagues.
24 The most general case will be treated in detail in Chapter 14 via Morse-theoretic techniques.
6
Integral Geometry
Our aim in this chapter is to develop a framework for handling excursion sets, which
we now redefine in a nonstochastic framework.
Au ≡ Au (f, T ) = {t ∈ T : f (t) ≥ u} (6.0.1)
Throughout this chapter, T will be a “nice’’ (in a sense to be specified soon) subset
of RN , and our tools will be those of integral geometry.
E = {t ∈ RN : tj = aj , j ∈ J, − ∞ < tj < ∞, j ∈
/ J}
p = p(A) = {B1 , . . . , Bm }
is called a partition of A, and their number, m, is called the order of the partition.
Clearly, partitions are in no way unique, nor, despite the terminology, are the elements
of a partition necessarily disjoint.
The class of all basic complexes, which we shall denote by CB , or CBN if we need
to emphasize its dimension, possesses a variety of useful properties. For example,
if A ∈ CB , then E ∩ A ∈ CB for every k-plane E. In fact, if E is a k-plane with
k ≤ N and A ∈ CBN , we have E ∩ A ∈ CBk . To prove this it suffices to note
that if
p = {B1 , . . . , Bm }
is a partition of A, then
p = {E ∩ B1 , . . . , E ∩ Bm }
and
whenever A, B, A ∪ B, A ∩ B ∈ CB .
An important result of integral geometry states that not only does a functional
possessing these two properties exist, but it is uniquely determined by them. We shall
prove this by obtaining an explicit formulation for ϕ, which will also be useful in its
own right.
Let p = p(A) be a partition of order m of some A ∈ CB into basic sets. Define
the characteristic of the partition to be
1 Anote for the purist: As noted earlier, our definition of C is dependent on the choice of basis,
B
which is what loses us rotation invariance. An easy counterexample in CB 2 is the descending
'∞ −j
staircase j =1 Bj , where Bj = {(x, y) : 0 ≤ x ≤ 2 , 1 − 2 1−j ≤ y ≤ 1 − 2−j }. This is
actually a basic set with relation to the natural axes, but not even a basic complex if the axes
are rotated 45 degrees, since then it cannot be represented as the union of a finite number
of basic sets.
Hadwiger’s [75] original definition of basic complexes was basis-independent but cov-
ered a smaller class of sets. In essence, basic sets were defined as above (but relative to a
coordinate system) and basic complexes were required to have a representation as a finite
union of basic sets for every choice of coordinate system. Thus our descending staircase is
not a basic complex in his scenario.
While more restrictive, Hadwiger’s approach gives a rotation-invariant theory. The
reasons for our choice will become clearer later on, when we return to a stochastic setting.
See, in particular, Theorem 11.3.3 and the comments following it. See also the axis-free
approach of Section 9.2.
2 Note that if A∩B = ∅, then A∪B is not necessarily a basic complex. For a counterexample,
take A to be the descending staircase of the previous footnote, and let B be the line segment
{(x, y) : y = 1 − x, x ∈ [0, 1]}. There is no way to represent A ∪ B as the union of a finite
number of basic sets, essentially because of the infinite number of single point intersections
between A and B, or, equivalently, the infinite number of holes in A ∪ B.
3 A functional on a lattice of sets that satisfies (6.1.2) and ϕ(A) = 0 if A = ∅ is called an
evaluation; cf. [92].
130 6 Integral Geometry
(1) (2)
κ(A, p) = (Bj ) − (Bj1 ∩ Bj2 ) + · · · (6.1.3)
(n)
+ (−1)n+1 (Bj1 ∩ · · · ∩ Bjn ) + · · ·
+ (−1)m+1 (B1 ∩ · · · ∩ Bm ),
where (n) denotes summation over all subsets {j1 , . . . , jn } of {1, . . . , m}, 1 ≤ n ≤
m, and is an indicator function for basic sets, defined by
!
0 if A = ∅,
(A) =
1 if A = ∅ is basic.
Then, if a functional ϕ satisfying (6.1.1) and (6.1.2) does in fact exist, it follows
iteratively from these conditions and the definition of basic complexes that for any
A ∈ CB and any partition p,
Thus, given existence, uniqueness of ϕ will follow if we can show that κ(A, p) is
independent of the partition p.
Proof. The main issue is that of existence, which we establish by induction. When
N = 1, basics are simply closed intervals or points, or the empty set. Thus, setting
yields a function satisfying ϕ(A) = κ(A, p) for every p and A ⊂ CB1 for which
(6.1.1) and (6.1.2) are clearly satisfied.
Now let N > 1 and assume that for all spaces of dimension k less than N we have
a functional ϕ k on CBk for which ϕ k (A) = κ(A, p) for all A ∈ CBk and every partition
p of A. Choose one of the vectors ej , and for x ∈ (−∞, ∞) let Ex (which depends
on j ) denote the (N − 1)-plane
Ex = {t ∈ RN : tj = x}. (6.1.5)
Let A ∈ CBN and let p = p(A) = {B1 , . . . , Bm } be one of its partitions. Then clearly
the projections onto E0 of the cross-sections A∩Ex are all in CBN−1 , so that there exists
a partition-independent functional ϕx defined on {A ∩ Ex , A ∈ CBN } determined by
f (A, x) = ϕx (A ∩ Ex ). (6.1.6)
However, by the induction hypothesis and (6.1.6), we have from (6.1.3) that
(1) (2)
f (A, x) = (Bj1 ∩ Ex ) − (Bj1 ∩ Bj2 ∩ Ex ) + · · · .
Assume that the intersection Bj1 ∩ · · · ∩ Bjk is nonempty. Otherwise, there is nothing
to prove. Since (x) is zero when the intersection of the Bji with Ex is empty and one
otherwise, we have for some finite a and b that (x) = 1 if a ≤ x ≤ b and (x) = 0
otherwise. Thus (x) is a step function, taking at most two values. Hence f (A, x),
being the sum of a finite number of such functions, is again a step function, with a
finite number of discontinuities. Thus the left-hand limits
always exist. Now define a function h, which is nonzero at only a finite number of
points x, by
and define
ϕ(A) = h(A, x), (6.1.9)
x
where the summation is over the finite number of x for which the summand is nonzero.
Note that since f is independent of p, so are h and ϕ.
Thus we have defined a functional on CB , and we need only check that (6.1.1)
and (6.1.2) are satisfied to complete this section of the proof. Firstly, note that if B is
a basic set and B = ∅, and if a and b are the extreme points of the linear set formed
by projecting B onto ej , then f (B, x) = 1 if a ≤ x ≤ b and equals zero otherwise.
Thus h(B, a) = 1, while h(B, x) = 0, x = a, so that ϕ(B) = 1. This is (6.1.1)
since ϕ(∅) = 0 is obvious. Now let A, B, A ∪ B, A ∩ B all belong to CB . Then the
projections onto E0 of the intersections
A ∩ Ex , B ∩ Ex , (A ∪ B) ∩ Ex , A ∩ B ∩ Ex
so that (6.1.2) is established and we have the existence of our functional ϕ. It may,
however, depend on the partition p used in its definition.
For uniqueness, note that since by (6.1.4), κ(A, p ) = ϕ(A) for any partition p ,
we have that κ(A, p ) is independent of p and so that ϕ(A) is independent of p. That
is, we have the claimed uniqueness of ϕ.
Finally, since κ(A, p) is independent of the particular choice of the vector ej
appearing in the proof, so is ϕ.
The proof of Theorem 6.1.1 actually contains more than we have claimed in
the statement of the result, since in developing the proof, we actually obtained an
alternative way of computing ϕ(A) for any A ∈ CB . This is given explicitly in the
following theorem, for which Ex is as defined in (6.1.5).
Theorem 6.1.2. For basic complexes A ∈ CBN , the Euler characteristic ϕ, as defined
by (6.1.4), has the following equivalent iterative definition:
!
number of disjoint intervals in A if N = 1,
ϕ(A) = (6.1.10)
x {ϕ(A ∩ Ex ) − ϕ(A ∩ Ex − )} if N > 1,
where
and the summation is over all real x for which the summand is nonzero.
This theorem is a simple consequence of (6.1.9) and requires no further proof.
Note that it also follows from the proof of Theorem 6.1.1 (cf. the final sentence there)
that the choice of vector ej used to define Ex is irrelevant.4
The importance of Theorem 6.1.2 lies in the iterative formulation it gives for ϕ,
for using this, we shall show in a moment how to obtain yet another formulation that
makes the Euler characteristic of a random excursion set amenable to probabilistic
investigation.
Figure 6.1.1 shows an example of this iterative procedure in R2 . Here the vertical
axis is taken to define the horizontal 1-planes Ex . The values of ϕ(A ∩ Ex ) appear
closest to the vertical axis, with the values of h to their left. Note in particular the set
with the hole “in the middle.’’ It is on sets like this, and their counterparts in higher
dimensions, that the characteristic ϕ and the number of connected components of
the set differ. In this example they are, respectively, zero and one. For the moment,
ignore the arrows and what they are pointing at.
4 It is also not hard to show that if A is a basic complex with respect to each of two coordinate
systems (which are not simple relabelings of one another) then ϕ(A) will be the same for
both. However, this is taking us back to the original formulation of [75], which we have
already decided is beyond what we need here.
6.1 Basic Integral Geometry 133
To understand how this works in higher dimensions, you should try to visualize
some N-dimensional examples to convince yourself that for the N -dimensional unit
ball, B N , and the N -dimensional unit sphere, S N−1 ,
ϕ(B N ) = 1, ϕ(S N−1 ) = 1 + (−1)N−1 . (6.1.11)
It is somewhat less easy (and, indeed, quite deep in higher dimensions) to see that
if KN,k denotes B N with k nonintersecting cylindrical holes drilled through it, then,
since both KN,k and its boundary belong to CBN ,
ϕ(KN,k ) = 1 + (−1)N k,
while
ϕ(∂KN,k ) = [1 + (−1)N−1 ](1 − k).
#
N
T = [s, t] = [si , ti ], −∞ < si < ti < ∞. (6.2.1)
i=1
For our first definition, we need to decompose T into a disjoint union of open sets,
starting with its interior, its faces, edges, etc. More precisely, a face J , of dimension
k, is defined by fixing a subset σ (J ) of {1, . . . , N} of size k and a sequence of N − k
zeros and ones, which we write as ε(J ) = {εj , j ∈ / σ (J )}, so that
J = v ∈ T : vj = (1 − εj )sj + εj tj if j ∈ / σ (J ), (6.2.2)
sj < vj < tj if j ∈ σ (J ) .
We start with some simple assumptions on f and, as usual, write the first- and
second-order partial derivatives of f as fj = ∂f/∂tj , fij = ∂ 2 f/∂ti ∂tj .
Definition 6.2.1. Let T be a bounded rectangle in RN and let f be a real-valued
function defined on an open neighborhood of T .
Then, if for a fixed u ∈ R the following three conditions are satisfied for each
face J of T for which N ∈ σ (J ), we say that f is suitably regular with respect to T
at the level u if the following conditions hold.
The first two conditions of suitable regularity are meant to ensure that the boundary
∂Au = {t ∈ T : f (t) = u} of the excursion set is smooth in the interior T ◦ of T and
that its intersections with ∂T also is smooth. The last condition is a little more subtle,
since it relates to the curvature of ∂Au both in the interior of T and on its boundary.
The main importance of suitable regularity is that it gives us the following theorem.
Theorem 6.2.2. Let f : RN → R1 be suitably regular with respect to a bounded
rectangle T at level u. Then the excursion set Au (f, T ) is a basic complex.
The proof of Theorem 6.2.2 is not terribly long, but since it is not crucial to
what will follow, it can be skipped at first reading. The reasoning behind it is all in
Figure 6.2.1, and after understanding this you can skip to the examples immediately
following the proof without losing much.
For those of you remaining with us, we start with a lemma.
Lemma 6.2.3. Let f : RN → R1 be suitably regular with respect to a bounded
rectangle T at the level u. Then there are only finitely many t ∈ T for which
The inverse mapping theorem6 implies that such a neighborhood will exist if the
N × N matrix (∂g i /∂tj ) has a nonzero determinant at t. However, this matrix has
the following elements:
∂g 1
= fj (t) for j = 1, . . . , N,
∂tj
∂g i
= fi−1,j (t) for i = 2, . . . , N, j = 1, . . . , N.
∂tj
Since t satisfies (6.2.7), all elements in the first row of this matrix, other than the N th,
are zero. Expanding the determinant along this row gives us that it is equal to
where D(t) is as defined in (6.2.6). Since (6.2.7) is satisfied, (6.2.5) and (6.2.6) imply,
respectively, that neither fN (t) nor the determinant of D(t) is zero, which, in view
of (6.2.9), is all that is required.
Proof of Theorem 6.2.2. When N = 1 we are dealing throughout with finite collec-
tions of intervals, and so the result is trivial.
Now take the case N = 2. We need to show how to write Au as a finite union of
basic sets.
Consider the set of points t ∈ T satisfying either
or
For each such point draw a line containing the point and parallel to either the horizontal
or vertical axis, depending, respectively, on whether (6.2.10) or (6.2.11) holds. These
lines form a grid over T , and it is easy to check that the connected regions of Au within
each cell of this grid are basic. Furthermore, these sets have intersections that are
either straight lines, points, or empty, and Lemma 6.2.3 (applied to the original axes
and a 90◦ rotation of them) guarantees that there are only a finite number of them, so
that they form a partition of Au . (An example of this partitioning procedure is shown
in Figure 6.2.1. The dots mark the points where either (6.2.10) or (6.2.11) holds.)
This provides the required partition, and we are done.
6 The inverse mapping theorem goes as follows: Let U ⊂ RN be open and g =
(g 1 , . . . , g N ) : U → RN a function possessing continuous first-order partial derivatives
∂g i /∂tj , i, j = 1, . . . , N. Then if the matrix (∂g i /∂tj ) has a nonzero determinant at some
point t ∈ U , there exist open neighborhoods U1 and V1 of t and g(t), respectively, and a
function g −1 : V1 → U1 , for which
Lemma 6.2.3 again guarantees the finiteness of the partition. The details are left
to you.7
We now attack the problem of obtaining a simple way of computing the Euler
characteristic of Au . As you are about to see, this involves looking at each Au ∩ J ,
J ∈ ∂k T , 0 ≤ k ≤ N, separately. We start with the simple example T = I 2 , in
which ∂2 T = T o , ∂1 T is composed of four open intervals parallel to the axes, and
∂0 T contains the four vertices of the square. Since this is a particularly simple case,
we shall pool ∂1 T and ∂0 T , and handle them together as ∂T .
Thus, let f : R2 → R1 be suitably regular with respect to I 2 at the level u.
Consider the summation (6.1.10) defining ϕ(Au (f, I 2 )), that is,
ϕ(Au ) = {ϕ(Au ∩ Ex ) − ϕ(Au ∩ Ex − )}, (6.2.12)
x∈(0,1]
where now Ex is simply the straight line t2 = x, and so nx = ϕ(Au ∩ Ex ) counts the
number of distinct intervals in the cross-section Au ∩Ex . The values of x contributing
to the sum correspond to values of x where nx changes.
It is immediate from the continuity of f that contributions to ϕ(Au ) can occur
only when Ex is tangential to ∂Au (Type I contributions) or when f (0, x) = u or
f (1, x) = u (Type II contributions). Consider the former case first.
If Ex is tangential to ∂Au at a point t, then f1 (t) = 0. Furthermore, since f (t) = u
on ∂Au , we must have that f2 (t) = 0, as a consequence of suitable regularity. Thus,
in the neighborhood of such a point and on the curve ∂Au , t2 can be expressed as an
implicit function of t1 by
f (t1 , g(t1 )) = u.
7 You should at least try the three-dimensional case, to get a feel for the source of the conditions
on the various faces of T in the definition of suitable regularity.
138 6 Integral Geometry
f (t) = u (6.2.13)
and
f1 (t) = 0. (6.2.14)
Conversely, for each point satisfying (6.2.13) to (6.2.15) there is a unit contribution
of Type I to ϕ(Au ). Note that there is no contribution of Type I to ϕ(Au ) from points
on the boundary of I 2 because of the regularity condition (6.2.5). Thus we have a one-
to-one correspondence between unit contributions of Type I to ϕ(Au ) and points in the
interior of I 2 satisfying (6.2.13) to (6.2.15). It is also easy to see that contributions of
+1 will correspond to points for which f11 (t) < 0 and contributions of −1 to points
for which f11 (t) > 0. Furthermore, because of (6.2.6) there are no contributing
points for which f11 (t) = 0.
Consider now Type II contributions to ϕ(A), which is best done by looking first
at Figure 6.2.2.
The eight partial and complete disks there lead to a total Euler characteristic of 8.
The three sets A, B, and C are accounted for by Type I contributions, since in each
case the above analysis will count +1 at their lowest points. We need to account for
the remaining five sets, which means running along ∂I 2 and counting points there.
In fact, what we need to do is count +1 at the points marked with a •. There is never
8 The implicit function theorem goes as follows: Let U ⊂ RN be open and F : U → R
possess continuous first-order partial derivatives. Suppose at t ∗ ∈ U, F (t ∗ ) = u and
FN ( t ∗ ) = 0. Then the equation
B
C
a need to count −1 on the boundary. Note that on the two vertical sections of ∂I 2 we
count +1 whenever we enter the set (at its intersection with ∂I 2 ) from below. There
is never a contribution from the top side of I 2 . For the bottom, we need to count the
number of connected components of its intersection with the excursion set, which
can be done either on “entering’’ or “leaving’’ the set in the positive t1 direction. We
choose the latter, and so must also count a +1 if the point (1, 0) is covered.
Putting all this together, we have a Type II contribution to ϕ(A) whenever one of
the following four sets os conditions is satisfied:
⎧
⎪
⎪ t = (t1 , 0), f (t) = u, f1 (t) < 0,
⎨
t = (0, t2 ), f (t) = u, f2 (t) > 0,
(6.2.16)
⎪
⎪ t = (1, t2 ), f (t) = u, f2 (t) > 0,
⎩
t = (1, 0), f (t) > u.
The above argument has established the following, which in Chapter 9, with com-
pletely different techniques and a much more sophisticated and powerful language,
will be extended to parameter sets in RN and on C 2 manifolds with piecewise smooth
boundaries:
Theorem 6.2.4. Let f be suitably regular with respect to I 2 at the level u. Then the
Euler characteristic of its excursion set Au (f, I 2 ) is given by the number of points t
in the interior of I 2 satisfying
plus the number of points on the boundary of I 2 satisfying one of the four sets of
conditions in (6.2.16).
This is what we have been looking for in a doubly simple case: The ambient
dimension was only 2, and the set T a simple square. There is another proof of
140 6 Integral Geometry
Theorem 6.2.4 in Section 9.4, built on Morse theory. There you will also find a gener-
alization of this result to I N for all finite N , although the final point set representation
is a little different from that of (6.2.16). Morse theory is also the key to treating far
more general parameter spaces. Nevertheless, what we have developed so far, along
with some ingenuity, does let us treat some more general cases, for which we give
an example. You should be able to fill in the details of the computation by yourself,
although some hand waving may be necessary. Here is the example:
(b) If t ∈ ∂T ∩ ∂Au , and the tangent to ∂T is not parallel to the horizontal axis, then
let fup (t) be the derivative of f in the direction of the tangent to ∂T pointing
in the positive t2 direction. (Two such tangent vectors appear as τC and τF in
Figure 6.2.3.) Furthermore, take the derivative of f with respect to t1 in the
direction pointing into T . Call this f⊥ . (It will equal either f1 or −f1 , depending
on whether the angles θ in Figure 6.2.3 from the horizontal to the τ vector develop
in a counterclockwise or clockwise direction, respectively.) Now mark t as a •
(and so count as +1) if f⊥ (t) < 0 and fup (t) > 0. There are no ◦ points in this
class.
(c) If t ∈ ∂T ∩ ∂Au , and the tangent to ∂T is parallel to the horizontal axis, but t is
not included in an open interval ∂T that is parallel to this axis, then proceed as in
(b), simply defining f⊥ to be f1 if the tangent is above ∂T and −f1 otherwise.
(d) If t ∈ ∂T ∩ ∂Au belongs to an open interval of ∂T that is parallel to the horizontal
axis (as in Figure 6.2.4), then mark it as a • if T is above ∂T and f1 (t) < 0.
(Thus, as in Figure 6.2.4, points such as B and C by which Au “hangs’’ from ∂T
will never be counted.)
(e) Finally, if t ∈ ∂T ∩ Au has not already been marked, and coincides with one of
the points that contribute to the Euler characteristic of T itself (e.g., A, B, and J
in Figure 6.2.3), then mark it exactly as it was marked in computing ϕ(T ).
All told, this can be summarized as follows.
Theorem 6.2.5 (Worsley [174]). Let T ⊂ R2 be compact with boundary ∂T that is
twice differentiable everywhere except, at most, at a finite number of points. Let f
be suitably regular for T at the level u. Let χ (Au (f, T )) be the number of points in
the interior of T satisfying (6.2.17) minus the number satisfying (6.2.18).
Denote the number of points satisfying (b)–(d) above as χ∂T , and denote the sum
of the contributions to ϕ(T ) of those points described in (e) by ϕ∂T . Then
ϕ(A) = χ (A) + χ∂T + ϕ∂T . (6.2.19)
Theorem 6.2.5 can be extended to three dimensions (also in [174]). In principle, it
is not too hard to guess what has to be done in higher dimensions as well. Determining
whether a point in the interior of T and on ∂Au contributes a +1 or −1 will depend
on the curvature of ∂Au , while if t ∈ ∂T , both this curvature and that of ∂T will have
roles to play.
It is clear that these kinds of arguments are going to get rather messy very quickly,
and a different approach is advisable. This is provided via the critical point theory of
differential topology, which we shall develop in Chapter 9. However, before doing
this we want to develop some more geometry in the still relatively simple scenario of
Euclidean space and describe how all of this relates to random fields.
It turned out to be integer-valued, although we did not demand this in the beginning,
and has an interpretation in terms of “counting’’ the various topological components
of a set. But there is more to life than mere counting, and one would also like to be
able to say things about the volume of sets, the surface area of their boundaries, their
curvatures, etc. In this vein, it is natural to look for a class of N additional position-
and rotation-invariant functionals {Lj }N j =1 that are also additive in the sense of (6.1.2)
and scale with dimensionality in the sense that
is the usual Euclidean distance from the point x to the set A. An example is given
in Figure 6.3.1, in which A is the inner triangle and Tube(A, ρ) the larger triangular
object with rounded-off corners.
With λN denoting, as usual, Lebesgue measure in RN , Steiner’s formula states10
that λN (Tube(A, ρ)) has a finite expansion in powers of ρ. In particular,
N
λN (Tube(A, ρ)) = ωN−j ρ N−j Lj (A), (6.3.3)
j =0
where
π j/2 sj
ωj = λj (B(0, 1)) = j = (6.3.4)
2 +1 j
9 Moving from basic complexes down to the very special case of convex sets is a severe
restriction in terms of what we need, and we shall lift this restriction soon. Nevertheless,
convex sets are a comfortable scenario in which to first meet intrinsic volumes.
10 There is a more general version of Steiner’s formula for the case in which T and its tube
are embedded in RN , where N > N = dim(A). In that case (6.3.2) still holds, but with
N replaced by N . See Theorem 10.5.6 for more details in the manifold setting.
6.3 Intrinsic Volumes 143
is the volume of the unit ball in Rj . (Recall that sj was the corresponding surface
area; cf. (5.7.5).)
We shall see a proof of (6.3.3) later on, in Chapter 10, in a far more general
scenario, but it is easy to see from Figure 6.3.1 from where the result comes.
To find the area (i.e., two-dimensional volume) of the enlarged triangle, one needs
only to sum three terms:
In other words,
Comparing this to (6.3.3), it now takes only a little thought to guess what the
intrinsic volumes must measure. If the ambient space is R2 , then L2 measures area,
L1 measures boundary length, while L0 gives the Euler characteristic. In R3 , L3
measures volume, L2 measures surface area, L1 is a measure of cross-sectional di-
ameter, and L0 is again the Euler characteristic. In higher dimensions, it takes some
imagination, but LN and LN−1 are readily seen to measure volume and surface area,
while L0 is always the Euler characteristic. Precisely why this happens, how it in-
volves the curvature of the set and its boundary, and what happens in less-familiar
spaces forms much of the content of Section 7.6 and is treated again, in fuller detail,
in Chapter 10.
In the meantime, you can try checking for yourself, directly from (6.3.3) and a
little first-principles geometry, that for an N -dimensional cube of side length T the
intrinsic volumes are given by
N
Lj [0, T ] N
= T j. (6.3.5)
j
N
N
λN Tube BλN , ρ = (λ + ρ) ωN =N
λj ρ N−j ωN
j
j =0
N
N j ωN
= ωN−j ρ N−j λ .
j ωN−j
j =0
For SλN−1 the sphere of radius λ in RN , a similar argument, using the fact that
Tube(SλN−1 , ρ) = Bλ+ρ
N
− Bλ−ρ
N
,
yields
N ωN j N − 1 sN j
Lj (SλN−1 ) =2 λ =2 λ (6.3.8)
j ωN−j j sN−j
if N − 1 − j is even and 0 otherwise.
Further examples can be found, for example, in [139].
A useful normalization of the intrinsic volumes are the so-called Minkowski func-
tionals, defined as
Mj (A) = (j !ωj )LN−j (A), (6.3.9)
so that, when expressed in terms of the Mj , Steiner’s formula now reads like a Taylor
series expansion
N
ρj
λN (Tube(A, ρ)) = Mj (A). (6.3.10)
j!
j =0
It is an important and rather deep fact, due to Weyl [168] in the manifold setting,11
that the Lj are independent of the ambient space in which sets sit. Because of
the reversed numbering system and the choice of constants, this is not true of the
Minkowski functionals.12
11 See also [73, 92] and Chapter 10, where we develop this in detail.
12 To see why the M are dependent on the ambient space, let i
j NM be the standard in-
clusion of RN into RM , M ≥ N, defined by iNM (x) = (x1 , . . . , xN , 0, . . . , 0) ∈ RM
6.3 Intrinsic Volumes 145
There is another way to define intrinsic volumes that works just as well for our
main scenario of basic complexes as it does for convex sets, based on the idea of
kinematic density.13 Let GN = RN be the isometry group (of rigid motions) on
RN , and equip it with the Haar measure μN , normalized to be Lebesgue measure on
RN and the invariant probability measure on the rotation group O(N ). A formula of
Hadwiger states that
9 :
N
Lj (A) = ϕ(A ∩ gEN−j )μN (dg), (6.3.11)
j GN
The way around this will be to develop a somewhat different notion, that of
Lipschitz–Killing curvatures, which we shall do in Chapters 7 and 8 in the setting on
for x ∈ RN . Consider M > N, and note that the polynomials λM (Tube(A, ρ)) and
λM (Tube(iNM (A), ρ)) lead to different geometric interpretations. For example, for a
curve C in R2 , M1 (C) will be proportional to the arc length of C and M2 (C) = 0, while
M2 (i2,3 (C)), rather than M1 (i2,3 (C)), is proportional to arc length.
13 We shall meet this again, in a slightly different format and in far more detail, as a special
case of Crofton’s formula, Theorem 13.1.1. In Chapter 13 we shall also develop a sig-
nificant extension of the Hadwiger formula (6.3.11), and use it to lift results about Euler
characteristics to results about Lipschitz–Killing curvatures.
146 6 Integral Geometry
Consider the last term here for large u. In that case both ψ(A) and ψ(B) will
be small. If A and B are disjoint, then it is reasonable to expect that the events
{supA f > u} and {supB f >} will be close to independent,14 and so the final term
in (6.3.15) would be a product of two very small terms, and so of smaller order than
either.
14 This would happen, for example, if A and B were sufficiently distant for the values of f in
A and B to be close to independent. It turns out that at least for Gaussian f , these heuristics
are true even if A and B are close, as long as u is large enough. We shall not treat these
results, but you can find examples in [17] and [97].
6.3 Intrinsic Volumes 147
where ϕ is the Euler characteristic and Au an excursion set, is one of the main punch
lines of this book and of the last few years of research in Gaussian random fields.
Proving this, in wide generality, and computing the coefficients cj in (6.3.17) as
functions of u is what much of Part III of this book is about.
If you are interested mainly in nice Euclidean parameter spaces, and do not need
to see the details of all the proofs, you can now comfortably skip the rest of Part II,
with the exception of Section 9.4, which gives a version of Theorem 6.2.4 in general
dimensions. The same is true if you have a solid background in differential geometry,
even if you plan to follow all the proofs later on. You can return later when the need
to confirm notation arises.
7
Differential Geometry
As we have said more than once, this chapter is intended to serve as a rapid and
noncomprehensive introduction to differential geometry, basically in the format of a
“glossary of terms.’’ Most will be familiar to those who have taken a couple courses
in differential geometry, and hopefully informative enough to allow the uninitiated1
to follow the calculations in later chapters. However, to go beyond merely following
the arguments there and to reach the level of a real understanding of what is going
on, it will be necessary to learn the material from its classical sources.2
Essentially all that we have to say is “well known,’’ in the sense that it appears
in textbooks of differential geometry. Thus, the reader familiar with these books will
be able to skim the remainder of this chapter, needing only to pick up our notation
and emphases.
7.1 Manifolds
A differentiable manifold is a mathematical generalization, or abstraction, of objects
such as curves and surfaces in RN . Intuitively, it is a smooth set with a locally defined,
Euclidean, coordinate system.
Manifolds without boundary
We call M a topological N -manifold if M is a locally compact Hausdorff space such
that for every t ∈ M, there exist an open U ⊂ M containing t, an open Ũ ⊂ RN ,
and a homeomorphism ϕ : U → Ũ .
1 For first-timers we have added numerous footnotes and simple examples along the way that
are meant to help them through the general theory. In general, we shall be satisfied with the
exposition as long as we have not achieved the double failure of both boring the expert and
bamboozling the novice. We also apologize to the experts for attempting the impossible:
to reduce their beautiful subject matter to a “glossary.’’
2 For the record, the books that we found most useful were Boothby [29], Jost [88], Millman
and Parker [114], Morita [116], and O’Neill [121]. The two recent books by Lee [100, 101]
stand out from the pack as being particularly comprehensive and easy to read, and we highly
recommend them as the right place to start.
150 7 Differential Geometry
Tangent spaces
For a manifold M embedded in RN , such as a curve or a surface, it is straightforward
to envision what is meant by a tangent vector at a point t ∈ M. It is no more than
a vector with origin at t, sitting in the tangent plane to M at t. Given such a vector,
v, one can differentiate functions f : M → R along the direction v. Thus, to each v
there corresponds a local derivative. For abstract manifolds, the basic notion is not
that of a vector sitting in a tangent plane, but rather that of a differential operator.
To see how this works, we start with the simplest case, in which M = RN and
everything reduces to little more than renaming familiar objects. Here we can manage
with the atlas containing the single chart (M, iNN ), where iNN is the inclusion map.
We change notation after the inclusion, writing x = iNN (t) and M = iNN (M)
(= RN ). To every vector Xx with origin x ∈ M , we can assign a linear map from
C 1 (M ) to R as follows:
If f ∈ C 1 (M) and Xx is a vector of the form Xx = x + N i=1 ai ei , where
{ei }1≤i≤N is the standard basis for RN , we define the first-order differential operator4
3 That is, both ϕ ◦ ϕ −1 and its inverse ϕ ◦ ϕ −1 are k-times differentiable as functions from
i j j i
subsets of RN to subsets of RN .
4 Hopefully, the standard usage of X to denote both a vector and a differential operator will
x
not cause too much confusion. In this simple case, they clearly amount to essentially the
same thing. In general, they are the same by definition.
7.1 Manifolds 151
Xx by its action on f :
∂f
N
Xx f = ai . (7.1.2)
∂xi x
i=1
−1
for any f ∈ C 1 (M). Since ϕ∗,t is linear, it follows that the set
−1 ∂
ϕ∗,t (7.1.4)
∂xi ϕ(t) 1≤i≤N
5 The germs of f at x are the (equivalence) class of functions g for which f ≡ g over some
open neighborhood U of x. All local properties of f at x depend only on the germs to which
f belongs.
152 7 Differential Geometry
and is referred to as the natural basis for Tt M in the chart (U, ϕ).
(g∗ Xt )h = Xt (h ◦ g),
for any h ∈ C 1 (N ).
Vector fields
Since each Xt ∈ Tt M is a linear map on C 1 (M) satisfying the product rule, we
can construct a first-order differential operator X, called a vector field , that takes C k
functions (k ≥ 1) to real-valued functions, as follows:
In other words, a vector field is a map that assigns, to each t ∈ M, a tangent vector
Xt ∈ Tt M. Thus, in a chart (U, ϕ) with coordinates (x1 , . . . , xN ), a vector field can
be represented (cf. (7.1.5)) as
N
∂
Xt = ai (t) . (7.1.6)
∂xi t
i=1
If the ai are C k functions, we can talk about C k vector fields. Note that for j ≥ 1, a
C k vector field maps C j (M) to C min(j −1,k) (M).
6 C k (M; N), the space of “k-times differentiable functions from M to N ,’’ is defined analo-
gously to C 1 (M). Thus f ∈ C k (M; N) if for all t ∈ M, there are a chart (UM , ϕM ) for M
and a neighborhood V of t with V ⊂ UM such that f (V ) ⊂ UN for some chart (UN , ϕN )
for N for which the composite map ϕN ◦ f ◦ (ϕM )−1 : ϕM (V ) → ϕN (f (V )) is C k in the
usual, Euclidean, sense.
7.1 Manifolds 153
Vector bundles
A C l vector bundle is a triple (E, M, F ), along with a map π : E → M, where
E and M are, respectively, (N + q)- and N -dimensional manifolds of class at least
C l and F is a q-dimensional vector space. The manifold E is locally a product.
That is, every t ∈ M has a neighborhood U such that there is a homeomorphism
ϕU : U ×F → π −1 (U ) satisfying (π ◦ϕU )(t, fU ) = t, for all fU ∈ F . Furthermore,
for any two such overlapping neighborhoods U, V with t ∈ U ∩ V , we require that
Tangent bundles
Perhaps the most natural example of a vector bundle is obtained from the collection
of all tangent spaces Tt M, t ∈ M. This can be parameterized in a natural way into a
manifold, T (M), called the tangent bundle.
In a chart (U, ϕ), any tangent vector Xt , t ∈ U , can be represented as in (7.1.6),
so the set of all tangent vectors at points t ∈ U is a 2N -dimensional space, with local
coordinates 3 ϕ (Xt ) = (x1 (t), . . . , xN (t); a1 (t), . . . , aN (t)). Call this Et , and call the
projection of Et onto its last N coordinates Ft . Denote the union (over t ∈ M) of
the Et by E, and of the Ft by F , and define the natural projection π : E → M given
by π(Xt ) = t, for Xt ∈ Et . The triple (E, M, F ), along with π , defines a vector
bundle, which we call the tangent bundle of M and denote by T (M).
The tangent bundle of a manifold is itself a manifold, and as such can be given a
differential structure in the same way that we did for M. To see this, suppose M is a
C k manifold and note that an atlas {Ui , ϕi }i∈I on M determines a covering on T (M),
the charts {π −1 (Ui ), ϕ3i }i∈I of which determine a topology on T (M), the smallest
topology such that 3 ϕ|π −1 (U ) are homeomorphisms. In any two charts (U, ϕU ) with
local coordinates (x1 , . . . , xN ) and (V , ϕV ) with local coordinates (y1 , . . . , yN ) with
U ∩ V = ∅, a vector Xt ∈ T (M) is represented by
154 7 Differential Geometry
N
∂yi
bi (t) = aj (t) , (7.1.7)
∂xj
j =1
and ∂yi /∂xj is the first-order partial derivative of the real-valued function yi with
respect to xj .
Since ϕU ◦ ϕV−1 is a C k diffeomorphism, we have that 3 ϕV−1 is a C k−1 diffeo-
ϕU ◦ 3
morphism, having lost one order of differentiation because of the partial derivatives
in (7.1.7). In summary, the atlas {Ui , ϕi }i∈I determines a C k−1 differential structure
on T (M) as claimed.
Tensors are basically linear operators on vector spaces. They have a multitude of
uses, but there are essentially only two main approaches to setting up the (somewhat
heavy) definitions and notation that go along with them. When tensors appear in
applied mathematics or physics, the definitions involve high-dimensional arrays that
transform (as the ambient space is transformed) according to certain definite rules.
The approach we shall adopt, however, will be the more modern differential-geometric
one, in which the transformation rules result from an underlying algebraic structure
via which tensors are defined. This approach is neater and serves two of our purposes:
We can fit everything we need into only six pages, and the approach is essentially
coordinate-free.7 The latter is of obvious importance for the manifold setting. The
downside of this approach is that if this is the first time you see this material, it is very
easy to get lost among the trees without seeing the forest. Thus, since one of our main
uses of tensors will be for volume (and, later, curvature) computations on manifolds,
we shall accompany the definitions with a series of footnotes showing how all of
this relates to simple volume computations in Euclidean space. Since manifolds are
locally Euclidean, this might help a first-timer get some feeling for what is going on.
7 Ultimately, however, we shall need to use the array notation when we come to handling
specific examples. It appears, for example, in the connection forms of (7.3.14) and the
specific computations of Section 7.7.
7.2 Tensor Calculus 155
where L(E; F ) denotes the set of (multi)linear8 mappings between two vector spaces
E and F . We denote the space of tensors of order (n, m) by Tmn , where n is said to
be the covariant order 9
B∞ and mj the contravariant order.
Let T (V ) = i,j =1 Ti (V ) be the direct sum of all the tensor spaces Tj (V ).
i
where S(k) is the symmetric group of permutations of k letters and εσ is the sign of
the permutation σ . It is called symmetric if
For example, computing the determinant of the matrix formed from N -vectors gives
an alternating covariant tensor of order N on RN .
For k ≥ 0, we denote by k (V ) (respectively, Sym(T0k (V ))) the space of alter-
nating (symmetric) covariant k-tensors on V , and by
∞
C ∞
C
∗ (V ) = k (V ), Sym∗ (V ) = Sym(T0k (V )),
k=0 k=0
1
Aγ (v1 , . . . , vk ) = εσ γ (vσ (1) , . . . , vσ (k) ),
k!
σ ∈S(k)
1
Sγ (v1 , . . . , vk ) = γ (vσ (1) , . . . , vσ (k) ).
k!
σ ∈S(k)
Grassmann algebra
The bilinear, associative wedge product,10 or exterior product, ∧ : ∗ (V ) ×
∗ (V ) → ∗ (V ) is defined on r (V ) × s (V ) by
(r + s)!
α∧β = A(α ⊗ β). (7.2.1)
r!s!
N
B∗ (V ) = {θ0 } ∪ θi1 ∧ · · · ∧ θij : i1 < i2 < · · · < ij , (7.2.2)
j =1
10 Continuing the example of footnote 9, take u, v ∈ R3 and check that this definition gives
(θ1 ∧ θ2 )(u, v) = (u1 v2 − v1 u2 ), which is, up to a sign, the area of the parallelogram
with corners 0, π(u), π(v), and π(u) + π(v), where π(u) is the projection of u onto the
plane spanned by the first two coordinate axes. That is, this particular alternating covariant
tensor of order 2 performs an area computation, where “areas’’ may be negative. Now take
u, v, w ∈ R3 , and take the wedge product of θ1 ∧ θ2 with θ3 . A little algebra gives that
(θ1 ∧ θ2 ∧ θ3 )(u, v, w) = detu, v, w, where u, v, w is the matrix with u in the first
column, etc. In other words, it is the (signed) volume of the parallelepiped three of whose
edges are u, v, and w. Extending this example to N dimensions, you should already be able
to guess a number of important facts, including the following:
(i) Alternating covariant tensors of order n have a lot to do with computing n-dimensional
(signed) volumes.
(ii) The wedge product is a way of combining lower-dimensional tensors to generate higher-
dimensional ones.
(iii) Our earlier observation that k (V ) ≡ {0} if k > N translates, in terms of volumes, to the
trivial observation that the k-volume of an N -dimensional object is zero if k > N .
(iv) Since area computations and determinants are intimately related, so will be tensors and
determinants.
7.2 Tensor Calculus 157
The unique inner product on ∗ (V ) that makes this basis orthogonal is the “cor-
responding’’ inner product to which we referred above.
Algebra of double forms
Define n,m (V ) to be the linear span of the image of n (V ) × mB
(V ) under the map
⊗, and define, with some abuse of notation, ∗ (V ) ⊗ ∗ (V ) = ∞ n,m=0
n,m (V ).
n,m
An element of (V ) is called a double form of type (n, m). Note that an (n, m)
double form is alternating in its first n and last m variables.
The double wedge product · on ∗ (V ) ⊗ ∗ (V ) is defined on tensors of the form
γ = α ⊗ β ∈ ∗ (V ) × ∗ (V ) by
(α ⊗ β) · (α ⊗ β ) = (α ∧ α ) ⊗ (β ∧ β ), (7.2.3)
and then extended by linearity. For γ ∈ n,m (V ) and θ ∈ p,q (V ) this gives
γ · θ ∈ n+p,m+q (V ), for which
(γ · θ ) u1 , . . . , un+p , v1 , . . . , vm+q (7.2.4)
1
= εσ ερ γ uσ1 , . . . , uσn , vρ1 , . . . , vρm
n!m!p!q!
σ ∈ S(n + p)
ρ ∈ S(m + q)
×θ uσn+1 , . . . , uσn+p , vρm+1 , . . . , vρm+q .
Note that a double form of type (n, 0) is simply an alternating covariant
B tensor, so that
comparing (7.2.4) with (7.2.1), it is clear that the restriction of · to N j,0
j =0 (V ) is
just the usual wedge product.
We shall be most interested in the restriction of this product to
∞
C
∗,∗ (V ) =
j,j (V ).
j =0
Tr(γ ) = α, β
1
N
Tr(γ ) = γ (va1 , . . . , vak ), (va1 , . . . , vak ) . (7.2.6)
k!
a1 ,...,ak =1
If γ ∈ 0,0 (V ), then γ ∈ R and we define Tr(γ ) = γ . One can also check that
while the above seemingly depends on the choice of basis, the trace operator is, in
fact, basis-independent, a property generally referred to as invariance.
There is also a useful extension to (7.2.6) for powers of symmetric forms in
γ ∈ 1,1 . Using (7.2.5) to compute γ j , and (7.2.6) to compute its trace, it is easy to
check that
Tr γ j = j ! detr j γ vi , vj i,j =1,...,N , (7.2.7)
When there is more than one inner product space under consideration, say V1 and
V2 , we shall denote their traces by Tr V1 and Tr V2 .
For a more complete description of the properties of traces, none of which we
shall need, you could try [64, Section 2]. This is not easy reading, but is worth the
effort.
N
I= θi ⊗ θ i , (7.2.10)
i=1
7.2 Tensor Calculus 159
gives what is known as the Grassmann bundle of M. As was the case for vector
fields, one can define C j , j ≤ k − 1, sections of ∗ (M), and these are called the C j
differential forms15 of mixed degree.
13 The finite dimensionality of spaces T n (T M), noted earlier, is crucial to make this con-
m t
struction work.
14 Included in this “differential structure’’is a set of rules describing how tensor fields transform
under transformations of the local coordinate systems, akin to what we had for simple vector
fields in (7.1.7).
In fact, there is a lot hidden in this seemingly simple sequence of constructions. In
particular, recall that at the beginning of our discussion of tensors we mentioned that the
tensors of applied mathematics and physics are defined via arrays that transform according
to very specific rules. However, nowhere in the path we have chosen have these demands
explicitly appeared. They are, however, implicit in the constructions that we have just
carried out.
15 The reason for the addition of the adjective “differential’’ will be explained later; cf. (7.3.16)
and the discussion following it.
160 7 Differential Geometry
Similarly, carrying out the same construction over the k (Tt M) gives the bundle
of (differential) k-forms on M, while carrying it out for ∗,∗ (Tt M) yields the bundle
of double (differential) forms on M.
If you followed the footnotes while we were developing the notion of tensors, you
will have noted that we related them to the elementary notions of area and volume in
the simple Euclidean setting of M = RN . In general, of course, they are somewhat
more complicated. In fact, if you think back, we do not as yet even have a notion
of simple distance on M, let alone notions of volume. For this, we need to add a
Riemannian structure to the manifold.
Riemannian manifolds
Formally, a C k−1 Riemannian metric g on a C k manifold M is a C k−1 section of
Sym(T02 (M)) such that for each t in M, gt is positive definite; that is, gt (Xt , Xt ) ≥ 0
for all t ∈ M and Xt ∈ Tt M, with equality if and only if Xt = 0. Thus a C k−1
Riemannian metric determines a family of smoothly (C k−1 ) varying inner products
on the tangent spaces Tt M.
A C k manifold M together with a C k−1 Riemannian metric g is called a C k
Riemannian manifold (M, g).
where D 1 ([0, 1]; M)(s,t) is the set of all piecewise C 1 maps c : [0, 1] → M with
c(0) = s, c(1) = t. When the Riemannian metric g is unambiguous, we write τg
as τ .
A curve in M connecting two points s, t ∈ M, along which the infimum in (7.3.1)
is achieved, is called a geodesic connecting them. Geodesics need not be unique.
gt (∇ft , Xt ) = Xt f (7.3.2)
The second problem is that Yc(δ) ∈ Tc(δ) R3 and Yt ∈ Tt R3 , so that the above difference
is also not well defined. However, fixing a basis for R3 , we can use the natural
identifications of Tt R3 and Tc(δ) R3 with R3 to define the difference Yc(δ) − Yt and so,
under appropriate conditions on Y , to also define the limit in (7.3.3).
The third problem is clearest in the case in which Y is also a tangent vector field,
and for reasons that will become clearer later we decide that we would like a notion
of derivative that also yields a tangent vector in Tt M. The above construction will
not necessarily do this, even when (7.3.3) exists and yields a vector in Tt R3 . This
last problem can, however, be easily solved by projecting (using the natural basis for
Tt R3 ) the limit (7.3.3) onto Tt M ⊂ Tt R3 .
This solves the problem of differentiating vector fields when M is embedded in a
general Euclidean space, since there is really nothing special about dimension three
in the above construction. We shall denote the result of the above construction by
∇X Y ≡ (∇X Y )(t),
162 7 Differential Geometry
and call it the covariant derivative16 of Y in the direction X. Note that unless the
surface M is flat, ∇X Y (t) is quite different from the usual derivative Xt Yt . In partic-
ular, while ∇X Y (t) ∈ Tt M, the same is not generally true of Xt Yt . Furthermore, in
general, ∇X Y = ∇Y X.
It is not trivial to extend the above construction to general manifolds, and so we
shall adopt a path of definition by characterization, rather than by construction. As
a first step, we define the notion of a (linear) connection on the tangent bundle of a
manifold M as a bilinear mapping
It is easy to check that the covariant derivative we defined above for the Euclidean
setting satisfies all of the above three requirements. Furthermore, it is quite easy to
generate connections on a general manifold.
In particular, given a chart (U, ϕ) on an N -dimensional manifold M, suppose that
we have N 3 functions {ijk : U → R}1≤i,j,k≤N . If, using the natural bases for the
Tt M, we define
N
∇ ∂
∂
∂xj = ijk ∂x∂ k , (7.3.7)
∂xi
k=1
and extend ∇ to T (M) × T (M) by linearity, then it is easy to check that the three
conditions required of a connection are satisfied on U . It is then standard fare to patch
charts together to construct a connection on the full manifold, given enough functions
ijk . In other words, if we can determine the ijk for a given connection, then we have
determined the connection. Note, however, that there is uniqueness here in only one
direction. It is clear that the ijk determine a connection. It is neither clear nor in
general true that on a given Riemannian manifold there is only one connection.
For uniqueness, we need to demand a little more. We start by noticing that the con-
struction we gave of the Euclidean connection, with its projection onto tangent spaces,
actually determines a unique set of ijk . This follows from properties (7.3.4)–(7.3.6),
the representation (7.3.7), and the following two consequences of the construction:
16 Hopefully, the double usage of ∇ for both gradient and the covariant derivative (and, soon,
the Riemannian connection) will not cause too many difficulties. Note that, like the gradient,
the connnection “knows’’ about the Riemannian metric g. The reason for this will become
clear soon.
7.3 Riemannian Manifolds 163
∇ ∂ =∇ ∂∂
∂xj
∂
∂xi , (7.3.8)
∂xi ∂xj
D E D E D E
∂xi ∂xj , ∂xk = ∇ ∂ , ∂x∂ k + ∇
∂ ∂ ∂ ∂ ∂ ∂
∂xj ∂
∂xk , ∂xj , (7.3.9)
∂xi ∂xi
where [X, Y ]f = XYf − Y Xf is the so-called Lie bracket17 of the vector fields X
and Y .18
It is a matter of simple algebra to derive from (7.3.10) and (7.3.11) that for C 1
vector fields X, Y, Z,
This equation is known as Koszul’s formula. Its importance lies in the fact that the
right-hand side depends only on the metric g and differential-topological notions.
Consequently, it gives a coordinate-free formula that actually determines ∇.
Another way to determine the Levi-Civitá connection is via an extension of
(7.3.7), which was written in terms of the natural bases of the Tt M. For this, we
need the notion of a C k orthonormal frame field {Ei }1≤i≤N , which is a C k section
of the orthonormal (with respect to g) frame bundle O(M). Then, given a connec-
tion ∇, an orthonormal frame field E on M, and defining N 3 functions ijk by the
relationships
N
∇Ei Ej = ijk Ek ,
k=1
and extending by linearity. The functions ijk are known as the Christoffel symbols
(of the second kind) of the connection ∇ (with respect to the given orthonormal frame
field).
Yet another way to determine the Levi-Civitá connection, which is essentially a
j
rewriting of the previous paragraph, is to define a collection of N 2 1-forms {θi }N
i,j =1 ,
known as connection forms, via the requirement that
N
j
∇X Ei = θi (X)Ej . (7.3.14)
j =1
the importance of Koszul’s formula for determining the connection is now clear. We
j
shall see in detail how to compute the θi for some examples in Section 7.7. In general,
they are determined by (7.3.19) below, in which {θi }1≤i≤N denotes the orthonormal
dual frame corresponding to {Ei }1≤i≤N . To understand (7.3.19) we need one more
concept, that of the (exterior) differential of a k-form.
If f : M → R is C 1 , then its exterior derivative or differential, df , is the 1-form
defined by
df (Eit ) = fi (t) = Eit f. (7.3.15)
N
df = fi θ i .
i=1
N
If θ = i=1 hi θi is a 1-form, then its exterior derivative is the 2-form
N
dθ = dhi ∧ θi . (7.3.16)
i=1
Note that the exterior derivative of a 0-form (i.e., a function) gave a 1-form, and
that of a 1-form gave a 2-form. There is a general notion of exterior differentiation,
which in general takes k-forms to (k + 1)-forms,19 but which we shall not need.
19 This is why we used the terminology of “differential forms’’ when discussing Grassmann
bundles at the end of Section 7.1.
7.3 Riemannian Manifolds 165
j
We do now, however, have enough to define the 1-forms θi of (7.3.14). They are
the unique set of N 2 1-forms satisfying the following two requirements:
N
dθ i − θ j ∧ θji = 0, (7.3.17)
j =1
j
θi + θji = 0. (7.3.18)
∇ċ ċ = 0.
Hence, for every ċ(0) ∈ Tt M, there is a unique geodesic leaving t in the direction
ċ(0)/|ċ(0)|, obtained by solving the above second-order differential equation.
This procedure is formalized by the exponential map
ϕt = E −1 ◦ exp−1 : U → RN .
(7.3.21)
where ∇X is the Levi-Civitá connection of (M, g). The equality here is a consequence
of (7.3.11) and (7.3.2).
Note the obvious but important point that if t is such that ∇f (t) = 0,21 then
Xf (t) = 0 for all X ∈ Tt M (cf. (7.3.2)), and so by (7.3.22) it follows that
∇ 2 f (X, Y )(t) = XYf (t). Consequently, at these points the Hessian is indepen-
dent of the metric g.
We now have the tools and vocabulary to start making mathematical sense out of
our earlier comments linking tensors and differential forms to volume computations.
However, rather than computing only volumes, we shall need general measures over
manifolds. Since a manifold M is a locally compact topological space, there is
no problem defining measures over it, and by Riesz’s theorem, these are positive,
bounded linear functionals on Cc0 (M), the c denoting compact support. This descrip-
tion, however, does not have much of a geometric flavor to it, and so we shall take a
different approach. The branch of mathematics of “adding a geometric flavor’’ to this
description is known, appropriately, as geometric measure theory. A user-friendly
introduction to this beautiful area of mathematics can be found in Morgan’s book
[115]. The definitive treatment, however, is still to be found in Federer’s treatise
[65]. We shall not, however, have need for much of this rather heavy machinery.
(g ∗ α)(X1 , . . . , Xk ) = α(g∗ X1 , . . . , g∗ Xk ).
(7.4.1)
The pullback has many desirable properties, the main ones being that it commutes
with both the wedge product of forms and the exterior derivative d (cf. (7.3.16)).
With the notion of a pullback defined, we can add one more small piece of nota-
tion. Take a chart (U, ϕ) of M, and recall the notation of (7.1.5) in which we used
{∂/∂xi }1≤i≤N to denote both the natural Euclidean basis of ϕ(U ) and its push-forward
to T (U ). We now do the same with the notation
dx1 , . . . , dxN , (7.4.2)
which we use to denote both the natural dual coordinate basis in RN (so that dxi (v) =
vi ) and its pullback under ϕ.
Now we can start defining integration, all of which hinges on a single definition:
If A ⊂ RN , and f : A → R is Lebesgue integrable over A, then we define
f dx1 ∧ · · · ∧ dxN = f (x) dx, (7.4.3)
A A
where {dxi }1≤i≤N , as above, is the natural dual coordinate basis in RN , and the
right-hand side is simply Lebesgue integration.
Since the wedge products in the left-hand side of (7.4.3) generate all N -forms on
RN (cf. (7.2.2)), this and additivity solves the problem of integrating N -forms on RN
in full generality.
Now we turn to manifolds. For a given chart, (U, ϕ), and an N -form α on U , we
define the integral of α over V ⊂ U as
∗
ϕ −1 α,
α= (7.4.4)
V ϕ(V )
Given an oriented manifold M with atlas as above, one can also define the notion
of an oriented (orthonormal) frame field over M. This is a C 0 (orthonormal) frame
field {E1t , . . . , EN t } over M for which, for each chart (U, ϕ) in the atlas, and at each
168 7 Differential Geometry
From this, and the fact that is a differential form, it follows that for Xi ∈ Tt M,
where g is the Riemannian metric. This determines, for each t, a positive definite
matrix (gij (t))1≤i,j ≤N . Then, for A ⊂ U ,
≡ (ϕ −1 )∗ (7.4.9)
A ϕ(A)
= det(gij ◦ ϕ −1 ) dx1 ∧ · · · ∧ dxN
ϕ(A)
= det(gij ◦ ϕ −1 ) dx,
ϕ(A)
where the crucial intermediate term comes from (7.4.3)–(7.4.8) and some algebra.
Under orientability, both this integral and that in (7.4.4) can be extended to larger
subsets of M by additivity.22 This yields a σ -finite measure μ associated to ,
called Riemannian measure, which, with some doubling up of notation, we shall also
write as Volg when convenient.
For obvious reasons, we shall often write the volume form as dx1 ∧ · · · ∧ dxN ,
where the 1-forms dxi are the (dual) basis of T ∗ (M).
An important point to note is that Volg agrees with the N -dimensional Hausdorff
measure23 associated with the metric τ induced by g, and so we shall also often write
22 The last line of (7.4.9), when written out in longhand, should be familiar to most readers as
an extension of the formula giving the “surface area’’ of a regular N-dimensional surface.
The extension lies in the fact that an arbitrary Riemannian metric now appears, whereas in
the familiar case there is only one natural candidate for g.
23 If M is an N-dimensional manifold, treat (M, τ ) as a metric space, where τ is the geodesic
metric given by (7.3.1). The diameter of a set S ⊂ M is then diam(S) = sup{τ (s, t) : s, t ∈
S}, and for integral n, the Hausdorff n-measure of A ⊂ M is defined by
7.4 Integration on Manifolds 169
it as HN . In this case we shall usually write integrals as M h(t) dHN (t) rather than
M hHN , which would be more consistent with our notation so far.
We now return to the issue of orientability. In setting up the volume form , we
first fixed an orthonormal frame field and then demanded that t (E1t , . . . , ENt ) =
+1 for all t ∈ M (cf. (7.4.6)). We shall denote M, along with this orientation, by
M + . However, there is another orientation of M for which = −1 when evaluated
on the orthonormal basis. We write this manifold as M − . On an orientable manifold,
there are only two such possibilities.
In fact, it is not just that can be used to determine an orientation. Any non-
vanishing continuous N -form α on an orientable manifold can be used to determine
one of two orientations, with the orientation being determined by the sign of α on the
orthonormal basis at any point on M. We can thus talk about the “orientation induced
by α.’’
With an analogue for Lebesgue measure in hand, we can set up the analogues of
Borel measurability and (Lebesgue) integrability in the usual ways. Furthermore, it
follows from (7.4.3) that there is an inherent smoothness24 in the above construction.
In particular, if M is compact, then for any continuous, nonvanishing N -form α that
induces the same orientation on M as does , there is an -integrable function dα/d
for which
dα
α= .
M M d
For obvious reasons, dα/d is called the Radon–Nikodym derivative of α.
From now on, and without further comment, we shall assume that all manifolds in
this book are orientable, and the frame fields and orientation chosen so as the make
the volume form positive.
Coarea formula
The following important result, due to Federer [64] and known as his coarea formula,
allows us to break up integrals over manifolds into iterated integrals over submanifolds
of lower dimension. Consider a differentiable map f : M → N between two
Hn (A) = ωn 2−n lim inf
(diam Bi )n ,
ε↓0
i
where for each ε > 0, the infimum is taken over all collections {Bi } of open τ -balls in M
whose union covers A and for which diam(Bi ) ≤ ε. As usual, ωn is the volume of the unit
ball in Rn . For the moment, we need only the case n = N . When both are defined, and the
underlying metric is Euclidean, Hausdorff and Lebesgue measures agree.
Later we shall need a related concept, that of Hausdorff dimension. If A ⊂ M, the
Hausdorff dimension of A is defined as
dim(A) = inf α : lim inf (diam Bi )α = 0 ,
ε↓0
i
Federer’s coarea formula [64] states that for differentiable25 f and for g ∈
L1 (M, B(M), Hm ),
g(t)Jf (t) dHm (t) = dHn (u) g(s) dHm−n (s). (7.4.13)
M N f −1 (u)
There is not a great deal of simplification in this case beyond the fact that it is
easy to see what the functional J is. On the other hand, if M = N = RN , then
Jf = | det ∇f |, and
g(t)| det ∇f (t)| dt = du g(s) dH0 (s) (7.4.15)
RN RN f −1 {u}
⎛ ⎞
= ⎝ g(t)⎠ du.
RN t:f (t)=u
We shall return to (7.4.15) in Section 11.4 and to the coarea formula in general in
Chapter 15, where it will play a very important role in our calculations.
25 Federer’s setting is actually somewhat more general than this, since he works with Lipschitz
mappings. In doing so he replaces derivatives with “approximate derivatives’’ throughout.
7.5 Curvature 171
7.5 Curvature
We now come to what is probably the most central of all concepts in differential ge-
ometry, that of curvature. In essence, much of what we have done so far in developing
the calculus of manifolds can be seen as no more than setting up the basic tools for
handling the ideas to follow.
Curvature is the essence that makes the local properties of manifolds inherently
different from simple Euclidean space, where curvature is always zero. Since there
are many very different manifolds, and many different Riemannian metrics, there are
a number of ways to measure curvature. In particular, curvature can be measured
in a somewhat richer fashion for manifolds embedded in ambient spaces of higher
dimension than it can for manifolds for which no such embedding is given.26 A
simple example of this is given in Figure 7.5.1, where you should think of the left-
hand circle S 1 as being embedded in the plane, while the right-hand circle exists
without any embedding. In the embedded case there are notions of “up’’ and “down,’’
with the two arrows at the top and bottom of the circle pointing “up.’’ In one case the
circle curves “away’’ from the arrow, in the other, “toward’’ it, so that any reasonable
treatment of curvature has to be able to handle this difference. However, for the
nonembedded case, in which there is nothing external to the circle, the curvature
must be the same everywhere. In what follows, we shall capture the first, richer,
notion of curvature via the second fundamental form of the manifold, and the second
via its curvature tensor, which is related to the second fundamental form, when both
are defined, via the Gauss equation (7.5.9) below.
26 We have used this term often already, albeit in a descriptive sense. The time has come to
define it properly: Suppose f : M → M is C 1 . Take t ∈ M and charts (U, ϕ) and (V , ψ)
containing t and f (t), respectively. The rank of f at t is defined to be the rank of the
mapping ψ ◦ f ◦ ϕ −1 : ϕ(U ) → ψ(V ) between Euclidean spaces. If f is everywhere
of rank dim M, then it is called an immersion. If dim M = dim M, then it is called a
submersion. Note that this is a purely local property of M.
If, furthermore, f is a one-to-one homeomorphism of M onto its image f (M) (with its
topology as a subset of M), then we call f an embedding of M in M and refer to M as an
embedded (sub)manifold and to M as the ambient manifold. This is a global property, and
amounts to the fact that M cannot “intersect’’ itself on M.
Finally, let M and M be Riemannian manifolds with metrics g and g, respectively. Then
we say that (M, g) is an isometrically embedded Riemannian manifold of (M, g) if, in
addition to the above, g = f ∗ g, where f ∗ g is the pullback of g (cf. (7.4.1)).
172 7 Differential Geometry
Note that if [X, Y ] = 0, as is the case when Xt and Yt are coordinate vectors in the
natural basis of some chart, then R(X, Y ) = ∇X ∇Y − ∇Y ∇X , and so the operator R
is the first measure of lack of commutativity of ∇X and ∇Y mentioned above.
The (Riemannian) curvature tensor, also denoted by R, is defined by
R(X, Y, Z, W ) = g ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z, W (7.5.2)
= g(R(X, Y )Z, W ),
where the R in the second line is, obviously, the curvature operator. It is easy to check
that for RN , equipped with the standard Euclidean metric, R ≡ 0.28
The definition (7.5.2) of R is not terribly illuminating, although one can read it
as “the amount, in terms of g and in the direction W , by which ∇X and ∇Y fail to
commute when applied to Z.’’ To get a better idea of what is going on, it is useful to
think of planar sections of Tt M.
For any t ∈ M, we call the span of two linearly independent vectors Xt , Yt ∈ Tt M
the planar section spanned by Xt and Yt , and denote it by π(Xt , Yt ). Such a planar
section is determined by any pair of orthonormal vectors E1t , E2t in π(Xt , Yt ), and
κ(π ) = −R (E1t , E2t , E1t , E2t ) = R (E1t , E2t , E2t , E1t ) (7.5.3)
is called the sectional curvature of the planar section. It is independent of the choice of
basis. Sectional curvatures are somewhat easier to understand than the curvature ten-
sor, but essentially equivalent, since it is easy to check from the symmetry properties
of the curvature tensor that it is uniquely determined by the sectional curvatures.
We shall later need a further representation of R, somewhat reminiscent of the
representation (7.3.14) for the Riemannian connection. The way that R was defined
27 Note that the curvature operator depends on the underlying Riemannian metric g via the
dependence of the connection on g.
28 Manifolds for which R ≡ 0 are called flat spaces. However, not only Euclidean space is
flat. It is easy to check, for example, that the cylinder S 1 × R is also flat when considered
as a manifold embedded in R3 with the usual Euclidean metric.
7.5 Curvature 173
1
N
R= ij ⊗ θi ∧ θj , (7.5.4)
2
i,j =1
Projection formula
Although it is more a result from linear algebra than differential geometry, since we are
about to start talking about projections, and these will appear frequently throughout
the book, we recall the following:
Suppose v is an element in a vector space V with inner product ·, ·, and
w1 , . . . , wn are vectors in V . Then the projection of v onto the span of the wj is
given by
n
Pspan(w1 ,...,wn ) v = wi , vg ij wj , (7.5.6)
i,j =1
where the g ij are the elements of G−1 , and G is the matrix with elements gij =
wi , wj .
where πM is the canonical projection on the tangent bundle T (M), and T (M) and
T ⊥ (M) are embedded subbundles of T (M).
The second fundamental form of M in M can now be defined to be the operator
S from T (M) × T (M) to T ⊥ (M) satisfying
S(X, Y ) = PT⊥(M) ∇X Y = ∇X Y − ∇X Y,
(7.5.8)
In the special case that M is a manifold with smooth boundary ∂ M = M, the above
simplifies to
S 2 ((X, Y ), (Z, W )) = −2 R ∂ M (X, Y, Z, W ) − R M (X, Y, Z, W ) . (7.5.10)
Now let ν denote a unit normal vector field on M, so that νt ∈ Tt⊥ M for all
t ∈ M. Then the scalar second fundamental form of M in M for ν is defined, for
X, Y ∈ T (M), by
Sν (X, Y ) = g (S(X, Y ), ν) , (7.5.11)
where the internal S on the right-hand side refers to the second fundamental form
(7.5.8). Note that despite its name, the scalar fundamental form is not a differential
form, since it is symmetric (rather than alternating) in its arguments. When there is
no possibility of confusion we shall drop the qualifier “scalar’’ and refer also to Sν as
the second fundamental form.
In view of the fact that ∇ is compatible with the metric g (cf. (7.3.11)), we also
have the so-called Weingarten equation that for X, Y ∈ T (M) is given by
for all Y ∈ T (M). Then Sν is known as the shape operator. It has N real eigenvalues,
known as the principal curvatures of M in the direction ν, and the corresponding
eigenvectors are known as the principal curvature vectors.
7.6 Intrinsic Volumes for Riemannian Manifolds 175
All of the above becomes quite familiar and particularly useful if M is a simple
surface determined by the graph of f : RN → R, with the usual Euclidean met-
ric. In this case, the principal curvatures are simply the eigenvalues of the Hessian
i,j =1 (cf. (7.3.22)). In particular, if (M, g) is a surface in R with the
(∂ 2 f/∂xi ∂xj )N 3
induced Euclidean metric, then the product of these eigenvalues is known as Gauss-
ian curvature and it is Gauss’s Theorema Egregium that it is an isometry invariant of
(M, g), i.e., it is independent of the embedding of M in R3 . In the next section we
shall see that the Gaussian curvature is not unique as far as isometric invariance goes
and that there are also other invariants, which can be obtained as integrals of mixed
powers of curvature and second fundamental forms over manifolds.
In Section 6.3, in the context of integral geometry, we described the notions of intrinsic
volumes and Minkowski functionals and how they could be used, via Steiner’s formula
(6.3.3), to find an expression for the tube around a convex Euclidean set. In Chapter 10
we shall extend these results considerably, looking at tubes around submanifolds of
some larger manifold. In doing so we shall encounter extensions of intrinsic volumes
known as Lipschitz–Killing curvatures, and study them in some depth.
However, while the detailed calculations of Chapter 10 are not needed for Part III,
the same is not true of Lipschitz–Killing curvatures, which will have a crucial role to
play there. Hence we define them now, with no further motivation.
Let (M, g) be a C 2 Riemannian manifold. For each t ∈ M, the Riemannian
curvature tensor Rt given by (7.5.2) is in 2,2 (Tt M), so that for j ≤ dim(M)/2 it
j
makes sense to talk about the j th power Rt of Rt . We can also take the trace
j
Tr M (R j )(t) = Tr Tt M (Rt ).
While (7.6.1) is a tidy formula, the integral is not always easy to compute. Perhaps
the easiest example is given by the Lipschitz–Killing curvatures of S N , which you
should be able to compute directly from (7.6.1). We shall carry out the computation
at the end of the following section, after having looked a little more carefully at some
of the components making up the trace in the integrand.
176 7 Differential Geometry
Christoffel symbols
We first met Christoffel symbols back in Section 7.3, where we used them to motivate
the construction of Riemannian connections. Now, however, we want to work in the
opposite direction. That is, given M and g, which uniquely define a Levi-Civitá
connection, we want to represent this connection in a form conducive to performing
computations. In doing this we shall recover the Christoffel symbols of Section 7.3
as well as develop a related class of symbols.
We start with {Ei }1≤i≤N , the standard30 coordinate vector fields on RN . This
also gives the natural basis in the global chart (RN , i), where i is the inclusion map.
We now define31 the so-called Christoffel symbols of the first kind of ∇,
ij k = g(∇Ei Ej , Ek ), 1 ≤ i, j, k ≤ N. (7.7.1)
We further define
gij = g(Ei , Ej ). (7.7.2)
Despite the possibility of some confusion, we also denote the corresponding matrix
function by g = (gij )N
i,j =1 , doubling up on the notation for the metric.
With this notation it now follows via a number of successive applications of
(7.3.10) and (7.3.11) that
ij k = Ej gik − Ek gij + Ei gj k /2. (7.7.3)
We need two more pieces of notation, the elements g ij of the inverse matrix g −1
and the Christoffel symbols of the second kind of ∇, defined by
N
ijk = g ks ij s .
s=1
30 Note that while the E might be “standard,’’ there is no reason why they should be the “right’’
i
coordinate system to use for a given g. In particular, they are no longer orthonormal, since
gij of (7.7.2) need not be a Kronecker delta. Thus, although we start here, we shall soon
leave this choice of basis for an orthonormal one.
31 An alternative, and somewhat better motivated, definition of the
ij k comes by taking the
vector fields {Ei } to be orthonormal with respect to the metric g. In that case, they can
be defined via their role in determining the Riemannian connection through the set of N 2
equations ∇Ei Ej = N k=1 ij k Ek . Taking this as a definition, it is easy to see that (7.7.1)
must also hold. In general, the Christoffel symbols are dependent on the choice of basis.
7.7 A Euclidean Example 177
Riemannian curvature
With the definitions above, it is now an easy and standard exercise to show that if the
metric g is C 2 , then
RijEkl = R((Ei , Ej ), (Ek , El )) (7.7.4)
N
; s <
= gsl Ei js k − Ej ik + isl js k − j sl ik
s
s=1
N
= Ei j kl − Ej ikl + iks g st j lt − j ks g st ilt .
s,t=1
Returning to the definition of the curvature tensor, and writing {dei }1≤i≤N for the
dual basis of {Ei }1≤i≤N , it now follows (after some algebra) that the curvature tensor
itself can be written as
1
N
R= RijEkl (dei ∧ dej ) ⊗ (dek ∧ del ). (7.7.5)
4
i,j,k,l=1
Next, we express the curvature tensor in an orthonormal frame field. To this end,
let X = {Xi }1≤i≤N be a section of the orthonormal frame bundle O(M), having dual
frames {θi }1≤i≤N , so that
N 1
θi = gii2 dei , (7.7.6)
i =1
1
where g 2 is given by
1
(g 2 )ij = g(Ei , Xj )
1 1 1
and the notation comes from the easily verified fact that g 2 (g 2 ) = g, so that g 2 is
a square root of g.
It follows that
1
N
R= RijXkl (θi ∧ θj ) ⊗ (θk ∧ θl ), (7.7.7)
4
i,j,k,l=1
where
N
−1 −1 −1 −1
RijXkl = RiE j k l gii 2 gjj 2 gkk2 gll 2 = R (Xi , Xj ), (Xk , Xl ) ,
i ,j ,k ,l =1
−1 1
and you are free to interpret the gij 2 as either the elements of (g 2 )−1 or of a square
root of g −1 .
178 7 Differential Geometry
1
R= ij ⊗ θi ∧ θj .
2
With the product and general notation of (7.2.4) we can thus write R k as
1
N F F
k k
Rk = l=1 i2l−1 i2l ⊗ l=1 (θi2l−1 ∧ θi2l ) ,
2k
i1 ,...,i2k =1
where
Fk
l=1 i2l−1 i2l (Xa1 , . . . , Xa2k )
1 #
k
= εσ i2l i2l−1 (Xaσ (2l−1) , Xaσ (2l) )
2k
σ ∈S(2k) l=1
1 #
k
= εσ RiX2l−1 i2l aσ (2l−1) aσ (2l) .
2k
σ ∈S(2k) l=1
It follows that
R k (Xa1 , . . . , Xa2k ), (Xa1 , . . . , Xa2k )
⎛ ⎞
1 N
(a1 ,...,a2k ) ⎝
#
k
= 2k δ(i1 ,...,i2k ) εσ RiX2l−1 i2l aσ (2l−1) aσ (2l) ⎠ ,
2
i1 ,...,i2k =1 σ ∈S(2k) l=1
1
N
Tr(R k ) = R k (Xa1 , . . . , Xa2k ), (Xa1 , . . . , Xa2k ) (7.7.8)
(2k)!
a1 ,...,a2k =1
⎛ ⎞
1
n #
k
= 2k ⎝ εσ RaX2l−1 a2l aσ (2l−1) aσ (2l) ⎠ .
2
a1 ,...,a2k =1 σ ∈S(2k) l=1
This is the equation we have been searching for to give a “concrete’’ example of
the general theory.
7.7 A Euclidean Example 179
A little thought will show that while the above was presented as an example of
a computation on RN , it is, in fact, far more general. Indeed, you can reread the
above, replacing RN by a general manifold and the Ei by a family of local coordinate
systems that are in some sense “natural’’ for computations. Then (7.7.8) still holds,
as do all the equations leading up to it. Thus the title of this section is somewhat of a
misnomer, since the computations actually have nothing to do with Euclidean spaces!
when N − 1 − j = 0, and zero otherwise. Here t is any point on SλN−1 . Thus, all
we need to do is compute the trace. We shall indicate how to do this, and leave the
remaining algebra needed to obtain (7.7.9) to you.
The key point to note is that if X, Y, U, V are unit vectors in Tt SλN−1 , then
⎧
⎪ −2
⎨−λ , (X, Y ) = (U, V ),
R(X, Y, U, V ) = λ−2 , (X, Y ) = (V , U ), (7.7.10)
⎪
⎩
0 otherwise.
computations described above. Alternatively, at least for the nonzero cases, you can
use the “well-known’’ (and easily checkable) result that the curvature of Sλ2 is λ−2
and then use (7.5.3) to go from two to general dimensions.
Once we have (7.7.10), then (7.7.8) and a little counting gives
(N − 1)!
Tr R k = (−1)k λ−2k 2−k .
(N − 1 − 2k)!
This, along with some more algebra, will yield (7.7.9), as required.
It then follows from the definition of the scalar second fundamental form (cf.
(7.5.11)) that
N−1
Sν = γiN ⊗ dei∗ , (7.7.11)
i=1
If we now define ij∗ k = g(∇Ei∗ Ej∗ , Ek∗ ), then the connection forms γiN can be
expressed33 as
33 It is often possible to write things in a format that is computationally more convenient. In
particular, if the metric is Euclidean and if it is possible to explicitly determine functions
aij , such that
N
Eit∗ = aik (t)Ekt ,
k=1
N
∂
j∗iN (t) = aj k (t) aNl (t)aim (t)gml (t) .
∂tk
k,l,m=1
7.7 A Euclidean Example 181
N−1
γiN = j∗iN dej∗ . (7.7.13)
j =1
If, as for the curvature tensor, we now choose a smooth section X of O(M) with
dual frames θ such that on ∂M, XN = ν, similar calculations yield that
N−1
N−1
Sν = SijX θi ⊗ θj = θiN ⊗ θi ,
i,j =1 i=1
where
N−1
−1 −1
N−1
SijX = gii 2 gjj 2 j i N , θiN = SijX θj .
i ,j =1 i=1
1
N−1 F F F
k j p
R k Sνj = l=1 a2l−1 ,a2l ∧ θ a
m=1 2k+m N ⊗ l=1 θal , (7.7.14)
2k
a1 ,...,ap =1
So far, all that we have had to say about manifolds and calculus on manifolds has
been of a local nature; i.e., it depended only on what was happening in individual
charts. However, looking back at what we did in Sections 6.1–6.3 in the setting of
integral geometry, we see that this is not going to solve our main problem, which
is understanding the global structure of excursion sets of random fields now defined
over manifolds.
In order to handle this, we are going to need a reasonably heavy investment in
notation leading to various notions of piecewise smooth spaces.1 The investment
will be justified by ultimately producing results, in Part III, that are both elegant and
applicable. To understand the need for piecewise smooth spaces, two simple examples
should suffice: the sphere S 2 , which is a C ∞ manifold without boundary, and the unit
cube I 3 , a flat manifold with a boundary made up of six faces that intersect at twelve
edges, themselves intersecting at eight vertices. The cube, faces, edges, and vertices
are themselves flat C ∞ manifolds, of dimensions 3, 2, 1, and 0, respectively.
In the first case, if f ∈ C k (S 2 ), the excursion set Au (S 2 , f ) is made of smooth
subsets of S 2 , each one bounded by a C k curve. In the second case, for f ∈ C k (I 3 ),
while the individual components of Au (I 3 , f ) will have a C k boundary away from
∂I 3 , their boundaries will also have faces, edges, and vertices where they intersect
with ∂I 3 . We already know from Section 6.2 that when we attempt to find point
set representations for the Euler characteristics of excursion sets, these boundary
intersections are important (e.g., (6.2.16) for the case of I 2 ). This is even the case if
the boundary of the parameter set is itself smooth (e.g., Theorem 6.2.5). Consequently,
as soon as we permit as parameter spaces manifolds with boundaries, we are going to
require techniques to understand how these boundaries intersect with excursion sets.
1 We shall soon meet many types of piecewise smooth spaces, including stratified manifolds,
Whitney stratified manifolds, all in tame and locally convex (or not) versions. We shall
therefore use the term “piecewise smooth’’ in a loose generic sense, referring to any or all
of these examples. In the formal statements of results, however, we shall be careful about
specifying which case is under consideration. Unfortunately—since one would prefer a
general theory—this is necessary, since very often, results that appear at first as if they
should hold in wider generality than stated do not.
184 8 Piecewise Smooth Manifolds
Thus, what we are searching for is a framework that will cover basically smooth
spaces, which, however, are allowed to have edges, corners, etc. This will ultimately
involve blending both integral and differential geometry. There are many ways to do
this, none of which could be considered canonical. The two most popular approaches
are based on the related, but nonequivalent, theories of Whitney stratified manifolds,
for which the standard reference is the monograph by Goresky and MacPherson [72],
and sets of finite reach, as developed by Federer [64, 65] with extensions by Zähle
[182] and others. In this book we shall adopt an approach based on Whitney stratified
manifolds, and in the remainder of the chapter we shall define carefully what these
are, and then proceed, both here and in Chapter 9, to add some additional restrictions
needed to make the results of Part III work.
The resulting treatment is therefore occasionally unmotivated, and it would only
be natural for you to ask why we suddenly add one side condition or another, and if
this one, why not another. The rather unsatisfactory answer is that we add conditions
that we require to make later proofs work. Indeed, it is probably instructional for you
to know that although this and the following chapter are only the eighth and ninth out
of fifteen, they were the last to be finished, since we had to return here time and again
to tailor conditions to match what we could prove. Our only consolation is that this
phenomenon seems to be endemic to the integral/differential geometry interface in
general; i.e., one chooses a basic framework, and then appends conditions to generate
proofs that morally should not require the additional conditions but in practice, do.2
Note, however, that if you are interested only in parameter spaces that are man-
ifolds without boundary then you can go directly to Chapter 9 and skip the current
one. However, you will then have to forgo fully understanding how to handle excur-
sion sets over parameter spaces as simple as cubes, which, from the point of view of
applications, is a rather significant loss.
Finally, we note that as much of the material of this chapter is not completely
standard material in differential geometry, and even those readers comfortable with
Chapter 7 may find it useful, at least for establishing notation.
dim M
M= ∂l M,
l=0
For the first two, the decompositions look locally the same at every point in R.
However, Z3 is fundamentally different at 0. A priori, this means that some of the
expressions for the intrinsic volumes described in Section 10.7 seem as if they may
186 8 Piecewise Smooth Manifolds
depend on the decomposition of M. That this is not the case is a somewhat subtle
issue, to which we shall return when the need arises.4
So far, we have not imposed any regularity on how the respective l-dimensional
boundaries ∂l M are “glued’’ together. In the case of I 3 , and the natural decomposi-
tion described above, the pieces all fit together in a nice fashion. For more general
parameter spaces, we shall have to impose some further regularity on M.
Returning to I 3 for motivation, note that every point t ∈ I 3 has a neighborhood
that is isomorphic5 to the product of an open neighborhood of the stratum that contains
t and a cone. For example, each vertex of I 3 has a neighborhood that is isomorphic
to a closed octant in R3 , and every point t in an edge of I 3 has a neighborhood
isomorphic to the product of an open interval around 0 and a closed quadrant in R2 .
This “locally conic’’property of I 3 will be essential to the Morse theory developed
in Chapter 9, and that for general stratified spaces follows from Whitney’s conditions
(A) and (B) below. These conditions are regularity conditions imposed on a stratified
space that, in particular, imply that each point t ∈ M has a neighborhood isomorphic
to the product of an open subset of the stratum S containing t and a cone6 Cone(LS )
with base LS over a stratified space LS , the link of M at t.
A stratified space (M, Z), is said to satisfy Whitney condition (A) at t ∈ S if the
following holds for every S 3 S:
(A) If tn → t ∈ S such that tn ∈ 3 S for all n and the sequence of tangent spaces
S converges in the Grassmannian7 bundle of dim(3
Ttn 3 S)-dimensional tangent
(sub)spaces of M 3 to some τ ⊂ Tt M, 3 then τ ⊃ Tt S. A limit of such tangent
spaces is called a generalized tangent space of M at t.
4 This point is treated in detail in [122], where a distinction is also made between a decom-
position and a stratification, where the latter are equivalence classes of decompositions.
However, since any decomposition uniquely determines a stratification, for our purposes
we can assume that we are given a decomposition Z of M and talk about the stratifica-
tion induced by Z. Reference [122] also raises the possibility of parameterizing strata by
something known as “depth’’ rather than by dimension.
5 Two stratified spaces (M , Z ) and (M , Z ) of class C l are said to be isomorphic if there
1 1 2 2
exists a map H : M1 → M2 that is an isomorphism of the partially ordered sets Z1 and Z2 .
That is, for each S ∈ Z1 , H (S) ∈ Z2 , and this map is an isomorphism when S and H (S) are
considered as elements of the partially ordered sets Z1 and Z2 , respectively. Furthermore,
H is such that for each S ∈ Z1 , H|S : S → H (S) is a diffeomorphism of class C l .
6 Recall that the cone Cone(L) over a topological space L is defined as the quotient space
Y = L × [0, 1), where y1 ∼ y2 ⇐⇒ y1 = (x1 , 0), y2 = (x2 , 0) for x1 , x2 ∈ L.
If the topological space is also a real vector space V , then a cone is a subset K of V for
which λK = K for all λ > 0. (Technically, this is a positive cone, but all our cones will be
of this form, so we shall drop the qualifier.) Then K can be written in the form
K = {λx : x ∈ Kbase , λ ≥ 0} .
We refer to K as the cone over the base Kbase and denote it by Cone(Kbase ).
7 The Grassmannian bundle referred to here is the natural one, namely the union of the
Grassmannian manifolds of dim(3 S)-dimensional subspaces of Tt M, 3 where the union is
3 and the collection is parameterized similarly to the tangent bundle.
taken over all t ∈ M,
8.1 Whitney Stratified Spaces 187
It is easy to show that condition (B) implies condition (A), and it is not too much
more difficult to show that whether condition (B) is satisfied at t is independent of
the chart (cf. [122]). A stratified space (M, Z) is called a Whitney stratified space if
it satisfies Whitney condition (B) (and hence condition (A)) at every t ∈ M.
Examples of Whitney stratified spaces abound, and include the following:
M
dim
ϕ(M) = (−1)(N−j ) αj (SM ) , (8.1.1)
j =0
188 8 Piecewise Smooth Manifolds
where αj (SM ) is the number of j -dimensional facets in SM . Despite the fact that
there is no uniqueness for triangulations and so the right-hand side here would seem
to depend on SM , it is a basic theory of algebraic topology that the Euler characteristic
is well defined and independent of the triangulation.
Finally, we turn to the issue of tranverse intersections. We say that two Whitney
stratified submanifolds, M1 and M2 , subsets of the same ambient N -dimensional
manifold M,3 intersect transversally if for each pair (j, k) with 0 ≤ j ≤ dim(M1 )
and 0 ≤ k ≤ dim(M2 ), and each t ∈ ∂j M1 ∩ ∂k M2 , the dimension of
span Xt + Yt : Xt ∈ Tt⊥ ∂j M1 , Yt ∈ Tt⊥ ∂k M2 (8.1.2)
is equal to 2N − j − k ≥ 0.
Now suppose that f ∈ C k (M) 3 is such that its excursion set over M,
3 Au (f, M)
3 =
−1
f [u, ∞) is a Whitney stratified manifold. If M is a Whitney stratified submanifold
3 and if Au (f, M)
of M, 3 and M intersect transversally, then Au (f, M) will also be a
Whitney stratified manifold with a stratification inherited from those of Au (f, M) 3
and M.
The punch line of the above paragraph is that if we start with Whitney stratified
manifolds as the parameter spaces of random fields, and if our random fields behave
well, then excursion sets will also be Whitney stratified manifolds. This closure,
together with the fact that Whitney stratified manifolds provide a natural setting for
both smooth and angular parameter spaces, is one of the main reasons that this class
of sets will be, for us, the right choice for Part III.
∂
N
ċ(0) = ċi (0)
∂xi c(0)
i=1
St (M1 ∩ M2 ) = St M1 ∩ St M2 . (8.2.2)
This follows from the simple observation that the support cone at t ∈ M1 ∩ M2 is the
set of all directions in which one can leave t ∈ M1 ∩ M2 while remaining in M1 ∩ M2 .
Thus each such direction must be contained in the support cones of both M1 and M2 ,
i.e., it must lie in the intersection of the two support cones.
Figure 8.2.1 shows a part of the (shaded) support cones for two domains in R2 ,
where the bases of the cones are the points at the base of the concavity in each domain.
Note that while neither of the domains is itself convex, the smooth domain always
has convex support cones, while the domain with the concave cusp does not.
We now have all we need to define a family of Whitney stratified spaces that will
be important for us later.
Definition 8.2.1. A Whitney stratified space (M, Z) is called locally convex if the
support cone St M is convex for every t ∈ M.
Under this definition, the smooth domain of Figure 8.2.1 is locally convex, while
the domain with concave cusp is not. These examples are actually quite generic, since
the main import of the convex support cone assumption is to exclude sharp, concave
cusps. Similarly, while the N -cube I N is locally convex (and indeed, convex), its
boundary ∂I N is not.
With the notion of support cones fresh in our minds, this is probably a good place
to define their duals, for which we need to assume that M3 has a Riemannian structure.
This we do, writing 3 3
g for the Riemannian metric on M and g or 3 g|∂j M for the metric
it induces on M or ∂j M.
This allows us to define a dual to each support cone St M, known as the normal
cone of M at t and defined by
3 :3
Nt M = {Xt ∈ Tt M g (Xt , Yt ) ≤ 0 for all Yt ∈ St M}. (8.2.3)
In the two domains in Figure 8.2.1, assuming the usual Euclidean metric, the
normal cones at the base of the concavity are the outward normal in the first case, and
empty in the other.
190 8 Piecewise Smooth Manifolds
A more interesting example is given in Figure 8.2.2 for the tetrahedron. In this
case we have shown the (truncated) normal cones at (a) a vertex (b) a point on an edge,
and (c) a point on a face. What remains is a point in the interior of the tetrahedron, for
which the normal cone is empty. Note that in each case the dimension of the normal
cone is the codimension of the stratum in which the base point sits, with respect to
that of the ambient manifold (in this case, R3 ).
Definition 8.3.1. Let M ⊂ M 3 be a C l stratified space with stratification (Z, S), with
l a nonnegative integer. Then M is said to be a cone space of class C l and depth 0 if
it is the topological sum of countably many connected C l manifolds, the strata S of
which are the unions of connected components of equal dimension.10
A space M is said to be a cone space of class C l,m (m ≥ 0) and depth d + 1
(d ≥ 0) if every t ∈ S ⊂ M has a neighborhood U ⊂ M 3 such that U ∩ M is C m
diffeomorphic to (U ∩ S) × Cone(LS ), where LS is a compact C l cone space of
8 The point at which we shall need C 2 cone spaces is in the proof of Theorem 9.2.6 in the
following chapter. It is not clear to us at this stage whether the assumption that the manifolds
there are also cone spaces is necessary for the result to hold, or merely a requirement of our
method of proof. However, we were unable to find a proof without this assumption.
9 In [122], cone spaces are defined only for m = 0, i.e., works only for homeomorphisms.
Unfortunately, we shall need a little more for the proof of Theorem 9.2.6.
10 Note that these are true manifolds, and not manifolds with boundary.
8.3 Cone Spaces 191
St M
t∈S
of support cones.
For an example, consider the important special case of convex simplicial com-
plexes in Euclidean spaces. Such spaces have canonical stratifications given by the
facets. In this case, for a given t ∈ M, the “link’’ LS is naturally identified with the
unit normal vectors of the supporting hyperplanes at t, chosen to point toward the
interior of M. Furthermore, if we denote the span of a facet S by [S], it is clear that
[S] ⊕ Cone(LS ) is St M, where ⊕ denotes Minkowski addition
A ⊕ B = {x + y : x ∈ A, y ∈ B} (8.3.1)
(cf. [141]).
For a simple example of a C ∞,0 cone space that is not C ∞,m for any m ≥ 1,
consider the so-called Neil’s parabola (cf. [122]) given by
MNeil = {(s, t) ∈ R2 : s 3 = t 2 }.
Its stratification is given by the origin and the two legs {(s, t) ∈ MNeil : t > 0} and
{(s, t) ∈ MNeil : t < 0}.
MNeil is clearly homeomorphic to Cone({−1, 1}), which we can identify with the
graph
{(s, t) ∈ R2 : s = |t|}.
However, there is no diffeomorphism of the plane that maps MNeil to Cone({−1, 1}).
9
Critical Point Theory
In the preceding chapter we set up the two main geometric tools that we shall need
in Part III of the book. The first of these are piecewise smooth manifolds of one kind
or another, which will serve there as parameter spaces for our random fields, as well
as appearing in the proofs. The second are the Lipschitz–Killing curvatures that we
met briefly in Chapter 7 and shall look at far more closely, in the piecewise smooth
scenario, in Chapter 10. These will appear in the answers to the questions we shall
ask. Between the questions and the answers will lie considerable computation, and
the main geometric tool that we shall need there is the topic of this short chapter.
Critical point theory, also known as Morse theory, is a technique for describing
various global topological characteristics of manifolds via the local behavior, at crit-
ical points, of functions defined over the sets. We have already seen a version of
this back in Section 6.2, where we obtained point set representations for the Euler
characteristic of excursion sets (cf. Theorems 6.2.4 and 6.2.5, which gave point set
representations for excursion sets in R2 , over squares and over bounded sets with C 2
boundaries). Our aim now is to set up an analogous set of results for the excursion
sets of C 2 functions defined over C 3 piecewise smooth spaces. We shall show in
Section 9.4 how to specialize these back down to the known, Euclidean, examples of
Section 6.2.
A full development of this theory for manifolds, which goes well beyond what we
shall need, is in the classic treatise of Morse and Cairns [117], but you can also find
a very readable introduction to this theory in the recent monograph of Matsumoto
[112]. The standard theory, however, concentrates on smooth manifolds, as opposed
to the piecewise smooth case that we need. The standard reference in this case is the
excellent monograph of Goresky and MacPherson [72].
point of f3 is a point t ∈ M
3 such that ∇ f3t = 0. Points that are not critical are called
regular.
Now take M to be compact, N -dimensional, and C 2 piecewise smooth, embedded
in a C 3 ambient manifold (M,3 3 g ), writing, as usual, g for the induced metric on M.
Extending the notion of critical points to f = f3|M requires taking note of the fact that
the various boundaries in M are of different dimensions and so, in essence, involves
repeating the above definition for each f|∂j M . However, our heavy investment in
notation now starts to pay dividends, since it is easy to see from the general definition
that a point t ∈ ∂j M, for some 0 ≤ j ≤ N , is a critical point if and only if
(cf. (7.5.7) for notation). Thus we need work only with the single function f and not,
explicitly at least, with its various restrictions.1
This definition implies that all points in ∂0 M are to be considered as critical points,
and, when dim(M) = dim(M), 3 that critical points of f|∂N M ≡ f3|M ◦ are just critical
points of f3 in the sense of the initial definition.
We call the set
N
{t ∈ ∂j M : ∇ f3t ∈ Tt⊥ ∂j M}
j =0
the set of critical points of f|M . All other points are known as regular points.
A critical point t ∈ ∂j M of f|M is called nondegenerate if the covariant Hessian
∇ 2 f|Tt ∂j M is nondegenerate, when considered as a bilinear mapping. A function
f ∈ C 2 (M) 3 is said to be nondegenerate on M if all the critical points of f|M are
nondegenerate. The tangential Morse index
1 This assumes, however, that one remembers where all these spaces are sitting, or (9.1.1)
makes little sense. Assuming that the ambient space M 3 is N-dimensional (N = dim(M)),
we have, on the one hand, that ∇ f3 is also N-dimensional, whereas Tt⊥ ∂j M is (N − j )-
dimensional. It is important, therefore, to think of Tt⊥ ∂j M as a subspace of Tt⊥ M for the
3 and its Riemannian
inclusion to make sense. Overall, all of these spaces are dependent on M
metric.
9.2 The Normal Morse Index 195
The normal Morse index, α, is a measure of local change in the topology of the
manifold. In that sense it is very much like the Euler–Poincaré functional ϕ of
(6.1.10). However, unlike ϕ, it records this change for any normal direction, and later
we shall take averages over all directions. In this way we overcome the drawback of
the theory of Chapter 6, which was axis-dependent (cf. footnote 1 there).
Beyond its use in Morse theory, α will also play a key role in the definition of
Lipschitz–Killing curvature measures for Whitney stratified spaces in Chapter 10.
T ⊥M = T ⊥S
S∈Z
is unchanged for 0 < δ < δ0 and 0 < ε < ε0 . This set is easily seen to be a
3
Whitney stratified manifold embedded in Rdim(M) , and so has a well-defined Euler
characteristic given either via (8.1.1) or via the integral geometry of Chapter 6 if the
setup there happens to apply here as well. For 0 < δ < δ0 and 0 < ε < ε0 , denote it
by χ (ν).
We then define the (normal) Morse index of M at t in the direction ν to be
α(ν) ≡ α(ν; M) = 1 − χ (ν). (9.2.2)
In fact, the normal Morse index is really dependent only on the structure of
the support cone St M, which, recall, we can express as Tt S ⊕ Kt for some cone
Kt ⊂ Tt S ⊥ . Moreover, a little thought shows that it is actually the structure of the
cone Kt that is important, and so it therefore makes sense to work with
α(νt ; Kt ) = α(νt ) = α(νt ; M).
Actually, it is not immediately clear that α(ν) is well defined, i.e., that the above
Euler characteristic depends only on the vector ν ∈ T ⊥ S (more precisely on its
196 9 Critical Point Theory
The Morse index takes a particularly simple form for convex polytopes. In this
case it is easy to check that
!
1, −ν ∈ (Nt M)◦ ,
α(ν) = (9.2.3)
0 otherwise,
Furthermore, since the support cones of M1 ∩M2 for two Whitney stratified subspaces
are the intersections of the corresponding support cones of M1 and M2 (cf. (8.2.2)),
we can rewrite (9.2.4) as
We have avoided one technicality in the above discussion, related to what are known
as generalized tangent spaces. To define these, let S1 ⊂ S2 (S1 = S2 ) be two strata
of a Whitney stratified manifold, and take t ∈ S 1 ⊂ S2 and {tn } a sequence of points
in S1 converging to t. Then a generalized tangent space at t is any limit
lim Ttn S1 .
tn →t
2 Note that the Morse index can actually be defined independently of the Riemannian metric,
although then we would need to define it on the stratified conormal bundle of M. Since
all our examples are Riemannian, we prefer to think of α as an integer-valued map on the
stratified normal bundle instead.
9.2 The Normal Morse Index 197
of all generalized tangent spaces coming from S has Hausdorff dimension less
than dim(S) in the appropriate Grassmannian.5
(ii) Wherever the normal Morse index is defined, we have
|α(νt ; M)| ≤ C.
We now come to a simple definition, to which, since it is so central to all that fol-
lows, we have devoted an entire subsection. Recall that cone spaces are defined in
Definition 8.3.1, local convexity at Definition 8.2.1, and tame spaces were defined
above.
In general, we shall not require that our manifolds be locally convex, although
many formulas (and some proofs) become easier in this case. However, this will be
a crucial assumption for Chapter 14, where we shall prove what is our main result
about excursion probabilities for smooth Gaussian fields.
An issue that will recur often throughout Part III of the book will be the geometry of
a set M1 ∩ M2 , where both M1 and M2 are stratified spaces. The simplest and almost
ubiquitous example will arise when we look at excursion sets, which we can write as
M ∩ f −1 (u),
2)
Codim(L
TL2 ν = PL⊥1 ν − g ij PL⊥1 wj ,
PL1 ν, PL1 wi 3 (9.2.11)
i,j =1
where the 3
g ij are the entries of the inverse of the matrix with elements
α(ν; K 32 ) = α(TL2 ν; K
31 ∩ K 31 ) · α(ν; PL1 K
32 ). (9.2.13)
Proof. While (9.2.10) may seem like a tidier and more natural result than (9.2.13),
the latter result will actually be rather important for us, and we shall also prove it first.
We start with a little notation, for which we fix a point t ∈ L 32 and look at
31 ∩ L
a variety of vectors, matrices, and spaces at t. However, since the notation is heavy
enough with carrying the dependence on t, it will not appear explicitly in any of what
follows. Note, however, that as far as the cones K1 and K2 are concerned, t is always
at the apex of the cone.
Enlarge the set V to
3 = v1 , . . . , vn , vn+1 , . . . , vCodim(L1 )
V
3) = L⊥ and
in such a way that span(V 1
31 )⊥ ,
vn+i ∈ span(K 1 ≤ i ≤ Codim(L1 ) − n.
10 The vectors that cause us problems at this stage are those that may annihilate generalized
tangent spaces, which in the current simple scenario have measure zero.
11 Since we normally write the Morse index as depending on a vector ν emanating from a
t
point t, and here there is no reference to t, note that we are implicitly assuming that given
a ν ∈ (L1 ∩ L2 )⊥ , the relevant t is the point in L1 ∩ L2 from which it emanates.
12 The fact that w ∈ L⊥ for all 1 ≤ j ≤ m comes from the orthogonal structure in (9.2.8).
j 2
It is also straightforward to check that the transformation TL2 is independent of the choice
of extension of W to W 3.
9.2 The Normal Morse Index 201
gij = vi , vj
1)
Codim(L
vi∗ = g ij vj , 1 ≤ i ≤ n.
j =1
It is straightforward to check that this set of vectors forms a basis for the normal
cone K3∗ , which we can now write as
1
! n "
31 =
K ∗ ∗
ai vi : ai ≤ 0, 1 ≤ i ≤ n ⊕ v : Pspan(K3 ) v = 0 1
i=1
⎧ ⎫
⎨Codim(L
1) ⎬
= ai vi∗ : ai ≤ 0, 1 ≤ i ≤ n .
⎩ ⎭
i=1
Similarly, define a set of vectors wi∗ ∈ L⊥2 , starting with the wi rather than with
the vi , and use these to write an analogous representation for the normal cone K 3∗
2
3
of K2 .
Given this notation, we are now in a position to start the proof.
Since intersections of simplicial cones are simplicial cones and simplicial cones
are convex, it follows from (9.2.3) that their normal Morse index is actually an
indicator function. Specifically, for ν ∈ (L1 ∩ L2 )⊥ ,
31 ∩ K
α(ν; K 32 ) = 1(K3 ∩K3 )∗ (−ν), (9.2.14)
1 2
where
31 ∩ K
(K 32 )∗ = K
31∗ ⊕ K
32∗ .
In particular, for ν ∈ (L1 ∩ L2 )⊥ ,
31∗ ⊕ K
−ν ∈ K 32∗
1 ∩L2 )
Codim(L
bi∗ = g ij bj (9.2.15)
j =1
wi , wj , 1 ≤ i, j ≤ Codim(L2 ),
in its lower right corner. As usual, the g ij are the elements of the inverse of the matrix
of gij .
Note, for later use, that the above construction implies that for 1 ≤ i ≤ Codim(L1 )
and Codim(L1 ) + 1 ≤ j ≤ Codim(L1 ∩ L2 ),
With the above notation, we can now write the normal Morse index of K 31 ∩ K 32 as
⎛ ⎞ ⎛ ⎞
#n #1 )+m
Codim(L
31 ∩ K
α(ν; K 32 ) = ⎝ 1[0,∞) (ν, bj∗ )⎠ · ⎝ 1[0,∞) (ν, bj∗ )⎠ .
j =1 j =Codim(L1 )+1
(9.2.17)
Our goal now is to relate each factor in the product to the factors on the right-hand
sides of (9.2.10) and (9.2.13). We start with (9.2.13).
Recall that the dual basis (9.2.15) is also uniquely determined by the orthonor-
mality relationship
bi∗ , bj = δij
for all bi ∈ B. Therefore, for any sequence of reals, cj ,
+ ,
D E
ν+ cl bl , bj∗ = ν, bj∗ .
l=j
In particular, taking ν ∈ (L1 ∩ L2 )⊥ and 1 ≤ l ≤ n, and noting that the wj are then
orthogonal to bl∗ , we have
+ ,
) * 2)
Codim(L
ν, bl∗ = ν− PL1 ν, PL1 wi 3
g ij
wj , bl∗ (9.2.18)
i,j =1
+ ,
2)
Codim(L
= PL⊥1 ν − PL1 ν, PL1 wi 3
g ij
PL⊥1 wj , bl∗ .
i,j =1
2)
Codim(L
ν = PL⊥1 ν + PL1 ν = PL⊥1 ν + PL1 ν, PL1 wi 3
g ij PL1 wj ,
i,j =1
where the matrix 3 g was defined in (9.2.12), and the representation of PL1 ν fol-
lows from the fact that the codimension assumptions of the theorem ensure that the
{PL1 wj }j =1,...,Codim(L2 ) are linearly independent and span the orthogonal comple-
ment of L⊥ 1 in (L1 ∩ L2 ) .
⊥
Returning to (9.2.18), noting the definition (9.2.11) of the mapping TL2 and the
fact that TL2 ν ∈ L⊥ 1 , we now have
) *
ν, bl∗ = TL2 ν, bl∗ = TL2 ν, PL⊥1 bl∗ = TL2 ν, vl∗ , (9.2.19)
where the final equality follows from the observation that for 1 ≤ i, l ≤ n,
vi , PL⊥1 bl∗ = vi , bl∗ = δil
(cf. (9.2.16)). By the uniqueness of the dual basis this implies that PL⊥1 bl∗ = vl∗ .
Substituting (9.2.19) into (9.2.17), we find that the first term on the right-hand
side there, and so also in (9.2.13), is given by
#
n
31 ),
1[0,∞) (TT2 ν, vl∗ ) = 1K ∗ (−TL2 ν) = α(TL2 ν, K
1
l=1
where K1 and K2 are cones in RN for which the following conditions hold:
(i) There exist sequences K1,n and K2,n of cones over simplicial complexes with
Kj,n → Kj in the Hausdorff metric.
(ii) For each j and almost all ν we have α(ν; Kj,n ⊕ Lj ) → α(ν; Kj ⊕ Lj ) as
n → ∞.
Let Codim (L1 ∩ L2 ) = Codim (L1 ) + Codim (L2 ). Then, for almost every ν ∈
(L1 ∩ L2 )⊥ ,
31 ∩ K
α(ν; K 32 ) = α(TL2 ν; K
31 ) · α(ν, PL1 K
32 )
31 ) · α(ν, PL1 K
= α(ν; PL2 K 32 ).
Proof. Note first that the result of Theorem 9.2.3 extends trivially to the case in which
the simplicial cones K1 and K2 are replaced by cones over simplicial complexes.
This follows from footnote 9, on applying Theorem 9.2.3 to the individual simplices
making up the complex and then exploiting the additivity (9.2.4).
Thus the result holds when each K 3j is replaced by a Kj,n ⊕ Lj . Conditions (i)
and (ii) of the theorem now imply the result in general.
Before turning to our main result, we note the following corollary to the preceding
results.
directions.
Note also that PL2 K1 / K1 ∩ L2 . Now apply the appropriate theorem or corol-
lary.
9.2 The Normal Morse Index 205
Now that we have a set of basic results for cones, we can turn to the manifold
version of these results, which is what we shall actually need, but for which we offer
only an outline of a proof.
Suppose that for each j and k, M1j and M2k intersect transversally and
Codim M1j ∩ M2k = Codim M1j + Codim (M2k ) . (9.2.20)
Fix a t ∈ M1j ∩ M2k . Then, for every such t and almost every νt ∈ Tt (M1j ∩ M2k )⊥ ,
⊥ is defined by
where Tj k : Tt (M1j ∩ M2k )⊥ → Tt M1j
2k )
Codim(M
Tj k ν = PT⊥t M1j ν − g rs PT⊥t M1j ws ,
PTt M1j ν, PTt M1j wr 3
r,s=1
⊥ and the 3
where {w1 , . . . , wCodim(M2k ) } is a collection of vectors spanning Tt M2k g rs
are the entries of the inverse of the matrix with elements
Proof. Starting the proof is easy; finishing it is, to say the least, “tedious.’’
To start it, note that the fact that St (M1 ∩ M2 ) = St M1 ∩ St M2 is an immediate
consequence of the definition of support cones, and so the first equality in (9.2.21) is
trivial.
To continue, note that α(·, M) was actually defined on the Euclidean image of M
under the exponential map. Thus we can assume that we are in Euclidean space.
The rest (i.e., both (9.2.21) and (9.2.22)) would now follow with very little work
from Theorem 9.2.3 and Corollary 9.2.5 if it were true that M1 and M2 were simplicial
complexes. Unfortunately, this, in general, is not the case.
206 9 Critical Point Theory
The last definition we need before stating Morse’s theorem is that of a Morse function.
3 where M is a C 2 Whitney stratified manifold
Definition 9.3.1. A function f ∈ C 2 (M),
3 3
embedded in a C ambient manifold M, is called a Morse function on M if it satisfies
the following two conditions on each stratum ∂k M, k = 0, . . . , dim(M):
(i) f|∂k M it is nondegenerate on ∂k M.
' '
(ii) The restriction of f to ∂k M = kj =0 ∂j M has no critical points on k−1
j =0 ∂j M.
Note that (ii) is equivalent to requiring the following:
(iii) At each critical point t of f|∂k M , ∇f|∂k M,t is a nondegenerate tangent vector.17
14 There are many references that could be given here, but perhaps the most appropriate
approximation is due to Cheeger et al. [39]. To see how to use it in the setting of tame
manifolds, see the papers by Zähle [182] and Bröcker and Kuppe [33].
15 In our case the approach of Zähle [182] is probably the most appropriate.
16 You might ask, why, if we had not planned to prove Theorem 9.2.6, we bothered with the
proof of Theorem 9.2.3 and its corollaries.
The reason is that we could not find anything like Theorem 9.2.3 in the literature, and,
since the product result here for the Morse index is crucial for later parts of the book, we
felt duty bound to prove it. On the other hand, the move from simplicial complexes to tame
manifolds, while certainly not easy, is more standard fare, and so a description of the proof
as “tedious but straightforward’’ is not unjustified.
17 Cf. (9.2.6) and the discussion preceding it.
9.3 Morse’s Theorem for Stratified Spaces 207
With all the definitions cleared up, we have the necessary ingredients to state the
following version of Morse’s theorem, due to Goresky and MacPherson [72].
N
(−1) f,∂j M α(PT⊥t ∂j M ∇ft ; M),
ι (t)
ϕ(M) = (9.3.1)
j =0 {t∈∂j M:∇ft ∈Tt⊥ ∂j M}
where PT⊥t ∂j M is the orthogonal projection onto (Tt ∂j M)⊥ , ϕ(M) is the Euler char-
acteristic of M, and the ιf,∂j M (t) are the tangential Morse indices of (9.1.2).
If the support cones St M are convex for each t ∈ M, i.e., M is locally convex,
then (cf. (9.2.3)) the above theorem reads as follows.
Corollary 9.3.3 (Morse’s theorem for locally convex manifolds). Let (M, Z) be a
compact C 2 , locally convex, Whitney stratified space embedded in a C 3 Riemannian
3 3
manifold (M, 3 be a Morse function on M. Then, setting
g ) and let f3 ∈ C 2 (M)
3
f = f|M ,
N ι (t)
ϕ(M) = (−1) f,∂j M 1{−∇ft ∈Nt M} , (9.3.2)
j =0 {t∈∂j M:∇ft ∈Tt⊥ ∂j M}
The points counted in the above corollary are given a special name.
−∇ft ∈ Nt (M),
Proof. As usual, write Au = M ∩ f3−1 [u, ∞). If −f3 were a Morse function on
Au , and if we changed the condition f3t > u to f3t ≥ u on the right-hand side of
(9.3.3), then the corollary would merely be a restatement of Morse’s theorem, and
there would be nothing to prove. However, the change is not obvious, and −f3is not
a Morse function on Au , since it is constant on the strata of Au that are subsets of
M ∩ f3−1 {u}.
The bulk of the proof involves finding a Morse function f on Au that agrees with
−f3 on “most’’ of this set (thus solving the problem of the “non-Morseness’’ of f3)
and that, at the same time, has critical points in a one-to-one correspondence with the
critical points of f on M above the level u. (This allows us to replace f3t ≥ u with
f3t > u).
The space M ∩ f −1 [u, ∞) can be decomposed into j -dimensional strata of the
form ∂j M ∩ f −1 (u, ∞) and ∂j +1 M ∩ f −1 {u}. More formally, consider f3 as a
function on M,3 which, since it is a manifold, has no boundary. Then
9.3 Morse’s Theorem for Stratified Spaces 209
∂N f3−1 [u, ∞) = f3−1 (u, ∞),
∂N−1 f3−1 [u, ∞) = f3−1 {u},
∂j f3−1 [u, ∞) = ∅ for j = 0, . . . , N − 2.
Since f3is a Morse function and u is a regular point for f|∂k M for all k, it follows that
3 Therefore Au is a (locally
M and f −1 [u, ∞) intersect transversally as subsets of M.
3
convex) regular subspace of M that can be decomposed as
N
Au = ∂j M ∩ f3−1 (u, ∞) ∪ ∂j +1 M ∩ f3−1 {u} (9.3.5)
j =0
⎛ ⎞ ⎛ ⎞
N N−1
=⎝ ∂j M ∩ f3−1 (u, ∞)⎠ ∪ ⎝ ∂j +1 M ∩ f3−1 {u}⎠ .
j =0 j =0
It is the last term here that gives rise to problems, since it is here that f3|Au loses its
property of being a Morse function (on Au ). Thus we search for a replacement to f
that is well behaved on this boundary set.
Since f3is a Morse function on M, it has only finitely many critical points inside
a relatively compact neighborhood V of M. Furthermore, exploiting the fact that u
is a regular value of f|∂k M for every k, there exists an ε > 0 such that
Uε = f3−1 (u − ε, u + ε) ∩ M ∩ V
3
contains no critical points of f|M . It is standard fare that there exists an h ∈ C 2 (M)
−1
that is a Morse function on f {u} and that is zero outside of Uε . Furthermore, since
V is compact, there exist Kf and Kh such that |∇h| < Kh and
for all t ∈ ∂j M ∩ Uε , 1 ≤ j ≤ N.
It then follows that the function
Kf
f = −f3+
h
3Kh
is a Morse function on Au . By our choice of h, the critical points of f|Au agree with
those of f on M ∩ Uεc . Furthermore, ∇f|∂j M ≡ πTt ∂j M ∇f can never be zero on
Uε ∩ ∂k M, and so there are no critical points of f at all in this region. Consequently,
the critical points of f on M ∩ f3−1 [u, ∞) are in one-to-one correspondence with
those of f3 on M ∩ f3−1 (u, ∞).
The result then follows from the fact that by Theorem 9.2.6 (cf. (9.2.22)) the
normal Morse indices at these points are the same for M as they are for M ∩
f −1 [u, ∞).
210 9 Critical Point Theory
With basic Morse theory for piecewise smooth spaces under our belt, it is now time to
look at one rather important example for which everything becomes quite simple. The
example is that of the N-dimensional cube I N = [0, 1]N , and the ambient space is
RN with the usual Euclidean metric. In particular, we want to recover Theorem 6.2.4,
which gave a point set representation for the Euler characteristic of the excursion set
of smooth functions over the square.
To recover Theorem 6.2.4 for the unit square we use Morse’s theorem 9.3.2 in its
original version. Reserving the notation f for the function of interest that generates
the excursion sets, write the f of Morse’s theorem as fm . We are interested in
computing the Euler characteristic of the set
Au = {t ∈ I 2 : f (t) ≥ u}
fm (t) = fm (t1 , t2 ) = t2 .
Now assume that f is “suitably regular’’in the sense of Definition 6.2.1. This is almost
enough to guarantee that fm is a Morse function over I 2 for the ambient manifold
R2 . Unfortunately, however, all the points along the top and bottom boundaries of I 2
are degenerate critical points for fm . We get around this by replacing I 2 with a tilted
version, Iε2 , obtained by rotating the square through ε degrees, as in Figure 9.4.1.
To compute ϕ(Au ) we now apply (9.3.1), and so need to characterize the various
critical points and their indices. The first fact to note is that, using the usual coordinate
system, we have ∇fm = (0, 1), and so there are no critical points of fm in A◦u . Thus
we can restrict interest to the boundary ∂Au which we break into three parts:
◦
(i) points t ∈ (Iε2 ) ∩ ∂Au ;
(ii) points t ∈ ∂Iε2 ∩ Au , but not vertices of the square;
(iii) the four vertices of the square.
9.4 The Euclidean Case 211
An example of each of these three classes appears in Figure 9.4.1, where the excursion
set of f appears along with contour lines in the interiors of the various components.
At points of type (i), f (t) = u. Furthermore, since the normal cone Nt (Au ) is
then the one-dimensional vector space normal to ∂Au , −∇fm = (0, −1) ∈ Nt (Au ) at
points for which ∂f/∂t1 = 0 and ∂f/∂t2 > 0. Such a point is at the base of the arrow
coming out of the disk in Figure 9.4.1. Differentiating between points that contribute
+1 and −1 to the Euler characteristic involves looking at ∂ 2 f/∂t12 . Comparing with
Theorem 6.2.4, we see that we have characterized the contributions of (6.2.17) and
(6.2.18) to the Euler characteristic.
We now turn to points of type (ii). Again due constancy of ∇fm , this time on
◦
∂Iε2 , there are no critical points to be counted on (∂Iε2 ∩ Au ) . We can therefore add
the endpoints of the intervals making up ∂Iε2 ∩ Au to those of type (iii). One of these
appears as the base of the leftmost arrow on the base of Figure 9.4.1. The rightmost
arrow extends from such a vertex.
For points of these kinds, the normal cone is a closed wedge in R2 , and it is left
to you to check that the contributions of these points correspond (on taking ε → 0)
to those of those described by (6.2.16).
This gives us Theorem 6.2.4, which trivially extends to any rectangle in R2 . You
should now check that Theorem 6.2.5, which computed the Euler characteristic for a
subset of R2 with piecewise C 2 boundary, also follows from Morse’s theorem, using
the same fm .
The above argument was really unnecessarily complicated, since it did not use
Corollary 9.3.5, which we built specifically for the purpose of handling excursion
sets. Nevertheless, it did have the value of connecting the integral-geometric and
differential-geometric approaches.
Now we apply Corollary 9.3.5. We again assume that f is suitably regular at the
level u in the sense of Definition 6.2.1, which suffices to guarantee that it is a Morse
function over the C 2 piecewise smooth space I N for the ambient manifold RN and
that the conditions of the corollary apply.
Write Jk ≡ ∂k I N for the collection of faces of dimension k in I N (cf. (6.2.2)).
With this notation, we can rewrite the sum (9.3.3) as
N
k
ϕ(Au (f, I N )) = (−1)i μi (J ), (9.4.1)
k=0 J ∈Jk i=0
J = {t ∈ I N : tj = εj if j ∈
/ σ (J ), 0 < tj < 1 if j ∈ σ (J )}.
Set εj∗ = 2εj − 1. Working with the definition of the Ci , it is then not hard to see that
μi (J ) is given by the number of points t ∈ J satisfying the following four conditions:
212 9 Critical Point Theory
f (t) ≥ u, (9.4.2)
fj (t) = 0, j ∈ σ (J ), (9.4.3)
εj∗ fj (t) > 0, j∈
/ σ (J ), (9.4.4)
index(fmn (t))(m,n∈σ (J )) = k − i, (9.4.5)
where, as usual, subscripts denote partial differentiation, and, consistent with the
definition of the index of a critical point, we define the index of a matrix to be the
number of its negative eigenvalues.
In Figure 9.4.2 there are three points that contribute to ϕ(Au (f, I 2 )). One, in
the center of the upper left disk, contributes via J = (I 2 )◦ = J2 . That on the right
side contributes via J = “right side’’ ∈ J1 , and that on the lower left corner via
J = {0} ∈ J0 .
The representation (9.4.1) of the Euler characteristic of an excursion set, along
with the prescription in (9.4.2)–(9.4.5) as to how to count the contributions of various
points to the sum, is clearly a tidier way of writing things than what we obtained via
integral-geometric methods. Nevertheless, it is now clear that the two are essentially
different versions of the same basic result. However, it is the compactness of (9.4.1)
that will be of importance to us in the upcoming computations for random excursion
sets in Part III of the book.
10
Volume of Tubes
In Chapters 7 and 8 we invested a good deal of time and energy in developing the many
results we need from differential geometry. The time has now come to begin to reap
the benefits of our investment, while at the same time developing some themes a little
further for later exploitation. This chapter focuses on the celebrated volume-of-tubes
formula of Weyl [73, 168], which expresses the Lebesgue volume of a tube of radius
ρ around a set M embedded in Rl or S(Rl ) in terms of the radius of the tube1 and
the Lipschitz–Killing curvatures of M (see Theorem 10.5.6). It is an interesting fact,
particularly in view of the fact that2 this is a book about probability that is claimed
to have applications to statistics, and despite the fact that Weyl’s formula is today the
basis of a large literature in geometry, that the origins of the volume-of-tubes formulas
were inspired by a statistical problem. This problem, along with its solution due to
Hotelling [79], were related to regression analysis and involved the one-dimensional
volume-of-tubes problem on a sphere, not unrelated to the computation we shall do
in a moment.
Much of the work in using Weyl’s tube formula and its generalizations lies in
deriving explicit expressions for Lipschitz–Killing curvature measures. This being
the case, even if you are comfortable with the material of Chapter 7 and are not
interested in the volume-of-tubes approximation per se, you will find it useful to read
Sections 10.7 and 10.9 of this chapter. These describe the Lipschitz–Killing curvature
measures of stratified Whitney spaces and the generalized Lipschitz–Killing curvature
measures that will be needed in Chapter 15.
Beginning in the late 1980s and early 1990s, the usefulness of the tube formula
in statistics was rediscovered, and there were a number of works applying these
formulas to statistical problems, as in [87, 93]. A particularly interesting paper for
us was a 1993 paper by Jiayang Sun [147], which came out of a thesis under David
Siegmund. It had an early version (albeit not easily recognizable as such) of the
1 A word on notation: If (T , d) is a metric space, then the sphere of radius λ in T is denoted
by Sλ (T ), with S(T ) ≡ S1 (T ). When T = RN we continue, when convenient, to use the
notation SλN−1 and S N−1 for Sλ (RN ) and S(RN ), respectively.
2 Despite current appearances!
214 10 Volume of Tubes
simple3 approximation
N
P sup ft ≥ u ≈ ρj (u)Lj (ψ(M)). (10.0.1)
t∈M j =0
where Hj is the j th Hermite polynomial, which we shall meet in more detail later in
Chapter 11 (cf. (11.6.9)). Finally, ψ is a mapping from the parameter space M to a
unit sphere. Exactly which mapping and a sphere of which dimension will be clear
after you have read Sections 10.2 and 10.6.
As we shall soon see, the above approximation, given enough side conditions, is
actually rather simple to derive using volume-of-tube techniques. Furthermore, when
both are defined, it agrees with the approximation that we shall derive in Chapter 14,
which is based on the expected Euler characteristic and which is much harder to es-
tablish. Unfortunately, however, the volume-of-tubes approach has the disadvantage
of being restricted to a somewhat small class of Gaussian processes, in that they
are required to possess a finite Karhunen–Loève expansion. Specifically, these are
processes defined on a manifold M that can be expressed as
l
f (t, ω) = ψt , ξ(ω)Rl = ξj (ω)ψj (t), (10.0.2)
j =1
for some smooth mapping ψ : M → S(Rl ),4 where the ξj are independent, standard
Gaussians.
The volume-of-tubes-based derivation of the approximation (10.0.1) also has the
disadvantage that all results on its accuracy are very distributional in nature and there
is no known way to even begin thinking about how to extend them to non-Gaussian
processes. In any case, we shall not address the accuracy of the approximation in
this chapter, since it will follow as a corollary to the accuracy of the expected Euler
characteristic approach in Chapter 14 and the equivalence of the two approaches [148]
for finite Karhunen–Loève processes.
3 This is “simple’’ as long as you are also comfortable applying the same adjective to
Lipschitz–Killing curvatures, defined below in (10.5.4).
4 While if M = I N , this expansion may indeed arise from the L2 (I N ) expansion of a
covariance function as in Section 3.2, this is not necessary, the main issue being the existence
of a finite orthogonal expansion of some kind. Consequently, these processes are more
appropriately referred to as finite orthogonal expansion processes, although we shall retain
the nomenclature “finite Karhunen–Loève expansion processes’’ for historical reasons.
10.1 The Volume-of-Tubes Problem 215
For a metric space (T , τ ) the tube of radius ρ around A ⊂ T is, as we have already
seen in the Euclidean setting, defined as
Tube(A, ρ) = {x ∈ T : τ (x, A) ≤ ρ} = Bτ (y, ρ), (10.1.1)
y∈A
where
τ (x, A) = inf τ (x, y) (10.1.2)
y∈A
μ (Tube(A, ρ))
for some measure μ, as a function of ρ. When this is not possible, it seeks approxi-
mations to μ(Tube(A, ρ)).
Of course, the possibility of such a computation, or the accuracy and validity of
such an approximation, will depend on the properties of the measure μ as well as
the set A. The simplest spaces to work on are the metric spaces Rl with its natural
metric structure and S(Rl ) with the geodesic metric from the induced Riemannian
structure on S(Rl ). The natural measures to consider in these examples are Lebesgue
measure on Rl and surface measure on S(Rl ), which agree with the Riemannian
measures induced by the canonical Riemannian structures. These are the two main
examples that we shall consider. In addition, in Section 10.9 we shall also investigate
the Gaussian volume of tubes, where the metric space is Rl with its standard metric
5 Since μ is a general measure, it would really be more appropriate to talk of the measure of
tubes rather than their “volumes.’’ However, we shall stay with the more standard “volume’’
terminology.’’
216 10 Volume of Tubes
and the measure6 μ = γRl the distribution of a random vector W ∼ N (0, Il×l ). This
example will play a crucial role in the important computations of Chapter 15.
As for what type of sets A we need to consider, in order to derive explicit formulas
for μ (Tube(A, ρ)), we shall have to restrict ourselves to sets A that are embedded
piecewise C 2 submanifolds of Rl or S(Rl ). For much of the discussion we shall
also limit ourselves to Lebesgue measure in the first case and surface measure in the
second. Both of these, of course, are the volume, or Hausdorff, measure induced by
the standard Euclidean metric in Rl (cf. footnote 23 of Chapter 7). In the final section
of the chapter we shall see what happens when μ is Gauss measure on Rl .
We now start a discussion of the connection between the excursion probability (10.0.1)
and the volume of tubes for finite Karhunen–Loève processes. We shall conclude it
only in Section 10.6, when we shall have more information on tube formulas.
Take a Gaussian process f , which is a restriction to a locally convex submanifold
M of a process f3 on an ambient manifold M 3 that has the representation (10.0.2),
that is,
l
f3(t, ω) = ψt , ξ(ω)Rl = ξj (ω)ψj (t). (10.2.1)
j =1
Note that we have introduced a new parameter here. Whereas we still use N to
denote dim(M) = dim(M), 3 the new parameter l denotes the order of the orthogonal
expansion.
Note also that the vector ξ/|ξ | is uniformly distributed on S(Rl ), independently
of |ξ |, which is distributed as the square root of a χl2 random variable. We shall write
ηl to denote the distribution of ξ/|ξ |, that is, the uniform measure over S(Rl ).
Finally, we shall assume that f3has constant unit variance, so that (10.2.1) imme-
diately implies
l
|ψ(t)|2 = ψj2 (t) = 1, (10.2.2)
j =1
3 This being the case, we can define the map ψ : t → (ψi (t), . . . , ψl (t)),
for all t ∈ M.
an embedding of M in S(Rl ). More significantly, we can define a random field f on
ψ(M) ∈ S(Rl ) by setting
6 For a finite-dimensional Hilbert space, H , we shall use γ to denote the canonical Gaussian
H
random vector on H . That is, if X ∼ γH , then for any orthonormal basis ζ1 , . . . , ζdim(H ) ,
the sequence
ζ1 , XH , . . . , ζdim(H ) , XH
for all x ∈ ψ(M). Note that f has the simple covariance function
and thus there is no problem taking a version of it on all of S(Rl ) with this covariance
function. This process is known as the canonical (isotropic) Gausssian field on
S(Rl ). Apart from our need of it now, it will play a central role in the calculations of
Chapter 15, and more details on it can be found in Section 15.6.
For our current purposes, we note that since it is trivial that
= ηl (Tube(A, ρ)) .
We now have enough to write out the basic equation in the volume-of-tubes
approach for such processes:
218 10 Volume of Tubes
∞
P sup ft ≥ u =
P sup ft ≥ u|ξ | = r P|ξ | (dr)
t∈M 0 t∈M
∞
= P sup ψt , ξ ≥ u|ξ | = r P|ξ | (dr)
0 t∈M
∞
= P sup ψt , ξ ≥ u|ξ | = r P|ξ | (dr)
u t∈M
∞
= P sup ψt , ξ/r ≥ u/r |ξ | = r P|ξ | (dr)
u t∈M
∞ ! "
= P sup s, ξ/r ≥ u/r |ξ | = r P|ξ | (dr)
u s∈ψ(M)
∞
= ηl Tube(ψ(M), cos−1 (u/r)) P|ξ | (dr)
u
2l ∞
= l/2
Hl−1 Tube(ψ(M), cos−1 (u/r)) P|ξ | (dr)
2π u
2l −1
= E H l−1 Tube(ψ(M), cos (u/|ξ |)) 1{|ξ |≥u} .
2π l/2
(10.2.6)
Therefore, the excursion probability (10.0.1) is a weighted average of the volume
of tubes around ψ(M) of varying radii, and being able to compute
will go a long way to computing (10.0.1). In particular, if it were true that, in some
approximate sense,
Hl−1 (Tube(ψ(M), ρ)) ≈ 3j,l (ρ)Lj (ψ(M))
G (10.2.7)
j
This section, as its title suggests, describes the local geometry of Tube(M, ρ), where
throughout, we shall assume that M is a locally convex, C 2 , Whitney stratified man-
ifold. To formally define the local geometry of the tube, we first have to specify the
ambient space in which M is assumed to be embedded and the metric space in which
the tube lives. Although this sounds rather pedantic, it is important in appreciating
the intrinsic nature of the final form of the volume-of-tubes formula. As a locally
convex space, M by definition needs an ambient space M 3 in which to be embedded,
but M3 itself might be embedded in a Riemannian manifold (M, g), so that we have
the inclusion8
3 ⊂ M.
M⊂M
In this case,
Tube(M, ρ) = x ∈ M : dM (x, M) ≤ ρ , (10.3.1)
Q(M) = Q(i(M)).
We now begin to describe the local structure of Tube(M, ρ), beginning with a lin-
earization of M near t ∈ M, which we use as motivation for an explicit description
of Tube(M, ρ). Our ultimate goal is to come up with an explicit parameterization of
the tube that we can use to compute its volume via the differential-geometric tools of
Chapter 7.
As with all linearizations, there will be a region over which it works well, and
so we shall need to define various notions of critical radius for Tube(M, ρ). As we
shall see, Weyl’s tube formula is valid only for values of ρ smaller than the critical
radius of M, since for larger ρ the explicit parameterization of the tube breaks down.
The linearization we shall use is essentially a linearization of the metric projection
ξM : M → M given by
ξM (s) = argmint∈M d(s, t), (10.3.2)
PSG
tM
(Xt ) = argminYt ∈SG
tM
|Xt − Yt | = argminYt ∈St M |PTt M
3 Xt − Yt |. (10.3.3)
A simple convexity argument then shows that Xt projects to the origin if and only
3 Xt ∈ Nt M. Alternatively, if PTt M
if PTt M 3 Xt ∈/ Nt M, then there exists some point
Yt = 0 ∈ St M such that |PTt M X
3 t − Y t | < |Xt |.
This argument is based on a local linearization of M and M at t ∈ M; hence it
holds only locally. However, using the fact that near t, M can be approximated by
St M, it is sufficient to establish that
10.3 Local Geometry of Tube(M, ρ) 221
s ∈ M : ξM (s) = t, d(s, t) ≤ ρ
= expM Xt ∈ Tt M : PTt M 3 Xt ∈ Nt M, |Xt | ≤ ρ .
= expM (t, Xt ).
t∈M {X ∈
t Nt M:|Xt |≤ρ}
The second equality is based on the local linearization described above, i.e., if Xt ∈
/
Nt M then there exists a point s ∈ M such that
expM : B(Tt M, ρ) → M
is a diffeomorphism onto its range. This means that ρ should be less than
ρc (M) = sup r : expM : B(Tt M, r) → M (10.3.5)
is a diffeomorphism for all t ∈ M ,
For r < ρc,l (t, Xt ) the point expM (t, rXt ) projects uniquely to the point t.
Taking the infimum over Nt M, we define ρc : M → R, the local critical radius
of M in M at t, as
ρc (t) = inf ρc,l (t, Xt ), (10.3.7)
Xt ∈
Nt M∩S(Tt M)
Therefore, for ρ ≤ min(ρc (M, M), ρc (M)), the final expression in (10.3.4) is a
disjoint union. Consequently, for such ρ, Tube(M, ρ) is the image of the region
(t, Xt , r) : Xt ∈ Nt M ∩ S(Tt M), 0 ≤ r < ρ
For values of ρ larger than min(ρc (M, M), ρc (M)), the map F may not be one-
to-one. However, it is always one-to-one when restricted to the region
(t, Xt , r) : Xt ∈ Nt M ∩ S(Tt M), 0 ≤ r < min ρ, ρc,l (t, Xt ) . (10.3.9)
We now turn to describing the region (10.3.9) in more detail, essentially stratifying
Tube(M, ρ). After this, computing the volume of Tube(M, ρ) reduces to computing
an integral over each of its strata.
As usual, we assume that M is an N -dimensional, stratified Whitney space (M, Z),
decomposed as a disjoint union
N
M= ∂j M,
j =0
Finally, Tube(M, ρ) is the disjoint union of the sets Dj (ρ) under the maps Fj =
F ◦ Gj , so that we have
Fj : (t, s, r) → expM t, r si ηi (t) . (10.3.11)
i
That is,
Tube(M, ρ) = Fj (Dj (ρ)). (10.3.12)
j
Note that while this gives the tube in M around M, the same construction gives the
tube in M 3 around M if we merely replace expM in (10.3.11) by expM 3 , although the
3 is in general different from the critical radius of M in M.
critical radius of M in M
where
Sj (r) = {s ∈ M : dM (s, ∂j M) = r}, (10.4.3)
and j,r is the volume form induced on Sj (r) by αμ .
Alternatively, we can pull back (cf. (7.4.1)) j,r to the level sets making up Dj (ρ)
to obtain
9 Note that whereas we stated the coarea theorem for differentiable functions, the distance
function to ∂j M can be nondifferentiable on a set of dimension up to l −1. This, however, is
a set of too small dimension to affect the integral in (10.4.2). See footnote 25 in Chapter 7.
224 10 Volume of Tubes
ρ
∗
μ(Fj (Dj (ρ))) = 1Dj (ρ) (t, s, r)fμ (Fj,r (t, s))Fj,r ( j,r ) dr,
0 ∂j M×S(Rl−j )
(10.4.4)
where Fj,r is the partial map
Fj,r (t, s) = Fj (t, s, r).
∗ (
We are left with computing the pullback Fj,r as well as the integration
j,r ),
over Dj (ρ).
where Xj,(t,s) ∈ T(t,s) ∂j M × S(Rl−j ) and (W1,Fj,r (t,s) , . . . , Wl−1,Fj,r (t,s) ) is any
(suitably oriented) orthonormal basis of the tangent space at Fj,r (t, s) to the level set
(10.4.3), a hypersurface in M. The inner product here is, of course, that of M.
Therefore, in order to evaluate Fj,r ∗ (
j,r ) it will, in general, be necessary
to choose a suitably oriented orthonormal basis (W1,Fj,r (t,s) , . . . , Wl−1,Fj,r (t,s) ) of
TFj,r (t,s) Sj (r).
We now treat two specific cases, in both of which the exponential map has an
explicit form that makes computations feasible.10
Furthermore, we shall ultimately want to replace the integral in (10.4.4) by an
integral with respect to the natural volume form on ∂j M × S(Rl−j ), and so we note
here, for later use, the fact that if α is an n-form on an n-dimensional Riemannian
manifold with volume form , and X1 , . . . , Xn is an orthonormal basis for Tt M, then
10.4.3 Subsets of Rl
We now treat the case in which the ambient space is M = Rl . Our aim is to compute
the determinant in (10.4.5).
In this case, since the exponential map is the identity, Fj,r is particularly simple
and is given by
l−j
Fj,r (t, s) = t + r si ηi (t). (10.4.7)
i=1
This allows us to establish the following lemma, which will be a major step toward
computing (10.4.5).
Lemma 10.4.1. For any coordinate systems u on ∂j M and w on S(Rl−j ) and any
orthonormal frames (ηi )1≤i≤l−j normal to ∂j M, the following relations hold:
l−j
∂ ∂
Fj,r∗ = +r si ∇ ∂ ηi (t (u)), (10.4.8)
∂uk ∂uk ∂uk
i=1
l−j
∂ ∂si
Fj,r∗ =0+r ηi (t (u)). (10.4.9)
∂wk ∂wk
i=1
As a consequence, we have
+ l−j
,
∂
Fj,r∗ , si (w)ηi (t (u)) = 0, (10.4.10)
∂uk
i=1
+ l−j
,
∂
Fj,r∗ , si (w)ηi (t (u)) = 0. (10.4.11)
∂wk
i=1
Proof. Note first that throughout the proof, hopefully without too much ambiguity,
we shall identify tangent vectors Xy ∈ Ty Rl with vectors in Rl . Not doing so would
lead to even more unwieldy notation.
To verify (10.4.8) and (10.4.9), let (U, ϕ) be a chart on M, in which case we can
rewrite (10.4.7) locally as a mapping from U × S(Rl−j ) by defining
l−j
3j,r (u, w) = ϕ(u) + r
F si (w)ηi (ϕ(u)). (10.4.12)
i
With the usual abuse of notation that identifies ∂/∂uk and ϕ∗ (∂/∂uk ) we can
compute Fj,r∗ (∂/∂uk ) from F 3j,r∗ (∂/∂uk ) ≡ F 3j,r∗ (ϕ∗ (∂/∂uk )). To see how to do
this, take the standard basis (e1 , . . . , el ) on R and start by noting that for 1 ≤ m ≤ l,
l
1 ≤ k ≤ l − j,
226 10 Volume of Tubes
∂ 3
Fj,r (u, s)
∂uk m
l−j
∂ϕ(u) ∂ηi (ϕ(u)), em
= +r si
∂uk m ∂uk
i
l−j H I
∂ ∂
= ϕ∗ +r s i ϕ∗ ηi (ϕ(u)), em
∂uk m ∂uk
i
l−j
∂ ∂
= +r si ηi (ϕ(u)), em ,
∂uk m ∂uk
i
where in this case the notational abuse appears in the passage between the last two
lines. Note, however, that from the “compatability’’ property of the Levi-Civitá
connection (cf. (7.3.11)), we have
∂ ∂
ηi (ϕ(u)), em = ∇∂/∂uk ηi (ϕ(u)), em + ∇∂/∂uk em , ηi (ϕ(u)),
∂uk ∂em
= ∇∂/∂uk ηi (ϕ(u)), em ,
since ∇∂/∂uk em ≡ 0.
Substituting this in the above leads to
∂ 3j,r∗ ∂
Fj,r∗ =F
∂uk ∂uk
l
∂ 3 ∂
= Fj,r (u, s)
∂uk m ∂e m
m=1
∂ ∂
= ∂/∂uk + r si ηi (ϕ(u)), em
m
∂uk ∂em
i
∂
= ∂/∂uk + r si ∇∂/∂uk ηi (ϕ(u)), em
m
∂em
i
= ∂/∂uk + r si ∇∂/∂uk ηj (ϕ(u)),
i
∂si (w)
aki (s(w)) = , (10.4.13)
∂wk
l−j
aki (s(w))si (w) = 0. (10.4.14)
i=1
Therefore,
+ l−j
,
∂
Fj,r∗ , si (w)ηi (t (u))
∂wk
i=1
+ l−j ,
l−j
= akn (s(w))ηn (t (u)), si (w)ηi (t (u))
n=1 i=1
l−j
= aki (s(w))si (w)
i=1
= 0.
We now return to our main task, the computation of the pullback and, specifically,
the determinant in (10.4.5). By (10.4.6), it will suffice to do this for an orthonormal
basis of T(t,s) ∂j M × S(Rl−j ). To this end, let (X1,t , . . . , Xj,t ) be an orthonormal
basis of Tt ∂j M, and for 1 ≤ m ≤ l − j − 1, let
l−j
∂
3m,s =
X ami (s) ,
∂si
i=1
for TFj,r (t.s) Sj (r). By Lemma 10.4.1, it is clear that such a basis is an orthonormal
basis for the orthogonal complement, in TFj,r (t,s) Rl , of the vector
228 10 Volume of Tubes
l−j
η(t, s) = si ηi (t), (10.4.15)
i=1
with base at t ∈ ∂j M and going through the point Fj,r (t, s).11
This motivates the following choice, with the Xj,t and X 3m,s as above: For 1 ≤
m ≤ j , let
l−j
Wj +m,Fj,r (t,s) = ami (s)ηi (t).
i=1
It is not difficult to see that the set {W1,Fj,r (t,s) , . . . , Wl−1,Fj,r (t,s) } is an orthonor-
mal basis of TFj,r (t,s) Sj (r) since it is an orthonormal set and all elements are orthog-
onal to η(t, s).
Returning to (10.4.5), we can now compute
) *
det Fj,r∗ Xk,(t,s) , Wm,Fj,r (t,s) 1≤k,m≤l−1 , (10.4.16)
31,s , . . . , X
for the Wm,Fj,r (t,s) just constructed, and taking X1,t , . . . , Xj,t , X 3l−j −1,s as
our choice for the vectors Xk,(t,s) . In view of (10.4.6), this will suffice.
The matrix in (10.4.16) can be broken into four blocks. For the first, we take
1 ≤ k, m ≤ j , so that in the notation on Lemma 10.4.1, and for some collection of
coefficients bik (t), it has elements
) *
Fj,r∗ Xk,t , Wm,Fj,r (t,s)
+ ⎛ ⎞ ,
j
∂
= Fj,r∗ ⎝ bik (t) ⎠ , Xm,t
∂ui
i=1
+⎛ ⎞ ,
j
∂
l−j
= bik (t) ⎝ +r sn ∇ ∂ ηn ⎠ , Xm,t
∂ui ∂ui
i=1 n=1
+ ,
l−j
= Xk,t + r sn ∇Xk,t ηn , Xm,t ,
n=1
the second equality here following from (10.4.8) of Lemma 10.4.1 and where η =
η(s, t) is the normal vector of (10.4.15).
However, noting the orthonormality of the Xk,t and applying the Weingarten
equation (7.5.12), it is immediate that the above is equal to
11 Recall our convention of identifying tangent vectors X ∈ T Rl with vectors in Rl .
y y
10.4 Computing the Volume of a Tube 229
where we have used (7.2.11) in going from the third to the fourth line.
Putting the above together, we have proven the following theorem, which is one
of the two main results of this section.
j
1 Tt ∂j M i
= r l−j −1+i Tr (S−η ),
i!
i=0
j
l ρ
μ (Tube(M, ρ)) = 1Dj (ρ) (t, s, r)fμ (Fj,r (t, s))
j =0 i=0 ∂j M S(Rl−j ) 0
1 Tt ∂j M i
× r l−j −1+i Tr (S−η ) drHl−j −1 (ds)Hj (dt),
i!
(10.4.19)
where Hl−j −1 is standard surface measure on S(Rl−j ) and Hj is the volume measure
induced on ∂j M by the Riemannian metric 3 g.
In this section, we treat the case M = Sλ (Rl ), the sphere of radius λ in Rl . This case
is very similar to the previous one, so we need only go over the major differences.
The geodesic metric on Sλ (Rl ) is given by
The exponential mapping expSλ (Rl ) is also explicitly computable in this case. For a
unit vector Xt ∈ Tt Sλ (Rl ) and r > 0,
l−1−j
Fj,r (t, s) = cos(r/λ)t + λ sin(r/λ) si ηi (t)
i=1
for some orthonormal frame (η1 (t), . . . , ηl−1−j (t)) spanning Tt
∂j M ⊥ , the orthogonal
complement of Tt ∂j M in Tt M = Tt Sλ (R ). l
The orthogonality relations (10.4.10) and (10.4.11) still hold in this case, with the
minor change that the Wi of the previous subsection now give a coordinate system
on S(Rl−1−j ) instead of S(Rl−j ).
To be more specific, to construct an orthonormal basis on TFj,r (t,s) Sj (r), we set
l−j −1
Wj +m,Fj,r (t,s) = ami (s)ηi (t),
i=1
10.5 Weyl’s Tube Formula 231
where
l−j −1
3m (s) = ∂
X ami (s) , 1 ≤ m ≤ l − 2 − j,
∂si
i=1
where the first unit matrix is of size j and the second of size l − j − 1. The vector
η = η(s, t) is as in the previous subsection and S is now the scalar second fundamental
form of ∂j M in Sλ (Rl ) (cf. (7.5.11)).
We thus have the following result.
j
= λl−2−j +i cos(r/λ)j −i sin(r/λ)l−j −2+i Tr Tt ∂j M (S−η
i
),
i=0
j
l−1 ρ
μ (Tube(M, ρ)) = 1Dj (ρ) (t, s, r)fμ Fj,r (t, s)
j =0 i=0 ∂j M S(Rl−1−j ) 0
1
× λl−2−j +i cos(r/λ)j −i sin(r/λ)l−j −2+i
i!
× Tr Tt ∂j M (S−η
i
) drHl−j −2 (ds)Hj (dt),
are not intrinsic. On the other hand, they are a little more general than the classic
Weyl’s tube formulas, for which μ is either Hl or Hl−1 .
In order to obtain an intrinsic formulation of these results, and so rederive the
classic Weyl’s tube formulas, we need to specialize down to Hl or Hl−1 and prove
that only intrinsic quantities arise when the integrals in Theorems 10.4.2 and 10.4.3
are evaluated. This is the deep part of Weyl’s tube formula,12 and it is one of the first
appearances of Lipschitz–Killing curvatures.
The derivation that follows is a little unusual, since it involves expectations of
random Gaussian forms, and so is not what you will find in a standard textbook.
Nevertheless, it does fit in nicely with the theme of this book, which attempts to
integrate both geometry and probability.
The drawback of this approach is that the following argument depends on
Lemma 12.3.1, which is still two chapters into the future. With only minor effort,
we could rearrange things so that the lemma appeared here rather than there, but it is
in Chapter 12, where it logically belongs. To make things a little easier for you, we
shall “precall’’ one definition now.
Let V be a vector space and μ ∈ 1,1 (V ) a double form on V . Furthermore, let
Cov : (V ⊗ V ) × (V ⊗ V ) → R be bilinear, symmetric, and nonnegative definite.
We think of μ as a mean function and Cov as a covariance function, and call W a
random, Gaussian 2-form on V ⊗ V with mean function μ and covariance function
Cov if for all finite collections of pairs (vi1 , vi2 ) ∈ V ⊗ V , the W (vi1 , vi2 ) have a
joint Gaussian distribution with means
E W (vi1 , vi2 ) = μ(vi1 , vi2 )
and covariances
E W (vi1 , vi2 ) − μ(vi1 , vi2 ) · W (vj1 , vj2 ) − μ(vj1 , vj2 )
= Cov (vi1 , vi2 ), (vj1 , vj2 ) .
for some constant C 3 and some intrinsic quantity Q(∂j M), which will turn out to be
related to the intrinsic volumes of M.13
12 In fact, of the computations of Sections 10.3 and 10.4, albeit in the case of manifolds in Rl
and S(Rl ) without boundary, and for Hl and Hl−1 , Weyl wrote that “so far we have hardly
done more than what could have been accomplished by any student in a course of calculus.’’
13 Read Vol
3g ⊥ (ds) and Vol3
|S(Tt ∂j M ) g|∂ M (dt) for ds and dt here.
j
10.5 Weyl’s Tube Formula 233
Recall that in the Euclidean case, we have S(Tt ∂j M ⊥ ) = S(Rl−j ), while for the
spherical case, S(Tt ∂j M ⊥ ) = S(Rl−j −1 ).
As a first step, define the constants
!
(2π)i/2
C(m, i) = sm+i , m + i > 0, (10.5.1)
1, m = 0,
where as usual, sm = 2π m/2 / m2 is the surface area of the unit sphere in Rm .
Our first result relating the quantities in Theorem 10.4.2 to intrinsic quantities is
the following.
⊥
with {ηi (t)}1≤i≤l−j an orthonormal basis of T
t ∂j M , the orthogonal complement of
Tt ∂j M in Rl .
This is well defined since 0 has only one element and is effectively equivalent to
defining S(0) = 0. We also adopt the convention, for the scalar second fundamental
form, that
234 10 Volume of Tubes
!
j 1, j = 0,
S =
0 0 otherwise.
These issues will, thankfully, arise only when we define the Lipschitz–Killing curva-
tures. The conventions are pedantic, but are important to ensure that the Lipschitz–
Killing curvature measures for locally convex spaces, defined in (10.5.4), agree with
the Lipschitz–Killing curvature measures for smooth manifolds defined in (7.6.1).
The proofs of Lemmas 10.5.1 and 10.5.3 are virtually identical and are based on
the following lemma, for which we recall that γRm denotes Gauss measure on Rm
and ηm denotes uniform measure over S(Rm ).
Proof.
E Pi (W )1A (W )
= E |W |i Pi (W/|W |)1A (W/|W |)
∞
= ri Pi (ν)1A (ν)ηm (dν)P|W | (dr)
0 S(Rm )
m
2 ∞
= r i
P |W | (dr) Pi (ν)1A (ν)Hm−1 (dν)
2π m/2 0 S(Rm )
1 m+i
= (2π)−m/2 2(i+m)/2 Pi (ν)1A (ν)Hm−1 (dν),
2 2 S(Rm )
which, on comparing with the definition of the C(m, i) at (10.5.1), proves the
lemma.
Proof of Lemma 10.5.1. Our proof is somewhat indirect and, as mentioned above,
proceeds via Gaussian random forms, which are not generally found in geometric
arguments. Rather than taking one expression in (10.5.2) and showing that it is equal
to the other, we shall show that both are given by a certain expectation.
One of the nice aspects of this approach is that it immediately shows that both
sides of (10.5.2) are truly intrinsic,14 since the expectation will, a priori, involve only
intrinsic quantities.
We start with the case j < N , leaving the easier case of j = N for later.
Therefore, let Xl−j ∼ γ ⊥ be a canonically distributed Gaussian random
Tt ∂j M
vector on Tt
∂j M ⊥ , the orthogonal complement of Tt ∂j M in Tt M. For a fixed or-
thonormal basis of Tt∂j M ⊥ , the random variable
1 Tt ∂j M i
Tr (S−Xl−j )
i!
is a homogeneous polynomial of degree i in the components of Xl−j in this basis.
Hence, by Lemma 10.5.5,
1 Tt ∂j M i
E Tr (S−Xl−j )1
Nt M
(X l−j )
i!
1 Tt ∂j M i
= C(l − j, i) 1 (ν )
N M l−j
Tr (S−νl−j )Hl−j −1 (νl−j )
S(T
t ∂j M)
t i!
1 T t ∂j M i
= C(l − j, i) 1
NM
(η) Tr (S−η )Hl−j −1 (ds),
S(Rl−j ) t i!
With this notation, and noting that the scalar fundamental form Sη is linear in η, we
can write
1 Tt ∂j M i
i
1
Tr (S−Xl−j ) = Tr Tt ∂j M S−X
m
S i−m
l−N −XN−j
,
i! m!(i − m)!
m=0
where the double forms S−XN−j and S−Xl−N are independent, since XN−j and
Xl−N are.
Furthermore,
1
NM
(Xl−j ) = 1Nt M (XN−j ).
t
Consequently, we have
1 T t ∂j M i
E Tr (S−Xl−j )1
Nt M
(Xl−j )
i!
i
1
= Tr Tt ∂j M E S−X
m
S i−m
l−N −XN−j
1 N M (X N−j )
m!(i − m)! t
m=0
i
1
= Tr Tt ∂j M E S−X
m
E 3S i−m
1 (X ) ,
−XN−j N M N−j
m!(i − m)! l−N t
m=0
where the second-to-last line follows from the Gauss equation (7.5.9) and the last
from the fact that M = Rl , and Rl is flat.
Consequently, applying Lemmas 10.5.5 and 12.3.1, we have
1 Tt ∂j M i
E Tr (S−Xl−j )1
Nt M
(Xl−j )
i!
2!
i
(−1)m
= Tr Tt ∂j M R3m E 3 i−2m
S−X 1 N M (X N−j )
m!(i − 2m)! N −j t
m=0
2!
i
(−1)m
= E Tr Tt ∂j M R3m3i−2m
S−X 1Nt M (X N−j )
m!(i − 2m)! N −j
m=0
2!
i
(−1)m
= C(N − j, i − 2m)
m!(i − 2m)!
m=0
× Tr Tt ∂j M R3m3 i−2m
S−ν 1Nt M (νN−j )HN−j −1 (dνN−j ),
N−j
S(Tt ∂j M ⊥ )
which is the right-hand side of (10.5.2), and so we are done when j < N .
When j = N we need the conventions that we adopted in Remark 10.5.2 to sort
things out. If j = N < l then Nt M = 0 ⊂ Tt M, 3 and by our conventions,
2!
i
(−1)m
C(N − j, i − 2m)
m!(i − 2m)!
m=0
× Tr Tt ∂j M R3m3i−2m
S−ν 1Nt M (νN−j )HN−j −1 (dνN−j )
N−j
S(Tt ∂j M ⊥ )
!
3 3k
Tr Tt M (R ), i = 2k is even,
=
0, i is odd.
The right-hand side of (10.5.2) is also 0 in this case, by our conventions and the fact
3 = 0, and so we are done.
that R
The proof of Lemma 10.5.3 is virtually identical and so we omit it. As in the
previous proof, there are some special cases (j = N = l − 1 and j < N = l) to
be checked, but the conventions established in Remark 10.5.4 again suffice to handle
these.
We are now almost ready to complete the derivation of Weyl’s tube formula on Rl
and Sλ (Rl ). For 0 ≤ j ≤ k and l − k ≤ i ≤ l define the (signed) Lipschitz–Killing
curvature measures [64] of a locally convex space M for 0 ≤ i ≤ N by
238 10 Volume of Tubes
j −i
N 2 !
−(j −i)/2 (−1)m
Li (M, A) = (2π) C(N − j, j − i − 2m)
m!(j − i − 2m)!
j =i m=0
(10.5.4)
× 3m3
Tr Tt ∂j M R −i−2m
SνjN−j
∂j M∩A S(Tt ∂j M⊥)
N
Li (M, A) = (2π)−(j −i)/2 C(l − j, j − i) (10.5.5)
j =i
1
× Tr Tt ∂j M (Sηj −i )
∂j M∩A S(Rl−j ) (j − i)!
× 1
NM
(−η)Hl−j −1 (dη)Hj (dt),
t
As before, the (signed) total masses of the curvature measures give the intrinsic
volumes
Li (M) = Li (M, M). (10.5.6)
N
Hl (Tube(M, ρ)) = ρ l−i ωl−i Li (M). (10.5.7)
i=0
The upper limit of the sum can be taken as infinity, since Li (M) ≡ 0 for i > N .
Hl (Tube(M, r))
N j ρ r l−i−1 j −i
= 1Dj (ρ) (t, s, r) Tr Tt ∂j M (S−η ) dr ds dt.
∂j M S(Rl−j ) 0 (j − i)!
j =0 i=0
10.5 Weyl’s Tube Formula 239
Applying Lemma 10.5.1 to each term in the above summation, followed by integration
with respect to r on [0, ρ], yields the desired conclusion.
where
ρ
π b/2
Ga,b (ρ) = cosa (r) sinb−1 (r) dr
b b2 + 1 0
π b/2
= b I B (a+1)/2,b/2 (cos2 ρ),
2
with
1
I B (a+1)/2,b/2 (x) = x (a−1)/2 (1 − x)(b−2)/2 dx
x
Lemma 10.5.8. The following equivalences hold between the standard and extended
Lipschitz–Killing curvatures:
∞
(−κ)n (i + 2n)!
Lκi (·) = Li+2n (·) (10.5.11)
(4π)n n!i!
n=0
and
∞
κ n (i + 2n)! κ
Li (·) = Li+2n (·). (10.5.12)
(4π)n n!i!
n=0
Before starting the proof, we note that, at least for some examples, it is easier
to compute the Lκj than the Lj . For example, suppose we use (10.5.10) to compute
L1j (S(Rl )). Then, since the scalar second fundamental form is zero for S(Rl ) as it
sits in itself, the only nonzero curvature is the highest-order one, and this is given by
L1l−1 (S(Rl )) = sl .
10.5 Weyl’s Tube Formula 241
From this and (10.5.12) it is now just a little algebra to check that
l − 1 sl
Lj (S(R )) = 2
l
j sl−j
for l − 1 − j even, and 0 otherwise, a result that we already computed from first
principles back in Chapter 6 (cf. (6.3.8)).
Proof. We shall prove only (10.5.11), since (10.5.12) follows from an almost identical
argument. In both cases the argument is primarily algebraic and rather elementary,
but since keeping track of combinatorial coefficients is not always trivial, and this
lemma will be rather important later, we give the details.
By the definition (10.5.8), Lκi (M, A) is given by
j −i
N 2 !
(−1)m
(2π )−(j −i)/2 C(N − j, j − i − 2m)
m!(j − i − 2m)!
j =i m=0
m
× Tr Tt ∂j M R 3+ κ I2 3 −i−2m
SνjN−j
∂j M∩A S(Tt ∂j M ⊥ ) 2
× 1Nt M (−νN−j )HN−j −1 (dνN−j )Hj (dt)
j −i
N 2 !m
m C(N − j, j − i − 2m)
= (2π)−(j −i)/2 (−κ/2)n
n m!(j − i − 2m)!
j =i m=0 n=0
× 3 m−n3
Tr Tt ∂j M I 2n (−R) −i−2m
SνjN−j 1Nt M (νN−j )
2 !
N−i
(4π)−n (−κ)n (i + 2n)!
k
(2π )−(j −i−2n)/2
n! i!
n=0 j =i+2n
j −i−2n
!
C(N − j, j − i − 2m − 2n)
2
×
m!(j − i − 2m − 2n)!
m=0
× Tr Tt ∂j M (−R)3 m3 −i−2m−2n
SνjN−j 1Nt M (νN−j )
where ξ 2 ∼ χl2 . On the other hand, Theorem 10.5.7 gives an explicit expression for
Hl−1 Tube(ψ(M), cos−1 (u/|ξ |))
for cos−1 (u/|ξ |) < ρc (M, S(Rl )). However, as |ξ | → ∞, the radius of this tube
grows and may surpass ρc (M, S(Rl )), in which case Weyl’s formula is not exact.
Formally ignoring this fact, we arrive at the following result.
where
!
1 − (u), j = 0,
ρj (u) = −(j +1)/2 −u 2 /2
(2π) Hj −1 (u)e , j ≥ 1,
and Hj is the j th Hermite polynomial (11.6.9).
Remark 10.6.2. It is important to understand the (im)precise meaning of (10.6.1).
The “≈’’ does not indicate approximate equality, asymptotic equality, or any other
relationship that at this stage equates the left- and right-hand sides. Rather, it indi-
cates that the right-hand side comes from taking an expression that is eqivalent to the
left-hand side, and substituting into it expressions that, while sometimes equivalent,
are sometimes definitely wrong. Only in Chapter 14 shall we show, via completely
different methods, that the purely formal equivalence in (10.6.1) can be replaced by
asymptotic, in u, equivalence, with specific bounds on the error of doing so.
Proof. Since the approximation formally ignores the condition ρ < ρc (M) in The-
orem 10.5.7, we have
P sup ft ≥ u
t∈T
N
2l
≈ Lj (ψ(M))
2π l/2
j =0
⎧ ⎫
⎪
⎨
j
2! −n
⎪
⎬
(−4π) j!
×E Gj −2n,l−1+2n−j (cos−1 (u/|ξ |))1{|ξ |≥u} ,
⎪
⎩n=0 n! (j − 2n)! ⎪
⎭
We artificially decompose ξ as
|ξ |2 = X1 + X2 ,
2 u
Finally,
j2 !
2l (−4π)−n j! −1
E G j −2n,l−1+2n−j (cos (u/|ξ |))1 {|ξ |≥u}
2π l/2 n! (j − 2n)!
n=0
j
2!
∞
−(j +1)/2 j !(−1)n
x j −2n e−x
2 /2
= (2π ) dx
n!(j − 2n)!2n u
n=0
∞
= (2π )−(j +1)/2 Hj (x)e−x
2 /2
dx
u
= ρj (u),
as required, with the last line following from the basic properties of Hermite polyno-
mials (cf. (11.6.12)).
So far, we have defined intrinsic volumes for the convex sets of Section 6.315 via
integral-geometric techniques, and for the Riemannian manifolds of Section 7.6 and
this chapter via Lipschitz–Killing curvatures, for spaces that possess a local convexity
property. However, for our computations in Part III we shall also need intrinsic
volumes of Whitney stratified spaces for which no convexity of any kind need hold.
15 We also gave a possible, but not very promising, extension to basic complexes via kinematic
integrals at (6.3.11).
10.7 Intrinsic Volumes for Whitney Stratified Spaces 245
Thus, in the current section, after describing the basic conditions under which
they can be defined, we shall define intrinsic volumes in a more general setting. The
defining expression will be a natural generalization of (10.5.4), which covered the
locally convex scenario.
However, while most of what we have done so far was devoted to and motivated
by the relationship between Weyl’s formula and intrinsic volumes, this will no longer
be the case. In fact, as we shall soon see in Section 10.8, Weyl’s formula need not
hold in the non–locally convex setting, and so we need alternative motivation for the
new curvatures. You can find this, among other details, in the differential geometry
literature (e.g., [24, 33]) or you can wait for Chapter 12, where Lipschitz–Killing
curvatures will arise when we compute the expected Euler characteristic of excursion
sets. The extended definition follows, after we make one remark about conventions
concerning the normal Morse index α of Section 9.2.
α(0) = 1.
where Hj is the volume form on ∂j M and Hl−j −1 the volume form17 on S(Tt ∂j M ⊥ ),
both determined by 3g.
We shall assume that all the signed measures defined here are finite.18
As we did before (cf. Lemma 10.5.8), we can again extend these measures to
a one-parameter family of Lipschitz–Killing curvature measures, which we do by
setting
16 The assumption on the set of degenerate tangent vectors is prompted by the same issue that
arose in Section 9.2, that α may not be well defined in vectors that annihilate a generalized
tangent space.
17 Of course, H
l−j −1 really depends on t, but we have more than enough subscripts already.
18 A priori, there is no reason for the L (M, ·) to be finite. However, for all the examples we
j
consider in this section and indeed, all situations of interest in this book, that will in fact be
the case. We therefore invoke this as a standing assumption for the remainder of the book.
246 10 Volume of Tubes
∞
(−κ)−n (i + 2n)!
Lκi (M, A) = Li+2n (M, A). (10.7.2)
(4π)n n!i!
n=0
Furthermore, we can define Lipschitz–Killing curvatures, or intrinsic volumes, of
M by
Lj (M) = Lj (M, M), Lκj (M) = Lκj (M, M). (10.7.3)
Although it is not at all obvious from the definition (10.7.1), it can be shown that
the measures Lκj (M; ) are independent of the stratification of M. Furthermore, Lκj is
a finitely additive valuation in its first variable and a (signed) measure in the second
one (cf. [24]).
The Lipschitz–Killing curvature L0 (M) is actually the Euler–Poincaré charac-
teristic of M, as defined via (8.1.1) and a triangulation. The fact that the curvature
integral and the Euler characteristic are equivalent is a celebrated result, known as
the Chern–Gauss–Bonnet theorem.
To get a feel for the fact that (10.7.1) is not quite as forbidding as it seems, you
might want to look at the simple example19 of the unit cube in RN now to see how it
works for familiar sets.
19 Consider, for our first example, a rectangle in RN equipped with the standard Euclidean
metric. Thus, we want to recover (6.3.5), that is,
N
Lj [0, T ]N = Tj. (10.7.4)
j
Since we are in the Euclidean scenario, both the Riemannian curvature and second fun-
damental form are identically zero, and so the only terms for which the final integral in
(10.7.1) is nonzero are those for which l = k − j = 0. Thus all of the sums collapse and
(10.7.1) simplifies to
N −j
π −(N−j )/2 2−1 Hj (dt) α(ν)HN−j −1 (dν).
2 ∂j M S(Tt⊥ ∂j M)
(i) Referring to the convex polytope discussion in Section 9.2, α(ν) = 1Nt ∂j M (−ν) is
zero except on a (1/2N−j )th part of the sphere S(T ⊥ ∂j M), and the measure HN−j −1
is surface measure on S(T ⊥ ∂j M). Consequently,
2π (N−j )/2 2−(N−j )
α(ν)HN−j −1 (dν) = .
S(Tt⊥ ∂j M) N−j 2
(ii) There are 2N−j N j disjoint components to ∂j M, each one a j -dimensional cube of
edge length T .
(iii) The volume form on ∂j M is Lebesgue measure, so that each of the cubes in (ii) has
volume T j .
Now combine all of the above, and (10.7.4) follows.
10.7 Intrinsic Volumes for Whitney Stratified Spaces 247
where detr j is given by (7.2.8) and the curvature matrix Curv is given by
Curv(i, j ) = SEN (Ei , Ej ). (10.7.7)
It is important to note that while the elements of the curvature matrix may de-
pend on the choice of basis, detr N−1−j (Curv) is independent of the choice, as will
be Lj (M, U ).
Since all of the simplifications (and examples in the footnotes) that we have just
looked at are locally convex, in the sense that they have convex support cones, we now
describe how to compute the Lipschitz–Killing curvatures for one highly nonconvex
example, a many-pointed star.
By a star, we mean the union of a finite collection line segments L1 , . . . , Lr
' a point v0 with endpoints (vj )1≤j ≤r . The ambient space is R . The
emanating from 2
20 For the ball B N (T ) we want to recover (6.3.7). It is easy to check that the second funda-
mental form of ∂B N (T ) = S N−1 (T ), with respect to the inner normal, is a constant T −1
over the sphere. Thus the term involving its trace can be taken out of the integral in (10.7.5)
leaving only the volume of S N−1 (T ), given by sN T N−1 . To compute the trace we use
(7.2.11), to see that Tr S N−1−j = T −(N−j −1) (N − 1)!/j !, so that
sN T N−1 T −(N−j −1) (N − 1)! N − 1 sN N ωN
Lj (B N (T )) = = Tj = Tj ,
sN−j (N − 1 − j )!j ! j sN−j j ωN−j
which is (6.3.7).
248 10 Volume of Tubes
Lj (M, ) = 0, j ≥ 2,
and
r
L1 (M, A) = H1 (Lj ∩ A),
j =1
1
L0 (M, vj ) = , 1 ≤ j ≤ r.
2
As for L0 (M, v0 ), note that for ν a unit vector emanating from v0 ,
r
α(ν) = 1 − 1{ν,Lj <0} ,
j =1
be a solid angle in the plane of angle φ. Let v0 = (0, 0), v1 = (1, 0), v2 =
(cos φ, sin φ) be the three vertices of Mφ . Then,
L2 (Mφ , U ) = H2 (U ∩ Mφ ),
1
L1 (Mφ , U ) = H1 (U ∩ ∂Mφ ),
2
1
L0 (Mφ , U ) = H1 (U ∩ Mφ ∩ S(R2 ))
2π
φ−π 1 1
− 1U (v0 ) + 1U (v1 ) + 1U (v2 ).
2π 4 4
10.8 Breakdown of Weyl’s Tube Formula 249
l
1
Li (M, A) = (2π)−(j −i)/2 (10.7.8)
(j − i)!
j =i
× E Tr Tt ∂j M (SXl−j,t )j −i α(Xl−j,t ) Hj (dt),
∂j M∩A
where for each t ∈ ∂j M, Xl−j,t ∼ γTt ∂j M ⊥ and α is the normal Morse index at t ∈ M
as it sits in Rl .
If M is a Whitney stratified space in S√κ (Rl ), then
l−1
1
Lκi (M, A) = (2π)−(j −i)/2 (10.7.9)
(j − i)! ∂j M∩A
j =i
× E Tr Tt ∂j M (SXl−1−j,t )j −i α(Xl−1−j,t ) Hj (dt),
l
Hl (Tube(M, ρ)) = ρ l−j ωl−j Lj (M)
j =0
As we saw in the earlier sections, the measures Lj (M; ) are signed measures on M
that determine the coefficients in a (Lebesgue) volume-of-tubes expansion around
M when M is embedded in Rl . These measures are adequate to derive Weyl’s tube
formula, but they will need to be generalized to continue our general study of the
volume-of-tubes problem, that is, computing μ(Tube(M, ρ)) for measures other than
Lebesgue.
Our interest in general measures μ is not purely academic, for the main result in
Chapter 15, which treats non-Gaussian random fields, relies heavily on the coefficients
in the tube expansion
Since this is the main expansion that we shall need, for this section we shall assume
that the dimensions of both the underlying manifold and its ambient Euclidean space
are l. That is, in our earlier notation,
3 = M = Rl ,
M dim(M) = N = l = dim(M).
The generalization of the measures Lj (M, ·) is needed to deal with the term
fμ (Fj,r (t, s)) in Theorem 10.4.2, which was, conveniently, identically 1 in Weyl’s
tube formula. The general strategy is to assume that fμ is smooth enough so that
there is a Taylor series expansion for fμ (Fj,r (t, s)) of the form
∞
rm dm
fμ (Fj,r (t, s)) = fμ (t + rη(t, s)) = fμ (t), (10.9.1)
m! dηm
m=0
10.9 Generalized Lipschitz–Killing Curvature Measures 251
where, as in (10.4.15), η ≡ η(t, s) is the unit vector from t to Fj,r (t, s), and by
d m f/dηm we mean the mth-order derivative of f in the direction η(t, s).
Using this expansion, we proceed to integrate
dm
fμ (t)
dηm
over Sj (r), the hypersurface at distance r from ∂j M for 0 ≤ j ≤ l. After integrat-
ing over Sj (r), we integrate r over [0, ρ], much as we did in deriving Weyl’s tube
formula (10.5.7).
We begin by defining generalized Lipschitz–Killing curvature measures on
SB(Rl ), the sphere bundle of Rl . These measures have finer information than the
basic Lipschitz–Killing curvature measures in that, up to a constant, they encode
surface measure on the hypersurfaces at distance r from M.
Before we explicitly define these measures, we note that they should be measures
on SB(Rl ), the sphere bundle of Rl , which is canonically isomorphic to the product
S(Rl ) × Rl . For simplicity, however, we choose to define the generalized Lipschitz–
Killing measures on S(Rl ) × Rl rather than on SB(Rl ), keeping this canonical iso-
morphism in mind. We also note that when it is convenient, we shall think of them
as measures on SB(Rl ).
Definition 10.9.1. Let (M, Z) ⊂ Rl be an l-dimensional Whitney stratified space
satisfying the requirements of Definition 10.7.2. The generalized Lipschitz–Killing
curvature measures of M, defined on Borel sets A ⊂ Rl , B ⊂ S(Rl ) and supported
on M × S(Rl ), are defined, for 0 ≤ i ≤ l − 1, by
j −i
l 2 !
3i (M, A × B) = (−1)m C(l − j, j − i − 2m)
(2π)−(j −i)/2
L (10.9.2)
m!(j − i − 2m)!
j =i m=0
× Tr Tt ∂j M R3m3 −i−2m
Sνjl−j
∂j M∩A S(T ⊥ ∂j M)∩B
× α(νl−j )Hl−j −1 (dνl−j )Hj (dt),
is the subsigma algebra of B(Rl )⊗B(S(Rl )) generated by functions that are constant
over S(Rl ).
Corollary 10.9.2. Let (M, Z) ⊂ Rl be a Whitney stratified space satisfying the re-
quirements of Definition 10.7.2. Then, for any Borel set A ⊂ Rl ,
3j (M, A × S(Rl )).
Lj (M, A) = L
Having defined the measures {L 3j (M, )}0≤j ≤l , we now proceed to describe how they
can be used to integrate over ∂ Tube(M, r), the hypersurface of distance r from a
locally convex space M.
Before we attack this task in earnest, we revisit the Minkowski functionals, which
we defined, in the context of integral geometry, in Section 6.3. Recall that they were
defined as
Mj (M) = j !ωj Ll−j (M),
l
rj
Hl (Tube(M, r)) = Mj (M).
j!
j =0
That is, Weyl’s tube formula can be expressed as a (finite) power series with the
Mj (M) as coefficients. This motivates our definition of Minkowski curvature mea-
sures as
Mj (M, A) = (j !ωj )Ll−j (M, A), (10.9.4)
and generalized Minkowski curvature measures as
Lj (M, A × B) =
M
3l−j (M, A × B).
(j !ωj )L (10.9.5)
10.9 Generalized Lipschitz–Killing Curvature Measures 253
We are now ready to prove the main result of this section, in which we derive a power
series expansion for
μ(Tube(M, ρ))
when μ has a bounded, analytic density with respect to Lebesgue measure. It is
possible to replace analytic functions with Schwarz functions, although the power
series expansion below is then only formal (cf. [158]).
Suppose, in addition, that fμ is analytic and for every ε > 0 there exists a compact
K(ε) ⊂ Rl such that
ρ n
l−1 j (−r)m d m
r L
|Mj +1 | M, 1K(ε)c ◦ F−r × f μ dr < ε, (10.9.8)
0 j! m! dη m
j =0 m=0
∞
ρ
l−1 j
r (−r)m d m
fμ (x) dHl (x) = Lj +1
M M, fμ dr
Tube(M,ρ) 0 j! m! dηm
j =0 m=0
∞
ρj μ
= μ(M) + Mj (M), (10.9.9)
j!
j =1
where
j −1
μ j −1 j −1−m L d j −1−m
Mj (M) = (−1) Mm+1 M, j −1−m fμ . (10.9.10)
m dη
m=0
As far as (10.9.9) is concerned, there are two things to show. The first is that the
first equality is valid, and the second that the final expression is equal to that on the
line above it. The first inequality arises by formally replacing fμ ◦ F−r in (10.9.7)
by its Taylor series expansion as in (10.9.1), that is,
∞
(−r)m d m
fμ (F−r (t, s)) = fμ (t).
m! dηm
m=0
We do, however, need to check that the resulting series expansion in (10.9.9) con-
verges nicely. Fix ε > 0 and choose K(ε) satisfying (10.9.8). Since we long ago
agreed that we consider only tubes of finite measure, we can, without loss of gener-
ality, assume that
n
(−r)m d m n→∞
m
fμ (t) → fμ (t − rη)
m! dη
m=0
n
r(s)m d m n→∞
fμn (s) = fμ (t (s)) → fμ (s)
m! dη(s)m
m=0
< ε.
Thus there are no problems with the convergence in (10.9.9). As for the equality
between the two expressions there, note that
256 10 Volume of Tubes
fμ (x) dHl (x)
Tube(M,ρ)
∞
ρ
l−1 j
r (−r)m d m
= μ(M) + L
Mj +1 M, fμ dr
0 j! m! dηm
j =0 m=0
∞
l−1
ρ j +1+m m
Lj +1 M, d fμ
= μ(M) + (−1)m M
j !m!(j + 1 + m) dηm
j =0 m=0
∞
l
ρm m − 1 m−j L d m−j
= μ(M) + (−1) Mj +1 M, m−j fμ
m! j − 1 dη
j =1 m=j
∞
m−1
ρm m − 1 Lj M, d
m−j −1
= μ(M) + (−1)m−j −1 M f μ ,
m! j dηm−j −1
m=1 j =0
which, except for the fact that m and j have switched roles, is precisely what we
need.
for suitable locally convex spaces (M, Z) ⊂ Rl . This expansion will be crucial for
our analysis of non-Gaussian processes in Chapter 15.
where
j −1
γ j −1 L 2
Mj R (M) = (2π)−l/2 Mm+1 M, Hj −1−m (η, t) e−|t| /2 .
l
m
m=0
(10.9.12)
10.9 Generalized Lipschitz–Killing Curvature Measures 257
Proof. Comparing the above with Theorem 10.9.5, it is clear (cf. (10.9.10)) that all
we need show is that
d j −1−m
fμ (t) = (−1)j −1−m Hj −1−m (η, t) e−|t| /2 .
2
dηj −1−m
However, this is easy to check from the generating function definition of Hermite
polynomials at (11.6.11).
To see how this expansion works in the simplest of cases, take M = [u, ∞) ⊂ R.
Lk are then identically zero for k ≥ 2, there is only one term appearing
Since all the M
γ l
in the sum defining Mj R (M). Furthermore, since Tube([u, ∞), ρ) = [u − ρ, ∞)),
we have that ML1 is supported on {u} × {+1} (+1 being the right-pointing unit vector
at u), and so
e−u /2
2
γ l
Mj R ([u, ∞)) = Hj −1 (u) √ . (10.9.13)
2π
It is elementary calculus to check that this gives the same expansion, when substituted
into (10.9.11), that comes from a simple Taylor series expansion of the Gaussian tail
probability of (1.2.1). If you want to see the details, we shall go through them in
Section 15.10.1 for expository reasons.
We close with the following simple, but rather important, comment.
γ
Remark 10.9.7. Unlike the usual Minkowski functionals, the functionals Mj R are
l
actually normalized independently of the dimension Rl , in the sense that for all
integers k > 0,
γ γ
Mj R (·) = Mj R (· × Rk ).
l l+k
γ
Hence, from now on, we shall drop the Rl in the definition of Mj R , using the sim-
l
γ
pler notation Mj . Furthermore, although these functionals were derived under the
assumption of local convexity, it actually turns out they can also be defined for non–
Lj (M, ·)
locally convex sets, as long as the integrals with respect to the measures M
are finite.
We defer the examples of the applications of the results of this last section to
Chapter 15, where you finally will have a chance to see why we care about these
power series expansions.
Part III
You could not possibly have gotten this far without having read the preface, so you
already know that you have finally reached the main part of this book. It is here that
the important results lie and it is here that, after over 250 pages of preparation, there
will also be new results.
With the general theory of Gaussian processes behind us in Part I, and the geometry
of Part II now established, we return to the stochastic setting.
There are three main (classes of) results in this part. The first is an explicit formula
for the expected Euler characteristic of the excursion sets of smooth Gaussian random
fields. In the same way that we divided the treatment of the geometry into two parts,
Chapter 11 will cover the theory of random fields defined over simple Euclidean
domains and Chapter 12 will cover fields defined over Whitney stratified manifolds.
Unlike the case in Chapter 6, however, even if you are primarily interested in the
manifold scenario you will need to read the Euclidean case first, since some of the
manifold computations will be lifted from this case via atlas-based arguments.
As an aside, in the final section (Section 12.6) of Chapter 12 we shall return to a
purely deterministic setting and use our Gaussian field results to provide a probabilistic
proof of the classical Chern–Gauss–Bonnet theorem of differential geometry using
nothing22 but Gaussian processes. This really has nothing to do with anything else
in the book, but we like it too much not to include it.
In Chapter 13 we shall see how to lift the results about the mean Euler charac-
teristics of excursion sets to results about mean Lipschitz–Killing curvatures. The
argument will rely on a novel extension of the classical Crofton formulas about av-
eraged cross-sections of Euclidean sets to a scenario in which the cross-sections are
replaced by intersections of the set with certain random manifolds, and the averaging
is against a Gaussian measure. This new “Gaussian Crofton’’ formula is completely
new and would seem to be of significant independent interest.
The second main result appears in Chapter 14, where we shall finally prove the
result promised long ago, that not only is the difference
P sup f (t) ≥ u − E{ϕ(Au (f, M))}
t∈M
extremely small for large u, but it can even be bounded in a rigorous fashion. Not only
will this justify our claims that the mean Euler characteristic of excursion sets yields
an excellent approximation to excursion probabilities, but, en passant, we shall also
show that the volume-of-tubes approximation of Chapter 10 can be made rigorous
as well.
In the closing Chapter 15 we finally leave the Gaussian scenario and develop
the third main result, an explicit formula for both the expected Euler characteristics
22 Of course, this cannot really be true. Establishing the Chern–Gauss–Bonnet theorem without
any recourse to algebraic topology would have been a mathematical coup that might even
have made probability theory a respectable topic within pure mathematics. What will be
hidden in the small print is that everything relies on the Morse theory of Section 9.3, and
this, in turn, uses algebraic geometry. However, our approach will save myopic probabilists
from having to read the small print.
262 Part III. The Geometry of Random Fields
The main result of this chapter is Theorem 11.7.2 and its corollaries, which give
explicit formulas for the mean Euler characteristic of the excursion sets of smooth
Gaussian fields over rectangles in RN .
The chapter develops in a number of distinct stages. Initially, we shall develop
rather general results that give integral formulas for the expected number of points at
which a vector-valued random field takes specific values. These are Theorem 11.2.1
and its corollaries. Aside from their basic importance in the current setting, these
results will also form the basis of corresponding results for the manifold setting of
Chapter 12. In view of the results of Chapter 6, which relate the global topology of
excursion sets of a function to its local behavior, it should be clear what this has to do
with Euler characteristic computations. However, the results are important beyond
this setting and indeed beyond the setting of this book, so we shall develop them
slowly and carefully.
Before we tackle all this, however, we shall take a moment to look at the classic
Rice formula, which is the simplest special case of our main result and the proof of
which incorporates many of the components of the general scenario.
Indeed, we shall commit even more serious crimes, including exchanging delicate
orders of integration, sometimes involving the Dirac delta function. All will be
justified later.
As a first step, we would like to compute E{Nu+ (0, T )}. To this end, let δx be the
Dirac delta function at x, “defined’’ by the fact that, for any reasonable test function g,
δx (y)g(y) dy = g(x).
R
Suppose that the upcrossing points of f , i.e., those t ∈ [0, T ] at which f (t) = u and
f (t) > 0, are isolated, so that each one can be covered by a small interval I in which
there are no other upcrossings and throughout which f > 0.
Then, treating δ as if it were a smooth function, a simple change of variable
argument gives that
1= δu (y) dy = δu (f (t)) · f (t) dt.
R I
Of course, this needs justification, but that is what Theorem 11.2.1 does. Concate-
nating all such intervals I , and noting that there is no contribution to the following
integral from outside of them, we obtain
T
Nu+ (0, T ) = δu (f (t)) · 1[0,∞] (f (t)) · f (t) dt.
0
Now take expectations, freely exchanging orders of integration and assuming
that the pairs of random variables (f (t), f (t)) have joint probability densities pt , to
see that
T ∞ ∞
+
E Nu (0, T ) = dt dx dyyδu (x)pt (x, y) (11.1.2)
0 −∞ 0
T ∞
= ypt (u, y) dy dt.
0 0
This is what could be called Rice’s formula in its most basic form, and it holds for all
processes on R for which the various operations above are justifiable. Note, however,
that it requires no specific distributional assumptions on the process f .
However, it turns out to be remarkably difficult to compute the integral in (11.1.2)
unless f is either Gaussian or a function of Gaussian processes. The case that is central
to the remainder of this book is that in which f is indeed Gaussian, and has zero mean
and constant variance, which for notational convenience we assume is 1. (Otherwise,
the variance can be absorbed into u in all the following formulas.) In that case we
know from Section 5.6 that f (t) and f (t) are independent for each t, and so, denoting
the variance of f (t) by λt , it immediately follows that in this case,2
for all x then the point process is called simple. All the point processes that will appear in
this book will be simple. For further information on point processes, see, for example, [90].
2 To see an expression in the fully nonstationary case, look at [42, formula (13.2.1)]. Note,
however, that this expression is not of closed form, but remains as a rather complicated
integral over [0, T ].
11.1 Rice’s Formula 265
∞
e−u /2 e−y /2λt
2 T 2
Fig. 11.1.1. Realizations of Gaussian sample paths with the spectrum (11.1.7) and second
spectral moments λ2 = 200 (left) and λ2 = 1,000 (right).
We start with a metatheorem about the expected number of points at which a vector-
valued random field takes values in some set. For the moment, we gain nothing by
assuming that our fields are Gaussian and so do not do so. Here is the setting:
For some N, K ≥ 1 let f = (f 1 , . . . , f N ) and g = (g 1 , . . . , g K ), respectively,
be RN - and RK -valued N -parameter random fields. We need two sets, T ⊂ RN and
B ⊂ RK . As usual, T is a compact parameter set, but now we add the assumption
that its boundary ∂T has finite HN−1 -measure (cf. footnote 23 in Chapter 7). As for
B, we shall assume that it is open and that its boundary ∂B = B̄ \ B has Hausdorff
dimension K − 1.
As usual, ∇f denotes the gradient of f . Since f takes values in RN , this is now
an N × N matrix of first-order partial derivatives of f ; i.e.,
∂f i (t)
(∇f )(t) ≡ ∇f (t) ≡ (fji (t))i,j =1,...,N ≡ .
∂tj i,j =1,...,N
All the derivatives here are assumed to exist in an almost sure sense.5
4 One might guess that in moving to constant variance, but otherwise nonstationary, random
fields on RN all that might happen would be that the λt of (11.1.3) might be replaced
with some other local derivative, or perhaps determinant of derivatives, of the covariance
function at the point (t, t). This, however, is not the case, and the actual situation, as we
shall soon see, is far more complicated.
5 This is probably too strong an assumption, since, at least in one dimension, Theorem 11.2.1
is known to hold under the assumption that f is absolutely continuous, and so has only a
weak-sense derivative (cf. [109]). However, since we shall need a continuous sample path
derivative later for other things, we assume it now.
11.2 An Expectation Metatheorem 267
Theorem 11.2.1. Let f , g, T and B be as above. Assume that the following conditions
are satisfied for some u ∈ RN :
(a) All components of f , ∇f , and g are a.s. continuous and have finite variances
(over T ).
(b) For all t ∈ T , the marginal densities pt (x) of f (t) (implicitly assumed to exist)
are continuous at x = u.
(c) The conditional densities6 pt (x|∇f (t), g(t)) of f (t) given g(t) and ∇f (t)
(implicitly assumed to exist) are bounded above and continuous at x = u, uni-
formly in t ∈ T .
(d) The conditional densities pt (z|f (t) = x) of det ∇f (t) given f (t) = x, are
continuous for z and x in neighborhoods of 0 and u, respectively, uniformly in
t ∈ T.
(e) The conditional densities pt (z|f (t) = x) of g(t) given f (t) = x, are continuous
for all z and for x in a neighborhood u, uniformly in t ∈ T .
(f) The following moment condition holds:
i N
sup max E fj (t) < ∞. (11.2.1)
t∈T 1≤i,j ≤N
(g) The moduli of continuity with respect to the usual Euclidean norm (cf. (1.3.6)) of
each of the components of f , ∇f , and g satisfy
P {ω(η) > ε} = o ηN as η ↓ 0 (11.2.2)
While conditions (a)–(g) arise naturally in the proof of Theorem 11.2.1, they all
but disappear in one of the cases of central interest to us, when the random fields f and
g are Gaussian. In these cases all the marginal and conditional densities appearing
in the conditions of Theorem 11.2.1 are also Gaussian, and so their boundedness
and continuity are immediate, as long as all the associated covariance matrices are
nondegenerate, which is what we need to assume in this case. This will also imply
that all variances are finite and that the moment condition (11.2.1) holds. Thus the
only remaining conditions are the a.s. continuity of f , ∇f , and g, and condition
(11.2.2) on the moduli of continuity.
Note first, without reference to normality, that if ∇f is continuous, then so must7
be f . Thus we have only the continuity of ∇f and g to worry about. However,
we spent a lot of time in Chapter 1 finding conditions that will guarantee this. For
example, we can apply Theorem 1.4.1.
Write Cfi = Cfi (s, t) for the covariance function of f i , so that Cfi j = ∂ 2 Cfi /∂sj ∂tj
is the covariance function of fji = ∂f i /∂tj . Similarly, let Cgi denote the covariance
function of g i . Then, by (1.4.4), ∇f and g will be a.s. continuous if
for some finite K > 0, some α > 0, and all |t − s| small enough.
All that now remains to check is condition (11.2.2) on the moduli of continuity.
Here the Borell–TIS inequality—Theorem 2.1.1—comes into play. Write h for any
of the components of ∇f or g, and H for the random field on T × T defined by
H (s, t) = h(t) − h(s). Then, writing
ω(η) = sup |H (s, t)|,
s,t:|t−s|≤η
where ση2 = sups,t:|t−s|≤η E{(H (s, t))2 }. But (11.2.5) immediately implies that
ση2 ≤ K| ln η|−(1+α) , while together with Theorem 1.4.1 it implies a bound of similar
order for E{ω(η)}. Substituting this into (11.2.7) gives an upper bound of the form
Cε η| ln η| , and so (11.2.6) holds with room to spare, in that it holds for any N and not
α
just N = dim(T ).
Putting all of the above together, we have that Theorem 11.2.1 takes the following
much more user friendly form in the Gaussian case.
7 The derivative can hardly exist, let alone be continuous, if f is not continuous! In fact, this
condition was vacuous all along and was included only for “symmetry’’ considerations.
11.2 An Expectation Metatheorem 269
Corollary 11.2.2. Let f and g be centered Gaussian fields over a T that satis-
fies the conditions of Theorem 11.2.1. If for each t ∈ T , the joint distributions of
(f (t), ∇f (t), g(t)) are nondegenerate, and if (11.2.5) holds, then so do (11.2.3)
and (11.2.4).
δε (t) dt = 1. (11.2.8)
B(ε)
neither overlap nor intersect ∂T . Furthermore, because of (b), we can ensure that η
is small enough so that within each ball, g(t) always lies in either B or the interior of
its complement, but never both.
Let σ (ε) be the ball |f | < ε in the image space of f . From what we have just
established, we claim that we can now choose ε small enough for the inverse image
of σ (ε) in T to be contained within the union of the η spheres. (In fact, if this were
not so, we could choose a sequence of points tn in T not belonging to any η sphere,
and a sequence εn tending to zero such that f (tn ) would belong to σ (εn ) for each
n. Since T is compact, the sequence tn would have a limit point t ∗ in T , for which
we would have f (t ∗ ) = 0. Since t ∗ ∈ / ∂T by (c), we must have t ∗ ∈ T . Thus t ∗ is
contained in the inverse image of σ (ε) for any ε, as must be infinitely many of the tn .
This contradiction establishes our claim.)
Furthermore, by (b) and the inverse mapping theorem (cf. footnote 6 of Chapter 6)
we can choose ε, η so small that for each η sphere in T , σ (ε) is contained in the f
image of the η sphere, so that the restriction of f to such a sphere will be one-to-one.
Since the Jacobian of the mapping of each η sphere by f is | det ∇f (t)|, it follows
that we can choose ε small enough so that
N0 = δε (f (t))1B (g(t))| det ∇f (t)| dt.
T
This follows since each η sphere in T over which g(t) is in B will contribute exactly
one unit to the integral, while all points outside the η spheres will not be mapped onto
σ (ε). Since the left-hand side of this expression is independent of ε, we can take the
limit as ε → 0 to obtain (11.2.9) and thus the theorem.
Theorem 11.2.3 does not tell us anything about expectations. Ideally, it would be
nice simply to take expectations on both sides of (11.2.9) and then, hopefully, find an
easy way to evaluate the right-hand side of the resulting equation. While this requires
justification and further assumptions, let us nevertheless proceed in this fashion, just
to see what happens. We then have
E{N0 } = lim E δε (f (t))1B (g(t))| det ∇f (t)| dt
ε→0
T
= 1B (v)| det ∇y|
T RN(N+1)/2 RK
× lim δε (x)pt (x, ∇y, v) dx d∇y dv dt,
ε→0 RN
To see this, suppose that t is such that f (t) = u. (If there are no such t, there is nothing
more to prove.) The fact that f ∈ C 1 implies that there is a neighborhood of t in which
f is locally linear: that is, it can be approximated by its tangent plane. Furthermore, since
det ∇f (t) = 0, not all the partial derivatives can be zero, and so the tangent plane cannot be
at a constant “height.’’ Consequently, throughout this neighborhood there is no other point
at which f = u. Since T is compact, there can therefore be no more than a finite number
of t satisfying f (t) = u.
11.2 An Expectation Metatheorem 271
where the pt are the obvious densities. Taking the limit in the innermost integral
yields
E{N0 } = 1B (v)| det ∇y|pt (0, ∇y, v) d∇y dv dt (11.2.10)
T
= E{| det ∇f (t)|1B (g(t))|f (t) = 0}pt (0) dt.
T
Of course, interchanging the order of integration and the limiting procedure re-
quires justification. Nevertheless, at this point we can state the following tongue-in-
cheek “corollary’’ to Theorem 11.2.3.
Corollary 11.2.4. If the conditions of Theorem 11.2.3 hold almost surely, as well as
“adequate’’ regularity conditions, then
E{Nu } = | det ∇x|1B (v)pt (u, ∇x, v) d∇x dv dt
T R
N(N+1)/2 RK
(11.2.11)
= E{| det ∇f (t)|1B (g(t))|f (t) = u}pt (u) dt.
T
#
n
E{|X1 · · · Xn |} ≤ [E{|Xi |n }]1/n . (11.2.12)
i=1
Theorem 11.2.6. Let f , g, B, and T be as in Theorem 11.2.3, but with f and g random
and conditions (a)–(d) there holding in an almost sure sense. Assume, furthermore,
that conditions (b)–(g) of Theorem 11.2.1 hold, with the notation adopted there. Then
E{Nu (f, g; T , B)} ≤ dt dv | det ∇y|pt (u, ∇y, v) d∇y.
T B RN(N+1)/2
(11.2.13)
Since all densities are assumed continuous and bounded, the innermost integral clearly
converges to
pt (0∇y, v)
as ε → 0. Furthermore,
δε (x)pt (x ∇y, v) dx ≤ sup pt (x ∇y, v) δε (x) dx
RN x RN
= sup pt (x|∇y, v),
x
We can now turn to the more difficult part of our problem: showing that the upper
bound for E{N(T )} obtained in the preceding theorem also serves as a lower bound
under reasonable conditions. We shall derive the following result.
Theorem 11.2.7. Assume the setup and assumptions of Theorem 11.2.6, along with
(11.2.2) of Theorem 11.2.1, that is, that the moduli of continuity of each of the com-
ponents of f , ∇f , and g satisfy
for all ε > 0. Then (11.2.13) holds with the inequality sign reversed, and so is an
equality.
11.2 An Expectation Metatheorem 273
Since the proof of this theorem is rather involved, we shall start by first describing
the principles underlying it. Essentially, the proof is based on constructing a pathwise
approximation to the vector-valued process f and then studying the zeros of the
approximating process. The approximation is based on partitioning T and replacing
f within each cell of the partition by a hyperplane tangential to f at the cell’s midpoint.
We then argue that if the approximating process has a zero within a certain subset of
a given cell, then f has a zero somewhere in the full cell. Thus the number of zeros
of the approximating process will give a lower bound to the number of zeros of f .
In one dimension, for example, we replace the real-valued function f on T =
[0, 1] by a series of approximations f (n) given by
9 :
1 −n 1 −n 1 −n
f (n) (t) = f j+ 2 + t− j+ 2 f j+ 2 ,
2 2 2
for j 2−n ≤ t < (j + 1)2−n , and study the zeros of f (n) as n → ∞. Although this
is perhaps not the most natural approximation to use in one dimension, it generalizes
easily to higher dimensions.
Proof of Theorem 11.2.7. As usual, we take the level u = 0, and start with some
notation. For each n ≥ 1 let Zn denote the lattice of points in RN whose components
are integer multiples of 2−n , i.e.,
Zn = {t ∈ RN : tj = i2−n , j = 1, . . . , N, i ∈ Z}.
Now fix ε > 0, and for each n ≥ 1 define two half-open hypercubes centered on an
arbitrary point t ∈ RN by
Set
!
1 if N0 (f, g; n (t), B) ≥ 1 and n (t) ⊂ T ,
Int =
0 otherwise,
Note (since it will be important later) that only those n (t) that lie wholly within T
contribute to the approximations. However, since the points being counted are, by
assumption, almost surely isolated, and none lie on ∂T , it follows that
a.s.
N n → N0 (f, g; T , B) as n → ∞.
where the moduli of continuity are all taken over T . Furthermore, define Mf by
9 :
Mf = max max sup |f j (t)|, max sup ∂f i /∂tj (t) .
1≤j ≤N t∈T 1≤i,j ≤N t∈T
Finally, set
δ2ε
η= .
2N !(K + 1)N−1
Then the conditions of the theorem imply that as n → ∞,
P{ω∗ (n) > η} = o 2−Nn . (11.2.16)
We claim that if both ωf∗ (n) < η and GδK occur, then, for n large enough,
t∗ ∈ εn (t) implies
and
These two facts, which we shall establish in a moment, are enough to make the
remainder of the proof quite simple. From (11.2.19), it follows that by choosing n
large enough for (11.2.16) to be satisfied we have
11.2 An Expectation Metatheorem 275
Using this, making the transformation t → t ∗ given by (11.2.18), and using the
notation of the theorem, we obtain
Noting (11.2.20), the continuity and boundedness assumptions on pt , and the bound-
edness assumptions on the moments of the fji , it follows that, as n → ∞, the last
expression, summed as in (11.2.15), converges to
(1 − ε)N dt | det ∇y|pt (0, ∇y, v) d∇y dv.
T GδK
This, of course, completes the proof, barring the issue of establishing that (11.2.19)
and (11.2.20) hold under the conditions preceding them.
Thus, assume that ω∗ (n) < η, GδK occur, and the t ∗ defined by (11.2.18) satisfies
t ∗ ∈ εn (t). Then (11.2.20) is immediate. The hard part is to establish (11.2.19). To
this end, note that (11.2.18) can be rewritten as
εδ 2
| det ∇f (τ ) − det ∇f (t)| < , (11.2.22)
2
under the conditions we require. Thus, since det ∇f (t) > δ, it follows that
det ∇f (τ ) = 0 for any τ ∈ n (t) and so the matrix ∇f (τ ) is invertible throughout
n (t). Similarly, one can check that g(τ ) ∈ B for all τ ∈ n (t).
Consider first (11.2.19). We need to show that t ∗ ∈ εn (t) implies the existence
of at least one τ ∈ n (t) at which f (τ ) = 0.
The mean value theorem9 allows us to write
9 In the form we need it, here is the mean value theorem for Euclidean spaces. A proof can
be found, for example, in [13].
Lemma 11.2.8 (mean value theorem for RN ). Let T be a bounded open set in RN and
let f : T → RN have first-order partial derivatives at each point in T . Let s and t be two
points in T such that the line segment
276 11 Random Fields on Euclidean Spaces
for some points t 1 (τ ), . . . , t N (τ ) lying on the line segment L(t, τ ), for any t and
τ , where ∇f (t 1 , . . . , t N ) is the matrix function ∇f with the elements in the kth
column evaluated at the point t k . Using similar arguments to those used to establish
(11.2.22) and the invertibility of ∇f (τ ) throughout n (t), invertibility can be shown
for ∇f (t 1 , . . . , t N ) as well. Hence we can rewrite (11.2.24) as
has at least one fixed point. Thus, by (11.2.25), there would be at least one τ ∈ n (t)
for which f (τ ) = 0. In other words,
f (t)[∇f (t 1 , . . . , t N )]−1
= f (t)[∇f (t)]−1 ∇f (t)[∇f (t 1 , . . . , t N )]−1
= f (t)[∇f (t)]−1 I + ∇f (t) − ∇f (t 1 , . . . , t N ) [∇f (t 1 , . . . , t N )]−1 ,
noting (11.2.21) and bounding the rightmost expression using basically the same
argument employed for (11.2.22). This completes the proof.
L(s, t) = {u : u = θ s + (1 − θ )t, 0 < θ < 1}
10 In the form we need it, the Brouwer fixed point theorem is as follows. Proofs of this result
are easy to find (e.g., [146]).
We now complete the task of this section, i.e., the proof of Theorem 11.2.1, which
gave the expression appearing in both Theorems 11.2.6 and 11.2.7 for E{Nu }, but
under seemingly weaker conditions than those we have assumed. What remains to
show is that conditions (b)–(d) of Theorem 11.2.3 are satisfied under the conditions
of Theorem 11.2.1. Condition (c) follows immediately from the following rather
intuitive result, taking h = f , n = N − 1, and identifying the T of the lemma with
∂T in the theorems. The claims in (b) and (d) are covered in the three remaining
lemmas of the section. Lemma 11.2.10 has roots going back to Bulinskaya [35].
Proof. We start with the observation that under the conditions of the lemma, for any
ε > 0, there exists a finite Cε such that
P max sup ∂hi (t)/∂tj < Cε > 1 − ε.
ij T
Writing ωh for the modulus of continuity of h, it follows from the mean value theo-
rem that
where
√
Eε = ωh (η) ≤ nCε η for small enough η .
In view of (11.2.27) it suffices to show that the sum here can be made arbitrarily
small.
278 11 Random Fields on Euclidean Spaces
To see this, let tmj be the center point of Bmj . Then, if both Amj and Eε occur, it
follows that for large enough m,
√
h(tmj ) − u ≤ nCε diam(Bmj ).
where M is a bound on the densities. Substituting into (11.2.29) and noting (11.2.28)
we are done.
We now turn to the second part of condition (b) of Theorem 11.2.3, relating to
the points t ∈ T satisfying f (t) − u = det ∇f (t) = 0. Note firstly that this would
follow easily from Lemma 11.2.10 if we were prepared to assume that f ∈ C 3 (T ).
This, however, is more than we are prepared to assume. That the conditions of
Theorem 11.2.1 contain all we need is the content of the following lemma.
Lemma 11.2.11. Let f and T be as in Theorem 11.2.1, with conditions (a), (b), (d),
and (g) of that theorem in force. Then, with probability one, there are no points t ∈ T
satisfying f (t) − u = det ∇f (t) = 0.
Proof. As for the proof of the previous lemma, we start with an observation. In
particular, for any ε > 0, there exists a continuous function ωε for which ωε (η) ↓
0 as η ↓ 0, and a finite positive constant Cε , such that P{Eε } > 1 − ε, where now
Eε = max sup |fj (t)| < Cε , max ωf i (η) ≤ ωε (η), for 0 < η ≤ 1 .
i
ij t∈T ij j
To see this, the following simple argument11 suffices: For ease of notation, set
It then follows from (11.2.2) that there are sequences {cn } and {ηn }, both decreas-
ing to zero, such that
P ω∗ (ηn ) < cn > 1 − 2−n ε.
Defining ωε (η) = cn for ηn+1 ≤ η < ηn , Borel–Cantelli then gives, for some
η1 > 0, that
P ω∗ (η) < ωε (η), 0 < η ≤ η1 > 1 − ε/2.
|f (tmj ) − u| ≤ CN Cε diam(Bmj ).
where pt is the density of f (t) and the integral is over a region in RN of volume no
greater than CN diam(Bmj )N .
By assumption (g) of Theorem 11.2.1 the integrand here tends to zero as m → ∞,
and so can be made arbitrarily small. Furthermore, pt is uniformly bounded. Putting
all of this together with (11.2.30) proves the result.
We now turn to the first part of condition (b) of Theorem 11.2.3, relating to the
points t ∈ T satisfying f (t) = 0 and g(t) ∈ ∂B. These are covered by the following
lemma, whose proof is almost identical to that of Lemma 11.2.11 and so is left to you.
Lemma 11.2.12. Let f and T be as in Theorem 11.2.1, with conditions (a), (b), (e),
and (g) of that theorem in force. Then, with probability one, there are no points t ∈ T
satisfying f (t) − u = 0 and g(t) ∈ ∂B.
280 11 Random Fields on Euclidean Spaces
Recall that in the current Euclidean setting, critical points are those t for which
∇f (t) = 0, and “nondegeneracy’’ means that the Hessian ∇ 2 f (t) has nonzero de-
terminant. Note also that in both conditions (ii) and (iii) there is a strong dependence
on dimension. In particular, in (ii) the requirement is that the Rk+1 -valued function
(∇(f|∂k T ), det ∇ 2 (f|∂k T )) not have zeros over a k-dimensional parameter set. Re-
garding (iii), the requirement is that the Rk -valued function ∇(f|∂k T ) defined on a
k-dimensional set not have zeros on a subset of dimension k − 1.
In this light, (ii) and (iii) are clearly related to Lemma 11.2.11, and we leave it to
you to check the details that give us the following theorem.
(a) f is, with probability one, C 2 on an open neighborhood of T , and all second
derivatives have finite variance.
(b) For all t ∈ J , the marginal densities pt (x) of ∇f|J (t) are continuous at 0,
uniformly in t.
(c) The conditional densities pt (z|x) of det ∇ 2 f|J (t) given ∇f|J (t) = x are contin-
uous for (z, x) in a neighborhood of 0, uniformly in t ∈ T .
11.3 Suitable Regularity and Morse Functions 281
(d) On J , the moduli of continuity of f and its first- and second-order partial deriva-
tives all satisfy, for any ε > 0,
P {ω(η) > ε} = o ηdim(J ) as η ↓ 0.
Corollary 11.3.2. Let f be a centered Gaussian field over a finite rectangle T . If for
each t ∈ T , the joint distributions of (fi (t), fij (t))i,j =1,...,N are nondegenerate, and
if for some finite K and all s, t ∈ T ,
max Cfij (t, t) + Cfij (s, s) − 2Cfij (s, t) ≤ K |ln |t − s||−(1+α) , (11.3.1)
i,j
then the sample functions of f are, with probability one, Morse functions over T .
The next issue is to determine when sample functions are, with probability one,
suitably regular in the sense of Definition 6.2.1. This is somewhat less elegant because
of the asymmetry in the conditions of suitable regularity and their dependence on a
particular coordinate system. Nevertheless, the same arguments as above work here
as well and it is easy (albeit a little time consuming) to see that the following suffices
to do the job.
Theorem 11.3.3. Under the conditions of Corollary 11.3.2 the sample functions of f
are, with probability one, suitably regular over bounded rectangles.
A little thought will show that the assumptions of this theorem would seem to
give more than is required. Consider the case N = 2. In that case, condition (6.2.6)
of suitable regularity requires that there be no t ∈ T for which f (t) − u = f1 (t) =
f11 (t) = 0. This is clearly implied by Theorem 11.3.3. However, the theorem also
implies that f (t) − u = f2 (t) = f22 (t) = 0, which is not something that we require.
Rather, it is a consequence of a desire to write the conditions of the theorem in a
compact form.
In fact, Theorem 11.3.3 goes even further, in that it implies that for every fixed
choice of coordinate system, the sample functions of f are, with probability one,
suitably regular over bounded rectangles.12
We now turn to what is really the most important case, that of determining suf-
ficient conditions for a random field to be, almost surely, a Morse function over a
piecewise C 2 Riemannian manifold (M, g). Writing our manifold as usual as
N
M= ∂j M (11.3.2)
j =0
12 Recall that throughout our discussion of integral geometry there was a fixed coordinate
system and that suitable regularity was defined relative to this system.
282 11 Random Fields on Euclidean Spaces
(cf. (9.3.5)), conditions (i)–(iii) still characterize whether f will be a Morse function.
The problem is how to generalize Theorem 11.3.1 to this scenario, since its proof
was based on the results of the previous section, all of which were established in a
Euclidean setting. The trick, of course, is to recall that each of the three required
properties is of a local nature. We can then argue as follows:
Choose a chart (U, ϕ) from a countable atlas covering M. Let t ∗ ∈ U be a
critical point of f , and note that this property is independent of the choice of local
coordinates. Working therefore with the natural basis for Tt M, it is easy to see that
ϕ(t ∗ ) is also a critical point of ϕ(U ). Furthermore, the covariant Hessian ∇ 2 f (t ∗ )
will be degenerate if and only if the same is true of the regular Hessian of f ◦ ϕ −1 ,
and since ϕ is a diffeomorphism, this implies that t ∗ will be a degenerate critical point
of f if and only if ϕ(t ∗ ) is for f ◦ ϕ −1 . It therefore follows that even in the manifold
case, we can manage, with purely Euclidean proofs, to establish the following result.
The straightforward but sometimes messy details are left to you.
Thus, we take another route, given conditions that are less elegant but easier to
establish and generally far easier to check in practice. As for Theorem 11.3.4 itself,
we leave it to you to check the details of the (straightforward) proof of the corollary.
Corollary 11.3.5. Take the setup of Theorem 11.3.4 and let f be a centered Gaussian
field over M. Let A = (Uα , ϕα )α∈I be a countable atlas for M such that for every
α, the Gaussian field fα = f ◦ ϕα−1 on ϕα (Ua ) ⊂ RN satisfies the conditions of
Corollary 11.3.2 with T = ϕα (Uα ), f = fα and some Kα > 0. Then the sample
functions of f are, with probability one, Morse functions over M.
Now take expectations (assuming that this is allowed) of both sides to obtain
ϕ(u)E{Nu (f : T )} du
RN
= E {ϕ(f (t))| det ∇f (t)|} dt
T
284 11 Random Fields on Euclidean Spaces
= ϕ(u) E | det ∇f (t)|f (t) = u pt (u) dtdu.
RN T
Since ϕ was arbitrary, this implies that for (Lebesgue) almost every u,
E{Nu (f : T )} = E | det ∇f (t)|f (t) = u pt (u) dt, (11.4.1)
T
which is precisely (11.2.4) of Theorem 11.2.1 with the g there identically equal to 1.
Modulo this restriction on g, which is simple to remove, this is the result we have
worked so hard to prove. The problem, however, is that since it is true only for almost
every u, one cannot be certain that it is true for a specific value of u.
To complete the proof, we need only show that both sides of (11.4.1) are con-
tinuous functions of u and that the assumptions of convenience made above are no
more than that. This, of course, is not as trivial as it may sound. Going through the
arguments actually leads to repeating many of the technical points we went through
in the previous section, and eventually Theorem 11.2.1 reappears with the same long
list of conditions. However (and this is the big gain), the details have no need of the
construction, in the proof of Theorem 11.2.7, of the linear approximation to f .
You can find all the details in [15] and decide for yourself which proof you prefer.
While not at all obvious at first sight, hidden away in Theorem 11.2.1 is another result,
about higher moments of the random variable Nu (f, g : T , B). To state it, we need,
for integral k ≥ 1, the kth partial factorial of a real x defined by
(x)k = x(x − 1) · · · (x − k + 1).
T k = {3
t = (t1 , . . . , tk ) : tj ∈ T , ∀1 ≤ j ≤ k},
f3(3
t) = (f (t1 ), . . . , f (tk )) : T k → RNk ,
g (3
3 t) = (g(t1 ), . . . , g(tk )) : T k → RKk .
Replace assumptions (b), (c), and (d) of Theorem 11.2.1 by the following, assumed
to hold for each t ∈ T k \ {t ∈ T k : ti = tj for some i = j }:
Then,
⎧ ⎫
⎨# ⎬
k
E{(Nu )k } = E det ∇f (tj ) 1B (g(tj ))f3(3
t) = ũ p3t (ũ) d3
t (11.5.1)
Tk ⎩ j =1
⎭
#
k
= det Dj 1B (vj )p3t (ũ, D̃, ṽ) d D̃ d ṽ d3
t,
Tk RkD j =1
where
Then the field f3 satisfies all the assumptions of Theorem 11.2.1, uniformly over the
parameter set Tδk .
It therefore follows from (11.2.3) and (11.2.4) that E{Nũ (f3, 3
g : Tδk , B k )} is given
by either of the integrals in (11.5.1), with the outer integrals taken over Tδk rather
than T k .
Let δ ↓ 0. Then, using the fact that f = u only finitely often (cf. footnote 8), it
is easy to see that with probability one,
Nũ (f3, 3
g : Tδk , B k ) ↑ (Nu (f, g : T , B))k .
As for the basic expectation result, Theorem 11.5.1 takes a far simpler form if f
is Gaussian, and we have the following.
Corollary 11.5.2. Let f and g be centered Gaussian fields over a T that satisfies the
conditions of Theorem 11.2.1. If for each t˜ = (t1 , . . . , tk ) ∈ T k , the joint distributions
of {(f (tj ), ∇f (tj ), g(tj ))}1≤j ≤k are nondegenerate for all choices of distinct tj ∈ T ,
and if (11.2.5) holds, then so does (11.5.1).
286 11 Random Fields on Euclidean Spaces
where
1
n n
Q = Q(θ ) = − θi E{Xi Xj }θj .
2
i=1 j =1
where in the last equation, the sequence r1 , . . . , rn−2 does not include the two numbers
k and j .
The moments of various orders can now be obtained by setting θ = 0 in the
equations of (11.6.5). Since from (11.6.4) we have Qj (0) = 0 for all j , the last (and
most general) equation in (11.6.5) thus leads to
E{X1 · · · Xn } = E{Xr1 · · · Xrn−2 }E{Xj Xk }.
k=j
From this relationship and the fact that the Xj all have zero mean it is easy to
deduce the validity of (11.6.1) and (11.6.2). It remains only to determine exactly the
number, M say, of terms in the summation (11.6.2). We note first that while there are
(2m)! permutations of X1 , . . . , X2m , since the sum does not include identical terms,
M < (2m)!. Secondly, for each term in the sum, permutations of the m factors result in
identical ways of breaking up the 2m elements. Thirdly, since E{Xj Xk } = E{Xk Xj },
an interchange of the order in such a pair does not yield a new pair. Thus
M(m!)(2m ) = (2m)!,
implying
(2m)!
M=
m!2m
as stated in the lemma.
For the next lemma we need some notation. Let N be a symmetric N ×N matrix
with elements ij such that each ij is a zero-mean normal variable with arbitrary
variance but such that the following relationship holds:
E{|2m+1 |} = 0, (11.6.7)
(−1)m (2m)!
E{|2m |} = . (11.6.8)
m!2m
Proof. Relation (11.6.7) is immediate from (11.6.1). Now
|2m | = η(p)1i1 · · · 2m,i2m ,
P
where Q is the set of the (2m)!/m!2m ways of grouping (i1 , i2 , . . . , i2m ) into pairs
without regard to order, keeping them paired with the first index. Thus, by (11.6.6),
E{|2m |} = η(p) {E(1, i1 , 2, i2 ) − δ1i1 δ2i2 } × · · ·
P Q
× {E(2m − 1, i2m−1 , 2m, i2m ) − δ2m−1,i2m−1 δ2m,i2m }.
It is easily seen that all products involving at least one E term will cancel out because
of their symmetry property. Hence
E{|2m |} = η(p) (−1)m δ1i1 δ2i2 · · · δ2m−1,i2m−1 δ2m,i2m
P Q
(−1)m (2m)!
= ,
(m!)2m
the last line coming from changing the order of summation and then noting that for
only one permutation in P is the product of delta functions nonzero. This completes
the proof.
Note that (11.6.8) in no way depends on the specific (co)variances among the
elements of N . These all disappear in the final result due to the symmetry of E.
Before stating the next result we need to introduce the family of Hermite polyno-
mials. The nth Hermite polynomial is the function
n/2!
(−1)j x n−2j
Hn (x) = n! , n ≥ 0, x ∈ R, (11.6.9)
j !(n − 2j )!2j
j =0
where a! is the largest integer less than or equal to a. For convenience later, we also
define
√ 2
H−1 (x) = 2π (x)ex /2 , x ∈ R, (11.6.10)
where is the tail probability function for a standard Gaussian variable (cf. (1.2.1)).
With the normalization inherent in (11.6.9) the Hermite polynomials form an orthog-
onal (but not orthonormal) system with respect to standard Gaussian measure on R,
in that
!
1 −x 2 /2 n!, m = n,
√ Hn (x)Hm (x)e dx =
2π R 0, m = n.
2 /2 d n −x 2 /2
Hn (x) = (−1)n ex e , n ≥ 0. (11.6.11)
dx n
From this it immediately follows that
∞
Hn (x)e−x /2 dx = Hn−1 (u)e−u /2 ,
2 2
n ≥ 0. (11.6.12)
u
Corollary 11.6.3. Let N be as in Lemma 11.6.2, with the same assumptions in force.
Let I be the N × N unit matrix, and x ∈ R. Then
Proof. It follows from the usual Laplace expansion of the determinant that
det(N −xI ) = (−1)N [x N −S1 (N )x N−1 +S2 (N )x N−2 +· · ·+(−1)N SN (N )],
(11.6.13)
where Sk (N ) is the sum of the Nk principal minors of order k in |N |. The result
now follows trivially from (11.6.7) and (11.6.8).
is a symmetric function of i, j, k, .
Finally, note, as shown in Section 5.4, that f and its first-order derivatives are
independent (at any fixed point t) as are the first- and second-order derivatives (from
one another). The field and its second-order derivatives are, however, correlated, and
Finally, denote the N × N Hessian matrix (fij ) by ∇ 2 f , and recall that the index
of a matrix is defined as its number of negative eigenvalues.
Before turning to the proof of the lemma, there are some crucial points worth
noting. The first is the rather surprising fact that the result depends on the covariance
of f only through some of its derivatives at zero, that is, only through the variance
and second-order spectral moments. This is particularly surprising in view of the fact
that the definition of the μk depends quite strongly on the fij , whose distribution
involves fourth-order spectral moments.
As will become clear from the proof, the disappearance of the fourth-order spectral
moments has a lot to do with the fact that we compute the mean of the alternating
sum in (11.7.6) and do not attempt to evaluate the expectations of the individual μk .
Doing so would indeed involve fourth-order spectral moments. As we shall see in
later chapters, the fact that this is all we need is extremely fortunate, for it is actually
impossible to obtain closed expressions for any of the E{μk }.
Note that det Q = (det )−1/2 . Now take the transformation of RN given by t →
tQ−1 , under which T → T Q = {τ : τ = tQ−1 for some t ∈ T } and define
f Q : T Q → R by
f Q (t) = f (tQ).
The new process f Q has covariance function
11.7 The Mean Euler Characteristic 291
#
N
T = [0, Ti ],
i=1
2 N−k
k faces of dimension k in T . As opposed to our previous conventions, we
take these faces as closed. Thus, all faces in Jk are subsets of some face in Jk for
all k > k. (For example,
N JN contains only T while J0 contains the 2 vertices of
N
We need one more piece of notation. Take J ∈ Jk . With the λij being the spectral
moments of (11.7.1), write J for the k × k matrix with elements λij , i, j ∈ σ (J ).
This is enough to state the following result.
Theorem 11.7.2. Let f be as described at the beginning of this section and T as
above. For real u, let Au = Au (f, T ) = {t ∈ T : f (t) ≥ u} be an excursion set,
and let ϕ be the Euler characteristic. Then
N
|J ||J |1/2 u u
E {ϕ (Au )} = e−u
2 /2σ 2
(k+1)/2 k
Hk−1 + . (11.7.11)
(2π) σ σ σ
k=1 J ∈Ok
The simplification follows immediately from the spherical symmetry of the spec-
tral measure in this case, which (cf. (5.7.3)) implies that each matrix J is equal to
λ2 I . In fact, looking back into the proof of Lemma 11.7.1, which is where most of
the calculation occurs, you can see that the transformation to the process f Q is now
rather trivial, since Q = λ−1/2 I (cf. (11.7.7)). Looked at in this light, it is clear that
one of the key points of the proof was a transformation that made the first derivatives
of f behave as if they were those of an isotropic process. We shall see this again, but
at a far more sophisticated level, when we turn to the manifold setting.
Now consider the case N = 1, so that T is simply the interval [0, T ]. Then, using
the definition of the Hermite polynomials given by (11.6.9), it is trivial to check that
1/2
T λ2 −u2 /2σ 2
E {ϕ (Au (f, [0, T ])} = (u/σ ) + e , (11.7.13)
2π σ
so that we have recovered (11.1.6) and with it, the Rice formula. Figure 11.7.1 gives
two examples, with σ 2 = 1, λ2 = 200, and λ2 = 1,000.
Note from (11.7.13) that as u → −∞, we have E{ϕ(Au )} → 1. The excursion
set geometry behind this is simple. Once u < inf T f (t), we have Au ≡ T , and
so ϕ(Au ) = ϕ(T ), which in the current case is 1. This is, of course, a general
phenomenon, independent of dimension or the topology of T .
294 11 Random Fields on Euclidean Spaces
To see this analytically, simply look at the expression (11.7.13), or even (11.7.11)
for general rectangles. In both cases it is trivial that as u → −∞ all terms other than
(u/σ ) disappear, while (u/σ ) → 1.
It thus seems not unreasonable to expect that when we turn to a more general
theory (i.e., for T a piecewise smooth manifold) the term corresponding to the last
term in (11.7.11) might be ϕ(T )(u/σ ). That this is in fact the case can be seen from
the far more general results of Section 12.4 below.
We now turn to two dimensions, in which case the right-hand side of (11.7.12)
becomes, for σ 2 = 1,
M 1/2
N
T 2 λ2 2T λ2
e−u
2 /2
3/2
u+ + (u). (11.7.14)
(2π) 2π
Figure 11.7.2 gives two examples, again with λ2 = 200 and λ2 = 1,000.
Many of the comments that we made for the one-dimensional case have similar
analogues here and we leave them to you. Nevertheless, we emphasize three points:
11.7 The Mean Euler Characteristic 295
(i) You should note, for later reference, how the expression before the exponential
term can be thought √ of as one of a number of different power series: one in T ,
one in u, and one in λ2 .
(ii) The geometric meaning of the negative values of (11.7.14) are worth under-
standing. They are due to the excursion sets having, in the mean, more holes
than connected components for (most) negative values of u.
(iii) The impact of the spectral moments is not quite as clear in higher dimensions
as it is in one. Nevertheless, to get a feel for what is happening, look back at
the simulation of a Brownian sheet in Figure 1.4.1. The Brownian sheet is, of
course, both nonstationary and nondifferentiable, and so hardly belongs in our
current setting. Nevertheless, in a finite simulation, it is impossible to “see’’ the
difference between nondifferentiability and large second spectral moments,13 so
consider the simulation in the latter light. You can then see what is happening.
Large spectral moments again lead to local fluctuations generating large numbers
of small islands (or lakes, depending on the level at which the excursion set is
taken), and this leads to larger variation in the values of E{ϕ(Au )}.
In three dimensions, the last case that we write out, (11.7.12) becomes, for σ 2 = 1,
M 3/2 1/2 3/2
N
T 3 λ2 2 3T 2 λ2 3T λ2 T 3 λ2
e−u /2 + (u).
2
2
u + 3/2
u+ − 2
(2π ) (2π) 2π (2π )
Note that once again there are a number of different power series appearing here,
although now, as opposed to the two-dimensional
√ case, there is no longer a simple
correspondence between the powers of T , λ2 , and u.
The two positive peaks of the curve are due to Au being primarily composed of a
number of simply connected components for large u and primarily of simple holes for
13 Ignore the nonstationarity of the Brownian sheet, since this has no qualitative impact on the
discussion.
296 11 Random Fields on Euclidean Spaces
negative u. (Recall that in the three-dimensional case the Euler characteristic of a set
is given by the number of components minus the number of handles plus the number
of holes.) The negative values of E{ϕ(Au )} for u near zero are due to the fact that
Au , at those levels, is composed mainly of a number of interconnected, tubular-like
regions, i.e., of handles.
An example14 is given in Figure 11.7.4 that shows an impression of typical ex-
cursion sets of a function on I 3 above high, medium, and low levels.
Fig. 11.7.4. Three-dimensional excursion sets above high, medium, and low levels. For obvious
reasons, astrophysicists refer to these three cases (from left to right) as “meatball,’’ “sponge,’’
and “bubble’’ topologies.
N
E {ϕ (Au (f, T ))} = Lk (T )ρk (u), (11.7.15)
k=0
where
u2
ρk (u) = (2π)−(k+1)/2 Hk−1 (u)e− 2 , k ≥ 0,
√ 2
since H−1 (u) = 2π (u)eu /2 (cf. (11.6.10)).
In fact, the above holds also when T is an N-dimensional, piecewise smooth
manifold, and f has constant variance (i.e., there are no assumptions of isotropy,
or even stationarity). The Lipschitz–Killing curvatures, however, will be somewhat
more complex, depending on a Riemannian metric related to f . This will be the
content of Sections 12.2 and 12.4 of the next chapter. Now, however, we (finally)
turn to the following proof.
f (t) ≥ u, (11.7.17)
fj (t) = 0, j ∈ σ (J ), (11.7.18)
∗
εj fj (t) ≥ 0, j∈/ σ (J ), (11.7.19)
I (t) = Index(fmn (t))(m,n∈σ (J )) = k − i, (11.7.20)
where, as usual, the index of a matrix is the number of its negative eigenvalues.
Our first and main step will hinge on stationarity, which we exploit to replace
the expectation of the sum over J ∈ Jk in (11.7.16) by something simpler.16 Fix a
particular J ∈ Ok —i.e., a face containing the origin—and let P(J ) denote all faces
in Jk (including J itself) that are affine translates of (i.e., parallel to) J . There are
2N−k faces in each such P(J ). We can then rewrite the right-hand side of (11.7.16) as
N
k
(−1)i μi (J ). (11.7.21)
k=0 i=0 J ∈Ok J ∈P (J )
Consider the expectation of the innermost sum. By Theorem 11.2.1 (cf. (11.2.4)) we
can rewrite this as
E{| det ∇ 2 f|J (t)|1{I (t)=k−i} 1EJ (t) |∇f|J (t) = 0}p∇f|J (0) dt,
J ∈P (J ) J
where EJ (t) denotes the event that (11.7.17) and (11.7.19) hold.
Further simplification requires one more item of notation. For J ∈ Ok , let P ε (J )
denote the collections of all sequences {εj , j ∈ / σ (J )} of ±1’s. The elements of
P ε (J ) can be identified with the sequences ε ∗ (J ) with J ∈ P(J ).
15 “Recall’’ includes extending the results there from cubes to rectangles and changing the
order of summation, but both these steps are trivial.
16 Our approach here is not unlike the first-principles geometry used to prove that
N
Lj ([0, T ]N ) = Tj.
j
298 11 Random Fields on Euclidean Spaces
With this notation, and calling on stationarity, we can replace the last expression by
E{| det ∇ 2 f|J (t)|1{I (t)=k−i} 1Eε∗ (t) |∇f|J (t) = 0}p∇f|J (0) dt,
ε∗ ∈P ε (J ) J
where Eε∗ (t) denotes the event that (11.7.17) and (11.7.19) hold for ε ∗ .
Now note the trivial fact that for any J ,
is the sure event, i.e., has probability one. Applying this to the last sum, we see that
it simplifies considerably to
E det ∇ 2 f|J (t) 1{I (t)=k−i} 1{f (t)≥u} ∇f|J (t) = 0 p∇f|J (0) dt.
J
Going back to Theorem 11.2.1, we have that this is no more that the expected number
of points in J for which
If we call the number of points satisfying these conditions μk−i (J ), then putting
all of the above together and substituting into (11.7.16) we see that we need the
expectation of
N k
(−1)k (−1)k−i μk−i (J ).
k=0 J ∈Ok i=0
Lemma 11.7.1 gives us a precise expression for the expectation of the innermost
sum, at least for k ≥ 1, namely,
It is left to you to check that for the case k = 0 (i.e., when Jk contains the 2N
vertices of T ), the remaining term is given by (u/σ ). Putting all this together
immediately gives (11.7.11), and so the proof is complete.
So far, this entire chapter has concentrated on finding the mean value of the Euler
characteristic of Gaussian excursion sets. However, the Euler characteristic is only
one of N + 1 intrinsic volumes, and it would be useful to know how the remaining
N behave. We shall prove, in the setting of manifolds, in Chapter 13, that
11.9 On the Importance of Stationarity 299
N−j 9 :
j +l
E Lj (Au (f, T )) = ρl (u)Lj +l (T ), (11.8.1)
l
l=0
9 :
where nk is the flag coefficient defined by (6.3.12).
As noted above, the proof appears only after we have moved to the manifold
setting. Nevertheless, the proof of (11.8.1) in the case of an isotropic field over an
N-dimensional rectangle requires none of the manifold material, and so, if you wish,
you can read it now in Section 13.2.
In essence, this chapter will repeat, for random fields on manifolds, what we have
already achieved in the Euclidean setting.
As there, our first step will be to set up a “metatheorem’’ for computing the mean
number of points at which a random field takes a certain value under specific side
conditions. This turns out to be rather easy to do, involving little more than taking the
Euclidean result and applying it, chart by chart, to the manifold. This is the content
of the first section.
Actually computing the resulting expression for special cases—such as finding the
mean Euler characteristic of excursion sets over Whitney stratified manifolds—turns
out to be somewhat more complicated. We start the computation in Section 12.2,
where, for the second time,1 we shall begin to see why we need our heavy investment
in the Riemannian geometry of Chapter 7. The main contribution of this section will
be to take the parameter space on which our Gaussian process is defined, endow it
with a Riemmanian metric induced by the process, and study this metric a little.
Section 12.3 is devoted to some general Gaussian computations, including the
moment formulas for random Gaussian forms that we used when looking at volume-
of-tube formulas. The real work is done in Section 12.4, which puts all the preceding
material together to devlop mean Euler characteristic formulas.
The strength of the general approach is shown by the examples of Section 12.5,
and, as we have already mentioned, we indulge ourselves somewhat with a fun proof
of the Chern–Gauss–Bonnet theorem in Section 12.6.
To formulate the metatheorem for manifolds, we need one small piece of notation.
1 The first was in the volume-of-tubes approximation of Chapter 10, which was something of
an aside. It is in the current chapter that we shall see the real need for Riemannian geometry
in a general setting, and not just geometry of RN or S(RN ) with the usual Euclidean metrics.
302 12 Random Fields on Manifolds
Nu ≡ Nu (M) ≡ Nu (f, h; M, B)
Assume that the following conditions are satisfied for some orthonormal frame field E:
(a) All components of f , ∇fE , and h are a.s. continuous and have finite variances
(over M).
(b) For all t ∈ M, the marginal densities pt (x) of f (t) (implicitly assumed to exist)
are continuous at x = u.
(c) The conditional densities pt (x|∇fE (t), h(t)) of f (t) given h(t) and (∇fE )(t)
(implicitly assumed to exist) are bounded above and continuous at x = u, uni-
formly in t ∈ M.
(d) The conditional densities pt (z|f (t) = x) of det(∇fEj i (t)) given f (t) = x are
(g) The moduli of continuity with respect to the (canonical) metric induced by g (cf.
j
(7.3.1)) of each component of h, each component of f , and each ∇fEi all satisfy,
for any ε > 0,
P {ω(η) > ε} = o ηN as η ↓ 0. (12.1.3)
Then
12.1 The Metatheorem on Manifolds 303
E{Nu } = E |det (∇fE )| 1B (h)f = u p(u) Volg , (12.1.4)
M
where p is the density2 of f and Volg the volume element on M induced by the
metric g.
Before turning to the proof of the theorem, there are a few points worth noting. The
first is that the conditions of the theorem do not depend on the choice of orthonormal
frame field. Indeed, as soon as they hold for one such choice, not only will they hold
for all orthonormal frame fields but also for any bounded vector field X. In the latter
j
case the notation will change slightly, and ∇fEi needs to be replaced by (Xf j )i .
Once this is noted, you should note that the only place that the metric g appears in
the conditions is in the definition of the neighborhoods Bτ (t, h) in the final condition.
A quick check of the proof to come will show that any neighborhood system will
in fact suffice. Thus the metric does not really play a role in the conditions beyond
convenience.
Furthermore, the definition of the random variable Nu is totally unrelated to the
metric. From this it follows that the same must be true of its expectation. Conse-
quently, although we require a metric to be able to define the integration in (12.1.4),
the final expression must actually yield a result that is independent of the choice of
g and so be a function only of the “physical’’ manifold and the distribution of f .
However, choosing an appropriate g can greatly simplify the calculation.
Proof. Since M is compact it has a finite atlas. Let (U, ϕ) be one of its charts and
consider the random fields f¯ : ϕ(U ) ⊂ RN → RN and h̄ : ϕ(U ) ⊂ RN → RK
defined by
f¯ = f ◦ ϕ −1 , h̄ = h ◦ ϕ −1 .
and so the expectations of both of these random variables are also identical.
Recall the comments made prior to the proof: All conditions in the theorem that
involve the orthonormal frame field E also hold for any other bounded vector field on
U ⊂ M. In particular, they hold for the natural coordinate vector field {∂/∂xi }1≤i≤N
determined by ϕ.
2 Of course, what is implicit here is that for each t ∈ M, we should really write p as p , since
t
it is the density of ft . There are also a number of additional places in (12.1.4) where we
could append a t, but since it has been our habit to drop the subscript when working in the
setting of manifolds, we leave it out here as well.
Note that it is implicitly assumed that the integrand in (12.1.4) is a well-defined N-
form on M, or, equivalently, that the expectation term is a well-defined Radon–Nikodym
derivative. That this is the case will follow from the proof.
304 12 Random Fields on Manifolds
where ∂/∂xi is the push-forward under ϕ −1 of the natural basis on ϕ(U ). Together
with the definition of integration of differential forms in Section 7.4 this gives us that
The next step involves moving from the natural basis on U to the basis given by
the orthonormal frame field E. Doing so generates two multiplicative factors, which
fortunately cancel. The first comes from the move from the form ∂x1 ∧ · · · ∧ ∂xN
to the volume form Volg , and generates a factor of (det(gij ))−1/2 , where gij (t) =
gt (∂/∂xi , ∂/∂xj ) (cf. (7.4.9)).
The second factor comes from noting that
∂ j ∂ 1/2
f = g , Ek Ek f j = gik Ek f j ,
∂xi ∂xi
k k
where g 1/2 = g(Ei , ∂xj ) 1≤i,j ≤N is a square root of the matrix g = (gij )1≤i,j ≤N .
Consequently,
det ∂/∂xi f j = det(gij ) det (∇fE ) .
t
3 The only condition that needs any checking is (11.2.2) on the moduli of continuity. It is
here that the requirement that g be C 1 over M comes into play. The details are left to you.
12.2 Riemannian Structure Induced by Gaussian Fields 305
E{Nu (U )} = E |det (∇fE )| 1B (h)f = u p(u) Volg , (12.1.5)
U
Ultimately, we shall apply the above corollary to obtain, among other things,
an expression for the expected Euler characteristic of Gaussian excursion sets over
manifolds. Firstly, however, we need to set up some machinery.
Up until now, all our work with Riemannian manifolds has involved a general
Riemannian metric g. Using this, back in Section 7.5 we developed a number of
concepts, starting with connections and leading up to curvature tensors and shape
operators, in corresponding generality.
For our purposes, however, it will turn out that for each random field f on a
piecewise C 2 manifold M, there is only one Riemannian metric that we shall need.
It is induced by the random field f , which we shall assume has zero mean and, with
probability 1, is C 2 over M. It is defined by
gt (Xt , Yt ) = E{(Xt f ) · (Yt f )}, (12.2.1)
We shall call g the metric induced by the random field4 f . The fact that this
definition actually gives a Riemannian metric follows immediately from the positive
semidefiniteness of covariance functions.
Note that at this stage, there is nothing in the definition of the induced metric
that relies on f being Gaussian.5 The definition holds for any C 2 random field.
Furthermore, there are no demands related to stationarity, isotropy, etc.
One way to develop some intuition for this metric is via the geodesic metric τ
that it induces on M. Since τ is given by
τ (s, t) = inf gt (c , c )(t) dt (12.2.3)
c∈D 1 ([0,1];M)(s,t) [0,1]
(cf. (7.3.1)), it follows that the geodesic between two points on M is the curve along
which the expected variance of the derivative of f is minimized.
It is obvious that g is closely related to the covariance function C(s, t) = E(fs ft )
of f . In particular, it follows from (12.2.1) that
Our first step is to describe the Levi-Civitá connection ∇ determined by the induced
metric g. Recall from Chapter 6 that the connection is uniquely determined by
Koszul’s formula,
In order also to write R in terms of covariances, we recall (cf. (7.3.22) the covariant
Hessian of a C 2 function f :
It follows from the fact that ∇ is torsion-free (cf. (7.3.10)) that ∇ 2 f (X, Y ) =
∇ 2 f (Y, X),and so ∇ 2 is a symmetric form.
With this definition, we now prove the following useful result, which relates the
curvature tensor R to covariances6 and is crucial for later computations.
Lemma 12.2.1. If f is a zero-mean, C 2 random field on a C 3 Riemannian manifold
equipped with the metric induced by f , then the curvature tensor R on M is given by
2
−2R = E ∇ 2 f , (12.2.8)
where the square of the Hessian is to be understood in terms of the dot product of
tensors developed at (7.2.4).
Proof. Note that for C 1 vector fields it follows from the definition7 (12.2.7) that
2
∇ 2 f ((X, Y ), (Z, W ))
Take expectations of this expression and exploit (12.2.6) to check (after a little alge-
bra) that
2
E ∇ f ((X, Y ), (Z, W )) = 2 (E[XZf · Y Wf ] − g(∇X Z, ∇Y W )
2
Now apply (7.3.11) along with (12.2.6) to see that the last expression is equal to
2 XE[Zf · Y Wf ] − E[Zf · XY Wf ] − g(∇X Z, ∇Y W )
; <
− Y E XWf · Zf + E[Zf · Y XWf ] + g(∇X W, ∇Y Z)
= 2 Xg(Z, ∇Y W ) − g(∇X Z, ∇Y W ) − g(Z, ∇[X,Y ] W )
− Y g(∇X W, Z) − g(∇X W, ∇Y Z)
= 2 g(Z, ∇X ∇Y W ) − g(∇Y ∇X W, Z) − g(Z, ∇[X,Y ] W )
= 2R (X, Y ), (W, Z)
= −2R (X, Y ), (Z, W ) ,
the first equality following from the definition of the Lie bracket, the second from
(7.3.11), the third from the definition of the curvature tensor R, and the last is trivial.
This establishes8 (12.2.8), which is what we were after.
Many of the Euclidean computations of Section 11.7 were made possible as a result
of convenient independence relationships between f and its first- and second-order
derivatives. The independence of f and ∇f followed from the fact that f had
constant variance, while that of ∇f and the matrix ∇ 2 f followed from stationarity.
Computations were further simplified by a global transformation (cf. (11.7.7)) that
transformed f to being isotropic.
While we shall continue to assume that f has constant variance, we no longer
can assume stationarity nor find easy transformations to isotropy. However, we have
invested considerable effort in setting up the geometry of our parameter space with
the metric induced by f , and now we are about to start profiting from this. We start
with some general computations, which require no specific assumptions.
We start with the variance function
σt2 = E ft2 .
8 If you are a stickler for detail, you may have noticed that since our assumptions require
only that f be C 2 , it is not at all clear that the terms XY Wf and Y XWf appearing in the
derivation make sense. However, their difference, [X, Y ]Wf , is well defined, and that is
all we have really used.
12.2 Riemannian Structure Induced by Gaussian Fields 309
Assuming, as usual, that f ∈ C 2 (M), we also have that σ 2 ∈ C 2 (M), in which case
there are no problems in changing the order of differentiation and expectation to see
that, for C 1 vector fields X and Y ,
Xσ 2 = XE f 2 = 2E {f · Xf } . (12.2.9)
XY σ 2 = 2 (E {XYf · f } + E {Xf · Yf })
and
by (12.2.6). You should note that this result requires no assumptions whatsoever. It is
an immediate consequence of the geometry that f induces on M via the induced metric
and the fact that the covariant Hessian ∇ 2 incorporates this metric in its definition.
Putting all the above together gives that
x x 2 2
E ∇ 2 ft ft = x, ∇ft = v = − 2 I + ∇ σt ,
σt 2σt2
where I is the identity double form determined by g, defined in (7.2.10).
Assume now that f has constant variance, which we take to be 1. Then Xσ 2 ≡ 0,
and the last equality simplifies to give
E ∇ 2 ft ∇ft = v, ft = x = −xI. (12.2.12)
Now might be a good time to take some time off to look at a few examples.
An extremely important example, which can be treated in detail without too much
pain, is given by the differential structure induced on a compact domain M in RN
by a zero-mean, C 2 Gaussian field. For the moment we shall assume that M has a
C 2 boundary, although at the end of the discussion we shall also treat the piecewise
C 2 case.
We shall show how to explicitly compute both the curvature tensor R and the
shape operator S in terms of the covariance function C, as well as traces of their
powers. We shall also discuss the volume form Volg .
We shall not, in general, assume that f is either stationary or isotropic. In fact,
one of the strengths of the manifold approach is that it handles the nonstationary case
almost as easily as the stationary one.
The basis for our computations is Section 7.7, where we saw how to compute
what we need after starting with a convenient basis. Not surprisingly, we start with
{Ei }1≤i≤N , the standard coordinate vector fields on RN . This also gives the natural
basis in the global chart (RN , i), where i is the inclusion map. We give RN the metric
g induced by f .
From Section 7.7 we know that, as far as the curvature operator is concerned,
everything depends on two sets of functions, the covariances
∂ 2 C(r, s)
gij (t) = g(Eti , Etj ) = (12.2.16)
∂ri ∂rj (t,t)
and so the curvature tensor, its powers, and their traces are identically zero. As a
consequence, most of the complicated formulas of Section 7.7 simply disappear. The
isotropic situation is, of course, simpler still, since then
only need to know how to compute the j∗iN of (7.7.13). While this is not always
easy, if f is stationary and isotropic, then as for the curvature tensor things do sim-
plify somewhat. In particular, if it is possible to explicitly determine functions aij
such that
N
Eit∗ = aik (t)Ekt ,
k=1
N
∂
j∗iN (t) = λ2 aj k (t) aNl (t)ail (t) , (12.2.20)
∂tk
k,l=1
so that the summation has dropped one dimension. Far more significant, however,
are the facts that the information about the Riemannian structure of (M, g) is now
summarized in the single parameter λ2 and that this information has been isolated
from the “physical’’ structure of ∂M inherent in the functions aik .
In fact, this can also be seen directly from the definition of the shape operator.
From (12.2.19) it is also easy to check that for any vectors X, Y ,
312 12 Random Fields on Manifolds
g(X, Y ) = λ2 X, Y ,
where the right-hand side denotes the Euclidean inner product of X and Y . Conse-
quently, writing S g for the shape operator under the induced Gaussian metric and S E
for the standard Euclidean one, we have
At the core of the calculation of the expected Euler characteristic in the Euclidean
case were the results of Lemma 11.6.2 and Corollary 11.6.3 about mean values of
determinants of Gaussian matrices. In the manifold case we shall need a somewhat
more general result.
To start, recall the discussion following (7.2.6). If we view an N × N matrix
as representing a linear mapping T from RN to RN , with ij = ei , T ej , then
can also be represented as a double form γ ∈ 1,1 (RN ), and from the discussion in
Section 7.1,
1
det = Tr (γ )N . (12.3.1)
N!
Thus it should not be surprising that we now turn our attention to the expectations
of traces of random double forms, for which we need a little notation.
12.3 Another Gaussian Computation 313
and covariances
E W (vi1 , vi2 ) − μ(vi1 , vi2 ) · W (vj1 , vj2 ) − μ(vj1 , vj2 )
= Cov (vi1 , vi2 ), (vj1 , vj2 ) .
Lemma 12.3.1. With the notation and conditions described above, and understanding
all powers and products of double forms as being with respect to the double wedge
product of (7.2.4),
2!
k
k!
E W k
= μk−2j C j , (12.3.2)
(k − 2j )!j !2j
j =0
2!
k
k!
= μ k−2j j
C (v1 , . . . , vk ), (v1 , . . . , v ) ,
(k − 2j )!j !2j k
j =0
Proof. Since it is easy to check that the standard binomial expansion works also for
dot products, the general form of (12.3.2) will follow from the special case μ = 0,
once we show that for this case,
!
0, k odd,
E W = (2j )! j
k
(12.3.3)
j !2j
C , k = 2j .
Thus, assume now that μ = 0. The case of odd k in (12.3.3) follows immediately
from (11.6.1), and so we have only the even case to consider.
314 12 Random Fields on Manifolds
Recalling the definition (7.2.4) of the dot product of double forms, we have that
#
2j
W 2j
(v1 , . . . , v2j ), (v1 , . . . , v2j
) = επ εσ W (vπ(k) , vσ (k) ),
π,σ ∈S(2j ) k=1
(2j )! #
j
επ εσ E W (vπ(2k−1) , vσ (2k−1) )W (vπ(2k) , vσ (2k) ) ,
j!
π,σ ∈S(2j ) k=1
where the combinatorial factor comes from the different ways of grouping the vectors
(vπ(k) , vσ (k) ), 1 ≤ k ≤ 2j , into ordered pairs.9
The last expression can be rewritten as
(2j )! #
j
επ εσ E W (vπ(2k−1) , vσ (2k−1) )W (vπ(2k) , vσ (2k) )
j !2j
π,σ ∈S(2j ) k=1
− W (vπ(2k) , vσ (2k−1) )W (vπ(2k−1) , vσ (2k) )
(2j )! #
j
= επ εσ C (vπ(2k−1) , vπ(2k) ), (vσ (2k−1) , vσ (2k) )
j !2j
π,σ ∈S(2j ) k=1
(2j )! j
= C (v1 , . . . , v2j ), (v1 , . . . , v2j
) ,
j !2 j
The following corollary is immediate from Lemma 12.3.1 and the definition
(7.2.6) of the trace operator.
2!
k
k!
E Tr(W ) = k
Tr μk−2j j
C . (12.3.4)
(k − 2j )!j !2j
j =0
9 In comparing this with (11.6.2), note that there we had an extra summation over the groupings
into unordered pairs rather than a simple multiplicative factor. We already have each possible
grouping due to the summation over π and σ in S(2j ), and since we are keeping pairs ordered
we also lose the factor of 2−j there.
12.4 The Mean Euler Characteristic 315
N
E {ϕ (Au )} = Lj (M)ρj (u), (12.4.1)
j =0
u2
ρj (u) = (2π)−(j +1)/2 Hj −1 (u)e− 2 , j ≥ 0, (12.4.2)
Hj is the j th Hermite polynomial, given by (11.6.9) and (11.6.10), and the Lj (M) are
the Lipschitz–Killing curvatures (7.6.1) of M, calculated with respect to the metric
(12.2.2) induced by f , i.e.,
!
(−2π)−(N−j )/2
g if N − j is even,
M R (N−j )/2 Vol
Lj (M, U ) = (N−j )! U Tr (12.4.3)
0 if N − j is odd.
Proof. The first consequence of the assumptions on f is that the sample functions of
f are almost surely Morse functions over M. Thus Corollary 9.3.5 gives us that
ϕ(Au ) = (−1)ι−f,M (t)
{t∈M:∇ft =0}
N
= (−1)k #{t ∈ M : ft ≥ u, ∇ft = 0, index(−∇ 2 ft ) = k}.
k=0
If we now interchange summation and integration and bracket the factor of (−1)k
together with | det ∇ 2 fE |, then we can drop the absolute value sign on the latter, al-
though there is a remaining factor of −1. This allows us to exchange expectation with
summation, and the factor of 1Ak (∇ 2 fE ) and the sum over k disappear completely.
Now recall that f has constant variance and note that since E is an orthonormal
frame field (with respect to the induced metric g), the components of ∇fE at any t ∈ M
are all independent standard Gaussians. Furthermore, as we saw in Section 12.2.2, the
constant variance of f implies that they are also independent of f (t). Consequently,
the joint probability density of (f, ∇fE ) at the point (x, 0) is simply
e−x /2
2
.
(2π)(N+1)/2
Thus not only is it known, but it is constant over M.
Noting all this, conditioning on f and integrating out the conditioning allows us
to rewrite the above in the much simpler format
∞
E{ϕ(Au )} = (2π)−(N+1)/2 e−x /2 dx
2
u
(12.4.5)
× E{det(−∇ fE )|∇fE = 0, f = x} Volg .
2
M
12.4 The Mean Euler Characteristic 317
Recalling the definition of the trace (cf. (12.3.1)), the innermost integrand can be
written as
1
E{Tr((−∇ 2 f )N )|∇fE = 0, f = x}.
N!
Since ∇ 2 f is a Gaussian double form, we can use Corollary 12.3.2 to compute the
above expectation, once we recall (12.2.12) and (12.2.13) to give us the conditional
mean and covariance of ∇ 2 f . These give
(−1)N
E{Tr M (∇ 2 f )N |∇fE = 0, f = x}
N!
2!
N
(−1)j
= Tr M ((xI )N−2j (I 2 + 2R)j )
(N − 2j )!j !2j
j =0
2 !
N
j
(−1)j N−2l j
= x N−2j
Tr M
I (2R)l
j !2j (N − 2j )! l
j =0 l=0
⎛ ⎞
N
! N−2l
!
(−1)l
2
⎜ 2
(−1)k ⎟
= Tr M ⎝R l x N−2k−2l I N−2l ⎠
l! 2 (N − 2k − 2l)!k!
k
l=0 k=0
2!
N
(−1)l
= Tr M (R l )HN−2l (x),
l!
l=0
where in the last line, we have used (7.2.11) and the definition (11.6.9) of the Hermite
polynomials.
Substituting back into (12.4.5) we conclude that E {ϕ (Au )} is given by
2 ! 9 :
N
∞ (−1)l
(2π)−(N+1)/2 HN−2l (x)e−x
2 /2
dx Tr M (R l ) Volg
u l! M
l=0
2!
N
(−1)l
(2π)−(N+1)/2 HN−2l−1 (x)e−u
2 /2
= Tr M (R l ) Volg
l! M
l=0
N
= Lj (M)ρj (u),
j =0
where the first equality follows from (11.6.12) and the second from the definitions
(12.4.2) of the ρj and (12.4.3) for the Lj , along with a little algebra.
That is, we have (12.4.1) and so the theorem.
boundary, and the results of Section 11.7. There, as you will recall, the parameter
space was an N -dimensional rectangle, and the Gaussian process was required to be
stationary.
Thus, we return to the setting of Sections 8.1 and 9.3, and take M to be an N-
dimensional regular stratified manifold.
N
E {ϕ (Au )} = Li (M)ρi (u), (12.4.6)
i=0
with the single change that the Lipschitz–Killing curvatures Lj are now defined by
(10.7.1), i.e.,
j −k
N 2 !
(−1)l
Lk (M) = (2π)−(j −k)/2 C(N − j, j − k − 2l) (12.4.7)
l!(j − k − 2l)!
j =k l=0
−k−2l
× Tr Tt ∂j M R l SνjN−j
∂j M S(Tt ∂j M ⊥ )
× α(νN−j )HN−j −1 (dνN−j )Hj (dt).
Remark 12.4.3. If we also require that M be locally convex, then α(ν) ≡ 1Nt M (−ν),
and so α disappears from (12.4.7), although the integral over S(Tt ∂j M ⊥ ) then be-
comes an integral over S(Nt M).
α : M × RL → Z
by
Nuα (f, h, F ; M, B) = α (F (t)) ,
t∈M:f (t)=u and h(t)∈B
Theorem 12.4.4. Retaining the notation and all conditions of Theorem 12.1.1 and
adopting the notation above, assume also the following:
(i) F is almost surely continuous over M.
(ii) For each t ∈ M, the marginal densities pt (x) of F (t) (implicitly assumed to
exist) are continuous in x.
(iii) The conditional densities pt (x|∇fE (t), h(t), F (t)) of f (t) given h(t), (∇fE )(t),
and F (t) (implicitly assumed to exist) are bounded above and continuous at
x = u, uniformly in t ∈ M.
(iv) The conditional densities pt (z|f (t) = x) of F (t) given f (t) = x are continuous
for all z and for x in a neighborhood of u, uniformly in t ∈ M.
(v) The modulus of continuity of F satisfies (12.1.3).
(vi) |α| is bounded and α is piecewise constant on M × RL . Furthermore, for each
t ∈ M, the set of discontinuities of α(t, ·) has dimension no greater than L − 1
in RL , while for each x ∈ RL the set of discontinuities of α(·, x) has dimension
no greater than N − 1 in RN .
Then
E{Nuα } = E |det (∇fE )| 1B (h)α(F )f = u p(u) Volg . (12.4.8)
M
Proof. The proof of the theorem is really no different from that of Theorem 12.1.1,
but we are missing an analogue of its Euclidean precursor, Theorem 11.2.1, with the
extra complication of the α term. However, if you had the patience to follow the
details of the proof of Theorem 11.2.1, you will agree that it really is easy to see that
under the conditions we have added, the proof that we had of that theorem requires
only minor changes to cover the current scenario.
Further (unnecessary) details are left to you.
most of the hard work in proving some of the earlier cases, and so we now face a
proof that is more concerned with keeping track of notation than needing any serious
new computations. Nevertheless, there is something new here, and that is the way
the Morse index and integration over normal cones are handled. These two points
actually take up most of the proof.
We start by recalling the setup, which implies that M has the unique decomposition
dim M
M= ∂j M,
j =0
where
μj kn = # t ∈ ∂j M : f (t) ≥ u, ∇f (t) ∈ Nt M,
α(−PT⊥t ∂j M ∇f (t)) = k, ι−f,∂j M (t) = n .
If t ∈ M ◦ then the normal cone is {0}, and so the expectation of the j = N term
in (12.4.9) has already been computed, since this is the computation of the previous
proof. Nevertheless, we shall rederive it, en passant, in what follows. For this case,
however, it will be important that you recall the conventions set out in Remark 10.5.2,
which we shall use freely.
Fix a j , 0 ≤ j ≤ N , and choose an orthonormal frame field E such that
Ej +1 , . . . , EN are normal to ∂j M, from which it follows that E1 f (t), . . . , EN f (t)
are independent standard Gaussians for each t.
To compute the expectation of (12.4.9) we argue much as we did in the proof of
Theorem 12.4.1, applying Theorem 12.4.4 rather than Theorem 12.1.1 to allow for
the new factor of α (cf. the argument leading up to (12.4.5)).10
This gives us the following, in which ν is an (N − j )-dimensional vector, which,
for convenience, we write as (νj +1 , . . . , νN ):
⎧ ⎫
⎨dim
M K j ⎬
E (−1)n kμj kn
⎩ ⎭
j =0 k=−K n=0
∞
−(N+1)/2 −x 2 /2
= (2π) e dx Hj (dt)
u ∂j M
−|ν|2 /2 j
× e θt (ν, x)α(−ν)HN−j (dν),
Tt ∂j M ⊥
where S is the scalar second fundamental form, given by (7.5.8) and (7.5.12). Equiv-
alently,
E ∇ 2 f|∂j M f = x, ∇fE = (0, ν) = −xI + Sν = − (xI + S−ν ) .
j
We now have all we need to evaluate θt (ν, x), which, following the argument of
the preceding section and using the above conditional expectations and variance, is
equal to
j
(−1)j
E Tr ∂j M
∇ f|∂j M f = x, ∇fE = (0, ν)
2
j!
j
2!
k
(−1)k j −2k
= Tr ∂j M
(xI + S−ν ) I + 2R
2
(j − 2k)!k!2k
k=0
α(cν) = α(ν),
T t ∂j M ⊥
= C(N − j, j − k − 2l) Tr ∂j M (Sνj −k−2l R l )α(ν)HN−j −1 (dν).
S(Tt ∂j M ⊥ )
M
dim ∞
j
−(N+1)/2 −x 2 /2
E{ϕ(Au )} = (2π) e Hk (x)
j =0 u k=0
j −k
2 !
(−1)l
× C(N − j, j − k − 2l)
l!(j − k − 2l)!
l=0
× Tr ∂j M (Sνj −k−2l R l )α(ν)HN−j −1 (dν)
S(Tt ∂j M ⊥ )
j −k
M
dim
N 2 !
(−1)l
= ρk (u) × C(N − j, j − k − 2l)
l!(j − k − 2l)!
k=0 j =k l=0
∂j M j −k−2l l
× Tr (Sν R )α(ν)HN−j −1 (dν)
S(Tt ∂j M ⊥ )
after integrating out x (via (11.6.12)) and changing the order of summation.
Comparing the last expression with the definitions (12.4.2) of the ρk and (12.4.7)
of the Lk , the proof is complete.
12.5 Examples
With the hard work behind us, we can now look at some applications of Theo-
rems 12.4.1, and 12.4.2. One of the most powerful implications of the formula
M
dim
E {ϕ (Au )} = Lj (M)ρj (u), (12.5.1)
j =0
is that for any example, all that needs to be computed are the Lipschitz–Killing
curvatures Lj (M), since the ρj are well defined by (12.4.2) and dependent neither
on the geometry of M nor the covariance structure of f .
324 12 Random Fields on Manifolds
Nevertheless, this is not always easy, and there is no guarantee that explicit forms
for the Lj (M) exist. In fact, more often than not, this will unfortunately be the case,
and one needs to turn to a computer for assistance, performing either (or often both)
symbolic or numeric evaluations.
However, there are some cases that are not too hard, and so we shall look at these.
where the last term is the simple Euclidean inner product. Thus g changes the usual
flat Euclidean geometry of RN only by scaling, and so the geometry remains flat.
This being the case, (6.3.6) gives us the necessary Lipschitz–Killing curvatures,
j/2
although each Lj of (6.3.6) needs to be multiplied by a factor of λ2 . Substituting
this into (12.5.1) gives the required result, (11.7.12).
The few lines of algebra needed along the way are left to you.
Also left to you is the nonisotropic case, which is not much harder, although you
will need a slightly adapted version of (6.3.6) that allows for a metric that is a constant
times the Euclidean metric on hyperplanes in RN , but for which the constant depends
on the hyperplane. This will give Theorem 11.7.2 in its full generality.
where detr j is given by (7.2.8) and the curvature matrix Curv is given by (10.7.7)
A simple example was given in Figure 6.2.3, for which we discussed finding,
via integral-geometric techniques, the Euler characteristic of Au (f, M), where M
was a C 2 domain in R2 . Although we found a point process representation for the
ϕ(Au ), we never actually managed to use integral-geometric techniques to find its
expectation. The reason is that differential-geometric techniques work much better,
since applying (12.5.2) with N = 2 immediately gives us the very simple result that
length(∂M)
L2 (M) = Area(M), L1 (M) = , L0 (M) = 1,
2
which, when substituted into (12.5.1), gives the required expectation.
A historical note is appropriate here: It was the example of isotropic fields on C 2
domains that, in Keith Worsley’s paper [173], was really the genesis of the manifold
approach to Gaussian geometry that has been the central point of this chapter and the
reason for writing this book.
Under isotropy, you should now be able to handle other examples yourself. All
reduce to calculations of Lipschitz–Killing curvatures under a constant multiple of
the standard Euclidean metric, and for many simple cases, such as balls and spheres,
these have already been computed in Chapter 6. What is somewhat harder is the
nonisotropic case.
the sum being over faces of dimension j in T containing the origin and the rest of
the notation as in Theorem 11.7.2. What (12.5.3) shows is that there is no simple
averaging over the boundary of T as there is in the isotropic case. Each piece of
the boundary has its own contribution to make, with its own curvature and second
326 12 Random Fields on Manifolds
fundamental form. In the case of a rectangle this is not too difficult to work with. In
the case of a general domain it is not so simple.
In Section 7.7 we saw how to compute curvatures and second fundamental forms
on Euclidean surfaces in terms of Christoffel symbols. In Section 12.2.3 we saw how
to compute Christoffel symbols for the metric induced by f in terms of its covariance
function. For any given example, these two computations need to be coordinated and
then fed into definitions such as (12.4.3) and (12.4.7) of Lipschitz–Killing curvatures.
From here to a final answer is a long path, often leading through computer algebra
packages.
There is, however, one negative result that is worth mentioning here, since it is
sufficiently counterintuitive that it has led many to an incorrect conjecture. As we
mentioned following the proof of Lemma 11.7.1, a first guess at extending (12.5.3) to
the nonstationary
case would be to replace the terms |J |||J |1/2 there by integrals of
the form J |t | dt, where the elements of t are the covariances E{fi (t)fj (t)}.
1/2
Without quoting references, it is a fact that this has been done more than once in
the past. However, it is clear from the computations of the Christoffel symbols in
Section 12.2.3 that this does not work, and additional terms involving third-order
derivatives of the covariance function enter into the computation. The fact that these
terms are all identically zero in the stationary case is probably what led to the errors.11
induce such Riemannian metrics, the integrals needed to evaluate E{ϕ(Au (f, M))}
are significantly easier to calculate.
2!
N
(−1)k ρN−2k (u)
E {ϕ(Au (f, M))} = Volg (G) Tr Te G (Rek ), (12.5.4)
(2π)k k!
k=0
Proof. This is really a corollary of Theorem 12.4.1, since G has no boundary. Ap-
plying that result, and comparing (12.5.4) with (12.4.1), it is clear that all we need to
show is that
Tr G R l Volg (dg ) = Tr Te G Rel Volg (G).
G g
As promised at the beginning of the chapter, we now give a purely probabilistic proof
of the classical Chern–Gauss–Bonnet theorem. Of course, “purely’’ is somewhat of
328 12 Random Fields on Manifolds
an overstatement, since the results on which our proof is based were themselves based
on Morse’s critical point theory.
The Chern–Gauss–Bonnet theorem is one of the most fundamental and important
results of differential geometry, and it gives a representation of the Euler character-
istic of a deterministic manifold in terms of curvature integrals. It has a long and
impressive history, starting in the early nineteenth century with simple Euclidean do-
mains. Names were added to the result as the setting became more and more general.
While it has nothing to do with probability, it can be obtained as a simple corollary to
Theorems 12.4.1 and 12.4.2. Here is a version very close to that originally proven in
1943 by Allendoerfer and Weil [11], for what they called “Riemannian polyhedra.’’
j
M
dim 2!
(−1)m
ϕ(M) = (2π)−j/2 C(N − j, j − 2m)
m!(j − 2m)!
j =0 m=0
−2m
× Tr Tt ∂j M (R m SνjN−j )
∂j M S(Tt ∂j M ⊥ )
× α(νN−j )HN−j −1 (dνN−j )Hj (dt),
where we adopt the notation of (10.7.1) and the convention of Remark 10.5.2.
Proof. Suppose f is a Gaussian random field on M such that f induces13 the metric g.
Suppose, furthermore, that f satisfies all the side conditions of either Theorem 12.4.1
or Theorem 12.4.2, depending on whether M does, or does not have, a boundary.
To save on notation, assume now that M does not have a boundary. Recall that
in computing E{ϕ(Au (f, M))} we first wrote ϕ(Au (f, M))) as an alternating sum of
N different terms, each one of the form
μk (u) = #{t ∈ M : ft > u, ∇ft = 0, index(−∇ 2 ft ) = k}.
13 Note that this is the opposite situation to that which we have faced until now. We have
always started with the field f and used it to define the metric g. Now, however, g is given
and we are assuming that we can find an appropriate f .
12.6 Chern–Gauss–Bonnet Theorem 329
μk = #{t ∈ M : ∇ft = 0, index(−∇ 2 ft ) = k},
and ϕ(M) is given by an alternating sum of the μk . Since μk (u) is bounded by the
total number of critical points of f , which is an integrable random variable, dominated
convergence gives us that
and the statement of the theorem then follows by first using Theorem 12.4.1 to evaluate
E{ϕ(Au (f, M))} and then checking that the right-hand side of (12.6.1) is, in fact, the
above limit.
If we do not know that g is induced by an appropriate Gaussian field, then we
need to adopt a nonintrinsic approach via Nash’s embedding theorem in order to
construct one.
Nash’s embedding theorem [118] states that for any C 3 , N -dimensional Riemann-
ian manifold (M, g), there is an isometry ig : M → ig (M) ⊂ RN for some finite N
depending only on N . More importantly, RN is to be taken with the usual Euclidean
metric.
The importance of this embedding is that it is trivial to find an appropriate f when
the space is Euclidean with the standard metric. Any unit zero-mean unit-variance
isotropic Gaussian random field, f say, on RN whose first partial derivatives have
unit variance and that satisfies the nondegeneracy conditions of Theorem 12.4.1 will
do. If we now define
on M, then it is easy to see that f induces the metric g on M, and so our construction
is complete.
Finally, we note that the case of piecewise smooth M follows exactly the same
argument, simply appealing to Theorem 12.4.2 rather than Theorem 12.4.1 and to a
version of the embedding theorem for stratified spaces (cf. [71, 119, 120]).
M
dim
(2π)−j/2
ϕ(M) =
j!
j =0
× E α(PT⊥t ∂j M ∇3 y|∂j M )j Hj (dt).
y ; M) × Tr Tt ∂j M (−∇ 23
∂j M
More generally,
M
dim
(2π)−j/2
L0 (M, A) =
j!
j =0
⊥
× E α(PTt ∂j M ∇3
y ; M) × Tr Tt ∂j M
(−∇ 3
2
y|∂j M )j
Hj (dt).
∂j M∩A
13
Mean Intrinsic Volumes
where the Lj on both sides of this equation are computed with respect to the metric
induced on the parameter space M by the zero-mean, unit-variance Gaussian field f
(cf. (12.2.1)) and the Au (f, M) are, as always, the excursion sets of f above the level
u. The functions ρl remain as in the preceding two chapters (cf. (12.4.2)).
In all, we shall give three proofs of (13.0.1). The first two, which make up the
content of the current chapter, exploit the results of Chapters 11 and 12 and use
some nonstochastic results of geometry to move from Euler characteristics to general
Lipschitz–Killing curvatures.
The first proof, in Section 13.2, will be for isotropic fields with unit variance
defined over subsets of RN . In this case the Lk on both sides of (13.0.1) are the
basic intrinsic volumes (volume, surface area, etc.) that we met first in Chapter 6 and
the proof is quite straightforward. Indeed, all we shall need for the proof are two
basic results of integral geometry—the formulas of Crofton and Hadwiger—which
we shall give in the following section.
The second proof of (13.0.1) will be for the case in which f is more general and
M a Whitney stratified space. For this we shall need a non-Euclidian analogue of
Crofton’s formula, which we give in Section 13.3. This is a new result and should be
of independent interest outside the particular application that we have for it.
332 13 Mean Intrinsic Volumes
The third proof will come in Chapter 15 when, via a completely new approach,
we shall re-prove the main results of this and the preceding two chapters.
for all 1 ≤ i ≤ N, where the fi are the usual partial derivatives of f . Then, for every
0 ≤ j ≤ N,
N−j 9 :
j + l j +l
E Lj (Au (f, M)) = λ ρl (u)Lj +l (M), (13.2.1)
l
l=0
where the Lj , on both sides of the equality, are computed with respect to the standard
Euclidean metric on RN .
Proof. We first note that it suffices to establish (13.2.1) for the case λ = 1. The
general case then follows from the scaling properties (6.3.1) of the Lj .
Thus, assuming that λ = 1, Hadwiger’s formula and Theorem 12.4.2 immediately
yield
E Lj (Au (f, M)) = E {L0 (Au (f, M) ∩ V )} dλN N−j (V )
Graff (N,N−j )
N−j
= ρl (u) Ll (M ∩ V ) dλN
N−j (V )
l=0 Graff (N,N−j )
N−j 9 :
j +l
= ρl (u)Lj +l (M),
l
l=0
Du = t ∈ M 3 : yt1 = u1 , . . . , ytk = uk
3 ∩ DZk ) = (2π)−k/2 [k + j ]! Lk+j (M),
E Lj (M 3 (13.3.1)
[j ]!
where the Lj on both sides of the equation are computed with respect to the Riemann-
ian metric induced on M3 by the y i .
The reason we call this a “Gaussian Crofton formula’ should be clear by analogy
with the original, Euclidean, Crofton formula (13.1.1). The formulas are similar, but
we have replaced the kinematic averaging under the measure λN N−k with a Gaussian
average. Note, also, that the averaging in (13.3.1) is over both y and Zk .
Proof. We start by noting that it suffices to prove (13.3.1) for the Euler characteristic
ϕ = L0 . That is, we need only prove that
E ϕ(M 3 ∩ DZk ) = (2π)−k/2 [k]!Lk (M). 3 (13.3.2)
13.3 A Gaussian Crofton Formula 335
This follows from the following observation: For j > 0, take Zj ∼ γRj independent
of everything else, and write Zk+j for the concatenation of Zk and Zj . Note that, in
distribution,
3 ∩ DZk+j = M
M 3 ∩ DZj ∩ DZk .
j/2
3 ∩ DZk = (2π) E ϕ M
E Lj M 3 ∩ DZk+j
[j ]!
[k + j ]! 3
= (2π)−k/2 Lk+j (M),
[j ]!
3
(2π )−(dim(M)−k)/2 3
E α(PT⊥t DZ ∇3
y ; M)
3 − k)!
(dim(M) DZk k
3
× Tr Tt DZk (−∇ 23
y|DZk )(dim(M)−k) Hj (dt),
for some suitably regular random field 3 y , which we can take to be a copy of y 1 ,
1 k
independent of y , . . . , y . All that would then remain would be to compute the
expectation here, over both 3 y and Zk .
Unfortunately, the technicalities involved in applying this appealingly simple
argument are not trivial, and so we shall take a more basic route, via Morse theory.
We shall break the proof into in two parts, firstly fixing Zk and applying Morse
theory, which will eventually involve an averaging over y and 3 y . At the second stage
we shall also average over Zk .
Thus, with Zk = u, we look for a way to express the Euler characteristic of
M3 ∩ Du = Du in terms of points {t : yt = u} and ∇3 yt , which is in the span of
∇yt1 , . . . , ∇ytk . Denote the latter space by
Lt = span(∇yti , 1 ≤ i ≤ k).
With the manifold Du fixed, straightforward calculations show that the Hessian of 3
y
on Du can be written as
336 13 Mean Intrinsic Volumes
k
ij j
∇ 23
y|D,t = ∇ 23
yt − ∇3
yt , ∇yti gt ∇ 2 yt (13.3.3)
i,j =1
k
ij j
= ∇ 23
yt − PLt ∇3
yt , ∇yti gt ∇ 2 yt ,
i,j =1
⊥ y , and
y is the joint density of y and PLt ∇3
where py,P ⊥ ∇3
Lt
Jt = det(Gt ). (13.3.5)
While this formula is similar to many that appear in Chapter 12, there is a new Jacobian
term, Jt , here that deserves a few words. It arises due the fact that whereas previously
we counted points that were critical with respect to the underlying manifold M, 3 we
are now counting points of the form
t : yt = u, PL⊥t ∇3yt = 0 = t : yt1 = u1 , . . . , ytk = uk , PL⊥t ∇3
yt = 0 .
That is, we now have points critical with respect to a random submanifold. By
3 it is fairly straightforward to see that the Jacobian
choosing a convenient basis of Tt M
3 Recall that the Wishart(n, #) distribution is defined as the distribution of a matrix W with
elements of the form Wij = nm=1 Xmi Xmj , where the n vectors Xm = (Xm1 , . . . , Xmk )
are independent, each distributed as N(0, #).
4 Obviously, we have allowed ourselves considerable latitude in this argument. Filling in all
the details would take a few pages, but would involve little more than taking the references
we have made and rewriting them in the notation of the present problem. We leave the
details to the masochists and/or pedants among you.
13.3 A Gaussian Crofton Formula 337
(13.3.5) is precisely the one that arises in an application of Corollary 12.4.5 needed
to establish (13.3.4).
Furthermore, noting that since yt and ∇yt are independent the same is true of yt
and Lt , and yt and Jt , we can integrate out u in the above to obtain
E ϕ M 3 ∩ D Zk
⊥
1
= E Tr Lt (−∇ 23 y|D )n−k Jt PL⊥t ∇3
y=0
(n − k)! M 3
× pP ⊥ ∇3y (0)Hn (dt).
Lt
where we have used the fact that, given the subspace Lt , the pair (∇ 23 y|D,t , Jt ) is
conditionally independent of PL⊥t ∇3
y , which then has a standard Gaussian distribution
on L⊥t .
We now condition on (∇ 23y|D,t , Jt ), so that the next step is to compute the condi-
tional expectation
⊥
E Tr Lt (−∇ 23 y|D )n−k Jt ∇ 23 y|D,t , Jt .
Because
F of the conditioning, this is just the expected value of the trace of a fixed form
α ∈ n−k,n−k (Tt M),3 restricted to a random subspace of dimension n − k of Tt M. 3
Lemma 13.5.1 below shows how to evaluate such expectations, and in our case it
follows from the lemma that
⊥ n−1
2 3
E Tr Lt
(−∇ 3 2
y|D )n−k
Jt ∇ 3y|D,t , Jt = Tr Tt M (−∇ 23y|D )n−k Jt .
k
(13.3.6)
To complete the computation, we now need to evaluate
3
E Tr Tt M (−∇ 23 y|D )n−k Jt .
k
ij j
PLt ∇3
yt , ∇yti gt ∇ 2 yt .
i,j =1
338 13 Mean Intrinsic Volumes
We want a more user-friendly version of this. To obtain it, for the moment we drop
the dependence on t and define
k
−1/2
Vi = gij PL ∇3
y , ∇y i ,
j =1
−1/2 −1/2
where the gij are the elements of the matrix G−1/2 (= Gt ). Then, conditional
on ∇y , . . . , ∇y , ∇ 2 y 1 , . . . , ∇ 2 y k , we have V ∼ N (0, Ik×k ). Since the conditional
1 k
distribution does not depend on the conditioning variables, the Vi are actually inde-
pendent of them. Furthermore, since the ∇y i and ∇ 2 y j are all independent of one
another, we have that for each t,
k
ij j L
k
−1/2
PLt ∇3
yt , ∇yti gt ∇ 2 yt = Vi Wij ∇ 2yj ,
i,j =1 i,j =1
Putting all the pieces together, we have enough to prove (13.3.1), and so we are done,
modulo proving Lemmas 13.5.1 and 13.5.2, which we defer to Section 13.5.
Our next step will be to extend Theorem 13.3.1 from manifolds to Whitney strat-
ified manifolds. In fact, one can go marginally further than this. Define a family of
j
additive functionals k on C 2 Whitney stratified submanifolds M of a manifold M 3
by setting, in the notation of Theorem 13.3.1,
13.3 A Gaussian Crofton Formula 339
j
k (M) = E Lj (M ∩ DZk ) .
j
Then the following result shows that the k and the Ll are very closely related.
[k + j ]!
E Lj (M ∩ DZk ) = (2π)−k/2 Lk+j (M).
[j ]!
In the notation of the linear functionals described above, this can be written as
[k + j ]!
k = (2π)−k/2
j
Lk+j .
[j ]!
Proof. Most of the hard work has been done in the proof of Theorem 13.3.1, and what
remains is some bookkeeping. As was the case for the proof of Theorem 13.3.1, the
strategy of the proof is to derive an expression to which we can apply Lemma 13.5.2.
Similar arguments to those appearing in the proof of Theorem 13.3.1, but taking into
account that we now have boundary terms to deal with, show that
E ϕ M ∩ DZk (13.3.9)
n
1 ⊥
= E α ηt ; M ∩ DZk Tr Lt ((−∇ 23 y|D )l−k )Jt
(l − k)! ∂l M
l=k
PLt ∇3
y = 0 pPLt ∇3y (0)Hl (dt),
where 3y is again an independent copy of the y j being used as a Morse function and
α is the normal Morse index of Section 9.2. As for the other terms,
Jt = det(Gt ),
while
Lt = span PTt ∂l M ∇yti , 1 ≤ i ≤ k (13.3.10)
and we write L⊥
t to denote the normal space to Lt in Tt ∂l M. Finally, we have
ηt = ∇3
yt − PLt ∇3
yt (13.3.11)
l
= PT⊥t ∂l M ∇3
ij j
yt + ∇3
yt , PTt ∂l M ∇yti gt PTt ∂l M ∇yt ,
i,j =1
340 13 Mean Intrinsic Volumes
and
l
ij j
∇ 23
y|D,t = ∇ 23
yt − ∇3
yt , PTt ∂l M ∇yti gt ∇ 2 yt + Sνt . (13.3.12)
i,j =1
l
νt = PT⊥t ∂l M ∇3 yt , PTt ∂l M ∇yti gt PT⊥t ∂l M ∇yt .
ij j
yt − ∇3 (13.3.13)
i,j =1
The term ∇ 23 y|D,t in (13.3.12) deserves some explanation. It is the Hessian of the
y to ∂l M ∩ DZk , and seeing that it has the above form takes some work.
restriction of 3
Consider two vector fields X, W on ∂l M ∩ DZk and let ∇ denote the Levi-Civitá
connection of ∂l M ∩ DZk . Then, from the definition (7.3.22) of the Hessian and the
compatability property (7.3.11) of ∇, we have
∇ 23
y|D (W, X) = W ∇3
y , X − ∇3
y , ∇W X
= W ∇3 y , ∇W X + PT⊥t ∂l M ∇W X, ∇3
y , X − ∇3 y
l
+ ∇W X, PTt ∂l M ∇y r gtrs PTt ∂l M ∇y s , ∇3
y ,
r,s=1
3 to
with the second line coming from the form of the projection of vectors in Tt M
Tt (∂l M ∩ DZk ).
Now note that
= ∇ 23
y (W, X) + SP ⊥ (W, X)
Tt ∂l M ∇3
y
l
− ∇3
y , PTt ∂l M ∇y s gtrs ∇ 2 y|∂
r
lM
(W, X),
r,s=1
13.3 A Gaussian Crofton Formula 341
∇ 2 y|∂
r
lM
(W, X) = ∇ 2 y r (W, X) + ∇W X, PT⊥t ∂l M ∇y r .
∇ 23
y|D (W, X)
l
= ∇ 23
y (W, X) − ∇3
y , PTt ∂l M ∇y s gtrs ∇ 2 y r (W, X) + Sνt (W, X),
r,s=1
which is (13.3.12). Thus, all the terms in (13.3.9) are well defined and we can turn
to evaluating the right-hand side there.
As in the proof of Theorem 13.3.1, this evaluation relies on getting everything
into just the right form for applying Lemmas 13.5.1 and 13.5.2.
Applying Lemma 13.5.1 to (13.3.9) and using the independence of ∇3 y and ∇ 23
y,
we immediately find that
E ϕ M ∩ DZk
n
1
=
(l − k)!
l=k
⊥
× E{α(ηt ; M ∩ DZk ) Tr Lt ((−∇ 23
y|D )l−k )Jt |PLt ∇3
y = 0}pPLt ∇3
y (0)Hl (dt)
∂l M
n
k!
= (2π )−(l−k)/2
l!
l=k
× E{α(ηt ; M ∩ DZk ) Tr Tt ∂l M ((−∇ 23
y|D )l−k )Jt }Hl (dt).
∂l M
We now want to apply Lemma 13.5.2 to the expectation here. The expression
above for ∇ 23
y|D is of the right form, but this is not so for the term involving the
normal Morse index. However, an application of Theorem 9.2.6 shows that in fact,
which is of the right form. (Recall that ηt and νt are defined, respectively, by (13.3.11)
and (13.3.13).)
Consequently, using this equivalence and Lemma 13.5.2, we find that
342 13 Mean Intrinsic Volumes
E ϕ M ∩ DZk
n
k!
= (2π )−(l−k)/2 y|D )l−k )Jt Hl (dt)
E α (νt ; M) Tr Tt ∂l M ((−∇ 23
l! ∂l M
l=k
n
[k]!
= (2π )−l/2
(l − k)!
l=k
× E α PTt ∂l M ∇3
yt ; M Tr Tt ∂l M ((−∇ 23
y )l−k ) Hl (dt).
∂l M
Once again, similar computations to those appearing at the end of the proof of Theo-
rem 12.4.2 show that the final expression above is equal to
where the Lk on both sides of this equation are computed with respect to the metric
induced by f as at (12.2.1), the combinatorial coefficients are defined in (6.3.12),
and the ρl remain as in (12.4.2).
Proof. The proof mimics that of the Euclidean, isotropic version of the result in
Theorem 13.2.1, using the Gaussian Crofton formula of Theorem 13.3.2 rather than
the standard Crofton formula.
We start by fixing 0 ≤ j ≤ dim(M) and again introducing an auxiliary set
of Gaussian processes y 1 , . . . , y j and a Gaussian random variable Zj as in Theo-
rem 13.3.1, satisfying the conditions required there. As before, write
j
Du = t ∈ M : yt1 = u1 , . . . , yt = uj .
(2π)j/2
E Lj (Au (f, M)) = E L0 (Au (f, M) ∩ DZj )
[j ]!
(2π)j/2
= E E L0 (Au (f, M) ∩ DZj )DZj .
[j ]!
We now note that with probability one, the sets M ∩ DZj are Whitney stratified
manifolds,5 and so we can apply Theorem 12.4.2 to compute the inner expectation
above, giving
dim
M−j
(2π)j/2
E Lj (Au (f, M)) = ρl (u)E Ll (M ∩ DZk )
[j ]!
l=0
(2π)j/2
dim M−j
[l + j ]!
= ρl (u)(2π )−j/2 Lj +l (M),
[j ]! [l]!
l=0
Since the components yti are assumed suitably regular and Zi is just a random shift of
each component, the regularity of M ∩ DZi is essentially the same as the zero sets of the
vector-valued field y.
344 13 Mean Intrinsic Volumes
where
#
k
1
k (n) = π k(k−1)/4
n − (j − 1) .
2
j =1
We shall need two properties of the Wishart(n, Ik×k ) distribution. The first is
the rather immediate fact that for any W ∼ Wishart(n, Ik×k ) and any orthogonal
matrix A,
L
AW A = W. (13.5.2)
The second follows immediately from the fact that the density (13.5.1) can be rewrit-
ten as
p (n−k−1)/2 − kj =1 λj /2
j =1 λj e
,
2nk/2 k ( n2 )
L #
k
det(W ) = 2
χn+1−j , (13.5.3)
j =1
where we read the right-hand side as “the product of k independent χ 2 variables, with
degrees of freedom running from n + 1 − k to n.’’ An immediate consequence of this
is that
#
k
n!
E {det(W )} = (n + 1 − j ) = . (13.5.4)
(n − k)!
j =1
Proof. Firstly, we note that the case j > k is trivial, since then α ≡ 0 by definition.
Secondly, by the linearity of double forms we note that it suffices to consider the case
α = ω1 ∧ · · · ∧ ωj · ω1 ∧ · · · ∧ ωj (13.5.5)
for some orthonormal set {ω1 , . . . , ωj }. Note that from the definition (7.2.6) of the
trace operator,
Tr V (α) = Tr V ω1 ∧ · · · ∧ ωj · ω1 ∧ · · · ∧ ωj = 1. (13.5.6)
Finally, we claim that it also suffices to consider only the case j = k. To see this,
note that for an α of the above type,
Tr Lk (α) = α θi1 , . . . , θij , θi1 , . . . , θij ,
{i1 ,...,ij }⊂{1,...,k}
where {θ1 , . . . , θk } is a (random) orthonormal basis for Lk . Note that the individual
terms in the sum all have an identical expectation, due to the assumed distributional
invariance of Lk (and its subspaces) under orthonormal transformation. Hence
k
E Tr (α) =
Lk
E α θ1 , . . . , θj , θ1 , . . . , θj (13.5.7)
j
k
= E Tr Lj (α) .
j
F
Thus, to compute the expected value of the trace of an α ∈ j,j (V ) on a uniform
subspace of dimension k, it suffices to compute it on a uniform subspace of dimension
j . Consequently, we shall now concentrate on the form (13.5.5) for the case j = k.
To this end we introduce an auxiliary set of random vectors, X1 , . . . , Xk , taken
to be independent, identicially distributed with distribution γV .6 Then, for α of the
form (13.5.5), it follows from the definition of the trace operator that
where
gij = Xi , Xj ∼ Wishart(n, Ik×k )
is independent of Lk .
3∗ = span{ω1 , . . . , ωk } and, using the usual identification of V and its dual
Let V
∗
V , let V3 be the corresponding subspace of V . Define
Yi = PV3 Xi ∼ γV3 .
Note that
6 Recall that this means that for any x ∈ V ∗ , the dual of V , the real-valued random variable
x(X) is distributed as N(0, x).
346 13 Mean Intrinsic Volumes
However, the right-hand side here has a particularly simple distribution, since
where
3
gij = Yi , Yj ∼ Wishart(k, Ik×k ).
the last equality following from (13.5.6). Combining this with (13.5.7) completes
the proof.
In particular,
⎧⎛ ⎞2l ⎫
⎪
⎨ ⎪
⎬
k
−1/2
E ⎝Uk+1 + Zi Wij Uj ⎠ det(W )
⎪
⎩ ⎪
⎭
i,j =1
(2l)! n! ωn−2l
= (2π)−k/2 E U12l .
l!2 l (n − k)! ωn−2l−k
13.5 Two Gaussian Lemmas 347
where Xn ∼ χn2 and MXn is its moment-generating function. However, this is equal to
n/2 n/2−1
d 1 1
=n .
dλ 1 − 2λ 1 + |Z|2
λ=−|Z|2 /2
Substituting this into the last line of (13.5.8) gives that the expectation there is equiv-
alent to
#
k−1
−k/2
(2π ) E χn−j2
n(1 + |z|2 )l/2−n/2−1 (13.5.9)
j =1 Rk
! "
Uk+1 + ki=1 zi Ui
× E Gl dz
1 + |z|2
n!
= (2π)−k/2 E {Gl (U1 )} (1 + |z|2 )l/2−n/2−1 dz.
(n − k)! R k
348 13 Mean Intrinsic Volumes
Finally, since the integral here is, effectively, that of a multivariate t density with
n − l − k + 2 degrees of freedom and covariance parameter (n − l − k + 2)−1 Ik×k
(cf. [12]), we have
n−l−k
k/2 +1 ωn−l
(1 + |z| )
2 l/2−n/2−1
dz = π 2
n−l = ,
Rk 2 +1 ωn−l−k
and after putting all the pieces together, the proof is done.
Now suppose that G 3r is not identically 1. Then, the proof proceeds exactly as
above up to the point (13.5.9), keeping in mind that we had made the substitution
k
−1/2
zi → 3
zi = Wij zj ,
i,j =1
is replaced with
3r (z)(1 + |z|2 )l/2−n/2−1 dz.
G
Rk
As we have mentioned more than once before, one of the oldest and most important
problems in the study of stochastic processes of any kind is to evaluate the excursion
probabilities
P sup f (t) ≥ u , (14.0.1)
t∈T
where f is a random process over some parameter set T . As usual, we shall restrict
ourselves to the case in which f is centered and Gaussian and T is compact for the
canonical metric of (1.3.1).
In this chapter we shall restrict ourselves to the case of parameter sets that are
locally convex manifolds, and so shall write M rather than T for the parameter space.
The reason why we require local convexity will become clear from a counterexample
in Section 14.4.4.
In particular, we shall show that for such M and for smooth Gaussian random
fields f ,
P sup f (t) ≥ u − E {ϕ (Au (f, M))} < O e−αu2 /(2σ 2 ) , (14.0.2)
t∈M
u
N
+ e−u
2 /2σ 2
C0 Cj uN−j + error,
σ
j =1
it would be natural to expect that the error term here would be the “next’’ term of
what seems like the beginning of an infinite expansion for the excursion probability,
and so of order u−1 e−u /2σ . However, (14.0.3) indicates that this is not the case.
2 2
about how to compute excursion probabilities, but we do know a lot about comput-
ing expectations, we turn the probability problem into an expectation problem by
introducing the random variable
Mu = the number of global suprema above the level u.
Suppose that f has, almost surely, a unique global supremum, a situation that will
generally hold (as we shall show) for smooth random fields. Then
!
1, supM f (t) ≥ u,
Mu =
0 otherwise.
In order to compute E{Mu } we need to find a point process representation for Mu itself,
much as we did for the Euler characteristic, in which case the representation came
ready made from Morse theory. We can then use the theory of Chapter 12 to develop
an integral expression for E{Mu }, analogous to those we developed for computing
E{ϕ(Au )}. Although we were actually able to evaluate the integral expression for
E{ϕ(Au )}, it turns out to be impossible to evaluate it for E{Mu }. Consequently, we
shall take the difference between the two expressions, and work on this in order to
bound |E{Mu } − E{ϕ(Au )}| or equivalently, |P{supt∈M f (t) ≥ u} − E{ϕ(Au )}|. All
of this is done in Sections 14.1 and 14.2 for quite general smooth random fields.
In other words, Gaussianity is not assumed, and so these arguments lay the basis
for quite a general theory. In Section 14.3 we specialize to the Gaussian case and
derive bounds that begin to look computable. This is the central Theorem 14.3.3 we
mentioned above. Finally, in Section 14.4, we shall see how to apply Theorem 14.3.3
in specific Gaussian scenarios, for which, after some work, the bounds become simple
and explicit.
The first step of this procedure, that of finding a useful point process representation
for Mu , turns out to be surprisingly difficult and extremely technical. We therefore
start with it.
3 3
h ∈ C 2 (M), h=3
h|M is a Morse function on M.
Our first representation of global suprema lies in the following lemma, whose main
contribution lies in its third set of conditions.
Then t is a maximizer of h above the level u if and only if the following three conditions
hold:
(i) h(t) ≥ u.
(ii) ∇3h(t) ∈ Nt M, where Nt M is the normal cone to M at t. Thus t is an extended
outward critical point of h (cf. Definition 9.3.4).
(iii) For all t ∈ M,
Note that the condition that α t (t) = 1 ensures that for each t ∈ M, t is a critical
point of α t .
h(s) − α t (s)h(t)
< h(t).
1 − α t (s)
On the other hand, if α t (s) = 1, then, by choice of α t , h(s) = h(t), which proves that
Now suppose that t is not a maximizer of h. Then there exists s ∈ M \ {t} such that
In particular, for such an s, our choice of α t implies that α t (s) < 1. It follows that
h(s) − α t (s)h(t)
h(t) < ,
1 − α t (s)
which yields a contradiction.
The limit (14.1.1) follows from two applications of l’Hôpital’s rule. Specifically,
we note that t is a critical point of h|∂i M by assumption and the properties of α t imply
that t must also be a nondegenerate critical point of α t . Therefore,
du h(c(u)) − α (c(u))h(t)
d t
h(c(u)) − α t (c(u))h(t)
lim = lim
u→0 1 − α t (c(u)) u→0
du (1 − α (c(u)))
d t
d2
du2
h(c(u)) − α t (c(u))h(t)
= lim .
d2
u→0
du2
(1 − α t (c(u)))
To compute the numerator and denominator here note the following: Fix a t ∈ ∂i M
and take β ∈ C 2 (∂i M) for which t is a critical point of β. Let c : (−δ, δ) → ∂i M be
a C 2 curve with c(0) = t and ċ(0) = Xt . Then
d2
lim β(c(u)) = ∇ 2 β(t)(Xt , Xt ).
u→0 du2
Apply this fact to each term in the last ratio above and use the assumed properties
of α t to complete the proof.
(Of course, this replacement for (iii) means that (i) and (ii) must automatically hold.)
This seems far simpler than the original (iii) and does not require the introduction of
the seemingly unnecessary function α t .
However, now think of h ≡ f as a mean zero, unit-variance Gaussian field, and
let α t (s) = ρ(t, s), where ρ is the correlation function of f . It is trivial to check that
for each t = s in M,
E f (t)f t (s) ≡ 0,
so that f (t) and sups∈M\{t} f t (s) are independent, something that is not true of f (t)
and sups∈M\{t} f (s). It is to gain this independence that we have introduced α t and
ht . In fact, if life were good, we would stop here and count ourselves lucky at having
already found a useful point process representation for global suprema, as required
in the outline of our plan of attack. However, life is not always good, and indeed, we
shall soon see that the function (s, t) → ht (s) can have some unpleasant properties.
Despite all that we have said so far about the functions ht , we have not looked too
closely at their properties. From (14.1.1) we know that ht (s) is continuous as s → t
in ∂j M, but we have not looked at other directions. In fact, in other directions ht is
badly behaved. It is straightforward to check, again by two applications of l’Hôpital’s
rule, that
∇3
h(t) ∈ Nt M ⇐⇒ sup ht (s) < ∞, (14.1.2)
s∈M\{t}
∇3
h(t) ∈ Nt M ⇒ inf ht (s) = −∞.
s∈M\{t}
In other words, there are paths of approach to t, in M but not in ∂j M, along which the
behavior of ht is highly singular. We now begin attempting to resolve this singularity,
with the solution finally appearing in Corollary 14.1.5, which gives a useable point
process representation for global maximizers. The stochastic version of the solution
appears in Corollary 14.1.6.
Our aim in this section is to replace the function ht we studied above by something
smoother, although, unfortunately, it will be somewhat more involved. In order to
profit from the characterization of global maxima as certain types of extended outward
critical points (cf. Lemma 14.1.1), we shall ensure that the replacement function will
be identical to our original ht at critical points. Furthermore, we shall show that for
this new function, which we shall continue to denote by ht , the function
14.1 On Global Suprema 355
W (t) = sup ht (s) (14.1.3)
s∈M\{t}
St M = Tt ∂i M × Kt ,
3
K = L × K ⊂ Rq = M,
X = L × (K \ {0}),
∂X = L × (K ∩ S(Rq )).
(L × K) \ {(t, s) ∈ L × K : t = s}
α t (s) = α(t, s)
satisfies the conditions of Lemma 14.1.1 at every t ∈ L and such that the Hessian of
the partial map α t is nondegenerate at every t ∈ L. Then, any 3
h ∈ C 2 (Rq ) maps to
a continuous function 3 hα,K on B(K) as follows:
3 3
h(t + s) − α(t, t + s)3
h(t) − ∇3
ht , sRq
hα,K (t, s) =
1 − α(t, t + s)
if either t or s is not in ∂X , and
3 ∇ 23
h(t)(s, s)
hα,K (u, v) = 3
lim h(t) − 2
(u,v)→(t,s) ∇ α(t)(s, s)
for (t, s) ∈ ∂X .
The term
∇3
ht , sRq
1 − α(t, t + s)
above “resolves’’ the singularity along the diagonal in some sense. In effect, it forces
every t ∈ L to be a critical point of the map
3
h(t + s) − α(t, t + s)3
h(t) − ∇3
h(t), sRq
s → .
1 − α(t, t + s)
To see how to exploit this fact, recall that our motivation for introducing 3 hα,K was
t
to describe the singularities in the function h (s) at critical points t of h|L . (Recall also
that L takes the place of Tt ∂i M in our general cone K). We are therefore interested
in critical points t of h|L = 3h|L . Note that if t is a critical point of h|L , then, for all
s ∈ Rq ,
14.1 On Global Suprema 357
With this final (re)definition of h, (14.1.6) holds for all t ∈ L, and furthermore, for
each critical point t of h|L the two definitions of ht coincide. For the remainder of this
section, as long as we remain in the Euclidean settting, we use the definition (14.1.7).
We shall, however, need to adjust it somewhat when we return to the piecewise smooth
scenario.
We now derive some properties of our new ht .
Lemma 14.1.3. PL⊥ ∇h(t) ∈ K ∗ , the dual of K ⊂ L⊥ , if and only if for any bounded
neighborhood Ot of t,
ht (s) ≤ 3
hα,K (t, s − t).
since the numerator on the right of (14.1.6) is strictly positive of order O(δ) for δ
small, while the denominator is of order O(δ 2 ).
Lemma 14.1.4. If PL⊥ ∇h(t) ∈ (K ∗ )◦ , then, for any bounded neighborhood O of the
origin in Rq ,
358 14 Excursion Probabilities for Smooth Fields
WO (t) = sup ht (s)
s∈K∩({t}⊕O)\{t}
PL⊥ ∇h(t), vRq
= sup 3
h (x, v) +
α,K
v∈K∩(O\{0}) 1 − α(t, t + v)
is continuous in t.
Proof. We first note that by (14.1.6) the two suprema above are equal, and so it
suffices to consider the second one.
Consider a convergent sequence
in B(K) along which the supremum WO (t) is approached. Then either vn (t)Rq > 0
for all n sufficiently large or vn (t)Rq → 0. In the first case it is immediate that
(t, v ∗ (t)) is in X , and as we will show in a moment, in the second case (t, v ∗ (t)) ∈
∂X ∩ L × S(L), where S(L) is the unit sphere in L. In other words, the limiting
direction v ∗ (t) is in S(L). Furthermore, if vn (t)Rq → 0 then it is easy to see that
the sequence (t, PL vn (t)) also achieves the supremum WO (t).
To see why v ∗ (t) must be in S(L) in the second case, suppose that (t, vn (t))
converges to (x, v ∗ (t)) with v ∗ (t) ∈ K \ L. Since PL⊥ ∇h(t) ∈ (K ∗ )◦ it follows that
The above implies that for any t ∈ L and any convergent sequence (x, vn (t))
achieving WO (t) we can either assume that |vn (t)| is bounded uniformly below, or that
vn (t) ∈ L for all n. Continuity of WO (t) now follows, since for such sequences both
α,K
sup 3 h (t, vn (t)) − 3hα,K (s, vn (t))
s∈B(t,δ(ε))
and
P ⊥ ∇h(t), v (t) q PL⊥ ∇h(s), vn (t)Rq
L n R
sup −
s∈B(t,δ(ε)) 1 − α(t, t + vn (t)) 1 − α(s, s + vn (t))
Lemmas 14.1.3 and 14.1.4 give us what we need for the Euclidean scenario, that is,
a candidate for ht that is well behaved and that we can plug into our characterization
for global suprema. The same arguments also work for piecewise smooth spaces,
after some slight modifications.
Specifically, the map , defined in (14.1.4), has no natural replacement for a
piecewise smooth space. However, we can think of B(K) as a subset of the tangent
bundle T (Rq ), so that we can think of as a map from Rq ×Rq to T (Rq ). Translating
this to the piecewise smooth setting, we should therefore look for a replacement map,
say H , from M 3×M 3 to T (M)
3 such that for each t ∈ M,3
H {t} × M 3 ⊂ Tt M.3 (14.1.8)
One of the key properties of was that for sequences (tn , sn ) ∈ X converging to
(t, 0) ∈ ∂X for which the unit vector
tn − sn
tn − sn Rq
Note that at critical points of h|∂i M (t) we always have that ∇h|∂i M = 0, so that the
term in 3g in (14.1.10) disappears, as it does for (i) in Lemma 14.1.5 below. Having
made this observation, by working in suitably chosen charts, it is not difficult to prove
the following lemma.
for every t ∈ M and such that the Hessian of the partial map α t is nondegenerate
at every t ∈ M. Furthermore, suppose H : M 3×M3 → T (M) satisfies (14.1.8) and
t
(14.1.9), and let h be defined as in (14.1.10).
Then t is a maximizer of h above the level u if and only if the following three
conditions hold:
360 14 Excursion Probabilities for Smooth Fields
(i) h(t) − 3
g (H (t, t), ∇h|∂i M (t)) ≥ u.
(ii) ∇3h(t) ∈ Nt M.
(iii) For all t ∈ M,
h(t) ≥ W (t) = sup ht (s).
s∈M\{t}
3
where PT⊥t ∂i M represents projection onto the orthogonal complement of Tt ∂i M in Tt M.
To appreciate why all of this hard work has been necessary to characterize some-
thing as simple as a supremum, we need first to translate it all to a stochastic setting.
We also place a new condition on ρ that we have not met before. It ensures that
the map t → f3(t) is an embedding of M3 into L2 ( , F, P) and rules out “global’’
singularities in the process, a term whose meaning will become clearer later. The
condition is
ρ(t, s) = 1 ⇐⇒ t = s. (14.1.12)
This condition rules out, for example, periodic processes on R. However, as we shall
see in Section 14.4.4, there is a good reason why this should be the case.
With the above definitions, we can now choose candidates for our functions α and
H of the previous subsection. Specifically, we take an orthonormal (with respect to
3
g ) frame field (X1,t , . . . , Xdim M,t 3 and set
3 ) on Tt M
α(t, s) = ρ(t, s), (14.1.13)
3
M
dim
Cov(f3(s), Xj f3(t))Xj,t .
H (t, s) = F (t, s) = (14.1.14)
j =1
Corollary 14.1.6. In addition to the above assumptions, assume that the maximizers
of f3are almost surely isolated and that there are no critical points of f|∂i M such that
PT⊥t ∂i M ∇ f (t) ∈ ∂Nt M ⊂ Tt ∂i M ⊥ . Then the maximizers of f are the points t ∈ ∂i M,
0 ≤ i ≤ N, satisfying the following four conditions:
(i) f (t) − 3 g (F (t, t), ∇f|∂i M (t)) ≥ u.
(ii) ∇f|∂i M (t) = 0.
(iii) PT⊥t ∂i M ∇ f3(t) ∈ (PT⊥t ∂i M Nt M)◦ .
(iv) f (t) ≥ sups∈M\{t} f t (s),
where
! f (s)−ρ(t,s)f (t)−3
g (F (t,s)−ρ(t,s)F (t,t),∇f
|∂i M (t))
, ρ(t, s) = 1,
f t (s) = 1−ρ(t,s) (14.1.15)
f (s), ρ(t, s) = 1.
Note that for later convenience, we have split the condition that t be an extended
outward critical point into two parts—(ii) and (iii).
Proof. Note firstly that since there are no critical points of f|∂i M such that
PT⊥t ∂i M ∇ f (t) ∈ ∂Nt M ⊂ Tt ∂i M ⊥ , all global maximizers will be such that
PT⊥t ∂i M ∇ f (t) is in the relative interior of Nt M in Tt ∂i M ⊥ . Thus (ii) and (iii) are
equivalent to the requirement that t be an extended outward critical point.
Next, note that while α and ρ have been assumed to have almost identical prop-
erties, they differ in one important aspect: While we assumed that α t (s) = 1 implied
362 14 Excursion Probabilities for Smooth Fields
h(t) = h(s), the same cannot be said to hold when ρ(t, s) = 1 unless we also assume
that E{f (t)} is constant and E{f 2 (t)} = 1 for all t. Nevertheless, a quick check
through the previous proofs will confirm the fact that given the definition of f t , this
difference affects none of the arguments.
Having noted these two facts, we can apply Lemma 14.1.5 on an almost sure
basis, with ht replaced by f t , α by ρ, and H by F .
We now have enough to finally see what the link is between ϕ(Au ), the Euler char-
acteristic of the excursion set {t ∈ M : f (t) ≥ u}, and the global supremum of f .
To do this, we need one more assumption, that the global maximizer of the process
f is almost surely unique. In this case, it follows from Corollary 14.1.6, under the
assumptions made there along with uniqueness of the global supremum, that2
1{supt∈M f (t)≥u} = # extended outward critical points t of f for which
f (t) ≥ u, sup f t (s) ≤ f (t) . (14.1.16)
s∈M\{t}
On the other hand, if we return to Corollary 9.3.4, which gives a point set character-
ization for the Euler characteristic for excursion sets over piecewise smooth spaces,
and compare this to the above, we immediately obtain that
1 − ϕ(Au (f, M))
supt∈M f (t)≥u
≤ # extended outward critical points t of f for which
f (t) ≥ u, sup f t (s) > f (t) .
s∈M\{t}
While correct, this relationship is not terribly useful, since we have no way of
computing the right-hand side of the inequality. What turns out to be a lot more
accessible is to begin evaluating the expectations of each term on the left, and then
compute with the difference. To do this, we need additional regularity conditions on
f , and adopt those of Theorem 11.3.4 along with the assumption3 that for all t ∈ M,
have a density, bounded by some constant K(t, s). Then it is an easy consequence of
Lemma 11.2.10 that
14.1 On Global Suprema 363
where
W (t) = sup f t (s),
s∈M\{t}
where
and where p∇f|∂i M,Ei (t) is the density of ∇f|∂i M,Ei (t) and Hi is the volume measure
induced by the Riemannian metric g.
Arguing as in the proof of Theorem 12.4.2 (cf. also (12.4.4)) it is not hard to
see that
where
and, if you check through the proofs leading to Theorem 12.1.1—which is still “straight-
forward’’ but which would take some time—you will see that this is all we actually need.
364 14 Excursion Probabilities for Smooth Fields
The discrepancy between the mean Euler characteristic of an excursion set and
the supremum distribution is therefore
where
There are three remarkable facts to note about (14.1.18). The first is that it is an
identity, and not an approximation or a bound. The second is that it holds for a wide
class of smooth random fields over a large class of piecewise smooth parameter spaces.
The last is that these smooth random fields were never assumed to be Gaussian.
Given (14.1.18), it is not difficult to guess how the rest of the proof should proceed.
The key point is that the expectation is taken only over the event AERR t , which is
included in the event {u ≤ f (t) ≤ W (t)}. Now, if you remember our brief stochastic
diversion at the end of Section 14.1.1, you will remember that if f is Gaussian with
constant mean and unit variance, f (t) and W (t) are independent. Thus
P AERRt ≤ P {u ≤ f (t) ≤ W (t)} (14.1.19)
≤ P {u ≤ f (t)} × P {u ≤ W (t)} .
We know from the very general theory of Chapter 4 that for the smooth random fields
that we are considering it is likely that
P sup f (t) ≥ u ≈ uα sup P{u ≤ f (t)},
t∈M t∈M
for some α, and where, since the current argument is heuristic, we make no attempt
to give a precise mathematical meaning to the ≈ here. Furthermore, we should
also be able to bound the rightmost probability in (14.1.19) by the general theory
of Chapter 4, since it is just a Gaussian excursion probability. If so, it too is going
to be exponentially small. Putting these two facts into (14.1.19) makes
P AERR
t
superexponentially small (when compared to P supt∈M f (t) ≥ u ) and, modulo
technicalities, we basically have the result promised at the beginning of the chap-
ter. However, since that result also promised explicit bounds on the rate at which
Diff f,M (u) → 0 as u → ∞, we are also going to have to do some more detailed
computation.
Of course, what is clear from the above description is that to truly exploit (14.1.18)
we are going to have to make distributional assumptions. Above, we have assumed
14.2 Some Fine Tuning 365
so that
f (c(u)) − ρ(t, (c(u))f (t)
f3t (c(u)) = .
1 − ρ(t, c(u))
The key to circumventing this problem relies on two facts. Firstly, we are really
concerned only with large positive values of f t , and, secondly, we care about the
behavior of f t only on the set
(t, ω) : PT⊥t ∂i M ∇ f3(t)(ω) ∈ (PT⊥t ∂i M Nt M)◦ . (14.2.1)
We exploit these facts and introduce a process f t in this section that has, under
appropriate conditions, finite variance while dominating6 f t on the set (14.2.1). The
good news is that this is the last time we shall have to “adjust’’f t . The final adjustment
is important, for not only does it make the proofs work, but the variance of f t will
appear as an important parameter in our final result, related to the α in (14.0.3).
The process f t is defined by
!
f (s)−ρ(t,s)f (t)−3g (F t (s)−PNt M F t (s),∇ f3(t))
, ρ(t, s) = 1,
f (s) =
t
t 1−ρ(t,s) (14.2.2)
f (s) − 3 3
g F (s) − PNt M F (s), ∇ f (t) , ρ(t, s) = 1,
t
Lemma 14.2.1. We adopt the setting of Section 14.1.4 and the assumptions of Corol-
lary 14.1.6. Then, for every t in the set (14.2.1) of extended outward critical points
and for every s ∈ M,
f t (s) ≥ f t (s).
3
Since Nt M is a convex cone, it follows that for any Yt ∈ Tt M,
Yt − PNt M Yt ∈ Nt M ∗ ,
3 Since F t (s) ∈ Tt M
for every Yt ∈ Tt M. 3 for each s, the first claim holds.
As for the second, if t ∈ ∂N M, then Nt M = Tt ∂N M ⊥ and PNt M Vt = 0 for all
Vt ∈ Tt ∂N M. Similarly,
g Vt , PT⊥t ∂i M ∇ f3(t) = 0.
3
is concerned, it is not difficult to show that, almost surely, Lemma 14.1.5 holds with
f t replaced by f t , i.e., that W (t) is continuous on the set
where
BtERR = {u ≤ f (t) ≤ W (t)}.
Corollary 14.2.3. With M as in Theorem 14.2.2, assume that f3is a zero-mean Gauss-
ian field satisfying the conditions of Corollary 11.3.5 and that (14.1.12) holds. Then
(14.2.5) holds.
We now come to what is the main section of this chapter, and the last part of the
program we described at its beginning. That is, we shall now actually obtain an
explicit bound for the difference between excursion probabilities and E{ϕ(Au )}.
Our first step will be to show that for each t ∈ M, the process f t of (14.2.2) is
uncorrelated with the random vector ∇f|∂i M (t). Hence, in the Gaussian case, f t (s) is
independent of ∇f|∂i M (t). The following lemma requires no side conditions beyond
the basic ones on M and the requirement that f and ρ be smooth enough for all terms
to be well defined. Since this is much weaker than the conditions of Theorem 14.2.2,
we will leave more details for later. It also does not require that f be Gaussian, nor
that it have constant variance.
for every Xt ∈ Tt ∂i M.
where probably the best way to check the last line is to expand all the vectors in the
middle expression and then compute.
If, on the other hand, ρ(t, s) = 1, then a similar computation shows that
Cov f t (s), Xt f = Cov (f (s), Xt f )
t
− Cov 3 g F (s) − PNt M F t (s), ∇ f3(t) , Xt f
= Cov (f (s), Xt f ) − 3
g F t (s) − PNt M F t (s), Xt .
The conclusion will therefore follow once we prove that for every t ∈ M,
14.3 Gaussian Fields with Constant Variance 369
Cov (f (s), Xt f ) = 3g F t (s) − PNt M F t (s), Xt ,
Cov (f (s) − ρ(t, s)f (t), Xt f ) = 3
g F t (s) − PNt M F t (s), Xt .
Since the two arguments are similar, we prove only the first equality. The map
F t can be decomposed as
F t (s) = PTt ∂i M F t (s) + PT⊥t ∂i M F t (s),
where
i
PTt ∂i M F t (s) = Cov f (s), Xj f (t) Xj,t ,
j =1
q
PT⊥t ∂i M F t (s) = Cov f (s), Xj f (t) Xj,t ,
j =i+1
and (X1,t , . . . , Xq,t ) is chosen such that (X1,t , . . . , Xi,t ) forms an orthonormal basis
for Tt ∂i M, while (Xi+1,t , . . . , Xq,t ) forms an orthonormal basis for Tt ∂i M ⊥ , the
orthogonal complement of Tt ∂i M in Tt M. 3
Furthermore, since 3 g (Xt , Vt ) = 0 for every Xt ∈ Tt ∂i M and Vt ∈ Nt M, it
follows that
PNt M F t (s) = PNt M PT⊥t ∂i M F t (s),
and, for every Xt ∈ Tt ∂i M,
3
g (F t (s) − PNt M F t (s), Xt ) = 3
g (PTt ∂i M F t (s), Xt )
g (PT⊥t ∂i M F t (s) − PNt M PT⊥t ∂i M F t (s), Xt )
+3
=3
g (PTt ∂i M F t (s), Xt )
i
= Cov(f (s), Xj f (t))3
g (Xj,t , Xt )
j =1
= Cov(f (s), Xt f ),
and we are done.
If we now assume that f is Gaussian, then the lack of correlation of Lemma 14.3.1
becomes independence, and the independence between the process f t and ∇f|∂i M (t)
allows us to remove the conditioning on ∇f|∂i M (t) in the expression for Diff f,M (u),
regardless of whether f has constant variance.
Corollary 14.3.2. Suppose f is a Gaussian process satisfying the conditions Corol-
lary 14.2.3. Then, with the notation of Theorem 14.2.2,
N
| Diff f,M (u)| ≤ E{| det(−∇ 2 f|∂i M,Ei (t))|1C ERR } (14.3.1)
t
i=0 ∂i M
where
Proof. Comparing this result with Theorem 14.2.2, it is clear that the only thing we
need to prove is that if the event CtERR occurs, then the condition {u ≤ f (t) ≤ W (t)}
can be replaced with
and
σc2 (f ) = sup σc2 (f, t). (14.3.4)
t∈M
∇ 2 f|∂i M,Ei (t) = ∇ 2 f|∂i M,Ei (t) − E{∇ 2 f|∂i M,Ei (t)|f (t)}
+ E{∇ 2 f|∂i M,Ei (t)|f (t)}
= ∇ 2 f|∂i M,Ei (t) + f (t)I − f (t)I
14.3 Gaussian Fields with Constant Variance 371
(cf. (12.2.12)). Now expand the determinant in (14.3.1) via a standard Laplace ex-
pansion as we did in (11.6.13) and apply Hölder’s inequality for conjugate exponents
p and q to see that
N
i
| Diff f,M (u)| ≤ E{f (t)j 1{f (t)≥u} }
i=0 ∂i M j =0
× (E{| detr i−j (−∇ 2 f|∂i M,Ei (t) − f (t)I )|p })1/p
× (P{W (t) ≥ u})1/q dHi (t),
For
u ≥ sup E sup (f t (s) − E(f t (s))) + μ+
t∈M s∈M\{t}
The result now follows after noting that we can choose q arbitrarily close to 1, and
u(q) such that for u ≥ u(q), the remaining terms are arbitrarily small logarithmically
when compared to u2 .
There is, of course, a glaring weakness in Theorem 14.3.3, in that (14.3.2) contains
a liminf rather a lim, and it contains an inequality rather than an equality.
Fortunately, the theorem provides a lower bound on the exponential decay of
Diff f,M (u), and this is almost always the important one in any practical situation.
We believe that this lower bound is generally tight when a maximizer of σc2 (f ) occurs
in the interior of M, in the sense that then the term corresponding to ∂N M in the sum
372 14 Excursion Probabilities for Smooth Fields
defining Diff f,M (u) in (14.1.18) is exponentially of the same order as the upper
bound. However, we have not been able to round off the desired proof, since it seems
difficult to determine the sign of the lower-order terms in the error. This opens the
(unlikely) possibility that some terms in the sum (14.1.18) cancel each other out,
leading to a faster rate of exponential decay.
Thus, in the piecewise smooth setting, it is therefore still open as to whether
(14.3.2) can be improved, and if so, under what conditions.
14.4 Examples
We now look at a number of applications of Theorem 14.3.3. The theorem contains
two “unknowns.’’ One is the expected Euler characteristic and the other is the param-
eter σc2 (f ). The expected Euler characteristic is the material of Chapters 11 and 12,
so we have no need to treat it again here. Consequently, looking at examples means
computing, or trying to understand, σc2 (f ), which, for reasons that will become clear
below, we now refer to as the critical variance. We shall start with the simplest possi-
ble example—stationary processes on R—and then jump to a rather general scenario.
We shall then close the chapter with a number of rather instructive special cases.
It is the general scenario that is actually the most instructive, for the simpler ones
are so special that they give little or no insight into what is really going on. Hopefully,
after reading the general case, with its elegant geometric interpretation of σc2 (f ), it
will be clearer to you from where the constructions of the previous sections of this
chapter, particularly f t , came.
The general case also allows us to investigate some very special cases in which
the approximation of Theorem 14.3.3 is either perfect (σc2 (f ) = 0) or meaningless
(σc2 (f ) = ∞). Along the way, we shall also spend a little time with the “cosine
random field’’(which actually provides the only example for which the approximation
is perfect) and see how it provides both examples and counterexamples for various
approximation techniques.
This implies that σc2 (f, 0) ≥ σc2 (f, t), 0 ≤ t ≤ T . Therefore the critical variance
σc2 (f ) is attained at the endpoints t = 0, T .
We can summarize the above discussion as follows.
374 14 Excursion Probabilities for Smooth Fields
Carrying on in the spirit of the previous example, we now consider isotropic fields
over convex sets M in RN . We add a new condition related to a monotonicity
property for the covariance function. Thus, while this example is more general than
Example 14.4.1 in that it treats more general parameter spaces, it is more specific in
that it requires more of the covariance function. Nevertheless, it is quite general, and
gives a very simple and computable formula for the critical variance σc2 .
for all j . Suppose that M is compact, piecewise smooth, and convex, and set T =
diam(M), where the diameter is computed in the standard Euclidean sense.
Suppose also that
∂ρ(t)
≤ 0, (14.4.5)
∂tj
Proof. We start by noting that since f is isotropic with unit second spectral moment,
the metric induced by f is the standard Euclidean one. Furthermore, we can take
M3 = RN .
We need to compute the maximal variance of the random field f t (s) given by
(14.2.2). Consider the term PNt M F t (s) that appears there, where
N
F t (s) = Cov(f (s), Xi f (t))Xi (14.4.7)
i=1
g (F t (s), ∇ f3(t))
f (s) − ρ(t, s)f (t) − 3
f t (s) = (14.4.8)
1 − ρ(t, s)
N
f (s) − ρ(t, s)f (t) − i=1 Cov(f (s), Xi f (t))Xi f (t)
= .
1 − ρ(t, s)
Substituting into (14.4.7) and using the fact that ρ̇(s) ≤ 0 gives us that F t (s) is
proportional to X1 , so that Xts is parallel to F t (s).
We turn now to computing the variance of (14.4.8) for given s, t. Once again,
we can choose t = 0 and s = (τ, 0, . . . , 0), for τ = |s − t|. If we then substitute
(14.4.9) into (14.4.8), a little algebra easily leads to the fact that
and so
376 14 Excursion Probabilities for Smooth Fields
By isotropy, we need not take the supremum over t here, and so fix t, taking t = 0.
We now claim is that the supremum over s is achieved as s → 0. If this is true, then
isotropy implies that we can take s → 0 along any Euclidean axis. This being the
case, two applications of l’Hôpital’s rule show that
∂ 4 ρ(t)
σc2 (f, M) = − 1,
∂t14 t=0
while (5.5.5) and the basic properties of Gaussian distributions give that this, in turn,
2
is the same as Var( ∂ f 2(t) |f (t)).
∂t1
Consequently, modulo the issue of where the supremum in (14.4.10) is achieved,
we have completed the proof. We shall return to this issue at the end of the next
section.
In this section we want to look a little more carefully at the critical variance
where
σc2 (f, t) = sup Var f t (s) (14.4.12)
s∈M\{t}
and f t is given by (14.2.2) (cf. (14.3.3) and (14.3.4)). Our aim is to give a geometrical
interpretation of σc2 (f ) in a quite general setting. With this, we shall return to complete
the proof of Example 14.4.2.
We shall start by looking closely at (14.4.12), for points7 t ∈ ∂N M = M ◦ . In
later examples we shall see that these are often the most important points to consider.
They are also the points for which a geometric intepretation of (14.4.12) is most
accessible.
The geometry that comes into play here is that of H , the reproducing kernel Hilbert
space (RKHS) of Section 3.1. In particular, since we shall retain the assumption that
f has constant unit variance, we shall be concerned with the spherical geometry of
the unit sphere S(H ) in H .
Recall, however, that there is a canonical isometry between H and H, the span of
ft , t ∈ M. Consequently, we can, and shall, work in H rather than H . Recall also
7 Actually, the arguments will work in slightly greater generality. For example, if M is
composed of a number of disjoint piecewise smooth manifolds, then they will also hold
for all t in the top-dimensional stratum of the separate piecewise smooth manifolds. For
example, if M = [0, 1]2 ∪ ({3} × [0, 1]) then it will hold for all t ∈ (0, 1)2 ∪ ({3} × (0, 1)).
14.4 Examples 377
that the unit-variance assumption also implies that the H inner product between f (t)
and f (s) in S(H) is given by
(t) = f (t).
∗ (Xt ) = Xt f,
so that the tangent space Tf (t) (M) is spanned by (X1,t f, . . . , XN,t f ) for any basis
{X1,t , . . . , XN,t } of Tt M. We denote the orthogonal complement of Tf (t) (M) in
Tf (t) H by Tf⊥(t) (M), and the orthogonal complement of Tf (t) (M) in Tf (t) S(H)
by Nf (t) (M).
Given a point f (t) and a unit normal vector vf (t) ∈ Nf (t) (M) we denote the
geodesic, in S(H), originating at f (t) in the direction vf (t) by cf (t),vf (t) . It is easy to
check that this is given by
The following lemma links these critical radii of (M) to the critical variances
(14.4.11) and (14.4.12).
Lemma 14.4.3. Under the setup and assumptions of Theorem 14.3.3,
!
cot 2 (θc (f (t))), 0 ≤ θc < π2 ,
σc (f, t) =
2
(14.4.16)
2 ≤ θc < π,
π
0,
for all t ∈ ∂N M.
8 It is important to note at this point that although H is made up of random variables, the
quantity θ (f (t), vf (t) is deterministic. Furthermore, by the isometry between the RKHS
and H it has essentially the same definition in the RKHS that it has in H.
378 14 Excursion Probabilities for Smooth Fields
Proof. Much as in the proof of Lemma 14.2.1 and Example 14.4.2, we start by noting
that if t ∈ ∂N M, then Nt M = Tt ∂N M ⊥ and PNt M Vt = 0 for all Vt ∈ Tt ∂N M. Now
3 such that at every t ∈ ∂N M, the first N
choose an orthonormal frame field for T (M)
vectors in the frame field generate Tt M, while the remaining vectors generate Nt M
3 Then f t simplifies somewhat, and for s = t, t ∈ ∂N M, it is
(as a subspace of Tt M).
easy to check that
g (F t (s), ∇ f3(t))
f (s) − ρ(t, s)f (t) − 3
f t (s) =
1 − ρ(t, s)
N
f (s) − ρ(t, s)f (t) − i=1 Cov(f (s), Xi f (t))Xi f (t)
= .
1 − ρ(t, s)
We now compute the variance of f t (s). Note that
N
ρ(t, s)f (t) + Cov(f (s), Xi f (t))Xi f (t)
i=1
N
= f (t), f (s)H f (t) + f (s), Xi f (t)H Xi f (t)
i=1
= PLt f (s),
where Lt = span{f (t), X1 f (t), . . . , XN f (t)}.
However, Nf (t) (M) is the orthogonal complement (in Tf (t) H ) of the subspace
Lt . Consequently,
t
N
Var f (s) = Var f (s) − ρ(t, s)f (t) − Cov(f (s), Xi f (t))Xi f (t)
i=1
= PNf (t) (M) f (s)H ,2
where PNf (t) (M) represents orthogonal projection onto Nf (t) (M).
Finally, we have that
PNf (t) (M) f (s)2H
σc2 (f, t) = sup . (14.4.17)
s∈M\{t} (1 − ρ(t, s))2
We now need to show that the same expression holds for cot 2 (θc (f (t))). Turning
to the picture in terms of geodesics, fix f (t) and vf (t) . Suppose that for a certain
r, the point cf (t),vf (t) (r) does not metrically project to f (t). This and a little basic
Hilbert sphere geometry implies there is a point f (s) ∈ (M) such that
) *
cos−1 cf (t),vf (t) (r), f (s) H = τ cf (t),vf (t) (r), f (s) (14.4.18)
< τ cf (t),vf (t) (r), f (t)
D *
= cos−1 cf (t),vf (t) (r), f (t) H ,
14.4 Examples 379
Therefore,
) *
vf (t) , f (s) H
cot θ (f (t), vf (t) ) = sup .
s∈M\{t} 1 − ρ(t, s)
Taking the supremum over all vf (t) ∈ S(Nf (t) (M)) we obtain
) *
vf (t) , f (s) H
cot (θc (f (t))) = sup sup
vf (t) ∈S(Nf (t) (M)) s∈M\{t} 1 − ρ(t, s)
PNf (t) (M) f (s)H
= sup .
s∈M\{t} 1 − ρ(t, s)
Lemma 14.4.3 gives a very succinct representation for the critical variance σc2 (f )
that requires nothing about the rather complicated process f t that led to its original
definition. Here is another result, part of which relates to the geometric structure
of H and part of which has a rather down-to-earth implication for the correlation
function ρ.
Lemma 14.4.4. Retain the assumptions of Lemma 14.4.3, and suppose that the pair
(t ∗ , s ∗ ) ∈ ∂N M × ∂N M achieves the supremum in
Then,
where
380 14 Excursion Probabilities for Smooth Fields
In order to appreciate the geometry behind (14.4.20) you might think as follows:
Given a pair of points in ∂N M that maximize (14.4.19), the three equations (14.4.20),
(14.4.21), and (14.4.22) imply that the tube of radius θc around (M) should self-
intersect along a geodesic from t ∗ to s ∗ in such a way that the tube, viewed locally
from the point t ∗ , shares a hyperplane with the tube viewed locally from the point
s ∗ . Alternatively, at the point of self-intersection the outward-pointing unit normal
vectors should be pointing in opposite directions.
Proof. Our first requirement is to establish (14.4.20). Looking back through the
proof of Lemma 14.4.3 (cf. (14.4.18)), it is immediate that if a pair (t ∗ , s ∗ ) achieves
the supremum in (14.4.19) then there exists a point z ∈ H equidistant from f (t ∗ ) and
f (s ∗ ) and unit vectors vf (t ∗ ) ∈ Nf (t ∗ ) (M) and wf (s ∗ ) ∈ Nf (s ∗ ) (M) such that
If u = 0, then f (t ∗ ), f (s ∗ ), and z are not on the same geodesic, and the triangle
inequality implies that
) *
cos(2θc ) < f (t ∗ ), f (s ∗ ) H ,
For such an r, there exist distinct points t(r) and s(r) such that
such that the geodesic connecting t(r) and cz,u∗ (r) is normal to M at t(r), and the
geodesic connecting s(r) and cz,u∗ (r) is normal to M at s(r).
Without loss of generality we assume that
In this case, the geodesic connecting s to cz,u∗ (r) is no longer a minimizer of distance
once it passes the point cz,u∗ (r), which is of distance strictly less than θc from s(r).
That is, for some point z(r) along the geodesic connecting s(r) and cz,u∗ (r), beyond
cz,u∗ (r) but of distance strictly less than θc from s(r), there exist points in M strictly
closer to z(r) than s(r). We therefore have a contradiction, since the critical radius
of (M) is θc .
Consequently, we have that the u of (14.4.23) is indeed zero, and so (14.4.21)
and (14.4.22) are proven.
∗ ∗
To prove claims about the partial maps ρ t (s) and ρ s (t) note first that f (s ∗ ) is
a linear combination of f (t ∗ ) and vf (t ∗ ) , which are perpendicular to every vector in
Tf (t ∗ ) (M). Similarly, f (t ∗ ) is a linear combination of f (s ∗ ) and wf (s ∗ ) , which are
∗
perpendicular to every vector in Tf (t ∗ ) (M). The fact that the partial maps ρ t (s)
∗
and ρ s (t) have local maxima then follows from the same contradiction argument
used above.
Conclusion of the proof of Example 14.4.2. Recall that what remained was to show
the equivalence of the two suprema in
382 14 Excursion Probabilities for Smooth Fields
(cf. (14.4.10)).
It is trivial that the right-hand side of (14.4.24) is no larger than the left-hand side.
It is also trivial that left-hand side is less than
1 − ρ 2 (t) − ρ̇ 2 (t)
sup . (14.4.25)
t∈[0,diam(M)]N (1 − ρ(t))2
Thus if we can show that this is equal to the right-hand side of (14.4.24) we shall
be done.
Let t ∗ be one of the points achieving the supremum in (14.4.25) with N = 1.
Then Lemma 14.4.4 implies that ρ̇(t ∗ ) = 0, and that t ∗ is a local maximum of ρ(t).
But under the monotonicity assumption of the example, there is only one such point,
given by t ∗ = 0.
We now turn to what is perhaps the grandfather of all Gaussian random fields, the
so-called cosine random field on RN . It is defined as
N
−1/2
f (t) = N fk (ωk tk ), (14.4.26)
k=1
The ξk and ξk are independent, standard Gaussians and the ωk are positive constants.
It is a trivial computation that f is a stationary, centered Gaussian process on RN
with covariance and correlation functions
N
C(t) = ρ(t) = N −1 cos(ωk tk ).
k=1
We shall concentrate on the case in which ωk = 1 for all k, in which case it is easy to
check that
if i = j .
14.4 Examples 383
The cosine field is important for a number of reasons, perhaps the most significant
of which is the fact that it is simple enough to enable explicit computations. For
example, set N = 1 and consider the excursion probability
! "
P sup f (t) ≥ u (14.4.27)
0≤t≤T
Again using the fact that T < π , we note that Nu is either 0 or 1, so that
P sup f (t) ≥ u = (u) + E{Nu }.
0≤t≤T
Looking back at Chapter 11, in particular (11.7.13), what we have just proven is that
P sup f (t) ≥ u ≡ E{ϕ(Au (f, [0, T ])}. (14.4.29)
0≤t≤T
In other words, the approximation that has been at the center of this chapter is
exact for the one-dimensional cosine process on an interval of length less than π .
On the other hand, if ϕ < T ≤ 2π, then much of the above argument breaks
down and (14.4.29) no longer holds. If T > 2π , then, since the cosine process is
periodic, increasing T has no effect whatsoever on sup[0,T ] f (t), and the left-hand
side of (14.4.29) is independent of T . The right-hand side, however, contains the
term E{Nu }, which grows linearly with T . Hence we see that the assumption in
the central Theorem 14.3.3 that ρ(t, s) = 1 ⇐⇒ t = s, which does not hold for
periodic processes, was necessary.
But more than this can be seen from this very simple example. For example, if
we take an interval of length greater than 2π , so that we get at least a full cycle of the
process, the excursion probability is simply
384 14 Excursion Probabilities for Smooth Fields
P{R ≥ u} = e−u
2 /2
,
where R is defined in (14.4.28). Note that this does not depend on T . On the other
hand, E {ϕ (Au (f, [0, T ])} contains a term growing linearly in T , and so here the
approximation breaks down. To relate these phenomena to the rest of this chapter,
we need to look at the critical variance σc2 for this example.
Unfortunately, we do not have among the results we have already proven one that
immediately yields σc2 . However, we do have a technique. The technique is that of
the proof of Example 14.4.2. Although the cosine process over RN is not isotropic,
as that example requires, if we restrict f to a unit cube of the form [0, T ]N with
T ≤ π/2 then the same argument as given there also works now. The conclusion is
again that
∂ 2 f (t)
σc (f, [0, T ] ) = Var
2 N
f (t) = N − 1, (14.4.30)
∂t12
N
1 1
N
MN = sup √ sup {ξk cos tk + ξk sin tk } = √ mk ,
t∈T k=1
N 0≤tk ≤Tk N k=1
This final chapter is, for two reasons, somewhat of an outlier as far as this book is
concerned.
The first reason is that it was initially primarily motivated by applications that re-
quired a non-Gaussian theory, while most of the book has been about purely Gaussian
random fields.
The second reason is that in large part, it stands independent of the preceding
four chapters of Part III of the book and is able to recoup, with completely different
proofs, the main results of Chapters 11–13, although not those of Chapter 14.
Consider the first issue first. If you, like us, have applications in mind, it will take
no effort whatsoever to convince you that not all random fields occurring in the “real
world’’ are Gaussian.1 The term “non-Gaussian’’ is, however, not well defined and
covers too wide a class of generalizations. Consequently, throughout this chapter we
shall limit ourselves to random fields of the form
suppose that the yj are centered and of unit variance and consider the following three
choices for F , where in the third we set k = n + m:
√
k
x1 k − 1 m ni=1 xi2
2
xi , , . (15.0.2)
i=1 ( ki=2 xi2 )1/2 n n+mi=n+1 xi
2
The corresponding random fields are known as χ 2 fields with k degrees of freedom,
Student’s t field with k − 1 degrees of freedom, and the F field with n and m degrees
of freedom. If you have any familiarity with basic statistics you will know that
the corresponding distributions are almost as fundamental to statistical theory as
is the Gaussian distribution. If you are not statistically literate you can consider
these merely as specific examples of general non-Gaussian random fields of the form
(15.0.1). Although we shall return to two of these examples in Section 15.10, the rest
of this chapter will treat the general case.
As usual, we shall concentrate on the excursion sets of these random fields, a
problem that can be tackled in at least two quite distinct ways. The standard ap-
proach, which held sway until JET’s thesis [157] appeared in 2001 (cf. also [158]),
was to treat each particular F as a special case and to handle it accordingly. This
invariably involved detailed computations of the kind we met first in Chapter 11 and
that were always related to the underlying Gaussian structure of y and to the specific
transformation F . This approach led to a large number of papers, which provided
not only a nice theory but also a goodly number of doctoral theses, promotions, and
tenure successes.
A more unified and far more efficient approach is based, in essence, on the obser-
vation that
Au (f, M) = Au (F (y), M) = {t ∈ M : (F ◦ y)(t) ≥ u} (15.0.3)
−1 −1 −1
= {t ∈ M : y(t) ∈ F [u, ∞)} = M ∩ y (F [u, +∞)).
Thus, the excursion set of a real-valued non-Gaussian f = F ◦ y above a level u is
equivalent to the excursion set for a vector-valued Gaussian y in F −1 [u, ∞), which,
under appropriate assumptions on F , is a manifold with piecewise smooth boundary.
This chapter is all about exploiting this observation in order to compute explicit
expressions for all the mean Lipschitz–Killing curvatures
E {Li (Au (f, M))} = E Li (M ∩ y −1 F −1 ([u, +∞)) . (15.0.4)
In fact, since the function F in (15.0.4) is quite general, so is the set F −1 ([u, +∞)),
which will generally be a stratified manifold in Rk . Consequently, rather than con-
centrating on the usual excursion sets, there is no reason not to take a general subset
D of Rk and study the Lipschitz–Killing curvatures of y −1 (D). Indeed, this is the
path that we shall take, and we shall show that for suitable stratified manifolds M and
suitable D ⊂ Rk , we have
dim M−i 9 :
i+j
E Li (M ∩ y −1 (D) = (2π)−j/2 Li+j (M)Mj (D), (15.0.5)
γ
j
j =0
15.1 A Plan of Action 389
9 :
n
where the combinatorial flag coefficients m
are those we met at (6.3.12) and the
γ γ k
Mj = Mj R are the generalized (Gaussian) Minkowski functionals that we met in
the Gaussian tube formula of Corollary 10.9.6.
The Lipschitz–Killing curvatures in both (15.0.4) and (15.0.5) are computed with
respect to the Riemannian metric3 induced on M by the individual Gaussian fields
yj . This is good news in the latter case, since the fact that they are therefore identical
to those appearing in the purely Gaussian results of Chapters 11 and 12 means that
they do not need to be recomputed for the non-Gaussian scenario. Consequently,
obtaining a useful final formula for a specific case—such as those of (15.0.2)—
involves only knowing how to compute the Mj (F −1 [u, +∞)). This usually turns
γ
where ρ 3 depends on all the parameters given, but not on the distribution of the
underlying Gaussian fields yj .
With (15.1.1) established it is clear that we need only find the form of the function
3. Moreover, we are free to choose simple examples of the manifold M and the
ρ
Gaussian fields yj in order to do this, since the result cannot depend on these choices.
The choice we make is that of a specific rotationally invariant field restricted to subsets
of a sphere.
However, even with this choice the computations are nontrivial, and so we ap-
proach them in an indirect fashion. In particular, we shall start by looking at non-
Gaussian fields with finite orthogonal expansions and with coefficients coming from
random variables distributed uniformly over high-dimensional spheres. This will en-
able us to take expectations using an appropriate version of the kinematic fundamental
formula on spheres, Euclidean (flat) relatives of which we already met in Chapter 13
under the titles of Hadwiger’s formula (13.1.2) and Crofton’s formula (13.1.1). These
preliminary expectations are computed in Section 15.6, after we establish the requisite
kinematic fundamental formulas in Section 15.5.
The passage from this scenario to the Gaussian one will be via a limit theorem for
projections of uniform random variables on S n as n → ∞, historically associated with
the name of Poincaré, although we shall need a slight extension of some more recent
versions due to Diaconis and Freedman [46] and their generalizations to matrices
in [47]. Poincaré’s result is given in Section 15.4 and applied to our setting in
Section 15.9, after we first see how it works in the case of the Euler characteristic,
L0 , in Section 15.7. This section effectively rederives the main results of Chapters 11
and 12.
It is in Section 15.9 that we shall finally exploit the non-Lebesgue tube-volume
γ
results of Section 10.9 to see how the Mj arise in (15.0.5).
The argument throughout will involve a continual and delicate intertwining of
probability and geometry to obtain a result that, since it involves both expectations
and Lipschitz–Killing curvatures, lies in both areas. As such, we see it as being at
the core of what this book was supposed to be about, and we hope that you will enjoy
reading it as much as we did finding it.6
6 Actually, finding the proofs for this chapter was not always enjoyable. As you will soon see,
the basic idea behind everything is not too difficult, although it is often well camouflaged by
heavy notation and subject to heavy combinatorial manipulation. If you find following the
details occasionally oppressive you can imagine what it was like going through it all for the
first time. However, the final result, Theorem 15.9.5 (i.e., the formal version of (15.0.5))
came out to be so simple and elegant that “enjoy’’ is still the most appropriate verb.
15.2 A Representation for Mean Intrinsic Volumes 391
N−i
E Li (M ∩ y −1 (D) = Li+j (M)3
ρ (i, j, D). (15.2.1)
j =0
The proof is not short, and it proceeds via a series of smaller calculations. The
basic idea is to write M ∩ y −1 (D) as
N j
−1
M ∩y (D) = ∂j M ∩ y −1 (∂k−l D) . (15.2.2)
j =0 l=0
Since both M and D are regular stratified manifolds and y is, with probability one,
C 2 , it follows from the basic properties of these manifolds (cf. Section 8.1) that such a
partition exists. If we also assume, without loss of generality,7 that k ≥ dim(M),3 then
−1
it follows from the properties of y that, with probability one, each ∂j M ∩ y (∂k−l D)
will be a (random) manifold of dimension j − l in M. 3
To simplify notation below, we write the strata of M ∩ y −1 D as
Mj l = ∂j M ∩ y −1 (∂k−l D).
(15.2.3)
j −i
E Li M, Mj l = Li+m (M, ∂j M) · ρ(i, m, ∂l D). (15.2.5)
m=0
In this section we prove Theorem 15.2.1, noting, however, that you will not need the
techniques of the proof for anything else in this chapter, and that on first reading it
should probably be skipped.
In the notation of the previous section, we shall prove that under the conditions
of Theorem 15.2.1,
1
j
E L0 M, Mj l = Li (M, ∂j M) · ρ(i, ∂k−l D), (15.3.1)
i=0
Note, for later use, that given yt the Jacobians have conditional distributions (cf.
footnote 3 of Chapter 13)
J3f (t)|yt ∼ Wishart(j, ∇Fr (yt ), ∇Fs (yt )) = Wishart(j, J3F (yt )).
(15.3.2)
9 It is not difficult to find functions F satisfying the conditions we require. For a concrete
l
example, we argue as follows: Extend ∂k−l D geodesic distance δ along unit speed geodesics
emanating from it. Denoting the extension by ∂k−l δ D, note that since we are prepared to
work locally, we can assume that ∂k−l D has positive critical radius, so that we can choose
δ sufficiently small for the projection map P∂ δ D to be well defined on Tube(∂k−l D, 2δ).
k−l
Choosing the standard orthonormal basis for Rk , we can now define Fl (t) coordinatewise by
It is then easy to check that this Fl satisfies all the required conditions.
10 While J3f (t) is not strictly the Jacobian Jf (t) : T ∂ M → T Rl , we call it the Jacobian
t j ft
because (det(J3f (t)))1/2 = |Jf (t)|, where the norm |Jf (t)| is the square root of the sum
of all l × l principal minors of the matrix of the linear transformation Jf (t) in terms of
orthonormal bases of Tt ∂j M and Tft Rl .
394 15 Non-Gaussian Geometry
With notation now set, we can start our computation. As a first step, choosing 3y
as a Morse function, note that L0 (M; Mj l ) is determined by the points11
t ∈ ∂j M : f (t) = 0, yt ∈ ∂k−l D, PTt Mj l ∇3
y=0 , (15.3.3)
along with the indices of ∇ 23yt and the normal Morse indices of Mj l in M.
Applying an appropriately reworked version12 of Corollary 12.4.5, an argument
essentially identical to those appearing
in the proofs of Theorems 13.3.1 and 13.3.2
shows that the expected value of L0 M ∩ y −1 D, Mj l is given by
1
E α(ηt ; M ∩ y −1 D) Tr Tt Mj l (−∇ 23 y|Mj l )j −l |J3f (t)|1t (15.3.4)
(j − l)! ∂j M
f (t) = 0, PTt Mj l ∇3
y = 0 pf,Pj l (0, 0)Hj (dt),
where in order to make the formulas a little more manageable, we write 1t for
1∂k−l D (y(t)), |J3f (t)| for (det(J3f (t)))1/2 ,
ηt = PT⊥t Mj l ∇3
yt , (15.3.5)
where y∗,t and 3y∗,t are the push-forwards of y and 3 y . We shall see that this factors
as required, which, after a little further manipulation of expectations, will achieve
our goal.
This expectation is virtually identical to the one we evaluated in Theorem 13.3.2,
except for two significant differences:
• The set Fl−1 (0) may not be flat as in Theorem 13.3.2, where it was a subspace
of Rk .
• The normal Morse index α is not as simple as it was there, since now it can also
depend on other strata of D.
We deal with the normal Morse index first. Theorem 9.2.6 implies that this Morse
index factors into a product of two modified Morse indices, one for M and one for
y −1 D. To set this up properly for the current scenario, note first that the support cone
of M ∩ y −1 D at t ∈ Mj l is the intersection of two support cones, that is,
3
St Mj l = St M ∩ St y −1 D ⊂ Tt M.
Let
Vt = span {∇fi (t), 1 ≤ i ≤ l}
l
TVt νt = PT⊥t ∂j M νt − νt , PTt ∂j M ∇fr (t)J3f (t)rs PT⊥t ∂j M ∇fs (t).
r,s=0
In particular, we have
where, with some abuse of notation, we have written y∗ (Tt ∂j M) to denote the col-
lection of push-forwards, by y, of vectors in Tt ∂j M. It is clear that, as a random
variable, α2 (ηt ) is dependent only on the collection
Ft = yt , y∗,t , 3
yt , 3
y∗,t
of four random variables. Hence, the conditional expectation in (15.3.7) does not
depend on t.
In view of the above observations, and appealing to Lemma 13.5.1, we have that
the conditional expectation (15.3.7) can be written as
E α(ηt ; M ∩ y −1 D) Tr Tt Mj l (−∇ 23 y|Mj l )j −l |J3f (t)|1t Ft
l!(j − l)! j −l
= E α(ηt ; M ∩ y −1 D) Tr Tt ∂j M (−∇ 23 y|Mj l ) |J3f (t)|1t Ft
j!
l!(j − l)! j −l
= α2 (ηt )|J3f (t)|1t E α1 (ηt ) Tr Tt ∂j M (−∇ 23 y|Mj l ) Ft .
j!
To evaluate the expectation here, we need to look a little more closely at the
structure of the Hessian ∇ 23
y|Mj l . As in the proof of Theorem 13.3.2, applying the
Weingarten equation (7.5.12) gives us that
l
rs
∇ 3
y|Mj l ,t = ∇ 3
2
yt −2
yt , ∇fr|∂j M (t) J3f (t) ∇ 2 fs|∂j M (t).
∇3
r,s=1
This depends on the curvature of Fl−1 (0). With some perversity (note how terms
cancel) but with the future in mind, we acknowledge this dependence by writing
∇ 23
y|Mj l ,t = $t + t ,
where
$t = ∇ 23
yt + 3
yt · I − Sηt
∂Fs 2
l
k
− yt , PTt ∂j M ∇fr (t)J3f (t)rs
∇3 · ∇ yu (t) + yu,t · I ,
∂yu
r,s=1 u=1
t = 1t + 2t ,
l k 2
rs ∂ Fs
1t = ∇3 3
yt , PTt ∂j M ∇fr (t)J f (t)
∂yu ∂yv yt
r,s=1 u,v=1
× Xt , ∇yu (t) · Yt , ∇yv (t),
k−l
k
∂Fs
yt , PTt ∂j M ∇fr (t)J3f (t)rs
2t = −3
yt · I + ∇3 yu , t · I.
∂yu
r,s=1 u=1
15.3 Proof of the Representation 397
Written as above, the curvature of Fl−1 (0) is contained in the double form 1t .
Continuing, we thus have that (15.3.7) is given by
j −l
l!
E α1 (ηt )α2 (ηt ) Tr Tt ∂j M (−∇ 23
y|Mj l ) |J3f (t)|1t Ft
j!
j −l
l!(j − l)!
=
(j − l − m)!m!j !
m=0
j −l−m 3
× Tr Tt ∂j M E α1 (ηt )α2 (ηt ) · $m
t t |J f (t)|1t Ft
j
l!(j − l)!
=
(m − l)!(j − m)!j !
m=l
j −m 3
× Tr Tt ∂j M E α1 (ηt )α2 (ηt ) · $t m−l
t | J f (t)|1 t Ft ,
where the first equality follows from Corollary 12.3.2 and the second is no more than
a change of variables.
We shall apply Lemma 13.5.2, conditional on yt , to each term in the sum above. To
do so, we first identify variables and functions above with the variables and functions
in Lemma 13.5.2. With this in mind, set
l
−1/2
Zi = yt , PTt ∂j M ∇fr (t)J3f (t)ri ,
∇3
r=0
−1/2 −1/2
W = J3F (yt ) J3f (t) J3F (yt )t ,
Uj = ∇ 2 yj + yj · I, SP ⊥ ∇yj , 1 ≤ j ≤ l,
Tt ∂j M
Ul+1 = ∇ 23
y , SP ⊥ ∇3 y ,
Tt ∂j M
j −m
Gj −m = α1 · $t ,
3m−l =
G α2 · m−l
t .
It is straightforward to verify that, conditional on yt , the above random variables have
the required conditional distributions. The result after application of the lemma with
l˜ = j − m, r̃ = m − l, ñ = j , k̃ = l is
j ! ωm 3j −m |yt }.
(2π)−m/2 E{Gm−l }E{G (15.3.9)
(j − l)! ω0
There is only one conditional expectation above because the Uj ’s are, in fact, inde-
pendent of yt .
Applying Lemma 13.5.2 again, along with the fairly obvious fact that E m t =
Cm I m for any m, we find constants, ρ(m, ∂k−l D), which may change from line to
line, such that the expected value of the above (after multiplication by the factor of
2π in (15.3.6)) is equivalent to
398 15 Non-Gaussian Geometry
j
l!
(2π)−(j −m)/2 ρ(m, ∂k−l D)
(m − l)!(j − m)!
m=l
× Tr Tt ∂j M E α(PT⊥t ∂j M ∇3
yt ; M) · (∇ 23 yt · I )j −m I m−l
yt + 3
j
l!
= (2π)−(j −m)/2 ρ(m, ∂k−l D)
(m − l)!(j − m)!
m=l
× Tr Tt ∂j M E α(PT⊥t ∂j M ∇3
yt ; M) · (∇ 23 yt )j −m ,
yt + 3
n1/2
Fig. 15.4.1. The “laddered’’ region, the left and right edges of which project down to a and b
on the right-pointing axis, is the spherical zone of (15.4.1) for n = 3.
This is the basic “Poincaré limit theorem,’’ but we shall need more. For a start,
we shall need to move from vectors to matrices, but this is really no more than a
change of notation, which we shall handle in a moment. Furthermore, rather than
talk about uniform random variables on spheres, it will be more convenient for us
to choose Haar distributed random orthonormal matrices and use these to transform
fixed points that will then have a spherically uniform distribution. This is also little
more than a notational issue.
More importantly, however, we shall need to know about the convergence of
expectations of functionals, and for this the proof itself becomes considerably more
involved. Consequently, we present Theorem 15.4.1 below without a proof, referring
you to [46, 47] for details.
Now for the notation. With ηn still uniformly distributed on S√n (Rn ), consider
the random vector
Xk,n = π√n,n,k (ηn ),
where πλ,n,k : Sλ (Rn ) → BRk (0, λ), defined by
πλ,n,k (x1 , . . . , xn ) = (x1 , . . . , xk ), (15.4.2)
is projection from Sλ (Rn ) onto the first k ≤ n coordinates. Then the result we
described above can be rephrased by writing that as n → ∞, and for k fixed,
L
Xk,n → N (0, Ik×k ), (15.4.3)
L
where → denotes convergence in distribution (law).
Here is the generalization of Poincaré convergence that we shall need.
400 15 Non-Gaussian Geometry
Theorem 15.4.1 (Poincaré’s limit [47]). Fix l, k ≥ 1 and suppose that gn ∈ O(n) is
a Haar distributed random orthonormal matrix. Consider the random l × k matrix
Xl,k,n with (i, j )th entry given by
√
π√n,n,k ( ngn ei ) ,
j
where {e1 , . . . , en } is the usual orthonormal basis of Rn . Then the matrix Xl,k,n
converges in total variation to Xl,k , an l ×k matrix of i.i.d. N (0, 1) random variables.
Furthermore, if F is a real-valued function of matrices for which
then
lim E F (Xl,k,n ) = E F (Xl,k ) . (15.4.4)
n→∞
dPl,k,n
dPl,k
The kinematic fundamental formula (KFF) is undoubtedly one of the most general and
fundamental results in integral geometry, of which many other well-known formulas
are special cases or corollaries. For a full treatment of this result in a variety of
scenarios you should turn to any of the classic references, including [24, 40, 65, 92,
139, 141].
We have already met a related result in the form of Crofton’s formulas of Sec-
tion 13.1. Crofton’s formulas gave expressions for the Lipschitz–Killing curvatures
of “typical’’ cross-sections of subsets M of Rn , that is, of intersections M ∩ V where
the V were subspaces of Rn . The KFF, however, is essentially a formula giving
the “average’’ Lipschitz–Killing curvatures of M1 ∩ gn M2 , where gn is a “typical’’
isometry (i.e., rigid motion) of Rn and M1 and M2 are both subsets of Rn .
For us the KFF will be the key to identifying the functions ρ 3 in results such as
(15.1.1), which are themselves key to developing the final expression for the mean
Lipschitz–Killing curvatures of excursion sets in Theorems 15.9.4 and 15.9.5 below.
15.5 Kinematic Fundamental Formulas 401
For the first KFF we take two tame stratified subsets M1 and M2 of Rn with finite
Lipschitz–Killing curvatures.14 Although this is not actually the case we shall need, it
is the simplest and most common version of the KFF and so a good place to start. We
also take Gn , the isometry group of Rn with a Haar measure. Since Gn is isomorphic
to Rn × O(n), its Haar measures are not finite, making it somewhat misleading to
talk of the “average’’ Lipschitz–Killing curvatures we described above or of “typical’’
isometries. Nevertheless, it is possible to normalize Haar measure on Gn in such a
way as to obtain a measure νn for which, for any x ∈ Rn and any Borel set A ⊂ Rn ,
n−i 9
: 9 :−1
i+j n
Li (M1 ∩ gn M2 ) dνn (gn ) = Li+j (M1 )Ln−j (M2 )
Gn i j
j =0
(15.5.2)
n−i
si+1 sn+1
= Li+j (M1 )Ln−j (M2 ),
si+j +1 sn−j +1
j =0
14 There is a fine point here about whether we should require that M and M also be piecewise
1 2
C 2 . Since our definition of the Lipschitz–Killing curvatures involves curvature, the KFFs
that follow are poorly defined otherwise. On the other hand, there are alternative definitions
of the Lipschitz–Killing curvatures (better called intrinsic volumes than curvatures in this
case) without such strong smoothness assumptions along with corresponding KFFs. Thus,
we shall leave out the reference to smoothness when it is not required, despite the fact that
you will then have to look in the references given above to see what the KFF really means
in these scenarios.
15 In (15.5.2) the L are always total curvature measures, or intrinsic volumes. However,
k
there are generalizations of (15.5.2) with curvature measures replacing intrinsic volumes.
For further details, which we shall not require, see [80, 141].
Further to the previous footnote, note that there are also versions of (15.5.2) over more
general classes of sets than tame stratified sets. Indeed, much of the recent work around the
KFF has been devoted to seeing just how general the sets may be taken.
A proof of (15.5.2) at (and beyond) the level of generality that we have stated the KFF
can be found, for example, in [33].
402 15 Non-Gaussian Geometry
What will be far more important for us than the KFF on Euclidean space, for reasons
that will soon become clear, is a version of the KFF for subsets of S√n (Rn ) in which
Gn is replaced by Gn,λ , the group of isometries (i.e., rotations) on Sλ (Rn ). Perhaps
not surprisingly, with an appropriate definition of curvature measures, the KFF on
the sphere ends up almost identical to the Euclidean version.
Noting that Gn,λ / O(n), we normalize Haar measure νn,λ on Gn,λ much as
we did for Gn in the Euclidean case. That is, for any x ∈ Sλ (Rn ) and every Borel
A ⊂ Sλ (Rn ), we require that
νn,λ gn ∈ Gn,λ : gn x ∈ A = Hn−1 (A). (15.5.3)
The KFF on Sλ (Rn ) then reads as follows, where M1 and M2 are tame stratified
spaces in Sλ (Rn ):
−2
Lλi (M1 ∩ gn M2 ) dνn,λ (gn ) (15.5.4)
Gn,λ
9
n−1−i
i+j
:9
n−1
:−1
−2 −2
= Lλi+j (M1 )Lλn−1−j (M2 )
i j
j =0
n−1−i
si+1 sn −2 −2
= Lλ (M1 )Lλn−1−j (M2 ),
si+j +1 sn−j i+j
j =0
where the functionals Lκi (·) are from the one-parameter family defined in (10.5.8).
We first met this process in Section 10.2, when we looked at the volume-of-
tubes approach to computing Gaussian excursion probabilities. In particular, we saw
there that any Gaussian field with a Karhunen–Loève expansion of order l < ∞ and
constant unit variance can be mapped to a process on a subset of S(Rl ), and that the
suprema for the two processes are identical.
In this chapter, however, we have a different task for the isotropic process, for
it will be the test process to which we referred earlier for computing the unknown
functionals ρ3 in (15.1.1) and Theorem 15.2.1.
However, rather than dealing directly with the isotropic Gaussian process, we
shall start with a series of approximations to it that are easier to handle and for which,
as noted above, we can compute mean Lipschitz–Killing curvatures via the KFF.
The following result is the key to all that follows in this section. Although it is based
on the one-parameter family Lλi of Lipschitz–Killing curvatures rather than the ones
we are more used to using, we have already seen in Section 15.5.2 that they are better
suited to computations on spheres than are the usual ones. We shall make the change
back to the usual curvatures before we leave this section.
Lemma 15.6.1. Let y (n) be the model process (15.6.2) on a tame stratified space
M ⊂ S(Rl ), with n ≥ l. Then, for any tame stratified space D ⊂ Rk ,
E L1i (M ∩ (y (n) )−1 D) (15.6.4)
dim M−i 9 :9 :−1 Ln−1 π −1
√ D
i+j n−1 n−1−j n,n,k
= nj/2 L1j +i (M)
j j sn n (n−1)/2
j =0
−1
−1
dim M−i
si+1 1 Lnn−1−j π√ n,n,k
D
= Lj +i (M) .
si+j +1 sn−j n(n−1−j )/2
j =0
−1
Remark 15.6.2. It is important to understand what the meaning of π√ n,n,k
D is in
the above lemma, and in all that follows. The problem is that for all t ∈ S√n (Rn ),
π√n,n,k (t) ∈ B√n (Rk ), which may, or may not, cover D. Thus, with the usual
definition
−1 √ (Rn ) : π√
π√ n,n,k
D = t ∈ S n n,n,k (t) ∈ D ,
−1
it follows that π√ n,n,k
D may be only the inverse image of a subset of D.
−1
Proof. The first thing that we need to note is that π√ n,n,k
D is a tame stratified space
in S(Rn ), and so from the construction of y (n) we have
where the second-to-last line follows from the scaling properties of Lipschitz–Killing
curvatures and the last is really no more than a notational change, using (15.5.3).
15.6 A Model Process on the l-Sphere 405
However, applying the KFF (15.5.4) to the last line above, we immediately have
that it is equal to
9 :9 :−1 n−1 √ n−1 −1
dim M−i
i+j n−1 L j +i nM Ln−1−j π √
n,n,k
D
nj/2
i j n(i+j )/2 sn n(n−1)/2
j =0
dim M−i 9 :9 :−1 L n−1 π√−1
D
i+j n−1 n−1−j n,n,k
= nj/2 L1j +i (M) ,
j j sn n(n−1)/2
j =0
then
E L1i (M ∩ y −1 D) = lim E L1i (M ∩ (y (n) )−1 D) (15.6.7)
n→∞
dim M−i 9 :
i+j
= L1j +i (M)3
ρj (D).
i
j =0
This is starting to take the form of (15.0.5), the result that we are trying to prove.
The combinatorial flag coefficients are in place, but both sides of the equation are
based on the L1j +i curvatures rather than the Lj +i , and we have yet to explicitly
identify the functions ρ 3j . Note the important fact, however, that on the right-hand
side of the equation we have already managed to split into product form factors that
depend on the underlying manifold M and the set D.
As far as the fact that we are using the “wrong’’ Lipschitz–Killing curvatures
is concerned, recall that we have already seen in Lemma 10.5.8 that the L1j can be
expressed as linear combinations of the Lj , and vice versa. Thus, in principle, it
should not be hard to derive a version of (15.6.7) with the usual curvatures. In
practice, the computation is not simple, but we shall carry it out in a moment. Once
done, it will show that
∞ 9
∞ :
−1 i+j
E Li (M ∩ y D) = 3l (D),
Lj (M)ci,j,l ρ (15.6.8)
j
j =i l=0
is probably the most important part of this chapter, and is the closing step in proving
the main result, Theorem 15.9.5. In particular, we shall see that the above sum depends
only on i + j and is intimately related to the volume-of-tubes results of Chapter 10.
Before doing this, however, we tackle the easier part of the problem, that of finding
a version of (15.6.7) with the usual Lipschitz–Killing curvatures. Since the ρ 3j of
(15.6.7), assuming that they exist, are functionals of D and independent of M, we
prove the following general result under the assumption that the limits (15.6.5) are
indeed well defined.
Theorem 15.6.3. Let M ⊂ S(Rl ) be a tame stratified space, and assume that for
0 ≤ j ≤ dim(M), the limits
9 :−1 Ln−1 π√−1
D
n−1 n−1−j n,n,k
3j (D) = lim nj/2
ρ (15.6.10)
n→∞ j sn n(n−1)/2
M−i 9
dim :
i+l
E Li (M ∩ y −1 D) = Li+l (M)ρl (D), (15.6.12)
l
l=0
where for j ≥ 1,
j −1
2 !
(−1)l
ρj (D) = (j − 1)! 3j −2l (D)
ρ (15.6.13)
(4π)l l!(j − 1 − 2l)!
l=0
and
Proof. As usual, set N = dim(M), and recall the two basic formulas of Lemma 10.5.8
relating the {Lj }j ≥0 and the {L1j }j ≥0 :
∞
(−1)n (i + 2n)!
L1i (·) = Li+2n (·) (15.6.14)
(4π)n n! i!
n=0
and
∞
1 (i + 2n)! 1
Li (·) = n
Li+2n (·). (15.6.15)
(4π) n! i!
n=0
2 ! 9 :
N−i
1 (i + 2n)!
N−i−2n
i + 2n + j
= L1i+2n+j (M)3
ρj (D)
(4π)n n! i! j
n=0 j =0
2 ! 9 :
N−i
1 (i + 2n)!
N−i−2n
i + 2n + j
= n
3j (D)
ρ
(4π) n! i! j
n=0 j =0
N −i−2n−j
!
2
(−1)l (i + 2n + j + 2l)!
× Li+2n+j +2l (M).
(4π)l l! (i + 2n + j )!
l=0
408 15 Non-Gaussian Geometry
2 ! 9 :
N−i
1 (i + 2n)!
N−i−2n
i + 2n + j
n
ρ̃j
(4π ) n! i! j
n=0 j =0
N−i−2n−j
!
2
(−1)l (i + 2n + j + 2l)!
× Li+2n+j +2l
(4π)l l! (i + 2n + j )!
l=0
N−i
N−i−j
! 9 :
2
1 (i + 2n)! i + 2n + j
= ρ̃j
(4π)n n! i! j
j =0 n=0
N−i−2n−j
!
2
(−1)l (i + 2n + j + 2l)!
× Li+2n+j +2l
(4π)l l! (i + 2n + j )!
l=0
N−i−j
N−i !
2 α
1 (i + 2(α − β))!
= ρ̃j
(4π)α−β (α − β)! i!
j =0 α=0 β=0
9 :
i + 2(α − β) + j (−1)β (i + j + 2α)!
× Li+j +2α ,
j (4π) β! (i + 2(α − β) + j )!
β
where the first and third equalities come from a change of order of summation and
the second from the transformation (α, β) = (n + l, l). Again changing the order of
summation we find that this is the same a
2 ! N−i−2α
N−i
α
1 (i + 2(α − β))!
ρ̃j
(4π)α (α − β)! i!
α=0 j =0 β=0
9 :
i + 2(α − β) + j (−1)β (i + j + 2α)!
× Li+j +2α .
j β! (i + 2(α − β) + j )!
Making now the further change of variables (m, k) = (α, j + 2α), the above is
equivalent to
2 !
k−1
N−i m
1 (i + 2(m − β))!
ρ̃k−2m
(4π)m (m − β)! i!
k=0 m=0 β=0
9 :
i + k − 2β (−1)β (i + k)!
× Li+k
k − 2m β! (i + k − 2β)!
2 !
k−1
N−i
(i + k)!
m
1
= Li+k ρ̃k−2m (i + 2(m − β))!
i! (4π ) (m − β)!
m
k=0 m=0 β=0
15.6 A Model Process on the l-Sphere 409
9 :
i + k − 2β (−1)β 1
× ,
k − 2m β! (i + k − 2β)!
the last line being just a minor reorganization of the preceding one.
We must therefore show that
2 !
k−1
N−i
(i + k)! m
1
Li+k ρ̃k−2m (i + 2(m − β))!
i! (4π)m (m − β)!
k=0 m=0 β=0
9 :
i + k − 2β (−1)β 1
×
k − 2m β! (i + k − 2β)!
N−i 9 :
i+k
= Li+k (M) ρk ,
k
k=0
where the functionals ρk are those of (15.6.13). Equivalently, we must show that
9 :−1 2 !
k−1
(i + k)! m
i+k 1
ρk = ρ̃k−2m (i + 2(m − β))!
k i! (4π)m (m − β)!
m=0 β=0
9 :
i + k − 2β (−1)β 1
×
k − 2m β! (i + k − 2β)!
2 !
k−1
k!ωk ωi
m
(−1)β ωi+k−2β
= ρ̃k−2m .
ωi+k (4π) (m − β)!β!(k − 2m)! ωk−2m ωi+2m−2β
m
m=0 β=0
This is equivalent, by (15.6.13), to proving that the following identity holds for all
nonnegative integers k ≥ 1, i ≥ 0:
Finally, after some further simple manipulations, this is equivalent to the identity
m
i k m i + 2m − 2β k − 2m
B + 1, = (−1)m−β B + 1, ,
2 2 β 2 2
β=0
That this identity holds is verified in the following lemma, and so the proof is com-
plete.
410 15 Non-Gaussian Geometry
m
m
B(γ , δ) = (−1)m−β B (γ − m, δ + m − β) .
β
β=0
m 1
1 m
1= (−1)m−β p γ −m−1 (1 − p)δ+m−β−1 dp.
B(γ , δ) β 0
β=0
Suppose, then, that P is a random variable with distribution Beta(γ , δ), so that it
has density
1
fP (p) = pγ −1 (1 − p)δ−1 , 0 ≤ p ≤ 1.
B(γ , δ)
Then,
m
−m m
1=P (1 − (1 − P )) =
m
(−1)m−β (1 − P )m−β P −m .
β
β=0
1 = E{P −m (1 − (1 − P ))m }
m
1
1 m
= (−1) m−β
p γ −1−m (1 − p)δ+m−β−1 dp,
B(γ , δ) β 0
β=0
What remains now is to check that the limits (15.6.6) are indeed well defined, and
to evaluate them. Before treating the general case, however, we shall work out one
special one, for which the computations will be simpler and the result familiar.
Our aim in this section is to carry out all the remaining computations needed to
compute mean Lipschitz–Killing curvatures for the isotropic Gaussian process on the
sphere when the set D ⊂ Rk is a half-space.
While this is merely a special case of what will follow in Section 15.9, the calcu-
lations in this case are considerably simpler and aid in understanding what is actually
going on. Furthermore, as we shall see in a moment, the result that we shall obtain is
also a special case of the main result of Chapter 12. Thus it can be thought of as an
independent verification of earlier results, or as a corollary of them. We shall return
15.7 The Canonical Gaussian Field on the l-Sphere 411
to this point in more detail in Section 15.7.2 below. Either way, the current section
should help in understanding the general case.
In terms of the notation of the previous section, we are dealing with the zero-mean
Gaussian process from S(Rl ) to Rk defined by (15.6.3), and the hitting set D taken
to be the half-space
D = y ∈ Rk : y, η ≥ u , (15.7.1)
for some unique unit vector η ∈ S(Rk ) and u ∈ R+ . The case k = 1 obviously yields
precisely the excursion sets treated in Chapter 12. However, even for general k, the
process f defined by
f (t) = y(t), η
is a zero-mean, unit-variance Gaussian process, and so, writing
M ∩ y −1 D = M ∩ f −1 [u, +∞),
we are still in the scenario of Chapter 12. Consequently, under appropriate regularity
conditions on M, the results of Chapter 12 (especially Theorem 12.4.2) give us the
mean Euler characteristic of M ∩ f −1 [u, +∞). If we compare this with (15.6.8) with
i = 0 (so as to obtain the Euler characteristic L0 ) we find that we must have
∞
!
(2π)−(j +1)/2 Hj (u)e−u /2 , j > 0,
2
3l (D) = ρj (u) =
c0,j,l ρ
l=0
1 − (u), j = 0,
where the Hj are, as usual, the Hermite polynomials (11.6.9), and so we have found
our elusive constants for this case.
Nevertheless, our aim is not to use the results of Chapter 12 to help out in the
current calculation, but rather to calculate everything afresh. However, at least now
we know what kind of result we are looking for.
where for j ≥ 1,
ρj (D) = (2π)−(j +1)/2 Hj −1 (u)e−u
2 /2
and
ρ0 (D) = γRk (D).
412 15 Non-Gaussian Geometry
The proof is based on the following lemma, which we state and prove first.
3l (D) defined by
Lemma 15.7.2. Under the assumptions of Theorem 15.7.1, the ρ
(15.6.6) satisfy
3j (D) = (2π)−(j +1)/2 uj −1 e−u /2
2
ρ
for j ≥ 1, and
30 (D) = 1 − (u).
ρ
Proof. We begin with the observation that as long as n > u, and adopting the notation
of the preceding section, we have
√ √
−1 −1
π√ n,n,k
D = B √
S n (R )
n nη, cos (u/ n) , (15.7.3)
√
a geodesic ball, or spherical cap, in S√n (Rn ) of radius cos−1 (u/ n) centered at the
√
point nη ∈ S√n (Rn ). An example is given in the shaded region of Figure 15.7.1.
u n1/2
Fig. 15.7.1. The spherical cap (15.7.3) with k = 1, n = 3, and η taken along the right-
pointing axis.
−1
and compute Lnn−1−j using the extrinsic formula (10.5.10). Note that for j > 0,
−1
the only contributions to Lnn−1−j (H ) come from ∂H ,17 which is a sphere of radius
√
n − u2 . (See Figure 15.7.1 again.)
−1
Consequently, up to a constant, Lnn−1−j (H ) is the integral of
Tr(Sηj −1 )
L = y ∈ Rn : y, η = u .
Using this equivalence and substituting into (10.5.10) and (10.5.5) to get the constants
right, (15.7.4) now follows on noting that
Ln−1−j (BL (uη, n − u2 )) = Ln−1−j (BRn−1 (0, n − u2 )).
Since we long ago computed the Lipschitz–Killing curvatures for S(Rn ) (cf. (6.3.8))
and we know how they scale (cf. (6.3.1)), the right-hand side of (15.7.4) is now trivial
to compute, and so for j > 0 and n > u, we now have
17 There are only two terms that could possibly contribute to Ln−1
n−1−j (H ), since H has only
two strata, its interior and its boundary. The interior term makes no contribution, since in
(10.5.10) the scalar second fundamental form for the interior term is identically zero. See
the discussion following Lemma 10.5.8.
414 15 Non-Gaussian Geometry
−1
√ √
Lnn−1−j nBS(Rn ) (η, cos−1 (u/ n)) (15.7.5)
9 :
u j −1 n−1
= √ ω (n − u2 )(n−1−j )/2 .
n n − 1 − j n−1−j
and
so that
9 : (n−1−j )/2
[j ]!uj −1 n−(j −1)/2 n−1 ωn−1−j n(n−1−j )/2 u2
1−
(2π )j/2 n−1−j sn n(n−1)/2 n
∼ (2π )−(j +1)/2 uj −1 e−u
2 /2
.
Substituting this into (15.7.5) gives that for j > 0 and u > 0,
as required.
It remains only to treat the case j = 0. But since it is an immediate implication
of Poincaré’s limit that for all u > 0,
−1 √ √
Lnn−1 nBS(Rn ) (η, cos−1 (u/ n))
lim = 1 − (u),
n→∞ sn n(n−1)/2
The previous subsection contains a new method for deriving the mean values of the
Lipschitz–Killing curvatures of the excursion sets of the canonical isotropic process
on S(RN ). Recall that our initial motivation for restricting attention to this one
very special process over this rather special parameter space was a consequence of
Theorem 15.2.1, which stated that we had only to identify the ρ 3j in order to obtain
a far more general result. The other terms, those involving the Lj (M), had already
been identified by Theorem 15.2.1.
However, a quick revision of the proof will show that nowhere did we actually
use Theorem 15.2.1 to prove Theorem 15.7.1. Rather, the proof of this theorem relied
on no more than the KFF and some, essentially algebraic, manipulations.
A natural question, therefore, is to ask how far this approach can be extended.
That is, can we actually manage without Theorem 15.2.1 at all, which was not an easy
result to prove, and still handle more general scenarios?
In fact, we can manage with only the current techniques if we are prepared to limit
ourselves to processes with finite orthogonal expansions. We describe the approach,
without going into technical details.
Recall firstly the discussion of Section 10.2, where we saw that any unit-variance
Gaussian process f3with an orthonormal expansion of order l < ∞ can be realized as
the canonical isotropic process f on an appropriately chosen subset of S(Rl ). In fact,
if f3 is defined over a tame stratified manifold M, then the corresponding subset of
S(Rl ), which we denote by ψ(M) for consistency with the notation of Section 10.2,
is also a tame stratified manifold.
Now suppose that on S(Rl ) we take the usual Riemannian metric, which is also
the metric induced on S(Rl ) by f , and on M we take the metric induced by f3 (cf.
Section 12.2). It then follows from the invariant nature of the Lipschitz–Killing
curvatures that if A ⊂ M, then
f3 f
Lj (A) ≡ Lj (ψ(A)),
416 15 Non-Gaussian Geometry
where superscripts have been added to emphasize that the Lipschitz–Killing curva-
tures on the left are computed with respect to the metric induced by f3on M, while on
the right they are computed with respect to the metric corresponding to f on S(Rl ).
Putting all this together, we have that Theorem 15.7.2 holds not just for the
isotropic process on S(Rl ), but actually on all reasonable manifolds and for all smooth,
unit-variance, Gaussian processes with a finite expansion. That is, we have reestab-
lished all the main results of Chapters 12 and 13 for these processes.
The natural question to ask now is therefore whether we can extend this approach
to processes without a finite expansion, thereby avoiding all the Morse-theoretic
computations of the earlier chapters and of the crucial Theorem 15.2.1 in the current
chapter.
Our answer is twofold: Firstly, we have not been able to do so, and it is not clear
that the argument can be extended to processes without a finite expansion. Secondly,
and, we think, more important, is the fact that the Morse-theoretic approach was cru-
cial for establishing the accuracy of the expected Euler characteristic approximation
to the excursion probability
P sup f (t) ≥ u
t∈M
in Chapter 14. We see no way of adopting an approach based purely on geometric and
tube-theoretic tools that will give the level of accuracy given by the Morse-theoretic
approach.
Thus, despite the fact that in this chapter we are rederiving many of our earlier
results, it seems that we did not waste our time previously. Apparently, there are some
results that are still beyond the reach of the KFF.
In view of the previous section, the light at the end of the tunnel, or chapter, should
now be becoming clearer. In order to evaluate the mean Lipschitz–Killing curvatures
that we are after, and in view of (15.6.6), what remains is to compute the limits
−1
−1
Lnn−1−j π√ n,n,k
D
lim (n−1)/2
n→∞ sn n
for suitable sets D, a computation that is in part geometry and in part asymptotics. In
this section we shall concentrate on the geometry.
−1
As a first step we shall need to investigate the structure of the set π√ n,n,k
D ∈
S√n (R ) a little more deeply than we have so far. In doing so, we shall also need
n
to introduce the notion of the warped product of Riemannian manifolds, the last
technical tool that we shall require from differential geometry in this book.
−1
It is clear that the set π√ n,n,k
D is, topologically, a disjoint union
15.8 Warped Products of Riemannian Manifolds 417
√
−1
π√ D / D ∩ S √ (Rk ) D ∩ B k (0, n) ◦ × S(Rn−k ) . (15.8.1)
n,n,k n R
For example, in Figure 15.7.1, the first part of this union is merely the point at the
extreme right of the spherical cap, while the second part is the cap with this point
removed.
Since we are assuming that D itself is a tame stratified space, the same is true of
−1
π√ n,n,k
D and of each of the two components above. Consequently, their intrinsic
volumes are well defined. One is easy to compute. Since D ∩ S√n (Rk ) is a tame
stratified subset of S√n (Rk ), its intrinsic volumes can be computed using (10.5.8).
The second set in the union is, however, somewhat more complex, since we have
written it as a product set and we have not yet developed tools for handling the
Riemannian structure of products. Furthermore, what we have written in (15.8.1) is a
topological equivalence, and ultimately, we shall need precise intrinsic volumes that
are not topological invariants.
The way to handle these problems is twofold. First of all, we need to break the
Riemannian structure of products into a product of structures, at least along each
stratum of a stratified space. Secondly, we need tokeep track √ of◦the fact that while
(15.8.1) is topologically precise, at each point in D∩ BRk (0, n) the corresponding
S(Rn−k ) is likely to have a different radius.
In fact, the rightmost part of (15.8.1)
√ is a subset of a warped (Riemannian) product,
and each stratum of D ∩ (BRk (0, n))◦ × S(Rn−k ) inherits this warped product
structure. Once we choose the appropriate warp we can, and shall, relate to (15.8.1)
no longer as a topologically correct relationship, but as if it were correct without
qualifiers.
To understand what this means, we take a moment to investigate warped products
and develop a general expression for the intrinsic volumes of their subsets. Once we
have that, we shall be able to develop a concrete expression for the intrinsic volumes
−1
of π√ n,n,k
D, which we do in Section 15.9.
We now develop some basic calculations needed to compute the intrinsic volumes of
warped products. We were unable to find many of these calculations in the literature
and so cannot adopt our usual policy of simply referring you to a good reference.
Bear with us—they will not be too long.
A (Riemannian) warped product consists of two Riemannian manifolds (M1 , g1 )
and (M2 , g2 ) and a smooth function σ 2 : M1 → [0, +∞), known as the warp. The
warped product is then defined to be the Riemannian manifold
(M1 , M2 , σ ) = (M1 × M2 , g1 + σ 2 g2 ). (15.8.2)
That is, the warped product of (M1 , g1 ) and (M2 , g2 ) with warp σ 2 is a Riemannian
manifold M1 × M2 such that for each (t1 , t2 ) ∈ M1 × M2 the Riemannian metric on
418 15 Non-Gaussian Geometry
M1 × M2 is given by18
g Xt1 + Yt2 , Zt1 + Wt2 = g1 Xt1 , Zt1 + σt21 g2 Yt2 , Wt2 .
gσ = gRk + ∇σ ⊗ ∇σ (15.8.3)
for
and the Riemannian metric gRk on S(Rn−k ) is the canonical one inherited from Rn−k .
The reason we are interested in this example of a warped product is that each
3n−k−1+j of
(n − k − 1 + j )-dimensional stratum D
c
D ∩ S(Rk ) × S(Rn−k )
Dj × S(Rn−k ) (15.8.5)
√ ◦
for some j -dimensional submanifold Dj of the open ball BRk (0, n) . Using this
embedding we can compute
1/n
Ln−1−i π√−1
D, 3n−k−1+j ,
D
n,n,k
−1
the contribution of these strata to the intrinsic volumes of π√ n,n,k
D.
The first step to computing these contributions is to compute the Levi-Civitá
connection ∇ 3σ , since this is needed in order to compute the second fundamental
3σ of M
form of Dj × S(Rn−k ) in M 3σ .
Consider, therefore, a general warped product (M1 , M2 , σ ) and denote the Levi-
Civitá connection on each Mj by ∇ j . Use E, or Ej , to denote vector fields on M1 ,
identified with their natural extensions on M1 × M2 . Similarly, F , or Fj , denotes
fields on M2 extended to M1 × M2 . Then the following relationships hold:
3Eσ E2 = ∇E1 E2 ,
∇ 3Fσ F2 = ∇F2 F2 ,
∇ (15.8.6)
1 1 1 1
and
18 Since the tangent space T
(t1 ,t2 ) (M1 × M2 ) is equal to Tt1 M1 ⊕ Tt2 M2 , any vector X̃(t1 ,t2 )
in it can be written as Xt1 + Xt2 for unique Xtj ∈ Ttj Mj .
15.8 Warped Products of Riemannian Manifolds 419
The two relationships in (15.8.6) follow directly from the definition of the Levi-Civitá
connection and the product structure of M. To prove (15.8.7) we apply Koszul’s
formula (7.3.12). If F1 , . . . , Fk is an orthonormal frame on (M2 , g2 ), then it is
immediate that
1
Fi = Fi , 1 ≤ i ≤ k,
σ
are orthonormal vector fields on (M, g).
Koszul’s formula then gives us that for any vector field E on (M1 , g1 ),
1
g(∇Eσ Fi , Fj ) = E(g(Fi , Fj )),
2
since all the other terms in the formula are zero either by orthogonality or the product
structure of (M, g). However,
Furthermore,
g(∇Eσ Fi , E) = 0
k
k
1
∇Eσ Fi = g(∇Eσ Fi , Fj ), Fj = g(∇Eσ Fi , Fj )Fj
σ
j =1 j =1
k
= Oj = E(σ ) Fi = E(log σ )Fi ,
E(σ )δij F
σ
j =1
With the Levi-Civitá connection ∇ 3σ determined, the next step toward computing
intrinsic volumes lies in understanding the second fundamental forms over the sets
Dj ×S(Rk ) as they sit in M3σ , as well as their powers. For this we need to describe the
√
3 ⊥
normal spaces T(t,η) Mσ for t ∈ BRk (0, n) and η ∈ S(Rn−k ). A simple argument
shows that at a point (t, η) ∈ Dj × S(Rn−k ),
⊥
3σ )⊥ = Tt Dj ⊕ Tη S(Rn−k ) / Tt Dj⊥ ,
(T(t,η) M (15.8.8)
With the various tangent and normal spaces determined, we now let S denote the
3
√j × S(R ) in Mσ , and Sσ the scalar second
scalar second fundamental form of D n−k
fundamental form of Dj in (BRk (0, n), gσ ). Then we have the following lemma.
Lemma 15.8.1. Retaining the above notation, for 0 ≤ l ≤ n − 1, take
Then
l
1 n−k−1
Tr(Sνk−j ) =
l
(−1)l−r (νk−j (log σt ))l−r Tr(Sσ,ν
r
).
l! l−r k−j
r=0
j
νk−j = ar Er
r=1
for some constants ar . Applying (15.8.6) and (15.8.7) along with the Weingarten
equation (7.5.12), we obtain
Therefore, for each νk−j , the matrix of the shape operator Sνk−j in our chosen
orthonormal basis is block diagonal with one block, of size j , being
Sσ,νk−j (Er , Es ) 1≤r,s≤j .
3n−k−1+j ∩ S(Rk ) = ∅.
D
Then, for all i ≥ k − j ≥ 0,
1/n
Ln−1−i π√ −1
D, 3n−k−1+j
D (15.9.2)
n,n,k
i+j −k
sk+l−j n−k−1
= sn−k
si i+j −k−l
l=0
3j −l D σ , σ n+k−2i−2j +2l−1 (2π )−k/2 hi+j −k−l 1D ,
×L j
√
where D σ = D ∩ BRk (0, n) is the tame stratified space obtained√ from the inter-
section of the embedding of D in Rk and the open ball of radius n, endowed with
the metric gσ given by (15.8.3) and
hl (t, ν) = ν, tlRk .
(Note also that the final term in (15.9.2) is an integral against the generalized cur-
vature measure L3j −l .)
Furthermore, suppose that for all 0 ≤ r, s ≤ i + k − j ,
3r D, |hs |ϕk 1D < ∞,
L (15.9.3)
j
writing ϕk for the k-dimensional Gaussian density (2π )−k/2 e−|t| /2 . Then
2
1
lim
1/n
Ln−1−i π√ −1 3n−k−1+j
D, D (15.9.4)
n→∞ sn n (n−1)/2 n,n,k
i+j −k
[k + l − j ]! i−1 3j −l D, ϕk hi+j −k−l 1D .
= L
[i]! k+l−j −1 j
l=0
422 15 Non-Gaussian Geometry
where
Hn−1−j +k (dt, dη) = σtn−1−k Hn−1−k (dη)Hj (dt)
is the Hausdorff measure that Dj × S(Rn−k ) inherits from D 3σ , the warped product
of (D , gσ ) and S(R ) with its usual metric and warp function σ 2 given by (15.8.4).
σ k
By Lemma 15.8.1, observation (15.8.8), and the subsequent remarks about the
normal Morse index α,
1/n
Ln−1−i π√ −1
D, 3n−k−1+j
D
n,n,k
= C(k − j, i + j − k)(2π)−(i+j −k)/2
Dj S(Rn−k ) S(Tt Dj⊥ )
i+j −k
n−1−k i+j −k−l
× σtn−1−k (−1)i+j −k−l νk−j (log σt )
i+j −k−l
l=0
1
× Tr(Sσ,ν
l
k−j
)α(νk−j )Hk−j (dνk−j )Hn−1−k (dη)Hj (dt).
l!
Equation (15.9.2) now follows from the fact that
C(k − j, i + j − k)(2π)−(i+j −k)/2 sk+l−j
−l/2
= ,
C(k − j, l)(2π) si
followed by integrating over S(Rn−k ) and noting that
νk−j , tRk
νk−j (log σt ) = − .
σt2
As for the second conclusion of the lemma, (15.9.4), note that
sn−k n−k−1 n+k−2i−2j +2l−1
lim σt
n→∞ sn n(n−1)/2 i + j − k − l
sn−k n−k−1
= lim n(n+k−2i−2j −2l−1)/2
n→∞ sn n(n−1)/2 i + j − k − l
(n+k−2i−2j −2l−1)/2
t2
× 1−
n
(2π)−k/2
e−t /2 .
2
=
(i + j − k − l)!
15.9 Non-Gaussian Mean Intrinsic Volumes 423
Also,
sk+l−j sk+l−j (k + l − j − 1)! i−1
=
si (i + j − k − l)! si (i − 1)! k+l−j
[k + l − j ]! i−1
= .
[i]! k+l−j
Finally, since it is not hard to see that there exists a finite K such that for all n large
enough,
n/2
|t|2
≤ Ke−|t| /2
2
1−
n
√ ◦
for all t ∈ BRk (0, n) , dominated convergence yields (15.9.4) and we are done.
In Lemma 15.9.1 we computed the contribution of the sets D3n−k−1+j = Dj ×
−1 −1 3n−k−1+j ∩
S(R ) to the curvatures Ln−1−i (π√n,n,k D) under the assumption that D
n−k n
S(Rk ) = ∅.
3n−k−1+j ∩ S(Rk ) = ∅, then there is
Our task now is to show that if in fact, D
−1 −1
actually no contribution to Ln−1−i (π√n,n,k D) for n large enough. We write this as
n
for all h.
Proof. Since Dj ⊂ S√n (Rk ),
−1
π√ D = Dj ,
n,n,k j
−1 −1
and so π√ D is a j -dimensional stratum of π√
n,n,k j n,n,k
D. From Definition 10.9.1,
we see that such strata contribute only to the intrinsic volumes of order 0 to j , as
required.
We now have all that we need to evaluate the illusive limits (15.6.6), namely,
L n−1 π√−1
D
n−1−j
3j (D) = (2π)−j/2 [j ]! lim
n,n,k
ρ ,
n→∞ sn n(n−1)/2
and via them the all-important functions ρj (D) of (15.6.13).
424 15 Non-Gaussian Geometry
k−1
i+j −k
−i/2 [k + l − j ]! i−1
3i (D) = (2π)
ρ [i]!
[i]! k+l−j −1
j =k−i l=0
3j −l D, ϕk hi+j −k−l 1D
×L j
k−1
i+j −k
−i/2 i−1
= (2π)
k+l−j −1
j =k−i l=0
×M Lk+l−j D, ϕk hi+j −k−l 1D
j
i−1
−i/2 i−1 Lm+1 (D, ϕk hi−1−m ) ,
= (2π) M
m
m=0
2 !
i−1
(−1)l
ρi (D) = (i − 1)! 3i−2l (D)
ρ
(4π)l l!(i − 1 − 2l)!
l=0
2 ! i−2l−1
i−1
(−1)l
= (i − 1)! (2π )−(i−2l)/2
(4π)l l!(i − 1 − 2l)!
l=0 m=0
i − 2l − 1 L
× Mm+1 (D, ϕk hi−2l−1−m )
m
i−1
i−1 L
= (2π)−(i+k)/2 Mm+1 D, Hi−m−1 (η, t)e−|t| /2
2
m
m=0
−i½ γ
= (2π) Mi (D),
the second-to-last line coming from the definition (11.6.9) of the Hermite polynomials,
γ
and the last from the definition (10.9.12) of the Mi (D).
The main results of this chapter, and indeed of the book, are now simple corollaries.
9
N−i
i+j
:
−1
(2π)−j/2 Li+j (M)Mj (D),
γ
E Li M ∩ y (D) = (15.9.6)
j
j =0
Proof. Theorems 15.6.3 and 15.9.3, which actually require fewer conditions than
we have assumed, immediately yield that (15.9.6) holds for the canonical isotropic
process y of (15.6.3).
On the other hand, Theorem 15.2.1, which does require all the conditions of the
theorem
being proven,
implies that in general, the mean Lipschitz–Killing curvatures
E Li (M ∩ y −1 (D) break up into a sum of Lipschitz–Killing curvatures of M and
coefficients that are dependent only on D and some other constants. In particular,
they are independent of the covariance structure of the underlying process and its
parameter space. Consequently, these coefficients are the same as we computed for
the canonical isotropic process on the sphere. That is, (15.9.6) holds in general.
Finally, we have the main theorem of the book, which is a trivial consequence of
the previous one. Indeed, we would have called it a corollary rather than a theorem,
were it not for the fact that it is such an important result.
Theorem 15.9.5. With the same notation and conditions as in Theorem 15.9.4, let
F ∈ C 2 (Rk , R) and define the non-Gaussian random field f = F (y(t)). Then
9
N−i
i+j
:
(2π)−j/2 Li+j (M)Mj F −1 [u, +∞) .
γ
E {Li (Au (f, M))} =
j
j =0
15.10 Examples
To round off, we now want to look at three examples of Theorem 15.9.5, one familiar
and two new. By “examples,’’ what we mean are explicit computations of the coeffi-
γ
cients Mj F −1 [u, +∞) for particular choices of the function F . There is no need
to recompute the coefficents Li+j since these are the same as in the Gaussian case
and independent of the choice of F .
The familiar example will be the case of Gaussian f , in which case, in the notation
of Theorem 15.9.5, k = 1 and F (x) ≡ x. Here we have seen the result many times,
including Chapters 11 and 12 for the Euler characteristic and Chapter 13 for the
remaining Lipschitz–Killing curvatures, not to mention in Section 15.7 of the present
426 15 Non-Gaussian Geometry
chapter. Thus our aim is simply to check that everything works as it should and to
see how to derive the specific from the general.
For the remaining two examples, we take two of the examples of (15.0.2), specif-
ically the so-called χ 2 and F random fields. In all three examples, we shall look for
functions ρj,f such that
N
E {L0 (Au (f, M))} = ρj,f (u)Lj (M). (15.10.1)
j =0
The ρj,f are generally known as the Euler characteristic (EC) densities for f , and
by Theorem 15.9.5 they are given by
ρj,f (u) = (2π)−j/2 Mj F −1 [u, +∞) .
γ
(15.10.2)
Although they are defined via the mean Euler characteristic calculation, it follows from
Theorem 15.9.5 that they determine all the other mean Lipschitz–Killing curvatures
as well, since
9
N−i
i+j
:
E {Li (Au (f, M))} = ρj,f (u)Li+j (M). (15.10.3)
j
j =0
Throughout, we shall assume, as we have done so far and without further comment,
that the assumptions of Theorem 15.9.5 hold regarding both the manifold M and the
process y.
Despite this easy path, we shall rederive the result via the Taylor series mentioned in
Chapter 10, as a preliminary to the more complicated χ 2 and F cases.
Recall that the Mγ arise as coefficents in the formal tube expansion (10.9.11).
That is,
∞
ρj γ
γRl (Tube(D, ρ)) = γRl (D) + Mj R (D).
l
(15.10.6)
j!
j =1
15.10 Examples 427
d j −x 2 /2
= (−1)j Hj (x)e−x /2
2
j
e
dx
(cf. (11.6.11)), we have, via a standard Taylor series expansion of the exponential
function, that
γRk T (Du , ρ) = 1 − (u − ρ)
∞
(−ρ)j (−1)j −1
Hj −1 (u)e−u /2
2
= 1 − (u) + √
j! 2π
j =1
∞
ρj 1
√ Hj −1 (u)e−u /2 ,
2
= 1 − (u) +
j! 2π
j =1
2π
which is, of course, the same as (15.10.4) and so also implies (15.10.5).
Substituting (15.10.5) into (15.10.1) gives the mean Euler characteristic results of
Chapters 11 and 12 (cf. (11.7.15) and Theorems 12.4.1 and 12.4.2), while substituting
into (15.10.3) recaptures the main results of Chapter 13 for the remaining Lipschitz–
Killing curvatures (cf. Theorems 13.2.1 and 13.4.1).
The χ 2 random fields arise in a number of applications (cf. [172]) and are easily
defined by taking F , in our standard notation, to be
G(y) = |y|2 .
(The desire for a change of notation will become clear soon.) We shall write the
corresponding random processes as
g(t) = G(y(t)).
The EC densities for these processes19 were originally derived in [2] for N = 2 and in
[172] in general, in both cases from a first-principles approach and with considerable
and complicated computation. We shall work via Theorem 15.9.5, an approach first
adopted in [157, 158]. As you will see in a moment, the general approach is basically
the same as in the simple Gaussian case, although the details of the computation are
a little more complicated.
19 The EC densities for noncentral χ 2 processes are also known, and available in [172] and
[157, 158].
428 15 Non-Gaussian Geometry
We start the actual computation by noting that if we take F (x) = |x|, then
f = F ◦ y is the square root of the χk2 random field g on M. Since the EC densities
of f are related to those of the χk2 field g by
√
ρj,g (u) = ρj,f ( u),
it suffices to calculate the EC densities of f , which turns out to lead to slightly tidier
computations.20
Following the same path we took for the Gaussian example, we note that for this
f we have
1
x k−1 e−x /2 ,
2
pk (x) =
(k/2)2(k−2)/2
and so
∞
(−ρ)j d j −1 pk
γRk Du−ρ = γRk (Du ) − .
j ! dx j −1 x=u
j =1
Direct calculations, again exploiting (11.6.11) as in the Gaussian case, show that
j −1
d j −1 pk 1 j −1 id
j −1−i x k−1
Hi (x)e−x /2
2
j −1
= (k−2)/2
(−1) j −1−i
dx (k/2)2 i dx
i=0
j −1
e−x /2 d j −1−i x k−1
2
j −1
= (−1)i Hi (x).
(k/2)2(k−2)/2 i dx j −1−i
i=0
j −1
j −1 (k − 1)! k+i−j
1{k≥j −i} (−1)i x Hi (x)
i (k + i − j )!
i=0
j −1
i/2!
j −1 (k − 1)! i!
= x k−j 1{k≥j −i} (−1)i+l x 2i−2l
i (k + i − j )! (i − 2l)!l!2l
i=0 l=0
j −1
2 !
j −1
j −1 (k − 1)! i!
=x k−j
1{k≥j −i} (−1)i+l x 2i−2l
i (k + i − j )! (i − 2l)!l!2l
l=0 i=2l
j −1
2 ! j −1−2l
k−1 (−1)m+l (j − 1)! 2m+2l
=x k−j
1{k≥j −m−2l} x .
j − 1 − m − 2l m!l!2l
l=0 m=0
Theorem 15.10.1. The EC densities of the χk2 random field g are given, for j ≥ 1
and u > 0, by
j −1
2 ! j −1−2l
u(k−j )/2 e−u/2
ρj,g (u) = j/2 (k−2)/2
(2π) (k/2)2
l=0 m=0
k−1 (−1)j −1+m+l (j − 1)! m+l
× 1{k≥j −m−2l} u .
j − 1 − m − 2l m!l!2l
When j = 0,
ρ0,g (u) = P χk2 ≥ u .
430 15 Non-Gaussian Geometry
Perhaps the most important thing to note from this example is that although many
of the formulas above may be long, there is nothing at all difficult in them. All that
was required, once the structure of the sets Du was understood, was basic calculus.
Indeed, this is often the case. However, there are cases, as we shall see from
the following example, for which, as the function F becomes more complex, it
becomes more efficient to backtrack a little and to compute the Minkowski curvature
functionals from first principles.
it is now not hard (although it involves some patience playing with constants) to see
from (10.9.2) and (10.9.5) that for j ≥ 1,
γ
Mj R
m+n
(Du ) (15.10.7)
(j −1)
2 !
(−1)l
1
e−|x| /2 Mj −2l (Du , dx)
2
= (j − 1)! l (m+n)/2
l!2 (2π) ∂Du
l=0
(j −1)
2 !
(−1)l 1
= (j − 1)!
l!2l (2π)(m+n)/2
l=0
1 j −2l−1
e−|x|
2 /2
× Tr ∂Du S∂Du dHm+n−1 (x).
∂Du (j − 2l − 1)!
15.10 Examples 431
The computation of this last integral is not simple, so we separate it out as a lemma.
Lemma 15.10.2. In the notation above,
1 1 ∂Du k −|x|2
Tr S∂Du e dHm+n−1 (x)
(2π )(m+n)/2 ∂Du k!
m+n−k−1
2
mu (m−1−k)/2 mu −(m+n−2)/2
= (k−1)/2 m n 1+
2 2 2 n n
k mu i m − 1n − 1
× (−1)k−i .
n k−i i
i=0
The matrix of −|∇F |−1 ∇ 2 F in this set of frames is diagonal, with entries
⎛ m − 1 times n − 1 times ⎞
1 @ A> ? @ A> ?
√ ⎝−1, . . . , −1, G, . . . , G, 0⎠ .
V G(1 + G)
By expanding the trace of the kth power of such a diagonal matrix we now obtain,
from (15.10.8), that
k
1 ∂DF k
k
1 k−i i m − 1 n−1
Tr S∂DF = √ (−1) G .
k! V G(1 + G) k−i i
i=0
There are many more formulas like these, for a wide variety of non-Gaussian
processes, some simpler, many even more complex.
Complexity, however, is a relative concept. The underlying formula behind all of
the special cases of this last section was the simple result of Theorem 15.9.5, namely,
9
N−i
i+j
:
(2π)−j/2 Li+j (M)Mj F −1 [u, +∞) .
γ
E {Li (Au (f, M))} =
j
j =0
[1] R. J. Adler, Excursions above a fixed level by n-dimensional random fields, J. Appl.
Probab., 13 (1976), 276–289.
[2] R. J. Adler, The Geometry of Random Fields, Wiley Series in Probability and Mathe-
matical Statistics, Wiley, Chichester, UK, 1981.
[3] R. J. Adler, An Introduction to Continuity, Extrema, and Related Topics for General
Gaussian Processes, IMS Lecture Notes–Monograph Series, Vol. 12, Institute of Math-
ematical Statistics, Hayward, CA, 1990.
[4] R. J. Adler, On excursion sets, tube formulae, and maxima of random fields, Ann. Appl.
Probab., 10 (2000), 1–74.
[5] R. J. Adler and L. D. Brown, Tail behaviour for suprema of empirical processes, Ann.
Probab., 14-1 (1986), 1–30.
[6] R. J. Adler and R. Epstein, Some central limit theorems for Markov paths and some
properties of Gaussian random fields, Stochastic Proc. Appl., 24 (1987), 157–202.
[7] R. J. Adler and A. M. Hasofer, Level crossings for random fields, Ann. Probab., 4 (1976),
1–12.
[8] R. J. Adler, J. E. Taylor, and K. J. Worsley, Random Fields and Geometry: Applications,
in preparation.
[9] J. M. P. Albin, On extremal theory for stationary processes, Ann. Probab., 18-1 (1990),
92–128.
[10] D. Aldous, Probability Approximations via the Poisson Clumping Heuristic, Applied
Mathematical Sciences, Vol. 77, Springer-Verlag, New York, 1989.
[11] C. B. Allendoerfer and A. Weil, The Gauss–Bonnet theorem for Riemannian polyhedra,
Trans. Amer. Math. Soc., 53 (1943), 101–129.
[12] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 3rd ed., Wiley
Series in Probability and Statistics, Wiley–Interscience, Hoboken, NJ, 2003.
[13] T. M. Apostol, Mathematical Analysis, Addison–Wesley, Reading, MA, 1957.
[14] J.-M. Azaïs, J.-M. Bardet, and M. Wschebor, On the tails of the distribution of the
maximum of a smooth stationary Gaussian process, ESAIM Probab. Statist., 6 (2002),
177–184.
[15] J.-M. Azaïs and M. Wschebor, On the distribution of the maximum of a Gaussian field
with d parameters, Ann. Appl. Probab., 15-1A (2005), 254–278.
[16] Yu. K. Belyaev, Point processes and first passage problems, in Proceedings of the 6th
Berkeley Symposium on Mathematics, Statistics, and Probability, Vol. 2, University of
California Press, Berkeley, CA, 1972, 1–17.
436 References
[17] S. M. Berman, Asymptotic independence of the numbers of high and low level crossings
of stationary Gaussian processes, Ann. Math. Statist., 42 (1971), 927–945.
[18] S. M. Berman, Excursions above high levels for stationary Gaussian processes, Pacific
J. Math., 36 (1971), 63–79.
[19] S. M. Berman, Maxima and high level excursions of stationary Gaussian processes,
Trans. Amer. Math. Soc., 160 (1971), 65–85.
[20] S. M. Berman, An asymptotic bound for the tail of the distribution of the maximum of
a Gaussian process, Ann. Inst. H. Poincaré Probab. Statist., 21-1 (1985), 47–57.
[21] S. M. Berman, An asymptotic formula for the distribution of the maximum of a Gaussian
process with stationary increments, J. Appl. Probab., 22-2 (1985), 454–460.
[22] S. M. Berman, Sojourns and Extremes of Stochastic Processes, Wadsworth–Brooks/Cole
Statistics/Probability Series, Wadsworth–Brooks/Cole, Pacific Grove, CA, 1992.
[23] S. M. Berman and N. Kôno, The maximum of a Gaussian process with nonconstant
variance: A sharp bound for the distribution tail, Ann. Probab., 17-2 (1989), 632–650.
[24] A. Bernig and L. Bröcker, Lipschitz–Killing invariants, Math. Nachr., 245 (2002), 5–25.
[25] N. H. Bingham, C. M. Goldie, and J. L. Teugels, Regular Variation, Cambridge Uni-
versity Press, Cambridge, UK, 1987.
[26] I. F. Blake and W. C. Lindsey, Level-crossing problems for random processes, IEEE
Trans. Inform. Theory, IT-19 (1973), 295–315.
[27] S. Bochner, Monotone Funktionen Stieltjessche Integrale and harmonische Analyse,
Math. Ann., 108 (1933), 378.
[28] V. I. Bogachev, Gaussian Measures, Mathematical Surveys and Monographs, Vol. 62,
American Mathematical Society, Providence, RI, 1998.
[29] W. M. Boothby, An Introduction to Differentiable Manifolds and Riemannian Geometry,
Academic Press, San Diego, 1986.
[30] C. Borell, The Brunn–Minkowski inequality in Gauss space, Invent. Math., 30 (1975),
205–216.
[31] C. Borell, The Ehrhard inequality, C. R. Math. Acad. Sci. Paris, 337-10 (2003), 663–666.
[32] C. Borell, Minkowski sums and Brownian exit times, Ann. Fac. Sci. Toulouse, to appear.
[33] L. Bröcker and M. Kuppe, Integral geometry of tame sets, Geom. Dedicata, 82-1–
3 (2000), 285–323.
[34] P. J. Brockwell and R. A. Davis, Time Series: Theory and Methods, Springer-Verlag,
New York, 1991.
[35] E. V. Bulinskaya, On the mean number of crossings of a level by a stationary Gaussian
process, Theory Probab. Appl., 6 (1961), 435–438.
[36] R. Cairoli and J. B. Walsh, Stochastic integrals in the plane, Acta Math., 134 (1975),
111–183.
[37] S. Chatterjee, An error bound in the Sudakov–Fernique inequality, preprint,
math.PR/0510424, 2005.
[38] L. Chaumont and M. Yor, Exercises in Probability: A Guided Tour from Measure Theory
to Random Processes, via Conditioning, Cambridge Series in Statistical and Probabilis-
tic Mathematics, Cambridge University Press, Cambridge, UK, 2003.
[39] J. Cheeger, W. Müller, and R. Schrader, On the curvature of piecewise flat spaces, Comm.
Math. Phys., 92 (1984), 405–454.
[40] S. S. Chern, On the kinematic formula in integral geometry, J. Math. Mech., 16 (1966),
101–118.
[41] H. Cramér, A limit theorem for the maximum values of certain stochastic processes,
Teor. Verojatnost. i Primenen., 10 (1965), 137–139.
[42] H. Cramér and M. R. Leadbetter, Stationary and Related Stochastic Processes, Wiley,
New York, 1967.
References 437
[43] N. Cressie and R. W. Davis, The supremum distribution of another Gaussian process,
J. Appl. Probab., 18-1 (1981), 131–138.
[44] R. C. Dalang and T. Mountford, Jordan curves in the level sets of additive Brownian
motion, Trans. Amer. Math. Soc., 353 (2001), 3531–3545.
[45] R. C. Dalang and J. B. Walsh, The structure of a Brownian bubble. Probab. Theory
Related Fields, 96 (1993), 475–501.
[46] P. Diaconis and D. Freedman, A dozen de Finetti-style results in search of a theory, Ann.
Inst. H. Poincaré Probab. Statist., 23-2 (suppl.) (1987), 397–423.
[47] P. W. Diaconis, M. L. Eaton, and S. L. Lauritzen, Finite de Finetti theorems in linear
models and multivariate analysis, Scand. J. Statist., 19-4 (1992), 289–315.
[48] V. Dobrić, M. B. Marcus, and M. Weber, The distribution of large values of the supremum
of a Gaussian process, in Colloque Paul Lévy sur les Processus Stochastiques (Palaiseau,
1987), Astérisque, Vols. 157–158, Société Mathématique de France, Paris, 1988, 95–
127.
[49] R. L. Dobrushin, Gaussian and their subordinated self-similar random generalized fields,
Ann. Probab., 7 (1979), 1–28.
[50] J. L. Doob, Stochastic Processes, Wiley, New York, 1953.
[51] R. M. Dudley, Sample functions of the Gaussian process, Ann. Probab., 1 (1973), 66–
103.
[52] R. M. Dudley, Metric entropy of some classes of sets with differentiable boundaries,
J. Approx. Theory, 10 (1974), 227–236.
[53] R. M. Dudley, Central limit theorems for empirical measures, Ann. Probab., 6 (1978),
899–929.
[54] R. M. Dudley, Lower layers in 42 and convex sets in 43 are not GB classes, in A. Beck,
ed., The Second International Conference on Probability in Banach Spaces, Lecture
Notes in Mathematics, Vol. 709, Springer-Verlag, Berlin, New York, Heidelberg, 1978,
97–102.
[55] R. M. Dudley, Real Analysis and Probability, Wadsworth, Belmont, CA, 1989.
[56] R. M. Dudley, Uniform Central Limit Theorems, Cambridge University Press, Cam-
bridge, UK, 1999.
[57] E. B. Dynkin, Markov processes and random fields, Bull. Amer. Math. Soc., 3 (1980),
957–1000.
[58] E. B. Dynkin, Markov processes as a tool in field theory, J. Functional Anal., 50 (1983)
167-187.
[59] E. B. Dynkin, Gaussian and non-Gaussian random fields associated with a Markov
process, J. Functional Anal. 55 (1984) 344-376.
[60] E. B. Dynkin, Polynomials of the occupation field and related random fields, J. Func-
tional Anal., 58 (1984), 20–52.
[61] A. Erdélyi, Higher Transcendential Functions, Vol. II, Bateman Manuscript Project,
McGraw–Hill, New York, 1953.
[62] V. R. Fatalov, Exact asymptotics of large deviations of Gaussian measures in a Hilbert
space, Izv. Nats. Akad. Nauk Armenii Mat., 27-5 (1992), 43–61 (in Russian).
[63] V. R. Fatalov, Asymptotics of the probabilities of large deviations of Gaussian fields:
Applications. Izv. Nats. Akad. Nauk Armenii Mat., 28-5 (1993), 25–51 (Russian).
[64] H. Federer, Curvature measures, Trans. Amer. Math. Soc., 93 (1959), 418–491.
[65] H. Federer, Geometric Measure Theory, Springer-Verlag, New York, 1969.
[66] X. Fernique, Regularité des trajectoires des fonctions aléatoires gaussiennes, in École
d’Été de Probabilités de Saint-Flour IV: 1974, Lecture Notes in Mathematics, Vol. 480,
Springer-Verlag, Berlin, 1975, 1–96.
438 References
[162] V. N. Vapnik, The Nature of Statistical Learning Theory, 2nd ed., Springer-Verlag, New
York, 2000.
[163] V. N. Vapnik, and A. Ya. Červonenkis, On the uniform convergence of relative frequen-
cies of events to their probabilities, Theory Probab. Appl., 16 (1971), 264–280.
[164] V. N. Vapnik and A. Ya. Červonenkis, Theory of Pattern Recognition: Statistical Prob-
lems in Learning, 1974, Nauka, Moscow (in Russian).
[165] J. B. Walsh, An Introduction to Stochastic Partial Differential Equations, École d’été de
probabilités de Saint-Flour XIV—1984, Lecture Notes in Mathematics 1180, Springer-
Verlag, New York, 1986, 265–439.
[166] S. Watanabe, Stochastic Differential Equations and Malliavan Calculus, Springer-
Verlag, Berlin, 1984.
[167] M. Weber, The supremum of Gaussian processes with a constant variance, Probab.
Theory Related Fields, 81-4 (1989), 585–591.
[168] H. Weyl, On the volume of tubes, Amer. J. Math., 61 (1939), 461–472.
[169] E. Wong, Stochastic Processes in Information and Dynamical Systems, McGraw–Hill,
New York, 1971.
[170] E. Wong and M. Zakai, Martingales and stochastic integrals for processes with a multi-
dimensional parameter, Z. Wahrscheinlichkeitstheorie Verw. Gebiete, 29 (1974), 109–
122.
[171] E. Wong and M. Zakai, Weak martingales and stochastic integrals in the plane, Ann.
Probab., 4 (1976), 570–587.
[172] K. J. Worsley, Local maxima and the expected Euler characteristic of excursion sets of
χ 2 , F , and t fields. Adv. Appl. Probab., 26 (1994), 13–42.
[173] K. J. Worsley, Boundary corrections for the expected Euler characteristic of excursion
sets of random fields, with an application to astrophysics, Adv. Appl. Probab., 27 (1995),
943–959.
[174] K. J. Worsley, Estimating the number of peaks in a random field using the Hadwiger char-
acteristic of excursion sets, with applications to medical images, Ann. Statist., 23 (1995),
640–669.
[175] K. J. Worsley, The geometry of random images, Chance, 9-1 (1997), 27–40.
[176] K. J. Worsley and K. J. Friston, A test for a conjunction, Statist. Probab. Lett., 47-
2 (2000), 135–140.
[177] A. M. Yaglom, Some classes of random fields in n-dimensional space, related to sta-
tionary random processes, Theory Probab. Appl., 2 (1957), 273–320.
[178] A. M. Yaglom, Second-order homogeneous random fields, in Proceedings of the 4th
Berkeley Symposium on Mathematics, Statistics, and Probability, Vol. 2, University of
California Press, Berkeley, CA, 1961, 593–622.
[179] K. Ylinen, Random fields on noncommutative locally compact groups, in H. Heyer, ed.,
Probability Measures on Groups VIII (Oberwolfach), Lecture Notes in Mathematics,
Vol. 1210, Springer-Verlag, Berlin, New York, Heidelberg, 1986, 365–386.
[180] N. D. Ylvisaker, The expected number of zeros of a stationary Gaussian process, Ann.
Math. Statist., 36 (1965), 1043–1046.
[181] A. C. Zaanen, Linear Analysis, North-Holland, Amsterdam, 1956.
[182] M. Zahle, Approximation and characterization of generalized Lipshitz–Killing curva-
tures, Ann. Global Anal. Geom., 8 (1990), 249–260.
Notation Index
AD , ix Mj , 144
(N, d), 7 HN , 150
, 8 ∂M, 150
, 8 M, 150
, 8 C k , 150
Nd (m, C), Nd (m, #), 9 Tt M, 152
C(s, t), 11 Xt , 152
d, 12 g∗ , 152
Bd (t, ε), 13 k (E), 153
N (T , d, ε), 13 T (M), 153
H (T , d, ε), 13 T , Tji , T ∗ , T ∗ , 155
ωf , 14 ⊗, 155
S N−1 , 36 S(k), 155
I (N, q, M), 37 k (V ), Sym(V )∗ (V ), Sym∗ (V ), 155
||f ||, 50 ∧, 156
(u, v)H , 66 n,m , 157
Tδ , 78 ∗,∗ , 157
GN , GL, 108 Tr, 158
H, H, 111 g, 160
K, K, 111 (M, g), 160
λi1 ...iN , 112 O(M), 160
Jm , 116 ∇, 161
sN , 117 ∇X Y , 162
(N−1) k , 164
ij
hml , 117
9 : [X, Y ], 163
n
k
, 145 Ei , 163
Au , 127 j
θi , 164
ϕ, 130 df , 164
B N , 133 exp, exp(M,g) , 165
Lj , 142
∇ 2 , 166
ωj , 142
g ∗ , 167
BλN , 144 , Volg , 168
SλN−1 , 144 R(X, Y ), 172
444 Notation Index