0% found this document useful (0 votes)
33 views

Proper Orthogonal Decomposition

Uploaded by

Riad Dewan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Proper Orthogonal Decomposition

Uploaded by

Riad Dewan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

3

Proper orthogonal decomposition

The proper orthogonal decomposition (POD) provides a basis for the modal decomposition
of an ensemble of functions, such as data obtained in the course of experiments. Its prop-
erties suggest that it is the preferred basis to use in various applications. The most striking
of these is optimality: it provides the most efficient way of capturing the dominant com-
ponents of an infinite-dimensional process with only finitely many, and often surprisingly
few, “modes.”
The POD was introduced in the context of turbulence by Lumley in [220]. In other
disciplines the same procedure goes by the names: Karhunen–Loève decomposition, prin-
cipal components analysis, singular systems analysis, and singular value decomposition.
The basis functions it yields are variously called: empirical eigenfunctions, empirical
basis functions, and empirical orthogonal functions. According to Yaglom (see [221]),
the POD was introduced independently by numerous people at different times, includ-
ing Kosambi [197], Loève [215], Karhunen [183], Pougachev [285], and Obukhov [272].
Lorenz [216], whose name we have already met in another context, suggested its use in
weather prediction. The procedure has been used in various disciplines other than fluid
mechanics, including random variables [275], image processing [313], signal analysis [5],
data compression [7], process identification and control in chemical engineering [118,119],
and oceanography [286]. Computational packages based on the POD are now readily
available (an early example appeared in [11]).
In the bulk of these applications, the POD is used to analyze experimental data with
a view to extracting dominant features and trends – in particular coherent structures. In
the context of turbulence and other complex spatio-temporal fields, these will typically be
patterns in space and time. However, our goal is somewhat different. As outlined in the
introductory chapter, we wish to use the POD to provide a “relevant” set of basis functions
with which we can identify a low-dimensional subspace on which to construct a model by
projection of the governing equations. The POD will produce the key spatial ingredients,
from which our models will dynamically recreate the coherent structures as time-dependent
mixtures of POD modes. In the following sections we review the theory behind the POD
and present some new results and extensions which are useful in the context of low-
dimensional models. The POD is an especially important tool in this connection, since
it yields an increasing sequence of finite-dimensional subspaces of the full phase space,

68

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.1 Introduction 69

chosen to resolve those parts of physical and phase space which contain the dominant
dynamics.
Although we shall almost exclusively apply it to nonlinear problems, it is important to
recognize that the POD is a linear procedure, and the nested sequence of subspaces referred
to above are linear spaces, even if the dynamical systems ultimately to be defined in them,
and the source of the data that generates them, are nonlinear. Linearity is the source of the
method’s strengths as well as its limitations: appealing to results from the theory of linear
operators, we can give a fairly complete account of the properties of representations via
the POD, but in stating optimality results, for example, the reader must remember that we
imply optimality only with respect to other linear representations. By a linear representa-
tion, we mean the superposition by a finite or an infinite sum of modal functions multiplied
by appropriate coefficients, such as a Fourier series. These representations in terms of basis
functions chosen a priori or by the POD are blind to the origin of the functions they are
called upon to represent, which of course may, and in our case do, derive from nonlinear
dynamical processes.
We start by introducing the POD in the simple context of scalar fields. The succeeding
sections describe the properties of representations using empirical basis functions,
precisely characterizing the fields they can reproduce and the sense in which they are
optimal, and explaining how symmetries in the data sets affect them. We also discuss
the relationship between the use of empirical basis functions and the structure of attrac-
tors in phase space. The chapter ends with some comments on the relation of the POD
to other techniques for the statistical analysis of data. Basic derivations and descriptions
of the POD can be found in various books, most notably in the context of turbulence in
Lumley ( [221], Section 3.5ff.), but a number of the results we shall find useful for low-
dimensional models do not appear well known. We have therefore included statements and
proofs of them, relegating some of the more technical derivations and discussions to the
Appendix (Section 3.8). The mathematical basis for the POD is the spectral theory of com-
pact, self-adjoint operators and we sketch some of the relevant background of this in the
Appendix also, along with those elements of measure theory needed for the definition of
averages. Some of the material of this chapter appeared previously in [45].

3.1 Introduction
The fundamental idea behind the POD is straightforward. Suppose that we have an ensem-
ble {u k } of scalar fields, each being a function u = u(x) defined on the domain 0 ≤ x ≤ 1.
Ultimately we want to use the theory in combination with Fourier decompositions, and
so we allow u be complex valued. In seeking good representations of members of {u k },
we need to project each u onto candidate basis functions, and so we assume that the func-
tions belong to an inner product space: for example, the linear, infinite-dimensional Hilbert
space L 2 ([0, 1]), of square integrable (complex-valued) functions with inner product
 1
( f, g) = f (x)g ∗ (x)d x. (3.1)
0

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
70 Proper orthogonal decomposition

(More information on L 2 spaces was given in Section 1.4.) In this context, we want to find
a basis {ϕ j (x)}∞ 2
j=1 for L that is optimal for the data set in the sense that finite-dimensional
representations of the form


N
u N (x) = a j ϕ j (x) (3.2)
j=1

describe typical members of the ensemble better than representations of the same dimen-
sion in any other basis. The notion of “typical” implies use of an averaging operation,
which we denote by · and which is assumed to commute with the spatial integral (3.1)
of the L 2 inner product. Averaging is discussed in more detail in the next section and in
the Appendix, but it is sufficient for now to imagine an ensemble average over a number
of separate experiments forming {u k }, or a time average over an ensemble with members
u k (x) = u(x, tk ), obtained from successive measurements during a single run.
We derive the POD in the context of a general Hilbert space H, with inner product (·, ·);
that is, we do not require the specific definition (3.1). While this approach may seem overly
abstract, it allows us to readily specialize the results to a number of cases of interest, such
as functions of more than one variable and vector-valued functions including the three-
dimensional velocity fields u(x, t) of turbulence. If the abstract formulation is off-putting,
it may help to think of the space H as the space of flowfields at a given time instant, so that
an element of H is a snapshot of the flow.
The mathematical statement of optimality is that we should choose ϕ such that the
average (squared) error between u and its projection onto ϕ is minimized:
 2 
 (u, ϕ) 
min u − ϕ , (3.3)
ϕ∈H  ϕ2 

where  ·  is the induced norm

 f  = ( f, f )1/2 .

This is equivalent to maximizing the averaged projection of u onto ϕ, suitably normalized:

|(u, ϕ)|2 
max , (3.4)
ϕ∈H ϕ2

where | · | denotes the absolute value. Solution of (3.4) as stated would yield only the best
approximation to the ensemble members by a single function, but the other critical points
of this functional are also physically significant, for they correspond to an entire set of
functions which, taken together, provide the desired basis.
We now have a problem in the calculus of variations: to extremize |(u, ϕ)|2  subject
to the constraint ϕ2 = 1. The corresponding functional for this constrained variational
problem is
 
J [ϕ] = |(u, ϕ)|2 − λ(ϕ2 − 1), (3.5)

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.1 Introduction 71

and a necessary condition for extrema is that the functional derivative vanish for all
variations ϕ + δψ ∈ H, δ ∈ R:
d
J [ϕ + δψ] |δ=0 = 0. (3.6)

From (3.5) we have
d 
J [ϕ + δψ]δ=0

d   !

= (u, ϕ + δψ)(ϕ + δψ, u) − λ(ϕ + δψ, ϕ + δψ) 
dδ δ=0
  !
= 2 Re (u, ψ)(ϕ, u) − λ(ϕ, ψ) = 0,

where we use the property of the inner product ( f, g) = (g, f )∗ . Interchanging the order
of the averaging operation and the inner product, the quantity in brackets may be written
"  #
(ϕ, u)u , ψ − λ(ϕ, ψ), (3.7)

or simply
 
Rϕ − λϕ, ψ , (3.8)

where the linear operator R is defined by


 
Rϕ = (ϕ, u)u . (3.9)

Finally, since ψ is an arbitrary variation, our condition reduces to the eigenvalue problem

Rϕ = λϕ. (3.10)

Thus the optimal basis is given by the eigenfunctions {ϕ j } of the operator R that is
defined from the empirical data u. They are consequently sometimes called empirical
eigenfunctions, or POD modes.
We shall shortly describe some properties of the POD, but first we illustrate with some
examples.

3.1.1 Finite dimensional spaces


Let H = R N , with the standard inner product (x, y) = yT x. Here the “data” are a collection
of M vectors uk ∈ R N , and the average · is just an arithmetic mean over the ensemble.
The operator R in (3.9) then becomes

1  k T k
M
Rx = (u ) xu (3.11)
M
k=1

or, equivalently, R is given by

1  k k T 1  k k
M M
R= u (u ) , Ri j = ui u j , (3.12)
M M
k=1 k=1

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
72 Proper orthogonal decomposition

or simply the correlation matrix uuT . Thus, R is a real, symmetric N × N matrix, and
Equation (3.10) is a standard matrix eigenvalue problem on R N . There is a nice geometri-
cal interpretation in this case: the eigenvectors are simply the principal axes of the cloud
of data points {uk } in the N -dimensional vector space. This idea is discussed further in
Section 3.4.2.

3.1.2 Scalar-valued functions


Let H = L 2 ([0, 1]), with the inner product (3.1). Then for an ensemble of functions u(x),
Equation (3.10) becomes
 1
 
Rϕ(x) = u(x)u ∗ (x  ) ϕ(x  )d x  = λϕ(x); (3.13)
0

the kernel of this integral equation is the averaged autocorrelation function R(x, x  ) =
u(x)u ∗ (x  ).

3.1.3 Vector-valued functions


Now let H = C(, V ) denote the space of continuous functions from some (spatial)
domain  ⊂ R3 to a vector space V = C3 (e.g., velocity vectors). Define an inner product
on H by

u, v = v (x)Qu(x) dx, (3.14)


where  denotes the complex-conjugate transpose, and Q ∈ C3×3 is a positive-definite Her-


mitian matrix. (Often we take Q = I , the identity matrix, but we shall see later that other
inner products may sometimes be useful.) Then the eigenvalue problem (3.10) becomes

 
Rϕ(x) = u(x)u (y) Qϕ(y) dy = λϕ(x). (3.15)


3.1.4 Technical properties of the POD


Henceforth we focus on the case H = L 2 ([0, 1]), with inner product (3.1). In the Appendix
we give conditions under which R is a compact self-adjoint operator, in which case spec-
tral theory [304] guarantees that the maximum in (3.4) exists and is equal to the largest
eigenvalue of the integral equation (3.13). Moreover, Hilbert–Schmidt theory assures us
that there is a countable infinity of eigenvalues and eigenfunctions that provides a diagonal
decomposition of the averaged autocorrelation function:


R(x, x  ) = λ j ϕ j (x)ϕ ∗j (x  ), (3.16)
j=1

and that the eigenfunctions ϕ j are mutually orthogonal in L 2 . This derivation is stronger
than the variational argument given above, since we are assured a maximum for (3.4) rather
than merely a critical point: solution of (3.10, 3.13) is both necessary and sufficient.

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.2 On domains and averaging 73

We can order the eigenvalues by λ j ≥ λ j+1 , and observe that the fact that the averaged
autocorrelation R(x, x  ) = u(x)u ∗ (x  ) is non-negative definite implies that the integral
operator R is non-negative definite. This ensures that λ j ≥ 0 for all j. As we shall see
in Section 3.3, almost every member (in a measure or probabilistic sense) of the ensemble
used in the averaging · leading to R(x, x  ) can be reproduced by a modal decomposition
based on the eigenfunctions {ϕ j }∞
j=1 :


u(x) = a j ϕ j (x). (3.17)
j=1

(The proper orthogonal decomposition is literally this equation.) Also, the diagonal
representation (3.16) of the two point correlation tensor implies that

a j ak∗  = δ jk λ j , (3.18)

so that the (random) modal coefficients of the representation are uncorrelated on average. If
u(x) is a turbulent velocity field, then the eigenvalues λ j represent twice the average kinetic
energy in each mode ϕ j , and thus, picking the subspace spanned by the modes {ϕ j } Nj=1
corresponding to the first (= largest) N eigenvalues, the representation (3.2) reproduces
the most energetic disturbances in the field, as claimed at the outset. The λ j are called
empirical eigenvalues.
Thus far we have considered only functions defined on a bounded interval. The
unbounded case, which is more natural in the context of open fluid flows, can be dealt with
in the same way provided that the inner product (now an infinite integral) is well defined,
and that the space of functions still has a countable basis. See the next section for more
remarks on this. In dealing with unbounded domains in practice we either select a finite
subdomain and use periodic boundary conditions (see Section 3.3.3) or we are concerned
with functions which decay to zero rapidly outside a finite region.
Another important point is implicit in Equations (3.17) and (3.18). We have remarked
that non-negative definiteness of R(x, x  ) implies that the empirical eigenvalues λ j are
non-negative themselves, but in general they are not all strictly positive. To produce a com-
plete basis for L 2 we must include all those “additional” eigenfunctions ϕ j with zero eigen-
values, although, in view of (3.18), they carry no information on the original data ensemble.
It is therefore often advantageous to consider representations in terms of only those empiri-
cal eigenfunctions with non-zero eigenvalues. We shall return to this shortly in Section 3.3.

3.2 On domains and averaging


In our introductory discussion we focused on fields depending on a single variable x.
In dealing with turbulence, the fields depend on four variables, three spatial and one
temporal. There is no reason a priori to distinguish between space and time and the multi-
dimensional theory does not enforce such a distinction: for experiments performed in
the one-dimensional spatial domain 0 ≤ x ≤ 1 over times of duration T , one simply
measures correlations with time lags as well as spatial separations and works in the
space L 2 ([0, 1] × [0, T ]), in which case the inner product becomes a double integral

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
74 Proper orthogonal decomposition

over x and t. However, in view of our intended use of the POD in the derivation of low-
dimensional models, we generally seek only spatial basis functions. The time-dependent
modal coefficients ak (t) in representations of the form

u(x, t) = ak (t)ϕk (x) (3.19)
k

and their multi-dimensional analogs, are determined subsequently via projection of the
governing equations.
Depending on the physical situation, one has to decide if the problem at hand should be
treated as stationary or non-stationary in time. If one seeks purely spatial representations
of a space–time field u(x, t) in a statistically stationary problem, the correlation functions
between pairs of points in physical space must be measured with no time lag. Assuming
ergodicity, time may be used to increase the ensemble size by including measurements
taken at appropriately separated intervals during a single experimental run, in which case
the ensemble members may be defined as u k (x) = u(x, tk ) and · effectively becomes
a time average. In contrast, if one wishes to represent a non-stationary field as a function
over (x, t)-space, correlations with time lags as well as spatial separations must be mea-
sured and multiple runs of the experiment undertaken to increase the ensemble size. In
such a case one could still derive purely spatial representations from the zero time lag cor-
relation averaged over ensembles chosen from different experiments of the same “age,” for
example, or by appeal to some other condition. We refer the reader to [368] for a discussion
of such issues. In our applications we shall generally assume stationarity in time.
In fluid mechanical applications involving a well-defined (finite) domain in physical
space with well-defined boundary conditions, examination of the mathematical details
associated with the averaging operator is usually unnecessary. However, it is impor-
tant to appreciate that, when dealing with an infinite domain, difficulties may arise.
The appropriate function space for time-stationary problems is L 2 () where  is the
(three-dimensional) spatial domain of the experiment, analogous to the interval [0, 1] in
Section 3.1. The appropriate function space for time-dependent problems, for which there
is no a priori upper bound on duration, is L 2loc ( × [0, ∞)), where the subscript loc implies
that the L 2 -norm is finite on finite closed intervals in time (the second variable). For prob-
lems in fluid mechanics a finite L 2 -norm is a reasonable assumption since it corresponds to
finite kinetic energy (although if infinite-time experiments were possible then the integrated
kinetic energy might become unbounded). Here we merely caution the reader that sub-
tleties arise in the time-dependent case due, for example, to the fact that L 2loc ( × [0, ∞))
is not a separable space [304] and so does not admit a countable set of basis functions.
Most of the analysis in this chapter assumes countable bases.

3.3 Properties of the POD


In much of the literature cited thus far in this chapter the POD is primarily regarded as a
tool for analysis of experimental data. We now wish to view it in a more dynamical context.
Throughout the remainder of the chapter the reader should imagine that the ensembles

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.3 Properties of the POD 75

from which the autocorrelation function and empirical bases are generated originate from
solutions belonging to the attractor of a dynamical system such as the Navier–Stokes equa-
tions, realized either by a physical or a numerical experiment. Attractors are discussed in
more detail in Chapter 6; for the moment it is enough to think of a set in phase space to
which all solutions starting sufficiently close approach as time increases. In the best case,
the attractor is ergodic, which means that time averages and averages over the part of phase
space containing the attractor coincide. In this case the initial conditions are “forgotten”
as time proceeds, much as in the common hypothesis that certain turbulent flows relax in
physical space to “universal equilibrium states” (see Narasimha [256]).
The following subsections describe properties that are especially important in our use
of the POD to derive low-dimensional models. In the first two we characterize the classes
of functions that can be represented by empirical bases and explain precisely how such
representations preserve properties of the observations from which they are derived and
how they are optimal. We then consider symmetries, showing that in the case of transla-
tion invariance (homogeneous directions), the empirical eigenfunctions are simply Fourier
modes, and we obtain results on ergodic attractors invariant under more general symmetry
groups. We then show how the rate of decay of the empirical eigenvalues determines geo-
metrical properties of the attractor and how theoretical results on the regularity of solutions
of the governing evolution equations are related to this.

3.3.1 Span of the empirical basis


The first step in understanding what can be done with representations using empirical
eigenfunctions is in characterizing the class of functions which can be accurately repre-
sented by the “relevant” elements of the basis: those containing spatial structures having
$ $
finite energy on average. This is the set S = { a j ϕ j | |a j |2 < ∞, λ j > 0}, or
span{ϕ j | j = 1, . . . , ∞, λ j > 0}. In this section, equality of functions is interpreted
as almost everywhere in the spatial domain  with respect to Lebesgue measure: two
functions f and g are equal in this sense if

| f − g|2 d x = 0; (3.20)


this is the mathematical definition of “accurately.” We also frequently use a second notion
of almost every member of an ensemble with respect to the probability measure underlying
the averaging operation ·. This is denoted by “a.e.” In applications this average is typ-
ically a finite sum over a set of realizations or an integral over a finite-time experimental
run, but the theory is developed in the ideal case of infinite data sets.
A standing assumption in this section is that the averaged autocorrelation R(x, x  ) is a
continuous function. Discontinuities in R can lead to negative values in the power spec-
trum – the Fourier transform of R – and negative energies are unreasonable on physical
grounds. See both Section 3.10 and Appendix 3.13 of Lumley [221] for further details.
We first show that the empirical basis can reconstruct any function that is indistinguish-
able in the sense of (3.20) from a member of the original ensemble {u k }. Let u ∈ L 2 () be
any such function and {ϕ j } be the orthonormal sequence of empirical eigenfunctions. The

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
76 Proper orthogonal decomposition

$
reconstruction of u is a function u s (u) = j (u, ϕ j )ϕ j , belonging to S. (By Parseval’s
$
j |(u, ϕ j )| converges.) We need to show that for a.e. u with
inequality we know that 2

respect to the ensemble average, we have u = u s (u); that is:

u − u s 2  = 0. (3.21)

From the real valuedness of u, we have

u(x) − u s (x)2  = (u − u s , u − u s )


= (u, u) − 2(u, u s ) + (u s , u s ). (3.22)

Since the functions u are the members of the original ensemble, we have
% & 
(u, u) = u(x)u ∗ (x)d x = R(x, x)d x, (3.23)
 

and also
 ⎡ ⎤ 

−2(u, u s ) = −2 u(x) ⎣ (u ∗ , ϕ ∗j )ϕ ∗j (x)⎦ d x
 j
 
  +
= −2 u(x) u ∗ (x  )ϕ j (x  )d x  ϕ ∗j (x)d x
 j 
   +
∗   
= −2 u(x)u (x )ϕ j (x )d x ϕ ∗j (x)d x
 j 
  
 
= −2 Rϕ j ϕ ∗j = −2 λ j ϕ j (x)ϕ ∗j (x)d x
 j  j

= −2 R(x, x)d x. (3.24)


The third term of (3.22) is:


 
, -,  ∗ ∗ ∗ -
(u s , u s ) = (u, ϕi )ϕi (x) (u , ϕ j )ϕ j (x) d x
 i j
  

∗ ∗ ∗
= (u, ϕi )(u , ϕ j ) ϕi (x)ϕ j (x)d x
i, j 
 
 
= u(x)ϕ ∗j (x)d x ∗ 
u (x )ϕ j (x )d x  

j  
   +
= u(x)u ∗ (x  )ϕ j (x  )d x  ϕ ∗j (x)d x
j  

= λ j ϕ j (x)ϕ ∗j (x)d x.
j 

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.3 Properties of the POD 77

Using the continuity of R we can apply Mercer’s theorem for the uniform convergence of
the series expression for R and interchange summation and integration, obtaining
  
λ j ϕ j (x)ϕ ∗j (x)d x = R(x, x)d x. (3.25)
 j 

Combining (3.23), (3.24), and (3.25) we obtain (3.21).


We have shown that almost every member of the original ensemble can be reconstructed
as a linear combination of empirical eigenfunctions having strictly positive eigenvalues.
Now we want to show the converse: that each such eigenfunction can be expressed as
a linear combination of observations. This will imply that any property of the ensemble
members that is preserved under linear combination is inherited by the empirical basis
functions and hence by elements of S.
Let X denote the set of functions (of full measure with respect to the averaging oper-
ation) for which reconstructions satisfying (3.21) are possible, and let θ be any function
in S. We claim that there is a sequence {b j }∞j=1 of real numbers and a set of functions
u (x) ∈ X for j = 1, . . . , ∞ such that
j



θ (x) = b j u j (x). (3.26)
j=1

It immediately follows from (3.26) that, if P is a closed linear property of a subset of


functions in L 2 () (i.e. all u ∈ L 2 () with property P form a closed linear subspace) and
all the ensemble members u k share that property, then the eigenfunctions of the POD also
share the property. The converse holds too. Equation (3.26) and this remark characterize
the “empirical subspace” S.
It remains to justify Equation (3.26). Let S  denote the set of all functions in L 2 () with
$
representations i bi u i with u i ∈ X . We can show that S ⊥ = S ⊥ , from which it follows
that S  = S, and so the equation indeed holds.
Now S ⊥ is exactly the set of functions θ such that (θ, ϕi ) = 0 for every ϕi with eigen-
$
value λi > 0. From the first result of this section we have u(x) = bi ϕi (x) where λi > 0,
$
for a.e. (almost every) u. Thus we have (θ, u) = 0 a.e. and so (θ, i bi u i ) = 0. This shows
that S ⊥ ⊂ S ⊥ .
To show S ⊥ ⊂ S ⊥ and hence conclude S ⊥ = S ⊥ , assume that (θ, u) =
 ∗   ∗   
 θ (x )u (x )d x = 0 for a.e. u. Therefore for a.e. u we have u(x)  u (x )θ (x )d x = 0,
and taking the average we get

u(x, t)u ∗ (x  , t)θ (x  )d x  = 0,

which, from the eigenvalue Equation (3.13), implies that (θ, ϕi ) = 0 for every i such that
λi > 0.
The classic example of a property which passes from the data ensemble to the empir-
ical basis is incompressibility. If the autocorrelation tensor R = u(x, t) ⊗ u∗ (x , t) is
formed from realizations of divergence-free vector fields u, then the empirical eigenfunc-
tions ϕ j (x) are also divergence-free. This is very useful when we project the Navier–Stokes
equations onto a subspace spanned by a collection of these eigenfunctions in the next

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
78 Proper orthogonal decomposition

chapter. Other important properties inherited by the eigenfunctions include those of sat-
isfying linear boundary conditions, such as no-slip or no-penetration conditions on fixed
surfaces.
We now have a characterization of the span of the eigenfunctions with strictly positive
eigenvalues. This linear space S exactly coincides with that spanned by all realizations
u k (x) of the original ensemble a.e. with respect to the measure induced by the averaging
operation. (A special case of this result, when the average is a sum on a finite number of
points, as would be the case in a computer experiment, was observed in [20].) From this
we see that the set of empirical eigenfunctions {ϕ j |λ j > 0} need not form a complete basis
for L 2 (). While S may be infinite-dimensional, it is generally only a subset of the “big”
space L 2 () in which we are working. It is complete only if one includes the kernel of
the operator R – all the (generalized) eigenfunctions with zero eigenvalues – but in doing
so one loses the major advantage of the POD, for in many applications one can argue on
physical grounds that the realizations u(x, t) do not and should not span L 2 (). (Many
strange things may happen in turbulence, but not everything.) In such cases the discussion
of this section highlights a strong property of the POD. Its use limits the space studied
to the smallest linear subspace that is sufficient to describe the observed phenomena. For
our models, the moral is that “you can only describe what you’ve seen already.” As we
shall see later in the book, this can lead to interesting paradoxes and problems as well as to
significant economies.

3.3.2 Optimality
Suppose we have a decomposition of a time-dependent, statistically stationary signal
u(x, t) with respect to any orthonormal basis {ψ j (x)}∞
j=1 :

u(x, t) = b j (t)ψ j (x). (3.27)
j

If the ψ j (x) are dimensionless, then the coefficients b j (t) carry the dimension of the quan-
tity u. If u(x, t) is a velocity and · is a time average, the average kinetic energy per unit
mass over the experiment is given by
% &   
1 ∗ 1  ∗ ∗
u(x, t)u (x, t)d x = bi (t)b j (t) ψi (x)ψ j (x)d x
2  2 
ij
1
= bi (t)bi∗ (t),
2
i

and so the average kinetic energy in the ith mode is given by 12 bi (t)bi∗ (t) (no summation
implied).
We can now precisely state optimality for the POD. Suppose that we have a stationary
random field u(x, t) in L 2 () and that {ϕi , λi |i = 1, . . . , ∞; λi ≥ λi−1 > 0} is the set
of orthonormal empirical eigenfunctions with their associated eigenvalues obtained from
time averages of u(x, t). Let

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.3 Properties of the POD 79


u(x, t) = ai (t)ϕi (x) (3.28)
i
∞ be any other arbitrary
be the decomposition with respect to this basis and let {ψi (x)}i=1
orthonormal set such that

u(x, t) = bi (t)ψi (x). (3.29)
i

Then the following hold:

1. ai (t)a ∗j (t) = δi j λi ; i.e. the POD random coefficients are uncorrelated.
2. For every n we have

n 
n 
n
ai (t)ai∗ (t) = λi ≥ bi (t)bi∗ (t);
i=1 i=1 i=1

i.e. the POD is optimal on average in the class of representations by linear superposition:
the first n POD basis functions capture more energy on average than the first n functions
of any other basis.

The first assertion derives from the representation of R(x, x  ), given in Equation (3.16):
 
 
 ∗  ∗ ∗ 
R(x, x ) = u(x, t)u (x , t) = ai (t)ϕi (x) a j (t)ϕ j (x )
i j

= ai (t)a ∗j (t)ϕi (x)ϕ ∗j (x  ).
ij

But we know that



R(x, x  ) = λi ϕi (x)ϕi∗ (x  ),
i

and so, since the ϕi∗ (x)


are an orthonormal family in L 2 (), we see that ai (t)a ∗j (t) =
δi j λi .
The second assertion relies on a result on linear operators. Let {ψ j (x)}nj=1 be n arbi-
trary orthonormal vectors in L 2 () that may be completed to form an orthonormal basis.
Let Q denote projection onto span{ψ1 , . . . , ψn }. We can express the kernel R in terms of
{ψ j }∞ j=1 as
 
  
 ∗ ∗ 
R(x, x ) = bi (t)ψi (x) b j (t)ψ j (x ) = bi b∗j ψi ψ ∗j . (3.30)
i j ij

We can then write R in operator matrix notation as


⎡ ⎤
b1 b1∗  b1 b2∗  b1 b3∗  . . .
⎢b2 b∗  b2 b∗  ... . . .⎥
⎢ 1 2 ⎥
⎢b3 b∗  ... ... . . .⎥
⎣ 1 ⎦
.. .. .. ..
. . . .

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
80 Proper orthogonal decomposition

and the product R ◦ Q yields


⎡ ⎤
b1 b1∗  b1 b2∗  . . . b1 bn∗  0... 0...
⎢b b∗  b b∗  . . . b2 bn∗  0... 0 . . .⎥
⎢ 2 1 2 2 ⎥
⎢ . .. .. .. .. .. ⎥
⎢ . ⎥
⎢ . . . . . . ⎥
⎢ ⎥.
⎢bn b1∗  bn b2∗  . . . bn bn∗  0... 0 . . .⎥
⎢ ⎥
⎢ 0... 0... ... 0... 0... 0 . . .⎥
⎣ ⎦
.. .. .. .. .. ..
. . . . . .
The proof is now completed by appeal to Remark 1.3 in Section V.1.2 (p. 260) of Temam
[367] (cf. Riesz and Nagy [304]), which states that the sum of the first n eigenvalues of
a self-adjoint operator is greater than or equal to the sum of the diagonal terms in any
n-dimensional projection of it:


n 
n
λi ≥ Tr(R ◦ Q) = bi bi∗ . (3.31)
i=1 i=1

This characterization is the basis for the claim that the POD is optimal for modeling or
reconstructing a signal u(x, t). It implies that, among all linear decompositions, the POD
is the most efficient in the sense that for a given number of modes, n, the projection on the
subspace spanned by the leading n empirical eigenfunctions contains the greatest possible
kinetic energy on average. Moreover, the time series of the coefficients ai (t) are linearly
uncorrelated.

3.3.3 Symmetry
We start by describing a particular kind of symmetry, called homogeneity in the turbu-
lence literature. We say that the averaged two point correlation R(x, x  ) is homogeneous
if R(x, x  ) = R(x − x  ), i.e. R depends only on the difference of the two coordinates: it
is translation invariant. In general, homogeneity of a system is defined through multipoint
moments. Here we need only second order moments, but it is important to note that, while
the ensemble of realizations {u k } may be translation invariant on average, individual real-
izations typically are not. Homogeneity occurs in both spatially unbounded systems and
systems with periodic boundary conditions. In either case we may develop R in a Fourier
representation. In the case of a finite domain, we have the series
 
R(x − x  ) = ck e2π ik(x−x ) . (3.32)

We can then solve the eigenvalue problem (3.13) by substituting the (unique) representation
 
R(x, x  ) = ck e2π ikx e−2π ikx , (3.33)

which implies that {e2π ikx } are exactly the eigenfunctions with eigenvalues ck . Conversely,
if the eigenfunctions are Fourier modes we can write (3.33), which implies (3.32). How-
ever, while homogeneity completely determines the form of the empirical eigenfunctions,

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.3 Properties of the POD 81

the numerical values and ordering of the eigenvalues depend upon the Fourier spectrum of
the particular data set involved. In summary we can state:

• If R(x, x  ) = R(x − x  ) is homogeneous, then the eigenfunctions of the operator R =


 
 R(x, x ) · d x are Fourier modes, and vice versa.

This is especially useful in systems where the domain  is of higher dimension. For
example, if  ⊂ R 2 and the x1 -direction is homogeneous, the problem of finding eigen-
functions in the two-dimensional domain is decoupled into a set of one-dimensional
problems by writing
R(x1 , x1 , x2 , x2 ) = R(x1 − x1 , x2 , x2 )

and performing the same procedure as above, yielding an eigenvalue problem for each
Fourier wavenumber. In the decomposition of the boundary layer used in later chapters
we appeal to homogeneity in the spanwise (x3 ) and streamwise (x1 ) directions, appropri-
ate to a fully developed flow in, say, a channel or a pipe. Selecting the (finite) domain
[0, L 1 ] × [0, L 3 ] in these variables, we may then use a mixed discrete Fourier-empirical
decomposition of the form
" #
 k1 x 1 k3 x 3
2π i L1 + L3
u(x, t) = ak1 ,k3 ,n (t)e ϕ n (k1 , k3 ; x2 ). (3.34)
k1 k3 n

The vector-valued eigenfunctions ϕ n (k1 , k3 ; x2 ) in (3.34) are obtained by solving an oper-


ator equation analogous to (3.10) in which the kernel R(x1 − x1 , x2 , x2 , x3 − x3 ) is replaced
by its Fourier transform in the x1 - and x3 -directions. More details are given in Chapter 10;
also see [155, 221, 244]. In the turbulent boundary layer, as in other problems with one or
more homogeneous directions, this decomposition produces structures that are not local-
ized in spanwise and streamwise extent, unlike the (instantaneous) events observed. We
return to this issue in Section 3.6, where we show how one can retrieve local structures
from the statistics in certain cases.
Homogeneity or translation invariance is only one of many types of symmetry that
physical systems may exhibit. It is an example of a continuous symmetry group, for we
may make translations by any distance. Discrete groups are also common: in the case of
a boundary layer over a surface treated with riblets – strakes parallel to the mean flow
direction – one would have spanwise symmetry only with respect to translations through
multiples of the riblet spacing. A jet with a lobed mixer would similarly exhibit symmetry
under discrete rotations. However, we stress that while a physical system or a model of it in
the form of a dynamical system may well admit such symmetries, we cannot expect either
individual observations (solutions) or even ensembles of them to share the full underlying
symmetry group. In mathematical terms, the system will generally not be ergodic. A sim-
ple example of this appears in Chapters 6 and 7, in a reflection symmetric two-dimensional
ODE possessing two stable fixed points (Figure 6.9). Any given solution can only approach
one of these, and so to reveal the full structure of the attractor, one must average over a set
of initial conditions chosen in light of the symmetry. More generally, if a system has several
distinct attractors, the time average of a single solution will reproduce just one of these and

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
82 Proper orthogonal decomposition

so the empirical eigenfunctions generated by time averaging from one experimental run
will enjoy less symmetry than the problem as a whole.
To make a precise statement characterizing the relation between underlying symmetries
and subspaces spanned by the empirical eigenfunctions, we need the notions of equivariant
dynamical systems and invariant subspaces, which are discussed (in the finite dimensional
context) in Chapters 6 and 7. We therefore relegate it to the Appendix where it appears as
Proposition 3. Even to make the informal statement below, we need a few preliminaries,
which we give in the context of finite-dimensional ODEs. Let
ẋ = f(x) (3.35)
be an n-dimensional system and  be a symmetry group acting on the phase space Rn : the
elements γ of  being n × n matrices. To say that (3.35) is equivariant under  means that
the equation
γ f(x) = f(γ x) (3.36)
holds for every γ ∈ . As described in more detail in Chapter 7, this implies that if x(t)
is a solution of (3.35), then so is γ x(t) (think of the two fixed points of the reflection
symmetric system mentioned above). Equivariance also typically implies that eigenvalues
come in multiples, for if A denotes the linearization of f, then we have from (3.36)
γ Av = γ λv = λγ v
and also
Aγ v = λγ v,
implying that if v is an eigenvector of A with eigenvalue λ, then so is γ v. The eigenspaces
of the operator R are similarly constrained by symmetries, so that one typically expects
several distinct eigenfunctions for a given eigenvalue, corresponding to the same structure
differently oriented or located in physical space.
After this preamble, we can now give the gist of the result:

• If ϕ j and λ j are the empirical eigenfunctions and eigenvalues generated from a set of
solutions (= experiments) {u k } of a dynamical system equivariant under a group , then
a necessary condition for the flow generating {u k } to be ergodic is that each of the finite-
dimensional eigenspaces corresponding to a given empirical eigenvalue be invariant
under .

The way one might check this condition experimentally would be to:

1. Perform the experiment to measure R(x, x  ).


$
2. Decompose R(x, x  ) using the POD: R(x, x  ) = λi ϕi (x)ϕi∗ (x  ).
3. Check that for every eigenfunction ϕ j ∈ Nλm = span{ϕ j |λ j = λm } and every γ ∈ ,
γ (ϕ j ) ∈ Nλm .
Aubry et al. [23] computed POD bases from numerical integrations of the Kuramoto–
Sivashinksy equation (see Chapter 8) and concluded that for certain values of the bifurca-
tion parameter the system is not ergodic. On the other hand, if one assumes that a system

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.3 Properties of the POD 83

is ergodic one may use its symmetries to increase the size of the ensemble, generating
additional data sets {γ u k } from a set of observations {u k } which represent only a limited
region of the full attracting set. This approach has been advocated by Sirovich in [330]
and adopted in many studies. However, one should be cautious, as there are examples for
which the partition into ergodic components is finer than the partition into symmetric com-
ponents. In this case the image of the basis obtained by one experiment under the symmetry
group will not produce the basis obtained by the ensemble average measure. See [48] for
further discussion of this point.
The ergodicity assumption is questionable, particularly in cases of “small” systems or
special geometries. For example in a square Rayleigh–Bénard cell there is a possibility
that the rotation direction of the single roll selected at the time of onset may never change
throughout the life of the experiment. This indicates that there are at least two distinct
and disjoint parts for the support of the invariant measure, each associated with a rotation
direction, much as in Figure 6.9; hence it is not ergodic. Similar phenomena appear to
occur in the minimal flow unit of the channel flow simulations by Jiménez and Moin [175].

3.3.4 Attractors
If the observations {u k (x)} from which the POD is generated come from a solution (or
solutions) u(x, t) of a dynamical system, then the empirical eigenfunctions and eigen-
values contain information on the attractor(s) of that system. We have already discussed
symmetries. In this subsection we develop this observation in several other ways. We first
give a probabilistic-geometric interpretation of the location of the dynamics in phase space
using Chebyshev’s inequality (Feller [108]):
Chebyshev’s inequality Let x be a vector-valued random variable, with mean x and
variance σ 2 = var (x) = |x − x|2 . Then for any  > 0
σ2
P{|x − x| ≥ } ≤ ,
2
where P denotes probability.
Chebyshev’s inequality expresses the notion that the variance says something about the
frequency of departures from the mean. In our case we define
⎧ ⎫
⎨ ∞ ⎬
Sn () = u ∈ L 2 ()| |(u, ϕm )|2 <  ,
⎩ ⎭
m=n+1

and
Wn () = L 2 () \ Sn ().
Here Sn () is a slab of thickness 2 around the finite-dimensional space span{ϕ1 , . . . , ϕn }
and Wn () is the rest of phase space, outside this slab. Note that Sn () is infinite-
dimensional. We use Chebyshev’s inequality to estimate the fraction of the time spent by
solutions u(x, t) in Sn (). Denote by xn the vector-valued random variable
xn (t) = {(u, ϕm )}∞
m=n+1

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
84 Proper orthogonal decomposition

representing the infinite “tail” of the process. Then we have xn  = 0 and σ 2 (xn ) =
$∞
m=n+1 λm , and therefore, by Chebyshev’s inequality,
$∞
m=n+1 λm
P{u ∈ Wn ()} = P{|xn | ≥ } ≤ . (3.37)
2
To obtain a useful result, one lets  → 0 for fixed n via a subsequence n → 0 satisfying
$∞
m=n+1 λm
→ 0; (3.38)
n2
in other words, the n s are chosen such that their squares go to zero slower than the decay
of the norm in the residual modes. This gives a series of slabs whose thicknesses go to zero
while the probability of solutions being in those slabs goes to one.
The problem now becomes that of estimating the rate of decay of the residual eigen-
$
values ∞ m=n+1 λm in the tail. The analytical evidence outlined below suggests that, when
the POD basis is used for turbulent flows, this residual decays at least exponentially fast
asymptotically. This enables us to take a series n2 → 0, with a slightly smaller exponent.
The result is a sequence of slabs with thicknesses going exponentially to zero, while the
probability of solutions being in a slab goes exponentially to one. This creates a picture in
which an attractor is very thin, although possibly infinite-dimensional. It is reasonable to
postulate, even in this case, that the essential dynamics are controlled by a finite number of
modes. There is even a technical refinement to this, given as Proposition 4 in the Appendix;
also see [112].
In the dynamical systems literature a lot of effort has gone into developing methods
for estimation of dimension of attractors (e.g. [84, 320, 367]). The underlying idea is that,
if the attractor of an infinite-dimensional evolution equation is finite-dimensional, then it
should be possible to extract a finite-dimensional model in the form of a set of ODEs of
comparable dimension. It is natural to try to relate the POD to such ideas, and even to
define a dimension using empirical eigenvalues. Perhaps the most obvious thing to do is to
define a (Karhunen–Loève) dimension as the number of non-zero eigenvalues in the POD,
as suggested by Aubry et al. [20]. However, the number of non-zero eigenvalues is the
dimension of the smallest linear subspace containing the dynamics, and it consequently
provides only a very crude upper bound for the (Hausdorff) dimension of the attractor.
Even if the attractor dimension is finite the linear subspace may not be finite-dimensional.
It is easy to construct an example having a limit cycle (of dimension 1) which “twists”
around in infinitely many directions in the function space and so generates data having an
infinite number of non-zero eigenvalues. Indeed, as Sirovich realised in [331], this naive
definition is not very useful. He suggested the following working definition: “ . . . introduce
dim K L . . ., the number of actual eigenfunctions required so that the captured energy is at
least 90% of the total . . ., and that no neglected mode, on the average, contains more than
1% of the energy contained in the principal eigenfunction mode.”
Note that the above comments are consistent with the celebrated embedding results of
Takens [365] and with the observation that the set of projections of a compact set of Haus-
dorff dimension k on to a (2k + 1)-dimensional subspace is a residual set [234]. In spite
of the finite (and possibly low) dimension of the attractor and the set containing it, the

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.3 Properties of the POD 85

POD will generally have positive eigenvalues with eigenfunctions in the complementary
subspace, unless the set is “flat” and entirely contained in the (2k + 1)-dimensional linear
subspace. As suggested by Equation (3.37) and the example mentioned above, we can only
expect this to hold probabilistically and in an asymptotic sense.
In connection with Takens’s theorem, which is generally applied to the study of phase
space reconstructions effected by delay maps, we note that delay maps are highly nonlinear
(as maps from “true” phase space into the embedding space). The POD spectrum is not
invariant under nonlinear coordinate changes and one cannot expect the dimension in the
delay map to be related to the number of non-zero POD eigenvalues in any simple way.
Berkooz [37] has suggested an approach to this problem via a “conditional POD,” but a
discussion of this would take us too far afield.
The remark following Equations (3.37) and (3.38) depends upon exponential decay of
empirical eigenvalues in the infinite tail. Is this a reasonable expectation? In the case of
turbulent flows the answer appears to be yes both on physical (Tennekes and Lumley [368])
and on mathematical grounds (Promislow [289], Foias et al. [112]).
From a physical viewpoint, exponential decay of the spectrum holds only in the far dissi-
pative range, in which length scales are smaller than the Kolmogorov microscale (Equation
(2.4)). This should not be confused with the power-law decay of the intermediate inertial
range. If asymptotic decay were substantially slower than exponential, high order spatial
derivatives of the velocity field u(x, t) might not exist. Most fluid mechanicians believe
this to be unreasonable in a description of continuum matter.
The relevant mathematical concept is regularity of solutions, which describes the rate of
decay of the tail of the spectrum of instantaneous solutions of a PDE in wavenumber space.
To get a feel for this, consider the linear heat equation with a spatio-temporal “forcing
function”:
u t = u x x + f (x, t) ; u(x, 0) = u 0 (x).

Solution of this equation effectively requires one to integrate twice in space (and once in
time), thereby smoothing the function f (x, t), and the initial data u 0 (x). The effect is,
of course, more marked the higher the spatial wavenumber. See Section 4.2 for explicit
examples. In the Navier–Stokes equation the dissipative term νu plays a similar rôle to
u x x , its major influence also being in the far dissipative range.
In the simplest fluid mechanical situation, where the domain is a rectangular box with
periodic boundary conditions and the solutions may be expressed in a (triple) Fourier series
with time-dependent coefficients, Gevrey regularity of class s is the statement that the
Fourier coefficients ak decay at a rate such that the sum

ak eτ |k| eik·x
2s

converges for all τ . In this case the energy spectrum also decays exponentially fast at suf-
ficiently high wavenumbers. In the context of turbulent flows this applies to the dissipation
region of the spectrum. For an arbitrary (finite) domain one defines regularity in terms of
the decay of the modal coefficients associated with the eigenfunctions of the Stokes opera-
tor in that domain. The asymptotics of the eigenvalues of the Stokes operator are the same

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
86 Proper orthogonal decomposition

for all “reasonable” domains [83], hence asymptotic results obtained from analysis of the
periodic case remain valid. Rigorous regularity results for the Navier–Stokes equations in
two-dimensional domains are given in [116]. In three dimensions there are no complete
results, but if one assumes that vorticity is bounded above uniformly throughout the flow
domain and in time, one can bypass the blow-up problem for solutions and obtain Gevrey
regularity in this case too, possibly after solutions have evolved for some time [116].
Equipped with regularity of individual solutions, one still needs to show that the appro-
priate averages used in the POD are also uniformly exponentially bounded. This can be
done [37] and the end result is that Gevrey regularity of solutions of the governing evolution
equations implies exponential decay of the empirical eigenvalues. However, as remarked
in [112], such exponential decay results apply only to the far dissipative range of turbu-
lence and so are not directly relevant to the low-dimensional models of interest to us, in
which we truncate far below that range. For more details see Berkooz [37].
For the sake of completeness we mention an interesting result due to Sirovich and
Knight [336] which also concerns the structure of the POD at small spatial scales (high
wavenumbers). Foias et al. [112] conclude from the results of [336] that, under certain
conditions, the asymptotic form of the empirical eigenfunctions is that of Fourier modes.
However, the results of Section 3.3.1 on the span of the eigenfunctions show that this can-
not be true in full generality. Consider an ensemble of realizations that all vanish identically
(or fall below experimental error) on a specific part of the domain. By Proposition 4 of the
Appendix, the eigenfunctions themselves also have to be zero on that part of the domain
and so cannot be asymptotically close to Fourier modes in that region.

3.4 Further results


In this section we briefly describe some recent extensions of, and other developments
related to, the POD. We first discuss an idea which can reduce computational effort and
then briefly remark on the “robustness” of the POD with respect to changes in conditions
under which the data ensemble is generated and on the relation between the POD and
probability density functions.

3.4.1 Method of snapshots


The method of snapshots was suggested by Sirovich in [330]. It is a numerical procedure
that can save time in solving the eigenvalue problem (3.10) necessary for the POD. The idea
is as follows: suppose one performs a numerical simulation on a large number of gridpoints
N , that the number of ensemble members deemed adequate for a description of the process
is M, and N  M. (The fundamental question of determining M is not addressed.) In gen-
eral, since the data functions u k and eigenfunctions ϕ j have been replaced by N -vectors,
the eigenfunction computation would become an N × N eigenvalue problem, as described
in Section 3.1.1. Sirovich observed that this can be reduced to an M × M problem as fol-
lows: suppose that {ui }i=1
M are the realizations of the field and that the inner product on

the N -dimensional vector space of realizations is denoted by (·, ·); this is the discretized

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.4 Further results 87

version of the inner product in L 2 (). From the result on the span of the eigenfunctions in
Section 3.3.1 we know that if ϕ is an eigenvector then

M
ϕ= ak uk , (3.39)
k=1

where the coefficients ak remain to be determined. Following (3.12), the N -dimensional


eigenfunction problem may then be written
0 1 M
1  i i T  
M M
u (u ) ak uk = λ ak uk . (3.40)
M
i=1 k=1 k=1

The left-hand side may be rearranged to give


M 
M  1
(uk , ui )ak ui ,
M
i=1 k=1

and we conclude that a sufficient condition for the solution of (3.10) will be to find
coefficients ak such that

M
1 k i
(u , u )ak = λai ; i = 1, . . . , M. (3.41)
M
k=1

This is now (one row of) an M × M eigenvalue problem. Note that in order for (3.41) to
be a necessary condition, one needs to assume that the observations {ui }i=1
M are linearly

independent.

3.4.2 Relationship to singular value decomposition


In the finite-dimensional case, the POD reduces to a singular decomposition of the given
dataset. To see this connection, we first stack the snapshots uk into an N × M data matrix
, -
X = u1 · · · u M , (3.42)

after which the N × N eigenvalue problem (3.10) may be written


1
XXT ϕ = λϕ, (3.43)
M
and the expansion (3.39) becomes ϕ = Xa, where a = (a1 , . . . , a M ). The eigenvalue
problem (3.41) for the method of snapshots then reduces to an M-dimensional eigenvalue
problem:
1 T
X Xa = λa. (3.44)
M
Recall that the singular value decomposition (SVD) of a real N × M matrix is given by

r
X = UVT = σ j ϕ j vTj , (3.45)
j=1

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
88 Proper orthogonal decomposition

where U = [ϕ 1 · · · ϕ N ] and V = [v1 · · · v M ] are orthogonal matrices (UT U = I N ×N


and VT V = I M×M ), r is the rank of X, and the N × M matrix  is of the form
 +
1 , -
= , N ≥ M,  =  1 0 , N ≤ M, (3.46)
0

where  1 is a diagonal matrix of real, non-negative singular values σ j , arranged in


descending order. A straightforward calculation reveals that if (3.45) is the singular value
decomposition of the data matrix X, then the POD modes are the columns of U, and the
eigenvalues of (3.43) are λi = σi2 /M:

1 
r r
1 1 2
XXT ϕ i = σ j σk ϕ j vTj vk ϕ Tk ϕ i = σ ϕ, (3.47)
M M M i i
j=1 k=1

since {ϕ j } ⊂ R N and {v j } ⊂ R M are orthonormal sets. Thus, the columns ϕ i of U (the left
singular vectors of X) are the eigenfunctions in (3.43) (the POD modes), with empirical
eigenvalues σi2 /M.
Also note that the right singular vectors v j are related to the temporal coefficients a j
in (3.17), and property (3.18) that these are uncorrelated is reflected in the orthogonality
of v j .

3.4.3 On inner products for compressible flows


For incompressible flows, typically we are interested in velocity fields u = (u, v, w), and
the standard inner product on L 2 is a natural choice, since its induced norm corresponds
to kinetic energy, as discussed in Section 1.4. For compressible flows, however, the sit-
uation is more complicated. In this case, the vector of quantities of interest contains not
only kinematic variables (velocities), but also thermodynamic variables: density, pressure,
enthalpy, etc. Here, the standard L 2 inner product may not even make sense. For instance,
if we use the flow variables q = (ρ, u, v, p), defined on a spatial domain , the L 2 inner
product is

 
q1 , q2 = (ρ1 ρ2 + u 1 u 2 + v1 v2 + p1 p2 ) dx, (3.48)


which is not dimensionally consistent, since one should not add a velocity and a pressure.
The simplest approach is perhaps to consider the kinematic variables and thermody-
namic variables separately, obtaining separate sets of POD modes. However, this approach
yields many more POD modes, and Galerkin models formed using this approach have been
shown not to perform as well as when a single set of vector-valued modes is used [316].
One approach, as taken in [228], is normalization: if the quantities (ρ, u, v, p) are scaled
by nominal values so that each component is non-dimensional, then combining them as
in (3.48) does make sense. This is perhaps the most versatile approach, because one can
adopt different scalings for different problems. However, there can be too much freedom
here, for it may not be clear which scaling is appropriate.

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.4 Further results 89

Another approach is to seek an inner product whose induced norm is also a form of
energy, consistent with the notion of the norm in incompressible flows. For instance, the
stagnation enthalpy of the flow is given by
1
h 0 = h + (u 2 + v 2 + w 2 ), (3.49)
2
where h is the static enthalpy, and (u, v, w) are velocities. The stagnation energy is defined
analogously, with enthalpy h replaced by the internal energy per unit mass, given for an
ideal gas by E = h/γ , where γ is the ratio of specific heats. Seeking an energy-based
inner product, in [316] an induced norm of the form

1  1 
qα =
2
αh + (u 2 + v 2 + w 2 ) dx (3.50)
2  2
was proposed, where q is the vector of flow variables, and α > 0 is a constant. The inte-
grand is not quadratic, however, because h appears linearly, so it may not be clear how to
write a corresponding inner product. This situation was remedied in [316] by transforming
to the flow variables q = (u, v, w, a), where a is the local speed of sound, which for an
ideal gas satisfies a 2 = (γ − 1)h. A corresponding family of inner products can then be
defined as
  

q1 , q2 α = u 1 u 2 + v1 v2 + w1 w2 + a1 a2 dx, (3.51)
 γ −1
and parameterized by α, where α = 1 corresponds to using the integral of stagnation
enthalpy as the norm, and taking α = 1/γ corresponds to using stagnation energy. In the
case of a nearly isentropic flow, the choice of flow variables q = (u, v, w, a) also leads to
a particularly simple (quadratic) form of the compressible Navier–Stokes equations, given
in [314] by
2
ut + u · ∇u +a∇a = νu,
γ −1
γ −1
at + u · ∇a + a∇ · u = 0.
2
Note, however that neither the integral of the stagnation enthalpy nor the stagnation
energy is actually a conserved quantity. The conserved quantity is the total energy, given by
  
1
ρ E + ρ(u + v + w ) dx.
2 2 2
(3.52)
 2
Although this norm would perhaps be the most natural from an energetic viewpoint, it is
not obvious how to choose physically meaningful flow variables for which the equations of
motion are reasonably simple. However, it is shown in [314] that in the case of an isentropic
flow, choosing α = 1/(γ − 1) in (3.51) yields an induced norm that is indeed a conserved
quantity.

3.4.4 On using an empirical basis over a parameter range


As we have remarked several times, and made clear in the results of Section 3.3.1, the abil-
ity of the POD to represent data is entirely dependent on the ensemble of observations, be

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
90 Proper orthogonal decomposition

they the result of physical experiments or numerical simulations, that goes into the aver-
aging process. A natural question, then, is the “robustness” of the POD under changes in
experimental conditions or parameter settings. Suppose, for example, that empirical eigen-
functions are computed for a channel flow at a Reynolds number (based on centerline
velocity) of 6000 and one wishes to use them to represent flows in the same channel at
Re = 8000. How well can they be expected to perform? There is no firm evidence here,
although some indications are encouraging.
Rodriguez and Sirovich [308, 339], working with numerical simulations of the
one-space-dimensional Ginzburg–Landau equation in a (temporally) chaotic regime,
constructed empirical eigenfunctions and used them to produce low-dimensional pro-
jections. They then produced bifurcation diagrams (see Chapter 6) by varying control
parameters in the resulting low-dimensional ODEs and compared the bifurcation values
with those revealed by the “full” simulations of the original PDE, finding agreement
within a few percent over a wide range of parameter variation. Another model problem,
the Kuramoto–Sivashisky equation (cf. Chapter 8) was considered in [344]. In neither of
these cases, however, were solutions particularly rich spatially, and the empirical eigen-
functions are rather similar to Fourier modes. Perhaps more remarkably, in experiments
on channel flows, Adrian and his colleagues [214] found evidence of simple scaling of
eigenfunctions with Reynolds number, much as they found earlier in numerical simula-
tions of a randomly forced one-dimensional Burgers equation [72]. In Section 12.4 we
discuss some other instances involving direct numerical simulations of flows in complex
geometries.
Recently Noack and colleagues have proposed the use of parameterized empirical bases
{φ κj (x)} to encompass wider operating ranges: an issue of particular importance in control
applications, in which actuators can change flow patterns around bodies. Examples appear
in [207, 248, 249, 329], and in a recent book [267]. Also see [177].
This problem can be avoided if one chooses scaling appropriate to the physical phe-
nomenon under study. For example, as we shall see in the boundary layer models of
Chapter 10, if scaling based on wall variables is employed, the models are unaffected by
changes in Reynolds number.
We close this section by mentioning a connection between the POD and the probability
density function (PDF) in phase (functional) space. The invariant measure associated with
solutions of the Navier–Stokes equations in functional space is an object of great interest;
if one could obtain it explicitly one would have “a solution to turbulence,” since all mul-
tipoint (single time) statistics would then be available. From this point of view the POD
can be seen as the linear change of basis that turns the phase space coordinates into uncor-
related (although probably dependent) random variables. As shown by Hopf’s theory of
turbulence, the characteristic functional of the PDF in functional space may be obtained
by multipoint correlations [164, 353]. This leads us to propose a very simple model for the
PDF in functional space. Using the representation


u(x, t) = ak (t)ϕk (x), (3.53)
k

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.5 Stochastic estimation 91

we assume that the ak s are independent and normally distributed with variance λk , ak ∼
N (0, λk ), where N (μ, σ 2 ) denotes the Gaussian distribution with mean μ and variance σ .
While this is consistent with the picture the POD gives of the flow in that the coefficients
are uncorrelated and the spectrum is correct, it clearly implies a strong assumption on the
modal dynamics. Nonetheless, in the next section we see that this model is closely related
to another statistical approach to coherent structures in turbulence.

3.5 Stochastic estimation


In this section we comment on the connection between the POD and linear stochastic
estimation, as applied by Adrian and co-workers in [1–3,243]. This picks up on the relation
between the POD and probability density functions already mentioned in Section 3.4.4.
We remark that the formulation of the POD and the results developed earlier apply to
space, time, or mixed space-time analyses, all depending on the choice of the averaging
operator (or equivalently, the measure), as long as the assumptions of Proposition 1 in the
Appendix are satisfied. In this regard, an appropriate choice of averaging (i.e. a measure
concentrated on a finite number of points, as would be encountered in a computer sim-
ulation) will produce the “bi-orthogonal” decomposition of Aubry et al. [20]. (Note that
this notion of bi-orthogonality, in both space and time, is distinct from the notion of a
bi-orthogonal set, considered in Chapter 5.)
Stochastic estimation is a method for predicting the conditional probability density func-
tion (CPDF) of a field u(x), given observations u(x  ) at other points, or possibly a vector
of events at several points. There are many reasons to seek a CPDF, including its use in
closure models [284] and in order to “produce” coherent structures. We shall outline the
method of linear stochastic estimation since it is simple and enlightening, but for the sake
of simplicity, we limit ourselves to scalar fields and single point vector events and we
consider only estimates linear in u(x  ), instead of the full CPDF.
Given u(x  ), we seek an estimate for u(x) of the form A(x, x  )u(x  ) by requiring the
function A(x, x  ) to minimize the expression
|u(x) − A(x, x  )u(x  )|2 , (3.54)
where · denotes an average over an ensemble of realizations, as in the rest of this chapter.
As in the derivation in Section 3.1, a necessary condition will be that for any V (x, x  )

d    2 

|u(x) − [A(x, x ) + δV (x, x )]u(x )|  = 0. (3.55)
dδ δ=0
But the expression inside the averaging brackets is equal to:
{u(x)−[A(x, x  ) + δV (x, x  )]u(x  )}·
{u ∗ (x)−[A∗ (x, x  ) + δV ∗ (x, x  )]u ∗ (x  )},
so that, after differentiating with respect to δ, evaluating at δ = 0 and equating to zero
we get:
2Re[V ∗ (x, x  )u(x)u ∗ (x  )] = 2Re[V ∗ (x, x  )A(x, x  )u(x  )u ∗ (x  )]. (3.56)

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
92 Proper orthogonal decomposition

Since V (x, x  ) is an arbitrary variation this implies that

u(x)u ∗ (x  ) = A(x, x  )u(x  )u ∗ (x  ) (3.57)

and we therefore take


u(x)u ∗ (x  )
A(x, x  ) = . (3.58)
u(x  )u ∗ (x  )
The fluctuations typically present in a turbulent system make the assumption that
u(x  )u ∗ (x  ) is invertible at every point x  a reasonable one, so that (3.58) will be well
defined. We have found that, to provide the “best” linear estimate, A(x, x  ) should be pre-
cisely the averaged two point correlation function R(x, x  ), suitably normalized. Results
of Adrian [1] show that the corrections to the CPDF due to higher order nonlinear terms in
u(x  ) are small, at least for homogeneous turbulence.
We can now introduce the representation (3.16) in terms of the POD to rewrite (3.58) as:
$∞ ∞
 λi ϕi (x)ϕi∗ (x  ) def 
A(x, x ) = $i=1 ∞  )|2
= ϕi (x) f i (x  ), (3.59)
j=1 λ j |ϕ j (x
i=1
$∞
where f i (x  ) = λi ϕi∗ (x  )/ j=1 λ j |ϕ j (x  )|2 . We may therefore interpret f i (x  ) as the rela-
tive contribution of the mode ϕi to u(x  ) on the average. We conclude that linear stochastic
estimation is equivalent to assuming that the estimated value of the POD coefficient of the
ith mode, given the velocity at x  , is the average contribution of the ith mode to the velocity
at x  multiplied by the given velocity.
We now show that we recover exactly the same result from the simplified PDF model
based on the POD introduced in Section 3.4, Equation (3.53). There we assumed that
ai ∼ N (0, μi ) and that these normally distributed coefficients ai s were independent. Let
us compute the estimator u(x)|u(x  ). Since we have an expression for the PDF we can
compute this explicitly. Recall from the formula for the conditional expectation of joint
normal variables in probability theory (see [108]) that if xi ∼ N (0, σi2 ) for i = 1, . . . , m
then
  m 
 σ 2C
xi  x j = C = $mi . (3.60)
j=1
σ2 j=1 j

Using (3.60), we have


  
 ∞ λi |ϕi (x  )|2 u(x  )

ai ϕi (x )  
a j ϕ j (x ) = u(x ) = $∞ ,
λ |ϕ (x  )|2
j=1 j j
j=1

which gives
$∞  2  
 i=1 λi |ϕi (x )| u(x )ϕi (x)/ϕi (x )
u(x)|u(x ) = $∞  2
j=1 λ j |ϕi (x )|
$∞ ∗  
i=1 λi ϕi (x )ϕi (x)u(x )
= $∞ 
. (3.61)
j=1 λ j | ϕ j (x ) |
2

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.6 Coherent structures and homogeneity 93

Since we estimated u(x)|u(x  ) by A(x, x  )u(x  ), this coincides with the result obtained
from linear stochastic estimation, as can be seen by reference to Equation (3.59).
We conclude that the simple PDF model suggested in the previous section results in the
best linear estimator of the conditional PDF of velocity, and that linear stochastic estima-
tion may be viewed as a result of the simple PDF model based on the POD. This reveals a
fundamental connection between the POD and linear stochastic estimation. In addition we
can make the following technical observations based on our previous results:
1. All fields generated by linear stochastic estimation (LSE) possess any closed linear
property that all ensemble members share.
2. Suitable averages of LSE events produce the POD eigenfunctions.
3. All LSE events are linear combinations of POD eigenfunctions.
We remark that one can apply the geometric (Chebyshev) result of Section 3.3.4 to obtain
bounds on the probability of rare LSE events. Bonnet and co-workers [54] have also
studied the relationship between stochastic estimation and the POD, with a view to effi-
cient estimation of instantaneous velocity fields and to developing methods for validating
low-dimensional models.

3.6 Coherent structures and homogeneity


The quest for an unbiased descriptor of coherent structures led us to consider the POD
as a possible tool for extraction of structure from the statistics of an ensemble of obser-
vations on a physical system or solutions of a dynamical system. However, as we saw in
Section 3.3.3, the existence of a homogeneous direction in physical space yields empiri-
cal eigenfunctions which are simply Fourier modes. Superficially, this does not appear to
agree with the observation of localized coherent structures, but one can, of course, produce
a localized structure with a suitable combination of Fourier modes with the appropriate
complex coefficients. The spectrum of empirical eigenvalues supplies the moduli (Equation
(3.33)), but one still has to determine phase relationships that yield instantaneous events.
In the work described subsequently in this book, low-dimensional models themselves
determine the time-varying amplitudes and phases of the Fourier modes, and so this ques-
tion is not particularly relevant for our purposes. However, one can ask for ensemble
averaged coherent structures which reflect the actual Fourier combinations present, in
which case phase information must be estimated from the statistics of the data. The exper-
imentally founded intuition that coherent structures occur randomly in space and time is
the basis for the treatment in this section, which follows Lumley’s application of the shot
noise decomposition [221, 223, 303]. We describe only the simplest case of a scalar field
with a single space variable. At the end of this section we briefly discuss a connection to
pattern analysis techniques such as [109, 359, 372].
Imagine a “building block” f (x), the basic coherent structure concentrated near 0, and
suppose that the process u(x) is generated by randomly sprinkling such blocks on the real
line or on a subinterval [0, L]. To move the structure so its reference point is at y, we
perform the convolution

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
94 Proper orthogonal decomposition


u(x) = δ(ξ − y) f (x − ξ )dξ,

where δ(ξ − y) is the Dirac delta function based at y. This prompts the following:
Definition A convolution of the type

u(x, t) = g(ξ ) f (x − ξ )dξ, (3.62)

where g(ξ ) is a random process in the space of generalized functions, will be called a shot
noise decomposition of u(x).
The goal of “extracting a coherent structure” implies that one wishes to reconstruct the
function f from statistics of u(x, t). With the above definition, we see that a shot noise
decomposition is always possible, and that it is moreover far from unique, since one has
freedom in choosing f and g. In fact if û(k), ĝ(k), and fˆ(k) are the Fourier transforms of
u, g, and f respectively, then û = ĝ fˆ. To remove some ambiguity in the decomposition
and to formalize the notion that g “randomly” sprinkles the deterministic blocks f , we
assume that the random process g is uncorrelated in non-overlapping intervals; i.e.
g(x)g ∗ (x  ) = η(x − x  ), (3.63)
where η(x − x  ) satisfies
  
  
h(x, x )η(x − x )d x d x = h(x, x)d x, (3.64)

for any continuous function h(x, x  ).


We can now partially characterize the power spectrum of the function f :

• If R̂(k) is the Fourier transform of the averaged two point correlation for a homogeneous
process, then R̂(k) = | fˆ(k)|2 .

To derive this result, note that


 
u(x)u ∗ (x  ) = R(x − x  )
%  &
∗  ∗   
= g(ξ ) f (x − ξ ) dξ g (ξ ) f (x − ξ ) dξ

= f (x − ξ ) f ∗ (x  − ξ  )g(ξ )g ∗ (ξ  ) dξ dξ 

= f (x − y) f ∗ (x  − y) dy.

The last equality comes from (3.64). Changing variables to s = x − y, we obtain




R(x − x ) = f (s) f (x  − x + s) ds,

from which the result follows from well-known convolution equalities.


Our assumption has effectively specified the modulus of the power spectrum of the build-
ing block; the phase angles are yet to be determined. This approach formalizes rather well
the stochastic sprinkling of deterministic structures in physical space. An extension of the

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.6 Coherent structures and homogeneity 95

shot noise formalism to include stochastic sprinkling in time is also possible, the ansatz
being extended to uncorrelatedness in time as well as space. However, coherent structures
are typically thought of as having a characteristic “life cycle,” which does not necessarily
sit well with lack of correlation in time. In this respect the assumption of a single building
block is also restrictive, since we expect to meet more than one spatial form of coherent
structure during the life cycle. These deficiencies reappear when we try to retrieve the
phase information for fˆ.
One possible way to obtain the phase information is from the bi-spectrum [59,212]; that
is, we want to find a second function θ (c) such that

fˆ(c) = R̂(c)1/2 e2π iθ(c) . (3.65)

Consider the triple correlation:


 
u(x)u(x + r1 )u(x + r2 )
%   &
     
= g(ξ ) f (x − ξ ) dξ g(ξ ) f (x + r1 − ξ ) dξ g(ξ ) f (x + r2 − ξ ) dξ

 
= f (x − ξ ) f (x + r1 − ξ  ) f (x + r2 − ξ  ) g(ξ )g(ξ  ) · g(ξ  ) dξ dξ  dξ  .
(3.66)

We now extend the assumption on g to be triply uncorrelated on non-overlapping intervals,


in which case the expression of (3.66) becomes

f (x) f (x + r1 ) f (x + r2 ) d x = B(r1 , r2 ), (3.67)

which serves as the definition of B(r1 , r2 ). Upon taking the Fourier transform of B we
obtain

B̂(c1 , c2 ) = B(r1 , r2 )e−2π i(c1 r1 +c2 r2 ) dr1 dr2 = fˆ(c1 ) fˆ(c2 ) fˆ∗ (c1 + c2 )

= R̂ 1/2 (c1 ) R̂ 1/2 (c2 ) R̂ 1/2 (c1 + c2 ) · e2π i[θ(c1 )+θ(c2 )−θ(c1 +c2 )] . (3.68)

In Equation (3.68) the “known” quantities are B̂(c1 , c2 ) and R̂(c). As observed in [221]
(see also [244]) this problem is, in general, not exactly solvable, since B̂(c1 , c2 ) may not
be factorizable as the right-hand side prescribes. Moin and Moser [244] observe that this
problem is encountered in other disciplines as well, see [31, 237], and ad hoc procedures
must be invoked. This is exactly where our assumptions on g come back to haunt us. The
lack of an exact solution to the bi-spectrum equation indicates that our assumptions are too
simple, either on the existence of a single building block, or on the statistical behavior of g.
We end with a brief and speculative discussion indicating a potential connection between
the POD and pattern recognition techniques. With the advent of digital image processing,
pattern recognition has become a vast field [313]. We limit ourselves to the relatively basic
procedures used in [109, 359, 372]. Coherent structures were originally found by direct
observations of flow visualizations. The quest for a quantitative procedure for extracting
coherent structures and their dynamics is still a subject of research. Pattern recognition

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
96 Proper orthogonal decomposition

techniques are designed to mimic the human capability of detecting patterns in a noisy
medium and thus they may be helpful in identifying such flow structures.
In the following discussion the reader should imagine a set of (two-) dimensional
images. The basic procedure is as follows: one wants to identify a recurrent pattern in
a noisy medium. First pick a template size and fill it with what is conjectured to look like
the coherent structure: this is the first member of the ensemble. The template is then moved
around in each frame of the data set and after each movement a correlation is computed.
Every time the correlation attains a local maximum the corresponding pattern is added
to the ensemble, which is averaged to produce a new reference template. This process is
repeated until the template undergoes no further change. The final template is supposed to
be the coherent structure. Once the coherent structure is deduced, one attempts to find
regions in space well correlated with this structure and to study their contributions to
various statistics.
This again, is a subjective procedure, although [359] suggests it is a robust one, with
the final template being practically independent of the initial conjectured structure. Our
mathematical understanding of the POD may contribute to a better understanding of the
results of pattern recognition applications. Observe the similarity in mission between the
pattern recognition technique and the shot noise expansion. Both attempt to decompose
the flow into building blocks (although in pattern recognition we concentrate on regions of
the flow with higher correlation with the template). This suggests caution in interpretation
of the resultant template, since, as we saw earlier in this section, any template with a suit-
able power spectrum might decompose the flow, with an appropriate sprinkling function.
This is accentuated by the fact that the experiments of [359], for instance, show a median
correlation of about 0.3. Based on the shot noise decomposition, one can propose a test
for the objectivity of this method, seeing how well the basic building block is reproduced.
Lumley’s example [223] would be a good starting point for such a quest.

3.7 Some applications


We close this chapter with a brief survey of applications of the POD in analysis of exper-
imental and numerically simulated data from various turbulent flows, and from numerical
simulations of model problems. Work with a more dynamical emphasis, including the
derivation and analysis of low-dimensional models, is described in Chapter 12. Neither
there nor in the present section can we give a complete survey of this rapidly evolving field,
and we apologize to those authors whose work has been omitted for reasons of space, or
our own ignorance.

3.7.1 Wall bounded flows


In an early use of the POD in turbulence, Bakewell and Lumley [28] measured two
point correlations of a single velocity component in the wall region of a fully devel-
oped pipe flow. They constructed the autocorrelation tensor using incompressibility and

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.7 Some applications 97

a closure assumption. The flow was approximately homogeneous in spanwise and stream-
wise directions and, in addition to the mean, they computed only a single eigenfunction
corresponding to the zero wavenumber Fourier mode in the streamwise direction. The
coherent structure was then reconstructed under the assumption of zero phase shift among
the spanwise Fourier modes. This yielded a pair of counter-rotating streamwise rolls much
like those pictured in Figures 2.15 and 2.16.
Using the same facility as [28] – a tunnel running almost pure glycerine as the working
fluid – Herzog [155] performed a fully three-dimensional study of the wall layer of turbu-
lent pipe flow at a Reynolds number of 8750 based on centerline velocity. He measured
streamwise and spanwise velocity components simultaneously with hot film probes and
computed the third component from incompressibility. He had rather low spatial reso-
lution (six points in the normal and spanwise directions, and seven in the streamwise
direction), but, by averaging over long periods, obtained well-converged statistics which
enabled him to compute the three leading eigenfunctions over a fairly wide range of span-
wise and streamwise Fourier wavenumbers. Herzog also measured correlation functions
with time lags, but only processed the zero time lag data to produce purely spatial eigen-
functions. He did not attempt to reconstruct phase relationships, but rather produced “raw”
Fourier-empirical data: eigenfunctions ϕ n (k1 , k3 ; x2 ) of the type needed for representation
of Equation (3.34). Indeed, it was these data that Aubry et al. [22] used in constructing
low-dimensional models, as described in Chapter 10. Herzog’s was the first full two point
data set to be measured.
Moin and Moser [244] used the direct numerical simulation of a turbulent channel flow
by Kim et al. [190] at a centerline Reynolds number of 3200 to perform a similar anal-
ysis. Although it is questionable that the statistics are fully converged (fewer than 200
realizations were included in the ensemble), the spatial resolution is excellent and, more-
over, extends across the whole channel, taking in the outer part of the boundary layer as
well as the wall region. The computations employed periodic boundary conditions (Fourier
modes) in the streamwise and spanwise directions, and their raw decomposition took the
same form as Herzog’s. However, they also studied various phase reconstruction meth-
ods. In addition to using the bi-spectrum in the shot noise decomposition of Section 3.6,
they proposed two further techniques, one based on spatial compactness and the other
on continuity of the eigenfunctions with respect to changes in spanwise and streamwise
wavenumber. With the help of these criteria they extracted the “characteristic structures”
which dominate the turbulent kinetic energy in both the wall region and the wall-to-channel
center domain.
Also working with a direct numerical simulation of channel flow, but at centerline
Reynolds numbers of 1500 and 3–4000, Sirovich et al. [332, 333] and Ball et al. [29]
computed a similar Fourier-empirical decomposition across the full channel. Rather than
attempting to estimate phase relations among the Fourier modes, they extracted the tem-
poral behavior of the modal coefficients in an expansion of the type (3.34) directly, by
projecting realizations of the solution on to specific modes. They found strong intermit-
tency, as one would expect from experimental observations of the burst-sweep process,
such as those described in Section 2.5. They also found evidence that plane waves
propagating obliquely can act as triggers for bursting.

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
98 Proper orthogonal decomposition

In the latter studies when the full channel is taken as the spatial domain over which the
optimal basis is to be computed, a relatively large number of eigenfunctions is required
to capture, say, 90% of the turbulent kinetic energy on average. Ball et al. [29] give a
figure of about 500, in reasonable agreement with the Liapunov dimension calculations
of Keefe et al. [186]. In quoting this figure, we count each triple consisting of a pair of
span and streamwise wavenumbers and an eigenfunction number as a separate “mode”;
the reader should note that some authors lump Fourier modes together by integrating or
summing, and count only empirical mode numbers. In contrast, when the wall layer is the
domain of interest, significantly fewer modes are necessary to reproduce the same fraction
of average energy. See Figures 2 and 3 in [244] and compare with Figures 10 and 12 of
[241]. In all cases the Fourier-empirical representations display an initial convergence rate
considerably faster than the Fourier-Chebyshev bases used for the numerical computations,
although over the last few percent in energy the empirical basis functions lose much of their
advantage. Moin and Moser [244] note that the first eigenfunction (summed over all Fourier
wavenumbers) captures 23% compared to only 4% for the lowest Chebyshev polynomial,
but that if 90% is to be reproduced, 10 empirical and 12 Chebyshev modes respectively are
required.
Finally we mention the work of Rempfer and Fasel on transition in flat plate boundary
layers. Like Moin and Moser [244], they used a database from a direct numerical sim-
ulation; specifically, one due to Rist and Fasel [305], who employed a finite difference
scheme with “buffer” regions at inlet and outlet to simulate a developing layer. In these
simulations, disturbances are triggered by perturbations symmetric about a mid-plane in
the streamwise/normal direction. This was chosen to match the experiments of Kachanov
et al. [180], and has the effect of breaking the spanwise translation/reflection symmetry
that one might otherwise expect in a computation with periodic boundary conditions in
that direction. Since the developing flow is now inhomogeneous in all spatial directions,
Fourier decompositions cannot be used, and fully three-dimensional POD computations
must be carried out.
In the first part of their work, [297, 299, 302], Rempfer and Fasel use the POD to
probe the spatio-temporal evolution of structures. They find that empirical eigenvalues
and eigenfunctions occur almost in pairs, reflecting the approximate streamwise transla-
tion invariance of relatively slowly growing structures. They identify the leading (pair of)
eigenfunction(s) with the dominant physically observed structure and demonstrate its sim-
ilarity to the lambda vortices revealed in flow visualization. They divide the flow into three
regions at different streamwise locations, thereby investigating the changes in the structures
as they evolve and propagate downstream. Near the inlet, the leading structure is primarily
two-dimensional; moving downstream, three-dimensional effects increase. Time histories
of the modal coefficients reveal that, while the leading (first and second) order structures
oscillate almost sinusoidally, higher order structures, which typically carry less than 5%
of the energy, display less regular, spiky behavior. This is consistent with a propagating
Tollmien–Schlichting wave on which “second order” instabilities are developing.
In the second phase of this work, [296, 300, 301], the authors develop low-dimensional
models similar to those of Aubry et al. [22], and study the energy flow among the various
modal components. We describe this in greater depth in Section 12.2.

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.7 Some applications 99

3.7.2 Free shear flows


Among the earliest studies of free shear flows by the POD were those of Glauser et al.
[121–123], who considered a jet. They did not account for the growth of the shear layer
but assumed approximate homogeneity in the streamwise direction. They found that most
of the turbulent kinetic energy resided in a “ridge” of wavenumber modes and proposed a
dynamical mechanism for turbulence production. See Section 12.1 for more details.
Sirovich and various co-workers also studied jet flows, using both experimental results
and numerically generated databases. In [335], conditionally sampled realizations of con-
centration fields from a gas jet at Reynolds number 1150 based on exit velocity and nozzle
diameter were used to generate a POD that reflects the large-scale structures in a frame
moving with the convected velocity. The conditioning breaks streamwise homogeneity and
yields a two-dimensional eigenvalue problem. The resulting eigenfunctions, as maps over
the streamwise and radial coordinates, clearly display the lobes responsible for entrainment
and mixing. Kirby et al. [192] performed a similar analysis using data from a large eddy
simulation, and Winter et al. [388] examined mixing of a jet exhausting into a turbulent
cross-flow.
Kirby et al. [193] studied two-dimensional pressure and momentum density fields
from a numerically simulated supersonic shear layer, also using conditional sampling to
“freeze” the convecting structures. They focused on the data compression afforded by use
of empirical eigenfunctions.
Glezer et al. [130] studied a periodically excited plane mixing layer using an extended
POD. Also see Section 12.3.

3.7.3 Rayleigh–Bénard convection


Numerically simulated Rayleigh–Bénard convection problems were studied by Sirovich
et al. [92, 276, 334, 338]. In the first pair of papers the authors made extensive use of
discrete symmetries of the domain, a rectangular parallelepiped, to increase ensemble
sizes as suggested in Section 3.3.3. They also used these symmetries to simplify their
computations by selecting even or odd parities for eigenfunction components. Here such
simplifications are especially important, for there are no homogeneous directions, and
fully three-dimensional empirical eigenfunctions must be found. The latter papers focus
on scaling properties and the computation of Liapunov and Karhunen–Loève dimensions
(see Section 3.3.4). In [337] low-dimensional dynamics and data compression issues were
considered.

3.7.4 Model problems


The POD has been used to analyze numerical solutions of several one- and two-space-
dimensional model equations. Sirovich and Rodriguez [308, 339, cf. 331] studied the
one-space-dimensional complex Ginzburg–Landau equation, investigating the ability of
Galerkin projections onto an empirical subspace computed at one parameter value to
reproduce dynamical and bifurcation behavior over a fairly wide parameter range.

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
100 Proper orthogonal decomposition

Chambers et al. [72] analyzed a simulation of the viscous Burgers equation defined on
the interval [0, 1] with zero boundary conditions and a random forcing term. They showed
that the empirical eigenfunctions exhibit “viscous” boundary layers near the ends of the
interval and an “outer” region in the interior of the interval essentially independent of the
viscosity parameter.
Kirby and Armbruster [191] applied a conditional POD and a moving POD to study
bifurcation in the Kuramoto–Sivashinsky equation, thereby identifying traveling wave
structures. Their work is similar in spirit to the pattern recognition studies outlined at the
end of Section 3.6. Aubry et al. [20, 21] studied the same problem, paying particular atten-
tion to the relation between the POD and symmetry groups (Section 3.3.3). See Section 7.5
below for more on traveling modes.
Armbruster et al. [12, 13] studied the two-dimensional Kolmogorov flow: the Navier–
Stokes equations in a rectangular domain subject to a spatially periodic steady body force.
They worked with both stream function and vorticity data and investigated how the POD
highlights different flow structures when optimizing the L 2 -norms of these quantities.
In the first paper they argue that the stream function eigenfunctions demonstrate low-
dimensional dynamics while the vorticity eigenfunctions display an enstrophy cascade.
The second paper is more in the spirit of [21]: discrete and continuous symmetries are
used to help identify local and global (homoclinic and heteroclinic) bifurcations. Platt
et al. [279] studied the same flow and, while they did not use the POD, their dynamical
investigations of intermittency share some common ground with [13].

3.8 Appendix: some foundations


The POD relies on the two mathematical concepts treated in this section. The first is aver-
aging, primarily insofar as it is needed to justify interchange of averaging and the inner
product operations. The second is the compactness of the operator R which, via Hilbert–
Schmidt theory, ensures the discrete spectrum and orthonormal eigenfunctions. (It is clear
from its definition that R is self-adjoint.) In this technical appendix we state and prove the
relevant results for these and some other properties used in the body of this chapter.

3.8.1 Probability measures


We first consider averaging. An averaging operation corresponds to a probability measure,
μ, on L 2 (), which, in the time stationary case, satisfies
μ(A) = μ(St−1 (A)). (3.69)
Here St is the solution semi-group or flow map of the dynamical system in question and
A is a subset in phase space. We discuss flow maps for finite-dimensional processes in
Chapter 6; in the present context the reader should think of the mapping induced on
an infinite-dimensional phase space by the solutions of the underlying governing equa-
tions. The existence of a measure corresponding to the physical ensemble average is an
assumption which we phrase as follows:

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.8 Appendix: some foundations 101

Ansatz 1 There exists an St invariant probability measure μ on L 2 () such that for every
(Borel) set A ⊂ L 2 ()
N −1
1 
μ(A) = lim 1 A (u k ), (3.70)
N →∞ N
k=0

where {u k }∞
k=0 is a sequence of physically determined states u ∈ L ().
k 2

The task of supplying an ensemble of physically determined states {u k } is left to the


experimentalist. The states should represent instantaneous velocity fields over the full flow
domain which are characteristic of the persistent turbulent activity after transients have
decayed: the “statistical equilibrium states” of the flow [256]. They must be representa-
tive of the system’s behavior in the sense that all statistical moments can, in principle,
be derived from them. In dynamical systems terms, μ(A) encodes the average frequency
with which typical orbits visit the region A of phase space. In the time-stationary case, the
measure μ defined by (3.70) is invariant under the evolution operator St of the system. A
simple one-dimensional example is given towards the end of Section 6.5.
We can now give conditions under which our standing assumption, that averaging and
spatial integration commute, will hold.

Proposition 1 Let μ be a probability measure on L 2 () ∩ C(), where C() is the set
of continuous functions on , such that for every x ∈  we have |u|2 (x) < ∞ and
|u|2 (x)1/2 ∈ L 2 (). Then:

1. u(x1 )u ∗ (x2 ) exists for every (x1 , x2 ) ∈  × .


2. u(x1 )u ∗ (x2 ) is in L 2 ( × ).
3. The following equation holds (interchange of averaging and integration):
%  1 1 &  1  1 
∗   ∗ 
 
u(x)u (x )ϕ(x )ψ (x) d x d x = u(x)u (x ) ϕ(x ) d x ψ ∗ (x) d x.
∗   
0 0 0 0
(3.71)

Proof To show property (1), define f xi : L 2 () ∩ C() → R by f xi (u) = u(xi ). Since
we are using continuous functions, the notion of point value is well defined (this is the
reason for working in C()). We have to show that the average

u(x1 )u ∗ (x2 ) = f x1 (u) f x∗2 (u)dμ(u)
L 2 ()

exists. The following upper bound establishes this:


  +1/2  +1/2
| f x1 (u) f x∗2 (u)|dμ(u) ≤ | f x1 |2 dμ(u) | f x2 |2 dμ(u)
L 2 () L 2 () L 2 ()
= |u| (x1 ) |u| (x2 ) .
2 1/2 2 1/2

To show property (2), we observe that

|u(x1 )u ∗ (x2 )u ∗ (x1 )u(x2 )| ≤ |u|2 (x1 )|u|2 (x2 ),

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
102 Proper orthogonal decomposition

and therefore
    +2
 
 u(x )u ∗
(x )u ∗
(x )u(x )d x d x  ≤ |u| 2
(x)d x < ∞.
 1 2 1 2 1 2
  

Finally, to show that Equation (3.71) holds we use Tonelli’s theorem, which permits change
of order of integration [319]. It is sufficient to show that the integral
 
u(x1 )u ∗ (x2 )ϕ(x2 )ψ ∗ (x1 )d x2 d x1
 

exists. But we have


  
 
 u(x1 )u (x2 )ϕ(x2 )ψ (x1 )d x2 d x1  ≤
∗ ∗

 
  +1/2   +1/2
|u(x1 )u ∗ (x2 )|2 d x2 d x1 |ϕ(x2 )ψ ∗ (x1 )|2 d x2 d x1
   
< ∞.

The requirement that |u|2 (x)1/2 ∈ L 2 () is physically reasonable: its interpretation is
that kinetic energy must be finite on average in a (finite) spatial domain. The requirement
that for every x we have |u|2 (x) < ∞ is also physical and corresponds to finite velocity
at any point, on average. We take the hypotheses of Proposition 1 as standing assump-
tions and so throughout we may freely interchange the order of the operations · and
 · d x.

3.8.2 Compactness of R
We now address the issue of compactness of the operator R introduced in Section 3.1.

Proposition 2 Under the assumptions on u in Proposition 1, and if additionally supp(μ)


is precompact, the operator R : L 2 () → L 2 () defined by

Rψ = u(x1 )u ∗ (x2 )ψ(x2 )d x2 (3.72)

is compact.

Proof Let ψn  ψ be a weakly convergent sequence. We have


 
 
R(ψn − ψ) =   u(x 1 )u ∗
(x 2 )[ψ (x
n 2 ) − ψ(x 2 )]d x 
2

%  &
 
=  u(x 1 ) u ∗
(x 2 )[ψ (x
n 2 ) − ψ(x 2 )]d x 
2 .

(3.73)

Given δ > 0 and using the precompactness of supp(μ), we can choose u 1 , . . . , u M to be a


δ > 0 spanning set in supp(μ), i.e. for every u ∈ supp(μ) there exists an 1 ≤ i ≤ M such
that u − u i  < δ. Let N be such that for all n > N and i = 1, . . . , M:

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.8 Appendix: some foundations 103

 
 
 u ∗ (x2 )[ψn (x2 ) − ψ(x2 )]d x2  < δ.
 i 


For u ∈ supp(μ) let Cl(u) be the u i , 1 ≤ i ≤ M closest to u, then Cl(u) − u < δ, and
we may bound the right-hand side of (3.73) by
%  &
 
 u(x1 ) [u ∗ (x2 ) − Cl(u ∗ )(x2 )][ψn (x2 ) − ψ(x2 )]d x2 
 
%   &
 
+  u(x1 ) Cl(u ∗ )(x2 )[ψn (x2 ) − ψ(x2 )]d x2 


%  &
   
≤ u(x1 ) δC  
 + u(x1 )δ .



Choosing δ > 0 small enough we have shown that R takes weakly convergent sequences
to strongly convergent ones. The operator is therefore compact.

We see that the assumption that the support of the invariant measure be contained in a
compact set in phase space suffices to show that the operator R is compact. Compactness
of the attractor in general and the support of the invariant measure in particular has been
established for many of the dissipative systems of interest; see Temam [367]. However, for
the three-dimensional Navier–Stokes equations, this still requires the assumption of global
existence of solutions.

3.8.3 Symmetry and invariant subspaces


We now state and prove the result underlying the discussion of symmetries at the end of
Section 3.3.3. It is given in terms of the probability measure μ corresponding to the average
·, introduced at the beginning of this appendix.

Proposition 3 Suppose that we have a semi-group or flow map St : L 2 () → L 2 () for
a dynamical system which is equivariant under a linear symmetry group , i.e. every ele-
ment γ ∈  is a linear transformation of L 2 (), such that γ ◦ St = St ◦ γ . Suppose further
that St is stationary and let μ be the invariant measure associated with time averages ·.
Then a necessary condition for μ to be ergodic is that, for almost every experiment, each
of the finite-dimensional eigenspaces Nλ corresponding to a given empirical eigenvalue λ
be invariant under .

Proof By ergodicity of μ, the averaged autocorrelation

R(x, x  ) = u(x, t)u ∗ (x  , t)

is independent of the initial condition u(x, 0) for every initial condition in a set of full
measure X ⊂ L 2 (). We can write


R(x, x  ) = λi ϕi (x)ϕi∗ (x  ).
i=1

Let Nλ = span{ϕi |λi = λ}. We want to show that if St is ergodic then, for every γ ∈  we
have γ (Nλ ) = Nλ . It is enough to show γ (Nλ ) ⊂ Nλ for every γ , since γ −1 ∈  and this

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
104 Proper orthogonal decomposition

implies γ −1 (Nλ ) ⊂ Nλ , showing that γ (Nλ ) = Nλ . Let {u(x, t)|t ∈ [0, ∞), u(x, 0) ∈ X }
be an experiment, then
∞
u(x, t) = ai (t)ϕi (x),
i=1

where ai ai∗  = λi . Thus, for every γ ∈ ,




ũ(x, t) = γ (u(x, t)) = ai (t)γ (ϕi (x))
i=1

is also an experiment (with probability 1 we have ũ(x, 0) ∈ X ). From the independence of


the representation of R(x, x  ) on the initial condition and the uncorrelatedness of the ai s,
we deduce that

R(x, x  ) = λi γ (ϕi (x))γ (ϕi (x  ))∗
i

is a diagonal representation of R(x, x  ).


In particular, the eigenspaces are orthogonal and
are spanned by the functions γ (ϕi (x)), so

γ (ϕi (x)) ∈ Nλi ,

which verifies the claim.

3.8.4 Spectral decay and approximate compactness


In Section 3.3.4, we used Chebyshev’s inequality to show that, if the tail of the spectrum of
empirical eigenfunctions decays sufficiently fast, then the probability is high that solutions
of the underlying dynamical system belong to a thin set around a finite-dimensional linear
subspace in phase space. However, this set Sn () is not necessarily compact. In Proposition
2 we saw that precompactness of the support of the invariant measure is sufficient to obtain
compactness of the operator R. Here we obtain a partial converse of this result by showing
that, if the empirical eigenvalues decay rapidly enough, then practically all the support of
the invariant measure is contained in a compact set.

Proposition 4 Let St be the flow of a dynamical system in L 2 ()∩C() with an invariant


measure μ. Denote by · the average with respect to this measure. Under the assumptions
of Proposition 1, if

u(x)u ∗ (x  ) = λi ϕi (x)ϕi∗ (x  ) with λn = o(exp(−cn))
i

for some c > 0, then for any  > 0 there exists a compact set B such that μ(B ) > 1 − .

Proof Take a sequence n → 0 such that


$∞ ∞ $∞
m=n+1 λm
 m=n+1 λm

→ 0 and <  and n = r 2 < ∞,
n2 n2
n=1

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005
3.8 Appendix: some foundations 105

for some constant r . This is possible since λn = o(exp(−cn)), and the head of the sequence
{λn } is of no consequence. Recall from Section 3.3.4 that Sn (n ) is a slab of thickness
n centered on the finite-dimensional subspace spanned by the (generalized) eigenvectors
$ 2
belonging to the first n eigenvalues. Since n < ∞, the closure of ∞ n=1 Sn (n ) is a
compact set in L 2 (). Indeed, we have

3 ∞ 
 ∞

Sn (n ) ⊂ {u ∈ L 2 : |(u, φm )|2 ≤ n = r 2 < ∞}.
n=1 n=1 m≥n n=1

Changing the order of summation in this expression leads to


∞ 
 ∞

|(u, φm )|2 = n|(u, φn )|2 ,
n=1 m≥n n=1

so that we have

3 ∞
 ∞

Sn (n ) ⊂ {u ∈ L 2 : n|(u, φn )|2 ≤ n = r 2 < ∞}.
n=1 n=1 n=1

This defines a precompact set in L2


because the right-hand side set is the image of the open
ball of L 2 of radius r under the compact linear map T defined by: (T v, φk ) = k −1/2 (v, φk )
for every k = 1, 2, . . .
Then, setting

3
B = Sn (n ),
n=1

(the closure of the infinite intersection), we see that



3 ∞
4
μ(B ) = μ( Sn (n )) = 1 − μ( Wn (n ))
n=1 n=1
∞ $∞
 m=n+1 λm
≥1− > 1 − .
n2
n=1

This final result has interesting implications. If we perform a POD on a system which is
not known a priori to have precompact support for its invariant measure and we obtain a
discrete POD with some regularity on the rate of decay of the spectrum, we may conclude
that most of the measure is concentrated on a compact set. This implies that, in a proba-
bilistic sense, we can well approximate the “full” attractor by a compact attractor, albeit
possibly an infinite-dimensional one. It is surprising that such fundamental information
may be obtained from a simple procedure.

Downloaded from https://www.cambridge.org/core. Imperial College London Library, on 09 Sep 2020 at 06:32:13, subject to the Cambridge Core terms of use,
available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/CBO9780511919701.005

You might also like