0% found this document useful (0 votes)

35 views20 pages

Stein 2011 DiffFilter

This document discusses using difference filters to precondition large covariance matrices that arise when modeling random processes or fields. The condition number of the covariance matrix grows quickly as more observations are taken, making the linear systems difficult to solve. The proposed method filters the covariance matrix using a differencing approach, which theoretically bounds the condition number independent of matrix size for some classes of processes. Numerical experiments show improved performance for solving the preconditioned linear systems iteratively.

Uploaded by

sadasdasdff

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views20 pages

Stein 2011 DiffFilter

Uploaded by

sadasdasdff

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Preprint ANL/MCS-P1888-0511

DIFFERENCE FILTER PRECONDITIONING FOR LARGE

COVARIANCE MATRICES
MICHAEL L. STEIN

, JIE CHEN

, AND MIHAI ANITESCU

Abstract. In many statistical applications one must solve linear systems corresponding to
large, dense, and possibly irregularly structured covariance matrices. These matrices are often ill-
conditioned; for example, the condition number increases at least linearly with respect to the size of
the matrix when observations of a random process are obtained from a xed domain. This paper
discusses a preconditioning technique based on a dierencing approach such that the preconditioned
covariance matrix has a bounded condition number independent of the size of the matrix for some
important process classes. When used in large scale simulations of random processes, signicant
improvement is observed for solving these linear systems with an iterative method.
Key words. Condition number, preconditioner, stochastic process, random eld, spectral anal-
ysis, xed-domain asymptotics
AMS subject classications. 65F35, 60G25, 62M15
1. Introduction. A problem that arises in many statistical applications is the
solution of linear systems of equations for large positive denite covariance matrices
(see, e.g., [15]). An underlying challenge for solving such linear systems is that co-
variance matrices are often dense and ill-conditioned. Specically, if one considers
taking an increasing number of observations of some random process in a xed and
bounded domain, then one often nds the condition number grows without bound
at some polynomial rate in the number of observations. This asymptotic approach
in which an increasing number of observations is taken in a xed region is called
xed-domain asymptotics. It is used extensively in spatial statistics [15] and is being
increasingly used in time series, especially in nance, where high frequency data is
now ubiquitous [2]. Preconditioned iterative methods are usually the practical choice
for solving these covariance matrices, whereby the matrix-vector multiplications and
the choice of a preconditioner are two crucial factors that aect the computational
eciency. Whereas the former problem has been extensively explored, for example,
by using the fast multipole method [9, 3, 6], the latter has not acquired satisfac-
tory answers yet. Some designs of the preconditioners have been proposed (see, e.g.,
[7, 10]); however, their behavior was rarely theoretically studied. This paper proves
that for processes whose spectral densities decay at certain specic rates at high fre-
quencies, the preconditioned covariance matrices have a bounded condition number.
The preconditioners use lters based on simple dierencing operations, which have
long been used to prewhiten (make the covariance matrix closer to a multiple of
the identity) regularly observed time series. However, the utility of such lters for
irregularly observed time series and spatial data is not as well recognized. These cases
are the focus of this work.
Consider a stationary real-valued random process Z(x) with covariance function
k(x) and spectral density f(), which are mutually related by the Fourier transform

Department of Statistics, University of Chicago, Chicago, IL 60637. Email:

[email protected]. Work of this author was supported by U.S. Department of En-
ergy Grant No. DE-SC0002557.

Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439.
Emails: (jiechen, anitescu)@mcs.anl.gov. Work of these authors was supported by the U.S.
Department of Energy, through Contract No. DE-AC02-06CH11357.
1
2 M. L. STEIN, J. CHEN, M. ANITESCU
and the inverse transform:
k(x) =
_
+

f() exp(ix) d, f() =

1
2
_
+

k(x) exp(ix) dx.

In higher dimensions, the process is more often called a random eld. Where boldface
letters denote vectors, a random eld Z(x) in R
d
has the following covariance function
k(x) with spectral density f():
k(x) =
_
R
d
f() exp(i
T
x) d, f() =
1
(2)
d
_
R
d
k(x) exp(i
T
x) dx.
For real-valued processes, both k and f are even functions. This paper describes
results for irregularly sited observations in one dimension and for gridded observations
in higher dimensions. To facilitate the presentation, we will in general use the notation
for d dimensions, except when discussions or results are specic to one dimension. The
covariance matrix K for observations Z(x
j
) at locations x
j
is dened as
K(j, l) covZ(x
j
), Z(x
l
) = k(x
j
x
l
).
Taking f to be nonnegative and integrable guarantees that k is a valid covariance
function. Indeed, for any real vector a,
a
T
Ka =

j,l
a
j
a
l
k(x
j
x
l
) =
_
R
d
f()

j
a
j
exp(i
T
x
j
)

2
d, (1.1)
which is obviously nonnegative as it must be since it equals var
_
j
a
j
Z(x
j
)
_
. The
existence of a spectral density implies that k is continuous.
In some statistical applications, a family of parameterized covariance functions
is chosen, and the task is to estimate the parameters and to uncover the underlying
covariance function that presumably generates the given observed data. Let be the
vector of parameters. We expand the notation and denote the covariance function
by k(x; ). Similarly, we use K() to denote the covariance matrix parameterized by
. We assume that observations y
j
= Z(x
j
) come from a stationary random eld
that is Gaussian with zero mean.
1
The maximum likelihood estimation [13] method
estimates the parameter by nding the maximizer of the log-likelihood function
/() =
1
2
y
T
K()
1
y
1
2
log(det(K()))
m
2
log 2,
where the vector y contains the m observations y
j
. A maximizer

is called a
maximum likelihood estimate of . The optimization can be performed by solving
(assuming there is a unique solution) the score equation
y
T
K()
1
K()

K()
1
y + tr
_
K()
1
K()

_
= 0, , (1.2)
where the left-hand side is nothing but the partial derivative of 2/(). Because of
the diculty of evaluating the trace for a large matrix, Anitescu et al. [1] exploited
1
The case of nonzero mean that is linear in a vector of unknown parameters can be handled with
little additional eort by using maximum likelihood or restricted maximum likelihood [15].
DIFFERENCE FILTERS FOR COVARIANCE MATRICES 3
the Hutchinson estimator of the matrix trace and proposed solving the sample average
approximation of the score equation instead:
F

() := y
T
K()
1
K()

K()
1
y
+
1
N
N

j=1
u
T
j
_
K()
1
K()

_
u
j
= 0, , (1.3)
where the sample vectors u
j
s have independent Rademacher variables as entries. As
the number N of sample vectors tends to innity, the solution

N
of (1.3) converges
to

in distribution:
(V
N
/N)
1/2
(

)
D
standard normal, (1.4)
where V
N
is some positive denite matrix dependent on the Jacobian and the variance
of F(). This error needs to be distinguished from the error in

itself as an estimate
of . Roughly speaking, this convergence result indicates that the th estimated
parameter

has variance of approximately V

N
(, )/N when N is suciently large.
Practical approaches (such as a Newton-type method) for solving (1.3) will need to
evaluate F (possibly multiple times), which in turn requires solving the linear system
K with multiple right-hand sides (y and u
j
s).
If we do not precondition, the condition number of K must grow faster than
linearly in m assuming the observation domain has nite diameter. To prove this, rst
note that we can pick observation locations y
m
and z
m
among x
1
, . . . x
m
such that
[y
m
z
m
[ 0 as m and k continuous implies var
_
1

2
Z(y
m
)
1

2
Z(z
m
)
_
0
as m , so that the minimum eigenvalue of K also tends to 0 as m . To
get a lower bound on the maximum eigenvalue, we note that there exists r > 0 such
that k(x) >
1
2
k(0) for all [x[ r. Assume that the observation domain has a nite
diameter, so that it can be covered by a nite number of balls of diameter r and call
this number B. Then for any m, one of these balls must contain at least m

m/B
observations. The sum of these observations divided by

has variance at least

1
2
m

k(0)
m
2B
k(0), so the maximum eigenvalue of K grows at least linearly with
m. Thus, the ratio of the maximum to the minimum eigenvalue of K and hence its
condition number grows faster than linearly in m. How much faster clearly depends
on the smoothness of Z, but we will not pursue this topic further here.
In what follows, we consider a ltering technique that essentially preconditions
K such that the new system has a condition number that does not grow with the size
of K for some distinguished process classes. Strictly speaking, the ltering operation,
though linear, is not equal to a preconditioner in the standard sense, since it reduces
the size of the matrix by a small number. Thus, we also consider augmenting the
lter to obtain a full-rank linear transformation that serves as a real preconditioner.
However, as long as the rank of the ltering matrix is close to m, maximum likelihood
of based on the ltered observations should generally be nearly as statistically
eective as maximum likelihood based on the full data. In particular, maximum
likelihood estimates are invariant under full rank transformations of the data.
The theoretical results on bounded condition numbers heavily rely on the prop-
erties of the spectral density f. For example, the results in one dimension require
either that the process behaves not too dierently than does Brownian motion or in-
tegrated Brownian motion, at least at high frequencies. Although the restrictions on
4 M. L. STEIN, J. CHEN, M. ANITESCU
f are strong, they do include some models frequently used for continuous time series
and in spatial statistics. As noted earlier, the theory is developed based on xed-
domain asymptotics; and, without loss of generality, we assume that this domain is
the box [0, T]
d
. As the observations become denser, for continuous k the correlations
of neighboring observations tend to 1, resulting in matrices K that are nearly singu-
lar. However, the proposed dierence lters can precondition K so that the resulting
matrix has a bounded condition number independent of the number of observations.
Section 4 gives several numerical examples demonstrating the eectiveness of this
preconditioning approach.
2. Filter for one-dimensional case. Let the process Z(x) be observed at
locations
0 x
0
< x
1
< < x
n
T,
and suppose the spectral density f satises
f()
2
bounded away from 0 and as . (2.1)
The spectral density of Brownian motion is proportional to
2
, so (2.1) says that
Z is not too dierent from Brownian motion in terms of its high frequency behavior.
Dene the process ltered by dierencing and scaling as
Y
(1)
j
= [Z(x
j
) Z(x
j1
)]/
_
d
j
, j = 1, . . . , n, (2.2)
where d
j
= x
j
x
j1
. Let K
(1)
denote the covariance matrix of the Y
(1)
j
s:
K
(1)
(j, l) = cov
_
Y
(1)
j
, Y
(1)
l
_
.
For Z Brownian motion, K
(1)
is a multiple of the identity matrix, and (2.1) is sucient
to show the condition number of K
(1)
is bounded by a nite value independent of the
number of observations.
Theorem 2.1. Suppose Z is a stationary process on R with spectral density f
satisfying (2.1). There exists a constant C depending only on T and f that bounds
the condition number of K
(1)
for all n.
If we let L
(1)
be a bidiagonal matrix with nonzero entries
L
(1)
(j, j 1) = 1/
_
d
j
and L
(1)
(j, j) = 1/
_
d
j
,
it is not hard to see that K and K
(1)
are related by
K
(1)
= L
(1)
KL
(1)
T
.
Note that L
(1)
is rectangular, since the row index ranges from 1 to n and the column
index ranges from 0 to n. It entails a special property that each row sums to zero:
a
T
L
(1)
1 = 0 (2.3)
for any vector a, where 1 denotes the vector of all 1s. It will be clear later that (2.3)
is key to the proof of the theorem. For now we note that if = L
(1)
T
a, then
a
T
K
(1)
a =
T
K = var
_

j
Z(x
j
)
_
with

j
= 0. (2.4)
DIFFERENCE FILTERS FOR COVARIANCE MATRICES 5
Strictly speaking, L
(1)
T
L
(1)
is not a preconditioner, since L
(1)
has more columns
than rows, even though the transformed matrix K
(1)
has a desirable condition prop-
erty. A real preconditioner can be obtained by augmenting L
(1)
. To this end, we
dene, in addition to (2.2),
Y
(1)
0
= Z(x
0
), (2.5)
and let

K
(1)
denote the covariance matrix of all the Y
(1)
j
s, including Y
(1)
0
. Then we
have

K
(1)
=

L
(1)
K

L
(1)
T
,
where

L
(1)
is obtained by adding to L
(1)
the 0th row, with 0th entry equal to 1 and
other entries 0. Clearly,

L
(1)
is nonsingular. Thus,

L
(1)
T

L
(1)
preconditions the matrix
K:
Corollary 2.2. Suppose Z is a stationary process on R with spectral density
f satisfying (2.1). Then there exists a constant C depending only on T and f that
bounds the condition number of

K
(1)
for all n.
We next consider the case where the spectral density f satises
f()
4
bounded away from 0 and as . (2.6)
Integrated Brownian motion, a process whose rst derivative is Brownian motion,
has spectral density proportional to
4
. Thus (2.6) says Z behaves somewhat like
integrated Brownian motion at high frequencies. In this case, the appropriate precon-
ditioner uses second order dierences. Dene
Y
(2)
j
=
[Z(x
j+1
) Z(x
j
)]/d
j+1
[Z(x
j
) Z(x
j1
)]/d
j
2
_
d
j+1
+d
j
, j = 1, . . . , n 1, (2.7)
and denote by K
(2)
the covariance matrix of the Y
(2)
j
s, j = 1, . . . , n 1, namely,
K
(2)
(j, l) = cov
_
Y
(2)
j
, Y
(2)
l
_
.
Then for Z integrated Brownian motion K
(2)
is a tridiagonal matrix with bounded
condition number (see 2.3). This result allows us to show the condition number of
K
(2)
is bounded by a nite value independent of n whenever f satises (2.6).
Theorem 2.3. Suppose Z is a stationary process on R with spectral density f
satisfying (2.6). Then there exists a constant C depending only on T and f that
bounds the condition number of K
(2)
for all n.
If we let L
(2)
be the tridiagonal matrix with nonzero entries
L
(2)
(j, j 1) = 1/(2d
j
_
d
j
+d
j+1
),
L
(2)
(j, j + 1) = 1/(2d
j+1
_
d
j
+d
j+1
),
L
(2)
(j, j) = L
(2)
(j, j 1) L
(2)
(j, j + 1),
for j = 1, . . . , n 1, and let K
(2)
be the covariance matrix of the Y
(2)
j
s, then K and
K
(2)
are related by
K
(2)
= L
(2)
KL
(2)
T
.
6 M. L. STEIN, J. CHEN, M. ANITESCU
Similar to (2.3), the matrix L
(2)
has a property that for any vector a,
a
T
L
(2)
x
0
= 0,
a
T
L
(2)
x
1
= 0,
(2.8)
where x
0
= 1, the vector of all 1s, and x
1
has entries (x
1
)
j
= x
j
. In other words, if
we let = L
(2)
T
a, then
a
T
K
(1)
a =
T
K = var
_
_
_

j
Z(x
j
)
_
_
_
, with

j
= 0 and

j
x
j
= 0.
To yield a preconditioner for K in the strict sense, in addition to (2.7), we dene
Y
(2)
0
= Z(x
0
) +Z(x
n
), and Y
(2)
n
= [Z(x
n
) Z(x
0
)]/(x
n
x
0
).
Accordingly, we augment the matrix L
(2)
to

L
(2)
with

L
(2)
(0, l) =
_

_
1, l = 0
1, l = n
0, otherwise,

L
(2)
(n, l) =
_

_
1/(x
n
x
0
), l = 0
1/(x
n
x
0
), l = n
0, otherwise,
and use

K
(2)
to denote the covariance matrix of the Y
(2)
j
s, including Y
(2)
0
and Y
(2)
n
.
Then, we obtain

K
(2)
=

L
(2)
K

L
(2)
T
.
One can easily verify that

L
(2)
is nonsingular. Thus,

L
(2)
T

L
(2)
becomes a precondi-
tioner for K:
Corollary 2.4. Suppose Z is a stationary process on R with spectral density
f satisfying (2.6). Then there exists a constant C depending only on T and f that
bounds the condition number of

K
(2)
for all n.
We expect that versions of the theorems and corollaries hold whenever, for some
positive integer , f()
2
is bounded away from 0 and as . However,
the given proofs rely on detailed calculations on the covariance matrices and do not
easily extend to larger . Nevertheless, we nd it interesting and somewhat surprising
that no restriction is needed on the spacing of the observation locations, especially
for = 2. These results perhaps give some hope that similar results for irregularly
spaced observations might hold in more than one dimension.
The rest of this section gives proofs of the above results. The proofs make sub-
stantial use of results concerning equivalence of Gaussian measures [11]. In contrast,
the results for the high dimension case (presented in 3) are proved without recourse
to equivalence of Gaussian measures.
2.1. Intrinsic random function and equivalence of Gaussian measures.
We rst provide some preliminaries. For a random process Z (not necessarily station-
ary) on R and a nonnegative integer p, a random variable of the form

n
j=1

j
Z(x
j
)
for which

n
j=1

j
x

j
= 0 for all nonnegative integers p is called an authorized
linear combination of order p, or ALC-p [5]. If, for every ALC-p

n
j=1

j
Z(x
j
), the
process Y (x) =

n
j=1

j
Z(x +x
j
) is stationary, then Z is called an intrinsic random
function of order p, or IRF-p [5].
DIFFERENCE FILTERS FOR COVARIANCE MATRICES 7
Similar to stationary processes, intrinsic random functions have spectral measures,
although they may not be integrable in a neighborhood of the origin. We still use g()
to denote the spectral density with respect to the Lebesgue measure. Corresponding
to these spectral measures are what are known as generalized covariance functions.
Specically, for any IRF-p, there exists a generalized covariance function G(x) such
that for any ALC-p

n
j=1

j
Z(x
j
),
var
_
_
_
n

j=1

j
Z(x
j
)
_
_
_
=
n

j,l=1

l
G(x
j
x
l
).
Although a generalized covariance function G cannot be written as the Fourier trans-
form of a positive nite measure, it is related to the spectral density g by
n

j,l=1

l
G(x
j
x
l
) =
_
+

g()

j=1

j
exp(ix
j
)

2
d
for any ALC-p

n
j=1

j
Z(x
j
).
Brownian motion is an example of an IRF-0 and integrated Brownian motion an
example of an IRF-1. Dening g
r
() = [[
r
, Brownian motion has a spectral density
proportional to g
2
with generalized covariance function c[x[ for some c > 0. Note
that if one sets Z(0) = 0, then covZ(x), Z(s) = minx, s for x, s 0. Integrated
Brownian motion has a spectral density proportional to g
4
with generalized covariance
function c[x[
3
for some c > 0.
We will need to use some results from Stein [17] on equivalence of Gaussian
measures. Let L
T
be the vector space of random variables generated by Z(x) for
x [0, T] and L
T,p
the subspace of L
T
containing all ALC-ps in L
T
, so that L
T

L
T,0
L
T,1
. Let P
T,p
(f) and P
T
(f) be the Gaussian measure for L
T,p
and L
T
,
respectively, when Z has mean 0 and spectral density f. For measures P and Q on
the same measurable space, write P Q to indicate that the measures are equivalent
(mutually absolutely continuous). Since L
T
L
T,p
, for two spectral densities f and
g, P
T
(f) P
T
(g) implies that P
T,p
(f) P
T,p
(g) for all p 0.
2.2. Proof of Theorem 2.1. Let K(h) denote the covariance matrix K associ-
ated to a spectral density h, and similarly for K
(1)
(h),

K
(1)
(h), K
(2)
(h), and

K
(2)
(h).
The main idea of the proof is to upper and lower bound the bilinear form a
T
K
(1)
(f)a
for f satisfying (2.1) by constants times a
T
K
(1)
(g
2
)a. Then since K
(1)
(g
2
) has a con-
dition number 1 independent of n, it immediately follows that K
(1)
(f) has a bounded
condition number, also independent of n.
Let f
0
() = (1 +
2
)
1
and
f
R
() =
_
f(), [[ R
f
0
(), [[ > R
for some R. By (2.1), there exist R and 0 < C
0
< C
1
< such that C
0
f
R
()
f() C
1
f
R
() for all . Then by (1.1) and (2.4), for any real vector a,
C
0
a
T
K
(1)
(f
R
)a a
T
K
(1)
(f)a C
1
a
T
K
(1)
(f
R
)a. (2.9)
By the denition of f
0
, we have P
T,0
(f
0
) P
T,0
(g
2
) [17, Theorem 1]. Since
f
R
= f
0
for [[ > R, by Ibragimov and Rozanov [11, Theorem 17 of Chapter III], we
8 M. L. STEIN, J. CHEN, M. ANITESCU
have P
T
(f
R
) P
T
(f
0
); thus P
T,0
(f
R
) P
T,0
(f
0
). Therefore, by the transitivity of
equivalence, we obtain that P
T,0
(f
R
) P
T,0
(g
2
). From basic properties of equivalent
Gaussian measures (see [11, (2.6) on page 76]), there exist constants 0 < C
2
< C
3
<
such that for any ALC-0,

n
j=0

j
Z(x
j
) with 0 x
j
T for all j,
C
2
var
g2
_
_
_
n

j=0

j
Z(x
j
)
_
_
_
var
fR
_
_
_
n

j=0

j
Z(x
j
)
_
_
_
C
3
var
g2
_
_
_
n

j=0

j
Z(x
j
)
_
_
_
,
where var
f
, for example, indicates that variances are computed under the spectral
density f. Then by (2.4) we obtain
C
2
a
T
K
(1)
(g
2
)a a
T
K
(1)
(f
R
)a C
3
a
T
K
(1)
(g
2
)a. (2.10)
Combining (2.9) and (2.10), we have
C
0
C
2
a
T
K
(1)
(g
2
)a a
T
K
(1)
(f)a C
1
C
3
a
T
K
(1)
(g
2
)a,
and thus the condition number of K
(1)
(f) is upper bounded by C
1
C
3
/(C
0
C
2
).
2.3. Proof of Theorem 2.3. Following a similar argument as in the preceding
proof, the bilinear form a
T
K
(2)
(f)a for f satisfying (2.6) can be upper and lower
bounded by constants times a
T
K
(2)
(g
4
)a. Then it suces to prove that K
(2)
(g
4
) has
a bounded condition number, and thus the theorem holds.
To estimate the condition number of K
(2)
(g
4
), rst note the fact that for any two
ALC-1s

j

j
Z(x
j
) and

j

j
Z(x
j
),

j,l

l
(x
j
x
l
)
3
= 0. (2.11)
Based on the generalized covariance function of g
4
, c[x[
3
, we have
(j, l)-entry of K
(2)
(g
4
) = cov
_
Y
(2)
j
, Y
(2)
l
_
= cov
_
_
_
+1

=1
L
(2)
(j, j +j

)Z(x
j+j
),
+1

=1
L
(2)
(l, l +l

)Z(x
l+l
)
_
_
_
= c
+1

=1
+1

=1
L
(2)
(j, j +j

)L
(2)
(l, l +l

)[x
j+j
x
l+l
[
3
.
Since for any j, Y
(2)
j
is ALC-1, by using (2.11) one can calculate that
(j, l)-entry of K
(2)
(g
4
) =
_

_
c, l = j
cd
j+1
/(2
_
d
j+1
+d
j
_
d
j+2
+d
j+1
) l = j + 1
0, [l j[ > 1,
which means that K
(2)
(g
4
) is a tridiagonal matrix with a constant diagonal c.
To simplify notation, let C(j, l) denote the (j, l)-entry of K
(2)
(g
4
). We have
[C(j 1, j)[ +[C(j, j + 1)[ =
cd
j
2
_
d
j
+d
j1
_
d
j+1
+d
j
+
cd
j+1
2
_
d
j+1
+d
j
_
d
j+2
+d
j+1

c
_
d
j
2
_
d
j+1
+d
j
+
c
_
d
j+1
2
_
d
j+1
+d
j

2
.
DIFFERENCE FILTERS FOR COVARIANCE MATRICES 9
For any vector a,
a
T
K
(2)
(g
4
)a =
n1

j,l=1
a
j
a
l
C(j, l) c
n1

j=1
a
2
j
2
n2

j=1
[a
j
a
j+1
C(j, j + 1)[,
but
2
n2

j=1
[a
j
a
j+1
C(j, j + 1)[
n2

j=1
(a
2
j
+a
2
j+1
)[C(j, j + 1)[

j=1
a
2
j
([C(j 1, j)[ +[C(j, j + 1)[)

2
n1

j=1
a
2
j
.
Therefore,
a
T
K
(2)
(g
4
)a c(1 1/

2) |a|
2
. (2.12)
Similarly, we have a
T
K
(2)
(g
4
)a c(1 + 1/

2) |a|
2
. Thus the condition number of
K
(2)
(g
4
) is at most (1 + 1/

2)/(1 1/

2) = 3 + 2

2.
2.4. Proof of Corollaries 2.2 and 2.4. The proof of Corollary 2.2 is similar to
but simpler than the proof of Corollary 2.4 and is omitted. The main idea of proving
Corollary 2.4 is to consider the following covariance function (3.836.5 in [8] with n = 4
shows that B

is a valid covariance function)

(x) =
_

_
32
3

3
4x
2
+[x[
3
, [x[ 2
1
3
(4 [x[)
3
, 2 < [x[ 4
0, [x[ > 4
for > 0 and the covariance function E(x) = 3e
|x|
(1 +[x[). The function B

has a
spectral density h

() proportional to sin()
4
/()
4
, and E has a spectral density
() = 6(1 +
2
)
2
. Using similar ideas as in the proof of Theorem 2.1, we dene

R
() =
_
f(), [[ R
(), [[ > R
for some R. Then by (2.6), there exist R and 0 < C
0
< C
1
< such that for any
real vector a,
C
0
a
T

K
(2)
(
R
)a a
T

K
(2)
(f)a C
1
a
T

K
(2)
(
R
)a. (2.13)
Furthermore, according to the results in [11, Theorem 17 of Chapter III], when T 2,
P
T
(h

) P
T
() P
T
(
R
), which leads to
C
2
a
T

K
(2)
(h

)a a
T

K
(2)
(
R
)a C
3
a
T

K
(2)
(h

)a (2.14)
for some 0 < C
2
< C
3
< . Combining (2.13) and (2.14), it remains to prove that

K
(2)
(h

) has a bounded condition number, then so does

K
(2)
(f).
10 M. L. STEIN, J. CHEN, M. ANITESCU
When T 2, only the branch [x[ 2 of B

is used, and one can compute the

covariance matrix

K
(2)
(h

) according to the denition of B

entry by entry:
(0, 0)-entry =
128
3

3
8D
2
+ 2D
3
(0, j)-entry =
_
4 +
3
2
D
_
_
d
j+1
+d
j
for j = 1, . . . , n 1
(0, n)-entry = 0
(j, 0)-entry = (0, j)-entry for j = 1, . . . , n 1
(j, l)-entry = (j, l)-entry of K
(2)
(g
4
)/c for j, l = 1, . . . , n 1
(j, n)-entry = (n, j)-entry for j = 1, . . . , n 1
(n, 0)-entry = 0
(n, j)-entry =
_
d
j+1
+d
j
D
_
x
j1
+x
j
+x
j+1

3
2
x
0

3
2
x
n
_
for j = 1, . . . , n 1
(n, n)-entry = 8 2D,
where D = x
n
x
0
, and recall that c is the coecient in the generalized covariance
function corresponding to g
4
. To simplify notation, let H(j, l) denote the (j, l)-entry
of

K
(2)
(h

). Then we have
a
T

K
(2)
(h

)a = a
2
0
H(0, 0) +a
2
n
H(n, n) + 2a
0
n1

j=1
a
j
H(0, j) + 2a
n
n1

j=1
a
j
H(n, j)
+ a
T
K
(2)
(g
4
) a/c, (2.15)
where a is the vector a with a
0
and a
n
removed. For every > 0, using [2xy[ x
2
+y
2
and the Cauchy-Schwartz inequality, we have

2a
0
n1

j=1
a
j
H(0, j)

2
a
2
0
+
1

2
n1

j=1
a
2
j
n1

j=1
H(0, j)
2

2
a
2
0
+
1
2
2
D(8 3D)
2
n1

j=1
a
2
j
. (2.16)
Similarly, for every > 0, using

x
j1
+x
j
+x
j+1

3
2
x
0

3
2
x
n

3D, we have

2a
n
n1

j=1
a
j
H(n, j)

2
a
2
n
+
1

2
n1

j=1
a
2
j
n1

j=1
H(n, j)
2

2
a
2
n
+
18D

2
n1

j=1
a
2
j
. (2.17)
Furthermore, by 2.12,
a
T
K
(2)
(g
4
) a/c (1 1/

2) | a|
2
. (2.18)
DIFFERENCE FILTERS FOR COVARIANCE MATRICES 11
Applying (2.16), (2.17) and (2.18) to (2.15), together with D T 2, we obtain
a
T

K
(2)
(h

)a
_
128
3

3
8D
2
+ 2D
3

2
_
a
2
0
+ (8 2D
2
)a
2
n
+
_
1
1

1
2
2
D(8 3D)
2

18D

2
_
n1

j=1
a
2
j

_
128
3

3
8T
2
+ 2T
3

2
_
a
2
0
+ (8 2T
2
)a
2
n
+
_
1
1

1
2
2
T (8 3T)
2

18T

2
_
n1

j=1
a
2
j
.
Setting = 14T,
2
= 116000T
3
and
2
= 100T yields
a
T

K
(2)
(h

)a
2902
3
T
3
a
2
0
+ 10Ta
2
1
+
_
178359
232000

1

2
_
n1

j=1
a
2
j
.
Since
178359
232000

1

2
.06, the minimum eigenvalue of

K
(2)
(h

) is bounded away from

0 independent of n. Similarly, the maximum eigenvalue is bounded from above, and
thus

K
(2)
(h

) has a bounded condition number.

3. Filter for d-dimensional case. The results in 2 do not easily extend to
higher dimensions. One simple exception is when the observation locations are the
tensor product of d one-dimensional grids and when the covariance function k(x) is the
product of d covariance functions for each dimension (i.e., k(x) = k
1
(x
1
)k
2
(x
2
) . . . k
d
(x
d
)),
in which case the covariance matrix is the Kronecker product of d one-dimensional
covariance matrices, each of which can be preconditioned separately. However, such
models are of questionable relevance in applications [15]. Here, we restrict the lo-
cations of observations to a regular grid. In this case, the second order dierence
lter becomes the standard discrete Laplace operator. A benet is that the Laplace
operator can be recursively applied many times, resulting in essentially a much higher
order dierence ltering.
Since the observation locations are evenly spaced, we use = T/n to denote
the spacing, where n is the number of observations along each dimension. Thus,
the locations are j, 0 j n. Here, j is a vector of integers, and n means
the vector of all ns.
2
Since the locations are on a grid, we use vector indices in
convenience. Thus, the covariance matrix K for the observations Z(j) has entries
K(j, l) = k(j l). We dene the Laplace operator to be
Z(j) =
d

p=1
Z(j e
p
) 2Z(j) +Z(j +e
p
),
where e
p
denotes the unit vector along the pth coordinate. When the operator is
applied times, we denote
Y
[]
j
=

Z(j).
2
Sometimes, boldface letters denote a vector of same entries (such as n meaning a vector of all
ns). Under context, this notation is self-explaining and not to be confused with the notation of a
general vector. Other examples in this paper include 1 and .
12 M. L. STEIN, J. CHEN, M. ANITESCU
Note that this notation is in parallel to the ones in (2.2) and (2.7), with [] meaning the
number of applications of the Laplace operator (instead of the order of the dierence),
and the index j being a vector (instead of a scalar). In addition, we use K
[]
to denote
the covariance matrix of Y
[]
j
, j n :
K
[]
(j, l) = cov
_
Y
[]
j
, Y
[]
l
_
.
We have the following result.
Theorem 3.1. Suppose Z is a stationary random eld on R
d
with spectral density
f satisfying
f() (1 +||)

, (3.1)
where = 4 for some positive integer . Then there exists a constant C depending
only on T and f that bounds the condition number of K
[]
for all n.
Recall that for a(), b() 0, the relationship a() b() indicates that
there exist C
1
, C
2
> 0 such that C
1
a() b() C
2
a(), .
It is not hard to verify that K
[]
and K are related by K
[]
= L
[]
KL
[]
T
, where
L
[]
= L
n+1
L
n1
L
n
and L
s
is an (s 1)
d
(s + 1)
d
matrix with entries
L
s
(j, l) =
_

_
2d, l = j
1, l = j e
p
, p = 1, . . . , d
0, otherwise,
for 1 j s 1. One may also want to have a nonsingular

L
[]
such that the
condition number of

L
[]
K

L
[]
T
is bounded. However, we cannot prove that such an
augmentation yields matrices with bounded condition number, although numerical
results in 5 suggest that such a result may be achievable. Stein [16] applied the
iterated Laplacian to gridded observations in d dimensions to improve approximations
to the likelihood based on the spatial periodogram and similarly made no eort to
recover the information lost by using a less than full rank transformation. It is worth
noting that processes with spectral densities of the form (3.1) observed on a grid bear
some resemblance to Markov random elds [14], which provide an alternative way to
model spatial data observed at discrete locations.
3.1. Proof of Theorem 3.1. First note that if one restricts to observations on
the grid j for j Z
d
, the covariance function k can be written as an integral in
[, ]
d
:
k(j) =
_
R
d
f() exp(i
T
(j)) d =
_
[,]
d
f

() exp(i
T
j) d,
where
f

() =
d

lZ
d
f(
1
( + 2l)). (3.2)
DIFFERENCE FILTERS FOR COVARIANCE MATRICES 13
Denote by k
[]
the covariance function such that k
[]
(j l) = K
[]
(j, l). Then
according to the denition of the operator , we have k
[0]
= k and the recurrence
k
[+1]
(j) =
d

p,q=1
k
[]
(j +(e
p
+e
q
)) 2k
[]
(j +e
p
) +k
[]
(j +(e
p
e
q
))
2k
[]
(j +e
q
) + 4k
[]
(j) 2k
[]
(j e
q
)
+k
[]
(j +(e
p
+e
q
)) 2k
[]
(j e
p
) +k
[]
(j +(e
p
e
q
)).
If we let
k
[]
(j) =
_
[,]
d
f
[]

() exp(i
T
j) d,
then the above recurrence for k
[]
translates to
f
[]

() =
_
d

p=1
4 sin
2
_

p
2
_
_
2
f

(), (3.3)
and for any real vector a, we have
a
T
K
[]
a =

j,ln
a
j
a
l
k
[]
(jl) =
_
[,]
d
f
[]

()

jn
a
j
exp(i
T
j)

2
d.
Therefore, to prove that K
[]
has a bounded condition number, we need to bound the
expression for a
T
K
[]
a given in the above equality.
According to the assumption of f in (3.1), combining (3.2) and (3.3), we have

d
f
[]

()
_
d

p=1
4 sin
2
_

p
2
_
_
2

lZ
d
( +| + 2l|)

=: h

().
Therefore, there exist 0 < C
0
C
1
< independent of and a, such that
C
0
H

(a)
d
a
T
K
[]
a C
1
H

(a), (3.4)
where
H

(a) =
_
[,]
d
h

()

jn
a
j
exp(i
T
j)

2
d.
We proceed to bound the function H

(a).
For any ,= 0, h

() is continuous with h

(0) = 0. When = 0 and = 4,

it can be shown that h
0
() is also continuous, but h
0
(0) = 1. In other words, h

converges to h
0
pointwise except at the origin. Since h

> h

when <

, we
have that h

is upper bounded by h
0
for all . Moreover, by the continuity of h
0
in
[, ]
d
, h
0
has a maximum C
2
. Therefore, h

() C
2
for all and , and thus
H

(a) C
2
_
[,]
d

jn
a
j
exp(i
T
j)

2
d = C
2
(2)
d

jn
a
2
j
. (3.5)
14 M. L. STEIN, J. CHEN, M. ANITESCU
Now we need a lower bound for H

(a). First, note that when [, ]

d
,
h

() sinc
2
(1/2) ||
4
( +||)

.
Therefore, for any 0 < /,
H

(a) sinc
2
(1/2)
_
[,]
d
_
||
+||
_

jn
a
j
exp(i
T
j)

2
d
sinc
2
(1/2)
_
[,]
d
\{}
_
||
+||
_

jn
a
j
exp(i
T
j)

2
d
sinc
2
(1/2)
_

1 +
_

_
[,]
d
\{}

jn
a
j
exp(i
T
j)

2
d. (3.6)
To obtain a lower bound on this last integral, note that
_
[,]
d

jn
a
j
exp(i
T
j)

2
d = (2)
d

jn
a
2
j
and
_

jn
a
j
exp(i
T
j)

2
d
_

_

jn
[a
j
[
_
2
d

(n + 1 2)
d

jn
a
2
j
d
= (n + 1 2)
d
()
d
V
d

jn
a
2
j
(T)
d
V
d

jn
a
2
j
,
where V
d
is the volume of the d-dimensional unit ball, which is always less than 2
d
.
Applying these results to (3.6),
H

(a) sinc
2
(1/2)
_

1 +
_

_
(2)
d
(T)
d
V
d

jn
a
2
j
.
Since this bound holds for any 0 < /, we specically let = 1/T. Then
H

(a) C
3

jn
a
2
j
(3.7)
with
C
3
=
sinc
2
(1/2)[(2)
d
V
d
]
(1 +T)

which is independent of .
Combining (3.4), (3.5) and (3.7), we have
C
0
C
3
|a|
2

d
a
T
K
[]
a C
1
C
2
(2)
d
|a|
2
,
which means that the condition number of K
[]
is bounded by (2)
d
C
1
C
2
/(C
0
C
3
).
DIFFERENCE FILTERS FOR COVARIANCE MATRICES 15
4. Numerical experiments. A class of popularly used covariance functions
that are exible in reecting the local behavior of spatially varying data is the Matern
covariance model [15, 13]:
k(x) =
1
2
1
()
_

2 |x|

_
,
where is the Gamma function and /

is the modied Bessel function of the second

kind of order . The parameter controls the dierentiability of the model, and is
a scale parameter. The corresponding spectral density
f()
_
2

2
+||
2
_
(+d/2)
,
which is dimension dependent. It is clear that with some choices of , f satises the
requirements of the theorems in this paper. For example, when d = 1, the Matern
model with = 1/2 corresponds to Theorem 2.1 and Collorary 2.2, whereas = 3/2
corresponds to Theorem 2.3 and Collorary 2.4. Also, when d = 2, the Matern model
with = 1 corresponds to Theorem 3.1 with = 4, meaning that the Laplace operator
is needed to apply once ( = 1). Whittle [18] argued that the choice of = 1 is
particularly natural for processes in R
2
, in large part because the process is a solution
to a stochastic version of the Laplace equation driven by white noise.
For the above three examples, we plot in Figure 4.1 the curves of the condition
numbers for both K and the ltered versions of K, as the size m of the matrix varies.
The plots were obtained by xing the domain T = 100 and the scale parameter = 7.
For one-dimensional cases, observation locations were randomly generated according
to the uniform distribution on [0, T]. The plots clearly show that the condition number
of K grows very fast with the size of the matrix. With an appropriate lter applied,
on the other hand, the condition number of the ltered covariance matrix stays more
or less the same, a phenomenon consistent with the theoretical results.
The good condition property of the ltered covariance matrix is exploited in the
block preconditioned conjugate gradient (block PCG) solver. The block version of
PCG is used instead of the single vector version because in some applications, such as
the one presented in 1, the linear system has multiple right-hand sides. We remark
that the convergence rate of block PCG depends not on the condition number, but on
a modied condition number of the linear system [12]. Let
j
, sorted increasingly, be
the eigenvalues of the linear system. With s right-hand sides, the modied condition
number is
m
/
s
. Nevertheless, a bounded condition number indicates a bounded
modied condition number, which is desirable for block PCG. Figure 4.2 shows the
results of an experiment where the observation locations were on a 128 128 regular
grid and s = 100 random right-hand sides were used. Note that since K and K
[1]
are
BTTB (block Toeplitz with Toeplitz blocks), they can be further preconditioned by
using a BCCB (block circulant with circulant blocks) preconditioner [4]. Comparing
the convergence history for K, K preconditioned with a BCCB preconditioner, K
[1]
,
and K
[1]
preconditioned with a BCCB preconditioner, we see that the last case clearly
yields the fastest convergence.
Next, we demonstrate the usefulness of the bounded condition number results in
the maximum likelihood problem mentioned in 1. The simulation process without
any ltering is as follows. We rst generated observations y = Z(x) for a Gaussian
16 M. L. STEIN, J. CHEN, M. ANITESCU
10
1
10
2
10
3
10
4
10
0
10
5
10
10
matrix dimension n
c
o
n
d
i
t
i
o
n

n
u
m
b
e
r

K
K
(1)
tilde K
(1)
(a) d = 1, = 1/2, rst order dierence lter
10
1
10
2
10
3
10
4
10
0
10
5
10
10
10
15
10
20
matrix dimension n
c
o
n
d
i
t
i
o
n

n
u
m
b
e
r

K
K
(2)
tilde K
(2)
(b) d = 1, = 3/2, second order dierence lter
10
2
10
4
10
6
10
0
10
2
10
4
10
6
10
8
matrix dimension m
c
o
n
d
i
t
i
o
n

n
u
m
b
e
r

K
K
[1]
(c) d = 2, = 1, Laplace lter once
Fig. 4.1. Condition numbers of K (both unltered and ltered) as the matrix size m varies.
random eld in R
2
with the covariance rule
k(x; ) =
_

2 r
x;
_

2 r
x;
_
, r
x;
=

x
2
1

2
1
+
x
2
2

2
2
,
where = 1,

= [7, 10], and the observation locations x were on a two-dimensional

regular grid of spacing = 100/n. Then we solved the nonlinear system (1.3) by
using the observation vector y and obtained an estimate

N
. For dierent grid sizes
n (matrix size m = n
2
) the computational times were recorded and the accuracies
of the estimates compared to the exact maximum likelihood estimates

(in terms of
condence interval as derived from (1.4)) were compared.
We have noted that the condition number of K grows faster than linearly in
m. Therefore, we instead solved a nonlinear system other than (1.3) to obtain the
estimate

N
. We applied the Laplace operator to the sample vector y once and
obtained a vector y
[1]
. Then we solved the nonlinear system
(y
[1]
)
T
(K
[1]
)
1
K
[1]

(K
[1]
)
1
(y
[1]
) +
1
N
N

j=1
u
T
j
_
(K
[1]
)
1
K
[1]

_
u
j
= 0, (4.1)
where the u
j
s are as in (1.3). This approach is equivalent to estimating the parameter
DIFFERENCE FILTERS FOR COVARIANCE MATRICES 17
0 100 200 300 400
10
10
10
5
10
0
10
5
iteration
r
e
s
i
d
u
a
l

K
K preconditioned
K
[1]
K
[1]
preconditioned
Fig. 4.2. Convergence history of BPCG.
from the sample vector y
[1]
with covariance K
[1]
. The matrix K
[1]
is guaranteed to
have a bounded condition number for all m according to Theorem 3.1.
The simulation was performed on a Linux desktop with 16 cores with 2.66 GHz
frequency and 32 GB of memory. The nonlinear equation (4.1) was solved by using
the Matlab command fsolve, which by default used the trust-region dogleg algo-
rithm. Results are shown in Figure 4.3. As we would expect, as the number m of
observations increases, the estimates

N
tend to become closer to

which generated
the simulation data. Furthermore, despite the fact that N = 100 is xed as m in-
creases, the condence intervals for

N
become increasingly narrow as m increases,
which suggests that it may not be necessary to let N increase with m to insure that
the simulation error

is small compared to the statistical error

. Finally,
as expected, the running time of the simulation scales roughly O(m), which shows
promising practicality for running simulations on much larger grids than 10241024.
10
4
10
6
6
7
8
9
10
11
matrix dimension m

2
(a) Est. parameters with condence interval.
10
4
10
6
10
1
10
2
10
3
10
4
10
5
matrix dimension m
t
i
m
e

(
s
e
c
o
n
d
s
)
64x64 grid
2.56 mins
func eval: 7
128x128 grid
6.62 mins
func eval: 7
256x256 grid
1.1 hours
func eval: 8
512x512 grid
2.74 hours
func eval: 8
1024x1024 grid
11.7 hours
func eval: 8
(b) Running time versus matrix dimension m.
Fig. 4.3. Simulation results of the maximum likelihood problem.
5. Further numerical exploration. This section describes additional numer-
ical experiments. First we consider trying to reduce the condition number of our
matrices by rescaling them to be correlation matrices. Specically, for a covariance
18 M. L. STEIN, J. CHEN, M. ANITESCU
matrix K, the corresponding correlation matrix is given by
C = diag(K)
1/2
K diag(K)
1/2
.
Although C is not guaranteed to have smaller condition number than K, in practice
it often will. For observations on a regular grid and a spatially invariant lter, which
is the case in 3, all diagonal elements of K are equal, so there is no point in rescaling.
For irregular observations, rescaling does make a dierence. For all of the settings
considered in 2, the ratio of the biggest to the smallest diagonal elements of all of the
covariance matrices considered is bounded. It follows that all of the theoretical results
in that section on bounded condition numbers apply to the corresponding correlation
matrices.
10
1
10
2
10
3
10
4
10
0
10
5
10
10
matrix dimension n
c
o
n
d
i
t
i
o
n

n
u
m
b
e
r

K
(2)
C
(2)
tilde K
(2)
tilde C
(2)
(a) d = 1, = 3/2, second order dierence lter
10
1
10
2
10
3
10
4
10
0
10
5
10
10
10
15
matrix dimension n
c
o
n
d
i
t
i
o
n

n
u
m
b
e
r

K
K
(1)
K
(2)
(b) d = 1, = 1, two lters
10
1
10
2
10
3
10
4
10
0
10
5
10
10
10
15
10
20
matrix dimension n
c
o
n
d
i
t
i
o
n

n
u
m
b
e
r

K
K
(1)
K
(2)
(c) d = 1, = 2, two lters
10
2
10
4
10
6
10
0
10
2
10
4
10
6
10
8
matrix dimension m
c
o
n
d
i
t
i
o
n

n
u
m
b
e
r

K
tilde K
[1]
tilde C
[1]
(d) d = 2, = 1, augmented Laplace lter once
Fig. 5.1. Condition numbers of covariance matrices and correlation matrices.
Figure 4.1(b) shows that the ltered covariance matrices

K
(2)
have much larger
condition numbers than does K
(2)
. This result is perhaps caused by the full rank
transformation

L
(2)
that makes the (0, 0) and (n, n) entry of

K
(2)
signicantly dierent
from the rest of the diagonal. For the same setting, Figure 5.1(a) shows that diagonal
rescaling yields much improved resultsthe correlation matrix

C
(2)
has a condition
number much smaller than that of

K
(2)
and close to that of K
(2)
.
Theorems 2.1 and 2.3 indicate the possibility of reducing the condition number
of the covariance matrix for spectral densities with a tail similar to [[
p
for even
DIFFERENCE FILTERS FOR COVARIANCE MATRICES 19
p by applying an appropriate dierence lter. A natural question is whether the
dierence lter can also be applied to spectral densities whose tails are similar to [[
to some negative odd power. Figures 5.1(b) and 5.1(c) show the ltering results for
[[
3
and [[
5
, respectively. In both plots, neither the rst nor the second order
dierence lter resulted in a bounded condition number, but the condition number
of the ltered matrix is greatly reduced. This encouraging result indicates that the
ltering operation may be useful for a wide range of densities (e.g., all Matern models)
that behave like [[
p
at high frequencies, whether or not p is an even integer.
For processes in d > 1 dimension, our result (Theorem 3.1) requires a transfor-
mation L
[]
that reduces the dimension of the covariance matrix by O(n
d1
). One
may want to have a full rank transformation or some transformation that reduces the
dimension of the matrix by at most O(1). We tested one such transformation here
for a R
2
example, which reduced the dimension by four. The transformation

L
[1]
is
dened as follows. When j is not on the boundary, namely, 1 j n 1,

L
[1]
(j, l) =
_

_
4, l = j
2, l = j + (e
p
), p = 1, 2
1, l = j +
_
1
1

0, otherwise.
When j is on the boundary but not at the corner, the denition of

L
[1]
(j, l) is exactly
the same as above, but only for legitimate l, that is, components of l cannot be smaller
than 0 or larger than n. The corner locations are ignored. The condition numbers of
the ltered covariance matrix

K
[1]
=

L
[1]
K

L
[1]
T
and those of the corresponding corre-
lation matrix

C
[1]
are plotted in Figure 5.1(d), for the same covariance function used
in Figure 4.1(c). Indeed, the diagonal entries of

K
[1]
corresponding to the boundary
locations are not too dierent from those not on the boundary; therefore, it is not
surprising that the condition numbers for

K
[1]
and

C
[1]
look similar. It is plausible
that the condition number of

K
[1]
is bounded independent of the size of the grid.
6. Conclusions. We have shown that for stationary processes with certain spec-
tral densities, a rst/second order dierence lter can precondition the covariance
matrix of irregularly spaced observations in one dimension, and the discrete Laplace
operator (possibly applied more than once) can precondition the covariance matrix
of regularly spaced observations in high dimension. Even when the observations are
located within a xed domain, the resulting ltered covariance matrix has a bounded
condition number independent of the number of observations. This result is particu-
larly useful for large scale simulations that require the solves of the covariance matrix
using an iterative method. It remains to investigate whether the results for high
dimension can be generalized for observation locations that are irregularly spaced.
REFERENCES
[1] M. Anitescu, J. Chen, and L. Wang, A matrix-free approach for solving the Gaussian pro-
cess maximum likelihood porblem, Tech. Rep. ANL/MCS-P1857-0311, Argonne National
Laboratory, 2011.
[2] O. E. Barndorff-Nielsen and N. Shephard, Econometric analysis of realized covariation:
High frequency based covariance, regression, and correlation in nancial economics, Econo-
metrica, 72 (2004), pp. 885925.
[3] J. Barnes and P. Hut, A hierarchical O(N log N) force-calculation algorithm, Nature, 324
(1986), pp. 446449.
20 M. L. STEIN, J. CHEN, M. ANITESCU
[4] R. H.-F. Chan and X.-Q. Jin, An Introduction to Iterative Toeplitz Solvers, SIAM, 2007.
[5] J. Chil` es and P. Delfiner, Geostatistics: Modeling Spatial Uncertainty, Wiley, New York,
1999.
[6] Z. Duan and R. Krasny, An adaptive treecode for computing nonbounded potential energy in
classical molecular systems, J. Comput. Chem., 23 (2001), pp. 15491571.
[7] A. C. Faul, G. Goodsell, and M. J. D. Powell, A Krylov subspace algorithm for multi-
quadric interpolation in many dimensions, IMA Journal of Numerical Analysis, 25 (2005),
pp. 124.
[8] I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Products, Academic Press,
Orlando, seventh ed., 2007.
[9] L. Greengard and V. Rokhlin, A fast algorithm for particle simulations, J. Comput. Phys.,
73 (1987), pp. 325348.
[10] N. A. Gumerov and R. Duraiswami, Fast radial basis function interpolation via precondi-
tioned Krylov iteration, SIAM J. Sci. Comput., 29 (2007), pp. 18761899.
[11] I. A. Ibragimov and Y. A. Rozanov, Gaussian Random Processes, Springer-Verlag, New
York, 1978.
[12] D. P. OLeary, The block conjugate gradient algorithm and related methods, Linear Algebra
Appl., 29 (1980), pp. 293322.
[13] C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning, MIT Press, Cam-
bridge, Massachusets., 2006.
[14] H. Rue and L. Held, Gaussian Markov Random Fields: Theory and Applications, Chapman
& Hall/CRC, Boca Raton, FL, 2005.
[15] M. Stein, Interpolation of Spatial Data: Some Theory for Kriging, Springer, New York, 1999.
[16] M. L. Stein, Fixed domain asymptotics for spatial periodograms, Journal of the American
Statistical Association, 90 (1995), pp. 12771288.
[17] , Equivalence of Gaussian measures for some nonstationary random elds, Journal of
Statistical Planning and Inference, 123 (2004), pp. 111.
[18] P. Whittle, On stationary processes in the plane, Biometrika, 41 (1954), pp. 434449.
The submitted manuscript has been created by the University of
Chicago as Operator of Argonne National Laboratory (Argonne)
under Contract No. DE-AC02-06CH11357 with the U.S. Depart-
ment of Energy. The U.S. Government retains for itself, and others
acting on its behalf, a paid-up, nonexclusive, irrevocable world-
wide license in said article to reproduce, prepare derivative works,
distribute copies to the public, and perform publicly and display
publicly, by or on behalf of the Government.

Logit and Probit Models
50% (2)
Logit and Probit Models
11 pages
An Introduction To Nonlinear Filtering
No ratings yet
An Introduction To Nonlinear Filtering
23 pages
Assignment 7
No ratings yet
Assignment 7
2 pages
Fast Adaptive Eigenvalue Decomposition A Maximum Likelihood Approach
No ratings yet
Fast Adaptive Eigenvalue Decomposition A Maximum Likelihood Approach
4 pages
Low Complexity Adaptive Algorithm For Generalized Eigenvalue Decomposition
No ratings yet
Low Complexity Adaptive Algorithm For Generalized Eigenvalue Decomposition
4 pages
IntroBayesTimeSeries2
No ratings yet
IntroBayesTimeSeries2
73 pages
Adaptive Detection With Training Data in Partially Homogeneous
No ratings yet
Adaptive Detection With Training Data in Partially Homogeneous
8 pages
DynamicFactorModelsFactorAugmentedVectorAutoregressions
No ratings yet
DynamicFactorModelsFactorAugmentedVectorAutoregressions
43 pages
2021 Week 5 Chapter3 Control
No ratings yet
2021 Week 5 Chapter3 Control
7 pages
STAT 3008 Applied Regression Analysis Tutorial 2 - Term 2, 2019 20
No ratings yet
STAT 3008 Applied Regression Analysis Tutorial 2 - Term 2, 2019 20
2 pages
chow1969
No ratings yet
chow1969
11 pages
s m s t c Lecture Notes Lecture5
No ratings yet
s m s t c Lecture Notes Lecture5
14 pages
Density Estimation Using Real NVP
No ratings yet
Density Estimation Using Real NVP
32 pages
CCFG 01 R 7 B
No ratings yet
CCFG 01 R 7 B
69 pages
Gec410 Note Vi
No ratings yet
Gec410 Note Vi
50 pages
Giannakis 1989
No ratings yet
Giannakis 1989
5 pages
Jeff 18
No ratings yet
Jeff 18
126 pages
New Method of On-Line Estimation of Noise Coy Ariances and R
No ratings yet
New Method of On-Line Estimation of Noise Coy Ariances and R
6 pages
ASNE94 Paper PDF
No ratings yet
ASNE94 Paper PDF
6 pages
Analysis of Credit Card Fraud Detection Methods
No ratings yet
Analysis of Credit Card Fraud Detection Methods
3 pages
Efficient Estimation of Conditional Covariance Matrices For Dimension Reduction - Solís, Loubes, Marteau - JDS, Bruxelles 2012
No ratings yet
Efficient Estimation of Conditional Covariance Matrices For Dimension Reduction - Solís, Loubes, Marteau - JDS, Bruxelles 2012
44 pages
Relative Performance of Expected and Observed Fisher Information in Covariance Estimation for Maximum Likelihood Estimates
No ratings yet
Relative Performance of Expected and Observed Fisher Information in Covariance Estimation for Maximum Likelihood Estimates
6 pages
A Review On Generative Adversarial Networks: Algorithms, Theory, and Applications
No ratings yet
A Review On Generative Adversarial Networks: Algorithms, Theory, and Applications
28 pages
Classes of Kernels For Machine Learning: A Statistics Perspective
No ratings yet
Classes of Kernels For Machine Learning: A Statistics Perspective
14 pages
9e6cbf4ac9c3320e4e9d5402ab7ac5eb_MIT14_384F13_rec7
No ratings yet
9e6cbf4ac9c3320e4e9d5402ab7ac5eb_MIT14_384F13_rec7
6 pages
Stochastic Reserving - Stochastic Reserving - Mack and Bootstrapping Mack and Bootstrapping (Slides) - Dave Clark
No ratings yet
Stochastic Reserving - Stochastic Reserving - Mack and Bootstrapping Mack and Bootstrapping (Slides) - Dave Clark
12 pages
Chap2 PDF
No ratings yet
Chap2 PDF
15 pages
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
No ratings yet
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
9 pages
Environmental Data Sets With Below Detection Limit Observations
No ratings yet
Environmental Data Sets With Below Detection Limit Observations
27 pages
牛颖Introduction to M-estimator
No ratings yet
牛颖Introduction to M-estimator
4 pages
Likelihood Functions For State Space Models With Diffuse Initial Conditions
No ratings yet
Likelihood Functions For State Space Models With Diffuse Initial Conditions
26 pages
Local Linear Regression For Functional Data: Alain Berlinet, Abdallah Elamine, André Mas Université Montpellier 2
No ratings yet
Local Linear Regression For Functional Data: Alain Berlinet, Abdallah Elamine, André Mas Université Montpellier 2
23 pages
Esqu Ivel, Manuel L. (P-NULST-DM2) : Discuss. Math. Probab. Stat. No. 2
No ratings yet
Esqu Ivel, Manuel L. (P-NULST-DM2) : Discuss. Math. Probab. Stat. No. 2
8 pages
BD Solutions PDF
100% (3)
BD Solutions PDF
31 pages
Quality of Life
No ratings yet
Quality of Life
16 pages
Muxim_PL_Stationary
No ratings yet
Muxim_PL_Stationary
19 pages
Maximum Likelihood Estimates of Linear Dynamic Systems (1965)
No ratings yet
Maximum Likelihood Estimates of Linear Dynamic Systems (1965)
14 pages
An Adaptive Unscented Kalman Filter Approach For State Estimation of Nonlinear Continuous-Discrete System
No ratings yet
An Adaptive Unscented Kalman Filter Approach For State Estimation of Nonlinear Continuous-Discrete System
4 pages
METULecture 1
No ratings yet
METULecture 1
15 pages
On The Simlation and Estimation of The MR OU Process With MATLAB
No ratings yet
On The Simlation and Estimation of The MR OU Process With MATLAB
19 pages
ch4 KFderiv
No ratings yet
ch4 KFderiv
22 pages
Stat Fit
No ratings yet
Stat Fit
135 pages
Chapter 3 (PR)
No ratings yet
Chapter 3 (PR)
29 pages
A Step by Step Mathematical Derivation A
No ratings yet
A Step by Step Mathematical Derivation A
32 pages
GSVD and Its Applications in Model Analy
No ratings yet
GSVD and Its Applications in Model Analy
15 pages
(Statistics94.Blogfa.com)Solutions to Selected Problems in Brockwell and Davis
No ratings yet
(Statistics94.Blogfa.com)Solutions to Selected Problems in Brockwell and Davis
31 pages
Oooo
No ratings yet
Oooo
6 pages
Estimation of A Common Multivariate Normal Mean Vector
No ratings yet
Estimation of A Common Multivariate Normal Mean Vector
11 pages
A Well-Conditioned Estimator For Large-Dimensional Covariance Matrices
No ratings yet
A Well-Conditioned Estimator For Large-Dimensional Covariance Matrices
47 pages
EE364a Homework 6 Solutions: I 1,..., K I I I
No ratings yet
EE364a Homework 6 Solutions: I 1,..., K I I I
20 pages
L6 - Kalman Filter
No ratings yet
L6 - Kalman Filter
15 pages
Nonlinear Estimation For Linear Inverse Problems With Error in The Operator
No ratings yet
Nonlinear Estimation For Linear Inverse Problems With Error in The Operator
27 pages
The Functional Central Limit Theorem and Testing For Time Varying Parameters
No ratings yet
The Functional Central Limit Theorem and Testing For Time Varying Parameters
38 pages
Extropy Estimation of Weibull Distribution Under Upper Records
No ratings yet
Extropy Estimation of Weibull Distribution Under Upper Records
7 pages
1981 Estimating the Dimension of a Linear-model_j. Andel, m. g. Perez and a. i. Negrao
No ratings yet
1981 Estimating the Dimension of a Linear-model_j. Andel, m. g. Perez and a. i. Negrao
12 pages
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
No ratings yet
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
22 pages
SolutionsManual MCstyle 2018
No ratings yet
SolutionsManual MCstyle 2018
40 pages
0 Inference for Diffusion Processes
No ratings yet
0 Inference for Diffusion Processes
20 pages
UDCovariance Factorizationfor Unscented Kalman Filterusing Sequential Measurements Update
No ratings yet
UDCovariance Factorizationfor Unscented Kalman Filterusing Sequential Measurements Update
9 pages
Variational Problems in Machine Learning and Their Solution With Finite Elements
No ratings yet
Variational Problems in Machine Learning and Their Solution With Finite Elements
11 pages
03 Spring Final Soln
No ratings yet
03 Spring Final Soln
3 pages
Industrial Mathematics Institute: Research Report
No ratings yet
Industrial Mathematics Institute: Research Report
25 pages
State-Space Models. ML Estimation. DSGE Models. Examples of State-Space Models (Cont.)
No ratings yet
State-Space Models. ML Estimation. DSGE Models. Examples of State-Space Models (Cont.)
9 pages
CampusX DSMP 2.0 Syllabus
No ratings yet
CampusX DSMP 2.0 Syllabus
66 pages
Dynamic Factor Models M Watson
No ratings yet
Dynamic Factor Models M Watson
43 pages
Arfima
No ratings yet
Arfima
20 pages
Solutions For Practice Set
No ratings yet
Solutions For Practice Set
7 pages
UT Dallas Syllabus For Ee6343.001.07f Taught by Naofal Al-Dhahir (Nxa028000)
No ratings yet
UT Dallas Syllabus For Ee6343.001.07f Taught by Naofal Al-Dhahir (Nxa028000)
6 pages
Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman
No ratings yet
Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman
19 pages
Observers and Kalman Filters: CS 393R: Autonomous Robots
No ratings yet
Observers and Kalman Filters: CS 393R: Autonomous Robots
37 pages
Dynamical Equations For Optimal Nonlinear Filtering: DX F (X, T) DT + W2 (X, T) DZ
No ratings yet
Dynamical Equations For Optimal Nonlinear Filtering: DX F (X, T) DT + W2 (X, T) DZ
12 pages
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
No ratings yet
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
23 pages
Solution 3 Problem 1: Let X
No ratings yet
Solution 3 Problem 1: Let X
12 pages
ERDEM, Tulin. An Empirical Analysis of Umbrella Branding
No ratings yet
ERDEM, Tulin. An Empirical Analysis of Umbrella Branding
14 pages
CIN4111200512 Actuarial Statistics III (Stochastic Modelling)
No ratings yet
CIN4111200512 Actuarial Statistics III (Stochastic Modelling)
3 pages
Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
No ratings yet
Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
14 pages
Applied Math
No ratings yet
Applied Math
23 pages
Automatica: Guangchen Wang Hua Xiao Guojing Xing
No ratings yet
Automatica: Guangchen Wang Hua Xiao Guojing Xing
6 pages
QRCM
No ratings yet
QRCM
7 pages
R300 Solution Guide 2018M
No ratings yet
R300 Solution Guide 2018M
8 pages
Presentation 2
No ratings yet
Presentation 2
39 pages
NUS ST2334 Lecture Notes
No ratings yet
NUS ST2334 Lecture Notes
56 pages
(Adaptive Computation and Machine Learning) Daphne Koller - Nir Friedman - Probabilistic Graphical Models - Principles and PDF
No ratings yet
(Adaptive Computation and Machine Learning) Daphne Koller - Nir Friedman - Probabilistic Graphical Models - Principles and PDF
1,270 pages
Practical Guide To Logistic Regression - Joseph M. Hilbe (2017)
100% (1)
Practical Guide To Logistic Regression - Joseph M. Hilbe (2017)
170 pages
Deep Reinforcement Learning PDF
No ratings yet
Deep Reinforcement Learning PDF
150 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Understanding Vector Calculus: Practical Development and Solved Problems
From Everand
Understanding Vector Calculus: Practical Development and Solved Problems
Jerrold Franklin
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Stein 2011 DiffFilter

Uploaded by

Stein 2011 DiffFilter

Uploaded by

Preprint ANL/MCS-P1888-0511

DIFFERENCE FILTER PRECONDITIONING FOR LARGE

, AND MIHAI ANITESCU

Department of Statistics, University of Chicago, Chicago, IL 60637. Email:

f() exp(ix) d, f() =

k(x) exp(ix) dx.

has variance of approximately V

has variance at least

is a valid covariance function)

) has a bounded condition number, then so does

is used, and one can compute the

) according to the denition of B

) is bounded away from

) has a bounded condition number.

(0) = 0. When = 0 and = 4,

(a). First, note that when [, ]

is the modied Bessel function of the second

= [7, 10], and the observation locations x were on a two-dimensional

You might also like