Spectrum Estimation Techniques For Characterization and Development of WT4 Waveguide (1977) (David J. Thomson)
Spectrum Estimation Techniques For Characterization and Development of WT4 Waveguide (1977) (David J. Thomson)
I. INTRODUCTION
The problem of estimating the spectrum of a stationary time series
has appeared frequently in the scientific literature and myriad ap-
proaches have been suggested. Nonetheless it became apparent during
the course of the development of the WT4 waveguide system that these
methods were inadequate for many of the data sets of interest. The
techniques presented here were therefore developed.
It is commonly stated that the method selected to estimate a spectrum
depends on the ultimate use of the estimate, and unfortunately to some
extent this is true. The method described below is felt to represent an
advance in that the basic technique works well in a variety of cases which
previously would have required individual treatment. The loss calcu-
lations reported in Anderson et al. 1 are indicative of its accuracy.
The procedure which has evolved for estimating spectra can best be
described as robust adaptive prewhitening. Such methods have three
distinct stages: formation of a pilot spectrum estimate, using this esti-
mate to design a prewhitening filter, and finally giving the result as the
ratio of the spectrum of the filtered data to the power transfer function
of the filter. This method is potentially both efficient and robust. The
1769
efficiency of a statistical estimation procedure is the fraction of the in-
2
formation, in the sense of Fisher, conveyed by the estimate about the
parameter being estimated to the total information on this parameter
inherent in the data. An estimation procedure is robust if it remains
efficient over a wide range of conditions and is relatively immune to a
small fraction of outlying or erroneous data.
For the sequential method described here to be efficient, the pilot
estimate must be designed to have a large dynamic range at the expense
of frequency resolution. The second spectrum estimate, which works on
the filtered data, uses the opposite choice and so is chosen on the basis
of frequency resolution. This can be done without incurring a large
penalty in loss of effective dynamic range as this information, acquired
by the pilot estimate, has been transferred to the filter specification. In
one meaning of the term this procedure is robust in that it can normally
handle situations where either estimate alone would fail. By using a
nonlinear filter for the prewhitening operation the procedure may also
7
sample eigenvalues. By the Szego theorem (see Grenander and Szego )
this comparison is asymptotically equivalent to comparisons on the
spectrum at a frequency spacing of 1/T. This agrees with the conven-
tional Rayleigh resolution and spectrum estimate with
heuristically a
this resolution and low bias is likely to be efficient. This argument pro-
vides the motivation for the present technique. Simple data windows
with frequency resolution close to 1/T do not provide enough bias pro-
tection. Moreover this is not just a result of not having chosen the right
"simple" data window but the result of fundamental characteristics of
8
the Fourier transform (see Landau and Pollak ). Data windows like the
47r prolate spheroidal wave function which provide the protection from
bias have frequency resolution on the order of 4/T and so are inefficient
from this viewpoint. It must be emphasized that the sequential approach
used here potentially has both limitations since it cannot resolve details
spaced by 1/T in frequency when their levels are more than 4 or 5 decades
apart. On the other hand if the spectrum is not quite so pathological and
varies "slowly" over 10 to 15 decades then the method can provide fre-
quency resolutions approaching 1/T with relatively low bias.
For the remainder of this paper we assume that the available data is
a sequence of samples \x t = 0, 1—L, and that the sampling interval
t \,
Large data sets can be tested for stationarity using the method de-
scribed inThomson. 12 Briefly the approach compares the different
subset estimates, Sj (o>), using Bartlett's M statistic for heteroscedast-
sequence using the previous p data points as a base for the prediction.
When the autoregressive model is correct the residual sequence will be
serially uncorrelated and have a white spectrum. When the data contains
outliers the effect of such filtering is to contaminate the p residuals
following each erroneous point.
The robust filter algorithm is a nonlinear procedure based on an au-
toregressive model which is designed to reduce the effects of occasional
outliers. The output of this filter or the modified data sequence is an
estimate of the uncontaminated process. This sequence is formed by
comparing successive input data points with the value predicted from
the modified sequence. In regions where the prediction errors are "small"
relative to the innovations variance, the modified sequence is essentially
a copy of the input data. When the prediction errors are "large," the
corresponding points of the modified data sequence are the predictions
rather than the data and for intermediate prediction errors the behavior
depends on a weight function. When the modified data sequence is used
as a basis for the final estimate of spectra, the prediction error sequence
is the difference between the predictions and the value of the modified
must be drawn between rejecting some valid data and accepting occa-
sional errors and, in the robust filter algorithm, this compromise is re-
flected in the choice of weight function. In Section VII a weight function
motivated by the normal extreme value distribution which has both
intuitive appeal and desirable mathematical properties is described.
2.7 Smoothing
In this definition the data, x, is defined on the domain [0, T], co is radian
frequency, and D is a data window or taper. The data window is nor-
malized according to the convention
T
D 2 (t)dt = 1 (2)
/.o
so that the resulting spectrum is interpretable in physical units.
Almost all of the published estimates of spectra are either direct es-
timates, smoothed direct estimates, or rational fits to direct estimates.
When D is constant §d is the periodogram. Smoothing the extended
periodogram* with appropriate weights corresponds to the various in-
direct estimates. Similarly an autoregressive or "maximum entropy"
estimate may be regarded as an all-pole rational fit to the extended
periodogram and Pisarenko 26 estimates constitute a generalization of
* In the simple periodogram estimates are computed at a frequency spacing of 1/T and
T
E{Sd(w)| = *f f ei ^ t
-^D(,t)D(.u)K\x(t)x(.u)\dtdu (3)
t The Capon 27 estimate, while superficially similar, is intended for estimating the
magnitude of periodic components in a background of a known covariance structure.
Ru(t) = -
I
—— —\T\ JO
f
T
x(t)x(t + \t\) dt (9)
are often used incorrectly to describe the direct estimate, Sd(o>). Except
for their first moments these two estimates have few properties in
common: one very important difference is that the direct estimate is
positive while the "equivalent" indirect form need not be. Also, because
their common spectral window enters the estimate in fundamentally
different ways, the variances of the two estimates are different.
£B(o>)=|£t(co-r)5(r)df| 2 (ID
where x (to) is the spectral representation of x (see Doob, 28 chapter 10).
By the Cauchy inequality this bias may be bounded so that
£b(«) <
£ ^ |*(«
- 2
Ol df y£ |D(a, - f)|2 dt; (12)
The first factor of this inequality depends only on the process and, as
the integrand is simply bounded by adding the integral from
is positive,
to — ft to w+
and identifying the result using Parseval's theorem. The
ft
second factor in the inequality depends only on the data window, D, and
where a 2is the sample variance, c = ftT/2, and X o(c) is the largest ei-
—r—
. . r l sinc(t — s) . . . . ,, ..
u
In Thomson et al. empirical studies show that direct estimates using
this window are generally superior to several other spectral estimates
in common Other examples are contained in Thomson. 5 Windows
use.
using approximations to prolate spheroidal wave functions have been
37
described by Kaiser, 35 Eberhard, 36 and in fact the Parzen window can
be considered as a fourth-order successive approximation to the Air
prolate window.
Figure 1 shows the 4tt prolate data window (and several other windows
described below) and the low weighting given near the ends of the data
are evident. The corresponding spectral windows are shown in Fig. 2,
and here the reason for using the 4ir prolate taper in situations where
the spectrum varies over large ranges is most evident. The frequency
scale of this plot has been normalized to units of 1/T so that by a fre-
quency of 4/T the spectral window corresponding to the 4ir taper has
decayed by more than 10 decades. It should be noted that the curves for
the other windows represent envelopes of the spectral windows. The
actual spectral windows are similar to that shown for the compound
prolate window and decay an oscillatory manner.
in
When the range of the spectrum is known to be small it is clear that
sin to i/z \
/ sina>772\2
\ wT/2 /
spectral window, Fig. 2, has a narrower center lobe than the Air window
and the sidelobes decay much faster than those of the default win-
dow.
Unfortunately the first few sidelobes of this window are too high for
it to be usable in many applications where accurate estimates of the fine
less extreme than the 4ir taper but distinctly different from the spliced
cosine form. From the plot of the spectral windows it can be seen that
the main lobe of the compound window is almost as narrow as that of the
spliced cosine and also that the first sidelobes are down 27 dB instead
of 13 dB. Since the widths of the main lobes are all very close to the same
width, this gain in performance is essentially free and results from the
superior characteristics of the prolate functions. It might also be men-
tioned that the usual objection to the use of the prolate spheroidal wave
functions, namely that they are "impossible" to compute, is false and
that by using Horner's rule together with the expansion given in Flam-
mer, 38 Section 3.2, they may be computed very rapidly. Appendix A gives
expansion formulae for the -w and 4tv prolate data windows.
In anticipation of Section VI it is also interesting to compare the bias
of the estimates of autocorrelation obtained by transforming the various
spectrum estimates. From eq. (7) it is apparent that such estimates of
the autocorrelation function at lag r will be biased by the factor Ld(t).
These lag windows are plotted in Fig. 3. From this figure it can be seen
that the bias imposed on the low-order autocorrelations by the win-
dowing techniques is much less than that resulting from the common
positive definite estimate [obtained by replacing the factor T — \t\ in
eq. (9) with T] corresponding to the simple extended periodogram. It
should be noted that if this factor is divided out the resulting unbiased
estimate is not positive definite and frequently results in negative
"prediction variances." For fitting autoregressive models, the low-order
autocorrelations are crucial and, as can be seen from the insert in Fig.
3, for t/T < 0.01 the bias obtained using the 4ir prolate window is lower
than that obtained from the extended periodogram on data sets 10 times
as long. The scale of such comparisons can be best judged by noting that
the one-step autocorrelation in the field evaluation test curvature data
is about 0.99983.
f
See Bogert, Healy, and Tukey 39 for definitions of these terms.
0.4 0.6
NORMALIZED LAG, T
(16)
1-A \ 1- A /
where both si and S2 have been standardized to unit level, /o is the usual
modified Bessel function, and A is the correlation between si and S2 given
below by eq. (18).
The characteristics of this distribution are most easily seen by con-
sidering the conditional distribution p(si\s2). For this distribution a
critical point is given by S2 = (1 — A)/A; at this point dp(si|s2)/dsi| S] =o
= which for lower values of S2 resembles the univariate distribution
and has its maximum at 0, while for larger values of S2 the mode ap-
proaches s 2-
Figure 4 shows plots of the conditional distribution for «2 = 0.5 and
1 and for values of A appropriate for the 4-k prolate window at frequency
S2 = 0.5
4w PROLATE WINDOW AT
CURVE A/ h-n-n
1 0.25/f 0.9775
2 o.50/r 0.9077
2
3 0.75/7" 0.8020
/ \ 4 1.00/7" 0.6740
*
z
1.0 ~^><^
s£^f —-^ \
5 2.00/7" 0.2012
o
u
0.5
I
>
S2 = 1
<
2.5
a
<
2.0
o
D
2
O _
o 1.5
1.0
—»__———*~~
— "~—-. .
/ "
4.
0.5
s^ Z,
2 1/
-I i i
1.5
—— -
2.0
0.5 1.0
tended periodogram) since, for the Ait window, the bias is only localized
within a band of ±4/T. In the more general case where the raw spectrum
estimates are correlated, smoothing over a fixed bandwidth is less ef-
fective and conventional smoothing techniques will be characterized by
fewer "equivalent degrees of freedom" than given by the usual estimate.
Most of the work on smoothing assumes that the true spectrum does not
vary appreciably over the width of the smoother and under this ap-
proximation the influence of smoothers on direct estimates is fairly
simple to evaluate.
To assess the effects of smoothing correlated spectrum estimates it
In this equation the first term is large only in the neighborhood of the
(20)
the convolution \D\ 2 *W but the variance does not correspond to the
usual interpretation of a spectral window in the literature on indirect
estimates. Using the above definitions the influence of a smoothing
operation, or lifter, may be described in the quefrency domain as a linear
so that the antespectrum, "Ed.w, or the spectrum of the smoothed
filter
spectrum estimate is the product of the antespectrum, Ed, and the
power transfer function of the lifter. The variance of the smoothed
spectrum estimate is the integral, over quefrency, of its antespectrum
2
so that the estimate § Dt w will have an approximately x distribution
with
1
rftir - 1
Sd(QWq]" (2D
2[ J^|1^(Q)|
equivalent degrees of freedom. For direct estimates the antespectrum
NORMALIZED QUEFRENCY, O
Fig. 6 —
Antespectra corresponding to various data windows (the antespectrum
spectrum of the spectrum estimate.
is the
*- (DHt)y') + Xy = (22)
dt
k
LU
S 30
10
2 4 6 8 10
NORMALIZED HALF-WIDTH OF SMOOTHER IN UNITS OF MT
(23)
I
Zir %J
The first term of this expression is large near the frequency origin but
elsewhere the second term dominates. Since D is an entire function it
is clear that the covariance between the two estimates is governed pri-
characteristics will result in the subsets being correlated for large values
of the offset b. The effect of this correlation is that averaging the different
subset estimates does not give the usual reduction of variance so that
the autoregressive model is unstable when only a few subsets are avail-
able. When the correlation between subsets is low the distribution of the
average of k subsets is nearly xlk-
When the spectrum is locally smooth estimates of this type depend,
in addition to the data window, on the two parameters T and b. The
length of the individual subsets depends primarily on the fine structure
of the process and will be discussed in Section V. The relative spacing
of subsets, however, depends largely on the choice of the data window
and in general there is an optimum spacing. Under the usual approxi-
mation that the true spectrum is locally constant or linear and that we
are interested in frequencies away from the origin, eq. (23) simplifies,
and the correlation between subsets becomes the square of the equivalent
lagwindow, Ld (o).
As a measure of effectiveness of this procedure, assume that sufficient
data is available to compute k subsets. Standardizing the local spectral
level to 1, the variance of the averaged estimate is
Mk(b) -±|"—
blVk+l (b)
—
Vk (b)\
(25)
=7 "
A/-(6) (26)
6 [T/b]
1 + 2 E
s=l
Hisb)
(i) Data points in serious error are tagged, either on the basis of vi-
\ 47T PROLATE
O 2 -
_
C = TT COMPOUND N.
C = TT PROLATE'"' ~~^~~^~~~-~^3lr
10% COSINE TAPER'' S5sa
- DEFAULT -^^^
1 1 1 1
f
Records of long range mouse data were coded to provide indications of hardware
malfunction.
readily calculable and is shown by the dashed line. As a check that this
removal is not distorting the spectrum of the actual distortions in the
tube the ratio, an F statistic, of the average of 10 estimates of the spec-
trum of the detrended vertical curvature to the spectrum of the hori-
Fig. 9 —
Example of a B-spline fit to waveguide elevation knots. The spline is order 4
witnKnots of multiplicity 4 at —0.1 and 90 meters, simple knots at 23.5 and 56.5 me-
ters.
mentioned above and use every other subset in the stationarity test.
A further advantage of the use of subsets is that a significant im-
provement in the accuracy of the pilot estimate can often be obtained
by combining the different subset estimates in a robust manner instead
of by the usual arithmetic average. Denoting the ordered subset estimates
by Sj((*)) with Si(co) < S2(<*>) < — < S«(co), a robust estimate S(a>) may
be formed as
V BWl 'A
-20
-40
-60
12 1
3
1
1
4
AXIAL DISTANCE IN
1
METERS
1
6
1
7 8
Fig. 10—Example of a spline mean value function. Data represent vertical curvature
output from a rotating-head mouse.
e; = ^7
3<k' (28)
=
k + l-k'
e*'
(transverse) filter generating the observed process from white noise. The convolution of
the moving average with itself gives the autocovariance function of the process.
LAG IN METERS
Fig. 11 —
Canonical moving average representation of vertical curvature gauge output.
Representation based on the average of 102 450-meter subsets.
t The discontinuities visible in this plot near 9 and 18 meters are due to the couplings
but due to the randomized guide lengths this effect is rapidly suppressed with increasing
separation.
described in Section VII. The use of the prediction error filter is therefore
conditioned on one's ability to estimate the parameters of an autoreg-
ressive model of the process and this problem is the subject of the present
section. Current work on autoregressive modeling uses two distinct ap-
proaches; direct solution of the Yule- Walker equations such as described
by Pagano, 57 Ulrych and Bishop, 58 Makhoul, 59 and spectral factorization
as described in Bhansali. 60 61 Following a brief review of these two ap-
-
01 = a{
1]
= Pi (32)
<r? = 1 - 0? (33)
0k+i = 4a - dW (36)
k)
In these equations the a) 's are the autoregressive or prediction coef-
ficients for k step prediction, the sequence is known as the partial
autocorrelation function, and a\ is the k step relative prediction error.
In the original Levinson algorithm the expansion
4- 1- E ctWpj (37)
obtained by substituting eq. (31) into (30) was used in place of eq. (36).
Analytically these equations are identical but the latter is both slower
and also has much poorer numerical properties than Durbin's form.
«J = "I
,)
(38)
Zi = -( Xi -Xi) (39)
<r 2
2
= — r*Sx («)e-2E5-i«*«o*»W- (43)
that is
/*
Ck = —1
I cos uk In |Sx (co)} do) (44)
important to notice that the series in eq. (42) does not include a
It is
c term because the constraint imposed by eq. (38) implies that Co defines
= exp
•i- T^ y In &(*)) do,] (45)
ix —
'
fie'
tM
(46)
inside it's radius of convergence, ie for fi < 1, rather than on the radius
of convergence as does the Wiener approach. With this modification one
obtains
timate is described by Davis and Jones77 except that their bias correction
-1
Periodogram estimates are distributed as xl so that E{S (o>)} = «>. While
this result is based on the Wiener spectral factorization method and so
applies to prediction using the entire past, it appears to give a good in-
dication of the behavior of autoregressive fits even for relatively compact
predictors. Further information on the effects of smoothing on the es-
timated innovations variance is available in Jones. 78
l — <t>k+\
Similarily the /z-step prediction variance, <s\ may be obtained from <r| +1
P(r)-l-f( + | (50)
attains its minimum. Within reasonable bounds the actual order selected
6.4 Alternatives
1
AUTOREGRESSIVE MODEL
TEST FOR
RESIDUALS » NORMALITY
I 1
DISTRIBUTION
RESIDUAL SPECTRUM OF DIFFERENTIAL
PHASE
1
J
CORRECT NONLINEAR
FOR SMOOTHING
PREWHITENING
1 1
CORRECT CORRECT
FOR FOR
GAUGE PREWHITENING
TRANSFER FN.
ESTIMATE
T T p \2
f UXn-t
p p 2
X
,2(p) =
n=l \
_ /
a f;
k=\
Oci
p)
jP) X
Xn+k
n+k ) )
I
+ f
n=p+l
=p+l
(x n
\
-t
k=l
a^Xn-k)'
/
(51)
yk = x k + vk (52)
Xn=t P)
<*i *n-k (53)
k=\
the square root of the prediction variance estimate, ffp, with the bias
correction given in Davis and Jones. 77 )
wn = W (54)
(^f**)
In the applications described here W
is an even function with W(0) =
The effect of this procedure is to leave the data unmodified where the
prediction errors are small and to replace the data with its prediction
at points where the prediction errors are gross. The action taken when
the prediction errors are near the expected extreme for the given sample
size depends on the weight function which will be discussed below. In
spectrum estimation applications the desired output is usually not the
filtered sequence but rather the prewhitening residuals
zn = xn - xn (56)
erally change the spectrum in complex ways. Because of this the weight
or influence function must be chosen in such a way that the spectral
content due to the induced nonlinearities is much less than that due to
the presence of errors in the data.
Several different weights have been used. Of these the best found to
date is a result of motivation by the extreme value distribution for dis-
tributions of exponential type (see Kendall and Stuart91 ) and is defined
by
in which
u = ^(l-^j (59)
In its most general form, use of the robust filter algorithm is alternated
with the model formation process as shown in Fig. 12. In this iterative
mode the output from the filter is used to generate a better autoregressive
model which is used to filter the data and so on. This kind of iterative
procedure has been used for some difficult data sets and was found to
converge to a stable estimate of spectrum very rapidly. Typically two
or three iterations are required on short series (for example, some dis-
tortions in individual tubes) where the range of the spectrum is very large
and the outliers are small relative to the scale of the process but large
compared to the scale of the innovations. With very large data sets, such
as thosefrom complete mode filter sections of the field trial (which av-
erage 80,000 data points), a single iteration has been used and found
satisfactory.
If one assumes that this iterative process has converged, it is possible
to describe the distortions introduced into the spectral density estimate.
At convergence the autoregressive parameters, 6tk, describe the estimated
process, \x], and are solutions of the Yule-Walker equations based on
7=1
E Hn-k$ = k - 1, ,P (60)
N —
'
.'
-10
10
NORMAL QUANTILES
Fig. 14 — WT4 field evaluation test horizontal curvature gauge output. Residuals from
a linear prediction error filter.
—
Fig. 15 WT4 field evaluation test horizontal curvature gauge output. Residuals from
a linear prediction error filter.
= S2 (o>)
S(o,) (61)
a k e —iwk
k=0
in which Sz (w)a direct estimate of the spectrum of the prewhitened
is
residuals [eq. (56)], and the denominator is the power transfer function
of the prediction error filter defined in eq. (39).
r— _• • • •
-1
<
3
Q
en
UJ
<r
o
h-
b
DC
Q-
Q
N
Q
m
<
a
2
<
i-
cn
z
<
_J
ir
Q
cc
O
-15 i
-5 5
NORMAL QUANTILES
Fig. 16 —WT4 field evaluation trial curvature gauge output. Prediction residuals from
the robust filter algorithm.
U= (1-*)(1 + *)
Computationally it is advantageous to rewrite the power series using
Horner's rule and for c = 4ir the expansion is:
2Ai
*.>-</
°°° (4 "' ) =V ((((((((((((((((((at
508125548147497r
+2.6197747176990866d - 11 U + 2.9812025862125737d -
U 10)
— ««<««(
=v^
Doo(ir,*)-V
+5.3476939016920851d -
U + 2.2654256220146656d - 09) U
11
+2.2902051859068017d - 01)
In the forms given here both functions have been normalized for use as
xt = ^1 -l; t = l,2,...,T
REFERENCES
1. J. C.Anderson et al, B.S.T.J., to be published.
2. R. A. Fisher, Statistical Methods and Scientific Inference (3d ed), Hafner Press,
1973.
3. M. Arato, "On the Sufficient Statistics for Stationary Gaussian Processes," Theory
Probab. AppL. 6 (1961), pp. 199-201.
4. P. Whittle, "Estimation and Information in Stationary Time Series," Arkiv For Ma-
termatik, 2 (1953), pp. 423-434.
5. D. J. Thomson, "Spectral Analysis of Short Series," thesis, Polytechnic Institute of
Brooklyn, 1971.
6. M. Loeve, Probability Theory, D. Van Nostrand, 1963.
7. U. Grenander and G. Szego, Toeplitz Forms and Their Applications, Univ. of Cal.
Press, 1958.
8. H. J. Landau and H. O. Pollak, "Prolate Spheroidal Wave Functions, Fourier Analysis
and Uncertainty— II." B.S.T.J., 40, No. 1 (January 1961), pp. 65-84.
9. K. 0. Dzhaparidze ana A. M. Yaglom, "Asymtotically Efficient Estimation of the
Spectrum Parameters of Stationary Stochastic Processes," Proc. Prague Symp.
on Asymptotic Statistics, 1 Prague: Charles Univ. Press, 1974.
,
67.
Hall, 1974.
N. Levinson, "The Wiener RMS Error Criterion in Filter Design and Prediction, Jour.
.««.-,
68
Math. Physics, 25 (1947), pp. 261-278. (Reprinted as Appendix B of Wiener. )
68. N. Wiener, Extrapolation, Interpolation and Smoothing of Stationary Time Series,
M.I.T. Press 1949.
69. J. Durbin, Distribution Theory for Tests Based on the Sample Distribution
Function,
SIAM, 1973.
70. F. L. Ramsey, "Characterization of the Partial Autocorrelation Function, Ann. Stat.,
90. D. F. Andrews et al., Robust Estimates of Location, Princeton Univ. Press, 197 A
91. M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, 1, New York:
Hafner,
1963.