Tukey, J. W. (1961) - Discussion, Emphasizing The Connection Between Analysis of Variance and Spectrum Analysis
Tukey, J. W. (1961) - Discussion, Emphasizing The Connection Between Analysis of Variance and Spectrum Analysis
Technometrics
Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/utch20
To cite this article: John W. Tukey (1961) Discussion, Emphasizing the Connection Between Analysis of Variance and
Spectrum Analysis, Technometrics, 3:2, 191-219
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose
of the Content. Any opinions and views expressed in this publication are the opinions and views of the
authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should
not be relied upon and should be independently verified with primary sources of information. Taylor
and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses,
damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection
with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
VOL. 3, No. 2 TECHSOMETRICS MAY, 1961
tend to make both the two papers and the general subject more understandable
to statisticians, particularly by relating spectrum analysis t’o statistical tech-
niques, and to fields of application, more widely familiar to them. Fortunately,
the connection between spectrum analysis and those aspects of the analysis
of variance which emphasize variance components is extremely close.
To make this connection evident, however, we shall have to analyze t,he
implications and foundations of our procedures and thinking in classical analysis
of variance more deeply than usual. It is fair to say that the spectrum analysis
of a single time series is just a branch of variance component analysis, but
only if one describes its main difference from the classical branches as a re-
quirement for explicit recognition of what is being done and why. In classicaI
(i.e. single-response analysis-of-variance) variance component analysis, one can
(and most of us do) analyze data quite freely and understandingly with little
thought about what is being done and why it is being done. This is, perhaps
unfortunately, not the case for the time series analysis branch of variance
component analysis.
Yobserved(Q
= $/fixed(t) + Yrandom(O
*
In this decomposition the “fixed” component8 is usually thought of as involving
only one, two, or perhaps three values of j, while, both most import,nntly and
DISCUSSION OF JENKINS AND PARZEN PAPERS 193
known formulas for I’he nvcragt: va111csof mca11 sq\laros arc, if all population
sizes are infinite:
since these averages will always he twice the variance of the population in the
corresponding cell.
When the underlying situation is at the other extreme, so that only variance
components should be considered, t,hen t’he labels upon the rows and columns
can wisely be regarded as purely arbitrary. This means that if the same “in-
dividual” were to appear as a row in each of two realizations of the same experi-
ment, the numbers labeling the two rows would be quite unrelated. Such lack
of relationship could be in the nat,ure of the situation, or could have been en-
forced by our insistence on a randomizat,ion of the row numbers, separately
for each realization, before the data was made available for analysis. But if
the labels are arbitrary, we cannot think of one cell, considered by itself, as
different from another. Similarly, t’here will be only four kinds of pairs of cells:
identical; in same column but not in same row; in same row but not in same
196 JOHN W. TUKEY
column; in different rows and columns. And the four corresponding average
square differences would have the following values:
ave (Yiir - yiiK)2 = 2a2
ave (yiik - ?/iJI$ = 2a2 + 2a;, + 2CT2,
ave (yiik - y,J = 2a* + 2&c + 2fJ;
ave (Ysik - YIJK)~ = 2a2 + 2& + 2a; + 2a:
Knowing either set of four quantities, either the 4 average squared differences,
2 2
or u, uC , uRI’ and CT”,,, the other set is very easily calculated.
Why then do we prefer the first set, since they are arithmetically equivalent?
It must be because of some matter of interpretation. And the interpretation
must involve not the realizations of a single experiment but the comparison
of two or more different experiments. In fact, we feel that, for example, the
sort of change of circumstances which halves or doubles a: while leaving c2,
2 , and ui unaffected is easier to understand than the sort which changes
Downloaded by [Michigan State University] at 14:13 12 January 2015
uRC
<me (Yiik - yiJK)2 without affecting its three fellows.
The prime criterion for selecting useful variance components is that we should
be more easily able to understand the changes in the situation which would change
some variance components while leaving others alone.
The formal similarities between the two pairs of mutually related variance-
component schemes, one for the replicated two-way table, and the other for
stationary periodic time series, are very striking, but the actual similarities
go deeper.
What are the simplest changes which we can contemplate making in a situa-
tion involving stationary periodic time functions? They are the results of such
simple linear operations as the result of passing an electrical voltage through
a simple circuit consisting of resistances, condensers, and inductances, or the
result of passing a mechanical motion through a simple linkage of springs,
masses, and dash pots. (Such processes occur, in particular, in almost every
physical or chemical measuring instrument.) Any such linear process will affect
t,he amplitude and phase of each harmonic in a characteristic way. If its effect
on a pure jth harmonic would be to multiply amplitude by 1 Li 1, then the
DISCUSSION OF JENKINS AND PARZEN PAPERS 197
the covariances are taken across the specification, from one realization to another,
WITH AN ENTIRE NEW SAMPLE OF ROWS AND COLUMNS IN
EACH REALIZATION:
WV {YiiS , YijkJ = CT2+ Ui, + U2, + Ut j
cov (Y,ik , yiirc1 = a;, + u; + 02, ,
cov (Yiik , y&TIC) = ai ,
COV {Yijk 7 YIjK} = 02, f
These covariances across the ensemble are quite analogous to the serial co-
variances in the time series case, which are given by
tifying persons, that, i = 3 should refer to a particular person, not to the third
row of some randomly arranged data array.
Yet in a situat,ion where a pure variance component approach is appropriate,
the process of randomly rearranging the rows of the data array generates what
we may t,hink of, without, doing too much violence to the situation, as a new
(but clearly not independent) repetition of t’he experiment. If we fix our eyes
on particular values of i, j, k, I, J, and K, consider all admissible rearrange-
ments of t,he data array, and tfhen average the simplest quadrature expressions,
we are led to suitable symmetric functions of the original data array which are
natural estimates of the covariances across the ensemble, provided the latter
are given an averaged interpretat’ion.
The usual practice in the spectrum analysis of a single stretch of time series
is entirely analogous to such a procedure. Let us, for example, consider esti-
mating cov (y, , y4). We have the original observations y1 , y2 , y:%, y4 , y5 , . * * .
The results of shifting the time origin, one unit at a time, and always dropping
observations at negative times, are first yZ , y3 , y4 , y5 , y8 , . . . , then y3 , y4 ,
Downloaded by [Michigan State University] at 14:13 12 January 2015
y3 , Ye , Y7 , * . . and so on. The pairs (y, , y4), (ya , ~4, (y,, , YA, . . * (yl , yll+4
are “equivalent” (either because stat,ionarity is assumed or because we want
an averaged covariance) and we can calculate a “sample” covariance from these
pairs. Such processes of imitating the sought-for covariance across the ensemble
wit’h a sample “covariance” wandering around the data pattern are inevitable
when only a single realization is available, be it in an analysis-of-variance
situation or a time series situation.
(In the t.ime series situation, if and when we look more deeply into the details
of the situation, we may find that the averages of squares of differences indeed,
as Jowett has suggested [1955, 1957, 195S], have real advantages over co-
variances, insofar as problems associated with trends and very low frequencies
are concerned. But this is for the future Do reveal.)
out t,hat probably t,he most important aspects of cross-spectrum analysis are
cases of (complex-valued, frequency-dependent) regression analysis in which
the analog of a regression coefficient is the ratio of a (complex-valued) cross-
spectrum density to a spectrum density, and is estimated by the corresponding
ratio of estimates of averaged densities. (This fact will not surprise those who
recall that a simple regression coefficient is estimated as the ratio of a sample
covariance to a sample variance, or that a structural regression coefficient is
somet,imes estimated as the ratio of a sample covariance component to a sample
variance component.) In studying time series, as in its more classical situations,
regression analysis, whenever there is a suitable regression variable, is a more
sensitive and powerful form of analysis than variance component analysis.
As a consequence, one major reason for learning about spectrum analysis is
as a foundation for learning about cross-spectrum analysis.
The other approaches to data associated, directly or indirectly, with the
analysis of variance and the name of R. A. Fisher also have their analogs in
the analysis of time series. We have already noted, for example, how classical
Downloaded by [Michigan State University] at 14:13 12 January 2015
w was not near w0 , and to nearly eliminate the terms in (w + wo)t + (Dif w is
near w0 . The results of smoothing, then, would, if w is near w,, , be close to
[+A.G(w- 41 cosKu - wo)t+ (~1
and
[$A .G(w - q,)] sin [(u - wJt + (p]
where G(w - w,) is the magnitude of the transfer function of the smoothing
process (which we have assumed to use symmetrical weights and thus not to
affect phase). In this simple case, a cosinusoidal variation of angular frequency
w in the original, which may have been quite effectively concealed by larger
contributions at other frequencies, has been demodulated, and appears as a
cosinusoidal variation at the very much reduced angular frequency w - w0 ,
which is likely to be much more evident to the eye. (Complex demodulation,
the calculation and smoothing of two stretches of modulation-products, is neces-
sary if we are to distinguish the results of demodulating cos (w,, + 6)t from
the results of demodulating cos (wO - 6)t.)
Downloaded by [Michigan State University] at 14:13 12 January 2015
This technique is the natural extension to the non-periodic case of the ideas
underlying the classical Buys-Ballot table [e.g. Stumpff 1937, pp. 132ff or
Burkhardt 1904, pp. 678-6791, the so-called secondary analysis, and Bartels’s
summation dial [Chapman and Bartels 1940, pp. 593-599 or Bartels 1935 pp.
30-311. It has to be tried out on actual data before its incisiveness and power
is adequately appreciated.
Problems involving the simultaneous behavior of more than two time series
have not been worked on in a wide variety of fields of application, but enough
has been done to point the way and suggest the possibilities. There will be an
increasing number of instances where the corresponding non-time-series problems
would be naturally approached by multiple regression. These can be effectively
approached by multiple cross-spectrum and spectrum techniques which will
be precise analogs of multiple regression in spirit and, if care is taken in choice,
in the algebraic form of their basic equations. The differences which will arise
in the development will stem from:
(1) the fact that regression goes on separately at each frequency (which
produces merely an extensive parallelism of results), and
(2) the fact that regression coefficients will now take complex values rather
than real values (which enables us to learn a little bit more about the
underlying situation).
To my knowledge the multiple-time-series analogs of discriminant functions
and canonical variates have not yet arisen in practice. But there would seem
to be no difficulty in analogizing either or both.
Stationarity
The second application of the general principle is t’o the assumption of station-
arity, t,he analog in time series situations t’o the assumption of constancy of
variance in more classical sit,uations. The assumption of stationarity is one
at which the innocent boggle, sometimes even to the ext’ent of failing to learn
what the data would tell them if asked. Yet I have yet to meet anyone experienced
in the analysis of time series data (Gwilym Jenkins is an outstanding example)
who is over-concerned with statiouarity. All of us give some thought to both
possible and likely deviations from stationarity in planning how to collect or
Downloaded by [Michigan State University] at 14:13 12 January 2015
things can, and do happen. The possibility of their occurrence must be carefully
kept in mind. But t.his fact is not relevant to the point we have just been dis-
cussing.
Surely, if one has both adequate data and scientific or insightful ground to
fear nonstationarity, it will be wise not to average spectra over too long a time.
But the urge to choose the averaging time wisely is strengthened by an under-
standing that all data analyses estimate average spectra.
Wisely-chosen resolution
The third application of the general principle is to the question of the nar-
rowness of the frequency ranges for which we should seek spectrum estimates.
There are infinitely many frequencies. The number of separate frequencies
over which we could seek estimates from a given body of data is limited by
the extent of the data, and grows without limit as longer and longer pieces of
data become available. But it does not follow that we should always, or even
usually, work close to this limit. The analogy with an interaction mean square
Downloaded by [Michigan State University] at 14:13 12 January 2015
effects, where we may need, because of variation from place to place, to esti-
mate the value of the least favorable average response and, perhaps, the frc-
quency with which similarly unfavorable situations will arise in more extended
practice. The situation with time series is exactly similar.
Most of the time we shall be driven to estimation of a spectrum averaged
over repetitions, where the pattern, or t,he causes, of the changes in spectrum
from repetition to repetition are not understood. This averaging over repeti-
tions, forced on us by alternate ensembles, is superposed upon the averaging
over time within repetition, partially forced upon us by nonstationarity, and
upon the averaging over frequency bands, forced upon us by the limited extent
and amount of our data. What we estimate, then, is an average of averages
of averages. We have come a long way from the idea of a tight specification-
estimation relationship, where everything which is not presupposed should be
estimated. But it is well that we have done so. And no one who has considered
carefully what is estimated by a main effect in a reasonably complex analysis
of variance can maintain that so much averaging is surprising or unusual.
Just as in more conventional areas of statistical application, there are situa-
tions, the comparison of vibration intensity with structural strength being
perhaps the most obvious, where we shall need to estimate not the average
spectrum but some upper limit, perhaps an upper 99% limit, for the spectra
in the various replications, for the spectra of the various alternative ensembles.
But such instances are the exception, not the rule.
value of any of them lies in what the values of the variously defined band-
widths tell us about “resolution”. No one definition, nor even all the defi-
nitions so far given, can tell us all about resolution. As Goodman pointed out
in his verbal discussion, such matters as “rejection slope in dbloctave away
from the major lobe” or “db of rejection at a particular frequency” can be
important in particular circumstances. Thus numerical values of bandwidths
according to any definition closely related to “resolution” can help us, but
they will help us most if we regard them as telling us part, not all, of the story.
Choice of resolution
There is one matter upon which I should not like to have my views mis-
understood: the desirability in exploratory work of making spect’ral analyses
of the same data with different rrsohItions (usually represented in packaged
systems of calculation of spectrum analysis by the use of varying numbers
of lags in the initial computing step, which is the calculation of sums of lagged
products). Let me be quibe clear that, in my judgment and according to my
experience, it definitely is very often desirable in exploratory work, and sometimes
essential, to make analyses of the same data at diflering resolutions, Moreover,
it may be equally important to use different window shapes and different pre-
whitenings.
The place where Jenkins and I differ seriously, at least verbally (and I suspect
the difference is more verbal t’han actual) is in the utility of examining some
sequence of mean lagged products as a firm basis for choosing the number
of such values to be inserted in an appropriate Fourier transformer, and trans-
formed into spectral estimates. Our difference is greater still in connection
with the adequacy of the point of apparent “damping down” of these values
as a basis for choosing this number. It is not that knowledge of t,hc “damping
down” lag is not useful, but rather that’, at least in my view, its unthinking
use may be dangerous.
On the one had, I have known of cases where the useful estimates of power
spectra came from stopping well short of the damping-doJvn point. On the
DISCUSSION OF JENKINS AND PARZEN PAPERS 207
ot#her hand, if the spectrum were to contain one very large, very broad, very
smooth peak, and a close group of small, narrow peaks, the mean lagged products
would appear to damp down at a lag associated with the width of the large
broad peak, so that a spectrum whose resolution was associated with this
damping-down point would fail t,o resolve the close group of small peaks. Here,
as in all sorts of data analysis, there is no substitute for careful thought com-
bined with trial of various alternatives.
It is natural to be tempted into calculating more spect’rum estimates than
the number of mean lagged products used as their basis. This t,emptation need
not be a dangerous one, once it is realized that, given the mean lagged products
and the shape of the window, all the possible spectrum estimates lie on a cosine
polynomial of degree equal to the number of lags used. Once the usual number
of spectrum estimates have been calculated, they are enough to determine
this polynomial, and the calculation of further estimates is equivalent to a
process of cosine-polynomial interpolation. This does not mean that calculating
more estimates is useless, or that t,he results of furt,her calculation will lie close
Downloaded by [Michigan State University] at 14:13 12 January 2015
Blurred estimands
In discussing the general principle of parsimony we emphasized the need
to estimat,e averages over bands of frequencies. This point is so central to spec-
tral analysis as to make its heuristic and intuitive understanding worth con-
siderable effort. Let us begin with classical situations. If one has more degrees
of freedom than variance components, then one can find estimat’es of some
(and perhaps all) of these variance components whose average values do not
depend upon t’he other variance components. But once there are more variance
components than degrees of freedom, this need not be the case, Consider a
two-way r-by-c array of observations in which there are r.c + 2 variance com-
ponents, viz. a rows variance component, a columns variance component, and
one variance component for each of the r =c cells. (This is a natural model when
the variance of the cell contributions varies irregularly from cell to cell.) In
this situation there is no estimate of any of the r. c cell variance components
whose average value is free of all the other variance components.
In t’he time series case there are very many more variance components than
degrees of freedom. For, unless some periodicity assumption holds perfectly
(and I know of not a single instance where it does), a contribution of the form
A cos wt + B sin wt
is permissible for any value of w in some interval. And as all statisticians know
from bitter experience, at least all t,he things that are permissible mill happen.
Thus, in principle, there are infinitely many variance components, one for
each possible w. And, when the realities of band-limiting and of finite duration
of data are faced, there are only a finite number of observat,ions available, and
208 JOHN W. TUKFf
Kinds of asymptosis
The purpose of asymptotic theory in statistics is simple: to provide usable
approximations before passage to the limit. Consequently asymptotic results
and asymptotic problems are likely to be of limited utility when the finiteness
of a sample size or of some other quantity is of overwhelming importance.
(Thus, for example, the theorem that maximum likelihood estimates are asymp-
totically normally distributed with a certain variance-covariance matrix is
rarely of any use when there are only 1 or 2 degrees of freedom for error.) It
Downloaded by [Michigan State University] at 14:13 12 January 2015
V: THE MORAL
To analyze time series effectively we must do the same as in any other area
of statistical technique: “Fear the Lord and Shame the Devil” by admitting that:
(1) The complexity of the situation we study is greater than the complexity
of that description of it offered by our estimates.
(2) Balancing of one ill against another in choosing the way data is either
Downloaded by [Michigan State University] at 14:13 12 January 2015
(If these are calculated for Ic = 0, 1,2, . . 1 , m, some (m + 1)n - m(m - 1)/2 -
m.n multiplications will be required.) The Xi in this calculation will be raw,
or prewhitened, or otherwise modified observations, from which means, fitted
polynomials, or other fitted trends may or may not have been subtracted.
Unless unusually careful preparatory steps for the elimination of very low
frequencies were already taken in the preparation of the Xi , the next step
aft’er calculating these sums of lagged products will be adjustment of these
sums of lagged products for means or trends. It is vital to deal in practice with
Downloaded by [Michigan State University] at 14:13 12 January 2015
such adjusted sums of lagged products, as almost everyone who ent.ers upon
time series analysis seems to have to learn for himself. (However, it will save
space and, hopefully, promote clarity if we omit the word “adjusted” during
the remainder of this discussion. We shall omit it.) Having been told of sums
of lagged products, every analyst of variance expects us to go on to mean lagged
products. Going on is inevitable.
There is a question of the appropriate divisor. If we had not corrected for
the mean (or any trend) there are cases to be made for both n and n - lc. If
we had corrected for, say, a general linear trend (which absorbs 2 degrees of
freedom), there are cases to be made for n, for n - 2, for n - k and for n - k - 2.
Parzen gives attention, between his (4.6) and (4.7), to some of the reasons for
choosing n or n - 2 rather than n - k or n - k - 2. By analogy with the
analysis of variance we might feel that n - k - 2 (or, when no adjustment
is made, n - k) would be desirable because unbiasedness is good. The un-
biasedness argument is found not to be a strong one in the time series situation.
Is t’his choice an important one for the analyst or investigator whose concern
is with the spectrum? You should be happy to be told that the answer is “no”.
If one’s concern is with the spectrum, then the most important thing about
any quadratic function of the observations is the spectrum window which
expresses the average value of the estimate in terms of the spectrum of the
ensemble. (The next most important t’hing is, of course, the variability of the
quadratic function.) This is just what we should expect for a variance-com-
ponent problem, where means and other linear combinations of the observations
are without direct interest. For if, in some very complex (probably unbalanced
to begin with, and then peppered with missing plots) analysis of variance,
one is given the values of cert,ain mean squares (or other quadratic functions
of t’he observations), the first question one concerned with variance components
asks is “How are the average values of these mean squares expressible in terms
of our variance component’s?“. (The question about stability “How many
degrees of freedom should be assigned to each?” is important but secondary.)
If we know the windows associated with our spectrum estimates, we need not,
212 JOHN W. TUKEY
be concerned, in the first instance, with how these estimates were obtained.
And, moreover, any linear combination of the result)s of dividing the sums of
lagged products by n, is also a linear combination of the results of dividing
the sums of lagged products by n - E, and vice versa.
The practicing spectrum analyst need not be concerned with division by
n or n - k, so long as he doesn’t mis-assemble formulas by combining some
which are appropriate for one divisor wit’h others appropriate for the other.
However, t’hose interested in the theory of spectrum analysis do need to give
some attention to this choice, partly because of the reasons given by Parzen,
partly because this choice affects just what functions of frequency the mean
lagged products are Fourier transforms of, partly for various other reasons.
The man who has a practical interest in the autocovariance function, if there
really be such, clearly also has to take an interest in alternative estimates.
Unlikely though it may seem at first, there is a moderately close analogy
between the biased estimates supported by Parzen and biased estimates which
are reasonable in classical analysis of variance. Consider data in a single classi-
Downloaded by [Michigan State University] at 14:13 12 January 2015
fication with r observations in each class, so that the between mean square
has average value c2 + r$ , where r2 is the error variance component, and
a; is the between variance component. If we wish to estimate the population
average corresponding to a particular classification, there is little doubt that
the sample mean for that classification is t’he most reasonable estimate. But
if we wish to depict the pattern of the population averages corresponding to
all classifications, we should do something about the inflation of this pattern
by error variance; we should replace the pattern of observed means by a suitably
shrunken pattern. (In the simplest cases it may suffice to shrink each classifi-
cation mean toward the grand mean by the fact’or [T&‘(o’ + $)I*. In others
the method developed by Eddington for dealing with stellar statistics [Trumpler
and Weaver 1953, pp. 101-1041 may need to be applied.) The analogy with
the time series case is reasonably, in fact surprisingly, close. If we wanted to
estimate just one autocovariance, we should undoubtedly use the unbiased
estimate. But if we are concerned with the pattern made by the estimated
values, with the nature of the autocovariance function, we may, as Pareen
points out, do better to use the biased estimate.
(The extreme instance of the problem underlying this choice in the time
series case arises when one 5-minute record is “cross-correlated” [really cross-
covarianced] with another 5-minute stretch of the same time series, as recorded
an hour, a day, or a week later. If the spectrum of the ensemble is relatively
sharp, the average value of the covariance will still tend to zero, but the average
value of its square will tend, not to zero, but to a value depending upon the
product of the 5-minute duration with the width of the spectral peak. Thus
if one calculates autocovariances at lags from 24 hours 0 minutes to 25 hours
5 minutes one will almost certainly find an apparently systematic wavy pattern
in the unbiased estimates of autocovariances or autocorrelations computed
for a particular realization. It is natural t’o believe that this pattern is “real”,
although t’he true average values of the autocovariances are actually very,
very much smaller in magnitude than the values found from a single realization.
Suchpatterns can be so regular as to mislead investigators into an unwarranted
DISCUSSION OF JENKINS AND PARZEN PAPERS 213
belief that the presence of a strikingly accurate underlying clock has been
demonstrated.)
How can I construct a window?
If we leave aside a few matters which really do not matter here, although
some of them are very important elsewhere (such as adjustment for the mean,
ot,her devices for rejection of very low frequencies, and division by n - k not n),
the function of lag by which the mean lagged products are multiplied before
Fourier transformation, and the window (expressed in terms of w - o. and
w + w. separately, where w. is the center frequency of the estimate) through
which the spectrum determines the average value of t’he estimate, are Fourier
t,ransforms of one another. (If you have never followed a derivation of this,
just take it on faith.) Since every lag must be a multiple of the data interval,
one of these functions is a finite array of spikes, spaced one data interval apart.
The other function is a polynomial in cos (u - wo) of an appropriate degree.
Downloaded by [Michigan State University] at 14:13 12 January 2015
at most k the height of the main lobe, and the resulting spect’ral window, often
called the Bartlet,t window, is everywhere posit,ive. If li = 4, which corresponds
Do line 8 in Parzen’s Table 1, and to ha(u) in his Table 2, the minor lobes are
at most &- the height of the main lobe, and the resulting spectral window, as
Parzen shows, is quite effective.
It would be perfectly possible to use k = 8 or k = 16 if we wished even lower
lower minor lobes. The cost to us of doing t,his would be twofold. There would
have to be an increase in computat’ional effort in order to provide mean lagged
products for the additional lags required to give a main lobe of comparable
width. And t’he shapes of the main lobes would be somewhat less favorable,
since the process of raising the window to a higher and higher power will make
both the minor lobes and the lower portions of t’he main lobe still lower. As
a result t’he main lobe will “occupy” a smaller and smaller part of the frequency
band between the zeroes (of the window) which define it,, and, consequently,
the variability of the corresponding estimate (leakage aside) will be greater
than that of an estimate with a more “blocky” spectrum window.
Downloaded by [Michigan State University] at 14:13 12 January 2015
As is clear from Parzen’s paper, these are not the only useful lag windows,
the “cosine-arch” or “hanning” lag window which is proportional to “one
plus cosine” being also of practical interest. This latter window was “discovered”
by empirical observation, and the best reason for considering it are the properties
it is found to have.
(Two further easily understandable types of window which may sometimes
prove useful may be obtained respectively, (i) by taking a truncated normal
distribution as the lag window, (ii) by taking a Cebys&r polynomial for the
spectral window. This last choice makes all minor lobes of equal height, and
as small in comparison with the main lobe as is possible for a given number
of lags. This equality of height, which makes the minor lobes adjacent to the
main lobes lower than those of most other windows but makes minor lobes
far away from the main lobes relatively higher than those of most other windows,
seems to prove t(o be a disadvant’age rather more often than it proves to be an
advantage.)
when a new set, of spectrum e&mates are required, but. others feel quite dif-
ferently. Some of the reasons for this difference can be made manifest, and
their mendion may serve to illuminate a variety of computational issues.
be found in eit,her Tukey 1959a (pp. 408411) or Tukey 195913 (pp. 327-330).
These lists unfortunately omitted the 1957 Symposium at the Royal Statistical
Society on the Analysis of Geophysical Time Series [Craddock 1957, Charnock
1957, Rushton and Neumann 1957, and discussion], where furt’her references
to geophysical applications can be found.
Expositions from one point of view or another have been attempted by Press
and Tukey 1956, and Tukey 1959b. There is no substitute for reading Chapman
and Bartels 1940, or one of Bartels’s other expositions of similar techniques,
e.g. Bartels 1935.
An account from the point of view of the user has been attempted by Black-
man and Tukey [1959], who give a fair diversity of references.
The more abstract background may be sought in Grenander and Rosenblatt
1957, and in recent papers in Series B of the Journal of Royal Statistical Societ’y.
No expository account of the analysis of cross-spectra seems so far to exist.
The only substantial reference continues t’o be the thesis of Goodman [1957],
copies of which I understand can now be obtained from: Office of Scientific
and Engineering Relations (Reprints), Space Technology Laboratories, Inc.
P.O. Box 95001, Los Angeles 45, Calif.
REFERENCES
JULIUS BARTELS, 1935, Random fluctuations, persistence, and quasipersistence in geophysical
and cosmical periodicities, 40 Terr. Magnetisnz l-60.
R. B. BLACKMAN AND J. W. TUKEY, 1959, The measuremtnt of power spectra from the point of
view of communications engineering, New York, Dover, 5 + 190 pp. (Reprinted from 57 Bell
System Technical Journal (1958) with added preface and index.)
H. BURKHARDT, 1904, Trigonometrische interpolation, IIA9a Encyklopadie der Math. Wiss.
642-693.
SYDNEY CHAPMAN AND JULIUS BAHTEIS, 1940, Geomagnetism Oxford, Univ. Press (2 ~01s).
Especially chap. 16, Periodicities and harmonic analysis in geophysics, pp. 515-605 (in
vol. 2) (Second Edition 1951, photographic reprint with additions to appear.)
H. CHARNOCK, 1957, Notes on the specification of atmospheric tllrbulence, A120 J. ROLJ.
Stutist. SOC. 398-408 (discussion 425-439).
JEROME CORXFIELD AND JOHN W. TUKEY, 1956, Average values of mean squares in factorials,
27 Annals Alath. Statist., 907-949.
DISCUSSION OF JENKINS AND PARZEN PAPERS 219
J. hl. CRADDOCK, 195i, .-\n analysis of t,he slower temperature variations at Iiew 0l)scrvatory
by means of mutually exclusive band pass filters, -11% J. Roll. Statist. Sot., 38i-397 (dis-
cussion -X%5-439).
R. A. FISHER, 1929, Tests of significance in harmonic analysis, A126 Proc. ZZo!/.Sot. London,
54-59. (Reprinted as paper I6 in his C’ontribr&ons to Mathematical Statistics, New York,
Wiley, 1950).
X. R. GOODMAX, 195i On the joint estimation of the spectra, cospectrum and quadrature
spectrum of a two-dimensional stationary Gaussian process, Scientific Paper Ko. 10, Engi-
neering Statistics Laboratory, New York University 1957 (also Ph. D. Thesis, Princeton
University).
UI,F GRENaNDER AND MURRAY ROSENBLATT, 1957, Statistical -4nalysis of Stationary Time
Series, New York, Wiley; Stockholm, Almqvist & Wiksell, 300 p.
H. 0. HARTLEY, 1946, The application of some commercial calculating machines to certain
statistical calculations. 8 Suppl. J Roll. Sla . Sot., 154-183 (Especially pp. 167-168).
J. LEITH HOLLOWAY JR., 1958, Smoothing and filtering of time series and space fields, 4
Advances in Geoph?ysics (Ed. H. E. Landsberg) pp. 351-389, New York, Academic Press.
G. H. JOWETT, 1955 Sampling properties of local statistics in stationary stochastic series,
43 Biometrika, 160-169.
G. H JOWETT 1957, Statistical analysis using local properties of smooth heteromorphic
Downloaded by [Michigan State University] at 14:13 12 January 2015