0% found this document useful (0 votes)
31 views

Tukey, J. W. (1961) - Discussion, Emphasizing The Connection Between Analysis of Variance and Spectrum Analysis

Uploaded by

toronja2022
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Tukey, J. W. (1961) - Discussion, Emphasizing The Connection Between Analysis of Variance and Spectrum Analysis

Uploaded by

toronja2022
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

This article was downloaded by: [Michigan State University]

On: 12 January 2015, At: 14:13


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer
House, 37-41 Mortimer Street, London W1T 3JH, UK

Technometrics
Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/utch20

Discussion, Emphasizing the Connection Between


Analysis of Variance and Spectrum Analysis
a
John W. Tukey
a
Princeton University, Bell Telephone Laboratories
Published online: 30 Apr 2012.

To cite this article: John W. Tukey (1961) Discussion, Emphasizing the Connection Between Analysis of Variance and
Spectrum Analysis, Technometrics, 3:2, 191-219

To link to this article: http://dx.doi.org/10.1080/00401706.1961.10489940

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose
of the Content. Any opinions and views expressed in this publication are the opinions and views of the
authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should
not be relied upon and should be independently verified with primary sources of information. Taylor
and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses,
damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection
with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
VOL. 3, No. 2 TECHSOMETRICS MAY, 1961

Discussion, Emphasizing the Connection Between


Analysis of Variance and Spectrum Analysis*
JOHN W. TUKEY
Princeton University
and
Bell Telephone Laboratories

This session was to be expository and to be directed t,o st~atisticians. Accord-


ingly, the discussants have a responsibility to provide such comments as may
Downloaded by [Michigan State University] at 14:13 12 January 2015

tend to make both the two papers and the general subject more understandable
to statisticians, particularly by relating spectrum analysis t’o statistical tech-
niques, and to fields of application, more widely familiar to them. Fortunately,
the connection between spectrum analysis and those aspects of the analysis
of variance which emphasize variance components is extremely close.
To make this connection evident, however, we shall have to analyze t,he
implications and foundations of our procedures and thinking in classical analysis
of variance more deeply than usual. It is fair to say that the spectrum analysis
of a single time series is just a branch of variance component analysis, but
only if one describes its main difference from the classical branches as a re-
quirement for explicit recognition of what is being done and why. In classicaI
(i.e. single-response analysis-of-variance) variance component analysis, one can
(and most of us do) analyze data quite freely and understandingly with little
thought about what is being done and why it is being done. This is, perhaps
unfortunately, not the case for the time series analysis branch of variance
component analysis.

I: VARIANCE COMPONENTS AND SPECTRUM ANALYSIS


When variance components?
When conducting analyses of data in conventional analysis-of-variance pat-
terns, we sometimes pay attention to individual values of main effects, inter-
actions, and the like. At other times, we pay attention to estimates of variance
componenm. The controlling factor in this choice is the character of the sets
of data which would be considered to be other realizations of the same experi-
ment (or of the same patterned observation). Thus, if we were comparing the
times taken by the five outstanding runners of the world to run 1500 met,ers,
anot’her realization of the experiment would reasonably involve the same runners,
and it would be appropriate to pay attention to individual main effects. If,
* Prepared in part in connection with research at Princeton University under contract
DA 36-034-ORD-2297 sponsored by the Office of Ordnance Research. Reproduction in whole
or part permitted for purposes of the U. S. Government.
191
192 JOHN W. TUKEY

however, me were considering the speeds for a standard assembly operation,


as shown by five assemblers drawn at random from a pool of 250 assemblers
in a large factory, another realization of the same experiment would almost
certainly involve a different group of assemblers. Consequently, in analyzing
such data, we would pay attention to the estimated variance component for
assemblers, since our concern would have been with assemblers as a whole
rather t’han with five particular assemblers. (We are here concerned with the
direct issue of what aspect of the classification concerned receives attention,
not with the indirect, but perhaps equally important, issue of how the character
of this classification affects the proper error term for other main effects-the
question sometimes discussed in terms of “fixed, mixed, or random models”.)
There is a clear analog to this choice in the Fourier-oriented analysis of time
series.
Let us first consider the case of a function of time which is periodic with
known period. If we may choose the time unit for convenience, the period may
as well be 2?r, and the function will then have (in practice) a Fourier series
Downloaded by [Michigan State University] at 14:13 12 January 2015

represent,ation of the form

y(t) = a0 + C (ai cos jt + bi sin jt)


I
Let us lay aside for the moment questions of errors of measurement, numbers
of (and spacings between) times at which observations are made, and whether j
has a finite or infinite range. Since we are statisticians, concerned with a sta-
tistical problem, the coefficients a0 , a, , b, , a, , b, , . . . are not to be thought
of as constant, but rather as having some joint distribution. This joint dis-
tribut’ion reflects the functions corresponding to “all the realizations” of the
same experiment or observational program. At one extreme, the functions of
time representing different realizations might all be very nearly the same. If
this is the case, then, given a single realization, it is clearly appropriate to
concentrate our attention upon the estimated values of a0 , al , b, , a2 , b, , . . . .
This is, of course, the situation envisaged in classical harmonic analysis. One
opposite extreme, one which you may claim only a statistician would think
of, occurs when there are parameters 0: , 0; , & , . . . and the a’s and b’s are
independent normal deviates with ave a0 = ave ai = ave bi = 0, var a,, = ri ,
var ai = var bi = ~:/2. Given one realization of such an experiment, it is only
reasonable t’o look at quadratic functions of the observations, and to regard
them as telling us about ,J: , gf , v,” , . . . . Specifically it is appropriate to look
at a; , a: + b; , CL:+ bi , . a. and at certain linear combinations of these quan-
tities. In contrast to classical harmonic analysis, this sort of periodic-time-
function problem is a variance component problem.
The model which lies behind t,he classical tests of significance in harmonic
analysis, a line of development finally completed by Fisher [1929], is an incom-
plete mixture of the two we have just described, in which

Yobserved(Q
= $/fixed(t) + Yrandom(O
*
In this decomposition the “fixed” component8 is usually thought of as involving
only one, two, or perhaps three values of j, while, both most import,nntly and
DISCUSSION OF JENKINS AND PARZEN PAPERS 193

most dangerously, the “random” component is thought of as having VT =


0; = . . . = c+ = . . . = 0.2
Equality of the g; , the analog for periodic functions of being a “white noise”,
is exactly what would hold if the “random” component consisted only of in-
dependent (or merely uncorrelated) observational errors in observations equally
spaced t’hrough (0, 2,). It is also, unfortunately, exactly what is most unlikely
to occur in practice (for reasons to be discussed in a moment). As a conse-
quence, the practical applications of such “largest value against all the rest”
tests of significance in harmonic analysis is, to say the least, extremely limited.
(If only our estimates, a: + by , of 0; had more than two degrees of freedom,
we could improve the classical tests of significance by fitting some sort of reason-
able dependence of uf upon j, before proceeding to the construction of a sig-
nificance test. Even with only two degrees of freedom, some such improvement
may be possible.)
Thus, even in the case of periodic time functions, we have some situations
which should be treated almost entirely in terms of means, others which should
Downloaded by [Michigan State University] at 14:13 12 January 2015

be treated entirely in terms of variance components, and still others where


both descriptions should be used together.

The character of time


Time is connected. And functions of time reflect this fact in their structure,
not only in the tendency toward continuity shown by individual time functions,
but even more obviously in the associated probability structures. When a time
function is wisely regarded as generated from constituents coming from dif-
ferent sources, as most are, the individual constituents are not likely to be
“white noises.” (Not even the measurement error constituent!) And, even
more crucially, the processes by which these constituents are combined are
not likely to treat different frequencies alike, so that even if the constituents
were white noises, their resultant would not be one. Both in the periodic case
and the more usual and general case of a continuous spect.rum, a random time ’
function is rarely a “white noise”.
Another characteristic of time is that it is quite frequently measured from
an arbitrary origin. To be sure, if the simple periodic case has an annual period,
we may place the computing origin of time where we will, but that will not
make 1 January and 1 July the same. But if we are examining the harmonics
of a 400-cycle electrical voltage, there is no equally necessary or special relation
between local time and 400-cycle time. In a repetition of the same experiment.,
the generator phase at zero local time may well be equally likely to have any
value between 0 and 2~. And if this is so, the situation is a stationary one. (This
example may help to emphasize that stationarity is a condition “across the
ensemble”, a condition relating one realization to another, a condition on a
whole ensemble; that it is not a condition on single realizat,ions, and most spe-
cifically is not a condition of steadiness within individual realizations.)
Finally, phenomena in time are rarely periodic. (In fact, when examined
under a microscope, no known phenomenon is precisely periodic.) Consequently,
an effective Fourier description of real phenomena can rarely be a periodic
194 JOHN W. TUKEY

descript)ion. We must allow all frequencies to contribute, and hence, as Jenkins


has explained, must turn to a continuous spectrum.
The statistically vital contrast between situations appropriately describable
by means and situations appropriately describable by variances continues here,
as we should have expected. The motion of a springboard from which a diver
has just leaped requires all frequencies for their description. The motions fol-
lowing successive leaps by a single careful and precise diver will be relatively
similar. They will, as a whole, probably be most appropriately described “by
means”, by a description of the typical t.ime history of board motion. But if
no diver is present, if the springboard is vibrating through a very small ampli-
tude because the wind is blowing on the board and its supports, and because
the ground itself is vibrating because of vehicle traffic and factory machinery,
the situation is likely to be quite different. The charact,eristics of this “noise-
like” motion of the springboard which are maintained from one realization to
another are of the nature of variance components rather than means. And of
course (as when a big grasshopper jumps off a small wind and traffic-vibrated
Downloaded by [Michigan State University] at 14:13 12 January 2015

springboard) there are intermediate situations whose description appropriately


combines both means and variance components.

Which uariance components?


Discussion has proceeded, up to this point, as though the statement of a
problem aut80matically fixed a set of variance components. When we think
matters over carefully, we find that this is far from being the case. In an ab-
stract problem, where only t,he pattern of the observations and the symmetries
of their distribution are specified, without any indication of their interpretation
or understanding, there is no unique set of variance components. Instead there
are many sets, each interconvertible by prescribable formulas into each other.
Abstractly, the best we can do is to say t’hat any set of quantities such that
each of the second moments (pure and mixed) of the observat’ions can be ex-
pressed as a linear combination of the quantit,ies of the set (together with, say,
the square of the average of some general mean) can play the formal role of a
system of variance components. (If the quantities in some set do not behave
like variances we might prefer to call t#hem (together with the squared average)
second-moment components rather than variance components, though we shall
not be concerned with this particular precision of language here.) Still one
set of variance components may be more convenient, and far more useful,
than another. Why?

Replicated double classiJcations


If we examine one of tho most classical patterns, a replicated double classi-
fication into rows and columns, we can learn why. Let us, then, consider a
classical analysis of variance, based on a pat’tern involving d observat8ions in
each of the r .c cells formed by crossing r rows wit’h c columns. The analysis
of variance breakdown into sums of squares, degrees of freedom, and mean
squares is standard, as arc t,hc definitions of variance components. The well-
DISCUSSION OF JENKINS AND PARZEN PAPERS 195

known formulas for I’he nvcragt: va111csof mca11 sq\laros arc, if all population
sizes are infinite:

ave (MS 1rows] = o2 + d.(~i~ + dc.f~2,

ave {R4S 1 ~01s) = rJ2+ d.a& + dr.ls2L’

ave (MS 1 int,) = g2 + d.(~i~

ave (MS j dup) = U’


Why did we choose g2, D’,~ , U: and G; as the variance components in terms
of which we are t’o write out such formulas? We could for example, have used
as variance component,s such average values of differences between differently
related pairs of observations as, taking i # 1, j # J, k # K:

ave (yiik - y,,d’


Downloaded by [Michigan State University] at 14:13 12 January 2015

ave (yiik - yiJK)2

ave bilk - yl,J

ave (ytik - yrJd2


Before trying to answer these questions we must, look back at some of the
implications of the way in which they were asked.
The term “variance component” can he, and is, appropriately used in two
different senses. These senses differ in effect,, but, only when the underlying
sit,uations differ, so that no contradictions arise. When the underlying situation
is such that it, is appropriate to consider means in t,he first instance (the pigeon-
hole model of Cornfield and Tukey 1956 includes such extreme examples),
variance components are means over more specific quadratic quantities. In
particular, the within-cell or “duplication” variance component c2 is the average
of the variances of all the cell populations. If these cell-population variances
differ from cell to cell, so too do the values of

since these averages will always he twice the variance of the population in the
corresponding cell.
When the underlying situation is at the other extreme, so that only variance
components should be considered, t,hen t’he labels upon the rows and columns
can wisely be regarded as purely arbitrary. This means that if the same “in-
dividual” were to appear as a row in each of two realizations of the same experi-
ment, the numbers labeling the two rows would be quite unrelated. Such lack
of relationship could be in the nat,ure of the situation, or could have been en-
forced by our insistence on a randomizat,ion of the row numbers, separately
for each realization, before the data was made available for analysis. But if
the labels are arbitrary, we cannot think of one cell, considered by itself, as
different from another. Similarly, t’here will be only four kinds of pairs of cells:
identical; in same column but not in same row; in same row but not in same
196 JOHN W. TUKEY

column; in different rows and columns. And the four corresponding average
square differences would have the following values:
ave (Yiir - yiiK)2 = 2a2
ave (yiik - ?/iJI$ = 2a2 + 2a;, + 2CT2,
ave (yiik - y,J = 2a* + 2&c + 2fJ;
ave (Ysik - YIJK)~ = 2a2 + 2& + 2a; + 2a:
Knowing either set of four quantities, either the 4 average squared differences,
2 2
or u, uC , uRI’ and CT”,,, the other set is very easily calculated.
Why then do we prefer the first set, since they are arithmetically equivalent?
It must be because of some matter of interpretation. And the interpretation
must involve not the realizations of a single experiment but the comparison
of two or more different experiments. In fact, we feel that, for example, the
sort of change of circumstances which halves or doubles a: while leaving c2,
2 , and ui unaffected is easier to understand than the sort which changes
Downloaded by [Michigan State University] at 14:13 12 January 2015

uRC
<me (Yiik - yiJK)2 without affecting its three fellows.
The prime criterion for selecting useful variance components is that we should
be more easily able to understand the changes in the situation which would change
some variance components while leaving others alone.

Known-period time functions


Let us now consider periodic time functions with a fixed period and a sta-
tionary joint distribution. One variance component description has already
been given in terms of ui , u: , U; , . . . . (Normality is a matter of indifference
to us in the present instance.) Another can be given in terms of Jowett’s serial
variance function [Jowett 19551:
Vn = + ave (y(t + h) - y(t))’
which, on account of stationarity, must be the same for all values of t. The
formal relations between these two schemes is easily found to be:

The formal similarities between the two pairs of mutually related variance-
component schemes, one for the replicated two-way table, and the other for
stationary periodic time series, are very striking, but the actual similarities
go deeper.
What are the simplest changes which we can contemplate making in a situa-
tion involving stationary periodic time functions? They are the results of such
simple linear operations as the result of passing an electrical voltage through
a simple circuit consisting of resistances, condensers, and inductances, or the
result of passing a mechanical motion through a simple linkage of springs,
masses, and dash pots. (Such processes occur, in particular, in almost every
physical or chemical measuring instrument.) Any such linear process will affect
t,he amplitude and phase of each harmonic in a characteristic way. If its effect
on a pure jth harmonic would be to multiply amplitude by 1 Li 1, then the
DISCUSSION OF JENKINS AND PARZEN PAPERS 197

jth variance component of any stationary ensemble of periodic time series


(with period 2,) will be multiplied by 1 Li 1’ = LiLT . There is no correspond-
ingly simple result for the serial varinuce function. Consequently, the frequency-
related variance components are much more useful than serial variance functions
in dealing with stationary ensembles of fixed-period time functions.
(In highly mathematical language, the frequency variance components are
a basis for second moments which simultaneously diagonalize the effects of all
operations that are linear and time-shift-invariant-all black boxes in the
sense of pp. xyz-uvw.)

It can be done with covariances!


The discussion just given stressed the analogy between classical analysis of
variance and the analysis of stationary periodic time series by using averages
of squares of differences of observations in both situations. It would have been
possible to have stressed the analogy almost equally to have used covariances
in both situations. In the replicated row-by-column pattern, we have, when
Downloaded by [Michigan State University] at 14:13 12 January 2015

the covariances are taken across the specification, from one realization to another,
WITH AN ENTIRE NEW SAMPLE OF ROWS AND COLUMNS IN
EACH REALIZATION:
WV {YiiS , YijkJ = CT2+ Ui, + U2, + Ut j
cov (Y,ik , yiirc1 = a;, + u; + 02, ,
cov (Yiik , y&TIC) = ai ,
COV {Yijk 7 YIjK} = 02, f
These covariances across the ensemble are quite analogous to the serial co-
variances in the time series case, which are given by

JW) = cov IYW, ~0 + WI


where the covariance is again across the ensemble, from realization to rcalixution,
and whose relation to the frequency variance components is, formally,
R(h) = cr,”+ c (cos jh) ‘CT;.
I
The main reason for approaching the analogy in terms of averages of squared
differences is a pedagogical one. It seems to be easier to think about the averages
of squared differences, when working from one realization to another. After
all, as statisticians we are quite used to thinking about the average value of
some quantity we have managed to measure only once. But it is a much further
cry to think about a covariance of two quantities, each of which has been meas-
ured only once.
The qualitative nature of this distinction between covariances and average
squared differences is notably different for the replicated double classification
and for stationary ensembles of periodic time series. This is due, in large part,
to our tendency to expect, t,he versions of classifications to have names, to try
to think in terms of situations where means and main effecm are more important,
than variance components. We feel that if, for example, i is a subscript iden-
198 JOHN W. TUKEY

tifying persons, that, i = 3 should refer to a particular person, not to the third
row of some randomly arranged data array.
Yet in a situat,ion where a pure variance component approach is appropriate,
the process of randomly rearranging the rows of the data array generates what
we may t,hink of, without, doing too much violence to the situation, as a new
(but clearly not independent) repetition of t’he experiment. If we fix our eyes
on particular values of i, j, k, I, J, and K, consider all admissible rearrange-
ments of t,he data array, and tfhen average the simplest quadrature expressions,
we are led to suitable symmetric functions of the original data array which are
natural estimates of the covariances across the ensemble, provided the latter
are given an averaged interpretat’ion.
The usual practice in the spectrum analysis of a single stretch of time series
is entirely analogous to such a procedure. Let us, for example, consider esti-
mating cov (y, , y4). We have the original observations y1 , y2 , y:%, y4 , y5 , . * * .
The results of shifting the time origin, one unit at a time, and always dropping
observations at negative times, are first yZ , y3 , y4 , y5 , y8 , . . . , then y3 , y4 ,
Downloaded by [Michigan State University] at 14:13 12 January 2015

y3 , Ye , Y7 , * . . and so on. The pairs (y, , y4), (ya , ~4, (y,, , YA, . . * (yl , yll+4
are “equivalent” (either because stat,ionarity is assumed or because we want
an averaged covariance) and we can calculate a “sample” covariance from these
pairs. Such processes of imitating the sought-for covariance across the ensemble
wit’h a sample “covariance” wandering around the data pattern are inevitable
when only a single realization is available, be it in an analysis-of-variance
situation or a time series situation.
(In the t.ime series situation, if and when we look more deeply into the details
of the situation, we may find that the averages of squares of differences indeed,
as Jowett has suggested [1955, 1957, 195S], have real advantages over co-
variances, insofar as problems associated with trends and very low frequencies
are concerned. But this is for the future Do reveal.)

Black boxes and the general case


A discussion exactly analogous to the one just given for stationary ensembles
of period-2a time series can be given for the general case of a stationary ensemble
of time series. We shall not att,empt to give details here, trying only to hit
the high points.
There are many circumst’auces under which it’ is convenient to call any pro-
cedure or process (be it computational, physical, or conceptual) which converts
an input to an output a black box. In dealing with time series it is convenient
to restrict the berm black box to procedures or processes which satisfy two
further conditions:
(1) The output corresponding to t’he superposition of two inputs is the super-
position of the corresponding outputs.
(2) The only effect of delaying an input by a fixed time is to delay the out-
put by the same time.
If t,he procedure or process departs from one or bot)h of t’hesc conditions, it is
conveniently called a colored 1)0x, using specific colors when specific sorts of
departures are permitted. Some examples of black boxes include:
DISCUSSION OF JENKINS AND PARZEN PAPERS 199

(a) moving averages, such as


1
x ,=- h jYI-h+l + l/t--h+2 + ..* + y,]

(b) time delays


Z( = yt-*
(c) differences
z , = 2/t - 2//-h
(d) more general moving linear combinations
x 1= aoyl + UllJL-1 + . . . + anyt-h
(c) linear electric networks (which may include amplifiers, transmission
lines, and wave guides),
(f) linear mechanical systems,
Downloaded by [Michigan State University] at 14:13 12 January 2015

(g) linear economic systems,


(h) differentiation with respect t’o time,
(i) integration wit,h respect to t’ime.
Clearly many of the most important computational, physical, and conceptual
processes are black boxes in this sense.
It is easy to show (if we grant a small amount of continuity and a sufficient
lack of dependence of present output on what happened at t = - a) that,
if the input to a black box is A .cos (wt + 6), then the outputs has to take the
form G(w).A.cos (wt + 6 + p(w)), w h ere the amplification G(w) and the phase
shift p(w) depend only upon W. This brings every black box into the framework
discussed by Jenkins, so that
(spectrum of output) = [G(W)]‘. (spectrum of input).
The important, thing about t’his relation, for our present purposes, is that the
variance component associated with a single frequency (or narrow band of
frequencies) in the output is determined by the corresponding variance com-
ponent of the input. There is no mixing up of frequency variance components.
This is simultaneously true for all black boxes, and is the basic reason why the
user, be he physicist, economist, or epidemiologist, almost invariably finds
frequency variance component8s the most satisfactory choice for any time
series problem which should be treated in terms of variance components.

II: OTHER ANALOGIES


I hope that Part 1 has made the close relationship between spect.rum analysis
of a single time series and variance component analysis very much clearer.
There arc similar analogies to other classical t,cchniques. These are worthy of
mention here, even though we cannot take t,he space t,o describe them in detail.
Even t’hough t,he cross-spect,rum analysis of two or more time series was
not discussed in this session (in part because an understanding of the spectrum
analysis of one time series is an essential prerequisite), it is important to point
200 JOHN W. TUKEY

out t,hat probably t,he most important aspects of cross-spectrum analysis are
cases of (complex-valued, frequency-dependent) regression analysis in which
the analog of a regression coefficient is the ratio of a (complex-valued) cross-
spectrum density to a spectrum density, and is estimated by the corresponding
ratio of estimates of averaged densities. (This fact will not surprise those who
recall that a simple regression coefficient is estimated as the ratio of a sample
covariance to a sample variance, or that a structural regression coefficient is
somet,imes estimated as the ratio of a sample covariance component to a sample
variance component.) In studying time series, as in its more classical situations,
regression analysis, whenever there is a suitable regression variable, is a more
sensitive and powerful form of analysis than variance component analysis.
As a consequence, one major reason for learning about spectrum analysis is
as a foundation for learning about cross-spectrum analysis.
The other approaches to data associated, directly or indirectly, with the
analysis of variance and the name of R. A. Fisher also have their analogs in
the analysis of time series. We have already noted, for example, how classical
Downloaded by [Michigan State University] at 14:13 12 January 2015

harmonic analysis is the appropriate approach to known-period time functions


when the over-all situation is such that one should look at means rather than
at variances.
In dealing with the mean-like behavior of nonperiodic time functions from
a Fourier point of view, a natural and effective approach is furnished by complex
dcmodulatiun in which the given stretch of data (Xi 1 is first converted into
two stretches of (real) values, viz.
ix; cos coot} and IX, sin wOt}
which can usefully be regarded as the real and (+ or -) imaginary parts of
one or the other of the complex stretches of data
(Xiei”“‘J or {XieCioO’}.
The second step is to smooth the two real-valued stretches, smoothing both
in the same way. The simplest smoothing process is the formation of equally-
weighted “moving averages”, but it is often desirable to use weights which
taper down at each end appropriately. The final step is to display the result
in various ways, including:
(1) Plotting individual stretches of smoothed values against time.
(2) Plotting corresponding smoothed values against one another, using time
as a parameter.
(3) Plotting against time the phase or the magnitude of the complex number
whose real and imaginary parts are the corresponding smoothed values.
The interpretation of such plots is usually guided by an understanding of
what happens if a particular single frequency or band of frequencies are promi-
nent in the original data. If the original data were simply Xj = A cos (tit + cp),
then the values of the two modulation-product stretches would be
Xj COSWot = $A COS [(Cd- %P + (pII+ 3A cos [(u + w& + p]
Xj sin w0t = -+A sin [(w - wo)t + +o] + $A sin [(w + w,)t + +9]
and the result of smoothing these would be to nearly eliminate both terms if
DISCUSSION OF JENKINS AND PARZEN PAPERS 201

w was not near w0 , and to nearly eliminate the terms in (w + wo)t + (Dif w is
near w0 . The results of smoothing, then, would, if w is near w,, , be close to
[+A.G(w- 41 cosKu - wo)t+ (~1
and
[$A .G(w - q,)] sin [(u - wJt + (p]
where G(w - w,) is the magnitude of the transfer function of the smoothing
process (which we have assumed to use symmetrical weights and thus not to
affect phase). In this simple case, a cosinusoidal variation of angular frequency
w in the original, which may have been quite effectively concealed by larger
contributions at other frequencies, has been demodulated, and appears as a
cosinusoidal variation at the very much reduced angular frequency w - w0 ,
which is likely to be much more evident to the eye. (Complex demodulation,
the calculation and smoothing of two stretches of modulation-products, is neces-
sary if we are to distinguish the results of demodulating cos (w,, + 6)t from
the results of demodulating cos (wO - 6)t.)
Downloaded by [Michigan State University] at 14:13 12 January 2015

This technique is the natural extension to the non-periodic case of the ideas
underlying the classical Buys-Ballot table [e.g. Stumpff 1937, pp. 132ff or
Burkhardt 1904, pp. 678-6791, the so-called secondary analysis, and Bartels’s
summation dial [Chapman and Bartels 1940, pp. 593-599 or Bartels 1935 pp.
30-311. It has to be tried out on actual data before its incisiveness and power
is adequately appreciated.
Problems involving the simultaneous behavior of more than two time series
have not been worked on in a wide variety of fields of application, but enough
has been done to point the way and suggest the possibilities. There will be an
increasing number of instances where the corresponding non-time-series problems
would be naturally approached by multiple regression. These can be effectively
approached by multiple cross-spectrum and spectrum techniques which will
be precise analogs of multiple regression in spirit and, if care is taken in choice,
in the algebraic form of their basic equations. The differences which will arise
in the development will stem from:
(1) the fact that regression goes on separately at each frequency (which
produces merely an extensive parallelism of results), and
(2) the fact that regression coefficients will now take complex values rather
than real values (which enables us to learn a little bit more about the
underlying situation).
To my knowledge the multiple-time-series analogs of discriminant functions
and canonical variates have not yet arisen in practice. But there would seem
to be no difficulty in analogizing either or both.

III: PARSIMONY AND ERROR TERMS


Parsimony
It appears to be natural to try to set up statistical problems in such a way
that the numerical values of only a few characteristics, each easily estimated
from the observations, suffice to complete the fixing of a probability model
for the situation. And it appears all too natural to feel that such presuppositions
202 JOHN W. TUKEY

as normality or const’ancy of variance arc important, since, if t,hey f&d to


hold, the whole situation would not be complctcly fixed by the values of those
characteristics which are easily estimated. But, for all such naturalness, thr
working st’atistician knows that it is often useful to estimate t,he mean of a
population whose variance is unknown, and, similarly that it is often useful
to estimate the variance of a population that is non-normal (frequently wit’hout
trying to assess the nat8ure and amount of it’s non-normality). For characteristics
to be usefully estimat,ed, it is not necessary that their values complete a pre-
cisely stat,ed model, although it is frequently the case that results about de-
signing an experiment are only precise in such simple situations. Thus t#he
famous t’elephone query, “I’m going to do an experiment, how many sheep
should I use?” cannot be answered when all else that is known is that the ex-
periment)er want’s to compare the means of two treatments to a precision of
f1.5 pounds of body weight, or t,hat he wants to assess a simple variance t’o
&lO% of itself. In the first of these instances, precise design would require
a precise variance of observation. In the second, precise design would require
Downloaded by [Michigan State University] at 14:13 12 January 2015

precise knowledge of dist,ributional shape. Yet experiments can be and are,


wisely, if not optimally, designed and validly analyzed in the absence of such
precise information.
Insofar as normality is needed only (i) to ensure that knowledge of the spec-
t]rum would leave nothing else to learn, or (ii) to ensure that pre-experimental
assessments of variabilit,y are precise, and t,hese are the only reasons why Jenkins
is concerned with normality, normality is not of great practical importance
in spectrum analysis.
(It is fortunate that normality is moderately closely approximated to in
cert,ain applications, since t’here are further branches of t,ime series analysis,
for example those dealing with numbers of upcrosses or numbers of maxima,
for which normality is of crucial importance. Sequences of zeros and ones
represent one ultimate expression of non-normalit#y. In some instance, such
sequences are usefully studied by spectrum analysis, in others they are not.
The difference has to do with which aspects of their behavior is important.)
Indeed there is a very general principle of data analysis upon which all exami-
ners of main effects (in analyses of variance) lean, whether they know it or not.
This can be boldly stated as the Principle of Parsimony, viz. IT MAY PAY
SOT TO TRY TO DESCRIBE IN THE ANALYSIS THE COMPLEXITIES
THAT ARE REALLY PRESENT IN THE SITUATION. Every time t,hat
one pays attention to main effects alone, whether because Lhey are so much
larger t’han interact’ions, or because t(he interactions cannot be estimated with
sufficient precision, or for almost any other reason, one is behaving in accord
wilh this principle. Thus this principle is widely, though usually implicitly,
a.dopted. The same principle applies to the quadratic analysis of t,ime series,
t,o spertrllm analysis and its relatives, not just in a single way, but in some
three or four separate and distinct ways:
Normality
The first, application is to t#heneed, or lack of need, for est,imation to a complrtc
specification, for either assuming normality or estimat,ing more complex matters
DISCUSSION OF JENKINS AND PARZEN PAPERS 203
than the spectrum. In most practical situations this need is nonexistent. Knowl-
edge about the spectrum of a probably non-normal ensemble of time--functions
can be useful, just as knowledge about the mean of a population of imprecisely
known variance can be useful. (In eit’her case, once the data has been gathered,
consistency of repetition is the appropriate basis for judging the stability of
t,he result, not wssumpt,ions about normality or known variance.)

Stationarity
The second application of the general principle is t’o the assumption of station-
arity, t,he analog in time series situations t’o the assumption of constancy of
variance in more classical sit,uations. The assumption of stationarity is one
at which the innocent boggle, sometimes even to the ext’ent of failing to learn
what the data would tell them if asked. Yet I have yet to meet anyone experienced
in the analysis of time series data (Gwilym Jenkins is an outstanding example)
who is over-concerned with statiouarity. All of us give some thought to both
possible and likely deviations from stationarity in planning how to collect or
Downloaded by [Michigan State University] at 14:13 12 January 2015

work up data, but no one of us will allow the possibility of nonstationarity to


keep LB from making estimates of an average spectrum, any more than working
analysis-of-variance statisticians will refrain from estimating a variance com-
ponent because the variability thus assessed may well have to be an average.
The fact that the spectrum is changing with time (or elevation, or azimuth)
need not make it unwise to estimate one, or several, average spectra. The de-
tection of waves 1 millimeter high, 1 kilometer long, with a 10,000 kilometer
fetch (Munk and Snodgrass 1957) was based upon estimates of spectra averaged
over four-hour periods. The crucial point in identifying the length of fetch
was the rate of change of the center frequency of this distinctive, but very
small peak, from one four-hour period to another. Once we admit that we are
estimating an average spectrum, we have admitted that there may well be
other relevant characteristics of the situation beyond the spectrum, that esti-
mation is not completing specification. Such an admission, as this example
,shows, is a good thing rather than a bad one.
There seems to be extra reluctance to consider an average spectrum. It is
hard t’o be sure of the principal reasons for this, but a well-founded desire for
replication as a basis of security is likely to be one. If only one time series is
available for analysis, as is far too often the case in so many economic instances,
it is comforting to believe that, somehow, stationarity makes it possible to have
“replication” from one time period of another. The truth is not so comforting.
.Stationarity is frequently absent. Even when stationarity holds, something
like “replication” can only occur within the limits of a single stretch of moderate
length if the true spectrum is devoid of detailed features (is sufficiently smooth
in the small). And it is surely not wise to trust in “replication” that may not
be there.
Harry Press notes (private communication) that average spectra may hide
an important departure from stationarity. In an entirely similar way, the use
of analysis of variance on the results of an experiment comparing 12 treatments
in randomized blocks may hide a substantial dependence of variability upon
.trcatment, or a sltbstantial dependence of treatment effect upon block. These
204 JOHN W. TUKEY

things can, and do happen. The possibility of their occurrence must be carefully
kept in mind. But t.his fact is not relevant to the point we have just been dis-
cussing.
Surely, if one has both adequate data and scientific or insightful ground to
fear nonstationarity, it will be wise not to average spectra over too long a time.
But the urge to choose the averaging time wisely is strengthened by an under-
standing that all data analyses estimate average spectra.

Wisely-chosen resolution
The third application of the general principle is to the question of the nar-
rowness of the frequency ranges for which we should seek spectrum estimates.
There are infinitely many frequencies. The number of separate frequencies
over which we could seek estimates from a given body of data is limited by
the extent of the data, and grows without limit as longer and longer pieces of
data become available. But it does not follow that we should always, or even
usually, work close to this limit. The analogy with an interaction mean square
Downloaded by [Michigan State University] at 14:13 12 January 2015

in a row-by-column table is close and persuasive. There are r. c individual


estimates of the interaction mean square, each based on just one of the residuals
which remain after fitting rows and columns, each involving just one degree
of freedom. How often does it pay us to calculate and compare all these sepa-
rate estimates? Only very rarely. (It is often useful to calculate and compare
a few estimates of an interaction mean square, each based on a reasonable
portion of the available degrees of freedom.) The position with spectrum est,i-
mates is analogous and similar; to be effective we must estimate averages over
well-selected frequency ranges. (This is in addition to t’he averaging over time
necessitated by lack of perfect stationarity.) In both instances, interaction
mean square and spectral estimate, it does not pay to try to estimate too much
detail, even if t’he detail is really there.

Proper error terms


The question of the proper error term is a classic of t’he analysis of variance,
often relied upon to separate the men from the boys and the pastry cooks.
It is well recognized that, for example, the plot-to-plot error of an agricultural
experiment is almost certain to be too small, specifically because it rules out
place-to-place and year-to-year components of variation. It is not too great
a stretch to consider this question, which arises for time series in an only slightly
different form, a fourth example of the general principle of parsimony. For
while it will not be costly to estimate plot-to-plot variance, it is likely to be
costly to trust it, to use such estimates as error estimates. Even its estimation
may be costly, in the agricultural situation, if the result is to expend too much
effort on choosing the optimum plot size, on doing one’s best to reduce what
may be a minor source of variation. As Jenkins points out at the very end of
his paper, it is not uncommon for spectrum estimates based upon different
experimental repetitions to differ more than might be expected from their
internal behavior. (Statisticians familiar with any of a wide variety of other
situations would be surprised if this were not so, if external error were not larger
than internal error.) As a consequence, it is not likely to be worthwhile to expend
DISCUSSION OF JENKINS AND PARZEN PAPERS 205
too much effort in using estimates whose windows have optimum widths and
optimum dekled shapes, since this may mean exerting a large effort to mini-
mize a minor component of variability.
One way to describe matters is in terms of alternative ensembles. In each
repetition of the experiment, the time series which is actually realized is drawn
from a different ensemble (from a different population each element of which
is a whole time series). Such a description is entirely analogous to a description
of an agricultural experiment in which each local comparison of two treatments
is drawn from a population, but the populations for different “places” or “years”
differ. The fact that matters may be appropriately described in such a way
often affects what we wish to estimate. If an average comparison, in the agri-
cultural situation, depends upon the “place” in a way, or for reasons, that we
do not understand, we are usually driven to estimate, not average responses
at individual places, but rather average responses for all places. (These are
the natural “main effects.“) There are situations, however, as for example
when studying a cheaper substitute to see if it causes occasional deleterious
Downloaded by [Michigan State University] at 14:13 12 January 2015

effects, where we may need, because of variation from place to place, to esti-
mate the value of the least favorable average response and, perhaps, the frc-
quency with which similarly unfavorable situations will arise in more extended
practice. The situation with time series is exactly similar.
Most of the time we shall be driven to estimation of a spectrum averaged
over repetitions, where the pattern, or t,he causes, of the changes in spectrum
from repetition to repetition are not understood. This averaging over repeti-
tions, forced on us by alternate ensembles, is superposed upon the averaging
over time within repetition, partially forced upon us by nonstationarity, and
upon the averaging over frequency bands, forced upon us by the limited extent
and amount of our data. What we estimate, then, is an average of averages
of averages. We have come a long way from the idea of a tight specification-
estimation relationship, where everything which is not presupposed should be
estimated. But it is well that we have done so. And no one who has considered
carefully what is estimated by a main effect in a reasonably complex analysis
of variance can maintain that so much averaging is surprising or unusual.
Just as in more conventional areas of statistical application, there are situa-
tions, the comparison of vibration intensity with structural strength being
perhaps the most obvious, where we shall need to estimate not the average
spectrum but some upper limit, perhaps an upper 99% limit, for the spectra
in the various replications, for the spectra of the various alternative ensembles.
But such instances are the exception, not the rule.

Eflects upon balance between stability and resolution.


In any case, the presence of true differences between repetitions, of differences
between the spectra of the alternative ensembles, will surely force a readjust-
ment of the balance between stability and resolution. The main reason for
estimating average spectral densities over relatively broad frequency bands
is to assure moderate stability of estimate. If variation within ensembles should
be small compared to variation between ensembles, such within-ensemble sta-
bility is of little value to us. Thus we can afford, in such circumstances, to
206 JOHN W. TUKEY

improve our frequency resolution by estimating spectral densities averaged


over narrower bands. (There will still remain a natural limitation on resolution,
however, associated wit,h the limited duration of the individual ensembles.)

IV: SPECIAL PROBLEMS OF TIME SERIES


Resolution
The notion of resolution, as applied in optics and other branches of physics,
is a well-recognized and useful physical concept. It does not have any single
definition in numerical terms, and it is well that it does not. For the general
idea that “higher resolution” means “capable of detecting more detail” is
clear, while any one way of making it quantitative would not be universally
saCsfactory. (If you like, “resolution” is not “unidimensional”. But whether
you like this fact or not, it would be unwise to make it unidimensional by a
fiat of definition.) Jenkins and Parzen have introduced us to a number of defi-
nitions of bandwidth. There are, and will be, other such definitions. The
Downloaded by [Michigan State University] at 14:13 12 January 2015

value of any of them lies in what the values of the variously defined band-
widths tell us about “resolution”. No one definition, nor even all the defi-
nitions so far given, can tell us all about resolution. As Goodman pointed out
in his verbal discussion, such matters as “rejection slope in dbloctave away
from the major lobe” or “db of rejection at a particular frequency” can be
important in particular circumstances. Thus numerical values of bandwidths
according to any definition closely related to “resolution” can help us, but
they will help us most if we regard them as telling us part, not all, of the story.

Choice of resolution
There is one matter upon which I should not like to have my views mis-
understood: the desirability in exploratory work of making spect’ral analyses
of the same data with different rrsohItions (usually represented in packaged
systems of calculation of spectrum analysis by the use of varying numbers
of lags in the initial computing step, which is the calculation of sums of lagged
products). Let me be quibe clear that, in my judgment and according to my
experience, it definitely is very often desirable in exploratory work, and sometimes
essential, to make analyses of the same data at diflering resolutions, Moreover,
it may be equally important to use different window shapes and different pre-
whitenings.
The place where Jenkins and I differ seriously, at least verbally (and I suspect
the difference is more verbal t’han actual) is in the utility of examining some
sequence of mean lagged products as a firm basis for choosing the number
of such values to be inserted in an appropriate Fourier transformer, and trans-
formed into spectral estimates. Our difference is greater still in connection
with the adequacy of the point of apparent “damping down” of these values
as a basis for choosing this number. It is not that knowledge of t,hc “damping
down” lag is not useful, but rather that’, at least in my view, its unthinking
use may be dangerous.
On the one had, I have known of cases where the useful estimates of power
spectra came from stopping well short of the damping-doJvn point. On the
DISCUSSION OF JENKINS AND PARZEN PAPERS 207
ot#her hand, if the spectrum were to contain one very large, very broad, very
smooth peak, and a close group of small, narrow peaks, the mean lagged products
would appear to damp down at a lag associated with the width of the large
broad peak, so that a spectrum whose resolution was associated with this
damping-down point would fail t,o resolve the close group of small peaks. Here,
as in all sorts of data analysis, there is no substitute for careful thought com-
bined with trial of various alternatives.
It is natural to be tempted into calculating more spect’rum estimates than
the number of mean lagged products used as their basis. This t,emptation need
not be a dangerous one, once it is realized that, given the mean lagged products
and the shape of the window, all the possible spectrum estimates lie on a cosine
polynomial of degree equal to the number of lags used. Once the usual number
of spectrum estimates have been calculated, they are enough to determine
this polynomial, and the calculation of further estimates is equivalent to a
process of cosine-polynomial interpolation. This does not mean that calculating
more estimates is useless, or that t,he results of furt,her calculation will lie close
Downloaded by [Michigan State University] at 14:13 12 January 2015

to the results of straight-line interpolation between the points already calculated.


But it does mean that the additional estimates provide no new information,
only more detailed exposition of information already present. And it means
t,hat drawing smooth freehand curves through the original spectral estimates
is often much more useful than com?ecting them by segments of straight lines.

Blurred estimands
In discussing the general principle of parsimony we emphasized the need
to estimat,e averages over bands of frequencies. This point is so central to spec-
tral analysis as to make its heuristic and intuitive understanding worth con-
siderable effort. Let us begin with classical situations. If one has more degrees
of freedom than variance components, then one can find estimat’es of some
(and perhaps all) of these variance components whose average values do not
depend upon t’he other variance components. But once there are more variance
components than degrees of freedom, this need not be the case, Consider a
two-way r-by-c array of observations in which there are r.c + 2 variance com-
ponents, viz. a rows variance component, a columns variance component, and
one variance component for each of the r =c cells. (This is a natural model when
the variance of the cell contributions varies irregularly from cell to cell.) In
this situation there is no estimate of any of the r. c cell variance components
whose average value is free of all the other variance components.
In t’he time series case there are very many more variance components than
degrees of freedom. For, unless some periodicity assumption holds perfectly
(and I know of not a single instance where it does), a contribution of the form
A cos wt + B sin wt
is permissible for any value of w in some interval. And as all statisticians know
from bitter experience, at least all t,he things that are permissible mill happen.
Thus, in principle, there are infinitely many variance components, one for
each possible w. And, when the realities of band-limiting and of finite duration
of data are faced, there are only a finite number of observat,ions available, and
208 JOHN W. TUKFf

hence only a finite number of degrees of freedom. There is no hope of esti-


mating all variance components here, even by using impractically u&able
estimates.
Bracketting undesired eflects
Let us return, for the moment, to a situation with a finite number of variance
components, only four of which will enter our discussion. Let us suppose that
we are interested in estimating a particular one of these variance components,
g12 , and that our choice has narrowed down to three quadratic functions of
the observations, whose average values are
ave {A] = gf + 0.040-i - 0.02~7: + 0.01~:
ave (B) = at + 0.060~ + 0.04~: + 0.02ai
ave {C) = US - O.OSui - 0.05~: - 0.03~7:
Downloaded by [Michigan State University] at 14:13 12 January 2015

So long as we insist on using only a single quadratic function of the observations,


the choice of A, whose average value is least affected by ui , ui , and U: has
a real advantage. But if we were willing to look at two quadratic functions
of the observations together, then B and C are a more effective choice, at least
so far as average values go. For, on the average, one is raised by the other
variance components, while the other is lowered. If, for example, the observa-
tions are replicated m times, so that there are m A’s, m B’s, and m C’s, and
so that, consequently,
B + kg/ drn
is an upper confidence limit’ for ave B, while
c - &/kG
is a lower confidence limit for ave C, then the interval
(C - ts,/d,, B + tsJv%i)
is a confidence interval for ~9 , without regard for the values of ui , ui , and u”, .
(So such confidence interval can be based upon the m values of il.) When-
ever we cannot get estimates (of what we want to estimate) whose average
values are wholly free of what we do not want to estimate, the use of such
paired estimates, one underestimating and the other overestimating, is likely
to be useful and, perhaps, even necessary.
When we make estimates of spectrum densities, the window which relates
the average value of our estimate to the spectrum is (for the apparently in-
escapable case of equally-spaced data) inevitably a cosine polynomial (of degree
no larger t’han the index of Dhe longest lag used). It can vanish at only a finite
number of points. Consequently its main lobe, which points out the band of
frequencies over which we seek to estimate some average spectrum density,
is inevitably accompanied by minor lobes which allow leakage from the parts
of the spectrum outside the desired band t,o affect the average value of our
estimate, and hence to affect its individual values. Even if we are willing to
accept the blurring due to averaging within the major lobe, as we must, like
DISCUSSION OF JENKINS AND PARZEN PAPERS 209
it or not, we are rightly reluctant to face unknown possibilities of leakage from
other parts of the spectrum. The cure is the same as for the example with four
variance components: use two estimates. (This time one estimate should have
all minor lobes negative while the other has all minor lobes positive.) This
general situation is discussed more fully elsewhere [Tukey 1961 (?)I, and it
is to be hoped that some suitable pairs of estimates will soon be explicitly
available. (For one pair see Wonnacott 1961.)

Kinds of asymptosis
The purpose of asymptotic theory in statistics is simple: to provide usable
approximations before passage to the limit. Consequently asymptotic results
and asymptotic problems are likely to be of limited utility when the finiteness
of a sample size or of some other quantity is of overwhelming importance.
(Thus, for example, the theorem that maximum likelihood estimates are asymp-
totically normally distributed with a certain variance-covariance matrix is
rarely of any use when there are only 1 or 2 degrees of freedom for error.) It
Downloaded by [Michigan State University] at 14:13 12 January 2015

is sometimes hard, but almost always important, to remember this fact.


Time series analysis follows its usual pattern, “like most statistical areas,
only more so !“, insofar as asymptosis is concerned. For there are three distinct
ways in which time series dat’a could tend toward a simplifying limit:
(1) The total extent of all the stretches of data available could become
more nearly infinite.
(2) The extent of each individual stretch of data could become more nearly
infinite.
(3) The bandwidth of the measurement could become more nearly infinite
(requiring a more nearly vanishing interval between times of recording).
The consequences of these three, which are quite distinct, depend upon whether
the resolution of the estimates to be made (a) remains constant, (1,) increases
as fast as the total extent, extent, or bandwidth of the data, or (c) behaves in an
intermediate manner.
If (1) occurs without (2) or (3), the possible resolution does not increase,
so that (a) is the only relevant situation. The stability of individual estimates
of (averaged) spectrum density then increases essentially proportionally to
the total extent of data.
If (2) proceeds, (1) must also. If (2) and (1) proceed wit’hout (3), the range
of (aliassed) frequencies to be considered will not change, so that a constant
number of cst’imates corresponds to constant resolution, and to an increase
in stability essentially proportional to total extent of data. If, on the other
hand, the resolution is increased proportionally to the total extent of data,
the stability of individual estimates will remain constant.
If (3) proceeds without (1) or (2), we may make estimates over a wider and
wider frequency range, but we cannot obtain higher and higher resolution.
For constant resolution, we obtain constant stability.
In practice, where there are several repetitions, several stretches of data,
it may be that we can wisely treat the total ext’ent of all data stret,ches asymp-
totically (especially when the addit,ional variability in cxternnl error should
210 JOHN W. TUKEY

be considered), but I know of no single practical instance where an asymptotic


treatment of either stretch length or band-limitation gives useful results.
The limitation on ult,imate resolution due to limited extent of data stretches,
and the limitation on frequency ranges for which estimates can be made due
to band-limiting, always seem to behave like small-sample phenomena, and
must be faced in detail. They do not at all behave like large-sample phenomena,
where everything can be “smoothed out” and treated in a limiting, continuous
way.

V: THE MORAL
To analyze time series effectively we must do the same as in any other area
of statistical technique: “Fear the Lord and Shame the Devil” by admitting that:
(1) The complexity of the situation we study is greater than the complexity
of that description of it offered by our estimates.
(2) Balancing of one ill against another in choosing the way data is either
Downloaded by [Michigan State University] at 14:13 12 January 2015

to be gathered or to be initially analyzed always requires knowledge


of quantities which cannot be merely hypothesized, and which, in many
cases, we cannot usefully estimate from a single body of data, such
as ratios of (detailed) variance components or extents of non-normality.
Theoretical opt’imizations based upon specific values of such quantities
~zay be useful guides, but only when the failure of past experience (and
the present dat.a) to give precise values for these quantities is recognized
and allowed for.
(3) There is no substitute for some sort of repetition as a basis for assessing
stability of estimates and establishing confidence limits.
(4) Asymptotic theory must be a tool, and not a master.
The only difference is that one must be far more conscious of these accept-
ances in time series analysis than in most other statistical areas.
In a single sentence, the moral is: ADMIT THAT COMPLEXITY ALWAYS
INCREASES, FIRST FROM THE MODEL YOU FIT TO THE DATA,
THENCE TO THE MODEL YOU USE TO THINK AND PLAN ABOUT
THE EXPERIMENT AND ITS AN,4LYSIS, AND THENCE TO THE
TRUE SITTJATION.

VI: THREE MYSTERIES


Up to this point, we have been concerned with the fundamentals of time
series analysis and with t’he close and cogent analogies between time series
analysis and ot’her areas of stat’istics. As a consequence our remarks have related
most closely to t,he first of the two papers. It is now time to turn to the second
paper, which grapples with some of the more detailed aspects of time series
analysis. Here it seems best to try to shed light on a few of the aspects which
are likely to seem most myst’erious. Our attention will be given to the mysterious
importance of dividing sums of lagged products by n rather than by n - k,
to the mystery of how new window patterns are sought, and t.o the mysterious
importance of choosing a window.
DISCUSSION OF JENKINS AND PARZEN PAPERS 211

Does the dicisor matter?


The major computational effort, as measured in millions of multiplications
or minut,es of machine t’ime, of any conventional careful spectral analysis is
expended on the calculation of t,he sums of lagged products
n-k
z = z xixi+k

(If these are calculated for Ic = 0, 1,2, . . 1 , m, some (m + 1)n - m(m - 1)/2 -
m.n multiplications will be required.) The Xi in this calculation will be raw,
or prewhitened, or otherwise modified observations, from which means, fitted
polynomials, or other fitted trends may or may not have been subtracted.
Unless unusually careful preparatory steps for the elimination of very low
frequencies were already taken in the preparation of the Xi , the next step
aft’er calculating these sums of lagged products will be adjustment of these
sums of lagged products for means or trends. It is vital to deal in practice with
Downloaded by [Michigan State University] at 14:13 12 January 2015

such adjusted sums of lagged products, as almost everyone who ent.ers upon
time series analysis seems to have to learn for himself. (However, it will save
space and, hopefully, promote clarity if we omit the word “adjusted” during
the remainder of this discussion. We shall omit it.) Having been told of sums
of lagged products, every analyst of variance expects us to go on to mean lagged
products. Going on is inevitable.
There is a question of the appropriate divisor. If we had not corrected for
the mean (or any trend) there are cases to be made for both n and n - lc. If
we had corrected for, say, a general linear trend (which absorbs 2 degrees of
freedom), there are cases to be made for n, for n - 2, for n - k and for n - k - 2.
Parzen gives attention, between his (4.6) and (4.7), to some of the reasons for
choosing n or n - 2 rather than n - k or n - k - 2. By analogy with the
analysis of variance we might feel that n - k - 2 (or, when no adjustment
is made, n - k) would be desirable because unbiasedness is good. The un-
biasedness argument is found not to be a strong one in the time series situation.
Is t’his choice an important one for the analyst or investigator whose concern
is with the spectrum? You should be happy to be told that the answer is “no”.
If one’s concern is with the spectrum, then the most important thing about
any quadratic function of the observations is the spectrum window which
expresses the average value of the estimate in terms of the spectrum of the
ensemble. (The next most important t’hing is, of course, the variability of the
quadratic function.) This is just what we should expect for a variance-com-
ponent problem, where means and other linear combinations of the observations
are without direct interest. For if, in some very complex (probably unbalanced
to begin with, and then peppered with missing plots) analysis of variance,
one is given the values of cert,ain mean squares (or other quadratic functions
of t’he observations), the first question one concerned with variance components
asks is “How are the average values of these mean squares expressible in terms
of our variance component’s?“. (The question about stability “How many
degrees of freedom should be assigned to each?” is important but secondary.)
If we know the windows associated with our spectrum estimates, we need not,
212 JOHN W. TUKEY

be concerned, in the first instance, with how these estimates were obtained.
And, moreover, any linear combination of the result)s of dividing the sums of
lagged products by n, is also a linear combination of the results of dividing
the sums of lagged products by n - E, and vice versa.
The practicing spectrum analyst need not be concerned with division by
n or n - k, so long as he doesn’t mis-assemble formulas by combining some
which are appropriate for one divisor wit’h others appropriate for the other.
However, t’hose interested in the theory of spectrum analysis do need to give
some attention to this choice, partly because of the reasons given by Parzen,
partly because this choice affects just what functions of frequency the mean
lagged products are Fourier transforms of, partly for various other reasons.
The man who has a practical interest in the autocovariance function, if there
really be such, clearly also has to take an interest in alternative estimates.
Unlikely though it may seem at first, there is a moderately close analogy
between the biased estimates supported by Parzen and biased estimates which
are reasonable in classical analysis of variance. Consider data in a single classi-
Downloaded by [Michigan State University] at 14:13 12 January 2015

fication with r observations in each class, so that the between mean square
has average value c2 + r$ , where r2 is the error variance component, and
a; is the between variance component. If we wish to estimate the population
average corresponding to a particular classification, there is little doubt that
the sample mean for that classification is t’he most reasonable estimate. But
if we wish to depict the pattern of the population averages corresponding to
all classifications, we should do something about the inflation of this pattern
by error variance; we should replace the pattern of observed means by a suitably
shrunken pattern. (In the simplest cases it may suffice to shrink each classifi-
cation mean toward the grand mean by the fact’or [T&‘(o’ + $)I*. In others
the method developed by Eddington for dealing with stellar statistics [Trumpler
and Weaver 1953, pp. 101-1041 may need to be applied.) The analogy with
the time series case is reasonably, in fact surprisingly, close. If we wanted to
estimate just one autocovariance, we should undoubtedly use the unbiased
estimate. But if we are concerned with the pattern made by the estimated
values, with the nature of the autocovariance function, we may, as Pareen
points out, do better to use the biased estimate.
(The extreme instance of the problem underlying this choice in the time
series case arises when one 5-minute record is “cross-correlated” [really cross-
covarianced] with another 5-minute stretch of the same time series, as recorded
an hour, a day, or a week later. If the spectrum of the ensemble is relatively
sharp, the average value of the covariance will still tend to zero, but the average
value of its square will tend, not to zero, but to a value depending upon the
product of the 5-minute duration with the width of the spectral peak. Thus
if one calculates autocovariances at lags from 24 hours 0 minutes to 25 hours
5 minutes one will almost certainly find an apparently systematic wavy pattern
in the unbiased estimates of autocovariances or autocorrelations computed
for a particular realization. It is natural t’o believe that this pattern is “real”,
although t’he true average values of the autocovariances are actually very,
very much smaller in magnitude than the values found from a single realization.
Suchpatterns can be so regular as to mislead investigators into an unwarranted
DISCUSSION OF JENKINS AND PARZEN PAPERS 213

belief that the presence of a strikingly accurate underlying clock has been
demonstrated.)
How can I construct a window?
If we leave aside a few matters which really do not matter here, although
some of them are very important elsewhere (such as adjustment for the mean,
ot,her devices for rejection of very low frequencies, and division by n - k not n),
the function of lag by which the mean lagged products are multiplied before
Fourier transformation, and the window (expressed in terms of w - o. and
w + w. separately, where w. is the center frequency of the estimate) through
which the spectrum determines the average value of t’he estimate, are Fourier
t,ransforms of one another. (If you have never followed a derivation of this,
just take it on faith.) Since every lag must be a multiple of the data interval,
one of these functions is a finite array of spikes, spaced one data interval apart.
The other function is a polynomial in cos (u - wo) of an appropriate degree.
Downloaded by [Michigan State University] at 14:13 12 January 2015

While the discreteness of time is generally an important aspect of the data,


it is not important for our present purposes, so that we may replace the spiky
lag window by a smoot,h function of a continuous variable without altering
its Fourier transform in any way which is essential to the present discussion.
(Provided t#hat we began with, say, at) least lo-20 spikes.) Since we are going
to calculate mean lagged products for only a finite number of lags, this continuous
lag window must vanish outside a finite interval. If it were possible, we would
like to have its Fourier transform, the corresponding spectrum window, also
vanish outside a finite interval, for then t,he average value of the corresponding
spectrum estimate would only involve contributions from a restricted part of
the spectrum.
It is, however, well known that a function and its Fourier transform cannot
both vanish outside finite intervals. Indeed, they cannot both go to zero too
rapidly as their arguments tend to infinity. The standard example of a function
which, together with its Fourier transform, goes to zero rapidly at infinity is
the standard normal density function, which together with it,s Fourier trans-
form, goes to zero as the negative exponential of half the square of its argument.
Unfortunately, we cannot make use of the normal density as a lag window,
because it does not vanish outside a finite interval.
Every statistician knows, however (or so the phrase goes), how to approxi-
mate a normal distribution by a bounded distribution. It is only necessary to
consider the distribution of means of simple random samples from any bounded
parent distribution. And what parent distribution could be simpler then the
rectangular (uniform) distribution? If we take samples of size k, the Fourier
transform of the distribution of means will be of the form (sin u/u)~, where
u is a multiple of w - w. , depending upon k and the number of lags used. The
larger is Ic, the smaller are the minor lobes of this window in comparison with
the main lobe, and the more lags are required to give a main lobe of prescribed
narrowness. If 1~ = 1, which corresponds to a raw Fourier transform of the
mean lagged product’s, the minor lobes adjacent to the main lobe are about $
the height of the main lobe (and negative), which proves to be impractical.
If 1~ = 2, which corresponds to lint 1 in Pnrzen’s Table 1, and minor lobes are
214 JOHN W. TUKEY

at most k the height of the main lobe, and the resulting spect’ral window, often
called the Bartlet,t window, is everywhere posit,ive. If li = 4, which corresponds
Do line 8 in Parzen’s Table 1, and to ha(u) in his Table 2, the minor lobes are
at most &- the height of the main lobe, and the resulting spectral window, as
Parzen shows, is quite effective.
It would be perfectly possible to use k = 8 or k = 16 if we wished even lower
lower minor lobes. The cost to us of doing t,his would be twofold. There would
have to be an increase in computat’ional effort in order to provide mean lagged
products for the additional lags required to give a main lobe of comparable
width. And t’he shapes of the main lobes would be somewhat less favorable,
since the process of raising the window to a higher and higher power will make
both the minor lobes and the lower portions of t’he main lobe still lower. As
a result t’he main lobe will “occupy” a smaller and smaller part of the frequency
band between the zeroes (of the window) which define it,, and, consequently,
the variability of the corresponding estimate (leakage aside) will be greater
than that of an estimate with a more “blocky” spectrum window.
Downloaded by [Michigan State University] at 14:13 12 January 2015

As is clear from Parzen’s paper, these are not the only useful lag windows,
the “cosine-arch” or “hanning” lag window which is proportional to “one
plus cosine” being also of practical interest. This latter window was “discovered”
by empirical observation, and the best reason for considering it are the properties
it is found to have.
(Two further easily understandable types of window which may sometimes
prove useful may be obtained respectively, (i) by taking a truncated normal
distribution as the lag window, (ii) by taking a Cebys&r polynomial for the
spectral window. This last choice makes all minor lobes of equal height, and
as small in comparison with the main lobe as is possible for a given number
of lags. This equality of height, which makes the minor lobes adjacent to the
main lobes lower than those of most other windows but makes minor lobes
far away from the main lobes relatively higher than those of most other windows,
seems to prove t(o be a disadvant’age rather more often than it proves to be an
advantage.)

How important is window choice?


We have discussed window carpent,ry briefly. Now we need to ask what does
it buy us, how much better can we do with a specially constructed window
than with a rather routine one. This question has opposite answers, depending
on whether one relies upon his window to do everything for him, or not.
If one relies solely upon windows, faces a peaky or st,eeply slanting spectrum,
and is concerned wit,h the behavior of the spectrum where the density is notice-
ably below its highest values, t)hen the quality of workmanship and polish of
the window used can easily be of the utmost importance. (During the early
’50s I spe& considerable effort on a variety of ways to improve windows. The
results have never been published because it turned out,, as will shortly be
explained, to be easier to avoid the necessity for their use.)
If one applies his windows, actually or effect,ively, not necessarily to the
original dat’a but, whenever useful, t’o the results of simple linear modifications
of t’he original data, chosen SO as t,o depress peaks, to raise valleys, and, where
DISCUSSION OF JENKINS AND PARZEN PAPERS 215

necessary, to remove narrow peaks (which may appear to be “lines”), he will


rarely, if ever, find any need for anyt’hing beyond a window of routinely good
quality, such as the hanning or cosine arch window (or, if a slight increase
in variance of estimat,e and a substantial increase in computational effort are
worth bearing, the k = 4 window described above). (For discussion of tech-
niques of linear modification see Blackman and Tukey 1959, Holloway 1958,
and, perhaps, the work of the Labroustes referred to by Chapman and Bartels
[1940, p. 9921 and Blackman and Tukey [1959, p. MO]). In my own experience
this sort of approach to the problem, which corresponds [Blackman and Tukey
1959, p. 421 to using different window shapes in different frequency bands, is
much easier than seeking out explicit forms for very special windows to meet
each special situation. Moreover [e.g. Blackman and Tukey 1959, pp. 62-63,
Tukey 1959, pp. 315-3161, consideration of this technique leads to very helpful
insights into how the data is best gathered in the first place.
But each of us is entitled to do his calculations as he pleases, so long as he
does adjust his techniques to provide the amounts of precision and stringency
Downloaded by [Michigan State University] at 14:13 12 January 2015

his problems require.

VII : COMPUTATIONAL CONSIDERATIONS


It is important to say something about the role of computational efficiency
and computational choices as a considerations in time series analysis. Com-
putational considerations are particularly important in time series analysis,
in part because of the relatively large amounts of data processed, in part because
of the very many multiplications involved in obtaining sums of lagged products,
and in part for more subtle reasons. And it is sometimes hard, especially for
the novice, to separate computational, statistical, and aims-and-purposes con-
siderations, one from another. Yet if they are not separated, neither sound
practices nor sound advice can be understood as such, rather than being taken
on faith.
Computational considerations depend very much on the equipment available.
Crude spectral analysis is possible with paper and pencil [Blackman and Tukey
1959, pp. 151-1591, and modestly refined computations have been done on
hand calculators. The beginning of effective spectrum calculation probably
involves the use of punched-card tabulators to obtain sums of lagged products
(by applying progressive digiting to cards obtained by off-set reproduction
[Hartley 19461) and the conduct of all further computation on hand calculators.
‘The steps from this to fully automatized spectral analysis on machines of the
capacity and speed of an IBM 7090 or CDC 1604 are many and long. The
reluctance or eagerness with which one faces another hundred thousand multi-
plications depends very st’rikingly on the equipment available.
And, consequently, so does one’s attitude toward using many more lagsto
improve window shape or increase resolution, or t,oward recomputing mean
lagged products whenever 11~~ spectrum estimates (estimates differing in re-
solution, in window shape, in prewhitening, or in rejection filtra,tion) are to be
obtained from the same dat’a. In t’he economy of abundance which goes with
modern electronic computers, I prefer bo recompute mean lagged products
216 JOHN W. TUKEY

when a new set, of spectrum e&mates are required, but. others feel quite dif-
ferently. Some of the reasons for this difference can be made manifest, and
their mendion may serve to illuminate a variety of computational issues.

To recompute or not to recompute?


First, recomputation when necessary allows the use of packaged, unified
machine programs, which require only values for a few constants and the data
in order to provide the desired spectrum estimates. This makes it much easier
for those unsophisticated in time series analysis, whether investigators or tech-
nical aides, to process data more easily and effectively. Most data analysis is
going to be done by the unsophisticated. As statisticians we have a responsibility
to package as many techniques as possible for safe and effective use by those
who will analyze data, and who will not understand why the choices in the
package were made wisely or unwisely.
Next, and perhaps more important for the present, is the absence of adequate
Downloaded by [Michigan State University] at 14:13 12 January 2015

facilities for data analysis. There is no data-analytic language analogous to


FORTRAN or ALGOL, in whose terms it is easy to describe the operations of
data-analysis, and, what if far more crucial, I know of no large machine instal-
lation whose operations are adapted to the basic step-by-step character of
most dat,a analysis, in which most answers coming out of the machine will,
after human consideration, return to the machine for further processing. Neither
programming languages or computer center operations are adapted to step-
wise operation, and all of us who use big machines for data analysis are thus
forced to more unified operation than might otherwise be desirable.
Third, and this consideration is not related or restricted to big machines,
stepwise computation tends to produce stepwise thinking. I believe that step-
wise thinking led to the classical Schuster periodogram, and hence to decades
of ineffective quiescence for frequency-oriented analysis of time series. The
individual steps from data through intermediate results to periodogram ordinates.
seemed reasonable each by itself. And while Stumpff’s book recognized the
nature of the corresponding spectral window before 1940 [Stumpff 1937, pp-
9%1001, nothing was done to provide more useful estimates until people began
to relate average values of estimates to the spectrum of t,he ensemble of which
the data is one realization. What security we can have in today’s frequency-
oriented time-series analysis comes from over-all thinking, while many of the
most threatening dangers come from step-by-step thinking. Thus we often
do very much better to apply over-all processes (which have been thought
through overall, not merely stepwise) to data than to apply the individual
steps separately. This view does not deny t’he great desirability of “try, look,
and try something a little different” as the typical pattern of data analysis.
It merely asks that each trial, unless it is ext.remely exploratory, be thought
through as a unit. It does not even say that it is unwise to calculate sums of
lagged products once and for all. It only calls on those who do so to be sure,
that the total processes they apply to data have been thought through as wholes.
It does, however, not,e that using preplanned packages increases the chances
that such thinking will have been done.
DISCUSSION OF JENKINS AND PARZEN PAPERS 217

Precision may matter


Finally, there is a question of required precision of arithmetic. Let us approach
this somewhat indirectly. In friendly conversation, James Durbin recently
brought firmly to my attention that there was an alternative to first prewhitening
t,he observations and then calculating sums of lagged products for these modified
values, remarking that one might, instead calculate rather more sums of lagged
products for the original observations, and then calculate the suitable simple
linear combinations of these sums which would be identically equal to the sums
of lagged products for the modified observations. This remark is surely well
taken. The results are algebraically identical. And if spectrum estimates for
t,he results of enough different prewhitenings of the same data are going to
be required, t’hen t,he computational path suggested by Durbin will surely have
real advantages. But it behooves us equally to consider the possible disad-
vantages of this alternate approach. Perhaps the greatest of these is the likely
requirement of greater precision of arithmetic, (although it is interesting to
note that, if only one set of spectrum estimates is to be calculated, prewhitening
Downloaded by [Michigan State University] at 14:13 12 January 2015

first will even save some multiplications).


This statement about accuracy sounds a little peculiar at first to one familiar
with more classical statistical computations, but when he recalls the advantages
of postponing divisions in calculating sums of squares of deviations (and in
more general analysis-of-variance computations) he becomes aware of the
practical inequivalence of algebraically identical forms of computation.
An adequately prewhitened time series, at least one that is a realization from
an ensemble which produces spectrum estimates which are even a quarter as
variable as those provided by a Gaussian ensemble (most ensembles arising
in practice will produce estimates more variable than those of a Gaussian en-
semble), requires the observations to be recorded to, at most, only the precision
offered by 1.5 to 2 decimal digits [Tukey 1959, pp. 319-3201. But one that is
far from adequately prewhit,ened may require several decimal digits. This happens
because the spread between the maximum and minimum observations is deter-
mined by the (areas of) peaks in t’he spectrum, while the precision necessary
t’o avoid serious loss of information about the spectrum is determined by t,he
depths of its valleys.
A similar difficulty can arise in so simple a situation as fitting a quadratic
polynomial, though there most statisticians would see the difficulty coming
and evade it. Thus if
yi = 12.71 + 1,000,000x, + 0.03($ - l/3) + ei
where Xi ranges from - 1 to + 1, var pi = 10-j and we seek to find the quadratic
terms by ordinary quadratic regression, it will not suffice to use y-values with
only 7-decimal digits of precision, because rounding to units introduces devia-
tions of up to 0.50 (which is large compared to the maximum quadratic effect
of +0.02) and increases the effective error variance by a factor of more than 3000.
Similarly, in the time series case, if one is not prepared to prewhiten first,
when desirable, it is necessary to make provision for moderate to high precision
in input data, and correspondingly higher precision in accumulating sums of
lagged products. The most likely result is a program which computes sums of
218 JOHN W. TUKEY

lagged products in double-precision arithmetic, perhaps even floating-point


double-precision arithmetic. This means extra effort at many stages of the
computation.
?;o one of these four considerations rule out calculating sums of lagged pro-
duct,s once and for all, but each exerts pressure. The combined effect influences
me very much, but I must admit that they might, not be as potent if the cnlcu-
lations with which I was concerned were to be made on quite other comput,ing
equipment.

VIII : OTHER INTRODUCTORY REFERENCES


Where is the statistician to seek further enlight.enment about spectral analysis?
It is hard to give extensive lists of highly informative sources, but some guidance
may be helpful.
One useful route for many statisticians will be to turn to instances where
the techniques has been applied. A list of references to recent applications can
Downloaded by [Michigan State University] at 14:13 12 January 2015

be found in eit,her Tukey 1959a (pp. 408411) or Tukey 195913 (pp. 327-330).
These lists unfortunately omitted the 1957 Symposium at the Royal Statistical
Society on the Analysis of Geophysical Time Series [Craddock 1957, Charnock
1957, Rushton and Neumann 1957, and discussion], where furt’her references
to geophysical applications can be found.
Expositions from one point of view or another have been attempted by Press
and Tukey 1956, and Tukey 1959b. There is no substitute for reading Chapman
and Bartels 1940, or one of Bartels’s other expositions of similar techniques,
e.g. Bartels 1935.
An account from the point of view of the user has been attempted by Black-
man and Tukey [1959], who give a fair diversity of references.
The more abstract background may be sought in Grenander and Rosenblatt
1957, and in recent papers in Series B of the Journal of Royal Statistical Societ’y.
No expository account of the analysis of cross-spectra seems so far to exist.
The only substantial reference continues t’o be the thesis of Goodman [1957],
copies of which I understand can now be obtained from: Office of Scientific
and Engineering Relations (Reprints), Space Technology Laboratories, Inc.
P.O. Box 95001, Los Angeles 45, Calif.
REFERENCES
JULIUS BARTELS, 1935, Random fluctuations, persistence, and quasipersistence in geophysical
and cosmical periodicities, 40 Terr. Magnetisnz l-60.
R. B. BLACKMAN AND J. W. TUKEY, 1959, The measuremtnt of power spectra from the point of
view of communications engineering, New York, Dover, 5 + 190 pp. (Reprinted from 57 Bell
System Technical Journal (1958) with added preface and index.)
H. BURKHARDT, 1904, Trigonometrische interpolation, IIA9a Encyklopadie der Math. Wiss.
642-693.
SYDNEY CHAPMAN AND JULIUS BAHTEIS, 1940, Geomagnetism Oxford, Univ. Press (2 ~01s).
Especially chap. 16, Periodicities and harmonic analysis in geophysics, pp. 515-605 (in
vol. 2) (Second Edition 1951, photographic reprint with additions to appear.)
H. CHARNOCK, 1957, Notes on the specification of atmospheric tllrbulence, A120 J. ROLJ.
Stutist. SOC. 398-408 (discussion 425-439).
JEROME CORXFIELD AND JOHN W. TUKEY, 1956, Average values of mean squares in factorials,
27 Annals Alath. Statist., 907-949.
DISCUSSION OF JENKINS AND PARZEN PAPERS 219
J. hl. CRADDOCK, 195i, .-\n analysis of t,he slower temperature variations at Iiew 0l)scrvatory
by means of mutually exclusive band pass filters, -11% J. Roll. Statist. Sot., 38i-397 (dis-
cussion -X%5-439).
R. A. FISHER, 1929, Tests of significance in harmonic analysis, A126 Proc. ZZo!/.Sot. London,
54-59. (Reprinted as paper I6 in his C’ontribr&ons to Mathematical Statistics, New York,
Wiley, 1950).
X. R. GOODMAX, 195i On the joint estimation of the spectra, cospectrum and quadrature
spectrum of a two-dimensional stationary Gaussian process, Scientific Paper Ko. 10, Engi-
neering Statistics Laboratory, New York University 1957 (also Ph. D. Thesis, Princeton
University).
UI,F GRENaNDER AND MURRAY ROSENBLATT, 1957, Statistical -4nalysis of Stationary Time
Series, New York, Wiley; Stockholm, Almqvist & Wiksell, 300 p.
H. 0. HARTLEY, 1946, The application of some commercial calculating machines to certain
statistical calculations. 8 Suppl. J Roll. Sla . Sot., 154-183 (Especially pp. 167-168).
J. LEITH HOLLOWAY JR., 1958, Smoothing and filtering of time series and space fields, 4
Advances in Geoph?ysics (Ed. H. E. Landsberg) pp. 351-389, New York, Academic Press.
G. H. JOWETT, 1955 Sampling properties of local statistics in stationary stochastic series,
43 Biometrika, 160-169.
G. H JOWETT 1957, Statistical analysis using local properties of smooth heteromorphic
Downloaded by [Michigan State University] at 14:13 12 January 2015

stochastic series, 44 Biometrika, 454-463.


G. H. JOWETT AND WENDY M. WRIGHT, 1958, Jump analysis, 4.5 Biometrika, 386-399.
W. H. MUNK AND F. E. SNODGRASS, 1957, Zlleasurements of southern swell at Guadalupe
Island, 4 Deep-sea Research, 272-286.
H. PRESS AND J. W. TUKEY, 1956, Power spectral methods of analysis and application in
airplane dynamics, AGARD Flight Test Manual (Ed. E. J. Durbin) Vol. IV, Part IVC pp.
IVC: l-IVC: 41, North Atlantic Treaty Organization. (Also Bell System hlonograph
2606.) (2nd edition, London, Pergamon Press 1959.)
S. R~JSHTON.~ND J. NEUMANN, 1957, Some applications of time series analysis to atmospheric
t.urbulence and oceanography, A120 J. Roy. Statist. Sot. 409-425 (discussion 425-439).
KARL STUMPFF, 1937, Grundlagen und Methoden der Periodenforschung, Berlin, Springer,
vii + 332 pp. (Reprinted, Ann Arbor, Edwards, 1945.)
ROBERT J. TRUMPLER AND HAROLD F. WEAVER, 1953, Statistical astronomy, Berkeley, Univ.
Press.
JOHN W. TUKEY, 1959a, The estimation of (power) spectra and related quantities, On Nu-
merical Approximation (Ed. Rudolph E. Langer), pages 389-411, hfadison, Wisr., Univ.
Press.
JOHN W. TUKEY, 1959b, An introduction to the measurement of spectra, Probability and
Statistics: The Harold Cramer Volutne. (Ed. Ulf Grenander), pages 300-330, Stockholm,
Almqvist & Wiksell; New York, Wiley.
JOHN W. TUKEY, 1961 (?), Curves as parameters, and touch estimation, to appear in the
Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability.
(Also circulated as Technical Report No. 39 of the Statistical Techniques Research Group,
Princeton University.)
THOMAS WONXACOTT, 1961, Spectral analysis combining a Barlett window with an associated
inner window, 3 I’echnometrics, pp. 237-245.

You might also like