The Holt-Winters Forecasting Procedure
The Holt-Winters Forecasting Procedure
Author(s): C. Chatfield
Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 27, No. 3
(1978), pp. 264-279
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2347162
Accessed: 12-10-2017 18:31 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
Royal Statistical Society, Wiley are collaborating with JSTOR to digitize, preserve and
extend access to Journal of the Royal Statistical Society. Series C (Applied Statistics)
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
Appl. Statist. (1978),
27, No. 3, p. 264-279
SUMMARY
The Holt-Winters forecasting procedure is a simple widely used projection method
which can cope with trend and seasonal variation. However, empirical studies have
tended to show that the method is not as accurate on average as the more complicated
Box-Jenkins procedure. This paper points out that these empirical studies have used
the automatic version of the method, whereas a non-automatic version is also possible
in which subjective judgement is employed, for example, to choose the correct model
for seasonality. The paper re-analyses seven series from the Newbold-Granger study
for which Box-Jenkins forecasts were reported to be much superior to the (automatic)
Holt-Winters forecasts. The series do not appear to have any common properties, but
it is shown that the automatic Holt-Winters forecasts can often be improved by sub-
jective modifications. It is argued that a fairer comparison would be that between
Box-Jenkins and a non-automatic version of Holt-Winters. Some general recom-
mendations are made concerning the choice of a univariate forecasting procedure.
The paper also makes suggestions regarding the implementation of the Holt-Winters
procedure, including a choice of starting values.
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
THE HOLT-WINTERS FORECASTING PROCEDURE 265
appropriate model, takes any outliers or discontinuities into consideration and keeps a careful
check on the forecast errors.
This paper presents a new look at the Holt-Winters procedure, describes the analysis of
seven economic series, makes a number of practical suggestions regarding the implementation
of the procedure (both in its automatic and non-automatic form) and finally makes recom-
mendations regarding the choice of a univariate forecasting procedure, particularly in regard
to deciding between the non-automatic Holt-Winters procedure and the Box-Jenkins approach.
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
266 APPLIED STATISTICS
though as we have seen it does actually do this for about one-third of the series. This high-
lights a general problem with empirical comparisons, as noted by Box (1970) in a different
context, that they are often made in circumstances which greatly favour one contender and
the result is as expected. For example, when a research worker proposes a new forecasting
method he naturally understands it better than other methods and may well get better results
with it than other methods. With regard to the HW method, empirical comparisons have
hitherto ignored the possibility of using subjective judgement to improve forecasts and so, in
this paper, I attempt to compare the automatic and non-automatic HW procedures.
One of the difficulties of attempting empirical comparisons of forecasting procedures using
real data is that there is no such thing as a "random sample of time series". For my study I
used a specially selected non-random sample for the following reason. One interesting feature
of the results reported by Newbold and Granger (1974) was that BJ sometimes gave much
better forecasts than HW, whereas HW was rarely superior to BJ by more than a small margin.
This was perhaps to be expected given that the BJ procedure allows a choice from a much wider
class of models. By using series for which BJ gave much better forecasts than automatic HW,
I would endeavour to answer two queries. Firstly, to see if series of this type have any general
features which would yield guidelines for indicating when BJ is likely to be preferable to auto-
matic HW and, secondly, to see if the HW results can be improved by using a non-automatic
version of the method. I contacted Dr P. Newbold and he kindly sent me seven series from
those examined in the Newbold-Granger comparison for which BJ gave much better forecasts
than HW. These series are listed in Table 1.
TABLE 1
Newbold and Granger (1974) divided each series into two parts; a fitting period, which
was used to fit an appropriate model, and a forecast period when forecasts were compared with
actual observations. The lengths of each period are shown in Table 1.
Unfortunately, records no longer exist of the Box-Jenkins models fitted in the Newbold-
Granger comparison or of the mean-square forecast errors (over the forecast period) which
resulted from using the BJ or automatic HW methods. However, records of the ratio
(BJ m.s.e./HW m.s.e.) do exist. For series B to G which all show seasonal variation, this ratio
lies between 04 and 07. Series C gives the lowest ratio. There were few non-seasonal series
in the Newbold-Granger set, the "worst" cases having a higher ratio between 08 and 09.
Series A was selected at random from those cases.
The absence of detailed records on the Newbold-Granger comparison is annoying and
raises a general query about empirical studies which are not available for re-examination, but
this should not prevent us achieving our objectives. Admittedly, any improvements that we
make on the automatic HW procedures will be improvements on our findings only, but there
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
THE HOLT-WINTERS FORECASTING PROCEDURE 267
is good reason to suppose that similar improvements will be possible for the Newbold-Granger
automatic results.
Throughout the paper, I not only estimate parameters by least squares but also compare
accuracy by means of mean-square forecast errors, thus effectively assuming a quadratic
cost-of-error function. This assumption is often queried and it has to be admitted that it is
usually made for practical convenience and because there is often no clearly superior alter-
native (e.g. see Granger and Newbold, 1973). However, Granger's (1969) results do give some
comfort that the choice of a quadratic cost function is not too critical, particularly if the error
distribution and the cost function are approximately symmetric. To check on this, I re-
estimated the HW smoothing parameters, which are evaluated in Section 4, by minimizing
mean absolute one-step-ahead forecast errors. This made virtually no difference for all seven
series.
Seasonal pattern
Series A is non-seasonal, while Series B loses its small seasonal component after the first
four or five years. The rest of the series are seasonal, but in markedly different ways. For
example, the seasonal variation is relatively small for Series G, but large for F. The size of the
variation increases with the mean for Series E, but decreases for G as the mean increases. For
Series C, the seasonal variation appears to decrease sharply in size after the first two or three
years.
Trend
There is increasing trend for Series B, D, E and G, while the mean level for Series A increases
and decreases sharply many times in a manner reminiscent of a random walk. Series C shows
little trend while Series F shows two turning points, first decreasing, then increasing, and then
decreasing again.
Random component
Some of the series show a high random component (e.g. Series A and C) while others
show a low random component (e.g. Series B).
To summarize the previous remarks, it appears that the seven series exhibit very different
properties. Indeed, some of the series exhibit non-stationary changes in trend or seasonality
which may be thought unusual or exceptional. But, as Harrison and Stevens (1975) have
pointed out, such data series are in no way exceptional or pathological.
Although I have analysed a relatively small sample of time series, I am forced to surmise
that time series, for which BJ performs much better than (automatic) HW, do not appear to
have any general common features.
The reader may now ask if the fitted BJ or HW models have any features in common.
In Section 4 we will see that the HW smoothing parameters vary considerably for the seven
series. As for the BJ models, the ones fitted by Newbold and Granger are no longer available.
This author could refit BJ models (as he has for Series A-see Section 6), but the models might
well be different to those fitted by Newbold and Granger. Given the different properties of the
series, it seems highly unlikely that the BJ models do have any features in common and, even
if they do, it would be unlikely to help in deciding beforehand if a BJ analysis is worth doing.
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
10
16,000 -)
9 0
14,000-
12,000-
035
0 E~~~~~~~~~~~~~~~~n n
of 200-
1 fitting fitti
period per
5 I1 0 1 5 20 5 1 0
Year Year
FIG. 1. Series A. FIG. 2. Serie
2400 ~~~~~~~~~~~~~~~~~~~~~~~~130 H
2,400 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2,000 13120 -0
a, ~~~~~~~~~~~~~~~End
1,600 of
fittinga
E ~~~~~~~~~~~~~~~period 110
61,2006
800 6~~~~~~~~~~~~~~~~~~~~E0
400 Endo
Cd~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~itn
b x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~pro
0 2 3go 6 7 51
Year Year~~~~~~~~~~~~~~~~~~~~~~~~~C
400 - FI.3.SrisC.FG.4nSre
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
180
150
C:m ~~~~~~~~~~~~~~~~~~~~~~~~~~160
C ~~~~~~~~~~~~~~~~~~~~~~~-140-
t;130 -
E 2 120-
800
090 H~~~~~~~~~~~~~~~~~~~~~~~~~~~
End 60 En
of o
fitting fitt
period peri
70 ~~~~~~~~~~~~~~5 10 40
Year Year
160-
c ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
m~~~~~~~~~~~~~~~~~~~
C 1420 -
0~~~~~~~~~~~~~~~~~~
0 (D
End
- ~~~~~~~~~~~~~~~~~~~~~~of
fitting
period
80 ~~~~ ~ ~~~~~~~Year10k
FIG. 7. Series G.
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
270 APPLIED STATISTICS
TABLE 2
A 08 - 020
B 0-3 1.0 0415
C 0-4 1-0 0.03
D 0-6 1-0 0-10
E 0-6 0-8 0-05
F 0-6 1.0 0-14
G 0-2 0-4 0-20
as some of the values are much higher than those customarily quoted in the literature. For
example, Coutie et al. (1964) say that typical values for sales data are 01 <c( < 03 and
0 < y < 003, while Reid (1975) says typical values might be x = 02, p = 0 3 and y = 0 1
notation. Now changes of 0 1 or 0 2 in the values of ox and / will not make much difference,
but the above "typical" values are considerably different from some of my estimates and
would do rather badly for some of the series.
On the basis of my experience with these and other series, I suggest that it is rather
dangerous to try to give "typical" values for the smoothing constants. Thus, the HW smooth-
ing parameters should not be guessed, but rather estimated from the data.
Using the optimal smoothing constants, I next computed mean-square one-step-ahead
forecast errors over the respective forecasting periods. From these I calculated the respective
coefficients of determination, namely the proportions of the total corrected sums of squares
which were "explained" by the one-step-ahead forecasts. Although HW was expected to have
inferior accuracy to BJ for these series, the coefficients of determination were all found to be
reasonably high, ranging from 77% for Series C to over 99% for Series G. In the latter case,
the HW method would almost certainly be judged sufficiently accurate even though the BJ
forecasts may be even better.
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
THE HOLT-WINTERS FORECASTING PROCEDURE 271
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
272 APPLIED STATISTICS
additive nor multiplicative, but it seems worth seeing if the additive model is a better approxima-
tion than the multiplicative model. In the event, the additive model gave virtually no improve-
ment of fit, presumably because the size of the seasonal variation is small compared with the
trend. No doubt there will be other series where the choice of the correct seasonal model will
be crucial. Indeed, the author was once asked by a commercial organization to find out why
their HW forecasts were "blowing up". It transpired that the data had not been plotted, that
a multiplicative model was being used when the data were clearly additive, and that their
computer program "went haywire" as a result because the seasonal factors were not normalized
after each year's cycle.
TABLE 3
A 0-38 049
B 024 -002
C 0-10 -0-19
D 0-33 0-36
E 0.01 0-13
F 0-42 070
G 0-24 0-34
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
THE HOLT-WINTERS FORECASTING PROCEDURE 273
One general procedure for improving HW forecasts suggested by Newbold and Granger
(1974) is to take a linear combination of the forecasts from the HW procedure and from a
technique called stepwise autoregression (Granger and Newbold, 1977, Section 5.4). This
combination of two sets of "automatic" forecasts gives results comparable in accuracy to those
of Box-Jenkins. Intuitively, the stepwise autoregression forecasts may be able to take account
of autocorrelation in the time series which cannot be described by a HW model. Here we try
an even simpler method of taking autocorrelation into account which has been suggested by
D. J. Reid for improving forecasts from general exponential smoothing (see Granger and
Newbold, 1977, p. 169). It is called a two-stage forecasting procedure by Gilchrist (1976, p. 262).
The modification consists of fitting a first-order autoregressive model to the HW one-step-
ahead forecast errors, which we denote by {el}. The quantity {Aet} is added to the one-step-
ahead forecast made at time t, where A is a new parameter which we will call the autoregressive
parameter. An intuitively sensible estimate of A is the value of r1 obtained for the forecast
errors in the fitting period using the unmodified HW procedure.
Now, when testing the null hypothesis that a series of N observations come from a purely
random process, it can be shown (see Chatfield, 1975, p. 62) that values of I r1 j exceeding
2/4N are significantly different from zero at the 5 per cent level. Box and Pierce (1970) have
shown that, when calculating r1 for the residuals from a fitted ARMA process, 2/1N will generally
provide an over-estimate of the critical value. It seems a reasonable conjecture that this may
also apply to the residuals from a fitted HW seasonal model although the updating procedure
for estimating parameters may introduce additional autocorrelation. I have not attempted a
theoretical analysis as I do not "believe" any of the models to be completely accurate. Rather
I see the method described above as a robust way of treating any departures from the HW
model and suggest 2/1N as an approximate critical value for testing whether values o
significantly different from zero.
In Table 3 we see that the values of r1 for Series A, B, D, F and G are significantly different
from zero in the fitting period, indicating that it should be possible to improve the HW fore-
casts for these series. The modified HW procedure was tried over the forecasting period for
all these five series. For series B the modification gave slightly worse results-a 2 per cent
increase in mean-square error. But for Series A, D, F and G substantial improvements were
made, averaging a 23 per cent reduction in mean-square error. Averaged over all seven series,
the improvement is 12 per cent. This compares with an average improvement of 20 per cent
reported by Newbold and Granger (1974) when combining HW with stepwise autoregression.
Given the simplicity of the modified HW procedure and Reid's earlier success with a similar
modification, the method seems worthy of further consideration. Indeed I suggest that the
value of r1 should be routinely calculated in the fitting period to see if the modification is
worth trying.
As the modified HW procedure depends on four parameters, namely o, /, y and A, it might
appear advantageous to estimate the parameters simultaneously in the fitting period. However,
this gave virtually the same values for our seven series as were obtained by estimating of, :
and y by the original method and taking A = r1. As the latter procedure involves minimizing a
sum of squares in only three dimensions, it will be preferred.
For a k-steps-ahead predictor, the modification resulting from the above approach will
become Ak el, which quickly becomes small as k increases, so that the modification will n
help in forecasting several steps ahead. In this connection, we note that Newbold and Granger
found that the advantage enjoyed by BJ over automatic HW also decreased with k.
6. SERIEs A
Although the HW procedure has been applied to all seven series in our set, it is not at all
clear that it is an appropriate method for Series A which is non-seasonal and shows no regular
trend. Rather the mean level moves up and down in a "random" way which experienced time-
series analysts will recognize as looking something like a random walk.
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
274 APPLIED STATISTICS
For a random walk, successive first differences form a purely random process (or discrete
white noise) and the optimal predictor of the observation at time (t + 1) is simply the previous
observation, namely xi. Over the forecasting period, this gives a mean square error which is
better than the standard HW procedure though not quite as good as the modified HW
procedure.
TABLE 4
Autocorrelation Autocorrelation
Lag coefficient Lag coefficient
0 1 7 -0i10
1 0X32 8 -0X14
2 0*03 9 -0-03
3 0.09 10 -0-01
4 0*00 11 0.01
5 0.01 12 0.09
6 -0X01 13 0X00
The random walk forecast can be further improved as there tend to be several increases or
several decreases in a row. The autocorrelation function of the differenced series is shown in
Table 4, and we see that the coefficient at lag one is the only one significantly different from
zero. This indicates that the first differences form a first-order MA process or, equivalently,
that xt follows an ARIMA model of order (0, 1, 1) namely
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
THE HOLT-WINTERS FORECASTING PROCEDURE 275
Comparing (6.3) with (6.1) we see that 0 = o- 1. Now (6.1) is invertible for j 01 < 1, or equiva-
lently for 0< a.<2 (see Box and Jenkins, 1970, p. 107). When a. lies in the range (1,2) the
simple exponential smoothing predictor given by (6.2) can still be expressed in the form
7. STARTING VALUES
The analysis of the Series A data highlighted the need to take a careful look at starting-up
values. I subsequently found that the Newbold-Granger starting values (see Granger and
Newbold, 1977, p. 165) were as follows:
(a) Mean: the average observation in the first year, as also suggested by Winters (1960), and
used in my first run.
(b) Trend: this was set to zero, unlike the recommendations of Winters (1960) and Chatfield
(1975). In my first run I used the average monthly difference between the first and second
years' averages.
(c) Seasonal factors: these were calculated from the first year's data only, by comparing each
observation with the overall average in the first year. No adjustments are made for trend
as the initial trend value is set equal to zero. Winters (1960) and Montgomery and Johnson
(1976) suggest averaging over the whole of the fitting period with a trend adjustment, while
my first run used averages over the first two years with a trend adjustment.
I tried the Newbold-Granger starting values on all seven series using the automatic pro-
cedure. Over the fitting period there was little difference in mean-square error on average,
though there were sizeable differences for individual series. For example, Series F was 55 per
cent worse using the Newbold-Granger values, while Series A and D were 26 per cent and
16 per cent better respectively. Over the forecast period, the Newbold-Granger values did
somewhat better on average. Series F was 41 per cent better while Series A was 14 per cent
better if as is restricted to the range (0, 1) but 23 per cent better if as is allowed to take the
value 1 36. The estimated smoothing constants were much the same except for the value of y.
For Series A we have already seen that the value was reduced from 0 2 to 0 0, while for Series D
the value increased from 0- 1 to 0 2.
The choice of starting values clearly requires more investigation and we need more
experience with other sets of data. The choice will depend to some extent on the properties of
the series. For data like Series A, which contain no steady trend, it is better to set the initial
trend value to zero. The advantage of the Newbold-Granger starting values is that they can
also adapt to cope with the situation where trend is present. Thus, on the grounds of sim-
plicity and accuracy, I am inclined to recommend the use of the Newbold-Granger starting
values, though other starting values may sometimes be appropriate.
I repeated the comparison of the automatic and non-automatic HW procedures using the
Newbold-Granger starting values. Once again, it was easy to improve the automatic forecasts
using subjective judgement. In particular, when a non-seasonal model was used for Series B,
an even larger reduction in mean-square forecast error was obtained than when using the
previous starting values. It was also found that the one-step-ahead forecast errors tended to be
autocorrelated so that further reductions in mean-square error could be made by the method
described in Section 5.3. Thus the non-automatic HW procedure is superior to the automatic
version for both sets of starting values.
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
276 APPLIED STATISTICS
(a) Series for which BJ forecasts are much better than forecasts from the automatic HW
procedure do not appear to have any common properties.
(b) It is often easy to improve automatic HW forecasts by simple subjective modifications.
Now, earlier empirical studies have concentrated on comparing the BJ procedure with
automatic procedures, although they are at opposite extremes of complexity and really have
different purposes. In this paper we have concentrated on a non-automatic version of the
HW procedure whose complexity is in-between these two extremes. The differences from the
automatic procedure are as follows.
(a) A careful subjective choice is necessary to choose the correct seasonal model, either
additive, or multiplicative or non-seasonal. It is particularly important not to use a
seasonal model for non-seasonal data.
(b) Subjectively adjust any outliers due to known effects. Also consider omitting the observa-
tions in the early part of the series if their properties are different to those of the later part
of the series.
(c) If the forecast errors are correlated, fit a first-order autoregressive model to them. It is
suggested that the first-order autocorrelation coefficient of errors in the fitting period should
be calculated routinely.
Although these subjective modifications require some effort, it is clearly much less than that
required by the full BJ procedure.
Apart from the above suggestions, this paper also makes the following practical recom-
mendations for the HW procedure, whether used in its automatic or non-automatic form.
(a) Starting values for the mean, trend and seasonal factors may be calculated from the first
year's data only, with the initial trend-value set equal to zero.
(b) The "typical" values for the HW smoothing parameters which are quoted in the literature
are often a long way from the estimated values, and it is recommended that the smoothing
parameters should always be estimated rather than guessed.
In conclusion I would like to make some general recommendations regarding the choice
of a univariate forecasting procedure. These are based, not only on my experience with the
series analysed in this paper, but also on my experience with other series and on the many
analyses reported in the literature by other authors.
(a) Automatic procedures are appropriate when there are a large number of items to forecast.
For example, in production planning and stock control there may be several hundred (or
even several thousand) series to consider. The automatic version of the HW procedure
seems as good as any.
(b) Non-automatic procedures are appropriate in most other situations, as few series are so
standard that it is safe to treat them automatically. There are many such procedures to
choose from, but my experience is mainly with the non-automatic version of the HW
procedure and the BJ procedure. I have seen little evidence to suggest that any other
method is definitely superior to either or both of these methods, although several other
methods are comparable in accuracy and the reader should use one he feels happy with.
Restricting attention to non-automatic HW and BJ, we should recognize that both have a
place in the forecaster's toolbag and that the choice between them is not easy. Sometimes
practical considerations will rule out the BJ procedure as, for example, if there are in-
sufficient observations or insufficient expertise available. If this is not the case then I
suggest the following.
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
THE HOLT-WINTERS FORECASTING PROCEDURE 277
(i) For series in which the variation is dominated by trend and seasonal variation (e.g.
those in Figs 2-7), I recommend the use of the non-automatic HW procedure. The BJ
procedure, which requires considerably more effort, will rarely give much improvement
in fit. This is because the effectiveness of the BJ procedure will here depend primarily on
the intial differencing procedure rather than on the ARMA model-fitting stage (see also
the remarks of Akaike, 1973, and Parzen, 1974, p. 725).
(ii) For series in which the random variation is "large" compared with trend and seasonal
variation, the BJ procedure is worth trying. This applies particularly to series, like that in
Fig. 1, which show a random-walk type of behaviour.
(iii) For series showing discontinuities, like that in Fig. 8, there is little point in trying
time-series projection methods. The author was recently asked to produce univariate fore-
casts for this series, but refused. The sudden jump corresponds to a sales drive. If
forecasts are required, then informed guesswork is likely to be better than statistical
projections. A useful rule-of-thumb is that if you think you can produce good forecasts
for a series "by eye", then statistical projections will probably work well. But if you
can't, then they won't!
2200
2000
1800
1600
a)
1400-
a) 1200
0
D-a 1000.
E
800-
600-
400-
1 2 3 4 5 6 7
Year
ACKNOWLEDGEMENTS
I am grateful for constructive comments on earlier versions of this paper by a referee,
A. S. C. Ehrenberg, E. McKenzie, P. Newbold, G. J. A Stern and K. D. C. Stoodley, the latter
having reminded me of the connection between exponential smoothing and an ARIMA (0, 1, 1)
model.
REFERENCES
AKAIKE, H. (1973). Contribution to the discussion of the paper by Chatfield and Prothero J. R.
Statist. Soc, A, 136, 330.
Box, G. E. P. (1970). Book review of De Bruyn's book on cusum charts. Rev. Int. Statist. Inst., 38, 305.
Box, G. E. P. and JENKINS, G. M. (1970). Time-series Analysis, Forecasting and Control. San Francisco:
Holden-Day (rev. edn publ. 1976).
Box, G. E. P. and PIERCE, D. A. (1970). Distribution of residual autocorrelations in ARIMA time-series models.
J. Amer. Statist. Ass., 65, 1509-1526.
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
278 APPLIED STATISTICS
CHATFIELD, C. (1975). The Analysis of Time Series: Theory and Practice. London: Chapman and Hall.
(1977). Some recent developments in time-series analysis. J. R. Statist. Soc. A, 140, 492-510.
CHATFIELD, C. and PROTHERO, D. L. (1973). Box-Jenkins seasonal forecasting: Problems in a case-study.
J. R. Statist. Soc. A, 136, 295-336.
COOPER, J. P. and NELSON, C. R. (1975). The ex-ante prediction performance of the St Louis and FRB-
MIT-PENN econometric models, and some results on composite predictions. J. of Money, Credit and
Banking, 7, 1-31.
COUTIE, G. A. et al. (1964). Short-term Forecasting. I.C.I. monograph No. 2. Edinburgh: Oliver and Boyd.
DURBIN, J. and MURPHY, M. J. (1975). Seasonal adjustment based on a mixed additive-multiplicative model.
J. R. Statist. Soc. A, 138, 385-410.
GILCHRIST, W. (1976). Statistical Forecasting. London: Wiley.
GRANGER, C. W. J. (1969). Prediction with a generalized cost of error function. Op. Res. Quart., 20, 199-207.
GRANGER, C. W. J. and NEWBOLD, P. (1973). Some comments on the evaluation of economic forecasts.
Appl. Econ., 5, 35-47.
(1977). Forecasting Economic Time Series. New York: Academic Press.
GROFF, G. K. (1973). Empirical comparison of models for short range forecasting. Man. Sci., 20, 22-31.
HARRISON, P. J. and STEVENS, C. F. (1975). Bayes forecasting in action: case studies. Warwick: Statistics
Research Report No. 14.
JENKINS, G. M. (1974). Contribution to the discussion of the paper by Newbold and Granger. J. R. Statist.
Soc. A, 137, 148-150.
McKENZIE, E. (1976). A comparison of some standard seasonal forecasting systems. The Statistician, 25,
3-14.
MONTGOMERY, D. C. and CONTRERAS, L. E. (1977). A note on forecasting with adaptive filtering. Op. Res.
Quart., 28, 87-91.
MONTGOMERY, D. C. and JOHNSON, L. A. (1976). Forecasting and Time-series Analysis. New York: McGraw-
Hill.
NEWBOLD, P. and GRANGER, C. W. J. (1974). Experience with forecasting univariate time-series and the
combination of forecasts. J. R. Statist. Soc. A, 137, 131-165.
PARZEN, E. (1974). Some recent advances in time-series modelling. Trans. I.E.E.E. on Automatic Control,
AC-19, 723-730.
REID, D. J. (1975). A review of short-term projection techniques. In Practical Aspects of Forecasting (H. A.
Gordon, ed.), pp. 8-25, London: Operational Research Society.
WINTERS, P. R. (1960). Forecasting sales by exponentially weighted moving averages. Man. Sci., 6, 324-342.
APPENDIX
In practice the seasonal effect may be intermediate between additive and multiplicative, as
in the models described by Durbin and Murphy (1975), but the Holt-Winters procedure
requires one to choose either an additive or multiplicative model. Both models also assume
an additive error of constant variance, but in practice the error variance may increase with the
mean. If both the seasonal and error terms are thought to be multiplicative, then the logarithms
of the data may be analysed using the additive model, although it should be noted that this
effectively assumes that the trend term is also multiplicative.
In order to describe the updating and forecasting procedures, we introduce the following
notation:
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms
THE HOLT-WINTERS FORECASTING PROCEDURE 279
This content downloaded from 148.204.57.95 on Thu, 12 Oct 2017 18:31:59 UTC
All use subject to http://about.jstor.org/terms