Time Series Forecasting Using Holt-Winters Exponential Smoothing
Time Series Forecasting Using Holt-Winters Exponential Smoothing
Exponential Smoothing
Prajakta S. Kalekar(04329008)
Kanwal Rekhi School of Information Technology
December 6, 2004
Abstract
Many industrial time series exhibit seasonal behavior, such as demand for apparel or toys.
Consequently, seasonal forecasting problems are of considerable importance. This report con-
centrates on the analysis of seasonal time series data using Holt-Winters exponential smoothing
methods. Two models discussed here are the Multiplicative Seasonal Model and the Additive
Seasonal Model.
1 Introduction
Forecasting involves making projections about future
performance on the basis of historical and current data.
When the result of an action is of consequence, but cannot be known in advance with precision,
forecasting may reduce decision risk by supplying additional information about the possible out-
come.
Once data have been captured for the time series to be forecasted, the analyst’s next step is
to select a model for forecasting. Various statistical and graphic techniques may be useful to the
analyst in the selection process. The best place to start with any time series forecasting analysis is
to graph sequence plots of the time series to be forecasted. A sequence plot is a graph of the data
series values, usually on the vertical axis, against time usually on the horizontal axis. The purpose
of the sequence plot is to give the analyst a visual impression of the nature of the time series. This
visual impression should suggest to the analyst whether there are certain behavioral “components”
present within the time series. Some of these components such as trend and seasonality are discussed
later in the report. The presence/absence of such components can help the analyst in selecting the
model with the potential to produce the best forecasts.
After selecting a model, the next step is its specification. The process of specifying a fore-
casting model involves selecting the variables to be included, selecting the form of the equation of
relationship, and estimating the values of the parameters in that equation.
After the model is specified, its performance characteristics should be verified or validated by
comparison of its forecasts with historical data for the process it was designed to forecast. Error
measures such as MAPE1 , RAE 2 , MSE3 maybe used for validating the model. Selection of an error
1
Mean absolute percentage error
2
Relative absolute error
3
Mean square error
1
measure has an important effect on the conclusions about which of a set of forecasting methods is
most accurate
Time-series forecasting assumes that a time series is a combination of a pattern and some
random error. The goal is to separate the pattern from the error by understanding the pattern’s
trend, its long-term increase or decrease, and its seasonality, the change caused by seasonal factors
such as fluctuations in use and demand.
Several methods of time series forecasting are available such as the Moving Averages method,
Linear Regression with Time, Exponential Smoothing etc. This report concentrates on the Holt-
Winters Exponential Smoothing technique as applied to time series that exhibit seasonality.
2 Basic Terminology
The following keywords are used throughout the report:
2
2.4 Stationarity
To perform forecasting, most techniques require the stationarity conditions to be satisfied.
3 Exponential smoothing
Exponential smoothing is a procedure for continually revising a forecast in the light of more recent
experience. Exponential Smoothing assigns exponentially decreasing weights as the observation get
older. In other words, recent observations are given relatively more weight in forecasting than the
older observations.
St = α ∗ Xt + (1 − α) ∗ St−1 (1)
When applied recursively to each successive observation in the series, each new smoothed value
(forecast) is computed as the weighted average of the current observation and the previous smoothed
observation; the previous smoothed observation was computed in turn from the previous observed
value and the smoothed value before the previous observation, and so on.
Thus, in effect, each smoothed value is the weighted average of the previous observations,
where the weights decrease exponentially depending on the value of parameter (α). If it is equal
to 1 (one) then the previous observations are ignored entirely; if it is equal to 0 (zero), then the
current observation is ignored entirely, and the smoothed value consists entirely of the previous
smoothed value (which in turn is computed from the smoothed observation before it, and so on;
thus all smoothed values will be equal to the initial smoothed value S0 ). In-between values will
produce intermediate results.
Initial Value
The initial value of St plays an important role in computing all the subsequent values. Setting it
to y1 is one method of initialization. Another possibility would be to average the first four or five
observations.
The smaller the value of α, the more important is the selection of the initial value of St .
3
trend. The level is a smoothed estimate of the value of the data at the end of each period. The
trend is a smoothed estimate of average growth at the end of each period. The specific formula for
simple exponential smoothing is:
Note that the current value of the series is used to calculate its smoothed value replacement in
double exponential smoothing.
Initial Values
There are several methods to choose the initial values for St and bt .
S1 is in general set to y1 .
Three suggestions for b1
b1 = y2 − y1
b1 = [(y2 − y1 ) + (y3 − y2 ) + (y4 − y3 )]/3
b1 = (yn − y1 )/(n − 1)
4.1 Overview
In this model, we assume that the time series is represented by the model
Where
b1 is the base signal also called the permanent component
b2 is a linear trend component
St is a multiplicative seasonal factor
t is the random error component
Let the length of the season be L periods.
The seasonal factors are defined so that they sum to the length of the season, i.e.
X
St = L (5)
1≤t≤L
The trend component b2 if deemed unnecessary, maybe deleted from the model.
4
4.2 Application of the model
The multiplicative seasonal model is appropriate for a time series in which the amplitude of the
seasonal pattern is proportional to the average level of the series, i.e. a time series displaying
multiplicative seasonality. (Section 2.1)
4.3 Details
This section describes the forecasting equations used in the model along with the initial values to
be used for the parameters.
yt
R̄t = α + (1 − α) ∗ (R̄t−1 + Ḡt−1 ) (6)
S̄t−L
5
4.3.3 Value of forecast
1. Forecast for the next period
The forecast for the next period is given by:
Note that the best estimate of the seasonal factor for this time period in the season is used,
which was last updated L periods ago.
where x̄i is the average for the season corresponding to the t index, and j is the position
of the the period t within the season. The above equation will produce m estimates of the
seasonal factor for each period.
1 m−1
X
S̄t = S̄t+kL t = 1, 2, · · · , L (14)
m k=0
L
S̄t (0) = S̄t PL t = 1, 2, · · · , L (15)
t=1 S̄t
6
5.1 Overview
In this model, we assume that the time series is represented by the model
yt = b1 + b2 t + St + t (16)
Where
b1 is the base signal also called the permanent component
b2 is a linear trend component
St is a additive seasonal factor
t is the random error component
Let the length of the season be L periods.
The seasonal factors are defined so that they sum to the length of the season, i.e.
X
St = 0 (17)
1≤t≤L
The trend component b2 if deemed unnecessary, maybe deleted from the model.
5.3 Details
This section describes the forecasting equations used in the model along with the initial values to
be used for the parameters.
7
where 0 < β < 1 is a second smoothing constant.
The estimate of the trend component is simply the smoothed difference between two successive
estimates of the deseasonalized level.
Note that the best estimate of the seasonal factor for this time period in the season is used, which
was last updated L periods ago.
6 Experimental Results
Variations of the additive and multiplicative exponential smoothing techniques were applied to
some standard time series data, the results of which are discussed in this section.
• Multiplicative model
Two variations of the model are used:
1. Non-adaptive
In the non-adaptive technique, data of the first two years is used for building the model
and establishing the parameters. Using these parameters, forecasts are made for the
remaining data. Hence, once the values of the parameters α, β and γ are initialized,
they are not modified later. The criterion used for selection of the parameters is MAPE.
– The advantage of this model is that the parameters are initialized only once. Hence,
once the parameters have been established, the forecasting can proceed without any
delay in re-computation of the parameters.
– Another advantage is that past data need not be remembered.
This method is typically suited for series where the parameters remain more or less
constant over a period of time.
8
2. Adaptive
In the adaptive technique, the original parameters are established using the data of
the first two years. However, the model keeps adapting itself to the changes in the
underlying process. The parameters are adjusted after every two years. The newly
computed parameters may be computed using
– All the data available till that point of time or
– Only k most recent data values.
The first approach requires all past data to be stored.
• Additive model
There are two variations of the model used. These are the same as that of the Multiplicative
model, i.e. Adaptive and Non-Adaptive.
For a given time series, it is possible that the value of L 4 maybe unknown. For such a time
series, the value of L was varied between 6 and 24.
6.2.1 d1
The results obtained by using Multiplicative Adaptive technique are summarized in Table 1
The best result value of MAPE is obtained for value of L = 23, with an infinite look back. The
MAPE for the non-adaptive technique is 10431.5, whereas that for adaptive is 109. This suggests
4
Seasonality period
9
that the series is constantly changing. Consequently, the model parameters need to be constantly
updated according to the newer data.
Figure 1 shows the comparison between the real and the forecasted values for L = 23 with an
infinite look back.
With the best value of MAPE being 109, exponential smoothing technique does not appear to
be the best forecasting technique for this data.
6.3 red-wine
In case of this time series, the multiplicative non-adaptive exponential smoothing technique gave
better results as compared to the adaptive version of the same. The value of MAPE obtained was
9.01809 with α = 0.2, β = 0.1 and γ = 0.1.
A summary of the results of Multiplicative Adaptive technique on the time series is shown in
Table 2.
The results suggest that the periodic re-computation of the parameters may not necessarily
improve the results. The time series may exhibit certain fluctuations. If the parameters are re-
computed periodically, then it is possible that new set of parameters are affected by the fluctuations.
Figure 2 shows the comparison between the real and the forecasted values for the multiplicative
non-adaptive technique with α = 0.2, β = 0.1 and γ = 0.1.
7 Conclusion
The Holt-Winters exponential smoothing is used when the data exhibits both trend and seasonality.
The two main HW models are Additive model for time series exhibiting additive seasonality and
Multiplicative model for time series exhibiting Multiplicative seasonality.
Model parameters, α, β and γ are initialized using the data of the first two years. The error
measure used for selecting the best parameters is MAPE. In the adaptive HW technique, these
parameters are constantly updated in light of the most recently observed data. The motivation
behind using the adaptive technique, as opposed to the non-adaptive technique is that, the time
series may change its behavior and the model parameters should adapt to this change in behavior.
Tests carried out on some standard time series data corroborated this assumption.
The value of L and the Look back size also play an important role in the performance of the
adaptive model, hence these parameters were also varied for forecasting the different time series. It
was observed that the adaptive technique with a larger look back in general improved the results
(as compared to the non-adaptive version). However, adaptive technique with a smaller look back
often performed worse than the non-adaptive technique. This suggests that re-computing the model
parameters on the basis of only few of the most recent observations may be an unnecessary overhead
which may lead to poor performance. Hence, the adaptive technique should be combined with a
sufficiently large look back size in order to obtain good results.
10
X Graph
Y
3.5000
3.0000
2.5000
2.0000
1.5000
1.0000
0.5000
0.0000
X
0.0000 100.0000 200.0000 300.0000
11
X Graph
Y x 103
3.0000
2.5000
2.0000
1.5000
1.0000
0.5000
X
0.0000 50.0000 100.0000 150.0000
Figure 2: Multiplicative non-adaptive technique for red-wine with α = 0.2, β = 0.1 and γ = 0.1
12
References
[1] Introduction to time series analysis. http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm.
[2] Lynwood A. Johnson Douglas C. Montgomery and John S. Gardiner. Forecasting and Time
Series Analysis. McGraw-Hill,Inc, 2nd edition edition, 1990.
13