Forecasting Introduction: 1 Forecast Error
Forecasting Introduction: 1 Forecast Error
Version 1.7
Feb 5, 2004
This introduction will cover basic forecasting methods, how to set the parameters of those
methods, and how to measure forecast accuracy.
We will use the following terminology:
Fi Forecast of demand in period i.
Di Actual demand in period i.
Ft,i Forecast of what the demand will be in period i; forecast was made at time t.
1 Forecast Error
The way we will measure how well a forecast performs is by using measures of forecast error that
you’ve probably seen before. Define δ (delta) as the difference between the forecast and the actual:
δi = Fi − Di .
We can perform a variety of calculations on this number to get a feel for how well our forecast
method is performing.
MAD = ni=1 |δi | /n
P
Pn
MSE = i=1 (δi )2 /n
MAPE = ni=1 |δDii| /n
P
Pn
RSFE = i=1 δi
TS = RSFE / MAD
MAD is the Mean Absolute Deviation, which tells us the average of the absolute values of the
errors. MSE is the Mean Squared Deviation, which is the average of the squared errors. MAPE is
the Mean Absolute Percentage Error, which takes the absolute error of each forecast, and divides
it by the value of the demand, to get the error as a percentage of the demand, and then averages
these percentage errors. RSFE is the running sum of forecast errors. Instead of taking the absolute
value of the errors, the positive and negative numbers are allowed to cancel each other out, if that’s
what happens. Finally, the Tracking Signal (TS) takes the RSFE and divides it by the MAD.
MSE is not as widely used. The MAD gives us a picture of the average amount of the error:
on average we are off by ten units, sometimes high, sometimes low. The RSFE tells us whether
our forecast is biased to always be too high, or always be to low. The analogy I like to make is to
1
archery. The MAD tells us that we miss the bullseye by 10 inches, on average, but it doesn’t tell
us which way we need to correct. The RSFE tells us that cumulatively, we have missed the target
100 inches to the right; so maybe sometimes we miss to the left, but on general, we miss to the
right a lot more than we miss to the left, so we should correct by aiming a bit more to the left.
If we aren’t consistently shooting in the wrong direction, the RSFE should stick close to zero,
sometimes positive, sometimes negative. If it becomes a large positive or negative number, we need
to correct our forecast. But how big is a big enough error that we should do something about it?
That’s where the TS comes in. If the RSFE gets to be as big as say, 5 times the MAD, we need to
fix something. So divide the RSFE by the MAD, and that’s the TS. If that gets close to 5, (either
positive or negative) we need to re-evaluate our forecasting method.
2 No Trend or Seasonality
If there is no trend or seasonality to your demand, then every day should be pretty much like any
other day.
2
numbers that eventually become really old, these old numbers drag down the average and should
not be considered representative of our situation. For that reason, we should throw out some of
the oldest data.
In a moving average forecast, we decide that we are only going to take an average of the n
most recent data points. We might use the last 4 months, or 3 years, whatever number we feel
comfortable with. Our forecast for period t is given by:
Pt−1
i=t−1−n Di
P
last n demands
Ft = = .
n n
If there truly is no trend to the demand, this won’t work quite as well as the simple average,
but if there is any actual trend to the demand, the moving average will work better than the simple
average.
Ft = α ∗ Dt−1 + (1 − α) ∗ Ft−1 .
3
As you can see, it is very easy to use. All you need to decide ahead of time is what value to use for
α, and you just need to know what the most recent demand was and what the most recent forecast
was, and that’s all you need to make a new forecast. If α = 0, the forecast never changes, and if α
= 1, we just have the naive method. Usual values are around 0.1 to 0.3.
The name comes from the fact that we are “smoothing” the numbers, getting a new estimate
each time by modifying last period’s number. The exponential part is a little trickier to explain,
but here goes. If we make a forecast for period 10, we would write:
F10 = α ∗ D9 + (1 − α) ∗ F9 .
But, we could ask ourselves, last month, when we made the forecast for period 9, how did we do
it? We used this formula:
F9 = α ∗ D8 + (1 − α) ∗ F8 .
So if you look at where the forecast for period 10 really comes from, if we substitute this expression
in for the F9 part, we could write it like this:
F8 = α ∗ D7 + (1 − α) ∗ F7 .
What you see then is that the new forecast that we so easily make for period 10 is really the sum
of the past demand figures. Each demand number is getting a different amount of weight, so what
this really is is a weighted moving average. But how do the weights change? Since α is between 0
and 1, (1 − α) must also be a number between 0 and 1. Any time you multiply together numbers
between 0 and 1, the result is a smaller number. (Try it.) So the weights get smaller exponentially,
and hence the name.
Ft = α ∗ Dt−1 + (1 − α) ∗ Ft−1 .
4
The new demand gives us another data point about what the intercept might be, and the old
forecast is our old estimate of the intercept.
We can (and we are going to) use this idea to update estimates of other things, like a trend.
Although the formula will look different, the idea is the same:
4 Setting Parameters
How should we choose the best parameters of α or b? How do we make our forecasting methods
get the best possible results? The idea is simple: try different parameter values until you get the
MAD (or MSE, or whatever you want to focus on) as small as possible. You could do this by hand,
but using a spreadsheet or some other computer tool is really the only way to do it quickly.
However, there is a risk that by doing this, you will be cooking your method to fit the past
data perfectly, but that doesn’t mean it will work anywhere that well for the future. Continuing
with baseball analogies, if you could throw the exactly same pitch 100 times (I could just stand
there and rewind the ball like it was a video), I ought to eventually be able to hit a home run. But
that doesn’t mean I’m going to hit a home run when I let you throw whatever you want to the
next time.
To get around that, a more trusted method is to take one part of your data for tweaking the
parameters, probably the first half or two thirds of the data. Then look to see how the method
performs on the remainder of your data. This is a more accurate portrayal of how it would do once
you gave it some new data.
5
formulas. But if you get it all in correctly, the spreadsheet will update automatically when you
enter new data points.
Secondly, could also use the “Data Analysis ToolPak.” You should find that under “Tools —
Data Analysis.” If it is not there, go to “Tools — Add-Ins.” In that dialog box, check the box by
Analysis ToolPak. If that does not appear in the dialog box, you need to get out your CDs and
install that part of Excel. After you go to “Tools — Data Analysis,” a dialog box comes up, where
you tell it which cells are the X’s, and which are the Y’s. Tell it where to put the output, but be
careful. If you put it on the sheet you are working on, the following 18 rows will get written over
with the output.
In that output, you will want to look at where the “Intercept” row and the “Coefficient” column
intersect. That is the intercept. The slope is the row below that, in the “X Variable” column, where
it meets the “Coefficient” column. R-squared is in the second row of numbers, under “R Square.”
One problem with this method is that when you add new data point to the spreadsheet, you
have to go back up to “Tools — Data Analysis” every time to re-run the LR. The spreadsheet can’t
update automatically.
Thirdly, another way to get the intercept and slope is to create a graph of the data. Right click
on the data line, and select “Add Trendline.” In that dialog box, add a linear trendline, and under
“Options,” you can have the equation of the trendline displayed on the graph, and also R-squared.
The trouble with doing the LR in the graph, is that you can’t make use of the numbers that appear
in the graph in any calculations.
Finally, the best way to do the LR is to use the SLOPE and INTERCEPT functions. SLOPE(range
of x value, range of y values) gives you the slope, and INTERCEPT(x values, y values) gives the
intercept. To find out R-Squared, use RSQ(x values, y values).
6
2. Compute a new, smoothed estimate of the trend
3. Use these two new terms to predict the demand for period i + 1:
T AFi+1 = Si + Ti .
If you want to make a forecast for the period after that, just add on another period’s worth of
growth:
T AFi+n = Si + Ti ∗ n.
This follows the same scheme that the Si formula follows. (Si − Si−1 ) shows us how much the
intercept has changed recently, so it is the new information. We’ll give it the weight of β, and give
the old estimate of the trend the weight of (1 − β).
If we take this, and multiply out the parenthesis (using the distributive property) we get:
Notice that the last part of this, Si−1 + Ti−1 , is just T AFi = Si−1 + Ti−1 . If we write it that way,
we get
Ti = Ti−1 + βSi − βT AFi .
If we put the last two terms together, we get the following, which is the original equation:
So even though the two equations look different, they are equivalent.
6 Problems
1. Explain the difference between bias and deviation.
3. Using the following data, create a forecast for each period, using the following methods:
7
(a) 3 period Moving Average
(b) Weighted Moving Average - 5 periods, you choose the weights
(c) Exponential smoothing - Compute the MAD of it, and play around with the alpha to
try to get the MAD as small as you can get it.
(d) Naı̈ve method
(e) Plot the demands and all of the forecasts on a graph. Which method seemed to work
the best?
4. Using the data for Problem 4, below, create a forecast for each period.
(a) Create the forecasts using Linear Regression. What is the R2 value? What is the MAD
from your forecasts?
(b) Create the forecast for each period using double exponential smoothing. Use α = 0.2,
β = 0.15, and use 55 as the initial intercept, and 4 as the initial slope. What is the
MAD of your forecasts?
Problem 3 Problem 4
Period Demand Demand
1 100 50
2 98 65
3 105 72
4 102 69
5 103 78
6 105 65
7 108 78
8 115 84
9 124 79
10 120 64
11 115 89
12 119 84
13 126 88
14 132 94
15 145 83
16 129 84
17 135 91
18 142 104
19 134 100
20 154 103