0% found this document useful (0 votes)
57 views

Forecasting Introduction: 1 Forecast Error

The document provides an introduction to basic forecasting methods including calculating forecast error, naive forecasting, simple average forecasting, moving average forecasting, weighted moving average forecasting, and exponential smoothing forecasting. It defines key terms, explains how to calculate common error metrics like MAD, MSE, MAPE, and RSFE/TS, and outlines the basic formulas and assumptions for each forecasting method with a focus on situations without trend or seasonality.

Uploaded by

Sayed Wahab
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Forecasting Introduction: 1 Forecast Error

The document provides an introduction to basic forecasting methods including calculating forecast error, naive forecasting, simple average forecasting, moving average forecasting, weighted moving average forecasting, and exponential smoothing forecasting. It defines key terms, explains how to calculate common error metrics like MAD, MSE, MAPE, and RSFE/TS, and outlines the basic formulas and assumptions for each forecasting method with a focus on situations without trend or seasonality.

Uploaded by

Sayed Wahab
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Forecasting Introduction

Version 1.7

Dr. Ron Tibben-Lembke

Feb 5, 2004

This introduction will cover basic forecasting methods, how to set the parameters of those
methods, and how to measure forecast accuracy.
We will use the following terminology:
Fi Forecast of demand in period i.
Di Actual demand in period i.
Ft,i Forecast of what the demand will be in period i; forecast was made at time t.

1 Forecast Error
The way we will measure how well a forecast performs is by using measures of forecast error that
you’ve probably seen before. Define δ (delta) as the difference between the forecast and the actual:

δi = Fi − Di .

We can perform a variety of calculations on this number to get a feel for how well our forecast
method is performing.
MAD = ni=1 |δi | /n
P
Pn
MSE = i=1 (δi )2 /n
MAPE = ni=1 |δDii| /n
P
Pn
RSFE = i=1 δi
TS = RSFE / MAD
MAD is the Mean Absolute Deviation, which tells us the average of the absolute values of the
errors. MSE is the Mean Squared Deviation, which is the average of the squared errors. MAPE is
the Mean Absolute Percentage Error, which takes the absolute error of each forecast, and divides
it by the value of the demand, to get the error as a percentage of the demand, and then averages
these percentage errors. RSFE is the running sum of forecast errors. Instead of taking the absolute
value of the errors, the positive and negative numbers are allowed to cancel each other out, if that’s
what happens. Finally, the Tracking Signal (TS) takes the RSFE and divides it by the MAD.
MSE is not as widely used. The MAD gives us a picture of the average amount of the error:
on average we are off by ten units, sometimes high, sometimes low. The RSFE tells us whether
our forecast is biased to always be too high, or always be to low. The analogy I like to make is to

1
archery. The MAD tells us that we miss the bullseye by 10 inches, on average, but it doesn’t tell
us which way we need to correct. The RSFE tells us that cumulatively, we have missed the target
100 inches to the right; so maybe sometimes we miss to the left, but on general, we miss to the
right a lot more than we miss to the left, so we should correct by aiming a bit more to the left.
If we aren’t consistently shooting in the wrong direction, the RSFE should stick close to zero,
sometimes positive, sometimes negative. If it becomes a large positive or negative number, we need
to correct our forecast. But how big is a big enough error that we should do something about it?
That’s where the TS comes in. If the RSFE gets to be as big as say, 5 times the MAD, we need to
fix something. So divide the RSFE by the MAD, and that’s the TS. If that gets close to 5, (either
positive or negative) we need to re-evaluate our forecasting method.

2 No Trend or Seasonality
If there is no trend or seasonality to your demand, then every day should be pretty much like any
other day.

2.1 Naive Method


With the naive method, your forecast for any day is that it will be exactly like what happened the
day before:
Ft = Dt−1 .
This seems so simple that it would seem like it could never work very well, but it does surprisingly
well in a lot of different circumstances.
If demand goes up and down randomly a little bit each day, then the naive method seems like
it would be at a bit of a disadvantage. If demand is a little below average one day, it’s likely that
it will be closer to the usual, or maybe a little above the usual the next day.

2.2 Simple Average


With a simple average, our forecast for the next day is just the long-term average of all of the sales
data we have. Pt−1
Sum of all t demands Di
Ft = = i=1 .
t t−1
Average all of the demand information you have.
The argument that can be made in favor of this method is that the more data you get, the
more your estimation of the true average gets closer to the real thing. To compare it to baseball,
if a player gets 2 hits in 5 at-bats, that would be a 0.400 batting average. But to really know if a
player is capable of hitting 0.400, we need to see how well the player does over a whole season. By
the time we’ve seen the player in action for a whole season, we really know how good the player
is. The same argument could be made for the simple average: the more sales numbers you get, the
better your estimate really fits the reality.

2.3 Moving Average


In theory, the simple average works wonderfully for demand that has no growth over time, but in
reality, everyone really has some increase or decrease over time. Because you are averaging some

2
numbers that eventually become really old, these old numbers drag down the average and should
not be considered representative of our situation. For that reason, we should throw out some of
the oldest data.
In a moving average forecast, we decide that we are only going to take an average of the n
most recent data points. We might use the last 4 months, or 3 years, whatever number we feel
comfortable with. Our forecast for period t is given by:
Pt−1
i=t−1−n Di
P
last n demands
Ft = = .
n n
If there truly is no trend to the demand, this won’t work quite as well as the simple average,
but if there is any actual trend to the demand, the moving average will work better than the simple
average.

2.4 Weighted Moving Average


By switching to the moving average, we have solved the problem the simple average had of consid-
ering too much data. But there still could be a problem in that all of the data we are considering is
getting an equal amount of weight. In some cases, people argue that the most recent information
should get the most consideration in our calculation, and that older information should not get as
much consideration.
A way to fix this is to give each demand point a different amount of weight, and give more weight
to the most recent data points, and less weight to older points. If we are using m periods in our
calculation, we will give a weight of bj to each period. The oldest data point gets a weight of b1 , and
the most recent gets a weight of bm . Usually, people set the weights so that b1 ≤ b2 ≤ b3 ... ≤ bm .
If all of the weights do not sum to 1.0, it won’t really be an average. So to easily make the forecast
a proper average, we will multiply each data point by its relevant weight, and then divide by the
sum of the weights: Pm
b1+m−i Dt−i
Ft = i=1Pm .
i=1 bi
Although the formula looks complicated, the idea is still relatively simple: if you are including n
periods in your calculation, multiply the oldest data point by b1 and the newest one by bn . Add
them up, and divide by the sum of the bi s.

2.5 Exponential Smoothing


A more accurate name for this method would be ’exponentially weighted moving average.’ The
idea here is that we will build on the weighted moving average method, but instead of having to
choose the parameters bi for each period, we will only have to choose one parameter, α, (alpha),
which is a number between 0 and 1. This forecast is also easy to write the equation for:

Ft = α ∗ Dt−1 + (1 − α) ∗ Ft−1 .

We can also write the same thing as:

Ft = α ∗ (Dt−1 − Ft−1 ) + Ft−1 .

3
As you can see, it is very easy to use. All you need to decide ahead of time is what value to use for
α, and you just need to know what the most recent demand was and what the most recent forecast
was, and that’s all you need to make a new forecast. If α = 0, the forecast never changes, and if α
= 1, we just have the naive method. Usual values are around 0.1 to 0.3.
The name comes from the fact that we are “smoothing” the numbers, getting a new estimate
each time by modifying last period’s number. The exponential part is a little trickier to explain,
but here goes. If we make a forecast for period 10, we would write:

F10 = α ∗ D9 + (1 − α) ∗ F9 .

But, we could ask ourselves, last month, when we made the forecast for period 9, how did we do
it? We used this formula:
F9 = α ∗ D8 + (1 − α) ∗ F8 .
So if you look at where the forecast for period 10 really comes from, if we substitute this expression
in for the F9 part, we could write it like this:

F10 = αD9 + (1 − α) [αD8 + (1 − α)F8 .]

F10 = αD9 + α(1 − α)D8 + α(1 − α)F8 .


But where did F8 come from? Again, the same formula:

F8 = α ∗ D7 + (1 − α) ∗ F7 .

Substituting that in and arranging, we get:

F10 = αD9 + α(1 − α)D8 + α(1 − α)2 D7 + α(1 − α)2 F7 .

We could play this game a few more times and get:

F10 = αD9 +α(1−α)D8 +α(1−α)2 D7 +α(1−α)3 D7 +α(1−α)4 D6 +α(1−α)4 D5 +α(1−α)5 D6 +...

What you see then is that the new forecast that we so easily make for period 10 is really the sum
of the past demand figures. Each demand number is getting a different amount of weight, so what
this really is is a weighted moving average. But how do the weights change? Since α is between 0
and 1, (1 − α) must also be a number between 0 and 1. Any time you multiply together numbers
between 0 and 1, the result is a smaller number. (Try it.) So the weights get smaller exponentially,
and hence the name.

3 What is Exponential Smoothing?


The basic idea in exponential smoothing is that we take an average of our old estimate of some
quantity, and some new information about that quantity. In exponential smoothing, we are as-
suming that there is no growth, no trend to the data. So every period, we are just making new
estimates of the intercept.

Ft = α ∗ Dt−1 + (1 − α) ∗ Ft−1 .

4
The new demand gives us another data point about what the intercept might be, and the old
forecast is our old estimate of the intercept.
We can (and we are going to) use this idea to update estimates of other things, like a trend.
Although the formula will look different, the idea is the same:

New Estimate = α · New information + (1 − α) · Old Estimate.

4 Setting Parameters
How should we choose the best parameters of α or b? How do we make our forecasting methods
get the best possible results? The idea is simple: try different parameter values until you get the
MAD (or MSE, or whatever you want to focus on) as small as possible. You could do this by hand,
but using a spreadsheet or some other computer tool is really the only way to do it quickly.
However, there is a risk that by doing this, you will be cooking your method to fit the past
data perfectly, but that doesn’t mean it will work anywhere that well for the future. Continuing
with baseball analogies, if you could throw the exactly same pitch 100 times (I could just stand
there and rewind the ball like it was a video), I ought to eventually be able to hit a home run. But
that doesn’t mean I’m going to hit a home run when I let you throw whatever you want to the
next time.
To get around that, a more trusted method is to take one part of your data for tweaking the
parameters, probably the first half or two thirds of the data. Then look to see how the method
performs on the remainder of your data. This is a more accurate portrayal of how it would do once
you gave it some new data.

5 Forecasting with a Trend


When our demand has a trend, there are two main methods that we can use.

5.1 Linear Regression


I assume that you are all familiar with linear regression from your statistics classes. Basically, we
assume that there is a linear relationship between one output (dependent) variable, Y , and the
input (independent) variable, X. In our case, we will be looking at the independent variable as
being time, t, and we think that demand is generally growing over time. We do a linear regression
to get a formula like this:
Y (t) = a + bt.
For any time value, t, we put it into the equation, and get a straight-line forecast of the demand
for that period.
How well the data are approximated by the line is represented in the term R2 . R2 can be
literally interpreted as the ”percentage of changes in Y that can be explained by changes in X.
To do a linear regression in Excel, there are four ways you could do, presented in the order of
the things I like the least to the way I think is the best.
First, you could dust off your statistics book and type in the formulas from it. That sounds
like a lot of work, and there are lots of opportunities to make a mistake when typing in those big

5
formulas. But if you get it all in correctly, the spreadsheet will update automatically when you
enter new data points.
Secondly, could also use the “Data Analysis ToolPak.” You should find that under “Tools —
Data Analysis.” If it is not there, go to “Tools — Add-Ins.” In that dialog box, check the box by
Analysis ToolPak. If that does not appear in the dialog box, you need to get out your CDs and
install that part of Excel. After you go to “Tools — Data Analysis,” a dialog box comes up, where
you tell it which cells are the X’s, and which are the Y’s. Tell it where to put the output, but be
careful. If you put it on the sheet you are working on, the following 18 rows will get written over
with the output.
In that output, you will want to look at where the “Intercept” row and the “Coefficient” column
intersect. That is the intercept. The slope is the row below that, in the “X Variable” column, where
it meets the “Coefficient” column. R-squared is in the second row of numbers, under “R Square.”
One problem with this method is that when you add new data point to the spreadsheet, you
have to go back up to “Tools — Data Analysis” every time to re-run the LR. The spreadsheet can’t
update automatically.
Thirdly, another way to get the intercept and slope is to create a graph of the data. Right click
on the data line, and select “Add Trendline.” In that dialog box, add a linear trendline, and under
“Options,” you can have the equation of the trendline displayed on the graph, and also R-squared.
The trouble with doing the LR in the graph, is that you can’t make use of the numbers that appear
in the graph in any calculations.
Finally, the best way to do the LR is to use the SLOPE and INTERCEPT functions. SLOPE(range
of x value, range of y values) gives you the slope, and INTERCEPT(x values, y values) gives the
intercept. To find out R-Squared, use RSQ(x values, y values).

5.2 Double Exponential Smoothing


The only problem with Linear Regression is that it gives all the demand points equal weight when
trying to fit a line. Really, we would like it to try hardest to fit the line to the most recent
data points, and not worry quite so much about fitting the line to the oldest data points. Linear
regression cannot do that.
However, we do remember that exponential smoothing had that type of behavior: give the most
weight to the most recent. There is a way we can adapt exponential smoothing to work with a
trend. This is also known as Holt’s Method. We will define two terms:
Si out estimate of the level, or intercept, at time i.
Ti our estimate of the trend in period i.
T AFi+1 Si + Ti .
our estimate of what the demand will actually be in period i.
In each period, we will revise our estimate of the level, and our estimate of the trend. To do
that, we will use two different smoothing constants, α, and β. Like α, β is a number between 0.1
and 0.3, usually.

1. At the end of a period, compute a new intercept:

Si = T AFi + α(Di − T AFi ).

We can also write this as:


Si = αDi + (1 − α) ∗ T AFi .

6
2. Compute a new, smoothed estimate of the trend

Ti = Ti−1 + β(Si − T AFi ).

3. Use these two new terms to predict the demand for period i + 1:

T AFi+1 = Si + Ti .

If you want to make a forecast for the period after that, just add on another period’s worth of
growth:
T AFi+n = Si + Ti ∗ n.

5.3 Is this Right?


When I first saw the equation for the trend, I thought “why doesn’t it look more like the equation
for the intercept?” I expected something like α times the new information, plus (1 − α) times the
previous estimate. It seemed to me that it should really look like

Ti = β(Si − Si−1 ) + (1 − β) ∗ Ti−1 .

This follows the same scheme that the Si formula follows. (Si − Si−1 ) shows us how much the
intercept has changed recently, so it is the new information. We’ll give it the weight of β, and give
the old estimate of the trend the weight of (1 − β).
If we take this, and multiply out the parenthesis (using the distributive property) we get:

Ti = βSi − βSi−1 + Ti−1 − βTi−1 .

We can rearrange this to be

Ti = Ti−1 + βSi − β(Si−1 + Ti−1 ).

Notice that the last part of this, Si−1 + Ti−1 , is just T AFi = Si−1 + Ti−1 . If we write it that way,
we get
Ti = Ti−1 + βSi − βT AFi .
If we put the last two terms together, we get the following, which is the original equation:

Ti = Ti−1 + β(Si − T AFi ).

So even though the two equations look different, they are equivalent.

6 Problems
1. Explain the difference between bias and deviation.

2. Explain the difference between MAD and MSE.

3. Using the following data, create a forecast for each period, using the following methods:

7
(a) 3 period Moving Average
(b) Weighted Moving Average - 5 periods, you choose the weights
(c) Exponential smoothing - Compute the MAD of it, and play around with the alpha to
try to get the MAD as small as you can get it.
(d) Naı̈ve method
(e) Plot the demands and all of the forecasts on a graph. Which method seemed to work
the best?

4. Using the data for Problem 4, below, create a forecast for each period.

(a) Create the forecasts using Linear Regression. What is the R2 value? What is the MAD
from your forecasts?
(b) Create the forecast for each period using double exponential smoothing. Use α = 0.2,
β = 0.15, and use 55 as the initial intercept, and 4 as the initial slope. What is the
MAD of your forecasts?
Problem 3 Problem 4
Period Demand Demand
1 100 50
2 98 65
3 105 72
4 102 69
5 103 78
6 105 65
7 108 78
8 115 84
9 124 79
10 120 64
11 115 89
12 119 84
13 126 88
14 132 94
15 145 83
16 129 84
17 135 91
18 142 104
19 134 100
20 154 103

You might also like