Neural Network With Specialized Knowledge For Forecasting Intermittent Demand
Neural Network With Specialized Knowledge For Forecasting Intermittent Demand
Introduction
There are a variety of challenges for every area in the implementation of the smart
manufacturing, or smart production factory, and overcoming them requires the
collaboration of different disciplines, both technical and social disciplines. The 4th
industrial revolution focuses not only on making machines intelligent with sensors and
IoT (Internet of Things) but also on automating and improving decision-making
processes [1]. One of these processes is inventory control, it can have its efficiency
improved thought the use of Industry 4.0 technologies [2].
Inventory control is essential for the company’s competitiveness in any market [3].
One of its main components is demand forecasting. Due to its importance, several
forecast methods have been developed over the years. These methods yield their best
results depending on the demand behavior. The demand behavior can be divided into
1
Corresponding Author, Email: [email protected].
A.C.A. de Oliveira et al. / Neural Network with Specialized Knowledge 525
four types, as follows: (i) erratic, a pattern that has a high demand size variability and
does not have many zero-demands; (ii) smooth, it has a low demand size variability and
few zero-demands; (iii) intermittent, a pattern that has many zero-demands and a low
demand size variability; and (vi) lumpy, it has many zero-demands and a high demand
size variability [4].
Different methods centered on lumpy and intermittent demand have been proposed
in the literature. The single exponential smoothing (SES) [5] was commonly used to
predict the demand for these types of behavior, however, it is not as accurate as newer
methods [6]. One of the earlier works focusing exclusively on this topic was [6], the
Croston’s method (CR) was the first to separate the demand size from the demand
occurrence. Later it was shown by [7] that this method was biased. [8] proposed a
revision to the CR method to approximately correct this bias, the method is known as
Syntetos-Boylan Approximation (SBA). However, these methods do not account for the
item’s obsolescence. To solve this problem, [9] proposed a method that is updated every
period, known as TBS (Teunter-Syntetos-Babai) method.
The Machine Learning (ML) techniques are one of the promising alternatives to
predict the demand behaviors [10]–[13]. These technique can identify nonlinear
functions from the data sample without any assumptions about its probabilistic
distribution, making them good candidates for forecasting methods [11]. [10] proposed
radial and elliptical basis functions networks to forecast the demand with inputs. [11]–
[13] adopted multilayer perceptron networks (MLP) as the forecasting method. However,
these methods only considered the pattern of the demand and not the cause of the
behavior.
In this paper we go further and model a multi-layer perceptron neural network with
internal knowledge of the process that drives the demand. For this, the artificial neural
network (ANN) was implemented on data from the indirect material replenishment
process of a helicopter engine maintenance company. The dataset is composed of 19
items with different characteristics. To compare the results of the proposed NN, the
following traditional methods were employed: moving average, SES, CR, SBA, and
TSB. In addition, the networks from [11]–[13] were implemented. The results were
compared using three different performance measures and to ensure a robust validation
of the results, a statistical analysis was performed.
This work is organized as follows: Section 1 introduces a literature review of the
aforementioned forecasting methods and; in Section 2, it is presented the methodology
followed in this paper; Section 3 contains the forecast results and the statistical analyses
performed on the results; Section 4 presents the conclusions and future works.
this method was also proved to be biased [7]. To overcome this bias, [8] proposed a
correction to CR method, making it an approximately unbiased demand estimator. This
method is known as Syntetos-Boylan Approximation. [8] showed the superiority of the
SBA method over the CR method in an experiment of 3000 stock-keeping units in the
automotive industry.
As the CR and SBA methods are only updated when there is a non-zero demand
occurrence, the predictions can become inaccurate after several periods with zero
demand. This happens when an item is at risk of obsolescence. This results in a positive
bias in the prediction. To solve this bias, [9] developed a forecasting method, TBS, that
uses the demand probability and not the demand interval, therefore, it is updated every
period. For the demand size, it utilizes the same formula as the CR and SBA methods.
Apart from the traditional methods, there are forecasting methods based on machine
learning techniques, specifically, artificial neural networks. ANNs are regarded as been
able to provide good approximations to most functional relationships. And by adding
more attributes to the model, the prediction results can be improved [11]. ANN can also
model non-linear relations in the data, commonly present in the intermittent demand
behavior. This non-linearity can be overlooked by most traditional methods. In addition,
for the implementation of the ANNs, there is no need for any statistical distribution
assumptions [10]. Altogether, the ANNs have great potential for forecasting the
intermittent behavior of the demand [11].
To the best of our knowledge, the first work focusing on predicting intermittent or
lumpy demand using ANNs was [10]. They utilized three different basis functions
networks for predicting the demand. The networks were Gaussian radial basis function
(RBF), normalized Gaussian radial basis function (NRBF), and Gaussian elliptical basis
function (EBF). The number of inputs and basis functions in the hidden unit layer was
optimized for each time series. They implemented two different types of prediction. The
first one was forecasting the future total demand for a period; and the second type, they
forecasted the demand size and the time of the next demand occurrence. Their results
showed that RBF was better suited for predicting the intermittent demand compared to
the other ANNs, and it had a better performance than the CR method.
[11] proposed a MLP network trained by back-propagation to predict the future
demand size for the next period. The ANN had three layers, one input layer, one hidden
layer, and one output layer. The input was composed of the past demand and the number
of periods between the last two non-zero demands. The hidden layer was composed of
three neurons, enough to approximate most complex functions [12]. They compared their
ANN to the traditional methods SES, CR, and SBA. Their results indicated that their
ANN was, generally, superior to the other methods. One downside of neural network is
the fact that it utilizes one dataset for training and one for testing, so it is not continuously
updated with new information, while the traditional methods are updated each period.
[12] adopted the same network configuration as [11], but it changed the input layer.
They used as input the number of consecutive zero-demand periods instead of the number
of periods between the last two non-zero demands. For comparison, traditional methods
(SES, CR, SBA, and weighted moving averages) were also implemented. Their ANN
had the best overall performance in the dataset compared to the traditional methods.
However, [11], [12] did not compare their networks with other networks. [13] filled
this void by comparing their proposed ANN, the two networks mentioned and the
traditional methods CR and SBA. Their ANN combined the inputs from [11], [12] having
three inputs. In their work, they tested different configurations of networks, including
the network architecture, the learning approach, and the learning method. They tested
A.C.A. de Oliveira et al. / Neural Network with Specialized Knowledge 527
2. Methodology
The objective of this paper is to model a multi-layer perceptron neural network to predict
the demand behaviors based on the internal knowledge of the process that drives the
demand.
To evaluate the performance of the proposed MLP, it was compared to traditional
methods and known ANNs. The selected traditional methods were: (i) Moving average;
(ii) Single exponential smoothing; (iii) Croston’s method; (iv) Syntetos-Boylan
Approximation; (v) Teunter-Syntetos-Babai method. The ANNs selected are from [11]–
[13]. Table 1 presents all the methods compared in this work, their compact notation, and
their parameters.
Table 1. Forecasting methods, their notation, and parameters.
Method Notation Parameters
Moving average MAX_Y X indicates the number of weeks considered for the
average and Y indicates the updating frequency in
weeks
Single exponential smoothing SES One smoothing constant
Croston’s method CR Two smoothing constants
Syntetos-Boylan approximation SBA Two smoothing constants
Teunter-Syntetos-Babai method TSB Two smoothing constants
ANN from [11] GUTI Number of hidden layers, number of hidden
neurons, learning rate, momentum factor
ANN from [12] MUKH Number of hidden layers, number of hidden
neurons, learning rate, momentum factor
ANN from [13] LOLL Number of hidden layers, number of hidden
neurons, learning rate, momentum factor
Proposed ANN CREP Number of hidden layers, number of hidden
neurons, learning rate, momentum factor
These methods are implemented in a dataset of 19 weekly time series company with
345 datapoints each from a helicopter engine MRO.
The methods’ performances are analyzed based on three accuracy measures: MAPE
(Mean Absolute Percentage Error), MPE (Mean Percentage Error) and PB (Percentage
Best). The combined analysis of the three measures gives an overall perspective of the
method’s behavior.
To enable the comparison of the methods, statistical analyses are performed on the
results of the three accuracy measures, MAPE, MPE, and PB. The first one is the
rANOVA to identify if there is one method significantly different from the others. The
rANOVA was chosen over the ANOVA since all the methods are evaluated on the same
sample. If the rANOVA indicates that there is a difference in the methods, a pairwise
528 A.C.A. de Oliveira et al. / Neural Network with Specialized Knowledge
paired t-test is conducted. This test can identify if two methods as significantly different
from each other. The statistical significance level utilized for both analyses is 0.05.
The following subsections discuss in more detail the data utilized in the project, the
ANN proposed the setting of each methods’ parameters and the accuracy measures.
2.1. Data
The modeled ANN employs an actual demand dataset from a helicopter engine
maintenance company based on Rio de Janeiro, Brazil. The dataset is composed of 19
weekly demand time series with 345 observations each. To understand the demand
behavior of the dataset, the squared coefficient of variation of demand (CV 2) and the
average inter-demand interval (ADI) was computed. These two variables indicate the
demand size erraticness and the intermittency of the demand occurrence, respectively.
Based on [4], these two variables are used to categorize the behavior as erratic, smooth,
intermittent, or lumpy. The threshold values for the categorization (CV2 = 0.49 and ADI
= 1.32) were derived from a theoretical comparison of the performance of the SES, CR
and SBA methods over constant lead time. Figure 1 plots the scattering of the 19 time
series CV2 and ADI from the dataset, it also shows the threshold values for the different
demand behaviors. As seen in Figure 1, most items have an intermittent behavior (9
items) or lumpy behavior (8 items). Smooth and erratic behavior correspond to 1 item
each in the dataset. Therefore, the methods focusing on the intermittency of the demand
are suitable for the provided data.
CV2-ADI Diagram
30
25
20
ADI
15 Intermittent Lumpy
10 (9) (8)
5
0 Smooth Erratic
0 (1) 0,5 (1) 1 1,5 2
CV2
analyzed. The variables considered are: (i) year; (ii) trimester; (iii) month; (iv) engine
input; (v) engine output; (vi) number of zero-demand occurrence in sequence; (vii) the
number of periods between the last two non-zero demands; and (viii) the last demand.
All these variables are input neurons for the first layer. In addition, the historical
information of the last 4 datapoints of the variables (iv) to (viii) is also used as input.
This conveys the sense of production history to the ANN. The variables (vi) and (vii)
were presented by [12] and [11], respectively.
The identification of the variables was achieved by expert interviews from different
areas related to production. These areas include production planning and control,
inventory managing, and manufacturing.
The settings for the ANN were one hidden layer with three hidden neurons, a
learning rate of 0.01, a momentum of 0.01, and 30,000 epochs. It was trained by a back-
propagation algorithm.
The output unit layer represents the demand value of the next period and it is
connected to all hidden neurons.
The traditional methods SES, CR, SBA, and TBS have smoothing constants that need to
be defined. Based on the recommendations from [6], [15] to choose the parameters in the
range of 0.05 to 0.20, we opted to use 0.1 for all the constants. For the moving average,
it was decided to use the configuration MA52_26, the mean of the former 52 weeks
updated every 26 weeks. The ANNs followed the same configuration as the proposed
network.
Each of 19 time series was partitioned into two datasets, one for training (65%) and
one for validation (35%), as used by [12]. The division was based on the chronology of
the samples, the older datapoints were used for training and the newer ones for validation.
This way, the traditional methods can be calibrated before they are utilized, and the
ANNs can be trained and their parameters settled using the same dataset. The forecast
results from the validation dataset were used to assess the methods’ performance. One
important distinction between traditional and ANNs methods is that the first is
continuously updated and the second is not.
Different accuracy measures provide different insights about the error of the predictions.
Therefore, to use only one measure would not give a complete understanding of the
prediction methods [16]. We opted to use only non-scale dependent measures, so the
statistical analyses on these measures would be possible [17].
The first accuracy measure utilized is the MAPE, it is the most adopted non-scale
measure [12]. This measure provides insides of the bias magnitude. The standard
definition of the measure uses the ratio |Et|/yt, where Et and yt are the error and the actual
demand in period t, respectively. Since there are zero demand occurrences, the following
modified equation is used:
σ
సభȁ௬
ො ି௬ ȁ
ܧܲܣܯൌ σ
(1)
సభ ௬
530 A.C.A. de Oliveira et al. / Neural Network with Specialized Knowledge
The second accuracy measure is the MPE, the ratio of the mean error and the average
demand. It indicates the overall bias behavior of the error, in other words, it indicates if
the method underestimates or overestimates the prediction. The equation of MPE is as
follows:
σ ො ି௬ ሻ
సభሺ௬
ܧܲܯൌ σ
(2)
సభ ௬
The last measure is the Percentage Best which is the percentage of times one method
outperforms the rest considering one specific criterion, in this case, we utilized the
absolute error.
The means and variances of the accuracy measures for each method are presented in
Table 2. For a more comprehensive analysis, this section is divided by each accuracy
measure.
Table 2. Accuracy measure mean and variance for the forecasting methods.
Measure MA52_26 SES CR SBA TBS MUKH GUTI LOLL CREP
MAPE
1.281 1.277 1.291 1.268 1.272 1.151 1.151 1.152 1.177
Mean
MAPE
0.363 0.254 0.445 0.410 0.268 0.195 0.193 0.197 0.258
Variance
MPE
-0.044 -0.006 0.015 -0.035 -0.004 -0.195 -0.196 -0.195 -0.161
Mean
MPE
0.111 0.017 0.158 0.142 0.024 0.077 0.074 0.078 0.094
Variance
PB
0.217 0.130 0.025 0.072 0.076 0.064 0.138 0.129 0.149
Mean
PB
0.030 0.012 0.001 0.008 0.002 0.009 0.027 0.007 0.023
Variance
3.1. MAPE
From the information of Table 2, it can be ANNs have, generally, a lower MAPE mean
than the traditional methods, meaning that their absolute bias is lower. And, the MAPE
variance of the ANNs tends to be lower than the other methods.
The rANOVA (Table 3) indicates that at least one of the methods is significantly
different from the other (p-value = 0.002 < 0.05). The rANOVA also indicates that the
time series is also significant for the MAPE result.
Table 3. rANOVA for MAPE.
Source SS df MS F p-value F crit (0.05)
Time Serie 43.010 18 2.389 98.211 2.149E-71 1.676
Method 0.624 8 0.078 3.204 0.002 2.003
Error 3.504 144 0.024
Total 47.140 170
To identify the different methods, a pairwise paired t-test is conducted, and its result
is presented in Table 4. It can be seen that the SES is statistically inferior to all the ANNs
implemented, but it is not significantly different from the other traditional methods. The
known conclusion that SBA is less biased than the CR method is validated once more.
A.C.A. de Oliveira et al. / Neural Network with Specialized Knowledge 531
Also, the ANN proposed in this paper is the only network statistically less absolute-
biased than the CR method. In contrast, the proposed ANN is the only network not
different from the TBS method. There is not a significant difference between the ANNs
considering the MAPE.
Table 4. Significantly different paired methods for the measures of accuracy MAPE, MPE, and PB.
MAPE MPE PB
SES MUKH SES MUKH MA52_26 CR
SES GUTI SES GUTI MA52_26 SBA
SES LOLL SES LOLL MA52_26 TBS
SES CREP SES CREP MA52_26 MUKH
CR SBA CR SBA SES CR
CR CREP CR MUKH CR SBA
TBS MUKH CR GUTI CR TBS
TBS GUTI CR LOLL CR GUTI
TBS LOLL CR CREP CR LOLL
SBA CREP CR CREP
TBS MUKH TBS LOLL
TBS GUTI MUKH LOLL
TBS LOLL MUKH CREP
TBS CREP
3.2. MPE
In general, the traditional methods have a better performance than the ANNs considering
the MPE accuracy measure. The traditional methods resulted in MPEs closer to zero,
with an overall MPE mean of -0.0147, and the ANNs tend to underestimate the demand
size, with an overall MPE mean of -0.187.
The results from the rANOVA imply that the forecasting method and the item
significantly impact the MPE of the prediction. Table 5 shows the results from the
rANOVA. Moreover, the results of the t-tests in Table 4 show the paired methods that
can be considered significantly different. The traditional methods SES, CR, and TBS
have a smaller bias compared to all ANNs. Also, the CREP network and the SBA method
are significantly different from each other. Between the ANNs, the CREP network had
the closest value to zero, meaning, it had the smallest bias, however, the t-test did not
indicate a significant difference between neural networks.
Table 5. rANOVA for MPE.
Source SS df MS F p-value F crit (0.05)
Time Serie 5.668 18 0.315 5.469 1.304E-09 1.676
Method 1.314 8 0.164 2.852 0.006 2.003
Error 8.292 144 0.058
Total 15.274 170
3.3. PB
While the previous accuracy measures indicate how much each method is different from
the others, the percentage best expresses how many times one method is better than the
others. From Table 2, the method with highest mean PB is the moving average, followed
by the CREP network.
532 A.C.A. de Oliveira et al. / Neural Network with Specialized Knowledge
The rANOVA for this measure indicated that the method has a significantly impact
on the forecast result, as seen in Table 6. And, as expected, the time series have no impact
on the measure, because the sum the measures across the time series is approximately 1.
The pairwise paired t-test results (Table 4) indicate that the moving average is
superior to the other traditional methods (except for SES) and to the MUKH network.
The CR method have an inferior performance when compared to most other methods.
The PB was the only accuracy measure to indicate a significantly difference between the
ANNs, the CREP and LOLL networks have significantly better results than the MUKH
network.
Table 6. rANOVA for MPE.
Source SS df MS F p-value F crit (0.05)
Time Serie 1.404E-31 18 7.800E-33 5.308 1 1.676
Method 0.502 8 0.063 4.269 0.0001 2.003
Error 2.116 144 0.015
Total 2.619 170
Acknowledgement
References
[1] S. Kumar, B. S. Purohit, V. Manjrekar, V. Singh, and B. K. Lad, Investigating the value of integrated
operations planning: A case-based approach from automotive industry, Int. J. Prod. Res., vol. 56, no.
22, pp. 6971–6992, 2018, doi: 10.1080/00207543.2018.1424367.
[2] J. Chen, O. Gusikhin, W. Finkenstaedt, and Y.-N. Liu, Maintenance, Repair, and Operations Parts
Inventory Management in the Era of Industry 4.0, IFAC-PapersOnLine, vol. 52, no. 13, pp. 171–
176, 2019, doi: 10.1016/j.ifacol.2019.11.171.
[3] E. A. Silver, D. F. Pyke, and D. J. Thomas, Inventory and Production Management in Supply Chains.
CRC Press, 2017.
[4] A. A. Syntetos, J. E. Boylan, and J. D. Croston, On the categorization of demand patterns, J. Oper.
Res. Soc., vol. 56, no. 5, pp. 495–503, 2005, doi: 10.1057/palgrave.jors.2601841.
[5] R. G. Brown, Smoothing, forecasting and prediction of discrete time series. Englewood Cliffs, N.J.,
Prentice-Hall, 1962.
[6] J. D. Croston, Forecasting and Stock Control for Intermittent Demands, Oper. Res. Q., vol. 23, no.
2, 1972.
[7] A. A. Syntetos and J. E. Boylan, On the bias of intermittent demand estimates, Int. J. Prod. Econ.,
vol. 71, pp. 457–466, 2001.
[8] A. A. Syntetos and J. E. Boylan, The accuracy of intermittent demand estimates, Int. J. Forecast.,
vol. 21, pp. 303–314, 2005, doi: 10.1016/j.ijforecast.2004.10.001.
[9] R. H. Teunter, A. A. Syntetos, and M. Z. Babai, Intermittent demand : Linking forecasting to
inventory obsolescence, Eur. J. Oper. Res., vol. 214, no. 3, pp. 606–615, 2011, doi:
10.1016/j.ejor.2011.05.018.
[10] L. Carmo and J. Rodrigues, Adaptive forecasting of irregular demand processes !, Eng. Appl. Artif.
Intell., vol. 17, pp. 137–143, 2004, doi: 10.1016/j.engappai.2004.01.001.
[11] R. S. Gutierrez, A. O. Solis, and S. Mukhopadhyay, Lumpy demand forecasting using neural
networks, Int. J. Prod. Econ., vol. 111, pp. 409–420, 2008, doi: 10.1016/j.ijpe.2007.01.007.
[12] S. Mukhopadhyay, A. O. Solis, and R. S. Gutierrez, The Accuracy of Non-traditional versus
Traditional Methods of Forecasting Lumpy Demand, J. Forecast., vol. 735, no. August 2011, pp.
721–735, 2012.
[13] F. Lolli, R. Gamberini, A. Regattieri, E. Balugani, T. Gatos, and S. Gucci, Single-hidden layer neural
networks for forecasting intermittent demand, Int. J. Prod. Econ., vol. 183, no. July 2016, pp. 116–
128, 2017, doi: 10.1016/j.ijpe.2016.10.021.
[14] C. Xiang, S. Q. Ding, and T. H. Lee, Geometrical Interpretation and Architecture Selection of MLP,
IEEE Trans. Neural Networks, vol. 16, no. 1, pp. 84–96, Jan. 2005, doi: 10.1109/TNN.2004.836197.
[15] F. R. Johnston and J. E. Boylan, Forecasting for Items with Intermittent Demand, J. Oper. Res. Soc.,
vol. 47, no. 1, pp. 113–121, Jan. 1996, doi: 10.1057/jors.1996.10.
[16] S. Makridakis and M. Hibon, The M3-competition: Results, conclusions and implications, Int. J.
Forecast., vol. 16, no. 4, pp. 451–476, 2000, doi: 10.1016/S0169-2070(00)00057-1.
[17] R. Carbone and J. S. Armstrong, Note. Evaluation of extrapolative forecasting methods: Results of
a survey of academicians and practitioners, J. Forecast., vol. 1, no. 2, pp. 215–217, Apr. 1982, doi:
10.1002/for.3980010207.