J 2(s)
J 2(s)
Research article
A R T I C L E I N F O A B S T R A C T
Keywords: Intermittent demand forecasting is an important challenge in the process of smart supply chain transformation,
Intelligent supply chain management and accurate demand forecasting can reduce costs and increase efficiency for enterprises. This study proposes an
Intermittent demand intermittent demand combination forecasting method based on internal and external data, builds intermittent
Combination forecasting
demand feature engineering from the perspective of machine learning, predicts the occurrence of demand by
Machine learning
Transfer learning
classification model, and predicts non-zero demand quantity by regression model. Based on the strategy selection
on the inventory side and the stocking needs on the replenishment side, this study focuses on the optimization of
the classification problem, incorporates the internal and external data of the enterprise, and proposes two com-
bination forecasting optimization methods on the basis of the best classification threshold searching and transfer
learning, respectively. Based on the real data of auto after-sales business, these methods are evaluated and
validated in multiple dimensions. Compared with other intermittent forecasting methods, the models proposed in
this study have been improved significantly in terms of classification accuracy and forecasting precision, which
validates the potential of combined forecasting framework for intermittent demand and provides an empirical
study of the framework in industry practice. The results show that this research can further provide accurate
upstream inputs for smart inventory and guarantee intelligent supply chain decision-making in terms of accuracy
and efficiency.
1. Introduction efficiency and agility of the supply chain. Inventory control can benefit
greatly from the guidance of accurate demand forecasting. Additionally,
Demand forecasting is an important part of intelligent supply chain maintaining appropriate inventory levels can save companies a lot of
management as well as a key concern for many companies. This is unnecessary inventory costs, as well as improve customer satisfaction
because it is an essential foundation for planning activities, such as in- and bring more revenue by avoiding stock-outs. Therefore, lower in-
ventory management, and accurate forecasting related to every aspect of ventory costs and higher service levels require more accurate forecasting
inventory management (Syntetos and Boylan, 2001), e.g., production results (Ghobbar and Friend, 2003).
planning, marketing planning, procurement planning, inventory plan- Intermittent Demand (ID), also known as zero inflation, is a phe-
ning, scheduling operations, vehicle scheduling, and network planning nomenon in which there is zero demand for a product in many periods
(Kolassa, 2016; Ma and Fildes, 2017). Demand forecasting runs through (Shenstone and Hyndman, 2005). This phenomenon is particularly
the entire supply chain as well as the manufacturing process. It is the key common in the aftermarket industry (Willemain et al., 2004), as well as
to dealing with uncertainties in the supply chain, supporting the basis for in the aerospace, computer components, electronics, and industrial ma-
decision-making in subsequent stages, and improving the operational chinery industries. Demand forecasting usually includes three
https://doi.org/10.1016/j.dsm.2022.04.001
Received 15 December 2021; Received in revised form 3 April 2022; Accepted 7 April 2022
Available online 22 April 2022
2666-7649/© 2022 Xi'an Jiaotong University. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under the
CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
dimensions: the warehouse dimension (warehouse, store, etc.), the which is combined with predicted demand occurrences to form the final
product dimension (Stock Keeping Unit (SKU), category, etc.) and the forecasting demand series. Roanec and Mladeni (2021) proposed a
time dimension (daily, weekly, monthly, etc.) (Syntetos et al., 2016). In combination model which shows that global classification models are the
many cases, non-intermittent data are intermittent at fine-grained levels best choice when predicting demand event occurrence, and when pre-
of data decomposition. For example, forecasting at a time granularity dicting demand sizes, they achieved the best results using Simple Expo-
from months to days results in a large amount of intermittent data and nential Smoothing (SES) forecast.
zero demand due to out-of-stock or low demand is often found in the SKU In summary, most of the current researches do not focus on the
Store dimension. SKUs that are at high risk of obsolescence may be classification problem in intermittent demand forecasting, and the time
nearing the end of their product life cycle. Therefore, accurate fore- series models give generally biased average forecasts. As long as there is
casting is especially important for these items (Patton and Steele, 2003). non-zero demand in the recent general period, there will be only non-
Intermittent demand has been a difficult area within the field of zero forecasts for the future. This will create a lot of pressure on the
supply chain demand forecasting for a long time, and forecasting replenishment side for actual business operations, such as inventory costs
methods and assessment metrics can be more challenging. In the existing and backlog costs. Therefore, improving the classification accuracy of
forecasting literature, a lot of attention has been given to the modeling of demand occurrence is a critical issue. When attempting to solve the
fast-moving time series and the construction of causal models (Bozos and classification problem, the existing research does not consider the char-
Nikolopoulos, 2011; Nikolopoulos et al., 2007; Nikolopoulos and Tho- acteristic factors of demand occurrence comprehensively, but only relies
makos, 2019; Petropoulos et al., 2014). However, very limited attention on the characteristics of time series itself. Machine learning, however,
has been paid to intermittent demand forecasting (Nikolopoulos et al., can comprehensively capture the global impact through feature engi-
2011). Most people generally believe that intermittent demand fore- neering and cross-training. A growing number of studies show that global
casting methods seem to work only for spare parts demand forecasting. In machine learning time series models (models built with multiple time
reality, this is an inaccurate assumption, since 60% of SKUs in any field of series) provide better results than local models (models built with a
commodity inventory, whether spare parts or consumer goods, exhibit single time series) (Bandara et al., 2020; Salinas et al., 2020).
intermittent characteristics. (Syntetos et al., 2016). This is because the Based on the actual business problems described above and the dif-
long tail of sales is often the part of the product that is overlooked by ficulty of intermittent demand forecasting, Intermittent Demand Com-
everyone, while the total number of this part is not a minority, simply bination Forecasting (IDCF), an intermittent demand combination
because it is not high-selling enough to be conspicuous. Several studies forecasting method based on internal and external data, is proposed in
have argued that such SKUs should receive the highest level of service this study. The basic method divides the problem into two parts. A
(Knod and Schonberger, 2001). This argument is largely based on the classification model in the first part gives the probability of demand
overall contribution of these SKUs. When viewed individually, such SKUs occurrence, and then a regression model in the second part gives the size
appear relatively unimportant; however, when they are considered as a for non-zero demand. Finally, the results of the two parts are combined to
whole, the situation changes dramatically. At the same time, there are a form the final prediction output. All of the optimizations are adapted and
large number of expensive items in this category, which also requires tested on this basis. This study provides a new perspective on the practice
additional attention and treatment if ranked by value (Nikolopoulos of intermittent demand forecasting in the automotive aftermarket, and
et al., 2011). Moreover, things become even trickier when we assume initially validates the potential of IDCF framework for automotive spare
that intermittent data obey independent identical distribution, because parts forecasting, and the framework has been implemented internally
one can hardly find the corresponding performance from intermittent for customer use with significant improvement in the enterprise. Based
series (Petropoulos et al., 2014). on the above, this study aims to address the following questions:
Intermittent demand series have a large number of zeros, do not have Q1: How can the problem of intermittent demand forecasting be
obvious patterns of variation, such as trend and seasonality, and usually decomposed into two stages? How can feature engineering be built and
show extremely irregular characteristics. Therefore, forecasting for applied to intermittent post-automotive spare parts? How can machine
intermittent demand has been recognized as a challenge by the academic learning models be used to build methods applicable to intermittent
community (Swain and Switzer, 1980; Tavares and Almeida, 1983; demand forecasting in the auto aftermarket, so as to capture the global
Watson, 1987; Willemain et al., 2004). Croston (1972) proposed a fore- impact and build global models?
casting method that calculates demand interval and demand size inde- Q2: How can the accuracy of machine learning demand classification
pendently of each other, splitting the intermittent demand series into two prediction be improved, and how can the best classification threshold be
continuous series, demand size and demand interval, and then fore- determined? When internal data are insufficient, how can transfer
casting these two continuous series separately using an exponential learning be used to incorporate external data into model training, so as to
smoothing algorithm to form the final result. Inspired by Croston (1972), improve the accuracy of prediction?
Kourentzes (2013) also predicted intermittent demand from two com-
ponents. He used a neural network approach to update the values of 2. Literature review
demand interval and demand size, replacing the simple moving average
used in Croston’s method. The neural network approach incorporates the 2.1. Demand classification
association between non-zero demand and demand interval into the
model, which results in a dynamic demand rate. Despite poor forecasting Many scholars believe that an accurate classification of demand can
results, the neural network approach improves the inventory service level provide some guidance in finding an appropriate forecasting method.
while maintaining the inventory holding cost. After Kourentzes (2013), The most commonly used classification is the ABC classification, also
many scholars tried to study segmented forecasting from the perspective known as the Pareto classification, which is often used in industries to
of a combination model, that is, to forecast demand occurrence and de- distinguish between categories of different levels of importance. It is
mand quantity separately. Jiang et al. (2020) proposed a method that named after the Italian economist Vilfredo Pareto, who noted in 1906
classifies intermittent demand data into two parts, zero value and that 80% of Italy’s land was owned by 20% of the population. More
non-zero value, and fits non-zero values into a mixed zero-truncated generally, there are similar laws in many fields. In the case of inventory,
Poisson model. When demand occurs, they use the weighted average of many case studies confirm the validity of the 80/20 rule (Syntetos et al.,
the mixed zero-truncated Poisson model as predicted non-zero demands, 2009, 2010). In inventory management, about 20% of SKUs generate
44
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
almost 80% of sales. Generally, more than two categories can be used to intermittent demand, and lumpy demand. Lumpy demand is also
control the overall demand category, based on a specific criterion (annual considered to involve more extreme intermittent requirements, and this
demand size or annual demand value), e.g., 60% of overall annual de- classification is more intuitive and easier to apply in real business.
mand size for category A, 30% of overall annual demand size for category Syntetos et al. (2005) re-examined the comparison between methods,
B, and 10% of overall annual demand size for category C (Syntetos et al., such as Croston’s and SES and discussed the cut-off points for the clas-
2009). Category A SKUs are considered the most critical and therefore sification criteria based on pairs of theories (rather than simulations),
require the highest level of service to avoid costly backlogs. Therefore, introducing non-zero demand variability as one of the demand classifi-
the ABC classification is also used to determine service levels, where cation variables in addition to the Average Demand Interval (ADI). In
category A is usually set as the highest target and intermittent demand particular, it is suggested to use the non-zero demand variation coeffi-
forecasting methods are used mostly for category C SKUs. However, a cient (CV 2 ) to represent demand stability. This classification is known as
problem can occur when ABC classification method is used for fore- the SBC classification method. As shown in Fig. 1, ADI and CV 2 can
casting. If some slow and low-volume non-intermittent SKUs are assigned classify demand into four categories, namely irregular demand, smooth
to category C according to the demand criteria, and the method devel- demand, lumpy demand and intermittent demand. The threshold values
oped for intermittent demand is used for forecasting on these SKUs, it of ADI and CV 2 are 1.32 and 0.49, respectively, as verified by the
may produce negative results. intermittent forecasting method, Syntetos-Boylan Approximation (SBA)
The purpose of forecast-based classification is to select the most and SES, the conventional time series method.
appropriate forecasting methods for different categories. Boylan and The SBC classification method, which is very intuitive and easy to
Johnston (1996) demonstrated the conditions under which Croston’s apply in real business, provides cut-off values between different demand
method is more accurate than SES, based on different lead time condi- types. It is better validated in the real operations of many organizations
tions and demand patterns, through a series of realistic simulation ex- dealing with intermittent commodities (Ghobbar and Friend, 2002;
periments. They found that, based on Mean Square Error (MSE), Regattieri et al., 2005; Rego and Mesquita, 2015). It is also utilized by
Croston’s method should be used instead of SES, if the mean many international supply chain software developers and consultants,
inter-demand interval is greater than 1.25. such as Blue Yonder, Implement Consulting, Llamasoft, and Syncron.
Inspired by Williams (1984), Eaves and Kingsman (2017) proposed a This study will focus on intermittent demand and lumpy demand for the
classification method for the identification of intermittent demand based exploration of forecasting methods, and the SBC method will be used as a
on second-order moment variability (transaction variability, demand size benchmark in demand classification.
variability, and lead time variability), but it lacks the practicality of a
business landing clock due to the excessive number of required elements. 2.2. Intermittent demand forecasting methods
Another approach proposed by Syntetos et al. (2005) is based on trans-
action frequency and demand size variability. They proposed two key The sparsity of intermittent demand poses a great challenge for
parameters, one of which is the average demand interval and the other is forecasting. Since the presence of a large number of zero values in
the demand variability coefficient. This classification provides the cut-off intermittent time series makes it difficult to apply conventional fore-
values between different demand types, thus classifying demand se- casting methods, scholars have proposed many different methods for
quences into four categories: irregular demand, smooth demand, intermittent demand forecasting. Croston’s method, proposed by and
45
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
named after Croston (1972), was the first algorithm to solve the problem shortcomings of the time series-based approach by focusing on the
of intermittent demand forecasting. This method splits the intermittent classification problem and working to give more accurate forecasts of
demand series into two continuous series, the demand size and the de- demand occurrence. Moreover, the research will further consider the
mand interval. Then, the exponential smoothing algorithm is used to introduction of external data for transfer learning, so as to expand the
forecast these two continuous series separately, and the final results are existing data volume and give full play to the advantages of enterprise
formed by the predicted interval and size. However, the forecast esti- data resources to obtain more accurate forecasts.
mates of Croston’s method are biased. Syntetos and Boylan (2005) pro-
posed the SBA method, arguing that the correction should be made by
multiplying the smoothing factor, and then the new forecast value will be 2.3. Transfer learning
close to unbiased. This method offers some advantages in terms of de-
mand forecast improvement, which has been proven by many researchers Data mining and machine learning techniques have achieved signifi-
(Eaves and Kingsman, 2017; Gutierrez et al., 2008; Nikolopoulos et al., cant success in many areas of knowledge engineering, including classifi-
2016; Wingerden et al., 2014). cation, regression, and clustering (Yang and Wu, 2006). A major
Many scholars have addressed intermittent demand forecasting from assumption of many machine learning and data mining algorithms is that
a combinatorial perspective. Bootstrapping has been shown to be effec- the training and test data must share the same distribution. When the
tive for intermittent prediction enhancement (Babai et al., 2020). On one distribution changes, most statistical models need to be rebuilt from
hand, The most typical approach, proposed by Willemain et al. (2004), scratch using the newly collected training data. However, in many prac-
uses a two-stage Markov chain to generate non-zero demand points, and tical applications, this assumption is difficult to uphold. For example, in
then resamples the demand using historical data. Viswanathan and Zhou one scenario, we may have a classification task with insufficient data;
(2008), on the other hand, use non-zero demand intervals from historical meanwhile, in another similar scenario, we may have enough training
data to resample and thus generate a non-zero demand interval distri- data, which may have a different distribution in terms of data. In this case,
bution for the lead time period validated to be effective. Furthermore, if the samples can be transferred for training, expensive data labelling and
Zhou and Viswanathan (2011) explored the scenarios for which the human subjective factors of error can be avoided, greatly improving the
bootstrap method and the parametric estimation method are suitable, training efficiency and prediction accuracy. There may also be situations in
respectively. Some researchers assume that the first prediction point of which the originally available labeled sample data may become unavai-
the lead time is non-zero demand, which also works well in a staged lable over time, e.g., stock data are very time-sensitive data, and the model
replenishment control system (Hasni et al., 2019a; Teunter and Duncan, obtained by using the previous month’s training sample is unreliable for
2009). More reviews on bootstrap methods can be found in Hasni et al. predicting the current month’s new sample.
(2019b). Transfer learning is a new machine learning approach that uses
Meanwhile, Hua et al. (2007) and Hua and Zhang (2006) follow a existing knowledge to solve problems in different but related domains. It
similar path by introducing explanatory variables and impact factors to relaxes two basic assumptions in traditional machine learning, allowing
predict intermittent demand. Hua et al. (2007) slice the time series into the domain, task, and distribution in which the training and test sets are
binary series. One is a zero-demand series, and the other is a non-zero located to differ, and uses existing knowledge to solve learning problems
demand series. They attribute the occurrence of demand to autocorre- in the target domain for which there is only a small amount of labeled
lation or explanatory variables. The former can predict the probability of sample data or even none. Many examples of transfer learning can be
demand occurrence through Markov chains, and the latter can make the observed in the real world. For example, learning erhu (Chinese tradi-
corresponding probability prediction through logistic regression, after tional musical instrument) may help in learning the violin. That is,
which the non-zero demand size is estimated using the bootstrapping research in transfer learning is based on the fact that people can intelli-
method. Hua and Zhang (2006), simply replace the Markov chain gently apply previously learned knowledge to solve new problems more
method with the Support Vector Machine (SVM). The prediction accu- quickly or effectively. Since 1995, research on transfer learning covering
racy of Hua and Zhang (2006) is better than that of Hua et al. (2007), as knowledge transfer, model transfer, sample transfer, feature transfer, etc.
verified by the real data. Nasiri Pour et al. (2008) use a neural network (Thrun and Pratt, 1998) has attracted increasing scholarly attention. The
along the lines of a hybrid model to predict whether demand occurs or more factors shared by two different domains, the easier the transfer
not, while using exponential smoothing to predict the non-zero demand learning will be. With few shared factors, transfer learning is more
size. Their proposed neural network model considers four input vari- difficult, and “negative transfer” can even occur (Dai et al., 2009; Rose-
ables: the demand size at the last point, the number of periods between nstein et al., 2005).
the previous two demand occurrences, the number of periods between In the scenario of demand forecasting, since both the target data and
the target period and the last demand occurrence, and the number of the source data are time series data, the same feature columns can be
periods between the target period and the nearest zero demand period. constructed from the time series to perform sample fusion training. The
However, demand forecasting based on time series algorithms cannot problem is how to use the source domain data to build a reliable model
capture the influence of internal and external factors on demand occur- with the target data to predict the target domain data (the source domain
rence. It tends to give an average forecast level with few zero values, data and the target domain data may not have the same data distribu-
which cannot distinguish the presence or absence of demand and cannot tion). Therefore, the most suitable transfer idea is instance transfer
give a guide for replenishment, thus bringing certain inventory costs and learning. The idea of instance transfer learning is very intuitive. Its core
backlog costs. In addition, many methods based on bootstrapping have concept is that although the source domain data cannot be directly and
been attempted for classification problems, but they rely only on the completely reused, some parts of its data can still be used together with
characteristics of the time series itself. Existing research does not fully tap some labeled data in the target domain. Dai et al. (2007) proposed a
into the advantages of machine learning, so as to capture more influ- boosting-based algorithm, TrAdaBoost, which is an extension of the
encing factors and make more accurate predictions. Furthermore, newly AdaBoost algorithm, to solve the problem of sample migration learning.
listed spare parts face a great challenge of forecasting due to insufficient TrAdaBoost assumes that the source and target domain data use exactly
historical data. Therefore, this study will further explore the the same feature and label fields, but the data distribution in the two
domains is different. TrAdaBoost assumes that because of the difference
46
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
in distribution between the source and target data, some parts of the Table 1
source data may be useful for training the target data, but some other Confusion matrix.
parts of the source data may be useless or even interfere with the training True value Forecasting value
of the target data. Therefore, the TrAdaBoost algorithm trains the fusion
Positive Negative
model by using different weight iteration strategies on the target and
source data. It uses the same strategy as AdaBoost to update the weights Positive True Positive (TP) False Negative (FN)
Negative False Positive (FP) True Negative (TN)
of misclassified samples in the target domain and adopts an opposite
strategy to update the weights of misclassified samples in the source data,
in order to reduce the “negative” effect in the source data. The “negative” Scale-Free Errors. The most typical evaluation indicator of this type is
influence of the source data is reduced, and the “positive” influence of the the Mean Absolute Scaled Error (MASE). The denominator can be seen as
source data on the target domain training is promoted. For each iteration, the MAE of the naïve forecasting method (using yesterday’s real sales as
TrAdaBoost trains the basic classifier based on the weighted source and tomorrow’s demand), so that this indicator is robust and scalable
target data but calculates the error only for the target data, as described (Hyndman, 2006). MASE does not have to worry that the denominator is
in Dai et al. (2007). zero, and that it will be infinite unless the historical demand is all equal,
Jiang and Zhai (2007) proposed a heuristic method to remove so it is considered to be the optimal choice for intermittent forecast
training samples from the source domain that may be “misleading” to evaluation metrics. The Eq. is as follows:
the target data based on the difference between the conditional prob-
abilities in the source and target domains. Liao et al. (2005) proposed a 1X n
jYi Fi j
MASE ¼
new active learning method that can label data in the target domain n i¼1 1 Xn (3)
jYi Yi1 j
with the help of the source domain data. Wu and Dietterich (2004) n 1 i¼2
integrated source domain data into a SVM framework to improve Percentage-Error Metrics. This type of indicator is based on pt ¼ Yett
classification performance. In addition to being used for classification 100%, which is the percentage error relative to the true value. These
problems, instance transfer has also been investigated for solving metrics do not depend on the scale and can be used to compare the merits
regression problems. Pardoe and Stone (2010) proposed the Two Stage of a prediction method across different data series. The typical metric is
TrAdaBoost. R2 model, which adapts TrAdaBoost to the requirements of Mean Absolute Percentage Error (MAPE), which is used in many fore-
regression tasks. casting studies (Yang et al., 2021). However, in intermittent demand, Yt
Based on the above research and summary, this study takes TrAda- is often equal to zero, which makes MAPE unsuitable for such scenarios.
Boost as the method benchmark and incorporates it into the intermittent Therefore, Sungil and Heeyoung (2016) proposed Mean Arctan Absolute
demand forecasting method through the response requirements in actual Percentage Error (MAAPE). MAAPE is based on MAPE, but it is not
business, thus providing a solution to the problem of insufficient data in limited to the domain, with a range between 0 and 2π . Consequently, it is
this scenario using external data.
not heavily biased towards underestimation; instead, it is more sym-
metric for overestimation and underestimation penalties, and has a very
2.4. Metrics for intermittent demand forecasting obvious meaning and high interpretability. The Eqs. for MAPE and
MAAPE are shown below.
When we train a model, we need to use evaluation metrics to assess its
1X n
strengths and weaknesses. Intermittent series are difficult to measure. MAPE ¼ pt (4)
Typical prediction accuracy metrics are usually not applicable to such n i¼1
problems. Therefore, Hyndman (2006) listed a series of regression pre-
diction evaluation metrics corresponding to such sequences. 1X n
MAAPE ¼ arctanðpt Þ (5)
Scale-Dependent Metric. This type of indicator is based on et ¼ Yt n i¼1
Ft , which is the difference between the true value and the forecasting
value, where Yt is the true value and Ft is the forecasting value. Typical In addition to the evaluation metrics for regression prediction, clas-
ones include Mean Square Error (MSE) and Mean Absolute Error (MAE), sification prediction also needs further evaluation in this study. Ac-
which are calculated using the Eqs. below: cording to data mining theory, the most widely used method to evaluate
the prediction ability of the classification model is the confusion matrix
1X n (Deng et al., 2016), shown in Table 1. Individual assessment focuses on
MSE ¼ ðYi Fi Þ2 (1)
n i¼1 accuracy, precision, and recall, while comprehensive assessment focuses
on F1 score, Receiver Operating Characteristic (ROC), and Area Under
the Curve (AUC).
1X n
MAE ¼ jYi Fi j (2) In the past, ROC curves were mainly applied in the field of signal
n i¼1
detection (Swets, 2014). Nowadays, they are commonly used as criteria
The disadvantage of these methods is that they are dependent on for model evaluation (Flach, 2003). ROC curves are evaluation curves
magnitude, and it is meaningless to compare time series of different consisting of True Positive Rate (TPR) and False Positive Rate (FPR), with
magnitudes. Furthermore, if the MAE is used as a loss function, it predicts FPR as the horizontal axis and TPR as the vertical axis. They are calcu-
that a series with high intermittency is zero at each stage, which is lated from the following Eqs., respectively:
obviously difficult to apply in practical scenarios. Syntetos and Boylan
TP
(2005) recommend the use of Geometric Mean Absolute Error ðGMAE ¼ TPR ¼ (6)
p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Qn TP þ FN
i¼1 jet jÞ. The disadvantage of this method is that the GMAE is equal to
n
47
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
48
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
LightGBM is an algorithm framework open-sourced by Microsoft, 3.1. Intermittent demand combination forecasting (IDCF)
which has been further modified on the basis of XGBoost and optimized
mainly in the following three aspects: 3.1.1. Problem definition
This study proposes the IDCF method. To clarify the problem, we first
i. Histogram Algorithm. The core benefit of this algorithm lies in its give the relevant definition presenting in the following.
ability to reduce the number of alternative splitting points, such as
Definition 3.1. (basic notation):
regression problems. The eigenvalues of continuous variables can
be discretized on a certain scale of intervals to make the number of
Let X be the sample space, which can also be called the target sample
alternative splitting points lower, and the speed can be improved.
(feature) space.
ii. Gradient-based One-Side Sampling (GOSS). The core benefit of
Let L ¼ {0,1} be the class space, which in this study involves the
this algorithm lies in its ability to reduce the number of observa-
classification problem of predicting the presence or absence of de-
tions, mainly using purposeful sampling of observations to reduce
mand and therefore needs to be defined.
redundancy in the calculation of the gain of the loss function. This
Let Y 2 Z be the integer space.
type of purposeful sampling is called one-sided sampling, which
The overall classification dataset C ⊆ {X L}, which is collected from
simply means sampling a certain class of samples according to a
the sample space.
certain strategy, while leaving the other classes of samples un-
The overall regression dataset R ⊆ {X Y}, which is collected from
touched. In this algorithm, it involves resampling observations
the sample space.
with small absolute values of the gradient during the training
The mapping function c : X → L, maps a sample x 2 X to its true
process at the corresponding rate, while those with large absolute
category c (x) 2 L.
values of the gradient remain untouched. This is because the gain
The mapping function r : X → Y, maps a sample x2 X to its true
of the loss function is contributed by the observations with a large
category r (x) 2 Y.
absolute value of the gradient, which can better distinguish be-
tween the different kinds. Therefore, the algorithm improves both
the operation speed and efficiency, while also retains the predic- Definition 3.2. (test dataset):
tive power of the model to the maximum extent.
iii. Exclusive Feature Bundling (EFB). The core benefit of this algo- N ¼ fðxi Þg; where xi 2 X; when i ¼ 1; 2; …; n;
rithm is its ability to reduce the number of features. In many cases, a
where the dataset N is not labeled and n is the number of elements in the
real sample set cannot be feature-rich. Additionally, there are al-
set N.
ways a large number of sparse features, where a large number of
observations are marked as zero and a small number of observa- Definition 3.3. (test prediction set):
tions are marked as some non-zero value. In general, these sparse
features are mutually exclusive. With these characteristics, it is F ¼ fðfi Þg; where fi 2 R; when i ¼ 1; 2; …; n;
possible to encode a new marker for the corresponding feature, so
that some mutually exclusive fields can be re-formed into a feature. and n is the number of elements in the set F.
This effectively reduces the number of features. Furthermore, Definition 3.4. (training dataset):
because of this design, LightGBM can support category variables.
Tc ¼ fðxi ; cðxi Þ Þ g; where xi 2 X; when i ¼ 1; 2; …; m;
With the introduction of these three algorithms, the complexity of
LightGBM is greatly reduced. Its main advantages are in the distributed Tr ¼ fðxi ; rðxi ÞÞg; where xi 2 X; and rðxi Þ > 0; subject to i m
support, which can be trained in parallel with high efficiency. LightGBM
supports both regression and classification tasks. In classification tasks, it where cðxi Þ is the true class of sample x and rðxi Þ is the true sales of
can support the processing of unbalanced samples and weight correction. sample x, Tc is the classification problem training set, Tr is the regression
In addition, four of the top five time-series forecasting models in the M5 problem training set, and m is the size of the classification problem
competition (the world’s top forecasting competition for demand fore- training dataset. The size of the regression problem training set is k ¼
casting) were based on this algorithm (Makridakis et al., 2020), lenðrðxi Þ > 0 Þði mÞ.
combining the advantages of engineering speed and predictive accuracy. Based on the above definitions, the problem can be simply described
Therefore, LightGBM is used as the base model in this study. as follows: given the training datasets Tc , training the classifier and re-
gressor, respectively, and some unlabeled test set N, the classification
3. Methodology error and regression error on N were minimized.
As shown in Fig. 2, a total of three intermittent demand combination 3.1.2. Method and description
forecasting methods based on internal and external data are proposed in As shown in the description of Algorithm 1, in the training set, the
this Section. Among them, IDCF is the core, extending the Best Threshold machine learning model LightGBM is invoked to train the classification
Intermittent Demand Combination Forecasting method (BTIDCF) and model M1 that predicts whether demand occurs. This needs to be miti-
Transfer-Based Intermittent Demand Combination Forecasting method gated by adjusting the weights of positive and negative samples, because
(TBIDCF). Among them, BTIDCF is mainly implemented based on inter- the data samples present a serious imbalance. The samples with sales
nal data, while TBIDCF can combine external data for training when greater than zero in the training data are screened, and the machine
internal data are not sufficient. These methods combine their advantages learning model LightGBM is invoked to train the regression model M2 for
and form a diverse solution to intermittent demand forecasting. predicting non-zero demand. In the test set, the classification model M1 is
49
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
used to predict whether the demand of the samples is zero. If it is not classification threshold and evaluate the classification of the validation set
zero, the regression model M2 is used to predict the size of non-zero at each threshold using the AUC value, obtaining the best classification
demand. Finally, the results of the two parts are combined to obtain threshold with the maximum AUC value. In the test set, using the best
the final output. classification threshold output from the validation set that matches the
data characteristics. After obtaining the classification prediction results,
Algorithm 1. IDCF description
the non-zero demand estimate is then obtained using the M2 regression
model. Finally, the two sets of results are combined to obtain the final
output.
Algorithm 2. BTIDCF description
V ¼ fðvi Þg; where vi 2 L; when i ¼ 1; 2; …; q; 3.3. Transfer-based intermittent demand combination forecasting method
(TBIDCF)
where q is the size of the validation sample set.
3.3.1. Problem definition
Definition 3.7. (classification threshold set): In this study, we propose an optimized method based on IDCF, called
transfer-based intermittent demand combination forecasting (TBIDCF).
α ¼ fðαi Þg; where αi 2 R; and αi 2 ½0:4; 0:6; when i ¼ 1; 2; …; k;
To clarify the problem, we give other definitions related to this problem
where αiþ1 αi ¼ l is satisfied for any positive integer i, α is the set of based on Definition 3.1, Definition 3.2, Definition 3.3, and Definition 3.4.
classification thresholds, l is the step size, and the size of k depends on the Definition 3.8. (source dataset):
step size l:
Based on the above definitions, the problem can be simply described Ts ¼ fðzi ; hðzi Þ Þ g; where zi 2 Z; when i ¼ 1; 2; …; w
as follows: given two training datasets Tc and Tr, a validation dataset Tv ,
training classifiers M1 and regressor M2, respectively, and some unla- where Z is the source instance space, the dataset H ⊆ { Z L }, collected
beled test set N, the classification error and regression error on N were from the source sample space, and the mapping function h : Z → L maps a
minimized. sample z 2 Z to its true category hðzÞ 2 L.
Therefore, the target dataset Tc in Definition 3.4 and the source
3.2.2. Method and description dataset Ts together constitute the training set of the new classification
As shown in the description of Algorithm 2, the target sample set is first model. The difference between Tc and Ts is that Tc has the same distri-
divided into a training sample set, a validation sample set, and a test bution as N, while the source dataset Ts is likely to have a different dis-
sample set. By using the machine learning model LightGBM, a classifica- tribution, that is, Pðx; yÞjx 2 Ts 6¼ Pðx; yÞjx 2 N.
tion model M1 is trained on the training set to predict the probability of Based on the above definition, the problem can be simply described as
demand occurrence. A set of classification thresholds is constructed, follows: given very small target datasets Tc and Tr , a large amount of
ranging from 0.4 to 0.6, and a threshold increment step is also set. Call M1 source data Ts , and some unlabeled test set N, the classification error and
on the validation set and output the demand probability of the validation regression error on N were minimized.
set. Predicting a value of 1, if the demand probability is greater than the
classification threshold, and predicting a value of 0, if the demand prob- 3.3.2. Method and description
ability is less than the classification threshold. Iterate through each As shown in the description of Algorithm 3, the target dataset is cut
into a training set and a test set, and external source data are introduced;
50
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
Table 2
Intermittent SKUs (3,089) weekly granularity SBC classification descriptive statistics.
Index Average Standard deviation Minimum 25% 50% 75% Maximum
after that, the transfer learning begins. It should be noted here that the (continued )
original version of TrAdaboost (Dai et al., 2007) uses a modification of the wTi c ¼ wTi c βt
jht ðxi Þcðxi Þj
; when i ¼ 1; 2; …; m;
Adaboost model. In Adaboost, the first weak classifier is obtained by wTi s ¼ wTi s βjht ðxi Þcðxi Þj ; when i ¼ 1; 2; …; q;
training all samples with the same weights. Starting from the second
Output the set of classification models M1 ¼ fht g and its set of voting weights O ¼
round, the weights of each sample are adjusted according to the classifi- fθt g, when t¼1,2,3,...,S
cation results of the previous weak classifier before each round. The Using the training regression sample Tr,call the regression algorithm Lr, and train the
weights of the misclassified samples in the previous round are increased, demand regression model M2 fr ðxÞ:
Output:
and the weights of the correctly classified samples are decreased. The new
Call the set of classification models h,iterate through each model, and get S
sample weights are then used to guide the training of the weak classifier in predictions on the test set N.
this round, i.e., to locate the deficiencies of the model by updating the For i ¼ 1; 2; 3; …; n
weights of the misclassified samples (Freund and Schapire, 1997). For t ¼ 1; 2; 3; …; S
However, in our actual business scenario, the number of SKUs to be 1; dt ¼ 0
Call ht ; dt ¼ ht ðxi Þ,update dt ¼
1; dt ¼ 1
optimized is very large. Particularly in automotive spare parts, there are
Obtaining prediction result set D ¼ fðdt Þg, combined with model voting weights set O,
often thousands to tens of thousands of SKUs. When the hierarchy is P
ci ¼ signð St¼1 dt θt Þ
refined to bin granularity and the prediction is refined to daily granu- If ci ¼ 0, then fi ¼ 0;
larity, there will be more. Furthermore, there will be a large amount of Else ci ¼ 1,then call M2: fi ¼ fr ðxi Þ
source data, which is almost impossible to land with traditional Ada- Output the final prediction result set F.
boost; the process is less efficient, and agile response is difficult to ach-
ieve. Therefore, in this study we use LightGBM, which is more maturely
landed in industry and tens of times faster than XGBoost (Chen and
Guestrin, 2016) with only one-fifth of its memory consumption.
This study proposes Adjusted TrAdaboost, which uses LightGBM for
optimization based on the inherited core idea of TrAdaboost, retains the 4. Data validation and discussion of results
sample weight update strategy in TrAdaboost, increases the weight of the
next round for misclassified target data samples, and decreases the 4.1. Data collection and feature engineering
weight of the next round for misclassified source data samples. After
resampling based on the updated sample weights, the weak classifier 4.1.1. Data collection
used in each iteration is replaced with a strong learner, which can greatly The research data come from a domestic vehicle company, V,
improve the efficiency. In the test set, the final weighted classification specializing in automotive development and sales. V is a global auto-
results are obtained by combining the multiple strong learners and their motive R&D company that produces numerous models, including buses,
voting weights. Then, the non-zero demand estimates are obtained using trucks, and passenger cars. Its business also involves the development
the M2 regression model. Finally, the two sets of results are combined to and production of industrial equipment. The main demand of V is to
obtain the final prediction results. establish an integrated and intelligent spare parts supply chain system.
The intermittent nature of spare parts poses a great challenge to demand
Algorithm 3. TBIDCF description
forecasting, which requires further construction of a forecasting frame-
Input:
work to support the optimization of an intelligent replenishment and
A small number of target training samples Tc and Tr; a large number of source data
samples Ts; an unlabeled test set N; a classification learner Lc and a regression learner allocation system.
Lr ; and the number of iterations S. This study uses the spare parts data from the XA warehouse managed
Sample weight initialization: by V as target data. XA warehouse, located in a core transportation hub
1
Target data Tc weight initialization: wTi c ¼ , when i¼1,2,3,...,m province, is an important spare parts distribution center. The dataset
m
1 includes 3,089 SKUs, corresponding to the equipment manufacturing and
Source data Ts weight initialization: wTi s ¼ , when i¼1,2,3,...,q
q maintenance replacement of several car types. The target dataset con-
Training: tains weekly granularity data from 2018-08-27 to 2020-12-28. Weekly
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Set β ¼ 1=ð1 þ 2lnq=SÞ
granularity aggregation is performed due to the extremely sparse daily
For t ¼ 1; 2; 3; …; S
granularity data; however, the data size will be sharply reduced with it.
wTi c wTs
a. Weight normalization, wTi c ¼ Pm P ; wTi s ¼ Pm Tc i Pq . In addition to the XA warehouse, V has 7 other warehouses that have
Tc
i¼1 wi þ qi¼1 wTi s i¼1 wi þ
Ts
i¼1 wi
b. Merge the target data Tc and the source data Ts, and resample the samples based on accumulated a large amount of data. These datasets are not interoper-
the updated sample weights to obtain the new dataset Tn. able, so this study will introduce data from other warehouses based on
c. Call the classification learner Tc, and train the classification model ht:Tn→L based on the target data to test TBIDCF method and will use the DS warehouse
the resampled dataset Tn.
spare parts data as the source data. Since this set contains historical sales
Pm wTc jh ðx Þ cðx Þj
t i i
d. Calculate the error rate of ht on Tc : ϵt ¼ i
Pm Tc and the voting data of 2,167 SKUs, it can be used as an source for transfer learning.
i¼1 i¼1 wi
1 1 ϵt As can be seen from Table 2, even for the weekly granularity data, the
weight of ht : θt ¼ log :
2 ϵt average ADI is as high as 10.03, which is greater than the standard
e. Set βt ¼ 2t ð1 2t Þ threshold of 1.32 and meets the intermittency criterion. The CV2 is 9.91,
f. Set the new weight vector as follows:
which is much greater than the standard threshold of 0.49. Therefore, the
(continued on next column) historical demand for most of the spare parts is highly intermittent, and
the prediction difficulty is relatively high.
51
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
Table 3 information, etc. The aforementioned features will be used as the main
Experimental model and description. model inputs.
Model Description
SBA Classical time series model (Croston variant) 4.2. Experimental design
Syntetos and Boylan
(2005) As in Table 3, the classical time series model SBA is used as the base
Bootstrapping Markov chain (C) þ Historical sales sampling (R)
Willemain et al. (2004)
model for intermittent demand forecasting as the comparison term in the
IDCF LightGBM (C)þ LightGBM (R) experiment, while the Markov chain-based combination model with
BTIDCF LightGBM (C þ Best threshold searching)þ LightGBM bootstrapping is added as another comparison term. AUC, MASE, and
(R) MAAPE are used as evaluation indicators to measure the performance of
TBIDCF Adjusted Tradaboost (C)þ LightGBM (R)
each forecasting model. Different forecasting periods (4 weeks, 9 weeks,
Note: C is for classification; R is for regression. 12 weeks) are selected to explore how different forecasting period
lengths affect the forecasting results in machine learning. In addition,
BTIDCF is also compared with IDCF to explore the impact of the opti-
4.1.2. Feature engineering mization strategy of best threshold on the prediction results.
First, the dataset is filtered according to the SBC demand classifica- Finally, from the same dataset, the set of SKUs in the XA main
tion method proposed by Syntetos et al. (2005). The set of SKUs with ADI warehouse with historical data within 3 months (52 in total) is selected.
>1.32 is used as data input, and then the feature engineering is built. These SKUs have limited historical data and insufficient information,
The construction of feature engineering is mainly divided into which means that they are not suitable for long-period (9 weeks or 12
discrete time features and historical sales aggregation features. Discrete weeks) forecasting. So we only select the next 4 weeks as the prediction
time features are mainly time specific, e.g., year, month, week informa- period without making other long-period forecasting tests. In addition,
tion, days of the week, weekend or not weekend, season, quarter, year- the purpose of the test is to verify whether transfer learning can work for
end or month-end, holiday, promotion day, etc. Historical sales aggre- SKUs with a small amount of historical data. Therefore picking the most
gation features are mainly transformed with historical sales information, suitable prediction period is sufficient. Finally, the next 4 weeks are used
e.g., sales from a few days ago and aggregation features within a specific as the prediction period, and DS warehouse source data are introduced.
time window (average, quantile, skewness, kurtosis, maximum/mini- TBIDCF is compared with IDCF to verify the impact of transfer learning
mum, difference, etc.). For the characteristics of intermittent SKU sales, on the prediction accuracy of such SKUs.
ADI, CV 2 , frequency of demand, percentage of zero demand, historical
total/mean/standard deviation of non-zero demand, last non-zero de- 4.3. Results and discussion
mand, number of days since last non-zero demand, historical most
frequent demand, etc., are also added, along with external factors, such 4.3.1. Results of BTIDCF
as retention data of the car type to which the spare part belongs, type of As shown in Table 4, compared with the time series intermittent
spare part (production part, repair part, etc.) and high and low flow prediction model represented by the SBA method, the machine learning-
Table 4
Comparison of BTIDCF with other models.
Model Forecasting length: Forecasting length: Forecasting length:
4 weeks 9 weeks 12 weeks
SBA 0.5000 1.4100 0.6882 0.5000 1.3100 0.6922 0.5000 1.3200 0.6634
Bootstrapping 0.6230 1.5300 0.7979 0.6180 1.4400 0.7332 0.6020 1.3900 0.7287
IDCF 0.8380 1.0300 0.2463 0.8430 0.9800 0.2447 0.8460 0.9600 0.2451
BTIDCF 0.8410 1.0500 0.2315 0.8480 0.9600 0.2392 0.8500 0.9500 0.2391
52
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
Table 5 core AUC classification accuracy about 12% in comparison to IDCF when
Comparison of TBIDCF with IDCF. predicting sales for the next 4 weeks. This indicates that when the
Model Forecasting length: 4 weeks amount of historical data is insufficient, the classification accuracy and
prediction accuracy can be effectively improved by introducing external
AUC MASE MAAPE
data for transfer training. As shown in Fig. 4, this experiment initially
IDCF 0.5120 1.5200 0.6521 verifies the feasibility as well as the significant advantages of the TBIDCF
TBIDCF 0.6330 1.2300 0.4436
method for intermittent demand forecasting.
based combined prediction model IDCF performs well for different 5.1. Conclusion
forecasting lengths as well as different evaluation metrics, with an
average improvement of about 35% in MASE and 43% in MAAPE. This paper focuses on intermittent demand forecasting in industry as
Compared with the bootstrapping-based Markov chain combination its main research target, considers intermittent demand forecasting as a
model, the average improvement in AUC classification accuracy is about difficult and key point in industrial practice, proposes a machine
22%, which shows that the performance of IDCF is superior. After adding learning-based intermittent demand combination forecasting framework
the best threshold searching optimization to the IDCF, the optimal (IDCF) from the perspective of machine learning, and conducts an in-
threshold is positioned at 0.52 by search, rather than the previous stan- depth exploration to improve forecasting accuracy. Two optimization
dard classification threshold of 0.5. Based on the new classification methods (BTIDCF and TBIDCF) are proposed and validated based on the
threshold, the performance of BTIDCF is improved to a certain extent. business data of real enterprises. The conclusions are as follows.
Compared with IDCF, MASE is improved about 2% on average, MAAPE is The intermittent demand combination forecasting method (IDCF)
improved about 1% on average, and the classification accuracy of AUC is based on machine learning can effectively improve the accuracy of spare
improved about 0.5%, which indicates that the threshold search has parts forecasting. Compared to the prediction results of the traditional
some effect. time series model, IDCF performs well in different metrics (AUC, MASE,
As shown in Fig. 3, this experiment initially verifies the significant MAAPE), which proves the potential of this combined prediction
advantages of the IDCF method and also demonstrates the feasibility of framework for intermittent demand forecasting.
the best threshold searching strategy. The BTIDCF method can slightly improve prediction accuracy. The
threshold values are traversed through the validation set, so as to find the
4.3.2. Results of TBIDCF optimal threshold boundaries by accuracy, which offers improvement in
As shown in Table 5, SKUs from the XA warehouse with historical regression accuracy and classification precision.
data within 3 months were selected for training, and the source data from TBIDCF performs well for SKUs with limited historical data. Insuffi-
the warehouse dataset were fused for transfer learning. The results show cient historical data will make forecasting extremely difficult. In our
that TBIDCF improves MASE about 29%, MAAPE about 20%, and the research, we tried transfer learning by introducing external data and
53
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
initially verified that it can effectively improve classification accuracy. corresponding optimal threshold, which may lead to a decrease in ac-
This also provides references and guidelines to enterprises on how to curacy. In addition, the selected transfer dataset is not pre-matched for
make full use of data resources. similarity in TBIDCF, which leads to an oversized data pool and affects
The framework is applicable to the forecasting of automotive spare the training efficiency, and a similar product searching algorithm should
parts. Through real data validation, it has the value of actual business be introduced. The above points should be taken into account in future
implementation in terms of accuracy and speed. With certain innovation studies. With the goal of improving intermittent demand forecasting
and robustness, it can further optimize the downstream intelligent pro- accuracy, this paper investigates the potential and the advantages of the
curement business of the supply chain through artificial intelligence and combined forecasting model from the perspective of machine learning.
promote the intelligent transformation of the aftermarket spare parts However, due to the scarcity of knowledge, there are still many short-
replenishment of enterprises. comings in the research. Further exploration and practice will be con-
ducted in the following areas in the future.
5.2. Contribution Exploring a more reasonable evaluation system. This study focuses on
the improvement of intermittent demand forecasting accuracy as its core
This study proposes a forecasting framework for intermittent demand objective without considering the combination and linkage of the
with the following contributions. improvement with downstream out-of-stock rate, inventory fulfillment
rate, and other indicators. In addition to the goal of improving demand
i. This study proposes a two-stage methodological framework for forecasting accuracy, downstream indicators should also be taken into
intermittent demand forecasting based on machine learning. The account, i.e., the improvement of demand forecasting accuracy should
intermittent demand forecasting problem is divided into two, positively affect the downstream inventory satisfaction rate, thus forming
going first to predict whether there is demand and then to predict an integrated linkage evaluation system.
how much demand there is. Compared with the traditional time- Exploring the idea of integrating machine learning and time series
series method and Bootstrapping method, this framework can models. Time series models and machine learning each have their own
effectively improve the classification accuracy of predicting the advantages. The traditional time series model can capture trends and
presence or absence of demand and provide an effective bench- seasonal patterns and can give an overall level for the future period. On
mark reference for the subsequent replenishment and allocation, one hand, once non-zero sales appear in the recent period, the future
which can further reduce inventory costs and backlog costs. forecast is all non-zero, which puts a lot of pressure on the inventory
ii. Two optimization paths are proposed to focus on the classification level. On the other hand, machine learning can capture global impact and
problem. Through the best threshold searching strategy and can build a perfect feature engineering using internal and external fac-
transfer learning strategy, the integration of internal and external tors, so as to improve the classification forecast accuracy; however, in the
data can be effectively used to help enterprises maximize the use forecast, the overall prediction level lacks underwriting strategy.
of their own data assets, make the classification model more Therefore, machine learning models can be further combined with time
generalized, and improve the overall classification accuracy, and series models in the future to guarantee the stability of the overall pre-
reduce costs and increase efficiency. Meanwhile, for newly diction level on the basis of improving the demand prediction accuracy as
developed spare parts, the problem of limited data will be solved much as possible.
through our method, which can use external data to empower Implementing data validation in more areas. This study is based on
intermittent prediction accuracy improvement. the demand for spare parts in the automotive aftermarket, and data
iii. This study deeply investigates the internal and external influ- validation has been conducted using data from real auto companies. In
encing factors, and establishes a complete feature engineering the future, data will be collected from other fields, such as the consumer
idea based on discrete time features, aggregated time features, goods industry and home appliance industry. In this way, we can verify
intermittent features and external business features, which pro- the feasibility of the model from a more multi-dimensional and
vides stable input for machine learning, and can also provide some comprehensive perspective and provide practical guidance for more
guidance for enterprises to analyze the demand influencing factors practitioners.
through feature importance analysis.
iv. This research framework supports simultaneous training and Declaration of competing interest
output of multiple SKUs, while transforming the model kernel of
transfer learning to greatly improve the efficiency of engineering The authors declare that there are no conflicts of interest.
landing, and greatly enhance the training speed through global
training, while also capturing the cross influence between SKUs Acknowledgments
and outputting more robust prediction results. Moreover, the
framework has been implemented internally for customer use This work was supported jointly by the funding from Shandong In-
with significant effect improvement in the enterprise. dustrial Internet Innovation and Entrepreneurship Community, the Na-
tional Natural Science Foundation of China (Grant No.: 71810107003),
6. Limitations and future research and the National Social Science Foundation of China (Grant No.:
18ZDA109).
The research still has some limitations. For example, compared with
IDCF, the BTIDCF method can slightly improve the prediction accuracy. References
However, the actual AUC value of the BTIDCF method is only 0.633,
which is still slightly insufficient. More business factors need to be taken Altay, N., Rudisill, F., Litteral, L.A., 2008. Adapting wright’s modification of holt’s
into account to further improve the effectiveness of the classification. method to forecasting intermittent demand. Int. J. Prod. Econ. 111 (2), 389–408.
Babai, M.Z., Tsadiras, A., Papadopoulos, C., 2020. On the empirical performance of some
Also, only the global optimal threshold is selected in the BTIDCF, without new neural network methods for forecasting intermittent demand. IMA J. Manag.
further classifying SKUs into different categories and searching for the Math. 31 (3), 281–305.
54
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
Bandara, K., Bergmeir, C., Smyl, S., 2020. Forecasting across time series databases using Nikolopoulos, K., Thomakos, D.D., 2019. Forecasting with the Theta Method: Theory &
recurrent neural networks on groups of similar series: a clustering approach. Expert Applications. Wiley, New Jersey.
Syst. Appl. 140 (1), 112896.1–112896.16. Nikolopoulos, K., Goodwin, P., Patelis, A., et al., 2007. Forecasting with cue information:
Boylan, J.E., Johnston, F.R., 1996. Variance laws for inventory management. Int. J. Prod. a comparison of multiple regression with alternative forecasting approaches. Eur. J.
Econ. 45 (1–3), 343–352. Oper. Res. 180 (1), 354–368.
Bozos, K., Nikolopoulos, K., 2011. Forecasting the value effect of seasoned equity offering Nikolopoulos, K., Syntetos, A.A., Boylan, J.E., et al., 2011. An aggregate-disaggregate
announcements. Eur. J. Oper. Res. 214 (2), 418–427. intermittent demand approach (adida) to forecasting: an empirical proposition and
Bradley, A.P., 1997. The use of the area under the roc curve in the evaluation of machine analysis. J. Oper. Res. Soc. 62 (3), 544–554.
learning algorithms. Pattern Recogn. 30 (7), 1145–1159. Nikolopoulos, K.I., Babai, M.Z., Bozos, K., 2016. Forecasting supply chain sporadic
Chen, T., Guestrin, C., 2016. XGBoost: a scalable tree boosting system. In: The 22nd ACM demand with nearest neighbor approaches. Int. J. Prod. Econ. 177 (1), 139–148.
SIGKDD International Conference. ACM, New York. Pardoe, D., Stone, P., 2010. Boosting for regression transfer. In: Proceedings of the 27th
Croston, J.D., 1972. Forecasting and stock control for intermittent demands. Oper. Res. Q. International Conference on Machine Learning (ICML 10). Omnipress, Madison,
23 (3), 289–303. pp. 863–870.
Dai, W., Yang, Q., Xue, G., et al., 2007. Boosting for transfer learning. In: Proceedings of Patton, J.D., Steele, R.J., 2003. Service Parts Handbook. Solomon PR, Virginia.
the 24th International Conference on Machine Learning. ACM, New York. Petropoulos, F., Makridakis, S., Assimakopoulos, V., et al., 2014. Horses for courses’ in
Dai, W., Jin, O., Xue, G., et al., 2009. Eigen transfer: a unified framework for transfer demand forecasting. Eur. J. Oper. Res. 237 (1), 152–163.
learning. In: Proceeding of the 24th International Conference on Machine Learning. Pour, N.A., Tabar, R.B., Rahimzadeh, A., 2008. A hybrid neural network and traditional
Morgan Kaufmann Publishers, San Francisco. approach for forecasting lumpy demand. In: Engineering and Technology,
Deng, X., Liu, Q., Deng, Y., 2016. An improved method to construct basic probability pp. 1028–1034.
assignment based on the confusion matrix for classification problem. Inf. Sci. Regattieri, A., Gamberi, M., Gamberini, R., et al., 2005. Managing lumpy demand for
340–341 (1), 250–261. aircraft spare parts. J. Air Transport. Manag. 11 (6), 426–431.
Eaves, A.H.C., Kingsman, B.G., 2017. Forecasting for the ordering and stock-holding of Rego, J.R., Mesquita, M.A., 2015. Demand forecasting and inventory control: a simulation
spare parts. J. Oper. Res. Soc. 55 (4), 431–437. study on automotive spare parts. Int. J. Prod. Econ. 161 (Mar.), 1–16.
Fan, G., Deng, Z., Ye, Q., et al., 2021. Machine learning-based prediction models for Roanec, J.M., Mladeni, D., 2021. Reframing demand forecasting: a two-fold approach for
patients no-show in online outpatient appointments. Data Sci. Manag. 2 (1), 45–52. lumpy and intermittent demand. Available at. https://arxiv.org/abs/2103.13812v1.
Flach, P.A., 2003. The geometry of ROC space: understanding machine learning metrics Rosenstein, M.T., Marx, Z., Kaelbling, L.P., et al., 2005. To transfer or not to transfer. In:
through ROC isometrics. In: Proceedings of the 20th International Conference on Proceeding of the Neural Information Processing Systems. MIT Press, Cambridge.
International Conference on Machine Learning. AAAI Press, pp. 194–201. Salinas, D., Flunkert, V., Gasthaus, J., et al., 2020. DeepAR: probabilistic forecasting with
Freund, Y., Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning autoregressive recurrent networks. Int. J. Forecast. 36 (3), 1181–1191.
and an application to boosting. J. Comput. Syst. Sci. 55 (1), 119–139. Shenstone, L., Hyndman, R.J., 2005. Stochastic models underlying Croston’s method for
Friedman, J.H., 2001. Greedy function approximation: a gradient boosting machine. Ann. intermittent demand forecasting. J. Forecast. 24 (6), 389–402.
Stat. 29 (5), 1189–1232. Sungil, K., Heeyoung, K., 2016. A new metric of absolute percentage error for intermittent
Ghobbar, A.A., Friend, C.H., 2002. Sources of intermittent demand for aircraft spare parts demand forecasts. Int. J. Forecast. 32 (3), 669–679.
within airline operations. J. Air Transport. Manag. 8 (4), 221–231. Swain, W., Switzer, B., 1980. Data analysis and the design of automatic forecasting
Ghobbar, A.A., Friend, C.H., 2003. Evaluation of forecasting methods for intermittent systems. In: Proceedings of the Business and Economic Statistics Section. American
parts demand in the field of aviation: a predictive model. Comput. Oper. Res. 30 (14), Statistical Association, Virginia.
2097–2114. Swets, J.A., 2014. Signal Detection Theory and ROC Analysis in Psychology and
Gutierrez, R.S., Solis, A.O., Mukhopadhyay, S., 2008. Lumpy demand forecasting using Diagnostics. Taylor and Francis, London.
neural networks. Int. J. Prod. Econ. 111 (2), 409–420. Syntetos, A.A., 2001. Forecasting of Intermittent Demand. Brunel University, London.
Hasni, M., Aguir, M.S., Babai, M.Z., et al., 2019a. On the performance of adjusted Syntetos, A.A., Boylan, J.E., 2001. On the bias of intermittent demand estimates. Int. J.
bootstrapping methods for intermittent demand forecasting. Int. J. Prod. Econ. 216 Prod. Econ. 71 (1–3), 457–466.
(1), 145–153. Syntetos, A.A., Boylan, J.E., 2005. The accuracy of intermittent demand estimates. Int. J.
Hasni, M., Aguir, M.S., Babai, M.Z., et al., 2019b. Spare parts demand forecasting: a Forecast. 21 (2), 303–314.
review on bootstrapping methods. Int. J. Prod. Res. 57 (15–16), 4791–4804. Syntetos, A.A., Boylan, J.E., Croston, J.D., 2005. On the categorization of demand
Hua, Z., Zhang, B., 2006. A hybrid support vector machines and logistic regression patterns. J. Oper. Res. Soc. 56 (5), 495–503.
approach for forecasting intermittent demand of spare parts. Appl. Math. Comput. Syntetos, A.A., Keyes, M., Babai, M.Z., 2009. Demand categorisation in a European spare
181 (2), 1035–1048. parts logistics network. Int. J. Oper. Prod. Manag. 29 (3), 292–316.
Hua, Z., Zhang, B., Yang, J., et al., 2007. A new approach of forecasting intermittent Syntetos, A.A., Babai, M.Z., Davies, J., et al., 2010. Forecasting and stock control: a study
demand for spare parts inventories in the process industries. J. Oper. Res. Soc. 58 (1), in a wholesaling context. Int. J. Prod. Econ. 127 (1), 103–111.
52–61. Syntetos, A., Babai, Z.M., Boylan, J.E., et al., 2016. Supply chain forecasting: theory,
Hyndman, R.J., 2006. Another look at forecast accuracy metrics for intermittent demand. practice, their gap and the future. Eur. J. Oper. Res. 252 (1), 1–26.
Foresight: Int. J. Appl. Forecast. 4 (1), 43–46. Tavares, L.V., Almeida, L.T., 1983. A binary decision model for the stock control of very
Jiang, J., Zhai, C., 2007. Instance weighting for domain adaptation in NLP. In: slow moving items. J. Oper. Res. Soc. 34 (3), 249–252.
Proceedings of the 45th Annual Meeting of the Association of Computational Teunter, R.H., Duncan, L., 2009. Forecasting intermittent demand: a comparative study.
Linguistics. Association for Computational Linguistics, Prague. J. Oper. Res. Soc. 60 (3), 321–329.
Jiang, A., Tam, K.L., Guo, X., et al., 2020. A new approach to forecasting intermittent Thrun, S., Pratt, L., 1998. Learning to Learn. Kluwer Academic Publishers, Norwell, MA.
demand based on the mixed zero-truncated Poisson model. J. Forecast. 39 (1), 69–83. Verganti, R., 1997. Order overplanning with uncertain lumpy demand: a simplified
Ke, G., Meng, Q., Finley, T., et al., 2017. LightGBM: a highly efficient gradient boosting theory. Int. J. Prod. Res. 35 (12), 3229–3248.
decision tree. In: Proceedings of the 31st International Conference on Neural Viswanathan, S., Zhou, C., 2008. A new Bootstrapping based method for forecasting and
Information Processing Systems. Curran Associates Inc., New York, pp. 3149–3157. safety stock determination for intermittent demand items. Nanyang Technological
Knod, E., Schonberger, R., 2001. Operations Management: Meeting Customer Demands. University, Singapore.
McGraw-Hill, New York. Watson, R.B., 1987. The effects of demand-forecast fluctuations on customer service and
Kolassa, S., 2016. Evaluating predictive count data distributions in retail sales forecasting. inventory cost when demand is lumpy. J. Oper. Res. Soc. 38 (1), 75–82.
Int. J. Forecast. 32 (3), 788–803. Weng, C.G., Poon, J.A., 2008. New evaluation measure for imbalanced datasets. In: 7th
Kourentzes, N., 2013. Intermittent demand forecasts with neural networks. Int. J. Prod. Australasian Data Mining Conference on Data Mining & Analytics. Australian
Econ. 143 (1), 198–206. Computer Society, Inc., Glenelg, pp. 27–32.
Leven, E., Segerstedt, A., 2004. Inventory control with a modified croston procedure and Willemain, T.R., Smart, C.N., Schwarz, H.F., 2004. A new approach to forecasting
erlang distribution. Int. J. Prod. Econ. 90 (3), 361–367. intermittent demand for service parts inventories. Int. J. Forecast. 20 (3), 375–387.
Liao, X., Xue, Y., Carin, L., 2005. Logistic regression with an auxiliary data source. In: Williams, T., 1984. Stock control with sporadic and slow-moving demand. J. Oper. Res.
Proceedings of the 21st International Conference on Machine Learning. Association Soc. 35 (10), 939–948.
for Computing Machinery, New York. Wingerden, E.V., Basten, R., Dekker, R., et al., 2014. More grip on inventory control
Ma, S., Fildes, R., 2017. A retail store SKU promotions optimization model for category through improved forecasting: a comparative study at three companies. Int. J. Prod.
multi-period profit maximization. Eur. J. Oper. Res. 260 (2), 680–692. Econ. 157 (Nov.), 220–237.
Makridakis, S., Spiliotis, E., Assimakopoulos, V., 2020. The M5 uncertainty competition: Wu, P., Dietterich, T.G., 2004. Improving SVM accuracy by training on auxiliary data
results, findings and conclusions. Int. J. Forecast. 36 (1), 224–227. sources. In: Proceedings of the 21st International Conference on Machine Learning.
Morgan Kaufmann, pp. 871–878.
55
X. Zhuang et al. Data Science and Management 5 (2022) 43–56
Yang, Q., Wu, X., 2006. 10 challenging problems in data mining research. Int. J. Inf. Zhou, C., Viswanathan, S., 2011. Comparison of a new bootstrapping method with
Technol. Decis. Making 5 (4), 597–604. parametric approaches for safety stock determi-nation in service parts inventory
Yang, Y., Guo, J., Sun, S., 2021. Tourism demand forecasting and tourists’ search behavior: systems. Int. J. Prod. Econ. 133 (1), 481–485.
evidence from segmented Baidu search volume. Data Sci. Manag. 4 (1), 1–9. Zotteri, G., 2000. The impact of distributions of uncertain lumpy demand on inventories.
Prod. Plann. Control 11 (1), 32–43.
56