Demand Focasting Using Machine Learning
Demand Focasting Using Machine Learning
2
Acknowledgements
I wish to express my sincere gratitude to Ms. Terri Hoare, my professor and research
supervisor for her patient guidance from the start, valuable and constructive
suggestions provided throughout the planning and development of my research, and
her willingness to give her time generously, has been very much appreciated. I
would also like to thank Mr. John O’Sullivan, my professor for his guidance and
assistance in helping me with the pre-processing of the data and offering ample
resources in the field of Time series forecasting. I would also like to extend my
thanks to H2O.ai for offering me the Driverless AI trial to conduct the research.
Finally, I wish to thank my family for their support and encouragement throughout my
course.
3
Abstract
This research titled “Demand forecasting using statistical and machine learning
algorithms” aims to compare the performance of traditional statistical and machine
learning algorithms to forecast the demand for 50 products. Demand forecasting is a
crucial part of a firm’s operations, it aims at predicting and estimating the future
demand of products to aid the decision making process. The current research
conducted compares the time series models with one another to identify the better
model, this research focuses on implementing different algorithms to identify the
variation in performance for each product. Traditional statistical models such as
ARIMA and Theta method are implemented alongside Machine learning algorithms
such as MLP and a new technology by H2O called Driverless AI. The accuracy of
each algorithm is evaluated using the Back-testing technique by splitting the existing
data into a train and test set. The models are built using the train set and the
demand of the products are forecasted for the existing year, the forecasted values
are compared with the values of the test set to compute the MAPE. After computing
the errors of the models, the entire data is used to forecast the future demand for the
products using each of these algorithms. The results show that, ARIMA accurately
forecasts the demand for 10 out of the 50 products, Theta accurately forecasts the
demand for 25 out of the 50 products and MLP accurately forecasts for the
remaining 15 products. ARIMA couldn’t handle the products with a strong pattern
and returned a generic model, Theta method and MLP are able to decompose the
data and forecast for products with a strong pattern. The Driverless AI was out
performed by ARIMA and Theta for all the products and the MLP for majority of the
products. The conclusion of this research is that, different statistical and machine
learning algorithms needs to be implemented when forecasting the demand of a set
of products to identify the best performing algorithm for each product.
4
Table of contents
5
List of Figures
6
4.21 PDP and ICE 44
5.1 Product-wise Error comparison of models 50
7
List of Tables
Table Title Page
No. no.
4.1 ARIMA MAPE 31
4.2 ARIMA Monthly forecast 32
4.3 Theta MAPE 34
4.4 Theta Monthly forecast 35
4.5 MLP MAPE 37
4.6 MLP Monthly forecast 38
4.7 DAI MAPE 40
5.1 Best 10 ARIMA models 45
5.2 Best 10 Theta models 46
5.3 Best 10 MLP models 47
5.4 Best 10 DAI models 47
5.5 Product-wise Error comparison of models 48
8
Chapter 1 - Introduction
1.1 Business Problem
Every organization faces the constant change in the planning and decision-making
process of the business. To meet the needs of the organization, a type of forecast is
needed, more reliable the forecasts better will be the results for planning and
decision-making. Forecasting has challenged management for years, with the
current advancement in computers, complex forecasting methods can be
implemented with ease. An efficient forecasting system is a requirement in the
growing supply chain management world, which as a result aids the firms to handle
the demand shifts of their products and resources. Every firm’s goal is to hold
enough inventory to meet the customer demand and to reduce the cost for buying
and stocking the inventory.
For the firm’s manufacturing department, the management must forecast the
demand for the products to estimate the raw materials, labour and budget required to
meet the demand. It is important for the organization to plan and schedule these
resources before the customers demand the products. In inventory holding locations
such as stores, dealers and distribution centres forecasts are necessary for the
inventory control systems therefore, the firm must be able to forecast the demand for
each of the products well in advance to have right amount made available to fill the
demand of the customers.
When there is an oversupply in the inventory the organization will incur losses due to
excess storage space required, depreciation of stock and expiry of items. With an
undersupply of inventory the organization will lose its sales and as a result lead to
bad will. Hence, there is a need for reliable forecasts which will allow the
organization to cope with the demand shift and enhance the growth of the
organization.
Aim: Accurately forecast the future demand for each of the product to aid managerial
decisions.
Objective: To compare the best statistical models and machine learning algorithms
to identify the most efficient forecasting model for each of the 50 products, will the
linear or non-linear method for time series forecasting prove to be better, if so what
are the factors that influence them.
9
Hypothesis: The performance of algorithms varies for different products.
1.3 Scope
The scope of this research is to help build a demand forecasting tool that includes a
wide range of statistical and machine learning algorithms to accurately forecast the
demand of different products.
1.4 Limitations
The number of univariate time series forecasting algorithms compared for this
research was narrowed down to four, for ease of computation and comparison
among different products.
To create a strategic plan for implementing the dissertation project, the following
roadmap is used.
This chapter includes the problem definition, reseach question,aim and objectives and the hypothesis to be
Introduction
tested.
Literature This chapter highlights the exsisting research for time series forecasting with the use of research journals and
Review books which include the theories, concepts and models of forecasting.
This chapter utilizes the CRISP-DM approach to conduct the research, each of the six phases of the
Methodology
methodology are tailored for this research.
Data The aim of this chapter is to compare the finding and the performance of each algorithms without concluding
Analysis the finding.
This chapter includes the interpretation of the algorithms' results, discussion of the findings and answering
Discussions
the research question.
Conclusion This chapter will summarise the finding of the research to come to a conclusion.
10
Chapter 2 – Literature Review
The current research done for time series forecasting is investigated to identify the
best performing forecasting methods for both Statistical and Machine learning
algorithms. It is a difficult task to conclude which are the best models (Traditional
statistical models, machine learning algorithms or deep learning algorithms) for
forecasting time series data. To better understand the theories, concepts and models
required for Demand forecasting, the following Books and Research journals were
referred.
(Dietrich et al.,2015), this book explains the basics of time series and its
components. This book gives an insight on the Autoregressive (AR) model, Moving
average (MA) model, Autoregressive Integrated Moving Average (ARIMA) model,
Autocorrelation Function (ACF), Partial Autocorrelation Function (PACF) and
concluding with the advantage and disadvantages of the ARIMA model.
(Noor-Ul-Amin, 2010), Compares the ARIMAX model with ANN to forecast the daily
calls to the emergency services due to road accidents. This study focuses on the
concerns of implementing the ANN and the way through. The study states that the
ARIMAX model failed to successfully forecast the call volume because of the low r-
squared value of 38.23%. The feed forward neural network was implemented using
different orders with a stopping rule i.e. if there is no decrease in error one more step
will be taken. To find the appropriate neural network, the number of lag
terms(inputs), number of hidden layers, number of neurons in the hidden layers were
taken into consideration. The results show that the ANN model outperforms the
ARIMAX model and that the low order Neural network models were better than the
higher order ANN models. The study also concludes that increasing the lags terms
and the hidden layers didn't increase the performance of the model.
(Assimakopoulos and Nikolopoulos, 2002) explains the working of Theta method for
forecasting univariate time series and the detailed math behind the model. The
journal article explains how the model captures both the long-term trend and the
short-term fluctuations and combines them both to accurately forecast the future.
The study is concluded with the perspectives for the future research of the theta
model. (Hyndman and Billah, 2003) This research paper helps better understand the
Theta model and compares the forecasts results of the Theta model with that of a
Simple Exponential Smoothing with drift (SES - d). The research suggests that using
a maximum likelihood approach to optimize the parameters of the SES-d, it can
outperform the theta model. The paper concludes that the Theta method is a special
case of SES-d where the drift parameter is half the slope of the linear trend line fitted
to the data. (Nikolopoulos et al., 2011) compares the Theta model with Simple
Exponential Smoothing with Drift (SES-d) and the optimised SES-d using the M3
competition time series dataset. The study suggests that Exponential smoothing
method is more generic at forecasting when compared to the theta model because of
the data decomposition approach used i.e. the assumption that the existing trend will
11
continue or that similar trend will recur. By comparing the SMAPEs of the two
models, the Theta model out performs SES-d in the Quarterly and Other- M3
subsets with a SMAPEs of 0.30% and 0.36% respectively. The study focuses on the
advances of the future research such as the use of more than two Theta lines and
the use of different Theta-lines combinations for each horizon of forecasting.
(Fiorucci, 2016) explains the advances on time series forecasting by comparing four
scientific papers. This thesis explains the standard theta model for univariate time
series forecasting and compares different models in the “forecTheta” package in R
such as Standard Theta method, Optimised Theta method and DOTM.
(Khaled Safi, 2013) compares two methods to forecast the monthly electricity
consumption in Gaza Strip i.e. ARIMA (Linear model) and Multilayer Perceptron
(Non-linear model). The dataset used contains monthly electricity consumption from
2000 to 2011, using 90% of the data to train the models and the remaining 10% to
test. The MLP model was trained using hidden layers ranging from 1 to 15, the study
shows that the performance of the model varies with different learning rates based
on the varying number of hidden layers. The study concludes that the MLP
outperforms the ARIMA model by offering consistency in the forecasts.
(Kochak and Sharma, 2015) uses feed forward Neural Network (Multilayer
Perceptron) with different back propagation training algorithms such as Batch
Gradient Descent, Variable Learning Rate, Conjugate Gradient Algorithms and
Levenberg-Marquardt algorithm to forecast the monthly sales of Fuel filters for a
year. The neural network was trained using the sales data for the years 2011 to 2013
to forecast for the year 2014. The percentage error was calculated using the actual
and expected values (Back testing), the results suggest that the Levenberg-
Marquardt back propagation algorithm performed better than the other training
methods and delivered a more reliable forecast for the product.
(Massaro et al., 2018) compares the performance of the Deep Learning Algorithm in
RapidMiner with Support Vector Machines (SVM), k-Nearest Neighbour (k-NN),
Gradient Boosted Trees and Decision Trees to forecast the sales using the Walmart
dataset. The Deep Learning algorithm implemented uses a feed-forward multilayer
perceptron (MLP) with an optimized operator that finds best parameter setting of the
algorithm automatically. The Optimize operator was used for both the Deep Learning
and Gradient Boosted Trees with an Exponential Rectifier activation function, 5
Hidden Layers with 50 neurons each and a quantile cost function. The results
suggest that based on the average absolute error and the relative average error, the
12
Deep learning algorithm is the best algorithm to forecast the sales followed by the
Gradient Boosted Tree.
(Makridakis et al., 2018) compares the advances in time series forecasting using
previous journals and the results of the M3 competition held by the International
institute of Forecasters and evaluates the performance of 8 traditional statistical
against 8 machine learning algorithms on a large subset of 1045 monthly time series
data used in the M3 competition. Ahmed et al. (2010 evaluates 8 machine learning
algorithms (Multilayer Perceptron (MLP), Bayesian Neural Networks (BNN), Radial
Basis Functions, Generalized Regression Neural Networks (GRNN), K-Nearest
Neighbour Regression, CART Regression trees, and Gaussian Process Regression
on a large subset of 1045 monthly time series data used in the M3 competition. In
this research, different pre-processing methods such as LAGGED-VAL (No special
pre-processing), DIFF (Time series differencing) and MOV-AVG (Taking moving
averages) are used to compare the models. The results show that different pre-
processing methods lead to a difference in model performance, however the MLP
and Gaussian process Regression have the best overall performance. The results
suggest that the Machine learning algorithms had a better performance over the
traditional statistical models, amongst the Machine learning the Multilayer Perceptron
(MLP) had the highest accuracy and amongst the traditional statistical models, the
Theta model had the highest accuracy. The paper also concludes that the Machine
learning algorithms require more computational power when compared to the
traditional statistical methods.
(Paoli et al., 2010) conducts the research using Multilayer Perceptron (MLP) and an
ad-hoc time series pre-processing to predict the global solar radiation on a horizontal
surface. This study compares different models such as ARIMA techniques, Bayesian
inference, Markov chains and k-Nearest Neighbours with the MLP. These models
were implemented on the time series with and without pre-processing the data,
without pre-processing, AR(8) and ANN had better RMSE score when compared to
Markov chains, Bayesian Inference and k-NN. Pre-processing the data based on
clearness of the sky and the clear sky index reduced the forecasting error of the
ANN by 5-6% when compared to the other models. It is found that the ANN with
clear sky pre-processing provides a solution for the winter months and the ANN
without pre-processing provides a better result for the summer months. The study
concludes by introducing a new differential variable to study the predicted errors
based on the first differential, using this the ANN and the ARMA had a good
accuracy for the predicted tendency.
13
From the literature review it is evident that the current research conducted between
different models are concluded based on the overall performance of those models.
Many researchers have compared different statistical and machine learning models
using the M-3 monthly time series data set, the results show that Theta method had
good accuracy among other statistical models and MLP had good accuracy among
other machine learning algorithms. Researchers compared the Theta method to be a
special case of SES-d, but further research states that the SES-d gave a more
generic forecast due to the type of data decomposition used. Comparing the MLP
with other forecasting algorithms researchers have concluded that MLP has a good
accuracy when forecasting time series data. A new automated technology
implemented by H2O called “Driverless AI” is tested for this research as no research
journals available for the same. Based on the existing research conducted in this
field, the following models are chosen to apply for this research:
14
Chapter 3 - Research Methodology and Methods
The research was conducted using the CRISP-DM (Cross-Industry Standard
Process for Data Mining) methodology. This methodology provides a structure for
the research that aids better and faster results. The CRISP-DM methodology
organizes the research into six phases, these phases help better understand the
process and provides a road map to follow when planning and carrying out the
research. The figure below shows the phases of the CRISP-DM model.
Fig 3.1 CRISP-DM Methodology
The arrows show the flow of the process and the frequent dependencies between
the phases. The phases of the CRISP-DM methodology are as follows:
1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modelling
5. Evaluation
6. Deployment
Each of the phases of the CRISP-DM model will be approached in detail with respect
to the research.
15
1. Business Understanding
The first and most important phases of the research project is the Business
Understanding, this phase aims at understanding the project objectives from a
business perspective and converting this business perspective into a research
problem definition and then a plan to achieve these objectives is created.
The aim of this research is to forecast the demand for the products of the future year
to aid managerial decisions. To forecast the demand of the products the historical
data is used to analyse the demand and forecast for the future year, to achieve this
univariate time series forecasting is used. There are numerous linear and non-linear
time series forecasting models being used in the Industry, the best forecasting
models is selected based on the literature review. The objective of this research is to
compare the best statistical models and machine learning algorithms to identify the
most efficient forecasting model for each of the 50 products, will the linear or non-
linear method for time series forecasting prove to be better, if so what are the factors
that influence them.
Time Series
A time series is a sequence of equally spaced values over time i.e. a collection of
time-based observations (Xt) where each observation is recorded at a specific time
(t). The following are the components of a time series:
The following figure (Fig 1.1) provides a visual representation of the components of a
time series variable.
16
Fig 3.2 Time series Components
2. Data Understanding
The data understanding phase starts with the data collection. Once the required data
is acquired, the data is explored to identify data quality problems if any and insights
on the data are gathered.
To forecast the number of units sold of different products for a future year, a dataset
made available at Kaggle was used. The dataset describes the number of units sold
of 50 different products across 10 different stores daily. The data set contains over
913,000 daily records of units sold over five years from 2013 to 2018 and 4 columns
the following are the columns in the dataset:
For this research, the store column is dropped to aggregate the units sold for every
product for a day i.e. the units sold per product are aggregated based on the day to
calculate the daily demand for the product.
17
3. Data Preparation
The data preparation phase consists of all the activities required to process the data
from the initial raw form before it is fed into the model. These tasks include
everything from attribute selection to transforming and cleaning the data. Different
pre-processing is required for the models in R Studio (ARIMA, Theta and MLP) and
H2O Driverless AI model. The following data pipeline describes the pre-processing
required for impending the models in R studio.
• The Initial data set consists of five years data (2013 - 2017), to calculate the test
accuracy of the models the data is split into a training set of four years data
(2013 - 2016) and testing set of one year (2017) using the “dplyr” package in R.
• The initial dataset has sales of each product by the store which means that
there are multiple sales entries for each of the products in every store in a day
which needs to be aggregated based on the product for a day, this is done to
calculate the number of units of a product sold during a day. This is achieved
using the “lubridate” package, it converts the date to date format.
The list of the entire data set consists of 50 products with the monthly sales of each
product over five years (2013 - 2017) i.e. 60 data points per product (12 x 5), the
training set list consists of 50 unique products with monthly sales of each product
over four years (2013 - 2016) i.e. 48 data points per product (12 x 4) and the test set
list consists of 50 products with the monthly sales of each product for the last year
(2017) i.e. 12 data points per product.
18
The pre-processing required for implementing the Driverless AI model is shown
below:
Fig 3.4 Pre-processing H2O driverless AI
The pre-processing steps of the H2O Driverless AI are like that of the previous
model, the difference is that the data is saved as two csv files for training and testing
set. The training set consists of 2400 rows i.e. monthly data for four years (2013-
2016) for each of the product (48 x 50) and the test set consists of 600 rows i.e.
monthly data for the year 2017 for each of the product (12 x 50). The columns of
both the training and test set are as follows:
4. Modelling
In the modelling phase, several models selected based on thorough research are
applied. Each of models have a specific requirement on how the data must be pre-
processed, hence the need to step back into the data processing phase. This phase
includes the selection, creation and assessments of the models.
In this research, the models are initially implemented on the training set list to
forecast for the next year (2017). These forecasted values are compared with the
test set list values of (2017) and the test accuracy of the models are computed. Once
the test errors are computed, the entire data set list is inputted into the models to
forecast for the next year (2018).
The best models for time series forecasting are selected based on the research
conducted in this field:
19
I. ARIMA (Autoregressive Integrated Moving Average)
This methodology was originally developed by George Box and Gwilym Jenkins in
the 1970s this methodology is implemented by following three main steps:
The first step of the methodology states that any trends or seasonality needs to be
removed from the time series to make the time series stationary, to this stationary
time series the autoregressive and moving average models are applied. A time
series 𝑦" for t = 1,2, 3,... is said to be stationary when the following conditions are
true:
Autocorrelation function (ACF) is used to identify the covariance and the underlying
structure of the variable in the time series. For h = 0, the cov (0) = cov (𝑦" ,𝑦" ) = var
(𝑦" ) for all t. According to condition b, the var (𝑦" ) < ∞, the variance of 𝑦" is constant
for all t (Condition a). When the constant variance is combined with condition 1,
E [𝑦" ] = μ for all t and some constant μ, this suggests that the points in the time
series variable will be centered around a constant (0) and the variance over time will
appear to be somewhat constant. Since cov (0) is the variance, the ACF is compared
to the correlation function of the two variables corr (𝑦" , 𝑦"#$ ) hence the value of ACF
will lie between -1 and +1. The closer the absolute value of ACF(h) is to 1, the better
the use of 𝑦" as a predictor of 𝑦"#$ , the quantity h in the ACF is the lag i.e. the
difference between the time points t and t+h. At lag 0, ACF (0) = 1 because ACF
provides the correlation of every point with itself. (Dietrich et al.,2015),
Autoregressive models for a stationary time series 𝑦" , t =1,2,3… is denoted by AR(p)
i.e. autoregressive model of order p. In this model, the time series is a linear
combination of the prior p values, 𝑦"&' for j =1,2,..p of the time series inclusive of a
random error term (𝜀" ) also known as white noise process, it is the random,
independent fluctuations components that occur in a time series. (Dietrich et
al.,2015),
20
Moving Average models for a time series 𝑦" centered at zero can be denoted as MA
(q) i.e. a moving average model of order q. In this model, the time series is a linear
combination of the current white noise term (error term 𝜀" ) and the prior q white noise
terms. The characteristics of the ACF and PACF plots of the Moving Average
(MA(q)) models are somewhat different from the ones of Autoregressive (AR(p))
models. (Dietrich et al.,2015),
To avoid the decision between AR(p) and MA(q) models, these two models are
combined along with differencing ((d) difference between successive y-values)
defining Autoregressive Integrated Moving Average model denoted by ARIMA (p, d,
q). The seasonal time series must be identified and the time series must be adjusted
to detrend the series, this can be achieved using a seasonal autoregressive
integrated moving average model denoted by ARIMA (𝑝, 𝑑, 𝑞) ✕ (𝑃, 𝐷, 𝑄)3 where
● p, d, q are as described earlier.
● s is the seasonal period
● P is the no. of terms in AR model across the s periods.
● D is the no. of differences applied across the s periods.
● Q is the no. of terms in the MA model across the s periods.
The Advantage of the ARIMA model is that analysis can be carried out simply using
historical time series data for the interested variable i.e. ARIMA ignores additional
input variables which simplifies the forecasting process. The disadvantages of the
ARIMA model is the minimum data requirement and that the model does not provide
an indication of the underlying variables that affect the outcome of the model.
As the value of the Theta coefficient becomes smaller i.e. the local curvature is
decreased (θ = 0), the degree of deflation will be the larger. The gradual decrease in
the fluctuations will reduce the absolute differences between the successive terms in
the series, which will reveal the long-term trend in the data (shown in Fig 3.5).
21
Fig 3.5 Theta model: Long term trend
When local curvatures are increased (θ >1) has an inverse effect on the time series
i.e. the time series is dilated. When the degree of dilation gets larger, the larger the
magnification of series will reveal the short-term behaviour (shown in Fig 3.6). This
procedure is followed to form a new time series called “Theta lines”.
In the general formulation of the method, the time series is decomposed into two or
more Theta lines. These Theta lines are extrapolated separately and the forecast are
combined with equal weights. Different forecasting methods can be used for the
extrapolation of the theta lines. In simple cases the time series is decomposed into
two theta lines (θ = 0 and θ = 2):
Where L(θ = 0) is the Theta line with Theta coefficient is zero and L(θ = 2) is the
Theta line with Theta coefficient is two i.e. second differences twice the initial time
series. (Assimakopoulos and Nikolopoulos, 2002)
The first Theta line (θ = 0) flattens the local curvature to describe the long-term term
in the time series using the linear regression line and the second Theta line (θ = 2)
doubles the local curvatures to magnify the short-term behaviour of the time series, it
is then extrapolated using a simple exponential smoothing (SES). The forecasts from
both the Theta lines are combined with equal weights to calculate the final forecast
of the Theta model. (Assimakopoulos and Nikolopoulos, 2002)
The advantage of the Theta model is the method chosen to decompose the initial
data into the two components. The theta model adapts both the long-term trend and
the short-term trend separately without neglecting either of the trends unlike existing
models where a method neglects the long-term trend when it tries to adapt the more
22
recent trend and vice versa. Rob Hyndman compares the Theta model with Simple
Exponential Smoothing with Drift (SES - d) (Hyndman and Billah, 2003), the M3
competition results suggest that the Theta model had a better overall accuracy. The
disadvantage of the Theta model is that if both the theta lines are not combined with
equal weights, the model will end up being more generic than the SES - d.
The Theta model was implemented in R studio using the “forecTheta” package. The
stheta function is used to implement the standard theta model in a for loop for each
of the 50 products. The output of the models is saved as a list of 50 i.e. one model
for each of the 50 products.
MLP is a type of feed forward neural network which has a minimum of three
computational units consisting of an input layer, one or more hidden layer and an
output layer. The MLP uses a commonly used supervised learning technique called
backpropagation to train the model.
In the above figure the MLP has three input nodes, five hidden nodes in a single
layer and one output node. All the three inputs go through each of the hidden layer
(𝐻* ,𝐻@ ,𝐻A ,𝐻C ,𝐻D ) and the final combined result is outputted (𝑂* ). The formula can be
given by
D A
Each neuron is a conventional regression that passes through a transfer function f()
to become nonlinear. The similar types of neurons are arranged in the network
through which the inputs are passed. After the neurons are passed through the
nonlinear regressions, the results are combined at the output node 𝑂* . The neural
network gets each approximation capabilities through the combination of the multiple
nonlinear regressions. Using the right number of nodes, any function can be
approximated to a higher accuracy. The transfer function f() commonly used for
regression functions are logistic sigmoid or hyperbolic tangent. The output node uses
23
a linear transfer function to transform the input values to the network output acting as
conventional linear regression. (Kourentzes, N.)
If f() is the identity function, the neurons become a conventional linear regression, if
f() is nonlinear, based on the weights 𝑎',J and the constant 𝑎H,J the neuron’s
behaviour changes. Both the logistic sigmoid and the hyperbolic tangent transfer
functions contain the input between a range of two values, therefore the output will
not increase or decrease beyond the range of values. Forecasting the trend using
neural networks cannot be achieved on the initial time series data, like the ARIMA
model, the trend can be removed using differencing. Appropriate scaling of data is
required to predict the trend. (Kourentzes, N.)
The Advantages of the MLP is that it automates the number of inputs (lags) and the
number of hidden layers for each of the product. The Disadvantage of the MLP is
that the results may vary due to the auto selection of lags and hidden layers. The
other issue is that differencing is helpful in pre-processing is not the best method to
deal with the trend.
MLP was implemented in R studio using the “nnfor” package which is exclusively for
time series forecasting using Neural networks. This package uses a MLP function
that is automated, It pre-processes the time series (scaling and differencing) and
selects the best number of hidden layers and lag series. The MLP models are
implemented in a for loop for each of the 50 products. The output of the models is
saved as a list of 50 i.e. one model for each of the 50 products.
The Advantages for Driverless AI for time series is that it automates the entire
process and is reasonably quick at training the model. The disadvantages of the
Driverless AI are that the time series forecasting is in the alpha testing stage and it
requires both a training and a test dataset to forecast i.e. in the absence of a test
data the model cannot be used for forecasting.
24
fastest computational GPU on cloud. The NC6 consists of 6 cores, 56 GB memory
and a 380 GB SSD.
The following pipeline best describes how the modelling in H2O Driverless AI works:
The training set and testing set data is fed into the model, the target, time and time
group columns are selected along with the forecasting period. The feature
processing shows the features (both original and those created by the Driverless AI)
that are important to the drive the model. Using the important features, the Driverless
AI the creates the final model which forecasts the values and compares it with that of
the existing data.
The fig 3.9 shows how DAI model was implemented, the model uses GBM (Gradient
Boosted Machines) for the feature evolution and the final pipeline. For time series
forecasting DAI uses the XGBoost algorithm.
V. Evaluation
In this phase, each of the models are evaluated to check if they are capable to
achieve the business objectives. Once the model’s results are evaluated, these
models are deployed.
A method called Back testing is used to calculate the accuracy of the models by
using a portion of the historical data to forecast the values for the existing historical
data and then the forecasted values and the actual values are compared. This
25
method of model evaluation is chosen to commute the test accuracy, the time series
models in R studio only returns the training errors.
For measuring the accuracy of each of the models, the existing data of five years
(2013 to 2018) is split into train and test set. The train set contains four years of data
(2013 to 2016) and the test set contains the data for the year 2017. The models are
trained on the four years data to forecast the number of units sold for each product
for the year 2017. The actual value is compared against the forecasted value and the
percentage error is calculated.
MAPE (Mean Absolute Percentage Error) measures the size of the error in
percentage terms. MAPE is used to calculate the error for this study due to the ease
of interpretation. The formula applied is as follows:
1 |𝐴𝑐𝑡𝑢𝑎𝑙 − 𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡|
U Σ c ∗ 100
𝑛 |𝐴𝑐𝑡𝑢𝑎𝑙|
The MAPE is calculated by the month for each product and then aggregated by the
year for the ease of interpretation.
VI. Deployment
Once the models are created, the knowledge gained is presented in a way that the
Business can use it, this includes using the models created to aid the decision-
making process of the organization. Based on the requirement, the deployment
phase can be a simple generated report or a repeated process.
In this research, the deployment phase is the generation of the forecasted sales of
the products for the year 2018. The future deployment will be building a demand
forecasting tool for an organization that compares different traditional statistical and
machine learning algorithms to identify the best performing algorithm for every
product.
26
Chapter 4 – Data Analysis
To identify the most efficient forecasting model for each of the 50 products, the
finding of the research will be presented by following a road map mentioned below:
The histogram of the units sold daily reveals a right skewed distribution (Positively
skewed) which means that the mean is greater than the median. It shows that the
highest quantity of products demanded in a day range from 20 to 30 units and that
there were very few products that was demanded in large quantities.
27
2. Line chart of the daily units sold for all products.
A Line chart reveals the overall increasing trend and the seasonality of the units sold
daily for all the products over the five years (2013 - 2017). The blue dots highlight the
number of units sold daily and the black lines highlight the overall trend and
seasonality in the data.
The line chart of the number of units sold aggregated based on the year reveals a
positive linear trend for each of the 50 products over the years, this suggests that
28
there was an increase in the demand for the products over the years. Since there are
products with similar demand there is an overlap between those products.
The Line chart shows the demand for each of the products over five years (2013 -
2017). For the ease of visualization, 10 products are visualized together in one grid.
The products are arranged from left to right with two line charts per row.
29
Fig 4.6 Line chart of the demand of products 21 -30
The line charts show that the demand for each of the 50 products, the demand over
the 5 years gradually increased for each of the products. The plots also show that
there exists a seasonal pattern in the data.
30
4.2 Model Results
The ARIMA model implemented for the 50 products using the training set and the
accuracy (MAPE) is calculated by comparing units sold per product in the testing set
values with the forecasted values. The MAPE score of the models arranged from
lowest MAPE scores to the highest, they are as follows:
31
The ARIMA models for the 50 products have an MAPE (Mean Absolute Percentage
Error) ranging from 1.40% to 5.18%. Product 1 has the highest MAPE and Product
42 has the lowest MAPE score as shown in fig 4.9.
The entire data is used build the ARIMA model to forecast the monthly product-wise
units sold for the year 2018. The table below shows the demand for the products
during the year 2018.
Table 4.2 ARIMA Monthly forecast
Product Jan Feb Mar April May June July Aug Sept Oct Nov Dec
1 5588 5783 7235 8311 8853 9465 10371 8789 8080 7844 7947 6291
2 14643 14730 18759 20847 23279 23487 25719 23021 21012 20323 20911 15974
3 9188 9188 11693 13307 14442 14961 16588 14339 13149 12738 13193 9937
4 5610 5871 7152 8170 8901 9247 10012 8872 8101 7692 8062 6183
5 4523 4737 6038 6872 7421 7758 8608 7402 6706 6481 6890 5049
6 14654 14892 18972 21233 23411 23900 26396 23484 21374 20648 21226 16182
7 14599 15092 18862 21782 23354 24328 26856 23309 21130 20629 21080 16103
8 19153 19397 24680 27738 30468 31150 34261 29967 27618 26772 27457 21020
9 12574 13036 16509 18424 20236 21145 23501 20406 18511 18043 18622 13878
10 18880 19018 23888 26912 29581 30131 33523 29194 27068 26102 26842 20303
11 17418 18078 22576 25721 28061 29015 32254 27950 25587 24725 25436 19282
12 17223 17736 22285 25567 27416 28551 31750 27376 25189 24393 24957 18886
13 21408 21658 27134 30732 33828 34742 38131 33456 30549 29594 30390 23028
14 14623 14973 18850 21322 23252 24252 26352 23129 21220 20452 20928 15970
15 22454 22485 28814 32562 35320 36261 40519 34848 32229 31132 32070 24186
16 6426 6578 8503 9472 10431 10699 11806 10139 9519 9225 9358 7191
17 8356 8474 10841 12225 13283 13599 15251 13403 12328 11822 12066 9234
18 21651 21858 27416 31126 33883 35034 38705 33671 30889 29913 30706 23421
19 10403 10605 13330 14957 16211 16923 18572 16237 15050 14535 14902 11482
20 12111 12359 15454 17278 19194 19610 21949 18952 17561 16747 17321 13317
21 9958 10352 13187 14954 16188 16746 18478 16184 14713 14032 14837 11100
22 20592 21119 26435 29909 32444 33493 36917 32471 29812 28657 29635 22766
23 7455 7562 9656 10646 11816 12098 13325 11762 10777 10379 10724 8094
24 16742 16758 21277 24413 26341 27190 30219 26194 23822 23104 23758 18074
25 20649 20890 26500 29794 32609 33446 37133 32093 29734 28392 29266 22608
26 11927 12148 15404 17373 18857 19536 21392 18897 17076 16587 17354 13172
27 5587 5704 7158 8223 9024 9140 10018 8843 8049 8036 8202 6196
28 22240 22566 28703 32387 35417 36311 40388 34566 32453 31104 32084 24433
32
29 17330 17754 22199 25415 27600 28761 31603 27342 25285 24444 24978 18764
30 10164 10288 13201 14777 16203 16550 18633 15916 14670 14278 14558 11236
31 14722 15306 19137 21903 23626 24315 26955 23178 21415 20768 21147 16452
32 10755 10937 13905 15741 17067 17785 19693 17022 15814 15128 15486 11590
33 17397 17647 22262 25223 27572 28274 31727 27474 24991 24295 25098 18845
34 6408 6531 8293 9334 10096 10324 11394 10194 9241 8936 9239 6962
35 16359 16769 21225 23854 26181 27078 30170 26260 23872 22994 23800 17792
36 19384 19588 24890 28153 30741 31286 35086 30386 28269 27176 27589 20987
37 7258 7284 9407 10426 11469 11902 13065 11568 10428 10031 10540 8000
38 20327 20328 25930 29264 32032 33259 36634 31983 29157 28533 28867 22012
39 11018 11091 14300 16044 17536 17944 19856 17605 15956 15423 16076 12260
40 7512 7467 9432 10634 11752 12034 13282 11704 10561 10314 10670 8106
41 5498 5557 7150 8030 8752 8833 9993 8600 7938 7642 7802 5910
42 8991 9167 11648 13070 14315 14735 16179 14350 13047 12517 13044 9871
43 13003 13516 16686 18967 20579 21107 23432 20126 18878 18033 18888 14246
44 7207 7323 9270 10374 11481 11772 12955 11424 10533 9980 10395 7829
45 20120 20650 25986 29471 31995 33352 36936 31985 29285 28188 29197 22182
46 14935 15030 19028 21129 23461 24249 26550 23399 21380 20575 21522 16332
47 5358 5447 6940 7739 8611 8786 9717 8510 7726 7556 7782 5906
48 12996 13458 16669 18957 20773 21337 23553 20666 19027 18282 18879 14524
49 7222 7466 9290 10536 11794 12057 13349 11748 10442 10140 10604 7915
50 16325 16636 20994 23714 26061 26635 29151 26095 23603 22923 23510 17938
The 10 best ARIMA models are identified based on the low overall MAPE score
calculated using the back-testing technique. The following are the plots of the
demand forecasted by the best 10 ARIMA models:
33
2. Theta Model
The Theta model implemented for the 50 products using the training set and the
accuracy (MAPE) is calculated by comparing the testing set values with the
forecasted values. The MAPE score of the models arranged from lowest MAPE
scores to the highest, they are as follows:
34
The Theta models for the 50 products have an MAPE (Mean Absolute Percentage
Error) ranging from 1.42 % to 4.24 %. Product 27 has the highest MAPE and Product
49 has the lowest MAPE score as shown in fig 4.11
.
Fig 4.11 Theta MAPE
The entire data is used build the Theta model to forecast the monthly product-wise
units sold for the year 2018. The table below shows the demand for the products
during the year 2018.
Table 4.4 Theta Monthly forecast
Product Jan Feb Mar April May June July Aug Sept Oct Nov Dec
1 5524 5577 7236 8301 9046 9515 10299 9035 8075 7750 8133 5913
2 14955 15108 19698 22142 24523 25064 27283 24337 21778 20873 21699 15991
3 9529 9472 12419 14006 15514 16002 17444 15426 13837 13241 13799 10119
4 5521 5555 7278 8344 9127 9442 10185 9135 8150 7766 8126 5918
5 4854 4817 6285 7206 7881 8086 8876 7817 6928 6629 7022 5056
6 15065 15007 19844 22428 24600 25221 27761 24587 21923 21058 21852 15877
7 14994 14986 19802 22400 24480 25260 27565 24438 21750 21055 21633 15863
8 19727 19837 26120 29463 32392 33340 36413 31921 28829 27679 28852 20967
9 13081 13203 17267 19326 21337 21999 24196 21243 19103 18403 19064 13891
10 19006 18896 24696 27995 30844 31284 34675 30613 27465 26406 27335 19867
11 18222 18358 24028 27153 29959 30754 33740 29549 26550 25460 26450 19342
12 17668 17838 23402 26573 29171 29960 32860 28808 25854 24858 25956 18855
13 21635 21633 28464 32087 35484 36255 39673 35123 31444 30427 31504 22936
14 15214 15420 19913 22609 24854 25500 27892 24652 21965 21230 21916 16149
15 22880 22662 30023 33755 37179 38056 41766 36912 33215 31702 32950 23967
16 6463 6507 8541 9547 10641 10811 11886 10416 9400 9053 9295 6871
17 8421 8415 11018 12452 13706 14003 15439 13826 12420 11771 12077 8894
18 21605 21800 28274 31880 35441 36309 39757 34998 31433 30191 31341 22863
19 10294 10317 13557 15354 16807 17348 19030 16742 14960 14304 15021 10941
20 12069 12241 15979 17939 19900 20287 22356 19766 17869 17013 17558 12993
21 10243 10349 13561 15271 16868 17287 19017 16833 15001 14309 15014 10941
22 20556 20712 27082 30489 33618 34416 37819 33553 30024 28710 29801 21894
23 7642 7617 10058 11112 12373 12691 13947 12308 11055 10558 11011 7934
24 17020 16980 22253 25180 27625 28479 31198 27571 24646 23585 24492 17976
25 20644 20663 27074 30761 33708 34549 37972 33372 30121 28759 30035 21922
26 12104 12226 16089 18082 19844 20463 22451 19803 17669 16880 17801 12887
35
27 5610 5559 7364 8291 9270 9471 10345 9201 8199 7911 8124 5957
28 22500 22308 29339 33081 36496 37360 41012 36310 32557 31325 32476 23758
29 17897 18013 23554 26745 29355 30203 32943 29154 26139 25026 26142 18900
30 10220 10285 13453 15159 16732 17022 18871 16571 14877 14322 14794 10850
31 14971 15082 19827 22343 24424 25131 27498 24363 21797 20874 21719 15927
32 11297 11310 14717 16752 18532 18873 20755 18240 16435 15944 16356 11933
33 17614 17802 23338 26157 28954 29563 32626 28740 25662 24542 25724 18785
34 6690 6631 8762 9912 10757 11052 12156 10731 9720 9244 9745 6996
35 17053 17121 22251 25055 27627 28476 31193 27705 24636 23524 24636 17877
36 19925 19979 26069 29439 32769 33353 36610 32428 29057 27974 28848 21044
37 7561 7539 9983 11083 12237 12729 13881 12303 11032 10452 10932 8040
38 20961 20836 27425 30900 34154 35059 38405 34083 30464 29178 30249 22040
39 11335 11296 14870 16733 18419 18831 20798 18387 16389 15665 16456 11877
40 7512 7539 9849 11125 12314 12584 13886 12201 10912 10479 11032 7955
41 5647 5586 7402 8370 9204 9341 10275 9077 8137 7844 8121 5939
42 9366 9430 12350 13939 15299 15825 17274 15251 13711 13089 13708 9947
43 13248 13271 17391 19547 21628 22034 24402 21367 19219 18320 19328 14027
44 7497 7516 9858 11092 12282 12623 13746 12152 10989 10385 10938 7934
45 20621 20639 27205 30540 33585 34692 38266 33471 30006 28787 29844 21842
46 15225 15166 19855 22357 24725 25661 27951 24637 22070 21135 22207 16049
47 5642 5701 7418 8345 9292 9482 10418 9198 8206 7941 8215 6028
48 13177 13298 17352 19603 21693 22133 24297 21469 19346 18461 19208 14052
49 7353 7390 9734 10923 12147 12471 13643 12167 10778 10395 10764 7817
50 16602 16687 21930 24776 27294 27981 30578 27203 24343 23329 24193 17720
The 10 best Theta models are identified based on the low overall MAPE score
calculated using the back-testing technique. The following are the plots of the
demand forecasted by the best 10 Theta models:
36
3. Multilayer Perceptron (MLP)
The MLP implemented for the 50 products using the training set and the accuracy
(MAPE) is calculated by comparing the testing set values with the forecasted values.
The MAPE score of the models arranged from lowest MAPE scores to the highest,
they are as follows:
Table 4.5 MLP MAPE
Products Actual value (2017) Forecasted (2017) Error (MAPE) %
9 210697 212253 1.14
21 165680 165752 1.22
37 120406 121939 1.78
11 286882 284980 1.87
12 285834 290091 1.91
23 120694 120884 1.92
34 105790 105140 1.97
33 286000 287020 1.98
29 286468 283901 2.02
36 316195 319727 2.10
6 239989 241815 2.14
48 211365 211799 2.18
35 270604 276528 2.31
24 270935 274724 2.38
26 195809 200137 2.54
41 90533 91656 2.65
47 90680 91085 2.66
1 90153 89778 2.73
20 195063 195271 2.78
28 360768 365479 2.81
2 240421 242042 3.02
8 316911 318399 3.04
44 120516 124126 3.08
15 361586 371091 3.16
18 346448 357268 3.25
30 165616 167135 3.30
39 180487 186314 3.32
50 269939 277491 3.32
7 240039 245698 3.67
46 240533 248582 3.69
38 331005 343455 3.98
22 329896 342728 4.05
4 89783 88883 4.33
42 150110 156614 4.57
19 166193 172531 4.64
16 105467 109842 4.67
10 301861 315424 4.89
27 90362 94324 4.96
13 346565 362563 4.98
5 75807 79578 5.00
17 134905 141651 5.12
25 330786 346671 5.43
14 241122 253889 5.57
32 181231 170434 5.69
49 120644 126255 5.73
43 211261 223264 6.09
45 331783 352323 6.28
31 240618 255898 6.62
40 120498 128029 6.83
3 150802 137748 9.65
37
The MLP models for the 50 products have an MAPE (Mean Absolute Percentage
Error) ranging from 1.14 % to 9.65 %. Product 3 has the highest MAPE and Product
9 has the lowest MAPE score as shown in fig 4.13.
The entire data is used build the MLP model to forecast the monthly product-wise
units sold for the year 2018. The table below shows the demand for the products
during the year 2018.
Table 4.6 MLP Monthly forecast
Product Jan Feb Mar April May June July Aug Sept Oct Nov Dec
1 5627 5663 7664 8536 9102 9477 10229 9048 8401 7901 8559 6575
2 13725 14937 18961 21546 23516 24626 26640 23746 21870 20189 21396 16518
3 9279 9185 12084 13611 14965 15107 16974 14902 13410 12747 13594 10097
4 5636 5616 7112 8375 9055 9271 10252 8991 8126 8007 8299 6228
5 4740 4895 6250 7181 7737 8123 8632 7948 7043 6912 7249 5500
6 14832 14829 19009 21573 23453 24272 26590 23897 21731 20902 21584 16561
7 14938 14957 19577 21739 23930 24583 27257 23305 21635 20411 21405 15719
8 18830 18916 24902 28115 30630 31960 35074 31557 29094 27864 29004 22204
9 12823 13029 16871 19027 20729 20999 23275 20654 18644 17563 18745 13817
10 19077 19257 25101 28358 30714 31481 34452 30830 28228 27035 27826 20877
11 17742 17836 23055 25917 28375 29362 32317 28464 25734 25042 26057 19742
12 17408 17721 22655 26262 28260 29270 32519 28059 25935 24786 25518 19550
13 20756 20880 27480 31132 34159 35052 38591 34731 31522 30342 31333 23878
14 14311 14755 18626 21341 23139 24282 26559 22939 21364 20003 21296 16069
15 20970 21073 27784 31663 33903 35823 38803 34811 31467 30277 31465 23299
16 6322 6377 8150 9196 10172 10479 11787 10128 9467 8906 9375 6970
17 8100 8346 10933 12175 13208 13718 15105 13593 12230 11780 12121 9228
18 20165 21253 27738 30813 34403 36275 39655 35165 31612 30565 31826 23516
19 10417 10390 13550 15262 16590 17320 18946 16520 15347 14487 15124 11463
20 11933 12205 15884 17552 19242 20145 21687 19451 17810 16931 17937 13543
21 9963 10167 13248 14882 16415 16977 18760 16430 14956 14274 14911 11106
22 19620 19875 26180 29574 32064 33532 37051 33133 30376 28851 29760 22935
23 7622 7389 9541 10713 11685 12117 13489 11572 10703 10451 10653 8032
24 16397 16804 21804 25076 26773 27963 30320 26829 24599 23291 24238 18237
25 19067 19376 24931 28768 31206 32201 35985 32034 28865 27520 28851 21830
38
26 11564 12006 15669 17584 19254 20053 21747 19458 17526 16781 17410 12680
27 5344 5347 6996 7921 8992 9020 10128 8924 8094 7902 8196 6096
28 21113 21498 27894 31920 34485 35785 40067 35037 32043 30859 31486 24081
29 17510 17679 23257 26408 28176 29530 32280 28747 26280 25031 25985 20231
30 9735 10147 13222 15042 16473 17056 18571 16608 15077 14337 14641 11026
31 14024 14641 18857 21256 23260 24048 26636 23340 21378 20438 21013 16054
32 11301 11426 14746 16728 18278 18659 20695 17985 16762 15825 16475 12318
33 16661 17242 22067 25103 27772 29258 32535 29372 26007 25275 25504 19389
34 6590 6523 8590 9769 10586 11140 12193 10817 9966 9528 9599 7391
35 17911 17910 22380 24691 25631 27729 29371 24826 22394 22518 22435 16773
36 19670 19734 25623 28752 31236 32221 35326 30876 28669 27115 28888 21459
37 7346 7304 9476 10815 11703 12319 13278 11879 10859 10327 10768 7881
38 20420 20642 26634 29574 32699 33468 36690 32260 30009 28650 29747 22773
39 10927 11064 14202 16035 17438 18150 20069 17754 16185 15592 16154 12112
40 7386 7563 9656 10894 12039 12202 13391 11972 10928 10647 10931 8401
41 5607 5534 7378 8167 8956 9197 10175 8948 8192 7912 8199 6207
42 9139 9300 12057 13580 14727 15605 16687 15031 13783 13038 13848 10247
43 12891 13289 17621 19463 21435 22050 23989 21267 19337 18569 19525 14515
44 7401 7383 9557 10859 11766 12457 13494 11948 11115 10624 10796 8378
45 19396 19401 25750 29152 31735 33623 36417 32473 30028 27736 29094 21898
46 14254 14441 18532 20973 22999 24175 26484 23313 21313 20452 21285 16182
47 5490 5627 7218 8406 9060 9449 10525 9177 8102 7906 8115 6182
48 13204 13269 17575 19768 21465 21953 24610 21220 19830 18939 19719 15131
49 7242 7246 9561 10589 11749 12150 13422 11763 10701 10107 10575 7785
50 16576 17379 22071 24566 27074 28168 30326 26930 24287 23542 24147 18361
The 10 best MLP models are identified based on the low overall MAPE score
calculated using the back-testing technique. The following are the plots of the
demand forecasted by the best 10 MLP models:
Fig 4.14 MLP forecast
39
4. H2O Driverless AI
The Fig 4.15 shows parameters used by Driverless AI model when creating the
model for demand forecasting.
Fig 4.15 DAI Interface
The Driverless AI implemented for the 50 products using the training set and the
accuracy (MAPE) is calculated by comparing the testing set values with the
forecasted values. The MAPE score of the models arranged from lowest MAPE
scores to the highest, they are as follows:
40
18 346448 353044 7.84
15 361586 365300 7.87
28 360768 364454 7.88
16 105467 106391 7.90
33 286000 297077 7.91
4 89783 92458 7.95
10 301861 313060 8.09
22 329896 341219 8.27
38 331005 339815 8.28
45 331783 342096 8.28
47 90680 92352 8.29
29 286468 298810 8.32
12 285834 298700 8.41
8 316911 326979 8.44
42 150110 158725 8.51
1 90153 92453 8.71
36 316195 327724 8.71
11 286882 299292 8.74
34 105790 104045 8.83
3 150802 156581 8.93
41 90533 92215 9.04
25 330786 343776 9.05
5 75807 77032 9.13
27 90362 89526 10.87
The Driverless AI model for the 50 products have an MAPE (Mean Absolute
Percentage Error) ranging from 5.49 % to 10.87 %. Product 27 has the highest
MAPE and Product 44 has the lowest MAPE score as shown in fig 4.16.
Fig 4.16 DAI MAPE
The DAI computes numerous factors that aid in building the model, such as:
• Feature importance
• Shapely features
• K-LIME
• Decision tree
• PDP and ICE
41
Feature Importance
The DAI computes the contribution of an input variable to the overall predictions of
the model created by Driverless AI and represents it in a horizontal bar chart. The
importance of the Global feature is calculated by aggregating the improvement in the
splitting criterion in all the decision trees in the DAI model caused by a single
variable.
Fig 4.17 Feature Importance
The Shapely feature importance plot all the global and local variables that are
credible to the DAI model. The Local Shapely values are calculated by passing each
of the rows in the data into a trained tree ensemble and aggregating the contribution
of each input variable of the row. The Global Shapely values are calculated by taking
the average of local shapely values on each row of the data set. For a regression
task, Shapely sums to the prediction of the DAI model.
Fig 4.18 Shapely feature plot
42
K-LIME
The K-LIME explains the local regions of the DAI model using the original variables
in the dataset, the local regions are defined by K clusters. In DAI, to separate the
input training data into K disjoint local regions the local regions are segmented using
K- means clustering. A local GLM model is used for each of the clusters to train the
original inputs and the predictions made by the DAI model. To generate local
explanations of the DAI model, the parameters of the local GLM are used. K is
selected to maximise the 𝑅@ between the predictions of the DAI model and the local
GLMs. The average trends in the DAI predictions are acquired by the trained global
GLM.
Fig 4.19 K-LIME
Decision Tree
The Decision tree is created by training a decision tree based on the original inputs
and the predictions of the DAI model, it represents the DAI model in the form of a
flow chart. On selecting a node, a prediction is highlighted by its decision path.
Fig 4.20 Decision Tree
43
PDP and ICE
44
Chapter 5 - Discussion
5.1 Data Overview
On conducting the exploratory data analysis (EDA) on the data set, it (Fig 4.2)
reveals a pattern in the overall demand for the products. This means that the
demand has a linear trend and is seasonal, to support this statement the overall
yearly (Fig 4.3) and monthly demand (Fig 4.4,4.5,4.6,4.7,4.8) of each product is
visualized using line charts. The yearly demand for each product confirms the linear
trend in demand over the years and the monthly demand for each of the products
confirms the seasonal pattern in the demand for each of the product over the years.
To compare the how each of the four models (ARIMA, Theta, MLP, DAI) performed
on this data set, the 10 models of each algorithms with the least overall MAPE are
selected.
1. ARIMA
The following table shows the 10 products with the least MAPE scores along with the
order of the ARIMA model that was used to achieve the forecasts.
The Lowest MAPE’s range from 1.4 % to 2.09 %. The demand for products
(42,37,47,34,8) was forecasted using the model ARIMA (0,1,0) (1,1,0), it means that
one order of non-seasonal differencing and one order of seasonal differencing i.e.
the time series is stationary after the first differencing is taken. Since there is a
strong pattern the model predicts non-seasonal difference to be constant. (1,1,0) is
the AR 1 coefficient which is the estimate of the coefficient on the lagged first
difference.
The demand for products (32,44,3,29,9) has ARIMA (0,1,0) (0,1,0) which is a special
case of an ARIMA model with one order of non-seasonal differencing and one order
of seasonal differencing with no constants or other parameters. It means that the
time series has a strong seasonal pattern which cannot be made stationary by
seasonal differencing. Due to this strong seasonal pattern ARIMA has the order
(0,1,0) (0,1,0) i.e. seasonal random walk model which will not give a good fit
because it predicts the seasonal difference to be constant. The model assumes that
45
expected values of the future seasonal differences are equal to average seasonal
difference calculated over the entire historical time series. The seasonal random
walk model predict that the future forecasted values will have same relative month to
month changes as the historical values.
From the results some of the products have a generalized forecasted value due to
the random walk models due to the multiplicative pattern in the demand for those
products. The ARIMA models have a low MAPE because the means of the
forecasted values are considered when computing the MAPE, the high and low
values of the forecast vary by a huge margin, which is visualized in the forecast plot
of the 10 products (fig). This means that when a strong pattern exists the ARIMA
model generates a random walk model which assumes that the forecasts for the
future year are same as that of the current year.
Based on the results shown in Table 5.1, the products (42, 37, 47, 34, 8) have the
least MAPE scores (avoiding the products with random walk models) computed
using back testing technique. Therefore, the forecasts of the ARIMA model for year
2018 shown on Table 4.2 will be accurate for these products with the least MAPE.
2. Theta Model
The strong pattern in the demand is captured well by the theta model because of the
decomposition type used, the theta model identifies the type of seasonality in the
data as Multiplicative. In multiplicative seasonality the amplification of the seasonal
variation is connected, i.e. the season becomes wider over the years. The theta
model uses linear regression to capture the trend and simple exponential smoothing
to capture the short-term behavior, therefore the Theta model can provide a good
forecast for the products even with a strong pattern.
Based on the results shown in Table 5.2, the products (49, 33, 21, 9, 41, 32, 29, 35,
5, 47) have the least MAPE scores computed using back testing technique.
Therefore, the forecasts of the Theta model for year 2018 shown on Table 4.3 will be
accurate for these products with the least MAPE.
46
3. Multilayer Perceptron (MLP)
From the table, it is evident that the number of number of lags (inputs) and the
hidden layers varies for different products. The MLP models consider the strong
pattern in the data and forecast for the future year. The forecast plot (fig) for the 10
products shows how the model captures the demand.
Based on the results shown in Table 5.3, the products (9, 21, 37, 11, 12, 23, 34, 33,
29, 36) have the least MAPE scores computed using back testing technique.
Therefore, the forecasts of the MLP model for year 2018 shown on Table 4.6 will be
accurate for these products with the least MAPE.
The DAI model automates the entire process once the target column, time columns,
time groups columns and the forecast periods are selected. The DAI model uses
Gradient Boosting Machines (GBM) to forecast the demand, using the XGBoost
algorithm. The DAI model returns an overall test score (MAPE) of 7.70 % +/- 0.21.
The following table shows the 10 products with the least MAPE scores.
Table 5.4 Best 10 DAI models
Products Error (MAPE) %
44 5.49
49 5.68
37 6.11
17 6.35
9 6.39
40 6.58
20 6.62
23 6.67
48 6.67
35 6.72
47
The lowest MAPE’s range from 5.49 to 6.72 %. The DAI computes numerous factors
that aid in building the model, such as:
• Feature importance
The most important variables (Fig 4.17) that contributed to the overall predictions of
the models is Target lag of the product 12 followed by the combined target lags of
products (10,11,12,17,18,23).
• Shapely features
The global and local variables (Fig 4.18) credible to the DAI model was the target
lags of products (10,11,12,17,18,23)
• K-LIME
The K-Lime chart (Fig 4.19) shows the DAI model and the trained global GLM have
captured the actual data well with an R-squared value of 94.39%.
Based on the results shown in Table 5.3, the products (44, 49, 37, 17, 9, 40, 20, 23,
48, 35) have the least MAPE scores computed using back testing technique. The
DAI model requires a training data and a test data hence the model only was used
only for implementing the back-testing technique. The average errors of the DAI
model are higher than that of the ARIMA, Theta and MLP models. The Driverless AI
for time series is in alpha testing, due to this the feature importance and shapely
features takes only 15 most important features (shown in Fig 4.17 and Fig 4.18).
48
19 4.67 3.32 3.22 6.85 MLP
20 3.77 2.71 2.37 6.62 MLP
21 3.29 1.62 1.48 7.26 MLP
22 4.75 3.90 3.09 8.27 MLP
23 3.32 2.95 2.35 6.67 MLP
24 2.87 2.61 3.45 7.24 Theta
25 4.20 3.71 3.34 9.05 MLP
26 2.38 3.70 2.71 7.18 ARIMA
27 4.73 4.24 6.02 10.87 Theta
28 3.66 2.45 2.60 7.88 Theta
29 2.05 1.90 2.12 8.32 Theta
30 3.31 3.34 8.73 7.20 Theta*
31 3.79 3.12 8.56 7.30 Theta
32 1.64 1.85 3.45 7.07 Theta*
33 2.24 1.58 2.65 7.91 Theta
34 1.89 3.08 2.87 8.83 ARIMA
35 2.28 1.91 1.47 6.72 MLP
36 2.64 2.61 2.75 8.71 Theta
37 1.57 2.68 2.90 6.11 ARIMA
38 2.49 2.88 3.28 8.28 Theta*
39 2.85 2.67 1.66 7.64 MLP
40 2.98 3.28 7.29 6.58 ARIMA
41 2.37 1.83 2.15 9.04 Theta
42 1.40 2.51 5.87 8.51 ARIMA
43 3.38 2.45 2.82 7.08 Theta
44 1.65 2.31 2.11 5.49 ARIMA
45 2.58 2.73 5.86 8.28 Theta*
46 3.34 4.02 2.44 7.01 MLP
47 1.86 2.08 3.91 8.29 ARIMA
48 4.01 3.12 5.98 6.67 Theta
49 2.16 1.42 9.84 5.68 Theta
50 2.25 3.10 3.45 7.66 ARIMA
The “*” represents the cases where a random walk model was generated by ARIMA,
so the next model with the least MAPE is selected. Using the table above (table), we
can conclude that:
49
Using the results from the table 5.5, the algorithm that performed best for each of the
products are identified. The forecasts for the year 2018 for each product are selected
based on the forecasts of the best performing algorithm.
The Line chart below shows the MAPE for each of the 4 Algorithms of the 50
individual products.
Fig 5.1 Product-wise Error comparison of models
As mentioned in the legends, the Red line shows the MAPE scores of the ARIMA
models, the Green line shows the MAPE scores of the Theta models the Blue shows
the MAPE scores of the MLP models and the Black line shows the MAPE scores of
the DAI model. The Best algorithms for the 50 products varies between the ARIMA,
Theta and the MLP models. The ARIMA and Theta model out performs the DAI
model for all the products, the DAI model performed better than the MLP model for
products 30,31,40,49.
From the plot (Fig 5.), It is evident that the performance of the algorithms varies for
different products and that it is necessary to implement different statistical and
machine learning algorithms to better forecast the demand for different products. The
previous research for demand forecasting focuses on finding an overall model that
best fits the data which can be the case for one product. The contribution from this
research is as follows:
50
• When there is a need to forecast demand for n products, a set of statistical
and machine learning algorithms need to be implemented to accurately
forecast the demand for each of the products.
• The performance of the statistical and machine learning algorithms varies for
different products.
• Back testing proved to be an efficient technique to test the accuracy for Time
series forecasting models for each of the product.
• The monthly error of the models needs to be computed for each product to
evaluate the monthly performance of the algorithms.
5.4 Limitations
The limitation of this research is that the average MAPE error is computed for the
ease of comparison of the results between the 50 products. The month wise error
(MAPE) is computed for each of the product i.e. each algorithm will have 12 errors
per product. The average of these monthly errors is compared to conclude the
results, but looking into the monthly error we can see that different algorithms
perform differently for each month of a product. The monthly comparison of the
errors was difficult due to the number of products involved, therefore the average
error was considered. (The Monthly errors are saved as csv files under the names
“Arima Monthly Error, Theta Monthly Error, MLP Monthly Error and DAI Monthly
Error” and are attached along with the artifacts.)
H2O Driverless is a new technology implemented for time series forecasting, it is still
in Alpha testing phase. Due to the requirement of both the training set and test set,
only back testing was implemented to compare the model performance across the
different products. The DAI model was not used forecast the demand of the products
for the year 2018.
51
Chapter 6 – Conclusion
Comparing the results of the statistical and machine learning algorithms, ARIMA
accurately forecasts the demand for 10 out of the 50 products, Theta accurately
forecasts the demand for 25 out of the 50 products and MLP accurately forecasts the
demand for the remaining 15 products. The comparison of the best 10 models of
each algorithm reveals that a different set of 10 products are identified by each
algorithm (Statistical and Machine learning). ARIMA generated a random walk model
when there was a strong seasonal pattern in the data, whereas the Theta models
was able to identify the type of seasonality in the data (Multiplicative) and
decompose the time series based on the type of seasonality and accurately forecast
for those products. When comparing the statistical models, seasonal data is better
handled by theta method than ARIMA. MLP also performed better for the products
where the ARIMA generated the random walk models, which suggests that MLP was
able to forecast data with a strong pattern. H2O Driverless AI has a simple GUI that
helps business users implement forecasting models with ease. DAI automates the
entire process from preprocessing, modelling, evaluating, forecasting and generates
an autoDL recipe, but the product-wise evaluation shows that the ARIMA and Theta
method out performs the DAI for all products. Comparing the overall MAPE of the
algorithms for each product, it is evident that the performance of the statistical and
machine learning algorithms varies for different products i.e. different types of
product demand are captured by different models. Therefore, when forecasting the
demand of different products for an Organization, a set of statistical and machine
learning algorithms needs to be implemented to identify which algorithms accurately
captures the demand of each product and then the best algorithm for that product
must be used to forecast the future demand.
Future Work
The future work of this research will be extending the number of algorithms used and
creating a demand forecasting tool for an Organization. The findings from this
research can be used as a foundation for building a demand forecasting tool. The
following steps needs to be implemented to build a good demand forecasting tool:
• The demand of each product must be analyzed to identify if there is any trend
or seasonality in the data.
• The computed error terms can be expressed in MAPE for the ease of
interpretation of a business user.
52
• The best algorithms for each product can be identified based on the lowest
MAPE score, that algorithm must be used to forecast the future demand of that
product.
• The GUI (Graphical User Interface) or the Gooey interface must be created
with the business user in mind (easy to interpret and visually appealing).
53
Chapter 7 – Plagiarism and Referencing
Ahmed, N. et al. (2010) ‘An Empirical Comparison of Machine Learning Models for
Time Series Forecasting’, Econometric Reviews, 29(5–6), pp. 594–621.
Dietrich, D. et al. (eds) (2015) Data science & big data analytics: discovering,
analyzing, visualizing and presenting data. Indianapolis, IN: Wiley.
Kourentzes, N. (no date a) ‘Can neural networks predict trended time series?’
Available at: http://kourentzes.com/forecasting/2016/12/28/can-neural-networks-
predict-trended-time-series/ (Accessed: 24 November 2018).
Kourentzes, N. (no date b) ‘New R package nnfor: time series forecasting with neural
networks’. Available at: http://kourentzes.com/forecasting/2017/10/25/new-r-
package-nnfor-time-series-forecasting-with-neural-networks/ (Accessed: 28
November 2018).
Laptev, N. et al. (no date) ‘Time-series Extreme Event Forecasting with Neural
Networks at Uber’, p. 5.
Massaro, A., Maritati, V. and Galiano, A. (2018) ‘Data Mining Model Performance of
Sales Predictive Algorithms Based on Rapidminer Workflows’, International Journal
of Computer Science and Information Technology, 10(3), pp. 39–56. doi:
10.5121/ijcsit.2018.10303.
54
Nikolopoulos, K. et al. (2011) ‘The Theta Model: An Essential Forecasting Tool for
Supply Chain Planning’, Lecture Notes in Electrical Engineering. doi: 10.1007/978-3-
642-25646-2_56.
Paoli, C. et al. (2010) ‘Forecasting of preprocessed daily solar radiation time series
using neural networks’, Solar Energy, 84(12), pp. 2146–2160. doi:
10.1016/j.solener.2010.08.011.
‘Can neural networks predict trended time series? – Nikolaos Kourentzes’ (no date).
Available at: http://kourentzes.com/forecasting/2016/12/28/can-neural-networks-
predict-trended-time-series/ (Accessed: 15 December 2018).
Install on Azure — Using Driverless AI 1.4.2 documentation (no date). Available at:
http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/install/azure.html
(Accessed: 15 December 2018).
55
Appendices
This document will guide you through the contents of the Artifacts and the necessary
steps to implement the R code for dissertation project titled “Demand forecasting
using statistical and machine learning algorithms”.
2. Model Results
• ARIMA Results
o ARIMA comparison.csv: This file compares the yearly (2017) units sold
for each product with forecast yearly (2017) values to calculate the Error.
o ARIMA forecast.csv: This file contains the monthly forecasts of each
product for the year 2018 made by ARIMA.
o ARIMA Monthly Error.csv: This file computes the error (MAPE) for the
monthly forecast for the year 2017.
• MLP Results
o MLP comparison.csv: This file compares the yearly (2017) units sold for
each product with forecast yearly (2017) values to calculate the Error.
o MLP forecast.csv: This file contains the monthly forecasts of each
product for the year 2018 made by MLP.
o MLP Monthly Error.csv: This file computes the error (MAPE) for the
monthly forecast for the year 2017.
• Model.Errors.csv: This file compares the average yearly error terms calculated
for the 4 algorithms.
• Theta Results
o Theta comparison.csv: This file compares the yearly (2017) units sold
for each product with forecast yearly (2017) values to calculate the Error.
56
o Theta forecast.csv: This file contains the monthly forecasts of each
product for the year 2018 generated by Theta method.
o Theta Monthly Error.csv: This file computes the error (MAPE) for the
monthly forecast for the year 2017.
3. R code
• Demand forecasting.R: Primary code required to execute demand forecasting
using ARIMA, MLP and Theta method.
• H2O DAI preprocessing.R: This R code pre-process the data for the
requirements of the DAI model and splits it into a train and test set.
4. Readme: Explains the contents of the Artifacts, how to implement the code on R
studio and the Installation process of H2O driver less AI on Azure cloud.
Please refer to the readme file attached along with the Artifacts to understand
how the code is implemented in R studio and the installation process of H2O
Driverless AI on Azure cloud.
57
List of Abbreviations
• AI – Artificial Intelligence
• AR - Autoregressive
• CORR – Correlation
• COV - Covariance
• DAI – Driverless AI
• MA – Moving Average
• VAR – Variance
• VM – Virtual Machine
58