Conference Paper Corrections
Conference Paper Corrections
Demand Forecasting⋆
1 Introduction
Supply Chain Management (SCM) is the backbone of the smooth flow of prod-
ucts, services, and information. However, it faces challenges in terms of capacity,
supply, and demand. Also, this adds up in production, storage, and delivery
costs. The significance of dealing with these uncertainties can’t be understated.
This emphasises the crucial impact of demand forecasting on increasing sup-
ply chain performance [1], [?], [3], [4]. Traditionally, two approaches have been
employed: a forward-looking approach, anticipating potential demand over the
next several years, and a backward-looking approach, relying on past or ongoing
capabilities to respond to demand. However, traditional solutions, often relying
on spreadsheet models and statistical methods like moving averages, face limita-
tions in scalability for large-scale data and struggle to address the complexities
⋆
The research work is supported by IoT Cloud Research Group, Indian Institute of
Information Technology, Kottayam.
2 Navadersh S et al.
2 Literature Review:
A review of literature spanning from 2005 to 2023 reveals a growing trend in
publications related to supply chain demand forecasting, with a focus on BDA
applications. Notable techniques identified in this review include neural net-
works, regression, time-series forecasting (ARIMA), support vector machines,
and decision trees. These techniques showcase the increasing utilization of BDA
in SCM demand forecasting, reflecting a departure from conventional statistical
forecasting approaches. The methodology, merits and demerits of the literature
are discussed in Table 1.
S.No Title Proposed Methodology Merits Demerits
1 Improved supply chain manage- Neural Networks, Random Forests, Improved accuracy in demand Limited scalability for large-
ment based on hybrid demand Time-Series Forecasting (ARIMA) forecasting. scale data.
forecasts [10]
2 Comparative Analysis of Ma- Relevance vector machine (KNN), Contributes to advancing the Time-Series Forecasting
chine Learning Algorithms for Support Vector Machine, Decision understanding of machine (ARIMA, SARIMA) are not
Predictive Modeling in Various Tree, Genetic Algorithms (GA), learning algorithm performance considered for seasonal trend
Domains [11] LSTM across diverse application data
domains, facilitating more
accurate and robust predictive
models
3 Big data analytics in supply Data-Mining Algorithms,machine Understanding of the potential Scope of Ensemble Modelling
chain management: A state-of- learning, predictive modelling benefits and challenges asso- for enhancing the forecasting
the-art literature review [12] ciated with implementing big accuracy
data analytics solutions in SCM
contexts.
4 Demand Forecasting for Tex- Regression models, Decision Trees, Improved prediction accuracy Widely used but not scalable
tile Products Using Statistical and Neural Networks for large-scale data, and lim-
Analysis and Machine Learning ited capabilities in handling
Algorithms [13] uncertainties.
5 Daily retail demand forecasting Machine Learning Techniques, Op- Improved forecast accuracy and Computational intensity due
using machine learning with em- timization resource allocation by capturing to the use complex algorithms.
phasis on calendric special days the nuanced effects of calendric
[14] special days.
6 A Comparative Study of De- A hybrid framework (ARIMA and Improved accuracy through Requires hyperparameter tun-
mand Forecasting Models for LSTM ML Models) for demand model combination, adaptable ing, potential complexity in
a Multi-Channel Retail Com- forecasting in multi-channel retail to diverse retail scenarios. model integration.
pany: A Novel Hybrid Machine
Predictive Big Data Analytics For Supply Chain Demand Forecasting
3 Research Objective
An ensemble-based demand forecasting model is proposed. The objective is to
– To study the adaptability of big data demand forecasting models (SARIMA,
Prophet) to nonlinear demand variations in the input dataset.
– To extract Geospatial Feature using K-Means clustering
– To identify the optimal parameters for training model using Grid search.
– To design ensemble architectures that integrate heterogeneous models (SARIMA,
Prophet), to achieve superior predictive accuracy and generalization capa-
bilities across diverse features considered.
– To harness Big Data processing platforms (SPARK), and cloud to experi-
ment the big data demand forecasting.
4 Methodology
In this proposed work, a time series predictive big data analytics using deep
learning models like SARIMA, Prophet and ensemble model is proposed. Fur-
ther to achieve improved prediction accuracy, optimal parameters were identified
using manual and grid search methods. Fig. 1 depicts a detailed workflow dia-
gram of the proposed work.
Extracting Date Feature: The code extracts various features from the DATE
column, such as year, month, day, and weekday. These features capture different
aspects of time, like seasonal variations and weekly trends. Incorporating date
features enriches the dataset and aids in time-based analysis.
Outlier Removal: Using Z-score potential outliers are identified. The equation
for the Z-score is as follows:
X −µ
Z= (1)
σ
To maintain distribution, reduce the impact of any outliers and simplify
interpretation where each data point is scaled linearly to fit within a specific
range using Min-Max Scaling.
(latitude and longitude) of orders are clustered using the K-means algorithm.
The Euclidean distance is used to measure the distance between each pair of
data blocks (di , dj ) from the dataset D ((di , dj ) ⊆ D).
q
Disteq (di , dj ) = (xdi − xdj )2 + (ydi − ydj )2 (3)
The optimal number of clusters is identified using the Silhouette score [18]. It is
calculated using the mean intra-cluster distance (a) and the mean nearest-cluster
distance (b) for each sample.
The K-means algorithms [19] segregate the geographical coordinates into clus-
ters, which are then included as a new feature for training the model.
The daily order quantities are aggregated and performed time series analysis
to understand patterns and trends. For training the model, input data is split
into training and testing sets in an 80-20 ratio. Model Selection: Based on
preliminary studies [10] [11] [12] [13] [14], SARIMA and Prophet models are
used to generate forecasts for future order based on historical data.
Predictive Big Data Analytics For Supply Chain Demand Forecasting 7
Mean squared error The mean squared error (MSE) is calculated by squaring
the residual error for each data point and then computing the average. The
equation for MSE is represented as follows::
n
1X
M SE = (yi − ŷi )2 (6)
n i=1
MSE values can range from 0 to ∞, with smaller values being preferable.
Root mean squared error Root mean squared error (RMSE) is akin to MSE,
with the addition of a square root. The equation for RMSE is identical to MSE,
with the addition of a square root:
√
RM SE = M SE (7)
Based on preliminary studies, SARIMA and Prophet models are used to gen-
erate forecasts for future orders based on historical data. Hence in this work both
the models (SARIMA, Prophet) are used for training the model. The perfor-
mance of the SARIMA and Prophet models is evaluated using metrics depicted
in Tables 3 and 4. Further, Fine-tuning the parameters of SARIMA and Prophet
models could potentially lead to improved accuracy and better generalization of
unseen data. Techniques such as grid search and manual search are explored
in this work to efficiently navigate the high-dimensional parameter space and
identify optimal configurations.
Fig. 3. Actual vs. Predicted Plot for SARIMA Model with Grid Search
Both manual and grid search methods were employed to determine the op-
timal parameters. The SARIMA (0, 1, 1)(0, 1, 1, 12) configuration achieved the
lowest MSE among the SARIMA models, suggesting its effectiveness in captur-
ing the underlying patterns in the data. The results depicted in Table 3 exhibit
that the differences in performance metrics between grid search and manual
methods were marginal, indicating that the grid search approach might be more
10 Navadersh S et al.
practical due to its automation. The comparison of Actual and Predicted values
for SARIMA Model with Grid Search is represented in Fig. 3.
Fig. 4. Actual vs. Predicted Plot for Prophet Model with Grid Search
Fig. 5. Actual vs. Predicted Plot for ENSEMBLE Model using Linear Regression
Finally in this ensemble model using weighted Average, Simple Average and
Linear regression are experimented. The ensemble technique is a powerful ap-
proach to machine learning where multiple models are combined to improve
predictive performance. Instead of relying on a single model, ensemble methods
leverage the strength of multiple models to make more accurate predictions. In
this work, the combined predictions from SARIMA and Prophet models are used
for ensemble modelling. The trained the ensemble model on the combined predic-
tions and actual order quantities to leverage the strengths of both models. The
results tabulated in Table 5 exhibit the linear regression ensemble model shows
less RMSE, it outperforms especially when the relationships between individual
model predictions and the target variable are linear, and it offers interpretability
by providing coefficients for each model’s contribution. The actual vs. predicted
Plot for the ENSEMBLE Model using Linear Regression is represented in Fig 5.
In this proposed work, a time series predictive big data analytics is proposed.
The time series and deep learning models like SARIMA, Prophet and ensemble
model are used to predict the sales demand from the IOWA_LIQUOR_SALES
dataset. In addition, geographical feature is extracted using KMeans clustering
and used in the predictive modelling. The experiments were evaluated using
five node Spark clusters deployed in the cloud environment. The finding under-
scores the efficacy of ensemble modelling in augmenting forecasting accuracy,
particularly when integrating clustering-based segmentation with SARIMA and
Prophet models.
In future studies, ensemble methods such as stacking, boosting, and bagging,
among others, will be investigated. The strengths of multiple models will be com-
bined in novel ways to construct more robust and adaptive ensemble frameworks,
aiming to elevate the predictive performance beyond the baseline established.
12 Navadersh S et al.
References