Effective Cement Demand Forecasting Using Deep Learning Technology: A Data-Driven Approach For Optimal Demand Forecasting
Effective Cement Demand Forecasting Using Deep Learning Technology: A Data-Driven Approach For Optimal Demand Forecasting
ISSN No:-2456-2165
Fig. 2: Architecture diagram showing the flow of the entire project with detailed information.
(Source: - Open-Source ML Workflow Tool- 360DigiTMG)
II. METHODS & TECHNIQUES data points, we employed cubic spline interpolation, a
technique that accurately estimates missing values while
A. Data Collection: preserving the temporal coherence of the time series data.
The dataset used in this research was provided by the Given that some market codes had periods with zero sales, we
client, a prominent French cement manufacturer. The data retained these entries as valuable indicators of no sales
primarily originated from internal company records, activity, as they contribute to understanding temporal
encompassing a wealth of information regarding cement patterns. We utilized cubic spline interpolation to address
quantity in kilotons. The dataset covers a substantial time missing data points, as it offers a robust method for estimating
span, from January 2018 to April 2023, thereby providing a values between known data points while ensuring that the
comprehensive view of cement demand over a five-year resulting time series maintains its temporal coherence. This
period. Within this temporal scope, we gathered data on technique effectively bridges gaps in the data, allowing for
cement sales quantities for 16 distinct market codes, each more accurate analysis and forecasting [Fig.2].
representing a unique market or region. The primary variable
of interest in this dataset is the quantity of cement sold, To assess the stationarity of the data, we performed a
measured in kilotons. Additionally, the dataset includes comprehensive evaluation. We employed the ADF test to
temporal information, allowing us to track sales over time and statistically test for stationarity [2, 3]. A significant p-value
uncover temporal patterns. Given the extensive nature of the from this test indicated stationarity for some codes and non-
dataset, it is worth noting that some market codes contain stationarity for others, prompting us to further investigate and
more than 30% zero values, indicating periods of no sales transform the data as needed. In addition to the ADF test, we
activity. To ensure the reliability and accuracy of our applied the KPSS test to complement our stationarity
analysis, we executed a series of data preprocessing steps assessment. The KPSS test assesses whether the data exhibits
(ak.2) [Fig. 2]. These steps included addressing data stationarity around a deterministic trend. To gain a deeper
anomalies such as duplicates and missing values. For missing understanding of the data's behavior, we examined it for signs
data points, we applied cubic spline interpolation to preserve of a random walk using an AR (1) model. This check allowed
temporal coherence and integrity, allowing us to maintain the us to identify potential non-stationary elements in the time
continuity of the time series data. series.
B. Data Preprocessing: This rigorous data preprocessing phase ensured that our
In the data cleaning phase, we rigorously addressed data subsequent analyses and modeling efforts were built upon a
anomalies to ensure the quality and integrity of the dataset. solid foundation of clean, coherent, and appropriately
This process involved: We systematically identified and transformed data. It also allowed us to make informed
removed duplicate entries from the dataset, eliminating decisions regarding stationarity and model selection based on
redundancy and preventing skewed analysis. For missing the data's characteristics.
Fig. 3: Above figure shows the train and test MAPE comparison for various models.
Fig. 4: This figure shows the trend line for market code 29 which has the highest sale across all 16 market codes ranging from 1 to
34.
Fig. 5: This figure shows the plots and total cement quantity sold for market codes 1 to 21.
Fig. 6: This figure shows the plots and total cement quantity sold for market codes 22 to 34.
In the results section, our analysis reveals the achieved an impressive MAPE reduction, signifying its
culmination of an extensive exploration into various prowess in demand forecasting.
forecasting models to determine the most accurate and
effective approach for predicting cement demand. A However, the success of our endeavor extends beyond
comprehensive array of models, including ARIMA, model selection. It is equally vital to acknowledge the holistic
SARIMA, SARIMAX, and many more were meticulously approach taken, which considered a myriad of business
tested and evaluated, [Fig.3]. Among these, the Long Short- constraints. We meticulously optimized cement distribution
Term Memory (LSTM) model emerged as the most adept, strategies, minimizing transportation costs, enhancing
consistently yielding the lowest Mean Absolute Percentage delivery timelines, and streamlining inventory management.
Error (MAPE) for the majority of market codes. This model, This multi-faceted approach ensured alignment with business
known for its ability to capture complex temporal dynamics, objectives while remaining attuned to constraints inherent in
the cement manufacturing industry.