0% found this document useful (0 votes)
10 views

out

Uploaded by

muhammad idrees
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

out

Uploaded by

muhammad idrees
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 16, No. 2, 2025

Data Analytics for Product Segmentation and


Demand Forecasting of a Local Retail Store Using
Python
Arun Kumar Mishra1, Megha Sinha2
Department of Computer Science and Engineering, University College of Engineering and Technology (UCET),
Vinoba Bhave University, Hazaribag -825301, Jharkhand, India1
Department of Computer Science and Engineering, Sarala Birla University, Ranchi-835103, Jharkhand, India2

Abstract—In today's competitive business environment, machine learning, and time series modelling. Pandas, NumPy,
understanding customers' expectations and choices is a necessity Scikit-learn, and Prophet are particularly adept at product
for the successful operations of a retail store. Forecasting demand clustering, trend analysis, and prediction modelling.
also plays an important role in maintaining inventory at an
optimum level. The work utilises data analytics for product This article examines the utilisation of Python-based data
segmentation and demand forecasting in a local retail store. analytics methods for efficient product segmentation and
Python is being used as a programming language for data demand forecasting in a small retail establishment. The study
analytics. Historical sales data of a local store has been used to seeks to analyse previous sales data to Determine specific
categorise products into different segments. Statistical techniques product segments for focused marketing and inventory
and a k-means clustering algorithm have been used to understand approaches and construct predictive models to anticipate future
different segments of the product. Machine learning algorithms demand, reducing stockouts and excess inventory.
and time series models have been used to forecast future sales
trends. The business insights allow the retail store to meet This study's findings emphasise that data-driven techniques
customers' expectations, manage inventory at an optimum level can enhance decision-making processes in retail, resulting in
and enhance supply chain efficiency. The present work seeks to greater efficiency, customer satisfaction, and profitability.
illustrate how data-driven tactics can enhance operational
decision-making in retail. II. LITERATURE REVIEW
Generally, uniform control measures for all inventory
Keywords—Data analytics; product segmentation; demand products are inadvisable. The high-value items may be essential
forecasting; multicriteria ABC classification; seasonality to the viability of the firm. The study in [1] talked about ABC
I. INTRODUCTION and multicriteria ABC analysis. In ABC analysis, products are
classified into three classes: A, B and C. Class A items entail
To ensure business growth and maintain operational significant stock-out expenses and necessitate stringent control
efficiency, understanding customer choices and forecasting measures. It emphasised that multicriteria ABC analysis was
demand for various products have become important in the crucial for comprehending different product categories: volume
current competitive retail environment. The problems a local drivers, margin drivers, regular movers, and slow movers. In
retail store faces include but are not limited to inventory multicriteria analysis, categories A_B and A_C denote volume
management, optimisation of sales strategy and fulfilling driver items, B_A and C_A indicate margin driver items,
customers' expectations in changing market trends. In this categories B_B, B_C, and C_B imply regular items, category
scenario, product segmentation and demand forecasting help C_C reveals slow-moving items, and A_A encompasses both
overcome these hurdles to run a successful business. margin and volume driver items. The research performed
Based on common characteristics such as sales performance, multicriteria ABC analysis on the online retail dataset utilising
revenue generation, demand trends and consumer preferences, data analytics methodologies. The study in [2] proposed using a
products are classified into different groups. This process is three-phased Multi-Criteria Inventory Classification (MCIC)
nothing but product segmentation. It helps retailers customise integrating the Analytical Hierarchy Process (AHP), Fuzzy C-
marketing strategies, optimise inventory, and enhance overall Means (FCM) algorithm, and a newly proposed Revised-Veto
resource allocation. Previous sales data is utilised to predict (RVeto) phase to adhere to the ABC Classification principles
future demand trends in demand forecasting. It helps retailers and enhance its application and adaptability. Classification
make proactive decisions in procurement, inventory based on several criteria is essential to meeting management's
management, and supply chain operations. needs in the current context. The study in [3] presented a semi-
supervised explainable methodology that integrated semi-
Python, an object-oriented programming language, provides supervised clustering with explainable artificial intelligence.
a rich set of libraries and tools for data analytics. Python The semi-supervised method integrated intelligent initialisation
provides extensive solutions for addressing intricate retail with a constrained clustering process that directed the
difficulties, encompassing data pre-treatment, visualisation, classification procedure towards Pareto-distributed items. At the

226 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 16, No. 2, 2025

same time, explainable artificial intelligence was employed to suppliers in the automotive sector was delineated. The study in
generate comprehensive micro and macro explanations of [9] reviewed the available literature, focusing on market
inventory categories at both the item and class levels. conditions, supplier characteristics, buyer characteristics, and
Implementing the suggested method for the automatic the connections between buyers and suppliers. The study in [10]
classification of chemical items within a distribution formulated an innovative methodology for supplier
organisation has demonstrated its efficacy in delivering precise, segmentation. Fuzzy logic was utilised to divide suppliers in a
transparent, and thoroughly elucidated ABC classifications. The broiler firm.
study in [4] presented an optimal multi-criteria ABC inventory
classification for supermarkets to manage commodities based on The study in [11] performed a comparative analysis of
unit price, lead time, and annual usage. Of the 442 objects, 30 machine learning algorithms for demand forecasting under
were categorised as group "A," 31 as group "B," and 27 as group uncertainty. The research utilised a synthetic dataset. The
"C" under the new ABC classification; nevertheless, all these machine learning algorithms compared were Linear Regression,
things were categorised in group "A" in the conventional ABC Decision Tree Regression, Random Forest Regression, Support
classification. The study in [5] indicated that AI-based methods Vector Machine Regression (SVR), XG Boost Regression on
exhibited more accuracy than multiple discriminant analysis the parameters of Mean Absolute Error (MAE), Mean Squared
(MDA). The statistical study specified that SVM facilitated Error (MSE) and Root Mean Squared Error (RMSE). The study
superior classification accuracy compared to alternative AI in [12] proposed a model that integrated time series analysis,
methodologies. This discovery indicated the potential for boosting and deep learning for demand forecasting. It achieved
employing AI-driven methodologies for multi-criteria ABC a significant enhancement in accuracy relative to state-of-the-art
analysis within enterprise resource planning (ERP) systems. The studies. The testing utilised authentic data from Turkey's SOK
study in [6] aimed to present a case-based multiple-criteria ABC Market. The article compared the Decision Tree Classifier,
analysis that enhances the traditional method by incorporating Gaussian Naive Bayes, and K-Nearest Neighbours (KNN). The
other factors, such as lead time and SKU criticality. It offered Gaussian Naive Bayes technique exhibited the greatest accuracy
in demand estimation. The study in [13] focused on demand
greater managerial flexibility. Decisions from instances served
as input, with preferences for alternatives represented naturally forecasting and consumer satisfaction within the retail sector. It
using weighted Euclidean distances. It facilitated easy emphasised the importance of precise demand estimation for
understanding for the decision-maker. The study in [7] merchants. The discussion encompassed machine learning
examined current portfolio models in procurement that methodologies for forecasting product demand. The paper
categorise purchases into several product classifications. Case considered variables for prediction, including time, location, and
studies from two European automotive OEMs and two vehicle historical data.
industry suppliers and benchmarking interviews at Toyota, III. PRESENT WORK
Japan, were used to establish a connection between these
product categories and various supplier types. Further, it tried In this paper, product segmentation was performed using
correlating the product categories and supplier types with the ABC and Multicriteria ABC analysis. First, data was collected,
specification process—specifically, associating the and then it was prepared for the segmentation exercise. Then,
specification types with their respective generators. The study in segmentation was performed using Python’s inventorize
[8] conducted supplier segmentation within the automobile package. After that, classification algorithms were applied to it,
sector and proposed four techniques for supplier relationships. and performance was evaluated. The flow of work is shown in
Additionally, a four-phase approach for analysing, selecting, Fig. 1.
and managing decisions on a dynamic relationship strategy with

Fig. 1. Workflow for product segmentation.

227 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 16, No. 2, 2025

The same dataset was analysed for demand trends, and a


comparative analysis of different machine learning algorithms
was done to forecast the demand for A-class products. Fig. 2
shows the workflow for this.

Fig. 4. Sample local retail dataset.

The inventorize module in Python was utilised to conduct an


ABC analysis based on volume. Further, the ABC analysis was
performed on revenue. After that, multicriteria analysis based on
‘Revenue’ and ‘Quantity’ was performed on the dataset. The
dataset obtained after this analysis was used to perform a
comparative study of machine learning algorithms viz. KNN,
Decision Tree, Random Forest and Naïve Bayes algorithm for
classification. The product mix categorisation was treated as the
labelled output, while the rest of the columns were used as the
basis for classification. First of all, data was split into training
and test data. The test size of the data was kept at 20% of the
total data. The models were trained and then tested. Confusion
Fig. 2. Workflow for demand trend analysis and forecasting. matrices were plotted for each model, and accuracy scores were
calculated.
Two years monthly sales data of a local retail store situated
at Hazaribag was captured and stored in a file named As shown in Fig. 4, the prepared dataset was used to
‘Sales_data.xlsx’. The dataset sample is displayed in Fig. 3. understand the demand trend. A graph was plotted for monthly
sales over time to know monthly sales trends. Seasonal
decomposition was performed to learn more about seasonality
trends, and a graph was plotted. Further analysis was carried out
to compare machine learning algorithms, viz. Linear regression,
Decision tree, Random forest, SVR and XG Boost regressor for
demand forecasting of class A items of local retail store. Next,
demand forecasting was performed using the said machine
learning algorithms. The dataset was split into train and test data
with 20% data size. A comparison of predictions was done for
all these items. The performance metrics included MAE, MSE
and RMSE. To visualise the results in a single frame, grouped
bar graphs were plotted for MAE, MSE, and RMSE. Then, the
graph was plotted to visualise the comparative performance of
Fig. 3. Local retail dataset sample.
the machine learning algorithm in this study.
IV. RESULTS AND DISCUSSIONS
The dataset has five columns: ‘Month’, ‘SKU’, ‘Quantity’,
‘Price’, and ‘Revenue’. It has 4339 records and was analysed for The result counts of ABC analysis on volume and revenue
null values and duplicates. After removing records with null are displayed in Fig. 5 and Fig. 6, respectively.
values and eliminating duplicates, the dataset comprised 4089 Then, multicriteria analysis based on ‘Revenue’ and
rows. It was further analysed for quantity value. Only those ‘Quantity’ was performed on the dataset. Fig. 7 displays the
records were kept in which the quantity value was greater than product mix count.
0. For analysis purposes, one new column, ‘Date’, was added
using the column ‘Month’, converting it to a datetime object and The dataset obtained after this analysis was used to perform
dropping it for further analysis. Fig. 4. shows the sample a comparative study of machine learning algorithms, viz., KNN,
prepared dataset. The pertinent columns of the dataset, namely Decision Tree, Random Forest, and Naïve Bayes algorithm for
‘SKU’, ‘Quantity’, and ‘Revenue’, were retained for ABC and classification. Fig. 8 shows the confusion matrices for these
multicriteria ABC analysis. Additionally, data was consolidated algorithms.
according to 'SKU'.

228 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 16, No. 2, 2025

(a)

Fig. 5. ABC Count by volume.

(b)

Fig. 6. ABC Count by revenue.

(c)

(d)
Fig. 7. Multicriteria ABC analysis.
Fig. 8. Confusion matrix (a) KNN (b) Decision Tree (c) Random Forest and
(d) Naïve Bayes classification algorithms.

229 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 16, No. 2, 2025

The comparative chart for the accuracy scores has been


displayed in Fig. 9.

(a)

Fig. 9. Accuracy scores comparison for classification.

Decision Tree and Random Forest classification algorithms


with identical accuracy scores outperformed KNN and Naïve
Bayes algorithms. (b)
Next, a graph plotting monthly sales over time shows
monthly sales trends. Fig. 10 displays the seasonal
decomposition for the same.

(c)

Fig. 10. Seasonality trend for monthly sales data.

There are irregular spikes in this plot, suggesting an


occasional high level of demand at irregular periods. It is clear
(d)
from the trend plot that there is variability in the dataset. It
indicates that the variability is inconsistent enough to make an
upward or downward trend over time. The seasonal component
of the graph indicates a continuous seasonal influence.
However, that influence is very thin. It is shown by minor
variations in numbers within a recurring period. The residuals in
the plot are scattered, showing greater fluctuations during
instances of spikes in the data. It suggests that the spikes may
not be described by the trend or seasonality. Further analysis was
carried out to compare machine learning algorithms, viz. Linear
regression, Decision tree, Random Forest, SVR and XG Boost
regressor for demand forecasting of class A items of local retail
store. Fig. 11 shows the demand pattern for these items. (e)

230 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 16, No. 2, 2025

(f)

Fig. 12. Comparative performance of machine learning algorithms based on


MAE for all class ‘A’ items.

(g)

Fig. 13. Comparative performance of machine learning algorithms based on


MSE for all class ‘A’ items.
(h)

(i)
Fig. 11. Demand pattern for (a) Printed Matter 18% (b) Printed Matter 12%
(c) Printing Job Work (d) PEN (e) Envelope (f) Paper (g) Envelopes (h) Letter
Head (i) Hard Bord Kut.

Demand for these products was forecasted using machine Fig. 14. Comparative performance of machine learning algorithms based on
learning algorithms, and MAE, MSE, and RMSE were RMSE for all class ‘A’ items.
calculated. To visualise the results in a single frame, grouped bar
graphs have been plotted for MAE, MSE, and RMSE, as shown Fig. 15 shows the comparative performance of the machine
in Fig. 12, 13, and 14, respectively. learning algorithm compared in this study.

231 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 16, No. 2, 2025

applied to identify demand trend analysis and demand


forecasting on class A items of a local retail store. Machine
learning algorithms were also compared to forecast these items.
SVR outperformed other algorithms for nearly half of the
products in this respect.
REFERENCES
[1] A. K. Mishra, & M. Sinha. Data Analytics for Multi-criteria ABC
Analysis and Supplier Segmentation in Making a Competitive Supply
Chain Using Python. In PROCEEDINGS OF 10th INTERNATIONAL
SYMPOSIUM ON FUSION OF SCIENCE AND TECHNOLOGY
(ISFT-2024) JANUARY 4-8, 2024.
https://www.jcboseust.ac.in/assets/files/sovenir_PROCEEDING_ISFT_
2024_1.pdf
[2] Fatih Yiğit, Sakir Esnaf, “A New Fuzzy C-Means and AHP-Based Three-
Phased Approach for Multiple Criteria ABC Inventory Classification.”
IMSS’19 Sakarya University - Sakarya/Turkey, 9-11 September 2019, pp.
633-642
[3] A. A. Qaffas, M. A. B. Hajkacem, C. -E. B. Ncir and O. Nasraoui,
"Interpretable Multi-Criteria ABC Analysis Based on Semi-Supervised
Clustering and Explainable Artificial Intelligence," in IEEE Access, vol.
11, pp. 43778-43792, 2023, doi: 10.1109/ACCESS.2023.3272403.
[4] Aregawi Yemane, Alehegn Melesse Semegn and Ephrem Gidey,” ABC
Fig. 15. Comparative analysis of best-performing machine learning algorithms Classification for Inventory Optimization (Case Study Family
for demand forecasting. Supermarket)”, Industrial Engineering & Management, Research - (2021)
Volume 10, Issue 5, ISSN: 2169-0316
It can be seen that SVR outperformed other algorithms for [5] Min-Chun Yu (2011). Multi-criteria ABC analysis using artificial-
intelligence-based classification techniques. Expert Syst. Appl.. 38. 3416-
44.4% of class A items, XGBoost performed better than other 3421. 10.1016/j.eswa.2010.08.127.
algorithms for 33.3% of items, and Random Forest [6] Ye Chen, Kevin W. Li, D. Marc Kilgour, Keith W. Hipel, A case-based
outperformed other algorithms for 22.2% of class A items. distance model for multiple criteria ABC analysis, Computers &
Operations Research, Volume 35, Issue 3, 2008, Pages 776-796, ISSN
V. CONCLUSION 0305-0548,https://doi.org/10.1016/j.cor.2006.03.024.
Given that clients seek a diverse range of products while [7] R. Nellore, and K. Söderquist (2000) 'Portfolio approaches to
procurement: Analysing the missing link to specifications', Long Range
desiring more excellent value for their expenditure, it is Planning, 33, 245-267.
imperative to comprehend the several categories of products,
[8] G. Svensson (2004) 'Supplier segmentation in the automotive industry: A
including volume drivers, margin drivers, and frequent and slow dyadic approach of a managerial model', International Journal of Physical
movers. Multi-criteria ABC analysis serves as an effective tool Distribution and Logistics Management, 34, 12-38.
for conducting this segmentation. This study aims to evaluate [9] M. Day, G. M. Magnan and M. M. Moeller (2010) 'Evaluating the bases
the efficacy of classification algorithms in conducting multi- of supplier segmentation: A review and taxonomy', Industrial Marketing
criteria ABC analysis on a retail dataset. Products have been Management, 39, 625-639.
classified based on ‘Quantity’ and ‘Revenue’ parameters into [10] J. Rezaei and R. Ortt, (2013) 'Multi-criteria supplier segmentation using a
A_A, B_A, C_B, B_B, A_B, C_C, A_C and B_C categories. In fuzzy preference relations based AHP', European Journal of Operational
Research, 225, 75-84.
the contemporary internet company landscape, categorising
[11] Arun Kumar Mishra, Megha Sinha, & Sudhanshu Kumar Jha. (2024).
products and locating providers of vital commodities has Comparative analysis of machine learning algorithms for demand
become crucial for business viability. Consequently, strategic forecasting under uncertainty. Computer Science & IT Research
collaborations may be established for commodities that generate Journal, 5(8), 1817-1827. https://doi.org/10.51594/csitrj.v5i8.1409.
volume and margin. The 'inventorize' module was an effective [12] Z. H. Kilimci, A. O. Akyuz, M. Uysal, S. Akyokus, M. O. Uysal, B. Atak
Python tool for conducting multi-criteria analysis based on Bulbul & M. A. Ekmis (2019). An improved demand forecasting model
quantity and income. It may assist in determining the critical using deep learning approach and proposed decision integration strategy
for supply chain. Complexity, 2019(1), 9067367.
elements to retain. The results indicate that, among the
[13] A. I. Arif, S. I. Sany, F. I. Nahin, & A. S. A. Rabby (2019, November).
classification algorithms evaluated based on accuracy score, Comparison study: product demand forecasting with machine learning for
Random Forest and Decision Tree Classifier exhibited shop. In 2019 8th International Conference System Modeling and
comparable performance. Further, a data-driven approach was Advancement in Research Trends (SMART) (pp. 171-176). IEEE.

232 | P a g e
www.ijacsa.thesai.org
© 2025. This work is licensed under
http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding
the ProQuest Terms and Conditions, you may use this content in accordance
with the terms of the License.

You might also like