0% found this document useful (0 votes)
1 views

Customer Profiling Segmentation and Sales Predicti

This study explores customer profiling, segmentation, and sales prediction using AI in direct marketing, focusing on RFM analysis and K-means clustering to identify distinct customer segments. Three primary clusters are identified: new customers, best customers, and intermittent customers, each requiring tailored marketing strategies. The research aims to enhance sales performance and customer engagement for digital start-ups by leveraging data-driven insights into customer behavior.

Uploaded by

radm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Customer Profiling Segmentation and Sales Predicti

This study explores customer profiling, segmentation, and sales prediction using AI in direct marketing, focusing on RFM analysis and K-means clustering to identify distinct customer segments. Three primary clusters are identified: new customers, best customers, and intermittent customers, each requiring tailored marketing strategies. The research aims to enhance sales performance and customer engagement for digital start-ups by leveraging data-driven insights into customer behavior.

Uploaded by

radm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Neural Computing and Applications (2024) 36:4995–5005

https://doi.org/10.1007/s00521-023-09339-6 (0123456789().,-volV)(0123456789().
,- volV)

ORIGINAL ARTICLE

Customer profiling, segmentation, and sales prediction using AI


in direct marketing
Mahmoud SalahEldin Kasem1 • Mohamed Hamada2 • Islam Taj-Eddin3

Received: 6 March 2023 / Accepted: 26 November 2023 / Published online: 23 December 2023
 The Author(s) 2023

Abstract
In the current business environment, where the customer is the primary focus, effective communication between marketing
and senior management is vital for success. Effective customer profiling is a cornerstone of strategic decision-making for
digital start-ups seeking sustainable growth and customer satisfaction. This research investigates the clustering of cus-
tomers based on recency, frequency, and monetary (RFM) analysis and employs validation metrics to derive optimal
clusters. The K-means clustering algorithm, coupled with the Elbow method, Silhouette coefficient, and Gap Statistics
method, facilitates the identification of distinct customer segments. The study unveils three primary clusters with unique
characteristics: new customers (Cluster A), best customers (Cluster B), and intermittent customers (Cluster C). For
platform-based Edutech start-ups, Cluster A underscores the importance of tailored learning content and support, Cluster B
emphasizes personalized incentives, and Cluster C suggests re-engagement strategies. By understanding and addressing the
diverse needs of these clusters, digital start-ups can forge enduring connections, optimize customer engagement, and fuel
sustainable business growth.

Keywords Data mining  SVM  Boosting tree  RFM analysis methodology  Deep learning

1 Introduction based on past client purchase data. This study aims to


present a data mining preprocessing method for developing
In today’s business landscape, companies are faced with a customer profiling system that improves the sales per-
the challenge of identifying potential customers who are formance of an enterprise. The study uses an RFM analysis
most likely to respond positively to a product or offer, this methodology to evaluate client capital and a boosting tree
is where data mining techniques come into play. With the for prediction. Furthermore, the study highlights the
increasing amount of data available, data mining has importance of customer segmentation methods and algo-
become an essential tool for direct marketing efforts, rithms in increasing the accuracy of the prediction. The
allowing companies to create a prediction response model main result of this study is the creation of a customer
profile and forecast for the sale of goods, which will assist
decision-makers in making strategic marketing decisions.
& Mahmoud SalahEldin Kasem
[email protected] The study is expected to provide valuable insights for
companies looking to improve their direct marketing
Mohamed Hamada
[email protected] efforts and increase sales performance through data min-
ing-based customer profiling [1–3].
Islam Taj-Eddin
[email protected] The proposed methodology in this study utilizes the
RFM analysis (recency, frequency, and monetary)
1
Department of Multimedia Systems, Assiut University, approach to assess client capital, coupled with a boosting
Asyut, Egypt tree algorithm for predictive modeling. Additionally, the
2
Department of Computer Science, International IT study emphasizes the crucial role of customer segmentation
University, Almaty, Kazakhstan methods and algorithms in enhancing prediction accuracy.
3
Department of Information Technology, Assiut University, The primary outcome of this research is the development of
Asyut, Egypt

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


4996 Neural Computing and Applications (2024) 36:4995–5005

a customer profile that offers valuable insights into cus- [31], emphasizing the significance of starting with seg-
tomer behavior and sales forecasts for goods. However, mentation, marketing, and customer data adjustment to
there is no explicit mention of any secondary results or achieve accurate analysis and profiling.
derivative findings throughout the introduction. The scientific novelty of this research lies in the
To achieve the research goal of enhancing sales per- development of scientific and methodological provisions
formance through data-driven customer profiling, the study and recommendations focused on creating and imple-
will address a series of key tasks. These tasks include data menting a client profiling framework using AI techniques.
collection, a comprehensive study of machine learning Additionally, the study identifies the most effective meth-
methods, specifying the structure of client profiles along ods for researching client profiles and enhancing consumer
with their types and relevant indicators, analyzing and loyalty in Kazakhstani enterprises, exemplifying its appli-
organizing customer data, systematizing international cability in a specific context.
practices for improving client profiling, identifying effec- In conclusion, this research aims to provide valuable
tive methods for researching client profiles, and exploring insights for companies seeking to improve their relation-
the concept of ‘‘consumer loyalty’’ in modern marketing. ship marketing efforts and enhance sales performance
Furthermore, the research aims to clarify the nature and through data-driven customer profiling. With the vast array
structure of consumer loyalty, highlight foreign experi- of customer needs, behavior, and preferences observed in
ences in enhancing consumer loyalty, and emphasize the online business platforms, customer segmentation becomes
factors influencing the selection of reward systems for crucial. The study will explore customer segmentation, its
developing comprehensive consumer loyalty programs for importance in understanding customer behavior, and the
goods and services manufacturers. Practical recommenda- role of AI in this process, offering practical insights for
tions for the formation of such loyalty programs will also organizations looking to improve customer retention and
be formulated. benefit upgrades through data-driven customer
Deep learning is a subfield of machine learning that has segmentation.
seen widespread applications in various industries. Deep
learning models have been applied to tasks such as text
classification, sentiment analysis, machine translation, 2 Related work
speech recognition [4], and table detection and recognition
[5–8]. Health care is another industry where deep learning In the field of customer segmentation, researchers have
has found several applications, including diagnosis, treat- been experimenting with different algorithms to perform
ment planning, drug discovery [9], and medical imaging segmentation on customer data. Most of these studies have
analysis [10–12]. In robotics, deep learning is used for focused on analyzing customer buying history and pur-
autonomous navigation, object recognition [13–15], and chasing behavior to identify segments. In the following
robotic control, handwritten recognition for various lan- paragraphs, related work methods will be explained fol-
guages [16–20], questions–answering [21–25], intrusion lowed by a table, i.e., Table 1, that summarizes the
detection in IoT [26–28], and energy consumption pre- advantages and disadvantages.
diction [29, 30]. According to Jiang and Tuzhilin [32], it is crucial to
The research focuses on studying enterprises, organi- implement both customer segmentation and buyer targeting
zations, and examining their marketing activities within the in order to enhance marketing performance. These two
context of client policy formation and implementation. tasks are integrated into a step-by-step approach; however,
Consumers of goods and services are also within the scope the challenge of unified optimization arises. To address this
of the study. The research delves into the entirety of eco- issue, the authors proposed the K-classifiers segmentation
nomic and organizational relationships that arise as firms algorithm. This method prioritizes allocating more
implement relationship marketing, particularly in creating resources to those customers who generate the most returns
and implementing programs to build consumer loyalty. for the company. A significant number of researchers have
The study’s theoretical and methodological foundation discussed various techniques for segmenting customers in
is built upon essential research by internal and international their studies. Also, the authors propose a direct clustering
scientists in market economy, management, marketing, method for grouping customers. Rather than relying on
consumer, and brand loyalty management. Methodologies computed statistics, this approach utilizes transactional
such as marketing, economic, and statistical analysis, as data from multiple customers. The authors also acknowl-
well as quantitative and qualitative study principles, were edge that finding an optimal segmentation solution is
utilized. Expert methods were also employed to substan- computationally difficult, known as NP-hard. Therefore,
tiate the main research provisions. The study also draws Tuzhilin presents alternative sub-optimal clustering meth-
inspiration from the work of authors Muller and Hamm ods. The study then experimentally evaluates the customer

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2024) 36:4995–5005 4997

Table 1 Related work methods, advantages and disadvantages


Paper Proposed method Advantages Disadvantages

Kashwan [33] K-means algorithm and a statistical tool A continuous analysis and Limited to the use of clustering strategy for
online system for e-commerce determining of market segmentation
organization to predict sales
Brito [34] Two data mining methods (clustering and Better understanding of Limited to redefined industries
sub-cluster discovery) customer preferences
Ballestar [37] Utilization of cashback and client behavior Shows the reliance on the Limited to the use of social network writing to
on social network sites position of clients inside an promoting like dedication, person-to-person
organization communication, development of client, and
commitment of client
Qadadeh [38] K-means for clustering and self-organized Involves various procedures for Limited to the use of multiple procedures for
maps for quality of clustering with division with expert to further segmentation with expert
representation develop organizations
Christy [40] RFM analysis and extended to other Good understanding of the need Limited to the use of RFM analysis and
algorithms such as K-means and RM of client and identification of extended it to other algorithms such as
K-means potential clients for K-means and RM K-means through minor
organization adjustment in K-means clustering
Jiang [32] Direct clustering based on transactional Identifies customer segments Finding an optimal segmentation solution is
data based on actual customer computationally difficult
behavior
He [35] Three-dimensional approach for enhancing Considers multiple dimensions Complexity and high computational cost
CLV, customer satisfaction, and of customer behavior, leading
customer behavior to more accurate
segmentation
Sheshasaayee Integrated approach combining RFM and Integrates different methods to Computationally intensive
[36] LTV methods with two-phase approach improve segmentation
(statistical and clustering) and neural
network

segments obtained through direct grouping and finds them Sheshasaayee [36] developed a new integrated approach
to be superior to statistical methods. to segmentation by combining the RFM (recency, fre-
Kashwan [33] proposed a K-means algorithm and a quency, and monetary) and LTV (lifetime value) methods.
statistical tool to propose a model that elaborates on a They employed a two-phase approach, starting with a
continuous analysis and online framework for an e-com- statistical method in the first phase and then proceeding to
merce organization to predict sales. They involved a cluster in the second phase. The objective is to apply
clustering strategy for determining market segmentation K-means clustering following the two-phase model and
because a developed computing-based system is intelligent then utilize a neural network to improve the segmentation.
enough to address results to managers for a quick and fast Ballestar [37] proposed the role of customers in the use
decision-making cycle. of their cashback and determined the business activity and
Brito [34] emphasized that advertising and manufac- behavior of customers on the site of a social network. They
turing approaches are highly important for customized proposed a model that applied social network analysis to
industries because buying a large variety of products makes marketing such as loyalty, communication, customer
it difficult to find specific patterns of customer preferences. development, and customer engagement to show the
As a result, they proposed two different data mining dependence of customers’ positions within an organization.
methods, clustering and sub-cluster discovery, for customer Qadadeh [38] proposed the evaluation of data analysis
segmentation to better understand customer preferences. algorithms such as K-means for clustering and self-orga-
He and Li [35] propose a three-dimensional strategy for nized maps for the nature of clustering with visualization.
enhancing customer lifetime value (CLV), customer satis- They recommend that involving various procedures for
faction, and customer behavior. The study concludes that segmentation with experts will further develop organiza-
consumers have varying needs, and segmentation helps to tions such as insurance and study segment elements and
identify their demands and expectations, which, in turn, behavior of a customer in any customer relationship man-
leads to providing better service. agement dataset.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


4998 Neural Computing and Applications (2024) 36:4995–5005

Several studies have demonstrated the extensive use of • Leadership It manages the mission run by e-commerce
RFM technology for customer segmentation and informa- organizations to deliver their service and make a lead.
tion access. In the context of commercial banks, marketing For this, they create a common language for the product
representatives can employ K-means classification to and design and go to markets to describe the customers.
identify potential customers. To extract valuable insights In this paper, we proposed a customer segmentation strat-
from customers, data mining methods, including neural egy based on various categories. Different clustering
networks, C5.0, classification and regression trees, and methods such as K-means, repetitive median-based
Chi-squared automatic interaction detector, are highly K-means (RM K-Means), and self-organized maps were
beneficial for detecting background information related to used for segmentation. We proposed a business model for
credit card holders [39]. e-commerce organizations based on segmentation accord-
Christy [40] emphasized that a good understanding of ing to various categories and recency, frequency, and
the customer’s needs and identification of potential cus- monetary (RFM) positioning to retain and acquire cus-
tomers for the organization are satisfied by the segmenta- tomers in e-commerce. Observing new customers is
tion process. They performed segmentation using RFM important, but retaining old customers is even more
analysis and extended it to other algorithms such as important.
K-means and RM K-means through minor adjustments in
K-means clustering.
4 Model, tools, environment,
and technology
3 Problem statement
4.1 The customer segmentation approach
The problem of customer segmentation can be based on
various factors such as marketing, sales, support, product, Client segmentation is a widely used marketing strategy
and leadership. Experts in large or small organizations that involves dividing the customer base into smaller
involved in the data analysis process adjust the working groups based on characteristics such as demographics,
group and set the expectations that it will continue to do so behavior, and purchasing history. This enables businesses
in many stages. Some issues that can be resolved through to understand their customer base better and implement
customer segmentation are given below. more effective marketing strategies. Vector quantization is
• Marketing We can solve the problem by understanding an algorithm commonly used for client segmentation,
our customer base to effectively reach them. We may automatically grouping customers based on their behavior
not be able to observe the business’s email lists using data. While it may not always achieve optimal results, it
the task to be done, but we can observe ones for provides valuable insights for businesses to target their
business to consumer (B2C) subscription organizations marketing efforts. A mapping or vector quantizer can be
with high website traffic volume. used to divide data into smaller groups. The mapping is an
• Sales Many issues faced by sales representatives can be N-level k-dimensional tool that takes various client RFM
resolved by this process. We can route prospects to our values as input vectors. It uses a non-negative real distor-
self-service stream or the most appropriate group within tion measure to represent the difference between the orig-
sales, such as startups, small market businesses, and inal and reproduced vectors. The error distortion measure,
multi-model businesses, based on clear customer widely used in mathematical applications, is chosen for its
segments. computational efficiency [41].
• Support Issues are categorized based on their tool and Data mining techniques have emerged as essential tools
field. After categorization, it can be used to route in market segmentation. This modern approach to market
support inquiries to the appropriate channels, such as research involves processing vast datasets from databases
AnswerBot, Alexa, Google Assistant, our help center, using intelligent solutions, such as neural networks, evo-
or a support representative, to improve customer and lutionary algorithms (EA), fuzzy theory, RFM, hierarchical
business outcomes further. clustering, K-means, bagged clustering, kernel methods,
• Product This process can also resolve issues with Taguchi method, multidimensional scaling, model-based
product quality. Experts should know which product clustering, and rough sets, among others. These techniques
requests and feedback make the biggest impact on offer highly effective and time-efficient means of seg-
which customer and focus accordingly, instead of by menting the market [42].
volume alone. Quantizer optimality is determined based on its ability
to minimize average distortion. An N-level quantizer is

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2024) 36:4995–5005 4999

considered ideal or globally ideal if it achieves this, or at Client status and behavior analysis involve categorizing
least, it is ideal if, for any remaining quantizers, their clients into active and lapsed based on their last purchase.
distortion is greater than the globally ideal quantizer [40]. Behavioral analysis analyzes past customer behavior, such
Quantizer design aims to obtain an optimal or locally as shopping habits and brand preferences, to predict future
optimal quantizer if possible, and various algorithms have actions. Data analysts use this information to segment
been proposed for this purpose in the literature. clients into categories and develop strategies for customer
retention and acquisition [43].
4.2 Machine learning Market division, first defined in 1956, is a method used
by organizations to categorize customers based on similar
Interest in machine learning (ML) has grown due to characteristics, such as geographic location, demographics,
increased processing power and data availability. ML uti- product usage, and purchasing behavior. The goal is to
lizes past experience to enhance performance and make increase customer satisfaction and maximize efficiency by
precise predictions. Tasks include classification, regres- tailoring marketing efforts to specific segments. One
sion, ranking, clustering, dimensionality reduction, and common tool used in the market division is clustering,
complex learning. which groups elements with similar values into segments
[44–46]. While early market division studies only consid-
• Data Preprocessing and Model Optimization It is
ered one set of factors, modern market division models
crucial for AI model creation, involving cleaning,
take into account multiple sets of factors simultaneously,
standardization, transformation, feature extraction, and
called cooperative market division. There are various
selection.
market division methods, including K-means clustering,
• Data Cleaning and Transformation Class imbalance in
hierarchical clustering, association rule mining, decision
ML can lead to issues such as improper evaluation
trees, and neural networks. The objective is to identify and
metrics and overfitting. Techniques such as oversam-
describe customer groups and reach profitable customer
pling and undersampling can address this.
segments [47].
• Missing Data Handling missing values involves dele-
Clustering data is a significant aspect of data mining
tion or imputation with estimated values.
techniques, involving the utilization of latent class analysis
• Sampling Preprocessing plays a vital role in AI model
(LCA), prior clustering, and various similarity or distance
creation, impacting performance and interpretability.
measures to segment large customer groups based on
Techniques such as oversampling and undersampling
individual expectations [48]. Sánchez-Fernández [49] pre-
can address class imbalance but have drawbacks.
sents a conceptual framework centered around tourists’
• Feature and Variable Selection Feature selection is
perception of sustainability policies at various destinations,
critical for identifying relevant data, improving predic-
along with a multidimensional measure to assess this
tive performance, and efficiency. Various methods can
construct. Through an empirical analysis conducted across
be used based on the dataset and computational
five Mediterranean destinations, the proposed conceptual
resources.
model was validated, offering substantial empirical evi-
dence supporting the viability of perceived sustainability as
4.3 Model for customer segmentation a valuable factor in segmentation studies.
and market segmentation
4.4 Customer segmentation and client profiling
Commonly used models for customer segmentation
include: Market and customer segmentation are often used inter-
changeably, with market segmentation seen as a high-level
• Demographic segmentation; strategy and customer segmentation providing a more
• Recency, frequency, and monetary (RFM) granular view. The RFM model is a valuable tool for
segmentation; combining customer segmentation and targeting in cam-
• Customer status and behavioral segmentation. paigns [34]. Genetic algorithms can enhance the customer
Segmentation based on gender is a simple yet effective way division and targeting process, using the LTV model as a
for organizations to categorize their customer base, fitness function [38]. Various mathematical methods have
allowing for targeted content and promotions for gender- been explored for customer segmentation, including sta-
specific events. RFM segmentation is widely used in the tistical techniques, neural networks, genetic algorithms,
direct mail industry to rank customers based on their pur- and K-means fuzzy clustering.
chasing history, considering recency, frequency, and Client profiling involves analyzing client characteristics
monetary value of purchases [34]. for tailored marketing strategies, contributing to customer

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


5000 Neural Computing and Applications (2024) 36:4995–5005

Fig. 1 Processes of customer segmentation

retention and CRM [50]. Segment profiling focuses on The RFM model employed in this study utilized data
understanding specific customer group attributes to guide from the SAS Institute to calculate the recency, frequency,
marketing strategies. Buyer behavior profiles consider and monetary rankings, enabling the segmentation of cus-
social factors such as timing, benefits sought, usage rate, tomers into distinct groups. The data comprise the fol-
loyalty, and attitude for targeted marketing efforts [51]. lowing attributes as shown in Table 3.
The proposed architecture of customer profiling, depic-
ted in Fig. 1, offering a comprehensive approach to 5.2 Preprocessing
understanding customer behavior and preferences.
This study employs three distinct algorithms to perform
customer clustering utilizing RFM analysis. Initially, the
5 Results and discussion data undergo preprocessing to eliminate outliers and
extract pertinent instances. Outliers are detected using the
5.1 Dataset z-score method, which assesses data’s proximity to its
mean and standard deviation. This relationship is trans-
The data utilized for our research were sourced from the formed into a scale from 0 to 1, with values deviating
Marketing Campaign dataset1, which encompasses a cross- significantly from the mean (zero) identified as outliers.
border dataset that encompasses several key demographic
attributes, including age, education level, ID, annual 5.3 Customer profiling approach
income, marital status, and presence of children in the
household, as shown in Table 2. Within this investigation, following the completion of data
preprocessing, the dataset proceeds to the customer pro-
1
https://www.kaggle.com/datasets/rodsaldanha/arketing-campaign? filing phase. In this phase, the K-means algorithm is
select=marketing_campaign.csv.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2024) 36:4995–5005 5001

Table 2 Attributes of first


Serial no. Attributes
datasets
1 ID
2 Year_Birth
3 Education
4 Marital_Status
5 Income
6 Kidhome
7 Teenhome

Table 3 Attributes of second datasets


Serial no. Attributes

1 Dt_Customer
2 Recency
3 MntWines
4 MntFruits
5 MntMeatProducts
6 MntMeatProducts
7 MntFishProducts
8 MntSweetProducts
9 MntGoldProds
10 NumDealsPurchases
Fig. 2 Processes of RFM analysis
11 NumWebPurchases
12 NumCatalogPurchases
employed to assess the effectiveness of the chosen
13 NumStorePurchases
methodology.
14 NumWebVisitsMonth
The outcome of the Elbow method analysis illustrates a
15 AcceptedCmp1
decline in the value of cluster inertia, also known as the
16 AcceptedCmp2
sum of squared errors (SSE), as the number of clusters
17 AcceptedCmp3 increases. From the graphical representation, it becomes
18 AcceptedCmp4 apparent that potential candidates for the optimal K value
19 AcceptedCmp5 reside within the range of K = 2 and K = 8. This obser-
20 Complain vation is guided by the appearance of a discernible ‘‘el-
21 Z_CostContact bow’’ shape in the graph, where the decrease in SSE starts
22 Z_Revenue to plateau. Nonetheless, the validation process remains
23 Response essential and will involve the assessment of the other two
metrics for confirming the optimal cluster configuration.
The second validation involves employing the Silhou-
ette coefficient method, which yields the Silhouette scores
applied to the dataset for clustering purposes, process of for each cluster as presented in Table 4. This method
K-means on RFM analysis shown in Fig. 2. Subsequently, allows for a comparative assessment with DP Agustino
the outcomes of the K-means clustering are subjected to [52].
validation procedures aimed at determining the optimal The Matthews correlation coefficient (MCC) for the test
cluster value (K). This validation is executed using three data was computed as 0.88, suggesting potential challenges
distinct metrics: the Elbow method, the Silhouette coeffi- in accurately categorizing positive examples within the test
cient, and the Gap Statistics method. The graphical repre- set.
sentation of these validation results is depicted in Fig. 3 for Based on the outcomes of the clustering analysis, the
the Elbow method, Fig. 4 for the Silhouette coefficient, and findings reveal the existence of three distinct clusters that
Fig. 5 for the Gap Statistics method. Importantly, the serve as foundational categories for digital start-ups in
Matthews correlation coefficient (MCC) scorer, known for executing customer profiling strategies. The initial category
its ability to accommodate classes of varying sizes, is (designated as Cluster A) pertains to new customers,

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


5002 Neural Computing and Applications (2024) 36:4995–5005

offered each new product launch by the digital start-up.


Furthermore, to foster more robust customer loyalty, the
provision of exclusive discounts to customers within this
category presents an effective approach.
Cluster C, the third category, is composed of intermit-
tent customers. These customers display a sporadic pattern
of engagement, characterized by occasional purchases and
fluctuations in their interaction frequency. For digital start-
ups, devising targeted marketing efforts that encourage
consistent engagement from these intermittent customers
can enhance their loyalty and transform them into more
regular purchasers. Tailored promotions and personalized
offers are instrumental in motivating these customers to
establish a more enduring connection with the digital start-
up’s offerings.
Fig. 3 Elbow method

denoting those who have initiated their first purchase of 6 Conclusion


products within the digital start-up’s domain. In the context
of this research, it is imperative to augment engagement In conclusion, this research delved into the critical domain
with these new customer segments by aligning strategies of customer profiling within the context of digital start-ups.
with their specific needs, thereby enhancing the prospect of Through a comprehensive analysis of clustering algorithms
subsequent product repurchases. Particularly, for platform- and validation methodologies, we successfully identified
based Edutech start-ups, augmenting the platform with distinct customer clusters that offer invaluable insights for
enriched educational content and providing tailored teacher tailored business strategies. The utilization of K-means
support emerge as a pivotal strategy to amplify customer clustering, coupled with validation metrics such as the
relevance and sustained interest. Elbow method, Silhouette coefficient, and Gap Statistics
The second category (identified as Cluster B) encom- method, provided a robust foundation for customer
passes the best customers. This category comprises indi- segmentation.
viduals who have engaged in multiple purchases, The results revealed three primary clusters that serve as
particularly emphasizing recent transactions. Customers significant touchpoints for digital start-ups to refine their
within this cluster exhibit a pronounced potential to be customer engagement tactics. Cluster A, representing new

Fig. 4 Silhouette method

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2024) 36:4995–5005 5003

Fig. 5 Gap Statistics method

Table 4 Silhouette scores for different K values clusters, start-ups can forge more meaningful and enduring
connections, thereby fostering growth, customer satisfac-
K value Silhouette score Silhouette score
for Agustino [52] for our model tion, and long-term success. As the digital landscape con-
tinues to evolve, these insights hold the potential to guide
2 0.87 .88 start-ups toward informed decisions that resonate with their
3 0.52 0.55 customer base, fostering a symbiotic relationship between
4 0.22 0.22 innovation and consumer needs.
5 0.56 0.55
6 0.62 0.64
7 0.70 0.71 7 Future work
8 0.70 0.69
9 0.71 0.72 In the future work, more advanced methods for predicting
10 0.65 0.66 customer churn may be explored, such as weighted random
forests and hybrid models that can handle unstructured
data. This would enable the extraction of relevant attributes
for potential customer segmentation studies in the retail
customers, necessitates tailored approaches to enhance industry. As highlighted in the literature review, using
their initial experience and foster repurchase potential. In hybrid models has shown promising performance gains and
the realm of platform-based Edutech start-ups, offering could be a strategy to improve the models.
enriched learning content and personalized support emer- Artificial intelligence has the potential to revolutionize
ges as a potent strategy. Cluster B, housing the best cus- various industries by transforming existing business pro-
tomers, signifies a vital avenue for product promotion and cesses and creating new business models. Key areas of
customer loyalty enhancement. Customized incentives and focus include consumer engagement, digital manufactur-
exclusive offerings can solidify their engagement and ele- ing, smart cities, autonomous vehicles, risk management,
vate their lifetime value. Cluster C, comprising intermittent computer vision, and speech recognition. AI has already
customers, highlights an opportunity to re-engage and demonstrated positive results in a range of sectors includ-
cultivate consistency. Strategic interventions, such as tar- ing health care, law enforcement, finance, security, trade,
geted promotions and individualized incentives, can manufacturing, education, mining, and logistics.
transform intermittent customers into steady patrons.
In the broader landscape of digital start-ups, the out-
comes underscore the paramount importance of customer Funding Open access funding provided by The Science, Technology
profiling in enhancing business outcomes. By acknowl- & Innovation Funding Authority (STDF) in cooperation with The
Egyptian Knowledge Bank (EKB).
edging the nuanced requirements of different customer

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


5004 Neural Computing and Applications (2024) 36:4995–5005

Data availability The data that support the findings of this study are technology. In: Proceedings of the 6th International Conference
available from the corresponding author upon reasonable request. on Engineering & MIS 2020, pp 1–8
12. Yu L, Hermann KM, Blunsom P, Pulman S (2014) Deep learning
for answer sentence selection, arXiv preprint arXiv:1412.1632
Declarations 13. Logothetis NK, Sheinberg DL (1996) Visual object recognition.
Annu Rev Neurosci 19:577–621
14. Nurseitov D, Bostanbekov K, Abdimanap G, Abdallah A, Ali-
Conflict of interest The authors declare no conflicts of interest related
mova A, Kurmangaliyev D (2022) Application of machine
to this work.
learning methods to detect and classify core images using gan and
texture recognition, arXiv preprint arXiv:2204.14224
Open Access This article is licensed under a Creative Commons
15. Mahmoud M, Kang H-S (2023) Ganmasker: a two-stage gener-
Attribution 4.0 International License, which permits use, sharing,
ative adversarial network for high-quality face mask removal.
adaptation, distribution and reproduction in any medium or format, as
Sensors 23:7094
long as you give appropriate credit to the original author(s) and the
16. Mahmoud SA, Ahmad I, Al-Khatib WG, Alshayeb M, Parvez
source, provide a link to the Creative Commons licence, and indicate
MT, Märgner V, Fink GA (2014) Khatt: an open arabic offline
if changes were made. The images or other third party material in this
handwritten text database. Pattern Recogn 47:1096–1112
article are included in the article’s Creative Commons licence, unless
17. Nurseitov D, Bostanbekov K, Kurmankhojayev D, Alimova A,
indicated otherwise in a credit line to the material. If material is not
Abdallah A, Tolegenov R (2021) Handwritten Kazakh and Rus-
included in the article’s Creative Commons licence and your intended
sian (hkr) database for text recognition. Multimed Tools Appl
use is not permitted by statutory regulation or exceeds the permitted
80:33075–33097
use, you will need to obtain permission directly from the copyright
18. Toiganbayeva N, Kasem M, Abdimanap G, Bostanbekov K,
holder. To view a copy of this licence, visit http://creativecommons.
Abdallah A, Alimova A, Nurseitov D (2022) Kohtd: Kazakh
org/licenses/by/4.0/.
offline handwritten text dataset. Sig Process Image Commun
108:116827
19. Abdallah A, Hamada M, Nurseitov D (2020) Attention-based
References fully gated cnn-bgru for Russian handwritten text. J Imag 6:141
20. Daniyar Nurseitov GA, Kairat B, Maksat K, Anel A, Abdelrah-
1. Alsayat A (2023) Customer decision-making analysis based on man A (2020) Classification of handwritten names of cities and
big social data using machine learning: a case study of hotels in handwritten text recognition using various deep learning models.
mecca. Neural Comput Appl 35:4701–4722 Adv Sci Technol Eng Syst J 5:934–943
2. Kalkan IE, Şahin C (2023) Evaluating cross-selling opportunities 21. Karpukhin V, Oğuz B, Min S, Lewis P, Wu L, Edunov S, Chen
with recurrent neural networks on retail marketing. Neural D, Yih W-t (2020) Dense passage retrieval for open-domain
Comput Appl 35(8):6247–6263 question answering, arXiv preprint arXiv:2004.04906
3. Das S, Nayak J (2021) Customer segmentation via data mining 22. Chen D, Yih W-t (2020) Open-domain question answering. In:
techniques: state-of-the-art review. Comput Intell Data Min: Proc Proceedings of the 58th annual meeting of the association for
ICCIDM 2022:489–507 computational linguistics: tutorial abstracts, pp 34–37
4. Chorowski J, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) 23. Abdallah A, Jatowt A (2023) Generator-retriever-generator: A
Attention-based models for speech recognition, arXiv preprint novel approach to open-domain question answering, arXiv pre-
arXiv:1506.07503 print arXiv:2307.11278
5. Abdallah A, Berendeyev A, Nuradin I, Nurseitov D (2022) Tncr: 24. Abdallah A, Abdalla M, Elkasaby M, Elbendary Y, Jatowt A
table net detection and classification dataset. Neurocomputing (2023a) Amurd: annotated multilingual receipts dataset for cross-
473:79–97 lingual key information extraction and classification, arXiv pre-
6. Prasad D, Gadpal A, Kapadni K, Visave M, Sultanpure K (2020) print arXiv:2309.09800
Cascadetabnet: an approach for end to end table detection and 25. Abdallah A, Piryani B, Jatowt A (2023) Exploring the state of the
structure recognition from image-based documents. In: Proceed- art in legal qa systems, arXiv preprint arXiv:2304.06623
ings of the IEEE/CVF Conference on Computer Vision and 26. Mahmoud M, Kasem M, Abdallah A, Kang HS (2022) Ae-lstm:
Pattern Recognition Workshops, pp 572–573 autoencoder with lstm-based intrusion detection in iot, in, (2022)
7. Kasem M, Abdallah A, Berendeyev A, Elkady E, Abdalla M, International Telecommunications Conference (ITC-Egypt).
Mahmoud M, Hamada M, Nurseitov D, Taj-Eddin I (2022) Deep IEEE, pp 1–6
learning for table detection and structure recognition: a survey, 27. Xu W, Jang-Jaccard J, Singh A, Wei Y, Sabrina F (2021)
arXiv preprint arXiv:2211.08469 Improving performance of autoencoder-based network anomaly
8. Abdimanap G, Bostanbekov K, Abdallah A, Alimova A, Kur- detection on nsl-kdd dataset. IEEE Access 9:140136–140146
mangaliyev D, Nurseitov D (2022) Enhancing core image clas- 28. Akkad A, Wills G, Rezazadeh A (2023) An information security
sification using generative adversarial networks (gans), arXiv model for an iot-enabled smart grid in the saudi energy sector.
e-prints arXiv–2204 Comput Electr Eng 105:108491
9. Fakoor R, Ladhak F, Nazi A, Huber M (2013) Using deep 29. Waschneck B, Reichstaller A, Belzner L, Altenmüller T,
learning to enhance cancer diagnosis and classification. In: Pro- Bauernhansl T, Knapp A, Kyek A (2018) Optimization of global
ceedings of the international conference on machine learning, production scheduling with deep reinforcement learning. Proc
volume 28, ACM, New York, USA, pp 3937–3949 Cirp 72:1264–1269
10. Nie L, Wang M, Zhang L, Yan S, Zhang B, Chua T-S (2015) 30. Hamada MA, Abdallah A, Kasem M, Abokhalil M (2021) Neural
Disease inference from health-related questions via sparse deep network estimation model to optimize timing and schedule of
learning. IEEE Trans Knowl Data Eng 27:2107–2119 software projects. In: 2021 IEEE International Conference on
11. Abdallah A, Kasem M, Hamada MA, Sdeek S (2020) Automated Smart Information Systems and Technologies (SIST), IEEE,
question-answer medical model based on deep learning pp 1–7

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2024) 36:4995–5005 5005

31. Müller H, Hamm U (2014) Stability of market segmentation with 42. Dutta S, Bhattacharya S, Guin KK (2015) Data mining in market
cluster analysis-a methodological approach. Food Qual Prefer segmentation: a literature review and suggestions. In: Proceed-
34:70–78 ings of Fourth International Conference on Soft Computing for
32. Jiang T, Tuzhilin A (2008) Improving personalization solutions Problem Solving: SocProS 2014, Volume 1, Springer, pp 87–98
through optimal segmentation of customer bases. IEEE Trans 43. Tsao Y-C, Raj PVRP, Yu V (2019) Product substitution in dif-
Knowl Data Eng 21:305–320 ferent weights and brands considering customer segmentation and
33. Kashwan KR, Velu C (2013) Customer segmentation using panic buying behavior. Ind Mark Manage 77:209–220
clustering and data mining techniques. Int J Comput Theory Eng 44. Liu Y, Kiang M, Brusco M (2012) A unified framework for
5:856 market segmentation and its applications. Expert Syst Appl
34. Brito PQ, Soares C, Almeida S, Monte A, Byvoet M (2015) 39:10292–10302
Customer segmentation in a large database of an online cus- 45. Kim S-Y, Jung T-S, Suh E-H, Hwang H-S (2006) Customer
tomized fashion business. Robot Comput-Integr Manuf segmentation and strategy development based on customer life-
36:93–100 time value: a case study. Expert Syst Appl 31:101–107
35. He X, Li C (2016) The research and application of customer 46. Weinstein A (2013) Handbook of market segmentation: Strategic
segmentation on e-commerce websites. In: 2016 6th International targeting for business and technology firms, Routledge
Conference on Digital Home (ICDH), IEEE, pp 203–208 47. Hosseini M, Shabani M (2015) New approach to customer seg-
36. Sheshasaayee A, Logeshwari L (2017) An efficiency analysis on mentation based on changes in customer value. J Market Anal
the tpa clustering methods for intelligent customer segmentation. 3:110–121
In: 2017 International Conference on Innovative Mechanisms for 48. Swenson ER, Bastian ND, Nembhard HB (2018) Healthcare
Industry Applications (ICIMIA), IEEE, pp 784–788 market segmentation and data mining: a systematic review.
37. Ballestar MT, Grau-Carles P, Sainz J (2018) Customer segmen- Health Mark Q 35:186–208
tation in e-commerce: applications to the cashback business 49. Sánchez-Fernández R, Iniesta-Bonillo MÁ, Cervera-Taulet A
model. J Bus Res 88:407–414 (2019) Exploring the concept of perceived sustainability at tourist
38. Qadadeh W, Abdallah S (2018) Customers segmentation in the destinations: a market segmentation approach. J Travel Tour
insurance company (tic) dataset. Proc Comput Sci 144:277–290 Market 36:176–190
39. Lu Z, Peiyi W, Ping C, Xianglong L, Baoqun Z, Longfei M 50. Romdhane LB, Fadhel N, Ayeb B (2010) An efficient approach
(2019) Customer segmentation algorithm based on data mining for building customer profiles from business data. Expert Syst
for electric vehicles. In: 2019 IEEE 4th International Conference Appl 37:1573–1585
on Cloud Computing and Big Data Analysis (ICCCBDA), IEEE, 51. Tong L, Wang Y, Wen F, Li X (2017) The research of customer
pp 77–83 loyalty improvement in telecom industry based on nps data
40. Christy AJ, Umamakeswari A, Priyatharsini L, Neyaa A (2021) mining, China. Communications 14:260–268
Rfm ranking-an effective approach to customer segmentation. 52. Agustino DP, Harsemadi IG, Budaya IGBA (2022) Edutech
J King Saud Univ-Comput Inform Sci 33:1251–1257 digital start-up customer profiling based on rfm data model using
41. Pranata I, Skinner G (2015) Segmenting and targeting customers k-means clustering. J Inform Syst Inform 4:724–736
through clusters selection & analysis. In: 2015 International
Conference on Advanced Computer Science and Information Publisher’s Note Springer Nature remains neutral with regard to
Systems (ICACSIS), IEEE, pp 303–308 jurisdictional claims in published maps and institutional affiliations.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

[email protected]

You might also like