Customer Segmentation Using Machine Learning Model
Customer Segmentation Using Machine Learning Model
RESEARCH ARTICLE
Israa Lewaa1,*
1
Department of Business Administration, The British University in Egypt, Egypt
*Corresponding author: Israa Lewaa, Department of Business Administration, The
British University in Egypt, Egypt. E-mail: [email protected]
Abstract: Machine learning encompasses a diverse array of both supervised and unsupervised techniques that facilitate
prediction, classification, and anomaly detection. Among the many fields of application for such techniques, customer churn
prediction is a prominent one. In order to forecast customer switching, data scientists employ a variety of demographic, social,
transactional, and behavioral variables and attributes. Unfortunately, many businesses in the United Kingdom still lack the
comprehensive and adaptable consumer data required to perform accurate analyses. As a result, they often rely heavily on data
produced by Enterprise resource planning (ERP) systems, which is primarily transactional in nature. Consequently, businesses
are often limited to modeling and forecasting on transactional data alone and are unlikely to invest significantly in marketing
research or other customer-related sources. Businesses are often limited to performing modeling and forecasting on transactional
data that are most often not based on advanced techniques like RFM and ML. So, the major objective of the current work is to
provide a mix of machine learning and Recency, frequency and monetary (RFM) analysis techniques for churn prediction using
mostly transactional data. The dataset was taken from the dataset search website containing online retail datasets. Every
customer's Recency, frequency and monetary (RFM) scores are computed based on the available data. A churn metric that
indicates whether or not the customer has made a transaction in a limited time. Through this paper, different techniques are
compared. We used K-means and DBSCAN clustering. By the end of this paper, it may be inferred that the act of dividing
customers into six distinct clusters is a more practical and straightforward approach.
Keywords: Recurrency problem, statistical approaches, data analysis, machine learning models, artificial intelligence
© The Author(s) 2023. Published by BON VIEW PUBLISHING PTE. LTD. This is an open access article under the CC BY License (https://creativecommons.org/
1
licenses/by/4.0/).
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy
______________________________________________________________________________
recent transactions, with the end goal of identifying and RFM analysis is a useful tool for gaining valuable insights
targeting the most valuable customers for the purposes of into customer behavior. However, it is subject to certain
performing focused and precision-targeted marketing limitations. One of the limitations of RFM analysis is its
campaigns (Shihab et al., 2019; Smaili & Hachimi, 2023). lack of consideration for key factors such as customer
Each consumer is assigned numerical scores based on these demographics or the nature of the purchased items. In light
parameters, thereby rendering the analysis objective and of these limitations, there is a pressing need for a more
data-driven. RFM analysis is rooted in the well-known comprehensive approach that takes into account a broader
marketing axiom that "80% of your business comes from range of factors. By doing so, a more accurate
20% of your customers" (Alsayat, 2023; Bratina & Faganel, understanding of customers can be achieved.
2023; Chakraborty, 2023).
Furthermore, it should be noted that the usage of RFM
RFM is a strategic approach employed in the analysis and solely depends on the historical data of the customers,
estimation of a customer's worth, predicated upon the thereby implying that it might not be able to accurately
evaluation of three crucial data points, namely: Recency, forecast the future activities of the customers (Rahim et al.,
Frequency, and Monetary Value. The Recency metric is 2021; Seymen, 2020). In contrast, predictive techniques
indicative of the customer's most recent purchase, while possess the capability to unveil the potential customer
Frequency posits the question of how frequently the behavioral patterns which may remain undetected by the
customer makes purchases. Lastly, Monetary Value delves RFM analysis (Maryani et al., 2018; Mohammad et al.,
into the amount expended by the customer. 2022). This suggests that despite the utility of RFM in
analyzing customer data, its limitations in predicting future
RFM analysis is a useful tool that can furnish valuable customer behavior necessitate the incorporation of more
insights about customers and their behavior. However, it advanced predictive methods.
must be noted that this approach does not account for
various other important factors that are instrumental in For the current case study, Recency, frequency and
shaping the customer experience. For instance, in-depth monetary values are easy to calculate and understand, but
targeted marketing strategies may leverage diverse they cover only one aspect of customer behavior. In order
variables such as the type of item purchased or customer to accomplish high-quality prediction models, data
campaign responses in order to achieve better outcomes scientists need versatile data about customer needs,
(Bahari, 2015). Moreover, it is important to acknowledge opinions, socio and economic characteristics, relationship
that customer demographics, including but not limited to data, etc. In many cases such data is hard to harvest as
age, sex and ethnicity, are not taken into consideration by small and mid-sized companies don’t implement a
RFM analysis. Therefore, it is imperative for marketers to systematized approach for collecting it.
integrate a more comprehensive and nuanced approach that
accounts for a broad range of factors for a more accurate Instead of conducting an all-encompassing and exhaustive
and actionable understanding of customers. There are many analysis of the entire customer database, it would be more
ways mentioned in the literature for data integration that advisable and beneficial to categorize and divide the
may help in this case. For more information about customers based on their distinct characteristics such as
statistical data integration, see (Lewaa et al., 2021; Lewaa their age or geographical location, and subsequently
2
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy
______________________________________________________________________________
segmented group, it is possible to fashion a personalized supervised machine learning model on historical data such
and relevant offer that would be highly appealing to as customer purchase history, account activity, and
customers that possess a high-value to the business. customer service interactions. The model is subsequently
utilized to generate forecasts based on new data, including
The process of computing the recency, frequency, and whether or not a new customer is expected to churn within
monetary (RFM) scores for practical purposes necessitates a specific time frame. Customer churn is a classification
specialized analytical expertise or advanced mathematical challenge, and the machine learning model can be used to
proficiency. Additionally, like any model, the complexity classify whether a customer will churn or not.
of RFM models can range from rudimentary to
sophisticated. The process of RFM segmentation 2. Literature Review
commences by arranging clients in each of the three
categories: recency score, frequency score, and monetary Jiang and Tuzhilin (2009) have indicated that the
score. Conventionally, this is executed on a scale of 1 to 4. enhancement of marketing performances necessitates the
A score of 1 designates the uppermost 25% in each implementation of both customer segmentation and buyer
category (i.e., the most recent to transact, the most frequent targeting. These two interdependent tasks are merged into a
to transact, and those who made the most purchases), with a systematic approach, albeit faced with the challenge of
3 denoting the following 25%, and so on. By utilizing an unified optimization. Consequently, to address this issue,
RFM scoring system akin to this, one can fabricate an the authors have proposed the application of the K-
efficacious marketing strategy by creating customer RFM Classifiers Segmentation algorithm. This particular
segments. approach emphasizes the allocation of additional resources
to customers who provide higher returns to the organization.
Among the different approaches presented in the literature A multitude of authors have contributed to the literature on
for considering customer classification, we fix the data diverse methodologies for segmenting customers. In their
using Box-Cox transformation, which was not used before scholarly work, Jiang and Tuzhilin (2008) propose an
in the previous work, to ensure the data is normally innovative method for clustering customers that diverges
distributed. Besides, this study is the first study to apply from the conventional practice of relying solely on
such customer segmentation in United Kingdom. Through computed statistics. Instead, their approach taps into the
this paper, different techniques are compared. transactional data of multiple customers to achieve a more
direct clustering outcome. The authors also reveal that the
The important question raised here which is the best
task of identifying an optimal segmentation strategy is, in
machine learning algorithm for customer churn is debatable.
fact, NP-hard, and thus, necessitates the development of
Data scientists must explore and assess as many potential
various sub-optimal clustering techniques, which Tuzhilin
candidates as they can in order to choose the best one. This
thoughtfully devised. Subsequently, the authors
research proposes and evaluates a method for churn
meticulously scrutinized the customer segments that were
prediction utilizing machine learning algorithms on RFM
generated via the direct grouping method and found that
data. Different input variables have been used to evaluate a
this approach yielded far superior results compared to the
number of candidate algorithms.
traditional statistical approach.
3
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy
______________________________________________________________________________
recognized as partitional techniques. The newly proposed authors have astutely observed that consumers are indeed a
algorithm, however, does not guarantee an optimal solution heterogeneous group, each with their unique set of needs,
in all circumstances. On the other hand, it reduces the desires, and aspirations. To cater to this diversity and
cluster error criterion. According to Saurabh's observations, intricacy of customer preferences, segmentation techniques
the time required for executing the new approach decreases can be effectively employed to identify and understand
with the increase in the number of clusters, which is a their demand and expectations, which in turn facilitates the
significant improvement over traditional methods. delivery of superior services to these valuable customers.
Sheshasaayee and Logeshwari (2017) developed a novel
Cho and Moon (2013) put forth a proposal for a and innovative approach that is integrated, aimed at
recommendation system that is customized to the needs of segmentation through the use of the RFM and LTV (Life
the users. The proposed system employs the technique of Time Value) methods. This approach was carried out in
weighted frequent pattern mining, which is aimed at two distinct phases, with the first phase being a statistical
identifying the patterns that are most frequently occurring. approach and the second phase entailing the performance of
In order to identify the potential customers, the authors clustering. The primary objective of this approach was to
have carried out customer profiling through the RFM perform K-means clustering after the two-phase model and
model, which is widely used for this purpose. The proposed subsequently employ a neural network to enhance the
system utilizes varied weights for each transaction to overall segmentation process.
generate the association rules that can be obtained from the
mining process. By using the RFM model, the accuracy of Zahrotun (2017) utilized customer data obtained through
the recommendations provided to the customers can be online channels to conduct identification of the most
enhanced, thereby leading to an increase in the profits of superior customers by means of Customer Relationship
the firm. Management (CRM). This application of the CRM
paradigm in the context of online shopping allowed the
Lu et al. (2012) conducted an in-depth analysis, centered on author to effectively pinpoint potential customers through
customer churn prediction. The authors skillfully employed segmentation, which in turn contributes to the
logistic regression and effectively isolated the transactional maximization of company profits. To facilitate the accurate
data to produce an entirely novel prediction model. implementation of customer segmentation and marketing
Through their experimental implementation, the astute strategies, the Fuzzy C-Means Clustering Method was
researchers observed that customers with the highest churn employed. This methodology ultimately affords customers
value can be identified and retained through the the opportunity to receive specialized amenities across
deployment of individualized marketing strategies. On the multiple categories, all in accordance with their unique
other hand, Zhang subscribes to the notion that the needs and preferences.
identification of the root cause for customer churn behavior
and the satisfaction of individual needs is an indispensable 3. Methodology
prerequisite for the sustainable existence of any company.
The data for the experiments has been extracted from
He and Li (2016) have proposed an intricate and Online Retail Dataset. For every customer, a set of features
multifaceted three-dimensional methodology aimed at and metrics have been calculated, as “Recency” (recency
enhancing customer lifetime value (CLV), augmenting from the ending date), “Frequency” (number of transactions
customer satisfaction, and positively influencing customer till the ending date), and “Monetary” (total amount of
behavior. Through their extensive research and analysis, the
4
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy
______________________________________________________________________________
transactions till the ending date) (Chaudhary et al., 2022; 4 Quantity Quantity of product per
Gustriansyah et al., 2020). transaction
Recency, frequency, and monetary scores have been 5 Invoice Date Invoice Date and Time
Any endeavor aimed at manipulating raw data in order to of our study is for the United Kingdom. So, we select all
optimize it for subsequent data processing operations is observations for online retail related to this country. Then,
known as data preprocessing, which is an integral part of null values were checked as a step of preprocessing. A null
data preparation (Shim et al., 2012; Weng, 2017). This value, which is commonly encountered in the context of
process has consistently been recognized as a pivotal initial relational databases, represents a scenario wherein the
stage in the data mining process. Recently, data preparation value contained within a particular column is either
techniques have undergone significant modifications in unknown or missing. It is important to note that a null value
order to facilitate the training of artificial intelligence and should not be conflated with an empty string, which is
machine learning models, as well as to enable conducting characteristic of character or datetime data types, nor
inferences against these models. should it be mistaken for a zero value, which is typical for
numeric data types. Such distinctions are crucial to ensure
The process of data preprocessing involves a series of the accurate interpretation and manipulation of data in a
operations aimed at transforming data into a structured given database. In our data for the United Kingdom, we
format that can be readily processed in various data science have 133600 null values out of the total sample size of
tasks, such as machine learning and data mining, in a more 495478.
expeditious and efficient manner (Yoseph & Heikkila, 2018;
Wu et al., 2021). It is a critical step that is typically Recency, Frequency and Monetary values will be
implemented at the outset of the machine learning and AI calculated. Recency is calculated by counting how many
development pipeline to guarantee dependable findings. days exist from the maximum date for each customer to the
maximum invoice date. Frequency factor is then calculated
Table 1 Online retail dataset description by considering the number of customer ID is repeated.
Moreover, the total price is calculated for being able to
No Attribute Name Description calculate Monetary. Total price for each customer is
calculated by multiplying quantity by unit price. Then, the
1 Invoice Number 6-digit unique number for
monetary for each customer is calculated by summing the
each
total price for each customer till the end date.
2 Stock Code 5-digit unique number for
each product So, the fitness of models to study the behavior of the
customer based on various sets of input factors has been
5
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy
______________________________________________________________________________
explored through a number of tests using various input Before conducting the clustering process, an investigation was
variables: done for the distribution of Recency, Frequency and Monetary
values.
I) Recency, Frequency and Monetary values;
II) R. F, M, RFM scores;
III) R. F, M, RFM scores, Count of Objects per every
customer.
The above various inputs are created for the two methods
for comparing them together and reach to the best method
for segmenting type of customers. Figure 1 Box plot for Recency values
1 Potential
3 At risk
4 Lost
4. Results
Through this section, data visualization is presented. Also,
the normality issue is solved using the Box-cox
transformation. In addition, the results after applying a
It is very clear from Figure1, Figure2 and Figure 3, that we
number of machine learning approaches, k-means, and
have outliers. With the use of the statistical method known
DBSCAN are presented. Also, a comparison between these
as the Box-Cox transformation, our goal variable is
methods is considered.
changed such that your data closely resembles a normal
distribution.
4.1. Data visualization
6
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy
______________________________________________________________________________
propensity to either procure or utilize the product at a future
instance. In the same manner, Frequency and Monetary
took scores from 1 to 4 where the first quarter of the values
of Frequency and Monetary and takes 4 and the fourth
quarter takes 1. Frequency distribution for RFM scoring
after considering q-cut method is summarized in Table 3.
Figure 4 3D for the three values after correcting the Table 3 Frequency distribution for RFM scoring
outlier problem
RFM scoring Frequency
444 186
111 169
344 105
433 100
211 96
…
From Figure 4, we can notice that there are few outliers that
exist which may affect on the results of clustering. Outliers 441 8
are observations that deviate significantly from the norm,
either in a positive or negative direction. These aberrant
data points can exert a disproportionate impact on statistical 241 7
outcomes, particularly on measures of central tendency
such as the mean, thereby potentially leading to erroneous
431 7
inferences and conclusions. So, we remove these outliers
and run our clustering methods without it.
314 6
414 2
4.2. Approach Q-CUT
R, F, M and RFM scores are imputed according to Q-CUT
method and after obtaining Recency, Frequency and
Monetary values. Each score for Recency took scores from 4.3. Clustering approach using k-means
1 to 4. The first quarter of the values of Recency takes 1
and the fourth quarter takes 4. Therefore, all values of 1 of The primary objective of the k-means clustering, which is
these scores represented the customers with the highest a widely used vector quantization approach that emerged
category (i.e 111) and has made a transactional acquisition in the signal processing domain, is to divide n
shall continue to retain the said product in their cognitive observations into k clusters, wherein each observation is
faculties, thereby increasing the probability of their assigned to the cluster that has the closest mean value
7
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy
______________________________________________________________________________
(also known as cluster centroid or cluster center). The question is assigned to the corresponding centroid, utilizing
outcome of this process is the creation of Voronoi cells, an efficient and effective assignment mechanism. Finally,
which partition the data space. It is worth noting that the the preceding steps are then meticulously repeated, with the
geometric median is the only measure that minimizes ultimate goal being to obtain an optimal clustering solution.
Euclidean distances. Nevertheless, k-means clustering is The entire process is carefully monitored and is only halted
effective in minimizing within-cluster variances (squared once it is determined that the clusters obtained are identical
Euclidean distances) rather than regular Euclidean to those obtained in the previous iteration (Sinaga & Yang,
distances, which is a more complex Weber problem. 2020).
Consequently, k-medians and k-medoids can be employed
to obtain superior Euclidean solutions (Ahmed et al., The input for this analysis is the customer dataset
8
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy
______________________________________________________________________________
The present algorithmic methodology for DBSCAN
clustering initiates by randomly selecting a data point
within the dataset and iteratively repeating this process
until all points have been visited. If a minimum
number of 'minPoint' data points exist within a
specified radius of 'ε' to the chosen point, then all such
data points are deemed to belong to the same cluster.
Finally, The process involves the iterative expansion
of the clusters through the repeated computation of the
surrounding area for each adjacent point.
9
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy
______________________________________________________________________________
a transformation to resemble a normal distribution and this
step improves the results of the study.
5. Conclusion
Conflicts of Interest
The major objective of the current work is to provide a mix
of machine learning and Recency, frequency and monetary The author declares that she has no conflicts of interest
(RFM) analysis techniques for churn prediction using to this work.
mostly transactional data. The dataset was taken from the
online retail dataset. Every customer's Recency, frequency References
and monetary (RFM) scores are computed based on the
Ahmed, M., Seraj, R., & Islam, S. M. S. (2020). The k-
available data. A churn metric that indicates whether or not means algorithm: A comprehensive survey and
the customer has made a transaction in a limited time. This performance evaluation. Electronics, 9(8), 1295.
is the first study among different approaches presented in Alsayat, A. (2023). Customer decision-making analysis
the literature to consider customer segmentation to deal based on big social data using machine learning: a case
study of hotels in Mecca. Neural Computing and
with the outlier using Box-Cox transformation. Besides, Applications, 35(6), 4701-4722.
this study is the first study to apply such customer
Bahari, T. F., & Elayidom, M. S. (2015). An efficient
segmentation in the United Kingdom. Moreover, we make CRM-data mining framework for the prediction of
10
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy
______________________________________________________________________________
customer behaviour. Procedia computer science, 46, 725- Li, X. Q., Song, L. K., & Bai, G. C. (2022). Deep learning
731. regression-based stratified probabilistic combined cycle
fatigue damage evaluation for turbine bladed
Bratina, D., & Faganel, A. (2023). Using Supervised disks. International Journal of Fatigue, 159, 106812.
Machine Learning Methods for RFM Segmentation: A
Casino Direct Marketing Communication Case. Market- Lu, N., Lin, H., Lu, J., & Zhang, G. (2012). A customer
Tržište, 35(1), 7-22. churn prediction model in telecom industry using
boosting. IEEE Transactions on Industrial
Chakraborty, A., Mitra, S., Bhattacharjee, M., De, D., & Informatics, 10(2), 1659-1665.
Pal, A. J. (2023). Determining human-coronavirus protein-
protein interaction using machine intelligence. Medicine in Maryani, I., Riana, D., Astuti, R. D., Ishaq, A., & Pratama,
Novel Technology and Devices, 18, 100228. E. A. (2018, October). Customer segmentation based on
RFM model and clustering techniques with K-means
Chaudhary, P., Kalra, V., & Sharma, S. (2022, April). A algorithm. In 2018 IEEE Third International Conference on
hybrid machine learning approach for customer Informatics and Computing (ICIC) (pp. 1-6).
segmentation using rfm analysis. In International
Conference on Artificial Intelligence and Sustainable Mohammad, J., & Kashem, M. A. (2022, April). Air
Engineering: Select Proceedings of AISE 2020 (pp. 87-100). Pollution Comparison RFM Model Using Machine
Learning Approach. In 2022 IEEE 7th International
Cho, Y. S., & Moon, S. C. (2013). Weighted mining conference for Convergence in Technology (I2CT) (pp. 1-5).
frequent pattern based customer’s RFM score for
personalized u-commerce recommendation Rahim, M. A., Mushafiq, M., Khan, S., & Arain, Z. A.
system. JoC, 4(4), 36-40. (2021). RFM-based repurchase behavior for customer
classification and segmentation. Journal of Retailing and
Gustriansyah, R., Suhandi, N., & Antony, F. (2020). Consumer Services, 61, 102566.
Clustering optimization in RFM analysis based on k-
means. Indonesian Journal of Electrical Engineering and Seymen, O. F., Dogan, O., & Hiziroglu, A. (2020,
Computer Science, 18(1), 470-477. December). Customer churn prediction using deep learning.
In International Conference on Soft Computing and Pattern
He, X., & Li, C. (2016, December). The research and Recognition (pp. 520-529).
application of customer segmentation on e-commerce
websites. In 2016 IEEE 6th International Conference on Shah, S., & Singh, M. (2012, May). Comparison of a time
Digital Home (ICDH) (pp. 203-208). efficient modified K-mean algorithm with K-mean and K-
medoid algorithm. In 2012 IEEE international conference
Jiang, T., & Tuzhilin, A. (2008). Improving personalization on communication systems and network technologies (pp.
solutions through optimal segmentation of customer 435-437).
bases. IEEE transactions on knowledge and data
engineering, 21(3), 305-320. Sheshasaayee, A., & Logeshwari, L. (2017, February). An
efficiency analysis on the TPA clustering methods for
Joung, J., & Kim, H. (2023). Interpretable machine intelligent customer segmentation. In 2017 IEEE
learning-based approach for customer segmentation for International Conference on Innovative Mechanisms for
new product development from online product Industry Applications (ICIMIA) (pp. 784-788).
reviews. International Journal of Information
Management, 70, 102641. Shihab, S. H., Afroge, S., & Mishu, S. Z. (2019, February).
RFM based market segmentation approach using advanced
Khajvand, M., Zolfaghar, K., Ashoori, S., & Alizadeh, S. k-means and agglomerative clustering: a comparative study.
(2011). Estimating customer lifetime value based on RFM In 2019 IEEE International Conference on Electrical,
analysis of customer purchase behavior: Case Computer and Communication Engineering (ECCE) (pp. 1-
study. Procedia computer science, 3, 57-63. 4).
Lewaa, I., Hafez, M. S., & Ismail, M. A. (2021). Data Shim, B., Choi, K., & Suh, Y. (2012). CRM strategies for a
integration using statistical matching techniques: A small-sized online shopping mall based on association rules
review. Statistical Journal of the IAOS, 37(4), 1391-1410. and sequential patterns. Expert Systems with
Applications, 39(9), 7736-7742.
Lewaa, I., Hafez, M. S., & Ismail, M. A. (2023). Mixed
Shirole, R., Salokhe, L., & Jadhav, S. (2021). Customer
Statistical Matching Approaches Using a Latent Class
segmentation using rfm model and k-means clustering. Int.
Model: Simulation Studies. Journal of statistics
J. Sci. Res. Sci. Technol, 8, 591-597.
Applications and probability.
Sinaga, K. P., & Yang, M. S. (2020). Unsupervised K-
means clustering algorithm. IEEE access, 8, 80716-80727.
11
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy
______________________________________________________________________________
Wu, Z., Zang, C., Wu, C. H., Deng, Z., Shao, X., & Liu, W.
(2021). Improving customer value index and consumption
forecasts using a weighted RFM model and machine
learning algorithms. Journal of Global Information
Management (JGIM), 30(3), 1-23.
How to Cite
Lewaa, I. (2023). Customer Segmentation Using Machine
Learning Model: An Application of RFM Analysis. Journal
of Data Science and Intelligent Systems.
https://doi.org/10.47852/bonviewJDSIS32021293
12