0% found this document useful (0 votes)

41 views

Customer Segmentation Using Machine Learning Model

The document discusses using machine learning and RFM analysis techniques to predict customer churn. RFM analysis involves quantifying customers based on recency, frequency, and monetary value of purchases. The study aims to address limitations of RFM by incorporating additional variables and predictive modeling. K-means and DBSCAN clustering are applied to segment customers into distinct groups for targeted marketing.

Uploaded by

deepakachu5114

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Customer Segmentation Using Machine Learning Model

Uploaded by

deepakachu5114

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Received: 29 June 2023 | Revised: 21 August 2023 | Accepted: 6 September 2023 | Published online: 8 September 2023

RESEARCH ARTICLE

Customer Segmentation Using Journal of Data Science and Intelligent Systems

yyyy, Vol. XX(XX) 1–5
Machine Learning Model: An DOI: 10.47852/bonviewJDSIS32021293
Application of RFM Analysis

Israa Lewaa1,*
1
Department of Business Administration, The British University in Egypt, Egypt
*Corresponding author: Israa Lewaa, Department of Business Administration, The
British University in Egypt, Egypt. E-mail: [email protected]

Abstract: Machine learning encompasses a diverse array of both supervised and unsupervised techniques that facilitate
prediction, classification, and anomaly detection. Among the many fields of application for such techniques, customer churn
prediction is a prominent one. In order to forecast customer switching, data scientists employ a variety of demographic, social,
transactional, and behavioral variables and attributes. Unfortunately, many businesses in the United Kingdom still lack the
comprehensive and adaptable consumer data required to perform accurate analyses. As a result, they often rely heavily on data
produced by Enterprise resource planning (ERP) systems, which is primarily transactional in nature. Consequently, businesses
are often limited to modeling and forecasting on transactional data alone and are unlikely to invest significantly in marketing
research or other customer-related sources. Businesses are often limited to performing modeling and forecasting on transactional
data that are most often not based on advanced techniques like RFM and ML. So, the major objective of the current work is to
provide a mix of machine learning and Recency, frequency and monetary (RFM) analysis techniques for churn prediction using
mostly transactional data. The dataset was taken from the dataset search website containing online retail datasets. Every
customer's Recency, frequency and monetary (RFM) scores are computed based on the available data. A churn metric that
indicates whether or not the customer has made a transaction in a limited time. Through this paper, different techniques are
compared. We used K-means and DBSCAN clustering. By the end of this paper, it may be inferred that the act of dividing
customers into six distinct clusters is a more practical and straightforward approach.

Keywords: Recurrency problem, statistical approaches, data analysis, machine learning models, artificial intelligence

1. Introduction comprehending and assessing consumer behavior

predicated on purchase. The RFM technique functions by
The Recency, Frequency, Monetary (RFM) analysis tool is
quantitatively categorizing and classifying patrons based on
a relatively straightforward but highly effective means of
the recency, frequency, and monetary total of their most

© The Author(s) 2023. Published by BON VIEW PUBLISHING PTE. LTD. This is an open access article under the CC BY License (https://creativecommons.org/
1
licenses/by/4.0/).
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy

______________________________________________________________________________
recent transactions, with the end goal of identifying and RFM analysis is a useful tool for gaining valuable insights
targeting the most valuable customers for the purposes of into customer behavior. However, it is subject to certain
performing focused and precision-targeted marketing limitations. One of the limitations of RFM analysis is its
campaigns (Shihab et al., 2019; Smaili & Hachimi, 2023). lack of consideration for key factors such as customer
Each consumer is assigned numerical scores based on these demographics or the nature of the purchased items. In light
parameters, thereby rendering the analysis objective and of these limitations, there is a pressing need for a more
data-driven. RFM analysis is rooted in the well-known comprehensive approach that takes into account a broader
marketing axiom that "80% of your business comes from range of factors. By doing so, a more accurate
20% of your customers" (Alsayat, 2023; Bratina & Faganel, understanding of customers can be achieved.
2023; Chakraborty, 2023).
Furthermore, it should be noted that the usage of RFM
RFM is a strategic approach employed in the analysis and solely depends on the historical data of the customers,
estimation of a customer's worth, predicated upon the thereby implying that it might not be able to accurately
evaluation of three crucial data points, namely: Recency, forecast the future activities of the customers (Rahim et al.,
Frequency, and Monetary Value. The Recency metric is 2021; Seymen, 2020). In contrast, predictive techniques
indicative of the customer's most recent purchase, while possess the capability to unveil the potential customer
Frequency posits the question of how frequently the behavioral patterns which may remain undetected by the
customer makes purchases. Lastly, Monetary Value delves RFM analysis (Maryani et al., 2018; Mohammad et al.,
into the amount expended by the customer. 2022). This suggests that despite the utility of RFM in
analyzing customer data, its limitations in predicting future
RFM analysis is a useful tool that can furnish valuable customer behavior necessitate the incorporation of more
insights about customers and their behavior. However, it advanced predictive methods.
must be noted that this approach does not account for
various other important factors that are instrumental in For the current case study, Recency, frequency and
shaping the customer experience. For instance, in-depth monetary values are easy to calculate and understand, but
targeted marketing strategies may leverage diverse they cover only one aspect of customer behavior. In order
variables such as the type of item purchased or customer to accomplish high-quality prediction models, data
campaign responses in order to achieve better outcomes scientists need versatile data about customer needs,
(Bahari, 2015). Moreover, it is important to acknowledge opinions, socio and economic characteristics, relationship
that customer demographics, including but not limited to data, etc. In many cases such data is hard to harvest as
age, sex and ethnicity, are not taken into consideration by small and mid-sized companies don’t implement a
RFM analysis. Therefore, it is imperative for marketers to systematized approach for collecting it.
integrate a more comprehensive and nuanced approach that
accounts for a broad range of factors for a more accurate Instead of conducting an all-encompassing and exhaustive

and actionable understanding of customers. There are many analysis of the entire customer database, it would be more

ways mentioned in the literature for data integration that advisable and beneficial to categorize and divide the

may help in this case. For more information about customers based on their distinct characteristics such as

statistical data integration, see (Lewaa et al., 2021; Lewaa their age or geographical location, and subsequently

et al., 2023). segregate them into a customer group. Through the

implementation of a meticulous and well-structured
marketing campaign that is specifically tailored to each

2
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy

______________________________________________________________________________
segmented group, it is possible to fashion a personalized supervised machine learning model on historical data such
and relevant offer that would be highly appealing to as customer purchase history, account activity, and
customers that possess a high-value to the business. customer service interactions. The model is subsequently
utilized to generate forecasts based on new data, including
The process of computing the recency, frequency, and whether or not a new customer is expected to churn within
monetary (RFM) scores for practical purposes necessitates a specific time frame. Customer churn is a classification
specialized analytical expertise or advanced mathematical challenge, and the machine learning model can be used to
proficiency. Additionally, like any model, the complexity classify whether a customer will churn or not.
of RFM models can range from rudimentary to
sophisticated. The process of RFM segmentation 2. Literature Review
commences by arranging clients in each of the three
categories: recency score, frequency score, and monetary Jiang and Tuzhilin (2009) have indicated that the
score. Conventionally, this is executed on a scale of 1 to 4. enhancement of marketing performances necessitates the
A score of 1 designates the uppermost 25% in each implementation of both customer segmentation and buyer
category (i.e., the most recent to transact, the most frequent targeting. These two interdependent tasks are merged into a
to transact, and those who made the most purchases), with a systematic approach, albeit faced with the challenge of
3 denoting the following 25%, and so on. By utilizing an unified optimization. Consequently, to address this issue,
RFM scoring system akin to this, one can fabricate an the authors have proposed the application of the K-
efficacious marketing strategy by creating customer RFM Classifiers Segmentation algorithm. This particular
segments. approach emphasizes the allocation of additional resources
to customers who provide higher returns to the organization.
Among the different approaches presented in the literature A multitude of authors have contributed to the literature on
for considering customer classification, we fix the data diverse methodologies for segmenting customers. In their
using Box-Cox transformation, which was not used before scholarly work, Jiang and Tuzhilin (2008) propose an
in the previous work, to ensure the data is normally innovative method for clustering customers that diverges
distributed. Besides, this study is the first study to apply from the conventional practice of relying solely on
such customer segmentation in United Kingdom. Through computed statistics. Instead, their approach taps into the
this paper, different techniques are compared. transactional data of multiple customers to achieve a more
direct clustering outcome. The authors also reveal that the
The important question raised here which is the best
task of identifying an optimal segmentation strategy is, in
machine learning algorithm for customer churn is debatable.
fact, NP-hard, and thus, necessitates the development of
Data scientists must explore and assess as many potential
various sub-optimal clustering techniques, which Tuzhilin
candidates as they can in order to choose the best one. This
thoughtfully devised. Subsequently, the authors
research proposes and evaluates a method for churn
meticulously scrutinized the customer segments that were
prediction utilizing machine learning algorithms on RFM
generated via the direct grouping method and found that
data. Different input variables have been used to evaluate a
this approach yielded far superior results compared to the
number of candidate algorithms.
traditional statistical approach.

Customer churn prediction involves utilizing machine

Shah and Singh (2012) have introduced a novel clustering
learning techniques to determine which customers are most
approach that shares similarities with the K-means
likely to discontinue their business relationship with a
algorithm and K-medoids algorithms. These algorithms are
company. This is typically accomplished by training a

3
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy

______________________________________________________________________________
recognized as partitional techniques. The newly proposed authors have astutely observed that consumers are indeed a
algorithm, however, does not guarantee an optimal solution heterogeneous group, each with their unique set of needs,
in all circumstances. On the other hand, it reduces the desires, and aspirations. To cater to this diversity and
cluster error criterion. According to Saurabh's observations, intricacy of customer preferences, segmentation techniques
the time required for executing the new approach decreases can be effectively employed to identify and understand
with the increase in the number of clusters, which is a their demand and expectations, which in turn facilitates the
significant improvement over traditional methods. delivery of superior services to these valuable customers.
Sheshasaayee and Logeshwari (2017) developed a novel
Cho and Moon (2013) put forth a proposal for a and innovative approach that is integrated, aimed at
recommendation system that is customized to the needs of segmentation through the use of the RFM and LTV (Life
the users. The proposed system employs the technique of Time Value) methods. This approach was carried out in
weighted frequent pattern mining, which is aimed at two distinct phases, with the first phase being a statistical
identifying the patterns that are most frequently occurring. approach and the second phase entailing the performance of
In order to identify the potential customers, the authors clustering. The primary objective of this approach was to
have carried out customer profiling through the RFM perform K-means clustering after the two-phase model and
model, which is widely used for this purpose. The proposed subsequently employ a neural network to enhance the
system utilizes varied weights for each transaction to overall segmentation process.
generate the association rules that can be obtained from the
mining process. By using the RFM model, the accuracy of Zahrotun (2017) utilized customer data obtained through
the recommendations provided to the customers can be online channels to conduct identification of the most
enhanced, thereby leading to an increase in the profits of superior customers by means of Customer Relationship
the firm. Management (CRM). This application of the CRM
paradigm in the context of online shopping allowed the
Lu et al. (2012) conducted an in-depth analysis, centered on author to effectively pinpoint potential customers through
customer churn prediction. The authors skillfully employed segmentation, which in turn contributes to the
logistic regression and effectively isolated the transactional maximization of company profits. To facilitate the accurate
data to produce an entirely novel prediction model. implementation of customer segmentation and marketing
Through their experimental implementation, the astute strategies, the Fuzzy C-Means Clustering Method was
researchers observed that customers with the highest churn employed. This methodology ultimately affords customers
value can be identified and retained through the the opportunity to receive specialized amenities across
deployment of individualized marketing strategies. On the multiple categories, all in accordance with their unique
other hand, Zhang subscribes to the notion that the needs and preferences.
identification of the root cause for customer churn behavior
and the satisfaction of individual needs is an indispensable 3. Methodology
prerequisite for the sustainable existence of any company.
The data for the experiments has been extracted from
He and Li (2016) have proposed an intricate and Online Retail Dataset. For every customer, a set of features
multifaceted three-dimensional methodology aimed at and metrics have been calculated, as “Recency” (recency
enhancing customer lifetime value (CLV), augmenting from the ending date), “Frequency” (number of transactions
customer satisfaction, and positively influencing customer till the ending date), and “Monetary” (total amount of
behavior. Through their extensive research and analysis, the

4
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy

______________________________________________________________________________
transactions till the ending date) (Chaudhary et al., 2022; 4 Quantity Quantity of product per
Gustriansyah et al., 2020). transaction

Recency, frequency, and monetary scores have been 5 Invoice Date Invoice Date and Time

calculated with its RFM analysis feature. RFM scores are

6 Unit Price Product price per unit
calculated using both nested and independent binning with
4 bins. A modification of RFM analysis could be done by 7 Customer ID 5-digit unique number for
using K-means clustering instead of this classic approach. each customer
This approach is compared with the classic one (Li et al.,
2022; Shirole et al., 2021; Wu et al., 2021). The application 8 Country Country Name

of machine learning algorithms was performed in Python

(Khajvand et al., 2011; Joung & Kim, 2023).
Based on the data description provided in Table 1, the aim

Any endeavor aimed at manipulating raw data in order to of our study is for the United Kingdom. So, we select all

optimize it for subsequent data processing operations is observations for online retail related to this country. Then,

known as data preprocessing, which is an integral part of null values were checked as a step of preprocessing. A null

data preparation (Shim et al., 2012; Weng, 2017). This value, which is commonly encountered in the context of

process has consistently been recognized as a pivotal initial relational databases, represents a scenario wherein the

stage in the data mining process. Recently, data preparation value contained within a particular column is either

techniques have undergone significant modifications in unknown or missing. It is important to note that a null value

order to facilitate the training of artificial intelligence and should not be conflated with an empty string, which is

machine learning models, as well as to enable conducting characteristic of character or datetime data types, nor

inferences against these models. should it be mistaken for a zero value, which is typical for
numeric data types. Such distinctions are crucial to ensure
The process of data preprocessing involves a series of the accurate interpretation and manipulation of data in a
operations aimed at transforming data into a structured given database. In our data for the United Kingdom, we
format that can be readily processed in various data science have 133600 null values out of the total sample size of
tasks, such as machine learning and data mining, in a more 495478.
expeditious and efficient manner (Yoseph & Heikkila, 2018;
Wu et al., 2021). It is a critical step that is typically Recency, Frequency and Monetary values will be

implemented at the outset of the machine learning and AI calculated. Recency is calculated by counting how many

development pipeline to guarantee dependable findings. days exist from the maximum date for each customer to the
maximum invoice date. Frequency factor is then calculated
Table 1 Online retail dataset description by considering the number of customer ID is repeated.
Moreover, the total price is calculated for being able to
No Attribute Name Description calculate Monetary. Total price for each customer is
calculated by multiplying quantity by unit price. Then, the
1 Invoice Number 6-digit unique number for
monetary for each customer is calculated by summing the
each
total price for each customer till the end date.
2 Stock Code 5-digit unique number for
each product So, the fitness of models to study the behavior of the
customer based on various sets of input factors has been

5
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy

______________________________________________________________________________
explored through a number of tests using various input Before conducting the clustering process, an investigation was
variables: done for the distribution of Recency, Frequency and Monetary
values.
I) Recency, Frequency and Monetary values;
II) R. F, M, RFM scores;
III) R. F, M, RFM scores, Count of Objects per every
customer.
The above various inputs are created for the two methods
for comparing them together and reach to the best method
for segmenting type of customers. Figure 1 Box plot for Recency values

Each customer is allocated with a triumvirate of discrete

scores for the temporal proximity, frequency, and monetary
variables. The act of scoring is executed on a graduated
scale from 1 to 4. The uppermost percentile is bestowed
with a score of 4, and the remaining patrons are awarded
scores of 3, 2, and 1 correspondingly. The scores are
capable of being assumed to possess idiosyncratic attributes
as itemized in Table 2.

Table 2 Scores of RFM

Figure 2 Box plot for Frequency values
Score Characteristics

1 Potential

2 Can’t lose them

3 At risk

4 Lost

Ultimately, all of the customers are furnished with scores

ranging from 444, 443, to 111. The customers who obtain a Figure 3 Box plot for Monetary values
score of 111 may be referred to as the potential customer of
the entity as they are anticipated to furnish a greater
quantum of proceeds to the entity, and conversely, the
customer possesses a score of 444. Based on this RFM
score, each customer can be sorted by its own segmentation.

4. Results
Through this section, data visualization is presented. Also,
the normality issue is solved using the Box-cox
transformation. In addition, the results after applying a
It is very clear from Figure1, Figure2 and Figure 3, that we
number of machine learning approaches, k-means, and
have outliers. With the use of the statistical method known
DBSCAN are presented. Also, a comparison between these
as the Box-Cox transformation, our goal variable is
methods is considered.
changed such that your data closely resembles a normal
distribution.
4.1. Data visualization

6
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy

______________________________________________________________________________
propensity to either procure or utilize the product at a future
instance. In the same manner, Frequency and Monetary
took scores from 1 to 4 where the first quarter of the values
of Frequency and Monetary and takes 4 and the fourth
quarter takes 1. Frequency distribution for RFM scoring
after considering q-cut method is summarized in Table 3.

Figure 4 3D for the three values after correcting the Table 3 Frequency distribution for RFM scoring
outlier problem
RFM scoring Frequency

444 186

111 169

344 105

433 100

211 96

…
From Figure 4, we can notice that there are few outliers that
exist which may affect on the results of clustering. Outliers 441 8
are observations that deviate significantly from the norm,
either in a positive or negative direction. These aberrant
data points can exert a disproportionate impact on statistical 241 7
outcomes, particularly on measures of central tendency
such as the mean, thereby potentially leading to erroneous
431 7
inferences and conclusions. So, we remove these outliers
and run our clustering methods without it.
314 6

414 2
4.2. Approach Q-CUT
R, F, M and RFM scores are imputed according to Q-CUT
method and after obtaining Recency, Frequency and
Monetary values. Each score for Recency took scores from 4.3. Clustering approach using k-means
1 to 4. The first quarter of the values of Recency takes 1
and the fourth quarter takes 4. Therefore, all values of 1 of The primary objective of the k-means clustering, which is
these scores represented the customers with the highest a widely used vector quantization approach that emerged
category (i.e 111) and has made a transactional acquisition in the signal processing domain, is to divide n
shall continue to retain the said product in their cognitive observations into k clusters, wherein each observation is
faculties, thereby increasing the probability of their assigned to the cluster that has the closest mean value

7
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy

______________________________________________________________________________
(also known as cluster centroid or cluster center). The question is assigned to the corresponding centroid, utilizing
outcome of this process is the creation of Voronoi cells, an efficient and effective assignment mechanism. Finally,
which partition the data space. It is worth noting that the the preceding steps are then meticulously repeated, with the
geometric median is the only measure that minimizes ultimate goal being to obtain an optimal clustering solution.
Euclidean distances. Nevertheless, k-means clustering is The entire process is carefully monitored and is only halted
effective in minimizing within-cluster variances (squared once it is determined that the clusters obtained are identical
Euclidean distances) rather than regular Euclidean to those obtained in the previous iteration (Sinaga & Yang,
distances, which is a more complex Weber problem. 2020).
Consequently, k-medians and k-medoids can be employed
to obtain superior Euclidean solutions (Ahmed et al., The input for this analysis is the customer dataset

2020). containing ‘n’ instances k, which is the number of clusters.

The output is the customer data Partitioned to k clusters.
In the realm of cluster analysis, which is a widely-used
technique for grouping objects into similar subsets in an
unsupervised manner, the elbow method has emerged as a
popular heuristic for determining the optimal number of
Figure 5 The elbow method for the second method
clusters in a given data set. The essence of this approach
involves generating a plot of the explained variation,
which is a measure of the total variance accounted for by
the clustering algorithm, as a function of the number of
clusters considered. Subsequently, the point on the curve
where a noticeable change in the slope occurs, known as
the elbow of the curve, is selected as the most appropriate
number of clusters to employ. It is worth mentioning that
this method can be extended to other data-driven models,
such as principal component analysis, where one aims to
capture the maximum amount of variance in a data set
using a smaller set of variables, by utilizing the same Based on Figure 5, the best number of clustering is three.
So, we are going to make customer segmentation to Figure
underlying principle of identifying the elbow point on the 6, Figure 7 and Figure 8.
relevant curve.

The initial stage of the clustering algorithm is to select a set

Figure 6 Boxplot within clusters of customers among
of k random points carefully as the initial centroids, based Recency values
on a predetermined value of k. Then, each and every data
point in the dataset is subjected to a thorough evaluation
process, whereby the Euclidian distance between the data
point and the previously chosen centroids is meticulously
computed. As a third step, upon completion of the distance
evaluation process, the computed values are carefully
compared to determine which centroid has the shortest
Euclidian distance value, after which the data point in

8
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy

______________________________________________________________________________
The present algorithmic methodology for DBSCAN
clustering initiates by randomly selecting a data point
within the dataset and iteratively repeating this process
until all points have been visited. If a minimum
number of 'minPoint' data points exist within a
specified radius of 'ε' to the chosen point, then all such
data points are deemed to belong to the same cluster.
Finally, The process involves the iterative expansion
of the clusters through the repeated computation of the
surrounding area for each adjacent point.

Figure 7 Boxplot within clusters of customers among

Frequency values

Display 1 DBSCAN clustering

Figure 9 2-D clustering using DBSCAN

Figure 8 Boxplot within clusters of customers among

Monetary values

Figure 10 3-D clustering using DBSCAN

4.4. Clustering approach using DBSCAN

9
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy

______________________________________________________________________________
a transformation to resemble a normal distribution and this
step improves the results of the study.

Through this paper, different techniques are compared. We

used K-means and DBSCAN clustering. By the end of this
paper, it may be inferred that the act of dividing customers
into six distinct clusters is a more practical and
straightforward approach. This is further substantiated by
the evident separation of the groups as depicted in the
various clustering method plots. Consequently, it is now
incumbent upon Marketing Managers and Customer Insight
teams to deliberate on the most effective mode of
Figure 11 Clustering using DBSCAN among frequency
communication or promotional strategy to be utilized, with
the intention of converting individuals from one segment to
another or potentially directing more customers toward a
new segment, ideally positioned at the top right corner of
the plots.

In future work, data scientists can improve the prediction

process and produce a more accurate estimate of customer
turnover by including more pertinent input variables based
on the subject area. On the other hand, this would provide
customer relationship strategists a competitive advantage to
keep their profitable clients and reduce unwelcome
turnover. Also, future studies should consider other
Through Figure 11, it may be inferred that the act of
dividing customers into six distinct clusters is a more classification methods like Decision Trees SVM, Neural
practical and straightforward approach. Networks, and Logistic Regression.

5. Conclusion
Conflicts of Interest
The major objective of the current work is to provide a mix
of machine learning and Recency, frequency and monetary The author declares that she has no conflicts of interest
(RFM) analysis techniques for churn prediction using to this work.
mostly transactional data. The dataset was taken from the
online retail dataset. Every customer's Recency, frequency References
and monetary (RFM) scores are computed based on the
Ahmed, M., Seraj, R., & Islam, S. M. S. (2020). The k-
available data. A churn metric that indicates whether or not means algorithm: A comprehensive survey and
the customer has made a transaction in a limited time. This performance evaluation. Electronics, 9(8), 1295.

is the first study among different approaches presented in Alsayat, A. (2023). Customer decision-making analysis
the literature to consider customer segmentation to deal based on big social data using machine learning: a case
study of hotels in Mecca. Neural Computing and
with the outlier using Box-Cox transformation. Besides, Applications, 35(6), 4701-4722.
this study is the first study to apply such customer
Bahari, T. F., & Elayidom, M. S. (2015). An efficient
segmentation in the United Kingdom. Moreover, we make CRM-data mining framework for the prediction of

10
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy

______________________________________________________________________________
customer behaviour. Procedia computer science, 46, 725- Li, X. Q., Song, L. K., & Bai, G. C. (2022). Deep learning
731. regression-based stratified probabilistic combined cycle
fatigue damage evaluation for turbine bladed
Bratina, D., & Faganel, A. (2023). Using Supervised disks. International Journal of Fatigue, 159, 106812.
Machine Learning Methods for RFM Segmentation: A
Casino Direct Marketing Communication Case. Market- Lu, N., Lin, H., Lu, J., & Zhang, G. (2012). A customer
Tržište, 35(1), 7-22. churn prediction model in telecom industry using
boosting. IEEE Transactions on Industrial
Chakraborty, A., Mitra, S., Bhattacharjee, M., De, D., & Informatics, 10(2), 1659-1665.
Pal, A. J. (2023). Determining human-coronavirus protein-
protein interaction using machine intelligence. Medicine in Maryani, I., Riana, D., Astuti, R. D., Ishaq, A., & Pratama,
Novel Technology and Devices, 18, 100228. E. A. (2018, October). Customer segmentation based on
RFM model and clustering techniques with K-means
Chaudhary, P., Kalra, V., & Sharma, S. (2022, April). A algorithm. In 2018 IEEE Third International Conference on
hybrid machine learning approach for customer Informatics and Computing (ICIC) (pp. 1-6).
segmentation using rfm analysis. In International
Conference on Artificial Intelligence and Sustainable Mohammad, J., & Kashem, M. A. (2022, April). Air
Engineering: Select Proceedings of AISE 2020 (pp. 87-100). Pollution Comparison RFM Model Using Machine
Learning Approach. In 2022 IEEE 7th International
Cho, Y. S., & Moon, S. C. (2013). Weighted mining conference for Convergence in Technology (I2CT) (pp. 1-5).
frequent pattern based customer’s RFM score for
personalized u-commerce recommendation Rahim, M. A., Mushafiq, M., Khan, S., & Arain, Z. A.
system. JoC, 4(4), 36-40. (2021). RFM-based repurchase behavior for customer
classification and segmentation. Journal of Retailing and
Gustriansyah, R., Suhandi, N., & Antony, F. (2020). Consumer Services, 61, 102566.
Clustering optimization in RFM analysis based on k-
means. Indonesian Journal of Electrical Engineering and Seymen, O. F., Dogan, O., & Hiziroglu, A. (2020,
Computer Science, 18(1), 470-477. December). Customer churn prediction using deep learning.
In International Conference on Soft Computing and Pattern
He, X., & Li, C. (2016, December). The research and Recognition (pp. 520-529).
application of customer segmentation on e-commerce
websites. In 2016 IEEE 6th International Conference on Shah, S., & Singh, M. (2012, May). Comparison of a time
Digital Home (ICDH) (pp. 203-208). efficient modified K-mean algorithm with K-mean and K-
medoid algorithm. In 2012 IEEE international conference
Jiang, T., & Tuzhilin, A. (2008). Improving personalization on communication systems and network technologies (pp.
solutions through optimal segmentation of customer 435-437).
bases. IEEE transactions on knowledge and data
engineering, 21(3), 305-320. Sheshasaayee, A., & Logeshwari, L. (2017, February). An
efficiency analysis on the TPA clustering methods for
Joung, J., & Kim, H. (2023). Interpretable machine intelligent customer segmentation. In 2017 IEEE
learning-based approach for customer segmentation for International Conference on Innovative Mechanisms for
new product development from online product Industry Applications (ICIMIA) (pp. 784-788).
reviews. International Journal of Information
Management, 70, 102641. Shihab, S. H., Afroge, S., & Mishu, S. Z. (2019, February).
RFM based market segmentation approach using advanced
Khajvand, M., Zolfaghar, K., Ashoori, S., & Alizadeh, S. k-means and agglomerative clustering: a comparative study.
(2011). Estimating customer lifetime value based on RFM In 2019 IEEE International Conference on Electrical,
analysis of customer purchase behavior: Case Computer and Communication Engineering (ECCE) (pp. 1-
study. Procedia computer science, 3, 57-63. 4).

Lewaa, I., Hafez, M. S., & Ismail, M. A. (2021). Data Shim, B., Choi, K., & Suh, Y. (2012). CRM strategies for a
integration using statistical matching techniques: A small-sized online shopping mall based on association rules
review. Statistical Journal of the IAOS, 37(4), 1391-1410. and sequential patterns. Expert Systems with
Applications, 39(9), 7736-7742.
Lewaa, I., Hafez, M. S., & Ismail, M. A. (2023). Mixed
Shirole, R., Salokhe, L., & Jadhav, S. (2021). Customer
Statistical Matching Approaches Using a Latent Class
segmentation using rfm model and k-means clustering. Int.
Model: Simulation Studies. Journal of statistics
J. Sci. Res. Sci. Technol, 8, 591-597.
Applications and probability.
Sinaga, K. P., & Yang, M. S. (2020). Unsupervised K-
means clustering algorithm. IEEE access, 8, 80716-80727.

11
Journal of Data Science and Intelligent Systems Vol. XX Iss. XX yyyy

______________________________________________________________________________

Smaili, M. Y., & Hachimi, H. (2023). New RFM-D

classification model for improving customer analysis and
response prediction. Ain Shams Engineering Journal,
102254.

Weng, C. H. (2017). Revenue prediction by mining

frequent itemsets with customer analysis. Engineering
Applications of Artificial Intelligence, 63, 85-97.

Wu, J., Shi, L., Yang, L., XiaxiaNiu, Li, Y.,

XiaodongCui, ... & Zhang, Y. (2021). User value
identification based on improved RFM model and k-
means++ algorithm for complex data analysis. Wireless
Communications and Mobile Computing, 1-8.

Wu, Z., Zang, C., Wu, C. H., Deng, Z., Shao, X., & Liu, W.
(2021). Improving customer value index and consumption
forecasts using a weighted RFM model and machine
learning algorithms. Journal of Global Information
Management (JGIM), 30(3), 1-23.

Yoseph, F., & Heikkila, M. (2018, December). Segmenting

retail customers with an enhanced RFM and a hybrid
regression/clustering method. In 2018 IEEE International
Conference on Machine Learning and Data Engineering
(iCMLDE) (pp. 108-116).

Zahrotun, L. (2017, November). Implementation of data

mining technique for customer relationship management
(CRM) on online shop tokodiapers. com with fuzzy c-
means clustering. In 2017 IEEE 2nd International
conferences on Information Technology, Information
Systems and Electrical Engineering (ICITISEE) (pp. 299-
303).

How to Cite
Lewaa, I. (2023). Customer Segmentation Using Machine
Learning Model: An Application of RFM Analysis. Journal
of Data Science and Intelligent Systems.
https://doi.org/10.47852/bonviewJDSIS32021293

Q2 PT Math4
No ratings yet
Q2 PT Math4
5 pages
Computer Theory Book 3rd Edition (English Medium)
No ratings yet
Computer Theory Book 3rd Edition (English Medium)
253 pages
Mitsubishi VRF Errors
100% (2)
Mitsubishi VRF Errors
82 pages
Revolution Prep SAT Tutoring Companion - Student
100% (1)
Revolution Prep SAT Tutoring Companion - Student
312 pages
Customer Segmentation With RFM Analysis
No ratings yet
Customer Segmentation With RFM Analysis
3 pages
IJCRT2212570
No ratings yet
IJCRT2212570
4 pages
DAB 303 Project 2
No ratings yet
DAB 303 Project 2
12 pages
IRJET-V11I5300
No ratings yet
IRJET-V11I5300
5 pages
Part 1 Libra - Used
No ratings yet
Part 1 Libra - Used
7 pages
Project Report
No ratings yet
Project Report
37 pages
PPT PDF Custome Segmentation
No ratings yet
PPT PDF Custome Segmentation
18 pages
Yeh 2009
No ratings yet
Yeh 2009
6 pages
Benefits of RFM Analysis: Pareto Principle
No ratings yet
Benefits of RFM Analysis: Pareto Principle
6 pages
24770-Article Text-109440-2-10-20231203
No ratings yet
24770-Article Text-109440-2-10-20231203
28 pages
RFM
100% (1)
RFM
27 pages
BMT (6148) - Marketing Metrics: Digital Assignment-2
No ratings yet
BMT (6148) - Marketing Metrics: Digital Assignment-2
12 pages
A Study On Customer Segmentation Using Recency Frequency and Monetary Analysis On Jivanjor Adhesive Product at Banglore
No ratings yet
A Study On Customer Segmentation Using Recency Frequency and Monetary Analysis On Jivanjor Adhesive Product at Banglore
9 pages
Ponlacha Rojl
No ratings yet
Ponlacha Rojl
103 pages
RFM Analysis in R: Math 3201 Datamining Foundation
No ratings yet
RFM Analysis in R: Math 3201 Datamining Foundation
12 pages
Customer Segmentation Course 21102024
No ratings yet
Customer Segmentation Course 21102024
33 pages
RFM Model For Customer Purchase Behaviour Using K-Means Algorithm
No ratings yet
RFM Model For Customer Purchase Behaviour Using K-Means Algorithm
55 pages
Direct Marketing Decision Support Through Predictive Customer Response Modeling
No ratings yet
Direct Marketing Decision Support Through Predictive Customer Response Modeling
9 pages
Pricing and Promotion Strategies of An Online Shop Based On Customer Segmentation and Multiple Objective Decision Making
No ratings yet
Pricing and Promotion Strategies of An Online Shop Based On Customer Segmentation and Multiple Objective Decision Making
9 pages
Mark Ana
No ratings yet
Mark Ana
7 pages
RFM Analysis
No ratings yet
RFM Analysis
9 pages
Second File To Upload
No ratings yet
Second File To Upload
17 pages
The Application Research of Customer Segmentation Model in Bank Financial Marketing
No ratings yet
The Application Research of Customer Segmentation Model in Bank Financial Marketing
6 pages
Customer 360
No ratings yet
Customer 360
14 pages
Combining RFM Model and Clustering Techniques For Customer Value Analysis of A Company Selling Online
No ratings yet
Combining RFM Model and Clustering Techniques For Customer Value Analysis of A Company Selling Online
6 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
5 pages
Important Paper
No ratings yet
Important Paper
7 pages
final_compare
No ratings yet
final_compare
9 pages
Suwarti - Final Project
No ratings yet
Suwarti - Final Project
20 pages
2015 A 2tuple Fuzzy Linguistic RFM Model and Its Implementation
No ratings yet
2015 A 2tuple Fuzzy Linguistic RFM Model and Its Implementation
8 pages
Adm Final
No ratings yet
Adm Final
7 pages
RFM Analysis For Customer Segmentation Using Machine Learning: A Survey of A Decade of Research
No ratings yet
RFM Analysis For Customer Segmentation Using Machine Learning: A Survey of A Decade of Research
8 pages
RFM How To Automatically Segment Customers Using Purchase Data and A Few Lines of Python
No ratings yet
RFM How To Automatically Segment Customers Using Purchase Data and A Few Lines of Python
8 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
7 pages
Name: Farhan Iqbal Instructor: Hassan Raza Sap: 70112728 Section: B Assignment #03 Marketing Management
No ratings yet
Name: Farhan Iqbal Instructor: Hassan Raza Sap: 70112728 Section: B Assignment #03 Marketing Management
6 pages
Notes
No ratings yet
Notes
50 pages
RFM Analysis
No ratings yet
RFM Analysis
2 pages
Data Insights - Module 2 (Sanskar)
No ratings yet
Data Insights - Module 2 (Sanskar)
19 pages
1 s2.0 S0950705114000586 Main
No ratings yet
1 s2.0 S0950705114000586 Main
13 pages
RFM Analysis Using Python Spark
No ratings yet
RFM Analysis Using Python Spark
3 pages
Osama Qadeer
No ratings yet
Osama Qadeer
1 page
Customer Segmentation Project
No ratings yet
Customer Segmentation Project
13 pages
Article Segmentation Clients
No ratings yet
Article Segmentation Clients
6 pages
RFM
No ratings yet
RFM
2 pages
RFM Anlaysis: Applications of RFM Analysis
No ratings yet
RFM Anlaysis: Applications of RFM Analysis
2 pages
Lol 1
No ratings yet
Lol 1
7 pages
CRM Analytics - RFM Model (New)
No ratings yet
CRM Analytics - RFM Model (New)
13 pages
Predictive Model assignment-RFM Model
No ratings yet
Predictive Model assignment-RFM Model
15 pages
MRA Milestone 1 RFM
No ratings yet
MRA Milestone 1 RFM
28 pages
Customer Segmentation With RFM Models and Demographic Variable Using DBSCAN Algorithm
No ratings yet
Customer Segmentation With RFM Models and Demographic Variable Using DBSCAN Algorithm
8 pages
Optimizing Promotion Strategies With Business Intelligence, Customer Segmentation, and Market Basket Analysis
No ratings yet
Optimizing Promotion Strategies With Business Intelligence, Customer Segmentation, and Market Basket Analysis
37 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
5 pages
Report
No ratings yet
Report
19 pages
Final Project Template V2 1
No ratings yet
Final Project Template V2 1
46 pages
RFM Analysis For Customer Segmentation
100% (1)
RFM Analysis For Customer Segmentation
8 pages
RFM Assignment
No ratings yet
RFM Assignment
2 pages
Automobile Report
No ratings yet
Automobile Report
31 pages
A Big Data Based Dynamic Weight Approach For RFM Segmentati
No ratings yet
A Big Data Based Dynamic Weight Approach For RFM Segmentati
11 pages
GiaoHoThanh - RFM and CLV Paper - V2
No ratings yet
GiaoHoThanh - RFM and CLV Paper - V2
16 pages
Retail Data Analytics: Enhancing Customer Experience and Profitability
From Everand
Retail Data Analytics: Enhancing Customer Experience and Profitability
Christine Nyaga
No ratings yet
EN 21000 AlCu4MgTi
No ratings yet
EN 21000 AlCu4MgTi
2 pages
Math 8 2019-2020 (3rd Quarter)
No ratings yet
Math 8 2019-2020 (3rd Quarter)
9 pages
Catalogo Bombas DAB K
No ratings yet
Catalogo Bombas DAB K
9 pages
0 1 PID Control Vs APC Control
No ratings yet
0 1 PID Control Vs APC Control
26 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
Damascus Enquiry Elliott Beaumont FINAL Final 1
No ratings yet
Damascus Enquiry Elliott Beaumont FINAL Final 1
49 pages
New Holland Excavator E70sr en Service Manual
97% (60)
New Holland Excavator E70sr en Service Manual
20 pages
Quant Updater Set 115 Arun Singh Rawat
No ratings yet
Quant Updater Set 115 Arun Singh Rawat
35 pages
Aerospace Supply Chain PDF
No ratings yet
Aerospace Supply Chain PDF
24 pages
CSE 211 - QUIZ Question
No ratings yet
CSE 211 - QUIZ Question
4 pages
LAS SHS PhySci MELC 7 Q2 Week-3
No ratings yet
LAS SHS PhySci MELC 7 Q2 Week-3
10 pages
Roulette
No ratings yet
Roulette
1 page
Unit and Design Filtration: 13 Theory OF
No ratings yet
Unit and Design Filtration: 13 Theory OF
26 pages
Ship-Structure Collisions: Development of A Numerical Model For Direct Impact Simulations
No ratings yet
Ship-Structure Collisions: Development of A Numerical Model For Direct Impact Simulations
8 pages
PHY105L_A2_GROUP3_E206_2Q2425
No ratings yet
PHY105L_A2_GROUP3_E206_2Q2425
12 pages
Four-Side Plane Figures
100% (1)
Four-Side Plane Figures
26 pages
Green To Wear Supporting Documents V1.5 in French
No ratings yet
Green To Wear Supporting Documents V1.5 in French
36 pages
Open-Circuit Voltage, and Equivalent Resistance
No ratings yet
Open-Circuit Voltage, and Equivalent Resistance
30 pages
c99 PHP
No ratings yet
c99 PHP
71 pages
PV Grid-Connected Inverter: User Manual
No ratings yet
PV Grid-Connected Inverter: User Manual
33 pages
IB Chemistry Mini-IA
No ratings yet
IB Chemistry Mini-IA
8 pages
Lesson 18. Contour Integral: Line Integral In: T B To
No ratings yet
Lesson 18. Contour Integral: Line Integral In: T B To
5 pages
Effect of Cleaning Point of Uniclean Machine in Blow Room On Cleaning Efficiency and Yarn Quality
No ratings yet
Effect of Cleaning Point of Uniclean Machine in Blow Room On Cleaning Efficiency and Yarn Quality
21 pages
4 4 7 Oil Grit Separator - Page6pdf
100% (1)
4 4 7 Oil Grit Separator - Page6pdf
18 pages
A Model of Optimal Consumer Search and Price
No ratings yet
A Model of Optimal Consumer Search and Price
31 pages
ms04 Cost Behavior and Cost Classification
No ratings yet
ms04 Cost Behavior and Cost Classification
8 pages

Customer Segmentation Using Machine Learning Model

Uploaded by

Customer Segmentation Using Machine Learning Model

Uploaded by

Received: 29 June 2023 | Revised: 21 August 2023 | Accepted: 6 September 2023 | Published online: 8 September 2023

Customer Segmentation Using Journal of Data Science and Intelligent Systems

1. Introduction comprehending and assessing consumer behavior

et al., 2023). segregate them into a customer group. Through the

Customer churn prediction involves utilizing machine

calculated with its RFM analysis feature. RFM scores are

of machine learning algorithms was performed in Python

Each customer is allocated with a triumvirate of discrete

Table 2 Scores of RFM

2 Can’t lose them

Ultimately, all of the customers are furnished with scores

2020). containing ‘n’ instances k, which is the number of clusters.

The initial stage of the clustering algorithm is to select a set

Figure 7 Boxplot within clusters of customers among

Display 1 DBSCAN clustering

Figure 9 2-D clustering using DBSCAN

Figure 8 Boxplot within clusters of customers among

Figure 10 3-D clustering using DBSCAN

4.4. Clustering approach using DBSCAN

Through this paper, different techniques are compared. We

In future work, data scientists can improve the prediction

Smaili, M. Y., & Hachimi, H. (2023). New RFM-D

Weng, C. H. (2017). Revenue prediction by mining

Wu, J., Shi, L., Yang, L., XiaxiaNiu, Li, Y.,

Yoseph, F., & Heikkila, M. (2018, December). Segmenting

Zahrotun, L. (2017, November). Implementation of data

You might also like