0% found this document useful (0 votes)
3 views

Customer Segmentation Based on RFM Model and Clustering Techniques With K-Means Algorithm

This study focuses on customer segmentation for Nine Reload Credit using the RFM model and K-Means clustering algorithm, analyzing 82,648 transactions from 2017. The research resulted in two clusters, with 63 customers in Cluster 1 and 39 in Cluster 2, providing insights for marketing strategies and customer retention. The findings can assist companies in identifying profitable customer segments and enhancing decision-making in product marketing.

Uploaded by

shiwanibanjare6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Customer Segmentation Based on RFM Model and Clustering Techniques With K-Means Algorithm

This study focuses on customer segmentation for Nine Reload Credit using the RFM model and K-Means clustering algorithm, analyzing 82,648 transactions from 2017. The research resulted in two clusters, with 63 customers in Cluster 1 and 39 in Cluster 2, providing insights for marketing strategies and customer retention. The findings can assist companies in identifying profitable customer segments and enhancing decision-making in product marketing.

Uploaded by

shiwanibanjare6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Customer Segmentation based on RFM model and

Clustering Techniques With K-Means Algorithm


Ina Maryani 1, Dwiza Riana 2, Rachmawati Darma Astuti 3, Ahmad Ishaq4, Sutrisno 5, Eva Argarini Pratama6
1,2,3
STMIK Nusa Mandiri Jakarta, 4,5,6 Universitas Bina Sarana Informatika
[email protected], [email protected], [email protected], [email protected],
[email protected], [email protected]

Abstract- Every day there is a transaction process performed unknown or hidden information can be known by processing
by Customer. The process generates a lot of data where there are the data so that it is useful for the credit business agent [4], for
82,648 transactions from the month of January-December 2017. example in which information on the grouping of agent data
This study aims to perform customer segmentation on Nine
has the potential to give the most profit to the company which
Reload Credit by utilizing data mining process based on RFM
will help companies to make decisions in product marketing.
model and by using techniques Clustering. The algorithm used
for cluster formation is K-Means algorithm. K-Means produces a The model used by the researcher is RFM (Recency,
visual cluster model with the Rapidminer 5.2 tools that represent Frequency, Monetary) commonly used to perform the last visit
the number of customers in each cluster by using RFM (Recency, time grouping, visit frequency, and revenue obtained by the
Frequency, and Monetary) attributes. From 82,648 transactions company [5]. The reason why continuing to use the RFM
that were then processed, based on RFM model it resulted in 102 model is that it is easy to use and quickly implemented in
Customers. Furthermore, we analyzed cluster by using K-Means companies, and in addition RFM is easily understood by
algorithm with the result of 63 Customers in Cluster 1 and 39 managers and marketing decision makers [6].
Customers in Cluster 2. The result of this research can be used
The results of this study can be used as a decision support
by company to know customer category, and then the company
system in the credit business to map customers and to know
will know how to maintain the customer owned.
potential customers.
Keywords—Data Mining; RFM Model; Cluster Analysis; Customer
Segmentation; K-Means Algorithm. II. LITERATURE REVIEW OF RFM MODEL

Some previous studies used RFM to analyze sales data as


I. INTRODUCTION performed by [8] where in the research, online sales (e-
commerce) was analyzed so that it obtained the results into 8
In today's business competition, customers are the main focus
clusters. From the whole cluster, cluster 7 is the cluster with
of the company to maintain its excellence. Companies must
the highest RFM value compared to other clusters. What was
plan and use clear strategies in serving customers [1]. The
performed by [7] provides information for e-commerce
company's primary focus is not on how to get new potential
entrepreneurs, so they can know from each category of
customers but how to sell more products to the existing
customer. Then [8] also used RFM to know customer value at
customers because the cost that companies must incur to
airlines customer. From the result of the research, there are 4
acquire new customers is much more expensive than to retain
customer categories that demand company to give different
existing customers [2]. In the credit business, the data can be
service to customer.
obtained based on historical data, so the data will increase
Furthermore the study [1] also used RFM to process the
continuously such as the transaction data from each agent. The
transaction data of exhaust sales which were then clustered to
transaction process of agents in a credit server generates
categorize the customer type of the company.
abundant data in the form of profiles of transactions that the
RFM technique is based on three simple customer
agent performs. This will happen repeatedly to the credit
attributes, namely Recency of purchase, Frequency of
business. Agent transaction data cumulation will slow down
purchase, and Monetary value of purchase. The purpose of
the search for information on that data [3]. This data can be
RFM is to predict future consumer behavior (directing better
called as data mining. Data mining is a part of knowledge
segmentation decisions) [9]. Therefore, it is necessary to
discovery data which is an information extraction process that
translate consumer behavior in “number” so that it can be used
is useful, not known before, and hidden from data [4]. Based
all the time. In this case the researcher intended to do the test
on the number of available agent transaction data, the
by using RFM Variable on the dataset of credit sale

Authorized licensed use limited to: International Institute of Information Technology-Raipur. Downloaded on April 23,2025 at 12:45:50 UTC from IEEE Xplore. Restrictions apply.
transaction where the amount of the data is very much. Every d. Group the data by the closest distance between data with
month, there are thousands of transactions. The total number centroid.
of transactions for a year is 82,648 times collected from
January-December 2017. After the data is mapped by using IV. A CASE STUDY
RFM variable, it will be combined with K-Means algorithm to
The dataset used in this case study is credit sales data
categorize from each customer so that from the process the
on Nine Reload Credit Server. At the company there is a lot of
company will be able to know the category of each customer.
data stacking, thousands of transactions every month. You can
III. REVIEW OF CLUSTER ANALYSIS imagine how difficult it would be if you had to analyze the
data manually one by one. The researchers tried to analyze the
Data mining is a process that uses statistics, data as much as 82,648 customer transactions. The model
mathematics, artificial intelligence, and machine learning proposed in determining the profitable customer is described
techniques to extract and identify useful information and in Figure 1 which shows the steps to determine the profitable
related knowledge from large databases. Data mining is a part customer.
of knowledge discovery data which is a useful, unknown, and
hidden information extraction process from data [4].
Data mining aims to obtain a relationship or pattern Transaction
that may provide useful indications [10]. The relationship Dataset
Marketing strategies
sought by data mining is a relationship between two or more
in one dimension [10].
This research using K-means to grouping data
transaction with consideration, such as:
1. Could not specified the number of manual data cluster. RFM
2. Unknown a cluster central point of data. Segmentation
3. Difficult to grouping the customer types with the amount Data
of data 82.648 preprocessing

Besides K-means also having an axcess, such as :


1. Easy to be implemented and used.
2. Takes the fairly quickly time to execute this learning Find final
Cluster
3. Easy to adapted. Recency,
4. Commonly used. Frequency,
The K-Means algorithm is a distance-based Monetary variables
clustering method that partitions data to a number of groups
and works on numeric attributes [11].
Here are the steps to calculate K-Mean Algorithm [12]:
a. Determine the number of k-clusters to be formed. Find number of Clustering
b. Generate k-centroid (cluster center point) randomly. clusters (k) by K-means
c. Calculate the distance of each data to each centroid. The
formula used is Euclidean distance with the equation (1) as
follows: Fig 1. Framework for Customer Segmentation based on RFM model
and Clustering Techniques
(1)
In this study the database used is the data collected
Where is the distance between the cluster from the transaction as much as 82,648 sales transactions.
with the center of cluster in the i-th word. is the i-th Table 1 is an example of a sales transaction database.
word weight of the cluster whose the distance will be
searched for. μi is the weight of the i-th word at the center
of the cluster.

Authorized licensed use limited to: International Institute of Information Technology-Raipur. Downloaded on April 23,2025 at 12:45:50 UTC from IEEE Xplore. Restrictions apply.
Table 1. TRANSACTION DATASET weighting was divided into 5 scales/ scores as listed in Table
4.

Table 4. DECISION TABLE AFTER DIGITAL

Weight R (Recency) F (Frecuency) M (Monetary)


<1 Month >15000 So >300 Million
5 Shortest Highest
Many
150 – 200
4 Short 1-3 Month High 8000 - 15000 Many
Million
100 - 150
3 Reguler 3-5 Month Reguler 5000 - 8000 Normal
Million
50 - 100
2 long 5-8 Month low 2000 - 5000 Few
Million

Data Preparation 1 longest >8 Month lower <2000 fewer <50 Million

At this stage the database structure will be prepared so as


to simplify the mining process. The preparation process Once the scale is determined, the next step is to transform its
includes three main things: selection, pre-processing, and data on the existing scale. Table 5 shows the sample data
transformation data. This process also carries out the selection transformed.
of attributes that are adjusted to the data mining process. The
attributes used can be seen in Table 2. Table 5. EXAMPLE R-F-M VALUES OF SOME CUSTOMERS
AFTER DATA PREPROCESSING
Table 2. ATTRIBUTES USED
Agent Code R F M
Field Information C001 5 2 1
Agent Used to specify the customer code C002 1 1 1
Name C003 1 1 1
Date The date of the customer's purchase transaction is used to C004 5 1 1
model Recency and Frequency. .... .... .... ....
Recency, within a year when the last customer made a C102 1 1 1
transaction with Nine Reload.
Frequency is the number of transactions conducted by the
customer within a period of one year.
After all transaction data is transformed into numeric form,
Price To model the Monetary attribute, that is by summing up all then the data have been able to be grouped by using K-means
customer’s transactions in one year. algorithm. To be able to group these data into several clusters
needs to do some steps (Rahman, 2017):
The overall data available on the transaction dataset 1. In this study the existing data will be grouped into four
must be selected first to determine the data that can be used in clusters.
accordance with the RFM variable. The total of 82,648 2. In this study the initial center point was determined
transactions are then selected by RFM variable to be 102 randomly, and it obtained the central point of each cluster
Customer. Table 3 shows the dataset in accordance with the which can be seen in Table 6.
Recency, Frequency, and Monetary variables.
Table 6. Initial Center Point
Table 3. The Description of Recency, Frequency and Monetary
Agent Code R F M
Agent C005 5 2 1
R F M
Code C061 1 1 1
C001 31-12-2017 2035 Rp22,909,504.00
C002 18-06-2017 339 Rp 5,878,306.00 3. In this research k-means method was used to allocate each
C003 04-11-2017 352 Rp 4,525,250.00
data into a cluster, so the data will be entered in a cluster
C004 31-12-2017 36 Rp 526,250.00
.... .... .... .... that has the closest distance to the center point of each
C102 25-01-2017 28 Rp 231,375.00 cluster. To find out which cluster is closest to the data, it is
necessary to calculate the distance of each data with the
This study collected data in the form of sales transaction center point of each cluster.
history dataset on the credit business of 82.648 transactions
which prformed the determination of criteria weighting first
based on recency, frequency, and monetary variable. The

Authorized licensed use limited to: International Institute of Information Technology-Raipur. Downloaded on April 23,2025 at 12:45:50 UTC from IEEE Xplore. Restrictions apply.
Table 7. CALCULATION RESULT OF EACH DATA 9 C011 ASNEY TRONIK
10 C012 ATIKA CELL
CUSTOMER Closest
R F M C1 C2 11 C014 AYTHA CELL
CODE Distance
C001 5 2 1 0.985150517 3.669114335 0.985150517 12 C015 BARLI TRONIK
C002 1 1 1 3.527989798 0.471593045 0.471593045
13 C016 BOYOUT21
C003 1 1 1 3.527989798 0.471593045 0.471593045
14 C017 CAHAYA CELL
C004 5 1 1 0.50619742 3.564042648 0.50619742
C005 5 1 1 0.50619742 3.564042648 0.50619742 15 C018 DEZTI CELL
16 C019 DIA TRONIK
4. After all the data is placed into the closest cluster, then 17 C020 EB TRONIK
recalculate the new cluster center based on the member 18 C021 ERNI CELL
average in the cluster.
19 C022 FAIT CELL
5. After obtaining a new center point for each cluster,
20 C023 FITRI CELL
repeat the third step until the center point of each cluster
21 C024 FITRI POJOK CELL
is fixed, and no data moves from one cluster to another.
22 C025 GRISELDA CELL
From the results of data processing performed, based on the 23 C026 HERA CELL
customer transaction dataset using K-Means through 4 24 C027 HESTI CELL
iterations in the form of clusters as shown in Figure 2, shows
25 C028 HILYA CELL
that the clustering results obtained 63 members of cluster 1, 39
26 C029 IBU CELL
members of cluster 2.
27 C030 LIA CELL
28 C031 LIDA CELL
29 C032 MUJI ASTUTI
30 C033 MUSTIKA
31 C034 NABIL CELL
32 C035 NDARI CELL
33 C036 ONDLENK CELL
34 C037 PUJI CELL
35 C038 QORY CELL
36 C039 RARA CELL

Fig 2. Graph of Cluster Analysis results 37 C041 RASITO


38 C042 RISWATI CELL
In Table 8 and Table 9, There are a number of agent names 39 C043 RIZA CELL
that are in Cluster 1 and Cluster 2 in which the data can be
40 C044 RIZKY CELL
utilized by the Company.
41 C045 ROKHIM KOMPUTER

Table 8. Customer Names in Cluster 1 42 C046 SAHAL CELL


43 C047 SEMBILAN RELOAD
NO CUSTOMER CODE AGENT NAME 44 C049 SUSI TRONIK
1 C001 ADAM CELL 45 C050 TARI
2 C004 ADIN TRONIK 46 C051 SUKRON
3 C005 ADITIYA CELL 47 C054 TOINK CELL
4 C006 AIS ALL CELL 48 C056 UTAMA CELL
5 C007 ANIDATUL CELL 49 C059 WAHYONO CELL
6 C008 AQILA 50 C060 YANI CELL
7 C009 ARA CELL
51 C062 YUNITA CELL
8 C010 ASIH 52 C063 AJENG CELL

Authorized licensed use limited to: International Institute of Information Technology-Raipur. Downloaded on April 23,2025 at 12:45:50 UTC from IEEE Xplore. Restrictions apply.
53 C064 ARRASYID RELOAD 29 C092 AGUSTIN CELL
54 C068 DEDERIZKY CELL 30 C093 FAKIH CELL
55 C069 FAIS CELL 31 C094 TABALONG-RELOAD
56 C070 TASY CELL 32 C095 MEI-TRONIK
57 C071 ADIVA CELL 33 C096 KALILLA CELL
58 C073 FAIZAL CELL 34 C097 DELTRA TRONIK
59 C076 FATH CELL 35 C098 DWI
60 C077 LUCAS TRONIK 36 C099 AJENG JKT
61 C079 LULU CELL 37 C100 AYU
62 C088 UNYIEL 38 C101 DWI CELL
63 C090 YUNITA CELL 39 C102 EGA CELL

Table 9. Customer Names in Cluster 2


V. CONCLUSION
NO CUSTOMER CODE AGENT NAME
The main purpose of this research was to segment the
1 C002 ANES CELL customers from the transaction data of 82,648 based on RFM
2 C003 HAFI CELL model, and furthermore clustering analysis was performed by
3 C013 HERI using K-Means.
4 C040 HUYA CELL The result of this research is 102 customers. 63 customers
5 C048 IMA CELL are in Cluster 1, and 39 customers are in Cluster 2. Cluster 1
has higher average of RFM value than Cluster 2.
6 C052 JUJU CELL
By knowing the categories of each Customer, it is
7 C053 INA CELL
expected that the company will be able to take the right
8 C055 JM IRS
decision in marketing strategy.
9 C057 IBU KECE
10 C058 RAFKA RELOAD ACKNOWLEDGMENT
11 C061 SAMSITI CELL
We would like to thank Nine Reload Credit which is the
12 C065 SIMPLE PAY
business of selling credit which has provided data for us.
13 C066 SUPRI CELL
14 C067 TAKIM REFERENCES
15 C072 TRIDAYA RELOAD
[1] Maryani, Ina, and Dwiza Riana. 2017. “Clustering and
16 C074 ULFA CELL
Profiling of Customers Using RFM for Customer
17 C075 SEMBILAN CELL
Relationship Management Recommendations.” 2017 5th
18 C078 SITRIADI CELL International Conference on Cyber and IT Service
19 C080 SOLIH TRONIK Management, CITSM 2017, 2–7.
20 C081 TRANSZHEN https://doi.org/10.1109/CITSM.2017.8089258.
21 C082 NASYAH PULSA [2] Tama, Bayu Adhi. 2010. “Penetapan Strategi Penjualan
22 C083 ADI CELL Menggunakan Association Rules Dalam Konteks CRM.”
23 C084 EKA CELL
Jurnal Generic Vol. 5 (No.1):35–38.
[3] Hand, David J. 2007. “Principles of Data Mining.” Drug
24 C085 ELLA CELL
Safety 30 (7):621–22. https://doi.org/10.2165/00002018-
25 C086 WAHYU CELL
200730070-00010.
26 C087 INCES [4] Ramamohan, Y, K Vasantharao, C Kalyana
27 C089 KHAYLA CELL Chakravarti, and a S K
28 C091 MUNDRI CELL

Authorized licensed use limited to: International Institute of Information Technology-Raipur. Downloaded on April 23,2025 at 12:45:50 UTC from IEEE Xplore. Restrictions apply.
Ratnam. 2012. “A Study of Data Mining Tools in
Knowledge Discovery Process.” International Journal of
Soft Computing and Engineering 2 (3):191–94.
[5] Wongchinsri, Pornwatthana, and Werasak Kuratach.
2016. “A Survey -Data Mining Frameworks in Credit
Card Processing.” 2016 13th International Conference
on Electrical Engineering/Electronics, Computer,
Telecommunications and Information Technology, ECTI-
CON 2016.
https://doi.org/10.1109/ECTICon.2016.7561287.
[6] Peiman Alipour Sarvari, Alp Ustundag, and Hidayet
Takci. 2014. “Performance Evaluation of Different
Customer Segmentation Approaches Based on RFM and
Demographics Analysis.” Kybernetes 43 (8):1209–23.
https://doi.org/10.1108/K-01-2015-0009
[7] Rachid, et al. 2015. “Combining RFM Model and
Clustering Techniques for Customer Value Analysis of a
Company selling online.” 2015 12th International
Conference of Computer Systems and Applications
(AICCSA) 2015,1-6.
[8] Liu Jiali and Du Hyung. 2010. “Study on Airline
Customer Value Evaluation Based on RFM Model
(2010).” 2010 International Conference On Computer
Design And Appliations (ICCDA 2010) ,278-281
[9] Aviliani, U. Sumarwan, I. Sugema, and A. Saefuddin.
2011. “Segmentasi Nasabah Tabungan Mikro
Berdasarkan Recency, Frequency, dan Monetary : Kasus
Bank BRI.” Finance and Banking Journal 13 (1):95–
109.
[10] Kusrini Luthfi, Ema Taufiq. 2009. Algoritma Data
Mining. Edited by Theresia Ari Prabawati. Yogyakarta:
C.V Andi OFFSET.
https://books.google.co.id/books?id=Ojclag73O8C&pg=
PA3&dq=data+mining+adalah&hl=id&sa=X&ved=0ah
UKEwijrefgpYnZAhXBPY8KHWeJCQ4Q6AEIKzAA#
v=onepage&q=data mining adalah&f=false.
[11] Lubis, Abdul Haris. 2016. “Model Segmentasi Pelanggan
Dengan Kernel K-Means Clustering Berbasis Customer
Relationship Management.” Jurnal & Penelitian Teknik
Informatika 1:36–41.
[12] Rahman, Aulia Tegar; Wiranto ;Rini Anggrainingsih.
2017. “Coal Trade Data Clustering Using K-Means (
Case Study PT . Global Bangkit Utama )” 6 (1):24–31.

Authorized licensed use limited to: International Institute of Information Technology-Raipur. Downloaded on April 23,2025 at 12:45:50 UTC from IEEE Xplore. Restrictions apply.

You might also like