0% found this document useful (0 votes)
15 views

2005 Research On Customer Segmentation Model by Clustering

Uploaded by

momo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

2005 Research On Customer Segmentation Model by Clustering

Uploaded by

momo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Research on Customer Segmentation Model

by Clustering
Jing Wu Zheng Lin
School of Information, Central University of Finance and School of Information, Central University of Finance and
Economics Economics
No. 39, South Xuyuan Road, Haidian District, Beijing No. 39, South Xuyuan Road, Haidian District, Beijing
(100081) (100081)
+86 010 6228 8663 +86 010 8124 8085
[email protected] [email protected]

and targeted market. Customers who belong to the same groups


ABSTRACT have certain similarities, while different segmented groups of
In the paper, we use credit card consumption data as our model- customers have distinct characteristics[1]. Customer
building samples and present a modeling framework for segmentation model is a model which is built by classifying
building segment-level predictive models that utilize pattern- customers according to certain standards on selected
based clustering approach and signature discovery techniques. segmentation variables. There are two kinds of consumption-
We devise monetary matrix and fluctuate-rate matrix to study based customer segmentation models[2].
various modes. Through clustering on both matrixes, we
uncover different customer characteristics. Utilizing these 2.1 RFM Model
characteristics, we can build two-dimension Consumption- RFM segmentation model is a model that differentiates
Based customer segmentation model. important customers by three variables, i.e. customers
consumption interval, frequency and money amount. R
Keywords represents recency which refers to the interval between the time
clustering analysis; customer segmentation; segmentation when the latest consuming behavior happens and present. The
model shorter the interval is, the bigger R is. F represents frequency
which refers to the frequency of consuming behavior in a
1. CLUSTERING ANALYSIS period of time. M represents monetary which refers to
Data mining is a kind of analytic method for abstracting useful consumption money amount in a period of time. Researches
knowledge and finding out useful data mode from massive data . show that the bigger the value of R and F are, the more likely
In the field of market research, clustering is an effective and the corresponding customers are to make a new trade with
frequently used method for market segmentation, finding out enterprises. Moreover, the bigger M is, the more likely the
targeted market and groups of customers. As one of the corresponding customers are to respond to enterprises’ products
important analytic methods used in data mining, clustering can and service again.
be used as an independent tool to present data distribution, RFM method is very effective for customer segmentation. We
observe cluster’s characteristics and make a further analysis of can sort customers by their consuming date first, putting the
specific clusters if needed. latest customer in front. In this way, customers can be
In this paper, different groups of customers are segmented by classified into several groups. Then, F and M are standardized
the modes of their consuming behavior using clustering method, and sorted in the same way as described before. At this time,
so that the classifying standard for each segmentation each customer is positioned in a three-dimension space,
dimension can be established. In this way, we can ensure that corresponding to a coordinate of (R, F, M). By calculating
the standards are not established subjectively. R*F*M, the value of RFM for each customer can be gained.
With these RFMs sorted descensively, the groups of customers
can be classified according to certain proportion. For example,
2. CUSTOMER SEGMENTATION
to a commercial enterprise, customers whose corresponding
MODEL
values of RFM are in the first 20 percent can be regarded as
The concept of customer segmentation was developed by
their most valuable customers.
American marketing expert, Wendell R. Smith, in the middle of
1950s. Customer segmentation refers to classifying customers
by their value, demands, preference and other factors in the 2.2 Customer Value Matrix Model
circumstances of clear organization strategies, business model Customer value matrix model is an improved model that is
based on the traditional RFM model. In this model, customer
value matrix consists of the times of purchasing( represented by
Permission to make digital or hard copies of all or part of this work for F) and the average amount of purchasing(represented by A).
personal or classroom use is granted without fee provided that copies are Average amount of purchasing replaces two variables in RFM
not made or distributed for profit or commercial advantage and that copies model between which there is multicollinearity, which
bear this notice and the full citation on the first page. To copy otherwise, or eliminates their linear effect on RFM model.
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. In customer value matrix, the base value of F and A is their
ICEC’05, August 15–17, 2005, Xi’an, China. average value respectively. Once the division of the axis is
Copyright 2005 ACM 1-59593-112-0/05/08…$5.00. decided, customers are positioned in one of the quadrants of the

316
customer value matrix. By the value of A and F, customers are 252,865 samples are left, and finally 186,835 samples remain
classified into four groups in the matrix, i.e. customer who likes for clustering analysis after eliminating non-continuous data.
to consume(represented by Ⅰ), customer who is valuable for
enterprises (represented by Ⅱ ), customer who often 3.2 The Result of Clustering Analysis of
consumes(represented by Ⅲ), and customer whose behavior is Consumption Expenditure
uncertain to enterprises(represented by Ⅳ ). The result is We establish a matrix of consumption expenditure using the
presented in Figure 1. samples. The matrix is represented by Г and is shown as
following:
high
⎛ χ1,1 χ1, 2 " χ1,11 ⎞
⎜ ⎟
⎜ χ 2,1 χ 2, 2 " χ 2,11 ⎟
Γ=⎜
Ⅰ Ⅱ # # # # ⎟
A ⎜ ⎟
⎜χ ⎟
⎝ n,1 χ 2,1 " χ n,11 ⎠
Ⅳ Ⅲ Г represents the consumption expenditure of n customers in 11
high months. It is the foundation for researching on consumption
low levels. The purpose of clustering analysis of data matrix is to
F discovery subgroups of different consumption levels in a whole
Figure 1. Customer value matrix.
customer group that is classified by consumption expenditure in
3. PRACTICE OF CUSTOMER each month.
SEGMENTATION MODEL After 5 times of clustering, we get a classifying result which
Here, credit card data of 2003 from a bank are used to carry out
generally reaches the requirement of customer segmentation on
the research on customer segmentation model. With clustering
consumption levels. According to the result of clustering
analysis, customers are segmented considering their consuming
analysis, customers can be classified into 9 different
behavior in this research. The two dimensions by which the
consumption levels. From the lowest level to the highest level
customer segmentation model is established are money amount
are represented by A1, A2, …, A9. They are shown in Figure 2.
of consumption and differential ratio of consumption. The
model can be referred to by enterprises for their marketing
strategies. The fast clustering algorithm provided by SPSS
(Statistics Package for Social Science) software kit is used here
for clustering analysis. The credit card data include record
number, credit card number, credit card type, consuming date,
consuming time and so on. Personal information of customers
is not included in the data. The data is provided by a
government administrative agency, so that the result of the
research can support government decisions.

3.1 Data Preparation


The reality and accuracy of mode mining in the late period
depend on the quality of data preparation. The preparation
discussed here includes four steps: data cleaning 、 data
integration、data transformation and data aggregation[3].
Before clustering analysis, data must be cleared to eliminate
those that are not in accordance with analysis premise. The
dirty data that need clearing can be classified into two Figure 2. Hierarchical structure of consumption
categories: those that can be seen directly from their expenditure.
characteristics, such as non-continuous consumption behavior
data, and those extra data that will be discovered by clustering 3.3 Clustering Analysis of Consumption
analysis, which is also called isolated point. Expenditure Differential Ratio
Considering that customers are different in consumption
Data integration in this research refers to merging consumption fluctuation and that the purpose of selecting variables for
records in each month into one data table which will facilitate consumption fluctuation is to show the characteristic of
clustering. Because the amount of original data is too huge and customers’ personal consumption behavior, consumption
they are unevenly distributed on time dimension, we have to expenditure fluctuation ratio is selected in this paper as the
aggregate them。 In this way, we can get expenditure in each variable describing the fluctuation of consumption behavior.
month, and then we get a series of consumption data on the
certain time dimension. Based on the consumption expenditure matrix, the matrix of
consumption expenditure differential ratio is established in this
We briefly conclude what we have done. At the beginning, we paper to show and study the fluctuation of consumption
have 2,142,341 credit card samples that are selected from a type expenditure. The matrix of consumption expenditure
of credit card of a bank. After integration and aggregation, differential ratio is a matrix that contains the ratios of the
difference between a month’s consumption expenditure and the

317
consumption expenditure of the month before that month to the
total annual consumption expenditure of a customer. The ratios
show the fluctuation of consumption expenditure in each month.
Ω is used here to represent the matrix. A1 A2 A3 A4 A5 A6 A7 A8 A9
B6
⎛ χ 1, 2 − χ 1,1 χ 1, 3 − χ 1, 2 χ 1, p − χ 1, p −1 ⎞ B6 B6 B6 B6
B6 B6 B6 B6 B6
⎜ p " ⎟

χ 1, j ⎟⎟
p p A1 A2 A3 A4 A5 A6 A7 A8 A9
⎜ ∑ χ 1, j ∑χ ∑
B5
B5 B5 B5 B5 B5 B5 B5 B5 B5
1, j
⎜ j =1 j =1 j =1

⎜ χ 2 , 2 − χ 2 ,1 χ 2,3 − χ 2 , 2 χ 2 , p − χ 2 , p −1 ⎟ B4
A1 A2 A3 A4 A5 A6 A7 A8 A9
" B4 B4 B4 B4 B4 B4 B4 B4 B4
⎜ p p p ⎟
Ω = ⎜ ∑ χ 2, j ∑χ 2, j ∑ χ 2, j ⎟ A1 A2 A3 A4 A5 A6 A7 A8 A9
B3
⎜ j =1 j =1 j =1 ⎟ B3 B3 B3 B3 B3 B3 B3 B3 B3
⎜ # # # # ⎟
⎜ χ n , 2 − χ n ,1 χ n ,3 − χ n , 2 χ n , p − χ n , p −1 ⎟ B2
A1 A2 A3 A4 A5 A6 A7 A8 A9
⎜ p p
" p ⎟
B2 B2 B2 B2 B2 B2 B2 B2 B2

⎜ ⎟
⎜ ∑ χ n, j ∑χ ∑ χ ⎟ A1 A2 A3 A4 A5 A6 A7 A8 A9
n, j n, j B1
⎝ j =1 j =1 j =1 ⎠ B1 B1 B1 B1 B1 B1 B1 B1 B1

A1 A2 A3 A4 A5 A6 A7 A8 A9
We calculate the matrix by k-means-value method with k=2, 3,
4, 5, 6 respectively. When k is given a value, we compare the Figure 4. Matrix of customer segmentation based on
distances between the centers of all the clusters. Finally, we consumption level and consumption fluctuation.
find that when k=6, the distances are the farthest and the
difference between the samples in each cluster is the most A1, A2, …, A9 represent 9 different consumption levels and
evident. Therefore, we choose the result with k=6 as the final they ascend from A1 to A9. B1, B2, … , B6 represent 6
result of our analysis and the distribution of the center of each different consumption fluctuation modes which are strongly
cluster is shown in Figure 3. steady mode(B1), relatively steady mode(B2), long-term
intermittent mode(B3), short-term intermittent mode(B4),
periodic mode(B5) and mixed mode(B6). Vertical axis
differential variable
ratio
represents consumption fluctuation modes and horizontal axis
V V
V
2 V
7 represents consumption levels. The combination of a mode and
0.25000 3
V 8
V
V
4 V1
9
a level represents a group of customer. For example, A9B1
5
V 0
V1 represents a group of customers of ultra-high steady
6 1
0.00000 consumption level, and A4B4 represents a group of customers
of middle short-term intermittent consumption level.
-0.25000
Customers have different consumption customs in different
industries. Therefore, when the model is applied to customer
-0.50000 segmentation, large amounts of historical data must be analyzed
first so that the dimensions of consumption levels and
Figure 3. Distribution
1 2 of the
3 center
4 of 5each cluster
6 cluste(where consumption fluctuation can be found out. The actual value of
k=6). the model depends on the amount of data and the length of time
series data. The larger the amount is and the longer the time
Each cluster in Figure 3 represents a characteristic mode of series are, the more actually the behavior mode is reflected by
consumption fluctuation. By analyzing all the modes, we get the model.
the following fluctuation modes: strongly steady mode(cluster
5), relatively steady mode(cluster 1), long-term intermittent The purpose of building customer segmentation model is to
mode(cluster 2), short-term intermittent mode(cluster 4), segment customers into different groups so that enterprises can
periodic mode(cluster 3) and mixed mode(cluster 6). sell their products according to the different needs of the
segmented customers, which means establishing different
3.4 Customer Segmentation Model marketing schemes for different targeted groups of customers
Since the consumption behavior of any customer is in certain according to the subsections of consuming time, the different
consumption level and fluctuation mode, we can analyze their levels of customers, the combination of products and service,
consumption behavior in these two aspects. According to the and the emphasis of market positioning.
previous result of clustering, the dimension of consumption
level has 9 classifications and the dimension of consumption 4. REFERENCES
fluctuation has 6. So we get 54 groups of customers with [1] Del L. Hawkins, Roger J. Best, Kenneth A. Coney.
different characteristics in axes. It is shown in Figure 4. Customers’ Behaviors(seventh edition).
[2] Management Science: A Comparative Research on the
Methods of Customer Segmentation Based on
Consumption Behavior. 2003.2, Vol.16.
[3] (Canada) Jiawei Han, Micheline Kamber. The Concept
and Techniques of Data Mining.

318

You might also like