2005 Research On Customer Segmentation Model by Clustering
2005 Research On Customer Segmentation Model by Clustering
by Clustering
Jing Wu Zheng Lin
School of Information, Central University of Finance and School of Information, Central University of Finance and
Economics Economics
No. 39, South Xuyuan Road, Haidian District, Beijing No. 39, South Xuyuan Road, Haidian District, Beijing
(100081) (100081)
+86 010 6228 8663 +86 010 8124 8085
[email protected] [email protected]
316
customer value matrix. By the value of A and F, customers are 252,865 samples are left, and finally 186,835 samples remain
classified into four groups in the matrix, i.e. customer who likes for clustering analysis after eliminating non-continuous data.
to consume(represented by Ⅰ), customer who is valuable for
enterprises (represented by Ⅱ ), customer who often 3.2 The Result of Clustering Analysis of
consumes(represented by Ⅲ), and customer whose behavior is Consumption Expenditure
uncertain to enterprises(represented by Ⅳ ). The result is We establish a matrix of consumption expenditure using the
presented in Figure 1. samples. The matrix is represented by Г and is shown as
following:
high
⎛ χ1,1 χ1, 2 " χ1,11 ⎞
⎜ ⎟
⎜ χ 2,1 χ 2, 2 " χ 2,11 ⎟
Γ=⎜
Ⅰ Ⅱ # # # # ⎟
A ⎜ ⎟
⎜χ ⎟
⎝ n,1 χ 2,1 " χ n,11 ⎠
Ⅳ Ⅲ Г represents the consumption expenditure of n customers in 11
high months. It is the foundation for researching on consumption
low levels. The purpose of clustering analysis of data matrix is to
F discovery subgroups of different consumption levels in a whole
Figure 1. Customer value matrix.
customer group that is classified by consumption expenditure in
3. PRACTICE OF CUSTOMER each month.
SEGMENTATION MODEL After 5 times of clustering, we get a classifying result which
Here, credit card data of 2003 from a bank are used to carry out
generally reaches the requirement of customer segmentation on
the research on customer segmentation model. With clustering
consumption levels. According to the result of clustering
analysis, customers are segmented considering their consuming
analysis, customers can be classified into 9 different
behavior in this research. The two dimensions by which the
consumption levels. From the lowest level to the highest level
customer segmentation model is established are money amount
are represented by A1, A2, …, A9. They are shown in Figure 2.
of consumption and differential ratio of consumption. The
model can be referred to by enterprises for their marketing
strategies. The fast clustering algorithm provided by SPSS
(Statistics Package for Social Science) software kit is used here
for clustering analysis. The credit card data include record
number, credit card number, credit card type, consuming date,
consuming time and so on. Personal information of customers
is not included in the data. The data is provided by a
government administrative agency, so that the result of the
research can support government decisions.
317
consumption expenditure of the month before that month to the
total annual consumption expenditure of a customer. The ratios
show the fluctuation of consumption expenditure in each month.
Ω is used here to represent the matrix. A1 A2 A3 A4 A5 A6 A7 A8 A9
B6
⎛ χ 1, 2 − χ 1,1 χ 1, 3 − χ 1, 2 χ 1, p − χ 1, p −1 ⎞ B6 B6 B6 B6
B6 B6 B6 B6 B6
⎜ p " ⎟
⎜
χ 1, j ⎟⎟
p p A1 A2 A3 A4 A5 A6 A7 A8 A9
⎜ ∑ χ 1, j ∑χ ∑
B5
B5 B5 B5 B5 B5 B5 B5 B5 B5
1, j
⎜ j =1 j =1 j =1
⎟
⎜ χ 2 , 2 − χ 2 ,1 χ 2,3 − χ 2 , 2 χ 2 , p − χ 2 , p −1 ⎟ B4
A1 A2 A3 A4 A5 A6 A7 A8 A9
" B4 B4 B4 B4 B4 B4 B4 B4 B4
⎜ p p p ⎟
Ω = ⎜ ∑ χ 2, j ∑χ 2, j ∑ χ 2, j ⎟ A1 A2 A3 A4 A5 A6 A7 A8 A9
B3
⎜ j =1 j =1 j =1 ⎟ B3 B3 B3 B3 B3 B3 B3 B3 B3
⎜ # # # # ⎟
⎜ χ n , 2 − χ n ,1 χ n ,3 − χ n , 2 χ n , p − χ n , p −1 ⎟ B2
A1 A2 A3 A4 A5 A6 A7 A8 A9
⎜ p p
" p ⎟
B2 B2 B2 B2 B2 B2 B2 B2 B2
⎜ ⎟
⎜ ∑ χ n, j ∑χ ∑ χ ⎟ A1 A2 A3 A4 A5 A6 A7 A8 A9
n, j n, j B1
⎝ j =1 j =1 j =1 ⎠ B1 B1 B1 B1 B1 B1 B1 B1 B1
A1 A2 A3 A4 A5 A6 A7 A8 A9
We calculate the matrix by k-means-value method with k=2, 3,
4, 5, 6 respectively. When k is given a value, we compare the Figure 4. Matrix of customer segmentation based on
distances between the centers of all the clusters. Finally, we consumption level and consumption fluctuation.
find that when k=6, the distances are the farthest and the
difference between the samples in each cluster is the most A1, A2, …, A9 represent 9 different consumption levels and
evident. Therefore, we choose the result with k=6 as the final they ascend from A1 to A9. B1, B2, … , B6 represent 6
result of our analysis and the distribution of the center of each different consumption fluctuation modes which are strongly
cluster is shown in Figure 3. steady mode(B1), relatively steady mode(B2), long-term
intermittent mode(B3), short-term intermittent mode(B4),
periodic mode(B5) and mixed mode(B6). Vertical axis
differential variable
ratio
represents consumption fluctuation modes and horizontal axis
V V
V
2 V
7 represents consumption levels. The combination of a mode and
0.25000 3
V 8
V
V
4 V1
9
a level represents a group of customer. For example, A9B1
5
V 0
V1 represents a group of customers of ultra-high steady
6 1
0.00000 consumption level, and A4B4 represents a group of customers
of middle short-term intermittent consumption level.
-0.25000
Customers have different consumption customs in different
industries. Therefore, when the model is applied to customer
-0.50000 segmentation, large amounts of historical data must be analyzed
first so that the dimensions of consumption levels and
Figure 3. Distribution
1 2 of the
3 center
4 of 5each cluster
6 cluste(where consumption fluctuation can be found out. The actual value of
k=6). the model depends on the amount of data and the length of time
series data. The larger the amount is and the longer the time
Each cluster in Figure 3 represents a characteristic mode of series are, the more actually the behavior mode is reflected by
consumption fluctuation. By analyzing all the modes, we get the model.
the following fluctuation modes: strongly steady mode(cluster
5), relatively steady mode(cluster 1), long-term intermittent The purpose of building customer segmentation model is to
mode(cluster 2), short-term intermittent mode(cluster 4), segment customers into different groups so that enterprises can
periodic mode(cluster 3) and mixed mode(cluster 6). sell their products according to the different needs of the
segmented customers, which means establishing different
3.4 Customer Segmentation Model marketing schemes for different targeted groups of customers
Since the consumption behavior of any customer is in certain according to the subsections of consuming time, the different
consumption level and fluctuation mode, we can analyze their levels of customers, the combination of products and service,
consumption behavior in these two aspects. According to the and the emphasis of market positioning.
previous result of clustering, the dimension of consumption
level has 9 classifications and the dimension of consumption 4. REFERENCES
fluctuation has 6. So we get 54 groups of customers with [1] Del L. Hawkins, Roger J. Best, Kenneth A. Coney.
different characteristics in axes. It is shown in Figure 4. Customers’ Behaviors(seventh edition).
[2] Management Science: A Comparative Research on the
Methods of Customer Segmentation Based on
Consumption Behavior. 2003.2, Vol.16.
[3] (Canada) Jiawei Han, Micheline Kamber. The Concept
and Techniques of Data Mining.
318