Three-Way K-Means - Integrating K-Means and Three-Way Decision
Three-Way K-Means - Integrating K-Means and Three-Way Decision
https://doi.org/10.1007/s13042-018-0901-y
ORIGINAL ARTICLE
Abstract
The traditional k-means, which unambiguously assigns an object precisely to a single cluster with crisp boundary, does
not adequately show the fact that a cluster may not have a well-defined cluster boundary. This paper presents a three-way
k-means clustering algorithm based on three-way strategy. In the proposed method, an overlap clustering is used to obtain
the supports (unions of the core regions and the fringe regions) of the clusters and perturbation analysis is applied to separate
the core regions from the supports. The difference between the support and the core region is regarded as the fringe region
of the specific cluster. Therefore, a three-way explanation of the cluster is naturally formed. Davies–Bouldin index (DB),
Average Silhouette index (AS) and Accuracy (ACC) are computed by using core region to evaluate the structure of three-
way k-means result. The experimental results on UCI data sets and USPS data sets show that such strategy is effective in
improving the structure of clustering results.
13
Vol.:(0123456789)
International Journal of Machine Learning and Cybernetics
data set and to construct information granules. Based on the the complement of the support give rise to a trisection of the
rough set theory [26–28], Lingras and West [18] introduced space. A trisection captures the three types of relationships
rough k-means (RKM) clustering, which describes each between a cluster and an object.
cluster not only by a center, but also with a pair of lower and As one of the classical hard clustering algorithms,
upper bounds. The lower and upper approximations are dif- k-means represents a cluster by a single set. For each cluster,
ferent weighted parameters that are used to compute the new the set naturally divides the space into two regions. Objects
centers. Involving membership degrees, Mitra et al. [24] put belong to the cluster if they are in the set, otherwise they
forward a rough-fuzzy k-means (RFKM) clustering method, do not. Here, only two relationships are considered in the
which incorporated membership in the RKM framework. process of k-means. In this paper, we aim at presenting a
The lower and upper bounds are determined according to the three-way k-means (TWKM for short) clustering method
membership degrees, not the individual absolute distances by incorporating three-way decisions into k-means cluster-
between an object and its neighbors. As a conceptual and ing. In the proposed method, we view each cluster as a set
algorithmic bridge between rough sets and fuzzy sets, shad- and represent it by using core region (Co), fringe region
owed set [29] provides an alternate mechanism for handling (Fr) and trivial region (Tr). Since Tr can be expressed as
uncertainty and has been successfully used in clustering the complement of the union of Co and Fr, we can represent
analysis, resulting in shadowed k-means (SKM) [25]. a three-way cluster by a pair of the set of core objects and
Three-way decision, as a new field of study for complex the set of fringe objects. For each object, it can be a mem-
problem solving, was proposed by Yao [43–46]. It is an ber of one core region at most or a member of one fringe
extension of the commonly used binary-decision models by region at least. Figure 1 shows one possible cluster by three-
adding the deferred decision. The main idea of three-way way k-means clustering method. The elements in the fringe
decision is to divide a universe into three disjoint regions region may belong only to this cluster or to fringe region of
and make different strategies for different regions. The other clusters.
deferred decision is viewed as the third decision-making In order to clarify the main differences between TWKM
behavior when the information is not sufficient to determine and the traditional k-means, we take the objects in Fig. 2 as
the state of an object, that is, whether the object is accepted an example. Figure 3 is the clustering result of traditional
or rejected. Many soft computing models, such as interval k-means method by setting k = 2 . From the result we can
sets, rough sets, fuzzy sets and shadowed sets, have the tri- see that an object is either in C1 or not in C1; The same is
partitioning properties and can be reinvestigated within the true about C2. There is a crisp boundary for each cluster. The
framework of three-way decision [46]. It is obviously that requirement of a sharp boundary leads to easy analytical
there may exist at least three types of relationships between results, but may not be good enough for characterizing the
an element and a cluster, namely, belong-to fully, belong-to uncertainty. If we apply the proposed TWKM algorithm on
partially (i.e., also not belong-to partially), and not belong-to data set in Fig. 2, we obtain the clustering result as Fig. 4.
fully. In order to chapter the three types of relationships, Yu
et al. [49, 50, 52] introduced three-way clustering based on
three-way decision theory. In three-way clustering, a clus-
ter is represented by pair of nested sets called the core and
support of the cluster, respectively. The core, the difference
between the support and core (i.e., the fringe region), and
C1 C2
Fig. 1 A demonstrative cluster of TWKM Fig. 3 Clustering results of traditional k-means
13
International Journal of Machine Learning and Cybernetics
A universal set
Trisecting
Acting
13
International Journal of Machine Learning and Cybernetics
text classification [13, 15], risk decision [13, 17], govern- Singh’s method focus on fuzzy concept lattice [34–36] and
ment decision [19], conflict analysis [10], web-based support its applications [37].
systems [42], and so on. Many recent studies further inves-
tigated extensions and applications of three-way decision. 2.2 Rough k‑means
For examples, Zhang et al. [54] presented a kind of three-
way decision model according to two types of classifica- The traditional k-means algorithm [38] proceeds by par-
tion errors and two types of uncertain classifications. Yang titioning n objects into k non-empty subsets. During each
et al. [41] proposed a unified model of sequential three-way partition, the centroids or means of clusters are computed as
decision and multilevel incremental processing for complex ∑
problem solving by making use of a granular structure. Hao v∈Ci v
et al. [5] developed a sequential three-way decision model xi = , (1)
�Ci �
to solve the optimal scale selection problem in a dynamic
multi-scale decision table. Zhang et al. [53] established a where v is the object in Ci , |Ci | is the number of objects in
dynamic three-way decision model based on the updating cluster Ci . The process is repeated until convergence, i.e.,
of the attribute values. In addition, Qi et al. [30] introduced there are no more new assignments of objects to the clusters.
three-way decision into formal concept analysis and pro- In the rough set theory [26–28], a rough concept is
posed the notion of three-way concept, in which the main approximated by a pair of exact concepts, called the lower
idea is to incorporate the idea of ternary classification into and upper approximations. The lower approximation is the
the design of extension or intension of a concept. Li et al. set of objects definitely belonging to the vague concept,
[14] proposed an axiomatic approach to describe three-way whereas the upper approximation is the set of objects pos-
concepts by means of multi-granularity. The idea of three- sibly belonging to the same. Correspondingly, three pair-
way concept analysis has attracted a lot of research, and a wise disjoint regions are formed, i.e., positive, boundary, and
series of related results have been obtained [6, 12, 32, 48]. In negative regions. Figure 6 provides a schematic diagram of
clustering field, Yu et al. [49, 50, 52] presented a framework rough set X with POS(X), BND(X) and NEG(X), consisting
of three-way clustering which represents the clusters by a of granules coming from the rectangular grid.
pair of sets called core region and fringe region. Afridi et al. By incorporating rough set theory into traditional
[1] presented a three-way clustering approach for handling k-means, Lingras and West [18] introduced RKM clus-
missing data by using game-theoretic rough set model. Wang tering. In RKM, the concept of k-means is extended by
and Yao [39] proposed a framework of a contraction-and- viewing each cluster as an interval set or rough set X. It is
expansion based three-way clustering called CE3 inspired characterized by the lower and upper approximations RX
by the ideas of erosion and dilation from mathematical mor- and RX , respectively, with the following properties: (i) an
phology. Yu et al. [51] investigated an active three-way clus- object xk can be part of at most one lower approximation,
tering method via low-rank matrices. All the above results (ii) if xk ∈ RX of cluster X, then simultaneously xk ∈ RX ,
enrich the theories and models of three-way decision. (iii) if xk is not a part of any lower approximation, then it
Recently, Singh [33] proposed another three-way method belongs to two or more upper approximations. This per-
to generate the three-way fuzzy concepts and their hierarchi- mits overlaps between clusters.
cal-order visualization in the concept lattice. Singh’s method In order to compute the centroid of each cluster in
comes from the theory of neutrosophic set, which uses three RKM, the right side of Eq. (1) is split into two parts.
functions: a truth membership function, an indeterminacy- Since the patterns lying in the lower approximation defi-
membership function, and a falsity membership function, to nitely belong to a rough cluster, they are assigned a higher
represent a set. The Singh’s method shares some similarity
with the Yao’s three-way decision and, at the same time, dif-
fers from it in several ways. On the one hand, they are both
based on the tri-partition methodology, which provides flex-
ible ways for human-like problem solving and information
processing. On the other hand, Yao’s method comes from
decision-theoretic rough set model, which has led to the
concept of three-way decision. A basic idea of Yao’s three-
way decision is to divide a universal set into three pair-wise
disjoint regions and to process the three regions accordingly.
Different from Yao’s method, Singh’s method gives a way X NEG(X ) BND(X ) POS(X )
to divide a universe set into three regions, independently,
which may intersect each other. Most of recent studies of Fig. 6 Three disjoint regions in rough set model
13
International Journal of Machine Learning and Cybernetics
13
International Journal of Machine Learning and Cybernetics
Property (I) demands that each cluster cannot be empty. the specific cluster. We suppose that clustering results satisfy
Property (II) states that any element in V must be in the core following properties.
or the fringe region of at least one cluster. It is possible that
an element v ∈ V belongs to more than one cluster. Property • Property 1: An object can be a member of one core region
(III) requires that the core regions of clusters are pairwise at most.
disjoint. Base on the above discussions, we have the follow- • Property 2: An object can be a member of one fringe region
ing family of clusters to represent the result of three-way at least.
clustering:
13
International Journal of Machine Learning and Cybernetics
Suppose the centroid of support(Ci ) is xi and mi is the num- cluster. Therefore, a three-way cluster is naturally formed.
ber of elements in support(Ci ). We adapt different strate- The above clustering procedure is referred as TWKM clus-
gies for different types. For objects in type I, we assign tering. Algorithm 1 is designed to describe the process of
them into fringe region of Ci because they belong to two TWKM clustering. In Algorithm 1, Line 3 to Line 15 is to
clusters at least. For objects in type II, we add mi times v find the support of each cluster by using the iteration pro-
into support(Ci ) and denote the new cluster by support(Ci∗ ). cess to update the centroids of supports. Line 16 to Line
Calculate the new centroid xi∗ of support(Ci∗ ) by (5) and the 29 is to separate the core regions from the support sets by
differences between xi∗ and xi . For a given parameter 𝜀2 , if using centroids perturbation analysis. Fig. 7 shows the flow-
|xi∗ − xi | ≤ 𝜀2 , v is assigned to core region of Ci , otherwise, chart of Algorithm 1. The time complexity of Algorithm 1 is
v is assigned to fringe region of Ci. O(tknm) + O(knm) and the space complexity of Algorithm 1
Using the same strategy for each support(Ci )(i = 1, … , k), is O(k + n)m, where t, n and m are the numbers of iterations,
we can obtain the core region and fringe region of each objects and attributes, respectively.
13
International Journal of Machine Learning and Cybernetics
∑
k j
nc
ACC = . (10)
c=1
n
13
International Journal of Machine Learning and Cybernetics
in Tables 2, 3, 4, 5, 6 and 7 in which, the optimal results be three types of relationships between an object and a clus-
among the different algorithms are marked as bold. The ter, namely, belong-to fully, belong-to partially (i.e., also not
CPU computing time of 100 runs on each data set, the unit belong-to partially), and not belong-to fully. Based on these
of measurement for time is the “second”, are recorded in relationships, we developed three-way k-means clustering
Table 8 as well. method by integrating k-means and three-way decisions in this
Tables 2, 3, 4 and 5 demonstrate the average perfor- paper. In the proposed method, an overlap clustering is used
mances and the best performances of DB value and AS value to obtain the supports of the clusters and perturbation analy-
by k-means, k-medoids, FKM, RKM and TWKM, respec- sis is applied to separate the core regions from the supports.
tively. From Tables 2, 3, 4 and 5, we can find that TWKM The differences between the supports and the core regions are
outperforms other algorithms in the sense of DB value and regarded as the fringe region of the specific cluster. Therefore,
AS value both for the best performances and for the average a three-way explanation of the cluster is naturally formed.
performances on most of the date sets. The improvement
can be attributed to the fact that each cluster is represented
Table 2 Average performances of DB value
by its lower bound, which helps to increase the degree of
separation between clusters and decrease the degree of scat- Data sets k-means k-medoids FKM RKM TWKM
ter within cluster since fringe regions have been successfully
WINE 1.3157 1.3553 1.3181 1.3025 1.1846
marked out. Though the performances of DB value and AS
WDBC 1.1363 1.1336 1.1446 1.1386 1.1266
value on WDBC, GLASS and BANK sets by FKM are simi-
GLASS 1.2663 0.9679 1.5416 1.1040 1.1440
lar to or superior to the results by TWKM, the computing
BANK 1.1913 1.1817 1.1981 1.1592 1.1550
time of TWKM are far less than FKM.
OCCUPANCY 0.6835 0.6833 0.6850 0.6835 0.1019
Tables 6 and 7 list the average performances and the
MAGIC 1.2603 1.3232 1.3920 1.2088 0.7790
best performances of ACC value, respectively. It is not dif-
USPS-08 1.9291 1.9305 1.9367 1.9312 1.9202
ficult to observe that both the average performances and the USPS-49 2.5573 2.6774 2.5683 2.5436 2.5009
best performances of ACCobtained by TWKM are superior
USPS-3568 2.8197 2.9993 3.4175 2.8195 2.7785
to the results obtained by other methods on WINE, WDBC,
OCCUPANCY, MAGIC, USPS-08 and USPS-49. However,
other methods exceed TKWM on GLASS, BANK and USPS-
Table 3 Best performances of DB value
49. This is because ACCis computed by using core region to
represent corresponding cluster and the total number of the Data sets k-means k-medoids FKM RKM TWKM
objects is to exclude the objects in the fringe regions, which
WINE 1.3053 1.3553 1.3181 1.2508 1.1656
means nc and n both become smaller in Eq. (10).
j
WDBC 1.1363 1.1336 1.1446 1.0887 1.1190
GLASS 0.9900 0.9546 1.2874 0.9335 0.8973
BANK 1.1911 1.1479 1.1980 1.1441 1.1550
6 Concluding remarks OCCUPANCY 0.6835 0.6830 0.6850 0.6835 0.1019
MAGIC 1.2603 1.2538 1.3920 1.2083 0.7790
In most of the existing studies, a cluster is represented by a
USPS-08 1.9268 1.9228 1.9337 1.9158 1.9157
single set, the set naturally divides the space into two regions.
USPS-49 2.5443 2.6774 2.5404 2.5104 2.4972
An object belongs to the cluster if it is in the set, otherwise it
USPS-3568 2.5378 2.9165 2.5277 2.5674 2.5258
does not belong to the cluster. It is obviously that there may
Table 1 A description of data sets used in the experiments Table 4 Average performances of AS value
ID Data sets Samples Attributes Classes Data sets k-means k-medoids FKM RKM TWKM
13
International Journal of Machine Learning and Cybernetics
Table 5 Best performances of AS value Table 8 The total run time of 100 runs
Data sets k-means k-medoids FKM RKM TWKM Data sets k-means k-medoids FKM RKM TWKM
WINE 0.4764 0.4608 0.4741 0.5050 0.5356 WINE 0.31 0.83 0.26 90.76 0.55
WDBC 0.5765 0.5828 0.5683 0.6036 0.5848 WDBC 0.38 3.87 0.64 266.12 0.94
GLASS 0.4912 0.5441 0.4421 0.6671 0.5739 GLASS 0.38 1.81 1.17 108.03 1.01
BANK 0.5004 0.4974 0.4927 0.5254 0.5204 BANK 0.57 19.12 0.95 610.06 3.29
OCCUPANCY 0.7812 0.7811 0.7782 0.7812 0.9924 OCCUPANCY 3.57 65.57 21.48 13568.81 44.06
MAGIC 0.6133 0.6191 0.4426 0.5016 0.7279 MAGIC 7.98 73.49 40.92 12158.70 64.52
USPS-08 0.3297 0.3280 0.3284 0.3323 0.3326 USPS-08 5.24 59.19 49.94 1532.54 16.96
USPS-49 0.2365 0.2098 0.2208 0.2365 0.2373 USPS-49 5.15 34.55 12.35 1105.43 19.01
USPS-3568 0.1766 0.1558 0.1562 0.1781 0.1782 USPS-3568 15.31 211.60 38.78 2777.60 59.53
13
International Journal of Machine Learning and Cybernetics
11. LeCun Y, Bottou L, Bengio Y, Haffner P (1990) USPS zip code 34. Singh PK (2017) Interval-valued neutrosophic graph representation
handwritten digits database. http://www.ics.uci.edu/mlearn/MLRep of concept lattice and its (𝛼, 𝛽, 𝛾)-decomposition. Arab J Sci Eng
ository.html 43:1–18
12. Li CP, Li JH, He M (2016) Concept lattice compression in incom- 35. Singh PK (2018) Similar vague concepts selection using their
plete contexts based on k-medoids clustering. Int J Mach Learn Euclidean distance at different granulation. Cogn Comput
Cybern 7:539–552 10:228–241
13. Li HX, Zhou XZ (2011) Risk decision making based on decision- 36. Singh PK (2018) Concept learning using vague concept lattice. Neu-
theoretic rough set: a three-way view decision model. Int J Comput ral Process Lett 48:31–52
Inf Sys 4:1–11 37. Singh PK (2017) Medical diagnoses using three-way fuzzy concept
14. Li JH, Huang CC, Qi JJ, Qia YH, Liu WQ (2017) Three-way cogni- lattice and their euclidean distance. Comp Appl Math 3:1–24
tive concept learning via multi-granularity. Inf Sci 378:244–263 38. Tou JT, Gonzalez RC (1974) Pattern recognition principles. Addi-
15. Li W, Miao DQ, Wang WL, Zhang N (2010) Hierarchical rough son-Wesley, London
decision theoretic framework for text classification. In: IEEE inter- 39. Wang PX, Yao YY (2018) CE3: A three-way clustering method
national conference on cognitive informatics, pp 484–489 based on mathematical morphology. Knowl-Based Syst 155:54–65
16. Li Y, Zhang C, Swan JR (2000) An information filtering model 40. Xu R, Wunsch DC (2005) Survey of clustering algorithms. IEEE
on the web and its application in jobagent. Knowl-Based Syst Trans Neural Netw 16:645–678
13:285–296 41. Yang X, Li TR, Fujita H, Liu D, Yao YY (2017) A unified model of
17. Liang DC, Liu D (2015) A novel risk decision-making based on sequential three-way decisions and multilevel incremental process-
decision-theoretic rough sets under hesitant fuzzy information. IEEE ing. Knowl-Based Syst 134:172–188
Trans Fuzzy Syst 23:237–247 42. Yao JT (2015) Web-based medical decision support systems for
18. Lingras P, West C (2004) Interval set clustering of web users with three-way medical decision making with game-theoretic rough sets.
rough k-means. J Intell Inf Syst 23:5–16 IEEE Trans Fuzzy Syst 23:3–15
19. Liu D, Li TR, Liang DC (2012) Three-way government decision 43. Yao YY (2009) Three-way decision: an interpretation of rules
analysis with decision-theoretic rough sets. Int J Uncertain Fuzz in rough set theory. In: Proceedings of RSKT’09, vol 5589, pp
20:119–132 642–649
20. Liu D, Yao YY, Li TR (2011) Three-way investment decisions with 44. Yao YY (2010) Three-way decisions with probabilistic rough sets.
decision-theoretic rough sets. Int J Comput Inf Sys 4:66–74 Inf Sci 180:341–353
21. Macqueen J (1967) Some methods for classification and analysis of 45. Yao YY (2011) The superiority of three-way decisions in probabil-
multivariate observations. In: Proceedings of 5th Berkeley sympo- istic rough set models. Inf Sci 181:1080–1096
sium on mathematical statistics and probability, pp 281–197 46. Yao YY (2012) An outline of a theory of three-way decisions. In:
22. Maulik U, Bandyopadhyay S (2002) Performance evaluation of Proceedings of RSCTC’12, vol 7413, pp 1–17
some clustering algorithms and validity indices. IEEE Trans Pat- 47. Yao YY (2016) Three-way decisions and cognitive computing. Cogn
tern Anal 24:1650–1654 Comput 8:543–554
23. Mirkin B (1991) Mathematical classification and clustering. Kluwer, 48. Yao YY (2017) Interval sets and three-way concept analysis in
Boston incomplete contexts. Int J Mach Learn Cybern 8:3–20
24. Mitra S, Banka H, Pedrycz W (2006) Rough-fuzzy collaborative 49. Yu H (2017) A framework of three-way cluster analysis. In: Proceed-
clustering. IEEE Trans Syst Man Cybern B 36:795–805 ings of international joint conference on rough sets, pp 300–312
25. Mitra S, Pedrycz W, Barman B (2010) Shadowed c-means: integrat- 50. Yu H, Jiao P, Yao YY, Wang GY (2016) Detecting and refining
ing fuzzy and rough clustering. Pattern Recognit 43:1282–1291 overlapping regions in complex networks with three-way decisions.
26. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:314–356 Inf Sci 373:21–41
27. Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about 51. Yu H, Wang XC , Wang GY, Zeng XH (2018) An active three-way
data. Kluwer, Boston clustering method via low-rank matrices for multi-view data. Inf Sci.
28. Pawlak Z (2004) Some issues on rough sets. Trans Rough Sets I https://doi.org/10.1016/j.ins.2018.03.009
3100:1–58 52. Yu H, Zhang C, Wang GY (2016) A tree-based incremental overlap-
29. Pedrycz W (1998) Shadowed sets: representing and processing fuzzy ping clustering method using the three-way decision theory. Knowl-
sets. IEEE Trans Syst Man Cybern B 28:103–109 Based Syst 91:189–203
30. Qi JJ, Qian T, Wei L (2016) The connections between three-way and 53. Zhang QH, Lv GX, Chen YH, Wang GY (2018) A dynamic three-
classical concept lattices. Knowl-Based Syst 91:143–151 way decision model based on the updating of attribute values.
31. Rousseeuw P (1987) Silhouettes: a graphical aid to the interpreta- Knowl-Based Syst 142:71–84
tion and validation of cluster analysis. J Comput Math Appl Math 54. Zhang QH, Xia DY, Wang GY (2017) Three-way decision model
20:53–65 with two types of classification errors. Inf Sci 420:431–453
32. Shivhare R, Cherukuri AK (2017) Three-way conceptual approach
for cognitive memory functionalities. Int J Mach Learn Cybern Publisher’s Note Springer Nature remains neutral with regard to
8:21–34 jurisdictional claims in published maps and institutional affiliations.
33. Singh PK (2016) Three-way fuzzy concept lattice representation
using neutrosophic set. Int J Mach Learn Cybern 8:1–11
13