0% found this document useful (0 votes)
57 views

Three-Way K-Means - Integrating K-Means and Three-Way Decision

This document summarizes a research article that proposes a new three-way k-means clustering algorithm. The algorithm represents each cluster with a core region, fringe region, and trivial region, allowing objects to belong fully or partially to clusters. This provides a more nuanced characterization than standard k-means. The algorithm is evaluated on benchmark datasets and shown to improve clustering structure over traditional k-means according to several validity indices.

Uploaded by

Anwar Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Three-Way K-Means - Integrating K-Means and Three-Way Decision

This document summarizes a research article that proposes a new three-way k-means clustering algorithm. The algorithm represents each cluster with a core region, fringe region, and trivial region, allowing objects to belong fully or partially to clusters. This provides a more nuanced characterization than standard k-means. The algorithm is evaluated on benchmark datasets and shown to improve clustering structure over traditional k-means according to several validity indices.

Uploaded by

Anwar Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

International Journal of Machine Learning and Cybernetics

https://doi.org/10.1007/s13042-018-0901-y

ORIGINAL ARTICLE

Three‑way k‑means: integrating k‑means and three‑way decision


Pingxin Wang1,2 · Hong Shi3 · Xibei Yang3 · Jusheng Mi2

Received: 12 December 2017 / Accepted: 12 December 2018


© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Abstract
The traditional k-means, which unambiguously assigns an object precisely to a single cluster with crisp boundary, does
not adequately show the fact that a cluster may not have a well-defined cluster boundary. This paper presents a three-way
k-means clustering algorithm based on three-way strategy. In the proposed method, an overlap clustering is used to obtain
the supports (unions of the core regions and the fringe regions) of the clusters and perturbation analysis is applied to separate
the core regions from the supports. The difference between the support and the core region is regarded as the fringe region
of the specific cluster. Therefore, a three-way explanation of the cluster is naturally formed. Davies–Bouldin index (DB),
Average Silhouette index (AS) and Accuracy (ACC​) are computed by using core region to evaluate the structure of three-
way k-means result. The experimental results on UCI data sets and USPS data sets show that such strategy is effective in
improving the structure of clustering results.

Keywords Three-way clustering · Three-way decision · K-means · Cluster validity index

1 Introduction has received much attention and a number of clustering


methods have been developed over the past decades. A com-
As one of the most fundamental topics in both data mining prehensive review of clustering algorithms can be found in
and machine learning, clustering has been widely used in [7, 40].
data analysis [8, 23]. The aim of clustering is to partition a Roughly speaking, most of the existing clustering meth-
set of given multivariate samples into several meaningful ods can be classed into two categories: hierarchical cluster-
groups such that all members within a group are similar to ing and partitive clustering [9]. In this paper, we focus on
each other and samples between different groups represent the latter which is categorized as a prototype-based model,
dissimilar characteristics. Research on clustering algorithm i.e., each cluster can be represented by a prototype, leading
to a concise description of the original data set. According
* Pingxin Wang to whether there is a crisp boundary between clusters, the
[email protected] various partitive clustering algorithms can be divided into
Hong Shi hard clustering and soft clustering. Hard clustering methods
[email protected] are based on an assumption that a cluster is represented by
Xibei Yang a set with a crisp boundary. One of the most widely used
[email protected] hard clustering is k-means method [21], where each object
Jusheng Mi must be assigned to exactly one cluster. The requirement of a
[email protected] sharp boundary leads to easy analytical results, but may not
adequately show the fact that a cluster may not have a well-
1
School of Science, Jiangsu University of Science defined cluster boundary. In order to relax this requirement
and Technology, Zhenjiang 212003, Jiangsu,
People’s Republic of China
of crisp boundary, many soft clustering methods based on
2
k-means were proposed for different applications. Incorpo-
College of Mathematics and Information Science,
Hebei Normal University, Shijiazhuang 050024,
rating fuzzy sets into k-means clustering, Bezdek [2] pro-
People’s Republic of China posed fuzzy k-means (FKM), which is assumed that a cluster
3
School of Computer Science, Jiangsu University
is represented by a fuzzy set that models a gradually chang-
of Science and Technology, Zhenjiang 212003, ing boundary. It is often used to reveal the structure of a
People’s Republic of China

13
Vol.:(0123456789)
International Journal of Machine Learning and Cybernetics

data set and to construct information granules. Based on the the complement of the support give rise to a trisection of the
rough set theory [26–28], Lingras and West [18] introduced space. A trisection captures the three types of relationships
rough k-means (RKM) clustering, which describes each between a cluster and an object.
cluster not only by a center, but also with a pair of lower and As one of the classical hard clustering algorithms,
upper bounds. The lower and upper approximations are dif- k-means represents a cluster by a single set. For each cluster,
ferent weighted parameters that are used to compute the new the set naturally divides the space into two regions. Objects
centers. Involving membership degrees, Mitra et al. [24] put belong to the cluster if they are in the set, otherwise they
forward a rough-fuzzy k-means (RFKM) clustering method, do not. Here, only two relationships are considered in the
which incorporated membership in the RKM framework. process of k-means. In this paper, we aim at presenting a
The lower and upper bounds are determined according to the three-way k-means (TWKM for short) clustering method
membership degrees, not the individual absolute distances by incorporating three-way decisions into k-means cluster-
between an object and its neighbors. As a conceptual and ing. In the proposed method, we view each cluster as a set
algorithmic bridge between rough sets and fuzzy sets, shad- and represent it by using core region (Co), fringe region
owed set [29] provides an alternate mechanism for handling (Fr) and trivial region (Tr). Since Tr can be expressed as
uncertainty and has been successfully used in clustering the complement of the union of Co and Fr, we can represent
analysis, resulting in shadowed k-means (SKM) [25]. a three-way cluster by a pair of the set of core objects and
Three-way decision, as a new field of study for complex the set of fringe objects. For each object, it can be a mem-
problem solving, was proposed by Yao [43–46]. It is an ber of one core region at most or a member of one fringe
extension of the commonly used binary-decision models by region at least. Figure 1 shows one possible cluster by three-
adding the deferred decision. The main idea of three-way way k-means clustering method. The elements in the fringe
decision is to divide a universe into three disjoint regions region may belong only to this cluster or to fringe region of
and make different strategies for different regions. The other clusters.
deferred decision is viewed as the third decision-making In order to clarify the main differences between TWKM
behavior when the information is not sufficient to determine and the traditional k-means, we take the objects in Fig. 2 as
the state of an object, that is, whether the object is accepted an example. Figure 3 is the clustering result of traditional
or rejected. Many soft computing models, such as interval k-means method by setting k = 2 . From the result we can
sets, rough sets, fuzzy sets and shadowed sets, have the tri- see that an object is either in C1 or not in C1; The same is
partitioning properties and can be reinvestigated within the true about C2. There is a crisp boundary for each cluster. The
framework of three-way decision [46]. It is obviously that requirement of a sharp boundary leads to easy analytical
there may exist at least three types of relationships between results, but may not be good enough for characterizing the
an element and a cluster, namely, belong-to fully, belong-to uncertainty. If we apply the proposed TWKM algorithm on
partially (i.e., also not belong-to partially), and not belong-to data set in Fig. 2, we obtain the clustering result as Fig. 4.
fully. In order to chapter the three types of relationships, Yu
et al. [49, 50, 52] introduced three-way clustering based on
three-way decision theory. In three-way clustering, a clus-
ter is represented by pair of nested sets called the core and
support of the cluster, respectively. The core, the difference
between the support and core (i.e., the fringe region), and

Fig. 2  Schematic diagram of a data set

C1 C2

Fig. 1  A demonstrative cluster of TWKM Fig. 3  Clustering results of traditional k-means

13
International Journal of Machine Learning and Cybernetics

A universal set

Trisecting

Region I Region II Region III

Acting

Strategy I Strategy II Strategy III


Fig. 4  Clustering results of TWKM

Fig. 5  Trisecting-and-acting model


We can see that objects near the cluster center are assigned
to core region and other objects are assigned to fringe region, is neither yes nor no. The third option may also be referred
which reveal a better structure than the result in Fig. 3. to as a deferment decision that requires further judgments.
The procedure of TWKM consists mainly of two steps. With an ordered evaluation function, the three regions are
The first step is to obtain the support of each cluster, where formally defined by:
support is the union of core region and fringe region of spec-
ified cluster. The second step is to separate the core region Definition 1 Three-way decision with order set [46].
from the support set. The rest of this paper is organized as Suppose (L, ⪯) is a totally ordered set, that is, ⪯ is a total
follows. In Sect. 2, we review some basic concepts of three- order. For two elements 𝛼, 𝛽 with 𝛽 ≺ 𝛼 , suppose that
way decision, rough k-means and three-way clustering. In the set of designated values for acceptance is given by
Sect. 3, we present the process and the algorithm of three- L+ = {t ∈ L ∣ t ⪰ 𝛼} and the set of designated values for
way k-means clustering. Evaluations of the algorithm and rejection is given by L− = {b ∈ L ∣ b ⪯ 𝛽}. For an evalu-
experiment results are reported in Sects. 4 and 5, respec- ation function 𝜈 ∶ U → L , its three regions are defined by:
tively. In Sect. 6, we give some concluding remarks and
point out some future research problems. POS(𝛼,𝛽) (𝜈) ={x ∈ U ∣ 𝜈(x) ⪰ 𝛼},
NEG(𝛼,𝛽) (𝜈) ={x ∈ U ∣ 𝜈(x) ⪯ 𝛽},
BND(𝛼,𝛽) (𝜈) ={x ∈ U ∣ 𝛽 ≺ 𝜈(x) ≺ 𝛼}.
2 Preliminaries
Three-way decision has been proved to build on solid
To facilitate the description of the proposed method, we
cognitive foundations and is a class of effective ways com-
introduce some basic concepts related to this paper, which
monly used in human problem solving and information pro-
include three-way decision, fuzzy k-means and three-way
cessing [47]. The idea of three-way decision is commonly
clustering.
used in real life and widely applied in many fields and disci-
plines. Many soft computing models for leaning uncertain
2.1 Three‑way decision
concepts, such as interval sets, rough sets, fuzzy sets and
shadowed sets, have the tri-partitioning properties and can
The concept of three-way decision, which was first pro-
be reinvestigated within the framework of three-way deci-
posed by Yao [45, 46], is an extension of the commonly
sions [45]. The theory of three-way decision embraces ideas
used binary-decision model through adding a third option
from these theories and introduces its own notions, concepts,
and is used to interpret rough set three regions. The positive,
methods and tools. Recently, Yao [47] presented a trisect-
negative and boundary regions are viewed, respectively, as
ing-and-acting model, which can be depicted by Fig. 5. The
the regions of acceptance, rejection, and noncommitment in
model explains three-way decision in terms of two basic
a ternary classification. The positive and negative regions
tasks. The first one is to divide a universal into three pair-
can be used to induce rules of acceptance and rejection.
wise disjoint regions and the second one is to develop appro-
Whenever it is impossible to make an acceptance or a rejec-
priate strategies for different region.
tion decision, the third noncommitment decision is made.
Since three-way decision was proposed by Yao, we have
One usually makes a decision based on available informa-
witnessed a fast growing developments and applications of
tion and evidence. When the evidence is insufficient or too
three-way approaches in many fields and disciplines, such
weak, it might be impossible to make either a positive or a
as, investment decision [17, 20], information filtering [16],
negative decision. One chooses an alternative decision that

13
International Journal of Machine Learning and Cybernetics

text classification [13, 15], risk decision [13, 17], govern- Singh’s method focus on fuzzy concept lattice [34–36] and
ment decision [19], conflict analysis [10], web-based support its applications [37].
systems [42], and so on. Many recent studies further inves-
tigated extensions and applications of three-way decision. 2.2 Rough k‑means
For examples, Zhang et al. [54] presented a kind of three-
way decision model according to two types of classifica- The traditional k-means algorithm [38] proceeds by par-
tion errors and two types of uncertain classifications. Yang titioning n objects into k non-empty subsets. During each
et al. [41] proposed a unified model of sequential three-way partition, the centroids or means of clusters are computed as
decision and multilevel incremental processing for complex ∑
problem solving by making use of a granular structure. Hao v∈Ci v
et al. [5] developed a sequential three-way decision model xi = , (1)
�Ci �
to solve the optimal scale selection problem in a dynamic
multi-scale decision table. Zhang et al. [53] established a where v is the object in Ci , |Ci | is the number of objects in
dynamic three-way decision model based on the updating cluster Ci . The process is repeated until convergence, i.e.,
of the attribute values. In addition, Qi et al. [30] introduced there are no more new assignments of objects to the clusters.
three-way decision into formal concept analysis and pro- In the rough set theory [26–28], a rough concept is
posed the notion of three-way concept, in which the main approximated by a pair of exact concepts, called the lower
idea is to incorporate the idea of ternary classification into and upper approximations. The lower approximation is the
the design of extension or intension of a concept. Li et al. set of objects definitely belonging to the vague concept,
[14] proposed an axiomatic approach to describe three-way whereas the upper approximation is the set of objects pos-
concepts by means of multi-granularity. The idea of three- sibly belonging to the same. Correspondingly, three pair-
way concept analysis has attracted a lot of research, and a wise disjoint regions are formed, i.e., positive, boundary, and
series of related results have been obtained [6, 12, 32, 48]. In negative regions. Figure 6 provides a schematic diagram of
clustering field, Yu et al. [49, 50, 52] presented a framework rough set X with POS(X), BND(X) and NEG(X), consisting
of three-way clustering which represents the clusters by a of granules coming from the rectangular grid.
pair of sets called core region and fringe region. Afridi et al. By incorporating rough set theory into traditional
[1] presented a three-way clustering approach for handling k-means, Lingras and West [18] introduced RKM clus-
missing data by using game-theoretic rough set model. Wang tering. In RKM, the concept of k-means is extended by
and Yao [39] proposed a framework of a contraction-and- viewing each cluster as an interval set or rough set X. It is
expansion based three-way clustering called CE3 inspired characterized by the lower and upper approximations RX
by the ideas of erosion and dilation from mathematical mor- and RX , respectively, with the following properties: (i) an
phology. Yu et al. [51] investigated an active three-way clus- object xk can be part of at most one lower approximation,
tering method via low-rank matrices. All the above results (ii) if xk ∈ RX of cluster X, then simultaneously xk ∈ RX ,
enrich the theories and models of three-way decision. (iii) if xk is not a part of any lower approximation, then it
Recently, Singh [33] proposed another three-way method belongs to two or more upper approximations. This per-
to generate the three-way fuzzy concepts and their hierarchi- mits overlaps between clusters.
cal-order visualization in the concept lattice. Singh’s method In order to compute the centroid of each cluster in
comes from the theory of neutrosophic set, which uses three RKM, the right side of Eq. (1) is split into two parts.
functions: a truth membership function, an indeterminacy- Since the patterns lying in the lower approximation defi-
membership function, and a falsity membership function, to nitely belong to a rough cluster, they are assigned a higher
represent a set. The Singh’s method shares some similarity
with the Yao’s three-way decision and, at the same time, dif-
fers from it in several ways. On the one hand, they are both
based on the tri-partition methodology, which provides flex-
ible ways for human-like problem solving and information
processing. On the other hand, Yao’s method comes from
decision-theoretic rough set model, which has led to the
concept of three-way decision. A basic idea of Yao’s three-
way decision is to divide a universal set into three pair-wise
disjoint regions and to process the three regions accordingly.
Different from Yao’s method, Singh’s method gives a way X NEG(X ) BND(X ) POS(X )
to divide a universe set into three regions, independently,
which may intersect each other. Most of recent studies of Fig. 6  Three disjoint regions in rough set model

13
International Journal of Machine Learning and Cybernetics

weight that is controlled by parameter wlow . The patterns (i)Ci ≠ 𝜙, i = 1, … , k,


lying in the upper approximation are assigned a relatively

k
lower weight, controlled by parameter wup during compu- (ii) Ci = V,
tation. The centroid of cluster Ui is determined by i=1

(iii) Ci Cj = 𝜙, i ≠ j.
⎧ w A + w B , if RCi ≠ 𝜙 ∧ RCi − RCi ≠ 𝜙
⎪ low 1 up 1
vi = ⎨ B1 , if RCi = 𝜙 ∧ RCi − RCi ≠ 𝜙 ,
⎪ A , if RCi ≠ 𝜙 ∧ RCi − RCi = 𝜙 Property (i) states that each cluster cannot be empty. Proper-
⎩ 1
ties (ii) and (iii) state that every v ∈ V belongs to one and
(2)
only one cluster. In this case, C is a partition of the universe.
where,
It is obviously that there are three types of relation-
∑ ships between an object and a cluster, namely, belong-
xk
A1 =
x∈RCi
, (3) to definitely, not belong-to definitely, and uncertain. It is
�RCi � therefore more appropriate to use three regions to repre-
sent a cluster. Inspired by ideas of three-way decision, Yu
∑ [50, 52] proposed a framework of three-way clustering.
x∈(RXCi −RCi ) xk
B1 = . (4) In contrast to the general crisp representation of a cluster,
�RCi − RCi � three-way clustering represents a three-way cluster Ci as
a pair of sets:
The parameters wlow and wup correspond to the relative
importance of the lower and upper approximations, respec- Ci = ( Co (Ci ), Fr (Ci )),
tively. Here |RCi | indicates the number of patterns in the where Co (Ci ) ⊂ V and Fr (Ci ) ⊂ V . Let
lower approximation of cluster Ci , while |RCi − RCi | is the Tr (Ci ) = V − ( Co (Ci ) ∪ Fr (Ci )) . T h e t h r e e s e t s ,
number of patterns in the rough boundary lying between Co (Ci ), Fr (Ci ) and Tr (Ci ) naturally form the CoreRegion,
the two approximations. In order to determine the lower FringeRegion, and TrivialRegion, respectively, of a cluster.
and upper approximations of each cluster, Lingras et al. That is:
[4] utilized the following rules: Let dpk = min1≤i≤k dik and
T = {j ∶ djk − dpk ≤ threshold and p ≠ j}. CoreRegion(Ci ) = Co (Ci ),
FringeRegion(Ci ) = Fr (Ci ),
1. If T ≠ 𝜙, xk ∈ R(Cp ) and xk ∈ R(Cj ), ∀j ∈ T . Further- TrivialRegion(Ci ) =V − ( Co (Ci ) ∪ Fr (Ci )).
more, xk is not part of any lower bound.
2. Otherwise, if T = 𝜙, xk ∈ R(Ci ). In addition, xk ∈ R(Cp ) These subsets have the following properties.
.
Tr (Ci ) ∪ Co (Ci ) ∪ Fr (Ci ) = V,
It is observed that the performance of the algorithm is Co (Ci ) ∩ Fr (Ci ) = 𝜙,
dependent on the choice of wlow , wup and threshold. The Co (Ci ) ∩ Tr (Ci ) = 𝜙,
parameter wlow controls the importance of the objects
Fr (Ci ) ∩ Tr (Ci ) = 𝜙.
lying within the lower approximation of a cluster in deter-
mining its centroid. Hence an optimal selection of these
parameters is an issue of reasonable interest. We allowed If Fr (Ci ) = 𝜙 , the representation of Ci turns into
wup + wlow = 1 and 0.5 ≤ wlow ≤ 1. Ci = Co (Ci ). It is a set and Tr (Ci ) = U − Co (Ci ). This is
a representation of hard cluster. It means that the representa-
2.3 Three‑way clustering tion of a cluster by a single set is a special case of three-way
cluster in which the fringe regions are empty.
The framework of three-way clustering was first proposed There are different requirements on Co (Ci ) and Fr (Ci ).
by Yu [50, 52]. We summarize the basic concepts of three- In this paper, we adopt the following properties:
way clustering. (I) Co (Ci ) ≠ 𝜙, i = 1, … , k,
Assume ℂ = {C1 , … , Ck } is a family clusters of universe
V = {v1 , … , vn }. A hard clustering requires that Ci satisfies ⋃k
(II) ( Co (Ci ) ∪ Fr (Ci )) = V,
the following conditions: i=1
(III) Co (Ci ) ∩ Co (Cj ) = 𝜙, i ≠ j.

13
International Journal of Machine Learning and Cybernetics

Property (I) demands that each cluster cannot be empty. the specific cluster. We suppose that clustering results satisfy
Property (II) states that any element in V must be in the core following properties.
or the fringe region of at least one cluster. It is possible that
an element v ∈ V belongs to more than one cluster. Property • Property 1: An object can be a member of one core region
(III) requires that the core regions of clusters are pairwise at most.
disjoint. Base on the above discussions, we have the follow- • Property 2: An object can be a member of one fringe region
ing family of clusters to represent the result of three-way at least.
clustering:

ℂ = {( Co (C1 ), Fr (C1 )), ( Co (C2 ), Fr (C2 )), … , ( Co (Ck ), Fr (Ck ))}.

3 The proposed TWKM The procedure of three-way k-means clustering consists


mainly of two steps. The first step is to obtain the support
We begin our discussion by introducing some nota- of each cluster and the second step is to separate the core
tions. We suppose that V = {v1 , … , vn } is a set of n region from the support. The idea of computing support
objects and ℂ = {(Co(C1 ), Fr(C1 )), (Co(C2 ), Fr(C2 )), … , comes from RKM. For each object v and randomly selected k
(Co(Ck ), Fr(Ck ))} is three-way clustering results of V. The centroids x1 , … , xk , let d(v, xj ) be the distance between itself
unions of Co(Ci ) and Fr(Ci ) are the support set of clusters and the centroid xj . Suppose d(v, xi ) = min1≤j≤k d(v, xj ) and
Ci , (i = 1, ⋯ , k), i.e., T = {j ∶ d(v, xj ) − d(v, xi ) ≤ 𝜀1 and i ≠ j}, where 𝜀1 is a given
parameter. Then,
support(Ci ) = Co(Ci ) ∪ Fr(Ci ), (i = 1, … , k).
Three-way clustering uses core region and fringe region to 1. If T ≠ 𝜙, then v ∈ support(Ci ) and v ∈ support(Cj ).
represent a cluster rather than a single set. One of the main 2. If T = 𝜙, then v ∈ support(Ci ).
tasks in three-way clustering is to construct core region and
fringe region. Based on the concept of three-way decision The modified centroid calculations for above procedure are
and three-way clustering, we develop three-way k-means given by:
(TWKM for short) clustering algorithm in this section.

As we know, k-means is a classical hard clustering algo- v∈support(Ci ) v
rithm, which represents a cluster by a single set with crisp xi = , (5)
�support(Ci )�
boundary. The main idea of k-means is to use centroids to
represent a cluster. The process can be viewed as continuous where i = 1, … , k , v are all objects in support(Ci ) ,
iteration of the centroids by associating each object to the |support(Ci )| is the number of objects in support(Ci ).
nearest centroid. Only two relationships are considered in the The mean in Eq. (5) basically tries to firstly get a coarse
process of k-means. Objects belong to the cluster if they are in idea regarding the cluster prototype and then proceeds to tune
the set, otherwise they do not. However, assigning uncertain and refine this value using data from the support. Repeat this
points into a cluster will reduce the accuracy of the method. process until modified centroids in the current iteration are
From Fig. 1, we know that there are three types of relation- identical to those that have been generated in the previous one,
ships between an element and a cluster, namely, belong-to namely, the prototypes are stabilized.
fully, belong-to partially (i.e., also not belong-to partially), From the above procedure, we get a family of overlap-
and not belong-to fully. In order to chapter the three types ping clusters, which are the unions of core regions and fringe
of relationships, we integrate idea of three-way decision into regions. How to separate the core regions from the supports
k-means and propose three-way k-means (TKWM). TWKM is another pivotal problem. We use centroids perturbation dis-
clustering is one of three-way clustering methods, that is, each tance by adding weights of elements to solve this problem.
cluster is represented by its core region and fringe region. In For a given upper bound support(Ci )(i = 1, … , k), we clas-
TWKM, an overlap clustering is used to obtain the supports of sify the elements of support(Ci ) into two types.
the clusters and perturbation analysis is applied to separate the
Type I ={v ∈ support(Ci ) ∣ ∃j = 1, … , k, j ≠ i, v ∈ support(Cj )},
core regions from the supports. The differences between the
support and the core region are regarded as the fringe region of Type II ={v ∈ support(Ci ) ∣ ∀j = 1, … , k, j ≠ i, v ∉ support(Cj )}.

13
International Journal of Machine Learning and Cybernetics

Fig. 7  Flowchart of Algorithm 1

Suppose the centroid of support(Ci ) is xi and mi is the num- cluster. Therefore, a three-way cluster is naturally formed.
ber of elements in support(Ci ). We adapt different strate- The above clustering procedure is referred as TWKM clus-
gies for different types. For objects in type I, we assign tering. Algorithm 1 is designed to describe the process of
them into fringe region of Ci because they belong to two TWKM clustering. In Algorithm 1, Line 3 to Line 15 is to
clusters at least. For objects in type II, we add mi times v find the support of each cluster by using the iteration pro-
into support(Ci ) and denote the new cluster by support(Ci∗ ). cess to update the centroids of supports. Line 16 to Line
Calculate the new centroid xi∗ of support(Ci∗ ) by (5) and the 29 is to separate the core regions from the support sets by
differences between xi∗ and xi . For a given parameter 𝜀2 , if using centroids perturbation analysis. Fig. 7 shows the flow-
|xi∗ − xi | ≤ 𝜀2 , v is assigned to core region of Ci , otherwise, chart of Algorithm 1. The time complexity of Algorithm 1 is
v is assigned to fringe region of Ci. O(tknm) + O(knm) and the space complexity of Algorithm 1
Using the same strategy for each support(Ci )(i = 1, … , k), is O(k + n)m, where t, n and m are the numbers of iterations,
we can obtain the core region and fringe region of each objects and attributes, respectively.

13
International Journal of Machine Learning and Cybernetics

where n is the total number of objects in the set and Si


is the silhouette of object vi , which defined as,
b i − ai
Si = , (9)
max{ai , bi }

ai is the average distance between xi and all other objects


in its own cluster, and bi is the minimum of the average
distance between xi and objects in other clusters.
  The silhouette of each object shows which objects
lie well within their cluster, and which ones are merely
somewhere in between clusters. The average silhouette
provides an evaluation of clustering validity. The range
of the average silhouette index is [−1, 1], a larger value
means a better clustering result.
3. Accuracy (ACC hereafter).


k j
nc
ACC = . (10)
c=1
n

where nc is the number of common objects in the cth


j

cluster and its matched class j after obtaining a one-to-


4 Evaluation of algorithm one match between clusters and classes. The higher the
value of ACC​ is, the better the clustering result is. The
The evaluation of clustering, also referred to as cluster validity, value is equal to 1 only when the clustering result is
is a crucial process to assess the performance of the learn- same as the ground truth.
ing method in identifying relevant groups. A good measure of
cluster quality will help in allowing the comparison of several
clustering methods and the analysis of whether one method 5 Experimental illustration
is superior to another one. The following quantitative indi-
ces are often used to evaluate the performance of clustering To test the performance of our proposed algorithm, six data
algorithms. sets from UCI Machine Leaning repository [4] and three
USPS ZIP code handwritten digits database [11] data sets
1. Davies–Bouldin index [3, 22] (DB hereafter). are employed in this subsection. The details of these data
{ } sets are shown in Table 1.
1∑
c
S(Ci ) + S(Cj )
DB = max (6) In order to identify the quality of three-way k-means clus-
c i=1 j≠i d(xi , xj ) tering, we use all the core regions to form a clustering result
and compute DB index, AS index and ACC​ by using core
where S(Ci ) and d(xi , xj ) are the intra-cluster distance region to represent corresponding cluster. The total number
and the inter-cluster separation, respectively. S(Ci ) is of the objects is to exclude the objects in the fringe regions
defined as follows: when calculating ACC​. A better three-way clustering results
∑ should have a lower DB value, a higher AS value and ACC​
∥ v − xi ∥
S(Ci ) =
v∈Ci
. (7) value. The performances of three-way k-means clustering
∣ Ci ∣ are presented on ten data sets. For comparing the clustering
effect, the performances of k-means [21], k-medoids [9],
As a function of the ratio of the within cluster scatter to FKM [2] and RKM [18] are also presented on each data
the between cluster separation, a lower value will mean set, where the threshold = 0.02, wup = 0.3 and wlow = 0.7 in
that the clustering is better. RKM. These experiments are repeated 100 times for each
2. Average Silhouette index [31] (AS hereafter). data set and the parameters of three-way k-means clustering
are 𝜀1 = 0.02 and 𝜀2 = 0.0023n , where n is the total num-
1∑
n
AS = S, (8) ber of objects in each set. The average values and the best
n i=1 i values of the three indices in 100 times runs are used to
compare the overall performances. The results are presented

13
International Journal of Machine Learning and Cybernetics

in Tables 2, 3, 4, 5, 6 and 7 in which, the optimal results be three types of relationships between an object and a clus-
among the different algorithms are marked as bold. The ter, namely, belong-to fully, belong-to partially (i.e., also not
CPU computing time of 100 runs on each data set, the unit belong-to partially), and not belong-to fully. Based on these
of measurement for time is the “second”, are recorded in relationships, we developed three-way k-means clustering
Table 8 as well. method by integrating k-means and three-way decisions in this
Tables 2, 3, 4 and 5 demonstrate the average perfor- paper. In the proposed method, an overlap clustering is used
mances and the best performances of DB value and AS value to obtain the supports of the clusters and perturbation analy-
by k-means, k-medoids, FKM, RKM and TWKM, respec- sis is applied to separate the core regions from the supports.
tively. From Tables 2, 3, 4 and 5, we can find that TWKM The differences between the supports and the core regions are
outperforms other algorithms in the sense of DB value and regarded as the fringe region of the specific cluster. Therefore,
AS value both for the best performances and for the average a three-way explanation of the cluster is naturally formed.
performances on most of the date sets. The improvement
can be attributed to the fact that each cluster is represented
Table 2  Average performances of DB value
by its lower bound, which helps to increase the degree of
separation between clusters and decrease the degree of scat- Data sets k-means k-medoids FKM RKM TWKM
ter within cluster since fringe regions have been successfully
WINE 1.3157 1.3553 1.3181 1.3025 1.1846
marked out. Though the performances of DB value and AS
WDBC 1.1363 1.1336 1.1446 1.1386 1.1266
value on WDBC, GLASS and BANK sets by FKM are simi-
GLASS 1.2663 0.9679 1.5416 1.1040 1.1440
lar to or superior to the results by TWKM, the computing
BANK 1.1913 1.1817 1.1981 1.1592 1.1550
time of TWKM are far less than FKM.
OCCUPANCY 0.6835 0.6833 0.6850 0.6835 0.1019
Tables 6 and 7 list the average performances and the
MAGIC 1.2603 1.3232 1.3920 1.2088 0.7790
best performances of ACC​ value, respectively. It is not dif-
USPS-08 1.9291 1.9305 1.9367 1.9312 1.9202
ficult to observe that both the average performances and the USPS-49 2.5573 2.6774 2.5683 2.5436 2.5009
best performances of ACC​obtained by TWKM are superior
USPS-3568 2.8197 2.9993 3.4175 2.8195 2.7785
to the results obtained by other methods on WINE, WDBC,
OCCUPANCY, MAGIC, USPS-08 and USPS-49. However,
other methods exceed TKWM on GLASS, BANK and USPS-
Table 3  Best performances of DB value
49. This is because ACC​is computed by using core region to
represent corresponding cluster and the total number of the Data sets k-means k-medoids FKM RKM TWKM
objects is to exclude the objects in the fringe regions, which
WINE 1.3053 1.3553 1.3181 1.2508 1.1656
means nc and n both become smaller in Eq. (10).
j
WDBC 1.1363 1.1336 1.1446 1.0887 1.1190
GLASS 0.9900 0.9546 1.2874 0.9335 0.8973
BANK 1.1911 1.1479 1.1980 1.1441 1.1550
6 Concluding remarks OCCUPANCY 0.6835 0.6830 0.6850 0.6835 0.1019
MAGIC 1.2603 1.2538 1.3920 1.2083 0.7790
In most of the existing studies, a cluster is represented by a
USPS-08 1.9268 1.9228 1.9337 1.9158 1.9157
single set, the set naturally divides the space into two regions.
USPS-49 2.5443 2.6774 2.5404 2.5104 2.4972
An object belongs to the cluster if it is in the set, otherwise it
USPS-3568 2.5378 2.9165 2.5277 2.5674 2.5258
does not belong to the cluster. It is obviously that there may

Table 1  A description of data sets used in the experiments Table 4  Average performances of AS value
ID Data sets Samples Attributes Classes Data sets k-means k-medoids FKM RKM TWKM

1 WINE 178 13 3 WINE 0.4750 0.4608 0.4741 0.4754 0.5249


2 WDBC 569 30 2 WDBC 0.5765 0.5828 0.5683 0.5826 0.5832
3 GLASS 214 9 6 GLASS 0.4609 0.5338 0.3703 0.5195 0.5136
4 BANK 1372 4 2 BANK 0.5002 0.4936 0.4926 0.5134 0.5204
5 OCCUPANCY 20560 7 2 OCCUPANCY 0.7812 0.7808 0.7782 0.7812 0.9924
6 MAGIC 19020 11 2 MAGIC 0.6133 0.5498 0.4426 0.5014 0.7279
7 USPS-08 1736 256 2 USPS-08 0.3293 0.3220 0.3275 0.3277 0.3314
8 USPS-49 1296 256 2 USPS-49 0.2309 0.2098 0.2174 0.2328 0.2366
9 USPS-3568 2420 256 4 USPS-3568 0.1651 0.1537 0.0612 0.1687 0.1689

13
International Journal of Machine Learning and Cybernetics

Table 5  Best performances of AS value Table 8  The total run time of 100 runs
Data sets k-means k-medoids FKM RKM TWKM Data sets k-means k-medoids FKM RKM TWKM

WINE 0.4764 0.4608 0.4741 0.5050 0.5356 WINE 0.31 0.83 0.26 90.76 0.55
WDBC 0.5765 0.5828 0.5683 0.6036 0.5848 WDBC 0.38 3.87 0.64 266.12 0.94
GLASS 0.4912 0.5441 0.4421 0.6671 0.5739 GLASS 0.38 1.81 1.17 108.03 1.01
BANK 0.5004 0.4974 0.4927 0.5254 0.5204 BANK 0.57 19.12 0.95 610.06 3.29
OCCUPANCY 0.7812 0.7811 0.7782 0.7812 0.9924 OCCUPANCY 3.57 65.57 21.48 13568.81 44.06
MAGIC 0.6133 0.6191 0.4426 0.5016 0.7279 MAGIC 7.98 73.49 40.92 12158.70 64.52
USPS-08 0.3297 0.3280 0.3284 0.3323 0.3326 USPS-08 5.24 59.19 49.94 1532.54 16.96
USPS-49 0.2365 0.2098 0.2208 0.2365 0.2373 USPS-49 5.15 34.55 12.35 1105.43 19.01
USPS-3568 0.1766 0.1558 0.1562 0.1781 0.1782 USPS-3568 15.31 211.60 38.78 2777.60 59.53

Table 6  Average performances of ACC​value


determine the number of clusters is another interesting
Data sets k-means k-medoids FKM RKM TWKM topic to be addressed.
3. Multigranulation is a developing approach which can be
WINE 0.9472 0.9270 0.9494 0.9221 0.9703
used for the constructing approximations of target con-
WDBC 0.9279 0.9227 0.9279 0.9283 0.9295
cept. With granular computing point of view, our pro-
GLASS 0.4311 0.4461 0.4233 0.5853 0.4633
posed algorithm is based on a single granulation. How to
BANK 0.5747 0.5814 0.6077 0.5718 0.5735
develop three-way k-means based on multigranulations
OCCUPANCY 0.8936 0.8981 0.8982 0.8936 0.9991
needs to be further discussed.
MAGIC 0.6491 0.6201 0.5780 0.5864 0.6852
USPS-08 0.8360 0.7839 0.8205 0.8214 0.8390
USPS-49 0.7323 0.6929 0.7277 0.7337 0.7658 Acknowledgements The authors would like to thank the editor and the
USPS-3568 0.7225 0.7392 0.5817 0.7477 0.7251 anonymous reviewers for their constructive and valuable comments.
This work was supported in part by National Natural Science Foun-
dation of China (nos. 61503160, 61773012 and 61572242), Natural
Science Foundation of the Jiangsu Higher Education Institutions of
Table 7  Best performances of ACC​value China (no. 15KJB110004).

Data sets k-means k-medoids FKM RKM TWKM

WINE 0.9719 0.9270 0.9494 0.9591 0.9818


WDBC 0.9279 0.9227 0.9279 0.9317 0.9327 References
GLASS 0.5374 0.4766 0.4533 0.8738 0.5780
BANK 0.5758 0.6378 0.6079 0.5869 0.5735 1. Afridi MK, Azam N, Yao JT, Alanazi E (2018) A three-way
clustering approach for handling missing data using GTRS. Int J
OCCUPANCY 0.8936 0.9030 0.8982 0.8936 0.9991 Approx Reason. https​://doi.org/10.1016/j.ijar.2018.04.001
MAGIC 0.6491 0.6550 0.5780 0.5873 0.6852 2. Bezdek J (1981) Pattern recognition with fuzzy objective function
USPS-08 0.8416 0.8213 0.8415 0.8410 0.8418 algorithms. Plenum Press, New York
USPS-49 0.7677 0.6929 0.7609 0.7569 0.7690 3. Bezdek J, Pal N (1998) Some new indexes of cluster validity. IEEE
Trans Syst Man Cybern B 28:301–315
USPS-3568 0.8694 0.7669 0.7376 0.8320 0.8692 4. Blake CL, Merz CJ (2005) UCI machine learning repository. http://
www.ics.uci.edu/mlear​n/MLRep​osito​ry.html
5. Hao C, Li JH, Fan M, Liu WQ, Tsang ECC (2017) Optimal scale
selection in dynamic multi-scale decision tables based on sequential
Experimental results demonstrate that the new algorithms three-way decisions. Inf Sci 415:213–232
can significantly improve the structure of clustering results 6. Huang CC, Li JH, Mei CL, Wu WZ (2017) Three-way concept
by comparing with the traditional clustering algorithm. The learning based on cognitive operators: an information fusion view-
present study is the first step for the research of three-way point. Int J Approx Reason 83:218–242
7. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern
k-means. The following are challenges for further research. Recognit Lett 31:651–666
8. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review.
1. The parameters 𝜀1 and 𝜀2 have a significant impact on ACM Comput Surv 31:264–323
the clustering results. Research on dynamic changes of 9. Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an intro-
duction to cluster analysis. Wiley, New York
parameters will be an interesting topic to be addressed. 10. Lang GM, Miao DQ, Cai MJ (2017) Three-way decision approaches
2. In the proposed three-way k-means algorithm, the num- to conflict analysis using decision-theoretic rough set theory. Inf Sci
ber of clusters is given in advance. However, how to 406:185–207

13
International Journal of Machine Learning and Cybernetics

11. LeCun Y, Bottou L, Bengio Y, Haffner P (1990) USPS zip code 34. Singh PK (2017) Interval-valued neutrosophic graph representation
handwritten digits database. http://www.ics.uci.edu/mlear​n/MLRep​ of concept lattice and its (𝛼, 𝛽, 𝛾)-decomposition. Arab J Sci Eng
osito​ry.html 43:1–18
12. Li CP, Li JH, He M (2016) Concept lattice compression in incom- 35. Singh PK (2018) Similar vague concepts selection using their
plete contexts based on k-medoids clustering. Int J Mach Learn Euclidean distance at different granulation. Cogn Comput
Cybern 7:539–552 10:228–241
13. Li HX, Zhou XZ (2011) Risk decision making based on decision- 36. Singh PK (2018) Concept learning using vague concept lattice. Neu-
theoretic rough set: a three-way view decision model. Int J Comput ral Process Lett 48:31–52
Inf Sys 4:1–11 37. Singh PK (2017) Medical diagnoses using three-way fuzzy concept
14. Li JH, Huang CC, Qi JJ, Qia YH, Liu WQ (2017) Three-way cogni- lattice and their euclidean distance. Comp Appl Math 3:1–24
tive concept learning via multi-granularity. Inf Sci 378:244–263 38. Tou JT, Gonzalez RC (1974) Pattern recognition principles. Addi-
15. Li W, Miao DQ, Wang WL, Zhang N (2010) Hierarchical rough son-Wesley, London
decision theoretic framework for text classification. In: IEEE inter- 39. Wang PX, Yao YY (2018) CE3: A three-way clustering method
national conference on cognitive informatics, pp 484–489 based on mathematical morphology. Knowl-Based Syst 155:54–65
16. Li Y, Zhang C, Swan JR (2000) An information filtering model 40. Xu R, Wunsch DC (2005) Survey of clustering algorithms. IEEE
on the web and its application in jobagent. Knowl-Based Syst Trans Neural Netw 16:645–678
13:285–296 41. Yang X, Li TR, Fujita H, Liu D, Yao YY (2017) A unified model of
17. Liang DC, Liu D (2015) A novel risk decision-making based on sequential three-way decisions and multilevel incremental process-
decision-theoretic rough sets under hesitant fuzzy information. IEEE ing. Knowl-Based Syst 134:172–188
Trans Fuzzy Syst 23:237–247 42. Yao JT (2015) Web-based medical decision support systems for
18. Lingras P, West C (2004) Interval set clustering of web users with three-way medical decision making with game-theoretic rough sets.
rough k-means. J Intell Inf Syst 23:5–16 IEEE Trans Fuzzy Syst 23:3–15
19. Liu D, Li TR, Liang DC (2012) Three-way government decision 43. Yao YY (2009) Three-way decision: an interpretation of rules
analysis with decision-theoretic rough sets. Int J Uncertain Fuzz in rough set theory. In: Proceedings of RSKT’09, vol 5589, pp
20:119–132 642–649
20. Liu D, Yao YY, Li TR (2011) Three-way investment decisions with 44. Yao YY (2010) Three-way decisions with probabilistic rough sets.
decision-theoretic rough sets. Int J Comput Inf Sys 4:66–74 Inf Sci 180:341–353
21. Macqueen J (1967) Some methods for classification and analysis of 45. Yao YY (2011) The superiority of three-way decisions in probabil-
multivariate observations. In: Proceedings of 5th Berkeley sympo- istic rough set models. Inf Sci 181:1080–1096
sium on mathematical statistics and probability, pp 281–197 46. Yao YY (2012) An outline of a theory of three-way decisions. In:
22. Maulik U, Bandyopadhyay S (2002) Performance evaluation of Proceedings of RSCTC’12, vol 7413, pp 1–17
some clustering algorithms and validity indices. IEEE Trans Pat- 47. Yao YY (2016) Three-way decisions and cognitive computing. Cogn
tern Anal 24:1650–1654 Comput 8:543–554
23. Mirkin B (1991) Mathematical classification and clustering. Kluwer, 48. Yao YY (2017) Interval sets and three-way concept analysis in
Boston incomplete contexts. Int J Mach Learn Cybern 8:3–20
24. Mitra S, Banka H, Pedrycz W (2006) Rough-fuzzy collaborative 49. Yu H (2017) A framework of three-way cluster analysis. In: Proceed-
clustering. IEEE Trans Syst Man Cybern B 36:795–805 ings of international joint conference on rough sets, pp 300–312
25. Mitra S, Pedrycz W, Barman B (2010) Shadowed c-means: integrat- 50. Yu H, Jiao P, Yao YY, Wang GY (2016) Detecting and refining
ing fuzzy and rough clustering. Pattern Recognit 43:1282–1291 overlapping regions in complex networks with three-way decisions.
26. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:314–356 Inf Sci 373:21–41
27. Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about 51. Yu H, Wang XC , Wang GY, Zeng XH (2018) An active three-way
data. Kluwer, Boston clustering method via low-rank matrices for multi-view data. Inf Sci.
28. Pawlak Z (2004) Some issues on rough sets. Trans Rough Sets I https​://doi.org/10.1016/j.ins.2018.03.009
3100:1–58 52. Yu H, Zhang C, Wang GY (2016) A tree-based incremental overlap-
29. Pedrycz W (1998) Shadowed sets: representing and processing fuzzy ping clustering method using the three-way decision theory. Knowl-
sets. IEEE Trans Syst Man Cybern B 28:103–109 Based Syst 91:189–203
30. Qi JJ, Qian T, Wei L (2016) The connections between three-way and 53. Zhang QH, Lv GX, Chen YH, Wang GY (2018) A dynamic three-
classical concept lattices. Knowl-Based Syst 91:143–151 way decision model based on the updating of attribute values.
31. Rousseeuw P (1987) Silhouettes: a graphical aid to the interpreta- Knowl-Based Syst 142:71–84
tion and validation of cluster analysis. J Comput Math Appl Math 54. Zhang QH, Xia DY, Wang GY (2017) Three-way decision model
20:53–65 with two types of classification errors. Inf Sci 420:431–453
32. Shivhare R, Cherukuri AK (2017) Three-way conceptual approach
for cognitive memory functionalities. Int J Mach Learn Cybern Publisher’s Note Springer Nature remains neutral with regard to
8:21–34 jurisdictional claims in published maps and institutional affiliations.
33. Singh PK (2016) Three-way fuzzy concept lattice representation
using neutrosophic set. Int J Mach Learn Cybern 8:1–11

13

You might also like