0% found this document useful (0 votes)
197 views

(TAM-12) Top-Down Vs Bottom-Up Methods of Linkage For Asymmetric Agglomerative Clustering

Algorithm for agglomerative hierarchical clustering using asymmetric similarity measures is studied. We classify linkage methods into two categories of bottom-up methods and top-down methods. The proposed methods have no reversals in the dendrograms.

Uploaded by

rizalespe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
197 views

(TAM-12) Top-Down Vs Bottom-Up Methods of Linkage For Asymmetric Agglomerative Clustering

Algorithm for agglomerative hierarchical clustering using asymmetric similarity measures is studied. We classify linkage methods into two categories of bottom-up methods and top-down methods. The proposed methods have no reversals in the dendrograms.

Uploaded by

rizalespe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2012 IEEE International Conference on Granular Computing

Top-down vs Bottom-up methods of Linkage for Asymmetric Agglomerative


Hierarchical Clustering
Satoshi Takumi
Master's Prgram in Risk Engineering
University of Tsukuba
Tsukuba, Japan
[email protected]
Abstrct-Algorithms of agglomerative hierarchical cluster
ing using asymmetric similarity measures are studied. We
classify linkage methods into two categories of bottom-up
methods and top-down methods. The bottom-up methods frst
defnes a similarity measure between two object, and extends it
to similarity between clusters. In contrast, top-down methods
directly defne similarity between clusters. In classical linkage
methods based on symmetric similarity measures, the single
linakge, complete linkage, and average linkage are bottom
up, while the centroid method and the Ward methods are
top-down. We propose two a top down method and a family
of bottom-up method using asymmetric similarity measures.
A dendrogram which is the output of hierarchical clustering
ofen has reversals. We show conditions that dendrogram have
no reversals. It is proved that the proposed methods have no
reversals in the dendrograms. Two different techniques to show
asymmetry in the dendrogram are used. Examples based on
real data show how the methods work.
Keywords-hierarchical clustering; asymmetric similarity
measures; reversal in dendrogram.
I. INTRODUCTION
Recently, a huge-scale data can be easily collected owing
to the development of computer technology. Cluster analysis
alias clustering is a method to handle such data sets to
extract useful information without an externally given class
label. Clustering is used in numerical taxonomy, social
sciences, engineering, and the other felds, and many data
analysis softwares nowadays include cluster analysis pro
grams. Clustering techniques are divided into two classes
of hierarchical and non-hierarchical methods. The major
technique in the frst class is the well-known agglomera
tive hierarchical clustering [1], [3] which is old but has
been found useful in a variety of applications. Hierarchical
clustering mostly uses symmetric measures of similarity.
However, some researchers [4], [8], [9], [11], [12] studied
hierarchical clustering using asymmetric measures.
Let us review for the moment the conventional agglomer
ative hierarchical methods of clustering that use symmetric
similarity (or dissimilarity) measures. Note that a similarity
measure between objects and the one between clusters are
used. In some linkage methods such as the single linakge,
978-1-4673-2311-6/12/$31.00 2012 IEEE
Sadaaki Miyamoto
Department of Risk Engineering
University of Tsukuba
Tsukuba. Japan
[email protected]
similarity between objects are frst defned and then similar
ity between clusters are defned using the former. We call
such a linkage method as a "bottom-up" method. In another
method such as the centroid method, similarity measure
between clusters are frst defned and similarity between
objects is a special case when a cluster consists of a single
object. Such a method is called here a "top-down" method.
In agglomerative hierarchical clustering using asymmet
ric measures, foregoing methods use similarity defned in
the bottom-up manner. However, we consider a top-down
method herein that uses a asymmetric probabilistic model.
A dendrogram of hierarchical clustering sometimes has
reversals [5], [6]. We show a condition that a dendrogram
has no reversals when asymmetric similarity is used, and
prove that the proposed method has no reversals. In addition
to the proposed "top-down" method, we prove that a family
of bottom-up method that was already discussed [9] also has
no reversal in the dendrogram.
Two methods to show asymmetry in a dendrogram are
considered. The one has already been proposed [8], but
another uses a hypothesis testing that is specifc to the
proposed top-down method.
The rest of this paper is organized as follows. Section
2 describes a general theory of agglomerative hierarchical
clustering. Section 3 discusses the diference between top
down and bottom-up methods. Section 4 then considers
the two methods of linkage using asymmetric similarity
measures. Section 5 then discusses theoretical properties
concerning tree reversals. Section 6 shows the two methods
to show asymmetry in the dendrogram. Section 7 shows
applications based on real data sets, and fnally Section 8
concludes the paper.
II. AGGLOMERATIVE HIERARCHICAL CLUSTERING
Let the set of objects for clustering be X = {X
l
, ... , X N }.
Generally a cluster denoted by Gi is a subset of X. The
family of clusters is denoted by 9 = {GI,G
2
, ... ,GK},
where the clusters form a crisp partition of X:
K
U
Gi
= X
,
i=
l
Moreover the number of objects in G is denoted by
I
G
I
.
Agglomerative hierarchical clustering uses a similarity
or dissimilarity measure. We use similarity here: similarity
between two objects x, y E X is assumed to be given and
denoted by s(x, y). Similarity between two clusters is also
used, which is denoted by s( G, G') (G, G' E g) which also
is called an inter-cluster similarity.
In the classical setting a similarity measure is assumed to
be symmetric:
s(G, G') = s(G', G).
Let us frst describe a general procedure of agglomerative
hierarchical clustering [5], [6].
AHC(Agglomerative Hierarchical Clustering) Algorithm:
AHCl: Assume that initial clusters are given by
9 =
{G
1
,G
2
, ... ,GNo} where G
1
,G
2
, ... ,GN are given
initial clusters. Generally G
j
=
{X
j} C X, hence No = N.
Set K = No.
(K is the number of clusters and No is the initial number
of clusters)
Gi
= Gi (i =
1, ... , K).
Calculate s ( G, G') for all pairs G, G' E g.
AHC2: Search the pair of maximum similarity:
and let
(Gp, Gq) =
arg
m
ax s(Gi, Gj),
Gi,GjEQ
Merge: Gr = Gp U Gq.
Add Gr to 9 and delete Gp, Gq from g.
K= K-l.
if K =
1 then stop and output the dendrgram.
(1)
AHC3: Update similarity s( Gr, Gil) and s( Gil, Gr) for all
Gil E g.
Go to AHC2.
End AHC.
Note 1. The calculation of s( Gil, Gr) in AHC3 is un
necessary when the measure is symmetric: s( Gn Gil) =
s(G
"
, Gr).
Well-known linkage methods such as the single link,
complete link, and average link all assume symmetric dis
similarity measures [1], [3], [5]. In particular, the single link
uses the following inter-cluster similarity defnition:
s(G,G') = m
ax s(x,y),
xEG,yEG'
(3)
The average link defnes the next inter-cluster similarity:
s(G, G') =
I
G
I

G'I
L s(x, y).
xEG,yEG'
(4)
There are two more linkage methods of the centroid link
and the Ward method that assume objects are points in the
Euclidean space. They use dissimilarity measures related
to the Euclidean distance. For example, the centroid link
uses the square of the Euclidean distance between two
centroids of the clusters. The above mentioned fve linkage
methods all assume the symmetric property of similarity and
dissimilarity measures.
For the single link, complete link, and average link, it is
known that we have the monotonicit of mK:
If the monotonicity does not hold, we have a reversal in
a dendrogram: it means that G and G' are merged into
G = G U G' at level m = s( G, G') and after that G and Gil
are merged at the level i = s( G, Gil), and i > m occurs.
Reversals in a dendrogram are observed for the centroid
method. A simple example of a reversal is shown in Fig. 1.
I
I
Figure 1. A simple example of reversal.
III. CONCEPT OF TOP-DOWN AND BOTTOM-UP
METHODS
As briefy noted in the introduction, we classify the fve
well-known linkage methods of agglomeraitve hierarchical
clustering using symmetric similarity or dissimilarity mea
sures. In some linkage methods, similarity between objects
are frst defned and then similarity between clusters are
defned using the former. We call such a linkage method as a
"bottom-up" method. In another method, similarity measure
between clusters are frst defned and similarity between
objects is a special case when a cluster consists of a single
object. Such a method is called a "top-down" method.
The single linkage, complete linkage, and average linkage
are bottom-up methods, since s(x, y) is frst defned between
a pair of objects, and then s( G, G') is defned: the single
linkage uses (3), the average linkage uses (4), and the
complete linkage uses
s(G,G') = min s(x,y)
xEG,yEG'
(6)
In these methods, to defne s( G, G') directly is impossible.
In contrast, the centroid method and the Ward method are
top-down method, since
d(G, G') =
II
M(G) -M(G')11
2
(7)
is used in the centroid method, where d( G, G') is a dis
similarity measure and M( G) is the center of gravity (alias
centroid) of G.
In the Ward method, we frst defne
E
(G) = L Il
xk -M(G)11
2
(8)
xkEG
and then
d(G, G') = E
(G U G') - E(G) - E(G') (9)
In the both methods, the dissimilarity d( G, G') between
clusters are directly defned without referring to d(x, y).
In the next section we propose a top-down method for an
asymmetric similarity measure.
IV. ASYMMETRIC SIMILARITY MEASURES
Up to now, methods using asymmetric measures are
bottom-up, as shown in [9], [12]. In this section, we show
two asymmetric agglomerative hierarchical methods, one of
which is top-down and the other bottom-up. Why we discuss
these two methods herer is that they have no reversals in the
dendrograms, as shown in the next section.
A. A prbabilistic model
We assume a specifc example of handling citations be
tween journals in this section. This example seems very
specifc, but the proposed model can easily be used for a
wide class of real applications. This specifcation for citation
is thus for the sake of simplicity.
We hence call objects in X jourals. Assume that n(x, y)
is the number of citations from x to y: journal x cites y for
n(x, y) times. Moreover n(x) is the total number of citations
of x, i.e., the number of citations from x to all journals. We
have
n(x) ;: L n(x, y). (10)
yE
X
Note that n(x) =
LYE
X
n(x, y) does not hold in general,
since X does not generally exhaust all journals in the world.
We can defne the estimate of citation probability from x
to y:
( )
_ n(x,y)
7 x,y
- n(x)
(11)
which may be generalized to inter-cluster similarity:
7(G,G') =
LXEG,YEG'
n(x,y)
. (12)
LXEG
n(x)
This measure 7( G, G') is, however, inconvenient for clus
tering, as we discuss in the next section. Hence we defne
asymmetric similarity as follows.
Defnition 1. Assume that n(x, y) and n(x) are given as
above, and G, G' are arbitrar two clusters of X. Then an
average citation prbability frm G to G' is defned by
(G G') =
7(G, G')
r ,
I
G'I
We also defne
LXEG,YEG'
n(x, y)
I
G'I
LXEG
n(x)
.
n(G,G') = L n(x,y),
We then have
xEG,yEG'
n(G) = L n(x).
xEG
( ')
n(G,G')
r G, G =
I
G'l
n(G)"
Note that if G = {x} and G' = {y}, we have
r(G,G') =r( {x}, {y}) =7(x,y).
(13)
(14)
(15)
(16)
Hence this measure is based on the citation probability fom
x to y. We have the following formula for the updating in
AHC3.
Proposition 1. When Gp and Gq are merged into GT
(GT = Gp U Gq), the updating formula in AHC3 is:
(
"
) (
"
)
n(Gp, Gil)
+ n(Gq, Gil)
r GnG =r GpUGq,G =
I
G"
I
(n(Gp)
+
n(Gq))
,
(l7)
(
II
) (
"
)
n(G
"
, Gp)
+
n(G
"
, Gq)
r G ,GT =r G ,GpUGq =
(
I
Gp
l+I
Gq
l
)n(G")
.
(18)
The proof is straightforward and omitted.
B. Extended updating formula
An extended updating formula was proposed by Yado
hisa [12]. Let Gi, G
j
, and Gk be clusters. When Gi and G
j
are merged into Gij, the updating formula for dissimilarity
d(ij)k
from cluster Gij to another cluster Gk is:
d(ij)k
= OU
1
(dik' dki)
+
OJf
1
(djk, dkj)
+ (
1
g
1
(dij, dji)
+ ,
l
ldik - djkl
. (19)
Similarly, the updating formula for dissimilarity dk(ji) is:
dk(ij) = OU
2
(dki' dik) + 0;f
2
(dkj, djk)
+ (
2
g
2
(dij, dji)
+ ,
2
1 dki - dkj I
. (20)
H I
1 2 2
(
1
(
2 1
d
2
t ere, O
i
' O
j
, O
i
, O
j
, , " , an , are constan s or
functions. Moreover we assume f
1
(X, y) = X, f
2
(x, y) = y,
and gl(x,y) ;: max{x,y}. This pair of formulas are called
the extended updating formula.
Various linkage methods can be represented by this for
mulas [12], [9], [11], but we omit the details.
V. DENDROGRAM WITHOUT A REVERSAL
As noted earlier, the average linkage for symmetric mea
sures have no reversals in the dendrogram [1], [5], [6]. We
will show that this property of no reversal also holds for the
above two methods. First, we defne
S(K) = { s(G,G')
: V
(G,G') E 9 x 9, G i
G'} (21)
where K is the index in AHC and 9 changes as K varies,
e.g., 191 = K. Hence S(K) is the set of all values of
similarity for K.
We also assume maxS(K) is the maximum value of
S(K): it exactly is mK given by (2). We have the following
lemma.
Lemma 1. I maxS(K) is monotonically non-increasing
with respect to K:
maxS(N) maxS(N -1)
. . .
maxS(2) maxS(I),
(22)
then there is no reversal in the dendrgram.
Prof The proof is almost trivial, since maxS(K) = mK.
Thus (22) is exactly the same as (5). Q.E.D.
We have the next two propositions regarding the extended
updating formula and the probabilistic model.
Proposition 3. Assume that r( G, G') are used. For
G, G', Gil E 9, we have
r(G U G', Gil) : max{r(G, Gil), r(G', Gil)},
r(G
"
, G U G') : max{r(G
"
, G), r(G
"
, G')}.
Hence Lemma 1 is applied and no reversal occurs.
(23)
(24)
Prof The two relations can be proved by easy calculations.
If these inequalities are satisfed, the set of values in S(K)
do not increase, and hence Lemma 1 is applied. Q.E.D.
Proposition 4. Assume that the extended updating formula
is used. I o
+
0;
+
(
1
1 and , 0 (I = 1,2), we have
d(ij)k min{dik,djk}
dk(ij) min{dki, dkj}.
Hence Lemma 1 is applied and no reversal occurs.
Prof (i) Let us frst prove (25).
When dij : dji, we have
dik,djk dij.
Then, right-hand side minus left-hand side in (25) is:
(25)
(26)
(ii) The proof of (26) is similar to (i); we omit the detail.
Finally, the reason why Lemma 1 can be applied is the
same as that for Proposition 3. Q.E.D.
VI. ASYMMETRIC DENDROGRAM
Asymmetric dendrograms have been proposed for ag
glomerative hierarchical clustering using asymmetric mea
sures [8], [12]. When we plot asymmetric dendrogram, we
use two methods here: the frst method has been proposed
by Okada and Iwamoto [8], while the second method is new.
A. Representation of asymmetr using ratio
The frst method is based on the ratio of similarity
measures [8]. We use an asymmetric dendrogram that uses
the ratio of s(G, G') and s(G', G) as in Fig. 2.
G-
s(G.G)
s(G,G')
G-
Figure 2. Example of asymmetric dendrogram using the ratio.
B. Representation of asymmetr using hypothesis testing
The second method uses a hypothesis testing. Here, we
use Chi-square test, and test the following hypothesis:
When the input value and the total is such as Table I, X
2
is
given by [2]:
where we put
Ou = n(G
1
, G
2
), 0
12
= n(GI) - n(G
1
, G
2
),
0
21
= n(G
2
, Gd, 0
22
= n(G
2
) - n(G
2
, Gd,
n
1
= n(Gd, n
2
= n(G
2
).
Then, we have an asymmetric dendrogram such as those in
d(ij)k -min{dij,djd
Fig. 3 according to the rejection of Ho with two diferent
=
odik + OJdjk + (
1
gl (dij, dji) + 1l
ldik -djk 1 - di
J
signifcance levels fom X
2
with one degree of freedom [2].
(o + OJ + (
1
-1)dij + 1l
ldik -djkl
0 (27)
Table I
where we used the assumption:
0
1
+
0
1
+
(
1
>
1

I
O.
t
J -
,

The case of dij dji is handled in a similar way; we omit
the detail.
THE 2 x 2 CONTINGENCY TABLE.
class 1 c1ass2 total
c1assl
Oll 012 n1
c1ass2
021 022 n2
Figure 3. Example of asymmetric dendrogram using the hypothesis testing:
the lef fgure means that the hypothesis Ho is rejected with I % signifcance
level, the center means that Ho is rejected with 5% signifcance level, and
the right means that Ho is not rejected.
VII. NUMERICAL EXAMPLES
Two data sets were used. They are as follows:
1) First data set is the numbers of citations among eight
journals on statistics. The original data are omitted
here, which are given in [lOJ. We call this data set
citation data.
2) Second data set is the number of traveler among
twenty countries in 2001. The original data are omitted
here, which are given in [14J. We call this data traveler
data.
We used the probability model and the extended updating
formula. In the latter method, the following parameter values
were used:
1 1 2 2
1
(1
=
(2
=
0,
ei = ej = ei = ej = 2' 'l = " = o.
Note that these parameters satisfy the condition of no
reversals in Section V.
A. Citation data
Figures 4, 5 and 6 were respectively obtained fom the
the probability model and the extended updating formula.
Figures 4 and 5 used the two different dendrograms for the
same clusters. In these fgures a cluster of three journals
'JASA', 'AnnSt', and ' ComSt' are formed. They are fre
quently citing with one another. An example of asymmetry
of citation is found between 'AnnSt' and ' ComSt'; the
citation fom ' ComSt' to 'AnnSt' is stronger than the reverse
direction in both the probability model and the updating
formula. When the dendrogram with the hypothesis testing
was used, the hypotheses at all branches were rejected with
1 % signifcance level.
B. Traveler data
Figures 7 and 8 were respectively obtained from the
the probability model and the extended updating formula.
We observed similar clusters in the both dendrograms.
First, countries in same area are merged, and then clusters
refecting geometrical closeness are formed. It moreover
seems that the probability method provides a better-balanced
dendrogram than the extended updating formula.
ComSI
JASA
r
r
AnnSI
JRSSB
I
-
Bioka
Biocs
JRSSC
Tech
o
Figure 4. Dendrogram of journal citation data using the probability model
with the representation of asymmetry using the ratio.
ComSI
JASA
I
-
I
AnnSI
JRSSB
I
Bioka
Biocs
JRSSC
Tech
o
Figure 5. Dendrogram of journal citation data using the probability model
with the representation of asymmetry using the hypothesis testing.
VIII. CONCLUSION
We discussed what we call bottom-up methods and top
down methods in agglomerative hierarchical clustering using
asymmetric similarity measures. Study of top-down methods
will lead to the development of a new theory and algorithms
in agglomerative hierarchical clustering.
The theory of reversals in dendrograms has also been
developed and it has been proved that the two methods
in this paper are without any reversal in the dendrogram.
This theory can be applied to other existing and new linkage
methods.
We have shown applications to small scale data sets. In
near future, we will study how to handle large-scale data by
Tech
JRSSB
Bioka
JASA
r-
-
AnnS!
CorS! U
Biocs
JRSSC
1200 1000 800 600 400 200 o
Figure 6. Dendrogram of journal citation data using the extended updating
formula with the representation of asymmetry using the ratio.
China
Hong Kong
Taiwan
Malaysia
Singapore
Indonesia
Thailand
Australia
New Zealand
India
France
Italy
Swizerland
United Kingdom
South Africa
Turkey
United States
Canada
Japan
Korea
--
I

r
I
I
r
o
Figure 7. Dendrogram of traveler data using the probability model with
the representation of asymmetry using the ratio.
agglomerative hierarchical methods using both symmetric
and asymmetric measures of similarity.
ACKNOWLEDGMENT
This work has partly been supported by the Grant-in-Aid
for Scientifc Research, Japan Society for the Promotion of
Science, No. 23500269.
REFERENCES
[1] M.R. Anderberg, Cluster Analysis for Applications, Academic
Press, New York, 1960.
[2] W.J. Conover, Prctical Nonparmetric Statistics, Wiley, New
York, 1971.
South Africa
Turkey
France
United Kingdom
Italy
Swizerland
Canada
United States
Australia
New Zealand
Malaysia
Singapore
Indonesia
Thailand
China
Hong Kong
Taiwan
Korea
Japan
India
I
--
58568488
h
-
r

o
Figure 8. Dendrogram of traveler data using the extended updating formula
with the representation of asymmetry using the ratio.
[3] B.S. Everitt, Cluster Analysis, 3rd Edition, Arnold, London,
1993.
[4] L. Hubert, Min and max hierarchical clustering using asym
metric similarity measures, Psychometrika, Vo1.38, No.1,
pp.63-72, 1973.
[5] S. Miyamoto, Fuzzy Sets in Information Retrieval and Cluster
Analysis, Kluwer, Dordrecht, 1990.
[6] S. Miyamoto, Introduction to Cluster Analysis, Morikita
Shuppan, Tokyo, 1999 (in Japanese).
[7] S. Miyamoto, Fuzzy multisets and their generalizations, in
C.S. Calude et at., eds., Multiset Processing, Lecture Notes
in Computer Science, LNCS 2235, Springer, Berlin, pp. 225-
235, 2001.
[8] A. Okada, T Iwamoto, A Comparison before and after the
Joint First Stage Achievement Test by Asymmetric Cluster
Analysis, Behaviormetrika, Vo1.23, No.2, pp.169-185, 1996.
[9] T Saito, H. Yadohisa, Data Analysis of Asymmetric Struc
tures, Marcel Dekker, New York, 2005.
[10] S.M. Stigler, Citation Patterns in the Journals of Statistics and
Probability, Statistical Science, vol.9, pp.94-108, 1994.
[11] A. Takeuchi, T. Saito, H. Yadohisa, Asymmetric agglomera
tive hierarchical clustering algorithms and their evaluations,
Joural of Classication, Vo1.24, pp.123-143, 2007.
[12] H. Yadohisa, Formulation of Asymmetric Agglomerative
Clustering and Graphical Representation of Its Result, J. of
Japanese Society of Computational Statistics, Vo1.15, No.2,
pp.309-316, 2002 (in Japanese).
[13] S. Miyamoto, K. Nakayama, Similarity Measures Based on
Fuzzy Set Model and Application to Hierarchical Clustering,
IEEE Trans. SMC., pp. 479-482, 1986.
[14] http://www.unwto-osaka.orglindex.html

You might also like