Hierarchical Start
Hierarchical Start
3.0
0.5
2.5
2.0
7
0.0
8
X2
5
1.5
−0.5
9
1.0
2
2
3
−1.0
0.5
1
4
8
0.0
−1.5
1
7
−1.5 −1.0 −0.5 0.0 0.5 1.0
X1
10
10
10
8
8
6
6
4
4
2
2
0
0
FIGURE 10.9. Left: dendrogram obtained from hierarchically clustering the data
from Figure 10.8 with complete linkage and Euclidean distance. Center: the den-
drogram from the left-hand panel, cut at a height of nine (indicated by the dashed
line). This cut results in two distinct clusters, shown in different colors. Right:
the dendrogram from the left-hand panel, now cut at a height of five. This cut
results in three distinct clusters, shown in different colors. Note that the colors
were not used in clustering, but are simply used for display purposes in this figure.
different the two observations are. Thus, observations that fuse at the very
bottom of the tree are quite similar to each other, whereas observations
that fuse close to the top of the tree will tend to be quite different.
This highlights a very important point in interpreting dendrograms that
is often misunderstood. Consider the left-hand panel of Figure 10.10, which
shows a simple dendrogram obtained from hierarchically clustering nine
observations. One can see that observations 5 and 7 are quite similar to
each other, since they fuse at the lowest point on the dendrogram. Obser-
vations 1 and 6 are also quite similar to each other. However, it is tempting
but incorrect to conclude from the figure that observations 9 and 2 are
quite similar to each other on the basis that they are located near each
other on the dendrogram. In fact, based on the information contained in
the dendrogram, observation 9 is no more similar to observation 2 than it
is to observations 8, 5, and 7. (This can be seen from the right-hand panel
of Figure 10.10, in which the raw data are displayed.) To put it mathe-
matically, there are 2n−1 possible reorderings of the dendrogram, where n
is the number of leaves. This is because at each of the n − 1 points where
fusions occur, the positions of the two fused branches could be swapped
without affecting the meaning of the dendrogram. Therefore, we cannot
draw conclusions about the similarity of two observations based on their
proximity along the horizontal axis. Rather, we draw conclusions about
the similarity of two observations based on the location on the vertical axis
where branches containing those two observations first are fused.
396 10. Unsupervised Learning
9 9
0.5
0.5
7 7
0.0
0.0
8 5 8 5
X2
X2
3 3
−0.5
−0.5
2 2
−1.0
−1.0
1 1
6 6
−1.5
−1.5
4 4
−1.5 −1.0 −0.5 0.0 0.5 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0
X1 X1
9 9
0.5
0.5
7 7
0.0
0.0
8 5 8 5
X2
X2
3 3
−0.5
−0.5
2 2
−1.0
−1.0
1 1
6 6
−1.5
−1.5
4 4
−1.5 −1.0 −0.5 0.0 0.5 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0
X1 X1
Linkage Description
Maximal intercluster dissimilarity. Compute all pairwise dis-
Complete similarities between the observations in cluster A and the
observations in cluster B, and record the largest of these
dissimilarities.
Minimal intercluster dissimilarity. Compute all pairwise dis-
similarities between the observations in cluster A and the
Single observations in cluster B, and record the smallest of these
dissimilarities. Single linkage can result in extended, trailing
clusters in which single observations are fused one-at-a-time.
Mean intercluster dissimilarity. Compute all pairwise dis-
Average similarities between the observations in cluster A and the
observations in cluster B, and record the average of these
dissimilarities.
Dissimilarity between the centroid for cluster A (a mean
Centroid vector of length p) and the centroid for cluster B. Centroid
linkage can result in undesirable inversions.
TABLE 10.2. A summary of the four most commonly-used types of linkage in
hierarchical clustering.
linkage are generally preferred over single linkage, as they tend to yield
more balanced dendrograms. Centroid linkage is often used in genomics,
but suffers from a major drawback in that an inversion can occur, whereby inversion
two clusters are fused at a height below either of the individual clusters in
the dendrogram. This can lead to difficulties in visualization as well as in in-
terpretation of the dendrogram. The dissimilarities computed in Step 2(b)
of the hierarchical clustering algorithm will depend on the type of linkage
used, as well as on the choice of dissimilarity measure. Hence, the resulting
10.3 Clustering Methods 395
Linkage Description
Maximal intercluster dissimilarity. Compute all pairwise dis-
Complete similarities between the observations in cluster A and the
observations in cluster B, and record the largest of these
dissimilarities.
Minimal intercluster dissimilarity. Compute all pairwise dis-
similarities between the observations in cluster A and the
Single observations in cluster B, and record the smallest of these
dissimilarities. Single linkage can result in extended, trailing
clusters in which single observations are fused one-at-a-time.
Mean intercluster dissimilarity. Compute all pairwise dis-
Average similarities between the observations in cluster A and the
observations in cluster B, and record the average of these
dissimilarities.
Dissimilarity between the centroid for cluster A (a mean
Centroid vector of length p) and the centroid for cluster B. Centroid
linkage can result in undesirable inversions.
TABLE 10.2. A summary of the four most commonly-used types of linkage in
hierarchical clustering.
linkage are generally preferred over single linkage, as they tend to yield
more balanced dendrograms. Centroid linkage is often used in genomics,
but suffers from a major drawback in that an inversion can occur, whereby inversion
two clusters are fused at a height below either of the individual clusters in
the dendrogram. This can lead to difficulties in visualization as well as in in-
terpretation of the dendrogram. The dissimilarities computed in Step 2(b)
of the hierarchical clustering algorithm will depend on the type of linkage
used, as well as on the choice of dissimilarity measure. Hence, the resulting