0% found this document useful (0 votes)
116 views

Hierarchical Clustering

Hierarchical clustering is an unsupervised machine learning algorithm that builds a hierarchy of clusters. It does not require specifying the number of clusters beforehand. There are two types: agglomerative which starts with each example as a cluster and merges them iteratively, and divisive which starts with all examples in one cluster and divides them. The linkage criteria determines how the distance between clusters is calculated during the merging or dividing process.

Uploaded by

Wet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views

Hierarchical Clustering

Hierarchical clustering is an unsupervised machine learning algorithm that builds a hierarchy of clusters. It does not require specifying the number of clusters beforehand. There are two types: agglomerative which starts with each example as a cluster and merges them iteratively, and divisive which starts with all examples in one cluster and divides them. The linkage criteria determines how the distance between clusters is calculated during the merging or dividing process.

Uploaded by

Wet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Artificial Intelligence (CSC9YE)

Hierarchical Clustering

Gabriela Ochoa
[email protected]
Hierarchical Clustering

I K-means clustering requires us to pre-specify the number of


clusters K. This can be a disadvantage.
I Hierarchical clustering does not require that we commit to a
particular choice of K.
I Hierarchical clustering algorithms produce a hierarchy which
can be visualised as a dendrogram (tree).
I The histogram helps us to decide on the number of clusters K.
I Two types of hierarchical clustering: agglomerative
(bottom-up) and divisive (top-down).

1 / 34
Types of Hierarchical Clustering

I Agglomerative (bottom-up, agglomerative nesting)


I Each example starts as a single-element cluster (leaf)
I At each step, the two clusters that are the most similar are
combined into a new bigger cluster (nodes)
I Iterate until all points are member a single big cluster (root).
I Divisive (top-down, divisive analysis)
I Start with a single cluster (root) with all examples
I At each step, the most heterogenous cluster is divided into two
I Iterate, until all examples are in their own cluster

2 / 34
Measuring Similarity

I We we need a way of measuring the similarity between two


clusters
I As we discussed for K-Means, we can measure the
(dis)similarity of observations using a distance measure, such
as the Euclidean distance
I How do we measure the dissimilarity between two clusters of
observations?
I A number of different cluster agglomeration methods (i.e,
linkage methods) have been proposed.
I Minimum or single linkage
I Maximum or complete linkage
I Mean or average linkage
I Inter cluster variance

3 / 34
Linkage Methods

I The linkage method determines the metric used for merging


clusters. Two clusters are merged if thy are close.
I All pairwise distances between the observations in cluster A
and the observations in cluster B, are computed.
Linkage Description
Single Minimal inter-cluster distance. Shortest distance
between observations of pairs of clusters.
Complete Maximal inter-cluster distance.Maximum distance
between observations of pairs of clusters.
Average Mean inter-cluster distance. Average distance be-
tween observations of pairs of clusters.
Ward Inter-cluster variance. Variance of the pairs of clus-
ters. Sum of squared differences within all clusters.

4 / 34
Agglomerative Clustering
Pseudo-code

Input: set of examples without labels


I Let each example form one cluster. For N examples, this
means creating N clusters, each containing a single example
I Find a pair of clusters with the smallest cluster-to-cluster
distance.
I Merge the two clusters into one, thus reducing the total
number of clusters to N − 1.
I Iterate until all examples are member a single big cluster.

5 / 34
Agglomerative Clustering
Intuitive simple example

Raw data

Dendogram

6 / 34
Real-world Dataset
Violent Crime Rates by US State

Statistics, in arrests per 100,000 residents, in each of the 50 US


states in 1973.
I Murder: Murder arrests (per 100,000)
I Assault: Assault arrests (per 100,000)
I UrbanPop: Percent of urban population
I Rape: Rape arrests (per 100,000)

7 / 34
Dendrogram after Agglomerative Clustering
Violent Crime Rates by US State

I Each leaf corresponds to one observation.


I As we move up the tree, observations that are similar to each
other are combined into branches, which are themselves fused
at a higher height.
I The vertical axis indicates dissimilarity. The higher the height
of the fusion, the less similar the observations are.

8 / 34
How many clusters?
Violent Crime Rates by US State

I The height of the cut to the dendrogram controls the number


of clusters obtained.
I It plays the same role as the K in K-means clustering
I In order to identify sub-groups (i.e. clusters), we need cut the
dendrogram. Here an example of cutting the tree into 4
groups

9 / 34
Agglomerative Clustering
Synthetic small dataset

1
8

10
6

4
4

5 6
2

7
0

0 2 4 6 8

10 / 34
Hierarchical Clustering
Synthetic small dataset

Distance Matrix
1 2 3 4 5 6 7 8 9 10
1 0.0
2 1.0 0.0
3 3.6 2.8 0.0
4 4.0 3.0 2.2 0.0
5 6.0 5.0 3.6 2.0 0.0
6 6.1 5.1 4.2 2.2 1.0 0.0
7 10.0 9.2 6.4 7.2 6.3 7.3 0.0
8 8.6 7.8 5.0 5.8 5.1 6.1 1.4 0.0
9 8.6 8.1 5.4 7.1 7.1 8.1 3.2 2.8 0.0
10 5.4 5.1 3.2 5.4 6.4 7.2 6.1 5.0 3.6 0.0

Clusters: {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {9}, {10}

11 / 34
Hierarchical Clustering
Single (minimal) distance

1
8

10
6

4
4

5 6
2

7
0

0 2 4 6 8

12 / 34
Hierarchical Clustering

d k Clusters Comment
0.0 10 {1}, {2}, {3}, {4}, {5}, Start with each observation as one
{6}, {7}, {8}, {9}, {10} cluster.
1.0 8 {1, 2}, {3}, {4}, {5, 6}, Merge {1} and {2} as well as {5}
{7}, {8}, {9}, {10} and {6} since they are the closest:
d(1,2)=1 and d(5,6)=1

13 / 34
Hierarchical Clustering

Distance Matrix
1 2 3 4 5 6 7 8 9 10
1 0.0
2 1.0 0.0
3 3.6 2.8 0.0
4 4.0 3.0 2.2 0.0
5 6.0 5.0 3.6 2.0 0.0
6 6.1 5.1 4.2 2.2 1.0 0.0
7 10.0 9.2 6.4 7.2 6.3 7.3 0.0
8 8.6 7.8 5.0 5.8 5.1 6.1 1.4 0.0
9 8.6 8.1 5.4 7.1 7.1 8.1 3.2 2.8 0.0
10 5.4 5.1 3.2 5.4 6.4 7.2 6.1 5.0 3.6 0.0

Clusters: {1, 2}, {3}, {4}, {5, 6}, {7}, {8}, {9}, {10}

14 / 34
Hierarchical Clustering

1
8
2

10
6

4
4

5 6
2

7
0

0 2 4 6 8

15 / 34
Hierarchical Clustering

d k Clusters Comment
0.0 10 {1}, {2}, {3}, {4}, {5}, Start with each observation as one
{6}, {7}, {8}, {9}, {10} cluster.
1.0 8 {1, 2}, {3}, {4}, {5, 6}, Merge {1} and {2} as well as {5}
{7}, {8}, {9}, {10} and {6} since they are the closest:
d(1,2)=1 and d(5,6)=1
1.4 7 {1, 2}, {3}, {4}, {5, 6}, Merge {7} and {8} since they are the
{7, 8}, {9}, {10} closest: d(7,8)=1.4

16 / 34
Hierarchical Clustering

Distance Matrix
1 2 3 4 5 6 7 8 9 10
1 0.0
2 1.0 0.0
3 3.6 2.8 0.0
4 4.0 3.0 2.2 0.0
5 6.0 5.0 3.6 2.0 0.0
6 6.1 5.1 4.2 2.2 1.0 0.0
7 10.0 9.2 6.4 7.2 6.3 7.3 0.0
8 8.6 7.8 5.0 5.8 5.1 6.1 1.4 0.0
9 8.6 8.1 5.4 7.1 7.1 8.1 3.2 2.8 0.0
10 5.4 5.1 3.2 5.4 6.4 7.2 6.1 5.0 3.6 0.0

Clusters: {1, 2}, {3}, {4}, {5, 6}, {7, 8}, {9}, {10}

17 / 34
Hierarchical Clustering

1
8
2

10
6

4
4

5 6
2

7
0

0 2 4 6 8

18 / 34
Hierarchical Clustering

d k Clusters Comment
0.0 10 {1}, {2}, {3}, {4}, {5}, Start with each observation as one
{6}, {7}, {8}, {9}, {10} cluster.
1.0 8 {1, 2}, {3}, {4}, {5, 6}, Merge {1} and {2} as well as {5}
{7}, {8}, {9}, {10} and {6} since they are the closest:
d(1,2)=1 and d(5,6)=1
1.4 7 {1, 2}, {3}, {4}, {5, 6}, Merge {7} and {8} since they are the
{7, 8}, {9}, {10} closest: d(7,8)=1.4
2.0 6 {1, 2}, {3}, {4, 5, 6}, Merge {4} and {5, 6} since 4 and 5
{7, 8}, {9}, {10} are the closest: d(4,5)=2.0

19 / 34
Hierarchical Clustering

Distance Matrix
1 2 3 4 5 6 7 8 9 10
1 0.0
2 1.0 0.0
3 3.6 2.8 0.0
4 4.0 3.0 2.2 0.0
5 6.0 5.0 3.6 2.0 0.0
6 6.1 5.1 4.2 2.2 1.0 0.0
7 10.0 9.2 6.4 7.2 6.3 7.3 0.0
8 8.6 7.8 5.0 5.8 5.1 6.1 1.4 0.0
9 8.6 8.1 5.4 7.1 7.1 8.1 3.2 2.8 0.0
10 5.4 5.1 3.2 5.4 6.4 7.2 6.1 5.0 3.6 0.0

Clusters: {1, 2}, {3}, {4, 5, 6}, {7, 8}, {9}, {10}

20 / 34
Hierarchical Clustering

1
8
2

10
6

4
4

5 6
2

7
0

0 2 4 6 8

21 / 34
Hierarchical Clustering

d k Clusters Comment
0.0 10 {1}, {2}, {3}, {4}, {5}, Start with each observation as one
{6}, {7}, {8}, {9}, {10} cluster.
1.0 8 {1, 2}, {3}, {4}, {5, 6}, Merge {1} and {2} as well as {5}
{7}, {8}, {9}, {10} and {6} since they are the closest:
d(1,2)=1 and d(5,6)=1
1.4 7 {1, 2}, {3}, {4}, {5, 6}, Merge {7} and {8} since they are the
{7, 8}, {9}, {10} closest: d(7,8)=1.4
2.0 6 {1, 2}, {3}, {4, 5, 6}, Merge {4} and {5, 6} since 4 and 5
{7, 8}, {9}, {10} are the closest: d(4,5)=2.0
2.2 5 {1, 2}, {3, 4, 5, 6}, Merge {3} and {4, 5, 6} since 3 and
{7, 8}, {9}, {10} 4 are the closest: d(3,4)=2.2

22 / 34
Hierarchical Clustering

Distance Matrix
1 2 3 4 5 6 7 8 9 10
1 0.0
2 1.0 0.0
3 3.6 2.8 0.0
4 4.0 3.0 2.2 0.0
5 6.0 5.0 3.6 2.0 0.0
6 6.1 5.1 4.2 2.2 1.0 0.0
7 10.0 9.2 6.4 7.2 6.3 7.3 0.0
8 8.6 7.8 5.0 5.8 5.1 6.1 1.4 0.0
9 8.6 8.1 5.4 7.1 7.1 8.1 3.2 2.8 0.0
10 5.4 5.1 3.2 5.4 6.4 7.2 6.1 5.0 3.6 0.0

Clusters: {1, 2}, {3, 4, 5, 6}, {7, 8}, {9}, {10}

23 / 34
Hierarchical Clustering

1
8
2

10
6

4
4

5 6
2

7
0

0 2 4 6 8

24 / 34
Hierarchical Clustering

d k Clusters Comment
0.0 10 {1}, {2}, {3}, {4}, {5}, Start with each observation as one
{6}, {7}, {8}, {9}, {10} cluster.
1.0 8 {1, 2}, {3}, {4}, {5, 6}, Merge {1} and {2} as well as {5}
{7}, {8}, {9}, {10} and {6} since they are the closest:
d(1,2)=1 and d(5,6)=1
1.4 7 {1, 2}, {3}, {4}, {5, 6}, Merge {7} and {8} since they are the
{7, 8}, {9}, {10} closest: d(7,8)=1.4
2.0 6 {1, 2}, {3}, {4, 5, 6}, Merge {4} and {5, 6} since 4 and 5
{7, 8}, {9}, {10} are the closest: d(4,5)=2.0
2.2 5 {1, 2}, {3, 4, 5, 6}, Merge {3} and {4, 5, 6} since 3 and
{7, 8}, {9}, {10} 4 are the closest: d(3,4)=2.2
2.8 3 {1, 2, 3, 4, 5, 6}, Merge {1, 2} and {3, 4, 5, 6} as
{7, 8, 9}, {10} well as {7, 8} and {9} since 2 and
3 as well as 8 and 9 are the closest:
d(2,3)=2.8 and d(8,9)=2.8

25 / 34
Hierarchical Clustering

Distance Matrix
1 2 3 4 5 6 7 8 9 10
1 0.0
2 1.0 0.0
3 3.6 2.8 0.0
4 4.0 3.0 2.2 0.0
5 6.0 5.0 3.6 2.0 0.0
6 6.1 5.1 4.2 2.2 1.0 0.0
7 10.0 9.2 6.4 7.2 6.3 7.3 0.0
8 8.6 7.8 5.0 5.8 5.1 6.1 1.4 0.0
9 8.6 8.1 5.4 7.1 7.1 8.1 3.2 2.8 0.0
10 5.4 5.1 3.2 5.4 6.4 7.2 6.1 5.0 3.6 0.0

Clusters: {1, 2, 3, 4, 5, 6}, {7, 8, 9}, {10}

26 / 34
Hierarchical Clustering

1
8
2

10
6

4
4

5 6
2

7
0

0 2 4 6 8

27 / 34
Hierarchical Clustering
d k Clusters Comment
0.0 10 {1}, {2}, {3}, {4}, {5}, Start with each observation as one
{6}, {7}, {8}, {9}, {10} cluster.
1.0 8 {1, 2}, {3}, {4}, {5, 6}, Merge {1} and {2} as well as {5}
{7}, {8}, {9}, {10} and {6} since they are the closest:
d(1,2)=1 and d(5,6)=1
1.4 7 {1, 2}, {3}, {4}, {5, 6}, Merge {7} and {8} since they are the
{7, 8}, {9}, {10} closest: d(7,8)=1.4
2.0 6 {1, 2}, {3}, {4, 5, 6}, Merge {4} and {5, 6} since 4 and 5
{7, 8}, {9}, {10} are the closest: d(4,5)=2.0
2.2 5 {1, 2}, {3, 4, 5, 6}, Merge {3} and {4, 5, 6} since 3 and
{7, 8}, {9}, {10} 4 are the closest: d(3,4)=2.2
2.8 3 {1, 2, 3, 4, 5, 6}, Merge {1, 2} and {3, 4, 5, 6} as
{7, 8, 9}, {10} well as {7, 8} and {9} since 2 and
3 as well as 8 and 9 are the closest:
d(2,3)=2.8 and d(8,9)=2.8
3.2 2 {1, 2, 3, 4, 5, 6, 10}, Merge {1, 2, 3, 4, 5, 6} and
{7, 8, 9} {10} since 3 and 10 are the closest:
d(3,10)=3.2

28 / 34
Hierarchical Clustering

Distance Matrix
1 2 3 4 5 6 7 8 9 10
1 0.0
2 1.0 0.0
3 3.6 2.8 0.0
4 4.0 3.0 2.2 0.0
5 6.0 5.0 3.6 2.0 0.0
6 6.1 5.1 4.2 2.2 1.0 0.0
7 10.0 9.2 6.4 7.2 6.3 7.3 0.0
8 8.6 7.8 5.0 5.8 5.1 6.1 1.4 0.0
9 8.6 8.1 5.4 7.1 7.1 8.1 3.2 2.8 0.0
10 5.4 5.1 3.2 5.4 6.4 7.2 6.1 5.0 3.6 0.0

Clusters: {1, 2, 3, 4, 5, 6, 10}, {7, 8, 9}

29 / 34
Hierarchical Clustering

1
8
2

10
6

4
4

5 6
2

7
0

0 2 4 6 8

30 / 34
Hierarchical Clustering
d k Clusters Comment
0.0 10 {1}, {2}, {3}, {4}, {5}, Start with each observation as one
{6}, {7}, {8}, {9}, {10} cluster.
1.0 8 {1, 2}, {3}, {4}, {5, 6}, Merge {1} and {2} as well as {5}
{7}, {8}, {9}, {10} and {6} since they are the closest:
d(1,2)=1 and d(5,6)=1
1.4 7 {1, 2}, {3}, {4}, {5, 6}, Merge {7} and {8} since they are the
{7, 8}, {9}, {10} closest: d(7,8)=1.4
2.0 6 {1, 2}, {3}, {4, 5, 6}, Merge {4} and {5, 6} since 4 and 5
{7, 8}, {9}, {10} are the closest: d(4,5)=2.0
2.2 5 {1, 2}, {3, 4, 5, 6}, Merge {3} and {4, 5, 6} since 3 and
{7, 8}, {9}, {10} 4 are the closest: d(3,4)=2.2
2.8 3 {1, 2, 3, 4, 5, 6}, Merge {1, 2} and {3, 4, 5, 6} as
{7, 8, 9}, {10} well as {7, 8} and {9} since 2 and
3 as well as 8 and 9 are the closest:
d(2,3)=2.8 and d(8,9)=2.8
3.2 2 {1, 2, 3, 4, 5, 6, 10}, Merge {1, 2, 3, 4, 5, 6} and
{7, 8, 9} {10} since 3 and 10 are the closest:
d(3,10)=3.2
3.6 1 {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} Merge remaining two clusters,
d(9,10)=3.6
31 / 34
Hierarchical Clustering
Single Linkage Cluster Dendrogram
3.5
3.0

10
2.5

9
Height

2.0

4
1.5
1.0

6
32 / 34
Height
1.0 1.5 2.0 2.5 3.0 3.5
9
7
8
10
1
2
3
Single Linkage

4
5
6

Height
Hierarchical Clustering

0 2 4 6 8 10
9
7
8
5
6
10
1
2
Complete Linkage

3
4

Height
1 2 3 4 5 6 7
9
7
8
4
5
6
1
Average Linkage

2
3
10
33 / 34
Conclusions

I Hierarchical clustering seeks to build a hierarchy of clusters.


I We can decided on the number of clusters after exploring the
dendrogram.
I Unsupervised learning is important for understanding the
variation and grouping structure of a set of unlabeled data,
and can be a useful pre-processor for supervised learning.
I It is intrinsically more difficult than supervised learning
because there is no gold standard (like an outcome variable)
and no single objective (like test set accuracy).

34 / 34

You might also like