Chapter 5 Clustering
Chapter 5 Clustering
• Clustering Quality
• Partitioning clustering
• Hierarchical clustering
Clustering
• Clustering is a data mining (machine learning) technique
that finds similarities between data according to the
characteristics found in the data & groups similar data
objects into one cluster
• Given a set of points, with a notion x x x
of distance between points, group x x x x x x x
x xx x x x x
the points into some number of
x x x x x x
clusters, so that members of a
x x x xx x
cluster are in some sense as close
to each other as possible. x x
• While data points in the same x x x x
cluster are similar, those in x x x
separate clusters are dissimilar to x
one another. 2
Example: clustering
• The example below demonstrates the clustering of
padlocks of same kind. There are a total of 10 b
which various in color, size, shape, etc.
7
Cluster Evaluation: Ground Truth
• We use some labeled data (for classification)
– Assumption: Each class is a cluster.
• After clustering, a confusion matrix is
constructed. From the matrix, we compute
various measurements, entropy, purity,
precision, recall and F-score.
– Let the classes in the data D be C = (c1, c2, …, ck). The
clustering method produces k clusters, which divides D
into k disjoint subsets, D1, D2, …, Dk.
8
Evaluation of Cluster Quality using Purity
i =1 i i
where X = (x1, x2, …, xn) and Y = (y1, y2, …, yn) are two n-
dimensional data objects; n is size of vector attributes of the
data object; q= 1,2,3,…
• If q = 1, dis is Manhattan distance
n
dis( X , Y ) = (| xi − yi |
i =1 14
Similarity and Dissimilarity Between Objects
• If q = 2, dis is Euclidean distance:
n
dis( X ,Y ) = (| x − y |) 2
i =1 i i
• Cosine Similarity
– If X and Y are two vector attributes of data objects, then
cosine similarity measure is given by:
x •y
dis( X ,Y ) = i i
x y
i i
cos(d1, d2 ) = 0.94
17
The need for representative
• Key problem: as you build clusters, how do you
represent the location of each cluster, to tell which
pair of clusters is closest?
• For each cluster assign a centroid (closest to all
other points)= average of its points.
N (C )
Cm = i = 1 ip
N
• Measure inter_cluster distances by distances of centroids.
Major Clustering Approaches
• Partitioning clustering approach:
– Construct various partitions and then evaluate them by
some criterion, e.g., minimizing the sum of square errors
– Typical methods:
• distance-based: K-means clustering
• model-based: expectation maximization (EM)
clustering.
• Hierarchical clustering approach:
– Create a hierarchical decomposition of the set of data (or
objects) using some criterion
– Typical methods:
• Agglomerative Vs Divisive
• Single link Vs Complete link 19
Partitioning Algorithms: Basic Concept
• Partitioning method: Construct a partition of a database D of
n objects into a set of k clusters; such that, sum of squared
distance is minimum
• Given a k, find a partition of k clusters that optimizes the
chosen partitioning criterion
– Global optimal: exhaustively enumerate all partitions
– Heuristic methods: k-means and k-medoids algorithms
– k-means: Each cluster is represented by the center of the cluster
• K is the number of clusters to partition the dataset
• Means refers to the average location of members of a
particular cluster
– k-medoids or PAM (Partition Around Medoids): Each cluster is
represented by one of the objects in the cluster
20
The K-Means Clustering Method
• Algorithm:
• Select K cluster points as initial centroids (the initial
centroids are selected randomly)
– Given k, the k-means algorithm is implemented as
follows:
• Repeat
–Partition objects into k nonempty subsets
–Recompute the centroids of each K clusters of the
current partition (the centroid is the center, i.e.,
mean point, of the cluster)
–Assign each object to the cluster with the nearest
seed point
• Until the centroid don’t change
21
The K-Means Clustering Method
• Example
10 Assign 10
9
10
9
9
8 each 8
7
Update 8
7
7
6
objects 6
the 6
5 5
5
4
to most 4
cluster 4
similar
3
3
2
2
means 2
center
1 1
1
0 0
0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
reassign reassign
K=2
Update
10 10
9 9
Arbitrarily choose 8
7 the cluster 8
K object as initial 6
5 means 6
cluster center 4
3
4
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
22
Example Problem
• Cluster the following eight points (with (x, y)
representing locations) into three clusters : A1(2,
10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4)
A7(1, 2) A8(4, 9).
– Assume that initial cluster centers are: A1(2, 10),
A4(5, 8) and A7(1, 2).
• The distance function between two points a=(x1,
y1) and b=(x2, y2) is defined as:
dis(a, b) = |x2 – x1| + |y2 – y1| .
• Use k-means algorithm to find optimal centroids to
group the given data into three clusters.
Iteration 1
First we list all points in the first column of the table below. The initial
cluster centers – centroids, are (2, 10), (5, 8) and (1, 2) - chosen
randomly.
(2,10) (5, 8) (1, 2)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 0 5 9 1
A2 (2, 5) 5 6 4 3
A3 (8, 4) 12 7 9 2
A4 (5, 8) 5 0 10 2
A5 (7, 5) 10 5 9 2
A6 (6, 4) 10 5 7 2
A7 (1, 2) 9 10 0 3
A8 (4, 9) 3 2 10 2
Next, we will calculate the distance from each points to each of the
three centroids, by using the distance function:
dis(point i,mean j)=|x2 – x1| + |y2 – y1|
Iteration 1
• Starting from point A1 calculate the distance to each of the three
means, by using the distance function:
dis (A1, mean1) = |2 – 2| + |10 – 10| = 0 + 0 = 0
dis(A1, mean2) = |5 – 2| + |8 – 10| = 3 + 2 = 5
dis(A1, mean3) = |1 – 2| + |2 – 10| = 1 + 8 = 9
–Fill these values in the table & decide which cluster should the point (2,
10) be placed in? The one, where the point has the shortest distance to
the mean – i.e. mean 1 (cluster 1), since the distance is 0.
• Next go to the second point A2 and calculate the distance:
dis(A2, mean1) = |2 – 2| + |10 – 5| = 0 + 5 = 5
dis(A2, mean2) = |5 – 2| + |8 – 5| = 3 + 3 = 6
dis(A2, mean2) = |1 – 2| + |2 – 5| = 1 + 3 = 4
– So, we fill in these values in the table and assign the point (2, 5) to
cluster 3 since mean 3 is the shortest distance from A2.
• Analogically, we fill in the rest of the table, and place each point in
one of the clusters
Iteration 1
• Next, we need to re-compute the new cluster centers (means). We do
so, by taking the mean of all points in each cluster.
• For Cluster 1, we only have one point A1(2, 10), which was the old
mean, so the cluster center remains the same.
• For Cluster 2, we have five points and needs to take average of
them as new centroid, i,e.
( (8+5+7+6+4)/5, (4+8+5+4+9)/5 ) = (6, 6)
• For Cluster 3, we have two points. The new centroid is:
( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)
• That was Iteration1 (epoch1). Next, we go to Iteration2 (epoch2),
Iteration3, and so on until the centroids do not change anymore.
– In Iteration2, we basically repeat the process from Iteration1 this
time using the new means we computed.
Second epoch
• Using the new centroid we have to compute cluster members.
(2,10) (6, 6) (1.5, 3.5)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 0 8 7 1
A2 (2, 5) 5 5 2 3
A3 (8, 4) 12 4 7 2
A4 (5, 8) 5 3 8 2
A5 (7, 5) 10 2 7 2
A6 (6, 4) 10 2 5 2
A7 (1, 2) 9 9 2 3
A8 (4, 9) 3 5 8 1
• After the 2nd epoch the results would be:
cluster 1: {A1,A8} with new centroid=(3,9.5);
cluster 2: {A3,A4,A5,A6} with new centroid=(6.5,5.25);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Third epoch
• Using the new centroid we have to compute cluster members.
(3,9.5) (6.5, 5.25) (1.5, 3.5)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 7 1
A2 (2, 5) 2 3
A3 (8, 4) 7 2
A4 (5, 8) 8 1
A5 (7, 5) 7 2
A6 (6, 4) 5 2
A7 (1, 2) 2 3
A8 (4, 9) 8 1
• After the 3rd epoch the results would be:
cluster 1: {A1,A4,A8} with new centroid=(3.66,9);
cluster 2: {A3,A5,A6} with new centroid=(7,4.33);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Fourth epoch
• Using the new centroid we have to compute cluster members.
(3.66,9) (7, 4.33) (1.5, 3.5)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 1
A2 (2, 5) 3
A3 (8, 4) 2
A4 (5, 8) 1
A5 (7, 5) 2
A6 (6, 4) 2
A7 (1, 2) 3
A8 (4, 9) 1
• After the 3rd epoch the results would be:
cluster 1: {A1,A4,A8} with new centroid=(3.66,9);
cluster 2: {A3,A5,A6} with new centroid=(7,4.33);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Final results
• Finally in the 4th epoch there is no change of members of
clusters and centroids. So the algorithm stops.
• The result of clustering is shown in the following figure
Comments on the K-Means Method
• Strength: Relatively efficient: O(tkn), where n is # objects, k is #
clusters, and t is # iterations. Normally, k, t << n.
• Weakness
–Applicable only when mean is defined, then what about
categorical data? Use hierarchical clustering
• Need to specify k, the number of clusters, in advance
– Unable to handle noisy data and outliers Since an object with
an extremely large value may substantially distort the
distribution of the data.
• K-Medoids: Instead of taking the mean value of the object in a
cluster as a reference point, medoids can be used, which is the
most centrally located object in a cluster.
31
Hierarchical Clustering
• As compared to partitioning algorithm, in
hierarchical clustering the data are not
partitioned into a particular cluster in a
0.2
4
unstructured set of clusters returned by 3
2
4
partitioning clustering. 5
2
–Can be visualized as a dendrogram; a tree
1
like diagram that records the sequences of 3 1
merges or splits
Dendrogram: Shows How the Clusters are Merged
34
Example of hierarchical clustering
Two main types of hierarchical clustering
• Agglomerative: it is a Bottom Up clustering technique
– Start with all sample units in n clusters of size 1.
– Then, at each step of the algorithm, the pair of clusters with the shortest distance
are combined into a single cluster.
– The algorithm stops when all sample units are grouped into one cluster of size n.
• Divisive: it is a Top Down clustering technique
– Start with all sample units in a single cluster of size n.
– Then, at each step of the algorithm, clusters are partitioned into a pair of
daughter clusters, selected to maximize the distance between each daughter.
– The algorithm stops when sample units are partitioned into n clusters of size 1.
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
a
ab
b abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0
Agglomerative Clustering Algorithm
• More popular hierarchical clustering technique
• Basic algorithm is straightforward
1. Let each data point be a cluster
2. Compute the proximity matrix
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains
• Key operation is the computation of the proximity of two clusters
10 10 10
9 9 9
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Example
• Perform a agglomerative clustering of five samples using
two features X and Y. Calculate Manhattan distance
between each pair of samples to measure their similarity.
Data item X Y
1 4 4
2 8 4
3 15 8
4 24 4
5 24 12
1 2 3 4 5
1 = (4,4) X 4 15 20 28
2= (8,4) X 11 16 24
3=(15,8) X 13 13
4=(24,4) X 8
5=(24,12) X
Proximity Matrix: second epoch
{1, 2} 3 4 5
{1,2} 13 18 26
(6,4) X
3=(15,8) X 13 13
4=(24,4) X 8
5=(24,12) X
Proximity Matrix: second epoch
{1, 2} 3 {4, 5}
{1,2} 13 22
(6,4) X
3=(15,8) X 13
{4,5}(24,8) X X
Proximity Matrix: second epoch
{1, 2} {3,4, 5}
{1,2} 17.5
(6,4) X
{3,4,5} X
(19.5,8) X
Assignment
• Single link agglomerative clustering (Abush)
• Complete link agglomerative clustering
(desalegn)
• Average link agglomerative clustering
(mohammed)
• Divisive clustering (Samuel & Getu)
• Expectation Maximization clustering (Kamal &
Ali)
• K-Medoid clustering (Sofonias & Yohannes)
Exercise: Hierarchical clustering
• Using centroid apply agglomerative
clustering algorithm to cluster the following 8
examples. Show the dendrograms.
A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8),
A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).