Clustering
Clustering
CLUSTERING
Unit : 7 : Clustering
Sr.No. Topics 06
3
Clustering : Application
4
Clustering : Example
5
Similarity?
6
Clustering (Contd.)
7
Types of Clustering
▷ Hard Clustering:
○ Each record is in one and only one cluster
▷ Soft Clustering:
○ Each record has a probability of being in each
cluster
8
Type of Clustering Algorithms
Hierarchical clustering
Divisive clustering (top down)
Agglomerative clustering (bottom up)
Partitional clustering
K-means clustering
K-medoids clustering
EM (expectation maximization) clustering
Density-Based Methods
Regions of dense points separated by sparser
regions of relatively low density
9
Type of Clustering Algorithms
1. Bottom-up approach:
2. Initially each item x1, . . . , xn is in its own cluster C1, . .
. ,Cn.
3. Repeat until there is only one cluster left:
Merge the nearest clusters, say Ci and Cj .
11
12
Agglomerative algorithm
13
Agglomerative algorithm
Single link approach
3. Two clusters are merged if
min dist between any two points <= threshold dist
d (ci ,c j ) min d ( x, y )
xci , yc j
4. Identify the pair of objects and merge them.
5. Compute all distances from the new cluster and
update the distance matrix after merger.
6. Continue merging process until only one cluster is left
or stop merging once the distance between clusters is
above the threshold.
▷ Single-linkage tends to produce stringy or elongated
clusters that is clusters are chained together by only
single objects that happen to be close together.
14
Single Link Example
15
Agglomerative algorithm
d (ci ,c j ) max d ( x, y )
xci , yc j
4. Identify the pair of objects and merge them.
5. Compute all distances from the new cluster and
update the distance matrix after merger.
6. Continue merging process until only one cluster is
left or stop merging once the distance between
clusters is above the threshold.
▷ Clusters tend to be compact and roughly equal in
diameter.
16
Complete Link Example
17
Agglomerative algorithm
d (ci ,c j ) d ( x, x ' ) / c
xci x 'cj
i .cj
18
Dendogram: Hierarchical Clustering
• Clustering obtained by
cutting the dendrogram
at a desired level: each
connected component
forms a cluster.
19
19
20
▷ Use single and complete link
agglomerative clustering to group the
data described by the following distance
matrix. Show the dendrograms.
21
Single link
Dendrogram
AB C D
AB 0 2 5
C 0 3
D 0
ABC D
ABC 0 3
D 0
22
23
Complete link
Dendrogram
AB C D
AB 0 4 6
C 0 3
D 0
ABC D
ABC 0 6
D 0
24
complete link: distance between two clusters is the longest distance
between a pair of elements from the two clusters
25
▷ Use single and complete link
agglomerative clustering to group the
data described by the following distance
matrix. Show the dendrograms.
A B C D E
A 0 1 2 2 3
B 1 0 2 4 3
C 2 2 0 1 5
D 2 4 1 0 3
E 3 3 5 3 0
26
Single link
A B C D E
A 0 1 2 2 3
3
B 1 0 2 4 3
C 2 2 0 1 5
D 2 4 1 0 3 2
E 3 3 5 3 0
AB CD E 1
AB 0
CD 2 0
E 3 3 0 A B C D E
ABCD E
ABCD 0
E 3 0
27
Complete Link
D K K Comment
3 1 {ABCDE} Merge E
28
complete link
A B C D E
A 0 1 2 2 3
5
B 1 0 2 4 3
C 2 2 0 1 5
D 2 4 1 0 3 3
E 3 3 5 3 0
AB CD E 1
AB 0
CD 4 0
E 3 5 0 A B E C D
ABE CD
ABE 0
CD 5 0
29
D K K Comment
30
Average link
31
Average link cluster
32
33
34
35
36
37
38
Merge the clusters
39
Agglomerative algorithm
NOTE:
Start with:
▷ Initial no of clusters=n.
▷ Initial value of threshold dist, d = 0.
▷ Increment d in each iteration.
▷ Check condition (min dist<=d) for merging.
▷ Continue until you get one cluster.
40
Weakness of Agglomerative
41
Hierarchical Clustering algorithms
▷ Divisive (top-down):
○ Generate Spanning Tree (no loop in the graph)
○ Start with all documents belong to the same cluster.
○ Eventually each node forms a cluster on its own.
▷ Does not require the number of clusters k in
advance
▷ Needs a termination/readout condition
○ The final mode in both Agglomerative and Divisive is of no
use.
42
Overview
43
Hierarchical Clustering
b abcde
two sub clusters
Until each leaf cluster contains
c
cde only one object
d
de
e
44
Overview-contd
2
• Max d(E,D) = 3 so {E}
,{ABCD}
C
• Max d(B,C) = 2 so {E}
{AB}{CD}
1
E D • Finally {E}, {A},{B},{C},{D},{E}
3
46
Type of Clustering Algorithms
47
Partitioning Algorithms
49
K-Means Clustering (contd.)
▷ Example
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
50
51
Example:
52
▷ Initial clusters (random
centroid or average): k = 2
▷ c1 = 16
c2 = 22
53
New centroid
Iteration 1 c1 = 15.33
c2 = 36.25
3
3
4
4
54
c1 = 18.56
Iteration 2 c2 = 45.90
55
c1 = 19.50
Iteration 3: c2 = 47.89
56
c1 = 19.50
Iteration 4: c2 = 47.89
57
Do it yourself
58
x m1=3 m2=4 C1=x-m1 C2=x-m2 cluster new centroid
2 3 4 1 2 1
3 3 4 0 1 1 2.5
4 3 4 1 0 2
10 3 4 7 6 2
11 3 4 8 7 2
12 3 4 9 8 2
20 3 4 17 16 2
25 3 4 22 21 2
30 3 4 27 26 2 16
61
62
Measurement metric
63
Euclidean
2 3 V2
X = 2, 0 [ (y - x)² ]
1
Y = -2, -2
-3 -2 -1 0
-3 -2 -1 0 1 2 3 V1
( 42 + 22) = 20 = 4.47
64
Squared Euclidean - (y - x)²
X = 2, 0
2 3 V2
Y = -2, -2
1
-3 -2 -1 0
([-2 – 2]2 + [-2 – 0]2 )
-3 -2 -1 0 1 2 3 V1
( 42 + 22) = 20
Squared Euclidean is a little better at “noticing” strays
66
Manhattan distance or city block
|y – x|
X = 2, 0
Y = -2, -2
2 3 V2
([-2 – 2] + [-2 – 0])
1
-3 -2 -1 0
( 4 + 2) = 6
-3 -2 -1 0 1 2 3 V1
67
Chebyshev metric
68
Chebychev
max | y - x |
X = 2, 0
Y = -2, -2
2 3 V2
| -2 – 2 | = 4
1
| -2 – 0 | = 2
-3 -2 -1 0
max ( 4 & 2) = 4 -3 -2 -1 0 1 2 3 V1
69
A Simple example showing the implementation of k-means
algorithm (using K=2) . Use Euclidean distance metric.
70
71
72
73
74
75
76
77
▷ A Simple example showing the implementation of k-means
algorithm (Two dimension) (using K=3).Use Manhattan
distance metric
X Y
1 2 10 Centroid
2 2 5 c1(2,10) c2(5,8) and c3(1,2)
3 8 4
4 5 8
5 7 5
6 6 4
7 1 2
8 4 9
78
Kmean
X Y C1’ C2’( C3’( Clus C1’ C2’( C3’( clust
(2,1 5,8) 1,2) ter (2,1 6,6) 1.5, er
0) 0) 3.5)
A1 2 10 0 5 9 C1 0 8 7 C1
A2 2 5 5 6 4 C3 5 5 2 C3
A3 8 4 12 7 9 C2 12 4 7 C2
A4 5 8 5 0 10 C2 5 3 8 C2
A5 7 5 10 5 9 C2 16 2 7 C2
A6 6 4 10 5 7 C2 10 2 5 C2
A7 1 2 9 10 0 C3 9 9 2 C3
A8 4 9 3 2 10 C2 3 5 8 C1
81
EM Model
82
Soft Clustering
86
Example
87
88
89
90
EM Example
91
92
Step 1
▷ Suppose : θa : 0.6
θb : 0.5
93
Step 2:
▷ H:5
▷ T:5
94
Step :3 compute the likelyhood
95
Step 4
96
Step 4
97
Step 5
98
Step 6:
99
Step 7
100
Step 8
101
EM Example (Do it yourself)
102
Step 1
▷ Suppose : θa : 0.6
θb : 0.5
103
Step 2:
▷ H:3
▷ T:2
104
Step :3 compute the likelyhood
First round = 3 Head and 2 Tails
105
Step 4
106
Coin A (Head) Coin A (Tail) Coin B (Head) Coin B (Tail)
=2.6255 =0 =2.374 =0
107
▷ Calculate new probability of C1 and C2
for getting head
▷ This is maximization parameter θ for
each coin
▷ Θa = 9.44/ (9.44+2.67) = 0.72
▷ Θb = 8.5456/ (8.5456+3.32) = 0.72
108
Limitations
109
110