3 Clustering
3 Clustering
Clustering
Unsupervised Algorithm
BMSCE - ME
MCL - Python
| PA G E 2
Can you group these items ?
BMSCE - ME
MCL - Python
| PA G E 3
Distance Metrics
√ √
𝑘 𝑘
𝑒𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒= ∑ √ ( 𝑥𝑖− 𝑦𝑖 ) 2
𝑚𝑎𝑛h𝑎𝑡𝑡𝑎𝑛𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒= ∑ ¿ 𝑥𝑖− 𝑦𝑖∨¿¿
𝑖=1 𝑖=1
Cosine distance
Edit distance
BMSCE - ME
MCL - Python
| PA G E 4
Good Clustering
Algorithm
1
The ability to discover some or all of the
hidden clusters.
2
Within-cluster similarity and between-cluster
dissimilarity.
3
Ability to deal with various types of
attributes.
BMSCE - ME
MCL - Python
| PA G E 5
Error Metrics
Sum of squared
errors
• Cohesion • Separation
• Intra-Cluster distance • Inter-Cluster distance
• Sum of squared errors between • Sum of squared errors between
all the points inside a cluster all the points between clusters
• Minimum is better • Maximum is better
BMSCE - ME | PA G E
MCL - Python 6
• Cohesion : Minimize
• Separation : Maximize
BMSCE - ME
MCL - Python
| PA G E 7
Complete Average
Single Linkage Linkage Linkage
Between Cluster Distance between two Distance between two Distance between two
clusters is defined as clusters is defined as clusters is defined as the
Distance the shortest distance the longest distance average distance
Functions between two points in between two points in between each point in
each cluster each cluster one cluster to every
point in the other cluster
BMSCE - ME
MCL - Python
| PA G E 8
Hierarchical Clustering
Divisive Clustering
Agglomerative
Clustering
BMSCE - ME
MCL - Python
| PA G E 9
A
A, D, F
D
A, D, F, H
F
H H
A, B, C, D, E
Agglomerative
E E
E, F, G, H
approach B, C
B
(Hierarchical) B, C, G
C
G G
BMSCE - ME
MCL - Python
| PA G E 10
N objects given 01
BMSCE - ME
MCL - Python
| PA G E 11
A
A, D, F
D
A, D, F, H
F
H H
A, B, C, D,
Divisive
E E
E, F, G, H E
approach B, C
B
(Hierarchical) B, C, G
C
G G
BMSCE - ME
MCL - Python
| PA G E 12
N objects given 01
Flow of Divisive
Assign all the points to a single cluster 02
Method Partition the cluster into two clusters with maximum distance
03
between them (or two clusters which are least similar)
Continue 3rd step till you finally get to clusters with individual
points (N clusters) 04
BMSCE - ME
MCL - Python
| PA G E 13
K-means
Clustering
Partition n objects into k clusters
BMSCE - ME
MCL - Python
| PA G E 14
Objective is to partition n given points to k (non zero) clusters 01
BMSCE - ME
MCL - Python
| PA G E 15
K-means
Algorithm Randomly initialize k Assign each object to Compute new
points (centroids) the cluster of the centroids of the
nearest seed point clusters of the current
measured with a partition (the centroid
specific distance metric is the centre, i.e., mean
point, of the cluster)
BMSCE - ME
MCL - Python
| PA G E 16
How many
clusters in
k-means ?
BMSCE - ME
MCL - Python
| PA G E 17
Agglomerative Clustering
https://youtu.be/XJ3194AmH40?
Clustering t=4m47s
Demos
K-Means Algorithm
https://www.naftaliharris.com/blog/
visualizing-k-means-clustering/
BMSCE - ME
MCL - Python
| PA G E 18