Chap8 Advanced Cluster Analysis
Chap8 Advanced Cluster Analysis
Outline
Prototype-based
– Fuzzy c-means
– Mixture Model Clustering
– Self-Organizing Maps
Density-based
– Grid-based clustering
– Subspace clustering: CLIQUE
– Kernel-based: DENCLUE
Graph-based
– Chameleon
– Jarvis-Patrick
– Shared Nearest Neighbor (SNN)
Characteristics of Clustering Algorithms
2
Hard (Crisp) vs Soft (Fuzzy) Clustering
c1 x c2
1 2.5 5
4
Fuzzy C-means
p: fuzzifier (p > 1)
Objective function
k
𝑆𝑆𝐸 𝑤 𝑑𝑖𝑠𝑡 𝒙 , 𝒄 w
j 1
ij 1
Bezdek, James C. Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, 1981.
Fuzzy C-means
c1 x c2
1 2.5 5
SSE(x) has a minimum value of 1.654 when wx1 = 0.74, wx2 = 0.36
6
Fuzzy C-means
k
𝑆𝑆𝐸 𝑤 𝑑𝑖𝑠𝑡 𝒙 , 𝒄 w
j 1
ij 1
Repeat:
– Update centroids: 𝒄𝒋 𝑤 𝒙 / 𝑤
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
maximum
membership
3/31/2021 Introduction to Data Mining, 2nd Edition 8
Tan, Steinbach, Karpatne, Kumar
8
An Example Application: Image Segmentation
10
Probabilistic Clustering: Example
11
12
Probabilistic Clustering: Updating Centroids
13
14
Probabilistic Clustering Applied to Sample Data
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
maximum
probability
3/31/2021 Introduction to Data Mining, 2nd Edition 15
Tan, Steinbach, Karpatne, Kumar
15
?
4
0
y
-2
-4
-6
-8
-10 -8 -6 -4 -2 0 2 4
x
16
Problems with EM
17
Alternatives to EM
Other approaches
18
SOM: Self-Organizing Maps
19
20
SOM Clusters of LA Times Document Data
21
22
Issues with SOM
23
Grid-based Clustering
24
Subspace Clustering
25
Clusters in subspaces
26
Clusters in subspaces
27
Clusters in subspaces
28
Clique – A Subspace Clustering Algorithm
29
Clique Algorithm
30
Clique Algorithm
31
Limitations of Clique
32
Denclue (DENsity CLUstering)
33
34
DENCLUE Algorithm
Find the density function
Identify local maxima (density attractors)
Assign each point to the density attractor
– Follow direction of maximum increase in density
35
36
Graph-Based Clustering: Chameleon
37
38
Graph-Based Clustering: Sparsification …
39
– GROUP-AVERAGE:
Merge two clusters based on their average connectivity
40
Limitations of Current Merging Schemes
(a)
(b)
(c)
(d)
41
42
Relative Interconnectivity
43
Relative Closeness
44
Chameleon: Steps
Preprocessing Step:
Represent the data by a Graph
– Given a set of points, construct the k-nearest-neighbor (k-NN)
graph to capture the relationship between a point and its k
nearest neighbors
– Concept of neighborhood is captured dynamically (even if region
is sparse)
45
Chameleon: Steps …
46
Experimental Results: CHAMELEON
47
48
Experimental Results: CURE (15 clusters)
49
50
Experimental Results: CURE (9 clusters)
51
52
Experimental Results: CHAMELEON
53
Spectral Clustering
54
Spectral Clustering
55
56
Clustering via Spectral Graph Partitioning …
57
58
Spectral Graph Clustering Algorithm
59
60
Strengths and Limitations
61
i j i j
4
62
Graph-Based Clustering: SNN Approach
i j i j
4
If two points are similar to many of the same points, then they are
likely similar to one another, even if a direct measurement of
similarity does not indicate this.
63
64
Jarvis-Patrick Clustering
65
66
When Jarvis-Patrick Does NOT Work Well
67
Combines:
– Graph based clustering (similarity definition based on
number of shared nearest neighbors)
– Density based clustering (DBScan-like approach)
68
SNN Clustering Algorithm
69
70
SNN Density
71
72
SNN Clustering Can Handle Other Difficult Situations
73
60
30
latitude
-30
-60
-90
- 180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180
longitude
3/31/2021 Introduction to Data Mining, 2nd Edition 74
Tan, Steinbach, Karpatne, Kumar
74
SST Clusters that Correspond to El Nino Climate Indices
Steinbach, et al (KDD 2003)
75 78 67 94
0
El Nino Regions Defined
-30 by Earth Scientists
-60
-90
-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180
longitude
75
24 22
25
60
60
26 13 14
30
30
21
latitude
16 20 17 18 15
0
latitude
19
23 -30
-3 0
1 9
-60
-6 0 3 4
6 2 5
7
11 12 10 -90
8 -180 -15 0 -12 0 -90 -60 -30 0 30 60 90 1 20 1 50 1 80
-9 0
-180 -150 -120 -90 -60 -30 0 30 60 90 1 20 1 50 1 80 longitude
longi tude
76
Pairs of SLP Clusters that Correspond to El-Nino SOI
SLP Clusters 15 and 20 SOI vs. Cluster Centroid 20 - C luster Centroid 15 ( c orr = 0.78 )
3 3
Cluster 15 SOI
Cluster 20 20 - 15
2
2
1
1
-1
-1
-2
-2
-3
-3 -4
82 83 84 85 86 87 88 89 90 91 92 93 94 82 83 84 85 86 87 88 89 90 91 92 93 94
77
78
Characteristics of Data, Clusters, and
Clustering Algorithms
A cluster analysis is affected by characteristics of
– Data
– Clusters
– Clustering algorithms
79
Characteristics of Data
High dimensionality
– Dimensionality reduction
Types of attributes
– Binary, discrete, continuous, asymmetric
– Mixed attribute types (e.g., some are continuous, others nominal)
Differences in attribute scales
– Normalization techniques
Size of data set
Noise and Outliers
Properties of the data space
– Can you define a meaningful centroid or a meaningful notion of
density
3/31/2021 Introduction to Data Mining, 2nd Edition 80
Tan, Steinbach, Karpatne, Kumar
80
Characteristics of Clusters
Data distribution
– Parametric models
Shape
– Globular or arbitrary shape
Differing sizes
Differing densities
Level of separation among clusters
Relationship among clusters
Subspace clusters
81
Order dependence
Non-determinism
Scalability
Number of parameters
82
Which Clustering Algorithm?
Type of Clustering
– Taxonomy vs flat
Type of Cluster
– Prototype vs connected reguions vs density-based
Characteristics of Clusters
– Subspace clusters, spatial inter-relationships
83
84
Comparison of MIN and EM-Clustering
MIN can handle outliers, but noise can join clusters; EM clustering
can tolerate noise, but can be strongly affected by outliers.
EM can only be applied to data for which a centroid is meaningful;
MIN only requires a meaningful definition of proximity.
EM will have trouble as dimensionality increases and the number
of its parameters (the number of entries in the covariance matrix)
increases as the square of the number of dimensions; MIN can
work well with a suitable definition of proximity.
EM is designed for Euclidean data, although versions of EM
clustering have been developed for other types of data. MIN is
shielded from the data type by the fact that it uses a similarity
matrix.
MIN makes no distribution assumptions; the version of EM we are
considering assumes Gaussian distributions.
3/31/2021 Introduction to Data Mining, 2nd Edition 85
Tan, Steinbach, Karpatne, Kumar
85
86
Comparison of DBSCAN and K-means
87
88
Comparison of DBSCAN and K-means
89