CLIQUE Algorithm Grid-Based Subspace Clustering
CLIQUE Algorithm Grid-Based Subspace Clustering
○ Irrelevance of distances;
○ Local feature relevance: different features or a different correlation of features may be relevant
for varying clusters
Apriori principle: If a collection of points S is a cluster in a k-dimensional space, then S is also part
of a cluster in any (k-1) dimensional projections of this space
Major Steps of the CLIQUE Algorithm
● Identify subspaces that contain clusters
○ Partition the data space and find the number of points that lie inside each cell of the partition
○ Identify the subspaces that contain clusters using the Apriori principle
● Identify clusters
○ Determine dense units in all subspaces of interests
○ Determine connected dense units in all subspaces of interests
● Generate minimal descriptions for the clusters
○ Determine maximal regions that cover a cluster of connected dense units for each cluster
○ Determine minimal cover for each cluster
Comments on CLIQUE
● Strengths
○ Automatically finds subspaces of the highest dimensionality as long as high density clusters exist in
those subspaces
○ Insensitive to the order of records in input and does not presume some canonical data distribution
○ Scales linearly with the size of input and has good scalability as the number of dimensions in the
data increases O(Ck + mk)
○ Simple method and interpretability of results
● Weaknesses
○ As in all grid-based clustering approaches, the quality of the results crucially depends on the
appropriate choice of the number and width of the partitions and grid cells
References
● R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic Subspace Clustering of High
Dimensional Data for Data Mining Applications. SIGMOD’98
● Charu Aggarwal. An Introduction to Clustering Analysis. in Aggarwal and Reddy(eds.). Data
Clustering: Algorithms and Applications (Chapter 1). CRC Press, 2014
● Kriegel, H.-P., Kröger, P., & Zimek, A. (2009). Clustering high-dimensional data. ACM
Transactions on Knowledge Discovery from Data, 3(1), 1–58.
● Jiawei Han’s video on CLIQUE (extract of a coursera/UIUC MOOC)
https://www.youtube.com/watch?v=QqkHPJxAXoE
● ELKI framework https://elki-project.github.io/