Graph based clustering
Graph based clustering
● When flying over a city, one can easily identify the forests,
commercial places, farmlands, riverbeds etc. based on their
features, without any explicit training.
● Class labels of the data are unknown
● Given a set of data, the task is to establish the existence
of classes or clusters in the data
What is Cluster Analysis?
● Finding groups of objects such that the objects in a group are similar (or
related) to one another and different from (or unrelated to) the objects in
other groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Application 1: Market Segmentation
● A retail company may collect the following information on households:
• Household income
• Household size
• Occupation of the household’s head
• Distance from nearest urban area
● Identify the following clusters:
• Cluster 1: Small family, high spenders
• Cluster 2: Larger family, high spenders
• Cluster 3: Small family, low spenders
• Cluster 4: Large family, low spenders
• The company can then send personalized advertisements or sales letters to
each household based on how likely they are to respond to specific types of
advertisements.
Application 2: Document Clustering
● Document Clustering:
– Goal: To find groups of documents that are similar to each
other based on the important terms appearing in them.
● Summarization
– Reduce the size of large data sets
● In fact, clustering is one of the most utilized data
mining techniques.
– It has a long history, and used in almost every field, e.g.,
medicine, botany, sociology, biology, marketing,
insurance, libraries, etc.
What is not Clustering?
● Simple segmentation
– Dividing students into different registration groups alphabetically, by
last name
● Results of a query
– Groupings are a result of an external specification
– Clustering is a grouping of objects based on the data
● Supervised classification
– Have class label information
Notion of a Cluster can be Ambiguous
● A clustering algorithm
– Partitional clustering
– Hierarchical clustering
– Density based clustering
– Graph based clustering
– …
● A proximity (similarity, or dissimilarity) function
● Clustering quality
– Inter-clusters distance ⇒ maximized
– Intra-clusters distance ⇒ minimized
● The quality of a clustering result depends on the algorithm, the distance
function, and the application.
Proximity Measure
2. Euclidean Distance (L2 Norm: 𝑟 = 2)
𝑑(𝑥, 𝑦) = 𝑥𝑖 − 𝑦𝑖 2
𝑖=1
● Quality of clustering:
– There is usually a separate “quality” function that measures
Connected components
Final clusters
Some properties of a Graph
● Formally: G=(V, E, W)
○ V = non-empty set of vertices
○ E = subset of V X V, edges, consisting of (ordered) pairs of vertices
○ W = Set of distances/weights between pair of vertices.
● Directed or undirected
● Degree of a node
○ Number of edges incident on it
○ Undirected degree, In-degre, Out-degree
Some properties of Graph
● Walk in a graph between nodes x and y:
○ x= v0 – v1 – v2 – v3 - . . . . . . . . – v(t-2) – v(t-1) – v(t) = y
○ There is an edge between every pair of nodes
○ Length of walk = number of hops = number of edges in the walk
● Closed walk: x=y
● Trail: a walk in which no edge is repeated
● Path: a walk in which no vertex is repeated (except start and end)
● Closed path: start vertex = end vertex
● Cycle = a closed path with length >= 3
● Vertices x and y connected: A path connecting x to y
● Connected graph: All vertex pairs are connected
Unweighted Graph Representation
● Adjacency matrix: A B C D
Edge List:
A B C D E F E F
A 1 1 A B
B 1 1 A E Node List:
C 1 1 B E
D 1 1 E F A B, E
C F B E, A
E 1 1 1
C D C D, F
F 1 1 1 D F D C, F
E A, B, F
F C, D, E
Weighted Graph Representation
A B C D
2 5
8 1 4 3
E 5 F
● Adjacency matrix:
A B C D E F Edge List:
A 2 8
Node List:
B 2 1 A B 3
A (B, 2), (E, 8)
C 5 1 A E 8
B (E, 1), (A, 2)
B E 1
D 5 3 C (D, 5), (F, 4)
E F 5
E 8 1 1 D (C, 5), (F, 3)
C F 4
E (A, 8), (B, 1), (F, 5)
F 1 3 1 C D 5
F (C, 4), (D, 3), (E, 5)
D F 3
Graph Representation
● Adjacency matrix:
Image Source:
https://matthewlincoln.net/2014/12/20/adjacency-matrix-plots-with-r-and-ggplot2.html
Relational Data
● Data not represented as graphs can be converted into graphs
○ Every Data Record = A Node of the graph (d1, d2, d3, . . .)
○ Every pair of nodes connected by an edge (d1,d2), . . . . , (di,dj)
○ Distance (di,dj) = weight of edge (di, dj).
σ is controlling the
sparsity.
Commonly used graphs models:
Fully connected graph: Similarity between the points decided by
a kernel function.
For eg. Gaussian kernel: s(xi,xj)=exp(−||xi−xj||2/(2σ2))
σ is controlling the
sparsity.
Commonly used graphs models:
Fully connected graph: Similarity between the points decided by
a kernel function.
For eg. Gaussian kernel: s(xi,xj)=exp(−||xi−xj||2/(2σ2))
σ is controlling the
sparsity.
Commonly used graphs models:
Fully connected graph: Similarity between the points decided by
a kernel function.
For eg. Gaussian kernel: s(xi,xj)=exp(−||xi−xj||2/(2σ2))
σ is controlling the
sparsity.
Commonly used graphs models:
K-rounds of MST: Similarity between the points decided by
closeness of the data points. Let G = (V, E) be the complete weighted
undirected graph of the dataset.
● The first round of MST of G, say K1 is
computed
● Then the consecutive MSTs are
computed by removing the edges of the
MSTs computed in the previous rounds,
i.e MST of the graph
● K-round of
MST neighborhood
graph is defined as
Commonly used graphs models:
K-round MST: Similarity between the points decided by closeness
of the data points. Let G = (V, E) be the complete weighted
undirected graph of the dataset.
● The first round of MST of G, say K1 is
computed
● Then the consecutive MSTs are
computed by removing the edges of the
MSTs computed in the previous rounds,
i.e MST Ki of the graph
● K-round of
MST neighborhood
graph is defined as
Commonly used graphs models:
K-round MST: Similarity between the points decided by closeness
of the data points. Let G = (V, E) be the complete weighted
undirected graph of the dataset.
● The first round of MST of G, say K1 is
computed
● Then the consecutive MSTs are
computed by removing the edges of the
MSTs computed in the previous rounds,
i.e MST Ki of the graph
● K-round of
MST neighborhood
graph is defined as
Commonly used graphs models:
K-round MST: Similarity between the points decided by closeness
of the data points. Let G = (V, E) be the complete weighted
undirected graph of the dataset.
● The first round of MST of G, say K1 is
computed
● Then the consecutive MSTs are
computed by removing the edges of the
MSTs computed in the previous rounds,
i.e MST Ki of the graph
● K-round of
MST neighborhood
graph is defined as
Commonly used graphs models:
K-round MST: Similarity between the points decided by closeness
of the data points. Let G = (V, E) be the complete weighted
undirected graph of the dataset.
● The first round of MST of G, say K1 is
computed
● Then the consecutive MSTs are
computed by removing the edges of the
MSTs computed in the previous rounds,
i.e MST Ki of the graph
● K-round of
MST neighborhood
graph is defined as
Minimum spanning tree
Prim's Algorithm:
let T be a single vertex x
while (T has fewer than n vertices)
{
find the smallest edge connecting T to G-T
add it to T
}
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Minimum spanning tree based clustering
Steps for MST based clustering (Input dataset: X, Number of clusters: K):
U. Von. Luxburg, A tutorial on spectral clustering, Stat. Comput. 17 (4) (2007) 395–416.
Graph Cut:
I. Mincut problem: Selecting the subsets A and B,
s.t., minimizing the following:
u2
u1 0.4
0.7 0.3
0.8
u3 u5 0.4
Example 1: Let A={u1,u3,u4,u7} 0.9 u4 0.5 1 u6
0.9
B={u2,u5,u6} 0.8
u7
then, cut(A, B) = 0.4+0.5 = 0.9
Example 2: A={u1,u5,u6,u3,u4,u7}, B={u2} u1 0.4 0.3
u2
cut(A, B) = 0.7 0.7
0.8
u3 u5 0.4
Drawback: may select a single node 0.9 u4 0.5 1 u6
0.9
as the one cluster. u7
0.8
Graph Cut:
II. RatioCut: Finding the balanced clusters based on the number of
vertices in each cluster.
u2
u1 0.4
0.7 0.3
0.8
Example 1: Let, A={u1,u3,u4,u7}, |A| = 4 u3 u5 0.4
0.9 u4 0.5
B={u2,u5,u6}, |B| = 3 0.9
1 u6
0.8
then, cut(A, B) = 0.4+0.5 = 0.9 u7
RatioCut(A,B) = 0.9 (¼ +⅓) = 0.525
u2
u1 0.4 0.3
0.7
Example 2: A={u1,u5,u6,u3,u4,u7}, B={u2} 0.8
u3 u5 0.4
then, cut(A, B) = 0.7 0.9 u4 0.5 1 u6
RatioCut(A,B) = 0.7 (⅙ +1) = 0.82 0.9
0.8
u7
Graph Cut:
NCut: Finding the balanced clusters based on the degree-sum of all
nodes within each cluster.
u2
u1
0.7 0.4 0.3
0.8
u3 u5 0.4
Let A={u1,u3,u4,u7} 0.9 u4 0.5 1 u6
0.9
|A| = Number of nodes in A = 4 0.8
u7
vol(A) = Sum of degree sum of each node in A
= deg-sum(u1)+deg-sum(u3)+deg-sum(u4)+deg-sum(u7)
= 1.9 + 2.5 + 3.0 + 1.7 = 9.1
Graph Cut:
Example Mincut RatioCut NCut
=
2 -1 -1 3 -1 0 0 0 0 Laplacian Matrix:
3 -1 -1 -1 4 0 0 -1 0
L=D-W
4 0 0 0 0 3 -1 -1 -1
5 0 0 0 0 -1 3 -1 -1
6 0 0 0 -1 -1 -1 4 -1
7 0 0 0 0 -1 -1 -1 3
Computation of the K-smallest eigenvectors of L
Combining two smallest eigenvectors
of L to get the transformed space U
Laplacian Matrix (L)
Smallest Second smallest
0 1 2 3 4 5 6 7 eigenvector eigenvector
0 3 -1 -1 -1 0 0 0 0 0 -0.35 -0.38
1 -1 3 -1 -1 0 0 0 0 1 -0.35 -0.38
2 -1 -1 3 -1 0 0 0 0 2 -0.35 -0.38
3 -1 -1 -1 4 0 0 -1 0 3 -0.35 -0.25
4 0 0 0 0 3 -1 -1 -1 4 -0.35 0.38
5 0 0 0 0 -1 3 -1 -1 5 -0.35 0.38
6 0 0 0 -1 -1 -1 4 -1
6 -0.35 0.25
7 0 0 0 0 -1 -1 -1 3
7 -0.35 0.38
Second smallest
eigenvector separates
the data points better.
Other examples: Smile dataset
Heatmap of U, i.e., four-smallest eigenvectors of L
References
● Lecture Notes for Chapter 7, Introduction to Data Mining, 2nd Edition by Tan,
Steinbach, Karpatne, Kumar
● Downloaded: https://www-users.cs.umn.edu/~kumar001/dmbook/index.php
● Data Mining: Concepts and Techniques (3rd Edn.) by Jiawei Han, Michelline
Kamber and Jian Pei, Morgan Kaufmann (2014).
● http://cse.iitkgp.ac.in/~dsamanta/courses/da/index.html#resources
● Lecture note on “Minimum Spanning Tree “ by By Swee-Ling Tang
Thank You