0% found this document useful (0 votes)

0 views

Graph based clustering

The document provides an overview of graph-based clustering, detailing its principles, applications, and various graph models used in clustering. It discusses the importance of cluster analysis in identifying groups of similar objects and highlights methods like MST-based and spectral clustering. Additionally, it explains how data can be represented as graphs and the significance of proximity measures in clustering quality.

Uploaded by

skhamrui2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Graph based clustering

Uploaded by

skhamrui2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Graph based Clustering

Dr. Sraban Kumar Mohanty

Computer science and Engineering
PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, India
Outline

● Introduction to cluster analysis

● What is Graph Clustering?
● Graph Modeling & Representations
● Commonly used Graphs
○ Sparse Graph
● MST based Clustering
● Spectral Clustering
Unsupervised Learning

● When flying over a city, one can easily identify the forests,
commercial places, farmlands, riverbeds etc. based on their
features, without any explicit training.
● Class labels of the data are unknown
● Given a set of data, the task is to establish the existence
of classes or clusters in the data
What is Cluster Analysis?

● Finding groups of objects such that the objects in a group are similar (or
related) to one another and different from (or unrelated to) the objects in
other groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Application 1: Market Segmentation
● A retail company may collect the following information on households:
• Household income
• Household size
• Occupation of the household’s head
• Distance from nearest urban area
● Identify the following clusters:
• Cluster 1: Small family, high spenders
• Cluster 2: Larger family, high spenders
• Cluster 3: Small family, low spenders
• Cluster 4: Large family, low spenders
• The company can then send personalized advertisements or sales letters to
each household based on how likely they are to respond to specific types of
advertisements.
Application 2: Document Clustering

● Document Clustering:
– Goal: To find groups of documents that are similar to each
other based on the important terms appearing in them.

– Approach: To identify frequently occurring terms in each

document. Form a similarity measure based on the
frequencies of different terms. Use it to cluster.
Clustering: An application

● Summarization
– Reduce the size of large data sets
● In fact, clustering is one of the most utilized data
mining techniques.
– It has a long history, and used in almost every field, e.g.,
medicine, botany, sociology, biology, marketing,
insurance, libraries, etc.
What is not Clustering?

● Simple segmentation
– Dividing students into different registration groups alphabetically, by
last name

● Results of a query
– Groupings are a result of an external specification
– Clustering is a grouping of objects based on the data

● Supervised classification
– Have class label information
Notion of a Cluster can be Ambiguous

How many Six Clusters

clusters?

Two Clusters Four Clusters

20 points and 3 different ways to dividing them

Aspects of clustering

● A clustering algorithm
– Partitional clustering
– Hierarchical clustering
– Density based clustering
– Graph based clustering
– …
● A proximity (similarity, or dissimilarity) function
● Clustering quality
– Inter-clusters distance ⇒ maximized
– Intra-clusters distance ⇒ minimized
● The quality of a clustering result depends on the algorithm, the distance
function, and the application.
Proximity Measure
2. Euclidean Distance (L2 Norm: 𝑟 = 2)

This metric is same as Euclidean distance between any two points

𝑥 𝑎𝑛𝑑 𝑦 𝑖𝑛 ℛ 𝑛 .
𝑛

𝑑(𝑥, 𝑦) = ෍ 𝑥𝑖 − 𝑦𝑖 2

𝑖=1

Example: x = [7, 3, 5] and y = [3, 2, 6].

The Euclidean distance between 𝑥 𝑎𝑛𝑑 𝑦 is

𝑑 𝑥, 𝑦 = 7−3 2 + 3−2 2 + 5−6 2 = 18 ≈ 2.426

Quality of Clustering

● Quality of clustering:
– There is usually a separate “quality” function that measures

the “goodness” of a cluster.

– It is hard to define “similar enough” or “good enough”

◆ The answer is typically highly subjective

● Sum of squared errors (SSE), Rand index, Adjusted Rand
index, Silhouette coefficient etc.
Major Clustering Approaches

(a) Partition-based (b) Hierarchical-based

(c) Density-based (d) Graph-based

What is Graph Clustering?

Input Dataset Graph representation

Connected components
Final clusters
Some properties of a Graph
● Formally: G=(V, E, W)
○ V = non-empty set of vertices
○ E = subset of V X V, edges, consisting of (ordered) pairs of vertices
○ W = Set of distances/weights between pair of vertices.
● Directed or undirected

Directed Directed Undirected Undirected

● Degree of a node
○ Number of edges incident on it
○ Undirected degree, In-degre, Out-degree
Some properties of Graph
● Walk in a graph between nodes x and y:
○ x= v0 – v1 – v2 – v3 - . . . . . . . . – v(t-2) – v(t-1) – v(t) = y
○ There is an edge between every pair of nodes
○ Length of walk = number of hops = number of edges in the walk
● Closed walk: x=y
● Trail: a walk in which no edge is repeated
● Path: a walk in which no vertex is repeated (except start and end)
● Closed path: start vertex = end vertex
● Cycle = a closed path with length >= 3
● Vertices x and y connected: A path connecting x to y
● Connected graph: All vertex pairs are connected
Unweighted Graph Representation
● Adjacency matrix: A B C D

Edge List:
A B C D E F E F
A 1 1 A B
B 1 1 A E Node List:
C 1 1 B E
D 1 1 E F A B, E
C F B E, A
E 1 1 1
C D C D, F
F 1 1 1 D F D C, F
E A, B, F
F C, D, E
Weighted Graph Representation
A B C D
2 5

8 1 4 3
E 5 F
● Adjacency matrix:
A B C D E F Edge List:
A 2 8
Node List:
B 2 1 A B 3
A (B, 2), (E, 8)
C 5 1 A E 8
B (E, 1), (A, 2)
B E 1
D 5 3 C (D, 5), (F, 4)
E F 5
E 8 1 1 D (C, 5), (F, 3)
C F 4
E (A, 8), (B, 1), (F, 5)
F 1 3 1 C D 5
F (C, 4), (D, 3), (E, 5)
D F 3
Graph Representation
● Adjacency matrix:

Image Source:
https://matthewlincoln.net/2014/12/20/adjacency-matrix-plots-with-r-and-ggplot2.html
Relational Data
● Data not represented as graphs can be converted into graphs
○ Every Data Record = A Node of the graph (d1, d2, d3, . . .)
○ Every pair of nodes connected by an edge (d1,d2), . . . . , (di,dj)
○ Distance (di,dj) = weight of edge (di, dj).

● Similarity function choice is critical

○ Similarity between two drugs?
■ Induced gene expressions
■ Molecular structures
■ Sets of side-effects
■ Sets of targets
○ Each similarity metric induces a different graph

A subgraph of a drug-drug network;

(example from: http://warunika.weebly.com/research.html)
Relational Data

• Any dataset can be converted to graphs.

• It provides great insights into the dataset

Co-Citation research paper: Jothi, R., Mohanty, S.K.

and Ojha, A., 2018. Fast approximate minimum spanning tree
based clustering algorithm. Neurocomputing, 272, pp.542-557.
(Example Source: http://https://www.connectedpapers.com/)
What is Graph Clustering?
● Given an undirected graph G=(V,E,W), partition the graph into k
subgraphs, on the basis of edge structure
● Each subgraph is a cluster
○ In its loosest sense, a graph cluster is a connected component
○ In its strictest sense, it’s a maximal clique of a graph
● Two points are placed in the same cluster if the distance between
them is less than certain threshold
● The goal of graph partitioning is to minimize the number of edges
that cross from one sub graph of vertices to another
Graph Modeling:
How to model the data in the form of a graph?

- Given an input dataset X of D dimension, the graph

G = (V, E, W) can be constructed by considering the data
points as vertices and the dissimilarity between them
represents the edges of the graph.
Graph Modeling:
How to model the data in the form of a graph?

- Given an input dataset X of D dimension, the graph

G = (V, E, W) can be constructed by considering the data
points as vertices and the dissimilarity between them
represents the edges of the graph.
Graph Modeling:
Graph Modeling:
Graph Modeling:
Commonly used graphs models:
I. K-nearest neighbors graph (KNN): An edge (u,v) belongs to
the graph if v is among the K-nearest neighbors of u.
Commonly used graphs models:
I. K-nearest neighbors graph (KNN): An edge (u,v) belongs to
the graph if v is among the K-nearest neighbors of u.
Commonly used graphs models:
I. K-nearest neighbors graph (KNN): An edge (u,v) belongs to
the graph if v is among the K-nearest neighbors of u.
Commonly used graphs models:
I. K-nearest neighbors graph (KNN): An edge (u,v) belongs to
the graph if v is among the K-nearest neighbors of u.

It’s a directed graph.

Commonly used graphs models:
Following strategies can be used to make KNN undirected:
➔ If (u, v) belongs to E then add the ➔ Mutual KNN graph: Add the edge
connecting u and v only if both u and v
edge (v, u) also. are K-nearest neighbours of each
other.
Commonly used graphs models:
Scanning-radius based neighborhood graph: All those edges
belong to the graph that are having weights less than the
scanning-radius(𝜺). (Note: If dissimilarity is considered.)
Commonly used graphs models:
Scanning-radius based neighborhood graph: All those edges
belong to the graph that are having weights less than the scanning-
radius(𝜺). (Note: If dissimilarity is considered.)
Commonly used graphs models:
Scanning-radius based neighborhood graph: All those edges
belong to the graph that are having weights less than the scanning-
radius(𝜺). (Note: If dissimilarity is considered.)
Commonly used graphs models:
Scanning-radius based neighborhood graph: All those edges
belong to the graph that are having weights less than the scanning-
radius(𝜺). (Note: If dissimilarity is considered.)
Commonly used graphs models:
Fully connected graph: Similarity between the points decided by
a kernel function.
For eg. Gaussian kernel: s(xi,xj)=exp(−||xi−xj||2/(2σ2))

σ is controlling the
sparsity.
Commonly used graphs models:
Fully connected graph: Similarity between the points decided by
a kernel function.
For eg. Gaussian kernel: s(xi,xj)=exp(−||xi−xj||2/(2σ2))

σ is controlling the
sparsity.
Commonly used graphs models:
K-rounds of MST: Similarity between the points decided by
closeness of the data points. Let G = (V, E) be the complete weighted
undirected graph of the dataset.
● The first round of MST of G, say K1 is
computed
● Then the consecutive MSTs are
computed by removing the edges of the
MSTs computed in the previous rounds,
i.e MST of the graph

● K-round of
MST neighborhood
graph is defined as
Commonly used graphs models:
K-round MST: Similarity between the points decided by closeness
of the data points. Let G = (V, E) be the complete weighted
undirected graph of the dataset.
● The first round of MST of G, say K1 is
computed
● Then the consecutive MSTs are
computed by removing the edges of the
MSTs computed in the previous rounds,
i.e MST Ki of the graph

● K-round of
MST neighborhood
graph is defined as
Minimum spanning tree

● A minimum spanning tree (MST) or minimum weight spanning tree is a

subset of the edges of a connected, edge-weighted undirected graph that
connects all the vertices together, without any cycles and with the
minimum possible total edge weight.

● Three classical algorithms are available

to find the MST of the graph.
○ Boruvka's Algorithm
○ Kruskal's Algorithm
○ Prim's Algorithm
Prim’s Algorithm
• Robert Clay Prim

Prim's Algorithm:
let T be a single vertex x
while (T has fewer than n vertices)
{
find the smallest edge connecting T to G-T
add it to T
}
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Prim's Algorithm - Example
v
8 12
13 9
2
11 20 40 14
7
50 6 10
1 3
Minimum spanning tree based clustering

Steps for MST based clustering (Input dataset: X, Number of clusters: K):

1. Let X be the dataset of size N and D dimension.

2. Construct the sparse graph G= (V, E), where V is the data points
representing the vertices and E is the edges between the vertices of
the graph.
3. Construct an MST of graph G.
4. Remove the K-1 inconsistent edges from MST.
5. Find the connected components that represent the K clusters.
Minimum spanning tree based clustering
I. Commonly used graph modeling: Time O(N2)

Minimum spanning tree

Input Dataset Commonly used Sparse graph

MST with inconsistent edges Connected components Final clusters

Minimum spanning tree based clustering
II. Sparse graph of size

Minimum spanning tree

Input Dataset Sparse graph

MST with inconsistent edges Connected components Final clusters

Spectral clustering

Similarity Computing Finding

Detecting
Input dataset graph Laplacian smallest
clusters
construction Matrix eigenvectors
- Graph-based - Laplacian Matrix
clustering. represents the - Finding K-smallest
- Dataset X. - Apply any traditional
- Data points affinity and eigenvectors of L to
- Number of clustering method like
represent the nodes. neighboring construct the
clusters K. K-means on U.
- Edges represent the information of the transformed matrix U.
connectivities. graph.

U. Von. Luxburg, A tutorial on spectral clustering, Stat. Comput. 17 (4) (2007) 395–416.
Graph Cut:
I. Mincut problem: Selecting the subsets A and B,
s.t., minimizing the following:
u2
u1 0.4
0.7 0.3
0.8
u3 u5 0.4
Example 1: Let A={u1,u3,u4,u7} 0.9 u4 0.5 1 u6
0.9
B={u2,u5,u6} 0.8
u7
then, cut(A, B) = 0.4+0.5 = 0.9
Example 2: A={u1,u5,u6,u3,u4,u7}, B={u2} u1 0.4 0.3
u2
cut(A, B) = 0.7 0.7
0.8
u3 u5 0.4
Drawback: may select a single node 0.9 u4 0.5 1 u6
0.9
as the one cluster. u7
0.8
Graph Cut:
II. RatioCut: Finding the balanced clusters based on the number of
vertices in each cluster.
u2
u1 0.4
0.7 0.3
0.8
Example 1: Let, A={u1,u3,u4,u7}, |A| = 4 u3 u5 0.4
0.9 u4 0.5
B={u2,u5,u6}, |B| = 3 0.9
1 u6
0.8
then, cut(A, B) = 0.4+0.5 = 0.9 u7
RatioCut(A,B) = 0.9 (¼ +⅓) = 0.525
u2
u1 0.4 0.3
0.7
Example 2: A={u1,u5,u6,u3,u4,u7}, B={u2} 0.8
u3 u5 0.4
then, cut(A, B) = 0.7 0.9 u4 0.5 1 u6
RatioCut(A,B) = 0.7 (⅙ +1) = 0.82 0.9
0.8
u7
Graph Cut:
NCut: Finding the balanced clusters based on the degree-sum of all
nodes within each cluster.

u2
u1
0.7 0.4 0.3
0.8
u3 u5 0.4
Let A={u1,u3,u4,u7} 0.9 u4 0.5 1 u6
0.9
|A| = Number of nodes in A = 4 0.8
u7
vol(A) = Sum of degree sum of each node in A
= deg-sum(u1)+deg-sum(u3)+deg-sum(u4)+deg-sum(u7)
= 1.9 + 2.5 + 3.0 + 1.7 = 9.1
Graph Cut:
Example Mincut RatioCut NCut

u2 0.9 0.9/4 + 0.9/3 0.9/9.1 + 0.9/4.3

u1 0.4
0.7 0.3 = 0.525 = 0.308
0.8
u5 0.4
u3
0.9 u4 0.5 1
0.9 u6
0.8
u7

u2 0.7 0.7/6 + 0.7/1 0.7/12.7 + 0.7/0.7

u1 0.4 0.3
0.7 = 0.8167 = 1.05
0.8
u3 u5 0.4
0.9 u4 0.5 1 u6
0.9
0.8
u7
Graph Cut:
➔ Both RatioCut and NCut problems are NP hard.

➔ The spectral clustering algorithm is the relaxation

of the above graph cut problems by considering
the smallest eigenvectors of the graph Laplacians.
Graph Laplacians:

I. Unnormalized graph Laplacian: L = D - W

where, D is the Degree Matrix and W is the
Adjacency Matrix
I. Normalized graph Laplacians:
A. Symmetric: Ls = D-1/2LD-1/2= I - D-1/2WD-1/2

B. Related to the Random Walks:Lr= D-1L = I-D-1W

Unnormalized spectral clustering:
(Relaxation of RatioCut)
Algorithm unnormalized_spectral_clustering (Dataset: X, Number of clusters: K):

1. Construct the sparse similarity graph (G) for X.

2. Find the unnormalized graph Laplacian as: L = D - W.
3. Compute the K-smallest eigenvectors (u1, u2, …….. uK) of the
eigenproblem Lu=λu.
4. Form a N x K matrix U, where each eigenvector ui is stacked as a
column in U.
5. Treat each row of U as a data point.
6. Apply the K-means clustering algorithm on U to get the final K
clusters.
Computing K-smallest eigenvectors of L:
➔ Standard method using QR factorization:
Algorithm k_eig_qr (Input matrix: L, Number of smallest eigenvectors: K):

1. Transform L into upper Hessenberg matrix L0.

2. Apply QR-algorithm:
1. For k=1,2,... do
1. Decompose Lk-1 into QR factors: QkRk = Lk-1, where Qk is an orthogonal
matrix and Rk is an upper triangular matrix.
2. Swap the two factors to get Lk: Lk = RkQk
2. End for
3. Consider the diagonal elements of Lk as the eigenvalues (λ1,λ2, … λN) of L, respectively:
λ = diag(Lk) = {λ1, λ2, …, λN}
4. Sort the previously calculated N-eigenvalues to select the first K-smallest eigenvalues
(λ’1,λ’2, … λ’K).
5. Finally compute the K-eigenvectors (u1, u2, ... uK) corresponding to the selected K-
smallest eigenvalues, respectively.
Example:
Adjacency Matrix (W)
Adjacency and Degree
0 1 2 3 4 5 6 7
Matrices 0 0 1 1 1 0 0 0 0
1 1 0 1 1 0 0 0 0
2 1 1 0 1 0 0 0 0
3 1 1 1 0 0 0 1 0
4 0 0 0 0 0 1 1 1
5 0 0 0 0 1 0 1 1
6 0 0 0 1 1 1 0 1
7 0 0 0 0 1 1 1 0

Degree Matrix (D)

0 1 2 3 4 5 6 7
0 3 0 0 0 0 0 0 0
1 0 3 0 0 0 0 0 0
2 0 0 3 0 0 0 0 0
3 0 0 0 4 0 0 0 0
4 0 0 0 0 3 0 0 0
5 0 0 0 0 0 3 0 0
6 0 0 0 0 0 0 4 0
7 0 0 0 0 0 0 0 3
Degree Matrix (D) Adjacency Matrix (W)
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
0 3 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0
1 0 3 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0
2 0 0 3 0 0 0 0 0 2 1 1 0 1 0 0 0 0
3
4
5
6
0
0
0
0
0
0
0
0
0
0
0
0
4
0
0
0
0
3
0
0
0
0
3
0
0
0
0
4
0
0
0
0
- 3
4
5
6
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
1
0
0
1
1
0
1
0
1
1
1
1
0
0
1
1
1
7 0 0 0 0 0 0 0 3 7 0 0 0 0 1 1 1 0

Unnormalized Laplacian Matrix (L)

0 1 2 3 4 5 6 7
0 3 -1 -1 -1 0 0 0 0
1 -1 3 -1 -1 0 0 0 0 Unnormalized

=
2 -1 -1 3 -1 0 0 0 0 Laplacian Matrix:
3 -1 -1 -1 4 0 0 -1 0
L=D-W
4 0 0 0 0 3 -1 -1 -1
5 0 0 0 0 -1 3 -1 -1
6 0 0 0 -1 -1 -1 4 -1
7 0 0 0 0 -1 -1 -1 3
Computation of the K-smallest eigenvectors of L
Combining two smallest eigenvectors
of L to get the transformed space U
Laplacian Matrix (L)
Smallest Second smallest
0 1 2 3 4 5 6 7 eigenvector eigenvector
0 3 -1 -1 -1 0 0 0 0 0 -0.35 -0.38
1 -1 3 -1 -1 0 0 0 0 1 -0.35 -0.38
2 -1 -1 3 -1 0 0 0 0 2 -0.35 -0.38
3 -1 -1 -1 4 0 0 -1 0 3 -0.35 -0.25
4 0 0 0 0 3 -1 -1 -1 4 -0.35 0.38
5 0 0 0 0 -1 3 -1 -1 5 -0.35 0.38
6 0 0 0 -1 -1 -1 4 -1
6 -0.35 0.25
7 0 0 0 0 -1 -1 -1 3
7 -0.35 0.38

Note: In this case, both of the two eigenvectors can

individually separate the data points into two
desired clusters.
Apply K-Means clustering on U to get final clusters
Final clusters of X

Transformed space of each data point

in U with predicted cluster labels
Input dataset (X) Predicted
cluster
0 1
0 1 labels
0 1 0.8
0 -0.35 -0.38 1
1 0.91 0.73
1 -0.35 -0.38 1
2 0.84 0.76
2 -0.35 -0.38 1
3 0.89 0.82
3 -0.35 -0.25 1
4 0.72 1
4 -0.35 0.38 0
5 0.64 0.94
5 -0.35 0.38 0
6 0.77 0.92
6 -0.35 0.25 0
7 0.68 0.9
7 -0.35 0.38 0
Other examples: Flame dataset
Heatmap of the smallest
and second-smallest
eigenvectors.

Clearly, all the values of

smallest eigenvectors not
varies much and not
helpful in finding the
clusters.

Second smallest
eigenvector separates
the data points better.
Other examples: Smile dataset
Heatmap of U, i.e., four-smallest eigenvectors of L
References

● Lecture Notes for Chapter 7, Introduction to Data Mining, 2nd Edition by Tan,
Steinbach, Karpatne, Kumar
● Downloaded: https://www-users.cs.umn.edu/~kumar001/dmbook/index.php
● Data Mining: Concepts and Techniques (3rd Edn.) by Jiawei Han, Michelline
Kamber and Jian Pei, Morgan Kaufmann (2014).
● http://cse.iitkgp.ac.in/~dsamanta/courses/da/index.html#resources
● Lecture note on “Minimum Spanning Tree “ by By Swee-Ling Tang
Thank You

Edukasyon Pantahanan at Pangkabuhayan With Entrep
100% (1)
Edukasyon Pantahanan at Pangkabuhayan With Entrep
26 pages
Graph Mining: Anuraj Mohan 13MZ01, CSED
No ratings yet
Graph Mining: Anuraj Mohan 13MZ01, CSED
50 pages
Tutorial On Spectral Clustering
No ratings yet
Tutorial On Spectral Clustering
26 pages
Week 16
No ratings yet
Week 16
47 pages
Luxburg07 Tutorial 4488
No ratings yet
Luxburg07 Tutorial 4488
32 pages
LAB6
No ratings yet
LAB6
4 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
53 pages
Clustering
No ratings yet
Clustering
27 pages
Graph Based
No ratings yet
Graph Based
26 pages
Ml Assignment 2
No ratings yet
Ml Assignment 2
6 pages
L21 Mining Social Network Graphs
No ratings yet
L21 Mining Social Network Graphs
30 pages
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
No ratings yet
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
37 pages
LecN10_R
No ratings yet
LecN10_R
9 pages
1. Clustering
No ratings yet
1. Clustering
75 pages
GraphBasedDataScience
No ratings yet
GraphBasedDataScience
37 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Elastic Net Hypergraph Learning For Image Clustering and Seni Supervised Learning Liu2016
No ratings yet
Elastic Net Hypergraph Learning For Image Clustering and Seni Supervised Learning Liu2016
13 pages
Unit I Graph Theory and concepts
No ratings yet
Unit I Graph Theory and concepts
35 pages
UNIT 5
No ratings yet
UNIT 5
5 pages
Graph-Based Clustering and Data Visualization Algorithms (PDFDrive)
No ratings yet
Graph-Based Clustering and Data Visualization Algorithms (PDFDrive)
120 pages
UNIT5
No ratings yet
UNIT5
60 pages
Clustering
No ratings yet
Clustering
39 pages
Graph in Datastructure
No ratings yet
Graph in Datastructure
34 pages
Group 4 prt presentation
No ratings yet
Group 4 prt presentation
48 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Clustering 2
No ratings yet
Clustering 2
13 pages
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
No ratings yet
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
102 pages
15. W-09_L-1_Introduction to Graph Algorithms and Graph Representation.pptx
No ratings yet
15. W-09_L-1_Introduction to Graph Algorithms and Graph Representation.pptx
26 pages
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
No ratings yet
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
61 pages
DS303 Clustering
No ratings yet
DS303 Clustering
20 pages
ML-07-clustering
No ratings yet
ML-07-clustering
56 pages
unit 4
No ratings yet
unit 4
78 pages
Introduction To Graph Cluster Analysis
No ratings yet
Introduction To Graph Cluster Analysis
48 pages
Graph Theory Cambridge U
No ratings yet
Graph Theory Cambridge U
75 pages
datastructure5
No ratings yet
datastructure5
34 pages
Clustering Lec 1 Introduction To Clustering
No ratings yet
Clustering Lec 1 Introduction To Clustering
48 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
menendezLlorente
No ratings yet
menendezLlorente
22 pages
Clustering
No ratings yet
Clustering
5 pages
M5
No ratings yet
M5
40 pages
Clustering
No ratings yet
Clustering
75 pages
Lec1 Graph
No ratings yet
Lec1 Graph
42 pages
Segmentation 1
No ratings yet
Segmentation 1
52 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
Graph
No ratings yet
Graph
10 pages
Graph Algorithms: Timothy Vismor June 11, 2011
No ratings yet
Graph Algorithms: Timothy Vismor June 11, 2011
30 pages
Chapter8-Basic Cluster Analysis2016
No ratings yet
Chapter8-Basic Cluster Analysis2016
143 pages
Module 5 Graphs
No ratings yet
Module 5 Graphs
8 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
Final Graph
No ratings yet
Final Graph
77 pages
Graph Algorithms Study Guide
No ratings yet
Graph Algorithms Study Guide
98 pages
I-Introduction To Network Theory: Basic Concepts
No ratings yet
I-Introduction To Network Theory: Basic Concepts
66 pages
Chapter 3-Unsupervised learning_updated
No ratings yet
Chapter 3-Unsupervised learning_updated
54 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Graph Theory 1-11 PDF
No ratings yet
Graph Theory 1-11 PDF
13 pages
Master ACT Math Prep: Maths, #1
From Everand
Master ACT Math Prep: Maths, #1
Subbalakshmi Devaki
No ratings yet
Math Practice Tests For The ACT
From Everand
Math Practice Tests For The ACT
Vibrant Publishers
No ratings yet
Master SAT Prep Maths: Maths, #1
From Everand
Master SAT Prep Maths: Maths, #1
Subbalakshmi Devaki
No ratings yet
The Beginners Math for GRE & GMAT: Maths, #1
From Everand
The Beginners Math for GRE & GMAT: Maths, #1
Subbalakshmi Devaki
No ratings yet
12a How To Do AHP Analysis in Excel
No ratings yet
12a How To Do AHP Analysis in Excel
21 pages
Spec 108 Activities
No ratings yet
Spec 108 Activities
61 pages
Island
No ratings yet
Island
8 pages
[Lab 1-Preparation of Media] (Repaired)
No ratings yet
[Lab 1-Preparation of Media] (Repaired)
16 pages
Complexity and Industrial Clusters - Dynamics and Models in Theory and Practice
No ratings yet
Complexity and Industrial Clusters - Dynamics and Models in Theory and Practice
307 pages
3is Lesson Plan
No ratings yet
3is Lesson Plan
7 pages
Semantics. Causes of Semantic Change
No ratings yet
Semantics. Causes of Semantic Change
3 pages
Spaghetti 642
No ratings yet
Spaghetti 642
10 pages
Download Full Geology and Landscape Evolution: General Principles Applied to the United States 2nd Edition Joseph A. Dipietro PDF All Chapters
100% (4)
Download Full Geology and Landscape Evolution: General Principles Applied to the United States 2nd Edition Joseph A. Dipietro PDF All Chapters
49 pages
Weekly Home
No ratings yet
Weekly Home
8 pages
Pti DAT Gain Multipliers
No ratings yet
Pti DAT Gain Multipliers
2 pages
Chem syl sem 4
No ratings yet
Chem syl sem 4
5 pages
Term-2 G4 English Question Paper - Paper 1
No ratings yet
Term-2 G4 English Question Paper - Paper 1
9 pages
List of Registered Sewage Tanks: As Established in Chapter 246-272C WAC August 2020
No ratings yet
List of Registered Sewage Tanks: As Established in Chapter 246-272C WAC August 2020
71 pages
Patliputra University, Patna: Under-Graduate (UG) Admission Application Form 24G0066273
No ratings yet
Patliputra University, Patna: Under-Graduate (UG) Admission Application Form 24G0066273
2 pages
Full The Routledge Companion To Feminist Philosophy 1st Edition Ann Garry Ebook All Chapters
100% (4)
Full The Routledge Companion To Feminist Philosophy 1st Edition Ann Garry Ebook All Chapters
62 pages
S&s Notes and Mcq's
No ratings yet
S&s Notes and Mcq's
91 pages
HS-420 4-20ma Velocity Sensor - 4 Pin M12 - TS029.9
No ratings yet
HS-420 4-20ma Velocity Sensor - 4 Pin M12 - TS029.9
1 page
Effect of Blanching Followed by Refrigerated Storage or Industrialmicrowave Drying On The Microbial Load of Yellow Mealworm Larvae
No ratings yet
Effect of Blanching Followed by Refrigerated Storage or Industrialmicrowave Drying On The Microbial Load of Yellow Mealworm Larvae
4 pages
Career Counseling Theory Case Study
100% (1)
Career Counseling Theory Case Study
6 pages
GHHR - Culture Analytics
No ratings yet
GHHR - Culture Analytics
482 pages
ALTA - Nuevas Tablas de Equivalencias
No ratings yet
ALTA - Nuevas Tablas de Equivalencias
6 pages
Dynamic Mechanical Analysis (DMA) of Polymers by Oscillatory Indentation
No ratings yet
Dynamic Mechanical Analysis (DMA) of Polymers by Oscillatory Indentation
5 pages
Quiz 5 - Decision Tree
No ratings yet
Quiz 5 - Decision Tree
4 pages
R. P. K. - Review, Maria Sabina and Her Mazatec Mushroom Velada (Mycology Journal)
No ratings yet
R. P. K. - Review, Maria Sabina and Her Mazatec Mushroom Velada (Mycology Journal)
3 pages
EDUC8 BPE Syllabus
No ratings yet
EDUC8 BPE Syllabus
12 pages
9D Participle Clauses-1
No ratings yet
9D Participle Clauses-1
1 page
Class 12 Physics Practical
No ratings yet
Class 12 Physics Practical
18 pages
PRELIM Week 1-3 Assignment GEC101 Understanding The Self CSAB
No ratings yet
PRELIM Week 1-3 Assignment GEC101 Understanding The Self CSAB
3 pages

Graph based clustering

Uploaded by

Graph based clustering

Uploaded by

Graph based Clustering

Dr. Sraban Kumar Mohanty

● Introduction to cluster analysis

– Approach: To identify frequently occurring terms in each

How many Six Clusters

Two Clusters Four Clusters

20 points and 3 different ways to dividing them

This metric is same as Euclidean distance between any two points

Example: x = [7, 3, 5] and y = [3, 2, 6].

The Euclidean distance between 𝑥 𝑎𝑛𝑑 𝑦 is

𝑑 𝑥, 𝑦 = 7−3 2 + 3−2 2 + 5−6 2 = 18 ≈ 2.426

the “goodness” of a cluster.

◆ The answer is typically highly subjective

(a) Partition-based (b) Hierarchical-based

(c) Density-based (d) Graph-based

Input Dataset Graph representation

Directed Directed Undirected Undirected

● Similarity function choice is critical

A subgraph of a drug-drug network;

• Any dataset can be converted to graphs.

Co-Citation research paper: Jothi, R., Mohanty, S.K.

- Given an input dataset X of D dimension, the graph

- Given an input dataset X of D dimension, the graph

It’s a directed graph.

● A minimum spanning tree (MST) or minimum weight spanning tree is a

● Three classical algorithms are available

1. Let X be the dataset of size N and D dimension.

Minimum spanning tree

MST with inconsistent edges Connected components Final clusters

Minimum spanning tree

MST with inconsistent edges Connected components Final clusters

Similarity Computing Finding

u2 0.9 0.9/4 + 0.9/3 0.9/9.1 + 0.9/4.3

u2 0.7 0.7/6 + 0.7/1 0.7/12.7 + 0.7/0.7

➔ The spectral clustering algorithm is the relaxation

I. Unnormalized graph Laplacian: L = D - W

B. Related to the Random Walks:Lr= D-1L = I-D-1W

1. Construct the sparse similarity graph (G) for X.

1. Transform L into upper Hessenberg matrix L0.

Degree Matrix (D)

Unnormalized Laplacian Matrix (L)

Note: In this case, both of the two eigenvectors can

Transformed space of each data point

Clearly, all the values of

You might also like