0% found this document useful (0 votes)

15 views

Spec Clus Mod

Spectral clustering is an algorithm for partitioning a graph into clusters based on the graph's connectivity and structure. It works by constructing a similarity graph from the data and computing the eigenvectors of the graph Laplacian matrix. These eigenvectors provide an embedding of the data points into a lower dimensional space, where standard clustering algorithms like k-means can be applied to obtain the final clustering. The intuition is that when the graph has clear clusters or communities, these will be reflected in the eigenvectors.

Uploaded by

Anwar Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Spec Clus Mod

Uploaded by

Anwar Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Spectral

Clustering

Aarti Singh

Machine Learning 10-701/15-781

Nov 28, 2012

Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg

1
Data Clustering
Graph Clustering
Goal: Given data points X1, …, Xn and similarities W(Xi,Xj), partition the data into
groups so that points in a group are similar and points in different groups are
dissimilar.

Similarity Graph: G(V,E,W) V – Vertices (Data points)

E – Edge if similarity > 0
W - Edge weights (similarities)

Similarity graph

Partition the graph so that edges within a group have large weights and
edges across groups have small weights.
Similarity graph construc6on
Similarity Graphs: Model local neighborhood relations between data points

E.g. epsilon-NN
⇢
1 kxi xj k  ✏ Controls size of neighborhood
Wij =
0 otherwise

or mutual k-NN graph (Wij = 1 if xi or xj is k nearest neighbor of the other)

Wij

Data clustering
Similarity graph construc6on
Similarity Graphs: Model local neighborhood relations between data points

E.g. Gaussian kernel similarity function

Controls size of neighborhood

Wij

Data clustering
Par66oning a graph into two clusters
Min-cut: Partition graph into two sets A and B such that weight of edges
connecting vertices in A to vertices in B is minimum.

•  Easy to solve O(VE) algorithm

•  Not satisfactory partition – often isolates vertices
Par66oning a graph into two clusters
Partition graph into two sets A and B such that weight of edges connecting
vertices in A to vertices in B is minimum & size of A and B are very similar.

Balanced Min-cut: (More generally

|A|,|B| ≥ δ)

Ratio cut:

Normalized cut:

But NP-hard to solve!!

Spectral clustering is a relaxation of these.
Some Graph Nota6on
.
Graph cut

1 T
= f (D-‐W) f
2

2RHS = fT(D-‐W) f = fTDf -‐ fTW f

= 2LHS
Graph cut and Graph Laplacian

1 1
= fT(D-‐W) f = fTL f
2 2

Un-normalized Graph Laplacian

Spectral properties of L:
Balanced min-‐cut

min fTL f s.t. fT1 = 0

f є{-‐1,1}n
(since ∑ fi = ∑ 1iєA -‐1iєB = 0 )

Above formula5on is s5ll NP-‐Hard, so we relax f not to be binary:

min fTL f s.t. fT1 = 0, fTf = n

f єRn

min fTL f s.t. fT1 = 0

f єRn
fTf
Relaxa6on of Balanced min-‐cut
min fTL f s.t. fT1 = 0
f єRn
fTf

=
λmin(L) -‐ smallest eigenvalue of L (Rayleigh-Ritz theorem)

If f is eigenvector of L, then

fTL f fTλf
= T = λ
f f
T f f

Recall that smallest eigenvalue of L is 0 with corresponding eigenvector 1
But f can’t be 1 according to constraint fT1 = 0

Therefore, solu6on f is the eigenvector of L corresponding to second
smallest eigenvalue, aka second eigenvector.
Approxima6on of Balanced min-‐cut

Let f be the second eigenvector of the unnormalized graph Laplacian L.

Recover binary par55on as follows: i є A if fi ≥ 0

i є B if fi < 0

Ideal solution Relaxed solution

Similar relaxa5ons work for other cut problems:
Ra5oCut -‐ second eigenvector of unnormalized graph Laplacian L = D – W
Normalized cut – second eigenvector of normalized Laplacian L’ = I -‐ D-‐1W
Example Xing et al 2001
How to par66on a graph into k
clusters?
Spectral Clustering Algorithm
W,

L’

Dimensionality Reduction
nxn →nxk
Spectral Clustering -‐ Intui6on
Eigenvectors of the Laplacian matrix provide an embedding of the data
based on similarity.
Disconnected subgraphs
Points are easy to
cluster in
embedded space
e.g. using k-means

00 Embedding of point i
L=

00
Understanding Spectral Clustering
•  If graph is connected, ﬁrst Laplacian evec is constant (all 1s)
•  If graph is disconnected (k connected components), Laplacian
is block diagonal and ﬁrst k Laplacian evecs are:

0 0
L1 1

…
0 0

…
L= L2 1 0
OR
0
0

…
0 L3 1

…
0 0
First three eigenvectors
Understanding Spectral Clustering
•  Is all hope lost if clusters don’t correspond to connected
components of graph? No!
•  If clusters are connected loosely (small oﬀ-‐block diagonal
enteries), then 1st Laplacian eigenvector is all 1s, but
for two clusters, second eigenvector ﬁnds balanced cut
for k clusters, the eigenvectors are slightly perturbed
(and possibly rotated)
Davis-‐Kahan Theorem

Spectral Clustering -‐ Intui6on
Eigenvectors of the Laplacian matrix provide an embedding of the data
based on similarity.
Disconnected subgraphs
Points are easy to
cluster in
embedded space
e.g. using k-means

Embedding of point i
L= ε
0

0ε

k-‐means vs Spectral clustering
Applying k-means to laplacian eigenvectors allows us to find cluster with
non-convex boundaries.

Both perform same Spectral clustering is superior

k-‐means vs Spectral clustering
Applying k-means to laplacian eigenvectors allows us to find cluster with
non-convex boundaries.

k-means output Spectral clustering output

k-‐means vs Spectral clustering
Applying k-means to laplacian eigenvectors allows us to find cluster with
non-convex boundaries.

Similarity matrix

Second eigenvector of graph Laplacian

Examples Ng et al 2001
Examples (Choice of k) Ng et al 2001
Some Issues
Ø  Choice of number of clusters k
Most stable clustering is usually given by the value of k that
maximizes the eigengap (difference between consecutive
eigenvalues)

Δk = λk − λk −1
Some Issues
Ø  Choice of number of clusters k

Ø  Choice of similarity

choice of kernel
for Gaussian kernels, choice of σ

Good similarity measure Poor similarity measure

Some Issues
Ø  Choice of number of clusters k

Ø  Choice of similarity

choice of kernel
for Gaussian kernels, choice of σ

Ø  Choice of clustering method – k-way vs. recursive 2-way

Spectral clustering summary
q  Algorithms that cluster points using eigenvectors of matrices derived from
the data

q  Useful in hard non-convex clustering problems

q  Obtain data representation in the low-dimensional space that can be

easily clustered

q  Variety of methods that use eigenvectors of unnormalized or normalized

Laplacian, differ in how to derive clusters from eigenvectors, k-way vs
repeated 2-way

q  Empirically very successful

ICM104 Kavinda Peiris - Java Literals - Assignment Week 02
No ratings yet
ICM104 Kavinda Peiris - Java Literals - Assignment Week 02
13 pages
PR_module_4_QB - Copy
No ratings yet
PR_module_4_QB - Copy
37 pages
GIU_2719_65_22376_2025-02-17T23_42_29
No ratings yet
GIU_2719_65_22376_2025-02-17T23_42_29
37 pages
Spectral Clustering Survey
No ratings yet
Spectral Clustering Survey
12 pages
Math 118: Mathematical Methods of Data Theory: Lecture 9: Graphs and Spectral Clustering
No ratings yet
Math 118: Mathematical Methods of Data Theory: Lecture 9: Graphs and Spectral Clustering
11 pages
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
No ratings yet
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
102 pages
The Latest Research Progress On Spectral Clustering
No ratings yet
The Latest Research Progress On Spectral Clustering
10 pages
LecN10_R
No ratings yet
LecN10_R
9 pages
Luxburg07 Tutorial 4488
No ratings yet
Luxburg07 Tutorial 4488
32 pages
Spectral Clustering
No ratings yet
Spectral Clustering
7 pages
Tutorial On Spectral Clustering
No ratings yet
Tutorial On Spectral Clustering
26 pages
clustering_notes
No ratings yet
clustering_notes
4 pages
n25 PDF
No ratings yet
n25 PDF
8 pages
Spectral_Clustering
No ratings yet
Spectral_Clustering
4 pages
DS303 Clustering
No ratings yet
DS303 Clustering
20 pages
Handbook of Cluster Analysis: C. Hennig, M. Meila, F. Murtagh, R. Rocci (Eds.)
No ratings yet
Handbook of Cluster Analysis: C. Hennig, M. Meila, F. Murtagh, R. Rocci (Eds.)
28 pages
Spectral Approach (BU)
No ratings yet
Spectral Approach (BU)
2 pages
Spectral Clustering 2
No ratings yet
Spectral Clustering 2
39 pages
3.1 Graph Clustering Using Normalized Cuts
No ratings yet
3.1 Graph Clustering Using Normalized Cuts
24 pages
2092 On Spectral Clustering Analysis and An Algorithm
No ratings yet
2092 On Spectral Clustering Analysis and An Algorithm
8 pages
주례발표 (20101019) SpectralClusteringforClass
No ratings yet
주례발표 (20101019) SpectralClusteringforClass
18 pages
Spectral Clustering: X Through The Parameter W 0. The Resulting
No ratings yet
Spectral Clustering: X Through The Parameter W 0. The Resulting
7 pages
Spectral Approach For Tabular and Graph Data Clustering
No ratings yet
Spectral Approach For Tabular and Graph Data Clustering
15 pages
I Jcs It 2015060141
No ratings yet
I Jcs It 2015060141
5 pages
Makdad - Chloe - NCUWM2021Poster Connections Between Graph Spectral Clustering and PDEs
No ratings yet
Makdad - Chloe - NCUWM2021Poster Connections Between Graph Spectral Clustering and PDEs
1 page
GABB18 Paper 5
No ratings yet
GABB18 Paper 5
8 pages
09_Spectral Clustering
No ratings yet
09_Spectral Clustering
22 pages
Spectral Decomposition for Graphs
No ratings yet
Spectral Decomposition for Graphs
5 pages
Graph Laplacian Matrix
No ratings yet
Graph Laplacian Matrix
3 pages
Spectral Analysis of Signed Graphs For Clustering, Prediction and Visualization
No ratings yet
Spectral Analysis of Signed Graphs For Clustering, Prediction and Visualization
12 pages
ML Unit - IV
No ratings yet
ML Unit - IV
56 pages
Atif-IS-paperwork
No ratings yet
Atif-IS-paperwork
31 pages
Community Detection
No ratings yet
Community Detection
9 pages
03 23MAT214 MIS4 KMeans Spectral Clustering (1)
No ratings yet
03 23MAT214 MIS4 KMeans Spectral Clustering (1)
52 pages
LecN9_R
No ratings yet
LecN9_R
5 pages
Community Detection With Graph Neural Networks
No ratings yet
Community Detection With Graph Neural Networks
16 pages
CS168 Spectral Graph Theory (Roughgarden & Valiant)
No ratings yet
CS168 Spectral Graph Theory (Roughgarden & Valiant)
11 pages
Using Laplacian Spectrum As Graph Feature Representation
No ratings yet
Using Laplacian Spectrum As Graph Feature Representation
17 pages
Slides - Introduction To Signal Processing On Graphs
No ratings yet
Slides - Introduction To Signal Processing On Graphs
35 pages
Sem232 LA CC07 Group08
No ratings yet
Sem232 LA CC07 Group08
23 pages
Signed Graphs: Negative Weights A Nega-Tive Weight Indicates Dissimilarity or Distance
No ratings yet
Signed Graphs: Negative Weights A Nega-Tive Weight Indicates Dissimilarity or Distance
42 pages
CS168: The Modern Algorithmic Toolbox Lectures #11: Spectral Graph Theory, I
No ratings yet
CS168: The Modern Algorithmic Toolbox Lectures #11: Spectral Graph Theory, I
7 pages
3.1 Some Simple Properties: G G G G G G G I
No ratings yet
3.1 Some Simple Properties: G G G G G G G I
11 pages
Ml Assignment 2
No ratings yet
Ml Assignment 2
6 pages
Spectral - Graph - Theory - 3
No ratings yet
Spectral - Graph - Theory - 3
27 pages
GraphSigProc Part I v18 NowFnT
No ratings yet
GraphSigProc Part I v18 NowFnT
49 pages
Pattern Vectors From Algebraic Graph Theory
No ratings yet
Pattern Vectors From Algebraic Graph Theory
14 pages
Cis515 15 Spectral Clust Chap6
No ratings yet
Cis515 15 Spectral Clust Chap6
10 pages
Kernel K-Means, Spectral Clustering and Normalized Cuts: Inderjit S. Dhillon Yuqiang Guan Brian Kulis
No ratings yet
Kernel K-Means, Spectral Clustering and Normalized Cuts: Inderjit S. Dhillon Yuqiang Guan Brian Kulis
6 pages
Segmentation 1
No ratings yet
Segmentation 1
52 pages
Expert Systems With Applications: Tülin Inkaya
No ratings yet
Expert Systems With Applications: Tülin Inkaya
10 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Graph Theory
No ratings yet
Graph Theory
2 pages
Data Clustering (Contd) : CS771: Introduction To Machine Learning Piyush Rai
No ratings yet
Data Clustering (Contd) : CS771: Introduction To Machine Learning Piyush Rai
15 pages
Spars i Fiers
No ratings yet
Spars i Fiers
6 pages
Research On Spectral Clustering Algorithms and Prospects
No ratings yet
Research On Spectral Clustering Algorithms and Prospects
5 pages
521-lecture-13
No ratings yet
521-lecture-13
7 pages
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
_Clustering
No ratings yet
_Clustering
41 pages
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
1 Three-Way Decision and Granular Computing
No ratings yet
1 Three-Way Decision and Granular Computing
43 pages
Rough Set For Categorical
No ratings yet
Rough Set For Categorical
21 pages
1 Tri Level Thinking Models of Three Way Decision
No ratings yet
1 Tri Level Thinking Models of Three Way Decision
13 pages
1 Set-Theoretic Models of Three-Way Decision Highlight
No ratings yet
1 Set-Theoretic Models of Three-Way Decision Highlight
16 pages
Computers and Geosciences 125 (2019) 9-18
No ratings yet
Computers and Geosciences 125 (2019) 9-18
10 pages
A Survey of Kernel and Spectral Methods For Clustering
No ratings yet
A Survey of Kernel and Spectral Methods For Clustering
15 pages
Recent Advances in Clustering A Brief Survey
No ratings yet
Recent Advances in Clustering A Brief Survey
9 pages
A Three-Way Approach For Protein Function Classification (Deep Learning Based 3WC)
No ratings yet
A Three-Way Approach For Protein Function Classification (Deep Learning Based 3WC)
29 pages
Using Boolean Networks To Model Post-Transcriptional Regulation in Gene Regulatory Networks (3W DL)
No ratings yet
Using Boolean Networks To Model Post-Transcriptional Regulation in Gene Regulatory Networks (3W DL)
13 pages
A Bayesian Approach For Estimating Protein-Protein
No ratings yet
A Bayesian Approach For Estimating Protein-Protein
12 pages
Detecting N6-Methyladenosine Sites From RNA Transcriptomes Using Random Forest (Deep Learning)
No ratings yet
Detecting N6-Methyladenosine Sites From RNA Transcriptomes Using Random Forest (Deep Learning)
22 pages
A Structure Based Approach For Accurate Prediction of Protein
No ratings yet
A Structure Based Approach For Accurate Prediction of Protein
8 pages
An Overview of Backdoor Attacks Against Deep Neural Networks and Possible Defences
No ratings yet
An Overview of Backdoor Attacks Against Deep Neural Networks and Possible Defences
27 pages
Development of Baseline (Air Quality) Data in Pakistan: Arifa Lodhi M. Mansha
No ratings yet
Development of Baseline (Air Quality) Data in Pakistan: Arifa Lodhi M. Mansha
16 pages
HJRS - Ehci
No ratings yet
HJRS - Ehci
1 page
Numerical Solutions - Module 5 - Incremental Search
No ratings yet
Numerical Solutions - Module 5 - Incremental Search
6 pages
7.3 Graphs
No ratings yet
7.3 Graphs
43 pages
An Introduction To Formal Language Theory That Integrates Experimentation and Proof - Allen Stoughton
No ratings yet
An Introduction To Formal Language Theory That Integrates Experimentation and Proof - Allen Stoughton
288 pages
Option B Shahine Porter
No ratings yet
Option B Shahine Porter
4 pages
Unit-3 Context Free Grammar
No ratings yet
Unit-3 Context Free Grammar
57 pages
Gold PDF
No ratings yet
Gold PDF
48 pages
0002 Data Types in Java
No ratings yet
0002 Data Types in Java
2 pages
Data Structure and Algorithms QUIZZES
No ratings yet
Data Structure and Algorithms QUIZZES
133 pages
Page Rank With Apache Spark Graphx
No ratings yet
Page Rank With Apache Spark Graphx
8 pages
Samsung New
No ratings yet
Samsung New
22 pages
Apznzazx3snelyiykha87fdhs58t1sfppmw2pjmc6f4kgzffxhzfobaq8jax0c0vtmnnbyn8uhg6m69rjllqiud09x0zzjihj4hhyvmrok3i718rmvn3y0cpkcrlr6ep9bqef Scc72 Imjdoz6w6pzvon1fbzd5hqzg4njca4myws2c i6o5mmqmomw6tbru3dyuqabaabpv44sipfycf
No ratings yet
Apznzazx3snelyiykha87fdhs58t1sfppmw2pjmc6f4kgzffxhzfobaq8jax0c0vtmnnbyn8uhg6m69rjllqiud09x0zzjihj4hhyvmrok3i718rmvn3y0cpkcrlr6ep9bqef Scc72 Imjdoz6w6pzvon1fbzd5hqzg4njca4myws2c i6o5mmqmomw6tbru3dyuqabaabpv44sipfycf
7 pages
Graph Theory Chap
No ratings yet
Graph Theory Chap
12 pages
Data Structures Lab Using C
No ratings yet
Data Structures Lab Using C
86 pages
Neural-Network Questions
0% (1)
Neural-Network Questions
3 pages
Studyset4 With Solutions
No ratings yet
Studyset4 With Solutions
6 pages
A Step-By-step Introduction to the Implementation of Automatic Differentiation
No ratings yet
A Step-By-step Introduction to the Implementation of Automatic Differentiation
17 pages
402 Ada Lab Manual
No ratings yet
402 Ada Lab Manual
64 pages
Laboratory Exercise 2: Discrete-Time Systems: Time-Domain Representation
No ratings yet
Laboratory Exercise 2: Discrete-Time Systems: Time-Domain Representation
3 pages
Bottom-Up Parsing
No ratings yet
Bottom-Up Parsing
4 pages
State Space Search Lecture # 2
No ratings yet
State Space Search Lecture # 2
34 pages
TOA Course Outline
No ratings yet
TOA Course Outline
3 pages
Homework 3 LR Parsing Solutions
No ratings yet
Homework 3 LR Parsing Solutions
8 pages
CS103 - 6 2 23
No ratings yet
CS103 - 6 2 23
3 pages
Phy F313
No ratings yet
Phy F313
3 pages
cs3353-cdsunit-v
No ratings yet
cs3353-cdsunit-v
6 pages
All Theory of Automata PP Solved by NomanTariq
No ratings yet
All Theory of Automata PP Solved by NomanTariq
30 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
Decomposition
No ratings yet
Decomposition
11 pages
University Exam Schedule(2024-25-Odd) -Final (1)
No ratings yet
University Exam Schedule(2024-25-Odd) -Final (1)
3 pages

Spec Clus Mod

Uploaded by

Spec Clus Mod

Uploaded by

Spectral

Machine Learning 10-701/15-781

Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg

Similarity Graph: G(V,E,W) V – Vertices (Data points)

or mutual k-NN graph (Wij = 1 if xi or xj is k nearest neighbor of the other)

E.g. Gaussian kernel similarity function

Controls size of neighborhood

• Easy to solve O(VE) algorithm

Balanced Min-cut: (More generally

But NP-hard to solve!!

2RHS = fT(D-­‐W) f = fTDf -­‐ fTW f

Un-normalized Graph Laplacian

min fTL f s.t. fT1 = 0

min fTL f s.t. fT1 = 0, fTf = n

min fTL f s.t. fT1 = 0

If f is eigenvector of L, then

Recover binary par55on as follows: i є A if fi ≥ 0

Both perform same Spectral clustering is superior

k-means output Spectral clustering output

Second eigenvector of graph Laplacian

Ø Choice of similarity

Good similarity measure Poor similarity measure

Ø Choice of similarity

Ø Choice of clustering method – k-way vs. recursive 2-way

q Useful in hard non-convex clustering problems

q Obtain data representation in the low-dimensional space that can be

q Variety of methods that use eigenvectors of unnormalized or normalized

q Empirically very successful

You might also like

•  Easy to solve O(VE) algorithm

2RHS = fT(D-‐W) f = fTDf -‐ fTW f

Ø  Choice of similarity

Ø  Choice of similarity

Ø  Choice of clustering method – k-way vs. recursive 2-way

q  Useful in hard non-convex clustering problems

q  Obtain data representation in the low-dimensional space that can be

q  Variety of methods that use eigenvectors of unnormalized or normalized

q  Empirically very successful