0% found this document useful (0 votes)
15 views

Spec Clus Mod

Spectral clustering is an algorithm for partitioning a graph into clusters based on the graph's connectivity and structure. It works by constructing a similarity graph from the data and computing the eigenvectors of the graph Laplacian matrix. These eigenvectors provide an embedding of the data points into a lower dimensional space, where standard clustering algorithms like k-means can be applied to obtain the final clustering. The intuition is that when the graph has clear clusters or communities, these will be reflected in the eigenvectors.

Uploaded by

Anwar Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Spec Clus Mod

Spectral clustering is an algorithm for partitioning a graph into clusters based on the graph's connectivity and structure. It works by constructing a similarity graph from the data and computing the eigenvectors of the graph Laplacian matrix. These eigenvectors provide an embedding of the data points into a lower dimensional space, where standard clustering algorithms like k-means can be applied to obtain the final clustering. The intuition is that when the graph has clear clusters or communities, these will be reflected in the eigenvectors.

Uploaded by

Anwar Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Spectral

 Clustering  

Aarti Singh

Machine Learning 10-701/15-781


Nov 28, 2012

Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg

1
Data  Clustering  
Graph  Clustering  
Goal: Given data points X1, …, Xn and similarities W(Xi,Xj), partition the data into
groups so that points in a group are similar and points in different groups are
dissimilar.

Similarity Graph: G(V,E,W) V – Vertices (Data points)


E – Edge if similarity > 0
W - Edge weights (similarities)

Similarity graph

Partition the graph so that edges within a group have large weights and
edges across groups have small weights.
Similarity  graph  construc6on  
Similarity Graphs: Model local neighborhood relations between data points

E.g. epsilon-NN

1 kxi xj k  ✏ Controls size of neighborhood
Wij =
0 otherwise

or mutual k-NN graph (Wij = 1 if xi or xj is k nearest neighbor of the other)

Wij

Data clustering
Similarity  graph  construc6on  
Similarity Graphs: Model local neighborhood relations between data points

E.g. Gaussian kernel similarity function

Controls size of neighborhood

Wij

Data clustering
Par66oning  a  graph  into  two  clusters  
Min-cut: Partition graph into two sets A and B such that weight of edges
connecting vertices in A to vertices in B is minimum.

•  Easy to solve O(VE) algorithm


•  Not satisfactory partition – often isolates vertices
Par66oning  a  graph  into  two  clusters  
Partition graph into two sets A and B such that weight of edges connecting
vertices in A to vertices in B is minimum & size of A and B are very similar.

Balanced Min-cut: (More generally


|A|,|B| ≥ δ)

Ratio cut:

Normalized cut:

But NP-hard to solve!!


Spectral clustering is a relaxation of these.
Some  Graph  Nota6on  
.
 Graph  cut  

1 T
=          f (D-­‐W)  f  
2

2RHS = fT(D-­‐W)  f    =  fTDf  -­‐  fTW  f  

= 2LHS
Graph  cut  and  Graph  Laplacian  

1 1
=        fT(D-­‐W)  f   =        fTL  f  
2 2

Un-normalized Graph Laplacian

Spectral properties of L:
Balanced  min-­‐cut  

min          fTL  f              s.t.                fT1  =  0                        


f    є{-­‐1,1}n  
     (since  ∑  fi  =  ∑  1iєA  -­‐1iєB  =  0  )  

Above  formula5on  is  s5ll  NP-­‐Hard,  so  we  relax  f  not  to  be  binary:  

min          fTL  f              s.t.                fT1  =  0,        fTf  =  n  


f  єRn  

min          fTL  f              s.t.                fT1  =  0  


f  єRn  
fTf    
Relaxa6on  of  Balanced  min-­‐cut  
min          fTL  f              s.t.                fT1  =  0  
f  єRn  
fTf    

=
λmin(L)    -­‐  smallest  eigenvalue  of  L   (Rayleigh-Ritz theorem)

If  f  is  eigenvector  of  L,  then  


 

  fTL  f          fTλf              
  =              T                =  λ  
f f    
T f f    
 
Recall  that  smallest  eigenvalue  of  L  is  0  with  corresponding  eigenvector  1  
But  f  can’t  be  1  according  to  constraint  fT1  =  0  
 
Therefore,  solu6on  f  is  the  eigenvector  of  L  corresponding  to  second  
smallest  eigenvalue,  aka  second  eigenvector.  
Approxima6on  of  Balanced  min-­‐cut  

Let  f  be  the  second  eigenvector  of  the  unnormalized  graph  Laplacian  L.  
 

Recover  binary  par55on  as  follows:    i  є  A        if      fi  ≥  0              


         i  є  B        if      fi  <  0  
 
 
Ideal solution Relaxed solution
 
 
 
 
 
Similar  relaxa5ons  work  for  other  cut  problems:  
Ra5oCut    -­‐  second  eigenvector  of  unnormalized  graph  Laplacian  L  =  D  –  W  
Normalized  cut  –  second  eigenvector  of  normalized  Laplacian  L’  =  I  -­‐  D-­‐1W  
Example   Xing et al 2001
How  to  par66on  a  graph  into  k  
clusters?  
Spectral  Clustering  Algorithm  
W,  

L’    

Dimensionality Reduction
nxn →nxk
Spectral  Clustering  -­‐  Intui6on  
Eigenvectors of the Laplacian matrix provide an embedding of the data
based on similarity.
Disconnected subgraphs
Points are easy to
cluster in
embedded space
e.g. using k-means

00 Embedding of point i
L=

00
Understanding  Spectral  Clustering  
•  If  graph  is  connected,  first  Laplacian  evec  is  constant  (all  1s)  
•  If  graph  is  disconnected  (k  connected  components),  Laplacian  
is  block  diagonal  and  first  k  Laplacian  evecs  are:  

0 0
L1 1


0 0


L= L2 1 0
OR
0
0


0 L3 1


0 0
First three eigenvectors
Understanding  Spectral  Clustering  
•  Is  all  hope  lost  if  clusters  don’t  correspond  to  connected  
components  of  graph?  No!  
•  If  clusters  are  connected  loosely  (small  off-­‐block  diagonal  
enteries),  then  1st  Laplacian  eigenvector  is  all  1s,  but    
 for  two  clusters,  second  eigenvector  finds  balanced  cut  
 for  k  clusters,  the  eigenvectors  are  slightly  perturbed  
 (and  possibly  rotated)  
 Davis-­‐Kahan  Theorem  
   
Spectral  Clustering  -­‐  Intui6on  
Eigenvectors of the Laplacian matrix provide an embedding of the data
based on similarity.
Disconnected subgraphs
Points are easy to
cluster in
embedded space
e.g. using k-means

Embedding of point i
L= ε
0



k-­‐means  vs  Spectral  clustering  
Applying k-means to laplacian eigenvectors allows us to find cluster with
non-convex boundaries.

Both perform same Spectral clustering is superior


k-­‐means  vs  Spectral  clustering  
Applying k-means to laplacian eigenvectors allows us to find cluster with
non-convex boundaries.

k-means output Spectral clustering output


k-­‐means  vs  Spectral  clustering  
Applying k-means to laplacian eigenvectors allows us to find cluster with
non-convex boundaries.

Similarity matrix

Second eigenvector of graph Laplacian


Examples   Ng et al 2001
Examples  (Choice  of  k)   Ng et al 2001
Some  Issues  
Ø  Choice of number of clusters k
Most stable clustering is usually given by the value of k that
maximizes the eigengap (difference between consecutive
eigenvalues)

Δk = λk − λk −1
Some  Issues  
Ø  Choice of number of clusters k

Ø  Choice of similarity


choice of kernel
for Gaussian kernels, choice of σ

Good similarity measure Poor similarity measure


Some  Issues  
Ø  Choice of number of clusters k

Ø  Choice of similarity


choice of kernel
for Gaussian kernels, choice of σ

Ø  Choice of clustering method – k-way vs. recursive 2-way


Spectral  clustering  summary  
q  Algorithms that cluster points using eigenvectors of matrices derived from
the data

q  Useful in hard non-convex clustering problems

q  Obtain data representation in the low-dimensional space that can be


easily clustered

q  Variety of methods that use eigenvectors of unnormalized or normalized


Laplacian, differ in how to derive clusters from eigenvectors, k-way vs
repeated 2-way

q  Empirically very successful

You might also like