0% found this document useful (0 votes)
64 views

Markov Clustering Algorithm

The document describes the Markov Clustering (MCL) algorithm for graph clustering. MCL uses random walks within a graph and two main operations - expansion and inflation. Expansion simulates multiple random walks to enhance flow between well-connected nodes. Inflation increases inequality in flow distribution, favoring nodes that receive more flow. Together, these operations cause flow to accumulate and form high-density clusters. MCL can find overlapping clusters and works by analyzing flow distributions within a graph to reveal its inherent cluster structure.

Uploaded by

Aaryan Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Markov Clustering Algorithm

The document describes the Markov Clustering (MCL) algorithm for graph clustering. MCL uses random walks within a graph and two main operations - expansion and inflation. Expansion simulates multiple random walks to enhance flow between well-connected nodes. Inflation increases inequality in flow distribution, favoring nodes that receive more flow. Together, these operations cause flow to accumulate and form high-density clusters. MCL can find overlapping clusters and works by analyzing flow distributions within a graph to reveal its inherent cluster structure.

Uploaded by

Aaryan Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

 Introduction

 Important Concepts in MCL Algorithm

 MCL Algorithm

 The Features of MCL Algorithm

 Summary
Graph Clustering
 Intuition:
◦ High connected nodes could be in one cluster
◦ Low connected nodes could be in different clusters.
 Model:
◦ A random walk may start at any node
◦ Starting at node r, if a random walk will reach node
t with high probability, then r and t should be
clustered together.
Markov Clustering (MCL)
 Markov process
◦ The probability that a random will take an edge at
node u only depends on u and the given edge.
◦ It does not depend on its previous route.
◦ This assumption simplifies the computation.
MCL
 Flow network is used to approximate the
partition
 There is an initial amount of flow injected

into each node.


 At each step, a percentage of flow will goes

from a node to its neighbors via the outgoing


edges.
MCL
 Edge Weight
◦ Similarity between two nodes
◦ Considered as the bandwidth or connectivity.
◦ If an edge has higher weight than the other, then
more flow will be flown over the edge.
◦ The amount of flow is proportional to the edge
weight.
◦ If there is no edge weight, then we can assign the
same weight to all edges.
Intuition of MCL
 Two natural clusters

A B

 When the flow reaches the border points, it is likely


to return back, than cross the border.
MCL
 When the flow reaches A, it has four possible
outcomes.
◦ Three back into the cluster, one leak out.
◦ ¾ of flow will return, only ¼ leaks.
 Flow will accumulate in the center of a cluster
(island).
 The border nodes will starve.
 Simualtion of Random Flow in graph

 Two Operations: Expansion and Inflation

 Intrinsic relationship between MCL process


result and cluster structure
 Popular Description: partition into graph so
that

 Intra-partition similarity is the highest

 Inter-partition similarity is the lowest


 Observation 1:

 The number of Higher-Length paths in G is


large for pairs of vertices lying in the same
dense cluster

 Small for pairs of vertices belonging to


different clusters
 Oberservation 2:

 A Random Walk in G that visits a dense


cluster will likely not leave the cluster until
many of its vertices have been visited
Definitions
 nxn Adjacency matrix A.
◦ A(i,j) = weight on edge from i to j
◦ If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric

 nxn Transition matrix P.


◦ P is row stochastic
◦ P(i,j) = probability of stepping on node j from node i
= A(i,j)/∑iA(i,j)

 nxn Laplacian Matrix L.


◦ L(i,j)=∑iA(i,j)-A(i,j)
◦ Symmetric positive semi-definite for undirected graphs
◦ Singular
Definitions

Adjacency matrix A
Transition matrix P

1 1
1 1/2
1 1

1 1/2
What is a random walk
t=0
1
1/2
1

1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2

t=2
1
1/2
1

1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2

t=2 t=3
1
1/2 1
1
1/2
1

1/2
1/2
Probability Distributions
 xt(i) = probability that the surfer is at node i at time t

 xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i)


=∑jxt(j)*P(j,i)

 xt+1 = xtP = xt-1*P*P= xt-2*P*P*P = …=x0 Pt

 What happens when the surfer keeps walking for a long


time?
Flow Formulation

• Flow: Transition probability from a node to another node.


• Flow matrix: Matrix with the flows among all nodes; ith
column represents flows out of ith node. Each column sums
to 1.

1 2 3

1 2 3
0.5 0.5 1 0 0.5 0
Flow
1 2 3 2 1.0 0 1.0
1 1 Matrix
3 0 0.5 0

20
 Measure or Sample any of these—high-length
paths, random walks and deduce the cluster
structure from the behavior of the samples
quantities.

 Cluster structure will show itself as a peaked


distribution of the quantities

 A lack of cluster structure will result in a flat


distribution
 Markov Chain

 Random Walk on Graph

 Some Definitions in MCL


 A Random Process with Markov Property

 Markov Property: given the present state,


future states are independent of the past
states

 At each step the process may change its state


from the current state to another state, or
remain in the same state, according to a
certain probability distribution.
 A walker takes off on some arbitrary vertex

 He successively visits new vertices by


selecting arbitrarily one of outgoing edges

 There is not much difference between


random walk and finite Markov chain.
 Simple Graph

 Simple graph is undirected graph in which


every nonzero weight equals 1.
 Associated Matrix

 The associated matrix of G, denoted MG ,is


defined by setting the entry (MG)pq equal to
w(vp,vq)
 Markov Matrix

 The Markov matrix associated with a graph G


is denoted by TG and is formally defined by
letting its qth column be the qth column of M
normalized
 The associate matrix and markov matrix is
actually for matrix M+I

 I denotes diagonal matrix with nonzero


element equals 1

 Adding a loop to every vertex of the graph


because for a walker it is possible that he will
stay in the same place in his next step
 Find Higher-Length Path

 Start Point: In associated matrix that the


quantity (Mk)pq has a straightforward
interpretation as the number of paths of
length k between vp and vq
MG

(MG+I)2
MG
 Flow is easier with dense regions than across
sparse boundaries,

 However, in the long run, this effect


disappears.

 Power of matrix can be used to find higher-


length path but the effect will diminish as the
flow goes on.
 Idea: How can we change the distribution of
transition probabilities such that prefered
neighbours are further favoured and less
popular neighbours are demoted.

 MCL Solution: raise all the entries in a given


column to a certain power greater than 1 (e.g.
squaring) and rescaling the column to have
the sum 1 again.
 Expansion Operation: power of matrix,
expansion of dense region

 Inflation Operation: mention aboved,


elimination of unfavoured region
The MCL algorithm
Input: A, Adjacency matrix
Initialize M to MG, the canonical
Enhances flow to well-connected nodes
transition matrix M:= MG:= (A+I) D-1
as well as to new nodes.

Expand: M := M*M

Inflate: M := M.^r (r usually Increases inequality in each column.


2), renormalize columns “Rich get richer, poor get poorer.”

Prune

Saves memory by removing entries close


No
Converged to zero.
?
Yes

Output clusters
43
Multi-level Regularized MCL
Run R-MCL to convergence, output clusters.

Input Graph Input Graph

Coarsen Run Curtailed R-MCL,project flow.

Intermediate Intermediate
Graph Graph Initializes flow
Coarsen matrix of refined
... ... graph

Run Curtailed R-MCL, project flow.


Coarsen
Captures global
topology of graph Faster to run on
smaller graphs first
Coarsest Graph

44
 http://www.micans.org/mcl/ani/mcl-animati
on.html
 Find attractor: the node a is an attractor if
Maa is nonzero
 Find attractor system: If a is an attractor then

the set of its neighbours is called an attractor


system.
 If there is a node who has arc connected to

any node of an attractor system, the node will


belong to the same cluster as that attractor
system.
Attractor Set={1,2,3,4,5,6,7,8,9,10}
The Attractor System is {1,2,3},{4,5,6,7},{8,9},{10}
The overlapping clusters are {1,2,3,11,12,15},{4,5,6,7,13},
{8,9,12,13,14,15},{10,12,13}
 how many steps are requred before the
algorithm converges to a idempoent matrix?

 The number is typically somewhere between


10 and 100

 The effect of inflation on cluster granularity


R denotes the inflation operation
constants. a denotes the loop weight.
 MCL stimulates random walk on graph to find
cluster

 Expansion promotes dense region while


Inflation demotes the less favoured region

 There is intrinsic relationship between MCL


result and cluster structure

You might also like