0% found this document useful (0 votes)

64 views

Markov Clustering Algorithm

The document describes the Markov Clustering (MCL) algorithm for graph clustering. MCL uses random walks within a graph and two main operations - expansion and inflation. Expansion simulates multiple random walks to enhance flow between well-connected nodes. Inflation increases inequality in flow distribution, favoring nodes that receive more flow. Together, these operations cause flow to accumulate and form high-density clusters. MCL can find overlapping clusters and works by analyzing flow distributions within a graph to reveal its inherent cluster structure.

Uploaded by

Aaryan Gupta

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views

Markov Clustering Algorithm

Uploaded by

Aaryan Gupta

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 53

 Introduction

 Important Concepts in MCL Algorithm

 MCL Algorithm

 The Features of MCL Algorithm

 Summary
Graph Clustering
 Intuition:
◦ High connected nodes could be in one cluster
◦ Low connected nodes could be in different clusters.
 Model:
◦ A random walk may start at any node
◦ Starting at node r, if a random walk will reach node
t with high probability, then r and t should be
clustered together.
Markov Clustering (MCL)
 Markov process
◦ The probability that a random will take an edge at
node u only depends on u and the given edge.
◦ It does not depend on its previous route.
◦ This assumption simplifies the computation.
MCL
 Flow network is used to approximate the
partition
 There is an initial amount of flow injected

into each node.

 At each step, a percentage of flow will goes

from a node to its neighbors via the outgoing

edges.
MCL
 Edge Weight
◦ Similarity between two nodes
◦ Considered as the bandwidth or connectivity.
◦ If an edge has higher weight than the other, then
more flow will be flown over the edge.
◦ The amount of flow is proportional to the edge
weight.
◦ If there is no edge weight, then we can assign the
same weight to all edges.
Intuition of MCL
 Two natural clusters

A B

 When the flow reaches the border points, it is likely

to return back, than cross the border.
MCL
 When the flow reaches A, it has four possible
outcomes.
◦ Three back into the cluster, one leak out.
◦ ¾ of flow will return, only ¼ leaks.
 Flow will accumulate in the center of a cluster
(island).
 The border nodes will starve.
 Simualtion of Random Flow in graph

 Two Operations: Expansion and Inflation

 Intrinsic relationship between MCL process

result and cluster structure
 Popular Description: partition into graph so
that

 Intra-partition similarity is the highest

 Inter-partition similarity is the lowest

 Observation 1:

 The number of Higher-Length paths in G is

large for pairs of vertices lying in the same
dense cluster

 Small for pairs of vertices belonging to

different clusters
 Oberservation 2:

 A Random Walk in G that visits a dense

cluster will likely not leave the cluster until
many of its vertices have been visited
Definitions
 nxn Adjacency matrix A.
◦ A(i,j) = weight on edge from i to j
◦ If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric

 nxn Transition matrix P.

◦ P is row stochastic
◦ P(i,j) = probability of stepping on node j from node i
= A(i,j)/∑iA(i,j)

 nxn Laplacian Matrix L.

◦ L(i,j)=∑iA(i,j)-A(i,j)
◦ Symmetric positive semi-definite for undirected graphs
◦ Singular
Definitions

Adjacency matrix A
Transition matrix P

1 1
1 1/2
1 1

1 1/2
What is a random walk
t=0
1
1/2
1

1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2

t=2
1
1/2
1

1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2

t=2 t=3
1
1/2 1
1
1/2
1

1/2
1/2
Probability Distributions
 xt(i) = probability that the surfer is at node i at time t

 xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i)

=∑jxt(j)*P(j,i)

 xt+1 = xtP = xt-1PP= xt-2PP*P = …=x0 Pt

 What happens when the surfer keeps walking for a long

time?
Flow Formulation

• Flow: Transition probability from a node to another node.

• Flow matrix: Matrix with the flows among all nodes; ith
column represents flows out of ith node. Each column sums
to 1.

1 2 3

1 2 3
0.5 0.5 1 0 0.5 0
Flow
1 2 3 2 1.0 0 1.0
1 1 Matrix
3 0 0.5 0

20
 Measure or Sample any of these—high-length
paths, random walks and deduce the cluster
structure from the behavior of the samples
quantities.

 Cluster structure will show itself as a peaked

distribution of the quantities

 A lack of cluster structure will result in a flat

distribution
 Markov Chain

 Random Walk on Graph

 Some Definitions in MCL

 A Random Process with Markov Property

 Markov Property: given the present state,

future states are independent of the past
states

 At each step the process may change its state

from the current state to another state, or
remain in the same state, according to a
certain probability distribution.
 A walker takes off on some arbitrary vertex

 He successively visits new vertices by

selecting arbitrarily one of outgoing edges

 There is not much difference between

random walk and finite Markov chain.
 Simple Graph

 Simple graph is undirected graph in which

every nonzero weight equals 1.
 Associated Matrix

 The associated matrix of G, denoted MG ,is

defined by setting the entry (MG)pq equal to
w(vp,vq)
 Markov Matrix

 The Markov matrix associated with a graph G

is denoted by TG and is formally defined by
letting its qth column be the qth column of M
normalized
 The associate matrix and markov matrix is
actually for matrix M+I

 I denotes diagonal matrix with nonzero

element equals 1

 Adding a loop to every vertex of the graph

because for a walker it is possible that he will
stay in the same place in his next step
 Find Higher-Length Path

 Start Point: In associated matrix that the

quantity (Mk)pq has a straightforward
interpretation as the number of paths of
length k between vp and vq
MG

(MG+I)2
MG
 Flow is easier with dense regions than across
sparse boundaries,

 However, in the long run, this effect

disappears.

 Power of matrix can be used to find higher-

length path but the effect will diminish as the
flow goes on.
 Idea: How can we change the distribution of
transition probabilities such that prefered
neighbours are further favoured and less
popular neighbours are demoted.

 MCL Solution: raise all the entries in a given

column to a certain power greater than 1 (e.g.
squaring) and rescaling the column to have
the sum 1 again.
 Expansion Operation: power of matrix,
expansion of dense region

 Inflation Operation: mention aboved,

elimination of unfavoured region
The MCL algorithm
Input: A, Adjacency matrix
Initialize M to MG, the canonical
Enhances flow to well-connected nodes
transition matrix M:= MG:= (A+I) D-1
as well as to new nodes.

Expand: M := M*M

Inflate: M := M.^r (r usually Increases inequality in each column.

2), renormalize columns “Rich get richer, poor get poorer.”

Prune

Saves memory by removing entries close

No
Converged to zero.
?
Yes

Output clusters
43
Multi-level Regularized MCL
Run R-MCL to convergence, output clusters.

Input Graph Input Graph

Coarsen Run Curtailed R-MCL,project flow.

Intermediate Intermediate
Graph Graph Initializes flow
Coarsen matrix of refined
... ... graph

Run Curtailed R-MCL, project flow.

Coarsen
Captures global
topology of graph Faster to run on
smaller graphs first
Coarsest Graph

44
 http://www.micans.org/mcl/ani/mcl-animati
on.html
 Find attractor: the node a is an attractor if
Maa is nonzero
 Find attractor system: If a is an attractor then

the set of its neighbours is called an attractor

system.
 If there is a node who has arc connected to

any node of an attractor system, the node will

belong to the same cluster as that attractor
system.
Attractor Set={1,2,3,4,5,6,7,8,9,10}
The Attractor System is {1,2,3},{4,5,6,7},{8,9},{10}
The overlapping clusters are {1,2,3,11,12,15},{4,5,6,7,13},
{8,9,12,13,14,15},{10,12,13}
 how many steps are requred before the
algorithm converges to a idempoent matrix?

 The number is typically somewhere between

10 and 100

 The effect of inflation on cluster granularity

R denotes the inflation operation
constants. a denotes the loop weight.
 MCL stimulates random walk on graph to find
cluster

 Expansion promotes dense region while

Inflation demotes the less favoured region

 There is intrinsic relationship between MCL

result and cluster structure

Sistem Kendali Digital - Week 2-1-1
No ratings yet
Sistem Kendali Digital - Week 2-1-1
35 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
29 pages
Random Walks On Graphs: An Overview: Purnamrita Sarkar
No ratings yet
Random Walks On Graphs: An Overview: Purnamrita Sarkar
71 pages
Lect 14-Web Ranking
No ratings yet
Lect 14-Web Ranking
30 pages
Lagrange-Euler Dynamics for Modelling Robotic Arm
No ratings yet
Lagrange-Euler Dynamics for Modelling Robotic Arm
11 pages
Interpolation Algorithm For Asynchronous ADC-data
No ratings yet
Interpolation Algorithm For Asynchronous ADC-data
6 pages
Signal Flow Graph
No ratings yet
Signal Flow Graph
9 pages
Kasami
No ratings yet
Kasami
20 pages
Introduction To: Fading Channels, Part 2 Fading Channels, Part 2
No ratings yet
Introduction To: Fading Channels, Part 2 Fading Channels, Part 2
39 pages
Ch-8-Daa-Mb Students
No ratings yet
Ch-8-Daa-Mb Students
27 pages
Chapter 2
No ratings yet
Chapter 2
50 pages
Notes On Circuital Representation of Two-Conductor Transmission Lines
No ratings yet
Notes On Circuital Representation of Two-Conductor Transmission Lines
32 pages
Notes On Circuital Representation of Two-Conductor Transmission Lines
No ratings yet
Notes On Circuital Representation of Two-Conductor Transmission Lines
32 pages
Markov Chain (Part 1)
No ratings yet
Markov Chain (Part 1)
31 pages
STA 412 Markov chain
No ratings yet
STA 412 Markov chain
11 pages
Lec 2 DSP
No ratings yet
Lec 2 DSP
25 pages
Ap Exam Supplement: Specifically Developed For This Year's AP Exam
No ratings yet
Ap Exam Supplement: Specifically Developed For This Year's AP Exam
8 pages
Signal Flow Graphs
No ratings yet
Signal Flow Graphs
5 pages
1019487508
No ratings yet
1019487508
27 pages
Modeling Dynamic Systems 2
No ratings yet
Modeling Dynamic Systems 2
56 pages
Lecture 3 FSM
No ratings yet
Lecture 3 FSM
14 pages
224C2 2-2006
No ratings yet
224C2 2-2006
8 pages
EE 218 - Lecture 3 Generation Shift Factors-2018-06
No ratings yet
EE 218 - Lecture 3 Generation Shift Factors-2018-06
23 pages
Vector Calculus in Two Dimensions
No ratings yet
Vector Calculus in Two Dimensions
27 pages
DWT Matlab
No ratings yet
DWT Matlab
5 pages
Analysis_of_Active_Clamp_Fly_Back_Converter
No ratings yet
Analysis_of_Active_Clamp_Fly_Back_Converter
13 pages
EE 301 Lec2 2024
No ratings yet
EE 301 Lec2 2024
31 pages
signalflowgraph-150409084237-conversion-gate01
No ratings yet
signalflowgraph-150409084237-conversion-gate01
15 pages
Classical Electrodynamics
No ratings yet
Classical Electrodynamics
22 pages
Signal Flow Graph
No ratings yet
Signal Flow Graph
48 pages
Section 3
No ratings yet
Section 3
22 pages
Signal Flow Graph
No ratings yet
Signal Flow Graph
61 pages
3-MS2 (MCMC)
No ratings yet
3-MS2 (MCMC)
32 pages
Ss2 Post Prac
No ratings yet
Ss2 Post Prac
9 pages
Digital Vlsi Design: Unit 3: Basic Circuit Concepts and Scaling of MOS Circuits
No ratings yet
Digital Vlsi Design: Unit 3: Basic Circuit Concepts and Scaling of MOS Circuits
13 pages
Markov Chains
No ratings yet
Markov Chains
35 pages
Entrepreneurship and Marketing Chapter 12 Lec#31
No ratings yet
Entrepreneurship and Marketing Chapter 12 Lec#31
16 pages
Chapter 4 - System Representation
No ratings yet
Chapter 4 - System Representation
24 pages
Discrete Finite Markov Chains Notes
No ratings yet
Discrete Finite Markov Chains Notes
7 pages
Lecture 36
No ratings yet
Lecture 36
26 pages
Diffusion
No ratings yet
Diffusion
15 pages
Floyd Warshall Report
No ratings yet
Floyd Warshall Report
11 pages
Final Report On Vlsi
No ratings yet
Final Report On Vlsi
25 pages
Lab Manual
No ratings yet
Lab Manual
9 pages
EE_301_Lec2_2023
No ratings yet
EE_301_Lec2_2023
31 pages
Mod 8 - Lecture 1
No ratings yet
Mod 8 - Lecture 1
24 pages
Lattice Vibrations, Part I: Solid State Physics 355
No ratings yet
Lattice Vibrations, Part I: Solid State Physics 355
22 pages
SLAM of Multi-Robot System Considering Its Network Topology: H. W. Dong, Z. W. Luo
No ratings yet
SLAM of Multi-Robot System Considering Its Network Topology: H. W. Dong, Z. W. Luo
2 pages
Lecture 6: Random Walks
No ratings yet
Lecture 6: Random Walks
20 pages
Lagrange_equations_Lec9
No ratings yet
Lagrange_equations_Lec9
17 pages
Combining
No ratings yet
Combining
5 pages
Problems Jacobians
No ratings yet
Problems Jacobians
5 pages
CMOS Inverter Module 2
100% (1)
CMOS Inverter Module 2
38 pages
Structured Multivariable Phase Margin Analysis With Applications To A Missile Autopilot
No ratings yet
Structured Multivariable Phase Margin Analysis With Applications To A Missile Autopilot
9 pages
QRD-RLS Adaptive Filter Based Antenna Beam Forming For OFDM
No ratings yet
QRD-RLS Adaptive Filter Based Antenna Beam Forming For OFDM
12 pages
Markov Chains
100% (7)
Markov Chains
91 pages
BT302_L9_HMM
No ratings yet
BT302_L9_HMM
29 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Constructed Layered Systems: Measurements and Analysis
From Everand
Constructed Layered Systems: Measurements and Analysis
W. H. Cogill
No ratings yet
Saharsa Dist
No ratings yet
Saharsa Dist
1 page
IR - A1 - Mid Sem Marks - Pre Compre
No ratings yet
IR - A1 - Mid Sem Marks - Pre Compre
8 pages
CHEM F212 Organic Chemistry Midsem Test Notice
No ratings yet
CHEM F212 Organic Chemistry Midsem Test Notice
1 page
Ug-Boi
No ratings yet
Ug-Boi
81 pages
Role of Lnrna and Chromatin
No ratings yet
Role of Lnrna and Chromatin
5 pages
Germ Cell and Fertilization 1
No ratings yet
Germ Cell and Fertilization 1
70 pages
Devbio
No ratings yet
Devbio
140 pages
Development Theories 1 2
No ratings yet
Development Theories 1 2
66 pages
(I) Explain The Terms Chiral Non-Racemic and Chiral Racemic. 2
No ratings yet
(I) Explain The Terms Chiral Non-Racemic and Chiral Racemic. 2
2 pages
On Radio Analytic Mean Related Graphs: - Distance Number of Some Cycle
No ratings yet
On Radio Analytic Mean Related Graphs: - Distance Number of Some Cycle
8 pages
MST Kruskals Prims
No ratings yet
MST Kruskals Prims
18 pages
Graph Theory IIT Kanpur
No ratings yet
Graph Theory IIT Kanpur
2 pages
Lecture 35 Aproximation Algorithms
No ratings yet
Lecture 35 Aproximation Algorithms
13 pages
ADas Solution CSE101
100% (5)
ADas Solution CSE101
20 pages
CPE133-Lecture Notes - L7-451-Graphs and Search
No ratings yet
CPE133-Lecture Notes - L7-451-Graphs and Search
36 pages
Re-Exam "Discrete: Optimization"
No ratings yet
Re-Exam "Discrete: Optimization"
5 pages
Social Network Analysis Unit-4
No ratings yet
Social Network Analysis Unit-4
21 pages
17 Trees Slides
No ratings yet
17 Trees Slides
31 pages
MODULE-Analyzing Co-Occurrence-Networks With GraphX
No ratings yet
MODULE-Analyzing Co-Occurrence-Networks With GraphX
43 pages
Network Flow Problems: Total Distance From O To F ?
No ratings yet
Network Flow Problems: Total Distance From O To F ?
7 pages
MAT3707 Assignment01 Solutions
No ratings yet
MAT3707 Assignment01 Solutions
9 pages
Lab Exercise Brute Force 3.2
No ratings yet
Lab Exercise Brute Force 3.2
19 pages
SNA - Question Bank Unit 4
No ratings yet
SNA - Question Bank Unit 4
3 pages
Approximation Algorithm
No ratings yet
Approximation Algorithm
17 pages
Review Practice - Quadratic Graphs and Their Features
No ratings yet
Review Practice - Quadratic Graphs and Their Features
9 pages
Calculation of Area of Steel in Slab
No ratings yet
Calculation of Area of Steel in Slab
10 pages
11 Chapter-1
No ratings yet
11 Chapter-1
16 pages
CS 4820-Lecture 6
No ratings yet
CS 4820-Lecture 6
2 pages
ITP3902_DMS_Lec_6_Graph_part2
No ratings yet
ITP3902_DMS_Lec_6_Graph_part2
27 pages
Tutorial 1 Solution-1
No ratings yet
Tutorial 1 Solution-1
3 pages
Initial Layout Recovered)
No ratings yet
Initial Layout Recovered)
43 pages
Asst9 Math 239
No ratings yet
Asst9 Math 239
5 pages
Discrete Mathematics2021
No ratings yet
Discrete Mathematics2021
2 pages
Weighted Graphs and Dijkstra's Algorithm
No ratings yet
Weighted Graphs and Dijkstra's Algorithm
18 pages
BCA V sem-DAA - Lab programs-1
No ratings yet
BCA V sem-DAA - Lab programs-1
46 pages
Module V - Graph Theory Notes.
No ratings yet
Module V - Graph Theory Notes.
75 pages
CS301 Midterm Preparation File by ZB
No ratings yet
CS301 Midterm Preparation File by ZB
18 pages
Divide and Conquer
No ratings yet
Divide and Conquer
10 pages
Precalculus - ANSWERSHEET Q1 M7 8
No ratings yet
Precalculus - ANSWERSHEET Q1 M7 8
2 pages

Markov Clustering Algorithm

Uploaded by

Markov Clustering Algorithm

Uploaded by

 Introduction

 Important Concepts in MCL Algorithm

 The Features of MCL Algorithm

into each node.

from a node to its neighbors via the outgoing

 When the flow reaches the border points, it is likely

 Two Operations: Expansion and Inflation

 Intrinsic relationship between MCL process

 Intra-partition similarity is the highest

 Inter-partition similarity is the lowest

 The number of Higher-Length paths in G is

 Small for pairs of vertices belonging to

 A Random Walk in G that visits a dense

 nxn Transition matrix P.

 nxn Laplacian Matrix L.

 xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i)

 xt+1 = xtP = xt-1*P*P= xt-2*P*P*P = …=x0 Pt

 What happens when the surfer keeps walking for a long

• Flow: Transition probability from a node to another node.

 Cluster structure will show itself as a peaked

 A lack of cluster structure will result in a flat

 Random Walk on Graph

 Some Definitions in MCL

 Markov Property: given the present state,

 At each step the process may change its state

 He successively visits new vertices by

 There is not much difference between

 Simple graph is undirected graph in which

 The associated matrix of G, denoted MG ,is

 The Markov matrix associated with a graph G

 I denotes diagonal matrix with nonzero

 Adding a loop to every vertex of the graph

 Start Point: In associated matrix that the

 However, in the long run, this effect

 Power of matrix can be used to find higher-

 MCL Solution: raise all the entries in a given

 Inflation Operation: mention aboved,

Inflate: M := M.^r (r usually Increases inequality in each column.

Saves memory by removing entries close

Input Graph Input Graph

Coarsen Run Curtailed R-MCL,project flow.

Run Curtailed R-MCL, project flow.

the set of its neighbours is called an attractor

any node of an attractor system, the node will

 The number is typically somewhere between

 The effect of inflation on cluster granularity

 Expansion promotes dense region while

 There is intrinsic relationship between MCL

You might also like

 xt+1 = xtP = xt-1PP= xt-2PP*P = …=x0 Pt