0% found this document useful (0 votes)

12 views

3-KMEANS - An Efficient Community Detection Method Based On Rank Centrality-2012

This document proposes an efficient community detection algorithm called K-rank that uses rank centrality to select initial seed nodes. K-rank then classifies nodes based on signal similarity and iteratively updates the seeds. The algorithm is extended to directed, weighted, and overlapping networks. Experimental results on synthetic and real networks show K-rank outperforms other algorithms like K-means, BGLL, LPA, infomap and OSLOM.

Uploaded by

meriem djellal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

3-KMEANS - An Efficient Community Detection Method Based On Rank Centrality-2012

Uploaded by

meriem djellal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Physica A 392 (2013) 2182–2194

Contents lists available at SciVerse ScienceDirect

Physica A
journal homepage: www.elsevier.com/locate/physa

An efficient community detection method based on

rank centrality
Yawen Jiang, Caiyan Jia ∗ , Jian Yu
School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China

article info abstract

Article history: Community detection is a very important problem in social network analysis. Classical
Received 10 July 2012 clustering approach, K -means, has been shown to be very efficient to detect communities
Received in revised form 31 October 2012 in networks. However, K -means is quite sensitive to the initial centroids or seeds, especially
Available online 22 December 2012
when it is used to detect communities. To solve this problem, in this study, we propose an
efficient algorithm K -rank, which selects the top-K nodes with the highest rank centrality
Keywords:
as the initial seeds, and updates these seeds by using an iterative technique like K -means.
Community detection
Clustering
Then we extend K -rank to partition directed, weighted networks, and to detect overlapping
Rank centrality communities. The empirical study on synthetic and real networks show that K -rank is
Vertex similarity robust and better than the state-of-the-art algorithms including K -means, BGLL, LPA,
Overlapping communities infomap and OSLOM.
© 2012 Elsevier B.V. All rights reserved.

1. Introduction

Many network systems are made of individuals or organizations that are related to each other by various interdepen-
dencies like friendship, kinship, etc. These members can be represented as vertices in a graph, and the relationships can be
represented as edges between the vertices. There are many different networks in the real world, such as biological networks,
ecological (species and their trophic interactions) networks, social (people and their interactions) networks, etc. With the de-
velopment of complex network research, scientists have found many particular properties in complex networks, e.g., small
world [1], scale-free [2] and modularity [3,4]. Modularity means that there are some community structures in social net-
works. These communities are subgroups of vertices where the edges between the vertices in the same groups are much
denser than those in different groups. Identifying communities in complex networks is very important for many real appli-
cations. For instances, a community in a social network implies common beliefs among people. A community in a WWW
network indicates the same topic among pages. So far, many algorithms have been developed to detect communities in
complex networks. These algorithms can be simply divided into two classes: vertex clustering methods based on traditional
clustering and new developed methods based on network topology.
Vertex clustering methods based on traditional clustering include K -means [5], AP [6] (affinity propagation), minimum
cut clustering [7] and NMF (nonnegative matrix factorization) clustering [8], etc. Clustering methods are efficient algorithms
to settle the problem of community detection. However, some clustering algorithms also have their own defects, such as,
K -means is sensitive to its initial seeds, AP’s performance depends heavily on its parameters, etc. New developed methods
based on network topology include GN [3], modularity maximizing methods [9–12], OSLOM [13], infomap [14], LPA [15],
etc. GN [3] is a classical one which deletes edges with maximal betweenness. But due to the high complexity for computing
the betweenness, GN is not efficient in large networks with thousands of vertices. Another type of classical methods

∗ Correspondence to: School of Computer and Information Technology, Lab of Machine Learning and Cognitive Computation, Beijing Jiaotong University,
Beijing, 100044, China.
E-mail addresses: [email protected] (Y. Jiang), [email protected] (C. Jia), [email protected] (J. Yu).

0378-4371/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.physa.2012.12.013
Y. Jiang et al. / Physica A 392 (2013) 2182–2194 2183

partitions networks by maximizing the modularity function [9–12] of a network. But all of them suffer from the resolution
limitation [16]. OSLOM [13] is based on a statistical model, which finds statistically significant communities in networks.
Infomap [14] is an information theoretic approach based on random walks, which uses the probability flow of random walks
on a network as a proxy for information flows. Raghavan et al. [15] employed a simple label propagation algorithm (LPA) to
find communities in large real-world networks. Although some of these algorithms are very fast, they sometimes have low
accuracy when the community structure of the network is not clear.
Among those clustering algorithms, K -means [5] is widely used and it is more efficient because of its fast convergence.
So in this paper, we focus on community detection using the basic idea of K -means. As mentioned above, K -means is
very sensitive to its initial seeds; in other words, if the community number is large, it may produce an empty cluster or
bad clustering results. Although K -means++ [17] was introduced to solve this problem, K -means++ [17] has high time
complexity when choosing the initial seeds. Therefore, in this paper we propose a new efficient clustering algorithm called
K -rank based on rank centrality which is much faster than K -means++ and synchronously solves the K -means’s problem.
First, K -rank finds K seeds via a new initial seeds choice strategy. Second, K -rank classifies the vertices using vertex
similarities, and then updates these seeds, iterating. Like K -means algorithm, K -rank is simple and has fast convergence
in finding communities. Furthermore, K -rank can be easily extended to directed, weighted and overlapping networks.
The rest of this paper is organized as follows. Section 2 presents the K -rank algorithm. In Section 3 we discuss how
K -rank can be extended to directed, weighted and overlapping networks. Experimental results on synthetic and real-world
networks are shown in Section 4. Section 5 contains the conclusions and summary.

2. K -rank algorithm

In cluster analysis, how to define vertex similarities is significant. In order to treat the community detection as a clustering
problem, the most crucial step is how to measure the similarities between vertices. At present, there are many ways to define
vertex similarities. The literature [18] gives an excellent summary about vertex similarities. It classified all the similarity
indices into three types, local, global and quasi-local indices. In this paper, we choose the signal similarity [19] based on
global network topology, which changes the network topology into a geometrical structure of vectors in n-dimensional
Euclidean space.
In this section, we present some related problems about the K -rank algorithm, including signal similarity, how to choose
the initial seeds, the problem of parameter choice, the process of K -rank algorithm and the details on how K was chosen.

2.1. Signal similarity

Signal [19] is a kind of vertex similarities definition based on signaling propagation, which changes the network topology
into a geometrical structure of vectors in n-dimensional Euclidean space. For a network with n vertices, every vertex is
assumed to be a source which can send, receive, and record signals. In this signaling process, all the vertices record the
amount of signals they have received, and at every step, each vertex sends all its present-owning signals to its neighbors
and itself. After c steps, the amount distribution of signals over the vertices could be viewed as the influence of the source
vertex on the whole network. Naturally, compared with the vertices in the other communities, the vertices of the same
community have a similar influence on the whole network. Therefore, normalizing these n vectors, the distance of each pair
of vectors will represent the similarity of the corresponding vertices. In fact, the propagation process could be described by
a simple and clear mathematical formula. Suppose A is the adjacency matrix of a network, In is an n-dimensional identity
matrix, then the matrix
U = (In + A)c (1)
represents the effects of source vertices to the whole network after c steps.

2.2. How to choose the seeds

Similar to K -means and K -means++, the choice of initial seeds will influence the results of K -rank algorithm to a great
extent. Up to now, many techniques were proposed to solve this problem [20]. Among them, K -means++ [17] is the most
popular. The difference between K -means and K -means++ is that K -means++ makes the chosen seeds as far away from
each other as possible. However, K -means++ needs to compare almost all the vertices in the network before choosing a
new initial seed, if the network is large, the time complexity of this process will be very high. The proposed K -rank algorithm
adopts an effective method based on rank centrality and vertex similarities which purposely puts the chosen seeds as far
away from each other as possible as well. K -rank does not need to compare all the vertices of the network each time.
• Rank centrality. For vertices ranking, we adopt the PageRank authority centrality which was originally developed by Brin
and Page [21] to rank the authority of web pages using the hyperlink structures of the web. Assume that a random
walker follows the structure of the network by the transition matrix P and sometimes randomly jumps to a vertex by the
probability distribution v , then PageRank vector r satisfies the following equation [21]:

r T = α r T P + (1 − α)v T (2)
2184 Y. Jiang et al. / Physica A 392 (2013) 2182–2194

in which r refers to the steady state distribution of the random walk governed by the transition matrix α P + (1 − α)lv T ,
where l is a column of ones. If v is uniform over V , then the steady state vector r is referred to the global PageRank vector
(GPR).
• Choice of K seeds. After ranking the vertices of the network, we can know the order of the vertices based on their PageRank
(PR) value. The larger the vertex’s PR value is, the more possible this vertex is a seed. From this view, perhaps simply
choosing the top K largest PR value vertices is an appropriate way to find the seeds. But as described in Ref. [21], if a
page (vertex) PR value is very large, then the PR values of the other pages (vertices) which link to this page are also large.
As a result, if we choose the top K largest PR value vertices, it is very common that we get K seeds which are very near
to each other. This situation violates the rule which requires the chosen seeds as far away from each other as possible.
Therefore, this seeds choice method is an alternative way which is slightly better than random choice but not the best
one. How can we find the seeds which have large PR value and at the same time these seeds are as far away from each
other as possible? This is the reason why we use vertex similarities. First, we choose the vertex with the largest PR value
as the first seed. Then we consider the vertex with the second largest PR value, if the similarity between it and the first
seed is smaller than a threshold µ, we choose it as the second seed. In the same way, at the tth step, we choose a vertex
as a new seed if the similarities between it and the chosen seeds are all smaller than µ until we get K seeds. In this way,
the value of µ determines the extent of the separated seeds. The smaller µ is, the farther between the seeds. If µ is fixed,
we can get K seeds by rank centrality and vertex similarities.
• The threshold µ. The parameter selection problem is very hard in machine learning. In the problem of seed choice, the
threshold µ is an important factor and we believe that how to estimate it is still an open problem. Small µ means that it
is hard to find K seeds in the minority of the vertices. We have to consider the majority of the vertices in order to make
sure that the similarities between the seeds are all smaller than the threshold. In the extreme case, all the vertices are
considered once. But in this situation, the choice of seeds is poor because the PR values of most seeds are very low. On
the contrary, large µ means that it is easy to find K seeds in the minority of the vertices. In the extreme case, µ is large
enough so that top K largest PR value vertices are all the seeds. This choice is also poor because these seeds are not far
from each other. So the threshold µ should be neither too large nor too small. Though it is hard to define its exact value,
we can estimate it by some heuristic and experiential means. If K is small, a smaller µ is needed, whereas large K means
that a larger µ is better. Besides, generally speaking, when choosing seeds, we should consider 10%–80% of the vertices
to guarantee the chosen seeds are far from each other and simultaneously that have large PR values. At last, if we use
the negative Euclidean distance to measure the similarities between the vectors of the vertices [19], the experimental
results tell us that [−1, −0.6] is a better experiential range of the threshold µ.

2.3. K -rank algorithm

First, we focus on a simple undirected, unweighted network G without multiple links or self-loops. The whole K -rank
algorithm based on rank centrality to detect communities in the network is summarized as Algorithm 1. Due to the fast
convergence of K -means and PageRank algorithms, K -rank algorithm also has fast convergence. In fact, K -rank algorithm
can be regarded as extended and improved K -means or K -means++. A difference between K -rank and K -means(++) is the
manner of updating seeds. K -rank adopts an effective and fast algorithm PageRank to update the seeds. Moreover, K -means
or K -means++ needs a number of iterations to make the algorithm convergent, while K -rank needs few iterations, indeed,
needs only one iteration if an appropriate µ is chosen. If the vertices of the network and K are very large, K -rank is still
possible to find empty clusters. In our implementation of K -rank, we deal with it by abandoning the seeds of empty clusters
and choosing seeds from the other vertices randomly until there is no any empty cluster.

Algorithm 1: K -rank Algorithm

Input: K , the number of communities; A, the affinity matrix of network G; µ, the threshold; Nmax , the maximum
number of iterations.
Output: the communities of network G.
(1) Run signal propagation [19] to change the network topology into a geometrical structure of vectors in
n-dimensional Euclidean space.
(2) Find K seeds using the method introduced in the first part of section 2, meanwhile, vertex similarities are
computed by the negative Euclidean distance.
repeat
(3). Based on vertex similarities, cluster the vertices into K communities with every vertex and its most similar
seed being in the same community.
(4) Regard all the communities as small subnetworks and find their new seeds using rank centrality, thereby, all
the seeds are updated.
until All the seeds or all the found communities remain unchanged, or the number of iterations comes to Nmax ;
Y. Jiang et al. / Physica A 392 (2013) 2182–2194 2185

2.4. How to specify the number of communities K

In the first step of the K -rank algorithm, we must specify additional information about the number of communities (K ).
In this paper, we use F statistics [19,22] to estimate a proper K . Suppose U = {u1 , u2 , . . . , un } is the set of vectors of all
vertices and uj = (xj1 , xj2 , . . . , xjn ), here, xjk is the kth character quantity of uj . Suppose K is the number of communities
and ni is the number of vertices of the ith community. All the vertices’ vectors of the ith community are ui1 , ui2 , . . . , uini . Let
 ni
x̄ik = 1
ni j =1 uij (k), k = 1, 2, . . . , n, be the mean characters of ith community, ūi = (x̄i1 , x̄i2 , . . . , x̄in ) be the ith community’s
center, and ū = (x̄1 , x̄2 , . . . , x̄n ) be all the vertices’ centers, here x̄k = xjk (k = 1, 2, . . . , n). Then the F statistic is
1
n
n j =1
defined as
K
 ni ∥ūi −ū∥2
K −1
i=1
F = (3)
ni
K 
 ∥uij −ūi ∥2
n −K
i=1 j=1

k=1 (x̄k − x̄k )2 is the distance between ūi and ū, and ∥uij − ūi ∥ is the distance between the uij vertex of
n i
where ∥ūi − ū∥ =
i
the ith and the center ū . The numerator of F signifies the distance of intercommunities and the denominator the distance
of intracommunities. So the F could be larger when the difference distance of intercommunities is larger and the difference
distance of intracommunities is smaller. When F achieves the maximum, we can get the best K . The literature [19] has told
us that, the clearer the community structure is, the more distinct the maximal value of the F statistics.

2.5. The complexity of K -rank algorithm

The complexity of K -rank consists of signaling process [19], seeds choice and the iterative process. The time complexity
of the process of signal diffusion is O(cn3 ) [19] when we use multiplication of the matrix to simulate the process, where c
is the number of propagation and n is the number of vertices. But if we simulate the process in the network directly, the
corresponding time complexity is O(c (k + 1)n2 ) [19], where k is the average degree of vertices in the network. Compared
with signaling process, the time complexity of seeds choice is trivial, because K is much less than n and PageRank algorithm
is very fast, GPR can be easily computed by the power iteration. The iterative process is similar to K -means algorithm [5],
the only difference is the strategy of updating the seeds. K -means computes the means of all the community members
while K -rank finds the new seeds by calculating rank centrality of communities. As the communities are subnetworks
which are much smaller than the original network, K -rank’s process of updating the seeds does not cost too much. If the
number of communities K is fixed, the time complexity of K -means clustering is O(nKt ) where t is the number of iterations.
Consequently, the total complexity of K -rank is O(c (k + 1)n2 + nKt ).

3. Generalization to weighted, directed and overlapping networks

Some of the existing methods that are proposed especially for community detection can only find communities in
undirected and unweighted networks, which cannot be easily extended to directed and weighted networks. However, in
the real world, there are many networks where edges’ directedness and weights (indicating the strength of the interaction
between vertices) are essential features, such as citation networks, web pages, etc. Besides, in some real-world networks,
like social networks, vertices sometimes belong to more than one community. These communities are called overlapping
communities and in practical situations it is very common for communities to overlap, but not all the community detection
methods can handle these types of networks. Compared with detecting communities in directed and weighted networks,
finding overlapping communities is more difficult. In this section, we will discuss how K -rank can be extended to directed,
weighted and overlapping networks.

3.1. Generalization to weighted and directed networks

Suppose we have a weighted and directed network with n vertices, it can be represented mathematically by an adjacency
matrix W with elements Wij which denotes the connection strength of vertex i and j. Also, since we are dealing with directed
networks, Aij ̸= Aji and Wij ̸= Wji in general. Then in the first step of K -rank, we use [19]

U = ( In + W ) c (4)
instead of U = (In +A) . In this way, we can compute vertex similarities incorporating edges’ weights and direction naturally.
c

In the same way, if we regard the original network as a weighted and directed graph, the weights and direction are also easily
incorporated because PageRank can work on a weighted and directed web network. Thereafter, the rest of the algorithm is
the same as the original K -rank algorithm.
2186 Y. Jiang et al. / Physica A 392 (2013) 2182–2194

3.2. Generalization to overlapping networks

We have found that overlapping vertices always exist in the boundary area of different communities and their
memberships of different communities are almost the same. So first, we define the vertex membership measure B.
• Membership measures of vertices. Suppose a network is divided into K communities C1 , C2 , . . . , CK , and we have already
got the vertex similarity S (i, j) which denotes the similarity between vertex i and j (i ̸= j). Then we can define four
membership measures of vertices in the following.
(1) Vertex i’s maximum membership measure with regard to community j:
Bmax (i, j) = max S (i, x). (5)
x∈Cj ,x̸=i
(2) Vertex i’s minimum membership measure with regard to community j:
Bmin (i, j) = min S (i, x). (6)
x∈Cj ,x̸=i
(3) Vertex i’s average membership measure with regard to community j:
1 
Baverage (i, j) = S (i, x), (7)
Nj x∈C ,x̸=i
j
where Nj denotes the number of vertices in community j except vertex i.
(4) Vertex i’s central membership measure with regard to community j:
Bcenter (i, j) = S (i, e), (8)
where e is the seed of community j and i ̸= e.
All the membership measures can tell us how closely vertex i belongs to community j. The larger the membership measure
is, the more possibly i belongs to community j. It is easy to find out that the performances of maximum and minimum
membership measures are similarly sensitive to the noises. While the other two measures are robust and not sensitive
to the noises. Therefore, we choose central membership to measure the possibility of a vertex belonging to a community.
• Finding overlapping vertices. According to the definitions of membership measures, we can see that the central
membership measure can be regarded as a kind of vertex similarity. Given the community structure of the network and
the seed of each community, we can get the vertex central membership measure by vertex similarities. Then, based on
this measure, we can judge every vertex whether it is overlapping. Therefore, first we should partition the network using
K -rank algorithm, get the communities and the seeds. And we define another threshold ϵ which denotes the minimum
value of the similarities between all the seeds and the overlapping vertices in their own communities. The value of
ϵ cannot be fixed because at present we do not know which vertex is overlapping in each community. So we set the
threshold ϵ with different values, and calculate all the non-seed vertices’ central membership measure with regard to all
the communities. If vertex i’s central membership measure with regard to community j is no smaller than the threshold ϵ ,
then vertex i should belong to community j. After that, if a vertex i belongs to more than one community, it is overlapping.
Thus, different values of ϵ correspond with different results of overlapping vertices. At last, we use an evaluation function
Qov [23] (another modularity function for overlapping networks) to find the best cover of the overlapping communities.
• Estimate the threshold ϵ . In the process of overlapping vertices detection, the threshold ϵ determines the result of
the algorithm. As mentioned above, ϵ denotes the minimum value of the similarities between all the seeds and the
overlapping vertices in their own communities. Although ϵ is not a fixed value, and in general different networks have
different values of ϵ , we can still estimate it according to its upper bound and lower bound.
Suppose C1 , C2 , . . . , CK is a partition of the network G, ei is the seed of community i, x is a vertex in community
Ci , S (i, j) is the similarity between vertex i and vertex j, K is the number of communities. Then the upper bound of ϵ is
mini=1,....K minx∈Ci S (ei , x). We consider two cases. In the first case, if the result of the algorithm is the same as the ground
truth, then the value of ϵ is mini=1,....K minx∈Ci S (ei , x). The reason is obvious, no matter whether the vertex is overlapping,
it must be close to one or more communities which it belongs to and the similarities between them are all no smaller
than ϵ . In the second case, if the result of the algorithm is not the best, then all the overlapping vertices were classified
as if they were non-overlapping vertices. In this case, the value of ϵ should be smaller than mini=1,....K minx∈Ci S (ei , x).
As to the lower bound of ϵ , we cannot get a strict bound, based on the basic assumption of cluster analysis, similarities
in the same cluster are larger than those between the clusters, so we can relax the constrains of lower bound and use
similarities between the seeds of different communities to approximate it. Then a relatively looser lower bound of ϵ is
given by maxi,j∈1,...,K s.t.i̸=j S (ei , ej ). To sum up, the value of ϵ should lie in the range
 
max S (ei , ej ), min min S (ei , x) (9)
i,j∈1,...,K s.t.i̸=j i=1,....K x∈Ci

and we can find the appropriate value of ϵ near the upper bound.
• Why it can work. In the above section, we compute all the non-seed vertices’ central membership measure to detect
overlapping vertices. But how to deal with the problem if the seeds are overlapping? In fact, we do not need consider
whether the seeds are overlapping. The reason is that we make three basic assumptions.

Hypothesis 1. Every community must have only one seed.

Y. Jiang et al. / Physica A 392 (2013) 2182–2194 2187

Table 1
Parameters of LFR artificial networks for various figures.
Parameter Fig. 2(a, b) Fig. 4(a, b) Fig. 3(a, b) Fig. 5(a, b) Fig. 6(a, b)

N 1000 1000 5000 5000 10 000

mu 0.1–0.9 0.1–0.9 0.1–0.9 0.1–0.9 0.1–0.9
k 20 20 20 20 20
kmax 50 50 50 50 100
minc 10 20 10 20 10
maxc 50 100 50 100 100
t1 2 2 2 2 2
t2 1 1 1 1 1

Hypothesis 2. Each non-overlapping vertex must take its own community seed as a center, if the vertex is overlapping, this
center is not the only one.

Hypothesis 3. Each seed must take itself as the center of its community.

The networks we deal with must satisfy the three basic assumptions mentioned above. If a network satisfies them, we
can get the following Hypothesis 4.

Hypothesis 4. If a network satisfies the Hypotheses 1–3, then the overlapping vertices in the network cannot be the seeds
of communities.

The proof is based on contradiction. Assume that the overlapping vertices only belong to two different communities. In
the first case, if vertex ei is an overlapping vertex belonging to different communities Ci and Ci′ , moreover, ei is the seed of Ci
and Ci′ , then all the other vertices of Ci and Ci′ take ei as their center. In other words, these vertices of the two communities
should be actually in the same community which conflicts with the assumption that they are in the different communities.
In the second case, if vertex ei is an overlapping vertex belonging to two different communities Ci and Cj , ei is the seed of Ci
not the seed of Cj , then based on Hypothesis 1, there must be another seed ej in community Cj which is the center of ei . As a
result, the seed ei takes another vertex as its center which conflicts with Hypothesis 3.
Therefore, based on the above hypothesis, seeds cannot be overlapping, we can detect the overlapping vertices from
non-seed vertices without considering the seeds.

4. Experimental results and applications

In this section, we evaluate K -rank algorithm on artificial and real world networks, including undirected, directed,
unweighted, weighted and overlapping networks. As for the nonoverlapping networks with known community structure,
we use accuracy and NMI (Normalized Mutual Information) [24] to measure the similarity between the planted partitions
and the results of the algorithms. In the same way, as for the overlapping networks, the extended version of NMI [25] for
overlapping networks should be used instead.

4.1. Unweighted and undirected artificial networks without overlapping communities

The algorithms are first compared on two classes of benchmark networks, namely, Girvan and Newman [3] and
Lancichinetti et al. [26] benchmark (LFR) networks. For the former, we generate a set of artificial networks with 128 vertices
divided into 4 communities with 32 vertices respectively. The average degree of each vertex is set to 16 and the average
number of edges of each vertex in a community, denoted by Zin , is varied from 8 to 16, the average number of edges of each
vertex between communities, denoted by Zout , is varied from 8 to 0 such that Zin + Zout = 16 on average. Therefore, the larger
Zin is, the more easily we can detect the communities. Lancichinetti et al. [26] present a class of artificial networks whose
degree and community size distributions are power laws which is the same as many real-world networks. Parameters of
the latter benchmark networks used in our experiments are given as shown in Table 1. They are the number of vertices (N),
mixing parameter (mu), average degree of the vertices (k), maximum degree of the vertices (kmax ), minimum community
size (minc), maximum community size (maxc), and the exponents of the power law degree and community size distributions
(t1 and t2, respectively). The mixing parameter mu is defined such that every vertex shares a fraction 1 − mu links with
other vertices in its community and a fraction mu links with vertices outside its community.
In all of the experiments, we compare K -rank with other five classical algorithms: LPA [15], BGLL [11], infomap [14],
OSLOM [13] and K -means [5] (K -means++’s results are similar to K -means, so we only list the results of K -means). Because
LPA [15] and K -means [5] are both sensitive to the initial input, we run both of them ten times and get their best results. The
results of different Zin in Girvan and Newman’s networks [3] are shown as Fig. 1. The results show that for both accuracy and
NMI, all the methods can detect the exact communities when Zin ≥ 11, but when Zin ≤ 10 they have different performances.
In general, K -rank and K -means are the best among all the methods especially when Zin = 8. Infomap and BGLL have bad
2188 Y. Jiang et al. / Physica A 392 (2013) 2182–2194

1 1
LPA K−rank
K−rank 0.9 BGLL
0.9 BGLL infomap
infomap 0.8 OSLOM
0.8 OSLOM K−means
K−means 0.7 LPA
0.7
0.6
accuracy

NMI
0.6 0.5

0.5 0.4
0.3
0.4
0.2
0.3 0.1
0.2 0
8 9 10 11 12 13 14 15 16 8 9 10 11 12 13 14 15 16
Zin Zin
(a) Accuracy. (b) NMI.

Fig. 1. Comparison of K -rank and other algorithms in Girvan and Newman’s networks.

1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
accuracy

K−rank
NMI

0.5 0.5 K−rank

LPA
LPA
BGLL
0.4 infomap 0.4 BGLL
infomap
OSLOM
0.3 K−means 0.3 OSLOM
K−means
0.2 0.2
0.1 0.1
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
mu mu
(a) Accuracy. (b) NMI.

Fig. 2. Comparison of K -rank and other algorithms in LFR networks (N = 1000, [minc , maxc ] = [10, 50]).

results when Zin = 8, 9. Though OSLOM’s accuracy is 1 when Zin = 9, its performance drops when Zin = 8. Besides, LPA has
the worst results because it does not work when Zin < 11.
Figs. 2–6 show the results of different methods in LFR networks using different parameters listed in Table 1. The same
with the above analysis, we still compare these six algorithms according to accuracy and NMI criteria. The only difference
is that we use different mu to identify whether the network has clear communities instead of Zin which is used in Girvan
and Newman’s networks. The results in LFR networks are similar to the results in Girvan and Newman’s networks. When
mu is small, almost all the methods perform well. But as mu increases, some algorithms’ performances drop greatly, for
example, LPA (mu ≥ 0.6), BGLL and OSLOM (mu ≥ 0.7), infomap (mu ≥ 0.8) in Fig. 2(a) etc. However, the performances of
K -rank and K -means do not drop dramatically. Furthermore, they have better results when mu is large. It should be noted
that there exist differences between the performances of K -means and K -rank. Though K -means sometimes is better than
K -rank, K -means lacks stability when mu is small such as mu ≤ 0.5 in Figs. 2(a) and 4(a). What is more, as mentioned
in the above section, if the network is large (Figs. 3, 5 and 6), the number of communities K is usually large as well, then
K -means hardly finds the communities. Because random choice of initial seeds will easily lead to an empty cluster or a bad
clustering result during the iterative process. However, K -rank can avoid bad results using our strategy, mentioned above,
of initial seeds choice. The experimental results in Figs. 3, 5 and 6 confirm our conclusion. In a word, compared with other
algorithms, K -rank algorithm has better performances and it is more steady when dealing with small networks. As for the
larger networks (Fig. 6), K -rank’s performance is slightly worse than the other four methods when mu is small, but it is
the best of all when mu is large. Therefore, K -rank is fit for the networks no matter whether they have clear communities
or not.
Y. Jiang et al. / Physica A 392 (2013) 2182–2194 2189

1 1

0.9 0.9

0.8 0.8
0.7 0.7

0.6 0.6
accuracy

NMI
K−rank 0.5 K−rank
0.5
LPA LPA
0.4 BGLL 0.4 BGLL
infomap infomap
0.3 OSLOM 0.3 OSLOM

0.2 0.2
0.1 0.1
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
mu mu
(a) Accuracy. (b) NMI.

Fig. 3. Comparison of K -rank and other algorithms in LFR networks (N = 5000, [minc , maxc ] = [10, 50]).

1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
accuracy

K−rank K−rank
NMI

0.5 0.5
LPA LPA
0.4 BGLL BGLL
0.4
infomap infomap
0.3 OSLOM 0.3 OSLOM
K−means K−means
0.2 0.2
0.1 0.1
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
mu mu
(a) Accuracy. (b) NMI.

Fig. 4. Comparison of K -rank and other algorithms in LFR networks (N = 1000, [minc , maxc ] = [20, 100]).

1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
accuracy

NMI

0.5 0.5
K−rank K−rank
0.4 LPA
0.4 LPA
BGLL BGLL
0.3 infomap 0.3 infomap
OSLOM OSLOM
0.2 0.2
0.1 0.1
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
mu mu
(a) Accuracy. (b) NMI.

Fig. 5. Comparison of K -rank and other algorithms in LFR networks (N = 5000, [minc , maxc ] = [20, 100]).
2190 Y. Jiang et al. / Physica A 392 (2013) 2182–2194

1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
accuracy

NMI
0.5 0.5
0.4 0.4
K−rank K−rank
0.3 LPA 0.3 LPA
BGLL BGLL
0.2 infomap 0.2 infomap
OSLOM OSLOM
0.1 0.1
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
mu mu
(a) Accuracy. (b) NMI.

Fig. 6. Comparison of K -rank and other algorithms in LFR networks (N = 10 000, [minc , maxc ] = [10, 100]).

1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
accuracy

NMI

0.5 0.5
0.4 K−rank 0.4 K−rank
LPA LPA
0.3 BGLL 0.3 BGLL
infomap infomap
0.2 OSLOM 0.2 OSLOM
K−means K−means
0.1 0.1
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
mut mut
(a) Accuracy. (b) NMI.

Fig. 7. Comparison of K -rank and other algorithms in LFR directed and weighted networks (N = 1000, [minc , maxc ] = [20, 50]).

4.2. Weighted and directed artificial networks without overlapping communities

Weighted and directed artificial networks are also generated by LFR [26] benchmark networks. But this type of network
needs 3 new parameters which are the mixing parameter for the topology (mut), the mixing parameter for the weights
(muw) and the exponent for the weight distribution (beta). In this section, the parameters of the networks we generated are
as follows: N = 1000, k = 15, kmax = 50, mut = muw = [0.1–0.9], minc = 20, maxc = 50, beta = 0.6, t1 = 2, t2 = 1.
The results of different mut in LFR networks [26] are shown in Fig. 7. The results show that in both accuracy and NMI,
K -rank and K -means are the best of all the methods especially when mut is large. Infomap and BGLL have bad results when
mut ≥ 0.7. OSLOM’s performance drops when mut ≥ 0.8. Besides, LPA has the worst results because it does not work
when mut ≥ 0.6. Similarly, K -means suffers from its instability when mut is small (Fig. 7(a)) though it is better than
K -rank when mut ≥ 0.7. We should note that the methods we compared with are the extended versions for weighted
and directed networks. But their results need to be improved. The proposed K -rank algorithm uses the vertex similarities
and PageRank which can incorporate both edges’ weights and direction naturally, so the results show that it outperforms
the other methods and is more fit for the weighted and directed networks.

4.3. Unweighted and undirected artificial networks with overlapping communities

In order to test our algorithm in overlapping networks, we compare K -rank with the other four overlapping communities
detection algorithms: CFinder [27], CONGA [28], COPRA [29] and LFM [25] in LFR networks. LFR networks with overlapping
communities need 2 new parameters which are the number of overlapping vertices (on) and the number of memberships of
the overlapping vertices (om). In our experiments, we generate 4 types of artificial networks, their parameters are as follows:
Y. Jiang et al. / Physica A 392 (2013) 2182–2194 2191

Table 2
Results of the comparison in networks with 8 overlapping vertices.
Methods Zin
16 15 14 13 12 11 10 9 8

Qov
K -rank 0.75 0.75 0.75 0.75 0.75 0.75 0.74 0.74 0.70
CFinder 0.74 0.74 – 0.07 0.06 0.06 0.06 0.07 0.06
CONGA 0.75 0.64 0.35 0.63 0.28 0.12 0.11 0.08 0.07
COPRA 0.74 0.74 0.74 0.74 0.74 0.74 0.00 0.00 0.00
LFM 0.74 0.74 0.74 0.74 0.74 0.73 0.73 – –
Accuracy
K -rank 1.00 1.00 1.00 1.00 1.00 1.00 0.92 0.86 0.51
CFinder 0.93 0.93 – 0.22 0.22 0.22 0.22 0.21 0.23
CONGA 1.00 0.63 0.31 0.27 0.22 0.23 0.22 0.22 0.23
COPRA 0.93 0.93 0.93 0.91 0.89 0.77 0.23 0.23 0.23
LFM 0.92 0.92 0.91 0.91 0.83 0.82 0.84 – –
NMI
K -rank 1.00 1.00 1.00 1.00 1.00 1.00 0.78 0.53 0.66
CFinder 0.45 0.57 – 0.47 0.46 0.46 0.47 0.46 0.46
CONGA 1.00 0.43 0.61 0.60 0.50 0.47 0.47 0.47 0.47
COPRA 0.58 0.70 0.72 0.57 0.56 0.42 0.37 0.37 0.37
LFM 0.57 0.57 0.68 0.68 0.43 0.55 0.54 – –

Table 3
Results of the comparison in networks with 16 overlapping vertices.
Methods Zin
16 15 14 13 12 11 10 9 8

Qov
K -rank 0.75 0.75 0.75 0.75 0.74 0.74 0.74 0.72 0.71
CFinder 0.74 – 0.07 – 0.06 – 0.06 – –
CONGA 0.75 0.64 0.32 0.46 0.27 0.15 0.19 0.09 0.06
COPRA 0.74 0.74 0.74 0.74 0.62 0.14 0.00 0.00 0.00
LFM 0.74 0.74 0.74 0.74 0.73 0.73 – – –
Accuracy
K -rank 1.00 1.00 1.00 1.00 0.98 0.98 0.85 0.55 0.32
CFinder 0.87 – 0.21 – 0.21 – 0.21 – –
CONGA 1.00 0.50 0.22 0.21 0.21 0.21 0.21 0.21 0.21
COPRA 0.87 0.85 0.81 0.81 0.45 0.29 0.21 0.21 0.21
LFM 0.85 0.86 0.85 0.85 0.84 0.78 – – –
NMI
K -rank 1.00 1.00 1.00 1.00 0.97 0.97 0.76 0.48 0.39
CFinder 0.44 – 0.46 – 0.47 – 0.46 – –
CONGA 1.00 0.74 0.48 0.49 0.49 0.47 0.47 0.47 0.47
COPRA 0.56 0.56 0.43 0.43 0.41 0.41 0.37 0.37 0.37
LFM 0.43 0.43 0.43 0.43 0.53 0.52 – – –

N = 128, k = 16, kmax = 16, maxc = minc = [38, 44, 56, 80], t1 = 2, t2 = 1, on = [8, 16, 32, 64], om = 4. The value
of each mu can be calculated by Zin (mu = (k − Zin )/k). For example, Zin = 15 means mu = (16 − 15)/16 = 0.0625.
Accuracy and NMI are used to estimate these methods’ performances, besides, in order to evaluate the results’ modularity,
we compute Qov [23], a kind of modularity function for overlapping networks.
Based on the different Zin and the different number of overlapping vertices, five methods’ experimental results are shown
in Tables 2–5. From the results of tables we can see that, K -rank is the best among all the algorithms in most cases except
CONGA may perform well in very few cases. In the Table, ‘‘–’’ means that this algorithm cannot find meaningful communities
in that case. It should be noted that sometimes K -rank is much better than the other algorithms. As we have discussed,
K -rank is based on a clustering technique which needs to compute vertex similarities. We adopt signal similarity based on
the network topology. Therefore, we can get the network’s global information about any pair of vertices. We believe that it
is this global information that makes K -rank highly effective. However, in the other four methods, CONGA [28] uses vertices
splitting based on some rules, COPRA [29] adopts the idea of label propagation, LFM [25] optimizes a local fitness function
and CFinder [27] finds all the κ -cliques in the networks. These four methods only use the vertices local information. K -rank
uses global information, making it highly effective. The experimental results in this section confirm our conclusion.
2192 Y. Jiang et al. / Physica A 392 (2013) 2182–2194

Table 4
Results of the comparison in networks with 32 overlapping vertices.
Methods Zin
16 15 14 13 12 11 10 9 8

Qov
K -rank 0.75 0.74 0.74 0.74 0.74 0.73 0.72 0.74 0.73
CFinder – 0.06 – – – – – – –
CONGA 0.61 0.53 0.12 0.23 0.07 0.17 0.08 0.09 0.08
COPRA 0.74 0.73 0.72 0.72 0.05 0.00 0.00 0.00 0.00
LFM 0.74 0.73 0.62 0.73 – – – – –
Accuracy
K -rank 1.00 0.97 0.96 0.90 0.89 0.69 0.61 0.41 0.24
CFinder – 0.18 – – – – – – –
CONGA 0.39 0.31 0.19 0.18 0.18 0.18 0.17 0.18 0.18
COPRA 0.75 0.74 0.59 0.44 0.17 0.18 0.18 0.18 0.18
LFM 0.74 0.70 0.54 0.68 – – – – –
NMI
K -rank 1.00 0.83 0.82 0.67 0.45 0.41 0.39 0.39 0.40
CFinder – 0.46 – – – – – – –
CONGA 0.62 0.60 0.46 0.48 0.47 0.48 0.46 0.47 0.47
COPRA 0.43 0.43 0.41 0.37 0.38 0.37 0.37 0.37 0.37
LFM 0.41 0.41 0.39 0.41 – – – – –

Table 5
Results of the comparison in networks with 64 overlapping vertices.
Methods Zin
16 15 14 13 12 11 10 9 8

Qov
K -rank 0.74 0.74 0.74 0.74 0.74 0.74 0.75 0.75 0.74
CFinder – – – – – – – – –
CONGA 0.14 0.13 0.08 0.11 0.11 0.11 0.11 0.08 0.06
COPRA 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
LFM – – – – – – – – 0.56
Accuracy
K -rank 0.91 0.81 0.78 0.60 0.51 0.51 0.50 0.50 0.51
CFinder – – – – – – – – –
CONGA 0.12 0.12 0.13 0.12 0.12 0.11 0.12 0.12 0.12
COPRA 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12
LFM – – – – – – – – 0.11
NMI
K -rank 0.45 0.74 0.42 0.68 0.47 0.57 0.37 0.37 0.38
CFinder – – – – – – – – –
CONGA 0.18 0.37 0.47 0.47 0.18 0.47 0.28 0.47 0.47
COPRA 0.37 0.37 0.37 0.37 0.37 0.37 0.37 0.37 0.37
LFM – – – – – – – – 0.24

Table 6
The real-world networks used in our experiments.
No. Network N E K Ref

1 Zachary’s club 34 78 2 [30]

2 Risk 42 83 6 [31]
3 Dolphins 62 159 2 [32]
4 Lesmis 77 254 11 [33]
5 Political books 105 441 3 [34]
6 Football 115 613 12 [3]
7 Blogs 1490 16 718 2 [35]
8 PPI 1628 11 249 408 [36]

4.4. Real-world networks

In this section, we test these six algorithms in real-world networks. Table 6 shows the networks we used in this section.
N is the number of vertices, E is the number of edges and K is the number of communities (prior knowledge). The results
Y. Jiang et al. / Physica A 392 (2013) 2182–2194 2193

Table 7
Comparison of K -rank and other algorithms in real-world networks.
Methods No.
1 2 3 4 5 6 7 8

Accuracy
K -rank 1.00 0.98 1.00 0.71 0.85 0.93 0.89 0.91
LPA 0.91 0.85 0.77 0.72 0.82 0.80 0.63 0.87
BGLL 0.64 0.85 0.50 0.77 0.72 0.90 0.79 0.88
infomap 0.82 0.85 0.58 0.76 0.78 0.91 0.58 0.75
OSLOM 0.97 0.42 0.87 0.59 0.80 0.91 0.88 0.85
K -means 1.00 0.98 1.00 0.62 0.85 0.90 0.82 –
NMI
K -rank 1.00 0.96 1.00 0.80 0.57 0.92 0.51 0.94
LPA 0.64 0.90 0.60 0.74 0.53 0.86 0.29 0.94
BGLL 0.58 0.94 0.46 0.81 0.51 0.93 0.33 0.95
infomap 0.69 0.94 0.53 0.78 0.53 0.92 0.29 0.90
OSLOM 0.84 0.42 0.55 0.64 0.55 0.91 0.50 0.93
K -means 1.00 0.96 1.00 0.66 0.55 0.90 0.38 –

of different algorithms in real-world networks are shown in Table 7. From the results of tables we can see that, K -rank
outperforms the other algorithms in most cases though BGLL and K -means may perform well sometimes. ‘‘–’’ means that
K -means cannot find meaningful communities in larger PPI network due to the bad choice of initial seeds. We think why
K -rank works better in these real-world network is largely because of clustering and the high performance of vertex
similarities. Besides, the strategy of choosing initial seeds makes K -rank more effective than K -means.

5. Conclusion

In this paper, we proposed an efficient clustering algorithm K -rank based on rank centrality. Similar to K -means
algorithm, the proposed method first finds K seeds which have largest rank centrality in the network, then updates these
seeds by an iterative technique. K -rank can be easily extended to directed, weighted and overlapping networks. Besides,
K -rank is much faster than K -means++ when choosing the initial seeds and can avoid producing empty clusters during
the iterative process. The results on synthetic and real-world networks have shown our method is more efficient than the
state-of-the-art algorithms.

Acknowledgments

We acknowledge Bian-Fang Chai,Ya-Fang Li and Li-Yan Ma for their spelling and grammar check. This work was
supported in part by National Nature Science Foundation of China (Grant Nos. 60905029, 81230086), Beijing Natural Science
Foundation (Grant No. 4112046) and the Fundamental Research Funds for the Central Universities.

References

[1] D.J. Watts, S.H. Strogatz, Collective dynamics of ‘small-world’ networks, Nature 393 (1998) 440.
[2] A.L. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (1999) 509.
[3] M. Girvan, M.E.J. Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA 99 (2002) 7821.
[4] M.E.J. Newman, M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69 (2004) 026113.
[5] J.B. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proceedings of 5-th Berkeley Symposium on
Mathematical Statistics and Probability, Vol. 1, University of California Press, Berkeley, 1967, p. 281.
[6] B.J. Frey, D. Dueck, Clustering by passing messages between data points, Science 315 (2007) 972.
[7] B. Yang, D.Y. Liu, et al., Complex network clustering algorithms, J. Softw. 20 (2009) 54.
[8] C. Ding, X. He, Horst D. Simon, On the equivalence of nonnegative matrix factorization and spectral clustering, in: Proceedings of the Fifth SIAM
International Conference on Data Mining, Newport Beach, CA.
[9] A. Clauset, M.E.J. Newman, C. Moore, Finding community structure in very large networks, Phys. Rev. E 70 (2004) 066111.
[10] M.E.J. Newman, Finding community structure in networks using the eigenvectors of matrices, Phys. Rev. E 74 (2006) 036104.
[11] V.D. Blondel, J. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, J. Stat. Mech. (2008) 10008.
[12] M.J. Barber, J.W. Clark, Detecting network communities by propagating labels under constraints, Phys. Rev. E 80 (2009) 026129.
[13] A. Lancichinetti, F. Radicchi, J.J. Ramasco, S. Fortunato, Finding statistically significant communities in networks, PLoS One 6 (2011) 0018961.
[14] M. Rosvall, C. Bergstrom, Maps of random walks on complex networks reveal community structure, Proc. Natl. Acad. Sci. USA 105 (2008) 1118.
[15] U.N. Raghavan, R. Albert, S. Kumara, Near linear time algorithm to detect community structures in large-scale networks, Phys. Rev. E 76 (2007) 036106.
[16] S. Fortunato, M. Barthelemy, Resolution limit in community detection, Proc. Natl. Acad. Sci. USA 104 (2007) 36.
[17] D. Arthur, S. Vassilvitskii, k-means++: the advantages of careful seeding, in: Proceedings of the Eighteenth Annual ACM–SIAM Symposium on Discrete
algorithms, 2007, p. 1027.
[18] Linyuan Lü, Tao Zhou, Link prediction in complex networks: a survey, Physica A 390 (2011) 1150.
[19] H. Yanqing, L. Menghui, Z. Peng, et al., Community detection by signaling on complex networks, Phys. Rev. E 78 (2008) 016115.
[20] D. Steinley, M.J. Brusco, Initializing K -means batch clustering: a critical evaluation of several techniques, J. Classification 24 (2007) 99.
[21] L. Page, S. Brin, R. Motwani, T. Winograd, The pagerank citation ranking: bringing order to the web, Technical Report, Stanford University, 1998.
[22] A. Li, Fuzzy Mathematics and Application, Metallurgical Industry Press, Beijing, 2005.
2194 Y. Jiang et al. / Physica A 392 (2013) 2182–2194

[23] V. Nicosia, G. Mangioni, V. Carchiolo, M. Malgeri, Extending the definition of modularity to directed graphs with overlapping communities, J. Stat.
Mech. (2009) 03024.
[24] L. Danon, A. Daz-Guilera, J. Duch, A. Arenas, Comparing community structure identification, J. Stat. Mech. (2005) 09008.
[25] A. Lancichinetti, S. Fortunato, J. Kertész, Detecting the overlapping and hierarchical community structure in complex networks, New J. Phys. 11 (2009)
033015.
[26] A. Lancichinetti, S. Fortunato, F. Radicchi, Benchmark graphs for testing community detection algorithms, Phys. Rev. E 78 (2008) 046110.
[27] G. Palla, I. Dernyi, I. Farkas, et al., Uncovering the overlapping community structure of complex networks in nature and society, Nature 435 (2005)
814.
[28] S. Gregory, An algorithm to find overlapping community structure in networks, in: PKDD, vol. 4702, 2007, p. 91.
[29] S. Gregory, Finding overlapping communities in networks by label propagation, New J. Phys. 12 (2010) 103018.
[30] W.W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33 (1977) 452.
[31] L. Donetti, M.A. Muñoz, J. Stat. Mech. (2004) P10012.
[32] D. Lusseau, K. Schneider, O.J. Boisseau, P. Haase, E. Slooten, S.M. Dawson, The bottlenose dolphin community of doubtful sound features a large
proportion of long-lasting associations, Behavioral Ecology and Sociobiology 54 (2003) 396.
[33] D.E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Addison-Wesley, Reading, MA, 1993.
[34] V. Krebs, (unpublished). http://www.orgnet.com/.
[35] L.A. Adamic, N. Glance, The political blogosphere and the 2004 US election: divided they blog, in: Proceedings of the International Workshop on Link
Discovery, 2005, p. 36.
[36] J. Vlasblom, S.J. Wodak, Markov clustering versus affinity propagation for the partitioning of protein interaction graphs, BMC Bioinformatics 10 (2009)
99.

CCIE Security v6 CLC LAB1.2
100% (1)
CCIE Security v6 CLC LAB1.2
67 pages
Social Media Data Mining
100% (2)
Social Media Data Mining
382 pages
Chi Square Lab Report
100% (2)
Chi Square Lab Report
5 pages
IOS9 Pegasus IOS Kernel Exploit
No ratings yet
IOS9 Pegasus IOS Kernel Exploit
44 pages
1-s2.0-S0950705121002240-main-1
No ratings yet
1-s2.0-S0950705121002240-main-1
14 pages
Finding Community Structure in Very Large Networks
No ratings yet
Finding Community Structure in Very Large Networks
6 pages
Finding Community Structure in Very Large Networks
No ratings yet
Finding Community Structure in Very Large Networks
6 pages
16-EJS1206
No ratings yet
16-EJS1206
26 pages
Centrality Measures in Complex Networks A Survey
No ratings yet
Centrality Measures in Complex Networks A Survey
75 pages
Vertex clustering in diverse dynamic networks
No ratings yet
Vertex clustering in diverse dynamic networks
29 pages
Community Moore
No ratings yet
Community Moore
6 pages
15-link-2 - converted
No ratings yet
15-link-2 - converted
11 pages
A-modified-label-propagation-algorithm-fo_2021_International-Journal-of-Info
No ratings yet
A-modified-label-propagation-algorithm-fo_2021_International-Journal-of-Info
11 pages
Physica A: Bilal Saoud, Abdelouahab Moussaoui
No ratings yet
Physica A: Bilal Saoud, Abdelouahab Moussaoui
9 pages
radicchi2004
No ratings yet
radicchi2004
6 pages
SADMJ12
No ratings yet
SADMJ12
19 pages
Community Detection and Evaluation
No ratings yet
Community Detection and Evaluation
46 pages
SNA-Community Detection
No ratings yet
SNA-Community Detection
38 pages
Clauset Et Al - 2004 - Finding Community Structure in Very Large Networks
No ratings yet
Clauset Et Al - 2004 - Finding Community Structure in Very Large Networks
6 pages
A Comprehensive Survey On Community Detection Methods and Applications in Complex Information Networks
No ratings yet
A Comprehensive Survey On Community Detection Methods and Applications in Complex Information Networks
47 pages
C OMBINATORIAL M ODELS OF C OMPLEX S YSTEMSTesis Doctorado Eng
No ratings yet
C OMBINATORIAL M ODELS OF C OMPLEX S YSTEMSTesis Doctorado Eng
194 pages
Network Analysis and Synthesis: A Modern Systems Theory Approach
From Everand
Network Analysis and Synthesis: A Modern Systems Theory Approach
Brian D. O. Anderson
5/5 (2)
Algorithms 15 00020
No ratings yet
Algorithms 15 00020
18 pages
E-Communities -Part1
No ratings yet
E-Communities -Part1
80 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
Weighted Graph Clustering For Community Detection of Large Social Networks
No ratings yet
Weighted Graph Clustering For Community Detection of Large Social Networks
10 pages
Communication Nets: Stochastic Message Flow and Delay
From Everand
Communication Nets: Stochastic Message Flow and Delay
Leonard Kleinrock
3/5 (1)
Sna It Unit3
No ratings yet
Sna It Unit3
19 pages
Community Detection Using Statistically Significant Subgraph Mining
No ratings yet
Community Detection Using Statistically Significant Subgraph Mining
10 pages
Genetic Algorithm-based Community Detection in Lar
No ratings yet
Genetic Algorithm-based Community Detection in Lar
18 pages
Salah Article Published
No ratings yet
Salah Article Published
39 pages
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
No ratings yet
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
10 pages
Evaluating The Igraph Community Detection Algorithms
No ratings yet
Evaluating The Igraph Community Detection Algorithms
8 pages
ML-07-clustering
No ratings yet
ML-07-clustering
56 pages
Tiles_ an Online Algorithm for Community Discovery
No ratings yet
Tiles_ an Online Algorithm for Community Discovery
29 pages
SNS Unit Iii
No ratings yet
SNS Unit Iii
21 pages
menendezLlorente
No ratings yet
menendezLlorente
22 pages
Generic Anomalous Vertices Detection Utilizing A Link Prediction Algorithm
No ratings yet
Generic Anomalous Vertices Detection Utilizing A Link Prediction Algorithm
19 pages
CC-GA
No ratings yet
CC-GA
12 pages
Community Detection in Social Network Ver4
No ratings yet
Community Detection in Social Network Ver4
23 pages
ARS-CH4-Local Community Identification
No ratings yet
ARS-CH4-Local Community Identification
32 pages
Module 3
No ratings yet
Module 3
36 pages
Final Version Maier 5681
No ratings yet
Final Version Maier 5681
31 pages
Data Mining-Unit 3-Part1
No ratings yet
Data Mining-Unit 3-Part1
41 pages
rr2012 Conclude Libre PDF
No ratings yet
rr2012 Conclude Libre PDF
15 pages
A Fast Algorithm For Finding Community Structure Based On Community Closeness
No ratings yet
A Fast Algorithm For Finding Community Structure Based On Community Closeness
4 pages
Social Media Mining: Reza Zafarani Mohammad Ali Abbasi Huan Liu
No ratings yet
Social Media Mining: Reza Zafarani Mohammad Ali Abbasi Huan Liu
382 pages
Entropy: Kernel Spectral Clustering For Big Data Networks
No ratings yet
Entropy: Kernel Spectral Clustering For Big Data Networks
20 pages
Community Structure in Social and Biological Networks: M. Girvan and M. E. J. Newman
No ratings yet
Community Structure in Social and Biological Networks: M. Girvan and M. E. J. Newman
6 pages
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach
No ratings yet
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach
10 pages
2010 Lu LinkPredictionComplexNetworks
No ratings yet
2010 Lu LinkPredictionComplexNetworks
44 pages
1 s2.0 S0370157309002841 Main PDF
No ratings yet
1 s2.0 S0370157309002841 Main PDF
100 pages
3.1 Extracting Evolution of Web Community From A Series of Web Archive
No ratings yet
3.1 Extracting Evolution of Web Community From A Series of Web Archive
18 pages
UNIT7-Community Detection
No ratings yet
UNIT7-Community Detection
91 pages
Comparative Analysis of Community Detection Algorithms
No ratings yet
Comparative Analysis of Community Detection Algorithms
5 pages
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
From Everand
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
Fouad Sabry
No ratings yet
Detecting Community Structures in Signed Social Networks (An Automated Approach)
No ratings yet
Detecting Community Structures in Signed Social Networks (An Automated Approach)
6 pages
Complex Networks
No ratings yet
Complex Networks
77 pages
04 Communities
No ratings yet
04 Communities
78 pages
scRNAseq_clustering_Asa_Bjorklund_2021
No ratings yet
scRNAseq_clustering_Asa_Bjorklund_2021
53 pages
Comp of Clustering Method
No ratings yet
Comp of Clustering Method
117 pages
Qin 2021 J. Phys. Conf. Ser. 1971 012061
No ratings yet
Qin 2021 J. Phys. Conf. Ser. 1971 012061
8 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
08 Class Algebra Assg
No ratings yet
08 Class Algebra Assg
7 pages
Contextual Analysis of Khat-E-Nastaleeq
100% (2)
Contextual Analysis of Khat-E-Nastaleeq
17 pages
Basic Electronics Urdu Book PDF
No ratings yet
Basic Electronics Urdu Book PDF
27 pages
Digi Error Codes
No ratings yet
Digi Error Codes
3 pages
Trigonometry Worksheet
No ratings yet
Trigonometry Worksheet
24 pages
Modular Co Ordination
No ratings yet
Modular Co Ordination
37 pages
Wavetra Solar System Price Quotation PDF
No ratings yet
Wavetra Solar System Price Quotation PDF
4 pages
Unit 5 New PDF
No ratings yet
Unit 5 New PDF
52 pages
Legacy: ICC-ES Legacy Report ER-3056
No ratings yet
Legacy: ICC-ES Legacy Report ER-3056
7 pages
Etea 2019
No ratings yet
Etea 2019
7 pages
Gage R&R
No ratings yet
Gage R&R
23 pages
JPCS DBase Template Series Updated)
No ratings yet
JPCS DBase Template Series Updated)
101 pages
HW06 - Linear Momentum
No ratings yet
HW06 - Linear Momentum
6 pages
Threaddump 20180831 160744
No ratings yet
Threaddump 20180831 160744
8 pages
11 Cbse Chemistry Organic Chemistry
No ratings yet
11 Cbse Chemistry Organic Chemistry
22 pages
Chapter Four: Nouns, Pronouns, and The Basic Noun Phrase: A University English Grammar by Quirk, R. and
No ratings yet
Chapter Four: Nouns, Pronouns, and The Basic Noun Phrase: A University English Grammar by Quirk, R. and
10 pages
L 2 Net User Guide
No ratings yet
L 2 Net User Guide
42 pages
Ultrasonic Testing in NDT
100% (1)
Ultrasonic Testing in NDT
22 pages
Human Activity Recognition
No ratings yet
Human Activity Recognition
6 pages
The Human Eye and The Colourful World
No ratings yet
The Human Eye and The Colourful World
7 pages
Journal of Urban Development and Management Integrating The Biophilia Concept Into Urban Planning: A Case Study of Kufa City, Iraq
No ratings yet
Journal of Urban Development and Management Integrating The Biophilia Concept Into Urban Planning: A Case Study of Kufa City, Iraq
11 pages
YEAR_12_EXAM_QUESTIONS_REVISION__1_
No ratings yet
YEAR_12_EXAM_QUESTIONS_REVISION__1_
53 pages
Bitstorm 2600 Ip Dslam
No ratings yet
Bitstorm 2600 Ip Dslam
58 pages
Series: Aluminum Electrolytic Capacitors
No ratings yet
Series: Aluminum Electrolytic Capacitors
4 pages
Graphing Rational Functions
No ratings yet
Graphing Rational Functions
6 pages
Bab 4 Termo
No ratings yet
Bab 4 Termo
40 pages
Internet and Java
No ratings yet
Internet and Java
400 pages

3-KMEANS - An Efficient Community Detection Method Based On Rank Centrality-2012

Uploaded by

3-KMEANS - An Efficient Community Detection Method Based On Rank Centrality-2012

Uploaded by

Physica A 392 (2013) 2182–2194

Contents lists available at SciVerse ScienceDirect

An efficient community detection method based on

article info abstract

2.1. Signal similarity

2.2. How to choose the seeds

2.3. K -rank algorithm

Algorithm 1: K -rank Algorithm

2.4. How to specify the number of communities K

2.5. The complexity of K -rank algorithm

3. Generalization to weighted, directed and overlapping networks

3.1. Generalization to weighted and directed networks

3.2. Generalization to overlapping networks

Hypothesis 1. Every community must have only one seed.

N 1000 1000 5000 5000 10 000

4. Experimental results and applications

4.1. Unweighted and undirected artificial networks without overlapping communities

0.5 0.5 K−rank

4.2. Weighted and directed artificial networks without overlapping communities

4.3. Unweighted and undirected artificial networks with overlapping communities

1 Zachary’s club 34 78 2 [30]

4.4. Real-world networks

You might also like