0% found this document useful (0 votes)

98 views16 pages

Engineering Parallel Algorithms For Community Detection in Massive Networks

This document summarizes a research paper about engineering parallel algorithms for community detection in large networks. It introduces the problem of community detection in networks and its applications. It also discusses existing community detection algorithms and the need for parallel algorithms that can handle massive networks with billions of edges. The authors propose a flexible framework with shared-memory parallelism and efficient parallel community detection heuristics, including a parallel label propagation scheme, a parallelization of the Louvain method, and an ensemble method combining algorithms.

Uploaded by

Serge Alain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views16 pages

Engineering Parallel Algorithms For Community Detection in Massive Networks

Uploaded by

Serge Alain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

1

Engineering Parallel Algorithms for

Community Detection in Massive Networks
Christian L. Staudt and Henning Meyerhenke
Faculty of Informatics, Karlsruhe Institute of Technology (KIT), Germany
{christian.staudt, meyerhenke}@kit.edu

Abstract—The amount of graph-structured data has recently methods that harness parallelism and apply specifically to
experienced an enormous growth in many applications. To trans- complex networks.
arXiv:1304.4453v4 [cs.DC] 2 Feb 2015

form such data into useful information, fast analytics algorithms In this work, we deal with the task of community detection
and software tools are necessary. One common graph analytics
kernel is disjoint community detection (or graph clustering). (also known as graph clustering) in large networks, i. e. the
Despite extensive research on heuristic solvers for this task, only discovery of dense subgraphs. Among manifold applications,
few parallel codes exist, although parallelism will be necessary community detection has been used to counteract search
to scale to the data volume of real-world applications. We engine rank manipulation [32], to discover scientific com-
address the deficit in computing capability by a flexible and munities in publication databases [34], to identify functional
extensible community detection framework with shared-memory
parallelism. Within this framework we design and implement groups of proteins in cancer research [15], and to organize
efficient parallel community detection heuristics: A parallel label content on social media sites [12]. So far, extensive research
propagation scheme; the first large-scale parallelization of the on community detection in networks has given rise to a variety
well-known Louvain method, as well as an extension of the of definitions of what constitutes a good community and
method adding refinement; and an ensemble scheme combining a variety of methods for finding such communities, many
the above. In extensive experiments driven by the algorithm
engineering paradigm, we identify the most successful parameters of which are described in surveys by Schaeffer [32] and
and combinations of these algorithms. We also compare our im- Fortunato [10]. Among these definitions, the lowest common
plementations with state-of-the-art competitors. The processing denominator is that a community is an internally dense node
rate of our fastest algorithm often reaches 50M edges/second. We set with sparse connections to the rest of the graph. While
recommend the parallel Louvain method and our variant with it can be argued that communities can overlap, we restrict
refinement as both qualitatively strong and fast. Our methods
are suitable for massive data sets with billions of edges.1 ourselves to finding disjoint communities, i.e. a partition of
the node set which uniquely assigns a node to a community.
Keywords: Disjoint community detection, graph clustering,
parallel Louvain method, parallel algorithm engineering, network The quality measure modularity [14] formalizes the notion of a
analysis good community detection solution by comparing its coverage
(fraction of edges within communities) to an expected value
based on a random edge distribution model which preserves
I. I NTRODUCTION the degree distribution. Modularity is not without flaws (like
the resolution limit [11], which can be partially overcome
The data volume produced by electronic devices is growing
by different techniques [2], [18], [23]) nor alternatives [37],
at an enormous rate. Important classes of such data can
but has emerged as a well-accepted measure of community
be modeled by complex networks, which are increasingly
quality. This makes modularity our measure of choice. While
used to represent phenomena as varied as the WWW, social
optimizing modularity is NP-hard [6], efficient heuristics have
relations, and brain topology. The resulting graph data sets can
been introduced which explicitly increase modularity.
easily reach billions of edges for many relevant applications.
For graphs with millions to billions of edges, only (near)
Analyzing data of this volume in near real-time challenges the
linear-time community detection algorithms are practical. Sev-
state of the art in terms of hardware, software, and algorithms.
eral fast methods have been developed in recent years. Yet,
A particular challenge is not only the amount of data, but
there is a lack of research in adapting these methods to take
also its structure. Complex networks have topological features
advantage of parallelism. A recent attempt at assessing the
which pose computational challenges different from traditional
state of the art in community detection was the 10th DIMACS
HPC applications: In a scale-free network, the presence of a
Implementation Challenge on Graph Partitioning and Graph
few high-degree nodes (hubs) among many low degree nodes
Clustering [1]. DIMACS challenges are scientific competitions
generates load balancing issues. In a small-world network,
in which the participants solve problems from a specified test
the entire graph can be visited in only a few hops from any
set, with the aim of high solution quality and high speed.
source node, which negatively affects cache performance. To
Only two of the 15 submitted implementations for modularity
enable network analysis methods to scale, we need algorithmic
optimization relied on parallelism and only very few could
1 A preliminary version of this paper appeared in Proceedings of the 42nd handle graphs with billions of edges in reasonable time.
International Conference on Parallel Processing (ICPP 2013) [35]. Accordingly, our objective is the development and imple-
2

mentation of parallel community detection heuristics which are In comparative experiments, our implementations perform well
able to handle massive graphs quickly while also producing in comparison to other state-of-the-art algorithms (Sec. V-E
a high-quality solution. In the following, the competitors of and V-F): Three of our algorithms are on the Pareto frontier.
the DIMACS challenge will be used for a comparative experi- Our community detection software framework, written in
mental study. In the design of such heuristics, we necessarily C++, is flexible, extensible, and supports rapid iteration be-
trade off solution quality against running time. The DIMACS tween design, implementation and testing required for algo-
challenge also showed that there is no consensus on what rithm engineering [25]. In this work, we focus on specific
running times are acceptable and how desirable an increase configurations of algorithms, but future combinations can be
in the third decimal place of modularity is. We therefore need quickly evaluated. We distribute our community detection code
to clarify our design goals as follows: In the comparison with as a component of NetworKit [36], our open-source network
other proposed methods, we want to place our algorithms analysis package, which is under continuous development.
on the Pareto frontier so that they are not dominated, i.e.
surpassed in speed and quality at the same time. Secondly, II. R ELATED W ORK
we target a usage scenario: Our algorithms should be suitable This section gives a short overview over related efforts.
as part of interactive data analysis workflows, performed by a For a comprehensive overview of community detection in
data analyst operating a multicore workstation. Networks with networks, we refer the interested reader to aforementioned
billions of edges should be processed in minutes rather than surveys [32], [10]. Recent developments and results are also
hours, and the solution quality should be competitive with the covered by the 10th DIMACS Implementation Challenge [1].
results of well-established sequential methods. Among efficient heuristics for community detection we can
distinguish between those based on community agglomeration
We implement three standalone parallel algorithms: Label
and those based on local node moves. Agglomerative algo-
propagation [28] is a simple procedure where nodes adopt
rithms successively merge pairs of communities so that an
the community assignment (label) which is most frequent
improvement with respect to community quality is achieved.
among their neighbors until stable communities emerge. We
In contrast, local movers search for quality gains which can be
implement a parallel version of the approach as the PLP
achieved by moving a node to the community of a neighbor.
algorithm. The Louvain method [5] is a multilevel technique
A globally greedy agglomerative method known as
in which nodes are repeatedly moved to the community of
CNM [8] runs in O(md log n) for graphs with n nodes and m
a neighbor if modularity can be improved. We are the first
edges, where d is the depth of the dendrogram of mergers and
to present a parallel implementation of the method for large
typically d ∼ log n. Among the few parallel implementations
inputs, named PLM. We also extend the method by adding
competing in the DIMACS challenge, Fagginger Auer and
a refinement phase on every level, which yields the PLMR
Bisseling [9] submitted an agglomerative algorithm with an
algorithm. In addition to these basic algorithms, we also
implementation for both the GPU (using NVIDIA CUDA) and
implement a two-phase approach that combines them. It is
the CPU (using Intel TBB). The algorithm weights all edges
inspired by ensemble learning, in which the output of several
with the difference in modularity resulting from a contraction
weak classifiers is combined to form a strong one. In our case,
of the edge, then computes a heavy matching M and contracts
multiple base algorithms run in parallel as an ensemble. Their
according to M . This process continues recursively with
solutions are then combined to form the core communities,
a hierarchy of successively smaller graphs. The matching
representing the consensus of all base algorithms. The graph
procedure can adapt to star-like structures in the graph to
is coarsened according to the core communities, and then
avoid insufficient parallelism due to small matchings. In the
assigned to a single final algorithm. Within this extensible
challenge, the CPU implementation competed as CLU_TBB
framework, which we call the ensemble preprocessing method
and proved exceptionally fast. Independently, Riedy et al. [30]
(EPP), we apply PLP as base algorithms and PLMR as the
developed a similar method, which follows the same principle
final algorithm.
but does not provide the adaptation to star-like structures.
With our shared-memory parallel implementation of com- An improved implementation, labeled CEL in the following,
munity detection by label propagation (PLP), we provide an corresponds to the description in [29].
extremely fast basic algorithm that scales well with the number Community detection by label propagation belongs to the
of processors (considering the heterogeneous structure of the class of local move heuristics. It has originally been described
input). The processing rate of PLP reaches 50M edges per by Raghavan et al. [28]. Several variants of the algorithm exist,
second for large graphs, making it suitable for massive data one of them (under the name peer pressure clustering) is due
sets. With PLM, we present the first parallel implementation of to Gilbert et al. [13]. The latter use the algorithm as a proto-
the Louvain community detection method for massive inputs, type application within a parallel toolbox that uses numerical
and demonstrate that it is both fast and qualitatively strong. algorithms for combinatorial problems. Unfortunately, Gilbert
We show that solution quality can be further improved by et al. report running times only for a different algorithm, which
extending the method with a refinement phase on every level solves a very specific benchmark problem and is not applicable
of the hierarchy, yielding the PLMR algorithm. The EPP in our context. A variant of label propagation by Soman and
ensemble algorithm can yield a good quality-speed tradeoff on Narang [33] for multicore and GPU architectures exists, which
some instances when an even lower time to solution is desired. seeks to improve quality by re-weighting the graph.
3

A locally greedy multilevel-algorithm known as the Louvain niques. Both basic approaches can be adapted for parallelism,
method [5] combines local node moves with a bottom-up but this is currently the exception rather than the norm in our
multilevel approach. Bhowmick and Srinivasan [3] presented scenario. In this work we compare our own algorithms with
a previous parallel version of the algorithm. According to their the best currently available, sequential and parallel alike.
experimental results, our implementation is about four orders
of magnitude faster. Noack and Rotta [31] evaluate similar se- III. A LGORITHMS
quential multilevel algorithms, which combine agglomeration In this section we formulate and describe our parallel vari-
with refinement. ants of existing sequential community detection algorithms,
as well as ensemble techniques which combine them. Imple-
Ovelgönne and Geyer-Schulz [27] apply the ensemble learn-
mentation details are also discussed. We use the following
ing paradigm to community detection. They develop what
notation: A graph, the abstraction of a network data set, is
they call the Core Groups Graph Clusterer scheme, which
denoted as G = (V, E) with a node set V of size n and
we adapt as the Ensemble Preprocessing (EPP) algorithm.
an edge set E of size m. In the following, edges {u, v} are
They also introduce an iterated scheme in which the core
undirected and have weights ω : E →P R+ . The weight of
communities are again assigned to an ensemble, creating a 0
a set of nodes is denoted as ω(E ) := {u,v}∈E’ ω(u, v). A
hierarchy of solutions/coarsened graphs until quality does
community detection solution ζ = {C1 , . . . , Ck } is a partition
not improve any more. Within this framework, they employ
of the node set V into disjoint subsets called communities.
Randomized Greedy (RG), a variant of the aforementioned
Equivalently, such a solution can be understood as a mapping
CNM algorithm. It avoids a loss in solution quality that
where ζ(v) returns the community containing node v. For our
arises from highly unbalanced community sizes. The resulting
implementation, the nodes have consecutive integer identifiers
CGGC algorithm emerged as the winner of the Pareto part
id(v) and edges are pairs of node identifiers. A solution is
of the DIMACS challenge, which related quality to speed
represented as an array indexed by integer node identifiers
according to specific rules. Recently Ovelgönne [26] presented
and containing integer community identifiers.
a distributed implementation (based on the big data frame-
work Hadoop) of an ensemble preprocessing scheme using A. Parallel Label Propagation (PLP) à
label propagation as a base algorithm. This implementation relire avec plus
processes a 3.3 billion edge web graph in a few hours on a a) Algorithm: Community detection d'attention
by label propaga-
50 machine Hadoop cluster [26, p. 73]. (Our OpenMP-based tion, as originally introduced by Raghavan et al. [28], extracts
implementation of the similar EPP algorithm requires only 4 communities from a labelling V → N of the node set. Initially,
minutes on a shared-memory machine with 16 physical cores.) each node is assigned a unique label, and then multiple
iterations over the node set are performed: In each iteration,
From an algorithmic perspective, disjoint community de- every node adopts the most frequent label in its neighborhood
tection is related to graph partitioning (GP). Although the (breaking ties arbitrarily). Densely connected groups of nodes
problems are different in important aspects (unbalanced vs bal- thus agree on a common label, and eventually a globally
anced blocks, unknown vs known number of blocks, different stable consensus is reached, which usually corresponds to a
objectives), algorithms such as the Louvain method or PLMR good solution for the network. Label propagation therefore
bear conceptual resemblance to multilevel graph partitioners. finds communities in nearly linear time: Each iteration takes
Exploiting parallelism has been studied extensively for GP. O(m) time, and the algorithm has been empirically shown
Several established tools are discussed in recent surveys [4], to reach a stable solution in only a few iterations, though
[7], most of them for machines with distributed memory. Often not mathematically proven to do so. The number of iterations
employed techniques are parallel matchings for coarsening and seems to depend more on the graph structure than the size.
parallel variants of Fiduccia-Mattheyses for local improve- More theoretical analysis is done by Kothapalli et al. [16].
ment. These techniques are at best partially helpful in our The algorithm can be described as a locally greedy coverage
scenario since vanilla matching-based coarsening is ineffective maximizer, i.e. it tries to maximize the fraction of edges which
on complex networks and distributed-memory parallelism is are placed within communities rather than across. With its
not necessary for us. Related to our work is a recent study on purely local update rule, it tends to get stuck in local optima
multithreaded GP by LaSalle and Karypis [20], who explore of coverage which implicitly are good solutions with respect
the design space of multithreaded GP algorithms. Their results to modularity: A label is likely to propagate through and cover
provide interesting insights, but are not completely transferable a dense community, but unlikely to spread beyond bottlenecks.
to our scenario. Very recently they presented Nerstrand [21], a The local update rule and the absence of global variables make
fast parallel community detection algorithm based on modular- label propagation well-suited for a parallel implementation.
ity maximization and the multilevel paradigm, using different Algorithm 1 denotes PLP, our parallel variant of label
aggregation schemes. Our work on PLP in this paper has propagation. We adapt the algorithm in a straightforward way
also inspired a very promising parallel multilevel algorithm to make it applicable to weighted graphs. Instead of the most
for partitioning massive complex networks [24]. frequent label, the dominant label in P the neighborhood is
We observe that most efficient disjoint community detection chosen, i.e. the label l that maximizes u∈N (v):ζ(u)=l ω(v, u).
heuristics make use of agglomeration or local node moves, We continue the iteration until the number of nodes which
possibly in combination with multilevel or ensemble tech- changed their labels falls below a threshold θ.
4

Algorithm 1: PLP: Parallel Label Propagation exécuter sethas been found to avoid oscillation of labels on bipar-
Input: graph G = (V, E) algorithme tite structures [28]. When dealing with scale-free networks
Result: communities ζ : V → N whose degree distribution follows a power law, assigning
1 parallel for v ∈ V node ranges of equal size to each thread will lead to load
2 ζ(v) ← id(v) imbalance as computational cost depends on the node degree.
3 updated ← n Instead of statically dividing the iteration among the threads,
4 Vactive ← V
guided scheduling (with #pragma omp parallel for
5 while updated > θ do
6 updated ← 0 schedule(guided)) assigns node ranges of decreasing
7 parallel for v ∈ {un∈ Vactive : deg(u) > 0} o size from a queue to available threads. This way it can help
8 l? ← arg maxl
P
u∈N (v):ζ(u)=l ω(v, u)
to overcome load balancing issues, since threads processing
9 if ζ(v) 6= l? then large neighborhoods will receive fewer vertices in later phases
10 ζ(v) ← l? of the dynamical assignment process. This introduces some
11 updated ← updated + 1 overhead, but we observed that guided scheduling is generally
12 Vactive ← Vactive ∪ N (v) superior to static parallelization for PLP and similar methods.
13 else
14 Vactive ← Vactive \ {v}
B. Parallel Louvain Method (PLM)
15 return ζ
Algorithm: The Louvain method for community detection
comprendre implémentation was first presented by Blondel et al. [5]. It can be classified
as a locally greedy, bottom-up multilevel algorithm and uses
b) Implementation: We make a few modifications to the modularity as the objective function. In each pass, nodes are
original algorithm. In the original description [28], nodes repeatedly moved to neighboring communities such that the
are traversed in random order. Since the cost of explicitly locally maximal increase in modularity is achieved, until the
randomizing the node order in parallel is not insignificant, we communities are stable. Algorithm 2 denotes this move phase.
make this optional and rely on some randomization through Then, the graph is coarsened according to the solution (by con-
parallelism otherwise. We also observe that forgoing random- tracting each community into a supernode) and the procedure
ization has a negligible effect on quality. We avoid unnecessary continues recursively, forming communities of communities.
computation by distinguishing between active and inactive Finally, the communities in the coarsest graph determine those
nodes. It is unnecessary to recompute the label weights for in the input graph by direct prolongation.
a node whose neighborhood labels have not changed in the Computation of the objective function modularity is a cen-
previous iteration. Nodes which already have the heaviest label
P
tral part of the algorithm. Let ω(u, C) := {u,v}:v∈C ω(u, v)
become inactive (Algorithm 1, line 14), and are only reacti- be the weight of all edges from u to nodes in community
vated if a neighboring node is updated (line 12). We restrict C, and define
P the volume of a node and a community as
iteration to the set of active nodes. Iterations are repeated until vol(u) := {u,v}:v∈N (u) ω(u, v) + 2 · ω(u, u) and vol(C) :=
the number of nodes updated falls below a threshold value. The
P
u∈C vol(u), respectively. The modularity of a solution is
motivation for setting threshold values other than zero is that defined as
on some graph instances, the majority of iterations are spent on X ω(C) vol(C)2
updating only a very small fraction of high-degree nodes (see mod(ζ, G) := − (III.1)
ω(E) 4ω(E)2
Fig. 12 in the supplementary material for an example). Since C∈ζ
preliminary experiments have shown that time can be saved Note that the change in modularity resulting from a node
and quality is not significantly degraded by simply omitting move can be calculated by scanning only the local neighbor-
these iterations, we set an update threshold of θ = n · 10−5 . hood of a node, because the difference in modularity when
Note that we do not use the termination criterion specified moving node u ∈ C to community D is:
in [27] as it does not lead to convergence on some inputs.
The original criterion is to stop when all nodes have the label
ω(u, D \ {u}) − ω(u, C \ {u})
of the relative majority in their neighborhood [28]. ∆mod(u, C → D) =
ω(E)
Label propagation can be parallelized easily by dividing
(vol(C \ {u}) − vol(D \ {u})) · vol(u)
the range of nodes among multiple threads which operate +
on a common label array. This parallelization is not free 2 · ω(E)2
of race conditions, since by the time the neighborhood of We introduce a shared-memory parallelization of the Lou-
a node u is evaluated in iteration i to set ζi (u), a neigh- vain method (PLM, Algorithm 3) in which node moves are
bor v might still have the previous iteration’s label ζi−1 (v) evaluated and performed in parallel instead of sequentially.
or already ζi (v). The outcome thus depends on the order This approach may work on stale data so that a monotonous
of threads. However, these race conditions are acceptable modularity increase is no longer guaranteed. Suppose that
and even beneficial in an ensemble setting since they in- during the evaluation of a possible move of node u other
troduce random variations and increase base solution diver- threads might have performed moves that affect the ∆mod
sity. This also corresponds to asynchronous updating, which scores of u. In some cases this can lead to a move of
5

u that actually decreases modularity. Still, such undesirable associated with using an std::map to store for each node
decisions can also be corrected in a following iteration, which the weights of edges leading to neighboring communities. the
is why the solution quality is not necessarily worse. Working mechanism was replaced by one std::vector for each
only on independent sets of vertices in parallel does not of the p threads, leading to an acceleration of a factor of 2
provide a solution since the sets would have to be very small, on average, at the cost of a memory overhead of O(p · n).
limiting parallelism and/or leading to the undesirable effect of The former version (referred to as PLM*) can still be used
a very deep coarsening hierarchy. Concerns about termination optionally under tighter memory constraints.
turned out to be theoretical for our set of benchmark graphs, Graph coarsening according to communities is performed in
all of which can be successfully processed with PLM. The a straightforward way such that the nodes of a community in
community size resolution produced by PLM can be varied G are aggregated to a single node in G0 . An edge between two
through a parameter γ in the range [0, 2m], 0 yielding a single nodes in G0 receives as weight the sum of weights of inter-
community, 1 being standard modularity and 2m producing community edges in G, while self-loops preserve the weight
singletons. Tuning this parameter is a possible practical rem- of intra-community edges. A mapping π of nodes in the fine
edy [18] against modularity’s resolution limit. graph to nodes in the coarse graph is also returned. In earlier
versions of PLM, the graph coarsening phase proved to be a
Algorithm 2: move: Local node moves for modularity major sequential bottleneck. We address this problem with a
gain parallel coarsening scheme: Each thread first scans a portion
Input: graph G = (V, E), communities ζ : V → N of the edges in G and constructs a coarse graph G0t of its own.
Result: communities ζ : V → N These partial graphs are then combined into G0 by processing
1 repeat each node of G0 in parallel and merging the adjacencies stored
2 parallel for u ∈ V in each G0t .
3 δ ← maxv∈N (u) {∆mod(u, ζ(u) → ζ(v))}
4 C ← ζ(arg maxv∈N (u) {∆mod(u, ζ(u) → ζ(v))})
5 if δ > 0 then C. Parallel Louvain Method with Refinement (PLMR)
6 ζ(u) ← C
Following up on the work by Noack and Rotta on multilevel
7 until ζ stable techniques and refinement heuristics [31], we extend the Lou-
8 return ζ vain method by an additional move phase after each prolon-
gation. This makes it possible to re-evaluate node assignments
in view of the changes that happened on the next coarser level,
giving additional opportunities for modularity improvement at
Algorithm 3: PLM: Parallel Louvain Method
the cost of additional iterations over the node set in each level
Input: graph G = (V, E) of the hierarchy. We denote the method and implementation
Result: communities ζ : V → N
1 ζ ← ζsingleton (G) as PLMR for Parallel Louvain Method with Refinement. We
2 ζ ← move(G, ζ) present a recursive implementation in Algorithm 4 which uses
3 if ζ changed then the same concepts as PLM.
4 [G0 , π] ← coarsen(G, ζ)
5 ζ 0 ← PLM(G0 )
6 ζ ← prolong(ζ 0 , G, G0 , π) Algorithm 4: PLMR: Parallel Louvain Method with Re-
7 return ζ finement
Input: graph G = (V, E)
Result: communities ζ : V → N
Implementation: The main idea of PLM (Algorithm 3) 1 ζ ← ζsingleton (G)
2 ζ ← move(ζ, G)
is to parallelize both the node move phase and the coarsening 3 if ζ changed then
phase of the Louvain method. Since the computation of the 4 [G0 , π] ← coarsen(G, ζ)
∆mod scores is the most frequent operation, it needs to be 5 ζ 0 ← PLMR(G0 )
very fast. We store and update some interim values, which is 6 ζ ← prolong(ζ 0 , G, G0 , π)
7 ζ ← move(ζ, G)
not apparent from the high-level pseudocode in Algorithm 3.
An earlier implementation associated with each node a map 8 return ζ
in which the edge weight to neighboring communities was
stored and updated when node moves occurred. A lock for
each vertex v protected all read and write accesses to v’s map
since std::map is not thread-safe. Meant to avoid redundant D. Ensemble Preprocessing (EPP)
computation, we later discovered that this introduces too much In machine learning, ensemble learning is a strategy in
overhead (map operations, locks). Recomputing the weight to which multiple base classifiers or weak classifiers are com-
neighbor communities each time a node is evaluated turned bined to form a strong classifier. Classification in this context
out to be faster. The current implementation only stores and can be understood as deciding whether a pair of nodes
updates the volume of each community. An additional opti- should belong to the same community. We follow this general
mization to the PLM implementation eliminated the overhead idea, which has been applied successfully to graph clustering
6

before [27]. Subsequently, we describe an ensemble techniques IV. I MPLEMENTATION AND E XPERIMENTAL S ETUP
EPP. We also briefly describe algorithms for combining A. Framework and Settings
multiple base solutions.
The language of choice for all implementations is C++
according to the C++11 standard, allowing us to use object-
Algorithm 5: EPP: Ensemble Preprocessing oriented and functional programming concepts while also
Input: graph G = (V, E), ensemble size b compiling to native code. We implemented all algorithms on
Result: communities ζ : V → N top of a general-purpose adjacency array graph data structure.
1 parallel for i ∈ [1, b] Basically, it represents the adjacencies of each node by storing
2 ζi ← Basei (G) them in an std::vector, allowing for efficient insertions
3 ζ̄ ← combine(ζ1 , . . . , ζb ) and deletions of nodes and edges. A high-level interface
4 G0 , π ← coarsen(G, ζ̄) encapsulates the data structure and enables a clear and concise
5 ζ 0 ← Final(G0 )
6 ζ ← prolong(ζ 0 , G, G0 , π)
notation of graph algorithms. In particular, our interface con-
7 return ζ veniently supports parallel programming through parallel node
and edge iteration methods which receive a function and apply
it to all elements in parallel. Parallelism is achieved in the form
In a preprocessing step, assign G to an ensemble of of loop parallelization with OpenMP, using the parallel
base algorithms. The graph is then coarsened according to for directive with schedule(guided) where appropriate
the core communities ζ̄, which represent the consensus of for improved load balancing.
the base algorithms. Coarsening reduces the problem size We publish our source code under a permissive free software
considerably, and implicitly identifies the contested and the license to encourage reproduction, reuse and contribution by
unambiguous parts of the graph. After the preprocessing phase, the community. Implementations of all community detection
the coarsened graph G0 is assigned to the final algorithm, algorithms mentioned are part of NetworKit [36], our growing
whose result is applied to the input graph by prolongation. Our toolkit for network analysis.3 The software combines fast
implementation of the ensemble technique EPP is agnostic to parallel algorithms written in C++ with an interactive Python
the base and final algorithms and can be instantiated with a interface for flexible and interactive data analysis workflows.
variety of such algorithms. We instantiate the scheme with For representative experiments we average quality and speed
PLP as a base algorithm and PLMR as the final algorithm. values over multiple runs in order to compensate for fluctua-
Thus we achieve massive nested parallelism with several tions. Table I provides information on the multicore platform
parallel PLP instances running concurrently in the first phase, used for all experiments.
and proceed in the second phase with the more expensive
phipute1.iti.kit.edu
but qualitatively superior PLMR. This constitutes the EPP compiler gcc 4.8.1
algorithm (Algorithm 5). We write EPP(b, Base, Final) to CPU 2 x 8 Cores: Intel(R) Xeon(R)
indicate the size of the ensemble b and the types of base and E5-2680 0 @ 2.70GHz, 32 threads
final algorithm. RAM 256 GB
OS SUSE 13.1-64
Implementation: A consensus of b > 1 base algorithms
is formed by combining the base solutions ζi in the following Table I: Platform for experiments
way: Only if a pair of nodes is classified as belonging to the
same community in every ζi , then it is assigned to the same B. Networks
community in the core communities ζ̄. Formally, for all node
pairs u, v ∈ V : We perform experiments on a variety of graphs from
different categories of real-world and synthetic data sets. Our
focus is on real-world complex networks, but to add variety
∀i ∈ [1, b] ζi (u) = ζi (v) ⇐⇒ ζ̄(u) = ζ̄(v). (III.2) some non-complex and synthetic instances are included as
well. The test set includes web graphs (uk-2002, eu-2005,
We introduce a highly parallel combination algorithm based in-2004, web-BerkStan), internet topology networks
on hashing. With a suitable hash function h(ζ1 (v), . . . , ζb (v)), (as-22july06, as-Skitter, caidaRouterLevel),
the community identifiers of the base solutions are mapped social networks (soc-LiveJournal, fb-Texas84,
to a new identifier ζ̄(v) in the core communities. Except for com-youtube, wiki-Talk, soc-pokec, com-orkut),
unlikely hash collisions, a pair of nodes will be assigned to the scientific coauthorship networks (coAuthorsCiteseer,
same community only if the criterion above is satisfied. We coPapersDBLP), a connectome graph (con-fiber_big),
use a relatively simple function called djb2 due to Bernstein,2 a street network (europe-osm) and synthetic graphs
which appears sufficient for our purposes. The use of a b-way (G_n_pin_pout, kron_g500-simple-logn20,
hash function is fast due to a high degree of parallelism. hyperbolic-268M). Therefore, we cover a range of
graph-structural properties. Real-world complex networks are
2 hash functions: http://www.cse.yorku.ca/~oz/hash.html 3 NetworKit: https://networkit.iti.kit.edu/
7

graph n m maxdeg comp lcc

as-22july06 22963 48436 2390 1 0.3493 results for each network. The Pareto evaluation (Section V-F)
G_n_pin_pout 100000 501198 25 6 0.0040 then aims to condense this into a single performance score.
caidaRouterLevel 192244 609066 1071 308 0.2016
coAuthorsCiteseer 227320 814134 1372 1 0.7629
fb-Texas84 36371 1590655 6312 4 0.1985 A. Parallel Label Propagation (PLP)
com-youtube 1157828 2987624 28754 22939 0.1725
wiki-Talk 2394385 4659565 100029 2555 0.1991 PLP is extremely fast and able to handle the large graphs
web-BerkStan 685231 6649470 84230 677 0.6343
as-Skitter 1696415 11095298 35455 756 0.2930
easily. The “weak classifier” PLP is nonetheless able to detect
in-2004 1382908 13591473 21869 134 0.7013 an inherent community structure and produce a solution with
coPapersDBLP 540486 15245729 3299 1 0.8111 reasonable modularity values, although it cannot distinguish
eu-2005 862664 16138468 68963 1 0.6509
soc-pokec 1632804 22301964 14854 2 0.1223 communities in a Kronecker graph, which has a very weak
soc-LiveJournal 4847571 43369619 20334 1876 0.3667 community structure. To demonstrate strong scaling behavior,
kron_g500-simple... 1048576 44619402 131503 253380 0.2096
con-fiber_big 591428 46374120 5166 727 0.6024 we apply PLP to the large uk-2007-05 web graph and
europe-osm 50912018 54054660 13 1 0.0012 increase the number of threads from 1 to 32 (Figure 1). (Weak
com-orkut 3072627 117185083 33313 187 0.1735
uk-2002 18520486 261787258 194955 38359 0.6892 scaling results on PLP and PLM are shown in Figure 10.)
hyperbolic-268M 6710886 268851810 71585 1 0.7895 A speedup of about factor 8 is achieved when scaling from
uk-2007-05 105896555 3301876564 975419 756936 0.743
1 to 32 threads. Note that we have only 16 physical cores
Table II: Overview of graphs used in experiments and the step from 16 to 32 threads implies hyperthreading,
so that a lower speedup is expected. Our results indicate
that PLP can benefit from increased parallelism. Figure 13
heterogeneous data sets, which makes it impossible to pick an in the supplementary material breaks running times down by
ideal or generic instance from which to generalize. Our main iteration, showing that the vast majority of time is spent in the
test set is chosen such that it can be handled by competing first couple of iterations.
codes as well. It contains 20 networks from different domains.
500 10
With this test set we aim for generalizable results. Note
400 8
that the achievable modularity for a network depends on its

speedup
time [s]

300 6
size and inherent community structure, which may or may
200 4
not be distinctive, and varies widely among the instances.
100 2
The majority of test networks are taken from the collection
0 0
compiled for the 10th DIMACS Implementation Challenge4
1 2 4 8 16 32 1 2 4 8 16 32
as well as the Stanford Large Network Dataset Collection5 threads threads
and are freely available on the web. They are undirected,
unweighted graphs. Table II gives an overview over graph Figure 1: PLP strong scaling on the uk-2007-05 web graph
sizes as well as some structural features: A high maximum
node degree (maxdeg) indicates possible load balancing
issues. The number of connected components (comp) points B. Parallel Louvain Method (PLM)
to isolated single nodes or small groups of nodes. A high For PLM we observe only small deviations in quality
average local clustering coefficient (lcc) is an indicator for between single-threaded and multi-threaded runs, supporting
the presence of dense subgraphs. We evaluate solution quality the argument that the algorithm is able to correct undesirable
and running time for all of our own algorithms as well as decisions due to stale data. PLM detects communities with
several relevant competitors on this set. For those algorithms relatively high modularity in the majority of networks. Even
that can process in reasonable time the largest real-world large instances are processed in no more than a few minutes.
graph available to us, a web graph of the .uk domain with Figure 2 shows the scaling behavior of PLM. Since both
m ≈ 3.3 · 109 , we add further experiments (see Section V-H). the node move phase and the coarsening phase have been
To measure strong scaling, we run our parallel algorithms on parallelized, PLM profits from increased parallelism as well,
this web graph. achieving a speedup of factor 9 for 32 threads. In comparison
to PLP (Figure 6b), we observe that PLP can solve instances
V. E XPERIMENTS AND R ESULTS in only half the time required by PLM, but at a significant
loss of modularity. As discussed in Sec. VI, the communities
In this section we report on a representative subset of our detected by the two algorithms can be markedly different.
experimental results for our different parallel algorithms, as Because the Louvain method for community detection is well-
well as competing codes. Figures 6 and 7 (as well as Fig- known and accepted, we choose the performance of PLM as
ures 16 and 17 in the supplementary material) show running our baseline (Figure 6a) and present quality and running time
time and quality differences broken down by the networks of of other algorithms relative to PLM.
our test set. The bars of the charts are in ascending order
of graph size. We have selected a diverse test set and show C. Parallel Louvain Method with Refinement (PLMR)
4 DIMACS collection: http://www.cc.gatech.edu/dimacs10/downloads.shtml As shown by Figure 6c, adding a refinement phase generally
5 Stanford collection: http://snap.stanford.edu/data/index.html leads to a (sometimes significant) improvement in modularity.
8

1600 10 18
1400 16
8 14

speedup
1200 12

speedup
time [s]

1000 6 10
8
800 6
600 4 4
400 2
2 0
200
0 0 1 2 4 8 16 32

1 2 4 8 16 32 1 2 4 8 16 32 18
threads threads 16
14

speedup
12
10
Figure 2: PLM strong scaling on the uk-2007-05 web graph 8
6
4
2
0
1 2 4 8 16 32
This improvement is paid for by a small increase in running 18
16
time. The results indicate that our proposed extension of the 14

speedup
12
original Louvain method by a refinement phase can efficiently 10
8
increase solution quality. We also evaluate the scaling behavior 6
4
2
of each phase of the PLMR algorithm. In Figure 3 a yellow 0
bar indicates the running time on the finest graph while the 1 2 4 8 16 32
threads
red bar stops at the total running time of the phase. Time spent
on the finest graph clearly dominates all running times. Our Figure 4: PLMR strong scaling of the move, coarsening and
experiments show that the move and refinement phases scale refinement phases (top to bottom) – speedup factors aggregated
well with the number of threads, while the coarsening phase
only partially profits from parallelization. The results on this
graph are representative for the trend of the scaling behavior
for the algorithm’s phases: Figure 4 shows speedup factors for a cost of about 5 times the running time of PLP alone. It also
each of the phases, aggregated over the test set of 20 graphs. becomes clear that for small networks the approach does not
pay off as running time becomes dominated by the overhead of
1600 the ensemble scheme. In comparison to PLM (Figure 6d), the
1400
1200 ensemble approach can be slightly faster on some networks,
time [s]

1000
800 but quality is slightly worse in most cases. We conclude that
600
400 the ensemble technique EPP is effective in improving on
200
0 the quality of a single algorithm. While somewhat lower in
1 2 4 8 16 32
modularity, the communities detected are similar (see Sec. VI)
50 to those of the Louvain method. In practice, our acceleration
40
of the PLM algorithm have made the ensemble approach less
time [s]

30
20 relevant.
10
0
1 2 4 8 16 32 E. Comparison with State-of-the-Art Competitors
300 In this section we present results for an experimental com-
250
parison with several relevant competing community detection
time [s]

200
150 codes. These are mainly those which excelled in the DIMACS
100
50 challenge either by solution quality or time to solution: The
0
agglomerative algorithms CLU_TBB6 and RG, as well as
1 2 4 8 16 32
threads CGGC and CGGCi7 , ensemble algorithms based on RG.
We also include the widely used original sequential Louvain8
Figure 3: PLMR strong scaling of the move, coarsening and implementation, as well as the agglomerative algorithm CEL.
refinement phases (top to bottom) on uk-2007-05 In contrast to the DIMACS challenge, we run all codes on the
same multicore machine (Tab. I) and measure time to solution
D. Ensemble Preprocessing (EPP) for sequential and parallel ones alike.
Figure 15 in the supplementary material demonstrates the a) Louvain: Although not submitted to the DIMACS
effectiveness of the ensemble approach. Results were gen- competition, the original sequential implementation of the
erated by an EPP instance with a 4-piece PLP ensemble Louvain method is still relatively fast (Figure 7a). The
and PLMR as final algorithm in comparison to a single PLP marginally different modularity values in comparison to PLM
instance. We observe that the approach of EPP pays off in may be caused by subtle differences in the implementation. For
the form of improved modularity on most instances, exploiting 6 CLU_TBB http://www.staff.science.uu.nl/~faggi101/
differences in the base solutions and spending extra time on 7 RG etc: http://www.umiacs.umd.edu/~mov/
classifying contested nodes. For larger networks, this comes at 8 Louvain https://sites.google.com/site/findcommunities/
9

example, Louvain explicitly randomizes the order in which PLMR emerge as qualitatively strong and fast candidates,
nodes are visited, while we rely on implicit randomization closest to the lower right corner. (Their more memory-efficient
through parallelism. For the smallest graphs, running time implementation PLM* is about a factor of 2 slower.) It is
values are missing because the implementation reported a run- also evident that our extended version PLMR can improve
ning time of zero. Louvain eventually falls behind the parallel solution quality for a small computational extra charge. We
algorithm for large graphs, confirming that the overhead and recommend both PLM and PLMR as the default algorithms for
complexity introduced by parallelism is eventually justified parallel community detection in large networks. The original
when we target massive datasets. sequential implementation of the Louvain method is thus no
b) CLU_TBB and CEL: CLU_TBB, one of the few longer on the Pareto frontier since it cannot benefit from
parallel entries in the DIMACS competition, is a very fast multicore systems. RG and its ensemble combinations have
implementation of agglomerative modularity maximization, the best modularity scores by a narrow margin, while they are
solving the larger instances more quickly than PLM (Fig- by far the most computationally expensive ones, which places
ure 7b). Qualitatively however, PLM is clearly superior on them outside of the application scenario we target.
most networks. Both in terms of modularity and running time,
CLU_TBB occupies a middle ground between PLP and PLM, 512
CGGCi
and is qualitatively very similar to our ensemble algorithm 256 CGGC
EPP. CEL, as another fast parallel program, produced con- 128

sistently and significantly worse modularity than PLM, failed 64

RG
to produce a solution on some graphs, and is not as fast as 32

time score
PLP. 16
Louvain
c) RG, CGGC and CGGCi: Ovelgönne and Geyer- 8
4 CEL
Schulz entered the DIMACS challenge with an ensemble
approach conceptually similar to what we have developed in 2
PLM*
this paper. Their base algorithm is the sequential agglomerative EPP
1
RG, and two ensemble variants exist: CGGC implements PLMR
CLU_TBB PLM
anensemble technique very similar to EPP, while CGGCi 0 PLP
iterates the approach. The RG algorithm achieves a high
−0.25 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10
solution quality, surpassing PLM by a small margin on most modularity score
networks (Figure 7c). Quality is again slightly improved by the
ensemble approach CGGC and its iterated version CGGCi Figure 5: Pareto evaluation of community detection algorithms
(Figure 7d, and 16a in the SM), with the latter surpassing
any other heuristic known to us. However, all three are very G. LFR Benchmark
expensive in terms of computation time, often taking orders The LFR benchmark [19] is an established method for eval-
of magnitude longer than PLM. We consider running times of uating community detection algorithms: A generator produces
several hours for many of our networks no longer viable for graphs that resemble real complex networks and contain dense
the scenario we target, namely interactive data analysis on a communities which are the more sparsely connected the lower
parallel workstation. the mixing parameter µ. Algorithm performance is measured
as the accuracy in recognizing the ground truth communities
F. Pareto Evaluation supplied by the generator, in view of increasing difficulty (µ).
Although there are real-world networks that come with sup-
We have so far presented results broken down by data
posed ground truth communities (e. g. interest-based groups of
set to stress that observed effects may vary strongly from
online social networks in the SNAP collection), we consider
one network to another, a sign of the heterogeneity of real-
only a synthetic ground truth reliable enough for our purposes.
world complex networks. Additionally, we want to give a
In Figure 8 we plot the agreement (graph-structural Rand
condensed picture of the results. For this purpose we use the
index, where 1 is complete agreement) between detected and
previous experimental data to compute a score for running
ground truth communities for our algorithms, and show that
time and solution quality. The time score is the geometric
the PLM method is able to detect the ground truth even with
mean of running time ratios over our test set of networks
strong noise (µ = 0.8), while PLP (and hence EPP) is
with the running time of PLM as the baseline, while the
somewhat less robust.
modularity score is the arithmetic mean of absolute modularity
differences. Figure 5 shows the resulting points. It becomes
clear that all algorithms except CEL and EPP are placed on or
close to the Pareto frontier. PLP is unrivaled in terms of time
to solution, but solution quality is suboptimal. In the middle
ground between label propagation and Louvain method, the
parallel CLU_TBB achieves about the same modularity but
beats the ensemble approach in terms of speed. PLM and
10

as-22july06 as-22july06
G_n_pin_pout G_n_pin_pout
caidaRouterLevel caidaRouterLevel
coAuthorsCiteseer coAuthorsCiteseer
fb-Texas84 fb-Texas84
com-youtube com-youtube
wiki-Talk wiki-Talk
web-BerkStan web-BerkStan
as-Skitter as-Skitter
in-2004 in-2004
coPapersDBLP coPapersDBLP
eu-2005 eu-2005
soc-pokec soc-pokec
soc-LiveJournal soc-LiveJournal
kron_g500-simple-logn20 kron_g500-simple-logn20
con-fiber_big con-fiber_big
europe-osm europe-osm
com-orkut com-orkut
uk-2002 uk-2002
hyperbolic-268M hyperbolic-268M

0 0 1 2 3
0.0 0.2 0.4 0.6 0.8 1.0 10 10 10 10
modularity time [s]

(a) PLM : absolute quality and speed serve as baseline for comparison

−0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0.0 0.1 0.0 0.5 1.0 1.5 2.0 2.5 3.0
modularity difference time ratio

(b) PLP

−0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
modularity difference time ratio

0 0 1 2
−0.5 −0.4 −0.3 −0.2 −0.1 0.0 0.1 10 10 10
modularity difference time ratio

(d) EPP(4, PLP, PLMR)

Figure 6: Performance of our algorithms in comparison: PLM serves as the baseline. 32 threads used.
11

−0.02 −0.01 0.00 0.01 0.02 0.03 0.04 0.05 0 10 20 30 40 50 60

modularity difference time ratio

(a) Louvain

as-22july06 as-22july06
G_n_pin_pout G_n_pin_pout
caidaRouterLevel caidaRouterLevel
coAuthorsCiteseer coAuthorsCiteseer
fb-Texas84 fb-Texas84
com-youtube com-youtube
wiki-Talk wiki-Talk
web-BerkStan web-BerkStan
as-Skitter as-Skitter
in-2004 in-2004
coPapersDBLP coPapersDBLP
eu-2005 eu-2005
soc-pokec soc-pokec
soc-LiveJournal soc-LiveJournal
con-fiber_big con-fiber_big
europe-osm europe-osm
com-orkut com-orkut
uk-2002 uk-2002
hyperbolic-268M hyperbolic-268M

−0.25 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.0 0.5 1.0 1.5 2.0 2.5
modularity difference time ratio

(b) CLU_TBB

−0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0 50 100 150 200
modularity difference time ratio

−0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0 100 200 300 400 500 600 700 800 900
modularity difference time ratio

(d) CGGC

Figure 7: Performance of competitors relative to baseline PLM. 32 threads used for CLU_TBB.
12

1.0
a generator [22] based on a unit-disk graph model in hyper-
bolic geometry [17] (HUD), which produces both a power
0.8 law degree distribution and distinctive dense communities.
Figure 10 shows the results of weak scaling experiments for
accuracy

0.6
PLP and PLM. It must be noted that perfect scaling cannot
0.4 be expected due to the complex structure of the input. The
PLMR results of the respective last column have been obtained with
0.2 PLM
PLP hyperthreading, which explains the steeper increase. Fig. 14 in
EPP
0.0 the supplementary material show results for additional weak
0.0 0.2 0.4 0.6 0.8 1.0
mixing parameter µ scaling experiments on synthetic graphs generated with the
R-MAT model.
Figure 8: LFR benchmark (n = 105 ): accuracy in recognizing
ground truth while increasing inter-community edges 16 45
14 40
12 35
30

time [s]

time [s]
10
25
8
20
6 15
H. One More Massive Network 4 10
2 5
In addition to the experiments that went into the Pareto 0 0
evaluation, we run our parallel algorithms on the web graph 25 26 27 28 29 30 25 26 27 28 29 30
log(m) log(m)
uk-2007-05, at about 3.3 billion edges the largest real-world
data set currently available to us. CLU_TBB fails at reading Figure 10: PLP (left) and PLM (right) weak scaling on the
the input file. This leaves us with five of our own parallel series of HUD graphs
algorithms for Figure 9: EPP(4,PLP,PLMR) takes about 219
seconds, while PLM requires about 156 seconds to arrive at
a slightly higher modularity. As expected, PLP is by far the VI. Q UALITATIVE A SPECTS
fastest algorithm and terminates in less than a minute. If a
In this work we concentrate on achieving a good tradeoff
certain modularity loss (here 0.02) is acceptable, PLP is also
between high modularity, a widely accepted quality measure
an appropriate choice for quickly detecting communities in
for community detection, and low running time. Ideally one
billion-edge networks. The processing rate for PLP is over
should also look for further validation of the detected com-
53M edges/second and over 21M edges/second for PLM with
munities beyond good modularity. This is a difficult task for
respect to a complete run of each algorithm. These rates
several reasons. For most networks, we do not have a reliable
confirm the suitability of our algorithms for analyzing massive
ground-truth partition, especially because community structure
complex networks on a commodity shared-memory server.
is likely a multi-factorial phenomenon in real networks. Our
task is to uncover the hidden community structure of the
PLP
PLMR
network. In order to know whether we have succeeded in this
PLM* data mining task, we would have to check whether the solution
PLM helps us to formulate hypotheses to predict and explain real-
EPP(4,PLP,PLMR) world phenomena on the basis of network data. Whether one
0.90 0.92 0.94 0.96 0.98 1.00 solution is more appropriate than another may strongly depend
modularity on the domain of the network. Domain-specific validation of
PLP 52 s this kind goes beyond the scope of this paper as we focus on
PLMR 168 s parallelization aspects. Also, most sequential counterparts of
PLM* 203 s
our algorithms have been validated before, see e. g. [5].
PLM 156 s
219 s
However, we give an example to illustrate differences be-
EPP(4,PLP,PLMR)
tween our algorithms in a more qualitative way. Coarsening
0 0 1 2 3
10 10 10 10 the input graph according to the detected communities yields a
time [s]
community graph, which we then visualize by drawing the size
Figure 9: Modularity and running time at 32 threads for our of nodes proportional to the size of the respective community.
parallel algorithms on the massive web graph uk-2007-05 Figure 11 shows community graphs for the PGPgiantcompo
graph, a social network and web of trust resulting from
signatures on PGP keys. From top to bottom, the solutions
were produced by PLP, PLM, PLMR and EPP(4, PLP,
I. Weak Scaling PLMR). It is apparent that PLP has a much finer resolution
For weak scaling experiments, we use a series of synthetic and detects ca. 1000 small communities. This is true for
graphs where each graph has twice the size of its predecessor most of our data sets, but the inverse case also appears. On
(from log m = 25 . . . 30), and double the number of threads this network, higher modularity is associated with coarser
simultaneously from 1 to 32. The graphs were created using resolution. PLM, PLMR and EPP(4, PLP, PLMR) have a
13

very similar resolution and divide the network into ca. 100
communities. While PGPgiantcompo is admittedly a very
small graph, this example shows how community detection
can help to reduce the complexity of networks for visual
representation.

VII. C ONCLUSION AND F UTURE W ORK

We have developed and implemented several parallel algo-
rithms for community detection, a common and challenging
task in network analysis. Successful techniques and param-
eter settings have been identified in extensive experiments
on synthetic and real-world networks. They include three
standalone parallel algorithms, all of which are placed on the
Pareto frontier with respect to running time and modularity
in an experimental comparison with other state-of-the-art
implementations. While the PLP label propagation algorithm
is extremely fast, its solution might not always be satisfactory
for some applications. PLM is to the best of our knowledge
the first parallel variant of the established Louvain algorithm
which can handle massive inputs. On our machine, it detects
high-quality communities in a network with 3.3 billion edges
in under 3 minutes using 32 threads. Achieving significant par-
allel speedups over the frequently used sequential algorithm,
it can accelerate analysis workflows now and even further
on future multicore systems. Our modification PLMR of this
method adds a refinement phase which enhances modularity
for a small increase in running time.
Our implementations are published as a component of Net-
worKit [36], an open-source package of performant implemen-
tations for established and novel network analysis algorithms.
We invite researchers in algorithm engineering and network
science to benefit from our software development efforts and
consider contribution to the project. NetworKit is under active
development by several researchers and may be extended by
additional community detection methods in the future, e. g.
considering overlapping communities as well.

Acknowledgements: This work was partially supported by the

project Parallel Analysis of Dynamic Networks — Algorithm Engi-
neering of Efficient Combinatorial and Numerical Methods, which
is funded by the Ministry of Science, Research and the Arts Baden-
Württemberg. We thank Pratistha Bhattarai for her contributions to
the experimental study, Michael Hamann for optimizations to PLM,
and numerous contributors to the NetworKit project.

c 2015 IEEE. Citation information: DOI10.1109/TPDS.2015.2390633, IEEE Trans-
actions on Parallel and Distributed Systems. http://ieeexplore.ieee.org/xpl/articleDetails.
jsp?arnumber=7006796

Figure 11: Community graphs of the PGPgiantcompo web

of trust for (top to bottom) PLP, PLM, PLMR and EPP(4,
PLP, PLMR)
14

R EFERENCES [25] M. Müller-Hannemann and S. Schirra, Eds., Algorithm Engineering:

Bridging the Gap between Algorithm Theory and Practice, ser. Lecture
[1] D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, Eds., Graph Notes in Computer Science, vol. 5971. Springer, 2010.
Partitioning and Graph Clustering, ser. Contemporary Mathematics. [26] M. Ovelgönne, “Distributed community detection in web-scale net-
AMS and DIMACS, 2013, no. 588. works,” in Proc. Advances in Social Networks Analysis and Mining
[2] J. W. Berry, B. Hendrickson, R. A. LaViolette, and C. A. Phillips, “Tol- (ASONAM ’13), 2013, pp. 66–73.
erating the community detection resolution limit with edge weighting,” [27] M. Ovelgönne and A. Geyer-Schulz, “An ensemble learning strategy
Physical Review E, vol. 83, no. 5, p. 056119, 2011. for graph clustering,” in Graph Partitioning and Graph Clustering, ser.
[3] S. Bhowmick and S. Srinivasan, “A template for parallelizing the louvain Contemporary Mathematics. AMS and DIMACS, 2013, no. 588.
method for modularity maximization,” in Dynamics On and Of Complex [28] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithm
Networks, Volume 2. Springer, 2013, pp. 111–124. to detect community structures in large-scale networks,” Physical Review
[4] C. Bichot and P. Siarry, Graph Partitioning, ser. ISTE. Wiley, 2011. E, vol. 76, no. 3, p. 036106, 2007.
[5] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast [29] E. J. Riedy and D. A. Bader, “Multithreaded community monitoring
unfolding of communities in large networks,” Journal of Statistical for massive streaming graph data,” in Workshop on Multithreaded
Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, 2008. Architectures and Applications (MTAAP’13), 2013, pp. 1646–1655.
[6] U. Brandes, D. Delling, M. Gaertler, R. Görke, M. Hoefer, Z. Nikoloski, [30] E. J. Riedy, H. Meyerhenke, D. Ediger, and D. A. Bader, “Parallel com-
and D. Wagner, “On modularity clustering,” IEEE Trans. Knowledge and munity detection for massive graphs,” in Graph Partitioning and Graph
Data Engineering, vol. 20, no. 2, pp. 172–188, 2008. Clustering, ser. Contemporary Mathematics. AMS and DIMACS, 2013,
[7] A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz, “Recent no. 588.
advances in graph partitioning,” arXiv preprint arXiv:1311.3144, 2014. [31] R. Rotta and A. Noack, “Multilevel local search algorithms for
[8] A. Clauset, M. E. Newman, and C. Moore, “Finding community modularity clustering,” J. Exp. Algorithmics, vol. 16, pp. 2.3:2.1–
structure in very large networks,” Physical review E, vol. 70, no. 6, 2.3:2.27, Jul. 2011.
p. 066111, 2004. [32] S. E. Schaeffer, “Graph clustering,” Computer Science Review, vol. 1,
[9] B. O. Fagginger Auer and R. H. Bisseling, “Graph coarsening and no. 1, pp. 27–64, 2007.
clustering on the GPU,” in Graph Partitioning and Graph Clustering, [33] J. Soman and A. Narang, “Fast community detection algorithm with
ser. Contemporary Mathematics. AMS and DIMACS, 2013, no. 588. GPUs and multicore architectures,” in Proc. 25th IEEE Intl. Parallel &
[10] S. Fortunato, “Community detection in graphs,” Physics Reports, vol. Distributed Processing Symposium (IPDPS). IEEE, 2011, pp. 568–579.
486, no. 3-5, pp. 75 – 174, 2010. [34] C. Staudt, A. Schumm, H. Meyerhenke, R. Görke, and D. Wagner,
[11] S. Fortunato and M. Barthelemy, “Resolution limit in community “Static and dynamic aspects of scientific collaboration networks,” in
detection,” Proceedings of the National Academy of Sciences, vol. 104, Advances in Social Networks Analysis and Mining (ASONAM), 2012
no. 1, pp. 36–41, 2007. IEEE/ACM International Conference on. IEEE, 2012, pp. 522–526.
[12] U. Gargi, W. Lu, V. S. Mirrokni, and S. Yoon, “Large-scale community [35] C. L. Staudt and H. Meyerhenke, “Engineering high-performance com-
detection on youtube for topic discovery and exploration.” in Interna- munity detection heuristics for massive graphs,” in proceedings of the
tional Conference on Weblogs and Social Media, 2011. 2013 International Conference on Parallel Processing. Conference
[13] J. R. Gilbert, S. Reinhardt, and V. B. Shah, “High-performance graph al- Publishing Services (CPS), 2013.
gorithms from parallel sparse matrices,” in Applied Parallel Computing. [36] C. L. Staudt, A. Sazonovs, and H. Meyerhenke, “NetworKit: An inter-
State of the Art in Scientific Computing. Springer, 2007, pp. 260–269. active tool suite for high-performance network analysis,” arXiv preprint
[14] M. Girvan and M. Newman, “Community structure in social and arXiv:1403.3005, 2014.
biological networks,” Proc. of the National Academy of Sciences, vol. 99, [37] J. Yang and J. Leskovec, “Defining and evaluating network communities
no. 12, p. 7821, 2002. based on ground-truth,” in Proceedings of the ACM SIGKDD Workshop
[15] P. F. Jonsson, T. Cavanna, D. Zicha, and P. A. Bates, “Cluster analysis on Mining Data Semantics. ACM, 2012, p. 3.
of networks generated through homology: automatic identification of
important protein communities involved in cancer metastasis,” BMC Christian L. Staudt received his Diplom de-
Bioinformatics, vol. 7, p. 2, 2006.
gree in computer science from Karlsruhe In-
[16] K. Kothapalli, S. Pemmaraju, and V. Sardeshmukh, “On the analysis of
a label propagation algorithm for community detection,” in Distributed
stitute of Technology (KIT) in 2012. He is
Computing and Networking, ser. Lecture Notes in Computer Science. currently a researcher and PhD candidate in the
Springer Berlin Heidelberg, 2013, vol. 7730, pp. 255–269. Parallel Computing Group, Institute of Theoret-
[17] D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguñá, ical Informatics, KIT. His research focuses on
“Hyperbolic geometry of complex networks,” Physical Review E, developing efficient algorithms and software for
vol. 82, p. 036106, Sep 2010. the analysis of large complex networks. Beyond
[18] R. Lambiotte, “Multi-scale modularity in complex networks,” in Model- that, he is interested in how network analysis
ing and optimization in mobile, ad hoc and wireless networks (WiOpt), methods can enable the study of complex sys-
2010 Proceedings of the 8th International Symposium on. IEEE, 2010, tems in various domains.
pp. 546–553.
[19] A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmark graphs for Henning Meyerhenke is an Assistant Professor
testing community detection algorithms,” Physical Review E, vol. 78, (Juniorprofessor) in the Institute of Theoretical
no. 4, p. 046110, 2008. Informatics at Karlsruhe Institute of Technol-
[20] D. LaSalle and G. Karypis, “Multi-threaded graph partitioning,” in Proc.
ogy (KIT), Germany, since October 2011. Be-
27th IEEE Intl. Symposium on Parallel and Distributed Processing
(IPDPS 2013). IEEE Computer Society, 2013, pp. 225–236.
fore joining KIT, Henning was a Postdoctoral
[21] ——, “Multi-threaded modularity based graph clustering using the Researcher in the College of Computing at
multilevel paradigm,” Journal of Parallel and Distributed Computing, Georgia Institute of Technology (USA) and at
2014, http://dx.doi.org/10.1016/j.jpdc.2014.09.012. the University of Paderborn (Germany) as well
[22] M. von Looz, C. L. Staudt, H. Meyerhenke, and R. Prutkin, as a Research Scientist at NEC Laboratories
“Fast generation of dynamic complex networks with underlying Europe. Henning received his Diplom degree
hyperbolic geometry,” Karlsruhe Institute of Technology, Karlsruhe in Computer Science from Friedrich-Schiller-
Reports in Informatics 2014,14, November 2014. [Online]. Available: University Jena, Germany, in 2004 and his Ph.D.
http://digbib.ubka.uni-karlsruhe.de/volltexte/1000043881 in Computer Science from the University of
[23] P. D. Meo, E. Ferrara, G. Fiumara, and A. Provetti, “Mixing local
Paderborn, Germany, in 2008. Dr. Meyerhenke’s main research in-
and global information for community detection in large networks,” J.
Comput. Syst. Sci., vol. 80, no. 1, pp. 72–87, 2014. terests are efficient sequential and parallel algorithms and tools for
[24] H. Meyerhenke, P. Sanders, and C. Schulz, “Parallel graph partitioning applications in network analysis, combinatorial scientific computing,
for complex networks,” in Proc. 29th IEEE Intl. Symposium on Parallel and the life sciences.
and Distributed Processing (IPDPS), 2015, to appear.
15

A PPENDIX

108
107 active
106 updated
105
104
103
102
101
100 0 20 40 60 80 100 120
Figure 12: Number of active and updated labels per iteration
of PLP for the web graph uk-2002.

5
10

4
10
running time [ms]

3
10

2
10

1
10

0
10
0
1 21
iteration

Figure 13: PLP running time in milliseconds per iteration for

the uk-2007-05 web graph, at 32 threads.

50 1400
1200
40
1000
time [s]

time [s]

30 800
20 600
400
10
200
0 0
19 20 21 22 23 24 19 20 21 22 23 24
log(n) log(n)

Figure 14: PLP (left) and PLM (right) weak scaling on the
series of R-MAT graphs.
16

0 0 1 2
−0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 10 10 10
modularity difference time ratio

Figure 15: Difference in quality (left) and running time time ratio (right) of EPP(4,PLP,PLMR) compared to a single PLP.

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0 200 400 600 800 1000 1200
modularity difference time ratio

(a) CGGCi

Figure 16: Performance of the competitor algorithm CGGCi relative to baseline PLM.

0 0 1
−0.005 0.000 0.005 0.010 0.015 10 10
modularity difference time ratio

(a) PLM*

Figure 17: Performance of our PLM* algorithm relative to baseline PLM.

Salah Article Published
No ratings yet
Salah Article Published
39 pages
Presentation1-1
No ratings yet
Presentation1-1
42 pages
16-EJS1206
No ratings yet
16-EJS1206
26 pages
Community Detection
No ratings yet
Community Detection
41 pages
04 Communities
No ratings yet
04 Communities
78 pages
UNIT7-Community Detection
No ratings yet
UNIT7-Community Detection
91 pages
Louvain-Like Methods For Community Detection in Multi-Layer Networks
No ratings yet
Louvain-Like Methods For Community Detection in Multi-Layer Networks
16 pages
1 s2.0 S0370157309002841 Main PDF
No ratings yet
1 s2.0 S0370157309002841 Main PDF
100 pages
Week 2 - Social Network Analysis
No ratings yet
Week 2 - Social Network Analysis
30 pages
LCD Documentation
No ratings yet
LCD Documentation
62 pages
COMPLEX_NETWORKS_2017_paper_190
No ratings yet
COMPLEX_NETWORKS_2017_paper_190
13 pages
s12530-018-9244-x
No ratings yet
s12530-018-9244-x
11 pages
Genetic Algorithm-based Community Detection in Lar
No ratings yet
Genetic Algorithm-based Community Detection in Lar
18 pages
Blondel_2024_J._Stat._Mech._2024_10R001
No ratings yet
Blondel_2024_J._Stat._Mech._2024_10R001
23 pages
E-Communities -Part1
No ratings yet
E-Communities -Part1
80 pages
SNA-Community Detection
No ratings yet
SNA-Community Detection
38 pages
Strategies For Improving The Quality of Community Detection Based On Modularity Optimization
No ratings yet
Strategies For Improving The Quality of Community Detection Based On Modularity Optimization
11 pages
A Comprehensive Survey On Community Detection Methods and Applications in Complex Information Networks
No ratings yet
A Comprehensive Survey On Community Detection Methods and Applications in Complex Information Networks
47 pages
Community Detection in Social Media: Performance and Application Considerations
No ratings yet
Community Detection in Social Media: Performance and Application Considerations
40 pages
High Quality, Scalable and Parallel Community Detection For Large Real Graphs
No ratings yet
High Quality, Scalable and Parallel Community Detection For Large Real Graphs
11 pages
2 - Cin2022-7084084
No ratings yet
2 - Cin2022-7084084
9 pages
Module3 Communitynetworks
No ratings yet
Module3 Communitynetworks
102 pages
Sciadv Adg9159-2
No ratings yet
Sciadv Adg9159-2
10 pages
Network Centrality Measures in a Graph
No ratings yet
Network Centrality Measures in a Graph
16 pages
Communities and Bottlenecks Trees and Treelike Networks Have High Modularity
No ratings yet
Communities and Bottlenecks Trees and Treelike Networks Have High Modularity
9 pages
CC-GA
No ratings yet
CC-GA
12 pages
Community Detection
No ratings yet
Community Detection
72 pages
Email Spam Classification
No ratings yet
Email Spam Classification
10 pages
Fast Unfolding of Communities in Large Networks
No ratings yet
Fast Unfolding of Communities in Large Networks
14 pages
A-modified-label-propagation-algorithm-fo_2021_International-Journal-of-Info
No ratings yet
A-modified-label-propagation-algorithm-fo_2021_International-Journal-of-Info
11 pages
نظام ذكي لدعم اتخاذ القرار لتحسين كفاءة مؤسسات التعليم العالي
No ratings yet
نظام ذكي لدعم اتخاذ القرار لتحسين كفاءة مؤسسات التعليم العالي
12 pages
Building Workshop Practice
No ratings yet
Building Workshop Practice
12 pages
A Multiobjective Genetic Algorithm To Find Communities in Complex Networks
No ratings yet
A Multiobjective Genetic Algorithm To Find Communities in Complex Networks
13 pages
Contract Template - Articles
No ratings yet
Contract Template - Articles
2 pages
Sna Unit III
No ratings yet
Sna Unit III
10 pages
Clauset Et Al - 2004 - Finding Community Structure in Very Large Networks
No ratings yet
Clauset Et Al - 2004 - Finding Community Structure in Very Large Networks
6 pages
Community Detection in Social Media: Symeon Papadopoulos
No ratings yet
Community Detection in Social Media: Symeon Papadopoulos
75 pages
Weighted Graph Clustering For Community Detection of Large Social Networks
No ratings yet
Weighted Graph Clustering For Community Detection of Large Social Networks
10 pages
Sna It Unit3
No ratings yet
Sna It Unit3
19 pages
Community Detection in Social Network Ver4
No ratings yet
Community Detection in Social Network Ver4
23 pages
Evaluation and Treatment of Tinnitus Com
No ratings yet
Evaluation and Treatment of Tinnitus Com
294 pages
Community Moore
No ratings yet
Community Moore
6 pages
Data Science 5th Assignment
No ratings yet
Data Science 5th Assignment
13 pages
Extraction and Classification of Dense Communities in The Web
No ratings yet
Extraction and Classification of Dense Communities in The Web
10 pages
Statistical Properties of Community Structure in Large Social &amp Information Networks
100% (2)
Statistical Properties of Community Structure in Large Social &amp Information Networks
10 pages
FOCS Fast Overlapped Community Search
No ratings yet
FOCS Fast Overlapped Community Search
12 pages
The Wellbeing Budget 2019
No ratings yet
The Wellbeing Budget 2019
157 pages
An Improved Louvain Algorithm For Community Detect
No ratings yet
An Improved Louvain Algorithm For Community Detect
14 pages
An Empirical Comparison of Algorithms To Find Communities in Directed Graphs and Their Application in Web Data Analytics
No ratings yet
An Empirical Comparison of Algorithms To Find Communities in Directed Graphs and Their Application in Web Data Analytics
19 pages
Community-Affiliation Graph Model For Overlapping Network Community Detection
No ratings yet
Community-Affiliation Graph Model For Overlapping Network Community Detection
6 pages
Comparative Analysis of Community Detection Algorithms
No ratings yet
Comparative Analysis of Community Detection Algorithms
5 pages
Finding Community Structure in Very Large Networks
No ratings yet
Finding Community Structure in Very Large Networks
6 pages
Chairman's Speech
No ratings yet
Chairman's Speech
2 pages
Distributed Community Detection With The WCC
No ratings yet
Distributed Community Detection With The WCC
7 pages
Designing Resilient Distributed Systems with CAP: Definitive Reference for Developers and Engineers
From Everand
Designing Resilient Distributed Systems with CAP: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach
No ratings yet
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach
10 pages
09.14 Jenis Sarana Perkeretaapian
No ratings yet
09.14 Jenis Sarana Perkeretaapian
39 pages
Detecting Community Structures in Signed Social Networks (An Automated Approach)
No ratings yet
Detecting Community Structures in Signed Social Networks (An Automated Approach)
6 pages
NonNegative Matrix Factorizations For Multiplex Network Analysis
No ratings yet
NonNegative Matrix Factorizations For Multiplex Network Analysis
13 pages
Applications of Machine and Deep Learning in Adaptive Immunity
No ratings yet
Applications of Machine and Deep Learning in Adaptive Immunity
27 pages
Art Types
No ratings yet
Art Types
14 pages
Finding Community Structure in Very Large Networks
No ratings yet
Finding Community Structure in Very Large Networks
6 pages
Shi2021 Article ACommunityDetectionAlgorithmBa-1
No ratings yet
Shi2021 Article ACommunityDetectionAlgorithmBa-1
1 page
Nissan EM Motor
No ratings yet
Nissan EM Motor
2 pages
Community Detection Using Statistically Significant Subgraph Mining
No ratings yet
Community Detection Using Statistically Significant Subgraph Mining
10 pages
Community Detection
No ratings yet
Community Detection
5 pages
Chapter 6
No ratings yet
Chapter 6
32 pages
Online Filing of S1 Form
No ratings yet
Online Filing of S1 Form
46 pages
Osaka Rebranding Final Report
No ratings yet
Osaka Rebranding Final Report
17 pages
Hayward Auto-Skim Manual
No ratings yet
Hayward Auto-Skim Manual
8 pages
Community Detection in Social Networks An Overview
No ratings yet
Community Detection in Social Networks An Overview
6 pages
ATG (G-11 FABM 1)
No ratings yet
ATG (G-11 FABM 1)
3 pages
Distributed Systems and Beyond
From Everand
Distributed Systems and Beyond
Pasquale De Marco
No ratings yet
Cold Startup Procedure PDF
No ratings yet
Cold Startup Procedure PDF
6 pages
Purcom Finals
No ratings yet
Purcom Finals
7 pages
Grid Computing: A Revolutionary Approach to Scientific Research and Data Management
From Everand
Grid Computing: A Revolutionary Approach to Scientific Research and Data Management
Pasquale De Marco
No ratings yet
NCC and NAAC Presentation
No ratings yet
NCC and NAAC Presentation
9 pages
Row Context and Filter Context
No ratings yet
Row Context and Filter Context
7 pages
Remedies Latin Name Abbreviation English Name
100% (1)
Remedies Latin Name Abbreviation English Name
3 pages
Theories of Costing & Cost Accounting
No ratings yet
Theories of Costing & Cost Accounting
5 pages
SLeM 1 Math 10 Q1
No ratings yet
SLeM 1 Math 10 Q1
9 pages
EPHC Manual
No ratings yet
EPHC Manual
3 pages
009.0 - RH120E - Attachment Functions BH - Neu
No ratings yet
009.0 - RH120E - Attachment Functions BH - Neu
24 pages
GMCC EE75H1F-U 1de3 HP BT
67% (3)
GMCC EE75H1F-U 1de3 HP BT
3 pages
Tata Nano Battery From Chennai Maruthi Power
No ratings yet
Tata Nano Battery From Chennai Maruthi Power
3 pages
Case Study Questions Class 11
No ratings yet
Case Study Questions Class 11
24 pages
Understand Closures in JavaScript - by Brandon Morelli - Codeburst
No ratings yet
Understand Closures in JavaScript - by Brandon Morelli - Codeburst
1 page
John Hopkins Interview
No ratings yet
John Hopkins Interview
4 pages
Prof. Rosarie Anne S. Borja Professor - Guidance Counselor Pnu-Itl Taft Avenue, Manila
No ratings yet
Prof. Rosarie Anne S. Borja Professor - Guidance Counselor Pnu-Itl Taft Avenue, Manila
3 pages
Tracking Solar Panel
No ratings yet
Tracking Solar Panel
10 pages

Engineering Parallel Algorithms For Community Detection in Massive Networks

Uploaded by

Engineering Parallel Algorithms For Community Detection in Massive Networks

Uploaded by

1

Engineering Parallel Algorithms for

graph n m maxdeg comp lcc

sistently and significantly worse modularity than PLM, failed 64

(d) EPP(4, PLP, PLMR)

−0.02 −0.01 0.00 0.01 0.02 0.03 0.04 0.05 0 10 20 30 40 50 60

VII. C ONCLUSION AND F UTURE W ORK

Acknowledgements: This work was partially supported by the

Figure 11: Community graphs of the PGPgiantcompo web

R EFERENCES [25] M. Müller-Hannemann and S. Schirra, Eds., Algorithm Engineering:

Figure 13: PLP running time in milliseconds per iteration for

Figure 17: Performance of our PLM* algorithm relative to baseline PLM.

You might also like