Dynamic_Community_Evolution_Analysis_Framework_for_Large-Scale_Complex_Networks_Based_on_Strong_and_Weak_Events
Dynamic_Community_Evolution_Analysis_Framework_for_Large-Scale_Complex_Networks_Based_on_Strong_and_Weak_Events
Abstract—Community evolution remains a heavily researched proposed and denoted by a weak-event-based community evolu-
and challenging area in the analysis of dynamic complex network tion method (WECEM). The framework can be summarized by
structures. Currently, the primary limitation of traditional event- the following: 1) communities in complex networks with adjacent
based approaches for community evolution analysis is the lack time-stamps are compared to determine the community overlap-
of strict constraint conditions for distinguishing evolutionary ping degree and community membership degree; 2) the values of
events, which entails that as the cardinality of discovered events the community overlapping degree and membership degree meet
increases, so does the number of redundant events. Another lim- the definition of events; and 3) weak events are effectively iden-
itation of existing approaches is the lack of consideration for tified. Extensive experimental results, on real and synthetic data
weak events. Weak events can be generated by small changes in sets consisting of dynamic complex networks and online social
communities, which are empirically prevalent, and are typically networks, demonstrate that WECEM is able to identify weak
not captured by traditional events. To manage these two afore- events more effectively than traditional frameworks. Specifically,
mentioned limitations, this research aims to formalize a weak WECEM outperforms traditional frameworks by 22.9% in the
and strong events-based framework, which includes the following number of discovered strong events. The detection accuracy of
newly discovered events: “weak shrink,” “weak expand,” “weak evolutionary events is approximately 12.2% higher than that
merge,” and “weak splity” predicated on the community over- of traditional event-based frameworks. It is also worth noting
lapping degree and community degree membership, this article that, as the cardinality of the data grows, the proposed frame-
refines these traditional strong events, as well as new constraints work, when compared with traditional frameworks, can more
for weak events. In addition, a community evolution mining effectively, and efficiently, mine large-scale complex networks.
framework, which is based on both strong and weak events, is
Index Terms—Community detection, community evolution
analysis, complex networks, event-based framework, weak events.
Manuscript received March 28, 2019; revised September 23, 2019; accepted
December 12, 2019. Date of publication January 7, 2020; date of current ver-
sion September 16, 2021. This work was supported in part by the National I. I NTRODUCTION
Natural Science Foundation of China under Grant 61772091, Grant 61802035,
OMPLEX networks are distinct from simple networks,
Grant 61962006, and Grant 71701026, in part by the Sichuan Science
and Technology Program under Grant 2018JY0448, Grant 2019YFG0106,
and Grant 2019YFS0067, in part by the Natural Science Foundation of
Guangxi under Grant 2018GXNSFDA138005, and in part by the Guangdong
C such as lattices or random graphs, such that they have
nontrivial topological features. These complex networks are
Province Key Laboratory of Popular High Performance Computers under ubiquitous in our everyday lives, and examples include online
Grant 2017B030314073. This article was recommended by Associate Editor social networks, such as Facebook and Twitter. Community
J. Lu. (Corresponding author: Nan Han.)
Shaojie Qiao is with the School of Software Engineering, Chengdu structures are generally inherent in complex networks, and
University of Information Technology, Chengdu 610225, China. can be best exemplified by groups of nodes in which the
Nan Han is with the School of Management, Chengdu University network connections are dense, but between which connec-
of Information Technology, Chengdu 610103, China (e-mail:
[email protected]). tions are sparser [1]. Community structures, representing a
Yunjun Gao is with the College of Computer Science and Technology, mesoscale structure of networks, accordingly, are viewed as
Zhejiang University, Hangzhou 310027, China. one of the most important characteristics of complex networks.
Rong-Hua Li is with the School of Computer Science and Technology,
Beijing Institute of Technology, Beijing 100081, China, also with the Such structures provide immense social and economic value
Guangdong Province Key Laboratory of Popular High Performance in understanding the processes of network formation, growth
Computers, Shenzhen University, Shenzhen 518060, China, and also with the and shrinkage, information dissemination and public opinion
Guangdong Province Engineering Center of China-made High Performance
Data Computing System, Shenzhen University, Shenzhen 518060, China. analysis. Dynamic analysis of complex networks, especially
Jianbin Huang is with the School of Computer Science and Technology, assessing the evolution of communities, can provide insights
Xidian University, Xi’an 710071, China. into: 1) detecting a drastic change in the interaction patterns;
Heli Sun is with the School of Computer Science and Technology, Xi’an
Jiaotong University, Xi’an 710049, China. 2) understanding the latent structures of complex networks;
Xindong Wu is with the Mininglamp Academy of Sciences, Mininglamp and 3) forecasting the future trends of networks [2], [3].
Technology, Beijing 100084, China, and also with the Key Laboratory of Motivation: In real-world applications, community struc-
Knowledge Engineering with Big Data, Hefei University of Technology, Hefei
230009, China. tures represent a dynamically changing phenomenon.
Digital Object Identifier 10.1109/TSMC.2019.2960085 Accordingly, communities are in constant flux: growing,
2168-2216
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6230 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021
shrinking, emerging, and disappearing all together. Examples New Framework: Evolutionary events are categorized into
include human migration in social networks, seasonal animal strong and weak events, and the application scenarios of weak
migrations, and topic transfers in blogs. Due to the cardinality events are introduced. Weak events are formally defined, as
of nodes in dynamic complex networks, static community well as a method for determining such events. In addition, a
discovery approaches cannot be effectively applied to analyze weak event-based mining framework is proposed, referred as
this evolution of communities. Thus, it follows that research WECEM.
in this area remains fundamentally important in precision Extensive Experimental Results: Extensive experiments are
marketing, crime prevention, traffic flow forecasting, and conducted in real and synthetic large-scale dynamic networks.
network congestion prediction. The following example We compare the proposed WECEM framework with state-of-
illustrates the importance of research in community evolution. the-art community evolution discovery methods to verify the
Example 1: Sina microblog is the largest blogging system quality, mining accuracy, and runtime performance.
in China. The distribution of the userbase is in constant flux,
with users joining and leaving on a regular basis. This dynamic
is indicative of a constantly changing network structure. In II. R ELATED W ORK
addition, the userbase represents a multitude of interests on In recent years, due to the popularity and ubiquity of online
a vast array of topics, which in itself contributes to the ever- social networks and large-scale complex networks, community
changing structure. Mining communities with such a userbase discovery, and evolutionary event mining have attracted a lot of
can provide insight and understanding into how the networks attention [5]–[8]. Most specifically, dynamic networks operate
change, and identify users’ points of interests. This knowl- as a powerful signal for forecasting the behavior of individu-
edge can then be applied to social networking platforms to als, route planning, personalized recommendations, and so on.
recommend services across various communities. Within this field, community detection remains fundamental
Challenges: Current research relevant to mining complex in community evolution analysis. Existing research has done
and dynamic networks has focused on event-based frame- much to progress the field of study, as noted in the following
works, and variants, as proposed by Asur et al. [4]. However, examples. The modularity-based approach [1], [9] widely used
the existing methods have the following disadvantages. for discovering communities in complex networks, and several
1) Defining an evolutionary event is not straight forward. improved algorithms [10] have been proposed successively,
Given the scenario in which a constraint condition is e.g., Fast-Newman [9], and the Clauset, Newman, and Moore’s
loosely defined, the potential for events to grow quickly algorithm (CNM) [11]. Recently, Hao et al. [12] proposed a
increases, leading to a large number of redundant events. technique which integrates formal concept analysis with the
2) Traditional event-based frameworks work ideally given clique percolation method, and works to improve the accu-
a single type of event in one community over a given racy of community discovery. Mahmood et al. [13] combined
period of time. However, in practice, multiple events complex networks and spatial data mining techniques by map-
often occur within a community simultaneously. In addi- ping network nodes into a geometric space and encoding the
tion, traditional methods are poorly equipped to deal position of each node with its geodesic distances from all the
with weak events, which can be defined as events trig- other nodes. Palla et al. [14] proposed a clique percolation
gered by small changes in the community. These events method that effectively identifies overlapping communities in
are not considered evolutionary due to the strict con- complex networks. Parsa et al. [15] presented a new method
straints of strong events. Finally, traditional methods for detecting communities based on an estimation of distribu-
generally return low accuracy in evolutionary event tion algorithm. Bouguessa et al. [16] aggregated similar nodes
discovery. to form small communities, then iteratively combined these
3) The high time complexity of traditional event-based small communities until a maximum modularity is reached.
frameworks makes it difficult to efficiently implement Lyzinski et al. [17] obtained a low-dimensional representation
discovery in large-scale dynamic complex networks. by mapping networks to an Euclidean space.
Contributions: In an effort to improve the efficiency and Recent research has been focused on predicting the trend
accuracy of traditional event-based frameworks for discovering of network development. The most common approach is the
community evolution, this article focuses on the detection of event-based framework proposed by Ausr et al. [4]. It first
various types of events occurring in the same communities, as detects communities over time, and then mines evolutionary
well as the discovery of comparatively higher quality strong events by comparing overlapping and membership degrees of
events. In this article, a new method for community evolution communities relative to time. Takaffoli et al. [2], [18] for-
analysis in mining dynamic networks is proposed, which is mally defined a series of events for community evolution, and
based on newly discovered weak events. The contributions of proposed a community matching algorithm to identify similar
this article are given as follows. communities. In addition, the concept of “meta-community”
New Concept: We introduce a new concept of weak was proposed, which entails a series of similar commu-
event based on the community overlapping degree and nities with different timestamps. İlhan and Öguducu [19]
membership degree. According to this concept, we refine used the autoregressive integrated moving average model
the events of community evolution, and then improve to predict how particular community features change on
the traditional event-based framework for community the next time horizon. Zhu et al. [20] proposed a multi-
evolution. mode co-clustering approach to detect the hierarchical and
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6231
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6232 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021
TABLE I
D ESCRIPTION OF I MPORTANT S YMBOLS
(a)
(b)
(c)
Ct2 at t. The formal definition of these four weak events is
described in Section III-B2.
Definition 4 (Community Overlapping Degree): Given
p q
community Ct at time t and community Ct+1 at time t + 1,
the community overlapping degree of these two communities
is defined as the proportion of the number of nodes in the
(d) intersection of communities to the number of nodes in the
union of communities, as follows [30]:
Fig. 2. Examples of weak events in community evolution. (a) Weak shrink. p q
(b) Weak expand. (c) Weak split. (d) Weak merge. p q Ct C
O Ct , Ct+1 = p t+1q . (1)
Ct C t+1
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6233
q
highly overlapped and share many similar nodes. This phe- degree between the union set of communities in Y and Ct+1 is
nomenon is called “Remain,” and can be formalized by the very high, then a “merge” event occurs, which is represented
following formula: by the following equation:
p q q
R Ct , Ct+1 = 1 SP Ct+1 , Y = 1
q p q q
iff.∃Ct+1 ∈ Ct+1 , O Ct , Ct+1 ≥ θ (3) O(Y, C ) ≥ ξ
iff i t+1 q (9)
S Ct , Ct+1 ≥ ξ ∀Cti ∈ Y.
where Ct+1 represents a dynamic network at time t + 1.
Definition 7 (Form): If the community overlapping degree In Definitions 6–12, the parameters of θ , γ , and ξ are tuned
p q
between community Ct and community Ct−1 at time t − 1 is by experiments in order to discover as many events as pos-
p
very low, that is, the community Ct at time t has no relation- sible. Different from traditional event-based frameworks, the
ship with other communities at time t−1, then a “Form” event WECEM framework uses these three parameters to control the
occurs. This event can be formalized as occurring conditions of each event.
q p Observation 1: Strong events can be classified into the
F Ct−1 , Ct = 1 following types.
q p q
iff.∀Ct−1 ∈ Ct−1 , O Ct , Ct−1 < θ. (4) 1) Form, Disappear, and Remain involve the existence
events of communities.
Definition 8 (Disappear): If the community overlapping
p q 2) Shrink and Expand events are relevant to the change of
degree between community Ct and community Ct+1 at t + 1
p community sizes.
is very low, Ct at time t has no relationship with other com-
3) Split and Merge involve the change of multiple
munities at time t + 1, then a “Disappear” event occurs, which
communities.
can be modeled as
p q For the aforementioned three kinds of evolutionary events,
D Ct , Ct+1 = 1 if only one parameter as shown in the definition of each
q p q event to control the evolution of communities, it is difficult to
iff.∀Ct+1 ∈ Ct+1 , O Ct , Ct+1 < θ. (5)
accurately detect evolutionary events. Therefore, we use three
p
Definition 9 (Expand): If community Ct at time t belongs parameters θ , γ , and ξ as constraints for events.
p q
to another community at t + 1, denoted by Ct ⊂ Ct+1 , and Theorem 1: The size of the union of two communities is
p q
the number of nodes in Ct is less than that of Ct+1 , we call larger than that of each community, that is
p
Ct “Expands” at time t + 1 by the following formula: p
Ct ∪ Cq ≥ Ctp (10)
p q p t+1
E Ct , Ct+1 = 1 Ct ∪ Cq ≥ Cq . (11)
q q p t+1 t+1
iff.∃Ct+1 ∈ Ct+1 , 1 − γ ≤ S Ct+1 , Ct < 1. (6)
It is worth noting that (1) is used to determine the overlap-
q
Definition 10 (Shrink): If community Ct+1 at time t + 1 ping degree of two communities between adjacent timestamps
p
belongs to community Ct at time t, and the number of nodes and the persistent relationship between two communities.
q p
in Ct+1 is less than that of Ct , then a “Shrink” event occurs By Theorem 1, it is unlikely, and most probably a byprod-
at time t + 1, which is described as follows: uct of randomness, when in real-world situations the union
p q set of two communities can be used to determine the overlap-
SH Ct , Ct+1 = 1 ping degree. Equation (1) takes into account the growth and
p p q
iff.∃Ct ∈ Ct , 1 − γ ≤ S Ct , Ct+1 < 1. (7) shrink of different communities over time.
2) Weak Events: It can be assumed that strong events may
Because γ ∈(0, 1], in order to detect Expand and Shrink
be accompanied by weak events, which can lead to small
events effectively, the left-hand constraints of Expand and
changes in communities. However, it can not be viewed to
Shrink events with respect to S are specified to 1-γ , rather
constitute the change necessary to trigger traditional events as
than γ . While the value of S only changes within a limited
observed by event-based frameworks. At the same time, there
range.
exist some changes that do not satisfy the requirements of
Definition 11 (Split): If there are k(>1) communities
q q+k strong events, despite there being measurable changes in the
X={Ct+1 , . . . , Ct+1 } at time t + 1, and each community in network. This phenomenon is referred to as a “weak event,”
p
X almost belongs to community Ct , and the community over- which can serve as a complement for strong events. A formal
p
lapping degree between the union of communities in X and Ct definition is as follows.
p
is very high, Ct is viewed to “split” into different communities Definition 13 (Weak Shrink): The phenomenon of a slight
as follows: or measurably small shrink of nodes in communities is called
p
SP Ct , X = 1 a weak shrink event. This event occurs at the same time as a
p strong event. Weak shrink events appear in the following three
O C , X ≥ ξ
iff it p (8) scenarios.
S Ct+1 , Ct ≥ ξ ∀Ct+1
i ∈ X. p
1) When a Remain event occurs, community Ct at time t
q
Definition 12 (Merge): If there are many communities belongs to community Ct+1 at time t +1, and the size of
p p+k
Y={Ct , . . . , Ct } at time t, each community in Y almost the intersection of the two communities, at various time
q
belongs to community Ct+1 , and the community overlapping intervals, is less than that of the communities at time t.
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6234 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6235
TABLE II
D ESCRIPTION OF E VENTS
Algorithm 1 Remain, Disappear, Weak Expand, and Weak Algorithm 2 Expand and Shrink Event Detection
Shrink Event Detection Input: The community set Ct at t, and Ct+1 at t+1.
Input: The community set Ct at t, and Ct+1 at t+1. Output: E, SH.
Output: R, WE, WSH, D. 1. for each ct ∈ Ct do
1. for each ct ∈ Ct do 2. for each ct+1 ∈ Ct+1 do
2. ct+1 =find(Ct+1 , O(ct , ct+1 )); 3. if 1-γ ≤ S(ct+1 ,ct )<1 then
3. if ct+1 = ∅ then 4. E=insert(ct );
4. R=insert(ct ); 5. else if 1-γ ≤ S(ct ,ct+1 )<1 then
5. if S(ct , ct+1 )<1 then 6. SH=insert(ct+1 );
6. WSH=insert(ct ); 7. output E, SH.
7. if S(ct+1 , ct )<1 then
8. WE=insert(ct );
Algorithm 3 Split and Its Accompanied Event Detection
9. else
10. D=insert(ct ); Input: The community set Ct at t, and Ct+1 at t+1.
11. output R, WE, WSH, D. Output: SP, WSP, WSH.
1. for each ct ∈ Ct do
2. for each ct+1 ∈ Ct+1 do
3. if S(ct+1 , ct )≥ ξ then
3) If there is no community at time t + 1 similar to a com- 4. C =C ∪ ct+1 ;
munity at t, a Disappear event has occurred (lines 9 5. if O(ct , C )≥ ξ then
and 10). 6. SP=insert(ct );
4) Finally, it outputs identified events (line 11). 7. if S(ct , C )<1 then
8. WSH=insert(ct );
B. Expand and Shrink Event Detection 9. else
10. WSP=insert(ct );
The main steps of Algorithm 2 include as follows. 11. output SP, WSP, WSH.
1) It compares communities at time t with communities
at time t + 1, if the community membership degree of
communities at time t + 1 meets with the communities
at time t in (6), an Expand event occurs (lines 1–4). 3) Finally, it outputs identified events (line 7).
2) If the community membership degree meets
with Equation (7), a Shrink event occurs (lines 5 C. Split and Accompanying Event Detection
and 6). Algorithm 3 includes the following important steps.
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6236 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021
Algorithm 4 Form and Its Accompanied Event Detection 1) It compares a community at time t+1 with all communi-
Input: The community set Ct at t, and Ct+1 at t+1. ties at t. If multiple communities at time t belong to the
Output: F, WSH, WE. community at t + 1, store these communities (lines 1–4).
1. for each ct+1 ∈ Ct+1 do If the community overlapping degree between the union
2. ct =find(ct , O(ct+1 , ct )≥ θ ) of these communities at time t and the community at t+1
3. if ct = ∅ then satisfies the merging condition, a Merge event occurs
4. F=insert(ct+1 ); (lines 5 and 6); otherwise, a Weak Merge event does
5. if S(ct+1 , ct )≥ θ then occur (lines 9 and 10). For communities with Merge
6. WSH=insert(ct ); events, if the union set of these communities does not
7. if S(ct , ct+1 )≥ θ then belong to the community at time t, a Weak Expand event
8. WE=insert(ct ); occurs (lines 7 and 8).
9. output F, WSH, WE. 2) Finally, it outputs identified events (line 11).
Algorithm 5 Merge and Its Accompanying Events Detection F. Algorithm Complexity Analysis
Input: The community set Ct at t, and Ct+1 at t+1. For a network G(V, E) with n nodes and m edges,
Output: M, WM, WE. in Algorithms 1–5, each algorithm visits all nodes in the
1. for each ct+1 ∈ Ct+1 do network at adjacent timestamps, after which they determine
2. for each ct ∈ Ct do whether the number of nodes changes, in order to deter-
3. if S(ct , ct+1 )≥ ξ then mine which event happens, similar to visiting the Cartesian
4. C =C ∪ ct ; product of nodes at adjacent timestamps. Therefore, the time
5. if O(ct+1 , C )≥ ξ then complexity of Algorithms 1–5 is O(n2 ).
6. M=insert(ct+1 );
7. if S(C , ct+1 )<1 then
8. WE=insert(ct+1 ); V. E XPERIMENTS
9. else A. Experimental Setup
10. WM=insert(ct+1 );
In order to verify the accuracy and efficiency of the
11. output M, WM, WE.
proposed community evolution analysis framework, we con-
duct experiments using real data as well as large-scale syn-
thetic network data sets, including: 1) two types of synthetic
1) It compare communities at time t with all communities dynamic networks generated by the data generator [31] and
at time t + 1, if a community at time t + 1 belongs to the 2) real dynamic networks, including DBLP data set [32] and
community at time t, stores this community (lines 1–4). Facebook data set from New Orleans in 2008 [33]. The details
2) If the community overlapping degree between the union of these data are shown in Table III.
set of these communities and the community at time t The first type of synthetic data is generated by the dynamic
satisfies the condition of Split, a Split event occurs (lines network D3, with parameters listed in Table III(a), without
5 and 6). If the community membership degree between specifying the number of evolutionary events. The second type
the union of these communities and the community at of synthetic data is generated by the dynamic networks D1 and
time t is very small, a Weak Shrink event occurs (lines D2 in Table III(a), in order to estimate the correctness of the
7 and 8). WECEM framework, where the D1 dataset generates 50 Form
3) Otherwise, Weak Split occurs (lines 9 and 10). and Disappear events, 10 Merge and Split events, 50 Shrink
4) Finally, it outputs identified events (line 11). and Expand events, while the D2 dataset is specified to have
200 Form and Disappear events, 50 Split and Merge events,
D. Form and Accompanying Event Detection and 200 Shrink and Expand events.
As we can see from Table III(a), for the synthetic dynamic
The main steps of Algorithm 4 include as follows.
network datasets D1, D2, and D3, the dynamic community
1) For each community at time t + 1, if there is no commu-
evolution events were analyzed across 5 time steps, and the
nity at time t satisfying the forming condition, a Form
time steps is determined based on the following rules: The
event occurs (lines 1–4).
networks began at t = 1 with around 400 embedded commu-
2) When a Form event occurs, if the community member-
nities, which were constrained to have sizes between [20, 60].
ship degree of ct+1 to ct is bigger than θ , a Weak Shrink
In these three synthetic datasets, twenty percent of node mem-
event occurs (lines 5 and 6). If the community member-
berships were randomly permuted at each step to simulate
ship degree of ct to ct+1 is bigger than θ , a Weak Expand
users’ movement across communities over time. Then, events
event occurs (lines 7 and 8).
were added by the generator. As for the real DBLP dataset
3) Finally, it outputs identified events (line 9).
in Table III(b), the number of time steps for community evo-
lution analysis is 5 (years). Similarly, for the real Facebook
E. Merge and Accompanying Events Detection dataset in Table III(c), the number of time steps for community
The main steps of Algorithm 5 are as follows. evolution analysis is 12 (months).
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6237
TABLE III
D ESCRIPTION OF E XPERIMENTAL DATA S ETS . (a) S YNTHETIC DYNAMIC EMA is used to evaluate the accuracy of each framework for
N ETWORK DATA . (b) DBLP N ETWORK DATA . detecting evolutionary events.
(c) FACEBOOK N ETWORK DATA
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6238 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021
Fig. 4. Number of discovered events and EMA by the WECEM framework with different θ , γ and ξ values. (a) Number of discovered events as θ changes.
(b) EMA as θ changes. (c) Number of discovered events as γ changes. (d) EMA as γ changes. (e) Number of discovered events as ξ changes. (f) EMA as
ξ changes.
number of communities meeting the constraints of overlapping Form and Disappear events are larger than that of other
degree and membership degree decreases, and the correspond- events.
ing events will reduce gradually. When 0 < ξ ≤ 0.6, as 2) As shown in Table IV(a), although there are many
ξ grows, the number of communities grows which satisfies weak events in the DBLP network, the change of nodes
the requirement of membership degree without satisfying the and edges relevant to weak events is so small, which
requirement of community overlapping degree, thus the num- does not cause a qualitative change in the community.
bers of Weak Split and Weak Merge events grow gradually. Additionally, because the weak events are accompanied
When 0.6 < ξ ≤ 1, these two conditions cannot be met, so the with strong events, even weak events overlap with strong
changing trends of Weak Split and Weak Merge events (Split events, discovering weak events can help accurately
and Merge events) are nearly the same. predict the variation tendency of network structures.
According to Fig. 4(f), when 0< ξ <0.7, EMA increases However, Asur, Takaffoli, and other event-based frame-
with ξ . When 0.7< ξ <1, EMA degrades with ξ . When ξ works does not work for identifying weak events, while
is small, with ξ growing, the number of discovered events is they only focus on strong events which are easy to
approximate to the number of actually occurred events. When be found. Actually, the phenomenon of overlapping
ξ is specified to a large value, the number of redundant events events rarely appears, and it is difficult for traditional
grows, which leads to the decrease of EMA. Based on the event-based frameworks to detect events with a slowly
above discussion, in order to accurately identify evolutionary changing tendency in dynamic networks.
events, ξ is set to 0.6 in experiments. 3) By comparing Table IV(a) with Table IV(b), the num-
bers of events detected by WECEM are larger than that
mined by the Asur framework and WECEM outperforms
C. Quantity Analysis of Detected Events traditional frameworks by 22.9% in the number of dis-
In this section, we compare the number of correctly detected covered strong events. Because the definition of events
events among different community detection frameworks. by Asur is strict, which makes the Asur framework dif-
1) Quantity Comparison of Detected Events on DBLP: ficult to detect events. Taking the Remain event as an
Table IV shows the number of detected events on the DBLP example, it requires the number of nodes in a commu-
data set, and the following observations can be drawn. nity should be exactly the same at adjacent timestamps.
1) For each framework, the numbers of Form and Moreover, as for Form events, there should be no similar
Disappear events are the largest compared with other nodes in a community at time t and t + 1.
events. This is because DBLP is a coauthor network, 4) By comparing Table IV(a) with Table IV(c), we can
with some scholars publishing papers and other scholars find that the number of events identified by WECEM
may not publishing papers each year. Consequently, sev- is almost the same as Takaffoli, since WECEM uses
eral small communities are formed, thus the numbers of multiple parameters to deal with different events. In
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6239
TABLE IV
Q UANTITY C OMPARISON ON DBLP DATASETS . (a) N UMBER OF E VENTS D ETECTED BY WECEM. (b) N UMBER OF E VENTS D ETECTED BY A SUR .
(c) N UMBER OF E VENTS D ETECTED BY TAKAFFOLI
(a)
(b) (c)
addition, community overlapping degree and member- during a short period of time, there are a large number of weak
ship degree works to accurately detect evolutionary events because of the slow changes in the number of nodes
events. For a small scale of events, Takaffoli can dis- and edges. Finally, the frequent occurrences of weak events
cover more events than WECEM, because Takaffoli result in strong events.
focuses on the detection of multiple networks before and
after the time slice, the event that meets the condition D. Event Detection Quality Analysis
is identified, and the corresponding event is considered For the D1 and D2 datasets, we generate a fixed num-
to have occurred. ber of events. The EMA measurement is used to verify the
2) Quantity Comparison of Detected Events on D3 accuracy of event detection by comparing the number of dis-
Data: Table V shows the number of events detected by each covered events with the actual number of events on these three
framework. The most important difference from the D3 data frameworks. It is noteworthy that we cannot manually spec-
set to the D1 and D2 data sets lies in we do not need to ify the number of events in community evolution on DBLP
manually specify the number of events, which is approximate and Facebook datasets that are from real networks. It is dif-
to the real-world dynamic network structures. ficult to accurately estimate the accuracy of event mining, so
According to [31], at every moment, nodes are selected we conduct experiments on the D1 and D2 datasets and the
from the D3 synthetic data set to simulate the changes of the experimental results are shown in Fig. 5.
network. This kind of network evolves in a continuous and Fig. 5 shows the accuracy of event detection on the D1 and
stable fashion, which is similar to the real dynamic network. D2 datasets by each framework. Given that Asur and Takaffoli
Consequently, the numbers of events discovered by these three do not define Shrink and Expand events, we only show the
frameworks over time are relatively stable. results about Shrink and Expand events from WECEM. The
As shown in Table V, the Asur framework cannot handle following conclusions are made by Fig. 5.
large-scale synthetic network data because the node changes 1) WECEM is more accurate than Asur and Takaffoli in
in each time slice of the simulated network are extracted pro- discovering Form and Disappear events that occur in
portionally from the last moment, and many communities have various communities. The reason behind is that although
small changes, which is not sufficient to satisfy the definition Asur can guarantee the accuracy of event detection, it
of events in the Asur framework, resulting in zero for each is too strict to define different events and the number of
event in the Asur framework. On the contrary, Tables V(a) events discovered by the Asur framework is very small,
and (c) show that the WECEM framework discovers the sim- so its EMA is very low. On the other hand, the Takaffoli
ilar numbers of Remain, Form and Disappear events to the framework considers the change of the number of nodes
Takaffoli framework, which verifies the effectiveness of the at all timestamps when mining events and the change
WECEM framework. For the Split and Merge events, Takaffoli of communities in synthetic networks are regular, the
discovers more events than WECEM. Because the values of redundant nodes mined by the Takaffoli framework are
parameters in WECEM is set to be small, which results in higher than that of the WECEM framework.
many redundant events. On the other hand, the Takaffoli 2) As for the Split and Merger events, the WECEM event
framework needs to compare the number of nodes at each discovery accuracy is lower than that of the Asur frame-
time slice in event detection. work. This can be explained by the fact that we use
In Tables IV(a) and V(a), there are a large number of weak the data generator designed by Greene et al. [31] to
events on each data set. We can conclude that weak events are generate dynamic synthetic network datasets in order to
very common in complex networks. For two networks evolving mine Split and Merge events, and this data generator
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6240 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021
TABLE V
Q UANTITY C OMPARISON OF E VENTS ON D3 DATASETS . (a) N UMBER OF E VENTS D ETECTED BY WECEM. (b) N UMBER OF E VENTS D ETECTED BY
A SUR . (c) N UMBER OF E VENTS D ETECTED BY TAKAFFOLI
(a)
(b) (c)
(a) (b)
Fig. 5. Event detection accuracy comparison on different datasets. (a) EMA on D1 dataset. (b) EMA on D2 dataset.
is developed based on Asur framework. On one hand, 3) EMA of WECEM is averagely 2.13% higher than that
WECEM can identify all of Split and Merge events due of the Asur framework and 12.2% higher than that of
to the relaxed definition of evolutionary events when the Takaffoli framework. The accuracy of WECEM is
compared to the Asur framework. On the other hand, higher than that of the Asur framework. This is because
more redundant events will be found by the WECEM WECEM uses multiple parameters to distribute different
framework, which leads to the lower accuracy of dis- kinds of events in order to avoid the overlap of events
covering Merger and Split events compared with the as well as reduce the redundancy rate of mining events.
Asur framework. Although some of the events found by
WECEM are redundant, these communities with these
events actually have changed in the network, which E. Efficiency Analysis of Detecting Events
mainly constitutes weak events. Similarly, the Takaffoli In this section, we compare the execution time of each
framework has a lower accuracy and higher redun- framework on these five datasets, including DBLP, Facebook,
dancy rate than the WECEM framework, although the D1, D2, and D3. The experimental results are shown in Fig. 6,
Takaffoli framework can identify all Split and Merge where the x-axis represents the time interval of two adjacent
events. The reasons of its higher redundancy are twofold: timestamps. As shown in Fig. 6(e), the y-axis represents the
a) the synthetic network data agrees with some regu- execution time of each framework.
larity without considering the distribution of events at As demonstrated in Fig. 6, with the number of nodes grow-
each timestamp and b) the Takaffoli framework uses ing, the runtime of these three frameworks increases gradually,
the uniform parameter k to control community sim- and the following conclusions can be made.
ilarity and discover events in the network, whereas 1) According to Fig. 6(a), (c), (d), and (e), the Asur frame-
WECEM takes into account three parameters, which work runs first on the DBLP, D1, D2, and D3 datasets,
plays important roles in accurately mining evolutionary followed by the Takaffoli framework, with the WECEM
events. framework being the lowest one. This can be explained
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6241
(d) (e)
Fig. 6. Execution time comparison of different frameworks on different datasets. (a) DBLP. (b) Facebook. (c) D1. (d) D2. (e) D3.
by the reason that WECEM needs to simultaneously community membership degree, the WECEM framework first
identify 11 kinds of events composed of strong events compares each community at consecutive timestamps, respec-
and weak events, while the Asur framework and the tively, and then discovers different events based on these two
Takaffoli framework only need to mine five kinds of measurements. The experimental results have indicated that the
elementary events. WECEM framework is effective at mining events. Particularly,
2) From Fig. 6(b), the most efficient framework is Asur, WECEM can discover weak events which cannot be handled
followed by WECEM, and the slowest one is Takaffoli by other frameworks. In addition, the experimental results
on the Facebook dataset. Because the Facebook datasets have also shown that the WECEM framework is effective
have a large number of nodes involved over several at detecting strong as well as weak events. As for mining
timestamps, the Takaffoli framework detects every event large-scale dynamic networks, the advantage of WECEM is
by comparing the number of nodes over all timestamps. apparent, since it can detect small changes in the network.
However, WECEM and Asur only need to compare Given that WECEM needs much time to mine several kinds
the number of nodes at adjacent timestamps. So the of events, its efficiency is less than the traditional frameworks
time complexity of Takaffoli is higher than WECEM in some cases. In our future work, we will continue to improve
and Asur. The efficiency of Asur is higher than that the accuracy of event mining by reducing redundant events.
of WECEM, as Asur framework does not need to Because traditional serial community evolution analysis meth-
detect Shrink and Expand events and it cannot detect ods cannot handle a big network data, and we will parallel the
weak events as well. For Facebook datasets, the effi- WECEM framework to mine larger complex networks with a
ciency of WECEM is 48.83% less than Asur and is huge number of nodes and complex relationships.
67.73% higher than Takaffoli. In summary, the proposed
WECEM framework can deal with large-scale dynamic
R EFERENCES
networks over several timestamps, which is more flexible
and generic than the Takaffoli framework. [1] M. E. J. Newman and M. Girvan, “Finding and evaluating community
structure in networks,” Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat.
Interdiscip. Top., vol. 69, no. 2, 2004, Art. no. 026113.
[2] M. Takaffoli, F. Sangi, J. Fagnan, and O. R. Zaiane, “Community evo-
VI. C ONCLUSION lution mining in dynamic social networks,” Soc. Behav. Sci., vol. 22,
no. 22, pp. 49–58, 2011.
In this article, we have explored the fundamental principle [3] S. Qiao, N. Han, J. Wang, R.-H. Li, L. A. Gutierrez, and X. Wu,
and working mechanism of the WECEM framework for weak “Predicting long-term trajectories of connected vehicles via the prefix-
event mining in the community evolution of dynamic com- projection technique,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 7,
pp. 2305–2315, Jul. 2018.
plex networks. WECEM classifies events into strong events [4] S. Asur, S. Parthasarathy, and D. Ucar, “An event-based framework for
and weak events. Two measurements, community overlap- characterizing the evolutionary behavior of interaction graphs,” ACM
ping degree, and community membership degree, are used to Trans. Knowl. Disc. Data, vol. 3, no. 4, pp. 1–36, 2009.
[5] S. Qiao et al., “A fast parallel community discovery model on complex
determine the continuity of dynamic communities in complex networks through approximate optimization,” IEEE Trans. Knowl. Data
networks. To calculate the community overlapping degree and Eng., vol. 30, no. 9, pp. 1638–1651, Sep. 2018.
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6242 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021
[6] S. Y. Bhat and M. Abulaish, “HOCTracker: Tracking the evolution of [30] J. Rao, H. Du, X. Yan, and C. Liu, “Detecting overlapping community
hierarchical and overlapping communities in dynamic social networks,” in social networks based on fuzzy membership degree,” in Proc. 5th Int.
IEEE Trans. Knowl. Data Eng., vol. 27, no. 4, pp. 1019–1031, Conf. Comput. Soc. Netw., 2016, pp. 99–110.
Apr. 2015. [31] D. Greene, D. Doyle, and P. Cunningham, “Tracking the evolution of
[7] S. Qiao, D. Shen, X. Wang, N. Han, and W. Zhu, “A self-adaptive communities in dynamic social networks,” in Proc. IEEE Int. Conf. Adv.
parameter selection trajectory prediction approach via hidden Markov Soc. Netw. Anal. Min., 2011, pp. 176–183.
models,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 1, pp. 284–296, [32] Datatang. (2017). DBLP Co-Author Data Sets. Accessed: Apr. 27, 2017.
Feb. 2015. [Online]. Available: http://www.datatang.com
[8] S. Qiao, N. Han, W. Zhu, and L. A. Gutierrez, “TraPlan: An effec- [33] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi, “On the evolu-
tive three-in-one trajectory-prediction model in transportation networks,” tion of user interactions in Facebook,” in Proc. ACM Workshop Online
IEEE Trans. Intell. Transp. Syst., vol. 16, no. 3, pp. 1188–1198, Soc. Netw., 2009, pp. 37–42.
Jun. 2015.
[9] A. Clauset, M. E. J. Newman, and C. Moore, “Finding community struc-
ture in very large networks,” Phys. Rev. E, Stat. Phys. Plasmas Fluids
Relat. Interdiscip. Top., vol. 70, no. 2, 2004, Art. no. 066111.
[10] M. E. J. Newman, Networks: An Introduction. Oxford, U.K.: Oxford
Univ. Press, 2010. Shaojie Qiao received the B.S. and Ph.D. degrees
[11] A. Clauset, “Finding local community structure in networks,” Phys. Rev. in computer science from Sichuan University,
E, Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 72, no. 2, Chengdu, China, in 2004 and 2009, respectively.
2005, Art. no. 026132. From 2007 to 2008, he worked as a Visiting
[12] F. Hao, G. Min, Z. Pei, D.-S. Park, and L. T. Yang, “K-clique community Scholar with the School of Computing, National
detection in social networks based on formal concept analysis,” IEEE University of Singapore, Singapore. He is currently a
Syst. J., vol. 11, no. 1, pp. 250–259, Mar. 2017. Professor with the School of Software Engineering,
[13] A. Mahmood, M. Small, S. A. Al-Maadeed, and N. Rajpoot, “Using Chengdu University of Information Technology,
geodesic space density gradients for network community detection,” Chengdu. He has authored more than 40 high qual-
IEEE Trans. Knowl. Data Eng., vol. 29, no. 4, pp. 921–935, Apr. 2017. ity papers, and coauthored more than 90 papers. His
[14] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, “Network community research interests include community discovery and
detection based on spectral clustering,” Nature, vol. 435, no. 7043, artificial intelligence.
pp. 814–818, 2005.
[15] M. G. Parsa, N. Mozayani, and A. Esmaeili, “An EDA-based com-
munity detection in complex networks,” in Proc. IEEE 7th Int. Symp.
Telecommun., 2014, pp. 476–480.
[16] M. Bouguessa, R. Missaoui, and M. Talbi, “A novel approach for detect-
ing community structure in networks,” in Proc. IEEE 26th Int. Conf. Nan Han received the M.S. and Ph.D. degrees in sci-
Tools Artif. Intell., 2014, pp. 469–477. ence of prescriptions from the Chengdu University
[17] V. Lyzinski, M. Tang, A. Athreya, Y. Park, and C. E. Priebe, of Traditional Chinese Medicine, Chengdu, China,
“Community detection and classification in hierarchical stochastic block- in 2009 and 2012, respectively.
models,” IEEE Trans. Netw. Sci. Eng., vol. 4, no. 1, pp. 13–26, She is an Associate Professor with the School
Jan.–Mar. 2017. of Management, Chengdu University of Information
[18] M. Takaffoli, R. Rabbany, and O. Zaiane, “Community evolution Technology, Chengdu. She has authored of more
prediction in dynamic social networks,” in Proc. IEEE/ACM Int. Conf. than 20 papers and she has participated in several
Adv. Soc. Netw. Anal. Min., 2014, pp. 9–16. projects supported by the National Natural Science
Foundation of China. Her research interests include
[19] N. İlhan and Ş. G. Öguducu, “Predicting community evolution based on
complex networks and data mining.
time series modeling,” in Proc. IEEE/ACM Int. Conf. Adv. Soc. Netw.
Anal. Min., 2015, pp. 1509–1516.
[20] W. Zhu, D. Zhang, X. Zhou, D. Yang, and Z. Yu, “Discovering and
profiling overlapping communities in location-based social networks,”
IEEE Trans. Syst., Man, Cybern., Syst., vol. 44, no. 4, pp. 499–509,
Apr. 2017.
Yunjun Gao (Member, IEEE) received the
[21] E. G. Tajeuna, M. Bouguessa, and S. Wang, “Tracking the evolution of Ph.D. degree in computer science from Zhejiang
community structures in time-evolving social networks,” in Proc. IEEE University, Hangzhou, China, in 2008.
Int. Conf. Data Sci. Adv. Anal., 2015, pp. 1–10. He is currently a Professor with the College
[22] T. Falkowski, A. Barth, and M. Spiliopoulou, “Studying community of Computer Science and Technology, Zhejiang
dynamics with an incremental graph mining algorithm,” in Proc. 14th University. His research interests include spatial and
Americas Conf. Inf. Syst., 2008, pp. 1–11. spatio-temporal databases and spatio-textual data
[23] Y. Wang, B. Wu, and N. Du, “Community evolution of social network: processing.
Feature algorithm and model,” Phys. Soc., vol. 804, p. 4356, Apr. 2008. Prof. Gao is a member of ACM and a Senior
[24] J. Zhang, Y. Zhu, and Z. Chen, “Evolutionary game dynamics of Member of CCF.
multiagent systems on multiple community networks,” IEEE Trans.
Syst., Man, Cybern., Syst., to be published.
[25] Y. Liu, H. Gao, X. Kang, Q. Liu, R. Wang, and Z. Qin, “Fast community
discovery and its evolution tracking in time-evolving social networks,” in
Proc. IEEE Int. Conf. Data Min. Workshop (ICDMW), 2015, pp. 13–22.
[26] P. Wang, J. Lü, and X. Yu, “Identification of important nodes in directed Rong-Hua Li received the Ph.D. degree in computer
biological networks: A network motif approach,” PLoS ONE, vol. 9, science from the Chinese University of Hong Kong,
no. 8, 2014, Art. no. e106132. Hong Kong, in 2013.
[27] P. Wang, C. Yang, H. Chen, Q. Leng, S. Li, and D. Wang, “Exploring He is currently an Associate Professor with
transcription factors reveals crucial members and regulatory networks the School of Computer Science and Technology,
involved in different abiotic stresses in Brassica Napus L,” BMC Plant Beijing Institute of Technology, Beijing, China, also
Biol., vol. 18, p. 202, Sep. 2018. with the Guangdong Province Key Laboratory of
[28] P. Wang, J. Lü, X. Yu, and Z. Liu, “Duplication and divergence effect Popular High Performance Computers, Shenzhen
on network motifs in undirected bio-molecular networks,” IEEE Trans. University, Shenzhen, China, and also with the
Biomed. Circuits Syst., vol. 9, no. 3, pp. 312–320, Jun. 2015. Guangdong Province Engineering Center of China-
[29] P. Wang, Y. Chen, J. Lu, Q. Wang, and X. Yu, “Graphical features of made High Performance Data Computing System,
functional genes in human protein interaction network,” IEEE Trans. Shenzhen University. His research interests include social network analysis
Biomed. Circuits Syst., vol. 10, no. 3, pp. 707–720, Jun. 2016. and graph data management.
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6243
Jianbin Huang received the Ph.D. degree in pat- Xindong Wu (Fellow, IEEE) received the Ph.D.
tern recognition and intelligent systems from Xidian degree in artificial intelligence from the University
University, Xi’an, China, in 2007. of Edinburgh, Edinburgh, U.K., in 1993.
He is currently a Professor with the School He is the President of Mininglamp Academy of
of Computer Science and Technology, Xidian Sciences, Mininglamp Technology, Beijing, China,
University. His research interests are in data mining and the Professor with the Key Laboratory of
and knowledge discovery. Knowledge Engineering with Big Data, Hefei
University of Technology, Ministry of Education,
Hefei, China. His research interests include data
mining and knowledge-based systems.
Prof. Wu is a fellow of the AAAS.
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.