0% found this document useful (0 votes)
5 views

Dynamic_Community_Evolution_Analysis_Framework_for_Large-Scale_Complex_Networks_Based_on_Strong_and_Weak_Events

The document presents a new framework for analyzing community evolution in large-scale complex networks, focusing on both strong and weak events. The proposed weak-event-based community evolution method (WECEM) addresses limitations of traditional approaches by effectively identifying weak events and refining the detection of strong events, resulting in improved accuracy and efficiency. Experimental results demonstrate that WECEM outperforms existing frameworks in discovering evolutionary events, particularly in dynamic environments such as online social networks.

Uploaded by

yuvaraj.m2023a
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Dynamic_Community_Evolution_Analysis_Framework_for_Large-Scale_Complex_Networks_Based_on_Strong_and_Weak_Events

The document presents a new framework for analyzing community evolution in large-scale complex networks, focusing on both strong and weak events. The proposed weak-event-based community evolution method (WECEM) addresses limitations of traditional approaches by effectively identifying weak events and refining the detection of strong events, resulting in improved accuracy and efficiency. Experimental results demonstrate that WECEM outperforms existing frameworks in discovering evolutionary events, particularly in dynamic environments such as online social networks.

Uploaded by

yuvaraj.m2023a
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO.

10, OCTOBER 2021 6229

Dynamic Community Evolution Analysis


Framework for Large-Scale Complex Networks
Based on Strong and Weak Events
Shaojie Qiao , Nan Han, Yunjun Gao , Member, IEEE, Rong-Hua Li , Jianbin Huang , Heli Sun,
and Xindong Wu , Fellow, IEEE

Abstract—Community evolution remains a heavily researched proposed and denoted by a weak-event-based community evolu-
and challenging area in the analysis of dynamic complex network tion method (WECEM). The framework can be summarized by
structures. Currently, the primary limitation of traditional event- the following: 1) communities in complex networks with adjacent
based approaches for community evolution analysis is the lack time-stamps are compared to determine the community overlap-
of strict constraint conditions for distinguishing evolutionary ping degree and community membership degree; 2) the values of
events, which entails that as the cardinality of discovered events the community overlapping degree and membership degree meet
increases, so does the number of redundant events. Another lim- the definition of events; and 3) weak events are effectively iden-
itation of existing approaches is the lack of consideration for tified. Extensive experimental results, on real and synthetic data
weak events. Weak events can be generated by small changes in sets consisting of dynamic complex networks and online social
communities, which are empirically prevalent, and are typically networks, demonstrate that WECEM is able to identify weak
not captured by traditional events. To manage these two afore- events more effectively than traditional frameworks. Specifically,
mentioned limitations, this research aims to formalize a weak WECEM outperforms traditional frameworks by 22.9% in the
and strong events-based framework, which includes the following number of discovered strong events. The detection accuracy of
newly discovered events: “weak shrink,” “weak expand,” “weak evolutionary events is approximately 12.2% higher than that
merge,” and “weak splity” predicated on the community over- of traditional event-based frameworks. It is also worth noting
lapping degree and community degree membership, this article that, as the cardinality of the data grows, the proposed frame-
refines these traditional strong events, as well as new constraints work, when compared with traditional frameworks, can more
for weak events. In addition, a community evolution mining effectively, and efficiently, mine large-scale complex networks.
framework, which is based on both strong and weak events, is
Index Terms—Community detection, community evolution
analysis, complex networks, event-based framework, weak events.

Manuscript received March 28, 2019; revised September 23, 2019; accepted
December 12, 2019. Date of publication January 7, 2020; date of current ver-
sion September 16, 2021. This work was supported in part by the National I. I NTRODUCTION
Natural Science Foundation of China under Grant 61772091, Grant 61802035,
OMPLEX networks are distinct from simple networks,
Grant 61962006, and Grant 71701026, in part by the Sichuan Science
and Technology Program under Grant 2018JY0448, Grant 2019YFG0106,
and Grant 2019YFS0067, in part by the Natural Science Foundation of
Guangxi under Grant 2018GXNSFDA138005, and in part by the Guangdong
C such as lattices or random graphs, such that they have
nontrivial topological features. These complex networks are
Province Key Laboratory of Popular High Performance Computers under ubiquitous in our everyday lives, and examples include online
Grant 2017B030314073. This article was recommended by Associate Editor social networks, such as Facebook and Twitter. Community
J. Lu. (Corresponding author: Nan Han.)
Shaojie Qiao is with the School of Software Engineering, Chengdu structures are generally inherent in complex networks, and
University of Information Technology, Chengdu 610225, China. can be best exemplified by groups of nodes in which the
Nan Han is with the School of Management, Chengdu University network connections are dense, but between which connec-
of Information Technology, Chengdu 610103, China (e-mail:
[email protected]). tions are sparser [1]. Community structures, representing a
Yunjun Gao is with the College of Computer Science and Technology, mesoscale structure of networks, accordingly, are viewed as
Zhejiang University, Hangzhou 310027, China. one of the most important characteristics of complex networks.
Rong-Hua Li is with the School of Computer Science and Technology,
Beijing Institute of Technology, Beijing 100081, China, also with the Such structures provide immense social and economic value
Guangdong Province Key Laboratory of Popular High Performance in understanding the processes of network formation, growth
Computers, Shenzhen University, Shenzhen 518060, China, and also with the and shrinkage, information dissemination and public opinion
Guangdong Province Engineering Center of China-made High Performance
Data Computing System, Shenzhen University, Shenzhen 518060, China. analysis. Dynamic analysis of complex networks, especially
Jianbin Huang is with the School of Computer Science and Technology, assessing the evolution of communities, can provide insights
Xidian University, Xi’an 710071, China. into: 1) detecting a drastic change in the interaction patterns;
Heli Sun is with the School of Computer Science and Technology, Xi’an
Jiaotong University, Xi’an 710049, China. 2) understanding the latent structures of complex networks;
Xindong Wu is with the Mininglamp Academy of Sciences, Mininglamp and 3) forecasting the future trends of networks [2], [3].
Technology, Beijing 100084, China, and also with the Key Laboratory of Motivation: In real-world applications, community struc-
Knowledge Engineering with Big Data, Hefei University of Technology, Hefei
230009, China. tures represent a dynamically changing phenomenon.
Digital Object Identifier 10.1109/TSMC.2019.2960085 Accordingly, communities are in constant flux: growing,
2168-2216 
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6230 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021

shrinking, emerging, and disappearing all together. Examples New Framework: Evolutionary events are categorized into
include human migration in social networks, seasonal animal strong and weak events, and the application scenarios of weak
migrations, and topic transfers in blogs. Due to the cardinality events are introduced. Weak events are formally defined, as
of nodes in dynamic complex networks, static community well as a method for determining such events. In addition, a
discovery approaches cannot be effectively applied to analyze weak event-based mining framework is proposed, referred as
this evolution of communities. Thus, it follows that research WECEM.
in this area remains fundamentally important in precision Extensive Experimental Results: Extensive experiments are
marketing, crime prevention, traffic flow forecasting, and conducted in real and synthetic large-scale dynamic networks.
network congestion prediction. The following example We compare the proposed WECEM framework with state-of-
illustrates the importance of research in community evolution. the-art community evolution discovery methods to verify the
Example 1: Sina microblog is the largest blogging system quality, mining accuracy, and runtime performance.
in China. The distribution of the userbase is in constant flux,
with users joining and leaving on a regular basis. This dynamic
is indicative of a constantly changing network structure. In II. R ELATED W ORK
addition, the userbase represents a multitude of interests on In recent years, due to the popularity and ubiquity of online
a vast array of topics, which in itself contributes to the ever- social networks and large-scale complex networks, community
changing structure. Mining communities with such a userbase discovery, and evolutionary event mining have attracted a lot of
can provide insight and understanding into how the networks attention [5]–[8]. Most specifically, dynamic networks operate
change, and identify users’ points of interests. This knowl- as a powerful signal for forecasting the behavior of individu-
edge can then be applied to social networking platforms to als, route planning, personalized recommendations, and so on.
recommend services across various communities. Within this field, community detection remains fundamental
Challenges: Current research relevant to mining complex in community evolution analysis. Existing research has done
and dynamic networks has focused on event-based frame- much to progress the field of study, as noted in the following
works, and variants, as proposed by Asur et al. [4]. However, examples. The modularity-based approach [1], [9] widely used
the existing methods have the following disadvantages. for discovering communities in complex networks, and several
1) Defining an evolutionary event is not straight forward. improved algorithms [10] have been proposed successively,
Given the scenario in which a constraint condition is e.g., Fast-Newman [9], and the Clauset, Newman, and Moore’s
loosely defined, the potential for events to grow quickly algorithm (CNM) [11]. Recently, Hao et al. [12] proposed a
increases, leading to a large number of redundant events. technique which integrates formal concept analysis with the
2) Traditional event-based frameworks work ideally given clique percolation method, and works to improve the accu-
a single type of event in one community over a given racy of community discovery. Mahmood et al. [13] combined
period of time. However, in practice, multiple events complex networks and spatial data mining techniques by map-
often occur within a community simultaneously. In addi- ping network nodes into a geometric space and encoding the
tion, traditional methods are poorly equipped to deal position of each node with its geodesic distances from all the
with weak events, which can be defined as events trig- other nodes. Palla et al. [14] proposed a clique percolation
gered by small changes in the community. These events method that effectively identifies overlapping communities in
are not considered evolutionary due to the strict con- complex networks. Parsa et al. [15] presented a new method
straints of strong events. Finally, traditional methods for detecting communities based on an estimation of distribu-
generally return low accuracy in evolutionary event tion algorithm. Bouguessa et al. [16] aggregated similar nodes
discovery. to form small communities, then iteratively combined these
3) The high time complexity of traditional event-based small communities until a maximum modularity is reached.
frameworks makes it difficult to efficiently implement Lyzinski et al. [17] obtained a low-dimensional representation
discovery in large-scale dynamic complex networks. by mapping networks to an Euclidean space.
Contributions: In an effort to improve the efficiency and Recent research has been focused on predicting the trend
accuracy of traditional event-based frameworks for discovering of network development. The most common approach is the
community evolution, this article focuses on the detection of event-based framework proposed by Ausr et al. [4]. It first
various types of events occurring in the same communities, as detects communities over time, and then mines evolutionary
well as the discovery of comparatively higher quality strong events by comparing overlapping and membership degrees of
events. In this article, a new method for community evolution communities relative to time. Takaffoli et al. [2], [18] for-
analysis in mining dynamic networks is proposed, which is mally defined a series of events for community evolution, and
based on newly discovered weak events. The contributions of proposed a community matching algorithm to identify similar
this article are given as follows. communities. In addition, the concept of “meta-community”
New Concept: We introduce a new concept of weak was proposed, which entails a series of similar commu-
event based on the community overlapping degree and nities with different timestamps. İlhan and Öguducu [19]
membership degree. According to this concept, we refine used the autoregressive integrated moving average model
the events of community evolution, and then improve to predict how particular community features change on
the traditional event-based framework for community the next time horizon. Zhu et al. [20] proposed a multi-
evolution. mode co-clustering approach to detect the hierarchical and

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6231

overlapping communities in location-based social networks.


Tajeuna et al. [21] formulated the number of nodes shared
between two communities as a matrix. This approach could
efficiently track the changes of communities during evolution.
Falkowski et al. [22] proposed an incremental graph min-
ing algorithm based on the idea of static density clustering,
which partitions evolutionary events into positive and nega-
tive changes. The method can discover detailed information
about evolutionary events. Wang et al. [23] proposed a new
method to calculate the importance of core nodes based on
the degrees of nodes. The changes are then compared to core
nodes over adjacent timestamps to determine the evolution
of the given networks. This method uncovered two important
phenomenon: growing and metabolic processes in networks.
Fig. 1. Example of strong events in community evolution.
Zhang et al. [24] studied the evolutionary game dynamics of
multiple community networks. Liu et al. [25] proposed a fast
community evolution tracking model, which uses an improved to change community structures over a comparatively short
PageRank algorithm to find the core nodes in a network. As period of time. Thus, large overarching changes are infrequent,
a result, the evolutionary events can be detected by adding however, this does not imply that small changes are not fre-
nodes and edges into core communities over time. quently occurring during the periods between larger changes.
The disadvantages of the aforementioned community evo- It follows then that if traditional methods are designed only to
lution analysis methods over event-based frameworks can be detect broad, sweeping changes in complex networks, those
summarized as follows. smaller, weak events are not being detected. This is the
1) These methods maintain a strict definition for strong motivation behind our proposed weak-event-based approach.
events, which greatly limits the scope. This leads to
events, which may be equally important, not qualifying
A. Preliminaries
under the given criteria. This is the motivation behind
the proposed weak events in this article. Existing work, Definition 1 (Dynamic Community): Let Ct =
based on the event-based framework, does not take into (vt1 , vt2 , . . . , vtn ) be a community at time t, and v repre-
account the effects of overlapping events, and that of sents the node in Ct . From t to t + m, several events
weak events. may occur in Ct . {Ct , A1 , A2 , . . . , Am } represents a
2) The computational complexity of the event-based frame- dynamic community, denoted by Ct:t+m . The community
work is extremely costly, most notably being the Ct+m = Ct + A1 + A2 + · · · + Am , in which A1 ∼ Am
Takaffoli framework, which requires the calculation of are evolutionary events happening in Ct from time t to t + m.
the difference of each community at all timestamps. With time passing by, the relationships between nodes in
In order to overcome the challenges associated with the tra- the network have gradually changed, forming some new com-
ditional event-based frameworks, this article proposes a new munities with different evolutionary events. Here, we will first
weak event-based community evolution analysis algorithm give the definition of traditional evolutionary events, which is
based on strong and weak events. In this article, the algo- called “strong event.”
rithm is referred to as WECEM. In discovery, WECEM takes Definition 2 (Strong Event): The nodes in the network inter-
into consideration the overlapping and membership degrees of act with each other, which causes the network structure to
communities, which allows events between overlapping com- change at the next timestamp, and this process is called a
munities to be distinguished more effectively. Most notably, strong event. Strong events include: Remain, Shrink, Expand,
this enables WECEM to perform without influencing the Split, Merge, Form, and Disappear. The structures and fea-
identification accuracy. tures of the network will change as strong events occur. An
The proposed dynamic community evolution analysis illustrative example of strong event is shown in Fig. 1.
framework can be applied in several applications, for As shown in Fig. 1, this network has two communities,
example, the identification of crucial genes in biological Ct1 and Ct2 , at time t. At time t + 1, Ct+1 2 splits into two
network [26], [27], the design, synthesis, and re-engineering smaller communities as a result of the relationships change
of biological networks for biomedical purpose [28], and between nodes 10, 11 and other nodes. Simultaneously, the
structure of Ct+1 1 1
remains unchanged, so Ct+1 is viewed to
networked medicine and biological network control [29].
remain as an event. At time t + 2, nodes 13 and 14 join this
network, and node 14 establishes a relationship with nodes
III. C OMMUNITY E VOLUTION A NALYSIS F RAMEWORK 3, 4, and 6. At the same time node 13 establishes a relation-
BASED ON S TRONG AND W EAK E VENTS ship with node 11. The cardinalities of community Ct+1 1 and
In large-scale complex dynamic networks, community struc- Ct+1 expand due to these changes. At time t + 3, community
3
1
Ct+3 shrinks because node 1 exits from Ct+3 1 . At time t+4,
tures evolve slowly once the relationships between nodes
in networks are formed. Consequently, it becomes difficult 3
community Ct+4 disappears completely since all of its nodes

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6232 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021

TABLE I
D ESCRIPTION OF I MPORTANT S YMBOLS

(a)

(b)

(c)
Ct2 at t. The formal definition of these four weak events is
described in Section III-B2.
Definition 4 (Community Overlapping Degree): Given
p q
community Ct at time t and community Ct+1 at time t + 1,
the community overlapping degree of these two communities
is defined as the proportion of the number of nodes in the
(d) intersection of communities to the number of nodes in the
union of communities, as follows [30]:
Fig. 2. Examples of weak events in community evolution. (a) Weak shrink.  p q 
(b) Weak expand. (c) Weak split. (d) Weak merge.  p q  Ct C 
O Ct , Ct+1 =  p  t+1q . (1)
Ct C  t+1

Equation (1) is used to determine the persistence of rela-


maintain no relationship with external communities. At time
tionships of nodes between communities at different time.
t+5, the relationships between nodes 4, 5, 7, and 12 become
1 2 . Definition 5 (Community Membership Degree): The com-
so strong that it causes Ct+4 to merge with Ct+4 q
munity membership degree of community Ct+1 at time t + 1
Remark 1: Due to the characteristics of nonrealtime evolv- p
and Ct at time t is equal to the proportion of the number
ing communities in complex networks, measurable change can
of nodes in the interaction of these two communities to the
be a slow occurring process. There may also be multiple p
number of nodes in Ct , which is defined as follows [30]:
evolutionary events occurring at the same time in similar
 p q 
communities.  p q  Ct C 
Definition 3 (Weak Event): A weak event is triggered by S Ct , Ct+1 =  p t+1 . (2)
Ct 
small changes in the community. It is not detected by strong
events, yet occur together with strong events, including: Equation (2) implies that the degree of the community Ct+1
q
Weak Shrink, Weak Expand, Weak Split, and Weak Merge p
belongs to community Ct . The community membership degree
events. is used to determine whether a community belongs to another
As shown in Fig. 2(a), community Ct2 at time t splits and one. Specifically, it can be used to discover evolutionary
forms communities Ct+1 2 3 , as seen at time t + 1.
and Ct+1 events, such as splitting and merging events.
Simultaneously, node 8 exists in Ct2 , and disappears from the An event plays a fundamental role in community evolution
network at time t+1. It can be observed then that a weak shrink analysis. The following section provides a detailed description
event occurs from t to t + 1 in Ct2 . According to Fig. 2(b), of evolutionary events.
communities C1 and C2 aggregate at time t + 1, and node 15
joins as well. This leads to the occurrence of a weak expand-
ing event. According to Fig. 2(c), communities Ct2 and Ct3 , at B. Definitions of Evolutionary Events
time t, belong to Ct2 , and nodes 8 and 9 disappear at time t+1. The nomenclature is provided in Table I.
Thus, it can be deduced that a weak splitting event occurs at 1) Strong Events: According to the concepts given by
time t + 1 in Ct2 at time t. By Fig. 2(d), though communities Asur et al. [4] and Takaffoli et al. [18], strong events are
Ct1 and Ct2 at time t belong to C1 , at time t + 1, nodes 15–17 defined as follows.
q
1
join in this network, and belong to Ct+1 at time t + 1. Thus, a Definition 6 (Remain): Suppose there is a community Ct+1
p
weak merging event occurs at t + 1 for communities Ct1 and at time t +1 and another community Ct at time t, and they are

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6233

q
highly overlapped and share many similar nodes. This phe- degree between the union set of communities in Y and Ct+1 is
nomenon is called “Remain,” and can be formalized by the very high, then a “merge” event occurs, which is represented
following formula: by the following equation:
 p q   q 
R Ct , Ct+1 = 1 SP Ct+1 , Y = 1
q  p q   q
iff.∃Ct+1 ∈ Ct+1 , O Ct , Ct+1 ≥ θ (3) O(Y, C ) ≥ ξ
iff  i t+1 q (9)
S Ct , Ct+1 ≥ ξ ∀Cti ∈ Y.
where Ct+1 represents a dynamic network at time t + 1.
Definition 7 (Form): If the community overlapping degree In Definitions 6–12, the parameters of θ , γ , and ξ are tuned
p q
between community Ct and community Ct−1 at time t − 1 is by experiments in order to discover as many events as pos-
p
very low, that is, the community Ct at time t has no relation- sible. Different from traditional event-based frameworks, the
ship with other communities at time t−1, then a “Form” event WECEM framework uses these three parameters to control the
occurs. This event can be formalized as occurring conditions of each event.
 q p Observation 1: Strong events can be classified into the
F Ct−1 , Ct = 1 following types.
q  p q 
iff.∀Ct−1 ∈ Ct−1 , O Ct , Ct−1 < θ. (4) 1) Form, Disappear, and Remain involve the existence
events of communities.
Definition 8 (Disappear): If the community overlapping
p q 2) Shrink and Expand events are relevant to the change of
degree between community Ct and community Ct+1 at t + 1
p community sizes.
is very low, Ct at time t has no relationship with other com-
3) Split and Merge involve the change of multiple
munities at time t + 1, then a “Disappear” event occurs, which
communities.
can be modeled as
 p q  For the aforementioned three kinds of evolutionary events,
D Ct , Ct+1 = 1 if only one parameter as shown in the definition of each
q  p q  event to control the evolution of communities, it is difficult to
iff.∀Ct+1 ∈ Ct+1 , O Ct , Ct+1 < θ. (5)
accurately detect evolutionary events. Therefore, we use three
p
Definition 9 (Expand): If community Ct at time t belongs parameters θ , γ , and ξ as constraints for events.
p q
to another community at t + 1, denoted by Ct ⊂ Ct+1 , and Theorem 1: The size of the union of two communities is
p q
the number of nodes in Ct is less than that of Ct+1 , we call larger than that of each community, that is
p
Ct “Expands” at time t + 1 by the following formula:  p   
Ct ∪ Cq  ≥ Ctp  (10)
 p q   p t+1   
E Ct , Ct+1 = 1 Ct ∪ Cq  ≥ Cq . (11)
q  q p t+1 t+1
iff.∃Ct+1 ∈ Ct+1 , 1 − γ ≤ S Ct+1 , Ct < 1. (6)
It is worth noting that (1) is used to determine the overlap-
q
Definition 10 (Shrink): If community Ct+1 at time t + 1 ping degree of two communities between adjacent timestamps
p
belongs to community Ct at time t, and the number of nodes and the persistent relationship between two communities.
q p
in Ct+1 is less than that of Ct , then a “Shrink” event occurs By Theorem 1, it is unlikely, and most probably a byprod-
at time t + 1, which is described as follows: uct of randomness, when in real-world situations the union
 p q  set of two communities can be used to determine the overlap-
SH Ct , Ct+1 = 1 ping degree. Equation (1) takes into account the growth and
p  p q 
iff.∃Ct ∈ Ct , 1 − γ ≤ S Ct , Ct+1 < 1. (7) shrink of different communities over time.
2) Weak Events: It can be assumed that strong events may
Because γ ∈(0, 1], in order to detect Expand and Shrink
be accompanied by weak events, which can lead to small
events effectively, the left-hand constraints of Expand and
changes in communities. However, it can not be viewed to
Shrink events with respect to S are specified to 1-γ , rather
constitute the change necessary to trigger traditional events as
than γ . While the value of S only changes within a limited
observed by event-based frameworks. At the same time, there
range.
exist some changes that do not satisfy the requirements of
Definition 11 (Split): If there are k(>1) communities
q q+k strong events, despite there being measurable changes in the
X={Ct+1 , . . . , Ct+1 } at time t + 1, and each community in network. This phenomenon is referred to as a “weak event,”
p
X almost belongs to community Ct , and the community over- which can serve as a complement for strong events. A formal
p
lapping degree between the union of communities in X and Ct definition is as follows.
p
is very high, Ct is viewed to “split” into different communities Definition 13 (Weak Shrink): The phenomenon of a slight
as follows: or measurably small shrink of nodes in communities is called
 p 
SP Ct , X = 1 a weak shrink event. This event occurs at the same time as a
  p  strong event. Weak shrink events appear in the following three
O C , X ≥ ξ
iff  it p (8) scenarios.
S Ct+1 , Ct ≥ ξ ∀Ct+1
i ∈ X. p
1) When a Remain event occurs, community Ct at time t
q
Definition 12 (Merge): If there are many communities belongs to community Ct+1 at time t +1, and the size of
p p+k
Y={Ct , . . . , Ct } at time t, each community in Y almost the intersection of the two communities, at various time
q
belongs to community Ct+1 , and the community overlapping intervals, is less than that of the communities at time t.

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6234 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021

2) When a Form event occurs, resulting from a community


at time t shrinking but not splitting, and event is not
observed even though a new community has formed at
time t + 1.
3) When a Split event occurs, the size of the intersection
of X, which represents the union of communities at time
p
t + 1 and community Ct at time t, is less than the size
p
of Ct at time t.
The above three cases can be formulated as follows: Fig. 3. Linked-list storage structure.
⎧  p q   p q 

⎪ S Ct , Ct+1 < 1, Ct , Ct+1 ∈ R
⎨ 
p
p q
C ,C ∈F
WSH = S Ct+1 , Ct ≥ θ,  pt qt+1
q
(12) Remark 2: There is a difference between splitting and

⎪    C , C
t  t+1 ∈/ SP
⎩ p p merging events in terms of time sequences. For a splitting
S Ct , X < 1, Ct , X ∈ SP
event, we compare communities over different time sequen-
Definition 14 (Weak Expand): A weak expanding event tially. Contrarily, for a merging event, we have to compare
occurs along side strong events, which indicates a slow communities in a reverse time sequence.
growth in communities. Weak expanding events appear in the Weak events are the manifestations of small changes in com-
following three scenarios. munities. Since these changes are not apparent, we cannot
1) When a Remain event occurs, the size of the intersection detect them via traditional event-based frameworks. More gen-
of two communities at time t is less than that of erally, vast networks evolve slowly with myriad small changes,
communities at the next timestamp. which are difficult to detect, but nonetheless serve as the cat-
2) When a Form event occurs, since the community at time alyst for strong events. Therefore, a case can be made that
t expands without the occurrence of a Merge event, a detecting weak events can be of equal, if not of greater impor-
new community nevertheless forms at time t + 1. tance, for successfully detecting changing trends in dynamic
3) When a Merge event occurs, the size of the interaction networks. This can be paramount in helping service providers
set of Y (which represents the union of communities at predict future developments of communities.
q
time t) and community Ct+1 at time t + 1, is less than Remark 3: When compared with strong events, weak events
q
the size of Ct+1 at time t + 1. occur more frequently, and the occurrences of a large number
The above three cases can be modeled as follows: of weak events are an indicator for an eventual strong event.
⎧  q p  p q 
⎪ S Ct+1 , Ct < 1, Ct , pCt+1q ∈ R

⎨  p q  C ,C ∈B IV. C OMMUNITY E VOLUTION D ETECTION A LGORITHM
WE = S Ct , Ct+1 ≥ θ,  pt qt+1 (13)

⎪ C , C ∈
/ M BASED ON W EAK E VENTS
⎩  q   t
q 
t+1
S Y, Ct+1 < 1, Y, Ct+1 ∈ M. WECEM includes the following steps: 1) detecting Remain,
Disappear, and their accompanying events, including Weak
Definition 15 (Weak Split): If a community at time t + 1
Expand and Weak Shrink; 2) detecting Expand and Shrink
belongs to another community at time t, but the union of these
events; 3) detecting Split, Weak Split and Weak Shrink events;
communities cannot represent the one at time t, then this phe-
4) detecting Form, Weak Shrink, and Weak Expand events; and
nomenon is called a “weak split,” which can be described
5) detecting Merge, Weak Merge and Weak Expand events.
further with the following formula:
 i Before discovering evolutionary events, duplicated edges are
 i p
∀C
 ∈X, S Ct+1 , Ct ≥ ξ eliminated and indices of nodes are reordered. In particular,
WSP = t+1
p (14)
O X, Ct < ξ. we use the linked-list storage structure as shown in Fig. 3.
We apply the above data structure because there are a huge
Definition 16 (Weak Merge): If some communities at time
volume of network data generated at different time, it is dif-
t belong to one community at time t+1, but the union of these
ficult to use a very large matrix to store the big network
communities cannot constitute the community at time t + 1,
structure. Contrarily, linked lists with head nodes can help
this phenomenon is called “weak merge,” which is defined as
greatly compress the storage space in order to reduce the cost
follows:
 of determining whether an edge does exist.
 i q 
∀Ct ∈ Y, S Cti , Ct+1 ≥ ξ
WM = p (15)
O Ct , Y < ξ. A. Remain, Disappear, and Accompanying Event Detection
In Definitions 6–16, the parameter θ is specified to the Algorithm 1 can be summarized as follows.
same value, and γ and ξ as well. These three parameters are 1) For each community at time t, if there is a commu-
determined empirically by experimentation. nity at time t + 1 in which the community overlapping
Detailed descriptions of various events are given in Table II. degree with it is bigger than θ , a Remain event occurs
The complexity represents the cardinality of changing nodes in (lines 1–4).
two communities, when events are detected by the framework. 2) If these two communities do not have complete mem-
The absolute value in the last column represents the size of bership relationship, Weak Shrink, and Weak Expand
the corresponding communities. events have occurred (lines 5–8).

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6235

TABLE II
D ESCRIPTION OF E VENTS

Algorithm 1 Remain, Disappear, Weak Expand, and Weak Algorithm 2 Expand and Shrink Event Detection
Shrink Event Detection Input: The community set Ct at t, and Ct+1 at t+1.
Input: The community set Ct at t, and Ct+1 at t+1. Output: E, SH.
Output: R, WE, WSH, D. 1. for each ct ∈ Ct do
1. for each ct ∈ Ct do 2. for each ct+1 ∈ Ct+1 do
2. ct+1 =find(Ct+1 , O(ct , ct+1 )); 3. if 1-γ ≤ S(ct+1 ,ct )<1 then
3. if ct+1 = ∅ then 4. E=insert(ct );
4. R=insert(ct ); 5. else if 1-γ ≤ S(ct ,ct+1 )<1 then
5. if S(ct , ct+1 )<1 then 6. SH=insert(ct+1 );
6. WSH=insert(ct ); 7. output E, SH.
7. if S(ct+1 , ct )<1 then
8. WE=insert(ct );
Algorithm 3 Split and Its Accompanied Event Detection
9. else
10. D=insert(ct ); Input: The community set Ct at t, and Ct+1 at t+1.
11. output R, WE, WSH, D. Output: SP, WSP, WSH.
1. for each ct ∈ Ct do
2. for each ct+1 ∈ Ct+1 do
3. if S(ct+1 , ct )≥ ξ then
3) If there is no community at time t + 1 similar to a com- 4. C =C ∪ ct+1 ;
munity at t, a Disappear event has occurred (lines 9 5. if O(ct , C )≥ ξ then
and 10). 6. SP=insert(ct );
4) Finally, it outputs identified events (line 11). 7. if S(ct , C )<1 then
8. WSH=insert(ct );
B. Expand and Shrink Event Detection 9. else
10. WSP=insert(ct );
The main steps of Algorithm 2 include as follows. 11. output SP, WSP, WSH.
1) It compares communities at time t with communities
at time t + 1, if the community membership degree of
communities at time t + 1 meets with the communities
at time t in (6), an Expand event occurs (lines 1–4). 3) Finally, it outputs identified events (line 7).
2) If the community membership degree meets
with Equation (7), a Shrink event occurs (lines 5 C. Split and Accompanying Event Detection
and 6). Algorithm 3 includes the following important steps.

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6236 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021

Algorithm 4 Form and Its Accompanied Event Detection 1) It compares a community at time t+1 with all communi-
Input: The community set Ct at t, and Ct+1 at t+1. ties at t. If multiple communities at time t belong to the
Output: F, WSH, WE. community at t + 1, store these communities (lines 1–4).
1. for each ct+1 ∈ Ct+1 do If the community overlapping degree between the union
2. ct =find(ct , O(ct+1 , ct )≥ θ ) of these communities at time t and the community at t+1
3. if ct = ∅ then satisfies the merging condition, a Merge event occurs
4. F=insert(ct+1 ); (lines 5 and 6); otherwise, a Weak Merge event does
5. if S(ct+1 , ct )≥ θ then occur (lines 9 and 10). For communities with Merge
6. WSH=insert(ct ); events, if the union set of these communities does not
7. if S(ct , ct+1 )≥ θ then belong to the community at time t, a Weak Expand event
8. WE=insert(ct ); occurs (lines 7 and 8).
9. output F, WSH, WE. 2) Finally, it outputs identified events (line 11).

Algorithm 5 Merge and Its Accompanying Events Detection F. Algorithm Complexity Analysis
Input: The community set Ct at t, and Ct+1 at t+1. For a network G(V, E) with n nodes and m edges,
Output: M, WM, WE. in Algorithms 1–5, each algorithm visits all nodes in the
1. for each ct+1 ∈ Ct+1 do network at adjacent timestamps, after which they determine
2. for each ct ∈ Ct do whether the number of nodes changes, in order to deter-
3. if S(ct , ct+1 )≥ ξ then mine which event happens, similar to visiting the Cartesian
4. C =C ∪ ct ; product of nodes at adjacent timestamps. Therefore, the time
5. if O(ct+1 , C )≥ ξ then complexity of Algorithms 1–5 is O(n2 ).
6. M=insert(ct+1 );
7. if S(C , ct+1 )<1 then
8. WE=insert(ct+1 ); V. E XPERIMENTS
9. else A. Experimental Setup
10. WM=insert(ct+1 );
In order to verify the accuracy and efficiency of the
11. output M, WM, WE.
proposed community evolution analysis framework, we con-
duct experiments using real data as well as large-scale syn-
thetic network data sets, including: 1) two types of synthetic
1) It compare communities at time t with all communities dynamic networks generated by the data generator [31] and
at time t + 1, if a community at time t + 1 belongs to the 2) real dynamic networks, including DBLP data set [32] and
community at time t, stores this community (lines 1–4). Facebook data set from New Orleans in 2008 [33]. The details
2) If the community overlapping degree between the union of these data are shown in Table III.
set of these communities and the community at time t The first type of synthetic data is generated by the dynamic
satisfies the condition of Split, a Split event occurs (lines network D3, with parameters listed in Table III(a), without
5 and 6). If the community membership degree between specifying the number of evolutionary events. The second type
the union of these communities and the community at of synthetic data is generated by the dynamic networks D1 and
time t is very small, a Weak Shrink event occurs (lines D2 in Table III(a), in order to estimate the correctness of the
7 and 8). WECEM framework, where the D1 dataset generates 50 Form
3) Otherwise, Weak Split occurs (lines 9 and 10). and Disappear events, 10 Merge and Split events, 50 Shrink
4) Finally, it outputs identified events (line 11). and Expand events, while the D2 dataset is specified to have
200 Form and Disappear events, 50 Split and Merge events,
D. Form and Accompanying Event Detection and 200 Shrink and Expand events.
As we can see from Table III(a), for the synthetic dynamic
The main steps of Algorithm 4 include as follows.
network datasets D1, D2, and D3, the dynamic community
1) For each community at time t + 1, if there is no commu-
evolution events were analyzed across 5 time steps, and the
nity at time t satisfying the forming condition, a Form
time steps is determined based on the following rules: The
event occurs (lines 1–4).
networks began at t = 1 with around 400 embedded commu-
2) When a Form event occurs, if the community member-
nities, which were constrained to have sizes between [20, 60].
ship degree of ct+1 to ct is bigger than θ , a Weak Shrink
In these three synthetic datasets, twenty percent of node mem-
event occurs (lines 5 and 6). If the community member-
berships were randomly permuted at each step to simulate
ship degree of ct to ct+1 is bigger than θ , a Weak Expand
users’ movement across communities over time. Then, events
event occurs (lines 7 and 8).
were added by the generator. As for the real DBLP dataset
3) Finally, it outputs identified events (line 9).
in Table III(b), the number of time steps for community evo-
lution analysis is 5 (years). Similarly, for the real Facebook
E. Merge and Accompanying Events Detection dataset in Table III(c), the number of time steps for community
The main steps of Algorithm 5 are as follows. evolution analysis is 12 (months).

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6237

TABLE III
D ESCRIPTION OF E XPERIMENTAL DATA S ETS . (a) S YNTHETIC DYNAMIC EMA is used to evaluate the accuracy of each framework for
N ETWORK DATA . (b) DBLP N ETWORK DATA . detecting evolutionary events.
(c) FACEBOOK N ETWORK DATA

(a) B. Parameter Specification


In order to take into account the quantity and quality of
discovered events, it is necessary to determine the value of
the community existence parameter θ , the community scale
changing parameter γ and the multiple community changing
parameter ξ . The accuracy of WECEM is relevant to these
three parameters, thus choosing appropriate parameter values
can help discover more reliable events. θ and γ are specified
by experiments on the D1 data set. Because the D1 data set
(b) contains only 10 Split and Merge events and the scale of data
is small, the accuracy of event mining cannot be accurately
displayed on D1, ξ is determined by experiments on the D2
data set. The results are shown in Fig. 4, while Fig. 4(a)–(d)
demonstrate the experimental results by changing θ and γ
parameters on the D1 data set, and Fig. 4(e) and (f) show the
results by specifying different ξ values on the D2 data set.
In Fig. 4(a), as θ grows, except for Form and Disappear
(c) events, the number of other events decreases, because Form
and Disappear events require that the community overlapping
degree between communities at adjacent timestamps is less
than θ . When the value of θ is very small, it is difficult to find
Form and Disappear events. However, as the value of θ grows,
the constraint for Form and Disappear events becomes increas-
ingly loose, and it can discover more such events. On the
contrary, with the value of θ growing, the constraint for other
events becomes strict, so the number of discovered events
decreases.
As shown in Fig. 4(b), when θ < 0.4, as the value of θ
grows, EMA of Form and Disappear events grows gradually.
Because as θ grows, the number of Form and Disappear events
changes to the actual situation of these two events. But when
θ > 0.4, as θ increases, redundant events increase gradually,
which leads to the decrease of EMA. According to the exper-
imental results of discovered events on the D1 data set, θ is
The proposed WECEM framework is implemented by Java
specified to 0.4 in the following experiments so as to avoid
programming language, and we compare it with classic Asur
the overlapping of strong events and enable WECEM to find
framework [4] and Takaffoli framework [2], where the param-
as many events as possible.
eter k of the Asur and the Takaffoli framework is set to
As shown in Fig. 4(c), as γ increases, the number of Expand
0.5 based on experimental studies. The hardware environment
and the Shrink events grows gradually, because the constraint
includes the Intel Corei7-4710HQ processor, and 8G memory.
range of Expand and Shrink events becomes large with γ
Each framework executes 3 times on each data set, and we
increasing. However, the capability of distinguishing these
take the average value to show their performance.
events becomes weak, and the discovered events will be over-
Definition 17 [Event Mining Accuracy (EMA)]: EMA rep-
lapped by other events, especially for Shrink events. When
resents the accuracy of event detection, which equals the
γ = 1, the bound constraint is between [0, 1], and the Shrink
proportion of the number of correctly identified communities
event happens in almost all communities.
to the actual number of communities with events happening
   In Fig. 4(d), as γ grows, EMA of Expand and Shrink
 P P  events increases at first and then drops, because when γ is
t∈T C t C t 
EMAP =  P   P  (16) small, the number of events is small, but, when γ is very
 
t∈T max Ct , Ct 
large, there exists several redundant events. In order to iden-
tify more events in an effective fashion, the numbers of Shrink
where EMAP represents EMA of a particular event P, T rep- and Expand events on the D1 data set are increased to 50 and
resents the set of timestamps, CtP represents the community γ is specified to 0.3.
where P occurs at time t detected by algorithms, CtP rep- By Fig. 4(e), the number of Split and Merge events
resents the true community where P happens at time t, and decreases with ξ , because when ξ is set to be large, the

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6238 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021

(a) (b) (c)

(e) (f) (g)

Fig. 4. Number of discovered events and EMA by the WECEM framework with different θ , γ and ξ values. (a) Number of discovered events as θ changes.
(b) EMA as θ changes. (c) Number of discovered events as γ changes. (d) EMA as γ changes. (e) Number of discovered events as ξ changes. (f) EMA as
ξ changes.

number of communities meeting the constraints of overlapping Form and Disappear events are larger than that of other
degree and membership degree decreases, and the correspond- events.
ing events will reduce gradually. When 0 < ξ ≤ 0.6, as 2) As shown in Table IV(a), although there are many
ξ grows, the number of communities grows which satisfies weak events in the DBLP network, the change of nodes
the requirement of membership degree without satisfying the and edges relevant to weak events is so small, which
requirement of community overlapping degree, thus the num- does not cause a qualitative change in the community.
bers of Weak Split and Weak Merge events grow gradually. Additionally, because the weak events are accompanied
When 0.6 < ξ ≤ 1, these two conditions cannot be met, so the with strong events, even weak events overlap with strong
changing trends of Weak Split and Weak Merge events (Split events, discovering weak events can help accurately
and Merge events) are nearly the same. predict the variation tendency of network structures.
According to Fig. 4(f), when 0< ξ <0.7, EMA increases However, Asur, Takaffoli, and other event-based frame-
with ξ . When 0.7< ξ <1, EMA degrades with ξ . When ξ works does not work for identifying weak events, while
is small, with ξ growing, the number of discovered events is they only focus on strong events which are easy to
approximate to the number of actually occurred events. When be found. Actually, the phenomenon of overlapping
ξ is specified to a large value, the number of redundant events events rarely appears, and it is difficult for traditional
grows, which leads to the decrease of EMA. Based on the event-based frameworks to detect events with a slowly
above discussion, in order to accurately identify evolutionary changing tendency in dynamic networks.
events, ξ is set to 0.6 in experiments. 3) By comparing Table IV(a) with Table IV(b), the num-
bers of events detected by WECEM are larger than that
mined by the Asur framework and WECEM outperforms
C. Quantity Analysis of Detected Events traditional frameworks by 22.9% in the number of dis-
In this section, we compare the number of correctly detected covered strong events. Because the definition of events
events among different community detection frameworks. by Asur is strict, which makes the Asur framework dif-
1) Quantity Comparison of Detected Events on DBLP: ficult to detect events. Taking the Remain event as an
Table IV shows the number of detected events on the DBLP example, it requires the number of nodes in a commu-
data set, and the following observations can be drawn. nity should be exactly the same at adjacent timestamps.
1) For each framework, the numbers of Form and Moreover, as for Form events, there should be no similar
Disappear events are the largest compared with other nodes in a community at time t and t + 1.
events. This is because DBLP is a coauthor network, 4) By comparing Table IV(a) with Table IV(c), we can
with some scholars publishing papers and other scholars find that the number of events identified by WECEM
may not publishing papers each year. Consequently, sev- is almost the same as Takaffoli, since WECEM uses
eral small communities are formed, thus the numbers of multiple parameters to deal with different events. In

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6239

TABLE IV
Q UANTITY C OMPARISON ON DBLP DATASETS . (a) N UMBER OF E VENTS D ETECTED BY WECEM. (b) N UMBER OF E VENTS D ETECTED BY A SUR .
(c) N UMBER OF E VENTS D ETECTED BY TAKAFFOLI

(a)

(b) (c)

addition, community overlapping degree and member- during a short period of time, there are a large number of weak
ship degree works to accurately detect evolutionary events because of the slow changes in the number of nodes
events. For a small scale of events, Takaffoli can dis- and edges. Finally, the frequent occurrences of weak events
cover more events than WECEM, because Takaffoli result in strong events.
focuses on the detection of multiple networks before and
after the time slice, the event that meets the condition D. Event Detection Quality Analysis
is identified, and the corresponding event is considered For the D1 and D2 datasets, we generate a fixed num-
to have occurred. ber of events. The EMA measurement is used to verify the
2) Quantity Comparison of Detected Events on D3 accuracy of event detection by comparing the number of dis-
Data: Table V shows the number of events detected by each covered events with the actual number of events on these three
framework. The most important difference from the D3 data frameworks. It is noteworthy that we cannot manually spec-
set to the D1 and D2 data sets lies in we do not need to ify the number of events in community evolution on DBLP
manually specify the number of events, which is approximate and Facebook datasets that are from real networks. It is dif-
to the real-world dynamic network structures. ficult to accurately estimate the accuracy of event mining, so
According to [31], at every moment, nodes are selected we conduct experiments on the D1 and D2 datasets and the
from the D3 synthetic data set to simulate the changes of the experimental results are shown in Fig. 5.
network. This kind of network evolves in a continuous and Fig. 5 shows the accuracy of event detection on the D1 and
stable fashion, which is similar to the real dynamic network. D2 datasets by each framework. Given that Asur and Takaffoli
Consequently, the numbers of events discovered by these three do not define Shrink and Expand events, we only show the
frameworks over time are relatively stable. results about Shrink and Expand events from WECEM. The
As shown in Table V, the Asur framework cannot handle following conclusions are made by Fig. 5.
large-scale synthetic network data because the node changes 1) WECEM is more accurate than Asur and Takaffoli in
in each time slice of the simulated network are extracted pro- discovering Form and Disappear events that occur in
portionally from the last moment, and many communities have various communities. The reason behind is that although
small changes, which is not sufficient to satisfy the definition Asur can guarantee the accuracy of event detection, it
of events in the Asur framework, resulting in zero for each is too strict to define different events and the number of
event in the Asur framework. On the contrary, Tables V(a) events discovered by the Asur framework is very small,
and (c) show that the WECEM framework discovers the sim- so its EMA is very low. On the other hand, the Takaffoli
ilar numbers of Remain, Form and Disappear events to the framework considers the change of the number of nodes
Takaffoli framework, which verifies the effectiveness of the at all timestamps when mining events and the change
WECEM framework. For the Split and Merge events, Takaffoli of communities in synthetic networks are regular, the
discovers more events than WECEM. Because the values of redundant nodes mined by the Takaffoli framework are
parameters in WECEM is set to be small, which results in higher than that of the WECEM framework.
many redundant events. On the other hand, the Takaffoli 2) As for the Split and Merger events, the WECEM event
framework needs to compare the number of nodes at each discovery accuracy is lower than that of the Asur frame-
time slice in event detection. work. This can be explained by the fact that we use
In Tables IV(a) and V(a), there are a large number of weak the data generator designed by Greene et al. [31] to
events on each data set. We can conclude that weak events are generate dynamic synthetic network datasets in order to
very common in complex networks. For two networks evolving mine Split and Merge events, and this data generator

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6240 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021

TABLE V
Q UANTITY C OMPARISON OF E VENTS ON D3 DATASETS . (a) N UMBER OF E VENTS D ETECTED BY WECEM. (b) N UMBER OF E VENTS D ETECTED BY
A SUR . (c) N UMBER OF E VENTS D ETECTED BY TAKAFFOLI

(a)

(b) (c)

(a) (b)

Fig. 5. Event detection accuracy comparison on different datasets. (a) EMA on D1 dataset. (b) EMA on D2 dataset.

is developed based on Asur framework. On one hand, 3) EMA of WECEM is averagely 2.13% higher than that
WECEM can identify all of Split and Merge events due of the Asur framework and 12.2% higher than that of
to the relaxed definition of evolutionary events when the Takaffoli framework. The accuracy of WECEM is
compared to the Asur framework. On the other hand, higher than that of the Asur framework. This is because
more redundant events will be found by the WECEM WECEM uses multiple parameters to distribute different
framework, which leads to the lower accuracy of dis- kinds of events in order to avoid the overlap of events
covering Merger and Split events compared with the as well as reduce the redundancy rate of mining events.
Asur framework. Although some of the events found by
WECEM are redundant, these communities with these
events actually have changed in the network, which E. Efficiency Analysis of Detecting Events
mainly constitutes weak events. Similarly, the Takaffoli In this section, we compare the execution time of each
framework has a lower accuracy and higher redun- framework on these five datasets, including DBLP, Facebook,
dancy rate than the WECEM framework, although the D1, D2, and D3. The experimental results are shown in Fig. 6,
Takaffoli framework can identify all Split and Merge where the x-axis represents the time interval of two adjacent
events. The reasons of its higher redundancy are twofold: timestamps. As shown in Fig. 6(e), the y-axis represents the
a) the synthetic network data agrees with some regu- execution time of each framework.
larity without considering the distribution of events at As demonstrated in Fig. 6, with the number of nodes grow-
each timestamp and b) the Takaffoli framework uses ing, the runtime of these three frameworks increases gradually,
the uniform parameter k to control community sim- and the following conclusions can be made.
ilarity and discover events in the network, whereas 1) According to Fig. 6(a), (c), (d), and (e), the Asur frame-
WECEM takes into account three parameters, which work runs first on the DBLP, D1, D2, and D3 datasets,
plays important roles in accurately mining evolutionary followed by the Takaffoli framework, with the WECEM
events. framework being the lowest one. This can be explained

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6241

(a) (b) (c)

(d) (e)

Fig. 6. Execution time comparison of different frameworks on different datasets. (a) DBLP. (b) Facebook. (c) D1. (d) D2. (e) D3.

by the reason that WECEM needs to simultaneously community membership degree, the WECEM framework first
identify 11 kinds of events composed of strong events compares each community at consecutive timestamps, respec-
and weak events, while the Asur framework and the tively, and then discovers different events based on these two
Takaffoli framework only need to mine five kinds of measurements. The experimental results have indicated that the
elementary events. WECEM framework is effective at mining events. Particularly,
2) From Fig. 6(b), the most efficient framework is Asur, WECEM can discover weak events which cannot be handled
followed by WECEM, and the slowest one is Takaffoli by other frameworks. In addition, the experimental results
on the Facebook dataset. Because the Facebook datasets have also shown that the WECEM framework is effective
have a large number of nodes involved over several at detecting strong as well as weak events. As for mining
timestamps, the Takaffoli framework detects every event large-scale dynamic networks, the advantage of WECEM is
by comparing the number of nodes over all timestamps. apparent, since it can detect small changes in the network.
However, WECEM and Asur only need to compare Given that WECEM needs much time to mine several kinds
the number of nodes at adjacent timestamps. So the of events, its efficiency is less than the traditional frameworks
time complexity of Takaffoli is higher than WECEM in some cases. In our future work, we will continue to improve
and Asur. The efficiency of Asur is higher than that the accuracy of event mining by reducing redundant events.
of WECEM, as Asur framework does not need to Because traditional serial community evolution analysis meth-
detect Shrink and Expand events and it cannot detect ods cannot handle a big network data, and we will parallel the
weak events as well. For Facebook datasets, the effi- WECEM framework to mine larger complex networks with a
ciency of WECEM is 48.83% less than Asur and is huge number of nodes and complex relationships.
67.73% higher than Takaffoli. In summary, the proposed
WECEM framework can deal with large-scale dynamic
R EFERENCES
networks over several timestamps, which is more flexible
and generic than the Takaffoli framework. [1] M. E. J. Newman and M. Girvan, “Finding and evaluating community
structure in networks,” Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat.
Interdiscip. Top., vol. 69, no. 2, 2004, Art. no. 026113.
[2] M. Takaffoli, F. Sangi, J. Fagnan, and O. R. Zaiane, “Community evo-
VI. C ONCLUSION lution mining in dynamic social networks,” Soc. Behav. Sci., vol. 22,
no. 22, pp. 49–58, 2011.
In this article, we have explored the fundamental principle [3] S. Qiao, N. Han, J. Wang, R.-H. Li, L. A. Gutierrez, and X. Wu,
and working mechanism of the WECEM framework for weak “Predicting long-term trajectories of connected vehicles via the prefix-
event mining in the community evolution of dynamic com- projection technique,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 7,
pp. 2305–2315, Jul. 2018.
plex networks. WECEM classifies events into strong events [4] S. Asur, S. Parthasarathy, and D. Ucar, “An event-based framework for
and weak events. Two measurements, community overlap- characterizing the evolutionary behavior of interaction graphs,” ACM
ping degree, and community membership degree, are used to Trans. Knowl. Disc. Data, vol. 3, no. 4, pp. 1–36, 2009.
[5] S. Qiao et al., “A fast parallel community discovery model on complex
determine the continuity of dynamic communities in complex networks through approximate optimization,” IEEE Trans. Knowl. Data
networks. To calculate the community overlapping degree and Eng., vol. 30, no. 9, pp. 1638–1651, Sep. 2018.

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
6242 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 51, NO. 10, OCTOBER 2021

[6] S. Y. Bhat and M. Abulaish, “HOCTracker: Tracking the evolution of [30] J. Rao, H. Du, X. Yan, and C. Liu, “Detecting overlapping community
hierarchical and overlapping communities in dynamic social networks,” in social networks based on fuzzy membership degree,” in Proc. 5th Int.
IEEE Trans. Knowl. Data Eng., vol. 27, no. 4, pp. 1019–1031, Conf. Comput. Soc. Netw., 2016, pp. 99–110.
Apr. 2015. [31] D. Greene, D. Doyle, and P. Cunningham, “Tracking the evolution of
[7] S. Qiao, D. Shen, X. Wang, N. Han, and W. Zhu, “A self-adaptive communities in dynamic social networks,” in Proc. IEEE Int. Conf. Adv.
parameter selection trajectory prediction approach via hidden Markov Soc. Netw. Anal. Min., 2011, pp. 176–183.
models,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 1, pp. 284–296, [32] Datatang. (2017). DBLP Co-Author Data Sets. Accessed: Apr. 27, 2017.
Feb. 2015. [Online]. Available: http://www.datatang.com
[8] S. Qiao, N. Han, W. Zhu, and L. A. Gutierrez, “TraPlan: An effec- [33] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi, “On the evolu-
tive three-in-one trajectory-prediction model in transportation networks,” tion of user interactions in Facebook,” in Proc. ACM Workshop Online
IEEE Trans. Intell. Transp. Syst., vol. 16, no. 3, pp. 1188–1198, Soc. Netw., 2009, pp. 37–42.
Jun. 2015.
[9] A. Clauset, M. E. J. Newman, and C. Moore, “Finding community struc-
ture in very large networks,” Phys. Rev. E, Stat. Phys. Plasmas Fluids
Relat. Interdiscip. Top., vol. 70, no. 2, 2004, Art. no. 066111.
[10] M. E. J. Newman, Networks: An Introduction. Oxford, U.K.: Oxford
Univ. Press, 2010. Shaojie Qiao received the B.S. and Ph.D. degrees
[11] A. Clauset, “Finding local community structure in networks,” Phys. Rev. in computer science from Sichuan University,
E, Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 72, no. 2, Chengdu, China, in 2004 and 2009, respectively.
2005, Art. no. 026132. From 2007 to 2008, he worked as a Visiting
[12] F. Hao, G. Min, Z. Pei, D.-S. Park, and L. T. Yang, “K-clique community Scholar with the School of Computing, National
detection in social networks based on formal concept analysis,” IEEE University of Singapore, Singapore. He is currently a
Syst. J., vol. 11, no. 1, pp. 250–259, Mar. 2017. Professor with the School of Software Engineering,
[13] A. Mahmood, M. Small, S. A. Al-Maadeed, and N. Rajpoot, “Using Chengdu University of Information Technology,
geodesic space density gradients for network community detection,” Chengdu. He has authored more than 40 high qual-
IEEE Trans. Knowl. Data Eng., vol. 29, no. 4, pp. 921–935, Apr. 2017. ity papers, and coauthored more than 90 papers. His
[14] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, “Network community research interests include community discovery and
detection based on spectral clustering,” Nature, vol. 435, no. 7043, artificial intelligence.
pp. 814–818, 2005.
[15] M. G. Parsa, N. Mozayani, and A. Esmaeili, “An EDA-based com-
munity detection in complex networks,” in Proc. IEEE 7th Int. Symp.
Telecommun., 2014, pp. 476–480.
[16] M. Bouguessa, R. Missaoui, and M. Talbi, “A novel approach for detect-
ing community structure in networks,” in Proc. IEEE 26th Int. Conf. Nan Han received the M.S. and Ph.D. degrees in sci-
Tools Artif. Intell., 2014, pp. 469–477. ence of prescriptions from the Chengdu University
[17] V. Lyzinski, M. Tang, A. Athreya, Y. Park, and C. E. Priebe, of Traditional Chinese Medicine, Chengdu, China,
“Community detection and classification in hierarchical stochastic block- in 2009 and 2012, respectively.
models,” IEEE Trans. Netw. Sci. Eng., vol. 4, no. 1, pp. 13–26, She is an Associate Professor with the School
Jan.–Mar. 2017. of Management, Chengdu University of Information
[18] M. Takaffoli, R. Rabbany, and O. Zaiane, “Community evolution Technology, Chengdu. She has authored of more
prediction in dynamic social networks,” in Proc. IEEE/ACM Int. Conf. than 20 papers and she has participated in several
Adv. Soc. Netw. Anal. Min., 2014, pp. 9–16. projects supported by the National Natural Science
Foundation of China. Her research interests include
[19] N. İlhan and Ş. G. Öguducu, “Predicting community evolution based on
complex networks and data mining.
time series modeling,” in Proc. IEEE/ACM Int. Conf. Adv. Soc. Netw.
Anal. Min., 2015, pp. 1509–1516.
[20] W. Zhu, D. Zhang, X. Zhou, D. Yang, and Z. Yu, “Discovering and
profiling overlapping communities in location-based social networks,”
IEEE Trans. Syst., Man, Cybern., Syst., vol. 44, no. 4, pp. 499–509,
Apr. 2017.
Yunjun Gao (Member, IEEE) received the
[21] E. G. Tajeuna, M. Bouguessa, and S. Wang, “Tracking the evolution of Ph.D. degree in computer science from Zhejiang
community structures in time-evolving social networks,” in Proc. IEEE University, Hangzhou, China, in 2008.
Int. Conf. Data Sci. Adv. Anal., 2015, pp. 1–10. He is currently a Professor with the College
[22] T. Falkowski, A. Barth, and M. Spiliopoulou, “Studying community of Computer Science and Technology, Zhejiang
dynamics with an incremental graph mining algorithm,” in Proc. 14th University. His research interests include spatial and
Americas Conf. Inf. Syst., 2008, pp. 1–11. spatio-temporal databases and spatio-textual data
[23] Y. Wang, B. Wu, and N. Du, “Community evolution of social network: processing.
Feature algorithm and model,” Phys. Soc., vol. 804, p. 4356, Apr. 2008. Prof. Gao is a member of ACM and a Senior
[24] J. Zhang, Y. Zhu, and Z. Chen, “Evolutionary game dynamics of Member of CCF.
multiagent systems on multiple community networks,” IEEE Trans.
Syst., Man, Cybern., Syst., to be published.
[25] Y. Liu, H. Gao, X. Kang, Q. Liu, R. Wang, and Z. Qin, “Fast community
discovery and its evolution tracking in time-evolving social networks,” in
Proc. IEEE Int. Conf. Data Min. Workshop (ICDMW), 2015, pp. 13–22.
[26] P. Wang, J. Lü, and X. Yu, “Identification of important nodes in directed Rong-Hua Li received the Ph.D. degree in computer
biological networks: A network motif approach,” PLoS ONE, vol. 9, science from the Chinese University of Hong Kong,
no. 8, 2014, Art. no. e106132. Hong Kong, in 2013.
[27] P. Wang, C. Yang, H. Chen, Q. Leng, S. Li, and D. Wang, “Exploring He is currently an Associate Professor with
transcription factors reveals crucial members and regulatory networks the School of Computer Science and Technology,
involved in different abiotic stresses in Brassica Napus L,” BMC Plant Beijing Institute of Technology, Beijing, China, also
Biol., vol. 18, p. 202, Sep. 2018. with the Guangdong Province Key Laboratory of
[28] P. Wang, J. Lü, X. Yu, and Z. Liu, “Duplication and divergence effect Popular High Performance Computers, Shenzhen
on network motifs in undirected bio-molecular networks,” IEEE Trans. University, Shenzhen, China, and also with the
Biomed. Circuits Syst., vol. 9, no. 3, pp. 312–320, Jun. 2015. Guangdong Province Engineering Center of China-
[29] P. Wang, Y. Chen, J. Lu, Q. Wang, and X. Yu, “Graphical features of made High Performance Data Computing System,
functional genes in human protein interaction network,” IEEE Trans. Shenzhen University. His research interests include social network analysis
Biomed. Circuits Syst., vol. 10, no. 3, pp. 707–720, Jun. 2016. and graph data management.

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.
QIAO et al.: DYNAMIC COMMUNITY EVOLUTION ANALYSIS FRAMEWORK FOR LARGE-SCALE COMPLEX NETWORKS 6243

Jianbin Huang received the Ph.D. degree in pat- Xindong Wu (Fellow, IEEE) received the Ph.D.
tern recognition and intelligent systems from Xidian degree in artificial intelligence from the University
University, Xi’an, China, in 2007. of Edinburgh, Edinburgh, U.K., in 1993.
He is currently a Professor with the School He is the President of Mininglamp Academy of
of Computer Science and Technology, Xidian Sciences, Mininglamp Technology, Beijing, China,
University. His research interests are in data mining and the Professor with the Key Laboratory of
and knowledge discovery. Knowledge Engineering with Big Data, Hefei
University of Technology, Ministry of Education,
Hefei, China. His research interests include data
mining and knowledge-based systems.
Prof. Wu is a fellow of the AAAS.

Heli Sun received the Ph.D. degree in computer sci-


ence from Xi’an Jiaotong University, Xi’an, China,
in 2011.
She is currently an Associate Professor with the
School of Computer Science and Technology, Xi’an
Jiaotong University. She is a Visiting Scholar with
the University of California at Los Angeles, Los
Angeles, CA, USA, from 2017 to 2018. Her research
interests are in graph mining, social network analy-
sis, and smart city.

Authorized licensed use limited to: VIT University. Downloaded on March 11,2024 at 04:49:13 UTC from IEEE Xplore. Restrictions apply.

You might also like