0% found this document useful (0 votes)
6 views

2001.06362

Uploaded by

FD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

2001.06362

Uploaded by

FD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Rumor Detection on Social Media with Bi-Directional Graph Convolutional

Networks

Tian Bian,1,2 Xi Xiao,1 Tingyang Xu,2 Peilin Zhao,2 Wenbing Huang,2 Yu Rong,2 Junzhou Huang2
1
Tsinghua University
2
Tencent AI Lab
[email protected],[email protected], [email protected], [email protected]
{tingyangxu, masonzhao, joehhuang}@tencent.com
arXiv:2001.06362v1 [cs.SI] 17 Jan 2020

Abstract Decision Tree (Castillo, Mendoza, and Poblete 2011), Ran-


dom Forest (Kwon et al. 2013), Support Vector Machine
Social media has been developing rapidly in public due to
(SVM) (Yang et al. 2012). Some studies apply more ef-
its nature of spreading new information, which leads to ru-
mors being circulated. Meanwhile, detecting rumors from fective features, such as user comments (Giudice 2010),
such massive information in social media is becoming an ar- temporal-structural features (Wu, Yang, and Zhu 2015), and
duous challenge. Therefore, some deep learning methods are the emotional attitude of posts (Liu et al. 2015). However,
applied to discover rumors through the way they spread, such those methods mainly rely on feature engineering, which is
as Recursive Neural Network (RvNN) and so on. However, very time-consuming and labor-intensive. Moreover, those
these deep learning methods only take into account the pat- handcrafted features are usually lack of high-level represen-
terns of deep propagation but ignore the structures of wide tations extracted from the propagation and the dispersion of
dispersion in rumor detection. Actually, propagation and dis- rumors.
persion are two crucial characteristics of rumors. In this pa-
per, we propose a novel bi-directional graph model, named Recent studies have exploited deep learning methods that
Bi-Directional Graph Convolutional Networks (Bi-GCN), to mine high-level representations from propagation path/trees
explore both characteristics by operating on both top-down or networks to identify rumors. Many deep learning models
and bottom-up propagation of rumors. It leverages a GCN such as Long Short Term Memory (LSTM), Gated Recurrent
with a top-down directed graph of rumor spreading to learn Unit (GRU), and Recursive Neural Networks (RvNN) (Ma
the patterns of rumor propagation; and a GCN with an op- et al. 2016; Ma, Gao, and Wong 2018) are employed since
posite directed graph of rumor diffusion to capture the struc- they are capable to learn sequential features from rumor
tures of rumor dispersion. Moreover, the information from
source post is involved in each layer of GCN to enhance the
propagation along time. However, these approaches have a
influences from the roots of rumors. Encouraging empirical significant limitation on efficiency since temporal-structural
results on several benchmarks confirm the superiority of the features only pay attention to the sequential propagation of
proposed method over the state-of-the-art approaches. rumors but neglect the influences of rumor dispersion. The
structures of rumor dispersion also indicate some spread-
ing behaviors of rumors. Thus, some studies have tried to
Introduction involve the information from the structures of rumor dis-
With the rapid development of the Internet, social media persion by invoking Convolutional Neural Network (CNN)
has become a convenient online platform for users to obtain based methods (Yu et al. 2017; Yu et al. 2019). CNN-based
information, express opinions and communicate with each methods can obtain the correlation features within local
other. As more and more people are keen to participate in neighbors but cannot handle the global structural relation-
discussions about hot topics and exchange their opinions on ships in graphs or trees (Bruna et al. 2014). Therefore, the
social media, many rumors appear. Due to a large number global structural features of rumor dispersion are ignored in
of users and easy access to social media, rumors can spread these approaches. Actually, CNN is not designed to learn
widely and quickly on social media, bringing huge harm to high-level representations from structured data but Graph
society and causing a lot of economic losses. Therefore, re- Convolutional Network (GCN) is (Kipf and Welling 2017).
garding to the potential panic and threat caused by rumors,
So can we simply apply GCN to rumor detection since
it is urgent to come up with a method to identify rumors on
it has successfully made progress in various fields, such as
social media efficiently and as early as possible.
social networks (Hamilton, Ying, and Leskovec 2017), phys-
Conventional detection methods mainly adopt hand-
ical systems (Battaglia et al. 2016), and chemical drug dis-
crafted features such as user characteristics, text contents
covery (Defferrard, Bresson, and Vandergheynst 2016)? The
and propagation patterns to train supervised classifiers, e.g.,
answer is no. As shown in Figure 1(a), GCN, or called undi-
Copyright c 2020, Association for the Advancement of Artificial rected GCN (UD-GCN), only aggregates information relied
Intelligence (www.aaai.org). All rights reserved. on the relationships among relevant posts but loses the se-
quential orders of follows. Although UD-GCN has the abil-
ity to handle the global structural features of rumor disper-
sion, it does not consider the direction of the rumor propa-
gation, which however has been shown to be an important
clue for rumor detection (Wu, Yang, and Zhu 2015). Specif-
ically, deep propagation along a relationship chain (Han
et al. 2014) and wide dispersion across a social commu-
nity (Thomas 2007) are two major characteristics of rumors,
which is eager for a method to serve both.
To deal with both propagation and dispersion of rumors, (a) UD-GCN (b) TD-GCN (c) BU-GCN
in this paper, we propose a novel Bi-directional GCN (Bi-
GCN), which operates on both top-down and bottom-up Figure 1: (a) the undirected graph with only node relation-
propagation of rumors. The proposed method obtains the ships; (b) the deep propagation along a relationship chain
features of propagation and dispersion via two parts, the from top to down; (c) the aggregation of the wide dispersion
Top-Down graph convolutional Networks (TD-GCN) and within a community to an upper node.
Bottom-Up graph convolutional Networks (BU-GCN), re-
spectively. As shown in Figure 1(b) and 1(c), TD-GCN for-
wards information from the parent node of a node in a rumor (Ma et al. 2015) classified the rumor by using the time-
tree to formulate rumor propagation while BU-GCN aggre- series to model the variation of handcrafted social context
gates information from the children nodes of a node in a ru- features. Wu et al. (Wu, Yang, and Zhu 2015) proposed a
mor tree to represent rumor dispersion. Then, the representa- graph kernel-based hybrid SVM classifier by combining the
tions of propagation and dispersion pooled from the embed- RBF kernel with a random-walk-based graph kernel. Ma et
ding of TD-GCN and BU-GCN are merged together through al. (Ma, Gao, and Wong 2017) constructed a propagation
full connections to make the final results. Meanwhile, we tree kernel to detect rumors by evaluating the similarities be-
concatenate the features of the roots in rumor trees with the tween their propagation tree structures. These methods not
hidden features at each GCN layer to enhance the influences only were ineffective but also heavily relied on handcrafted
from the roots of rumors. Moreover, we employ DropEdge feature engineering to extract informative feature sets.
(Rong et al. 2019) in the training phase to avoid over-fitting In order to automatically learn high-level features, a num-
issues of our model. The main contributions of this work are ber of recent methods were proposed to detect rumor based
as follows: on deep learning models. Ma et al. utilized Recurrent Neu-
• We leverage Graph Convolutional Networks to detect ru- ral Networks (RNN) to capture the hidden representation
mors. To the best of our knowledge, this is the first study from temporal content features (Ma et al. 2016). Chen et
of employing GCN in rumor detection of social media. al. (Chen et al. 2018) improved this approach by combining
attention mechanisms with RNN to focus on text features
• We propose the Bi-GCN model that not only considers with different attentions. Yu et al. (Yu et al. 2017) proposed
the causal features of rumor propagation along relation- a method based on Convolutional Neural Network (CNN)
ship chains from top to down but also obtains the struc- to learn key features scattered among an input sequence and
tural features from rumor dispersion within communities shape high-level interactions among significant features. Liu
through the bottom-up gathering. et al. (Liu and Wu 2018) incorporated both RNN and CNN
• We concatenate the features of the source post with other to get the user features based on time series. Recently, Ma
posts at each graph convolutional layer to make a compre- et al. (Ma, Gao, and Wong 2019) employed the adversarial
hensive use of the information from the root feature and learning method to improve the performance of rumor clas-
achieve excellent performance in rumor detection. sifier, where the discriminator is used as a classifier and the
corresponding generator improves the discriminator by gen-
Experimental results on three real-world datasets show
erating conflicting noises. In addition, Ma et al. built a tree-
that our Bi-GCN method outperforms several state-of-the-
structured Recursive Neural Networks (RvNN) to catch the
art approaches; and for the task of early detection of rumors,
hidden representation from both propagation structures and
which is quite crucial to identify rumors in real time and
text contents (Ma, Gao, and Wong 2018). However, these
prevent them from spreading, Bi-GCN also achieves much
methods are too inefficient to learn the features of the prop-
higher effectiveness.
agation structure, and they also ignore the global structural
features of rumor dispersion.
Related Work Compared to the deep-learning models mentioned above,
In recent years, automatic rumor detection on social media GCN is able to capture global structural features from
has attracted a lot of attention. Most previous work for ru- graphs or trees better. Inspired by the success of CNN
mor detection mainly focuses on extracting rumor features in the field of computer vision, GCN has demonstrated
from the text contents, user profiles and propagation struc- state-of-the-art performances in various tasks with graph
tures to learn a classifier from labeled data (Castillo, Men- data (Battaglia et al. 2016; Defferrard, Bresson, and Van-
doza, and Poblete 2011; Yang et al. 2012; Kwon et al. 2013; dergheynst 2016; Hamilton, Ying, and Leskovec 2017).
Liu et al. 2015; Zhao, Resnick, and Mei 2015). Ma et al. Scarselli et al. (Scarselli et al. 2008) firstly introduced GCN
as a special massage-passing model for either undirected Graph Convolutional Networks
graphs or directed graphs. Later on, Bruna et al. (Bruna et Recently, there is an increasing interest in generalizing con-
al. 2014) theoretically analyzed graph convolutional meth- volutions to the graph domain. Among all the existing
ods for undirected graphs based on the spectral graph the- works, GCN is one of the most effective convolution mod-
ory. Subsequently, Defferrard et al. (Defferrard, Bresson, els, whose convolution operation is considered as a general
and Vandergheynst 2016) developed a method named the ”message-passing” architecture as follows:
Chebyshev Spectral CNN (ChebNet) and used the Cheby-
shev polynomials as the filter. After this work, Kipf et al. Hk = M (A, Hk−1 ; Wk−1 ), (1)
(Kipf and Welling 2017) presented a first-order approxima-
tion of ChebNet (1stChebNet), where the information of where Hk ∈ Rn×vk is the hidden feature matrix computed
each node is aggregated from the node itself and its neigh- by the k − th Graph Conventional Layer (GCL) and M is
boring nodes. Our rumor detection model is inspired by the the message propagation function, which depends on the ad-
GCN. jacency matrix A, the hidden feature matrix Hk−1 and the
trainable parameters Wk−1 .
There are many kinds of message propagation functions
Preliminaries M for GCN (Bruna et al. 2014; Defferrard, Bresson, and
We introduce some fundamental concepts that are necessary Vandergheynst 2016). Among them, the message propaga-
for our method. First the notation used in this paper is as tion function defined in the first-order approximation of
follows. ChebNet (1stChebNet) (Kipf and Welling 2017) is as fol-
lows:
Notation Hk = M (A, Hk−1 ; Wk−1 ) = σ(ÂHk−1 Wk−1 ). (2)
Let C = {c1 , c2 , ..., cm } be the rumor detection dataset, − 12 − 12
where ci is the i-th event and m is the number of events. ci = In the above equation  = D̃ ÃD̃ is the normal-
{ri , w1i , w2i , ..., wni i −1 , Gi }, where ni refers to the number ized adjacency matrix, where Ã=A + IN (i.e., adding self-
of posts in ci , ri is the source post, each wji represents the connection), D̃ii =Σj Ãij that represents the degree of the
j-th relevant responsive post, and Gi refers to the propaga- i − th node; Wk−1 ∈ Rvk−1 ×vk ; and σ(·) is an activation
tion structure. Specifically, Gi is defined as a graph hVi , Ei i function, e.g., the ReLU function.
with ri being the root node (Wu, Yang, and Zhu 2015; Ma,
Gao, and Wong 2017), where Vi = {ri , w1i , . . . , wni i −1 }, DropEdge
and Ei = {eist |s, t = 0, . . . , ni −1} that represents the set of DropEdge is a novel method to reduce over-fitting for GCN-
edge from responded posts to the retweeted posts or respon- based models (Rong et al. 2019). In each training epoch, it
sive posts, as shown in Figure 1(b). For example, if w2i has a randomly drops out edges from the input graphs to gener-
response to w1i , there will be an directed edge w1i → w2i , i.e., ate different deformed copies with certain rate. As a result,
ei12 . If w1i has a response to ri , there will be an directed edge this method augments the randomness and the diversity of
ri → w1i , i.e., ei01 . Denote Ai ∈ {0, 1}ni ×ni as an adjacency input data, just like rotating or flapping images at random.
matrix where Formally, suppose the total number of edges in the graph A
is Ne and the dropping rate is p, then the adjacency matrix
1, if eist ∈ Ei

i
ats = . after DropEdge, A0 , is computed as below:
0, otherwise
A0 = A − Adrop (3)
>
Denote Xi = [xi> i> i>
0 , x1 , ..., xni −1 ]
as a feature matrix ex-
where Adrop is the matrix constructed using Ne × p edges
tracted from the posts in ci , where xi0 represents the feature randomly sampled from the original edge set.
vector of ri and each other row feature xij represents the fea-
ture vector of wji . Bi-GCN Rumor Detection Model
Moreover, each event ci is associated with a ground-truth
label yi ∈ {F, T } (i.e., False Rumor or True Rumor). In In this section, we propose an effective GCN-based method
some cases, the label yi is one of the four finer-grained for rumor detection based on the rumor propagation and the
classes {N, F, T, U } (i.e., Non-rumor, False Rumor, True rumor dispersion, named as Bi-directional Graph Convo-
Rumor, and Unverified Rumor) (Ma, Gao, and Wong 2017; lutional Networks (Bi-GCN). The core idea of Bi-GCN is
Zubiaga et al. 2018). Given the dataset, the goal of rumor to learn suitable high-level representations from both rumor
detection is to learn a classifier propagation and rumor dispersion. In our Bi-GCN model,
two-layer 1stChebNet are adopted as the fundamental GCN
f : C → Y, components. As shown in Figure 2, we elaborate the rumor
detection process using Bi-GCN in 4 steps.
where C and Y are the sets of events and labels respectively, We first discuss how to apply the Bi-GCN model to one
to predict the label of an event based on text contents, user event, i.e., ci → yi for the i-th event. The other events are
information and propagation structure constructed by the re- calculated in the same manner. To better present our method,
lated posts from that event. we omit the subscript i in the following content.
Figure 2: Our Bi-GCN rumor detection model. X denotes the original feature matrix input to the Bi-GCN model, and Hk is
the hidden features matrix generated from the k − th GCL. Xroot and Hroot
1 represents the matrix extended by the features of
source post.

1 Construct Propagation and Dispersion Graphs W1T D ∈ Rv1 ×v2 are the filter parameter matrices of TD-
Based on the retweet and response relationships, we con- GCN. Here we adopt ReLU function as the activation func-
struct the propagate structure hV, Ei for a rumor event ci . tion, σ(·). Dropout (Srivastava et al. 2014) is applied on
Then, let A ∈ Rni ×ni and X be its corresponding adja- GCN Layers (GCLs) to avoid over-fitting. Similar to Eqs.
cency matrix and feature matrix of ci based on the spread- (4) and (5), we calculate the bottom-up hidden features HBU
1
ing tree of rumors, respectively. A only contains the edges and HBU2 for BU-GCN in the same manner as Eq. (4) and
from the upper nodes to the lower nodes as illustrated in Fig- Eq. (5).
ure 1(b). At each training epoch, p percentage of edges are
dropped via Eq. (3) to form A0 , which avoid penitential over- 3 Root Feature Enhancement
fitting issues (Rong et al. 2019). Based on A0 and X, we can As we know, the source post of a rumor event always has
build our Bi-GCN model. Our Bi-GCN consists of two com- abundant information to make a wide impact. It is necessary
ponents: a Top-Down Graph Convolutional Network (TD- to better make use of the information from the source post,
GCN) and a Bottom-Up Graph Convolutional Network (BU- and learn more accurate node representations from the rela-
GCN). The adjacency matrices of two components are dif- tionship between nodes and the source post.
ferent. For TD-GCN, the adjacency matrix is represented as Consequently, besides the hidden features from TD-GCN
AT D = A0 . Meanwhile, for BU-GCN, the adjacency ma- and BU-GCN, we propose an operation of root feature en-
trix is ABU = A0> . TD-GCN and BU-GCN adopt the same hancement to improve the performance of rumor detection
feature matrix X. as shown in Figure 2. Specifically, for TD-GCN at the k-
th GCL, we concatenate the hidden feature vectors of every
2 Calculate the High-level Node Representations nodes with the hidden feature vector of the root node from
After the DropEdge operation, the top-down propagation the (k − 1)-th GCL to construct a new feature matrix as
features and the bottom-up propagation features are obtained TD
by TD-GCN and BU-GCN, respectively. H̃k = concat(HTk D , (HTk−1
D root
) ) (6)
By substituting AT D and X to Eq. (2) over two layers, we
write the equations for TD-GCN as below: with HT0 D = X. Therefore, we express TD-GCN with the
 TD  root feature enhancement by replacing HT1 D in Eq. (5) with
HT1 D = σ Â XW0T D , (4) TD TD
H̃1 = concat(HT1 D , Xroot ), and then get H̃2 as follows:
 TD 
HT2 D = σ Â HT1 D W1T D ,
 TD TD 
(5) HT2 D = σ Â H̃1 W1T D , (7)
where HT1 D ∈ Rn×v1 and HT2 D ∈ Rn×v2 represent the hid- TD
den features of two layer TD-GCN. W0T D ∈ Rd×v1 and H̃2 = concat(HT2 D , (HT1 D )root ). (8)
Similarly, the hidden feature metrics of BU-GCN with root
BU BU Table 1: Statistics of the datasets
feature enhancement, H̃1 and H̃2 , are obtained in the
Statistic Weibo Twitter15 Twitter16
same manner as Eq. (7) and Eq. (8).
# of posts 3,805,656 331,612 204,820
4 Representations of Propagation and Dispersion # of Users 2,746,818 276,663 173,487
for Rumor Classification # of events 4664 1490 818
The representations of propagation and dispersion are the # of True rumors 2351 374 205
aggregations from the node representations of TD-GCN and # of False rumors 2313 370 205
BU-GCN, respectively. Here we employ mean-pooling op-
# of Unverified rumors 0 374 203
erators to aggregate information from these two sets of the
node representations. It is formulated as # of Non-rumors 0 372 205
Avg. time length / event 2,460.7 Hours 1,337 Hours 848 Hours
TD
ST D = MEAN(H̃2 ), (9) Avg. # of posts / event 816 223 251
BU BU Max # of posts / event 59,318 1,768 2,765
S = MEAN(H̃2 ). (10)
Min # of posts / event 10 55 81
Then, we concatenate the representations of propagation and
the representation of dispersion to merge the information as
S = concat(ST D , SBU ). (11) Experimental Setup We compare the proposed method
Finally, the label of the event ŷ is calculated via several full with some state-of-the-art baselines, including:
connection layers and a softmax layer: • DTC (Castillo, Mendoza, and Poblete 2011): A rumor de-
ŷ = Sof tmax(F C(S)). (12) tection method using a Decision Tree classifier based on
where ŷ ∈ R1×C is a vector of probabilities for all the various handcrafted features to obtain information credi-
classes used to predict the label of the event. bility.
We train all the parameters in the Bi-GCN model by mini- • SVM-RBF (Yang et al. 2012): A SVM-based model with
mizing the cross-entropy of the predictions and ground truth RBF kernel, using handcrafted features based on the over-
distributions, Y , over all events, C. L2 regularizer is applied all statistics of the posts.
in the loss function over all the model parameters.
• SVM-TS (Ma et al. 2015): A linear SVM classifier that
leverages handcrafted features to construct time-series
Experiments model.
In this section, we first evaluate the empirical performance
of our proposed Bi-GCN method in comparison with sev- • SVM-TK (Ma, Gao, and Wong 2017): A SVM classifier
eral baseline models. Then, we investigate the effect of each with a propagation Tree Kernel on the basis of the propa-
variant of the proposed method. Finally, we also examine gation structures of rumors.
the capability of early rumor detection for both the proposed • RvNN (Ma, Gao, and Wong 2018): A rumor detection ap-
method and the compared methods. proach based on tree-structured recursive neural networks
with GRU units that learn rumor representations via the
Settings and Datasets propagation structure.
Datasets We evaluate our proposed method on three real- • PPC RNN+CNN (Liu and Wu 2018): A rumor detection
world datasets: Weibo (Ma et al. 2016), Twitter15 (Ma, Gao, model combining RNN and CNN, which learns the rumor
and Wong 2017), and Twitter16 (Ma, Gao, and Wong 2017). representations through the characteristics of users in the
Weibo and Twitter are the most popular social media sites rumor propagation path.
in China and the U.S., respectively. In all the three datasets,
nodes refer to users, edges represent retweet or response re- • Bi-GCN: Our GCN-based rumor detection model utiliz-
lationships, and features are the extracted top-5000 words in ing the Bi-directional propagation structure.
terms of the TF-IDF values as mentioned in the Bi-GCN We implement DTC and SVM-based models with scikit-
Rumor Detection Model Section. The Weibo dataset con- learn1 ; PPC RNN+CNN with Keras2 ; RvNN and our
tains two binary labels: False Rumor (F) and True Rumor method with Pytorch3 . To make a fair comparison, we ran-
(T), while Twitter15 and Twitter16 datasets contains four la- domly split the datasets into five parts, and conduct 5-
bels: Non-rumor (N), False Rumor (F), True Rumor (T), and fold cross-validation to obtain robust results. For the Weibo
Unverified Rumor (U). The label of each event in Weibo is dataset, we evaluate the Accuracy (Acc.) over the two cat-
annotated according to Sina community management cen- egories and Precision (Prec.), Recall (Rec.), F1 measure
ter, which reports various misinformation (Ma et al. 2016). (F1 ) on each class. For the two Twiter datasets, we evalu-
And the label of each event in Twitter15 and Twitter16 is ate Acc. over the four categories and F1 on each class. The
annotated according to the veracity tag of the article in ru-
1
mor debunking websites (e.g., snopes.com, Emergent.info, https://scikit-learn.org
2
etc) (Ma, Gao, and Wong 2017). The statistics of the three https://keras.io/
3
datasets are shown in Table 1. https://pytorch.org/
Table 2: Rumor detection results on Weibo dataset (F: False Table 3: Rumor detection results on Twitter15 and Twitter16
Rumor; T: True Rumor) datasets (N: Non-Rumor; F: False Rumor; T: True Rumor;
U: Unverified Rumor)
Method Class Acc. Prec. Rec. F1
Twitter15
F 0.847 0.815 0.831
DTC 0.831 N F T U
T 0.815 0.824 0.819 Method Acc.
F 0.777 0.656 0.708 F1 F1 F1 F1
SVM-RBF 0.879
T 0.579 0.708 0.615 DTC 0.454 0.415 0.355 0.733 0.317
F 0.950 0.932 0.938 SVM-RBF 0.318 0.225 0.082 0.455 0.218
SVM-TS 0.885
T 0.124 0.047 0.059
SVM-TS 0.544 0.796 0.472 0.404 0.483
F 0.912 0.897 0.905
RvNN 0.908 SVM-TK 0.750 0.804 0.698 0.765 0.733
T 0.904 0.918 0.911
F 0.884 0.957 0.919 RvNN 0.723 0.682 0.758 0.821 0.654
PPC RNN+CNN 0.916
T 0.955 0.876 0.913 PPC RNN+CNN 0.477 0.359 0.507 0.300 0.640
F 0.961 0.964 0.961 Bi-GCN 0.886 0.891 0.860 0.930 0.864
Bi-GCN 0.961
T 0.962 0.962 0.960
Twitter16
N F T U
Method Acc.
parameters of Bi-GCN are updated using stochastic gradi- F1 F1 F1 F1
ent descent, and we optimize the model by Adam algorithm DTC 0.473 0.254 0.080 0.190 0.482
(Kingma and Ba 2014). The dimension of each node’s hid-
SVM-RBF 0.553 0.670 0.085 0.117 0.361
den feature vectors are 64. The dropping rate in DropEdge
is 0.2 and the rate of dropout is 0.5. The training process is SVM-TS 0.574 0.755 0.420 0.571 0.526
iterated upon 200 epochs, and early stopping (Yao, Rosasco, SVM-TK 0.732 0.740 0.709 0.836 0.686
and Caponnetto 2007) is applied when the validation loss
RvNN 0.737 0.662 0.743 0.835 0.708
stops decreasing by 10 epochs. Note that we do not employ
SVM-TK on the Weibo dataset due to its exponential com- PPC RNN+CNN 0.564 0.591 0.543 0.394 0.674
plexity on large datasets. Bi-GCN 0.880 0.847 0.869 0.937 0.865

Overall Performance
Table 2 and Table 3 show the performance of the proposed the information of the source posts, which helps improve our
method and all the compared methods on the Weibo and models much more.
Twitter datasets, respectively.
First, among the baseline algorithms, we observe that the Ablation Study
deep learning methods performs significantly better than To analyze the effect of each variant of Bi-GCN, we com-
those using hand-crafted features. It is not surprising, since pare the proposed method with TD-GCN, BU-GCN, UD-
the deep learning methods are able to learn high-level repre- GCN and their variants without the root feature enhance-
sentations of rumors to capture valid features. This demon- ment. The empirical results are summarized in Figure 3. UD-
strates the importance and necessity of studying deep learn- GCN, TD-GCN, and BU-GCN represent our GCN-based
ing for rumor detection. rumor detection models utilize the UnDirected, Top-Down
Second, the proposed method outperforms the and Bottom-Up structures, respectively. Meanwhile, ”root”
PPC RNN+CNN method in terms of all the perfor- refers to the GCN-based model with concatenating root fea-
mance measures, which indicates the effectiveness of tures in the networks while ”no root” represents the GCN-
incorporating the dispersion structure for rumor detection. based model without concatenating root features in the net-
Since RNN and CNN cannot process data with the graph works. Some conclusions are drawn from Figure 3. First, Bi-
structure, PPC RNN+CNN ignores important structural GCN, TD-GCN, BU-GCN, and UD-GCN outperforms their
features of rumor dispersion. This prevents it from obtaining variants without the root feature enhancement, respectively.
efficient high-level representations of rumors, resulting in This indicates that the source posts plays an important role
worse performance on rumor detection. in rumor detection. Second, TD-GCN and BU-GCN can not
Finally, Bi-GCN is significantly superior to the RvNN always achieve better results than UD-GCN, but Bi-GCN is
method. Since RvNN only uses the hidden feature vector of always superior to UD-GCN, TD-GCN and BU-GCN. This
all the leaf nodes so that it is heavily impacted by the infor- implies the importance to simultaneously consider both top-
mation of the latest posts. However, the latest posts are al- down representations from the ancestor nodes, and bottom-
ways lack of information such as comments, and just follow up representations from the children nodes. Finally, even the
the former posts. Unlike RvNN, the root feature enhance- worst results in Figures 3 are better than those of other base-
ment allows the proposed method to pay more attention to line methods in Table 2 and 3 by a large gap, which again
(a) Weibo dataset (b) Twitter15 dataset (c) Twitter16 dataset

Figure 3: The rumor detection performance of the GCN-based methods on three datasets

(a) Weibo dataset (b) Twitter15 dataset (c) Twitter16 dataset

Figure 4: Result of rumor early detection on three datasets

verifies the effectiveness of graph convolution for rumor de- Conclusions


tection. In this paper, we propose a GCN-based model for rumor de-
tection on social media, called Bi-GCN. Its inherent GCN
model gives the proposed method the ability of processing
Early Rumor Detection graph/tree structures and learning higher-level representa-
tions more conducive to rumor detection. In addition, we
also improve the effectiveness of the model by concate-
Early detection aims to detect rumor at the early stage of nating the features of the source post after each GCL of
propagation, which is another important metric to evaluate GCN. Meanwhile, we construct several variants of Bi-GCN
the quality of the method. To construct an early detection to model the propagation patterns, i.e., UD-GCN, TD-GCN
task, we set up a series of detection deadlines and only use and BU-GCN. The experimental results on three real-world
the posts released before the deadlines to evaluate the accu- datasets demonstrate that the GCN-based approaches out-
racy of the proposed method and baseline methods. Since perform state-of-the-art baselines in very large margins in
it is difficult for the PPC RNN+CNN method to process the terms of both accuracy and efficiency. In particular, the Bi-
data of variational lengths, we cannot get the accurate results GCN model achieves the best performance by considering
of PPC RNN+CNN at each deadline in this task, so it is not both the causal features of rumor propagation along rela-
compared in this experiment. tionship chains from top to down propagation pattern and
the structural features from rumor dispersion within com-
Figure 4 shows the performances of our Bi-GCN method munities through the bottom-up gathering.
versus RvNN, SVM-TS, SVM-RBF and DTC at various
deadlines for the Weibo and Twitter datasets. From the fig-
ure, it can be seen that the proposed Bi-GCN method reaches Acknowledgments
relatively high accuracy at a very early period after the The authors would like to thank the support of Tencent AI
source post initial broadcast. Besides, the performance of Lab and Tencent Rhino-Bird Elite Training Program. This
Bi-GCN is remarkably superior to other models at each work is supported by the National Natural Science Founda-
deadline, which demonstrates that structural features are not tion of Guangdong Province (2018A030313422), National
only beneficial to long-term rumor detection, but also help- Natural Science Foundation of China (Grant No. 61773229,
ful to the early detection of rumors. No. 61972219) and Overseas Cooperation Research Fund of
Graduate School at Shenzhen, Tsinghua University (Grant [Ma et al. 2016] Ma, J.; Gao, W.; Mitra, P.; Kwon, S.; Jansen, B. J.;
No. HW2018002). Wong, K.-F.; and Cha, M. 2016. Detecting rumors from microblogs
with recurrent neural networks. In Ijcai, 3818–3824.
References [Ma, Gao, and Wong 2017] Ma, J.; Gao, W.; and Wong, K.-F. 2017.
Detect rumors in microblog posts using propagation structure via
[Battaglia et al. 2016] Battaglia, P.; Pascanu, R.; Lai, M.; Rezende, kernel learning. In Proceedings of the 55th Annual Meeting of the
D. J.; et al. 2016. Interaction networks for learning about objects, Association for Computational Linguistics (Volume 1: Long Pa-
relations and physics. In Advances in neural information process- pers), 708–717.
ing systems, 4502–4510.
[Ma, Gao, and Wong 2018] Ma, J.; Gao, W.; and Wong, K.-F. 2018.
[Bruna et al. 2014] Bruna, J.; Zaremba, W.; Szlam, A.; and Lecun, Rumor detection on twitter with tree-structured recursive neural
Y. 2014. Spectral networks and locally connected networks on networks. In Proceedings of the 56th Annual Meeting of the As-
graphs. In International Conference on Learning Representations sociation for Computational Linguistics (Volume 1: Long Papers),
(ICLR2014), CBLS, April 2014, http–openreview. 1980–1989.
[Castillo, Mendoza, and Poblete 2011] Castillo, C.; Mendoza, M.; [Ma, Gao, and Wong 2019] Ma, J.; Gao, W.; and Wong, K.-F. 2019.
and Poblete, B. 2011. Information credibility on twitter. In Pro- Detect rumors on twitter by promoting information campaigns with
ceedings of the 20th international conference on World wide web, generative adversarial learning. In The World Wide Web Confer-
675–684. ACM. ence, 3049–3055. ACM.
[Chen et al. 2018] Chen, T.; Li, X.; Yin, H.; and Zhang, J. 2018. [Rong et al. 2019] Rong, Y.; Huang, W.; Xu, T.; and Junzhou, H.
Call attention to rumors: Deep attention based recurrent neural net- 2019. The truly deep graph convolutional networks for node clas-
works for early rumor detection. In Pacific-Asia Conference on sification. arXiv preprint arXiv:1907.10903.
Knowledge Discovery and Data Mining, 40–52. Springer.
[Scarselli et al. 2008] Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagen-
[Defferrard, Bresson, and Vandergheynst 2016] Defferrard, M.; buchner, M.; and Monfardini, G. 2008. The graph neural network
Bresson, X.; and Vandergheynst, P. 2016. Convolutional neural model. IEEE Transactions on Neural Networks 20(1):61–80.
networks on graphs with fast localized spectral filtering. In
[Srivastava et al. 2014] Srivastava, N.; Hinton, G.; Krizhevsky, A.;
Advances in neural information processing systems, 3844–3852.
Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: a simple way
[Giudice 2010] Giudice, K. D. 2010. Crowdsourcing credibility: to prevent neural networks from overfitting. The journal of machine
The impact of audience feedback on web page credibility. In learning research 15(1):1929–1958.
Proceedings of the 73rd ASIS&T Annual Meeting on Navigating
[Thomas 2007] Thomas, S. A. 2007. Lies, damn lies, and rumors:
Streams in an Information Ecosystem-Volume 47, 59. American
an analysis of collective efficacy, rumors, and fear in the wake of
Society for Information Science.
katrina. Sociological Spectrum 27(6):679–703.
[Hamilton, Ying, and Leskovec 2017] Hamilton, W.; Ying, Z.; and [Wu, Yang, and Zhu 2015] Wu, K.; Yang, S.; and Zhu, K. Q. 2015.
Leskovec, J. 2017. Inductive representation learning on large False rumors detection on sina weibo by propagation structures.
graphs. In Advances in Neural Information Processing Systems, In 2015 IEEE 31st international conference on data engineering,
1024–1034. 651–662. IEEE.
[Han et al. 2014] Han, S.; Zhuang, F.; He, Q.; Shi, Z.; and Ao, X. [Yang et al. 2012] Yang, F.; Liu, Y.; Yu, X.; and Yang, M. 2012.
2014. Energy model for rumor propagation on social networks. Automatic detection of rumor on sina weibo. In Proceedings of the
Physica A: Statistical Mechanics and its Applications 394:99–109. ACM SIGKDD Workshop on Mining Data Semantics, 13. ACM.
[Kingma and Ba 2014] Kingma, D. P., and Ba, J. 2014. [Yao, Rosasco, and Caponnetto 2007] Yao, Y.; Rosasco, L.; and
Adam: A method for stochastic optimization. arXiv preprint Caponnetto, A. 2007. On early stopping in gradient descent learn-
arXiv:1412.6980. ing. Constructive Approximation 26(2):289–315.
[Kipf and Welling 2017] Kipf, N., T., and Welling, M. 2017. Semi- [Yu et al. 2017] Yu, F.; Liu, Q.; Wu, S.; Wang, L.; and Tan, T. 2017.
supervised classification with graph convolutional networks. In A convolutional approach for misinformation identification. In
Proceedings of the International Conference on Learning Repre- Proceedings of the 26th International Joint Conference on Artifi-
sentations. cial Intelligence, 3901–3907. AAAI Press.
[Kwon et al. 2013] Kwon, S.; Cha, M.; Jung, K.; Chen, W.; and [Yu et al. 2019] Yu, F.; Liu, Q.; Wu, S.; Wang, L.; and Tan, T. 2019.
Wang, Y. 2013. Prominent features of rumor propagation in online Attention-based convolutional approach for misinformation identi-
social media. In 2013 IEEE 13th International Conference on Data fication from massive and noisy microblog posts. Computers &
Mining, 1103–1108. IEEE. Security 83:106–121.
[Liu and Wu 2018] Liu, Y., and Wu, Y.-F. 2018. Early detection of [Zhao, Resnick, and Mei 2015] Zhao, Z.; Resnick, P.; and Mei, Q.
fake news on social media through propagation path classification 2015. Enquiring minds: Early detection of rumors in social media
with recurrent and convolutional networks. In 32nd AAAI Confer- from enquiry posts. In Proceedings of the 24th International Con-
ence on Artificial Intelligence, AAAI 2018, 354–361. AAAI press. ference on World Wide Web, 1395–1405. International World Wide
[Liu et al. 2015] Liu, X.; Nourbakhsh, A.; Li, Q.; Fang, R.; and Web Conferences Steering Committee.
Shah, S. 2015. Real-time rumor debunking on twitter. In Proceed- [Zubiaga et al. 2018] Zubiaga, A.; Aker, A.; Bontcheva, K.; Li-
ings of the 24th ACM International on Conference on Information akata, M.; and Procter, R. 2018. Detection and resolution of ru-
and Knowledge Management, 1867–1870. ACM. mours in social media: A survey. ACM Computing Surveys (CSUR)
[Ma et al. 2015] Ma, J.; Gao, W.; Wei, Z.; Lu, Y.; and Wong, K.-F. 51(2):32.
2015. Detect rumors using time series of social context informa-
tion on microblogging websites. In Proceedings of the 24th ACM
International on Conference on Information and Knowledge Man-
agement, 1751–1754. ACM.

You might also like