2001.06362
2001.06362
Networks
Tian Bian,1,2 Xi Xiao,1 Tingyang Xu,2 Peilin Zhao,2 Wenbing Huang,2 Yu Rong,2 Junzhou Huang2
1
Tsinghua University
2
Tencent AI Lab
[email protected],[email protected], [email protected], [email protected]
{tingyangxu, masonzhao, joehhuang}@tencent.com
arXiv:2001.06362v1 [cs.SI] 17 Jan 2020
1 Construct Propagation and Dispersion Graphs W1T D ∈ Rv1 ×v2 are the filter parameter matrices of TD-
Based on the retweet and response relationships, we con- GCN. Here we adopt ReLU function as the activation func-
struct the propagate structure hV, Ei for a rumor event ci . tion, σ(·). Dropout (Srivastava et al. 2014) is applied on
Then, let A ∈ Rni ×ni and X be its corresponding adja- GCN Layers (GCLs) to avoid over-fitting. Similar to Eqs.
cency matrix and feature matrix of ci based on the spread- (4) and (5), we calculate the bottom-up hidden features HBU
1
ing tree of rumors, respectively. A only contains the edges and HBU2 for BU-GCN in the same manner as Eq. (4) and
from the upper nodes to the lower nodes as illustrated in Fig- Eq. (5).
ure 1(b). At each training epoch, p percentage of edges are
dropped via Eq. (3) to form A0 , which avoid penitential over- 3 Root Feature Enhancement
fitting issues (Rong et al. 2019). Based on A0 and X, we can As we know, the source post of a rumor event always has
build our Bi-GCN model. Our Bi-GCN consists of two com- abundant information to make a wide impact. It is necessary
ponents: a Top-Down Graph Convolutional Network (TD- to better make use of the information from the source post,
GCN) and a Bottom-Up Graph Convolutional Network (BU- and learn more accurate node representations from the rela-
GCN). The adjacency matrices of two components are dif- tionship between nodes and the source post.
ferent. For TD-GCN, the adjacency matrix is represented as Consequently, besides the hidden features from TD-GCN
AT D = A0 . Meanwhile, for BU-GCN, the adjacency ma- and BU-GCN, we propose an operation of root feature en-
trix is ABU = A0> . TD-GCN and BU-GCN adopt the same hancement to improve the performance of rumor detection
feature matrix X. as shown in Figure 2. Specifically, for TD-GCN at the k-
th GCL, we concatenate the hidden feature vectors of every
2 Calculate the High-level Node Representations nodes with the hidden feature vector of the root node from
After the DropEdge operation, the top-down propagation the (k − 1)-th GCL to construct a new feature matrix as
features and the bottom-up propagation features are obtained TD
by TD-GCN and BU-GCN, respectively. H̃k = concat(HTk D , (HTk−1
D root
) ) (6)
By substituting AT D and X to Eq. (2) over two layers, we
write the equations for TD-GCN as below: with HT0 D = X. Therefore, we express TD-GCN with the
TD root feature enhancement by replacing HT1 D in Eq. (5) with
HT1 D = σ Â XW0T D , (4) TD TD
H̃1 = concat(HT1 D , Xroot ), and then get H̃2 as follows:
TD
HT2 D = σ Â HT1 D W1T D ,
TD TD
(5) HT2 D = σ Â H̃1 W1T D , (7)
where HT1 D ∈ Rn×v1 and HT2 D ∈ Rn×v2 represent the hid- TD
den features of two layer TD-GCN. W0T D ∈ Rd×v1 and H̃2 = concat(HT2 D , (HT1 D )root ). (8)
Similarly, the hidden feature metrics of BU-GCN with root
BU BU Table 1: Statistics of the datasets
feature enhancement, H̃1 and H̃2 , are obtained in the
Statistic Weibo Twitter15 Twitter16
same manner as Eq. (7) and Eq. (8).
# of posts 3,805,656 331,612 204,820
4 Representations of Propagation and Dispersion # of Users 2,746,818 276,663 173,487
for Rumor Classification # of events 4664 1490 818
The representations of propagation and dispersion are the # of True rumors 2351 374 205
aggregations from the node representations of TD-GCN and # of False rumors 2313 370 205
BU-GCN, respectively. Here we employ mean-pooling op-
# of Unverified rumors 0 374 203
erators to aggregate information from these two sets of the
node representations. It is formulated as # of Non-rumors 0 372 205
Avg. time length / event 2,460.7 Hours 1,337 Hours 848 Hours
TD
ST D = MEAN(H̃2 ), (9) Avg. # of posts / event 816 223 251
BU BU Max # of posts / event 59,318 1,768 2,765
S = MEAN(H̃2 ). (10)
Min # of posts / event 10 55 81
Then, we concatenate the representations of propagation and
the representation of dispersion to merge the information as
S = concat(ST D , SBU ). (11) Experimental Setup We compare the proposed method
Finally, the label of the event ŷ is calculated via several full with some state-of-the-art baselines, including:
connection layers and a softmax layer: • DTC (Castillo, Mendoza, and Poblete 2011): A rumor de-
ŷ = Sof tmax(F C(S)). (12) tection method using a Decision Tree classifier based on
where ŷ ∈ R1×C is a vector of probabilities for all the various handcrafted features to obtain information credi-
classes used to predict the label of the event. bility.
We train all the parameters in the Bi-GCN model by mini- • SVM-RBF (Yang et al. 2012): A SVM-based model with
mizing the cross-entropy of the predictions and ground truth RBF kernel, using handcrafted features based on the over-
distributions, Y , over all events, C. L2 regularizer is applied all statistics of the posts.
in the loss function over all the model parameters.
• SVM-TS (Ma et al. 2015): A linear SVM classifier that
leverages handcrafted features to construct time-series
Experiments model.
In this section, we first evaluate the empirical performance
of our proposed Bi-GCN method in comparison with sev- • SVM-TK (Ma, Gao, and Wong 2017): A SVM classifier
eral baseline models. Then, we investigate the effect of each with a propagation Tree Kernel on the basis of the propa-
variant of the proposed method. Finally, we also examine gation structures of rumors.
the capability of early rumor detection for both the proposed • RvNN (Ma, Gao, and Wong 2018): A rumor detection ap-
method and the compared methods. proach based on tree-structured recursive neural networks
with GRU units that learn rumor representations via the
Settings and Datasets propagation structure.
Datasets We evaluate our proposed method on three real- • PPC RNN+CNN (Liu and Wu 2018): A rumor detection
world datasets: Weibo (Ma et al. 2016), Twitter15 (Ma, Gao, model combining RNN and CNN, which learns the rumor
and Wong 2017), and Twitter16 (Ma, Gao, and Wong 2017). representations through the characteristics of users in the
Weibo and Twitter are the most popular social media sites rumor propagation path.
in China and the U.S., respectively. In all the three datasets,
nodes refer to users, edges represent retweet or response re- • Bi-GCN: Our GCN-based rumor detection model utiliz-
lationships, and features are the extracted top-5000 words in ing the Bi-directional propagation structure.
terms of the TF-IDF values as mentioned in the Bi-GCN We implement DTC and SVM-based models with scikit-
Rumor Detection Model Section. The Weibo dataset con- learn1 ; PPC RNN+CNN with Keras2 ; RvNN and our
tains two binary labels: False Rumor (F) and True Rumor method with Pytorch3 . To make a fair comparison, we ran-
(T), while Twitter15 and Twitter16 datasets contains four la- domly split the datasets into five parts, and conduct 5-
bels: Non-rumor (N), False Rumor (F), True Rumor (T), and fold cross-validation to obtain robust results. For the Weibo
Unverified Rumor (U). The label of each event in Weibo is dataset, we evaluate the Accuracy (Acc.) over the two cat-
annotated according to Sina community management cen- egories and Precision (Prec.), Recall (Rec.), F1 measure
ter, which reports various misinformation (Ma et al. 2016). (F1 ) on each class. For the two Twiter datasets, we evalu-
And the label of each event in Twitter15 and Twitter16 is ate Acc. over the four categories and F1 on each class. The
annotated according to the veracity tag of the article in ru-
1
mor debunking websites (e.g., snopes.com, Emergent.info, https://scikit-learn.org
2
etc) (Ma, Gao, and Wong 2017). The statistics of the three https://keras.io/
3
datasets are shown in Table 1. https://pytorch.org/
Table 2: Rumor detection results on Weibo dataset (F: False Table 3: Rumor detection results on Twitter15 and Twitter16
Rumor; T: True Rumor) datasets (N: Non-Rumor; F: False Rumor; T: True Rumor;
U: Unverified Rumor)
Method Class Acc. Prec. Rec. F1
Twitter15
F 0.847 0.815 0.831
DTC 0.831 N F T U
T 0.815 0.824 0.819 Method Acc.
F 0.777 0.656 0.708 F1 F1 F1 F1
SVM-RBF 0.879
T 0.579 0.708 0.615 DTC 0.454 0.415 0.355 0.733 0.317
F 0.950 0.932 0.938 SVM-RBF 0.318 0.225 0.082 0.455 0.218
SVM-TS 0.885
T 0.124 0.047 0.059
SVM-TS 0.544 0.796 0.472 0.404 0.483
F 0.912 0.897 0.905
RvNN 0.908 SVM-TK 0.750 0.804 0.698 0.765 0.733
T 0.904 0.918 0.911
F 0.884 0.957 0.919 RvNN 0.723 0.682 0.758 0.821 0.654
PPC RNN+CNN 0.916
T 0.955 0.876 0.913 PPC RNN+CNN 0.477 0.359 0.507 0.300 0.640
F 0.961 0.964 0.961 Bi-GCN 0.886 0.891 0.860 0.930 0.864
Bi-GCN 0.961
T 0.962 0.962 0.960
Twitter16
N F T U
Method Acc.
parameters of Bi-GCN are updated using stochastic gradi- F1 F1 F1 F1
ent descent, and we optimize the model by Adam algorithm DTC 0.473 0.254 0.080 0.190 0.482
(Kingma and Ba 2014). The dimension of each node’s hid-
SVM-RBF 0.553 0.670 0.085 0.117 0.361
den feature vectors are 64. The dropping rate in DropEdge
is 0.2 and the rate of dropout is 0.5. The training process is SVM-TS 0.574 0.755 0.420 0.571 0.526
iterated upon 200 epochs, and early stopping (Yao, Rosasco, SVM-TK 0.732 0.740 0.709 0.836 0.686
and Caponnetto 2007) is applied when the validation loss
RvNN 0.737 0.662 0.743 0.835 0.708
stops decreasing by 10 epochs. Note that we do not employ
SVM-TK on the Weibo dataset due to its exponential com- PPC RNN+CNN 0.564 0.591 0.543 0.394 0.674
plexity on large datasets. Bi-GCN 0.880 0.847 0.869 0.937 0.865
Overall Performance
Table 2 and Table 3 show the performance of the proposed the information of the source posts, which helps improve our
method and all the compared methods on the Weibo and models much more.
Twitter datasets, respectively.
First, among the baseline algorithms, we observe that the Ablation Study
deep learning methods performs significantly better than To analyze the effect of each variant of Bi-GCN, we com-
those using hand-crafted features. It is not surprising, since pare the proposed method with TD-GCN, BU-GCN, UD-
the deep learning methods are able to learn high-level repre- GCN and their variants without the root feature enhance-
sentations of rumors to capture valid features. This demon- ment. The empirical results are summarized in Figure 3. UD-
strates the importance and necessity of studying deep learn- GCN, TD-GCN, and BU-GCN represent our GCN-based
ing for rumor detection. rumor detection models utilize the UnDirected, Top-Down
Second, the proposed method outperforms the and Bottom-Up structures, respectively. Meanwhile, ”root”
PPC RNN+CNN method in terms of all the perfor- refers to the GCN-based model with concatenating root fea-
mance measures, which indicates the effectiveness of tures in the networks while ”no root” represents the GCN-
incorporating the dispersion structure for rumor detection. based model without concatenating root features in the net-
Since RNN and CNN cannot process data with the graph works. Some conclusions are drawn from Figure 3. First, Bi-
structure, PPC RNN+CNN ignores important structural GCN, TD-GCN, BU-GCN, and UD-GCN outperforms their
features of rumor dispersion. This prevents it from obtaining variants without the root feature enhancement, respectively.
efficient high-level representations of rumors, resulting in This indicates that the source posts plays an important role
worse performance on rumor detection. in rumor detection. Second, TD-GCN and BU-GCN can not
Finally, Bi-GCN is significantly superior to the RvNN always achieve better results than UD-GCN, but Bi-GCN is
method. Since RvNN only uses the hidden feature vector of always superior to UD-GCN, TD-GCN and BU-GCN. This
all the leaf nodes so that it is heavily impacted by the infor- implies the importance to simultaneously consider both top-
mation of the latest posts. However, the latest posts are al- down representations from the ancestor nodes, and bottom-
ways lack of information such as comments, and just follow up representations from the children nodes. Finally, even the
the former posts. Unlike RvNN, the root feature enhance- worst results in Figures 3 are better than those of other base-
ment allows the proposed method to pay more attention to line methods in Table 2 and 3 by a large gap, which again
(a) Weibo dataset (b) Twitter15 dataset (c) Twitter16 dataset
Figure 3: The rumor detection performance of the GCN-based methods on three datasets