2023 Rumor Detection Driven by Graph Attention Capsule Network On Dynamic Propagation Structures
2023 Rumor Detection Driven by Graph Attention Capsule Network On Dynamic Propagation Structures
https://doi.org/10.1007/s11227-022-04831-7
Abstract
Rumor detection aims to judge the authenticity of posts on social media (such as
Weibo and Twitter), which can effectively prevent the spread of rumors. While
many recent rumor detection methods based on graph neural networks can be con-
ducive to extracting the global features of rumors, each node of the rumor propa-
gation structure learned from graph neural networks is considered to have multiple
individual scalar features, which are insufficient for reflecting the deep-level rumor
properties. To address the above challenge, we propose a novel model named graph
attention capsule network on dynamic propagation structures (GACN) for rumor
detection. Specifically, GACN consists of two components: a graph attention net-
work enforced by capsule network that can encode static graphs into substructure
classification capsules for mining the deep-level properties of rumor, and a dynamic
network framework that can divide the rumor structure into multiple static graphs
in chronological order for capturing the dynamic interactive features in the evolving
process of the rumor propagation structure. Moreover, we use the capsule attention
mechanism to combine the capsules generated from each substructure to focus more
on informative substructures in rumor propagation. Extensive validation on two
real-world datasets demonstrates the superiority of GACN over baselines.
* Peng Yang
[email protected]
Extended author information available on the last page of the article
13
Vol.:(0123456789)
5202 P. Yang et al.
Fig. 1 An example of rumor event on Twitter with a source post and related comments
1 Introduction
The rapid development of social media has changed the way people communi-
cate with each other in daily life and has contributed to the proliferation of many
rumors. Rumors quickly and widely spread, and their proliferation pollutes the
social network ecology and affects users’ access to high-quality information.
Selecting the COVID-19 epidemic in 2020 as an example, the rapid spread of
much false information on social networks has caused public panic. Therefore,
correctly identifying rumors has become an important research task for scholars
and even the industry.
A rumor is defined as a story or statement in general circulation without con-
firmation or certainty of fact [1]. Consider Fig. 1 as an example, which shows a
rumor event on Twitter with a source post and related comments. Previous meth-
ods mainly focus on the use of machine learning to detect rumors, such as Decision
Tree [2], Random Forest [3], and Support Vector Machine(SVM) [4]. These meth-
ods are trained using extracted features that can effectively represent rumors, such as
user features, text content, and propagation patterns [2, 4–7]. Such methods heavily
rely on feature engineering, which is time-consuming and requires numerous human
resources. Moreover, the handcrafted features are highly subjective and cannot cap-
ture deep-level features. In recent years, to extract higher-order features, many deep
learning techniques have been widely applied in the field of rumor detection. Based
on deep learning models, such as CNNs [8–11] and RNNs [12–14], researchers have
proposed many models of rumor detection. However, these models fail to take into
account the characteristics of the rumor propagation structure. Recent graph mod-
els such as GCN [15, 16], GAT [17], and GraphSage [18, 19] have successively
emerged, attracting the attention of numerous researchers. Tian et al. [20] proposed
a bidirectional graph convolutional network structure, in which the upwards and
downwards propagation modes of social media texts were combined to effectively
capture the global features of the rumor structure.
Although graph neural networks have been widely employed in rumor detec-
tion [20–27], certain problems still need to be solved. When the rumor propaga-
tion structure is learned from the graph neural network to the graph embedding,
the learning representation of each text node is considered to comprise multiple
13
Rumor detection driven by graph attention capsule network… 5203
• We propose a new model driven by the graph attention capsule network, GACN
(Graph Attention Capsule Network on Dynamic Propagation Structures), to
effectively mine the deep-level properties of rumors. To the best of our knowl-
edge, this is the first application of capsule network for rumor detection.
• To capture the dynamic interactive features in the evolving process of the rumor
propagation structure, we elaborately design a strategy that divides the dynamic
propagation structure of rumor into multiple static graphs in chronological order.
• The GACN model is evaluated on two social network datasets, and the experi-
mental results demonstrate that the proposed method has higher rumor detec-
tion performance than other advanced baselines.
13
5204 P. Yang et al.
[30] ✓ × ×
[2] ✓ ✓ ×
[12] ✓ × ×
[31] ✓ × ×
[4] × ✓ ×
[32] × ✓ ×
[33] ✓ × ✓
[34] ✓ × ✓
[20] ✓ × ✓
[35] ✓ × ✓
✓ and × mean that the method has or does not have this capability,
respectively
2 Related work
The research on rumor detection is mainly divided into two aspects: (1) text con-
tent and user information, and (2) propagation structure. The comparison of pre-
vious works is summarized in Table 1.
13
Rumor detection driven by graph attention capsule network… 5205
verification type, is collected from the user’s profile. Yang et al. [4] extracted
user characteristics for classification, such as gender, geographic location, and
the number of followers. Castillo et al. [2] utilized user characteristics on Twitter
to detect fake news, which includes the number of followers, number of friends,
age of registration, etc. Liu et al. [32] combined RNN and CNN to capture
users’ characteristics based on time series to improve the performance of rumor
detection.
2.2 Propagation structure
3 Problem Definition
13
5206 P. Yang et al.
4 Method
In this section, we introduce the overall framework of GACN and its detail of
implementation. Figure 2 shows an overview of our framework, which will be
described in the following sections. Table 2 summarizes some important nota-
tions used in this paper. We propose a Graph Attention Capsule Network on
Dynamic Propagation Structures, named GACN. This model consists of two com-
ponents: the graph attention capsule network which mines the deep-level proper-
ties of rumors and the dynamic network framework which captures the dynamic
interactive features in the evolution process of the rumor propagation structure.
Fig. 2 Illustration of our GACN model. It consists of two components: the dynamic network framework
(left) and the graph attention capsule network (right)
13
Rumor detection driven by graph attention capsule network… 5207
We design the graph attention capsule network to obtain the deep-level properties
of each propagation structure inspired by capsule network [28]. The network con-
sists of three parts: global features of nodes, source post encoding, and substruc-
ture classification capsules. We first describe how to generate the global features
of nodes.
We construct graph propagation structure < V, E > for each rumor c, depending on
the response relationships between comments and between comments and source
post. V (t) = {x1 , x2 , ..., xn(t) } represents the nodes in the t − th propagation substruc-
ture S(t) =< V (t) , E(t) > , including x1 being the node of the source post, xi being
the node of the comments, and E(t) representing the set of edges that describe the
response relationship between nodes. For each textual content, we encode them
using the TF-IDF model [36].
13
5208 P. Yang et al.
First, the affine transformation matrix 𝐁 transforms the initial feature xi into a
hidden vector hi (0) , as shown in Eq.2. The GAT, proposed by Veličković et al. [17],
is then applied to the hidden vectors to obtain the global features of each node. In
updating the feature vectors of graph nodes, the graph attention network aggregates
the neighbor information according to the weights to obtain the representation that
contains the neighbor information. The detailed steps are presented as follows:
∑
hi (l+1) = 𝜎( 𝛼ij 𝐖hj (l) ) (4)
j∈Ni
The 𝛼ij obtained by Eq. 3 represents the weight of importance when the features of
the node j are aggregated to node i, where Ni is the first-order neighbors of node i,
both 𝐖 and a are trainable parameters, hi (l) is the representation of the i-th graph
node after l graph attention layers are sequentially computed, and LeakyRelu is the
activation function. By summing the vectors of neighbors according to the weights
𝛼ij , we can obtain the vector hi (l+1) of node i at layer l + 1 of the graph, as shown in
Eq. 4. We concatenate the vectors Hi obtained from the output of each graph atten-
tion layer for node i to obtain the global vector 𝐇 ∈ ℝN×p×dm on the substructure S(t) .
It is formulated as:
𝐇 = concat([H1 , H2 , … , HN ]) (6)
where p is the number of graph network layers, N denotes the number of graph
nodes, and dm is the dimension of the hidden vector after each graph attention layer
is encoded.
4.1.2 Source post encoding
The source post contains rich information about a rumor, which is beneficial to
strengthen the feature representation of graph nodes. In our proposed model, the
transformer encoder [37] is used to encode the source post.
First, we use GloVe [38] to generate a word vector {w1r , w2r , ..., wnr } for each word
r
in the source post x1 , where nr refers to the number of words. Second, the self-atten-
tion mechanism is used to measure the importance of words. It is formulated as:
13
Rumor detection driven by graph attention capsule network… 5209
r = mean(hr 1 , hr 2 , … , hr nr ) (8)
where hr is the result of the encoding by the encoder module in the transformer.
Last, the final source post feature r ∈ ℝdr is obtained by averaging the hidden vec-
tors of all words, where dr is the dimension of the source post feature.
After the aggregation calculation of each node through the graph attention network,
the obtained vector 𝐇 indicates that it has been able to capture the global features of
the propagation structure. Considering that the source post of the rumor can high-
light the typical features of the rumor, the model concatenates the source post feature
with the global features of each node to obtain the representation 𝐀 ∈ ℝN∗p∗(dm +dr )
of each graph node which is enhanced. We use the one-dimensional convolution
conv1d(∙) by concatenating the features of different graph network layers at the same
location to obtain the primary capsules 𝐇 ∈ ℝN∗q∗dc , where q is the number of pri-
�
(10)
�
𝐇 = conv1d(𝐀)
Substructure classification capsules are obtained by transforming primary capsules,
which indicate the type of rumors. Each substructure classification capsule is a vec-
tor, and the scalar values in the vector represent the deep-level properties of the
rumor.
Node normalization is performed to generate the attention value 𝛼primary which
is applied to the primary capsules. To obtain the normalized primary capsules
𝐔 ∈ ℝN×q×dc , we employ two fully connected layers. It is formulated as:
(11)
�
𝛼 = FC2 (FC1 (𝐇 ))
�
𝐔 = 𝛼primary ∗ 𝐇 (12)
The dynamic routing mechanism in the capsule network [28] is employed to trans-
form primary capsules into substructure classification capsules. In the algorithm, bij
is initialized and the coupling coefficient weight mi is obtained, which represents the
contribution of the normalized primary capsules i to the substructure classification
capsules. It is formulated as:
bij = bij + uj|i ∗ vj (13)
mi = softmax(bi ) (14)
where bij and uj|i denote the training parameters and prediction vectors, respectively,
of the normalized primary capsule i to the substructure classification capsule j. The
13
5210 P. Yang et al.
prediction vector uj|i is obtained by the weight wij of the normalized primary cap-
sules 𝐔 . The initialized capsule weight mij is used to obtain the intermediate value sj
of the substructure classification capsule j. sj is calculated by the activation function
Squashing [28] to obtain the output vj of the substructure classification capsule j.
The weight mij is iteratively updated through the prediction vector uj|i and capsule
output vj . It is formulated as:
uj|i = wij 𝐔 (15)
∑
sj = mij ∗ uj|i (16)
‖ ‖2
‖sj ‖ sj
vj = ‖ ‖ 2 . (17)
‖ ‖ ‖ ‖2
1 + ‖sj ‖ ‖sj ‖
‖ ‖ ‖ ‖
To effectively capture the dynamic propagation characteristics of rumors, after the cap-
sule network obtains the classification capsule vectors G(i) = [v1 , v2 , … , vk ] of the sub-
structure S(i) , where k is the number of rumor classes, we use the self-attention mecha-
nism to measure the importance of the classification capsules of each substructure as
shown in Fig. 3.
The capsules belonging to the same category in all structures are formed into a
matrix as the common values of Q , K , and V , where Q , K , and V are the query vector,
key vector and value vector, respectively. Therefore, the attentional substructure clas-
sification capsules hG(i) are calculated according to Eq. 19, where dK is used to keep
the gradient stable. The final classification capsules 𝐈 ∈ ℝk∗f are obtained by averaging
13
Rumor detection driven by graph attention capsule network… 5211
over all substructure vectors hG(i) , where k is the number of categories and f is the
dimension of the final classification capsules. It is formulated as:
(QK T )
Attention(Q, K, V) = softmax( √ )V (19)
dK
T
1∑
𝐈= h (i) (20)
T i=1 G
4.3 Prediction and classification
In this paper, the margin loss function [28] is used as the loss function of our pro-
posed model. It can be expressed as Eq. 21:
∑{ 2
}
Lossc = T̃ i max (0, m+ − ||𝐈i ||) + 𝜆(1 − T̃ i ) min (0, ||𝐈i || − m− )2 (21)
i
where i is the type of rumor, ||𝐈i || is the output probability of final classification cap-
sule i , T̃ i is the indicator function of classification (the existence of class i is 1, and
the nonexistence is 0), and m+ is the upper bound, penalizing false positives, i.e.,
predicting the presence of class i but its true absence. m− is the lower bound, penal-
izing false negatives, i.e., predicting the absence of class i but its true presence, and
𝜆 is the scaling factor, adjusting the weight of both. The overall algorithm descrip-
tion of our GACN is shown in Algorithm 1.
13
5212 P. Yang et al.
5 Experiment
In this section, we first introduce the two mainstream datasets used in the experi-
ment, including Twitter15 [39] and Twitter16 [33], and then describe the base-
line models selected in this article that contrast with the proposed method. Next,
the experimental evaluation indicators are described. Finally, we describe the
parameter settings of our proposed model.
13
Rumor detection driven by graph attention capsule network… 5213
5.1 Dataset
This paper will evaluate our proposed method on two publicly available datasets,
including Twitter15 [39] and Twitter16 [33] . The statistics of the datasets are
shown in Table 3.
Twitter15 and Twitter16 were created by Ma et al. [33, 39], who collected
rumor information from the Twitter website, a famous international social net-
working platform at different times. The authors collect 1490 and 818 rumors
that are marked into four categories: true rumor (TR), false rumor (FR), unveri-
fied rumor (UR), and nonrumor (NR). The method of dividing the Twitter15 and
Twitter16 datasets refers to existing research [34], and the experiment is carried
out using a fivefold cross-validation method.
5.2 Baselines
In this paper, certain state-of-the-art methods for rumor detection are selected as
the baseline models to compare with our proposed model. These models are pre-
sented as follows:
• DTC [2]: A rumor detection model based on global handcrafted features to build
a decision tree classifier to obtain information credibility.
• SVM-RBF [4]: A support vector machine classifier based on RBF kernel which
uses statistical features manually constructed from blog content.
• SVM-TS [33]: A linear support vector machine classifier based on temporal con-
text features.
• SVM-TK [39]: An SVM classifier with a propagation tree kernel on the basis of
the propagation structures of rumors.
• GRU-RNN [12]: An RNN-based model that captures contextual information
from continuous representations of relevant posts over time.
13
5214 P. Yang et al.
• RvNN [34]: A model based on RNN models to model the propagation direction
and the diffusion direction, respectively, to learn the feature vector representation
of the propagation tree.
• GLAN [21]: A model that combines global and local features to build heteroge-
neous graphs and extract features.
• GCAN [23]: A GCN-based model that can describe the rumor propagation mode
and use the dual co-attention mechanism to capture the relationship among
source post, user characteristics and propagation path.
• Bi-GCN [20]: A graph convolution neural network model based on the propaga-
tion direction and diffusion direction of the propagation tree.
• P-BiGAT [22]: A bidirectional graph attention networks based on the propaga-
tion tree and diffusion tree through the tweet comment and reposting relation-
ship.
5.3 Evaluation metrics
2 × TP
F1 = (23)
2 × TP + FP + FN
where TP (true positive) denotes the number of cases in which the true category
is positive, and the predicted category is positive, FP (false positive) denotes the
number of cases in which the true category is negative and the predicted category is
positive, FN (false negative) denotes the number of cases in which the true category
is positive and the predicted category is negative, and TN (true negative) denotes the
number of cases in which the true category is negative and the predicted category is
negative.
5.4 Experimental settings
Our experiments are implemented with PyTorch. The optimizer is Adam [40] with
a learning rate of 0.001. Our model uses TF-IDF to initialize the text nodes, and its
dimension is 5000. We adopt Glove with a dimension of 200 to encode the source
post. The model is trained by using 3 layers of graph attention network layers and
the hidden vector dimension between each layer is 64. The rate of dropout is 0.5.
The experiments are set up with an early stopping strategy, which is to terminate
the training when the accuracy of the validation set no longer decreases within 10
iterations. A 5-fold cross-validation method is applied to evaluate the experimental
13
Rumor detection driven by graph attention capsule network… 5215
The values in bold indicate the maximum value under different indi-
cators
effects, and the results are averaged as the metrics of the model. The hyperparam-
eters involved in the experiment are listed in Table 4.
6 Experimental results
In this section, we validate our proposed model in three aspects: rumor detection
results, ablation study, and visualization.
Table 5 and Table 6 record the detection effects of the state-of-the-art models on
Twitter15 and Twitter16. Among them, we rerun the open-source code provided by
13
5216 P. Yang et al.
The values in bold indicate the maximum value under different indi-
cators
Bian et al. [20] and Lu et al. [23] to obtain the experimental results of Bi-GCN and
GACN.
From the results in Table 5 and Table 6, we can determine that our proposed
GACN model outperforms the state-of-the-art baselines on both Twitter15 and Twit-
ter16. GACN improves 1.9% in accuracy compared with the optimal baseline model
on Twitter15, while improving 0.4% on Twitter16. The results indicate that our pro-
posed model has much better detection performance.
First, as shown in Table 5 and Table 6, all models based on deep learning (GRU-
RNN, RvNN, GLAN, GCAN, Bi-GCN, P-BiGAT, and GACN) are superior to most
of traditional models based on handcrafted features. This result confirms that deep
learning methods are more accurate than traditional methods in extracting the deeper
features of rumors.
Second, compared with RvNN, GCAN, Bi-GCN, P-BiGAT, and GACN are
slightly better in all metrics. The main reason is that RvNN is a tree-based model
that cannot capture long-distance dependencies in sequences. The graph-based
models can solve this problem by capturing global characteristics in rumors, which
improves the detection of rumors.
Last, in the graph-based models, the accuracy of GACN is higher than that of
other graph-based models, which is attributed to the notion that these models only
handle a single static graph, and cannot capture the dynamic interaction characteris-
tics of rumors in the propagation process. On the other hand, these models can only
extract the scalar features of graph nodes, which is not enough to effectively mine
the deep-level properties of rumors.
13
Rumor detection driven by graph attention capsule network… 5217
Acc F1 Acc F1
NR FR TR UR NR FR TR UR
w/o Dynamic 88.5 82.9 87.9 95.0 88.7 88.2 77.5 91.4 96.3 87.5
w/o Root 87.5 92.3 89.0 85.3 83.5 88.8 98.8 87.8 88.9 79.5
w/o Capsule 87.2 91.6 87.7 84.7 84.7 87.6 91.4 93.5 87.5 78.6
w/o Cap- 85.4 80.3 86.9 92.7 81.9 87.0 78.1 88.9 95.1 85.7
sule &w/o
Dynamic
GACN 88.9 91.0 92.4 87.5 84.5 90.0 96.4 88.9 90.0 84.6
The values in bold indicate the maximum value under different indicators
6.2 Ablation study
• w/o Dynamic: This is a static graph capsule network model, not including
dynamic network framework.
• w/o Root: This model only considers the rumor propagation structure and does
not consider the source post.
• w/o Capsule: This model replaces the capsule network after the graph network
model with a fully connected layer and applies cross-entropy as the loss func-
tion.
• w/o Capsule &w/o Dynamic: The model does not take into account the capsule
network and dynamic network framework.
The results in Table 7 show that our proposed model has the highest accuracy on
Twitter15 and Twitter16, while the w/o Capsule &w/o Dynamic achieves poor
detection performance. The accuracy of GACN, w/o Root, and w/o Dynamic out-
perform the w/o Capsule and w/o Capsule &w/o Dynamic without the capsule net-
work, respectively, which confirms that the capsule network can better excavate the
deep-level properties of rumors. Second, compared with w/o Root, our proposed
model achieves better results, indicating that the information of the source post fea-
ture has an important role in improving the performance of rumor detection. Table 7
shows that the GACN and w/o Capsule models outperform the corresponding w/o
Dynamic and w/o Capsule &w/o Dynamic without a dynamic network framework
in terms of accuracy metrics, which further demonstrates the effectiveness of the
dynamic propagation structure. The results suggest the necessity of capturing the
dynamic interactive features in the evolving process of the rumor propagation struc-
ture. In summary, the dynamic propagation structure of rumors and the capsule net-
work have a certain auxiliary effect for rumor detection.
13
5218 P. Yang et al.
6.3 Visualization
GACN mines the deep-level properties of the rumor by using the capsule network.
To clarify the practical effect of the properties in each capsule, we use Tsne [41] to
extract the values of each channel in the final classification capsules for dimension-
ality reduction and then present them in the form of visualization in Table 8. For
Table 8, the categories of rumors are labeled with 0, 1, 2, and 3, representing nonru-
mor, false rumor, true rumor, and unverified rumor, and the corresponding points in
the figures are colored red, green, yellow, and blue, respectively. In each row of the
table, we compare two categories, three categories, and four categories of rumors.
Selecting the first Row 0_1 as an example, we compare nonrumor and false rumors.
Considering the size limitation of the paper, we choose channels that can highlight
the properties of rumors for analysis.
13
Rumor detection driven by graph attention capsule network… 5219
Table 8 shows that when the two categories are compared, the vectors of each
channel can distinguish specific categories after dimensionality reduction. For
example, Channel3 can well distinguish 1 from 2, but it is difficult to distinguish
1 from 3. In contrast, Channel1 can easily distinguish 1 and 3, but not 1 and 2.
This result indicates that the properties captured by Channel1 may be the factor
determining the difference between 1 and 2, as well as Channel3. Channel2 can
almost completely distinguish 0 from 2 and 2 from 3, which also explains why the
types of rumors represented by 0, 2, and 3 can be completely divided when com-
paring the three categories of rumors 0_2_3. In the four classification rows, none
of the individual channels can effectively classify each category, so it is necessary
to integrate all channels to obtain their respective properties for classification.
Compared with scalar-based deep learning models, using the capsule network can
better integrate the properties captured by each channel to improve the accuracy
of rumor detection.
Next, we use Tsne to reduce the dimensionality of each classification capsule,
and the obtained results are shown in Table 9. Almost every classification capsule
can roughly distinguish different types of rumors, indicating that each classification
capsule focuses on the properties of rumors in different categories. Compared with
the scalar-based neural network, the graph structure of rumors can be represented
as multiple graph embeddings, each of which contains rich properties, so the use of
capsule network can fully extract the features of rumors.
7 Conclusion
13
5220 P. Yang et al.
Data availability The datasets analyzed during the current study are available from the corresponding
author on reasonable request.
Declarations
Conflict of interest The authors declare that they have no competing interests.
Consent for Publication All authors have checked the manuscript and have agreed to the submission.
Ethics approval All authors read and approved the final version of the manuscript.
References
1. DiFonzo N, Bordia P (2007) Rumor psychology: Social and organizational approaches
2. Castillo C, Mendoza M, Poblete B (2011) Information credibility on twitter. In: Proceedings of the
20th International Conference on World Wide Web,P 675–684
3. Kwon S, Cha M, Jung K, Chen W, Wang Y (2013) Prominent features of rumor propagation in
online social media. In: 2013 IEEE 13th International Conference on Data Mining, pp. 1103–1108 .
IEEE
4. Yang F, Liu Y, Yu X, Yang M (2012) Automatic detection of rumor on sina weibo. In: Proceedings
of the ACM SIGKDD Workshop on Mining Data Semantics, pp. 1–7
5. Qazvinian V, Rosengren E, Radev D, Mei Q (2011) Rumor has it: Identifying misinformation in
microblogs. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language
Processing, pp. 1589–1599
6. Wang AH (2010) Don’t follow me: Spam detection in twitter. In: 2010 International Conference on
Security and Cryptography (SECRYPT), pp. 1–10 . IEEE
7. Ratkiewicz J, Conover M, Meiss M, Gonçalves B, Patil S, Flammini A, Menczer F (2010) Detecting
and tracking the spread of astroturf memes in microblog streams. arXiv preprint arXiv:1011.3768
8. Yu F, Liu Q, Wu S, Wang L, Tan T et al (2017) A Convolutional Approach for Misinformation Iden-
tification. In: IJCAI, pp. 3901–3907
9. Yu F, Liu Q, Wu S, Wang L, Tan T (2019) Attention-based convolutional approach for misinforma-
tion identification from massive and noisy microblog posts. Comput Secur 83:106–121
13
Rumor detection driven by graph attention capsule network… 5221
10. Azri A, Favre C, Harbi N, Darmont J, Noûs C (2021) Calling to cnn-lstm for rumor detection: A
deep multi-channel model for message veracity classification in microblogs. In: Joint European
Conference on Machine Learning and Knowledge Discovery in Databases, pp. 497–513 . Springer
11. Peng Y, Wang J (2021) Rumor detection based on attention cnn and time series of context informa-
tion. Future Internet 13(11):1–18
12. Ma J, Gao W, Mitra P, Kwon S, Jansen BJ, Wong K-F, Cha M (2016) Detecting rumors from micro-
blogs with recurrent neural networks. In: Proceedings of the Twenty-Fifth International Joint Con-
ference on Artificial Intelligence, pp. 3818–3824
13. Ajao O, Bhowmik D, Zargari S (2018) Fake news identification on twitter with hybrid cnn and
rnn models. In: Proceedings of the 9th International Conference on Social Media and Society, pp.
226–230
14. Asghar MZ, Habib A, Habib A, Khan A, Ali R, Khattak A (2021) Exploring deep neural networks
for rumor detection. J Ambient Intell Human Comput 12(4):4315–4333
15. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In:
5th International Conference on Learning Representations, pp. 1–14
16. Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with
fast localized spectral filtering. Adva Neural Inform process syst 29:3844–3852
17. Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) Graph attention net-
works. stat 1050:4
18. Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In:
Proceedings of the 31st International Conference on Neural Information Processing Systems, pp.
1025–1035
19. Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? arXiv preprint
arXiv:1810.00826
20. Bian T, Xiao X, Xu T, Zhao P, Huang W, Rong Y, Huang J (2020) Rumor detection on social media
with bi-directional graph convolutional networks. In: Proceedings of the AAAI Conference on Arti-
ficial Intelligence, vol. 34, pp. 549–556
21. Yuan C, Ma Q, Zhou W, Han J, Hu S (2019) Jointly embedding the local and global relations of
heterogeneous graph for rumor detection. In: 2019 IEEE International Conference on Data Mining
(ICDM), pp. 796–805 . IEEE
22. Yang X, Ma H, Wang M (2022) Rumor detection with bidirectional graph attention networks. Secur
Commun Netw 2022:1–13
23. Lu Y-J, Li C-T (2020) Gcan: Graph-aware co-attention networks for explainable fake news detection
on social media. In: Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics, pp. 505–514
24. Song Y-Z, Chen Y-S, Chang Y-T, Weng S-Y, Shuai H-H (2021) Adversary-aware rumor detection.
In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1371–1382
25. Li J, Bao P, Shen H, Li X (2021) Mistr: A multiview structural-temporal learning framework for
rumor detection. IEEE Transact Big Data 01:1–13
26. Li C, Peng H, Li J, Sun L, Lyu L, Wang L, Yu PS, He L (2022) Joint stance and rumor detection in
hierarchical heterogeneous graph. IEEE Transact Neural Netw Learn Syst 33(6):2530–2542. https://
doi.org/10.1109/TNNLS.2021.3114027
27. Ran H, Jia C, Zhang P, Li X (2022) Mgat-esm: multi-channel graph attention neural network with
event-sharing module for rumor detection. Inform Sci 592:402–416
28. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Proceedings of the
31st international conference on neural information processing systems, pp. 3859–3869
29. Xinyi Z, Chen L (2018) Capsule graph neural network. In: International Conference on Learning
Representations, pp. 1–16
30. Chua AY, Banerjee S (2016) Linguistic predictors of rumor veracity on the internet. In: Proceedings
of the International MultiConference of Engineers and Computer Scientists, vol. 1, pp. 387–391
31. Liu Z, Wei Z, Zhang R (2017) Rumor detection based on convolutional neural network. J Comput
Appl 37(11):3053–3056
32. Liu Y, Wu Y-FB (2018) Early detection of fake news on social media through propagation path clas-
sification with recurrent and convolutional networks. In: Proceedings of the Thirty-Second AAAI
Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence
Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, pp.
354–361
13
5222 P. Yang et al.
33. Ma J, Gao W, Wei Z, Lu Y, Wong K-F (2015) Detect rumors using time series of social context
information on microblogging websites. In: Proceedings of the 24th ACM International on Confer-
ence on Information and Knowledge Management, pp. 1751–1754
34. Ma J, Gao W, Wong K-F (2018) Rumor detection on twitter with tree-structured recursive neural
networks. In: Proceedings of the 56th Annual Meeting of the Association for Computational Lin-
guistics (Volume 1: Long Papers), pp. 1980–1989
35. Yang X, Lyu Y, Tian T, Liu Y, Liu Y, Zhang X (2021) Rumor detection on social media with graph
structured adversarial learning. In: Proceedings of the Twenty-Ninth International Conference on
International Joint Conferences on Artificial Intelligence, pp. 1417–1423
36. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inform process
manage 24(5):513–523
37. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017)
Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008
38. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Pro-
ceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),
pp. 1532–1543
39. Ma J, Gao W, Wong K-F (2017) Detect rumors in microblog posts using propagation structure via
kernel learning. In: Proceedings of the 55th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pp. 708–717
40. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: ICLR (Poster), pp. 1–15
41. van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the
author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article
is solely governed by the terms of such publishing agreement and applicable law.
13