Rumour Detection Based On Graph Convolutional Neuron
Rumour Detection Based On Graph Convolutional Neuron
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3050563, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2019.DOI
ABSTRACT Rumor detection is an important research topic in social networks, and lots of rumor detection
models are proposed in recent years. For the rumor detection task, structural information in a conversation
can be used to extract effective features. However, many existing rumor detection models focus on local
structural features while the global structural features between the source tweet and its replies are not
effectively used. To make full use of global structural features and content information, we propose Source-
Replies relation Graph (SR-graph) for each conversation, in which every node denotes a tweet, its node
feature is weighted word vectors, and edges denote the interaction between tweets. Based on SR-graphs,
we propose an Ensemble Graph Convolutional Neural Net with a Nodes Proportion Allocation Mechanism
(EGCN) for the rumor detection task. In experiments, we first verify that the extracted structural features are
effective, and then we show the effects of different word-embedding dimensions on multiple test indices.
Moreover, we show that our proposed EGCN model is comparable or even better than the current state-of-art
machine learning models.
INDEX TERMS Rumour detection; Graph Convolutional Neural Nets; Word-vectors embedding
VOLUME x, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3050563, IEEE Access
this paper, both the global structure and the local structure are dimension of word vectors. Based on experiments, we
treated as important features for the rumor detection task. We conclude that the proposed EGCN is optimal based on
aim to model both the global and local structural information F1 scores when the dimension of word vectors is 25.
in a uniform frame and then propose a deep ensemble Neural The remainder of this paper is organized as follows. Sec-
Net to learn these two features for conversations. In the tion 2 introduces the related works, including rumor detection
proposed model, the structural information can be expressed methods that use content features and context features. Sec-
in Source-Replies relation graphs (SR-graphs). In a SR- tion 3 details the proposed model. In Section 4, experimental
graph, a node denotes a tweet, the node feature is its weight- results prove the feasibilities of the proposed model. The last
ed word vectors, and edges denote the interaction between section provides conclusions and outlook.
tweets. The word vectors are used to express the content
information, which are trained by using the Word2Vec model II. RELATED WORK
[11]. However, when the structure of a SR-graph is simple, For rumor detection, Zhao et al. [5] assumes that rumors
the global structural features of SR-graphs between rumors will cause Twitter users to question the veracity of tweets.
and non-rumors may be indistinguishable. In order to ex- This method is based on content information, but not all
tract distinguishable features from conversations in different rumors provoke inquiry tweets. Based on this research, many
lengths, we propose a Nodes Proportion Allocation Mecha- scholars focus on the content features of rumors. Zubiaga
nism (NPAM) to build an ensemble deep neural network for et al. propose an alternative approach that learns context
different conversations. Generally, the complexity of a SR- from breaking news to determine whether a tweet constitutes
graph is proportional to its number of nodes, if a SR-graph a rumor [6]. Tolosi et al. distinguish rumors by analyzing
has limited nodes, its structure is always simple, and most the characteristics of different events. However, the features
simple SR-graphs have similar global structures. Therefore, change dramatically across events [7]. McCreadie et al. study
for simple SR-graphs, the text features and local structural the feasibility using a crowdsourcing platform to identify
features are more important for rumor detection while the rumors and non-rumors in social media [8]. Bhattacharjee
global structural features are secondary. Conversely, for com- et al. regard rumor detection as a text classification task
plex SR-graphs, the structures of rumors and non-rumors are [9]. They propose a novel approach of feature construction
probably distinguishable, and the global structural features by reweighting the TF-IDF score of some particular terms
and text features are both important for rumor detection. according to the label information of training data, and they
In order to effectively model the above phenomenon, the show that their model reaches comparable performance to
proposed NPAM is used to ensemble two neural networks, a LSTM with Glove word-embedding for rumor detection
a Text CNN (TCNN) and a GCN, while the rumor detection on PHEME datasets. Although these methods can extract
is treated as a classification task. Assuming that the number effective content features, scholars realize that these rumor
of nodes in current SR-graph is N, and the number of nodes detection models cannot reflect the structural information
in the maximal SR-graph is M, the contribution rate of the of rumor propagation, and the exclusive use of content
TCNN and the GCN for classification is defined as N/M. We features is not sufficient for the rumor detection task. To
call the resulting model an Ensemble Graph Convolutional further enhance the performance of rumor detection mod-
Neural Net with Nodes Proportion Allocation Mechanism els, scholars consider structural information and introduce
(EGCN). In experiments, we first verify that the global struc- structural features to their models. In the proposed models,
tural features are effective for rumor detection, and then, the the structural information is expressed as context features,
effects of different word-embedding dimensions on multiple which are usually local. Context features are extracted by
test indices are studied. Moreover, we show that our proposed considering relevant information of the social media tweet
EGCN model is comparable or even better than current state- or fake news [10]. Wu et al. introduce random walk kernels
of-art models. between tweets to a Kernel Support Vector Machine (KSVM)
Our main contributions can be summarized as follows: and combine both the content-based kernels and the random
walk kernels for rumor detection [12]. Pamungkas et al. use
1) To learn both the global and local structural informa- Jaccard Similarity between every tweet and its source as
tion, we construct a SR-graph for every conversation. context features [13]. Although these local structural features
In a SR-graph, a node denotes a tweet, and the node are useful, they do not make full use of the global structural
feature is its weighted word vectors. features for the rumor detection task. Therefore, this paper
2) To build an effective deep learning model for the rumor suggests that more application about structural information
detection task, we propose an EGCN model based on could be done for the rumor detection task, and we pay
a Nodes Proportion Allocation Mechanism (NPAM). attention to the global and local structural features in every
Based on NPAM, the text features, local structural conversation in our research.
features, and global structural features can be learned
by an ensemble deep Neural Network. III. THE PROPOSED MODEL
3) To obtain a satisfactory EGCN model for the rumor Global structural information means that the interaction be-
detection task, we explore the optimal values of the tween all tweets in a conversation is considered. As we
2 VOLUME x, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3050563, IEEE Access
F
Spatial
based GCN
features
Word-vectors
G
Spatial
based GCN
embedded to Graphs features
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3050563, IEEE Access
the Text CNN is PT and the feature output of the GCN is PG, Hebdo, killing 11 people and wounding 11 more, on January
the total feature ouput of EGCN is 7, 2015.
N N * Ferguson: citizens of Ferguson in Missouri, USA,
y = PG ×+ P T (1 − ) (1) protested after the fatal shooting of an 18-year-old African
M M
American, Michael Brown, by a white police officer on
The spatial-based GCN has 4 Graph convolution layers,
August 9, 2014.
a SortPooling layer, and 3 1-dimension convolution layers.
* Gencrash: a passenger plane from Barcelona to Dussel-
The output of the proposed EGCN is an ensemble of the Text
dorf crashed in the French Alps on March 24, 2015, killing
CNN and the spatial-based GCN.
all passengers and crew. The plane was ultimately found to
The convolution operation in the Text CNN model can be
have been deliberately crashed by the co-pilot.
expressed as follows:
* Ottawa Shooting: shootings occurred on Ottawaaŕs ˛ Par-
X liament Hill, resulting in the death of a Canadian soldier on
Y = W (i) ∗ x + b(i) (2)
i=1 October 22, 2014.
where W (i) denotes the i-th convolution kernel, which can * Sydney Siege: a gunman held hostage ten customers and
be optimized by the BP algorithm. After the convolutional eight employees of a Lindt chocolate cafe located at Martin
operations, a ReLU active function is used. Place in Sydney on December 15, 2014.
The final dataset contains 5,802 annotated tweets, of which
Yconv = ReLU (Y ) = max (0, Y ) (3) 1,972 were classified as rumors and 3,830 as non-rumors.
These annotations are distributed differently across the five
After convolution layers, the features, such as keywords events, as shown in Table 1.
are extracted, and then higher level features are extracted by
the pooling layers. The pooled features can be expressed as TABLE 1. Distribution of categories for the five events in the dataset PHEME.
Formula (4).
Event Rumors Non-rumors Total
Ypool = Pmax (Yconv ) (4) Charlie Hebdo 458(22.0%) 1621(78.0%) 2079
Ferguson 284 (24.8%) 859 (75.2%) 1143
where, Pmax denotes the max-pooling operation. The fea-
Gencrash 238 (50.7%) 231 (49.3%) 469
tures extracted by pooling layers are then passed to a full-
Ottawa Shooting 470 (52.8%) 420 (47.2%) 890
connection layer.
Sydney siege 522 (42.8%) 699 (57.2%) 1221
In the GCN part of the EGCN, given a graph A and its node
Total 1972 (34.0%) 3830 (66.0%) 5802
features X, the graph convolution layer takes the following
form:
Z = f D̃−1 ÃXW (5) The source data are structured as follows. Each event has a
directory with two subfolders: rumors and non-rumors. These
where à = A+I is the adjacent matrix of the graph with two folders have folders named with a tweet ID. The tweet
P self- itself can be found on the source-tweet directory of the tweet
loops, D̃ is its diagonal degree matrix with D̃ii = j Ãij ,
W is a matrix of trainable graph parameters, f is a nonlinear in question, and the directory reactions has the set of tweets
activation function, and Z is the output activation matrix. We responding to that source tweet.
stack multiple graph convolution layers as follows:
2) Evaluation Measures
Z t+1 = f D̃−1 ÃZ t W t (6) Accuracy is often treated as a suitable evaluation measure
for a classifier. In this paper, we also introduce other 3
After several spatial graph convolution layers, a SortPool- indices: Precision, Recall, and F1. The evaluating indicators
ing layer [21] is used to sort the feature descriptors, each of are defined as follows:
which represents a vertex. The SortPooling operation defines
a sequence of nodes in the graph. Then, the output of the Positive Negative
SortPooling layer is passed to the full-connection layer. True True Positive (TP) True Negative (TN)
False False Positive (FP) False Negative (FN)
IV. EXPERIMENTS
A. EXPERIMENTAL SETUP
1) Datasets TP
P recision= (7)
In this paper, we use the PHEME dataset which contains TP + FP
Twitter posts during breaking news. The five breaking news TP
Recall= (8)
in PHEME are as follows [6]: TP + FN
* Charlie Hebdo: two brothers forced their way into the 2 × P recison × Recall
offices of the French satirical weekly newspaper Charlie F 1= (9)
P recison + Recall
4 VOLUME x, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3050563, IEEE Access
where TP and FN refer to the number of instances which TABLE 2. Test F1 scores of the five events in the dataset PHEME.
are correctly classified, and FP and TN are the numbers of
instances that are incorrectly classified. Events Learning rate filters F1 scores
Charlie Hebdo 1e-5 32-16-32 0.793
3) Evaluation Measures
Ferguson 1e-5 32-16-32 0.730
Gencrash 1e-5 32-16-32 0.553
SVM (Content + Context). Support Vector Machines with
Ottawa shooting 1e-5 32-16-32 0.551
the cost coefficient selected via nested cross-validation.
Sydney siege 1e-5 32-16-32 0.650
Random Forest (Content). Random forest is a classifier that
uses multiple trees to train and predict samples.
Naive Bayes (Content). Naive Bayesian method is a classi- Compared with traditional machine learning models, the
fication method based on Bayesian theorem and independent spatial-based GCN can recognize the structural information
hypothesis of characteristic conditions. of different rumor categories. To show the improvement of
Maximum Entropy (Content). Maximum Entropy states the proposed GCN compared with the traditional machine
that the probability distribution that best represents the cur- learning methods, we compare our experimental results with
rent state of knowledge is the one with largest entropy in the the commonly used rumor detection models, and the test F1
context of precisely stated prior data. scores are shown in Table 3.
CRF (Content+Social). Conditional Random Field is a As noted in Tables 2 and 3, the SR-graphs in the rumor
discriminant probability model used two types of features: dataset are helpful for the classification task, which means
content-based features and social features [6]. that both the local and global structure are helpful for rumor
Text-CNN (Content). A convolutional neural network de- detection.
signed for text data.
Zhao et al., 2015. Classification method proposed by Zhao 2) Exploring the structures of conversations
et al. [7]. To improve the classification results of the model, we use
TF-IDFB. A novel approach of feature construction by weighted word vectors to replace the degrees in the node
reweighting the TF-IDF score of some particular terms taking features and introduce a node threshold to the model. If a
into account the label information of training data [8]. graph contains fewer nodes, the structure of the source tweet
PGNN (Content + Structural). In a PGNN, the adjacent and the replies are simple. A Text CNN may be effective for
relation is transformed into indicator functions in its defined the rumor detection task. According to SR-graphs in datasets,
graph convolution to avoid directly using adjacent matrices the degree of each node can be calculated. When the number
[22]. of nodes in a SR-graph is greater than 10, the SR-graphs of
most conversations will become relatively complex, and this
B. EXPERIMENTAL RESULTS AND DISCUSSION complexity is reflected in the fact that different nodes in one
In experiments, the EGCN contains a Text CNN and a GCN. complex SR-graph have different degrees. When the node
The Text CNN has 3 layers, and the GCN contains 4 Graph number of SR-graph is less than 10, the SR-graphs in some
convolution layers, a SortPooling layer, and a 1-dimension conversations are complex, and the degrees of most nodes
convolution layers. The Adam method is used, and the initial are different. However, the SR-graph in some conversations
learning rate is 1e-5. The structure of source tweets and is relatively simple, which is reflected by the fact that the
the replies can be organized as SR-graphs, and different degrees of other nodes are close except the source node.
SR-graphs have different numbers of nodes. Both the local When the node number of SR-graph is less than 5, most SR-
structural information and the global structural information graphs are simple. We show that each event contains its own
are treated as important features for the rumor detection task node distribution, and the node distributions are shown in
in this paper. Our experiments are divided into two parts. Table 4.
First, we verify the validity of global structural features by TABLE 4. Node distributions of the five events in the dataset PHEME.
the GCN model in the proposed EGCN. Then we explore the
effect of different dimensional word vectors and show that Event max minn rumors non-rumors
our proposed EGCN model is comparable or even better than Charlie Hebdo 346 1 458 1621
the current state-of-art models. Ferguson 289 1 284 859
Gencrash 77 1 238 231
1) Effective GCNs based on global structure features Ottawa shooting 108 1 470 420
To verify that the extracted structural features of our SR- Sydney siege 342 1 522 699
graphs are effective, in our first experiment, we test the F1
scores of the proposed graph structure without embedding As noted in Table 4, the node distributions of different
word vectors, and the node features are their degrees rather events are unbalanced. The second column shows the max-
than the weighted word vectors. The test F1 scores are shown imum number of nodes in graphs of corresponding events.
as Table 2. The third column shows the minimum number of nodes, and
VOLUME x, 2019 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3050563, IEEE Access
TABLE 3. Test F1 scores of the GCN and traditional machine learning methods on PHEME.
the last 2 columns show the distribution of rumors and non- As Table 8 shows, the proposed EGCN achieves at least one
rumors. best result in every event and almost all the optimal solutions
or suboptimal solutions. Overall, the EGCN obtains the 6
3) Exploring the Optimal EGCN best results and 2 suboptimal solutions in the PHEME dataset
To further explore the effects of different dimensional word- and perform better than other models. For comparing, we
vectors and construct a satisfactory EGCN model, we test use a LSTM and a Text CNN, which are commonly used
4 different indices on PHEME dataset based on different for classification, and the results show that the proposed
dimensional word vectors. EGCN performs better than the two commonly used models.
The proposed EGCN use both text features and structural
TABLE 5. Node distributions of the five events in the dataset PHEME. features for classification, and the experiments verify that the
extracted text features and structural features are effective for
Event max min N ∈ (0, 10] N ∈ (10, ∞) the rumor detection task. Moreover, we also add a state-of-
Charlie Hebdo 346 1 687 1392 the-art Graph Neural Network PGNN, a PGNN is a kind of
Ferguson 289 1 379 764 GCN, and the adjacent relation in a PGNN is transformed
Gencrash 77 1 299 170 into indicator functions in the graph convolution to avoid
Ottawa shooting 108 1 387 503 directly using adjacent matrices. The experiments show that
Sydney siege 342 1 328 893 the proposed EGCN uses the structural information and the
text information more effectively and performs better than
Next, we test the classification indices and we aim to find the PGNN on most indexes.
an optimal dimension of word vectors. The test classification
indices are shown in Table 6. V. CONCLUSIONS AND OUTLOOKS
As Table 6 shows, different dimensional word vectors For the rumor detection task, we propose a deep Neural Net
lead to different test indices. When the dimension of word to transform the rumor detection problem to classification
vectors is 25, the average test indices are satisfactory. After problem. To obtain satisfactory classification results, we train
excluding meaningless words, the length of most tweets is word vectors based on the Word2Vec model and propose a
relatively short. For commonly used words, low dimension SR-graph for every source tweet and its replies. Based on SR-
word vector can achieve satisfactory results. Therefore, we graph and the corresponding word vectors, we train an EGCN
use weighted 25-dimension word vectors as the node features model that achieves comparable or even better results than
in SR-graphs, and then the structure of the proposed EGCN the state-of-art machine learning models. In experiments,
model is fixed. Table 7 shows the final test indices of the we find that the word vectors are very important for the
EGCN in our experiments. final performance of the proposed EGCN model, and we
To further illustrate the effectiveness of the proposed EGC- use an existing word-embedding model to train the word
N, we compare the indices of the EGCN with the commonly vectors in this paper. However, we suppose that a word-
used rumor detection models. In reference [22], a PGNN is embedding model designed for Twitter datasets might work
proposed for a Four-classes rumor detection task. A PGNN better than the existing models. Therefore, we will design an
is a kind of GCN, and the adjacent relation in a PGNN is unsupervised word-embedding Neural Net for Twitter in our
transformed into indicator functions in the graph convolution next study.
to avoid directly using adjacent matrices.The F1 scores and
other indices are shown in Table 8. VI. ACKNOWLEDGMENT
In Table 8, there are 5 events and 15 indices. R.F denotes This work are supported by the National Key Research and
the Random Forest algorithm, and N. B is the Naive Bayes Development Plan (No. 2016YFC0600908) and the Nation-
method. We use bold black fonts to mark the optimal solu- al Natural Science Foundation of China under Grant(No.
tions and bold blue fonts to mark the suboptimal solutions. 61876186, No. 61977061).
6 VOLUME x, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3050563, IEEE Access
TABLE 6. The test results of different dimensional word vectors in the dataset PHEME.
TABLE 7. The best test results of the proposed EGCN in the dataset PHEME.
VOLUME x, 2019 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3050563, IEEE Access
8 VOLUME x, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/