CHEER-Centrality-aware High-order Event Reasoning Network
CHEER-Centrality-aware High-order Event Reasoning Network
Figure 2: An overview of our proposed Centrality-aware High-order Event Reasoning Network (CHEER).
Central Events Prediction and EIG Enhance- A General GNN-based DECI Model To predict
ment Once obtained the centrality-aware event whether there is a causal relation between events
embeddings, we use them to predict whether an ei and ej , we concatenate “[CLS]” embeddings
event is a central event: pei = f (cei Wc ), where of the document, the event features zi , zj , event
f denotes the sigmoid function, Wc ∈ Rd×1 is pair features zk , and define the probability of being
10807
causal relation as follows: where rt ∈ R1×d is the edge feature specified by
the edge type t, Wr ∈ Rd×1 is parameter vector
pei,j = f ([h[CLS] ||zi ||zj ||zk ] Wp ) , (4)
according to t. In this way, we could adaptively ad-
where f denotes the softmax function, ∥ denotes just the interaction strength between two adjacent
concatenation, Wp is the parameter weight ma- nodes by weighing different types of connections
trix. Event-related features are typically initial- with γt . γt will be automatically learned.
ized with contextualized embeddings via PLM in Figure 2 illustrates an example of the entire pro-
Section 3.1 and enhanced through L-layer GNN cess of CHEER (here we take a sub-graph of EIG
reasoning. The l-th layer takes a set of node em- for brevity). Different colors of edges indicate dif-
beddings Z(l) ∈ RN ×din as input, and outputs a ferent connection types in EIG. Edges with the
new set of node embeddings Z(l+1) ∈ RN ×dout , same color (i.e., the same edge type) will use the
(l)
where N = |V1 | + |V2 | is the number of nodes, din same γt . Each layer has its own set of γt . Then
and dout are the dimensions of input and output we could instantiate the aggregation function g as:
embeddings, respectively. Formally, the output of
(l) (l) (l) (l)
the l-th layer for node vi can be written as: g zi , zj = f (γt + αij )(zjl Wv(l) ), (9)
X (l)
(l+1) (l) (l) where f denotes the softmax function, Wv ∈
zi = σ g zi , zj , (5)
Rdin ×dout is the parametwer weight matrix. αij
j∈Ni
is computed by a shared self-attention mecha-
where σ denotes non-linearity, Ni denotes the set nism (Vaswani et al., 2017) to measure the im-
that contains all the first-order neighbors of vi , g de- portance of neighbor j to i, where Wq , Wk ∈
notes how to aggregate neighborhood information. Rdin ×dout are parameter weight matrices:
By stacking multiple layers L, multi-hop reasoning
could be reached. (zi Wq )(zj Wk )T
αij = √ . (10)
EIG Reasoning Network Instantiation dout
Event & Event-pair Features For an event node As shown in Figure 2, the above process can be
ei , we directly take the centrality-aware event em- organized as a matrix multiplication to compute
beddings for its initialization: representations for all the nodes simultaneously
through a weighted adjacency matrix. Denote Aij
(0)
zi = cei Wt , vi ∈ V1 , (6) as the (i, j)-element of the binary adjacency matrix
A, Aij is 1 if there is an edge between nodes vi and
where 0 denotes the initial state for the following
vj or 0 otherwise. We could compute each entry of
neural layers, Wt ∈ Rd×2d is a parameter weight
the edge-aware adjacency matrix as follows, where
matrix to make event nodes be the same size as the (l) (l)
following event pair nodes for efficient computing. δij = f (γt + αij ) is the normalized weight:
As for an event pair node (ei , ej ) → vk , we ′ (l)
concatenate their corresponding two contextualized Aij = δij Aij , (11)
event embeddings as the event pair node features:
Figure 2 shows that the corresponding neighbor
(0)
zk = [hei ∥hej ], vk ∈ V2 , (7) node features are aggregated with different weights
according to δij to obtain the representation of the
EIG Reasoning It is intuitive that different types target node. Finally, the node representations of
of edges represent various semantics contributing layer l can be obtained by:
differently to the causality prediction. To handle
′
this heterogeneity issue, EIG Reasoning Network Z(l+1) = σ A (l) Z(l) Wv(l) . (12)
incorporates the edge features with a self-attention
mechanism during aggregation. Specifically, let 3.5 Training
T denote the number of edge types in EIG. We
Following ERGO (Chen et al., 2022), we adopt the
incorporate the edge features and learn a scalar
focal loss (Lin et al., 2017) to alleviate the false-
γt (1 ≤ t ≤ T ) for each different type of edge to
negative issue (i.e., the number of negative samples
measure their importance:
during training far exceeds that of positives). We
γt = rt Wr , (8) adopt the β-balanced variant of focal loss, which
10808
introduces a weighting factor β in [0, 1] for the Therefore, by modifying the event centrality in-
class “positive” and 1 − β for the class “negative”. corporation, the construction of EIG, and the aggre-
The loss function L2 can be written as: gation function, CHEER can degenerate into differ-
X ent GNN-based DECI methods, and thus provide a
L2 = − βei,j (1 − pei,j )τ log(pei,j ), (13) unified view for better document-level reasoning.
ei ,ej ∈D
Since some baselines can not handle the inter- Inter-sentence Evaluation From Table 2, we can
sentence scenarios in EventStoryLine, and the observe that: (1) CHEER greatly outperforms all
10810
Intra
Model Intra Inter
+ Inter
CHEER 62.6 48.4 51.4
w/o event centrality 60.3 46.3 49.3
w/o edge features 61.4 47.6 50.4
w/o coref 60.8 46.9 50.1
5 Conclusion Pengfei Cao, Xinyu Zuo, Yubo Chen, Kang Liu, Jun
Zhao, Yuguang Chen, and Weihua Peng. 2021.
In this paper, we propose a novel centrality-aware Knowledge-enriched event causality identification
high-order event reasoning network (CHEER) to via latent structure induction networks. In Proceed-
ings of the 59th Annual Meeting of the Association for
conduct global reasoning for DECI. We first sum- Computational Linguistics and the 11th International
marize a general GNN-based DECI model and pro- Joint Conference on Natural Language Processing
vide a unified view for better understanding. Then (Volume 1: Long Papers), pages 4862–4872, Online.
we design an Event Interaction Graph (EIG) that Association for Computational Linguistics.
involves prior event relations and high-order inter- Tommaso Caselli and Piek Vossen. 2017. The event
actions among event pairs. Finally, we incorpo- StoryLine corpus: A new benchmark for causal and
10812
temporal relation extraction. In Proceedings of the blogs and films. In Proceedings of the Events and Sto-
Events and Stories in the News Workshop, pages 77– ries in the News Workshop, pages 52–58, Vancouver,
86, Vancouver, Canada. Association for Computa- Canada. Association for Computational Linguistics.
tional Linguistics.
Kazuma Kadowaki, Ryu Iida, Kentaro Torisawa, Jong-
Meiqi Chen, Yixin Cao, Kunquan Deng, Mukai Li, Hoon Oh, and Julien Kloetzer. 2019. Event causal-
Kun Wang, Jing Shao, and Yan Zhang. 2022. Ergo: ity recognition exploiting multiple annotators’ judg-
Event relational graph transformer for document- ments and background knowledge. In Proceedings
level event causality identification. arXiv preprint of the 2019 Conference on Empirical Methods in Nat-
arXiv:2204.07434. ural Language Processing and the 9th International
Joint Conference on Natural Language Processing
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and (EMNLP-IJCNLP), pages 5816–5822, Hong Kong,
Kristina Toutanova. 2019. BERT: Pre-training of China. Association for Computational Linguistics.
deep bidirectional transformers for language under-
standing. In Proceedings of the 2019 Conference of Thomas N. Kipf and Max Welling. 2017. Semi-
the North American Chapter of the Association for supervised classification with graph convolutional
Computational Linguistics: Human Language Tech- networks. In 5th International Conference on Learn-
nologies, Volume 1 (Long and Short Papers), pages ing Representations, ICLR 2017, Toulon, France,
4171–4186, Minneapolis, Minnesota. Association for April 24-26, 2017, Conference Track Proceedings.
Computational Linguistics. OpenReview.net.
Quang Do, Yee Seng Chan, and Dan Roth. 2011. Min- Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaim-
imally supervised event causality identification. In ing He, and Piotr Dollár. 2017. Focal loss for dense
Proceedings of the 2011 Conference on Empirical object detection. In IEEE International Conference
Methods in Natural Language Processing, pages 294– on Computer Vision, ICCV 2017, Venice, Italy, Octo-
303, Edinburgh, Scotland, UK. Association for Com- ber 22-29, 2017, pages 2999–3007. IEEE Computer
putational Linguistics. Society.
Lei Gao, Prafulla Kumar Choubey, and Ruihong Huang. Jian Liu, Yubo Chen, and Jun Zhao. 2020. Knowl-
2019. Modeling document-level causal structures for edge enhanced event causality identification with
event causal relation identification. In Proceedings mention masking generalizations. In Proceedings
of the 2019 Conference of the North American Chap- of the Twenty-Ninth International Joint Conference
ter of the Association for Computational Linguistics: on Artificial Intelligence, IJCAI 2020, pages 3608–
Human Language Technologies, Volume 1 (Long and 3614. ijcai.org.
Short Papers), pages 1808–1817, Minneapolis, Min-
nesota. Association for Computational Linguistics. Ilya Loshchilov and Frank Hutter. 2019. Decoupled
weight decay regularization. In 7th International
Chikara Hashimoto. 2019. Weakly supervised mul- Conference on Learning Representations, ICLR 2019,
tilingual causality extraction from Wikipedia. In New Orleans, LA, USA, May 6-9, 2019. OpenRe-
Proceedings of the 2019 Conference on Empirical view.net.
Methods in Natural Language Processing and the
9th International Joint Conference on Natural Lan- Yubo Ma, Zehao Wang, Yixin Cao, Mukai Li, Meiqi
guage Processing (EMNLP-IJCNLP), pages 2988– Chen, Kun Wang, and Jing Shao. 2022. Prompt for
2999, Hong Kong, China. Association for Computa- extraction? PAIE: Prompting argument interaction
tional Linguistics. for event argument extraction. In Proceedings of the
60th Annual Meeting of the Association for Compu-
Chikara Hashimoto, Kentaro Torisawa, Julien Kloet- tational Linguistics (Volume 1: Long Papers), pages
zer, Motoki Sano, István Varga, Jong-Hoon Oh, and 6759–6774, Dublin, Ireland. Association for Compu-
Yutaka Kidawara. 2014. Toward future scenario gen- tational Linguistics.
eration: Extracting event causality exploiting seman-
tic relation, context, and association features. In Christopher D Manning, Mihai Surdeanu, John Bauer,
Proceedings of the 52nd Annual Meeting of the As- Jenny Rose Finkel, Steven Bethard, and David Mc-
sociation for Computational Linguistics (Volume 1: Closky. 2014. The stanford corenlp natural language
Long Papers), pages 987–997, Baltimore, Maryland. processing toolkit. In Proceedings of 52nd annual
Association for Computational Linguistics. meeting of the association for computational linguis-
tics: system demonstrations, pages 55–60.
Christopher Hidey and Kathy McKeown. 2016. Identi-
fying causal relations using parallel Wikipedia arti- Paramita Mirza. 2014. Extracting temporal and causal
cles. In Proceedings of the 54th Annual Meeting of relations between events. In Proceedings of the ACL
the Association for Computational Linguistics (Vol- 2014 Student Research Workshop, pages 10–17, Bal-
ume 1: Long Papers), pages 1424–1433, Berlin, Ger- timore, Maryland, USA. Association for Computa-
many. Association for Computational Linguistics. tional Linguistics.
Zhichao Hu, Elahe Rahimtoroghi, and Marilyn Walker. Qiang Ning, Zhili Feng, Hao Wu, and Dan Roth. 2018.
2017. Inference of fine-grained event causality from Joint reasoning for temporal and causal relations. In
10813
Proceedings of the 56th Annual Meeting of the As- Information Processing Systems 2017, December 4-9,
sociation for Computational Linguistics (Volume 1: 2017, Long Beach, CA, USA, pages 5998–6008.
Long Papers), pages 2278–2288, Melbourne, Aus-
tralia. Association for Computational Linguistics. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
Chaumond, Clement Delangue, Anthony Moi, Pier-
Mehwish Riaz and Roxana Girju. 2013. Toward a better ric Cistac, Tim Rault, Remi Louf, Morgan Funtow-
understanding of causality between verbal events: Ex- icz, Joe Davison, Sam Shleifer, Patrick von Platen,
traction and analysis of the causal power of verb-verb Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
associations. In Proceedings of the SIGDIAL 2013 Teven Le Scao, Sylvain Gugger, Mariama Drame,
Conference, pages 21–30, Metz, France. Association Quentin Lhoest, and Alexander Rush. 2020. Trans-
for Computational Linguistics. formers: State-of-the-art natural language processing.
In Proceedings of the 2020 Conference on Empirical
Mehwish Riaz and Roxana Girju. 2014a. In-depth ex- Methods in Natural Language Processing: System
ploitation of noun and verb semantics to identify Demonstrations, pages 38–45, Online. Association
causation in verb-noun pairs. In Proceedings of the for Computational Linguistics.
15th Annual Meeting of the Special Interest Group on
Discourse and Dialogue (SIGDIAL), pages 161–170, Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin,
Philadelphia, PA, U.S.A. Association for Computa- Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou,
tional Linguistics. and Maosong Sun. 2019. DocRED: A large-scale
document-level relation extraction dataset. In Pro-
Mehwish Riaz and Roxana Girju. 2014b. Recognizing ceedings of the 57th Annual Meeting of the Associa-
causality in verb-noun pairs via noun and verb seman- tion for Computational Linguistics, pages 764–777,
tics. In Proceedings of the EACL 2014 Workshop on Florence, Italy. Association for Computational Lin-
Computational Approaches to Causality in Language guistics.
(CAtoCL), pages 48–57, Gothenburg, Sweden. Asso-
ciation for Computational Linguistics. Wenpeng Yin, Dragomir Radev, and Caiming Xiong.
2021. DocNLI: A large-scale dataset for document-
Jiaxin Shi, Shulin Cao, Lei Hou, Juanzi Li, and level natural language inference. In Findings of
Hanwang Zhang. 2021. Transfernet: An effec- the Association for Computational Linguistics: ACL-
tive and transparent framework for multi-hop ques- IJCNLP 2021, pages 4913–4922, Online. Association
tion answering over relation graph. ArXiv preprint, for Computational Linguistics.
abs/2104.07302.
Kun Zhao, Donghong Ji, Fazhi He, Yijiang Liu, and
Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. Yafeng Ren. 2021. Document-level event causality
Conceptnet 5.5: An open multilingual graph of gen- identification via graph inference mechanism. Infor-
eral knowledge. In Proceedings of the Thirty-First mation Sciences, 561:115–129.
AAAI Conference on Artificial Intelligence, February
4-9, 2017, San Francisco, California, USA, pages Xinyu Zuo, Pengfei Cao, Yubo Chen, Kang Liu, Jun
4444–4451. AAAI Press. Zhao, Weihua Peng, and Yuguang Chen. 2021a.
Improving event causality identification via self-
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky,
supervised representation learning on external causal
Ilya Sutskever, and Ruslan Salakhutdinov. 2014.
statement. In Findings of the Association for Com-
Dropout: a simple way to prevent neural networks
putational Linguistics: ACL-IJCNLP 2021, pages
from overfitting. The journal of machine learning
2162–2172, Online. Association for Computational
research, 15(1):1929–1958.
Linguistics.
Yuan Sui, Shanshan Feng, Huaxiang Zhang, Jian Cao,
Liang Hu, and Nengjun Zhu. 2022. Causality-aware Xinyu Zuo, Pengfei Cao, Yubo Chen, Kang Liu, Jun
enhanced model for multi-hop question answering Zhao, Weihua Peng, and Yuguang Chen. 2021b.
over knowledge graphs. Knowledge-Based Systems, LearnDA: Learnable knowledge-guided data augmen-
250:108943. tation for event causality identification. In Proceed-
ings of the 59th Annual Meeting of the Association for
Minh Tran Phu and Thien Huu Nguyen. 2021. Graph Computational Linguistics and the 11th International
convolutional networks for event causality identifi- Joint Conference on Natural Language Processing
cation with rich document-level structures. In Pro- (Volume 1: Long Papers), pages 3558–3571, Online.
ceedings of the 2021 Conference of the North Amer- Association for Computational Linguistics.
ican Chapter of the Association for Computational
Linguistics: Human Language Technologies, pages Xinyu Zuo, Yubo Chen, Kang Liu, and Jun Zhao. 2020.
3480–3490, Online. Association for Computational KnowDis: Knowledge enhanced data augmentation
Linguistics. for event causality detection via distant supervision.
In Proceedings of the 28th International Conference
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob on Computational Linguistics, pages 1544–1550,
Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Barcelona, Spain (Online). International Committee
Kaiser, and Illia Polosukhin. 2017. Attention is all on Computational Linguistics.
you need. In Advances in Neural Information Pro-
cessing Systems 30: Annual Conference on Neural
10814
ACL 2023 Responsible NLP Checklist
A For every submission:
3 A1. Did you describe the limitations of your work?
Limitations
7 A4. Have you used AI writing assistants when working on this paper?
Left blank.
3 Did you use or create scientific artifacts?
B
4 Experiments
3 B1. Did you cite the creators of artifacts you used?
4.1 Experimental Setup
B2. Did you discuss the license or terms for use and / or distribution of any artifacts?
Not applicable. Left blank.
3 B3. Did you discuss if your use of existing artifact(s) was consistent with their intended use, provided
that it was specified? For the artifacts you create, do you specify intended use and whether that is
compatible with the original access conditions (in particular, derivatives of data accessed for research
purposes should not be used outside of research contexts)?
4.1 Experimental Setup
B4. Did you discuss the steps taken to check whether the data that was collected / used contains any
information that names or uniquely identifies individual people or offensive content, and the steps
taken to protect / anonymize it?
Not applicable. Left blank.
B5. Did you provide documentation of the artifacts, e.g., coverage of domains, languages, and
linguistic phenomena, demographic groups represented, etc.?
Not applicable. Left blank.
3 B6. Did you report relevant statistics like the number of examples, details of train / test / dev splits,
etc. for the data that you used / created? Even for commonly-used benchmark datasets, include the
number of examples in train / validation / test splits, as these provide necessary context for a reader
to understand experimental results. For example, small differences in accuracy on large test sets may
be significant, while on small test sets they may not be.
4.1 Experimental Setup
C1. Did you report the number of parameters in the models used, the total computational budget
(e.g., GPU hours), and computing infrastructure used?
No response.
The Responsible NLP Checklist used at ACL 2023 is adopted from NAACL 2022, with the addition of a question on AI writing
assistance.
10815
3 C2. Did you discuss the experimental setup, including hyperparameter search and best-found
hyperparameter values?
4.1 Experimental Setup
3 C3. Did you report descriptive statistics about your results (e.g., error bars around results, summary
statistics from sets of experiments), and is it transparent whether you are reporting the max, mean,
etc. or just a single run?
4.3 Overall Results
3 C4. If you used existing packages (e.g., for preprocessing, for normalization, or for evaluation), did
you report the implementation, model, and parameter settings used (e.g., NLTK, Spacy, ROUGE,
etc.)?
4.1 Experimental Setup
D 3 Did you use human annotators (e.g., crowdworkers) or research with human participants?
3.3 Event Centrality Incorporation
3 D1. Did you report the full text of instructions given to participants, including e.g., screenshots,
disclaimers of any risks to participants or annotators, etc.?
3.3 Event Centrality Incorporation
D2. Did you report information about how you recruited (e.g., crowdsourcing platform, students)
and paid participants, and discuss if such payment is adequate given the participants’ demographic
(e.g., country of residence)?
Not applicable. Left blank.
D3. Did you discuss whether and how consent was obtained from people whose data you’re
using/curating? For example, if you collected data via crowdsourcing, did your instructions to
crowdworkers explain how the data would be used?
Not applicable. Left blank.
D4. Was the data collection protocol approved (or determined exempt) by an ethics review board?
Not applicable. Left blank.
D5. Did you report the basic demographic and geographic characteristics of the annotator population
that is the source of the data?
Not applicable. Left blank.
10816