0% found this document useful (0 votes)
11 views13 pages

CHEER-Centrality-aware High-order Event Reasoning Network

The document presents CHEER, a novel model for Document-level Event Causality Identification (DECI) that emphasizes central events and high-order reasoning through an Event Interaction Graph (EIG). By manually annotating central events and incorporating centrality information, CHEER enhances causal relation predictions, achieving an average F1 gain of 5.9% in extensive experiments. The model addresses limitations of previous approaches by effectively integrating event centrality and coreference into its reasoning framework.

Uploaded by

wsy18236909587
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views13 pages

CHEER-Centrality-aware High-order Event Reasoning Network

The document presents CHEER, a novel model for Document-level Event Causality Identification (DECI) that emphasizes central events and high-order reasoning through an Event Interaction Graph (EIG). By manually annotating central events and incorporating centrality information, CHEER enhances causal relation predictions, achieving an average F1 gain of 5.9% in extensive experiments. The model addresses limitations of previous approaches by effectively integrating event centrality and coreference into its reasoning framework.

Uploaded by

wsy18236909587
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

CHEER: Centrality-aware High-order Event Reasoning Network

for Document-level Event Causality Identification

Meiqi Chen1 , Yixin Cao2 , Yan Zhang1 , Zhiwei Liu3 ,


1
Peking University 2 Singapore Management University 3 Meituan
[email protected]

Abstract A large FIRE broke out at the Waitrose


Document-level Event Causality Identification supermarket in Wellington‘s High Street.
(DECI) aims to recognize causal relations be- causal relation
tween events within a document. Recent stud- Half of the roof at the entrance of the
ies focus on building a document-level graph store collapsed during the blaze. It will
for cross-sentence reasoning, but ignore im- take a couple of days for repairs.
portant causal structures — there are one or
coreference
two “central" events that prevail throughout the
document, with most other events serving as
A man has been charged with arson
either their cause or consequence. In this pa- after an investigation over the fire.
per, we manually annotate central events for a
systematical investigation and propose a novel Figure 1: An example of DECI. Solid green lines denote
DECI model, CHEER, which performs high- target causal relations and dashed yellow lines denote
order reasoning while considering event cen- coreference. FIRE is the central event in this document.
trality. First, we summarize a general GNN-
based DECI model and provide a unified view
for better understanding. Second, we design an (2021) take events as nodes and extract linguis-
Event Interaction Graph (EIG) involving the in- tic/discourse relations as edges. Then, they apply
teractions among events (e.g., coreference) and Graph Neural Network (GNN) to enhance even-
event pairs, e.g., causal transitivity, cause(A, B) t/node embeddings with their neighbors for final
∧ cause(B, C) ⇒ cause(A, C). Finally, we incor-
causality prediction. To avoid noisy and exhaustive
porate event centrality information into the EIG
reasoning network via well-designed features relation extraction, ERGO (Chen et al., 2022) in-
and multi-task learning. We have conducted ex- stead takes each event pair as nodes and leverages
tensive experiments on two benchmark datasets. GNN on the relational graph for high-order causal
The results present great improvements (5.9% transitivity, e.g., cause(A, B) ∧ cause(B, C) ⇒
F1 gains on average) and demonstrate the ef- cause(A, C). However, some useful prior event rela-
fectiveness of each main component. tions such as coreference are discarded. Moreover,
1 Introduction we observe a causal information loss from docu-
ment to graph. Not all events are equally important.
Event Causality Identification (ECI) aims at identi- There are one or two “central" events that prevail
fying causal relations between events within texts. throughout the document, and other events are ei-
It is a fundamental NLP task and beneficial to vari- ther to explain their cause or the consequence (Gao
ous applications, such as question answering (Shi et al., 2019). As shown in Figure 1, event FIRE is
et al., 2021; Sui et al., 2022) and future event fore- the central event. It is mentioned several times (i.e.,
casting (Hashimoto, 2019; Bai et al., 2021). In coreferences blaze and fire), causing almost all the
terms of the text length, events may occur within other events (e.g., collapsed and repairs).
the same sentence (SECI) or span across the en- In this paper, we propose to consider the above
tire document (DECI). DECI is more practical than causal structures while leveraging the reasoning
SECI but suffers from the lack of clear causal indi- power of GNN. To do so, we highlight the follow-
cators, e.g., causal words because. ing questions:
Recent DECI works often build a document-
level graph for cross-sentence reasoning, but ignore • How to identify central events? Are they rec-
important causal structures. Tran Phu and Nguyen ognizable?
10804
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics
Volume 1: Long Papers, pages 10804–10816
July 9-14, 2023 ©2023 Association for Computational Linguistics
• How to effectively consider such causal struc- • Extensive experiments on two benchmark
tures for cross-sentence reasoning? datasets validate the effectiveness of CHEER
(5.9% F1 gains on average).
To address the issues, we manually annotate
central events in the public dataset EventStory- 2 Related Work
Line (Caselli and Vossen, 2017) and propose a
novel DECI model, Centrality-aware High-order 2.1 Sentence-level ECI
EvEnt Reasoning network (CHEER). We first Early feature-based methods explore different re-
summarize a general GNN-based DECI model for sources for causal expressions, such as lexical and
better understanding. Then, we design an Event syntactic patterns (Riaz and Girju, 2013, 2014b,a),
Interaction Graph (EIG) that involves interactions causality cues or markers (Do et al., 2011; Hidey
between events and among event pairs (i.e., high- and McKeown, 2016), temporal patterns (Ning
order relations). Finally, we incorporate event cen- et al., 2018), statistical information (Hashimoto
trality information into the EIG reasoning network et al., 2014; Hu et al., 2017), and weakly super-
via well-designed features and multi-task learning. vised data (Hashimoto, 2019; Zuo et al., 2021b).
In specific, for the first challenge, we preserve Recently, some methods have leveraged Pre-trained
centrality information into event embeddings using Language Models (PLMs) for the ECI task and
two measures: (i) position centrality to maintaining have achieved promising performance (Kadowaki
the order of sentences where events are located, and et al., 2019; Liu et al., 2020; Zuo et al., 2020).
(ii) degree centrality that counts the number of prior To deal with implicit causal relations, Cao et al.
relations of each event. The motivation is that a cen- (2021) incorporate external knowledge from Con-
tral event usually summarizes the main content at ceptNet (Speer et al., 2017), and Zuo et al. (2021a)
the beginning and almost all the other events are rel- learn context-specific causal patterns from external
evant to it. Then, we use the centrality-aware event causal statements.
embeddings for central event prediction. Evalu-
ated on our central event annotations, we found 2.2 Document-level ECI
that this centrality modeling method is feasible and Following the success of sentence-level natural lan-
effective, with potential for further improvement. guage understanding, many tasks are extended to
For the second challenge, based on the general the entire document, such as relation extraction
GNN-based DECI model, our proposed EIG uni- (Yao et al., 2019), natural language inference (Yin
fies both event and event-pair graphs, so that we et al., 2021), and event argument extraction (Ma
can reason over not only available causal structures et al., 2022). DECI poses new challenges to cross-
but also high-order event relations. Particularly, sentence reasoning and the lack of clear causal
there are three types of edges. First, two event pair indicators. Gao et al. (2019) propose a feature-
nodes shall be connected if they share a common based method that uses Integer Linear Program-
event, so that their relational information can be ming (ILP) to model the global causal structures.
fused for transitivity. Second, we connect event DSGCN (Zhao et al., 2021) uses a graph inference
nodes to their corresponding event pair nodes to en- mechanism to capture interaction among events.
hance event embeddings with high-order reasoning. RichGCN (Tran Phu and Nguyen, 2021) constructs
Moreover, the edge types will be further distin- an even graph and uses GCN (Kipf and Welling,
guished according to whether the event node is a 2017) to capture relevant connections. However,
central event or not. Third, EIG is also scalable to noise may be introduced in the construction of
prior event relations (e.g., coreference) that connect edges and the interdependency among event pairs
event nodes if available. is neglected. ERGO (Chen et al., 2022) builds a re-
Our contributions can be summarized as follows: lational graph and model interaction between event
• We propose to consider causal structures (i.e., pairs. Although intuitive, some meaningful event
event centrality and coreference) and manu- relations such as coreference are ignored. Com-
ally annotate central events for investigation. pared with them, CHEER could capture high-order
interactions among event pairs automatically while
• We design an EIG and propose a novel DECI being compatible with prior event relations. More-
framework CHEER for effective reasoning at over, we consider the centrality of events to conduct
the document level. global reasoning.
10805
Document and Events EIG EIG (with event centrality) EIG Reasoning Network
A large ➀FIRE broke out at the Waitrose … … Weighted Adj matrix
𝑒! 𝑒!
supermarket in Wellington‘s High Street. 𝑒$ 𝑒$ 𝑒!
𝑒!,$ 𝑒!,$ 𝑒"
Half of the roof at the entrance of the store 𝑒!,# 𝑒!,#
➁collapsed during the ➂blaze. It will take a 𝑒!,%
couple of days for ➃repairs. 𝑒!,% 𝑒!,% 𝑒!,$ N N
𝑒%,$ 𝑒%,$
𝑒#,% 𝑒#,% 𝑒!,"
A man has been charged with ➄arson after
an investigation over the ➅fire. 𝑒% 𝑒%,$
𝑒%
… … 𝑒$,"
Position Degree
Central Events Node Features
Encoder Centrality-aware Embeddings Prediction 𝑒! L
𝑒"
𝑒!,%
Centrality-aware 𝑒!,$
1 2 3 4 5 6 N D
Event Embeddings
𝑒!,"
Contextualized Embeddings 𝑒%,$
𝑒$,"

Figure 2: An overview of our proposed Centrality-aware High-order Event Reasoning Network (CHEER).

3 Methodology Considering BERT’s original limits that it cannot


handle documents longer than 512, we leverage a
Given document D and all its events, DECI is to
dynamic window mechanism to deal with it. Specif-
predict whether there is a causal relation between
ically, we divide D into several overlapping spans
any two event mentions ei and ej in D. As shown in
according to a specific step size and input them into
Figure 2, our proposed CHEER includes four main
BERT separately. For the same event occurring in
components: (1) Document Encoder to encode
different spans, we calculate the average of all the
the document and output contextualized represen-
embeddings of the corresponding token “<t>” to
tations of events; (2) Event Interaction Graph
obtain the final event representation hei for event i.
that builds a graph including event nodes and event
pair nodes for document-level reasoning. (3) Event
3.2 Event Interaction Graph
Centrality Incorporation that incorporates event
centrality information through two aspects. (4) Our EIG could not only performs high-order in-
EIG Reasoning Network that improves the qual- ference among event pairs but also be compatible
ity of event and event pair representations by con- with prior event relations. Specifically, given all
ducting inference over EIG, and then combines two the events of document D, we formulate EIG as:
types of node embeddings for final classification. G = {V, E}, where V is the set of nodes, E is the
set of edges. There are two types of nodes in V:
3.1 Document Encoder the nodes for a single event V1 and the nodes to
Given document D = [xt ]L t=1 where D can be of
D
represent a pair of events V2 . Each node in V2 is
any length LD , the document encoder aims to out- constructed by combining any two events of D.
put the contextualized document and event repre- For global inference, we introduce three main
sentations. Almost arbitrary PLMs can serve as types of edges in E: (1) (Event pair) - (event pair)
the encoder. In this paper, we leverage pre-trained edges E1 for two event pairs that share at least one
BERT (Devlin et al., 2019) as a base encoder to event, e.g., the green line of (FIRE, collapsed)-
obtain the contextualized embeddings. Following (collapsed, repairs) in Figure 2, which is motivated
conventions (Chen et al., 2022), we add special by the causal transitivity described in Introduction;
tokens at the start and end of D (i.e., “[CLS]” and (2) Event - (event pair) edges E2 for an event
and “[SEP]”), and insert additional special tokens pair and its corresponding two events, e.g., the pink
“<t>” and “</t>”’ at the start and end of all the line of FIRE-(FIRE, collapsed) in Figure 2. (3)
events to mark the event positions. Then, we have: Event - event edges E3 for prior event relations ob-
H = [h1 , h2 , ..., hLD ] = Encoder([x1 , x2 , ..., xLD ]), (1) tained by external knowledge or tools (this type
of edge is optional). Take coreference edges as an
where hi ∈ Rd is the output embedding of token xi . example (the yellow line of FIRE-fire in Figure 2),
Then, we use the embedding of the token “[CLS]” they are helpful for causal reasoning, since there is
for document representation and the embedding of no causal relation between coreference events them-
the token “<t>” for event representation. selves. Moreover, coreference events shall have the
10806
same causal relations between other events, which the parameter weight matrix. if pei is greater than
is so-called coreference consistency. Therefore, 0.5, we will regard ei as a central event. Then, we
both coreference consistency and causal transitivity increase the type of edges in E: we further divide
can be regarded as a kind of high-order reasoning. the event - (event pair) edges into central event -
(event pair) edges E21 and normal event - (event
3.3 Event Centrality Incorporation pair) edges E22 , and so does the event-event edges.
Considering the centrality of events is based on In this way, the interaction of central events on EIG
the motivation that the central event should play could have more of a special influence.
a more important role in global inference. In this
section, we introduce two aspects for incorporat- Central Events Annotation We manually anno-
ing event centrality information into our model. tate central events on the public dataset EventSto-
First, we propose centrality-aware event embed- ryLine to investigate the effect of centrality. In
dings, which could be used to predict whether an specific, we annotate central events considering the
event is a central event. Obtained the contextual- following rules: (1) the central events should be the
ized event embeddings hei output by the document focus of the story; (2) almost all other events de-
encoder, we perform the following two different scribed in the document should be related to it; (3)
centrality encoding modules: the coreference of central events will be regarded
as central events, too; (4) on the premise of express-
Position Centrality Encoding which assigns ing the main content of the document correctly and
each event an embedding vector cpos ∈ Rd ac- completely, the number of central events should be
cording to which sentence the event locates in the as small as possible. According to the rules, we
document. We initialize the vector randomly for have three annotators to complete the task. Each
each position. The motivation is central events of- document was annotated by two junior annotators
ten appear in the front of the document to summa- independently. If the answers of the two annota-
rize the core gist. For example, in Figure 2, the first tors were inconsistent, a senior annotator checked
sentence of the document outlines the main context the answers and made the final decision. The aver-
of story and contains the central event FIRE. age inter-annotator agreement is 86.4% (Cohen’s
Degree Centrality Encoding which assigns each kappa). For 258 documents of EventstoryLine, we
event an embedding vector cdeg ∈ Rd according get 352 central events, of which 166 documents
to the degree of its corresponding event node in have one central event, 90 documents have two
EIG. We initialize the vector randomly for each central events, and only 2 documents have three
degree. Intuitively, central events are throughout central events (these documents have more than
the document with many repeated mentions. Thus, 30 sentences and introduce several independent
central events will have a greater degree. For exam- events). Then, we use the labels to train the model
ple in Figure 2, the degree of central event FIRE is to predict central events:
greater than that of event collapsed, due to it has X
two coreference events blaze and fire. L1 = − log(pei ). (3)
ei ∈D
As the centrality encoding is applied to each
event, we directly add it to the event contextualized More analysis can be seen in Section 4.5.
embeddings. Formally, for an event ei and its corre-
sponding embedding hei , the final centrality-aware 3.4 EIG Reasoning Network
event embeddings is obtained by: In this section, we first describe a general GNN-
cei = hei + cpos(ei ) + cdeg(ei ) , (2) based DECI model, then instantiate our implemen-
tation by considering causal structures. Finally, we
where cpos , cdeg are obtained by the position and provide a unified view for better understanding and
degree centrality encoding of ei , respectively. discussing existing models.

Central Events Prediction and EIG Enhance- A General GNN-based DECI Model To predict
ment Once obtained the centrality-aware event whether there is a causal relation between events
embeddings, we use them to predict whether an ei and ej , we concatenate “[CLS]” embeddings
event is a central event: pei = f (cei Wc ), where of the document, the event features zi , zj , event
f denotes the sigmoid function, Wc ∈ Rd×1 is pair features zk , and define the probability of being
10807
causal relation as follows: where rt ∈ R1×d is the edge feature specified by
the edge type t, Wr ∈ Rd×1 is parameter vector
pei,j = f ([h[CLS] ||zi ||zj ||zk ] Wp ) , (4)
according to t. In this way, we could adaptively ad-
where f denotes the softmax function, ∥ denotes just the interaction strength between two adjacent
concatenation, Wp is the parameter weight ma- nodes by weighing different types of connections
trix. Event-related features are typically initial- with γt . γt will be automatically learned.
ized with contextualized embeddings via PLM in Figure 2 illustrates an example of the entire pro-
Section 3.1 and enhanced through L-layer GNN cess of CHEER (here we take a sub-graph of EIG
reasoning. The l-th layer takes a set of node em- for brevity). Different colors of edges indicate dif-
beddings Z(l) ∈ RN ×din as input, and outputs a ferent connection types in EIG. Edges with the
new set of node embeddings Z(l+1) ∈ RN ×dout , same color (i.e., the same edge type) will use the
(l)
where N = |V1 | + |V2 | is the number of nodes, din same γt . Each layer has its own set of γt . Then
and dout are the dimensions of input and output we could instantiate the aggregation function g as:
embeddings, respectively. Formally, the output of  
(l) (l) (l) (l)
the l-th layer for node vi can be written as: g zi , zj = f (γt + αij )(zjl Wv(l) ), (9)
 
X   (l)
(l+1) (l) (l) where f denotes the softmax function, Wv ∈
zi = σ g zi , zj  , (5)
Rdin ×dout is the parametwer weight matrix. αij
j∈Ni
is computed by a shared self-attention mecha-
where σ denotes non-linearity, Ni denotes the set nism (Vaswani et al., 2017) to measure the im-
that contains all the first-order neighbors of vi , g de- portance of neighbor j to i, where Wq , Wk ∈
notes how to aggregate neighborhood information. Rdin ×dout are parameter weight matrices:
By stacking multiple layers L, multi-hop reasoning
could be reached. (zi Wq )(zj Wk )T
αij = √ . (10)
EIG Reasoning Network Instantiation dout

Event & Event-pair Features For an event node As shown in Figure 2, the above process can be
ei , we directly take the centrality-aware event em- organized as a matrix multiplication to compute
beddings for its initialization: representations for all the nodes simultaneously
through a weighted adjacency matrix. Denote Aij
(0)
zi = cei Wt , vi ∈ V1 , (6) as the (i, j)-element of the binary adjacency matrix
A, Aij is 1 if there is an edge between nodes vi and
where 0 denotes the initial state for the following
vj or 0 otherwise. We could compute each entry of
neural layers, Wt ∈ Rd×2d is a parameter weight
the edge-aware adjacency matrix as follows, where
matrix to make event nodes be the same size as the (l) (l)
following event pair nodes for efficient computing. δij = f (γt + αij ) is the normalized weight:
As for an event pair node (ei , ej ) → vk , we ′ (l)
concatenate their corresponding two contextualized Aij = δij Aij , (11)
event embeddings as the event pair node features:
Figure 2 shows that the corresponding neighbor
(0)
zk = [hei ∥hej ], vk ∈ V2 , (7) node features are aggregated with different weights
according to δij to obtain the representation of the
EIG Reasoning It is intuitive that different types target node. Finally, the node representations of
of edges represent various semantics contributing layer l can be obtained by:
differently to the causality prediction. To handle
 ′ 
this heterogeneity issue, EIG Reasoning Network Z(l+1) = σ A (l) Z(l) Wv(l) . (12)
incorporates the edge features with a self-attention
mechanism during aggregation. Specifically, let 3.5 Training
T denote the number of edge types in EIG. We
Following ERGO (Chen et al., 2022), we adopt the
incorporate the edge features and learn a scalar
focal loss (Lin et al., 2017) to alleviate the false-
γt (1 ≤ t ≤ T ) for each different type of edge to
negative issue (i.e., the number of negative samples
measure their importance:
during training far exceeds that of positives). We
γt = rt Wr , (8) adopt the β-balanced variant of focal loss, which
10808
introduces a weighting factor β in [0, 1] for the Therefore, by modifying the event centrality in-
class “positive” and 1 − β for the class “negative”. corporation, the construction of EIG, and the aggre-
The loss function L2 can be written as: gation function, CHEER can degenerate into differ-
X ent GNN-based DECI methods, and thus provide a
L2 = − βei,j (1 − pei,j )τ log(pei,j ), (13) unified view for better document-level reasoning.
ei ,ej ∈D

where τ is the focusing hyper-parameter, β is a 4 Experiments


weighting hyper-parameter and its value is related
4.1 Experimental Setup
to the ratio of positive and negative samples.
Besides, we find that predicting causal and coref- Datasets Details We evaluate CHEER on two
erence relations jointly brings benefits. A support widely used datasets. EventStoryLine (version
point for this is that these two types of relations are 0.9) (Caselli and Vossen, 2017) contains 22 topics,
mutually exclusive. Thus, we leverage the coref- 258 documents, and 5,334 events. Among them,
erence information and perform a ternary classi- 1,770 intra-sentence and 3,885 inter-sentence event
fication training, i.e., to predict the label of each pairs are annotated with causal relations. Follow-
sample as a causal relation class, a coreference re- ing Gao et al. (2019), we group documents ac-
lation class, or no relation class (negative samples). cording to their topics. Documents in the last two
The final loss function combines event central- topics are used as the development data, and docu-
ity and causality learning, where λ is a hyper- ments in the remaining 20 topics are employed
parameter: for 5-fold cross-validation. Causal-TimeBank
L = λL1 + L2 , (14) (Mirza, 2014) contains 184 documents and 6,813
events. Among them, 318 event pairs are anno-
3.6 A Unified View of GNN-based DECI tated with causal relations. Following Tran Phu
Methods and Nguyen (2021), we employ 10-fold cross-
CHEER is a general framework that first constructs validation and only evaluate ECI performance for
a document-level graph, then incorporates event intra-sentence event pairs because the number of
centrality, and finally conducts reasoning on the inter-sentence event pairs in Causal-TimeBank is
graph. In this section, we discuss the difference quite small (i.e., only 18 pairs). EventStoryLine
between CHEER and previous GNN-based DECI provides ground-truth event coreference chains, but
methods. Note that only CHEER considers joint Causal-TimeBank does not. To solve this, we have
training, and we do not discuss loss function here. preprocessing steps on Causal-TimeBank. We first
(1) RichGCN (Tran Phu and Nguyen, 2021) has perform pre-training on EventStoryLine, and then
only event nodes and  uses vanilla
 GCN’s aggre- use the pre-trained model to extract coreference
(l) (l)
gation function: g zi , zj
(l)
= zj Wv . By re-
l data for Causal-TimeBank. We also use the Stan-
moving: i) event centrality incorporation, ii) event ford CoreNLP toolkit (Manning et al., 2014) for
pair nodes and their relevant edges, iii) edge fea- a supplement. After the preprocessing steps, we
tures and self-attention mechanism, CHEER could add event-event coreference edges E3 to EventSto-
degenerate into RichGCN’s framework. ryLine and Causal-TimeBank. We perform a joint
(2) DSGCN (Zhao et al., 2021) has only training in Section 3.5 on EventStoryLine. In eval-
event uation, we only report and compare the prediction
 nodes andPuses a combination of GCNs:
(l) (l)
g zi , zj = K l (l,k)
, where αk de- results of causal relations with baselines.
k=1 αk zj Wv
notes a feature filter. By removing: i) event cen- Implementation Details We set the dynamic
trality incorporation, ii) event pair nodes and their window size in Section 3.1 to 256, and divide docu-
relevant edges, iii) edge features and modifying g ments into several overlapping windows with a step
accordingly, CHEER is scalable to DSGCN. size of 32. We implement our method based on the
(3) ERGO (Chen et al., 2022) has only event- Pytorch version of Huggingface Transformer (Wolf
pair
 nodes and  performs self-attention aggregation: et al., 2020). We use uncased BERT-base (Devlin
(l) (l) (l) (l)
g zi , zj = f (αij )(zjl Wv ). By removing i) et al., 2019) as the document encoder. We optimize
event centrality incorporation, ii) event nodes and our model with AdamW (Loshchilov and Hutter,
their relevant edges, and iii) edge features, CHEER 2019) using a learning rate of 2e-5 with a linear
could degenerate into ERGO’s framework. warm-up for the first 8% steps. We apply layer nor-
10809
malization (Ba et al., 2016) and dropout (Srivastava Model
EventStoryLine Causal-TimeBank
et al., 2014) between the EIG reasoning network P(%) R(%) F1(%) P(%) R(%) F1(%)
layers. We clip the gradients of model parameters OP 22.5 98.6 36.6 - - -
to a max norm of 1.0. We perform early stopping LR+ 37.0 45.2 40.7 - - -
and tune the hyper-parameters by grid search based LIP 38.8 52.4 44.6 - - -
on the development set performance: dropout rate KMMG[◦] 41.9 62.5 50.1 36.6 55.6 44.1
KnowDis[◦] 39.7 66.5 49.7 42.3 60.5 49.8
∈ {0.1, 0.2, 0.3}, focusing parameter τ ∈ {0, 1, LSIN[◦] 47.9 58.1 52.5 51.5 56.2 53.7
2, 3}, weighting factor β ∈ {0.25, 0.5, 0.75}, loss LearnDA[◦] 42.2 69.8 52.6 41.9 68.0 51.9
weight λ ∈ {0.1, 0.2}. Our model is trained on an CauSeRL[◦] 41.9 69.0 52.1 43.6 68.1 53.2
NVIDIA RTX 2080 GPU with 24GB memory. BERT[◦] 47.8 57.2 52.1 47.6 55.1 51.1
RichGCN[◦] 49.2 63.0 55.2 39.7 56.5 46.7
Evaluation Metrics We adopt Precision (P), Re- ERGO[◦] 49.7 72.6 59.0 58.4 60.5 59.4
call (R), and F1-score (F1) as evaluation metrics, CHEER[◦] 56.9 69.6 62.6 56.4 69.5 62.3
same as previous methods (Tran Phu and Nguyen,
2021) to ensure comparability. Table 1: Models’ intra-sentence performance on
EventStoryLine and Causal-TimeBank, the best results
4.2 Baselines are in bold and the second-best results are underlined.
[◦] denotes models that use pre-trained BERT-base en-
We compare our proposed CHEER with various coders. Overall, CHEER outperforms previous SOTA
state-of-the-art SECI and DECI methods. methods with a significant test at the level of 0.05.
SECI Baselines (1) KMMG (Liu et al., 2020), a
Inter-sentence Intra + Inter
mention masking generalization method using ex- Model
tenal knowledge. (2) KnowDis (Zuo et al., 2020), P(%) R(%) F1(%) P(%) R(%) F1(%)
a knowledge-enhanced distant data augmentation OP 8.4 99.5 15.6 10.5 99.2 19.0
LR+ 25.2 48.1 33.1 27.9 47.2 35.1
method to alleviate the data lacking problem. (3) LIP 35.1 48.2 40.6 36.2 49.5 41.9
CauSeRL (Zuo et al., 2021a), which learns context-
BERT[◦] 36.8 29.2 32.6 41.3 38.3 39.7
specific causal patterns from external causal state- RichGCN[◦] 39.2 45.7 42.2 42.6 51.3 46.6
ments. (4) LearnDA (Zuo et al., 2021b), which ERGO [◦] 43.2 48.8 45.8 46.3 50.1 48.1
uses knowledge bases to augment training data. CHEER[◦] 45.2 52.1 48.4 49.7 53.3 51.4
(5) LSIN (Cao et al., 2021), which constructs a
descriptive graph to leverage external knowledge. Table 2: Model’s inter and (intra+inter)-sentence per-
formance on EventStoryLine.
DECI Baselines (1) OP (Caselli and Vossen,
2017), a dummy model that assigns causal re-
lations to event pairs. (2) LR+ and LIP (Gao number of inter-sentence event pairs in Causal-
et al., 2019), feature-based methods that construct TimeBank is quite small (i.e., only 18 pairs). Thus
document-level structures and use various types of we report the results of intra- and inter-sentence
resources. (3) BERT (our implementation) a base- settings separately.
line method that leverages dynamic window and
Intra-sentence Evaluation From Table 1, we
event marker techniques. (4) RichGCN (Tran Phu
can observe that: (1) CHEER outperforms all the
and Nguyen, 2021), which constructs a document-
baselines by a large margin on both datasets, which
level interaction graph and uses GCN to capture rel-
demonstrates its effectiveness. (2) Compared with
evant connections. (5) ERGO (Chen et al., 2022),
feature-based methods OP, LR+, and LIP, models
which builds a relational graph and model inter-
using PLMs far boost the performance, which ver-
action between event pairs. We compare with its
ifies that BERT could extract useful text features
BERT-base implementation for fairness. Due to
for the ECI task. We notice that OP achieves the
DSGCN (Zhao et al., 2021) does not provide re-
highest Recall on EventStoryLine, which may be
sults on benchmark datasets and does not release
due to simply assigning causal relations by mim-
codes, we do not compare with it here.
icking the textual order. This leads to many false
4.3 Overall Results positives and thus a low Precision.

Since some baselines can not handle the inter- Inter-sentence Evaluation From Table 2, we can
sentence scenarios in EventStoryLine, and the observe that: (1) CHEER greatly outperforms all
10810
Intra
Model Intra Inter
+ Inter
CHEER 62.6 48.4 51.4
w/o event centrality 60.3 46.3 49.3
w/o edge features 61.4 47.6 50.4
w/o coref 60.8 46.9 50.1

Table 3: F1 results of ablation study on EventStoryLine.

the baselines under both inter- and (intra+inter)-


sentence settings. This demonstrates that CHEER
can make better document-level inferences via our
effective modeling over EIG. (2) the overall F1-
Figure 3: DECI performance of using different ways
score of the inter-sentence setting is much lower of event causality incorporation and the according F1
than that of the intra-sentence, which shows the results of central event prediction.
challenge of DECI where events scatter in the doc-
ument without clear causal indicators. Specifically,
the BERT baseline could achieve competitive per- 4.5 Event Centrality Investigation
formance under the intra-sentence setting. How- We further analyze the role of central events in the
ever, it performs much worse than LIP, RichGCN, DECI task and the effect of our incorporation ways.
ERGO, and CHEER under inter-sentence settings,
which indicates that a document-level structure
4.5.1 Role of Central Events
or graph helps capture the global interactions for
causal relation prediction. In Figure 3, the histograms represent the F1 results
of CHEER under intra/inter/intra+inter settings on
EventStoryLine. Three different groups represent
4.4 Ablation Study three different ways of event causality incorpora-
tion, and the lines represent F1 results of central
To analyze the effect of each main component events prediction under three ways: (1) w/o event
proposed in CHEER, we consider evaluating the centrality, which removes the event centrality in-
following ablated models on the EventStoryLine corporation introduced in Section 3.3; (2) CHEER,
dataset. As shown in Table 3: (1) Effect of Event the original incorporation way; (3) w/ g-t central
Centrality (w/o event centrality), which removes events, which preserves centrality-aware event em-
event centrality incorporation introduced in Sec- beddings as event node features initialization but
tion 3.3. Removing event centrality leads to in- uses ground-truth central event labels to distinguish
formation loss from the document to the graph. edge types. It can be seen that the F1 result of
The performance degradation proves our contribu- our central event classification reaches nearly 80%,
tion to preserving the event centrality information. which is feasible and still has space for improve-
(2) Effect of Edge Features (w/o edge features), ment. We also observe that compared with using
which does not incorporate the edge features in ground-truth labels, the inaccuracy of event cen-
Section 3.4 and thus the learnable scalar γt is re- trality prediction limits the performance of DECI.
moved in aggregation function. We can see that re- Nevertheless, the performance of event centrality
moving the edge-aware scalar clearly decreases the prediction could be higher by using more advanced
performance, which validates the necessity of cap- encoding methods.
turing the semantic information of different edge
features in EIG. (3) Effect of Coreference (w/o 4.5.2 Case Study
coref), which removes the E3 edges in EIG and In this section, we conduct a case study to further
does not use the ground-truth coreference chains illustrate an intuitive impression of CHEER and
as auxiliary training labels. The results indicate choose the SOTA baseline ERGO for comparison.
that the prior coreference information is helpful for In Figure 3, we show a piece of text with five events,
the DECI task and supports us to unify event and where quake is the central event (with a corefer-
event-pair graphs. ence earthquake) We notice that: (1) ERGO cannot
10811
‘Several die’ in south Iran quake rate event centrality via well-designed features and
Novemer 27 , 2005 multi-task learning. Extensive experiments show a
A powerful earthquake has hit southern Iran, destroying
several villages and killing at least three people and great improvement of CHEER for both intra- and
injuring others , according to reports. inter-sentence ECI on two benchmark datasets. Fur-
No. Event Pair GT ERGO CHEER ther analysis demonstrates the effectiveness of each
1 (quake, die) Yes Yes Yes main component.
2 (die, destroying) No No No
Limitations
3 (quake, destroying) Yes No Yes
4 (earthquake, die) Yes No Yes Although our modeling of event centrality is feasi-
(quake, die) (die, destroying) (quake, die) (die, destroying)
ble and effective, there is still space for improve-
0.12 0.10 0.17 0.08 ment. The performance of event centrality pre-
… (quake, destroying) … (quake, destroying) diction could be higher by using more advanced
quake
0.09
quake 0.28 0.12
0.14 destroying destroying encoding methods.
Besides, it is meaningful to further explore the
Figure 4: A case study of CHEER. interactions among various types of event relations.
Existing datasets only cover limited relation types
achieve the coreference consistency (No.1 and 4 at once, and many works focus on the identification
event pairs), but CHEER could solve this explicitly of causal relations alone. In this paper, although
by introducing prior relations and joint training. (2) we further consider the effect of coreference rela-
ERGO could suffer from the false negative issue tions and perform joint classification, there are still
(No.3 event pair). For example when (quake, de- some other relations that can be explored, such as
stroying) receives positive prediction from (quake, temporal relations, subevent relations, etc.
die) but negative prediction from (die, destroying),
Acknowledgments
it tends to think the transitivity does not hold and
outputs a wrong prediction. In contrast, CHEER This work was supported by the Singapore Ministry
blocks the propagation over these misleading paths of Education (MOE) Academic Research Fund
by making central events take effect. 3) In the bot- (AcRF) Tier 1 grant, as well as cash and in-kind
tom graph, we visualize the normalized weights δ contribution from the industry partner(s).
of Equation (11) with (left part) and without event
centrality information (right part). For clarity, we
only show some main nodes and edges here. We References
could see that when there is no event centrality Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hin-
incorporation, the δ values of neighboring nodes ton. 2016. Layer normalization. ArXiv preprint,
to (quake, destroying) are relatively even, which abs/1607.06450.
makes its prediction disturbed by negative paths, Long Bai, Saiping Guan, Jiafeng Guo, Zixuan Li, Xiao-
i.e., information from (die, destroying) node. When long Jin, and Xueqi Cheng. 2021. Integrating deep
the event centrality is incorporated, (quake, destroy- event-level and script-level information for script
ing) pays more attention to the paths where central event prediction. In Proceedings of the 2021 Confer-
ence on Empirical Methods in Natural Language Pro-
events are involved, i.e., quake node and (quake, cessing, pages 9869–9878, Online and Punta Cana,
die) node. Therefore, CHEER can learn more from Dominican Republic. Association for Computational
such informative neighbors for the DECI task. Linguistics.

5 Conclusion Pengfei Cao, Xinyu Zuo, Yubo Chen, Kang Liu, Jun
Zhao, Yuguang Chen, and Weihua Peng. 2021.
In this paper, we propose a novel centrality-aware Knowledge-enriched event causality identification
high-order event reasoning network (CHEER) to via latent structure induction networks. In Proceed-
ings of the 59th Annual Meeting of the Association for
conduct global reasoning for DECI. We first sum- Computational Linguistics and the 11th International
marize a general GNN-based DECI model and pro- Joint Conference on Natural Language Processing
vide a unified view for better understanding. Then (Volume 1: Long Papers), pages 4862–4872, Online.
we design an Event Interaction Graph (EIG) that Association for Computational Linguistics.
involves prior event relations and high-order inter- Tommaso Caselli and Piek Vossen. 2017. The event
actions among event pairs. Finally, we incorpo- StoryLine corpus: A new benchmark for causal and

10812
temporal relation extraction. In Proceedings of the blogs and films. In Proceedings of the Events and Sto-
Events and Stories in the News Workshop, pages 77– ries in the News Workshop, pages 52–58, Vancouver,
86, Vancouver, Canada. Association for Computa- Canada. Association for Computational Linguistics.
tional Linguistics.
Kazuma Kadowaki, Ryu Iida, Kentaro Torisawa, Jong-
Meiqi Chen, Yixin Cao, Kunquan Deng, Mukai Li, Hoon Oh, and Julien Kloetzer. 2019. Event causal-
Kun Wang, Jing Shao, and Yan Zhang. 2022. Ergo: ity recognition exploiting multiple annotators’ judg-
Event relational graph transformer for document- ments and background knowledge. In Proceedings
level event causality identification. arXiv preprint of the 2019 Conference on Empirical Methods in Nat-
arXiv:2204.07434. ural Language Processing and the 9th International
Joint Conference on Natural Language Processing
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and (EMNLP-IJCNLP), pages 5816–5822, Hong Kong,
Kristina Toutanova. 2019. BERT: Pre-training of China. Association for Computational Linguistics.
deep bidirectional transformers for language under-
standing. In Proceedings of the 2019 Conference of Thomas N. Kipf and Max Welling. 2017. Semi-
the North American Chapter of the Association for supervised classification with graph convolutional
Computational Linguistics: Human Language Tech- networks. In 5th International Conference on Learn-
nologies, Volume 1 (Long and Short Papers), pages ing Representations, ICLR 2017, Toulon, France,
4171–4186, Minneapolis, Minnesota. Association for April 24-26, 2017, Conference Track Proceedings.
Computational Linguistics. OpenReview.net.

Quang Do, Yee Seng Chan, and Dan Roth. 2011. Min- Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaim-
imally supervised event causality identification. In ing He, and Piotr Dollár. 2017. Focal loss for dense
Proceedings of the 2011 Conference on Empirical object detection. In IEEE International Conference
Methods in Natural Language Processing, pages 294– on Computer Vision, ICCV 2017, Venice, Italy, Octo-
303, Edinburgh, Scotland, UK. Association for Com- ber 22-29, 2017, pages 2999–3007. IEEE Computer
putational Linguistics. Society.
Lei Gao, Prafulla Kumar Choubey, and Ruihong Huang. Jian Liu, Yubo Chen, and Jun Zhao. 2020. Knowl-
2019. Modeling document-level causal structures for edge enhanced event causality identification with
event causal relation identification. In Proceedings mention masking generalizations. In Proceedings
of the 2019 Conference of the North American Chap- of the Twenty-Ninth International Joint Conference
ter of the Association for Computational Linguistics: on Artificial Intelligence, IJCAI 2020, pages 3608–
Human Language Technologies, Volume 1 (Long and 3614. ijcai.org.
Short Papers), pages 1808–1817, Minneapolis, Min-
nesota. Association for Computational Linguistics. Ilya Loshchilov and Frank Hutter. 2019. Decoupled
weight decay regularization. In 7th International
Chikara Hashimoto. 2019. Weakly supervised mul- Conference on Learning Representations, ICLR 2019,
tilingual causality extraction from Wikipedia. In New Orleans, LA, USA, May 6-9, 2019. OpenRe-
Proceedings of the 2019 Conference on Empirical view.net.
Methods in Natural Language Processing and the
9th International Joint Conference on Natural Lan- Yubo Ma, Zehao Wang, Yixin Cao, Mukai Li, Meiqi
guage Processing (EMNLP-IJCNLP), pages 2988– Chen, Kun Wang, and Jing Shao. 2022. Prompt for
2999, Hong Kong, China. Association for Computa- extraction? PAIE: Prompting argument interaction
tional Linguistics. for event argument extraction. In Proceedings of the
60th Annual Meeting of the Association for Compu-
Chikara Hashimoto, Kentaro Torisawa, Julien Kloet- tational Linguistics (Volume 1: Long Papers), pages
zer, Motoki Sano, István Varga, Jong-Hoon Oh, and 6759–6774, Dublin, Ireland. Association for Compu-
Yutaka Kidawara. 2014. Toward future scenario gen- tational Linguistics.
eration: Extracting event causality exploiting seman-
tic relation, context, and association features. In Christopher D Manning, Mihai Surdeanu, John Bauer,
Proceedings of the 52nd Annual Meeting of the As- Jenny Rose Finkel, Steven Bethard, and David Mc-
sociation for Computational Linguistics (Volume 1: Closky. 2014. The stanford corenlp natural language
Long Papers), pages 987–997, Baltimore, Maryland. processing toolkit. In Proceedings of 52nd annual
Association for Computational Linguistics. meeting of the association for computational linguis-
tics: system demonstrations, pages 55–60.
Christopher Hidey and Kathy McKeown. 2016. Identi-
fying causal relations using parallel Wikipedia arti- Paramita Mirza. 2014. Extracting temporal and causal
cles. In Proceedings of the 54th Annual Meeting of relations between events. In Proceedings of the ACL
the Association for Computational Linguistics (Vol- 2014 Student Research Workshop, pages 10–17, Bal-
ume 1: Long Papers), pages 1424–1433, Berlin, Ger- timore, Maryland, USA. Association for Computa-
many. Association for Computational Linguistics. tional Linguistics.

Zhichao Hu, Elahe Rahimtoroghi, and Marilyn Walker. Qiang Ning, Zhili Feng, Hao Wu, and Dan Roth. 2018.
2017. Inference of fine-grained event causality from Joint reasoning for temporal and causal relations. In

10813
Proceedings of the 56th Annual Meeting of the As- Information Processing Systems 2017, December 4-9,
sociation for Computational Linguistics (Volume 1: 2017, Long Beach, CA, USA, pages 5998–6008.
Long Papers), pages 2278–2288, Melbourne, Aus-
tralia. Association for Computational Linguistics. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
Chaumond, Clement Delangue, Anthony Moi, Pier-
Mehwish Riaz and Roxana Girju. 2013. Toward a better ric Cistac, Tim Rault, Remi Louf, Morgan Funtow-
understanding of causality between verbal events: Ex- icz, Joe Davison, Sam Shleifer, Patrick von Platen,
traction and analysis of the causal power of verb-verb Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
associations. In Proceedings of the SIGDIAL 2013 Teven Le Scao, Sylvain Gugger, Mariama Drame,
Conference, pages 21–30, Metz, France. Association Quentin Lhoest, and Alexander Rush. 2020. Trans-
for Computational Linguistics. formers: State-of-the-art natural language processing.
In Proceedings of the 2020 Conference on Empirical
Mehwish Riaz and Roxana Girju. 2014a. In-depth ex- Methods in Natural Language Processing: System
ploitation of noun and verb semantics to identify Demonstrations, pages 38–45, Online. Association
causation in verb-noun pairs. In Proceedings of the for Computational Linguistics.
15th Annual Meeting of the Special Interest Group on
Discourse and Dialogue (SIGDIAL), pages 161–170, Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin,
Philadelphia, PA, U.S.A. Association for Computa- Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou,
tional Linguistics. and Maosong Sun. 2019. DocRED: A large-scale
document-level relation extraction dataset. In Pro-
Mehwish Riaz and Roxana Girju. 2014b. Recognizing ceedings of the 57th Annual Meeting of the Associa-
causality in verb-noun pairs via noun and verb seman- tion for Computational Linguistics, pages 764–777,
tics. In Proceedings of the EACL 2014 Workshop on Florence, Italy. Association for Computational Lin-
Computational Approaches to Causality in Language guistics.
(CAtoCL), pages 48–57, Gothenburg, Sweden. Asso-
ciation for Computational Linguistics. Wenpeng Yin, Dragomir Radev, and Caiming Xiong.
2021. DocNLI: A large-scale dataset for document-
Jiaxin Shi, Shulin Cao, Lei Hou, Juanzi Li, and level natural language inference. In Findings of
Hanwang Zhang. 2021. Transfernet: An effec- the Association for Computational Linguistics: ACL-
tive and transparent framework for multi-hop ques- IJCNLP 2021, pages 4913–4922, Online. Association
tion answering over relation graph. ArXiv preprint, for Computational Linguistics.
abs/2104.07302.
Kun Zhao, Donghong Ji, Fazhi He, Yijiang Liu, and
Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. Yafeng Ren. 2021. Document-level event causality
Conceptnet 5.5: An open multilingual graph of gen- identification via graph inference mechanism. Infor-
eral knowledge. In Proceedings of the Thirty-First mation Sciences, 561:115–129.
AAAI Conference on Artificial Intelligence, February
4-9, 2017, San Francisco, California, USA, pages Xinyu Zuo, Pengfei Cao, Yubo Chen, Kang Liu, Jun
4444–4451. AAAI Press. Zhao, Weihua Peng, and Yuguang Chen. 2021a.
Improving event causality identification via self-
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky,
supervised representation learning on external causal
Ilya Sutskever, and Ruslan Salakhutdinov. 2014.
statement. In Findings of the Association for Com-
Dropout: a simple way to prevent neural networks
putational Linguistics: ACL-IJCNLP 2021, pages
from overfitting. The journal of machine learning
2162–2172, Online. Association for Computational
research, 15(1):1929–1958.
Linguistics.
Yuan Sui, Shanshan Feng, Huaxiang Zhang, Jian Cao,
Liang Hu, and Nengjun Zhu. 2022. Causality-aware Xinyu Zuo, Pengfei Cao, Yubo Chen, Kang Liu, Jun
enhanced model for multi-hop question answering Zhao, Weihua Peng, and Yuguang Chen. 2021b.
over knowledge graphs. Knowledge-Based Systems, LearnDA: Learnable knowledge-guided data augmen-
250:108943. tation for event causality identification. In Proceed-
ings of the 59th Annual Meeting of the Association for
Minh Tran Phu and Thien Huu Nguyen. 2021. Graph Computational Linguistics and the 11th International
convolutional networks for event causality identifi- Joint Conference on Natural Language Processing
cation with rich document-level structures. In Pro- (Volume 1: Long Papers), pages 3558–3571, Online.
ceedings of the 2021 Conference of the North Amer- Association for Computational Linguistics.
ican Chapter of the Association for Computational
Linguistics: Human Language Technologies, pages Xinyu Zuo, Yubo Chen, Kang Liu, and Jun Zhao. 2020.
3480–3490, Online. Association for Computational KnowDis: Knowledge enhanced data augmentation
Linguistics. for event causality detection via distant supervision.
In Proceedings of the 28th International Conference
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob on Computational Linguistics, pages 1544–1550,
Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Barcelona, Spain (Online). International Committee
Kaiser, and Illia Polosukhin. 2017. Attention is all on Computational Linguistics.
you need. In Advances in Neural Information Pro-
cessing Systems 30: Annual Conference on Neural

10814
ACL 2023 Responsible NLP Checklist
A For every submission:
3 A1. Did you describe the limitations of your work?

Limitations

 A2. Did you discuss any potential risks of your work?


Not applicable. Left blank.
3 A3. Do the abstract and introduction summarize the paper’s main claims?

Abstract & 1 Introduction


7 A4. Have you used AI writing assistants when working on this paper?
Left blank.
3 Did you use or create scientific artifacts?
B 
4 Experiments
3 B1. Did you cite the creators of artifacts you used?

4.1 Experimental Setup

 B2. Did you discuss the license or terms for use and / or distribution of any artifacts?
Not applicable. Left blank.
3 B3. Did you discuss if your use of existing artifact(s) was consistent with their intended use, provided

that it was specified? For the artifacts you create, do you specify intended use and whether that is
compatible with the original access conditions (in particular, derivatives of data accessed for research
purposes should not be used outside of research contexts)?
4.1 Experimental Setup

 B4. Did you discuss the steps taken to check whether the data that was collected / used contains any
information that names or uniquely identifies individual people or offensive content, and the steps
taken to protect / anonymize it?
Not applicable. Left blank.

 B5. Did you provide documentation of the artifacts, e.g., coverage of domains, languages, and
linguistic phenomena, demographic groups represented, etc.?
Not applicable. Left blank.
3 B6. Did you report relevant statistics like the number of examples, details of train / test / dev splits,

etc. for the data that you used / created? Even for commonly-used benchmark datasets, include the
number of examples in train / validation / test splits, as these provide necessary context for a reader
to understand experimental results. For example, small differences in accuracy on large test sets may
be significant, while on small test sets they may not be.
4.1 Experimental Setup

C 3 Did you run computational experiments?



4 Experiments

 C1. Did you report the number of parameters in the models used, the total computational budget
(e.g., GPU hours), and computing infrastructure used?
No response.
The Responsible NLP Checklist used at ACL 2023 is adopted from NAACL 2022, with the addition of a question on AI writing
assistance.

10815
3 C2. Did you discuss the experimental setup, including hyperparameter search and best-found

hyperparameter values?
4.1 Experimental Setup
3 C3. Did you report descriptive statistics about your results (e.g., error bars around results, summary

statistics from sets of experiments), and is it transparent whether you are reporting the max, mean,
etc. or just a single run?
4.3 Overall Results
3 C4. If you used existing packages (e.g., for preprocessing, for normalization, or for evaluation), did

you report the implementation, model, and parameter settings used (e.g., NLTK, Spacy, ROUGE,
etc.)?
4.1 Experimental Setup

D 3 Did you use human annotators (e.g., crowdworkers) or research with human participants?

3.3 Event Centrality Incorporation
3 D1. Did you report the full text of instructions given to participants, including e.g., screenshots,

disclaimers of any risks to participants or annotators, etc.?
3.3 Event Centrality Incorporation

 D2. Did you report information about how you recruited (e.g., crowdsourcing platform, students)
and paid participants, and discuss if such payment is adequate given the participants’ demographic
(e.g., country of residence)?
Not applicable. Left blank.

 D3. Did you discuss whether and how consent was obtained from people whose data you’re
using/curating? For example, if you collected data via crowdsourcing, did your instructions to
crowdworkers explain how the data would be used?
Not applicable. Left blank.

 D4. Was the data collection protocol approved (or determined exempt) by an ethics review board?
Not applicable. Left blank.

 D5. Did you report the basic demographic and geographic characteristics of the annotator population
that is the source of the data?
Not applicable. Left blank.

10816

You might also like