0% found this document useful (0 votes)

6 views

Language Models Can Improve Event Prediction

The paper introduces LAMP, a framework that integrates large language models (LLMs) to enhance event prediction through abductive reasoning. By leveraging the reasoning capabilities of LLMs, the framework significantly improves the performance of event sequence models on real-world datasets. Extensive experiments demonstrate that LAMP outperforms state-of-the-art models by effectively suggesting possible causes for predicted events and retrieving relevant historical data.

Uploaded by

Habeenkii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Language Models Can Improve Event Prediction

Uploaded by

Habeenkii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Language Models Can Improve Event Prediction

by Few-Shot Abductive Reasoning

Xiaoming Shi1 Siqiao Xue1 Kangrui Wang3 Fan Zhou1 James Y. Zhang1
Jun Zhou1 Chenhao Tan2 Hongyuan Mei3
1
Ant Group 2 UChicago 3 TTIC
{peter.sxm,siqiao.xsq,hanlian.zf,james.z,jun.zhoujun}@antgroup.com
arXiv:2305.16646v2 [cs.CL] 8 Oct 2023

[email protected] {kangrui,hongyuan}@ttic.edu

Abstract
Large language models have shown astonishing performance on a wide range of
reasoning tasks. In this paper, we investigate whether they could reason about
real-world events and help improve the prediction performance of event sequence
models. We design LAMP, a framework that integrates a large language model in
event prediction. Particularly, the language model performs abductive reasoning to
assist an event sequence model: the event model proposes predictions on future
events given the past; instructed by a few expert-annotated demonstrations, the
language model learns to suggest possible causes for each proposal; a search
module finds out the previous events that match the causes; a scoring function learns
to examine whether the retrieved events could actually cause the proposal. Through
extensive experiments on several challenging real-world datasets, we demonstrate
that our framework—thanks to the reasoning capabilities of large language models—
could significantly outperform the state-of-the-art event sequence models.

1 Introduction
Prompting large language models (LLMs) such as GPT-3.5 has recently become a standard approach
to perform text-based reasoning tasks. In this paper, we investigate their capabilities in reasoning
about real-world events and improving event prediction. Particularly, we focus on the problem of
modeling sequences of time-stamped events and predicting future events given the past. For example,
in the healthcare domain, we would like to model patients’ sequences of time-stamped hospital
visits and predict their future symptoms given their past diagnosis and treatments. It has been a
long-standing and important problem in machine learning. Large language models are potentially
useful for advancing solutions to this problem because event sequences are often accompanied with
rich text information which large language models excel at handling. For example,

• Healthcare. Each hospital visit will have a doctor note summarizing this visit, including the
department that the patient visits, the clinical measurements and treatments, and any future medical
plans. By reading such textual information, a large language model may be elicited to recall the
medical knowledge that it has read during pretraining and then reason about the future hospital
visits such as what symptoms or treatments that the patient may have.
• Political. Each political event may generate a series of news articles describing the political agents
involved in it and discussing its possible influences. A language model reading these articles may
recall its knowledge—which is acquired from pretraining—about these agents, their relations, and
fundamental principles in politics such that it could reason about future political events.
• Similar scenarios arise in commercial, dialogue, finance, etc.

In this paper, we propose LAMP, a framework that integrates a large language model in event
prediction. The overview of our framework is illustrated in Figure 1. Given a history of previous
37th Conference on Neural Information Processing Systems (NeurIPS 2023).
Subject: TESLA
Prompting
1 Event Sequence Model Object: AUSTRALIA
Predicate: REJECT
2 Large Language Model Subject: TESLA
Date:2022-01-16 Object: AUSTRALIA
Generated cause events Predicate: COOPERATE
Date:2022-01-16
Subject: TESLA
Object: AUSTRALIA
Subject: TESLA Subject: BYD Subject: TESLA
3 …
Subject: Subject: Subject: Predicate: COOPERATE
Object: GERMANY Object: INDUSTRY Object: CHINA
TESLA FDA COMPANY Date:2022-01-16
Predicate: CONSULT Predicate: APPEAL
… Predicate: MAKE STMT
Subject: TESLA
Object: Object: Object: … Date, News … Date, News … Date, News …
Ranking Model Object: AUSTRALIA
EUROPE AMERICAN CHINA Predicate: REJECT
Subject: TESLA
Predicate: Predicate: Predicate: Date:2022-01-16
Object: AUSTRALIA Retrieve relevant events
COOPERATE YIELD MAKE STMT
Predicate: FIGHT
Date: 2022-01-01 Date: 2022-01-03 Date: 2022-01-05
Date:2022-01-16
News: Tesla model Subject: TESLA Subject: FDA Subject: COMPANY Subject: TESLA
News: FDA News: EV battery
3 is Europe’s reportedly close to prices go up Predicate?
Object: EUROPE Object: AMERICAN … Object: CHINA Object: AUSTRALIA
Predicate: COOPERATE Predicate: YIELD Predicate: MAKE STMT Predicate: FIGHT
best-selling EV in authorizing third facing Lithium TESLA AUSTRALIA
Date, News … Date, News … Date, News … Date:2022-01-16
2021. Pﬁzer shot for kids. shortage. 2022-01-16

Time Time
History Future History

Figure 1: An overview of our framework that leverages a large language model to reason about events. Firstly, an
event sequence model proposes predictions: in this example, we predict the predicate of the structured event type
given its time, subject, and object. Secondly, a language model suggests cause events, which will pattern-match
against actual previous events and retrieve the most relevant. In the end, a neural model learns to assign high
scores to the proposed predictions that are strongly supported by the retrieved evidence.

events, we use a pretrained event sequence model to propose predictions on the future events, which
are then examined with the assistance of an LLM. The LLM learns to perform abductive reasoning:
it is instructed by a few expert-annotated demonstrations, and generates possible causes that may
explain the possible occurrence of each proposal. Each generated cause serves as a query to search
for similar or relevant events that have actually happened. Then another neural model learns to embed
these retrievals and examine whether they could really justify the corresponding proposal.
We are the first—to the best of our knowledge—to integrate large language models into event sequence
modeling. Our modeling and prediction framework is general: it can incorporate all kinds of event
sequence models and large language models. We experimented with a range of model choices and
demonstrate that large language models could indeed help improve the prediction performance of
event sequence models. On several challenging real-world datasets, our framework significantly
outperforms the current state-of-the-art event sequence models.

2 Problem Formulation and Technical Background

Now we give a formal introduction to our problem setting and review the background knowledge.

Event sequence modeling. The problem is to model event sequences (t1 , k1 ), (t2 , k2 ), . . ., where
0 < t1 < t2 < . . . are times of occurrence and each ki ∈ K is a discrete event type. The goal is
to predict the next event for a given history of events Hi = (t1 , k1 ), . . . , (ti−1 , ki−1 ). Precisely, it
consists of two subtasks: the first is to predict the time ti of the next event; the second is to predict
the type ki of the next event with the knowledge of its time ti .
The standard approach is to build a probabilistic model over the sequences. Such models typically
define an intensity function λk : the intensity value λk (t) is the instantaneous rate that an event of
type k occurs at time t. Given the function λk , one could obtain the minimum Bayes risk (MBR)
prediction of the next event given the history. Particularly, the MBR time prediction t̂i is
Z ∞ Z t !
X
t̂i = tλ(t) exp − λ(s)ds dt where λ(t) = λk (t) (1)
ti−1 ti−1 k∈K
and it could be approximated by averaging samples given by the thinning algorithm (Lewis & Shedler,
1979; Liniger, 2009). The MBR type prediction k̂i given time ti is
k̂i = argmaxk λk (ti ) (2)
The intensity function λk is typically learned by maximizing the log-likelihood of the model. For a
time period (0, T ) that contains observed events (t1 , k1 ), . . . , (tI , kI ), the log-likelihood is
XI Z T X
log λki (ti ) − λk (t)dt (3)
i=1 t=0 k∈K

Rich text information. In real-world data, each type k may be represented as a text-based identifier:
in the example of Figure 1, each k is one of the possible interactions between the political entities
(organizations and individuals) in the G20 countries, which can be represented with a structured name

2
such as Tesla-cooperate-Australia. In addition, each event may have a text mark m that contains
additional information about the event: in Figure 1, each m is a news headline about the event (e.g.,
“EV battery prices go up”). For notation simplicity, we will only mention the mark m of an event
when necessary. While reading such text information, a human may recall their relevant domain
knowledge (e.g., influence of battery prices on Tesla and Australia) and increase their estimate on the
probability that an event of Tesla-cooperate-Australia happens in the near future. An important way
that humans learn such knowledge is reading text such as textbooks, research publications, and news
articles. But event sequence models can not directly leverage this kind of information.

Large language models. Language models learn by reading text. Over the past years, large
language models that have read nearly the entire internet have shown astonishing performance on
many challenging tasks such as arithmetic reasoning and multi-turn dialogue (Wei et al., 2022b;
OpenAI, 2023). So it seems tempting to pair a large language model with an event sequence model
to improve its prediction performance: the language model has consumed a tremendous amount of
information that the event model may not have seen but should be able to benefit from.

3 LAMP: Large Language Model in Event Prediction

Now we present our LAMP framework, in which an LLM is leveraged to enhance the prediction
process of an event sequence model. As shown in Figure 1, LAMP has three key components:
• A base event sequence model. This model is pretrained and we use it to propose candidate
predictions. Section 3.1 is the discussion of this phase.
• A large language model. Its duty in the framework is to perform abductive reasoning, a form of
logical inference seeking the most plausible explanations for a given observation (Russell & Norvig,
2010). Particularly, the language model reads each proposed prediction and suggests possible cause
events for it. Then we pattern-match each LLM-generated cause against the actual previous events
in the history, and retrieve those which are most similar. Section 3.2 discusses this phase.
• A ranking model. The ranking model learns to examine each combination of the candidate
prediction and its retrieved events—or, in other words, its evidence—and assign high scores to the
candidates that are strongly supported by the retrieved evidence. Section 3.3 discusses this phase.
3.1 Phase-I: Proposing Predictions
Given a history of previous events Hi = (t1 , k1 ), . . . , (ti−1 , ki−1 ), the base event sequence model is
used as a proposer to generate candidate predictions on the time and type of the next event.
(1) (L)
For time prediction, we draw L i.i.d. samples t̂i , . . . , t̂i from the base model via the thinning
algorithm. If we were to only use this base model but not our LLM-enhanced framework, the final
PL (ℓ)
MBR time prediction would be the average of the samples, i.e., t̂i = L1 ℓ=1 t̂i . However, the
MBR prediction may not be accurate since the base model is imperfect. Therefore, our LAMP
(L+1)
framework treats all the L + 1 samples—with t̂i denoting the MBR prediction—as candidates,
and utilize the LLM and ranking model to score them in later phases. If any of the L draws is actually
a better prediction than the MBR estimate, our framework has a chance to rank it higher.
As we’ll show shortly in sections 3.2 and 3.3, the LLM and ranking model work on full events. So we
(ℓ) (ℓ,m) M (ℓ) (ℓ,m)
find the most probable M full events {(t̂i , k̂i )}m=1 for each time proposoal t̂i , where k̂i
(ℓ)
is the event type that has the m-th highest intensity at time t̂i .
(1) (M ) (m)
For type prediction given the ground-truth time ti , we find M event types k̂i , . . . , k̂i where k̂i
has the m-th highest intensity at time ti under the base model. If we were to only use this base model
(1)
for prediction, the MBR type prediction would be the event type k̂i with the highest model intensity.
However, our full framework will use the LLM and ranking model to examine each of the top M full
(m)
events {(ti , k̂i )}M
m=1 at time ti in order to make a more informed prediction.
In practice, the event types often have structures and we may be interested in predicting an attribute
of the structured type. Figure 1 shows an example in which we are trying to predict the predicate
of the structured event type given its time, subject, and object. In such cases, we just need to select
as proposals the most probable M event types whose other attributes are the same as the known
information (e.g., Tesla and Australia in Figure 1).

3
I want you to do the reasoning over ## Example 1
social events. I given you an effect
event and you give me four or five effect
cause events. An effect event is an predicate: APPEAL
event that happens. A cause event is time: 2022−04−23
believed to be one of the causes that subject: GERMANY
have triggerred the effect event to object: GREEN PROJECT
happen. Each event consists of a time,
a type (that includes subject, reasoning:
predicate, object), and a news headline −−−−−−−−−−−−−−−−−−−−−−−−
describing the event. cause event 1
predicate: REDUCE RELATIONS
The predicates are restricted to the 20 time: 2022−04−21
options below. subject: EUROPE
1. MAKE STATEMENT object: RUSSIA
.
. headline: Europe determined to ban
. // Full list are in Appendix E.4. Russian energy exports.
20. ENGAGE IN MASS VIOLENCE
cause event 2
Now I give you 10 examples. In each predicate: DISAPPROVE
example, the first event is the effect time: 2022−03−16
and the next several events are the subject: EUROPE
causes that happened earlier. object: RUSSIAN
.
. headline: Europe can endure painful
. // Examples are in Listing 2. transition to live without Russian oil.
Now please generate possible causes for
.
.
effect . // Other causes are in Appendix E.4.
predicate: CONSULT
time: 2022−07−05 ## Example 2
subject: CHINA PM .
object: YELLEN .
. // Other examples in Appendix E.4.
Listing 1: Format of our LLM prompt. Listing 2: Few-shot examples in our prompt.

The L and M are hyperparameters. Ideally, we would like to analyze all the possible candidates (i.e.,
L = ∞ and M = |K|), which is intractable for time prediction and expensive for type prediction.
In practice, we focus on the most plausible candidates to maintain a low computation cost. In our
experiments, our framework already performs remarkably well with small L and M .
3.2 Phase-II: Prompting LLM to Perform Abductive Reasoning
For each proposed event (t, k), our framework selects a set of previous events from its full history as
its supporting evidence e(t, k). The selection is guided by an LLM (e.g., GPT-3.5). Technically, we
prompt the LLM to imagine some possible cause events that—under the LLM’s belief—would be
able to explain the occurrence of this proposal. The imaginary cause events may not exactly match
any actual event in the history, but we could use them as queries to search for the most similar ones.
Prompting is a widely used technique to extract knowledge from an LLM. A prompt is a concise
statement designed to elicit a response from the LLM. It typically includes the task description and a
few demonstrations. Our prompt follows the format in Listings 1 and 2.
Each LLM-generated cause event is used as a query to search for D most similar events in the
history, where D > 0 is a hyperparameter. The overall evidence e(t, k) is then defined to be the
union of the previous events retrieved by the LLM-generated causes. Retrieval is vector-based: we
construct a query embedding vq for the query event, and a key embedding vk for each actual previous
v⊤ vk
event; the similarity is measured by cosine between vq and vk , i.e., ∥vqq∥∥vk ∥ . We use a pretrained
SentenceBERT (SBERT) (Reimers & Gurevych, 2019) as our embedding model. The model takes as
input a text string concatenating the event time t, the text identifier (e.g., subject-predicate-object) of
the event type k, and—if any—the textual mark m of the event. It returns an embedding for each
token, and we take the event embedding v to be the average of the SBERT embeddings.

4
3.3 Phase-III: Ranking Proposals
In this phase, our framework scores each proposed event (t, k) based on the compatibility with its
retrieved evidence e(t, k). Precisely, the score is defined to be
def
sevent (t, k) = exp (c ((t, k), e(t, k))) (4)
The function c takes as input the proposed event (t, k) as well as its evidence e(t, k), and returns a
scalar ∈ R. A high value of c means that this proposal is strongly supported by its retrieved evidence,
and thus is more likely to be an actual event at time t; a low value means that this proposal has no
strong evidence even after we have tried our best to search from the history.
Given the most probable M events {(t, k (m) )}M m=1 at time t, we sum their sevent scores to measure
the overall belief of our framework in an event occurring at time t. That is,
M
X
def
stime (t) = sevent (t, k (m) ) (5)
m=1
Intuitively, this score is high when any of the top-ranked event types at this time can be strongly
supported by the retrieved evidence. Otherwise, even the top-ranked event types have no strong
evidence in the history, which implies that the next event is unlikely to occur at this time.
For time prediction, each proposed time t̂(ℓ) (ℓ = 1, . . . , L+1) has a score stime (t̂(ℓ) )—more precisely,
PM (ℓ) (ℓ,m)
m=1 sevent (t̂ , k̂ )—and our final prediction is the proposal with the highest score.
For type prediction given time t, each proposed type k̂ (m) (m = 1, . . . , M ) has a score sevent (t, k̂ (m) ),
and our framework takes the final prediction to be the type with the highest score.

Model architecture. Our function c is an energy function with a continuous-time Transformer

architecture (Xue et al., 2022). It reads the proposal (t, k) followed by its evidence events in the
chronological order, and returns a compatibility score ∈ R.
We choose this architecture because its continuous-time attention is suitable for our setting. First, the
attention mechanism may learn to disregard any of the retrieved events that are not really relevant
and focus on those which can actually support the proposal. Second, its sophisticated handling of
time may capture how the time of an evidence event may influence its relevance to the proposal (e.g.,
a recent evidence event may be more important than an ancient event).
def
Training. We train the ranking model by maximizing the objective J = Jactual + βJno where β ≥ 0
is a hyperparameter. The first term Jactual is defined to be
I M
!!
(m)
def
X X
Jactual = log sevent (ti , ki ) − log sevent (ti , ki ) + sevent (ti , ki ) (6)
i=1 m=1
(m)
where (t1 , k1 ), . . . , (tI , kI ) is a sequence of events over the time interval (0, T ), and each ki is the
event type with the m-th highest intensity under the base model at time ti . By maximizing Jactual , the
function c learns to increase the scores of the events that have actually happened, but suppress the
scores of the non-events at times t1 , . . . , tI . The second term Jno is defined to be
XN N
X XM
def
Jno = − log stime (tn ) = − log sevent (tn , kn(m) ) (7)
n=1 n=1 m=1
(m)
where each tn is a time point uniformly sampled from (0, T ) and each kn is the event type with the
m-th highest intensity at time tn . By maximizing Jno , the function c learns to decrease the scores of
the non-events at times over (0, T ) other than t1 , . . . , tI .
Although not explicitly mentioned in equations (6) and (7), computing Jactual and Jno involves
searching for the evidence of the actual and proposed events.

4 Experiments
Our code is at https://github.com/iLampard/lamp. This repository includes CSV files con-
taining numerical results of our experiments. It also includes qualitative results such as LLM-
generated cause events. Experiment details (e.g., hyperparameters) are in Appendix E.

5
4.1 Experimental Setup
We conducted experiments on three real-world datasets (see Appendix E.1 for dataset details).

GDELT (Leetaru & Schrodt, 2013). The GDELT Project monitors events all over the world, with
live datasets updated every 15 minutes. We only focused on the political events that happened in
G20 countries from 2022-01-01 to 2022-07-31, ending up with a corpus of 109000 time-stamped
event tokens. This choice of time range guarantees that our data was not included in the training
data of the most recent GPT. The event type k of each token has a structured name of the format
subject-predicate-object. Each predicate is one of the twenty CAMEO codes such as CONSULT and
INVESTIGATE (see Appendix E.4 for a full list); each subject or object is one of the 2279 political
entities (individuals, groups, and states) such as Tesla and Australia. So there are about 104M event
types in total, making this dataset extremely challenging. Each event token has a news headline that
concisely describes the event. We split the dataset into disjoint train, dev, and test sets based on their
dates: the 83100 events that happened before 2022-07-05 are training data; the 16650 events after
2022-07-19 are test data; the 9250 events between these dates are development data.

ICEWS (Boschee et al., 2015). Similar to GDELT, this dataset logs interactions between social-
political entities. We collected 79410 event tokens from 2022-10-11 to 2023-02-28. Its event types
have the same structure as GDELT: each predicate is one of the twenty CAMEO codes; each subject
or object is one of the 2981 political entities. We split the dataset into disjoint train, dev, and test sets
based on their dates: the 41600 events that happened before 2023-01-16 are training data; the 22030
events after 2023-02-01 are test data; the 15780 events between these dates are development data.

Amazon Review (Jianmo Ni, 2019). This dataset contains user reviews on Amazon shopping
website from 2014-01-04 to 2016-10-02. We focused on the most active 2500 users and each user has
a sequence of product review events. The type k is the category of the product: we selected the most
frequently-reviewed 23 categories and grouped all the others into a special OTHER category, ending
up with 24 categories in total. Each review event also has a mark m which is the actual content of
the review. Each of the 2500 sequences is cut into three segments: the events that happened before
2015-08-01 are training data; those after 2016-02-01 are test data; the events between these dates are
dev data. Then we have 49,680 training tokens, 7,020 dev tokens, and 13,090 test tokens.
We experimented with four state-of-the-art event sequence models: NHP (Mei & Eisner, 2017a),
Know-Evovle (KE) (Trivedi et al., 2017), DyRep (Trivedi et al., 2019), and ANHP (Yang et al.,
2022). For each of them, we evaluated it as a baseline method as well as integrated it into our LAMP
framework. KE and DyRep require domain-specific knowledge to configure their structural-sparse
architectures: we evaluated them on GDELT since their GDELT-specific architectures are available in
the original papers; we didn’t evaluate them on Amazon Review since we do not have such knowledge
on this data. ANHP can take domain knowledge into its architecture but it is optional, so we evaluated
it on both GDELT and Amazon data: on GDELT, we adapt the knowledge used in KE and DyRep
into its structure; on Amazon Review, we use the generic architecture. On Amazon Review, we also
experimented with NHP since it doesn’t require any domain-specific structure knowledge.
We experimented with three strong LLMs: GPT-3-davinci (Brown et al., 2020) which we also denote
as G3.0; GPT-3.5-turbo (Brown et al., 2020; Stiennon et al., 2020; Gao et al., 2022) which we also
denote as G3.5, and Llama-2-chat with 13B parameters (Touvron et al., 2023) which we also denote
as llama. For GDELT and ICEWS data, we used 10-shot prompts; for Amazon Review data, we used
8-shot prompts. Each “shot” is a demonstration that contains an effect event followed by one or more
expert-annotated cause events. Prompt examples can be found in Appendix E.4.
4.2 Main Results on Type and Time Prediction
Our main results are displayed in Figure 2. Figure 2a shows the result of each method on GDELT
data. GDELT data is updated every fifteen minutes so the time intervals are regular and thus it is
not interesting to predict them. For type prediction, we focus on predicting certain attributes given
the others, which is more practical than full type prediction. In practice, predicting “which of the
hundreds of millions of events is the most probable” is too difficult and existing models will all
perform disastrously. But answering questions like “what will A do to B” and “to whom A will
do this” is usually useful enough for real applications. Note that attribute prediction is still very
challenging: e.g., there are 45580 distinct predicate-object combinations in GDELT data.

6
25 100 2.25 NHP
KE
DyRep
KE
DyRep ANHP Figure 2: Prediction perfor-
20 ANHP 80 ANHP
2.00 NHP-G3.5
KE-G3.5 KE-G3.5 ANHP-G3.5 mance of different methods
1.75

Mean Rank
15 60

Mean Rank

Mean Rank
DyRep-G3.5 DyRep-G3.5 on each dataset. On GDELT,
ANHP-G3.5 ANHP-G3.5
1.50
10 40 the upper figure is for ob-
1.25
5 20 ject prediction, and the
1.00
0 2 0 10
lower figure is for predicate-
5 10 15 20 30 40 50 25 50 75 100 2 3 4 5
M M M object joint prediction. On
KE 2.0 ×10
1 ×102 ICEWS, the upper figure is
40 DyRep KE 1.50
ANHP DyRep for object prediction, and
1.9 ANHP
KE-G3.5 1.45
30 DyRep-G3.5 KE-G3.5 the lower figure is for time
Mean Rank

1.8 DyRep-G3.5 NHP

ANHP-G3.5 1.40 ANHP

RMSE
RMSE
ANHP-G3.5
20 1.7
NHP-G3.5 prediction. On Amazon Re-
1.35 ANHP-G3.5
view, the upper figure is
10 1.6 1.30
for type prediction, and the
10 25 50 75 100 1.5 2 3 4 5 1.25 2 3 4 5 lower figure is for time pre-
M L L
(a) GDELT. (b) ICEWS. (c) Amazon Review. diction.

We evaluate each model by the quality of its top-ranked predictions. For each baseline model (KE,
DyRep, or ANHP), the list of top predictions contains the top M event types (with known attributes
filled, if any) that have the highest intensities; see section 3.1. Our LLM-enhanced framework takes
the list given by its base model, and sorts it based on the sevent scores of the proposals. Our primary
evaluation metric is the mean rank (MR). This metric has a straightforward interpretation: given a
sorted list of proposed predictions, it measures the average rank of the ground-truth type in the list; a
smaller MR means a higher rank, and thus a better result (e.g., MR = 1 means “ranked at the top on
average”). We also used the mean reciprocal rank (MRR), which is less interpretable but more robust
to bad predictions than MR. Appendix E.2 includes full procedures for computing these metrics. In
our experiments, MR and MRR results yield the same conclusions. So we present MR results in this
section for its straightforward interpretation, but leave MRR results to Appendix F.
On ICEWS and Amazon Review, we evaluate each model on time prediction (in addition to type
prediction), which is measured by the root of mean squared error (RMSE). For each held-out token,
each base model proposes a list of scored predictions; see section 3.1. Our LAMP framework reranks
the list given by its base model. In either case, the final prediction is the highest-ranked proposal.
In each evaluation setting shown in Figure 2, our LLM-enhanced framework substantially and
consistently outperforms the corresponding baseline model across a range of L and M values. All
the results throughout the paper have 95% bootstrap confidence intervals but they are too tight to
be visually obvious in most cases, implying that our improvements are significant. When we draw
more proposals from the base model (i.e., L and M are larger), our framework tends to enjoy a larger
improvement over the base model. For predicate-object joint prediction on GDELT, the ground-truth
type ranks at around 40 in the lists given by the base models, but is moved to around 20 by our
LLM-based adjustment. For object prediction on ICEWS, our method improved the MR results from
around 90 to about 20. Note that it is not fair to compare the same method across L or M .
The MR and MRR are only evaluated on the held-out tokens whose ground-truth event types fall into
the top proposals. But how many are there such tokens? This solely depends on how well the base
event sequence model works. A detailed analysis can be found in Appendix F.2.
4.3 Analysis
Now we present our analysis on GDELT. On GDELT, we focus on the predicate and object prediction.
It is less expensive than the time and predicate-object prediction (which requires an order of magnitude
more GPT hours and GPT API calls) but the results are well correlated in our pilot experiments. More
analysis can be found in Appendix F, including analysis on the other datasets.

Analysis-I: About LLMs. Section 4.2 only shows the results of the GPT-3.5 version of our
framework. Figure 3 shows the results of the other versions with ANHP as the base model. These
LLMs all help improve the prediction performance but GPT-3.5 and Llama work significantly better
than GPT-3. Interestingly, the Llama version performs competitive to the GPT-3.5 version. Its strong
performance demonstrates that our framework doesn’t have to depend on black-box LLMs; indeed, it
can excel with open-source LLMs like Llama. Note that Llama has considerably fewer parameters
than GPTs. An important reason for its success may be the reinforcement learning from human

7
2.75 6.8
ANHP ANHP 2.7 KE-G3.5
ANHP-G3.0 8 ANHP-G3.0 6.6 DyRep-G3.5
2.50 ANHP-llama ANHP-llama ANHP-G3.5
2.25 ANHP-G3.5 ANHP-G3.5 2.6 6.4

Mean Rank

Mean Rank
6

Mean Rank
KE-G3.5 6.2
2.00 DyRep-G3.5
2.5 ANHP-G3.5 6.0
1.75 4
5.8
1.50
2 2.4 5.6
1.25
2 3 4 5 2 5 10 15 20 5 10 15 20 5 10 15 20
M M Num of Retrieved Evidence Events Num of Retrieved Evidence Events

Figure 3: MR of using different LLMs on predicate Figure 4: Effect of the number of evidence events on
(left) and object (right) prediction on GDELT. predicate (left) and object (right) prediction on GDELT.
2.7 9 2.7 9

2.6 2.6
8 8
Mean Rank 2.5

Mean Rank

Mean Rank
Mean Rank

2.5 7 7
2.4
2.4 ANHP 6 ANHP
ANHP-G3.0 ANHP-G3.0 2.3 6
ANHP-G3.5 ANHP-G3.5
2.3 0 5 2.2
4
Num of Shots
10 0 4
Num of Shots
10 ANHP G3.0-P1G3.0-P2G3.5-P1G3.5-P2 ANHP G3.0-P1 G3.0-P2 G3.5-P1 G3.5-P2

Figure 5: Effect of the number of shots: MR of predi- Figure 6: Effect of the shot choices: MR of predicate
cate (left) and object (right) prediction on GDELT. (left) and object (right) prediction on GDELT.

feedback (RLHF) technique, which is adopted by GPT-3.5 and Llama-2-chat, but not by GPT-3.
Further investigation may be an interesting direction for future work.

Analysis-II: How many evidence events do we need? In our GDELT experiments, we varied the
number of retrieved evidence events for each proposed prediction. As shown in Figure 4, the results
improve when the number of retrievals increases from 5 to 10, but become worse when the number
further goes up. Apparently, if we retrieve too many “evidence” events, then most of them might not
be actually helpful so the performance will degrade due to the decreased signal-to-noise ratio.1 We
also tried another retrieval criteria: instead of retrieving a fixed number of previous events, we only
retrieve an event if its similarity score exceeds a prespecified threshold. It turns out that this criteria
only had negligible impact on the results.

Analysis-III: Prompt design. How will the design of the prompt influence the performance of
our LAMP framework? As shown in Figure 5, LAMP performs better when the prompt includes
a larger number of demonstrations, and the GPT-3.5 version consistently outperforms the GPT-3
version. Interestingly, even 0-shot prompting yields a better result than the baseline ANHP model,
emphasizing that the LLMs are indeed very helpful.
Figure 6 shows how sensitive our framework is to the choice of demonstrations. We use P1 to
denote the set of demonstrations used throughout our main experiments. Then P2 refers to a new
set of demonstrations obtained by randomly replacing half of the demonstrations in P1. As we can
see, changing the demonstrations has only a slight effect on performance, and LAMP consistently
outperforms ANHP, whether using P1 or P2.
We designed the prompt templates without prior experimentation or tuning on any dataset. In a post
analysis, we tested several different templates on a small subset of GDELT data, and found only
minimal variations in results, as long as the task description was clear. Interestingly, a new template
that we tested doesn’t include the full list of possible predicates (see Listing 1), yet the results with
this template closely match those of our original version.

Analysis-IV: About generalization. The LLM in LAMP is instructed by a few demonstrations.

On GDELT, these demonstrations cover 17 out of 20 (85%) predicates and 30 out of 2279 (1.31%)
entities. However, GPT-3.5 is able generalize beyond the demonstrations, suggesting cause events
that involve novel predicates and entities. Precisely, the LLM-generated cause events involve 4699
distinct predicates and 65554 distinct entities, covering all the 20 ground-truth predicates and more
than 800 ground-truth entities. Appendix F.3 includes more details about this analysis.
LLMs are known for strong in-context learning capability. In our specific problem setting, an LLM
may have already understood the meanings of the predicates and been familiar with many entities
1
For Figures 4–6, we fixed M = 5 for predicate prediction and M = 20 for object prediction.

8
ANHP 10 ANHP 3.00 10
ANHP-rnd ANHP-rnd ANHP ANHP
3.0 ANHP-rec ANHP-rec 2.75 ANHP-G3.5 ANHP-G3.5
ANHP-bert 8 ANHP-bert ANHP-G3.5-edit 8 ANHP-G3.5-edit
ANHP-rec-pre ANHP-rec-obj
2.50

Mean Rank

Mean Rank
2.5 ANHP-G3.5 6 ANHP-G3.5 2.25 6
2.00
2.0 4 4
1.75
1.5 2 1.50 2
2 3 4 5 2 5 10 15 20 2 3 4 5 5 10 15 20
M M M M

Figure 7: Predicate (left) and object (right) prediction Figure 8: Effect of similarity metric: MR of predicate
of different retrieval methods on GDELT. (left) and object (right) prediction on GDELT.

through its pretraining and fine-tuning processes. Consequently, when presented with a proposal,
it can follow demonstrations and draw upon its internal knowledge to suggest plausible causes.
Benefiting from the strong generalization capability of LLMs, our LAMP framework has a significant
potential for broad applications.

Analysis-V: About retrieval methods. To further investigate the usefulness of LLMs, we tested
several versions of our framework that do not involve the LLM in retrieval: rnd randomly samples 10
previous events from the history for each proposal; rec uses the most recent 10 past events; rec-pre
retrieves the most recent 10 events that share the same predicate with the proposal; rec-obj retrieves
the most recent 10 events that share the same object; bert uses SBERT to embed the text-based
identifier of each event and retrieve the most similar 10 events based on the cosine similarity of
SBERT embeddings. As shown in Figure 7, all these retrieval methods perform significantly worse
than our LLM-based method. Noticeably, the “most recent” retrievals work poorly. It suggests that
“recent” is not a good inductive bias for our problem setting. On dev data, we found that the cause
events retrieved based on LLM-generated clues are often not “recent”.
We also tested using the edit distance as the similarity score in our LLM-based retrieval method. For
a pair of events, we compute the character-level edit distance (Jurafsky & Martin, 2009) between
their text strings that are otherwise fed into the SBERT embedding model (section 3.2), and define
the similarity to be the reciprocal of this distance. As shown in Figure 8, using this metric yields a
better result than the baseline method, but performs worse than our original framework.

Analysis-VI: About data leakage. GPT models are trained on online text data up to 2021.2 This
dataset doesn’t include the GDELT or ICEWS data used in our experiments. Our demonstrations
span a wide time range in training data, and there is a large temporal gap between the training and test
data. Precisely, the most recent demonstration is about an event on 2022-06-07, while the earliest test
event occurred on 2022-07-05. Below are the percentiles for the time differences (in days) between
the test events and the latest demonstration:
percentage 1% 5% 25% 50% 75% 95% 99%
percentile 40 41 44 52 54 55 55

The GPT training data may have covered the Amazon Review data. Therefore, we had a conversation
with GPT-3.5, checking whether it could recall information about specific users or specific reviews.
GPT-3.5 could say nothing specific about the users or reviews, indicating that we do not have an issue
of data leakage. Our conversation can be found in Appendix E.5.

5 Discussion and Related Work

Event reasoning and prediction is an important problem that arises in various real-world applications.
A great number of event sequence models have been proposed and advanced this field, including the
classical graphical models (Hawkes, 1971; Du et al., 2015a), recurrent neural models (Du et al., 2016;
Mei & Eisner, 2017b; Xiao et al., 2017; Omi et al., 2019; Shchur et al., 2020; Boyd et al., 2020),
and Transformer-based models (Zuo et al., 2021; Zhang et al., 2020; Enguehard et al., 2020; Sharma
et al., 2021; Zhu et al., 2021; Yang et al., 2022; Xue et al., 2022; Liu et al., 2022). These models have
been applied to a wide range of applications such as network analysis (Choi et al., 2015; Etesami

2
GPT-3.0-davinci’s training data goes up to October 2019, while GPT-3.5-turbo’s training data goes up to
September 2021 (source: https://platform.openai.com/docs/models/gpt-3-5).

9
et al., 2016), recommendation systems (Du et al., 2015b), social analysis (Guo et al., 2015; Lukasik
et al., 2016; Zhang et al., 2022), and healthcare (Hua et al., 2022; Zhang et al., 2023b).
Recently, there has been a growing body of research that has directed its focus towards the textual
features of the real-world events, such as the reports and news articles about the events. This line of
work includes Deng et al. (2020, 2021, 2022); Han & Ning (2022). Their methods all treat text as
auxiliary features but do not consider reasoning about that text information. Our work is significantly
different from this line of research since we focus on reasoning. Our framework reasons about the
events and their text information (e.g., text identifiers of the event types, associated documents) by
prompting a large language model. The large language model has consumed a massive amount of
text during pretraining and is aware of diverse world knowledge, thus able to perform the kind of
reasoning beyond the capacities of the aforementioned models.
By leveraging large language models, our framework induces domain-specific knowledge into the
deployment of event sequence models. It is related to previous work of configuring domain-specific
knowledge into model architectures (Trivedi et al., 2017, 2019; Mei et al., 2020; Yang et al., 2022). But
our work is significantly different: it doesn’t rely on human domain experts to write down a full set of
knowledge; instead, we extract knowledge from language models via few-shot prompting. Prompting
has become a standard way of applying large language models to reasoning tasks. There has been a
recent surge of work that develops novel prompting techniques for a better performance (Wei et al.,
2022a,b; Zhou et al., 2023). Our framework is general and can adopt any prompting methods.
This paper is closely related to research in logical reasoning, which primarily focuses on proving
goals from known facts and rules. A major approach for this problem is backward chaining: it works
backward from the goal, chaining through rules to find known facts that support the proof (Russell &
Norvig, 2010). This approach has been applied to various application problems, including reasoning
and planning in natural language (Ye et al., 2022; Weir & Van Durme, 2022; Kazemi et al., 2023;
Kassner et al., 2023). In our setting, each proposed event is treated as a goal, and previous events
serve as known facts. Like backward chaining, our method identifies previous events that support the
proposal. But our method only performs a single step of reasoning, assuming complete data where all
events are observable. A second step of reasoning is unnecessary since direct causes are observable
and more temporally recent than indirect causes. In cases of incomplete data, multiple reasoning
steps may be required to identify indirect causes. Handling incomplete data will be a non-trivial
extension of our current framework, which we leave to future research.
This paper is also closely related to research in event-centric natural language processing (NLP) (Chen
et al., 2021). Over the past decades, there has been a great amount of progress in this area, including
many datasets and benchmarks (Ning et al., 2020; Li et al., 2020; Han et al., 2021; Wen et al., 2021;
Feng et al., 2022) as well as a diversity of methods for key problems such as event detection and
extraction (Ji & Grishman, 2008; Li et al., 2013; Feng et al., 2016; Lyu et al., 2021; Wang et al., 2021),
relation extraction and prediction (Chan & Roth, 2011; Ning et al., 2018b; Wang et al., 2020; Wen &
Ji, 2021; Li et al., 2022), event induction and summarization (Do et al., 2012; Saravanakumar et al.,
2021; Li et al., 2021b,a; Jin et al., 2022), and temporal reasoning (Ning et al., 2018a; Ballesteros
et al., 2020; Zhou et al., 2020, 2021). Recently, there has been research in leveraging LLMs to solve
these problems (Dror et al., 2023; Li et al., 2023; Zhang et al., 2023a). Our work complements this
line of research, focusing on the problem of event sequence modeling, which has been out of the
scope of the classical event-centric NLP.

6 Conclusion
In this paper, we present LAMP, a general modeling and prediction framework that leverages the
abductive reasoning ability of large language models to help improve the prediction performance
of event sequence models. Empirical studies demonstrate that our LLM-enhanced framework can
significantly outperform the state-of-the-art event sequence models. Our findings have significant
implications for future research in the field of event sequence modeling. In principle, an event
sequence model should benefit from a range of reasoning abilities of large language models such as
deductive reasoning, inductive reasoning, commonsense reasoning, and arithmetic reasoning. Further
exploration in this area may lead to impactful innovations.

10
Acknowledgments
This work was supported by a research gift to the last author by Adobe Research. We thank the
anonymous NeurIPS reviewers and meta-reviewer for their constructive feedback. We also thank our
colleagues at Ant Group, UChicago, and TTIC for helpful discussion.

References
Ballesteros, M., Anubhai, R., Wang, S., Pourdamghani, N., Vyas, Y., Ma, J., Bhatia, P., McKeown, K.,
and Al-Onaizan, Y. Severing the edge between before and after: Neural architectures for temporal
ordering of events. In Proceedings of the Conference on Empirical Methods in Natural Language
Processing (EMNLP), 2020.

Boschee, E., Lautenschlager, J., O’Brien, S., Shellman, S., Starz, J., and Ward, M. ICEWS Coded
Event Data, 2015.

Boyd, A., Bamler, R., Mandt, S., and Smyth, P. User-dependent neural sequence models for
continuous-time event data. In Advances in Neural Information Processing Systems (NeurIPS),
2020.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam,
P., Sastry, G., Askell, A., et al. Language models are few-shot learners. In Advances in Neural
Information Processing Systems (NeurIPS), 2020.

Chan, Y. S. and Roth, D. Exploiting syntactico-semantic structures for relation extraction. In

Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2011.

Chen, M., Zhang, H., Ning, Q., Li, M., Ji, H., McKeown, K., and Roth, D. Event-centric natural
language processing. In Proceedings of the Annual Meeting of the Association for Computational
Linguistics (ACL), 2021.

Choi, E., Du, N., Chen, R., Song, L., and Sun, J. Constructing disease network and temporal
progression model via context-sensitive Hawkes process. In IEEE International Conference on
Data Mining (ICDM), 2015.

Deng, S., Rangwala, H., and Ning, Y. Dynamic knowledge graph based multi-event forecasting. In
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, 2020.

Deng, S., Rangwala, H., and Ning, Y. Understanding event predictions via contextualized multilevel
feature learning. In Proceedings of the 30th ACM International Conference on Information and
Knowledge Management, pp. 342–351, New York, NY, USA, 2021. Association for Computing
Machinery. ISBN 9781450384469.

Deng, S., Rangwala, H., and Ning, Y. Causality enhanced societal event forecasting with het-
erogeneous graph learning. In 2022 IEEE International Conference on Data Mining (ICDM),
2022.

Do, Q., Lu, W., and Roth, D. Joint inference for event timeline construction. In Proceedings of the
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012.

Dror, R., Wang, H., and Roth, D. Zero-shot on-the-fly event schema induction. In Findings of the
Conference of the European Chapter the Association for Computational Linguistics (Findings of
EACL), 2023.

Du, N., Farajtabar, M., Ahmed, A., Smola, A. J., and Song, L. Dirichlet-Hawkes processes with
applications to clustering continuous-time document streams. In Proceedings of the ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, 2015a.

Du, N., Wang, Y., He, N., Sun, J., and Song, L. Time-sensitive recommendation from recurrent user
activities. In Advances in Neural Information Processing Systems (NeurIPS), 2015b.

11
Du, N., Dai, H., Trivedi, R., Upadhyay, U., Gomez-Rodriguez, M., and Song, L. Recurrent marked
temporal point processes: Embedding event history to vector. In Proceedings of the ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, 2016.
Enguehard, J., Busbridge, D., Bozson, A., Woodcock, C., and Hammerla, N. Neural temporal point
processes [for] modelling electronic health records. In Proceedings of Machine Learning Research,
volume 136, pp. 85–113, 2020. NeurIPS 2020 Workshop on Machine Learning for Health (ML4H).
Etesami, J., Kiyavash, N., Zhang, K., and Singhal, K. Learning network of multivariate Hawkes
processes: A time series approach. arXiv preprint arXiv:1603.04319, 2016.
Feng, X., Huang, L., Tang, D., Ji, H., Qin, B., and Liu, T. A language-independent neural network
for event detection. In Proceedings of the Annual Meeting of the Association for Computational
Linguistics (ACL), 2016.
Feng, Y., Zhou, B., Wang, H., Jin, H., and Roth, D. Generic temporal reasoning with differential anal-
ysis and explanation. In Proceedings of the Annual Meeting of the Association for Computational
Linguistics (ACL), 2022.
Gao, L., Schulman, J., and Hilton, J. Scaling laws for reward model overoptimization. arXiv preprint
arXiv:2210.10760, 2022.
Guo, F., Blundell, C., Wallach, H., and Heller, K. The Bayesian echo chamber: Modeling social
influence via linguistic accommodation. In Proceedings of the International Conference on
Artificial Intelligence and Statistics (AISTATS), 2015.
Han, R., Hsu, I.-H., Sun, J., Baylon, J., Ning, Q., Roth, D., and Peng, N. Ester: A machine
reading comprehension dataset for reasoning about event semantic relations. In Proceedings of the
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Han, X. and Ning, Y. Text-enhanced multi-granularity temporal graph learning for event prediction.
In Zhu, X., Ranka, S., Thai, M. T., Washio, T., and Wu, X. (eds.), IEEE International Conference
on Data Mining, ICDM 2022, Orlando, FL, USA, November 28 - Dec. 1, 2022, pp. 171–180. IEEE,
2022.
Hawkes, A. G. Spectra of some self-exciting and mutually exciting point processes. Biometrika,
1971.
Hua, W., Mei, H., Zohar, S., Giral, M., and Xu, Y. Personalized dynamic treatment regimes in
continuous time: A Bayesian joint model for optimizing clinical decisions with timing. Bayesian
Analysis, 2022.
Ji, H. and Grishman, R. Refining event extraction through cross-document inference. In Proceedings
of the Annual Meeting of the Association for Computational Linguistics (ACL), 2008.
Jianmo Ni, Jiacheng Li, J. M. Justifying recommendations using distantly-labeled reviews and
fined-grained aspects. In Empirical Methods in Natural Language Processing (EMNLP), 2019.
Jin, X., Li, M., and Ji, H. Event schema induction with double graph autoencoders. In Proceedings
of the Conference of the North American Chapter of the Association for Computational Linguistics
(NAACL), 2022.
Jurafsky, D. and Martin, J. H. Speech and language processing : an introduction to natural language
processing, computational linguistics, and speech recognition. 2009.
Kassner, N., Tafjord, O., Sabharwal, A., Richardson, K., Schutze, H., and Clark, P. Language models
with rationality. arXiv preprint arXiv:2305.14250, 2023.
Kazemi, S. M., Kim, N., Bhatia, D., Xu, X., and Ramachandran, D. Lambada: Backward chaining for
automated reasoning in natural language. In Proceedings of the Annual Meeting of the Association
for Computational Linguistics (ACL), 2023.
Leetaru and Schrodt. Gdelt: Global data on events, location, and tone, 1979–2012. ISA annual
convention, 2, 04 2013.

12
Lewis, P. A. and Shedler, G. S. Simulation of nonhomogeneous Poisson processes by thinning. Naval
Research Logistics Quarterly, 1979.
Li, M., Zareian, A., Zeng, Q., Whitehead, S., Lu, D., Ji, H., and Chang, S.-F. Cross-media structured
common space for multimedia event extraction. In Proceedings of the Annual Meeting of the
Association for Computational Linguistics (ACL), 2020.
Li, M., Li, S., Wang, Z., Huang, L., Cho, K., Ji, H., Han, J., and Voss, C. Future is not one-
dimensional: Graph modeling based complex event schema induction for event prediction. In
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP),
2021a.
Li, M., Ma, T., Yu, M., Wu, L., Gao, T., Ji, H., and McKeown, K. Timeline summarization based on
event graph compression via time-aware optimal transport. In Proceedings of the Conference on
Empirical Methods in Natural Language Processing (EMNLP), 2021b.
Li, Q., Ji, H., and Huang, L. Joint event extraction via structured prediction with global features. In
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2013.
Li, S., Ji, H., and Han, J. Open relation and event type discovery with type abstraction. In Proceedings
of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
Li, S., Zhao, R., Li, M., Ji, H., Callison-Burch, C., and Han, J. Opendomain hierarchical event schema
induction by incremental prompting and verification. In Proceedings of the Annual Meeting of the
Association for Computational Linguistics (ACL), 2023.
Liniger, T. J. Multivariate Hawkes processes. Diss., Eidgenössische Technische Hochschule ETH
Zürich, Nr. 18403, 2009.
Liu, S., Yu, H., Liao, C., Li, J., Lin, W., Liu, A. X., and Dustdar, S. Pyraformer: Low-complexity
pyramidal attention for long-range time series modeling and forecasting. In Proceedings of the
International Conference on Learning Representations (ICLR), 2022.
Lukasik, M., Srijith, P. K., Vu, D., Bontcheva, K., Zubiaga, A., and Cohn, T. Hawkes processes for
continuous time sequence classification: An application to rumour stance classification in Twitter.
In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL),
2016.
Lyu, Q., Zhang, H., Sulem, E., and Roth, D. Zero-shot event extraction via transfer learning: Chal-
lenges and insights. In Proceedings of the Annual Meeting of the Association for Computational
Linguistics (ACL), 2021.
Mei, H. and Eisner, J. The neural Hawkes process: A neurally self-modulating multivariate point
process. In Advances in Neural Information Processing Systems (NeurIPS), 2017a.
Mei, H. and Eisner, J. The neural hawkes process: A neurally self-modulating multivariate point
process. In Proceedings of the International Conference on Machine Learning (ICML), 2017b.
Mei, H., Qin, G., Xu, M., and Eisner, J. Neural Datalog through time: Informed temporal modeling
via logical specification. In Proceedings of the International Conference on Machine Learning
(ICML), 2020.
Ning, Q., Feng, Z., Wu, H., and Roth, D. Joint reasoning for temporal and causal relations. In
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2018a.
Ning, Q., Wu, H., and Roth, D. A multi-axis annotation scheme for event temporal relations. In
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2018b.
Ning, Q., Wu, H., Han, R., Peng, N., Gardner, M., and Roth, D. Torque: A reading comprehension
dataset of temporal ordering questions. In Proceedings of the Conference on Empirical Methods in
Natural Language Processing (EMNLP), 2020.
Omi, T., Ueda, N., and Aihara, K. Fully neural network based model for general temporal point
processes. In Advances in Neural Information Processing Systems (NeurIPS), 2019.

13
OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga,
L., and Lerer, A. Automatic differentiation in PyTorch. 2017.
Reimers, N. and Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. In
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP),
2019.
Russell, S. and Norvig, P. Artificial Intelligence: A Modern Approach. Prentice Hall, 2010.
Saravanakumar, K. K., Ballesteros, M., Chandrasekaran, M. K., and McKeown, K. Event-driven news
stream clustering using entity-aware contextual embeddings. In Proceedings of the Conference of
the European Chapter the Association for Computational Linguistics (EACL), 2021.
Sharma, K., Zhang, Y., Ferrara, E., and Liu, Y. Identifying coordinated accounts on social media
through hidden influence and group behaviours. In Proceedings of the ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 2021.
Shchur, O., Biloš, M., and Günnemann, S. Intensity-free learning of temporal point processes. In
Proceedings of the International Conference on Learning Representations (ICLR), 2020.
Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., Radford, A., Amodei, D., and
Christiano, P. F. Learning to summarize with human feedback. In Advances in Neural Information
Processing Systems (NeurIPS), 2020.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S.,
Bhargava, P., Bhosale, S., et al. Llama 2: Open foundation and fine-tuned chat models. arXiv
preprint arXiv:2307.09288, 2023.
Trivedi, R., Dai, H., Wang, Y., and Song, L. Know-Evolve: Deep temporal reasoning for dynamic
knowledge graphs. In Proceedings of the International Conference on Machine Learning (ICML),
2017.
Trivedi, R., Farajtabar, M., Biswal, P., and Zha, H. DyRep: Learning representations over dynamic
graphs. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
Wang, H., Chen, M., Zhang, H., and Roth, D. Joint constrained learning for event-event relation ex-
traction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing
(EMNLP), 2020.
Wang, H., Zhang, H., Chen, M., and Roth, D. Learning constraints and descriptive segmentation for
subevent detection. In Proceedings of the Conference on Empirical Methods in Natural Language
Processing (EMNLP), 2021.
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou,
D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., and Fedus, W. Emergent
abilities of large language models. Transactions of Machine Learning Research (TMLR), 2022a.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E. H., Le, Q., and Zhou, D. Chain of
thought prompting elicits reasoning in large language models. In Advances in Neural Information
Processing Systems (NeurIPS), 2022b.
Weir, N. and Van Durme, B. Dynamic generation of interpretable inference rules in a neuro-symbolic
expert system. arXiv preprint arXiv:2209.07662, 2022.
Wen, H. and Ji, H. Utilizing relative event time to enhance event-event temporal relation extraction. In
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP),
2021.
Wen, H., Qu, Y., Ji, H., Ning, Q., Han, J., Sil, A., Tong, H., and Roth, D. Event time extraction and
propagation via graph attention networks. In Proceedings of the Conference of the North American
Chapter of the Association for Computational Linguistics (NAACL), 2021.

14
Xiao, S., Yan, J., Farajtabar, M., Song, L., Yang, X., and Zha, H. Joint modeling of event sequence
and time series with attentional twin recurrent neural networks. arXiv preprint arXiv:1703.08524,
2017.
Xue, S., Shi, X., Zhang, Y. J., and Mei, H. Hypro: A hybridly normalized probabilistic model
for long-horizon prediction of event sequences. In Advances in Neural Information Processing
Systems (NeurIPS), 2022.
Xue, S., Shi, X., Chu, Z., Wang, Y., Zhou, F., Hao, H., Jiang, C., Pan, C., Xu, Y., Zhang, J. Y.,
et al. Easytpp: Towards open benchmarking the temporal point processes. arXiv preprint
arXiv:2307.08097, 2023.
Yang, C., Mei, H., and Eisner, J. Transformer embeddings of irregularly spaced events and their
participants. In Proceedings of the International Conference on Learning Representations (ICLR),
2022.
Ye, A., Cui, C., Shi, T., and Riedl, M. O. Neural story planning. arXiv preprint arXiv:2212.08718,
2022.
Zhang, Q., Lipani, A., Kirnap, O., and Yilmaz, E. Self-attentive hawkes processes. In Proceedings of
the International Conference on Machine Learning (ICML), 2020.
Zhang, T., Tham, I., Hou, Z., Ren, J., Zhou, L., Xu, H., Zhang, L., Martin, L. J., Dror, R., Li, S., et al.
Human-in-the-loop schema induction. In Proceedings of the Annual Meeting of the Association
for Computational Linguistics (ACL), 2023a.
Zhang, Y., Cao, D., and Liu, Y. Counterfactual neural temporal point process for estimating causal
influence of misinformation on social media. In Advances in Neural Information Processing
Systems (NeurIPS), 2022.
Zhang, Z., Mei, H., and Xu, Y. Continuous-time decision transformer for healthcare applications. In
Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS),
2023b.
Zhou, B., Ning, Q., Khashabi, D., and Roth, D. Temporal common sense acquisition with minimal
supervision. In Proceedings of the Annual Meeting of the Association for Computational Linguistics
(ACL), 2020.
Zhou, B., Richardson, K., Ning, Q., Khot, T., Sabharwal, A., and Roth, D. Temporal reasoning on
implicit events from distant supervision. In Proceedings of the Conference of the North American
Chapter of the Association for Computational Linguistics (NAACL), 2021.
Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Bousquet, O., Le, Q.,
and Chi, E. Least-to-most prompting enables complex reasoning in large language models. In
Proceedings of the International Conference on Learning Representations (ICLR), 2023.
Zhu, S., Zhang, M., Ding, R., and Xie, Y. Deep fourier kernel for self-attentive point processes. In
Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS),
2021.
Zuo, S., Jiang, H., Li, Z., Zhao, T., and Zha, H. Transformer hawkes process. In Advances in Neural
Information Processing Systems (NeurIPS), 2021.

15
Appendices
A Societal Impacts
Our paper develops a novel method to integrate large language models into temporal modeling. By
describing the model and releasing code, we hope to facilitate the modeling of continuous-time
sequential data in many domains. However, like many other machine learning models, our method
may be applied to unethical ends. For example, its abilities of better fitting data and making more
accurate predictions could potentially be used for unwanted tracking of individual behavior, e.g. for
surveillance.

B Limitations
Our framework utilizes LLMs such as GPT-3.5 and thus inherits the issues of those models such as
hallucination and biased content. Therefore, the cause events generated by LLMs may be irrelevant,
incorrect, or misleading, thus harming the overall performance of our framework. Also because
of using LLMs, our approach requires human-crafted prompts, and the events need to have textual
information (e.g., event types have textual identifiers or marks).
Our framework needs a pretrained event sequence model, and its overall capability is partially
determined by this base model. When the base model can handle events with duration (i.e., events
that last for a period of time, such as holding a tool), our framework will inherit this merit. But our
framework will also inherit some of the technical limitations of its base event sequence model. If the
base model can only handle a finite set of event types, then our framework will inherit this limitation.
If the base model is misspecified or poorly trained, our framework may not work well.
Our compatibility function c is a Transformer-based neural network which is known to be data-hungry.
Though it worked well in our experiments, it might struggle when starved of data.

C Possible Extensions
Our framework currently trains each component separately. A natural improvement will be to jointly
train its three components. In principle, we can iteratively

• refine the base event sequence model with the feedback from the LLM and ranking model;
• learn to prompt the LLM so that it generates cause events of higher quality;
• train the ranking model with proposals and evidence given by the improved base model and LLM.

Another interesting extension of our framework is to apply its propose-justify-rank pipeline to other
problems. At the core of our framework is the LLM, which is able to examine each proposed
prediction drawn from the base model, and suggest clues to justify its validity. This idea seems to be a
natural fit to a wide range of reasoning problems. Question answering is an example: given a question,
an existing model proposes multiple plausible answers; an LLM reads each proposed answer and
reasons about its preconditions; a search module finds out evidence (from local knowledge bases or
the open internet) which may match the LLM-suggested preconditions; a ranking model learns to
score each combination of answer and evidence.

D Method Details
D.1 ANHP
On Amazon data, the ANHP model we used is directly from Yang et al. (2022). On GDELT and
ICEWS data, we modified the definition of event embedding so that it is easier to scale up to hundreds
of millions of event types. Particularly, ANHP defines a dynamic event embedding JkK(t) for the
event type k at time t. On GDELT and ICEWS, each event type is a tuple of subject-predicate-object,
so we redefine the event embedding to be
def
JkK(t) = concat Jk subject K(t), Jk predicate K(t), Jk object K(t)

(8)
and keep everything else the same as in the original ANHP.

16
This modified version is more scalable since it needs to maintain a significantly smaller number
of dynamic embeddings: there are hundreds of millions of possible event types, but there are only
thousands of possible subjects and objects.

E Experimental Details
E.1 Dataset Details
Table 1 shows statistics about each dataset in our experiments.

DATASET # OF # OF # OF E VENT T OKENS

E NTITIES P REDICATES T RAIN D EV T EST
GD ELT 2279 20 83100 9250 16650
ICEWS 2981 20 41600 15780 22030
A MAZON 1 24 49680 7020 13090

Table 1: Statistics of datasets.

E.2 Evaluation Metric Details

Now we give the formal definitions of MR and MRR. The primary evaluation metric is the mean rank
(MR): for each method, we initialize N = R = 0; for each held-out token, the method provides a list
of M proposed predictions sorted in the decreasing order of their scores; if the ground-truth type is
among the proposals, we update N += 1 and R += r where r is the rank of the ground-truth type
in this list; the mean rank is R/N and the lower is the better.
The mean reciprocal rank (MRR) is as follows: for each method, we initialize N = R = 0; for each
held-out token, the method provides a list of M proposals sorted in the decreasing order of their
scores; if the ground-truth type is among the proposals, we update N += 1 and R += 1/r where r
is the rank of the ground-truth type in this list; the MRR is R/N and the higher is the better.
Note that MR and MRR are not affected by the fact that there might be multiple ground-truth attributes.
That is because the contribution of j th ground-truth (among the multiple) to the final number is only
dependent on its rank in the list but not its j index.
Next we give the precise definitions of MAP and MAR.
MAP is computed as follows. We first initialize N = C = 0. Then, for each partial held-out event
(i.e., only t given in Amazon Review, t and some attributes given in GDELT), the model gives its top
M predictions on the attribute of interest (e.g., type k on Amazon Review and “object” on GDELT).
There might be multiple ground-truth attributes for a held-out event since multiple events may be
recorded at the same time (due to time quantization and recording errors). If any of the ground-truth
is in the top M list, we update N and C: first, we have N += J where J ≤ M is the number of the
ground-truth attributes that are covered in the top M list; then, for j th covered ground-truth, we let
C += j/R where R is its rank in the top M list. In the end, MAP is defined as C/N . Each i/R is a
pseudo-count for this event: it is in (0, 1] since the rank R of the j th correct prediction will be surely
≥ j; if it ranks at the top, i/R is close to 1 and it is counted as “predicted correctly”; if it ranks at the
bottom, i/R is close to 0, meaning that it is almost missed by the top M proposals.
MAR is defined as follows. We first initialize N = C = 0. For each partial held-out event, the model
gives its top M proposals on the attribute of interest; for the j th correct prediction in the list, we let
C += j/R where R is its rank in the top M list; in the end, MAR is defined as C/N where N is the
number of actual events.
E.3 Implementation Details
All models are implemented using the PyTorch framework (Paszke et al., 2017).
For the implementation of NHP, AttNHP and energy functions, we used the code from the pub-
lic Github repository at https://github.com/ant-research/EasyTemporalPointProcess
(Xue et al., 2023) with Apache License 2.0.

17
For the implementation of Know-Evolve, we used the code from the public Github repository at
https://github.com/rstriv/Know-Evolve (Trivedi et al., 2017) without any license.
For the implementation of DyRep, we used the code from the public Github repository at https:
//github.com/uoguelph-mlrg/LDG (Trivedi et al., 2019) without any license.
For the implementation of MAR@M and MAP@M, we used the code from the public Github reposi-
tory at https://github.com/statisticianinstilettos/recmetrics without any license.
For the implementation of Levenshtein distance, we used the code from the public Github repository
at https://github.com/maxbachmann/Levenshtein.git with General Public License 2.0.
To compute the dense representations of text descriptions of events, we use the code from the public
Github repository at https://github.com/UKPLab/sentence-transformers with Apache
License 2.0.
E.4 Prompts
We show one example of the prompt structures used for GDELT dataset as below. The effect event
consists of an event predicate, an event time, a subject name and an object name while the cause
events consists of an event headline additionally.

I want you to do the reasoning over social events. I given you an effect event
and you give me four or five cause events. An effect event is an event that
happens. A cause event is an event that is believed to be one of the causes that
trigger an effect event to happen. Each event consists of an event headline, an
event predicate, an event time, subject name and object name of describing the
event.

The predicates of the effect and cause events are restricted to 20 options, with
names (in capital) and the descriptions below.
1. MAKE STATEMENT: decline comment, make comments.
2. APPEAL: appeal for material, economic, military, humanitarian aid or
cooperation.
3. EXPRESS INTENT TO COOPERATE: Express intent to engage in material, diplomatic,
military aid.
4. CONSULT: make a visit, host a visit.
5. ENGAGE IN DIPLOMATIC COOPERATION: praise or endorse, defend verbally.
6. ENGAGE IN MATERIAL COOPERATION: cooperate economically, militarily, judicially.

7. PROVIDE AID: provide economic, military, humanitarian aid.

8. YIELD: ease admin or political sanctions or military blockade, return and
release.
9. INVESTIGATE: investigate crime, corruption, human rights abuses, military
actions.
10. DEMAND: demand any type of cooperation, aid, reforms, rights, easing of
sanctions.
11. DISAPPROVE: criticize or denounce, accuse of crime, human rights abuses,
complain officially and make lawsuit against.
12. REJECT: reject material, economic, military, judicial operations, requests or
plans.
13. THREATEN: threaten to reduce aid, to boycott, to reduce or break relations,
to impose sanctions.
14. PROTEST: civilian demonstrations.
15. EXHIBIT MILITARY POSTURE.
16. REDUCE RELATIONS: reduce or break any relations.
17. COERCE: seize or damage properties, impose administrative sanctions or
restrictions.
18. ASSAULT: use of unconventional forms of violence.
19. FIGHT: uses of conventional force.
20. ENGAGE IN UNCONVENTIONAL MASS VIOLENCE.

Now I give you 10 examples of event reasoning. In each example, the first event
is the effect event, the next three to five events are cause events that happen
earlier.

18
## Example 1

effect
predicate: APPEAL
time: 2022−04−23
subject: GERMANY
object: GREEN PROJECT

reasoning:
−−−−−−−−−−−−−−−−−−−−−−−−−−−−
cause event 1
predicate: REDUCE RELATIONS
time: 2022−04−21
subject: EUROPE
object: RUSSIA
headline: Europe determined to ban Russian energy exports.

cause event 2
predicate: DISAPPROVE
time: 2022−03−16
subject: EUROPE
object: RUSSIAN
headline: Europe can endure painful transition to live without Russian oil.

cause event 3
predicate: ENGAGE IN DIPLOMATIC COOPERATION
time: 2022−03−05
subject: BUSINESS
object: GOVERNMENT
headline: Oil prices surge to multi−year highs as Ukraine conflict causes ripple
effect in global oil supplies.

cause event 4
predicate: REJECT
time: 2022−03−04
subject: SENATOR
object: RUSSIA
headline: Marshall, Moran seek ban on Russian oil imports, urge expansion of
domestic production.

cause event 5
predicate: DISAPPROVE
time: 2022−03−03
subject: TRADER
object: RUSSIA
headline: Energy markets in turmoil as European gas climbs 60%
Listing 3: A prompt example used for GDELT.

We show one example of the prompt structures used for ICEWS dataset as below. The effect event
consists of an event predicate, an event time, a subject name and an object name while the cause
events consists of an event sub predicate additionally.

I want you to do the reasoning over social events. I given you an effect event
and you give me four or five cause events. An effect event is an event that
happens. A cause event is an event that is believed to be one of the causes that
trigger an effect event to happen. Each event consists of an event text, an event
predicate, an event time, subject name and object name of describing the event.

19
3. EXPRESS INTENT TO COOPERATE: Express intent to engage in material, diplomatic,
military aid.
4. CONSULT: make a visit, host a visit.
5. ENGAGE IN DIPLOMATIC COOPERATION: praise or endorse, defend verbally.
6. ENGAGE IN MATERIAL COOPERATION: cooperate economically, militarily, judicially.

7. PROVIDE AID: provide economic, military, humanitarian aid.

Now I give you 10 examples of event reasoning. In each example, the first event
is the effect event, the next three to five events are cause events that happen
earlier.

## Example 1

effect
predicate: DISAPPROVE
time: 2022−10−28
subject: CHINA
object: JAPAN

reasoning:
−−−−−−−−−−−−−−−−−−−−−−−−−−−−
cause event 1
predicate: ENGAGE IN DIPLOMATIC COOPERATION
time: 2022−10−18
subject: Japan
object: USA
sub predicate: Rally support on behalf of.

cause event 2
predicate: ENGAGE IN MATERIAL COOPERATION
time: 2022−10−16
subject: SOUTH KOREA
object: UNITED NATIONS
sub predicate: Cooperate militarily.

cause event 3
predicate: ENGAGE IN DIPLOMATIC COOPERATION
time: 2022−10−13
subject: SOUTH KOREA
object: JAPAN
sub predicate: Praise or endorse.

cause event 4

20
predicate: MAKE STATEMENT
time: 2022−10−10
subject: CHINA
object: JAPAN
sub predicate: Make pessimistic comment.
Listing 4: A prompt example used for ICEWS.

The following is a prompt exemplar used for the Amazon Review dataset. Each effect event consists
of the product category and event time while each cause event also includes the content of the product
review.

I want you to do the reasoning over the events that are extracted from online−
shopping review data. I given you an effect event and you give me two to four
cause events. An effect event is an event that happens. A cause event is an event
that is believed to be one of the causes that trigger an effect event to happen.
Each event corresponds to an review submitted by the customer, which consists of
an product category(event type), a product title, an event time, summary text
and review text from the user that describes the feedback of the shopping event.

# Example 1

effect event
product category: Luggage & Travel Gear
event time: 2013−10−19

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
reasoning:
cause event 1
product category: Novelty & More
product title: Sports Katz Peace Out Socks
event time: 2013−09−24
summary text: Peace Out Socks
review text: We ordered these for soccer for my daughter and they worked out well.
They are very cute and have held up. They are a quite thick, which is why I only
gave them 4 stars. They will work better for fall season; I would not recommend
them for summer soccer, if soccer is your intended use.

cause event 2

21
product category: Luggage & Travel Gear
product title: Wildkin Pink Retro Kickstart Messenger Bag
event time: 2013−09−09
summary text: Wildkin Kickstart Messenger Bag
review text: My daughter, going to 4th grade, wanted a messenger type bag this
year for school. I searched and searched on Amazon for the perfect bag, but all
of the adult bags seemed huge (like they would definately look funny with her
carrying them). Thank goodness I found this bag. It is the perfect size for her,
and her favorite color!.

cause event 3
product category: Children Clothing
product title: Levi’s Girls’ 715 Thick Stitch Bootcut Jean
event time: 2013−09−01
summary text: Thick Stitch Bootcut Jeans
review text: I love Levis and this style is sooo cute! My daughter even likes
them! The price was amazing in these, as we found similar Levis at Dillard’s for
36. The waist of these Levi in particular run small compared to other Levis of
the same size. During the Levis sale on Amazon, we purchased these, the Sweetie
Skimmers, and the Scooter. All of the others ran almost half an inch larger in
the waist. I worry the larger size would be huge on her, but I think I will try
the larger size, as they must be comfortable on her or she will not wear them.

cause event 4
product category: Children Shoes
product title: New Balance KJ750 Running Running Shoe (Lttle Kid/Big Kid)
event time: 2013−02−02
summary text: Cute and durable!
review text: I ordered these for my 8 year old daughter. She usually destroys
shoes within months, but these have held up!

Listing 5: A prompt example used for the Amazon data.

For the full set of prompts used in the paper, please see scripts/gdelt/gdelt_prompts and
scripts/amazon/amazon_prompts in our code.

E.5 Does GPT remember Amazon Review data?

We have a conversion with GPT-3.5-turbo to verify that it does not remember the exact information
of Amazon Review data.

input: We are using Amazon Review Data, published by Jianmo Ni, et al., 2018,
which is firstly used in the paper "Justifying recommendations using distantly−
labeled reviews and fined−grained aspects", EMNLP 2019.

The dataset includes millions of reviews (ratings, text, helpfulness votes),

product metadata (descriptions, category information, price, brand, and image
features) in the range May 1996 − Oct 2018. Do you remember any of the review
information if I give you a review id?

GPT−3.5 output: As an AI language model, I am not capable of remembering any

review information unless I have been programmed to store and retrieve specific
pieces of information. However, I can search for and retrieve review information
from the dataset if given a review ID.

input: can you recall any information with reviewer ID A28GK1G2KDXHRP on date 04
17, 2014.

GPT−3.5 output: I apologize, but the Amazon Review dataset does not include
information about the specific date when a review was posted. It only includes
the year and month when the review was posted. Therefore, I cannot provide you
with any information about reviews posted by reviewer ID A28GK1G2KDXHRP on April
17, 2014.

22
input: Reviewer A0148968UM59JS3Y8D1M bought Hanes Men’s Classic Crew on 01 18,
2014, what review did he wrote?

GPT−3.5 output: I apologize, but as an AI language model, I do not have access to

real−time or updated information about specific Amazon reviews or purchases made
by individual reviewers. Additionally, the Amazon Review dataset does not
include information about the specific content of a specific review. It only
includes metadata such as the reviewer ID, product ID, and overall rating.
Therefore, I cannot provide you with the review written by the reviewer
A0148968UM59JS3Y8D1M for the Hanes Men’s Classic Crew on January 18, 2014 or any
other date.

input: Reviewer A0148968UM59JS3Y8D1M bought Hanes Men’s Classic Crew on 01 18,

2014, his review starts with ’they seem very well made..’, can you recall the
complete review he wrote?

GPT−3.5 output: I apologize, but as an AI language model, I do not have access to

real−time or updated information about specific Amazon reviews or purchases made
by individual reviewers. Additionally, the Amazon Review dataset does not
include information about the specific content of a specific review. It only
includes metadata such as the reviewer ID, product ID, and overall rating.
Therefore, I cannot provide you with the complete review written by the reviewer
A0148968UM59JS3Y8D1M for the Hanes Men’s Classic Crew on January 18, 2014, or any
other date.

input: on which date did reviewer A0148968UM59JS3Y8D1M buy Hanes Men’s 7−Pack
ComfortSoft Briefs?

GPT−3.5 output: Sorry, as an AI language model, I do not have access to personal

information or purchase history of individuals.
Listing 6: A conversation with GPT-3.5-turbo to test its memorization about Amazon Review data.

E.6 Training and Inference Details

In this section, we present the details about training the models and doing inference with them.

LLMs. We use GPT-3-davinci and GPT-3.5-turbo. We set temperature to be 0 to ensure that our
results are reproducible.

Training Base Models. For each model, we did grid search and chose the hyperparameters based on
their performance on the dev set; see Table 2 for the values of the hyperparameters. For KE, DyRep,
ANHP, the main hyperparameters to tune are the entity embedding size D1 , predicate embedding size
D2 , RNN hidden size D3 used in the network and the number of layers L of the attention structure
(DyRep and ANHP). In practice, the optimal D for a model was usually 4, 8, 16, 32; the optimal L
was usually 1, 2, 3, 4. In the experiment, to train the parameters of the base model, we performed
early stopping based on log-likelihood on the held-out dev set.

Retrieval. For each event, we use few-shot prompting (see Appendix E.4 for examples) to generate
a set of cause events. For each LLM-generated cause, we retrieve the most similar D actual events
from the history, following the procedure in section 3.2.

• For GDELT, we set D = 2 and the average number of retrieved events is 10.
• For ICEWS, we set D = 2 and the average number of retrieved events is 10.
• For Amazon Review, we set D = 4 and the average number of retrieved events is 10.

Training Ranking Model. The energy function used in our ranking model is the same as what’s
proposed in Xue et al. (2022), which consists a continuous-time Transformer and an MLP. The
hyperparameters are tuned within a range of values that make the score function to have a similar
size as the base ANHP model. Training this ranking model needs negative samples. On GDELT and
ICEWS, we use 5 negative samples forpredicate prediction and 20 for object and predicate-object
prediction. On Amazon Review, we use 5 for both type and time prediction.

23
M ODEL D ESCRIPTION VALUE USED
GDELT ICEWS A MAZON
K NOW-E VOLVE E NTITY EMBEDDING SIZE 16 16 NA
P REDICATE EMBEDDING SIZE 4 4 NA
RNN HIDDEN SIZE 16 16 NA
DY R EP E NTITY EMBEDDING SIZE 16 16 NA
P REDICATE EMBEDDING SIZE 4 4 NA
RNN HIDDEN SIZE 16 16 NA
ATTENTION L AYERS N UMBER 1 1 NA
NHP RNN HIDDEN SIZE NA NA 36
E NTITY EMBEDDING SIZE NA NA 16
P REDICATE EMBEDDING SIZE NA NA 8
ANHP E NTITY EMBEDDING SIZE 16 16 32
P REDICATE EMBEDDING SIZE 4 4 32
H EADS NUMBER 2 1 1
ATTENTION L AYERS N UMBER 1 1 2
R ANKING M ODEL T EMPORAL EMBEDDING SIZE 40 40 24
H IDDEN SIZE 60 60 56
H EADS NUMBER 4 3 16
ATTENTION L AYERS N UMBER 3 3 3

Table 2: Values of hyperparameters used for models trained on the three datasets.
0.6 0.5 1.2
KE
DyRep
KE
DyRep
NHP
ANHP
Figure 9: Prediction perfor-
0.5 0.4
Mean Reciprocal Rank

Mean Reciprocal Rank

ANHP ANHP 1.0 NHP-G3.5 mance measured by MRR

0.4 KE-G3.5 KE-G3.5 ANHP-G3.5
DyRep-G3.5 0.3 DyRep-G3.5 of different methods on each
0.3 ANHP-G3.5 ANHP-G3.5 0.8
0.2 dataset. The configura-
0.2
0.1 0.6 tion of this figure is the
0.1
0.0 2 0.0 10 0.4 2
same as that of Figure 2:
5 10 15 20 30 50 25 50 75 100 3 4 5
M M M on GDELT, we show ob-
0.30 KE 2.0 ×10
1 ×102 ject (upper) and predicate-
DyRep KE 1.50
0.25 ANHP DyRep object (lower) prediction;
Mean Reciprocal Rank

1.9 ANHP
KE-G3.5 1.45
0.20 DyRep-G3.5 KE-G3.5 on ICEWS, we show object
1.8 DyRep-G3.5 NHP
ANHP-G3.5 1.40 ANHP
RMSE
RMSE

0.15 ANHP-G3.5
NHP-G3.5 (upper) and time (lower) pre-
1.7 1.35 ANHP-G3.5
0.10 diction; on Amazon Review,
0.05 1.6 1.30
we show type (upper) and
0.00 10 25 50 75 100 1.5 2 3 4 5 1.25 2 3 4 5
M L L
time (lower) prediction.
(a) GDELT. (b) ICEWS. (c) Amazon Review.

Computation Cost. All the experiments were conducted on a server with 256G RAM, a 64 logical
cores CPU (Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz) and one NVIDIA A100 GPU for
acceleration. The training batch size is 8. For GDELT and ICEWS, the wall-clock training time is:
4.9ms per sequence for KE; 6.7ms per sequence for DyRep; 28.7ms per sequence for ANHP. For
Amazon Review, the wall-clock time is: 3.6ms per sequence for NHP; 5.2ms per sequence for ANHP.
For each of the datasets, the wall-clock time of training the ranking model is 9ms per training sample.

F Additional Results and Analysis

In this section, we present additional results and analysis that complement section 4.
F.1 Main MRR results.
Our main results in section 4 are the MR results. In this section, we show the corresponding MRR
results, which are known to be more robust to bad predictions. The MRR results are in Figure 9.
F.2 Analysis About Precision and Recall
Figure 10 shows the mean average precision (MAP) and mean average recall (MAR) of each base
model for a range of M . These metrics complement MR and MRR for evaluating the base models:
MAP measures the fraction of the predictions that are correct; MAR measures the fraction of the

24
0.50 0.08 0.07
0.45
0.07 0.06
0.45 0.40
0.06 0.05
0.35 0.05

MAR
0.40 0.04

MAR
MAP

MAP
0.30 0.04
KE 0.25 KE 0.03 KE 0.03 KE
0.35
DyRep 0.20 DyRep 0.02 DyRep 0.02 DyRep
ANHP ANHP ANHP ANHP
0.30 1 5 10 15 20 0.15 1 5 10 15 20 0.01 1 25 50 75 100 125 150 175 200 0.01 1 25 50 75 100 125 150 175 200
M M M M

(a) MAP on predicate pre- (b) MAR on predicate pre- (c) MAP on object predic- (d) MAR on object predic-
diction. diction. tion. tion.
Figure 10: MAP@M and MAR@M on GDELT dataset.

actual events that are covered by the predictions; higher is better. Their precise definitions (and
formulas) for our setting can be found in Appendix E.2.
As we can see in Figure 10, when M increases, MAR monotonically increases but the change of
MAP may not be monotonic (which is consistent with how MAP and MAR behave in other retrieval
tasks). For predicate prediction, we found that M = 5 yields a good balance between precision and
recall as well as a low financial cost (i.e., GPT API calls). For object prediction, M = 50 seems to
give the best precision-recall balance but we chose M = 20 due to the GPT API budget. For object
prediction, both MAP and MAR are lower than 0.1 (but still significantly better than a random guess
of 0.0015), indicating that this data is very difficult to model.
F.3 Analysis About Generalization
The LLM in LAMP is instructed by a few demonstrations. On GDELT, these demonstrations cover

• 17 out of 20 (85%) predicates;

• 30 out of 2279 (1.31%) subjects;
• 28 out of 2279 (1.22%) objects.

However, the LLM is able generalize beyond the demonstrations, suggesting cause events that involve
novel predicates and entities. Precisely, the LLM-generated cause events involve

• 4699 distinct predicates, which cover all the 20 ground-truth predicates;

• 60552 distinct subjects, which cover 30 (of 30) subjects mentioned in demonstrations, as well as
787 not-mentioned ground-truth subjects.
• 65554 distinct objects, which cover 28 (of 28) objects mentioned in demonstrations, as well as 788
not-mentioned ground-truth objects.

The retrieved cause events cover:

• all the 20 ground-truth predicates.

• 30 (of 30) subjects in demonstrations, as well as 186 not-mentioned ground-truth subjects.
• 28 (of 28) objects in demonstrations, as well as 216 not-mentioned ground-truth objects.

Note that retrievals involve fewer distinct subjects and objects than the LLM-generated causes. It is
because the LLM generation is often diverse and creative, proposing many novel subject-predicate-
object combinations that involve ground-truth entities but haven’t actually happened in real history.
Such causes will eventually be grounded to actual previous events that exhibit less diversity and
creativity, thus ending up with fewer distinct subjects and objects in the retrievals. Benefiting from
the strong generalization capability of LLMs, our LAMP framework has a significant potential for
broad applications.
F.4 Additional Results About LLMs
Now we present our additional results of comparing different LLMs. They include the MR results of
comparing GPT-3 and GPT-3.5 on GDELT (Figures 11 and 12), MR results of comparing GPT-3 and
GPT-3.5 on Amazon Review (Figure 13), and MRR results corresponding to Figure 3 (Figure 14).

25
KE KE 2.75 DyRep DyRep
2.75 KE-G3.0 KE-G3.0 DyRep-G3.0 DyRep-G3.0
KE-G3.5 8 KE-G3.5 2.50 DyRep-G3.5 8 DyRep-G3.5
2.50
2.25

Mean Rank

Mean Rank
2.25 6 6
2.00 2.00
4 4
1.75 1.75
1.50 2 1.50 2
2 3 4 5 2 5 10 15 20 2 3 4 5 2 5 10 15 20
M M M M

Figure 11: Different LLMs on predicate (left) and ob- Figure 12: Different LLMs on predicate (left) and ob-
ject (right) prediction on GDELT dataset, with KE. ject (right) prediction on GDELT dataset, with DyRep.
2.25 NHP 2.2 ANHP 0.80 ANHP ANHP
NHP-G3.0 2.0 ANHP-G3.0 ANHP-G3.0 0.5 ANHP-G3.0
2.00 ANHP-G3.5 0.75 ANHP-G3.5 ANHP-G3.5

Mean Reciprocal Rank

NHP-G3.5
1.75 1.8 0.4
0.70
Mean Rank
Mean Rank

1.6
1.50 0.65 0.3
1.4
1.25 1.2 0.60 0.2
1.00 1.0 0.55 0.1
2 3 4 5 2 3 4 5 2 3 4 5 2 5 10 15 20
M M M M

Figure 13: Different LLMs on type prediction on Ama- Figure 14: MRR results of different LLMs on predicate
zon Review. (left) and object (right) prediction on GDELT.

F.5 Additional Results About Prompt Design

Now we present additional results about the prompt design analysis in section 4. Figure 15 and
Figure 16 show the MRR results corresponding to Figure 5 and Figure 6, respectively.
0.60
0.58 ANHP ANHP
ANHP-G3.0 0.25 ANHP-G3.0 0.25
0.58

Mean Reciprocal Rank

0.57 ANHP-G3.5 ANHP-G3.5

0.20 0.56 0.20
0.56
0.55 0.15 0.54 0.15
0.54 0.52
0.10 0.10
0.50
0 4
Num of Shots
10 0 4
Num of Shots
10 ANHP G3.0-P1G3.0-P2G3.5-P1G3.5-P2 ANHP G3.0-P1G3.0-P2G3.5-P1G3.5-P2

Figure 15: MRR results corresponding to Figure 5: Figure 16: MRR results corresponding to Figure 6:
predicate (left) and object (right) prediction on GDELT. predicate (left) and object (right) prediction on GDELT.

F.6 Additional Results About Retrievals

Now we present our additional analysis results about retrievals. Figure 17 presents the MRR results
with different numbers of retrieved evidence events on GDELT.
2.25 2.2 ANHP
0.58 NHP
0.26 2.00 NHP-G3.5 2.0 ANHP-G3.5
Mean Reciprocal Rank

Mean Reciprocal Rank

NHP-G3.5-edit ANHP-G3.5-edit
0.56 0.24 1.8
1.75
Mean Rank
Mean Rank

1.6
0.22 1.50
0.54 1.4
0.20 1.25
KE-G3.5 KE-G3.5 1.2
0.52 DyRep-G3.5 0.18 DyRep-G3.5
ANHP-G3.5 ANHP-G3.5 1.00 1.0
5 10 15 20 5 10 15 20 2 3 4 5 2 3 4 5
Num of Retrieved Evidence Events Num of Retrieved Evidence Events M M

Figure 17: MRR results on GDELT corresponding to Figure 18: MR results of type prediction with different
Figure 4: predicate (left) and object (right) prediction. similarity metrics on Amazon Review.

3.00 KE 3.00 DyRep

KE DyRep
2.75 KE-G3.5 KE-G3.5 2.75 DyRep-G3.5 DyRep-G3.5
KE-G3.5-edit 8 KE-G3.5-edit DyRep-G3.5-edit 8 DyRep-G3.5-edit
2.50 2.50
Mean Rank

Mean Rank
Mean Rank

Mean Rank

2.25 6 2.25 6
2.00 2.00
1.75 4 1.75 4
1.50 1.50
25 2
2 3 4 5 10 15 20 2 3 4 5 5 10 15 20
M M M M

Figure 19: About similarity metric: predicate (left) and Figure 20: About similarity metric: predicate (left) and
object (right) prediction on GDELT, with KE. object (right) prediction on GDELT, with DyRep.

YSQ S3 Scoring Sheet Finala
86% (22)
YSQ S3 Scoring Sheet Finala
2 pages
LLM Reprogramming
No ratings yet
LLM Reprogramming
20 pages
Acid-Base Titrations in MATLAB PDF
No ratings yet
Acid-Base Titrations in MATLAB PDF
4 pages
The Simple Pendulum Lab
100% (3)
The Simple Pendulum Lab
2 pages
Asian Parliamentary Debate
No ratings yet
Asian Parliamentary Debate
9 pages
Risk Management Plan
100% (7)
Risk Management Plan
32 pages
Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning
No ratings yet
Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning
26 pages
Wang 等 - 2024 - From News to Forecast Integrating Event Analysis
No ratings yet
Wang 等 - 2024 - From News to Forecast Integrating Event Analysis
30 pages
From News to Forecast Integrating Event Analysis in LLM-Based Time Series Forecasting With Reflection
No ratings yet
From News to Forecast Integrating Event Analysis in LLM-Based Time Series Forecasting With Reflection
30 pages
Distilling Event Sequence Knowledge From Large Language Models
No ratings yet
Distilling Event Sequence Knowledge From Large Language Models
9 pages
Financial Time Series
No ratings yet
Financial Time Series
34 pages
Downloed Papers
No ratings yet
Downloed Papers
700 pages
Kalyan 1 s2.0 S2949719123000456 Main
No ratings yet
Kalyan 1 s2.0 S2949719123000456 Main
48 pages
Uncertainty On Asynchronous Time Event Prediction
No ratings yet
Uncertainty On Asynchronous Time Event Prediction
10 pages
2502.01477v1
No ratings yet
2502.01477v1
18 pages
FDS 2
No ratings yet
FDS 2
18 pages
annotator_bias_llms
No ratings yet
annotator_bias_llms
14 pages
Mortizing Intractable Inference in Large Language Models: Edward J. Hu, Moksh Jain, Eric Elmoznino Younesse Kaddar
No ratings yet
Mortizing Intractable Inference in Large Language Models: Edward J. Hu, Moksh Jain, Eric Elmoznino Younesse Kaddar
31 pages
Contextual Bias Study-Mohan Shrivastava
No ratings yet
Contextual Bias Study-Mohan Shrivastava
5 pages
Learning To Predict Rare Events in Event Sequences
No ratings yet
Learning To Predict Rare Events in Event Sequences
6 pages
2020.Findings Emnlp.344
No ratings yet
2020.Findings Emnlp.344
11 pages
Shakarian_AAAI_tutorial
No ratings yet
Shakarian_AAAI_tutorial
64 pages
LLM time series
No ratings yet
LLM time series
24 pages
Sources of Hallucination by Large Language Models On Inference Tasks
No ratings yet
Sources of Hallucination by Large Language Models On Inference Tasks
17 pages
Reasoning and Tools For Human-Level Forecasting: Elvis Hsieh, Preston Fu, Jonathan Chen
No ratings yet
Reasoning and Tools For Human-Level Forecasting: Elvis Hsieh, Preston Fu, Jonathan Chen
9 pages
2503.10814v1
No ratings yet
2503.10814v1
15 pages
Comparison of Naive Bayes Classifier and C-LSTM
No ratings yet
Comparison of Naive Bayes Classifier and C-LSTM
6 pages
LAMPP - Language Models As Probabilistic Priors For Perception and Action PDF
No ratings yet
LAMPP - Language Models As Probabilistic Priors For Perception and Action PDF
12 pages
2409.05258v1
No ratings yet
2409.05258v1
16 pages
ICLR 2019 Notes
No ratings yet
ICLR 2019 Notes
56 pages
12395-Article (PDF) - 25776-2-10-20210118
No ratings yet
12395-Article (PDF) - 25776-2-10-20210118
39 pages
(2019) Future Event Prediction - If and When CVPR 3DCNN
No ratings yet
(2019) Future Event Prediction - If and When CVPR 3DCNN
10 pages
2407.04108v3
No ratings yet
2407.04108v3
39 pages
Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey
No ratings yet
Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey
19 pages
An Attempt to Unraveling Token Prediction Refinement and Identifying Essential Layers of Large Language Models
No ratings yet
An Attempt to Unraveling Token Prediction Refinement and Identifying Essential Layers of Large Language Models
11 pages
2024 Acl-Long 512
No ratings yet
2024 Acl-Long 512
28 pages
Standards For Belief Representations in LLMS
No ratings yet
Standards For Belief Representations in LLMS
19 pages
CSE291D 10a
No ratings yet
CSE291D 10a
55 pages
Lecture Events - 2023
No ratings yet
Lecture Events - 2023
107 pages
Fake News 13-2-2024 (1)
No ratings yet
Fake News 13-2-2024 (1)
8 pages
Deep Prediction Model
No ratings yet
Deep Prediction Model
7 pages
Advanced Topics and Recent Innovations in Machine Learning and Deep Learning References
No ratings yet
Advanced Topics and Recent Innovations in Machine Learning and Deep Learning References
12 pages
H S: F L L M W E L: Ide and EEK Ingerprinting Arge Anguage Odels ITH Volutionary Earning
No ratings yet
H S: F L L M W E L: Ide and EEK Ingerprinting Arge Anguage Odels ITH Volutionary Earning
14 pages
Event Categorization From News Articles Using Machine Learning Techniques.. (1) ..
No ratings yet
Event Categorization From News Articles Using Machine Learning Techniques.. (1) ..
18 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture1
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture1
65 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
An Overview of Large Language Models for Statisticians
No ratings yet
An Overview of Large Language Models for Statisticians
67 pages
Wisdom of The Silicon Crowd 2402.19379
No ratings yet
Wisdom of The Silicon Crowd 2402.19379
20 pages
Process Mining Poster
No ratings yet
Process Mining Poster
1 page
Analysis of News Sentiments Using Natural Language
No ratings yet
Analysis of News Sentiments Using Natural Language
8 pages
Summarization and Visualization of Files based on Genai
No ratings yet
Summarization and Visualization of Files based on Genai
5 pages
PIIS2589004224005558
No ratings yet
PIIS2589004224005558
24 pages
Large Language Models Are
No ratings yet
Large Language Models Are
14 pages
Automated Social Science Language Models As Scientist and Subjects
No ratings yet
Automated Social Science Language Models As Scientist and Subjects
63 pages
LLMand Logicor Mimick
No ratings yet
LLMand Logicor Mimick
11 pages
Evaluating the simulation of social dynamics via Large Language Models
No ratings yet
Evaluating the simulation of social dynamics via Large Language Models
19 pages
Bad Actor, Good Advisor_Exploring the Role of Large Language Models in Fake News Detection
No ratings yet
Bad Actor, Good Advisor_Exploring the Role of Large Language Models in Fake News Detection
9 pages
2411.16035v1
No ratings yet
2411.16035v1
32 pages
Federated and edge learning for large language models
No ratings yet
Federated and edge learning for large language models
19 pages
Report - PDF 20240827 210738 0000
No ratings yet
Report - PDF 20240827 210738 0000
23 pages
Llms 1 15
No ratings yet
Llms 1 15
15 pages
Challenges and Applications of Large Language Models: Desi GN Behavior
No ratings yet
Challenges and Applications of Large Language Models: Desi GN Behavior
72 pages
CAAFE
No ratings yet
CAAFE
23 pages
14-LookingForward
No ratings yet
14-LookingForward
48 pages
Conceptual Dependency Theory: Fundamentals and Applications
From Everand
Conceptual Dependency Theory: Fundamentals and Applications
Fouad Sabry
No ratings yet
Iied
No ratings yet
Iied
148 pages
Principles of Construction Management: Lecture No. 2 - Overview of The Construction Industry Cont
No ratings yet
Principles of Construction Management: Lecture No. 2 - Overview of The Construction Industry Cont
54 pages
Literature Review On Warehouse Receipt System
No ratings yet
Literature Review On Warehouse Receipt System
5 pages
The Wow Project
No ratings yet
The Wow Project
12 pages
Peterson
No ratings yet
Peterson
14 pages
Laboratory Safety Handbook - KBP
No ratings yet
Laboratory Safety Handbook - KBP
7 pages
0183_01_N23_er_4aeffc2ba1
No ratings yet
0183_01_N23_er_4aeffc2ba1
18 pages
Large Above Ground Water Tank Installation Guidelines
No ratings yet
Large Above Ground Water Tank Installation Guidelines
2 pages
Multimeter Parts and Functions
No ratings yet
Multimeter Parts and Functions
3 pages
Weekly Home Learning Plan Grade 8 Quarter 1 Week 1 Date: September 20-25, 2021
No ratings yet
Weekly Home Learning Plan Grade 8 Quarter 1 Week 1 Date: September 20-25, 2021
1 page
LabEx No. 19 Compressive Strength of Cylindrical Concrete
No ratings yet
LabEx No. 19 Compressive Strength of Cylindrical Concrete
7 pages
10 Grade - Chemistry - Second Term2023 (F.V)
No ratings yet
10 Grade - Chemistry - Second Term2023 (F.V)
1 page
Tradesense Wba Status Date
No ratings yet
Tradesense Wba Status Date
55 pages
Abraham Shaji: Curriculum Vitae
No ratings yet
Abraham Shaji: Curriculum Vitae
4 pages
20 Tray SS 304 Tray - Processing Project
No ratings yet
20 Tray SS 304 Tray - Processing Project
8 pages
Kyle Rhayne D. Caliwag Michaella Kenaizan S. Bacani Millen Grace C. Deang Ma. Bench Niña T. Delacruz Killua G. Pilapil
No ratings yet
Kyle Rhayne D. Caliwag Michaella Kenaizan S. Bacani Millen Grace C. Deang Ma. Bench Niña T. Delacruz Killua G. Pilapil
46 pages
Akash Shinde (A) - Postmodernism
No ratings yet
Akash Shinde (A) - Postmodernism
21 pages
Data Sheet - 1600kVA
0% (1)
Data Sheet - 1600kVA
2 pages
Weld Cost Calc XL1.3.2mm
100% (1)
Weld Cost Calc XL1.3.2mm
3 pages
Grundfos UPA 120 HOME BOOSTER PDF
100% (1)
Grundfos UPA 120 HOME BOOSTER PDF
17 pages
NCM 103 - Chapter 3 Nursing Theories & Conceptual Framework
No ratings yet
NCM 103 - Chapter 3 Nursing Theories & Conceptual Framework
32 pages
DBE Tables
No ratings yet
DBE Tables
16 pages
Organizational Behaviour MBA Course 2022-24
No ratings yet
Organizational Behaviour MBA Course 2022-24
22 pages
Estimation of Reservoir Properties by Monte Carlo Simulation
No ratings yet
Estimation of Reservoir Properties by Monte Carlo Simulation
8 pages
DVB-S Satellite Transmission Technique White Paper
No ratings yet
DVB-S Satellite Transmission Technique White Paper
2 pages