0% found this document useful (0 votes)
7 views

G Prompt Syntax

Uploaded by

rfrhn4ch84
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

G Prompt Syntax

Uploaded by

rfrhn4ch84
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Decoding Prompt Syntax: Analysing its Impact on Knowledge

Retrieval in Large Language Models


Stephan Linzbach Tim Tressel Laura Kallmeyer
[email protected] Heinrich Heine University Heinrich Heine University
GESIS - Leibniz Institute for Social Düsseldorf, Germany Düsseldorf, Germany
Sciences [email protected] [email protected]
Germany

Stefan Dietze Hajira Jabeen


GESIS - Leibniz Institute for Social GESIS - Leibniz Institute for Social
Sciences, Sciences
Heinrich Heine University Germany
Germany [email protected]
[email protected]

ABSTRACT ACM Reference Format:


Large Language Models (LLMs), with their advanced architectures Stephan Linzbach, Tim Tressel, Laura Kallmeyer, Stefan Dietze, and Hajira
Jabeen. 2023. Decoding Prompt Syntax: Analysing its Impact on Knowledge
and training on massive language datasets, contain unexplored
Retrieval in Large Language Models. In Companion Proceedings of the ACM
knowledge. One method to infer this knowledge is through the Web Conference 2023 (WWW ’23 Companion), April 30–May 04, 2023, Austin,
use of cloze-style prompts. Typically, these prompts are manually TX, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3543873.
designed because the phrasing of these prompts impacts the knowl- 3587655
edge retrieval performance, even if the LLM encodes the desired
information. In this paper, we study the impact of prompt syntax on
the knowledge retrieval capacity of LLMs. We use a template-based
approach to paraphrase simple prompts into prompts with a more
complex grammatical structure. We then analyse the LLM perfor- 1 INTRODUCTION
mance for these structurally diferent but semantically equivalent Recent advancements in Large Language Models (LLMs) have led
prompts. Our study reveals that simple prompts work better than to signifcant progress in various natural language processing tasks
complex forms of sentences. The performance across the syntactical such as translation, summarization, and question-answering by
variations for simple relations (1:1) remains best, with a marginal providing efcient representations of language in a self-supervised
decrease across diferent typologies. These results reinforce that way. In addition to encoding linguistic and syntactic knowledge
simple prompt structures are more efective for knowledge retrieval [7, 9], recent studies have demonstrated [15, 23] that deep LLMs also
in LLMs and motivate future research into the impact of prompt capture relational knowledge, enabling them to support basic ques-
syntax on various tasks. tion answering and reasoning tasks. Petroni et al. [22] show that
introducing an information retrieval component which captures
the relevant context from a LLM for a relation or factual question
CCS CONCEPTS
signifcantly improves performance in knowledge extraction tasks,
• Computing methodologies → Information extraction; Lan- where the context may include both specifc semantics and syntac-
guage resources; Lexical semantics. tic characteristics. Additionally, leveraging of syntactic information
while training complex language models is shown to improve rep-
KEYWORDS resentational quality across languages [26, 28]. Taking into account
Large Language models, BERT, Syntax aware prompt, Knowledge the benefcial infuence of additional syntactical information, it
retrieval seems natural to question the linguistic reliability of contextual-
ized embeddings when testing for relational knowledge retrieval
and extraction. This is further stressed by the debate on the extent
to which acquired knowledge generalises beyond the statements
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed seen as part of training data [14]. In particular, the generalisation
for proft or commercial advantage and that copies bear this notice and the full citation of knowledge across diferent syntactic transformations of seen
on the frst page. Copyrights for components of this work owned by others than the relational knowledge defnes a desirable property of reliable LLMs.
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specifc permission Although an in-depth investigation of the dependencies between
and/or a fee. Request permissions from [email protected]. syntactical-information and relational knowledge in LLMs seems
WWW ’23 Companion, April 30–May 04, 2023, Austin, TX, USA promising, it remains underexplored. Based on these insights, we
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9419-2/23/04. . . $15.00 investigate the hypothesis that syntactic structure plays a role in
https://doi.org/10.1145/3543873.3587655 the inference of knowledge from language models. In this paper, we

1145
WWW ’23 Companion, April 30–May 04, 2023, Austin, TX, USA Linzbach and Tressel, et al.

Typological-transformation Template
The capital of [S] is [O] .
simple
[S] maintains diplomatic relations with [O].
[S] is a country and it’s capital is [O].
compound
[S] maintains diplomatic relations with countries and [O] is one of them.
[S] is the country, who’s capital is [O].
complex
[S] is a country that maintains diplomatic relations with [O].
[O] is a city and it is the city that is the capital of [S].
compound-complex
[S] is a country that maintains diplomatic relations with [O].
Table 1: Templates for ’capital of’ (1:1) and ’diplomatic relation’ (M:N)

present initial experiments to study this hypothesis and share man- memorization and knowledge types (KMIR [6], KAMEL [13]). Fur-
ually created prompts with the community 1 . We have extended thermore, the performance improvements which were achieved
T-REx to incorporate diferent grammatical structures alongside the through fne-tuning LLMs on the provided prompts were investi-
relations already provided. The key fndings of our work are that a gated. For this, an archive with diferent prompts as well as train,
simple sentence structure performs better for relational knowledge validation, and test-splits for the T-REx subset of the LAMA-probe
extraction than complex grammatical constructions. However, the called LPAQA [12] was created. (b) In addition to prompts, probing
impact of sentence structures is negligible for simpler relations tasks are often used to investigate the knowledge encoded in LLMs.
(1:1). Moreover, these relations are easier to extract than complex This method uses auxiliary classifcations with features derived
relations (N:M). from the frozen network to understand inherent information. For
Overall, this paper is organised as follows: Firstly ( Related Work), the example of transformer-based language models, probing tasks
we briefy cover the state of the relevant research for this paper. can be solved by using the output representations [29], the atten-
The main section ( Preliminary Experiments) is then divided into tion information [1] or the information change across the diferent
four subsections, three of which describe the methodology (Data, layers [11, 29]. The information derivable from those features has
Task & metrics, Prompt Engineering), while the fnal subsection been used to understand several aspects of the contextualization of
Results presents the performance of diferent models on our earlier the representation [29], the syntactic truthfulness of the attention
established experimental setting. We conclude our work with a mechanism [1], and the workfow of the layer-wise processing [11].
discussion and an outlook (Conclusion and Future Work). Enhancing the LLMs inherent knowledge:
Various types of information are used to enhance the model’s in-
2 RELATED WORK herent knowledge. Approaches range from enhancing lexical word
relations [18], in-context semantic abstractions [19], sentiment sen-
Since the proposal of transformer-based LLMs which learn repre-
sitivity [17, 30], and entity centred information [5, 21] to improving
sentations through Masked Language Modelling (MLM-task) [2],
any knowledge type [20, 31]. Knowledge enhancement approaches
two research felds have emerged: (1) Understanding the knowl-
also difer in their infusion technique. Proposals that stay the clos-
edge inherent in LLMs [25], and (2) Enhancing the LLMs’ inherent
est to pure language modelling only change the probabilities of
knowledge [33].
the corruption task in a way in which it teaches stance- [16], or
Understanding the knowledge inherent in LLMs:
entity-knowledge [27]. Another infusion strategy tries to enhance
Current research generally proposes two diferent methods to test
the model by simultaneously teaching a secondary learning objec-
the self-taught knowledge of LLMs. (a) Prompts that pose knowl-
tive. This is applied to entity- [32], sentiment- [30] and general
edge related tasks in a cloze-text format. This research direction is
linguistic knowledge [20].
heavily infuenced by the LAMA-probe proposed by [23], a cloze-
In this paper, we focus on understanding the knowledge inherent in
text data-set that encodes simple relational facts about real world
LLMs. In particular, we aim to study the impact of syntactical difer-
entities. E.g. the prompt ’Where was Dante born [MASK]?’ is paired
ences while treating LLMs as a black box model. In comparison to
with ’Florence’. Using BERT [2] for predicting missing tokens, the
Heinzerling et al. [8], we test paraphrasing motivated by linguistics.
authors show that BERT already carried a surprisingly high amount
Additionally, we open the feld for new probing-tasks [29], i.e. how
of relational knowledge. Following Petroni et al’s [15, 23] fndings,
sentence processing [11] impacts knowledge inference. Thus, we
Heinzerling et al. [8] focus on entity representations, storage capac-
gain insight into information encoding and potential directions for
ity and paraphrased queries. However, they draw a more critical
knowledge enhancement strategies.
picture of storage and query capabilities of these models. More-
over, Roberts et al. [24] investigate how much knowledge can be
stored in model parameters. To approximate the storage capacities, 3 PRELIMINARY EXPERIMENTS
they over-ft the model on knowledge triples. Since then, many 3.1 Data
probing-suites have been published to understand the impact of
In this work, we propose that utilizing cloze-text prompts ofers
a direct means of studying the impact of syntactic features on
1 https://github.com/Thrasolt/ContextualKnowledgeOfLMs knowledge retrieval in language models. Knowledge capturing

1146
Decoding Prompt Syntax: Analysing its Impact on Knowledge Retrieval in Large Language Models WWW ’23 Companion, April 30–May 04, 2023, Austin, TX, USA

Model Simple Compound Complex Compound-complex


BERT-large-cased 30.22 16.28 16.99 17.99
BERT-base-cased 28.19 12.40 12.95 15.77
BERT-base-multilingual-cased 19.99 13.40 13.39 10.97
BERT-large-uncased 3.38 1.10 1.20 0.47
BERT-base-uncased 3.07 0.75 1.54 0.77
BERT-base-multilingual-uncased 3.50 0.60 1.83 0.39
Table 2: Knowledge Retrieval Model Comparison with T-REx Data Set for average top-1 metric in percent (#Triples=34039)

Model Simple Compound Complex Compound-complex


BERT-large-cased 58.48 41.76 43.73 42.82
BERT-base-cased 57.74 37.32 39.82 38.96
BERT-base-multilingual-cased 39.29 33.36 30.68 32.17
BERT-large-uncased 11.34 6.51 6.38 5.14
BERT-base-uncased 9.33 6.78 6.07 5.36
BERT-base-multilingual-uncased 9.40 4.31 5.29 3.54
Table 3: Knowledge Retrieval Model Comparison with T-REx Data Set for Average top-10 Accuracy in percent (#Triples=34039)

Cardinality #Triples #Relations 3.2 Task & metrics


1:1 937 2 In this work, we have limited our typological paraphrasing to the
N:1 20006 23 T-REx triples (and corresponding relations) of the LAMA-probe.
N:M 13096 16 Given a set of four syntactical typologies � , and a set of subject-
Total 34039 41 relation-object triples < �, �, � > named �, we transform � in a set
Table 4: Properties of the T-REx [23] of tuples �� = {< ��� , � > | < �, �, � >∈ � }. This is achieved by
describing each � through a prompt written with the typology �
named � � . We can use this prompt to parse � so that we obtain ��� ,
� is the cloze-prediction target. Given such a set �� , we measure
prompt-templates were frst used in the LAMA-probe [23]. Those the performance of a model � for typology �, by calculating the
templates enable the parsing of a subject and object tokens to form top-k accuracy for all tuples in �� .

(��� ,� ) ∈�� 1 (� ∈ top-k� (�� ))


a correct sentence. In the test prompts, the mask token replaces the Í �
correct object-token. Thus, the model tries to predict the correct top-k� accuracy = (1)
object for a given preflled prompt. We give an example of such |�� |
samples here for relations ((1) P36, (2) P108, (3) P136): Where � is the correct label, top-k� are the � predictions with the
Template: highest probability assigned by the model �, |�� | is the number of
(1) "The capital of [S] is [O] ." samples, and 1 is the indication function. In our results, we consider
(2) "[S] works for [O] ." ���-{1, 10}� ��������. This can be noted that ���-1������ ��������
(3) "[S] plays [O] music ." is virtually equal to the evaluation conducted by Petroni et al. [23]
Prompt: with the � @1 metric.
(1) "The capital of France is [MASK]."
(2) "Tim Cook works for [MASK]." 3.3 Prompt Engineering
(3) "Bruno Mars plays [MASK] music." The LAMA-probe contains a simple sentence template for each of
Parsed: the 41 relations in the T-REx data. The ftness of these templates is
(1) "The capital of France is Paris." manually improved and tested by Petroni et al. [23]. Therefore, they
(2) "Tim Cook works for Apple." represent a natural starting point for our syntactically motivated
(3) "Bruno Mars plays funk music." prompt paraphrasing. Expanding this template to diferent syntacti-
The T-REx subset of the LAMA-Probe relies on the T-REx knowl- cal structures ofers insight into the impact of such a transformation
edge base [4] derived from Wikidata triples. The 34039 triples are on the same knowledge task. We have used four typological trans-
organized into 41 diferent Wikidata relations. For each relation, no formations, one of which is the same as the LAMA template. Thus,
more than 1000 facts are sub-sampled. All relations have a maximum we create three new prompt templates for each of the 41 relations in
of 995 and a minimum of 225 facts, with most relations specifying T-REx. The resulting four templates provide a unique grammatical
more than 900 facts. The 41 relations cover all possible cardinality structure, which is theoretically guided by the research of Rodney
types 1 : 1, � : 1 and � : �. Huddleston [10]. In our sentence typology, a simple sentence defnes

1147
WWW ’23 Companion, April 30–May 04, 2023, Austin, TX, USA Linzbach and Tressel, et al.

Results Simple Compound Complex Compound-Complex #Triples


Total 30.22 16.59 16.08 17.87 34039
1:1 70.65 69.48 67.56 58.16 937
N:1 35.07 18.36 18.18 20.01 20006
N:M 18.77 8.61 10.65 11.24 13096
Table 5: top-1 Accuracy in Percent of Bert-Base-Cased on the T-REx Data Set

Results Simple Compound Complex Compound-Complex #Triples


Total 58.48 42.18 42.75 43.09 34039
1:1 85.17 84.95 84.31 81.32 937
N:1 65.96 43.49 48.47 45.40 20006
N:M 43.57 36.62 29.70 36.18 13096
Table 6: top-10 Accuracy in Percent of Bert-Large-Cased on the T-REx Data Set

a sentence that contains only one main clause (LAMA-probe tem- at least two and at most four percentage points. Second, there is
plates). A sentence that includes two or more independent clauses a chasm in performance between cased and uncased models, as
is known as a compound sentence, while a sentence that contains the accuracy of uncased models is comparatively low. Third every
an independent clause and one or more dependent clauses is known model has a higher prediction accuracy when queried with the
as a complex sentence. Lastly, a sentence that includes two or more simple sentence compared to the other three types. Finally, the
independent clauses and at least one dependent clause is known diferences in scores among the non-������ sentence types are
as a compound-complex sentence. Table 1 shows an example of the signifcantly lower than the variations within the ������ sentence
four templates for the 1:1 relation P36, describing the predicate type. These observations also apply to the results based on the
’capital of’, and the M:N relation P530, describing the ’diplomatic top-10 accuracy, albeit with the expected higher accuracy results,
relation with’. see Table 3.
Table 5 and Table 6 show the average accuracy results for the
3.4 Results four sentence types for each of the cardinality relations for top-1
We applied these template-based-prompts to three BERT variants: and top-10 for the BERT-large-cased model. Both results show that
BERT-large, BERT-base, and BERT-base-multilingual. We include the ������ sentence type enables a higher accuracy for all three
multilingual BERT to understand the impact of named entity men- cardinalities. Additionally, in both sets of results, the performance
tions in diferent languages. Our experiments show that all investi- decreases with increasing cardinality, which is intuitive, as the
gated LLMs perform best on the simple sentence type. Additionally, difculty level increases with the number of possible subjects and
we discover that the cased models outperform the uncased models objects. For N:M relations, top-1 is an inappropriate metric, as only
by a large margin. one guess is allowed per subject.
Table 2 shows the top-1 accuracy for each model in percent for The results are the closest for the cardinality 1:1 and furthest
all four sentence types. We report slightly worse results for the apart for N:M, thus implying that the relation extraction works
top-1������ accuracy (BERT-base-cased -3.0, BERT-large-cased -1.1) best for simple sentence types and simple relations (1:1). The per-
on the LAMA-probe than in the original paper [23]. In contrast formance noticeably decreases when either sentence or relation
to Petroni et al. [23], we consistently evaluated over the whole complexity increases. Additionally, the sentence structure (typol-
vocabulary, which had a notable infuence on the reliability of the ogy) has close to no infuence on the top-10 performance for the
results for the N:M relations. Specifcally, Petroni et al. [23] exclude simple relations (1:1). However, the relations with less mutual infor-
all other valid entities except for the one they test. Nonetheless, mation between subject and object co-occurrence (N:1, M:N) show a
our results are reasonably close, given diferent reported results large decrease in performance for changes in the sentence-typology.
on the same data in other works [34]. Generally, the values of the Hence, the MLM-task does not incorporate the rules of syntactical
correct tokens are surprisingly high. The best model was able to change while keeping semantic equivalence.
predict one-third of the masked tokens correctly. However, most
comparable results achieved by the cased models are around 15 4 CONCLUSION AND FUTURE WORK
to 20 percent accurate. Most importantly, the average top-1 accu- In this paper, we investigate the impact of prompt syntax on the
racy varies signifcantly between diferent sentence types. Thus, knowledge retrieval performance of LLMs. To achieve this, we
indicating grammatical structure infuences a model’s ability to expand the well-known and commonly used T-REx subset of the
retrieve relational knowledge. This is true for all models under LAMA-probe to support diferent syntactical structures of prompts.
investigation. our preliminary results show, that the impact of syntax is only mar-
From this, we draw four conclusions: First, the BERT-large-cased ginal for simple relations (1:1). In general, simple prompts should
model outperforms all other models on all four sentence types by be the preferred way of querying. Most importantly, we show that

1148
Decoding Prompt Syntax: Analysing its Impact on Knowledge Retrieval in Large Language Models WWW ’23 Companion, April 30–May 04, 2023, Austin, TX, USA

LLMs indeed struggle to generalise knowledge across grammatical representations. arXiv preprint arXiv:1909.04164 (2019).
structures. This fnding highlights the importance of the relation- [22] Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim Rocktäschel, Yuxiang Wu,
Alexander H. Miller, and Sebastian Riedel. 2020. How Context Afects Language
ship between syntax and semantics within LLMs as a crossroad of Models’ Factual Predictions. arXiv:2005.04611 [cs.CL]
human and machine language representation. Consequently, we [23] Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin,
Yuxiang Wu, and Alexander Miller. 2019. Language Models as Knowledge
will focus on a deeper analysis of the disparities in information Bases?. In Proceedings of the 2019 Conference on Empirical Methods in Natu-
coding for typologically diferent templates. These disparities may ral Language Processing and the 9th International Joint Conference on Natu-
be refected in the attention mechanism [1], the predicted token- ral Language Processing (EMNLP-IJCNLP). ACL, Hong Kong, China, 2463–2473.
https://doi.org/10.18653/v1/D19-1250
distribution [3] or the diferences in mask representation among [24] Adam Roberts, Colin Rafel, and Noam Shazeer. 2020. How Much Knowl-
the various typologies per relation. edge Can You Pack Into the Parameters of a Language Model? arXiv preprint
arXiv:2002.08910 (2020).
[25] Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2021. A primer in BERTology:
REFERENCES What we know about how BERT works. Transactions of the Association for
[1] Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D Manning. Computational Linguistics 8 (2021), 842–866.
2019. What does bert look at? an analysis of bert’s attention. arXiv preprint [26] Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum.
arXiv:1906.04341 (2019). 2018. Linguistically-Informed Self-Attention for Semantic Role Labeling. In
[2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Proceedings of the 2018 Conference on Empirical Methods in Natural Language
Pre-training of deep bidirectional transformers for language understanding. arXiv Processing. Association for Computational Linguistics, Brussels, Belgium, 5027–
preprint arXiv:1810.04805 (2018). 5038. https://doi.org/10.18653/v1/D18-1548
[3] Yanai Elazar, Shauli Ravfogel, Alon Jacovi, and Yoav Goldberg. 2021. Amnesic [27] Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin
probing: Behavioral explanation with amnesic counterfactuals. Transactions of Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representa-
the Association for Computational Linguistics 9 (2021), 160–175. tion through knowledge integration. arXiv preprint arXiv:1904.09223 (2019).
[4] Hady Elsahar, Pavlos Vougiouklis, Arslen Remaci, Christophe Gravier, Jonathon [28] Dhanasekar Sundararaman, Vivek Subramanian, Guoyin Wang, Shijing Si, Ding-
Hare, Frederique Laforest, and Elena Simperl. 2018. T-rex: A large scale alignment han Shen, Dong Wang, and Lawrence Carin. 2021. Syntactic Knowledge-Infused
of natural language with knowledge base triples. In Proceedings of the Eleventh Transformer and BERT models. In CEUR Workshop Proceedings, Vol. 3052. CEUR
International Conference on Language Resources and Evaluation (LREC 2018). Workshop Proceedings.
[5] Thibault Févry, Livio Baldini Soares, Nicholas FitzGerald, Eunsol Choi, and Tom [29] Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy,
Kwiatkowski. 2020. Entities as experts: Sparse memory access with entity super- Najoung Kim, Benjamin Van Durme, Samuel R Bowman, Dipanjan Das, et al. 2019.
vision. arXiv preprint arXiv:2004.07202 (2020). What do you learn from context? probing for sentence structure in contextualized
[6] Daniel Gao, Yantao Jia, Lei Li, Chengzhen Fu, Zhicheng Dou, Hao Jiang, Xinyu word representations. arXiv preprint arXiv:1905.06316 (2019).
Zhang, Lei Chen, and Zhao Cao. 2022. KMIR: A Benchmark for Evaluating [30] Hao Tian, Can Gao, Xinyan Xiao, Hao Liu, Bolei He, Hua Wu, Haifeng Wang, and
Knowledge Memorization, Identifcation and Reasoning Abilities of Language Feng Wu. 2020. SKEP: Sentiment knowledge enhanced pre-training for sentiment
Models. arXiv preprint arXiv:2202.13529 (2022). analysis. arXiv preprint arXiv:2005.05635 (2020).
[7] Yoav Goldberg. 2019. Assessing BERT’s Syntactic Abilities. [31] Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Guihong
arXiv:1901.05287 [cs.CL] Cao, Daxin Jiang, Ming Zhou, et al. 2020. K-adapter: Infusing knowledge into
[8] Benjamin Heinzerling and Kentaro Inui. 2020. Language models as knowledge pre-trained models with adapters. arXiv preprint arXiv:2002.01808 (2020).
bases: On entity representations, storage capacity, and paraphrased queries. arXiv [32] Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhengyan Zhang, Zhiyuan Liu,
preprint arXiv:2008.09036 (2020). Juanzi Li, and Jian Tang. 2021. KEPLER: A unifed model for knowledge embed-
[9] Jennifer Hu, Jon Gauthier, Peng Qian, Ethan Wilcox, and Roger Levy. 2020. A ding and pre-trained language representation. Transactions of the Association for
Systematic Assessment of Syntactic Generalization in Neural Language Models. Computational Linguistics 9 (2021), 176–194.
In Proceedings of ACL. Association for Computational Linguistics. [33] Chaoqi Zhen, Yanlei Shang, Xiangyu Liu, Yifei Li, Yong Chen, and Dell Zhang.
[10] Rodney Huddleston. 1984. Introduction to the Grammar of English. Cambridge 2022. A Survey on Knowledge-Enhanced Pre-trained Language Models. arXiv
University Press. preprint arXiv:2212.13428 (2022).
[11] Ganesh Jawahar, Benoît Sagot, and Djamé Seddah. 2019. What does BERT [34] Zexuan Zhong, Dan Friedman, and Danqi Chen. 2021. Factual Probing Is [MASK]:
learn about the structure of language?. In ACL 2019-57th Annual Meeting of the Learning vs. Learning to Recall. In North American Association for Computational
Association for Computational Linguistics. Linguistics (NAACL).
[12] Zhengbao Jiang, Frank F Xu, Jun Araki, and Graham Neubig. 2020. How can we
know what language models know? Transactions of the Association for Computa-
tional Linguistics 8 (2020), 423–438.
[13] Jan-Christoph Kalo and Leandra Fichtel. 2022. KAMEL: Knowledge Analysis
with Multitoken Entities in Language Models. In Proceedings of the Conference on
Automated Knowledge Base Construction.
[14] Nora Kassner, Benno Krojer, and Hinrich Schütze. 2020. Are Pretrained Language
Models Symbolic Reasoners Over Knowledge? arXiv preprint arXiv:2006.10413
(2020).
[15] Nora Kassner, Benno Krojer, and Hinrich Schütze. 2020. Pre-trained Language
Models as Symbolic Reasoners over Knowledge?, In Proceedings of the 24th
Conference on Computational Natural Language Learning. CoRR.
[16] Kornraphop Kawintiranon and Lisa Singh. 2021. Knowledge enhanced masked
language model for stance detection. In Proceedings of the 2021 conference of the
north american chapter of the association for computational linguistics: human
language technologies. 4725–4735.
[17] Pei Ke, Haozhe Ji, Siyang Liu, Xiaoyan Zhu, and Minlie Huang. 2019. SentiLARE:
Sentiment-aware language representation learning with linguistic knowledge.
arXiv preprint arXiv:1911.02493 (2019).
[18] Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti, Anna Korhonen, and Goran
Glavaš. 2019. Specializing Unsupervised Pretraining Models for Word-Level
Semantic Similarity. arXiv:1909.02339 [cs.CL]
[19] Yoav Levine, Barak Lenz, Or Dagan, Ori Ram, Dan Padnos, Or Sharir, Shai Shalev-
Shwartz, Amnon Shashua, and Yoav Shoham. 2019. Sensebert: Driving some
sense into bert. arXiv preprint arXiv:1908.05646 (2019).
[20] Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2019. Multi-
task deep neural networks for natural language understanding. arXiv preprint
arXiv:1901.11504 (2019).
[21] Matthew E Peters, Mark Neumann, Robert L Logan IV, Roy Schwartz, Vidur Joshi,
Sameer Singh, and Noah A Smith. 2019. Knowledge enhanced contextual word

1149

You might also like