0% found this document useful (0 votes)

5 views

KnowPrompt

Uploaded by

1758704861

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

KnowPrompt

Uploaded by

1758704861

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

KnowPrompt: Knowledge-aware Prompt-tuning with

Synergistic Optimization for Relation Extraction

Xiang Chen, Ningyu Zhang∗ Xin Xie, Shumin Deng Yunzhi Yao
Zhejiang University Zhejiang University Zhejiang University
AZFT Joint Lab for Knowledge Engine AZFT Joint Lab for Knowledge Engine AZFT Joint Lab for Knowledge Engine
Hangzhou Innovation Center Hangzhou Innovation Center Hangzhou Innovation Center
Hangzhou, Zhejiang, China Hangzhou, Zhejiang, China Hangzhou, Zhejiang, China
{xiang_chen,zhangningyu}@zju.edu.cn {xx2020,231sm}@zju.edu.cn [email protected]

Chuanqi Tan, Fei Huang Luo Si Huajun Chen∗

arXiv:2104.07650v6 [cs.CL] 23 Jan 2022

Alibaba Group Alibaba Group Zhejiang University

Hangzhou, Zhejiang, China Hangzhou, Zhejiang, China AZFT Joint Lab for Knowledge Engine
{chuanqi.tcq,f.huang}@alibaba- [email protected] Hangzhou Innovation Center
inc.com Hangzhou, Zhejiang, China
[email protected]
ABSTRACT ACM Reference Format:
Recently, prompt-tuning has achieved promising results for specific Xiang Chen, Ningyu Zhang, Xin Xie, Shumin Deng, Yunzhi Yao, Chuanqi
Tan, Fei Huang, Luo Si, and Huajun Chen. 2022. KnowPrompt: Knowledge-
few-shot classification tasks. The core idea of prompt-tuning is to
aware Prompt-tuning with Synergistic Optimization for Relation Extraction.
insert text pieces (i.e., templates) into the input and transform a In Proceedings of the ACM Web Conference 2022 (WWW ’22), April 25–29,
classification task into a masked language modeling problem. How- 2022, Virtual Event, Lyon, France. ACM, New York, NY, USA, 11 pages. https:
ever, for relation extraction, determining an appropriate prompt //doi.org/10.1145/3485447.3511998
template requires domain expertise, and it is cumbersome and time-
consuming to obtain a suitable label word. Furthermore, there exists
abundant semantic and prior knowledge among the relation labels
that cannot be ignored. To this end, we focus on incorporating
1 INTRODUCTION
knowledge among relation labels into prompt-tuning for relation Relation Extraction (RE) aims to extract structured knowledge from
extraction and propose a Knowledge-aware Prompt-tuning ap- unstructured text and plays a critical role in information extraction
proach with synergistic optimization (KnowPrompt). Specifically, and knowledge base construction. RE appeals to many researchers
we inject latent knowledge contained in relation labels into prompt [5, 24, 45, 50, 58, 59, 62] due to the capability to extract textual
construction with learnable virtual type words and answer words. information and benefit many web applications, e.g., information
Then, we synergistically optimize their representation with struc- retrieval, web mining, and question answering.
tured constraints. Extensive experimental results on five datasets Previous self-supervised pre-trained language models (PLMs)
with standard and low-resource settings demonstrate the effec- such as BERT [10] have achieved state-of-the-art (SOTA) results in
tiveness of our approach. Our code and datasets are available in lots of RE benchmarks. However, since fine-tuning requires adding
GitHub1 for reproducibility. extra classifiers on top of PLMs and further training the models un-
der classification objectives, their performance heavily depends on
CCS CONCEPTS time-consuming and labor-intensive annotated data, making it hard
to generalize well. Recently, a series of studies using prompt-tuning
• Computing methodologies → Information extraction.
[11, 18, 27, 43, 44] to address this issue: adopting the pre-trained
KEYWORDS LM directly as a predictor by completing a cloze task to bridge
the gap between pre-training and fine-tuning. Prompt-tuning fuses
Relation Extraction, Prompt-tuning, Knowledge-aware the original input with the prompt template to predict [MASK] and
∗ Corresponding author. then maps the predicted label words to the corresponding class sets,
1 https://github.com/zjunlp/KnowPrompt
which has induced better performances for PLMs on few-shot tasks.
As shown in Figure 1 (a), a typical prompt for text classification
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed consists of a template (e.g. “<𝑆 1 > It is [MASK] ”) and a set of label
for profit or commercial advantage and that copies bear this notice and the full citation words (“great”, “terrible”etc.) as candidates to predict [MASK]. PLMs
on the first page. Copyrights for components of this work owned by others than ACM predict (“great”, “terrible”, etc.) at the masked position to deter-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a mine the label of the sentence “<𝑆 1 >”. In a nutshell, prompt-tuning
fee. Request permissions from [email protected]. involves template engineering and verbalizer engineering, which
WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France aims to search for the best template and an answer space [35].
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9096-5/22/04. . . $15.00 Despite the success of prompt-tuning PLMs for text classifica-
https://doi.org/10.1145/3485447.3511998 tion tasks, there are still several non-trivial challenges for RE with
[CLS] The cast is uniformly excellent and relaxed . [SEP]
implicit structural constraints among entities and relations, and vir-
It is [MASK] . [SEP] (a)
tual words should be consistent with the surrounding contexts, we
Prompt for great positive
Text Classiﬁcation MLM
terrible negative
introduce synergistic optimization to obtain optimized virtual type
and answer words (§4.2). Concretely, we propose a context-aware
[CLS] Hamilton is the ﬁrst British champion. [SEP]
prompt calibration method with implicit structural constraints to
(b)
Hamilton [MASK] British [SEP] inject structural knowledge implications among relational triples
per:country_of_birth
MLM org:city_of_headquarters
and associate prompt embeddings with each other.
KnowPrompt for per:country_of_residence
Relation Extraction
mutual restrict
(c) 2 RELATED WORK
P Hamilton [MASK] British C [SEP]
per:country_of_birth
MLM org:city_of_headquarters 2.1 Relation Extraction
per:country_of_residence
P person C country Relation Extraction (RE) involves extracting the relation between
virtual type words two given entities based on their related context, which plays an
essential task in information extraction and knowledge base con-
Figure 1: Examples of prompt-tuning to stimulate the struction. Early approaches involve pattern-based methods [8, 29],
knowledge of PLMs by formalizing specific tasks as cloze- CNN/RNN-based [56, 62, 65] and graph-based methods [19, 20, 61].
style tasks. The P and C in dashed balls represents the virtual With the recent advances in pre-trained language models [10], ap-
type words with semantics close to person and country. plying PLMs as the backbone of RE systems [32, 48, 53, 57, 60, 64,
67] has become standard procedure. Several studies have shown
that BERT-based models significantly outperform both RNN and
prompt-tuning as follows: on the one hand, determining the appro- graph-based models [30, 49, 54]. Meanwhile, a series of knowledge-
priate prompt template for RE requires domain expertise, and auto- enhanced PLMs have been further explored, which use knowledge
constructing a high-performing prompt with input entities often bases as additional information to enhance PLMs. Among them,
requires additional computation cost for generation and verification MTB[3] propose matching the blanks based on BERT, which is a RE-
[15, 42, 44, 46]; on the other hand, the computational complexity oriented pre-trained method to learn relational patterns from text.
of the label word search process is very high (e.g., usually exponen- SPANBERT [30] adopt knowledge to enhance learning objectives,
tially depending on the number of categories) when the length of KNOWBERT [38] propose to incorporate knowledge into input
the relation label varies, and it is non-trivial to obtain a suitable features, and LUKE [52] leverage knowledge to improve model
target label word in the vocabulary to represent the specific relation architectures. We compare with this line of work here for their
label. For example, the relation labels of 𝑝𝑒𝑟 : 𝑐𝑜𝑢𝑛𝑡𝑟𝑦_𝑜 𝑓 _𝑏𝑖𝑟𝑡ℎ promotion comes from relational knowledge of external sources.
and 𝑜𝑟𝑔 : 𝑐𝑖𝑡𝑦_𝑜 𝑓 _ℎ𝑒𝑎𝑑𝑞𝑢𝑎𝑟𝑡𝑒𝑟𝑠 cannot specify a single suitable In contrast to them, we focus on learning from the text itself in
label word in the vocabulary. In addition, there exists rich seman- the paper. Recently, Xue et al. [51] propose a multi-view graph
tic knowledge among relation labels and structural knowledge based on BERT, achieving SOTA performance both on TACRED-
implications among relational triples, which cannot be ignored. Revisit [1] and DialogRE [54]. Thus, we also choose the latest graph
For example, as shown in Figure 1 (b) and (c), if a pair of entities methods based on BERT for RE as our baselines to demonstrate the
contains the semantics of “person” and “country”, the prediction effectiveness of our KnowPrompt.
probability of the [MASK] on the relation “org:city_of_headquarters” Some previous studies [9] have focused on the few-shot setting
will be lower. Conversely, the relation also restricts the types of its since available annotated instances may be limited in practice. Dong
subject and object entity. Previous studies [4, 13, 33] indicate that et al. [14], Gao et al. [16, 17], Han et al. [23], Qu et al. [39], Yu et al.
incorporating the relational knowledge will provide evidence for [55] propose approaches for few-shot RE based on meta-learning
RE. or metric learning, with the aim of developing models that can be
To address those issues, we take the first step to inject knowl- trained with only a few labeled sentences and nonetheless general-
edge into learnable prompts and propose a novel Knowledge-aware ize well. In contrast to previous N-way K-shot approaches, Gao et al.
Prompt-tuning with synergistic optimization (KnowPrompt) ap- [15] utilize a setting that is relatively practical both for acquiring a
proach for RE. We construct prompt with knowledge injection via few annotations (e.g., 16 examples per class) and efficiently training.
learnable virtual answer words and virtual type words to alleviate
labor-intensive prompt engineering (§4.1). To be specific, instead of 2.2 Prompt-tuning
a regular verbalizer that mapping from one label word in vocabulary Prompt-tuning methods are fueled by the birth of GPT-3 [7] and
to the particular class, we creatively propose to leverage learnable have achieved outstanding performance in widespread NLP tasks.
virtual answer words by injecting in semantic knowledge to With appropriate manual prompts, series of studies [6, 31, 35, 37, 40,
represent relation labels. Furthermore, we assign learnable vir- 41] have been proposed, demonstrating the advancement of prompt-
tual type words surrounding entities to hold the role of weakened tuning. Hu et al. [28] propose to incorporate external knowledge
Type Marker [66], which are initialized with prior knowledge into the verbalizer with calibration. Ding et al. [12] apply prompt-
maintained in relation labels. Notably, we innovatively leverage tuning to entity typing with prompt-learning by constructing an
learnable virtual type words to dynamically adjust accord- entity-oriented verbalizer and templates. To avoid labor-intensive
ing to context rather than utilizing annotation of the entity prompt design, automatic searches for discrete prompts have been
type, which may not be available in datasets. Since there exist extensively explored. Gao et al. [15], Schick et al. [42] first explore
the automatic generation of ans words and templates. Shin et al. [46] right token at the masked position, we can formalize 𝑝 (𝑦|𝑥) with
further propose gradient-guided search to generate the template the probability distribution over V at the masked position, that is,
and label word in vocabulary automatically. Recently, some contin- 𝑝 (𝑦|𝑥) = 𝑝 ([MASK] = M (𝑦)|𝑥 prompt ). Taking the binary sentiment
uous prompts have also been proposed [21, 25, 34, 36], which focus classification task described as an example, we set the template
on utilizing learnable continuous embeddings as prompt templates 𝑇 (·) = “ · It is[MASK].” and map 𝑥 to 𝑥 prompt = “𝑥 It is[MASK].”. We
rather than label words. However, these works can not adapted to can then obtain the hidden vector of [MASK] by encoding 𝑥 prompt
RE directly. by L and produce a probability distribution 𝑝 ([MASK]|𝑥 prompt ), de-
For relation extraction, Han et al. [22] proposes a model called scribing which tokens of V are suitable for replacing the [MASK]
PTR, which creatively applies logic rules to construct prompts with word. Since previous study for prompt-learning involves searching
several sub-prompts. Compared with their approach, our approach or generating label words here, we simply set M (𝑦 = “𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒”) →
has three significant differences. Firstly, we propose virtual an- “𝑔𝑟𝑒𝑎𝑡” and M (𝑦 = “𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒”) → “𝑡𝑒𝑟𝑟𝑖𝑏𝑙𝑒” as examples. Accord-
swer words to represent specific relation labels rather than ing to whether L predicts "great" or "terrible", we can identify if
multiple sub-prompt in PTR. Essentially, our method is model- the label of instance 𝑥 is either 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 or 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒.
agnostic that can be applied to generative LMs, while PTR fails
due to its sub-prompt mechanism. Secondly, we construct prompt 4 METHODOLOGY
with knowledge injection via learnable virtual type words and vir- In this section, we introduce our Knowledge-aware Prompt-tuning
tual answer words to alleviate labor-intensive prompt engineering with synergistic optimization (KnowPrompt) approach to be aware
rather than predefined rules; thus, our method is more flexible of semantic and prior knowledge contained in relation labels for
and can generalize to different RE datasets easily. Thirdly, we syn- relation extraction. As shown in Figure 2, we elucidate the details
ergistically optimize virtual type words and answer words with of how to construct (§4.1), optimize (§4.2) the KnowPrompt.
knowledge constraints and associate prompt embeddings with each
other. 4.1 Prompt Construction with Knowledge
Injection
3 BACKGROUND
Because a typical prompt consists of two parts, namely a template
An RE dataset can be denoted as D = {X, Y}, where X is the set of and a set of label words, we propose the construction of virtual
examples and Y is the set of relation labels. For each example 𝑥 = type words and virtual answer words with knowledge injection for
{𝑤 1, 𝑤 2, 𝑤𝑠 . . . 𝑤𝑜 , . . . 𝑤𝑛 }, the goal of RE is to predict the relation the RE task.
𝑦 ∈ Y between subject entity 𝑤𝑠 and object entity 𝑤𝑜 (since one Entity Knowledge Injection. Note that Type Marker [66]
entity may have multiple tokens, we simply utilize 𝑤𝑠 and 𝑤𝑜 to methods can additionally introduce the type information of entities
represent all entities briefly.). to improve performance but require additional annotation of type
information, which is not always available in datasets. However,
3.1 Fine-tuning PLMs for RE we can obtain the scope of the potential entity types with prior
Given a pre-trained language model (PLM) L for RE, previous fine- knowledge contained in a specific relation, rather than annota-
tuning methods first convert the instance 𝑥 = {𝑤 1, 𝑤𝑠 . . . 𝑤𝑜 , . . . 𝑤𝑛 } tion. For instance, given the relation “per:country_of_birth”, it is
into an input sequence of PLM, such as [CLS]𝑥[SEP]. The PLM evident that the subject entity matching this relation belongs to
L encodes the input sequence into the corresponding output hid- “person” and the object entity matching this relation belongs to
den vectors such as h = {h[CLS], h1, h𝑠 , . . . , h𝑜 , . . . , h[SEP] }. Nor- “country”. Intuitively, we estimate the prior distributions 𝜙𝑠𝑢𝑏
mally, a [CLS] head is utilized to compute the probability distri- and 𝜙𝑜𝑏 𝑗 over the candidate set 𝐶𝑠𝑢𝑏 and 𝐶𝑜𝑏 𝑗 of potential en-
bution over the class set Y with the softmax function 𝑝 (·|𝑥) = tity types, respectively, according to the relation class, where the
Softmax(Wh[CLS] ), where h[CLS] is the output embedding of [CLS] prior distributions are estimated by frequency statistics. Take 𝐶𝑠𝑢𝑏
and W is a randomly initialized matrix that needs to be optimized. and 𝐶𝑜𝑏 𝑗 of partial relation labels listed in the Table 1 as an ex-
The parameters of L and W are fine-tuned by minimizing the ample, the prior distributions for 𝐶𝑠𝑢𝑏 can be counted as: 𝜙𝑠𝑢𝑏 =
cross-entropy loss over 𝑝 (𝑦|𝑥) on the entire X. {“𝑜𝑟𝑔𝑎𝑛𝑖𝑧𝑎𝑡𝑖𝑜𝑛” : 3/6, “𝑝𝑒𝑟𝑠𝑜𝑛” : 3/6}. Because of this, we assign
virtual type words around the entities, which are initialized with
3.2 Prompt-Tuning of PLMs aggregated embeddings of the set of potential entity types. Since ini-
Prompt-tuning is proposed to bridge the gap between the pre- tialized virtual type words are not precise types for specific entities,
training tasks and downstream tasks. The challenge is to construct those learnable virtual type words can dynamically adjust accord-
an appropriate template T (·) and label words V, which are collec- ing to context and play the weakened role of Type Marker for
tively referred to as a prompt P. For each instance 𝑥, the template RE. The specific initialization method is as follows:
∑︁
is leveraged to map 𝑥 to prompt the input 𝑥 prompt = 𝑇 (𝑥). Con- ê [𝑠𝑢𝑏 ] = 𝜙𝑠𝑢𝑏 · e (𝐼 (C𝑠𝑢𝑏 )) , (1)
cretely, template T (·) involves the location and number of added
additional words. V refers to a set of label words in the vocab- ∑︁
ulary of a language model L, and M : Y → V is an injective ê [𝑜𝑏 𝑗 ] = 𝜙𝑜𝑏 𝑗 · e 𝐼 C𝑜𝑏 𝑗 , (2)
mapping that connects task labels to label words V. In addition where ê [𝑠𝑢𝑏 ] and ê [𝑜𝑏 𝑗 ] represent the embeddings of virtual
to retaining the original tokens in 𝑥, one or more [MASK] is placed type words surrounding the subject and object entities, 𝐼 (·) is the
into 𝑥 prompt for L to fill the label words. As L can predict the deduplication operations on sets, and e is the word-embedding
(a) Fine-Tuning for RE
per:founded_by
CLS
per:date_of_birth
Head
...

[CLS] Steve Jobs , co-founder of Apple . [SEP]

(b) KnowPrompt
CE Loss
no_relation
per:employee_of
org:founded_by
per:date_of_birth MLM Head
per:stateorprovinces_of_residence
Relation
Embedding Head
relation probabilities

[CLS] Steve Jobs , co-founder of Apple . [SEP] Apple [MASK] Steve Jobs [SEP]

relation
Knowledge Injection subject object
virtual type words person virtual answer words

Relation
date Embedding Head Structured Loss

organization
Synergistic Optimization

Figure 2: Model architecture of fine-tuning for RE (Figure a), and proposed KnowPrompt (Figure b) approach (Best viewed in
color). The answer word described in the paper refers to the virtual answer word we proposed.

layer of L. Since the virtual type words designed based on the prior where ê [𝑟𝑒𝑙 ] (𝑣 ′ )is the embedding of virtual label word 𝑣 ′ , e repre-
knowledge within relation labels can initially perceive the range sents the word-embedding layer of L. It is noticed that the knowl-
of entity types, it can be further optimized according to context edgeable initialization process of virtual answer words may be
to express semantic information close to the actual entity type, regarded as a great anchor; we can further optimize them based on
holding the role similar to Typer Marker. context to express optimal semantic information, leading to better
Relation Knowledge Injection. Previous studies on prompt- performance.
tuning usually form a one-one mapping between one label word in
the vocabulary and one task label by automatic generation, which
maintains large computational complexity of the search process 4.2 Synergistic Optimization with Knowledge
and fails to leverage the abundant semantic knowledge in relation Constraints
labels for RE. To this end, we assume that there exists a virtual Since there exist close interaction and connection between en-
answer word 𝑣 ′ ∈ V ′ in the vocabulary space of PLMs, which tity types and relation labels, and those virtual type words as
can represent the implicit semantics of the relation. From this per- well as answer words should be associated with the surround-
spective, we expand the MLM Head layer of L with extra learnable ing context, we further introduce a synergistic optimization
relation embeddings as the virtual answer word sets V ′ to com- method with implicit structural constraints over the parameter
pletely represent the corresponding relation labels Y. Thus, we can set {ê [𝑠𝑢𝑏 ] , ê [𝑜𝑏 𝑗 ] , ê [𝑟𝑒𝑙 ] (V ′ )} of virtual type words and virtual
reformalize 𝑝 (𝑦|𝑥) with the probability distribution over V ′ at the answer words.
masked position. We propose to encodes semantic knowledge Context-aware Prompt Calibration. Although our virtual
about the label and facilitates the process of RE. Concretely, type and answer words are initialized based on knowledge, they
we set the 𝜙𝑅 = [𝜙𝑟 1, 𝜙𝑟 2, ..., 𝜙𝑟𝑚 ] and C𝑅 = [C𝑟 1, C𝑟 2, ..., C𝑟𝑚 ], may not be optimal in the latent variable space. They should be as-
where 𝜙𝑟 represent the probability distribution over the candidate sociated with the surrounding context. Thus, further optimization
set C𝑟 of the semantic words of relation by disassembling the is necessary by perceiving the context to calibrate their represen-
relation label 𝑟 , 𝑚 is the number of relation labels. Furthermore, we tation. Given the probability distribution 𝑝 (𝑦|𝑥) = 𝑝 ( [MASK] =
adopt the weighted average function for 𝜙𝑟 to average embeddings V ′ |𝑥 prompt ) over V ′ at the masked position, we optimize the vir-
of each words among C𝑟 to initialize these relation embeddings, tual type words as well as answer words by the loss function com-
which can inject the semantic knowledge of relations. The specific puted as the cross-entropy between y and 𝑝 (𝑦|𝑥) as follows:
decomposition process is shown in Table 1, and the learnable rela-
tion embedding of virtual answer word 𝑣 ′ = M (𝑦) is initialized as
follows:
1 ∑︁
′
J[MASK] = − y log 𝑝 (𝑦|𝑥), (4)
ê [𝑟𝑒𝑙 ] (𝑣 ) = 𝜙𝑟 · e (C𝑟 ) , (3) |X|
𝑥 ∈X
Table 1: Examples of some relations of the datasets TACREV, and relation-specific C𝑠𝑢𝑏 ,C𝑜𝑏 𝑗 and C𝑟 .

Relation Labels C𝑠𝑢𝑏 C𝑜𝑏 𝑗 C𝑟 (Disassembled Relation Prepared for Virtual Answer Words)
per:country_of_birth person country {“country”, “of”, “birth” }
per:data_of_death person data {“data”, “of”, “death” }
per:schools_attended person organization {“school”,“attended’}
org:alternate_names organization organization {“alternate”, “names” }
org:city_of_headquarters organization city {“city”, “of”, “headquarters” }
org:number_of_employees/members organization number {“number”, “of”, “employees”, “members” }

where |X| represents the numbers of the training dataset. The of virtual type words and virtual answer words with a large learning
learnable words may adaptively obtain optimal representations for rate 𝑙𝑟 1 to obtain the optimal prompt as follows:
prompt-tuning through a synergistic type and answer optimization. J = J[MASK] + 𝜆Jstructured, (7)
Implicit Structured Constraints. To integrate structural knowl-
edge into KnowPrompt, we adopt additional structured constraints where 𝜆 is the hyperparameter, and Jstructured and J[MASK] are
to optimize prompts. Specifically, we use a triplet (𝑠, 𝑟, 𝑜) to describe the losses for the KE and [MASK] prediction, respectively. Second,
a relational fact; here, 𝑠, 𝑜 represent the virtual types of subject and based on the optimized virtual type words and answer words, we
object entities, respectively, and 𝑟 is the relation label within a pre- utilize the object function J[MASK] to tune the parameters of the
defined set of answer words V ′ . In KnowPrompt, instead of using PLM with prompt (optimizing overall parameters) with a small
pre-trained knowledge graph embeddings2 , we directly leverage learning rate 𝑙𝑟 2 . For more experimental details, please refer to the
the output embedding of virtual type words and virtual answer Appendix.
words through LMs to participate in the calculation. We define the
loss Jstruct of implicit structured constraints as follows: 5 EXPERIMENTS
Jstructured = − log 𝜎 (𝛾 − 𝑑𝑟 (s, o)) 5.1 Datasets
𝑛 For comprehensive experiments, we carry out our experiments on
∑︁ 1 (5)
− log 𝜎 (𝑑𝑟 (si′, oi′ ) − 𝛾), five RE datasets: SemEval 2010 Task 8 (SemEval) [26], DialogRE [54],
𝑛
𝑖=1 TACRED [63], TACRED-Revisit [1], Re-TACRED [47]. Statistical
details are provided in Table 2 and Appendix A:
𝑑𝑟 (s, o) = ∥s + r − o∥ 2, (6)
where (𝑠𝑖′, 𝑟, 𝑜𝑖′ )
are negative samples, 𝛾 is the margin, 𝜎 refers 5.2 Experimental Settings
to the sigmoid function and 𝑑𝑟 is the scoring function. For negative For fine-tuning vanilla PLMs and our KnowPrompt, we utilize
sampling, we assign the correct virtual answer words at the position RoBERT_large for all experiments to make a fair comparison
of [MASK] and randomly sample the subject entity or object entity (except for DialogRE, we adopt RoBERTa_base to compare with
and replace it with an irrelevant one to construct corrupt triples, in previous methods). For test metrics, we use micro 𝐹 1 scores of RE as
which the entity has an impossible type for the current relation. the primary metric to evaluate models, considering that 𝐹 1 scores
can assess the overall performance of precision and recall. We use
different settings for standard and low-resource experiments.All
Table 2: Statistics for RE datasets used in the paper, includ-
detailed settings for our KnowPrompt, Fine-tuning and PTR can be
ing numbers of relations and instances in the different split.
found in the Appendix B, C and D.
For dialogue-level DialogRE, instance refers to the number
Standard Setting. In the standard setting, we utilize full Dtrain
of documents.
to fine-tune. Considering that entity information is essential for
models to understand relational semantics, a series of knowledge-
Dataset # Train. # Val. # Test. # Rel. enhanced PLMs have been further explored using knowledge graphs
SemEval 6,507 1,493 2,717 19 as additional information to enhance PLMs. Specifically, we select
DialogRE 5,963 1,928 1,858 36 SpanBERT [30], KnowBERT [38], LUKE [52], and MTB [3] as our
TACRED 68,124 22,631 15,509 42
strong baselines, which are typical models that use external knowl-
TACRED-Revisit 68,124 22,631 15,509 42
Re-TACRED 58,465 19,584 13,418 40 edge to enhance learning objectives, input features, model archi-
tectures, or pre-training strategies. We also compare several SOTA
models on DialogRE, in which one challenge is that each entity pair
has more than one relation.
4.3 Training Details Low-Resource Setting. We conducted 8-, 16-, and 32-shot ex-
Our approach has a two-stage optimization procedure. First, we syn- periments following LM-BFF [15, 22] to measure the average perfor-
ergistically optimize the parameter set {ê [𝑠𝑢𝑏 ] , ê [𝑜𝑏 𝑗 ] , ê [𝑟𝑒𝑙 ] (V ′ )} mance across five different randomly sampled data based on every
experiment using a fixed set of seeds Sseed . Specifically, we sample
2 Note that pre-trained knowledge graph embeddings are heterogeneous compared 𝑘 instances of each class from the initial training and validation
with pre-trained language model embeddings. sets to form the few-shot training and validation sets.
Table 3: Standard RE performance of 𝐹 1 scores (%) on different test sets. “w/o” means that no additional data is used for pre-
training and fine-tuning, yet “w/” means that the model uses extra data for tasks. It is worth noting that “†” indicates we
exceptionally rerun the code of KnowPrompt and PTR with RoBERT_base for a fair comparison with current SOTA models
on DialogRE. Subscript in red represents advantages of KnowPrompt over the best results of baselines. Best results are bold.

Standard Supervised Setting

Methods Extra Data SemEval DialogRE† TACRED TACRED-Revisit Re-TACRED
Fine-tuning pre-trained models
Fine-tuning-[Roberta] w/o 87.6 57.3 68.7 76.0 84.9
SpanBERT [30] w/ - - 70.8 78.0 85.3
KnowBERT [38] w/ 89.1 - 71.5 79.3 89.1
LUKE [52] w/ - - 72.7 80.6 -
MTB [3] w/ 89.5 - 70.1 - -
GDPNet [51] w/o - 64.9 71.5 79.3 -
Dual [2] w/o - 67.3 - - -
Prompt-tuning pre-trained models
PTR-[Roberta] [22] w/o 89.9 63.2 72.4 81.4 90.9
KnowPrompt-[Roberta] w/o 90.2 (+0.3) 68.6 (+5.4) 72.4 (-0.3) 82.4 (+1.0) 91.3 (+0.4)

5.3 Main Results and works in the same period as our KnowPrompt. Thus, we make
Standard Result. As shown in Table 3, the knowledge-enhanced a comprehensively comparative analysis between KnowPrompt
PLMs yield better performance than the vanilla Fine-tuning. This and PTR, and summarize the comparison in Table 7. The specific
result illustrates that it is practical to inject task-specific knowledge analysis is as follows:
to enhance models, indicating that simply fine-tuning PLMs cannot Firstly, PTR adopt a fixed number of multi-token answer form
perceive knowledge obtained from pre-training. and LM-BFF leverage actual label word with single-token answer
Note that our KnowPrompt achieves improvements over all base- form, while KnowPrompt propose virtual answer word with
lines and even achieves better performance than those knowledge- single-token answer form. Thus, PTR needs to manually formu-
enhanced models, which use knowledge as data augmentation late rules, which is more labor-intensive. LM-BFF requires expen-
or architecture enhancement during fine-tuning. On the other sive label search due to its search process exponentially depending
hand, even if the task-specific knowledge is already contained in on the number of categories.
knowledge-enhanced PLMs such as LUKE, KnowBERT SpanBERT Secondly, essentially attributed to to the difference of answer
and MTB , it is difficult for fine-tuning to stimulate the knowledge form, our KnowPrompt and LM-BFF is model-agnostic and can
for downstream tasks. Overall, we believe that the development be plugged into different kinds of PLMs (As show in Figure 3, our
of prompt-tuning is imperative and KnowPrompt is a simple and method can adopted on GPT-2), while PTR fails to generalize to
effective prompt-tuning paradigm for RE. generative LMs due to it’s nultiple discontinuous [MASK] prediction.
Low-Resource Result. From Table 4, KnowPrompt appears to Thirdly, above experiments, demonstrates that KnowPrompt is
be more beneficial in low-resource settings. We find that Know- comprehensively comparable to the PTR, and can perform better
Prompt consistently outperforms the baseline method Fine-tuning, in low-resource scenarios. Especially for DialogRE, a multi-label
GDPNet, and PTR in all datasets, especially in the 8-shot and 16- classification task, our method exceeded PTR by approximately 5.4
shot experiments. Specifically, our model can obtain gains of up to points in the standard supervised settings. It may be attributed to
22.4% and 13.2% absolute improvement on average compared with the rule method used by PTR that forcing multiple mask predictions
fine-tuning. As 𝐾 increases from 8 to 32, the improvement in our will confuse multi-label predictions.
KnowPrompt over the other three methods decreases gradually. For In a nutshell, the above analysis proves that KnowPrompt is
32-shot, we think that the number of labeled instances is sufficient. more flexible and widely applicable; meanwhile, it can be aware of
Thus, those rich semantic knowledge injected in our approach may knowledge and stimulate it to serve downstream tasks better.
induce fewer gains. We also observe that GDPNet even performs
worse than Fine-tuning for 8-shot, which reveals that the complex
SOTA model in the standard supervised setting may fall off the
5.4 Ablation Study on KnowPrompt
altar when the data are extremely scarce. Effect of Virtual Answer Words Modules: To prove the effects
Comparison between KnowPrompt and Prompt Tuning of the virtual answer words and its knowledge injection, we conduct
Methods. The typical prompt-tuning methods perform outstand- the ablation study, and the results are shown in Table 5. For -VAW,
ingly on text classification tasks (e.g., sentiment analysis and NLI), we adopt one specific token in the relation label as the label word
such as LM-BFF, but they don’t involve RE application. Thus we without optimization, and for -Knowledge Injection for VAW, we ran-
cannot rerun their code for RE tasks. To our best knowledge, PTR is domly initialize the virtual answer words to conduct optimization.
the only method that uses prompts for RE, which is a wonderful job Specifically, removing the knowledge injection for virtual answer
words has the most significant effect, causing the relation F1 score
Table 4: Low-resource RE performance of 𝐹 1 scores (%) on different test sets. We use 𝐾 = 8, 16, 32 (# examples per class) for
few-shot experiments. Subscript in red represents the advantages of KnowPrompt over the results of Fine-tuning.

Low-Resource Setting
Split Methods SemEval DialogRE† TACRED TACRED-Revisit Re-TACRED Average
Fine-tuning 41.3 29.8 12.2 13.5 28.5 25.1
GDPNet 42.0 28.6 11.8 12.3 29.0 24.7
K=8
PTR 70.5 35.5 28.1 28.7 51.5 42.9
KnowPrompt 74.3 (+33.0) 43.8 (+14.0) 32.0 (+19.8) 32.1 (+18.6) 55.3 (+26.8) 47.5 (+22.4)
Fine-tuning 65.2 40.8 21.5 22.3 49.5 39.9
GDPNet 67.5 42.5 22.5 23.8 50.0 41.3
K=16
PTR 81.3 43.5 30.7 31.4 56.2 48.6
KnowPrompt 82.9 (+17.7) 50.8 (+10.0) 35.4 (+13.9) 33.1 (+10.8) 63.3 (+13.8) 53.1 (+13.2)
Fine-tuning 80.1 49.7 28.0 28.2 56.0 48.4
GDPNet 81.2 50.2 28.8 29.1 56.5 49.2
K=32
PTR 84.2 49.5 32.1 32.4 62.1 52.1
KnowPrompt 84.8 (+4.7) 55.3 (+3.6) 36.5 (+8.5) 34.7 (+6.5) 65.0 (+9.0) 55.3 (+6.9)

Table 5: Ablation study on SemEval, VAW and VTW refers 6 ANALYSIS AND DISCUSSION
to virtual answer words and type words.
6.1 Can KnowPrompt Applied to Other LMs?
Since we focus on MLM (e.g., RoBERTa) in the main experiments,
Method K=8 K=16 K=32 Full
we further extend our KnowPrompt to autoregressive LMs like
KnowPrompt 74.3 82.9 84.8 90.2 GPT-2. Specifically, we directly append the prompt template with
-VAW 68.2 72.7 75.9 85.2 [MASK] at the end of the input sequence for GPT-2. We further ap-
-Knowledge Injection for VAW 52.5 78.0 80.2 88.0 ply the relation embedding head by extending the word embedding
-VTW 72.8 80.3 82.9 88.7 layer in PLMs; thus, GPT2 can generate virtual answer words. We
-Knowledge Injection for VTW 68.8 79.5 81.6 88.5 first notice that fine-tuning leads to poor performance with high
-Structured Constrains 73.5 81.2 83.6 89.3
variance in the low-resource setting, while KnowPrompt based on
RoBERTa or GPT-2 can achieve impressive improvement with low
variance compared with Fine-tuning. As shown in Figure 3, Know-
to drop from 74.3% to 52.5% in the 8-shot setting. It also reveals that Prompt based on GPT-2 obtains the results on par of the model
the injection of semantic knowledge maintained in relation labels with RoBERTa-large, which reveals our method can unearth the
is critical for relation extraction, especially in few-shot scenarios. potential of GPT-2 to make it perform well in natural language
Effect of Virtual Type Words Modules: We also conduct an ab- understanding tasks such as RE. This finding also indicates that our
lation study to validate the effectiveness of the design of virtual method is model-agnostic and can be plugged into different kinds
type words. As for -VTW, we directly remove virtual type words, of PLMs.
and for -Knowledge Injection for VTW, we randomly initialize the
virtual type words to conduct optimization. In the 8-shot setting,
the performance of the directly removing virtual type words drops F1 Over Diff. K Set On TACRED-Revisit
from 74.3 to 72.8, while randomly initialized virtual type words
80
decrease the performance to 68.1, which is much lower than 72.8.
This phenomenon may be related to the noise disturbance caused
by random initialization, while as the instance increase, the impact 60
F1 Score

of knowledge injection gradually diminishes. Despite this, it still

demonstrates that our design of knowledge injection for virtual 40
type words is effective for relation extraction.
Effect of Structured Constrains: Moreover, -Structured Constraints KnowPrompt(RoBERTa)
20 KnowPrompt(GPT2)
refer to the model without implicit structural constraints, which
Fine-Tuning
indicates no direct correlations between entities and relations. The
result demonstrates that structured constraints certainly improve 8 16 32 full
model performance, probably, because they can force the virtual Number of Training K Shots
answer words and type words to interact with each other better.
Figure 3: RoBERT-large vs. GPT-2 results on TACRED-
Overall, the result reveals that all modules contribute to the final
Revisit dataset regarding different K (instances per class).
performance. We further notice that virtual answer words with
knowledge injection are more sensitive to performance and highly
beneficial for KnowPrompt, especially in low-resource settings.
Table 6: Interpreting representation of virtual type words. We obtain the hidden state h[sub], h[obj] through the PLM, then
adopt MLM Head over them to explore which words in the vocabulary is nearest the virtual type words.

Input Example of our KnowPrompt Top 3 words around Top 3 words around
[sub] [obj]
x:[CLS] It sold [𝐸 1 ] ALICO [/𝐸 1 ] to [𝐸 2 ] MetLife Inc [𝐸 2 ] for $ 162 billion. [SEP] organization company
[sub] ALICO [sub] [MASK] [obj] MetLife Inc [obj]. [SEP] group plc
y: "𝑜𝑟𝑔 : 𝑚𝑒𝑚𝑏𝑒𝑟 _𝑜 𝑓 " corporation organization
x: [CLS] [𝐸 1 ] Ismael Rukwago [/𝐸 1 ], a senior [𝐸 2 ] ADF [𝐸 2 ] commander, denied any involvement. [SEP] person intelligence
[sub]Ismael Rukwago [sub] [MASK] [obj] ADF [obj]. [SEP] commander organization
y: "𝑝𝑒𝑟 : 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑒_𝑜 𝑓 " colonel command

Table 7: Comparative statistics between KnowPrompt and express and whether virtual type words can adaptively reflect the
PTR, including (1)Answer Form of prompt; (2) labor- entity types based on context as shown in Table 6. Specifically, we
intensive; (3) MA refers to whether model-agnostic ; (4) ML apply the MLM head over the position of the virtual type words
refers to the ability of multi-label learning; and (4) CC refers to get the output representation and get the top-3 words in vocab-
to the computational complexity. ulary nearest the virtual type words according to the 𝐿2 distance
of embeddings between virtual type words and other words. We
Method # Answer Form # Labor # MA # ML # CC observe that thanks to the synergistic optimization with knowl-
edge constraints, those learned virtual type words can dynamically
LM-BFF single-token normal yes - high
PTR multi-token normal no normal norm adjust according to context and play a reminder role for RE.
Ours single-token small yes better norm

6.2 Interpreting Representation Space of

Virtual Answer Words
Since the embeddings of virtual answer words {ê [𝑟𝑒𝑙 ] (V ′ )} are ini-
tialized with semantic knowledge of relation type itself, and
further learned in continuous space, it is intuitive to explore what
precisely the optimized virtual answer word is. We use t-SNE and
normalization to map the embedding to 3 dimension space and
make a 3D visualization of several sampled virtual answer words
in the TACRED-Revisit dataset. We also get the top3 tokens nearest
the virtual answer word by calculating the 𝐿2 distance between
the embedding of the virtual answer word and the actual word
in the vocabulary. For example, “𝑜𝑟𝑔 : 𝑓 𝑜𝑢𝑛𝑑𝑒𝑑_𝑏𝑦” referred to as Figure 4: A 3D visualization of several relation representa-
green ★ in Figure 4 represents the relation type, which is learned tions (virtual answer words) optimized in KnowPrompt on
by optimizing virtual answer words in vocabulary space, and the TACRED-Revisit dataset using t-SNE and normalization.
“𝑓 𝑜𝑢𝑛𝑑𝑒𝑟 ”,“𝑐ℎ𝑎𝑖𝑟 ” and “𝑐𝑒𝑜” referred to as green • are the words
closest to the it. It reveals that virtual answer words learned in
vocabulary space are semantic and intuitive. To some extent, our 7 CONCLUSION AND FUTURE WORK
proposed virtual answer words are similar to prototypical repre- In this paper, we present KnowPrompt for relation extraction, which
sentation for relation labels. This inspired us to further expand mainly includes knowledge-aware prompt construction and syn-
knowprompt into the field of prototype representation learning ergistic optimization with knowledge constraints. In the future,
in the future, which can also be applied to other NLP tasks with we plan to explore two directions, including: (i) extending to semi-
prompt-tuning. supervised setting to further leverage unlabelled data; (ii) extending
to lifelong learning, whereas prompt should be optimized with adap-
6.3 Interpreting Representation Space of tive tasks.
virtual type words
Since we initialize the virtual type words with the average em- ACKNOWLEDGMENTS
bedding of candidate types of head and tail entities through prior This work is funded by NSFC91846204/NSFCU19B2027, National
knowledge maintained in the relation labels, and synergisti- Key R&D Program of China (Funding No.SQ2018YFC000004), Zhe-
cally optimize them ({ê [𝑠𝑢𝑏 ] , ê [𝑜𝑏 𝑗 ] }) with virtual answer words jiang Provincial Natural Science Foundation of China (No. LGG22F030011),
based on context. To this end, we further conduct further analysis Ningbo Natural Science Foundation (2021J190).
to investigate that what semantics do the optimized type words
REFERENCES King, Tie-Yan Liu, and Maarten van Steen (Eds.). ACM / IW3C2, 1830–1838.
[1] Christoph Alt, Aleksandra Gabryszak, and Leonhard Hennig. 2020. TACRED https://doi.org/10.1145/3366423.3380252
Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task. In [25] Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham
Proceedings of ACL 2020. Neubig. 2021. Towards a Unified View of Parameter-Efficient Transfer Learning.
[2] Xuefeng Bai, Yulong Chen, Linfeng Song, and Yue Zhang. 2021. Semantic Repre- CoRR abs/2110.04366 (2021). arXiv:2110.04366 https://arxiv.org/abs/2110.04366
sentation for Dialogue Modeling. In Proceedings of ACL/IJCNLP 2021. [26] Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid
[3] Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski. Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, and Stan
2019. Matching the Blanks: Distributional Similarity for Relation Learning. In Szpakowicz. 2010. SemEval-2010 Task 8: Multi-Way Classification of Seman-
Proceedings of ACL/IJCNLP 2019. tic Relations between Pairs of Nominals. In Proceedings of SemEval. 33–38.
[4] Anson Bastos, Abhishek Nadgeri, Kuldeep Singh, Isaiah Onando Mulang, Saeedeh https://www.aclweb.org/anthology/S10-1006/
Shekarpour, Johannes Hoffart, and Manohar Kaul. 2021. RECON: Relation Extrac- [27] Shengding Hu, Ning Ding, Huadong Wang, Zhiyuan Liu, Juanzi Li, and Maosong
tion using Knowledge Graph Context in a Graph Neural Network. In Proceedings Sun. 2021. Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt
of the Web Conference 2021. 1673–1685. Verbalizer for Text Classification. CoRR abs/2108.02035 (2021). arXiv:2108.02035
[5] Matthias Baumgartner, Wen Zhang, Bibek Paudel, Daniele Dell’Aglio, Huajun https://arxiv.org/abs/2108.02035
Chen, and Abraham Bernstein. 2018. Aligning Knowledge Base and Document [28] Shengding Hu, Ning Ding, Huadong Wang, Zhiyuan Liu, Juanzi Li, and Maosong
Embedding Models Using Regularized Multi-Task Learning. In International Sun. 2021. Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt
Semantic Web Conference (1) (Lecture Notes in Computer Science, Vol. 11136). Verbalizer for Text Classification. arXiv preprint arXiv:2108.02035 (2021).
Springer, 21–37. [29] Scott B. Huffman. 1995. Learning information extraction patterns from examples.
[6] Eyal Ben-David, Nadav Oved, and Roi Reichart. 2021. PADA: A Prompt-based In Connectionist, Statistical, and Symbolic Approaches to Learning for Natural
Autoregressive Approach for Adaptation to Unseen Domains. arXiv preprint Language Processing.
arXiv:2102.12206 (2021). [30] Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and
[7] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Omer Levy. 2020. SpanBERT: Improving Pre-training by Representing and
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Predicting Spans. Trans. Assoc. Comput. Linguistics 8 (2020), 64–77. https:
Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, //transacl.org/ojs/index.php/tacl/article/view/1853
Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, [31] Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale
Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin for Parameter-Efficient Prompt Tuning. arXiv preprint arXiv:2104.08691 (2021).
Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya https://arxiv.org/abs/2104.08691
Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In [32] Juan Li, Ruoxu Wang, Ningyu Zhang, Wen Zhang, Fan Yang, and Huajun Chen.
Proceedings of NeurIPS 2020. 2020. Logic-guided Semantic Representation Learning for Zero-Shot Relation
[8] Mary Elaine Califf and Raymond J. Mooney. 1999. Relational Learning of Pattern- Classification. In Proceedings of COLING. 2967–2978.
Match Rules for Information Extraction. In Proceedings of AAAI. AAAI Press / [33] Pengfei Li, Kezhi Mao, Xuefeng Yang, and Qi Li. 2019. Improving Relation
The MIT Press, 328–334. Extraction with Knowledge-attention. In Proceedings of EMNLP. 229–239.
[9] Jiaoyan Chen, Yuxia Geng, Zhuo Chen, Jeff Z. Pan, Yuan He, Wen Zhang, Ian [34] Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous
Horrocks, and Huajun Chen. 2021. Low-resource Learning with Knowledge Prompts for Generation. In Proceedings of ACL/IJCNLP 2021.
Graphs: A Comprehensive Survey. CoRR abs/2112.10006 (2021). arXiv:2112.10006 [35] Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Gra-
https://arxiv.org/abs/2112.10006 ham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompt-
[10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: ing methods in natural language processing. arXiv preprint arXiv:2107.13586
Pre-training of Deep Bidirectional Transformers for Language Understanding. In (2021).
Proceedings of NAACL-HLT 2019. [36] Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie
[11] Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Hai-Tao Zheng, Tang. 2021. GPT Understands, Too. CoRR abs/2103.10385 (2021). arXiv:2103.10385
Zhiyuan Liu, Juanzi Li, and Hong-Gee Kim. 2021. Prompt-Learning for Fine- https://arxiv.org/abs/2103.10385
Grained Entity Typing. CoRR abs/2108.10604 (2021). arXiv:2108.10604 https: [37] Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp.
//arxiv.org/abs/2108.10604 2021. Fantastically Ordered Prompts and Where to Find Them: Overcoming
[12] Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Hai-Tao Zheng, Few-Shot Prompt Order Sensitivity. arXiv preprint arXiv:2104.08786 (2021).
Zhiyuan Liu, Juanzi Li, and Hong-Gee Kim. 2021. Prompt-Learning for Fine- [38] Matthew E Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi,
Grained Entity Typing. arXiv preprint arXiv:2108.10604 (2021). Sameer Singh, and Noah A Smith. 2019. Knowledge Enhanced Contextual Word
[13] Bayu Distiawan, Gerhard Weikum, Jianzhong Qi, and Rui Zhang. 2019. Neural Representations. In Proceedings of EMNLP-IJCNLP. 43–54. https://www.aclweb.
relation extraction for knowledge base enrichment. In Proceedings of ACL. 229– org/anthology/D19-1005
240. [39] Meng Qu, Tianyu Gao, Louis-Pascal A. C. Xhonneux, and Jian Tang. 2020. Few-
[14] Bowen Dong, Yuan Yao, Ruobing Xie, Tianyu Gao, Xu Han, Zhiyuan Liu, Fen shot Relation Extraction via Bayesian Meta-learning on Relation Graphs. In
Lin, Leyu Lin, and Maosong Sun. 2020. Meta-Information Guided Meta-Learning Proceedings of ICML 2020.
for Few-Shot Relation Classification. In Proceedings of COLING 2020. [40] Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language
[15] Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making Pre-trained Language models: Beyond the few-shot paradigm. In Proceeding of CHI. 1–7.
Models Better Few-shot Learners. In Proceedings of ACL. [41] Teven Le Scao and Alexander M. Rush. 2021. How Many Data Points is a Prompt
[16] Tianyu Gao, Xu Han, Zhiyuan Liu, and Maosong Sun. 2019. Hybrid Attention- Worth? CoRR abs/2103.08493 (2021). arXiv:2103.08493 https://arxiv.org/abs/2103.
Based Prototypical Networks for Noisy Few-Shot Relation Classification. In 08493
Proceedings of AAAI. [42] Timo Schick, Helmut Schmid, and Hinrich Schütze. 2020. Automatically Iden-
[17] Tianyu Gao, Xu Han, Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin, and Maosong tifying Words That Can Serve as Labels for Few-Shot Text Classification. In
Sun. 2020. Neural Snowball for Few-Shot Relation Learning. In Proceedings of Proceedings of COLING.
AAAI 2020. [43] Timo Schick and Hinrich Schütze. 2020. It’s Not Just Size That Matters: Small
[18] Yuxian Gu, Xu Han, Zhiyuan Liu, and Minlie Huang. 2021. PPT: Pre- Language Models Are Also Few-Shot Learners. CoRR abs/2009.07118 (2020).
trained Prompt Tuning for Few-shot Learning. CoRR abs/2109.04332 (2021). arXiv:2009.07118 https://arxiv.org/abs/2009.07118
arXiv:2109.04332 https://arxiv.org/abs/2109.04332 [44] Timo Schick and Hinrich Schütze. 2021. Exploiting Cloze-Questions for Few-Shot
[19] Zhijiang Guo, Guoshun Nan, Wei Lu, and Shay B Cohen. 2020. Learning Latent Text Classification and Natural Language Inference. In Proceedings of EACL 2021.
Forests for Medical Relation Extraction.. In IJCAI. 3651–3657. [45] Yongliang Shen, Xinyin Ma, Yechun Tang, and Weiming Lu. 2021. A Trigger-
[20] Zhijiang Guo, Yan Zhang, and Wei Lu. 2019. Attention Guided Graph Convolu- Sense Memory Flow Framework for Joint Entity and Relation Extraction. In
tional Networks for Relation Extraction. In Proceedings of ACL 2019. WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April
[21] Karen Hambardzumyan, Hrant Khachatrian, and Jonathan May. 2021. WARP: 19-23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila
Word-level Adversarial ReProgramming. In Proceedings of ACL/IJCNLP 2021. Zia (Eds.). ACM / IW3C2, 1704–1715. https://doi.org/10.1145/3442381.3449895
[22] Xu Han, Weilin Zhao, Ning Ding, Zhiyuan Liu, and Maosong Sun. 2021. PTR: [46] Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer
Prompt Tuning with Rules for Text Classification. CoRR abs/2105.11259 (2021). Singh. 2020. AutoPrompt: Eliciting Knowledge from Language Models with
arXiv:2105.11259 https://arxiv.org/abs/2105.11259 Automatically Generated Prompts. In Proceedings of EMNLP 2020.
[23] Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, and Maosong [47] George Stoica, Emmanouil Antonios Platanios, and Barnabás Póczos. 2021. Re-
Sun. 2018. FewRel: A Large-Scale Supervised Few-shot Relation Classification TACRED: Addressing Shortcomings of the TACRED Dataset. arXiv preprint
Dataset with State-of-the-Art Evaluation. In Proceedings of EMNLP, 2018. arXiv:2104.08398 (2021). https://arxiv.org/abs/2104.08398
[24] Tom Harting, Sepideh Mesbah, and Christoph Lofi. 2020. LOREM: Language- [48] Zifeng Wang, Rui Wen, Xi Chen, Shao-Lun Huang, Ningyu Zhang, and Yefeng
consistent Open Relation Extraction from Unstructured Text. In WWW ’20: The Zheng. 2020. Finding influential instances for distantly supervised relation
Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, Yennun Huang, Irwin extraction. arXiv preprint arXiv:2009.09841 (2020).
[49] Shanchan Wu and Yifan He. 2019. Enriching Pre-trained Language Model with (41 common relation types and a special “no relation” type). The
Entity Information for Relation Classification. In Proceedings of theCIKM 2019. subject mentions in TACRED are person and organization, while ob-
[50] Tongtong Wu, Xuekai Li, Yuan-Fang Li, Reza Haffari, Guilin Qi, Yujin Zhu, and
Guoqiang Xu. 2021. Curriculum-Meta Learning for Order-Robust Continual ject mentions are in 16 fine-grained types, including date, number,
Relation Extraction. CoRR abs/2101.01926 (2021). arXiv:2101.01926 https://arxiv. etc.
org/abs/2101.01926
[51] Fuzhao Xue, Aixin Sun, Hao Zhang, and Eng Siong Chng. 2021. GDPNet: Refining
TACRED-Revisit: one dataset built based on the original TA-
Latent Multi-View Graph for Relation Extraction. In Proceedings of AAAI 2021. CRED dataset. They find out and correct the errors in the original
[52] Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto. development set and test set of TACRED, while the training set was
2020. LUKE: Deep Contextualized Entity Representations with Entity-aware
Self-attention. In Proceedings of EMNLP 2020. left intact.
[53] Hongbin Ye, Ningyu Zhang, Shumin Deng, Mosha Chen, Chuanqi Tan, Fei Huang, Re-TACRED: another version of TACRED dataset. They address
and Huajun Chen. 2021. Contrastive Triple Extraction with Generative Trans- some shortcomings of the original TACRED dataset, refactor its
former. In Proceedings of AAAI, 2021.
[54] Dian Yu, Kai Sun, Claire Cardie, and Dong Yu. 2020. Dialogue-Based Relation training set, development set and test set. Re-TACRED also modifies
Extraction. In Proceedings of ACL 2020. a few relation types, finally resulting in a dataset with 40 relation
[55] Haiyang Yu, Ningyu Zhang, Shumin Deng, Hongbin Ye, Wei Zhang, and Huajun
Chen. 2020. Bridging Text and Knowledge with Multi-Prototype Embedding for
types.
Few-Shot Relational Triple Extraction. In Proceedings of COLING. International SemEval: a traditional dataset in relation classification contain-
Committee on Computational Linguistics, 6399–6410. https://doi.org/10.18653/ ing 10, 717 annotated examples covering 9 relations with two direc-
v1/2020.coling-main.563
[56] Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant Supervision for tions and one special relation “no_relation”.
Relation Extraction via Piecewise Convolutional Neural Networks. In Proceedings DialogRE: DialogRE is the first human-annotated dialogue-level
of EMNLP 2015. RE dataset. It contains 1,788 dialogues originating from the complete
[57] Ningyu Zhang, Xiang Chen, Xin Xie, Shumin Deng, Chuanqi Tan, Mosha Chen,
Fei Huang, Luo Si, and Huajun Chen. 2021. Document-level Relation Extraction transcripts of a famous American television situation comedy. It is
as Semantic Segmentation. In Proceedings of IJCAI, Zhi-Hua Zhou (Ed.). ijcai.org, multi-label classification, as each entity pair may posses more than
3999–4006. https://doi.org/10.24963/ijcai.2021/551
[58] Ningyu Zhang, Shumin Deng, Zhanling Sun, Xi Chen, Wei Zhang, and Hua-
one relation.
jun Chen. 2018. Attention-Based Capsule Network with Dynamic Routing for
Relation Extraction. In Proceedings of EMNLP 2018. B IMPLEMENTATION DETAILS FOR
[59] Ningyu Zhang, Shumin Deng, Zhanlin Sun, Guanying Wang, Xi Chen, Wei Zhang,
and Huajun Chen. 2019. Long-tail Relation Extraction via Knowledge Graph KNOWPROMPT
Embeddings and Graph Convolution Networks. In Proceedings of NAACL-HLT.
[60] Ningyu Zhang, Qianghuai Jia, Shumin Deng, Xiang Chen, Hongbin Ye, Hui
This section details the training procedures and hyperparameters
Chen, Huaixiao Tou, Gang Huang, Zhao Wang, Nengwei Hua, and Huajun Chen. for each of the datasets. We utilize Pytorch to conduct experiments
2021. AliCG: Fine-grained and Evolvable Conceptual Graph Construction for with 8 Nvidia 3090 GPUs. All optimizations are performed with the
Semantic Search at Alibaba. In Proceedings of KDD. ACM, 3895–3905. https:
//doi.org/10.1145/3447548.3467057 AdamW optimizer with a linear warmup of learning rate over the
[61] Yuhao Zhang, Peng Qi, and Christopher D. Manning. 2018. Graph Convolution first 10% of gradient updates to a maximum value, then linear decay
over Pruned Dependency Trees Improves Relation Extraction. In Proceedings of over the remainder of the training. Gradients are clipped if their
EMNLP, 2018.
[62] Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. norm exceeded 1.0, margin 𝛾, 𝜆 and weight decay on all non-bias
Manning. 2017. Position-aware Attention and Supervised Data Improve Slot parameters are set to 1, 0.001 and 0.01. A grid search is used for
Filling. In Proceedings of EMNLP 2017.
[63] Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D
hyperparameter tuning (maximum values bolded below).
Manning. 2017. Position-aware attention and supervised data improve slot filling.
In Proceedings of EMNLP. 35–45. https://nlp.stanford.edu/pubs/zhang2017tacred. B.1 Standard Supervised Setting
pdf
[64] Hengyi Zheng, Rui Wen, Xi Chen, Yifan Yang, Yunyan Zhang, Ziheng Zhang, The hyper-parameter search space is shown as follows:
Ningyu Zhang, Bin Qin, Xu Ming, and Yefeng Zheng. 2021. PRGC: Potential
Relation and Global Correspondence Based Joint Relational Triple Extraction. In
• learning rate 𝑙𝑟 1 of synergistic optimization for virtual tem-
Proceedings of ACL/IJCNLP 2021. plate and anchor words. [5e-5,1e-4, 2e-4 ]
[65] Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo • learning rate 𝑙𝑟 2 of optimization for overall parameters. [1e-5,
Xu. 2016. Attention-Based Bidirectional Long Short-Term Memory Networks for
Relation Classification. In Proceedings of ACL 2016. 2e-5, 3e-5, 5e-5]
[66] Wenxuan Zhou and Muhao Chen. 2021. An Improved Baseline for Sentence- • number epochs 5 (for dialogre as 20)
level Relation Extraction. CoRR abs/2102.01373 (2021). arXiv:2102.01373 https: • batch size: 16 (for tacrev, retacred and dialogre as 8)
//arxiv.org/abs/2102.01373
[67] Wenxuan Zhou, Hongtao Lin, Bill Yuchen Lin, Ziqi Wang, Junyi Du, Leonardo • max seq length: 256 (for tacrev, retacred and dialogre as 512)
Neves, and Xiang Ren. 2020. NERO: A Neural Rule Grounding Framework for • gradient accumulation steps: 1 (for dialogre as 4)
Label-Efficient Relation Extraction. In WWW ’20: The Web Conference 2020, Taipei,
Taiwan, April 20-24, 2020, Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten
van Steen (Eds.). ACM / IW3C2, 2166–2176. https://doi.org/10.1145/3366423. B.2 Low-Resource Setting
3380282
The hyper-parameter search space is shown as follows:
• learning rate 𝑙𝑟 1 of synergistic optimization for virtual tem-
A DETAILED STATISTICS OF DATASET plate and anchor words: [5e-5,1e-4, 2e-4 ]
For comprehensive experiments, we carry out our experiments • learning rate 𝑙𝑟 2 of optimization for overall parameters: [1e-5,
on five relaction extraction datasets: TACRED [63], TACREV [1], 2e-5, 3e-5, 5e-5]
Re-TACRED [47], SemEval 2010 Task 8 (SemEval) [26], and Dialo- • number of epochs: 30
gRE [54]. A brief introduction to these data is as follows: • batch size: 16 (for tacrev, retacred and dialogre as 8)
TACRED: one large-scale sentence-level relation extraction • max seq length: 256 (for tacrev, retacred and dialogre as 512)
dataset drawn from the yearly TACKBP4 challenge, which con- • gradient accumulation steps: 1 (for dialogre as 4)
tains more than 106K sentences. It involves 42 different relations
C IMPLEMENTATION DETAILS FOR D IMPLEMENTATION DETAILS FOR PTR
FINE-TUNING Since PTR does not conduct experiments on DialogRE in standard
The fine-tuning method is conducted as shown in Figure 2, which supervised setting and SemEval and DialogRE in few-shot settings,
is both equipped with the same entity marker in the raw text for a we rerun its public code to supplement the experiments we de-
fair comparison. The hyper-parameters such as batch size, epoch, scribed above with these data and scenarios. As for SemEval, the
and learning rate are the same as KnowPrompt. experiment process completely follows the original setting in his
code, while for DialogRE, we modify his code to more adapt to the
setting of this data set. The specific hyper-parameters such as batch
size, epoch, and learning rate are the same as KnowPrompt.

Instant download Transformers for Natural Language Processing and Computer Vision, Third Edition Denis Rothman pdf all chapter
67% (3)
Instant download Transformers for Natural Language Processing and Computer Vision, Third Edition Denis Rothman pdf all chapter
55 pages
Translating the Future: Exploring the Impact of Technology and AI on Modern Translation Studies
From Everand
Translating the Future: Exploring the Impact of Technology and AI on Modern Translation Studies
Tian Chuanmao
No ratings yet
Knowprompt: Knowledge-Aware Prompt-Tuning With Synergistic Optimization For Relation Extraction
No ratings yet
Knowprompt: Knowledge-Aware Prompt-Tuning With Synergistic Optimization For Relation Extraction
11 pages
PTR: Prompt Tuning With Rules For Text Classification: Petroni Et Al. 2019
No ratings yet
PTR: Prompt Tuning With Rules For Text Classification: Petroni Et Al. 2019
12 pages
A Survey of Graph Prompting Methods
No ratings yet
A Survey of Graph Prompting Methods
11 pages
Prompt-Learning For Short Text Classification
No ratings yet
Prompt-Learning For Short Text Classification
8 pages
2022.Acl-long.292-Prompt-based Data Augmentation for Low-Resource NLU Tasks
No ratings yet
2022.Acl-long.292-Prompt-based Data Augmentation for Low-Resource NLU Tasks
14 pages
Aipaper2
No ratings yet
Aipaper2
14 pages
2021 emnlp-main 92论文
No ratings yet
2021 emnlp-main 92论文
14 pages
The Power of Scale For Parameter-Efficient Prompt Tuning
No ratings yet
The Power of Scale For Parameter-Efficient Prompt Tuning
15 pages
Prompt Part1
No ratings yet
Prompt Part1
36 pages
KnowledgeEnhanced Prompt Learning for FewShot Text Classification 2024 Multidisciplinary Digital Publishing Institute MDPI
No ratings yet
KnowledgeEnhanced Prompt Learning for FewShot Text Classification 2024 Multidisciplinary Digital Publishing Institute MDPI
12 pages
XPrompt-Exploring the Extreme of Prompt Tuning
No ratings yet
XPrompt-Exploring the Extreme of Prompt Tuning
15 pages
The Power of Scale For Parameter-Efficient Prompt Tuning: Brian Lester Rami Al-Rfou Noah Constant Google Research
No ratings yet
The Power of Scale For Parameter-Efficient Prompt Tuning: Brian Lester Rami Al-Rfou Noah Constant Google Research
13 pages
Controlling The Extraction of Memorized Data From Large Language Models Via Prompt-Tuning
No ratings yet
Controlling The Extraction of Memorized Data From Large Language Models Via Prompt-Tuning
9 pages
An Information-Theoretic Approach To Prompt Engineering Without
No ratings yet
An Information-Theoretic Approach To Prompt Engineering Without
44 pages
2486+_Phuttaamart_et_al.
No ratings yet
2486+_Phuttaamart_et_al.
7 pages
知识提示学习-通过提示将知识图谱植入预训练模型
No ratings yet
知识提示学习-通过提示将知识图谱植入预训练模型
38 pages
Cutting Down On Prompts and Parameters: Simple Few-Shot Learning With Language Models
No ratings yet
Cutting Down On Prompts and Parameters: Simple Few-Shot Learning With Language Models
12 pages
Prefix-Tuning: Optimizing Continuous Prompts For Generation
No ratings yet
Prefix-Tuning: Optimizing Continuous Prompts For Generation
15 pages
GenAI A Systematic Survey of Prompting Techniques 1718255319
No ratings yet
GenAI A Systematic Survey of Prompting Techniques 1718255319
76 pages
Transfer Prompt
No ratings yet
Transfer Prompt
21 pages
3 Paradigm 2: Prompt-Based Learning: Table 2: Example Prompt Designs For Learning From In-Structions
No ratings yet
3 Paradigm 2: Prompt-Based Learning: Table 2: Example Prompt Designs For Learning From In-Structions
10 pages
Learning How To Ask: Querying Lms With Mixtures of Soft Prompts
No ratings yet
Learning How To Ask: Querying Lms With Mixtures of Soft Prompts
11 pages
2023_GPT understands, too_Liu et al_AI Open
No ratings yet
2023_GPT understands, too_Liu et al_AI Open
10 pages
P-Tuning v2
No ratings yet
P-Tuning v2
8 pages
El Poder Del Prompting_ Explorando Técnicas Avanzadas
No ratings yet
El Poder Del Prompting_ Explorando Técnicas Avanzadas
80 pages
2022 Acl-Demo 10
No ratings yet
2022 Acl-Demo 10
9 pages
Asdfasdasd
No ratings yet
Asdfasdasd
9 pages
Los Modelos Basados en Indicaciones Realmente Entienden El Significado de Sus Indicaciones
No ratings yet
Los Modelos Basados en Indicaciones Realmente Entienden El Significado de Sus Indicaciones
45 pages
Prompt-Time_Symbolic_Knowledge_Capture_with_LLM
No ratings yet
Prompt-Time_Symbolic_Knowledge_Capture_with_LLM
8 pages
On Transferability of Prompt Tuning for Natural Language Processing
No ratings yet
On Transferability of Prompt Tuning for Natural Language Processing
21 pages
Prompt-Time_Ontology-Driven_Symbolic_Knowledge_Capture_with_Large_Language_Models
No ratings yet
Prompt-Time_Ontology-Driven_Symbolic_Knowledge_Capture_with_Large_Language_Models
7 pages
G Prompt Syntax
No ratings yet
G Prompt Syntax
5 pages
Context - Tuning
No ratings yet
Context - Tuning
15 pages
Advancement in NLP Paper
No ratings yet
Advancement in NLP Paper
49 pages
Hard Prompts Made Easy: Gradient-Based Discrete Optimization For Prompt Tuning and Discovery
No ratings yet
Hard Prompts Made Easy: Gradient-Based Discrete Optimization For Prompt Tuning and Discovery
15 pages
29809-Article Text-33863-1-2-20240324
No ratings yet
29809-Article Text-33863-1-2-20240324
9 pages
Online Adaptation of Language Models With A Memory of Amortized Contexts
No ratings yet
Online Adaptation of Language Models With A Memory of Amortized Contexts
14 pages
Cracking the Java Coding Interview: A Comprehensive Guide to Algorithmic Problem Solving
From Everand
Cracking the Java Coding Interview: A Comprehensive Guide to Algorithmic Problem Solving
Aarav Joshi
No ratings yet
Ali 等 - 2024 - Prompt-SAW Leveraging Relation-Aware Graphs for Textual Prompt Compression
No ratings yet
Ali 等 - 2024 - Prompt-SAW Leveraging Relation-Aware Graphs for Textual Prompt Compression
16 pages
2406.06608v4
No ratings yet
2406.06608v4
77 pages
68 LLM Informed Discrete Promp
No ratings yet
68 LLM Informed Discrete Promp
6 pages
Fluentprompt
No ratings yet
Fluentprompt
12 pages
Ontology Enhanced Prompt Tuning for Few Shot Learning WWW2022
No ratings yet
Ontology Enhanced Prompt Tuning for Few Shot Learning WWW2022
11 pages
Lecture 7
No ratings yet
Lecture 7
66 pages
2021 Acl-Long 172
No ratings yet
2021 Acl-Long 172
15 pages
Efficient Prompting Methods For Large Language Models - A Survey
100% (1)
Efficient Prompting Methods For Large Language Models - A Survey
18 pages
Discourse-Aware Soft Prompting For Text Generation
No ratings yet
Discourse-Aware Soft Prompting For Text Generation
20 pages
Prompting - Survey on Prompting Techniques in LLMs
No ratings yet
Prompting - Survey on Prompting Techniques in LLMs
10 pages
2210.06656
No ratings yet
2210.06656
8 pages
Switchprompt: Learning Domain-Specific Gated Soft Prompts For Classification in Low-Resource Domains
No ratings yet
Switchprompt: Learning Domain-Specific Gated Soft Prompts For Classification in Low-Resource Domains
7 pages
ACL - 2021 - Xiang Lisa Li - Prefix-Tuning Optimizing Continuous Prompts For Generation
No ratings yet
ACL - 2021 - Xiang Lisa Li - Prefix-Tuning Optimizing Continuous Prompts For Generation
16 pages
2308.02357v1
No ratings yet
2308.02357v1
15 pages
2211.05994v4 (1)
No ratings yet
2211.05994v4 (1)
14 pages
IE-Complex Relation Extraction- Challenges and Opportunities
No ratings yet
IE-Complex Relation Extraction- Challenges and Opportunities
7 pages
Auto-Debias: Debiasing Masked Language Models With Automated Biased Prompts
No ratings yet
Auto-Debias: Debiasing Masked Language Models With Automated Biased Prompts
12 pages
Summary
No ratings yet
Summary
13 pages
IJSRED Paper SupervisedPromptEngineering ALiteratureReview
No ratings yet
IJSRED Paper SupervisedPromptEngineering ALiteratureReview
9 pages
T-BERTSum Topic-Aware Text Summarization Based on BERT
No ratings yet
T-BERTSum Topic-Aware Text Summarization Based on BERT
12 pages
BertNet Harvesting Knowledge Graphs From Pretrained Language Models
No ratings yet
BertNet Harvesting Knowledge Graphs From Pretrained Language Models
13 pages
Instant download [EARLY RELEASE] Quick Start Guide to Large Language Models: Strategies and Best Practices for using ChatGPT and Other LLMs Sinan Ozdemir pdf all chapter
100% (4)
Instant download [EARLY RELEASE] Quick Start Guide to Large Language Models: Strategies and Best Practices for using ChatGPT and Other LLMs Sinan Ozdemir pdf all chapter
66 pages
Large Language Models
No ratings yet
Large Language Models
10 pages
Graph Neural Networks for Antisocial Behavior
No ratings yet
Graph Neural Networks for Antisocial Behavior
14 pages
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
No ratings yet
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
7 pages
Emotions Detection From Messages Using Machine Learning: Abstract
No ratings yet
Emotions Detection From Messages Using Machine Learning: Abstract
4 pages
MLSys Class LLM Introduction
No ratings yet
MLSys Class LLM Introduction
43 pages
3.building A Knowledge Graph To Enrich Chatgpt Responses in Manufacturing Service Discovery
No ratings yet
3.building A Knowledge Graph To Enrich Chatgpt Responses in Manufacturing Service Discovery
28 pages
Stock Price Prediction and Analysis Using Machine Learning Techniques
100% (1)
Stock Price Prediction and Analysis Using Machine Learning Techniques
8 pages
Literature Review On Application of Natural Language Processing and Machine Learning Techniques For Risk Prediction of Mucormycosis
100% (1)
Literature Review On Application of Natural Language Processing and Machine Learning Techniques For Risk Prediction of Mucormycosis
13 pages
Unit - 3
No ratings yet
Unit - 3
55 pages
GEN AI
No ratings yet
GEN AI
17 pages
ChatGPT MASTERY 12 Books in 1 Unlocki... (Z-Library)
No ratings yet
ChatGPT MASTERY 12 Books in 1 Unlocki... (Z-Library)
161 pages
Text Generation (Final)
No ratings yet
Text Generation (Final)
36 pages
50 LLM Interview Questions
No ratings yet
50 LLM Interview Questions
56 pages
Get Real World Natural Language Processing 1st Edition Masato Hagiwara free all chapters
100% (2)
Get Real World Natural Language Processing 1st Edition Masato Hagiwara free all chapters
37 pages
Neuspell: A Neural Spelling Correction Toolkit
No ratings yet
Neuspell: A Neural Spelling Correction Toolkit
7 pages
L1 NLP BM Basics
No ratings yet
L1 NLP BM Basics
120 pages
Bertgcn: Transductive Text Classification by Combining GCN and Bert
No ratings yet
Bertgcn: Transductive Text Classification by Combining GCN and Bert
7 pages
2410.21091v1
No ratings yet
2410.21091v1
14 pages
Interactive Dense Retrieval and Query Refinement Systems_ a Synergistic Approach to Information Retrieval
No ratings yet
Interactive Dense Retrieval and Query Refinement Systems_ a Synergistic Approach to Information Retrieval
22 pages
Forecasting Cryptocurrency Returns From Sentiment Signals: An Analysis of BERT Classifiers and Weak Supervision
No ratings yet
Forecasting Cryptocurrency Returns From Sentiment Signals: An Analysis of BERT Classifiers and Weak Supervision
29 pages
#Metoomaastricht: Building A Chatbot To Assist Survivors of Sexual Harassment
No ratings yet
#Metoomaastricht: Building A Chatbot To Assist Survivors of Sexual Harassment
19 pages
Fake News Detection Report
No ratings yet
Fake News Detection Report
18 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
Fake News Detection and Fact Verification Research Paper
No ratings yet
Fake News Detection and Fact Verification Research Paper
2 pages
NLP Unit Iv
No ratings yet
NLP Unit Iv
24 pages
Towards Spontaneous Style Modeling With Semi-Supervised Pre-Training For Conversational Text-to-Speech Synthesis
No ratings yet
Towards Spontaneous Style Modeling With Semi-Supervised Pre-Training For Conversational Text-to-Speech Synthesis
5 pages
2305 File Paper
No ratings yet
2305 File Paper
7 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
5 pages

KnowPrompt

Uploaded by

KnowPrompt

Uploaded by

KnowPrompt: Knowledge-aware Prompt-tuning with

Synergistic Optimization for Relation Extraction

Chuanqi Tan, Fei Huang Luo Si Huajun Chen∗

Alibaba Group Alibaba Group Zhejiang University

[CLS] Steve Jobs , co-founder of Apple . [SEP]

Standard Supervised Setting

of knowledge injection gradually diminishes. Despite this, it still

6.2 Interpreting Representation Space of

You might also like