KnowledgeEnhanced Prompt Learning for FewShot Text Classification 2024 Multidisciplinary Digital Publishing Institute MDPI
KnowledgeEnhanced Prompt Learning for FewShot Text Classification 2024 Multidisciplinary Digital Publishing Institute MDPI
cognitive computing
Article
Knowledge-Enhanced Prompt Learning for Few-Shot
Text Classification
Jinshuo Liu and Lu Yang *
Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of
Cyber Science and Engineering, Wuhan University, Wuhan 430072, China; [email protected]
* Correspondence: [email protected]
Abstract: Classification methods based on fine-tuning pre-trained language models often require a
large number of labeled samples; therefore, few-shot text classification has attracted considerable
attention. Prompt learning is an effective method for addressing few-shot text classification tasks
in low-resource settings. The essence of prompt tuning is to insert tokens into the input, thereby
converting a text classification task into a masked language modeling problem. However, constructing
appropriate prompt templates and verbalizers remains challenging, as manual prompts often require
expert knowledge, while auto-constructing prompts is time-consuming. In addition, the extensive
knowledge contained in entities and relations should not be ignored. To address these issues, we
propose a structured knowledge prompt tuning (SKPT) method, which is a knowledge-enhanced
prompt tuning approach. Specifically, SKPT includes three components: prompt template, prompt
verbalizer, and training strategies. First, we insert virtual tokens into the prompt template based on
open triples to introduce external knowledge. Second, we use an improved knowledgeable verbalizer
to expand and filter the label words. Finally, we use structured knowledge constraints during the
training phase to optimize the model. Through extensive experiments on few-shot text classification
tasks with different settings, the effectiveness of our model has been demonstrated.
PLMs and various downstream tasks, since it converts downstream tasks into pre-training
tasks for PLMs. Prompt learning is more effective in low-resource settings because there
are not enough training samples to specify the behavior of the model; therefore, using
text prompts to push the model in the correct direction is particularly effective. One of
the typical ways to use prompt learning is to formalize the text classification task into a
masked language modeling problem, which can better activate the ability of the masked
language model. Additionally, because most of the prompt learning methods usually keep
the parameters of a PLM fixed, they require fewer training parameters, leading to reduced
training costs.
Although traditional prompt learning methods have addressed the shortcomings of
fine-tuning PLMs, there are still other issues: on one hand, it is still hard to construct
appropriate prompt templates, because manual prompt templates often require expert
knowledge [8], while auto-constructing prompt templates often requires significant compu-
tation cost [9]. On the other hand, the input text includes semantic knowledge as well as
structured knowledge, which is overlooked by existing prompt learning methods.
To address these issues, we propose SKPT for few-shot text classification. Specifi-
cally, SKPT includes three components: prompt template construction, prompt verbalizer
construction, and training strategies. First, we extract open entities and relations through
open information extraction (OpenIE) [10]. Based on the semantic information and the
location of open triples, we propose a structured knowledge template and initialize it.
Second, we use an improved knowledgeable verbalizer to expand and filter the label words.
Finally, we use structured knowledge constraints during the training phase to optimize
the embedding. We conduct experiments on two topic text classification datasets, AG
News [11] and DBpedia [12]. The results demonstrate the effectiveness of SKPT, and we
show the effectiveness of each component through an ablation study.
The main contributions of this paper are as follows:
• We propose a knowledge-enhanced prompt learning method SKPT for few-shot text
classification. Based on open triples, we insert learnable virtual tokens into the prompt
template to introduce external knowledge.
• We use an improved knowledgeable verbalizer, which utilizes external knowledge
bases to expand each class label into a set of label words. We filter the out-of-vocabulary
words for the PLM and assign learnable weights to different label words for training.
• We apply structured knowledge constraints during the training phase through the
specific loss function.
• We perform experiments on two text classification benchmark datasets to illustrate the
effectiveness of SKPT, especially in low-resource settings.
2. Related Works
Our research goal is to enhance prompt learning with the knowledge to address the
few-shot text classification problem. Therefore, we focus on the existing research on both
prompt learning methods and knowledge-enhanced methods.
2.1. Prompt-Tuning
Many large-scale pre-trained models have been open-sourced in the current research
domain, but fine-tuning is still required to adapt PLMs to target downstream tasks. How-
ever, there are still some issues in fine-tuning. On one hand, fine-tuning usually requires
additional network structures for different downstream tasks; for example, we need to
add a classifier for text classification tasks. Therefore, fine-tuning has weak generalization
ability for different tasks. On the other hand, the amount of parameters that are required to
be trained during the fine-tuning process is still considerable.
Since the introduction of GPT-3 [7], prompt-based learning has attracted significant
attention and has achieved remarkable performance in a variety of natural language
processing tasks. By using prompt information, we can formalize a downstream task into
a pre-training task of a pre-trained model. Since then, numerous effective approaches
Big Data Cogn. Comput. 2024, 8, 43 3 of 12
have been developed for prompt learning [13,14]. Schick et al. [15] first propose a prompt
tuning method based on the manual prompt template, which obtains excellent results on
few-shot classification tasks. Prompt learning methods can be categorized into two types:
cloze prompts [16], which insert a slot in the middle of the input text separately; and prefix
prompts [17,18], where the prompt template token comes before the input text. Due to the
significant time and experience required for manually designing templates, and the fact
that even experienced prompt designers are unable to manually discover the best prompt
information [19], some work has begun to explore automatic search protocols for prompt
templates and verbalizers.
Shin et al. [9] propose a gradient-guided search to automatically generate templates
and label words. Gao Tianyu et al. [20] propose a prompt-based method with automatically
searched prompts, which can also select task demonstrations in the context of the input.
However, the quality of prompts from an automatic search in a discrete space is often not
the best, and automatic search requires a large amount of computing resources.
Recently, some external knowledge-enhanced prompt learning methods have also
been proposed. Liu et at. [21] use external knowledge to design knowledge-enhanced
prompts. KP4SR [22] is proposed to use structured knowledge to generate prompts for
sequential recommendation tasks. Hu et al. [23] propose a knowledge-enhanced prompt
method named KPT, which expands label words through external knowledge bases. Since
the key to prompt learning is to design an appropriate prompt template and prompt
verbalizer based on the verbalizer of KPT, we propose a new knowledge-enhanced prompt
learning method for few-shot text classification. We integrate knowledge into the prompt
template as well as the prompt verbalizer to optimize prompt learning.
during the prompt tuning phase, the core idea behind both approaches is to enhance the
power of pre-trained models with external knowledge for downstream tasks.
3. Methods
The model presented in this paper, named SKPT, consists of three main components:
knowledge-enhanced prompt template, knowledge-enhanced verbalizer, and training
strategies. The model is illustrated in Figure 1. First, we extract open entities and relations
from text data through open information extraction (OpenIE) [10] and propose a structured
knowledge prompt template based on these triples, which initializes the template embed-
ding by introducing prior knowledge. Second, we use the improved knowledgable prompt
tuning verbalizer to expand class labels into a set of label words with different levels and
perspectives and then filter them. On one hand, we filter the out-of-vocabulary (OOV)
words for the PLM. On the other hand, we assign learnable weights to different label words
for training. Finally, we design a loss function under structured knowledge constraints
based on the classic translation model TransE and refine the model parameters through the
incorporation of context.
Figure 2. Example of triple extraction. On the top is an unstructured input text; at the bottom are
some example triples corresponding to the original text.
We input a sentence x, which includes n tokens, which is denoted as x = {v1, v2, v3, . . . , vn }.
After relation extraction, we obtain triples {vh , vt , vr }. Among them, vh represents the head
entity, vt represents the tail entity, and vr represents the relation. Both entities and relations
can be composed of multiple tokens, which are simplified by vh , vt , and vr . Through the
extraction process, we can obtain two types of useful information: the entities and relations
themselves, and the positions of entities and relations in the original input.
We introduce virtual template tokens based on the manual template to incorporate
external knowledge. In detail, we insert a token on one side of the located head entity
vh , tail entity vt , and relation vr as a part of our prompt template. For instance, when the
extracted triple is (“Barack Obama”, “gave”, “his speech”), we insert the virtual token
[entity] before the head entity “Barack Obama” and the tail entity “his speech”, as well
as the virtual token [relation] before the relation “gave”. The use of virtual tokens in the
prompt template allows for the introduction of prior knowledge as well as enables the
incorporation of structured information between relations and entities during the training
process. This is illustrated in Figure 3.
The template tokens we inserted are virtual and need to be initialized first. The process
of embedding initialization is the process of introducing prior knowledge. The task is text
classification, and open triples, to some extent, contain the main information of the input
Big Data Cogn. Comput. 2024, 8, 43 6 of 12
text. Therefore, triples also contain the category information of the entire sentence, to
a certain extent. For example, in the sentence “Swimming is a very effective exercise.”,
after performing open information extraction, we know the head entity is “swimming”,
the tail entity is “exercise” and the relation is “is”. Both the head and tail entity point to the
class “sports”.
For entities, we only need to consider whether their categories are included in the
class labels set Y . The distributions of vh and vt over Y are denoted as ϕh and ϕt , and they
are computed based on normalized statistical data. In detail, we encode the texts of entity
vh and all the category texts with RoBERTa, then we compute the cosine similarity for vh
across all categories and normalize these similarities to obtain ϕh . The calculation process
for ϕt is also the same. We use the pre-trained language model to encode the class labels set
Y , so we obtain the class labels embedding set PLM(Y ). Therefore, we obtain the weighted
embedding eh and et of virtual tokens vh and vt , and they are initialized as follows:
eh = ϕh · PLM (Y ) (2)
et = ϕt · PLM(Y ) (3)
1
PPLM ([ MASK ] = vo | x prompt ) = ∑ PPLM ([ MASK] = v|x prompt )
|Vno | v∈V
(4)
no
Big Data Cogn. Comput. 2024, 8, 43 7 of 12
Second, in few-shot learning, we can calculate the impact of each label word on the
prediction results. Furthermore, the crucial aspect of the verbalizer’s filtering algorithm
is to retain high-quality label words while removing low-quality ones. Consequently, we
provide each label word in label words set Vy with a wv , which is a learnable weight in the
training phase and is fixed in the testing phase. The predicted label ŷ is as follows:
exp(h(y| x prompt ))
ŷ = argmaxy∈Y , (5)
∑y′ exp(h(y′ | x prompt ))
where h(y| x prompt ) is
4. Experiments
In this section, we conduct a series of experiments with different settings and provide
explanations for benchmark datasets and comparative models.
Big Data Cogn. Comput. 2024, 8, 43 8 of 12
4.1. Datasets
Our model is evaluated using two topic classification benchmark datasets, AG News [11]
and DBpedia [12]. AG News: AG News is a topic classification dataset about news, where
each sample contains a headline and content body. In this dataset, samples have to be
classified as World, Sports, Business, and Science. Each class contains 30,000 training
samples and 1900 testing samples. DBpedia: The DBpedia ontology dataset includes
fourteen non-overlapping categories selected from DBpedia 2014. Every class contains
40,000 training samples and 5000 testing samples.
We define labeled samples as training set Dtrain and unlabeled samples as testing set
Dtest . We randomly select data from the training set in the form of k-shot and n-ways to
construct the few-shot training dataset Dtrain . K-shot represents the number of samples in
each category; here, we set k as 5, 10, and 20 in our main experiments. N-ways represents
the number of categories in the dataset; for example, n is 14 for the DBpedia dataset, and for
the AG News dataset, n is 4. Meanwhile, the number of samples in the validation set Dval ,
and the training set remains the same. The testing set Dtest is the original testing set of the
benchmark dataset.
4.3. Baselines
Fine-tuning: Fine-tuning is the further training of PLMs using a few sample training
data to adapt to new tasks or domains. The traditional fine-tuning method is to input the
embedding of [CLS] into the classification layer for classification.
Prompt-tuning: We choose the typical prompt learning method PET as an example of
prompt-tuning. The prompt template of PET [15] uses a manual template. Further, for its
verbalizer, there is only one label word corresponding to a category. In order to represent the
general prompt-tuning method, we do not include PET’s additional optimization method
in the experiment.
Knowledgeable Prompt-tuning (KPT) [23]: This is a typical method of knowledge-
enhanced prompt-tuning, which focuses on incorporating external knowledge into the
prompt verbalizer. Specifically, it utilizes external knowledge bases to introduce external
knowledge, expand the label words, and filter the prompt verbalizer. As our model extends
and improves some of its methods, we consider KPT as one of the baselines.
5. Results
5.1. Main Results
By comparing our model with the baseline methods through experiments, we demon-
strate the effectiveness of SKPT in the few-shot text classification task.
From Table 1, we find that SKPT outperforms fine-tuning, prompt-tuning, and KPT in
most of the experiments, especially in the 5 shot and 10 shot experiments, which proves the
effectiveness of our method.
Big Data Cogn. Comput. 2024, 8, 43 9 of 12
prompt template, Model1 achieves less impressive results, suggesting the importance of
incorporating knowledge into the template. For most of the results, Model2 achieves
better performance among these three comparison models, suggesting that the knowledge
verbalizer may not have a significant effect on the datasets.
6. Conclusions
In this paper, we present SKPT for few-shot text classification, which enhances knowl-
edge in three phases. First, we insert structured knowledge prompt template tokens into a
manual template based on open triples, and these virtual template tokens are initialized
by prior knowledge. Second, we use the improved knowledgeable verbalizer to expand
the label words based on external knowledge bases. Third, we use structured knowledge
constraints during the training phase. We find that SKPT achieves good results in the
few-shot text classification task. Our work achieved the highest F1 score compared with
baselines, especially in low-resource settings.
However, our research still has certain limitations. We introduce knowledge at three
stages: constructing the prompt template, constructing the prompt verbalizer, and training.
This leads to an intricate process. In the future, we will extract more effective features
from external knowledge graphs or knowledge bases to enhance the performance of PLMs
in few-shot text classification tasks. In our study, we explore the use of SKPT for text
classification, utilizing a masked language model, since the cloze pre-training task is more
appropriate for classification tasks. However, for other generative tasks, it is more suitable
to use decoder-based pre-trained models. SKPT can also inject structured knowledge into
decoder-based models by modifying the initial manual templates. We will explore SKPT to
enhance other models of different architectures.
Author Contributions: Conceptualization, J.L. and L.Y.; methodology, J.L. and L.Y.; software, L.Y.;
validation, J.L. and L.Y.; formal analysis, J.L. and L.Y.; investigation, J.L. and L.Y.; resources, J.L.
and L.Y.; data curation, L.Y.; writing—original draft preparation, J.L. and L.Y.; writing—review and
editing, L.Y.; visualization, L.Y.; supervision, J.L. and L.Y.; project administration, J.L. All authors
have read and agreed to the published version of the manuscript.
Funding: This research was funded by National Natural Science Foundation of China (grant
no. U193607).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: All data included in this study are available upon request by contacting
the corresponding author.
Conflicts of Interest: We declare that we do not have any commercial or associative interest that
represents a conflict of interest in connection with the work submitted.
References
1. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understand-
ing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186.
Big Data Cogn. Comput. 2024, 8, 43 11 of 12
2. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer
learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67.
3. Rajpurkar, P.; Zhang, J.; Lopyrev, K.; Liang, P. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of
the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 2383–2392.
4. Kowsari, K.; Jafari Meimandi, K.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text classification algorithms: A survey.
Information 2019, 10, 150. [CrossRef]
5. Liu, T.; Hu, Y.; Gao, J.; Sun, Y.; Yin, B. Zero-shot text classification with semantically extended graph convolutional network.
In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021;
pp. 8352–8359.
6. Dong, B.; Yao, Y.; Xie, R.; Gao, T.; Han, X.; Liu, Z.; Lin, F.; Lin, L.; Sun, M. Meta-information guided meta-learning for few-shot
relation classification. In Proceedings of the 28th International Conference on Computational Linguistics, Online, 8–13 December
2020; pp. 1594–1605.
7. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al.
Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901.
8. Shin, R.; Lin, C.; Thomson, S.; Chen, C., Jr.; Roy, S.; Platanios, E.A.; Pauls, A.; Klein, D.; Eisner, J.; Van Durme, B. Constrained
Language Models Yield Few-Shot Semantic Parsers. In Proceedings of the 2021 Conference on Empirical Methods in Natural
Language Processing, Online , 7–11 November 2021; pp. 7699–7715.
9. Shin, T.; Razeghi, Y.; Logan IV, R.L.; Wallace, E.; Singh, S. AutoPrompt: Eliciting Knowledge from Language Models with
Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing
(EMNLP), Online, 16–20 November 2020; pp. 4222–4235.
10. Kolluru, K.; Adlakha, V.; Aggarwal, S.; Mausam; Chakrabarti, S. OpenIE6: Iterative Grid Labeling and Coordination Analysis for
Open Information Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing
(EMNLP), Online, 16–20 November 2020; pp. 3748–3761.
11. Zhang, X.; Zhao, J.; LeCun, Y. Character-level convolutional networks for text classification. Adv. Neural Inf. Process. Syst.
2015, 28 , 649–657.
12. Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; Van Kleef, P.; Auer, S.;
et al. Dbpedia—A large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 2015, 6, 167–195. [CrossRef]
13. Liu, X.; Zheng, Y.; Du, Z.; Ding, M.; Qian, Y.; Yang, Z.; Tang, J. GPT understands, too. arXiv 2023, arXiv:2103.10385.
14. Han, X.; Zhao, W.; Ding, N.; Liu, Z.; Sun, M. Ptr: Prompt tuning with rules for text classification. AI Open 2022, 3, 182–192.
[CrossRef]
15. Schick, T.; Schütze, H. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume,
Online, 19–23 April 2021; pp. 255–269.
16. Razniewski, S.; Yates, A.; Kassner, N.; Weikum, G. Language models as or for knowledge bases. arXiv 2021, arXiv:2110.04888.
17. Li, X.L.; Liang, P. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting
of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
(Volume 1: Long Papers), Virtual, 1–6 August 2021; pp. 4582–4597.
18. Lester, B.; Al-Rfou, R.; Constant, N. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021
Conference on Empirical Methods in Natural Language Processing, Online, 7–11 November 2021; pp. 3045–3059.
19. Lee, L.; Johnson, M.; Toutanova, K.; Roark, B.; Frermann, L.; Cohen, S.B.; Lapata, M. Transactions of the Association for Computational
Linguistics; MIT Press: Cambridge, MA, USA, 2017; Volume 5.
20. Gao, T.; Fisch, A.; Chen, D. Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual
Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language
Processing (Volume 1: Long Papers), Virtual, 1–6 August 2021; pp. 3816–3830.
21. Liu, J.; Liu, A.; Lu, X.; Welleck, S.; West, P.; Le Bras, R.; Choi, Y.; Hajishirzi, H. Generated Knowledge Prompting for Commonsense
Reasoning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
Dublin, Ireland, 22–27 May 2022; pp. 3154–3169.
22. Zhai, J.; Zheng, X.; Wang, C.D.; Li, H.; Tian, Y. Knowledge prompt-tuning for sequential recommendation. In Proceedings of the
31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 Octover–3 November 2023; pp. 6451–6461.
23. Hu, S.; Ding, N.; Wang, H.; Liu, Z.; Wang, J.; Li, J.; Wu, W.; Sun, M. Knowledgeable Prompt-tuning: Incorporating Knowledge
into Prompt Verbalizer for Text Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 2225–2240.
24. Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data.
Adv. Neural Inf. Process. Syst. 2013, 26, 2787–2795.
25. Nickel, M.; Tresp, V.; Kriegel, H.P. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the
28th International Conference on Machine Learning, Bellevue, WA, USA, 8 June–2 July 2011.
26. Yang, B.; Yih, W.t.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases.
arXiv 2014, arXiv:1412.6575.
Big Data Cogn. Comput. 2024, 8, 43 12 of 12
27. Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI
Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; Volume 28.
28. Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings
of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29.
29. Sun, Z.; Deng, Z.; Nie, J.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings
of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019.
30. Zhang, S.; Tay, Y.; Yao, L.; Liu, Q. Quaternion Knowledge Graph Embeddings. In Proceedings of the NeurIPS, Vancouver, BA,
Canada, 8–14 December 2019; pp. 2731–2741.
31. Socher, R.; Chen, D.; Manning, C.D.; Ng, A. Reasoning with neural tensor networks for knowledge base completion. Adv. Neural
Inf. Process. Syst. 2013, 26 , 926–934.
32. Wang, X.; Gao, T.; Zhu, Z.; Zhang, Z.; Liu, Z.; Li, J.; Tang, J. KEPLER: A unified model for knowledge embedding and pre-trained
language representation. Trans. Assoc. Comput. Linguist. 2021, 9, 176–194. [CrossRef]
33. He, B.; Zhou, D.; Xiao, J.; Jiang, X.; Liu, Q.; Yuan, N.J.; Xu, T. BERT-MK: Integrating Graph Contextualized Knowledge into
Pre-trained Language Models. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020,
Online, 16–20 November 2020; pp. 2281–2290.
34. Peters, M.E.; Neumann, M.; Logan, R.; Schwartz, R.; Joshi, V.; Singh, S.; Smith, N.A. Knowledge Enhanced Contextual Word
Representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th
International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019;
pp. 43–54.
35. Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph and text jointly embedding. In Proceedings of the 2014 Conference on
Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–19 October 2014; pp. 1591–1601.
36. Sun, T.; Shao, Y.; Qiu, X.; Guo, Q.; Hu, Y.; Huang, X.J.; Zhang, Z. CoLAKE: Contextualized Language and Knowledge Embedding.
In Proceedings of the 28th International Conference on Computational Linguistics, Online, 8–13 December 2020; pp. 3660–3670.
37. Xu, W.; Fang, M.; Yang, L.; Jiang, H.; Liang, G.; Zuo, C. Enabling language representation with knowledge graph and structured
semantic information. In Proceedings of the 2021 International Conference on Computer Communication and Artificial
Intelligence (CCAI), Guangzhou, China, 7–9 May 2021; pp. 91–96.
38. RelatedWords. RelatedWords. 2021. Available online: https://relatedwords.org/ (accessed on 9 April 2024).
39. Miller, G.A. WordNet. In Proceedings of the Workshop on Speech and Natural Language—HLT ’91, Harriman, NY, USA, 23–26
February 1992. [CrossRef]
40. Speer, R.; Chin, J.; Havasi, C. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI
Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31.
41. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly
Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.