0% found this document useful (0 votes)
6 views

Natural Language Processing in the Era of Large La

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Natural Language Processing in the Era of Large La

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

TYPE Specialty Grand Challenge

PUBLISHED 12 January 2024


DOI 10.3389/frai.2023.1350306

Natural language processing in


OPEN ACCESS the era of large language models
EDITED AND REVIEWED BY
Thomas Hartung,
Johns Hopkins University, United States Arkaitz Zubiaga*
*CORRESPONDENCE
School of Electronic Engineering and Computer Science, Queen Mary University of London, London,
Arkaitz Zubiaga
United Kingdom
[email protected]

RECEIVED 05 December 2023


ACCEPTED 31 December 2023 KEYWORDS
PUBLISHED 12 January 2024
natural language processing, large language models (LLM), language models (LMs),
CITATION specialty grand challenge, generative AI
Zubiaga A (2024) Natural language processing
in the era of large language models.
Front. Artif. Intell. 6:1350306.
doi: 10.3389/frai.2023.1350306 1 Overview
COPYRIGHT
© 2024 Zubiaga. This is an open-access article Since their inception in the 1980s, language models (LMs) have been around for more
distributed under the terms of the Creative
than four decades as a means for statistically modeling the properties observed from natural
Commons Attribution License (CC BY). The use,
distribution or reproduction in other forums is language (Rosenfeld, 2000). Given a collection of texts as input, a language model computes
permitted, provided the original author(s) and statistical properties of language from those texts, such as frequencies and probabilities of
the copyright owner(s) are credited and that
words and surrounding context, which can then be used for different purposes including
the original publication in this journal is cited, in
accordance with accepted academic practice. natural language understanding (NLU), generation (NLG), reasoning (NLR) and, more
No use, distribution or reproduction is broadly, processing (NLP) (Dong et al., 2019). Such statistical approach to modeling natural
permitted which does not comply with these
language has sparked debate for decades between those who argue that language can be
terms.
modeled through the observation and probabilistic representation of patterns, and those
who argue that such an approach is rudimentary and that proper understanding of language
needs grounding in linguistic theories (Mitchell and Krakauer, 2023).
It has only been recently that, as a consequence of the increase in the availability of text
collections and in the access to improved computational resources, large language models
(LLMs) have been introduced in the scientific community by revolutionizing the NLP field
(Min et al., 2023). Following the same foundational intuition as traditional LMs introduced
in the 1980s, LLMs scale up the statistical language properties garnered from large text
collections. Following the same logic of modeling statistical properties of languages as
traditional LMs, researchers have demonstrated that, with today’s computational resources, it
is possible to train much larger LLMs which are trained from huge collections of text that on
occasions can even include almost the entire Web. This is however not without controversy,
not least because use of such large-scale collections of text prioritizes quantity over quality
(Li et al., 2023a), as indeed one loses control of what data is being fed into the model when
the whole Web is being used, which in addition to valuable information contains offensive
content and misinformation (Derczynski et al., 2014; Cinelli et al., 2021; Yin and Zubiaga,
2021).
The surge of LLMs has been incremental since the late 2010s and has come in waves.
Following a wave that introduced word embedding models such as word2vec (Mikolov
et al., 2013) and GloVe (Pennington et al., 2014) for compact representation of words in
the form of embeddings, the first major wave came with the emergence of LLMs built on
top of the Transformer architecture (Vaswani et al., 2017), including BERT (Devlin et al.,
2019), RoBERTa (Liu et al., 2019) and T5 (Raffel et al., 2020). A more recent wave has led to
a surge of models for generative AI including chatbots like ChatGPT, Google Bard, as well
as open source alternatives such as LLaMa (Touvron et al., 2023), Alpaca (Taori et al., 2023)
and Lemur (Xu et al., 2023). These have in turn motivated the creation of different ways
of leveraging these LLMs, including through prompting methods (Liu et al., 2023) such as
Pattern Exploiting Training (PET) (Schick and Schütze, 2021) for few-shot text classification
as well as methods for NLG (Sarsa et al., 2022). An LLM is typically a model which is
pre-trained on existing large-scale datasets, which involves significant computational power
and time, whereas these models can later be fine-tuned to specific domains with less effort
(Bakker et al., 2022).

Frontiers in Artificial Intelligence 01 frontiersin.org


Zubiaga 10.3389/frai.2023.1350306

In recent years, LLMs have demonstrated to achieve state- 2.2 Risk of data contamination
of-the-art performance across many NLP tasks, having in turn
become the de facto baseline models to be used in many Data contamination occurs when “downstream test sets find
experimental settings (Mars, 2022). There is however evidence that their way into the pretrain corpus” (Magar and Schwartz, 2022).
the power of LLMs can also be leveraged for malicious purposes, Where an LLM trained on large collections of text has already
including the use of LLMs to assist with completion of school seen the data it is then given at test time for evaluation, the
assignments by cheating (Cotton et al., 2023), or to generate content model will then exhibit an impressive yet unrealistic performance
that is offensive or spreads misinformation (Weidinger et al., score. Research has in fact shown that data contamination can be
2022). frequent and have a significant impact (Deng et al., 2023; Golchin
The great performance of LLMs has also inevitably provoked and Surdeanu, 2023). It is therefore crucial that researchers ensure
some fear in society that artificial intelligence tools may eventually that the test data has not been seen by an LLM before, for a
take up many people’s jobs (George et al., 2023), hence questioning fair and realistic evaluation. This is however challenging, if not
the ethical implications they may have on society. This has nearly impossible, to figure out with black box models, which again
in turn sparked research, with recent studies suggesting to encourages the use of open source, transparent LLMs.
embrace AI tools as they can in fact support and boost the
performance of, rather than replace, human labor (Noy and Zhang,
2023).
2.3 Bias in LLM models

The use of large-scale datasets for training LLMs also means


2 Limitations and open challenges that those datasets are very likely to contain biased or stereotyped
information, which has been shown that LLMs amplify (Gallegos
The success of LLMs is not without controversy, which et al., 2023; Li et al., 2023b). Research has shown that text
is in turn shaping up ongoing research in NLP and opening generated by LLMs includes stereotypes against women when
up avenues for more research in improving these LLMs. The writing reference letters (Wan et al., 2023), suggesting that LLMs
following are some of the key limitations of LLMs which need in fact amplify gender biases inherent in the training data leading
further exploration. to an increased probability of stereotypical linking between gender
groups and professions (Kotek et al., 2023). Another recent study
(Navigli et al., 2023) has also shown that LLMs exhibit biases
against numerous demographic characteristics, including gender,
2.1 Black box models age, sexual orientation, physical appearance, disability or race,
among others.
After the release of the first major LLM-based chatbot system
that garnered mainstream popularity, OpenAI’s ChatGPT, concerns
emerged around the black box nature of the system. Indeed,
there is no publicly available information on how ChatGPT was
implemented as well as what data they used for training their
2.4 Generation of offensive content
model. From the perspective of NLP researchers, this raises serious
Biases inherent in LLMs are at times exacerbated to even
concerns about the transparency and reproducibility of such model,
generate content that can be deemed offensive (Weidinger et al.,
not only because one does not know what is going on in the model,
2021). Research in this direction is looking at how to best curate the
but also because it hinders reproducibility (Belz et al., 2021). If
training data fed to LLMs to avoid learning offensive samples, as
one runs some experiments using ChatGPT on a particular date,
well as in eliciting generation of those harmful texts to understand
there is no guarantee that somebody else can reproduce those
their origin (Srivastava et al., 2023). This research is highly linked
results at a later date (or, arguably, even on the same date), which
with the point above on bias and fairness in LLMs, and therefore
reduces the validity and potential for impact and generalisability of
both could be studied jointly by looking at the reduction of biases
ChatGPT-based research.
and harm.
To mitigate the impact, and increase our understanding,
Some systems, such as OpenAI’s ChatGPT, acknowledge the
of black box models like ChatGPT, researchers have started
risk of producing offensive content in their terms of service1 :
investigating methods for reverse engineering those models, for
example by trying to find out what data a model may have used “Our Services may provide incomplete, incorrect, or
for training (Shi et al., 2023). offensive Output that does not represent OpenAIs views. If
Luckily, however, there is a recent surge of open source models Output references any third party products or services, it
in the NLP scientific community, which have led to the release doesnt mean the third party endorses or is affiliated with
of models like Facebook’s LLaMa 2 (Touvron et al., 2023) and OpenAI.”
Stanford’s Alpaca (Taori et al., 2023), as well as multilingual models
like BLOOM (Scao et al., 2023). Recent studies have also shown that
the performance of these open source alternatives is often on par
with closed models like ChatGPT (Chen et al., 2023). 1 https://openai.com/policies/terms-of-use

Frontiers in Artificial Intelligence 02 frontiersin.org


Zubiaga 10.3389/frai.2023.1350306

2.5 Privacy discussing a particular topic, then shifting to another unrelated


topic which is not intuitive, or even stating wrong facts. LLM
LLMs can also capture sensitive information retrieved from its hallucination has been defined as “the generation of content
training data. While this information is encoded in embeddings that deviates from the real facts, resulting in unfaithful outputs"
which are not human readable, it has been found (Pan et al., (Maynez et al., 2020; Rawte et al., 2023). Efforts toward better
2020) that an adversarial user can reverse engineer those understanding model hallucination is focusing on different tasks,
embeddings to recover the sensitive information, which can including detection, explanation, and mitigation (Alkaissi and
have damaging consequences for the relevant individuals. While McFarlane, 2023; Zhang et al., 2023), with some initial solutions
research investigating these vulnerabilities of LLMs is still in its proposed to date, such as Retrieval-Augmented Generation (RAG)
infancy, there is awareness of the urgency of such research to make (Lewis et al., 2020).
LLMs robust to privacy attacks (Guo et al., 2022; Rigaki and Garcia,
2023; Shayegani et al., 2023).
2.8 Lack of explainability
2.6 Imperfect accuracy
The complexity of LLM models means that it is often very
Despite initial impressions that LLMs achieve an impressive difficult to understand why it makes certain predictions or produces
performance, a closer look and investigation into model outputs certain outputs. This also means that it is very difficult to provide
shows that there is significant room for improvement. Evaluation explanations on model outputs to system users, which calls for
of LLMs has in turn become a fertile area of research (Chang et al., more investigation into furthering the explainability of LLMs
2023). (Danilevsky et al., 2020; Gurrapu et al., 2023; Zhao et al., 2023).
Aware of the many shortcomings and inaccurate outputs of
LLMs, companies responsible for the production and publication
of major LLMs all have disclaimers about the limitations of their
3 Concluding remarks
models. For example, ChatGPT owner OpenAI acknowledges that:
The introduction and surge in popularity of LLMs has impacted
“Output may not always be accurate. You should not rely and reshaped NLP research. Much of the NLP research and
on Output from our Services as a sole source of truth or factual methods slightly over a decade ago focused on the representation
information, or as a substitute for professional advice." of words using bag-of-words and TF-IDF based methods and the
use of machine learning algorithms such as Logistic Regression or
Support Vector Machine classifiers. The increase in computational
Google also warns2 about the limitations of its LLM-based
capacity to handle large-scale datasets and for more complex
chatbot Bard, as follows:
computing has led to the renaissance of deep learning models
“Bard is an experimental technology and may sometimes and in turn the emergence of LLMs. The latter have shown to
give inaccurate or inappropriate information that doesnt achieve unprecedented performance across a range of downstream
represent Googles views.” NLP tasks, but have also opened up numerous avenues for future
research aiming to tackle the limitations and weaknesses of LLMs.
Much of this research will need to deal with the better curation
“Dont rely on Bards responses as medical, legal, financial, of the data fed to train LLMs, which in the current circumstances
or other professional advice.” has shown to have severe risks in aspects such as fairness, privacy
and harm.

Facebook also has a similar disclaimer3 for its flagship model


LLaMa 2: Author contributions
“Llama 2s potential outputs cannot be predicted in AZ: Writing – original draft, Writing – review & editing.
advance, and the model may in some instances produce
inaccurate, biased or other objectionable responses to user
prompts. Therefore, before deploying any applications of Funding
Llama 2, developers should perform safety testing and tuning
tailored to their specific applications of the model.” The author(s) declare that no financial support was received for
the research, authorship, and/or publication of this article.

2.7 Model hallucination Conflict of interest


Responses and outputs generated by LLMs often deviate from The author declares that the research was conducted in the
common sense, where for example a generated text can start
absence of any commercial or financial relationships that could be
construed as a potential conflict of interest.
2 https://support.google.com/bard/answer/13594961?hl=en The author(s) declared that they were an editorial board
3 https://github.com/facebookresearch/llama/blob/main/MODEL_CARD. member of Frontiers, at the time of submission. This had no impact
md on the peer review process and the final decision.

Frontiers in Artificial Intelligence 03 frontiersin.org


Zubiaga 10.3389/frai.2023.1350306

Publisher’s note organizations, or those of the publisher, the editors and the
reviewers. Any product that may be evaluated in this article, or
All claims expressed in this article are solely those of the claim that may be made by its manufacturer, is not guaranteed or
authors and do not necessarily represent those of their affiliated endorsed by the publisher.

References
Alkaissi, H., and McFarlane, S. I. (2023). Artificial hallucinations in chatgpt: Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., et al.
implications in scientific writing. Cureus 15, 2. doi: 10.7759/cureus.35179 (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural
Inform. Proc.Syst. 33, 9459–9474.
Bakker, M., Chadwick, M., Sheahan, H., Tessler, M., Campbell-Gillingham, L.,
Balaguer, J., et al. (2022). Fine-tuning language models to find agreement among Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., et al. (2019). Roberta: a
humans with diverse preferences. Adv. Neural Inform. Proc. Syst. 35, 38176–38189. robustly optimized bert pretraining approach. arXiv. doi: 10.48550/arXiv.1907.11692
Belz, A., Agarwal, S., Shimorina, A., and Reiter, E. (2021). “A systematic review Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. (2023).
of reproducibility research in natural language processing,” in Proceedings of the 16th Pre-train, prompt, and predict: a systematic survey of prompting methods in
Conference of the European Chapter of the Association for Computational Linguistics: natural language processing. ACM Comput. Surv. 55, 1–35. doi: 10.1145/35
Main Volume. Kerrville, TX: Association for Computational Linguistics, 381–393. 60815
Chen, H., Jiao, F., Li, X., Qin, C., Ravaut, M., Zhao, R., et al. (2023). Chatgpt’s Magar, I., and Schwartz, R. (2022). “Data contamination: From memorization
one-year anniversary: are open-source large language models catching up? arXiv. to exploitation,” in Proceedings of the 60th Annual Meeting of the Association for
doi: 10.48550/arXiv.2311.16989 Computational Linguistics (Volume 2: Short Papers) Kerrville, TX: Association for
Computational Linguistics, 157–165.
Chang, Y., Wang, X., Wang, J., Wu, Y., Zhu, K., Chen, H., et al. (2023). A survey on
evaluation of large language models. arXiv. doi: 10.48550/arXiv.2307.03109 Mars, M. (2022). From word embeddings to pre-trained language models:
a state-of-the-art walkthrough. Appl. Sci. 12, 8805. doi: 10.3390/app121
Cinelli, M., Pelicon, A., Mozetič, I., Quattrociocchi, W., Novak, P. K., and Zollo,
78805
F. (2021). Dynamics of online hate and misinformation. Scient.Rep. 11, 22083.
doi: 10.1038/s41598-021-01487-w Maynez, J., Narayan, S., Bohnet, B., and McDonald, R. (2020). “On faithfulness
and factuality in abstractive summarization,” in Proceedings of the 58th Annual
Cotton, D. R., Cotton, P. A., and Shipway, J. R. (2023). “Chatting and cheating:
Meeting of the Association for Computational Linguistics. Kerrville, TX: Association for
Ensuring academic integrity in the era of chatgpt,” in Innovations in Education and
Computational Linguistics, 1906.
Teaching International (Oxfordshire: Routledge), 1–12.
Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., and Sen, P. (2020). Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word
“A survey of the state of explainable ai for natural language processing,” in Proceedings representations in vector space. arXiv. doi: 10.48550/arXiv.1301.3781
of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational
Min, B., Ross, H., Sulem, E., Veyseh, A. P. B., Nguyen, T. H., Sainz, O., et al. (2023).
Linguistics and the 10th International Joint Conference on Natural Language Processing
Recent advances in natural language processing via large pre-trained language models:
(Association for Computational Linguistics), 447–459.
a survey. ACM Computing Surveys 56, 1–40. doi: 10.1145/3605943
Deng, C., Zhao, Y., Tang, X., Gerstein, M., and Cohan, A. (2023). Investigating
Mitchell, M., and Krakauer, D. C. (2023). The debate over understanding
data contamination in modern benchmarks for large language models. arXiv.
in ais large language models. Proc. National Acad. Sci. 120, e2215907120.
doi: 10.48550/arXiv.2311.09783
doi: 10.1073/pnas.2215907120
Derczynski, L., Bontcheva, K., Lukasik, M., Declerck, T., Scharl, A., Georgiev, G.,
Navigli, R., Conia, S., and Ross, B. (2023). Biases in large language models: origins,
et al. (2014). “Pheme: computing veracity: the fourth challenge of big social data,” in
inventory and discussion. ACM J. Data Inform. Qual. 15, 1–21. doi: 10.1145/3597307
Proceedings of ESWC EU Project Networking (Vienna: Semantic Technology Institute
International). Noy, S., and Zhang, W. (2023). Experimental Evidence on the Productivity Effects of
Generative Artificial Intelligence. Amsterdam: Elsevier Inc.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). “Bert: Pre-training
of deep bidirectional transformers for language understanding,” in Proceedings of the Pan, X., Zhang, M., Ji, S., and Yang, M. (2020). “Privacy risks of general-purpose
2019 Conference of the North American Chapter of the Association for Computational language models,” in 2020 IEEE Symposium on Security and Privacy (SP). San Francisco,
Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), CA: IEEE, 1314-1331.
Kerrville, TX: Association for Computational Linguistics, 4171–4186.
Pennington, J., Socher, R., and Manning, C. D. (2014). “Glove: Global vectors for
Dong, L., Yang, N., Wang, W., Wei, F., Liu, X., Wang, Y., et al. (2019). “Unified word representation,” in Proceedings of the 2014 Conference on Empirical Methods in
language model pre-training for natural language understanding and generation,” in Natural Language Processing (EMNLP). Stanford, CA: Stanford University, 1532–1543.
Advances in Neural Information Processing Systems (Red Hook, NY: Curran Associates,
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., et al. (2020).
Inc.), 32.
Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach.
Gallegos, I. O., Rossi, R. A., Barrow, J., Tanjim, M. M., Kim, S., Dernoncourt, Learn. Res. 21, 5485–5551.
F., et al. (2023). Bias and fairness in large language models: a survey. arXiv.
Rawte, V., Chakraborty, S., Pathak, A., Sarkar, A., Tonmoy, S., Chadha,
doi: 10.48550/arXiv.2309.00770
A., et al. (2023). The troubling emergence of hallucination in large language
George, A. S., George, A. H., and Martin, A. G. (2023). Chatgpt and the future models-an extensive definition, quantification, and prescriptive remediations. arXiv.
of work: a comprehensive analysis of ai’s impact on jobs and employment. Partners doi: 10.18653/v1/2023.emnlp-main.155
Universal Int. Innovat. J. 1, 154–186.
Rigaki, M., and Garcia, S. (2023). A survey of privacy attacks in machine learning.
Golchin, S., and Surdeanu, M. (2023). Time travel in llms: tracing data ACM Comp. Surv. 56, 1–34. doi: 10.1145/3624010
contamination in large language models. arXiv. doi: 10.48550/arXiv.2308.08493
Rosenfeld, R. (2000). Two decades of statistical language modeling: where do we go
Guo, S., Xie, C., Li, J., Lyu, L., and Zhang, T. (2022). Threats to pre-trained language from here? Proc. IEEE 88, 1270–1278. doi: 10.1109/5.880083
models: Survey and taxonomy. arXiv. doi: 10.48550/arXiv.2202.06862
Sarsa, S., Denny, P., Hellas, A., and Leinonen, J. (2022). “Automatic generation
Gurrapu, S., Kulkarni, A., Huang, L., Lourentzou, I., and Batarseh, F. A. (2023). of programming exercises and code explanations using large language models,” in
Rationalization for explainable nlp: a survey. Front. Artif. Intellig. 6, 1225093. Proceedings of the 2022 ACM Conference on International Computing Education
doi: 10.3389/frai.2023.1225093 Research (New York, NY: Association for Computing Machinery), 27–43.
doi: 10.1145/3501385.3543957
Kotek, H., Dockum, R., and Sun, D. (2023). “Gender bias and stereotypes in large
language models,” in Proceedings of The ACM Collective Intelligence Conference (New Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ili, S., Hesslow, D., et al.
York, NY: Association for Computing Machinery), 12–24. (2023). Bloom: A 176b-parameter open-access multilingual language model. arXiv.
Li, M., Zhang, Y., Li, Z., Chen, J., Chen, L., Cheng, N., et al. (2023a). From quantity doi: 10.48550/arXiv.2211.05100
to quality: boosting llm performance with self-guided data selection for instruction
Schick, T., and Schütze, H. (2021). “Exploiting cloze-questions for few-shot text
tuning. arXiv. doi: 10.48550/arXiv.2308.12032
classification and natural language inference,” in Proceedings of the 16th Conference of
Li, Y., Du, M., Song, R., Wang, X., and Wang, Y. (2023b). A survey on fairness in the European Chapter of the Association for Computational Linguistics: Main Volume
large language models. arXiv. doi: 10.48550/arXiv.2308.10149 (Association for Computational Linguistics), 255–269.

Frontiers in Artificial Intelligence 04 frontiersin.org


Zubiaga 10.3389/frai.2023.1350306

Shi, W., Ajith, A., Xia, M., Huang, Y., Liu, D., Blevins, T., et al. (2023). Detecting Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P.-S., Mellor, J.,
pretraining data from large language models. arXiv. doi: 10.48550/arXiv.2310.16789 et al. (2022). “Taxonomy of risks posed by language models,” in Proceedings of
the 2022 ACM Conference on Fairness, Accountability, and Transparency. New
Shayegani, E., Mamun, M. A. A., Fu, Y., Zaree, P., Dong, Y., and Abu-Ghazaleh,
York: Association for Computing Machinery, 214–229. doi: 10.1145/3531146.35
N. (2023). Survey of vulnerabilities in large language models revealed by adversarial
33088
attacks. arXiv. doi: 10.48550/arXiv.2310.10844
Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S.,
Srivastava, A., Ahuja, R., and Mukku, R. (2023). No offense taken: eliciting
et al. (2021). Ethical and social risks of harm from language models. arXiv.
offensiveness from language models. arXiv. doi: 10.48550/arXiv.2310.00892
doi: 10.48550/arXiv.2112.04359
Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., et al. (2023).
Stanford Alpaca: An Instruction-Following Llama Model. Available online at: https:// Xu, Y., Su, H., Xing, C., Mi, B., Liu, Q., Shi, W., et al. (2023). Lemur: Harmonizing
github.com/tatsu-lab/stanford_alpaca (accessed December 1, 2023). natural language and code for language agents. arXiv. doi: 10.48550/arXiv.2310.
06830
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix,
T., et al. (2023). Llama: Open and efficient foundation language models. arXiv. Yin, W., and Zubiaga, A. (2021). Towards generalisable hate speech detection: a
doi: 10.48550/arXiv.2302.13971 review on obstacles and solutions. PeerJ Comp. Sci. 7, e598. doi: 10.7717/peerj-cs.598
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., et al. (2023). Siren’s
(2017). “Attention is all you need,” in Advances in Neural Information Processing song in the ai ocean: a survey on hallucination in large language models. arXiv.
Systems (Red Hook, NY: Curran Associates, Inc.), 30. doi: 10.48550/arXiv.2309.01219
Wan, Y., Pu, G., Sun, J., Garimella, A., Chang, K.-W., and Peng, N. (2023). " kelly is a Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., et al. (2023).
warm person, joseph is a role model": Gender biases in llm-generated reference letters. Explainability for large language models: a survey. arXiv. doi: 10.1145/36
arXiv. doi: 10.18653/v1/2023.findings-emnlp.243 39372

Frontiers in Artificial Intelligence 05 frontiersin.org

You might also like