0% found this document useful (0 votes)
26 views

Comprehending and Reducing LLM Hallucinations

The integration of large language models (LLM) into many artificial intelligence applications shows the best performance in tasks such as text mining, typing, question answering. Despite his success, his LL.M. The biggest concern is the emergence of so-called "hallucinations", especially in text-based systems and Q&As that rely on LL M. These hearings may lead to the spread of misinformation or fraud. This article explains the basics of AI illusions and highlights their importance in AI.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Comprehending and Reducing LLM Hallucinations

The integration of large language models (LLM) into many artificial intelligence applications shows the best performance in tasks such as text mining, typing, question answering. Despite his success, his LL.M. The biggest concern is the emergence of so-called "hallucinations", especially in text-based systems and Q&As that rely on LL M. These hearings may lead to the spread of misinformation or fraud. This article explains the basics of AI illusions and highlights their importance in AI.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL882

Comprehending and Reducing LLM Hallucinations


Harsh1 ; Dr. Shobha T2
1
Student, BMS College of Engineering, Bengaluru
2
Assistant Professor, BMS College of Engineering, Bengaluru

Abstract:- The integration of large language models bias, or misinformation can push the model to produce
(LLM) into many artificial intelligence applications shows positive but uncertain results. The real problem is the
the best performance in tasks such as text mining, typing, incomplete understanding of ideas that leads to abnormal
question answering. Despite his success, his LL.M. The production.
biggest concern is the emergence of so-called
"hallucinations", especially in text-based systems and The outcome of daydreaming in this study involves the
Q&As that rely on LL M. These hearings may lead to the production of written content, such as text or response, that
spread of misinformation or fraud. This article explains reveals reality, relationship, and reality but deviates from or
the basics of AI illusions and highlights their importance is distorted at the pace of the original source interpretation.
in AI. Work involves deploying visualizations to a variety The real truth (23). Investigating efforts based on large
of tasks, including machine translation, surveys, language models (LLM) is important to avoid biases that can
interviews, content writing, LLM maps, and visualization influence decision-making strategies and lead to negative
questions. Additionally, this article explores potential outcomes ( 24 ).
strategies to reduce negative perceptions in order to
increase the overall credibility of the LL.M. LL.M. Identification and reduction of visual impairment
developed by. Since the launch of ChatGPT in 2022, the
Keywords:- LLMs, Hallucination, Artificial Intelligence, world has seen exponential growth in LLM-based operations
Hallucination Mitigation, Factualness. and tools. Recently, much interest in science and industry has
been directed towards exploring side effects of LLM, such as
I. INTRODUCTION insight. In a previous study ( 23 ), elements of auditory
processing in functional studies were identified and linked to
The large field of language models (LLM) includes early development of natural language. Techniques for
GPT-3 (21), IntroductGPT (22), FLAN (23), PaLM (24), writing effective guidelines for implementing LLMs,
LLaMA (25), etc. continues to evolve with new including the use of NLP criteria, human reasoning, and
developments such as important collaborations. While LL.M dynamic LLMs, are discussed in (25). Another study (16)
is good at many things, he also displays a flaw that affects his investigates voice therapy in which LL.M.s are guided or
self-confidence and self-confidence: skepticism. Citing inspired to correct their vision. In contrast to these trainings,
Berrios and Dening ( 30 ), vision is thought to be slightly our contribution is to provide a comprehensive review of the
different from actual perception, the main difference being vision of the LL.M., the inclusion of various methods and
the lack of evidence. This allows for a nuanced assessment of access to their advantages and disadvantages. The main
the connection between perception and perception. In the contribution of this article is an in-depth analysis of the field
context of many cognitive concepts that focus on the analysis of research available in LL.M. To achieve this goal, we
of human gestures, revealing vision in cognitive skills needs review and categorize relevant studies across various fields
to be done. Hallucinations, defined as the creation of concepts and disciplines. We also discuss visualization and reduction
that arise but are inappropriate information or false facts, methods in LL.M. Evaluate the advantages and disadvantages
cause serious problems in important areas such as medicine of these reductions by demonstrating the principles behind
(8), finance (22) and other sensitive areas of necessity. The the results presented. The final chapter, “Unborn
question at hand is: Why do large language models (LLMs) Perspectives,” suggests future directions and raises questions
gain insight? Factors such as lack of real-world knowledge, about current interests.

IJISRT24JUL882 www.ijisrt.com 1222


Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL882

Fig. 1. Types of Language Models (1)

II. RELATED WORK research, pose a great challenge for the smart and are difficult
to solve by developing information that indicates each other.
Surprising opening has been discussed in the context of This difference is explained in more detail by Evans et al. (6).
GPT-4 by Bubeck and colleagues ( 16 , 3 ). (3, Runner 82) Previous studies specifically examining open source concepts
shows the challenges these visionaries faced due to their risks similar to ours are limited. Some projects, such as (8), aim to
and impact. Blind eye opener is a comprehensive understand what training is most appropriate in a field. In a
investigation that involves collected data and tasks beyond recent independent study in the field of healthcare, Athaluri
the current discussion and is considered to be more difficult. et al. (1) Evaluates hearing-related information empirically.
This study shows that it is possible to at least partially resolve Similar to our method, they use Google search with real
the apparent illusion without the need for external resources. match strings for sales evaluation. Our auditory processing
The word "illusion" used in this work refers to information analysis allows us to predict visual perception for different
that is not based on knowledge. In fact, there are two types of models, and as discussed in previous studies, the auditory
errors: Errors that may arise from errors in knowledge (for issue will be important since users will give more weight to
example, the misconception that people use only 10% of their what they believe if the activation model is correct (16). A
brain) and from unlawful behavior. The two types may recent workshop discussed a black-box technique for
require different treatments. Less error training programs or trustworthiness testing built on linguistic models (LMs).
the use of methods such as RLHF (17) may help reduce Although these tutorials focus on real trust, their approach is
errors. However, illegal crimes, which are the focus of our consistent with our work. For example, Kadavas et al. (10)

IJISRT24JUL882 www.ijisrt.com 1223


Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL882

Predictions using LMs can be directly connected to form only estimated their confidence in good knowledge (“P(IK)”)
LMs, such as ChatGPT using slices. Lin et al. (12) Show that along with partial understanding (IK stands for “I know”).
LM can represent the approximation by generating numbers Various visual discovery methods have been developed for
or words that represent the three ways. Finally, Manakul et al. "zero source" where external data are not available to verify
(13) Do a volume check when gathering information. These the authenticity of the LL.M. These methods can be divided
workshops all used direct research, which directly influenced into gray column methods and black box methods (23). The
the design of our research. Due to space limitations, we do first assumes knowledge of the internal distribution of the
not delve into the study of unlimited space illusions (e.g., model. The latter is designed for LL.Ms with limited API
paraphrasing or summarizing), but instead refer to the access and no access to relevant resources. Different
discussion of recent work by Ji et al. (9). techniques are used to eliminate gray box and black box
illusions. Pre-graduate training knowledge, which includes
Various approaches have been adopted to develop large training the future language of generic scripts to capture real-
language models (LLMs), including strategies such as using world knowledge and contextual context, is required for gray
human feedback or using grammar for optimization ( Bakker face detection. Figure 2 shows how the query and verification
et al., 2022 ; Ouyang et al., 2022 ). Ouyang et al. Plan process work together. Varshny et al. (24) developed an
improved LLM-developed content by supporting learning improved technique and detected visual defects in GPT-3.5.
with human feedback. Their recommendations include Their involvement key spotting points, keyword initiation
improving the LL.M. However, it is known that fine-tuning and "guidelines". They use the LLM feature to extract
can often lead to poor performance patterns in other tasks ( important information from the generated text. Comparison
Kirkpatrick et al., 2017 ). In this study, we take an unbiased of the three methods shows that the "standard orientation" is
approach by assuming that the model cannot be accessed better than the truth and initial words in determining the main
without modification or modification. meaning. They introduced the probability as the minimum
value of the probability of the token and improved the method
Another approach applicable to our context was with the question recognition design step based on the
proposed by Burns et al. (2022) called differential answer-aware design model and search the website to answer
consistency research (CCS). However, CCS needs to turn the the authentication questions. This approach achieved an
statements into questions, evaluate the LLM in two different impressive recall of 88 on GPT-3.5.
parts of the statement, and request information from the same
data (content language) as an experiment. These limitations IV. MITIGATING LLM HALLUCINATIONS
LL.M. It makes it impossible to implement the statements
made by. Furthermore, while CCS only increases accuracy by Handling surprises in large language models (LLMs)
4% over 0-shot LLM queries, our method improves accuracy has become a significant challenge, especially with the
by almost 20% over 0-shot LLM queries. worldwide proliferation of LLM-based virtual chatbot agents
and question answering systems. Although many methods
III. HALLUCINATIONS DETECTION have been published recently to solve this problem, some of
them are only part of the effective use of vaccines because
Various methods have been proposed to verify accuracy they can cause more blindness in LL.M. Varshny et al. (24)
and reliability in large language models (LLMs). Some proposed an effective method that could reduce the reflection
methods rely on the central process (e.g., recording the result) in GPT3.5 by 33%. The way to check for these artifacts is to
to define the uncertainty of the written sequence (18), (19). provide a model that will correct them in the output. This
However, external APIs of standards such as ChatGPT do not process involves removing or changing inaccurate
provide users with access to important information, making information based on the information collected. Although
this system ineffective in decision-making processes. LLM's hearing LL.M. has emerged recently, many methods based on
fact-checking system can use external repositories and different standards have been proposed. These methods can
organizations such as Wikipedia (20) to verify visual be divided into the following categories:
information. However, there are concerns about the reliability ● Fine-tuning
of content on Wikipedia. Azaria and Mitchell ( 21 ) proposed ● Knowledge Graphs
a method to evaluate the accuracy of messages using latent ● Memory augmentation
representations of LL.M.s for use by multiple layers. The ● Context Prompts
system is based on trained supervision and relies on ● Preemptive Strategies
registration information and the internal state of the LLM,
which may not be accessible via API. In their system, LL.Ms Figure 3 provides a visual representation of mitigation
are reminded to evaluate the accuracy of their previous methods along with their respective pros and cons. Fine-
guesses, the likelihood of the answer, or the answer indicating tuning, a widely adopted technique in machine learning to
that they were correct. Kadawas et al. ( 22 ) introduced a specialize a pre-trained model with a limited dataset [15], has
nonvisual detection system called sonometry. This study been employed to mitigate hallucinations in Large Language
investigates how language models can evaluate the validity Models (LLMs), as demonstrated by Lee et al. [16]. However,
of their own responses and predict their accuracy. Larger the high parameter count in LLMs, often in the millions,
models that show good predictability for different questions makes fine-tuning a resource-intensive solution. Knowledge
can make predictions in open-ended tasks, estimating the graph methods offer the integration of structured and
probability of the answer being correct (“P(True)”). They also unstructured knowledge, providing LLMs with a broader

IJISRT24JUL882 www.ijisrt.com 1224


Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL882

foundation for various tasks [17]. However, the challenge lies VI. CONCLUSION
in the time-consuming process of designing a well-curated
knowledge base and the labor-intensive effort required to In summary, this paper provides a comprehensive
maintain up-to-date knowledge. Wu et al. [18] proposed an review of the phenomena observed in large language models
augmented transformer for knowledge-intensive Natural (LLMs). It classifies different types of hearing and explores
Language Processing (NLP) tasks to address the need for their root causes, which arise from limitations in knowledge,
deep learning methods to expand their capabilities based on models and inference methods. The authors recommend
new knowledge. Although memory augmentation has several mitigation strategies, including improving data
benefited NLP models, its applicability to LLMs remains quality, improving the design model, and incorporating a
untested. Prompt-based solutions have recently emerged as a robust verification process. They also highlight the need to
means to "de-hallucinate" LLMs. Jha et al. [19] introduced a develop reliable evaluation measures to assess the
self-monitoring prompting framework leveraging formal effectiveness of these strategies. Through a combination of
methods to autonomously identify errors in LLM responses. theoretical insights and empirical experiments, this article
This framework utilizes the conversational abilities of LLMs demonstrates the potential of advanced technologies to
for response alignment with specified correctness criteria reduce LLM thinking and thus make them more reliable and
through iterative refinement. Luo et al. [10] proposed Self- effective outcomes.
Familiarity, a method challenging existing State-of-the-Art
(SOTA) techniques by introducing a zero-resource, pre- REFERENCES
detection approach to mitigate the risk of LLMs producing
inaccurate information. This method extracts and processes [1]. V. Raunak, A. Menezes, M. Junczys-Dowmunt, The
conceptual entities from the instruction, employing prompt curious case of hallucinations in neural machine
engineering to derive a familiarity score for each concept. translation, in: Proceedings of the 2021 Conference of
the North American Chapter of the Association for
Low instruction-level familiarity scores indicate a Computational Linguistics: Human Language
higher likelihood of the LLM generating erroneous Technologies, Association for Computational
information, prompting it to refrain from generating a Linguistics, Online, 2021, pp. 1172–1183. URL:
response. Feldman et al. [11] developed a method based on https://aclanthology.org/2021.naacl-main.92.
context-tagged prompts. They formulated a set of questions doi:10.18653/v1/2021.naacl-main.92.
and created context prompts to assist LLMs in providing [2]. N. M. Guerreiro, D. Alves, J. Waldendorf, B. Haddow,
more accurate answers. Validation of the context prompts and A. Birch, P. Colombo, A. Martins, Hallucinations in
questions ensured their intended functionality. Experiments large multilingual translation models, ArXiv
with various GPT models were conducted to evaluate the abs/2303.16104 (2023). URL:
impact of context prompts on the accuracy of LLM responses. https://api.semanticscholar.org/CorpusID:257771892.
[3]. D. Dale, E. Voita, J. Lam, P. Hansanti, C. Ropers, E.
V. FUTURE PERSPECTIVE Kalbassi, C. Gao, L. Barrault, M. R. Costa-jussà,
Halomi: A manually annotated benchmark for
This section outlines considerations regarding Large multilingual hallucination and omission detection in
Language Models' (LLMs) hallucinations and mitigation machine translation, ArXiv abs/2305.11746 (2023).
strategies. Current developments in zero-resource URL:
hallucination detection are in their nascent stages, suggesting https://api.semanticscholar.org/CorpusID:258823059.
potential avenues for future exploration to enhance the [4]. J. Pfeiffer, F. Piccinno, M. Nicosia, X. Wang, M. Reid,
accuracy and reliability of these techniques across a broader S. Ruder, mmt5: Modular multilingual pre-training
spectrum of scenarios. Black-box hallucination detection solves source language hallucinations, ArXiv
poses additional challenges due to the absence of access to abs/2305.14224 (2023). URL:
the LLM's internal states. Future research in this area could https://api.semanticscholar.org/CorpusID:258841429.
focus on devising novel black-box hallucination detection [5]. S. Lin, J. Hilton, O. Evans, TruthfulQA: Measuring how
methods or optimizing existing approaches for greater models mimic human falsehoods, in: Proceedings of the
effectiveness. Another aspect to explore is hallucination 60th Annual Meeting of the Association for
detection tailored for specific tasks. While current techniques Computational Linguistics (Volume 1: Long Papers),
are generally applicable, task-specific customization may Association for Computational Linguistics, Dublin,
yield more effective results. For example, designing Ireland, 2022, pp. 3214–3252. URL:
hallucination detection methods for factual question https://aclanthology.org/2022.acl-long.229.
answering could leverage the understanding that factually doi:10.18653/v1/2022. acl-long.229.
accurate responses are more likely to be grounded in real- [6]. L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu,
world knowledge. Multimodal LLMs, a novel category Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J.
capable of handling text, images, and other media types, Gonzalez, I. C. Stoica, Judging llm-as-a-judge with mt-
present a unique challenge for hallucination detection. bench and chatbot arena, ArXiv abs/2306.05685 (2023).
Despite the complexity, addressing hallucination detection in URL:
multimodal LLMs is crucial due to their increasing https://api.semanticscholar.org/CorpusID:259129398.
popularity.

IJISRT24JUL882 www.ijisrt.com 1225


Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL882

[7]. V. Adlakha, P. BehnamGhader, X. H. Lu, N. Meade, S. [16]. V. Adlakha, P. BehnamGhader, X. H. Lu, N. Meade, S.
Reddy, Evaluating correctness and faithfulness of Reddy, Evaluating correctness and faithfulness of
instruction-following models for question answering, instruction-following models for question answering,
ArXiv abs/2307.16877 (2023). URL: ArXiv abs/2307.16877 (2023). URL:
https://api.semanticscholar.org/CorpusID:260334056. https://api.semanticscholar.org/CorpusID:260334056.
[8]. L. K. Umapathi, A. Pal, M. Sankarasubbu, Med-halt: [17]. L. K. Umapathi, A. Pal, M. Sankarasubbu, Med-halt:
Medical domain hallucination test for large language Medical domain hallucination test for large language
models, ArXiv abs/2307.15343 (2023). URL: models, ArXiv abs/2307.15343 (2023). URL:
https://api.semanticscholar.org/ CorpusID:260316324. https://api.semanticscholar.org/ CorpusID:260316324.
[9]. N. Dziri, S. Milton, M. Yu, O. Zaiane, S. Reddy, On the [18]. N. Dziri, S. Milton, M. Yu, O. Zaiane, S. Reddy, On the
origin of hallucinations in conversational models: Is it origin of hallucinations in conversational models: Is it
the datasets or the models?, in: Proceedings of the 2022 the datasets or the models?, in: Proceedings of the 2022
Conference of the North American Chapter of the Conference of the North American Chapter of the
Association for Computational Linguistics: Human Association for Computational Linguistics: Human
Language Technologies, Association for Computational Language Technologies, Association for Computational
Linguistics, Seattle, United States, 2022, pp. 5271– Linguistics, Seattle, United States, 2022, pp. 5271–
5285. URL: https://aclanthology.org/2022.naacl- 5285. URL: https://aclanthology.org/2022.naacl-
main.387. doi:10.18653/v1/2022.naacl-main.387. main.387. doi:10.18653/v1/2022.naacl-main.387.
[10]. S. Das, S. Saha, R. Srihari, Diving deep into modes of [19]. S. Das, S. Saha, R. Srihari, Diving deep into modes of
fact hallucinations in dialogue systems, in: Findings of fact hallucinations in dialogue systems, in: Findings of
the Association for Computational Linguistics: EMNLP the Association for Computational Linguistics: EMNLP
2022, Association for Computational Linguistics, Abu 2022, Association for Computational Linguistics, Abu
Dhabi, United Arab Emirates, 2022, pp. 684–699. Dhabi, United Arab Emirates, 2022, pp. 684–699. URL:
URL: https://aclanthology.org/2022.findings-emnlp.48. https://aclanthology.org/2022.findings-emnlp.48.
doi:10.18653/v1/2022. findings-emnlp.48 doi:10.18653/v1/2022. findings-emnlp.48.
[11]. N. M. Guerreiro, D. Alves, J. Waldendorf, B. Haddow, [20]. N. Dziri, E. Kamalloo, S. Milton, O. Zaiane, M. Yu, E.
A. Birch, P. Colombo, A. Martins, Hallucinations in M. Ponti, S. Reddy, FaithDial: A Faithful Benchmark
large multilingual translation models, ArXiv for Information-Seeking Dialogue, Transactions of the
abs/2303.16104 (2023). URL: Association for Computational Linguistics 10 (2022)
https://api.semanticscholar.org/CorpusID:257771892. 1473– 1490. URL:
[12]. D. Dale, E. Voita, J. Lam, P. Hansanti, C. Ropers, E. https://doi.org/10.1162/tacl_a_00529.
Kalbassi, C. Gao, L. Barrault, M. R. Costa-jussà, doi:10.1162/tacl_a_00529.
Halomi: A manually annotated benchmark for [21]. N. Dziri, H. Rashkin, T. Linzen, D. Reitter, Evaluating
multilingual hallucination and omission detection in attribution in dialogue systems: The begin benchmark,
machine translation, ArXiv abs/2305.11746 (2023). Transactions of the Association for Computational
URL: Linguistics 10 (2021) 1066–1083. URL:
https://api.semanticscholar.org/CorpusID:258823059. https://api.semanticscholar.org/CorpusID:233481654.
[13]. J. Pfeiffer, F. Piccinno, M. Nicosia, X. Wang, M. Reid, [22]. W. Sun, Z. Shi, S. Gao, P. Ren, M. de Rijke, Z. Ren,
S. Ruder, mmt5: Modular multilingual pre-training Contrastive learning reduces hallucination in
solves source language hallucinations, ArXiv conversations, Proceedings of the AAAI Conference on
abs/2305.14224 (2023). URL: Artificial Intelligence 37 (2023)
https://api.semanticscholar.org/CorpusID:258841429. 13618–13626. URL:
[14]. S. Lin, J. Hilton, O. Evans, TruthfulQA: Measuring how https://ojs.aaai.org/index.php/AAAI/article/view/26596
models mimic human falsehoods, in: Proceedings of the . doi:10.1609/ aaai.v37i11.26596.
60th Annual Meeting of the Association for [23]. D. Tam, A. Mascarenhas, S. Zhang, S. Kwan, M.
Computational Linguistics (Volume 1: Long Papers), Bansal, C. Raffel, Evaluating the factual consistency of
Association for Computational Linguistics, Dublin, large language models through news summarization, in:
Ireland, 2022, pp. 3214–3252. URL: Findings of the Association for Computational
https://aclanthology.org/2022.acl-long.229. Linguistics: ACL 2023, Association for Computational
doi:10.18653/v1/2022. acl-long.229. Linguistics, Toronto, Canada, 2023, pp. 5220–5255.
[15]. L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, URL: https://aclanthology.org/2023. findings-acl.322.
Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. doi:10.18653/v1/2023.findings-acl.322.
Gonzalez, I. C. Stoica, Judging llm-as-a-judge with mt-
bench and chatbot arena, ArXiv abs/2306.05685 (2023).
URL:
https://api.semanticscholar.org/CorpusID:259129398.

IJISRT24JUL882 www.ijisrt.com 1226


Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL882

[24]. M. Cao, Y. Dong, J. Cheung, Hallucinated but factual! [33]. A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G.
inspecting the factuality of hallucinations in abstractive Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton,
summarization, in: Proceedings of the 60th Annual S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J.
Meeting of the Association for Computational Maynez, A. Rao, P. Barnes, Y. Tay, N. M. Shazeer, V.
Linguistics (Volume 1: Long Papers), Association for Prabhakaran, E. Reif, N. Du, B. C. Hutchinson, R. Pope,
Computational Linguistics, Dublin, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T.
Ireland, 2022, pp. 3340–3354. URL: Duke, A. Levskaya, S. Ghemawat, S. Dev, H.
https://aclanthology.org/2022.acl-long.236. Michalewski, X. García, V. Misra, K. Robinson, L.
doi:10.18653/v1/2022.acl-long.236. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph,
[25]. J. Shen, J. Liu, D. Finnie, N. Rahmati, M. Bendersky, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M.
M. Najork, “why is this misleading?”: Detecting news Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A.
headline hallucinations with explanations, in: Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee,
Proceedings of the ACM Web Conference 2023, WWW Z. Zhou, X. Wang, B. Saeta, M. Díaz, O. Firat, M.
’23, Association for Computing Machinery, New York, Catasta, J. Wei, K. S. Meier-Hellstern, D. Eck, J. Dean,
NY, USA, 2023, p. 1662–1672. URL: S. Petrov, N. Fiedel, Palm: Scaling language modeling
https://doi.org/10.1145/3543507.3583375. doi:10.1145/ with pathways, ArXiv abs/2204.02311 (2022). URL:
3543507.3583375. https://api.semanticscholar.org/CorpusID:247951931.
[26]. Y. Qiu, Y. Ziser, A. Korhonen, E. Ponti, S. B. Cohen, [34]. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A.
Detecting and mitigating hallucinations in multilingual Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro,
summarisation, ArXiv abs/2305.13632 (2023). URL: F. Azhar, A. Rodriguez, A. Joulin, E. Grave, G. Lamp
https://api.semanticscholar.org/CorpusID:258841008.
[27]. J. Yu, X. Wang, S. Tu, S. Cao, D. Zhang-li, X. Lv, H.
Peng, Z. Yao, X. Zhang, H. Li, C. yan Li, Z. Zhang, Y.
Bai, Y.-T. Liu, A. Xin, N. Lin, K. Yun, L. Gong, J.
Chen, Z. Wu, Y. P. Qi, W. Li, Y. Guan, K. Zeng, J. Qi,
H. Jin, J. Liu, Y. Gu, Y. Gu, Y. Yao, N. Ding, L. Hou,
Z. Liu, B. Xu, J. Tang, J. Li, Kola: Carefully
benchmarking world knowledge of large language
models, ArXiv abs/2306.09296 (2023). URL:
https://api.semanticscholar.org/CorpusID:259165244.
[28]. N. Mihindukulasooriya, S. M. Tiwari, C. F. Enguix, K.
Lata, Text2kgbench: A benchmark for ontology-driven
knowledge graph generation from text, ArXiv
abs/2308.02357 (2023). URL:
https://api.semanticscholar.org/CorpusID:260611736.
[29]. Y. Li, Y. Du, K. Zhou, J. Wang, W. X. Zhao, J. rong
Wen, Evaluating object hallucination in large vision-
language models, ArXiv abs/2305.10355 (2023). URL:
https: //api.semanticscholar.org/CorpusID:258740697.
[30]. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D.
Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G.
Sastry
[31]. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C.
Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K.
Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L.
Miller, M. Simens, A. Askell, P. Welinder, P. F.
Christiano, J. Leike, R. Lowe, Training language
models to follow instructions with human feedback, in:
S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K.
Cho, A. Oh (Eds.), Advances in Neural Information
Processing Systems, volume 35, Curran Associates,
Inc., 2022, pp. 27730–27744. URL:
https://proceedings.neurips.cc/paper_files/paper/2022/f
ile/ b1efde53be364a73914f58805a001731-Paper-
Conference.pd f.
[32]. J. Wei, M. Bosma, V. Zhao, K. Guu, A. W. Yu, B.
Lester, N. Du, A. M. Dai, Q. V. Le, Finetuned language
models are zero-shot learners, in: International
Conference on Learning Representations, 2022. URL:
https://openreview.net/forum?id=gEZrGCozdqR.

IJISRT24JUL882 www.ijisrt.com 1227

You might also like