Survey of hallucination in natural language generation

Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, E Ishii… - ACM Computing …, 2023 - dl.acm.org
Natural Language Generation (NLG) has improved exponentially in recent years thanks to
the development of sequence-to-sequence deep learning technologies such as Transformer …

Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models

Z Lin, S Guan, W Zhang, H Zhang, Y Li… - Artificial Intelligence …, 2024 - Springer
Recently, large language models (LLMs) have attracted considerable attention due to their
remarkable capabilities. However, LLMs' generation of biased or hallucinatory content …

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arXiv preprint arXiv …, 2023 - arxiv.org
While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models

P Manakul, A Liusie, MJF Gales - arXiv preprint arXiv:2303.08896, 2023 - arxiv.org
Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly
fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate …

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arXiv preprint arXiv …, 2022 - arxiv.org
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

Med-halt: Medical domain hallucination test for large language models

A Pal, LK Umapathi, M Sankarasubbu - arXiv preprint arXiv:2307.15343, 2023 - arxiv.org
This research paper focuses on the challenges posed by hallucinations in large language
models (LLMs), particularly in the context of the medical domain. Hallucination, wherein …

A culturally sensitive test to evaluate nuanced gpt hallucination

TR McIntosh, T Liu, T Susnjak, P Watters… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
The Generative Pre-trained Transformer (GPT) models, renowned for generating human-like
text, occasionally produce “hallucinations”-outputs that diverge from human expectations …

Red teaming language model detectors with language models

Z Shi, Y Wang, F Yin, X Chen, KW Chang… - Transactions of the …, 2024 - direct.mit.edu
The prevalence and strong capability of large language models (LLMs) present significant
safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive …

CLIFF: Contrastive learning for improving faithfulness and factuality in abstractive summarization

S Cao, L Wang - arXiv preprint arXiv:2109.09209, 2021 - arxiv.org
We study generating abstractive summaries that are faithful and factually consistent with the
given articles. A novel contrastive learning formulation is presented, which leverages both …

Generating benchmarks for factuality evaluation of language models

D Muhlgay, O Ram, I Magar, Y Levine, N Ratner… - arXiv preprint arXiv …, 2023 - arxiv.org
Before deploying a language model (LM) within a given domain, it is important to measure
its tendency to generate factually incorrect information in that domain. Existing factual …