A survey of language model confidence estimation and calibration

J Geng, F Cai, Y Wang, H Koeppl, P Nakov… - arXiv preprint arXiv …, 2023 - arxiv.org
Language models (LMs) have demonstrated remarkable capabilities across a wide range of
tasks in various domains. Despite their impressive performance, the reliability of their output …

Navigating the grey area: How expressions of uncertainty and overconfidence affect language models

K Zhou, D Jurafsky, T Hashimoto - arXiv preprint arXiv:2302.13439, 2023 - arxiv.org
The increased deployment of LMs for real-world tasks involving knowledge and facts makes
it important to understand model epistemology: what LMs think they know, and how their …

Uncertainty in natural language processing: Sources, quantification, and applications

M Hu, Z Zhang, S Zhao, M Huang, B Wu - arXiv preprint arXiv:2306.04459, 2023 - arxiv.org
As a main field of artificial intelligence, natural language processing (NLP) has achieved
remarkable success via deep neural networks. Plenty of NLP tasks have been addressed in …

[HTML][HTML] Understanding and detecting hallucinations in neural machine translation via model introspection

W Xu, S Agrawal, E Briakou, MJ Martindale… - Transactions of the …, 2023 - direct.mit.edu
Neural sequence generation models are known to “hallucinate”, by producing outputs that
are unrelated to the source text. These hallucinations are potentially harmful, yet it remains …

The art of abstention: Selective prediction and error regularization for natural language processing

J Xin, R Tang, Y Yu, J Lin - … of the 59th Annual Meeting of the …, 2021 - aclanthology.org
In selective prediction, a classifier is allowed to abstain from making predictions on low-
confidence examples. Though this setting is interesting and important, selective prediction …

Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration

S Feng, W Shi, Y Wang, W Ding… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite efforts to expand the knowledge of large language models (LLMs), knowledge gaps-
-missing or outdated information in LLMs--might always persist given the evolving nature of …

On the calibration of pre-trained language models using mixup guided by area under the margin and saliency

SY Park, C Caragea - arXiv preprint arXiv:2203.07559, 2022 - arxiv.org
A well-calibrated neural model produces confidence (probability outputs) closely
approximated by the expected accuracy. While prior studies have shown that mixup training …

Challenges of neural machine translation for short texts

Y Wan, B Yang, DF Wong, LS Chao, L Yao… - Computational …, 2022 - direct.mit.edu
Short texts (STs) present in a variety of scenarios, including query, dialog, and entity names.
Most of the exciting studies in neural machine translation (NMT) are focused on tackling …

On compositional generalization of neural machine translation

Y Li, Y Yin, Y Chen, Y Zhang - arXiv preprint arXiv:2105.14802, 2021 - arxiv.org
Modern neural machine translation (NMT) models have achieved competitive performance
in standard benchmarks such as WMT. However, there still exist significant issues such as …

Self-training sampling with monolingual data uncertainty for neural machine translation

W Jiao, X Wang, Z Tu, S Shi, MR Lyu, I King - arXiv preprint arXiv …, 2021 - arxiv.org
Self-training has proven effective for improving NMT performance by augmenting model
training with synthetic parallel data. The common practice is to construct synthetic data …