Trustworthy LLMs: A survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, RGH Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

Uncertainty in natural language processing: Sources, quantification, and applications

M Hu, Z Zhang, S Zhao, M Huang, B Wu - arXiv preprint arXiv:2306.04459, 2023 - arxiv.org
As a main field of artificial intelligence, natural language processing (NLP) has achieved
remarkable success via deep neural networks. Plenty of NLP tasks have been addressed in …

Lm vs lm: Detecting factual errors via cross examination

R Cohen, M Hamri, M Geva, A Globerson - arXiv preprint arXiv …, 2023 - arxiv.org
A prominent weakness of modern language models (LMs) is their tendency to generate
factually incorrect text, which hinders their usability. A natural question is whether such …

Crawling the internal knowledge-base of language models

R Cohen, M Geva, J Berant, A Globerson - arXiv preprint arXiv …, 2023 - arxiv.org
Language models are trained on large volumes of text, and as a result their parameters
might contain a significant body of factual knowledge. Any downstream task performed by …

Glue-x: Evaluating natural language understanding models from an out-of-distribution generalization perspective

L Yang, S Zhang, L Qin, Y Li, Y Wang, H Liu… - arXiv preprint arXiv …, 2022 - arxiv.org
Pre-trained language models (PLMs) are known to improve the generalization performance
of natural language understanding models by leveraging large amounts of data during the …

Uncertainty quantification with pre-trained language models: A large-scale empirical analysis

Y Xiao, PP Liang, U Bhatt, W Neiswanger… - arXiv preprint arXiv …, 2022 - arxiv.org
Pre-trained language models (PLMs) have gained increasing popularity due to their
compelling prediction performance in diverse natural language processing (NLP) tasks …

Selectively answering ambiguous questions

JR Cole, MJQ Zhang, D Gillick, JM Eisenschlos… - arXiv preprint arXiv …, 2023 - arxiv.org
Trustworthy language models should abstain from answering questions when they do not
know the answer. However, the answer to a question can be unknown for a variety of …

Improving selective visual question answering by learning from your peers

C Dancette, S Whitehead… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Despite advances in Visual Question Answering (VQA), the ability of models to
assess their own correctness remains underexplored. Recent work has shown that VQA …

Reliable visual question answering: Abstain rather than answer incorrectly

S Whitehead, S Petryk, V Shakib, J Gonzalez… - … on Computer Vision, 2022 - Springer
Abstract Machine learning has advanced dramatically, narrowing the accuracy gap to
humans in multimodal tasks like visual question answering (VQA). However, while humans …

The art of saying no: Contextual noncompliance in language models

F Brahman, S Kumar, V Balachandran, P Dasigi… - arXiv preprint arXiv …, 2024 - arxiv.org
Chat-based language models are designed to be helpful, yet they should not comply with
every user request. While most existing work primarily focuses on refusal of" unsafe" …