[HTML][HTML] AI deception: A survey of examples, risks, and potential solutions

PS Park, S Goldstein, A O'Gara, M Chen, D Hendrycks - Patterns, 2024 - cell.com
This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some …

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arXiv preprint arXiv …, 2023 - arxiv.org
While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

Xstest: A test suite for identifying exaggerated safety behaviours in large language models

P Röttger, HR Kirk, B Vidgen, G Attanasio… - arXiv preprint arXiv …, 2023 - arxiv.org
Without proper safeguards, large language models will readily follow malicious instructions
and generate toxic content. This risk motivates safety efforts such as red-teaming and large …

How to catch an ai liar: Lie detection in black-box llms by asking unrelated questions

L Pacchiardi, AJ Chan, S Mindermann… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) can" lie", which we define as outputting false statements
despite" knowing" the truth in a demonstrable sense. LLMs might" lie", for example, when …

Algorithm of thoughts: Enhancing exploration of ideas in large language models

B Sel, A Al-Tawaha, V Khattar, R Jia, M Jin - arXiv preprint arXiv …, 2023 - arxiv.org
Current literature, aiming to surpass the" Chain-of-Thought" approach, often resorts to
external modi operandi involving halting, modifying, and then resuming the generation …

Towards logiglue: A brief survey and a benchmark for analyzing logical reasoning capabilities of language models

M Luo, S Kumbhar, M Parmar, N Varshney… - arXiv preprint arXiv …, 2023 - arxiv.org
Logical reasoning is fundamental for humans yet presents a substantial challenge in the
domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and …

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - Transactions of the …, 2024 - direct.mit.edu
While large language models (LLMs) have shown remarkable effectiveness in various NLP
tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A …

Measuring and improving chain-of-thought reasoning in vision-language models

Y Chen, K Sikka, M Cogswell, H Ji… - arXiv preprint arXiv …, 2023 - arxiv.org
Vision-language models (VLMs) have recently demonstrated strong efficacy as visual
assistants that can parse natural queries about the visual content and generate human-like …

Evaluating language model agency through negotiations

TR Davidson, V Veselovsky, M Josifoski… - arXiv preprint arXiv …, 2024 - arxiv.org
Companies, organizations, and governments increasingly exploit Language Models'(LM)
remarkable capability to display agent-like behavior. As LMs are adopted to perform tasks …