Beyond human data: Scaling self-training for problem-solving with language models

A Singh, JD Co-Reyes, R Agarwal, A Anand… - arXiv preprint arXiv …, 2023 - arxiv.org
Fine-tuning language models~(LMs) on human-generated data remains a prevalent
practice. However, the performance of such models is often limited by the quantity and …

Amortizing intractable inference in large language models

EJ Hu, M Jain, E Elmoznino, Y Kaddar, G Lajoie… - arXiv preprint arXiv …, 2023 - arxiv.org
Autoregressive large language models (LLMs) compress knowledge from their training data
through next-token conditional distributions. This limits tractable querying of this knowledge …

Star-gate: Teaching language models to ask clarifying questions

C Andukuri, JP Fränken, T Gerstenberg… - arXiv preprint arXiv …, 2024 - arxiv.org
When prompting language models to complete a task, users often leave important aspects
unsaid. While asking questions could resolve this ambiguity\citep [GATE;][]{li2023eliciting} …

Quiet-star: Language models can teach themselves to think before speaking

E Zelikman, G Harik, Y Shao, V Jayasiri… - arXiv preprint arXiv …, 2024 - arxiv.org
When writing and talking, people sometimes pause to think. Although reasoning-focused
works have often framed reasoning as a method of answering questions or completing …

Doing experiments and revising rules with natural language and probabilistic reasoning

T Piriyakulkij, K Ellis - arXiv preprint arXiv:2402.06025, 2024 - arxiv.org
We build a computational model of how humans actively infer hidden rules by doing
experiments. The basic principles behind the model is that, even if the rule is deterministic …

NExT: Teaching Large Language Models to Reason about Code Execution

A Ni, M Allamanis, A Cohan, Y Deng, K Shi… - arXiv preprint arXiv …, 2024 - arxiv.org
A fundamental skill among human developers is the ability to understand and reason about
program execution. As an example, a programmer can mentally simulate code execution in …

Can a Bayesian Oracle Prevent Harm from an Agent?

Y Bengio, MK Cohen, N Malkin, M MacDermott… - arXiv preprint arXiv …, 2024 - arxiv.org
Is there a way to design powerful AI systems based on machine learning methods that would
satisfy probabilistic safety guarantees? With the long-term goal of obtaining a probabilistic …

Markovian Agents for Truthful Language Modeling

S Viteri, M Lamparth, P Chatain, C Barrett - arXiv preprint arXiv …, 2024 - arxiv.org
Chain-of-Thought (CoT) reasoning could in principle enable a deeper understanding of a
language model's (LM) internal reasoning. However, prior work suggests that some LMs …

[PDF][PDF] Markovian Agents for Faithfulness of Chain-of-Thought Reasoning

S Viteri, M Lamparth, P Chatain, C Barrett - stanfordaialignment.org
Faithful and interpretable reasoning in language models can be achieved by imposing a
bottleneck on the model: generating explanatory notes on how to solve a task, and then …