Approximate thompson sampling via epistemic neural networks

J Lee, A Xie, A Pacchiano, Y Chandak… - Advances in …, 2024 - proceedings.neurips.cc

Large transformer models trained on diverse datasets have shown a remarkable ability to
learn in-context, achieving high few-shot performance on tasks they were not explicitly …

被引用次数：56 相关文章所有 7 个版本

[PDF] neurips.cc

Epistemic neural networks

I Osband, Z Wen, SM Asghari… - Advances in …, 2023 - proceedings.neurips.cc

Intelligence relies on an agent's knowledge of what it does not know. This capability can be
assessed based on the quality of joint predictions of labels across multiple inputs. In …

被引用次数：120 相关文章所有 6 个版本

[PDF] arxiv.org

Self-exploring language models: Active preference elicitation for online alignment

S Zhang, D Yu, H Sharma, H Zhong, Z Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Preference optimization, particularly through Reinforcement Learning from Human
Feedback (RLHF), has achieved significant success in aligning Large Language Models …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Making rl with preference-based feedback efficient via randomization

R Wu, W Sun - arXiv preprint arXiv:2310.14554, 2023 - arxiv.org

Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be
efficient in terms of statistical complexity, computational complexity, and query complexity. In …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Efficient exploration for llms

V Dwaracherla, SM Asghari, B Hao… - arXiv preprint arXiv …, 2024 - arxiv.org

We present evidence of substantial benefit from efficient exploration in gathering human
feedback to improve large language models. In our experiments, an agent sequentially …

被引用次数：15 相关文章所有 3 个版本

Position paper: Bayesian deep learning in the age of large-scale ai

T Papamarkou, M Skoularidou, K Palla… - arXiv e …, 2024 - ui.adsabs.harvard.edu

In the current landscape of deep learning research, there is a predominant emphasis on
achieving high predictive accuracy in supervised tasks involving large image and language …

被引用次数：19 相关文章

[PDF] openreview.net

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

T Papamarkou, M Skoularidou, K Palla… - … on Machine Learning, 2024 - openreview.net

In the current landscape of deep learning research, there is a predominant emphasis on
achieving high predictive accuracy in supervised tasks involving large image and language …

被引用次数：17 相关文章

[PDF] arxiv.org

Reinforcement Learning: An Overview

K Murphy - arXiv preprint arXiv:2412.05265, 2024 - arxiv.org

This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

Pearl: A Production-ready Reinforcement Learning Agent

Z Zhu, R de Salvo Braz, J Bhandari, D Jiang… - Journal of Machine …, 2024 - jmlr.org

Reinforcement learning (RL) is a versatile framework for optimizing long-term goals.
Although many real-world problems can be formalized with RL, learning and deploying a …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Satisficing exploration for deep reinforcement learning

D Arumugam, S Kumar, R Gummadi… - arXiv preprint arXiv …, 2024 - arxiv.org

A default assumption in the design of reinforcement-learning algorithms is that a decision-
making agent always explores to learn optimal behavior. In sufficiently complex …

被引用次数：1 相关文章所有 4 个版本